66
NB. This is the author’s post-refereed version of a paper to be published in the International Journal of Corpus Linguistics 19/4 in November 2014. Copyright belongs to John Benjamins Publishing. Please contact John Benjamins for further reprinting or re-use. This paper was first submitted for publication October 2012. Corpus frequency and second language learners’ knowledge of collocations A meta-analysis Philip Durrant University of Exeter Tests of second language learners’ knowledge of collocation have lacked a principled strategy for item selection, making claims about learners’ knowledge beyond the particular collocations tested difficult to evaluate. Corpus frequency may offer a good basis for item selection, if a reliable relationship can be demonstrated between frequency and learner knowledge. However, such a relationship is difficult to establish satisfactorily, given the small number of items and narrow range of test- takers involved in any individual study. In this study, a meta-analysis is used to determine the correlation between learner knowledge and frequency data across nineteen

P. Durrant (2014). Corpus frequency and second language learners' knowledge of collocations: A meta-analysis. International Journal of Corpus Linguistics, 19(4): 443-477

  • Upload
    exeter

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

NB. This is the author’s post-refereed version

of a paper to be published in the International

Journal of Corpus Linguistics 19/4 in November 2014.

Copyright belongs to John Benjamins

Publishing. Please contact John Benjamins for

further reprinting or re-use. This paper was

first submitted for publication October 2012.

Corpus frequency and second language learners’

knowledge of collocations

A meta-analysis

Philip DurrantUniversity of Exeter

Tests of second language learners’ knowledge of collocation

have lacked a principled strategy for item selection,

making claims about learners’ knowledge beyond the

particular collocations tested difficult to evaluate.

Corpus frequency may offer a good basis for item selection,

if a reliable relationship can be demonstrated between

frequency and learner knowledge. However, such a

relationship is difficult to establish satisfactorily,

given the small number of items and narrow range of test-

takers involved in any individual study. In this study, a

meta-analysis is used to determine the correlation between

learner knowledge and frequency data across nineteen

previously-reported tests. Frequency is shown to correlate

moderately with knowledge, but the strength of this

correlation varies widely across corpora. Strength of

association measures (such as mutual information) do not to

correlate with learner knowledge. These findings are

discussed in terms of their implications for collocation

testing and models of collocation learning.

Keywords: collocation, testing, frequency, formulaic

language, vocabulary, SLA

1. Introduction

It has long been recognized that collocations are pervasive

in language (Hoey 2005, Sinclair 2004), and that a healthy

repertoire of collocations is essential to mastery of a

foreign language (e.g. Kjellmer 1990, Lewis 1993, Nattinger

& DeCarrico 1992, Palmer 1933, Pawley & Syder 1983).

Research on second language learners’ knowledge and

acquisition of collocations has gathered pace in recent

years, with greater integration of corpus and

psycholinguistic methods advancing our understanding and

allowing ever more specific questions to be addressed (e.g.

Durrant & Schmitt 2010; Webb & Kagimoto 2011; Wolter &

Gyllstad 2011, 2013; Yamashita & Jiang 2010).

A major outstanding issue in this area is that of how

learners’ knowledge of collocations can be validly assessed

While a number of recent studies have evaluated various

test formats (Barfield 2003, Bonk 2001, Gyllstad 2007,

Moreno Jaén 2009, Revier 2009), the key question of how

collocations can be reliably sampled for inclusion as test

items has not been addressed.

As with all ‘selective’ (Read 2000) vocabulary tests,

collocation tests utilize samples of items which are small

in proportion to the population from which they are drawn.

Because we are usually interested not just in learners’

knowledge of the particular items tested, but of

collocations more generally, it is essential that items be

selected in a principled way to allow inference beyond the

sample.

In relation to single-word vocabulary, word frequency

has been shown to correlate with likelihood of knowledge

(Milton 2009) and it has become common practice to sample

items according to this variable. Typically, words are

grouped into frequency ‘bands’ and learners’ performance on

a sample of words is taken to reflect their knowledge of

words in that band (Nation 1990). Since collocation can be

seen as a type of vocabulary (in that collocations are

linguistic items which need to be specifically learned,

rather than being derivable from rules – see Section 2,

below) and since models of collocation have claimed that L1

collocation learning is frequency-driven (e.g. Ellis 2001,

Hoey 2005), it is tempting to extend this strategy to

collocation tests. However, at least two considerations

suggest that it would be unwise to do so without further

evidence.

First, some researchers have doubted whether collocation

learning is frequency-driven for second language learners.

Wray (2002), for example, has claimed that adult L2

learners tend not to notice and remember the collocations

they encounter. This view suggests that collocation

learning is usually the result of explicit memorization of

selected forms, rather than exposure, and so implies that

collocation knowledge may not be sensitive to frequency.

A second reason to question the frequency-knowledge link

for collocations lies in the nature of the corpus data on

which frequency counts are based. The logic behind using

frequency to predict knowledge is that the more frequent an

item is, the more likely learners are to have met it

repeatedly. However, few corpora are likely to be

representative of the language which any individual learner

has encountered (Durrant & Doherty 2010). Corpora are

generally designed to represent, not individuals’

experiences, but rather particular types of discourse. A

further problem is introduced by limitations in the ways

that frequency counts are conducted. In particular, in the

absence of fine-grained semantic tagging, counts do not

distinguish different senses of polysemous words. Since it

is likely that language learners do make such distinctions,

this constitutes a further distortion of their experience

of the language.

Studies have shown that corpus-based frequency counts

are a reasonable guide to learners’ knowledge of relatively

frequent words (especially for the 4,000 most frequent

words) (Milton 2009). However, the correlation weakens

considerably at lower frequencies (Milton 2009). This is

probably because, whereas frequent words tend to be

frequent across a wide range of situations, lower frequency

words are usually associated with particular contexts, and

their frequency therefore tends to vary between corpora.

For such words, the assumption of a correlation between

frequency in a particular corpus and frequency in a given

learner’s experience is dubious. This raises problems for

collocations because individual items tend to be relatively

infrequent. Shin & Nation (2008), for example, find that

only 891 collocation have frequencies similar to those of

the 4,000 most frequent single words, where Milton (2009)

finds frequency to be a reliable predictor.

While the factors discussed above suggest that the

relationship between corpus frequency and L2 knowledge of

collocations may not be entirely straightforward, a number

of recent studies have suggested that some relationship

does exist. Durrant & Schmitt (2010) find that – contrary

to Wray’s (2002) claims – adult second language learners do

retain memories of which words appear together in the

language they meet, and that greater repetition leads to

greater retention. Similarly, both Siyanova-Chanturia et

al. (2011) and Wolter & Gyllstad (2013) find that adult L2

users’ speed of processing English collocations was

affected by collocation frequency.

While these studies suggest that frequency is related to

L2 knowledge, at least two caveats should be noted.

Firstly, these studies were conducted with a relatively

narrow range of high proficiency learners. Durrant &

Schmitt (2010) and Siyanova-Chanturia et al.’s (2011)

participants were all students at a single British

university, while Wolter & Gyllstad’s (2013) participants

showed an impressive mean vocabulary size of 7,350 words,

putting them above even the 3,750-5,000 words which are

associated with the highest levels (C1 and C2) of the

Common European Framework (Wolter & Gyllstad 2013). Whether

similar effects will be seen for learners below these high

levels remains an open question.

Secondly, these studies did not aim to measure knowledge

of the type which is tapped in typical test formats, but

rather efficiency of processing (Siyanova-Chanturia, et al.

2011, Wolter & Gyllstad 2013) or priming relationships

between words (Durrant & Schmitt 2010). Further work is

needed to determine whether the frequency effects which

these studies show through eye-fixation durations, response

times to decision tasks, or priming is also found in

students’ responses on standard test tasks.

In response to these issues, the present paper will

investigate the extent to which learners’ knowledge of

collocations, as measured by typical test formats, is

related to collocations’ frequency in a corpus. It aims

both to establish whether corpus frequency is a valid

strategy for sampling collocation test items and to give

guidance on which types of frequency information are most

relevant to collocation sampling.

One way of studying the frequency-knowledge relationship

would be to create a test including a set of collocations

of different frequencies and to determine whether the

number of students knowing each collocation is correlated

with collocation frequency. However, any individual test

administration would be limited by the inevitably small

sample of collocations used, the testing method employed

and any peculiarities of the test-takers. To gain a more

robust data set, therefore, existing literature was

reviewed to identify studies which report collocation

tests. Frequency data were then retrieved for the

collocations in these tests, and correlations between

frequency and the percentage of students answering

correctly were determined. Meta-analytic techniques were

then used to determine overall correlations across all

studies. As with all meta-analyses, the logic is that, if

the effect we are seeking is robust across multiple studies

despite the different types of error inherent in each, we

can have a high degree of confidence that the effect is

real (Cooper 1998).

2. Defining collocation

The term ‘collocation’ has been used by researchers from a

variety of traditions and has been defined in a several

different ways (see e.g. Barfield & Gyllstad 2009,

Nesselhauf 2004). It is therefore important for any study

of collocation to define its scope clearly.

Durrant & Mathews-Aydinli (2010) describe three main

orientations:

(i) ‘Phraseological approaches’ (e.g. Cowie 1998,

Nesselhauf 2004) define collocations as word

combinations in which one element does not carry its

usual meaning (e.g. take a step, explode a myth) or in which

there are restrictions on which words can enter a

combination (e.g. commit can only be followed by a

small number of nouns, related to wrongdoing, shrug

combines almost exclusively with shoulders).

(ii) ‘Frequency-based approaches’ (e.g. Biber 2009,

Hoey 1991, Sinclair, 1991) define collocations as sets

of words which have a statistical tendency to co-occur

in texts. These are likely to include collocations as

defined in the phraseological approach (e.g. shrug is

statistically highly likely to co-occur with shoulders)

but also include combinations which do not exhibit

semantic specialization or restriction (e.g. next

week, drink tea).

(iii) ‘Psycholinguistic approaches’ (e.g. Hoey 2005,

Wray 2002) define collocations as combinations of

words which have psychological reality in that they

are stored holistically or there is an associative

link between their elements. This has clear overlaps

with the previous categories in that both semantic

specialization/restriction and high frequency of

occurrence are likely to imply some form of mental

representation.

In spite of their differences, all three of these

approaches share the idea that collocations are

combinations whose behaviour cannot be fully explained in

terms of features of their component words and therefore

need to be handled as partially independent entities, with

their own semantic/ distributional/psycholinguistic

properties. For language learning purposes, this

corresponds to the idea that collocations are combinations

of words which need to be independently learned. This

conception is captured well by Palmer’s (1993) definition

of collocations as:

successions of words […] that (for various, different

and overlapping reasons) […] must or should be learnt,

or is best or most conveniently learnt as an integral

whole or independent entity, rather than by the

process of placing together their component parts.

(Palmer 1933: 4)

To be of use, however, Palmer’s (1993) definition needs to

be developed in two ways. First, it is necessary to spell

out the “various, different and overlapping reasons”

(Palmer 1993: 4) why a succession of words might be best

learnt as a whole. Three main reasons can be cited here:

(i) Semantic opacity: the collocation is semantically

non-transparent. I.e. its meaning cannot be reliably

predicted on the basis of a knowledge of the meaning

its component parts have in other contexts. Examples

include: small talk and curry favour. Without specific

knowledge of the meanings of such collocations, a

learner is unlikely to be able to understand or produce

them accurately.

(ii) Received usage: particular collocations may

become the conventional way of expressing a particular

meaning, even though other phrasings are equally

plausible. Examples include: answer phone and slit throat.

Without specific knowledge of such pairings, a learner

has a good chance of guessing the “wrong” combination

and their language is likely to sound “inauthentic”

(Pawley & Syder 1983).

(iii)Fluency: the combination occurs with such high

frequency that learning it as an item is likely to

promote fast and accurate (efficient) language

processing. Possible examples include: sunny day and salt

and pepper. Without knowledge of such collocations, a

learner may not be able to achieve nativelike fluency

(Pawley & Syder 1983).

Second, Palmer (1993) does not specify how many words

collocations can have. His examples (e.g. there is something the

matter with you, to be difficult for someone to do something) seem to

indicate that he has no particular limit in mind. Some

researchers in the frequency-based/psychological traditions

have similarly called combinations of any number of words

‘collocations’ (Biber 2009, Hoey 2005, Kjellmer 1990,

Sinclair 1991). However, corpus linguists often make a

distinction between two-word combinations and longer

sequences, with longer combinations commonly referred to by

other terms, such as ‘lexical bundle’ (Biber et al. 1999,

Ellis et al. 2008), ‘n-gram’ or ‘concgram’ (Cheng et al.

2006). The differences between these labels are important

in corpus research because each involves a different search

strategy. For example, whereas lexical bundles are

retrieved as fixed contiguous sequences of words,

collocations are usually searched for as pairs of words

frequently appearing within a certain distance of each

other, so allowing greater flexibility regarding their

relative positions. Similarly, the various measures which

are used to quantify collocation frequency (reviewed below)

can vary dramatically across combinations of different

length, with frequency dropping and mutual information

increasing sharply as the lengths of combinations increase.

For these reasons, frequency data about positionally-

flexible two-word collocations are not strictly comparable

with frequency data about other types of word combination.

For studies, such as the present one, which make extensive

use of such data, it is therefore important to maintain a

distinction between combinations of different lengths. For

this reason, the term ‘collocation’ will be used here to

refer only to combinations of two words within a given

span.

Taking these points into consideration, Palmer’s (1933)

formulation can be adapted to define collocations as:

combinations of two words that are best learned as

integral wholes or independent entities, rather than

by the process of placing together their component

parts, either because (i) they may not be understood

or appropriately produced without specific knowledge,

or (ii) they occur with sufficient frequency that

their independent learning will facilitate fluency.

3. Material and methods

Meta-analysis is a technique for synthesizing existing

research in order to clarify the relationships between the

main variables and to understand the effects of moderating

variables (e.g. Cooper 1998, Lipsey & Wilson 2000). Meta-

analytic work in the field of language learning is usefully

reviewed by Norris & Ortega (2006), who describe three main

stages in a meta-analysis: sampling of relevant studies;

coding of data relevant to each study; and analysis. The

following sections will describe each of these stages in

turn.

3.1 Sampling

The first step in the meta-analysis was a comprehensive

search of the literature to identify relevant data. To

ensure relevance to the research questions and

comparability between studies, it is important at this

stage to define clear criteria for inclusion in the review.

In the present case, studies needed to include descriptions

of selective tests of non-native speakers’ knowledge of

English collocations and provide information about the

numbers of learners answering each test item correctly.

The first step in the review was to search five major

databases using the search term: collocation* AND (test* OR learn*

OR knowledge). The databases searched were: (i) Web of

Knowledge (topic search, refined to ‘Arts Humanities’ and

‘Social Sciences’); (ii) ERIC (abstract search); (iii)

Linguistics and Language Behavior Abstracts (abstract

search); (iv) PsychInfo (abstract search); (v) PsycArticles

(abstract search). Further, both Google and Google Scholar

were searched using the search term language collocation* test*

learn* acquisition*. Because of the large number of (often

irrelevant) hits returned by Google, only the first 250

results were used.

The abstracts of all retrieved items were checked to

determine whether they included empirical studies which

involved selective tests of non-native speakers’

collocation knowledge in English. 35 such studies were

identified.

The second step was to check the bibliographies of all

relevant studies for further studies. Google Scholar was

also checked for studies citing the retrieved works. Any

publication whose title suggested it included some

evaluation of learners’ collocational knowledge was

retrieved and checked to see if it met the inclusion

criteria. This process was repeated recursively with all

newly-identified studies.

The review was restricted to papers written in English

and either freely available online or accessible through my

institution’s library. In a small number of cases,

references from other sources suggested that a source which

was written in another language or which was not freely

available contained information of the type required. These

sources (i.e. Jaen 2009, Barfield 2003) were obtained

through direct contact with the authors or through my

institution’s library.

This process returned a total of 85 studies. Of these,

very few included the information required for the meta-

analysis. Only 46 studies recorded which collocations were

included in their tests. Of these, 14 provided data on the

number of students answering each item correctly. Four of

these were excluded because test items did not have unique

correct answers and so did not allow show whether learners

knew specific target collocations; one was excluded because

it focused on collocations from a narrowly-defined area of

discourse (Maritime English) for which available corpora

were unlikely to provide valid frequency data.

Some of the nine remaining publications included more

than one test and so provided multiple data sources. As

discussed above, tests were only included if items had a

single correct answer. This meant that, for example,

Gyllstad’s (2007) COLLMATCH tests and Jaen’s (2009) test 3

were not included in the analysis.

This process provided a total of 19 different tests,

summarized in Table 1. The tests were conducted by nine

different researchers in eight different countries.

Participant numbers ranged from 18 to 340, with a total of

1,568 distinct test takers.

Table 1. Tests included in the meta-analysisSource Participants Items Test formatAbdul-Fattah 2001

340 10th grade students at10 different schools in Jordan

12 V + N2 Adj + N

4-option selected response sentence completion; node given, collocate selected

Barfield 2003(Chp 3)

93 students at a university in Japan (various departments)

99 V + N 4-point self-reportknowledge scale; nocontext given

Brashi 2009 20 senior undergraduate English Language studentsat a university in Saudi Arabia

20 V + N 4-option selected response completionof sentence context; noun given, verb selected

Farghal & 34 junior/senior English 7 Adj + Sentence completion

Obeidat 1995 (Test 1)

majors at a university inJordan

N

Farghal & Obeidat 1995 (Test 2)

23 senior English majors at a university of Jordan

15 Adj +N2 N + N

Whole sentence translation from L1(Arabic)

Gyllstad 2007(COLLEX 1)

18 2nd year undergraduate ELT students at a university in Sweden

59 V + N 2-option selected response; noun given, verb selected; no context given

Gyllstad 2007(COLLEX 2)

84 1st year undergraduate English Language studentsat a university in Sweden

48 V + N12 Adj +N2 N + V1 Adv + Adj

2-option selected response; noun given, collocate selected; no context given

Gyllstad 2007(COLLEX 3)

116 1st-2nd year undergraduate English Language students at a university in Sweden

38 V + N8 Adj + N2 Adv + Adj

2-option selected response; noun given, collocate selected; no context given

Gyllstad 2007(COLLEX 4)

188 students in Sweden (26 10th grade high school; 28 11th grade high school; 134 1st year English language undergraduates)

38 V + N8 Adj + N2 Adv + Adj

2-option selected response; noun given, collocate selected; no context given

Gyllstad 2007(COLLEX 5)

24 students in Sweden (7 11th grade high school; 171st year undergraduate English Language students)

38 x V +N

3-option selected response; noun given, verb selected; no context given

Jaén 2009 (Test 1)

311 undergraduate EnglishPhilology/English Translation and Interpretation students at three universities in Spain

22 Adj +N13 V + N6 N + N1 N + V

C-test with sentence context; noun and first letter of collocategiven

Jaén 2009 (Test 2)

311 undergraduate EnglishPhilology/English Translation and Interpretation students at three universities in Spain

16 Adj +N11 V + N1 N + N

Translation: L1 phrase and English node given; test takers supply collocate

Jaén 2009 (Test 4)

311 undergraduate EnglishPhilology/English Translation and Interpretation students at three universities in Spain

23 Adj +N15 V + N6 N + N1 N + V

4-option selected response sentence completion. Node given; collocate selected

Koya 2005 130 students at a 68 V + N 3-option selected

(Test B) university in Japan (various departments)

response completionof sentence context; noun given, verb selected

Kurosaki 2012 (selected response - French)

34 French undergraduate students studying Englishpart-time in Paris

16 V + N7 Adj + N5 Adv + Adj

4-option selected response sentence completion; node given, collocate selected

Kurosaki 2012 (selected response - Japanese)

30 3rd/4th year non-Englishmajor undergraduate students in Japan

16 V + N7 Adj + N5 Adv + Adj

4-option selected response sentence completion; node given, collocate selected

Kurosaki 2012 (translation - French)

29 French undergraduate students studying Englishpart-time in Paris

13 V + N8 Adj + Adj5 Adj + N

Translation from L1- target sentence provided with wholecollocation removed

Kurosaki 2012 (translation - Japanese)

38 3rd/4th year non-English major undergraduate students inJapan

13 V + N9 Adj + Adj5 Adj + N

Translation from L1- target sentence provided with wholecollocation removed

Revier 2009 56 students in Denmark (20 10th grade high school; 17 11th grade highschool; 19 1st year undergraduate)

19 V + N 3-option selected response completionof sentence contexts; each component of collocation selected separately

For various reasons, not all items on all tests were

included in the present analysis. Specifically, items were

omitted if they did not test collocations as defined in

this study (e.g. if they included more than one word or

included a non-lexical word) or if more than one answer was

accepted by the researchers as correct. Table 1 shows the

number and grammatical type of collocations remaining in

each test. After adjustments, the tests comprised between 7

and 100 items each, with a total of 724 items across the 19

tests. There was some overlap between tests in the items

used. For this reason, the total number of unique

(lemmatized) collocations was lower, at 476. The majority

of items were verb + noun combinations (349), followed by

adjective + noun (99), noun + noun (15) and adverb +

adjective (13).

A common problem with meta-analyses is that of

publication bias – i.e. that studies tend only to get

published if they achieve significant results. This means

that meta-analyses which incorporate only published studies

may inadvertently exclude contrary evidence. However, the

present study is unusual amongst meta-analyses in that the

main effect it studies (the relationship between frequency

and knowledge) was not a focus on the original studies

reviewed. There is therefore no reason to believe that the

studies included will demonstrate a greater or lesser

relationship between frequency and knowledge than would

unpublished studies.

3.2 Coding

The second stage of the meta-analysis was that of coding

studies for variables of interest. In this study, the main

variables are the percentage of participants correctly

answering each item and the frequency of each collocation.

The former was provided by the original studies. The latter

was retrieved directly from corpora. Because quantification

of collocation frequency is a complex issue, involving a

number of decisions, this will be described in detail below

(Section 3.3).

As well as the main variables, studies need to be coded

for any potential moderator variables that might be

relevant to the analysis. Four such variables were

identified in the current set of studies:

(i) Students’ experience of studying English. Tests can

be broadly divided into those in which the text-

takers were full-time students on university

programmes directly related to English language and

those which were not (Gyllstad’s (2007) COLLEX 4

and 5 drew on a mix of university and pre-

university students and so will not be included in

this analysis);

(ii) Students’ L1. These can be divided into European

languages (Danish, French, Spanish and Swedish),

Arabic and Japanese;

(iii) Test task type. The main types used are selected

response and translation. Three other task types

(self-report, sentence-completion, and C-test) are

combined under the category ‘other’;

(iv) Whether test-takers are asked to provide the whole

collocation or only the collocate.

Table 2 shows how the 19 tests are categorized on each of

these variables.

Table 2. Categorization of tests according to possible moderatorsSource English

majorsL1 Task type Whole

collocationrequired

Abdul-Fattah No Arabic

Selectedresponse No

BarfieldNo

Japanese Other Yes

Brashi Yes Arabic

Selectedresponse No

Farghal & Obeidat (Test 1) Yes Arabic Other NoFarghal & Obeidat (Test 2) Yes Arabic Translation YesGyllstad (COLLEX 1) Yes

European

Selectedresponse No

Gyllstad (COLLEX 2) Yes

European

Selectedresponse No

Gyllstad (COLLEX 3) Yes

European

Selectedresponse No

Gyllstad (COLLEX 4) Mixed

European

Selectedresponse No

Gyllstad (COLLEX 5) Mixed

European

Selectedresponse No

Jaén (Test 1)Yes

European Other No

Jaén (Test 2)Yes

European Translation No

Jaén (Test 4)Yes

European

Selectedresponse No

KoyaNo

Japanese

Selectedresponse No

Kurosaki (MC Fr)No

European

Selectedresponse No

Kurosaki (MC Jp)No

Japanese

Selectedresponse No

Kurosaki (trans. Fr) No

European Translation Yes

Kurosaki (trans. JP) No

Japanese Translation Yes

Revier No European

Selectedresponse Yes

3.3 Frequency data

Collocation frequency can be quantified in a number of

different ways (see Schmitt 2010 for a review). Since it is

unclear which of these is most likely to be related to

learner knowledge, several different methods were used.

The first variable which needs to be considered in

counting collocations is the ‘span’ of text within which

two words need to occur to be counted as an example of the

collocation. Collocates can occur at quite some distance

from each other, as the following Example (1) of the

collocation realize dream, taken from the Corpus of

Contemporary American (COCA) (Davies 2008-), illustrates:

(1) The old dream of wireless communication through

space has now been realized

Thus if the span used in our search of collocations is too

narrow, many genuine examples will be missed. However, as

the span is widened, the chances of counting word pairs

which are not in a collocational relationship increases.

Consider Example (2), again taken from COCA:

(2) she realizes that the buzzing sound from her dream

is present in her bedroom.

The balance we need to achieve in setting a search span,

therefore, is to maximize the number of genuine

collocations while minimizing the number of false hits. The

former pushes us to widen our search span, while the latter

pushes us to keep it narrow. Jones & Sinclair’s (1974)

claim that most collocates are found within four words to

the left or right of their node has led to the widespread

adoption of a 4:4 span. However, there has been little

direct validation of this claim. The present research will

therefore adopt two spans: a conservative 4:4 and a more

liberal 9:9. Results from both types of search will be

compared with student scores to see which is the better

predictor of knowledge.

A second variable that must be considered is that of

whether counts for separate forms of a word should be

combined – such that, for example, argue strongly and argued

strongly would count as two occurrences of a single

collocation – or whether separate counts should be made for

each form. While Halliday (1966) argues for the former on

the grounds that treating different forms separately would

add complexity without a gain in descriptive power, many

corpus linguists have noted that conflating forms risks

disguising important differences between the collocations

of different forms of a word (Clear 1993, Hoey 2005,

Sinclair 1991, Stubbs 1996). Both of these arguments, it

should be noted, are based on the priorities of descriptive

linguists. For our purposes, the important question is

which approach produces counts which are relevant to

students’ likelihood of knowing a collocation. While there

is some evidence that learners do not always transfer their

knowledge of one form of a word to another (Schmitt &

Zimmerman 2002), I would argue that the default assumption

should be that learning will usually take place at least at

the lemma level – for example, encountering argue strongly

will increase a learners’ chances of recognizing argued

strongly as an appropriate collocation. Most of the frequency

counts used in this study therefore combined counts of

differently inflected forms of the component words.

However, since the assumption that lemmatised counts

provides a better estimate of knowledge is yet to be

substantiated, one frequency count based on unlemmatised

word forms was also provided for comparison.

A third factor that needs to be considered is the

measure used to quantify collocation frequency. The

simplest approach is to record the number of times a

combination appears. However, such counts tend to give

undue prominence to combinations of very high-frequency

words (of the, and a, etc.), which co-occur very frequently

by chance alone, while sidelining genuine collocations

which consist of low-frequency words (abject poverty, battering

ram, etc.). A number of methods have been suggested to

overcome these problems. Perhaps the most widely used are

the ‘t-score’ and ‘mutual information’ (MI) statistics. The

rationale for and calculation of these statistics are

discussed in detail elsewhere (Manning & Schütze 1999) so

will be described only briefly here.

Both statistics work by comparing the actual frequency

of co-occurrence of a pair of words with the frequency we

would expect them to co-occur by chance alone, given the

individual frequency of each word. Expected frequency E is

calculated using the formula

E C *w1*w2C 2

where C is the total number of word tokens in the corpus,

and w1 and w2 are the frequencies of each of the component

words.

T-score and MI are then calculated with the formulas

t O E

O

MI log2OE

where O is the observed frequency of a combination.

The logic behind these two statistics is rather

different, and this results in characteristically different

types of collocations being highlighted by each. MI is a

measure of the extent to which the probability of meeting

one word increases once we encounter the other. T-score, on

the other hand, is a hypothesis testing technique, which

evaluates how much evidence there is that a particular

combination occurs more frequently than we would expect by

chance alone, given the frequencies of its component parts.

As Clear (1993) puts it, whereas “MI is a measure of the

strength of association between two words”, t-score indicates “the

confidence with which we can claim there is some association” (Clear

1993: 279-282, original emphases). Clear (1993) gives the

example of taste arbiters as a combination with a high MI.

Though the pairing is not particularly frequent, a high

proportion of occurrences of each of its component words

are found as part of this collocation, with, according to

Clear’s (1993) data, one quarter of all occurrences of

arbiters being found within a two word span of an occurrence

of taste. The two words are therefore strongly associated in

that, where we find arbiters, we are also likely to find

taste. However, the relatively low frequency of the

collocation means that we cannot be confident that the

association is generalisable – i.e. that we would encounter

it in other samples of language. The pairing taste for, on

the other hand, is an example of a collocation with a high

t-score. Though the association between these words is

weaker than that between arbiter and taste, in that neither

word is a strong predictor of the other, the pair occurs

much more frequently, so we can be more confident in the

generalisability of the association.

Both of these measures of association are non-

directional, in that it makes no difference which word is

taken as node and which as collocate. Clearly, however, the

relationship between two parts of a collocation is often

not symmetrical. The association from arbiters to taste, for

example, is likely to be much stronger than that from taste

to arbiters since, while a very high proportion of

occurrences of arbiters is found in co-occurrence with taste,

the reverse is not true. Since many of the task types

included in the present analysis ask test-takers to

identify a collocate when a particular node is given, this

directionality may be important. For this reason, the

analysis will also include the ‘conditional probability’

measure described by Durrant (2008: 84-85). This shows the

probability of a particular word appearing, given that

another particular word has appeared. It is calculated as:

P(w1|w2)w1*w2w1

A further point that needs to be taken into account when

quantifying collocation frequency is the nature of the

corpus consulted. To determine the extent to which

learners’ knowledge of collocation is frequency-driven, the

best corpus would be one representative of each students’

lifetime exposure to the language. Since such corpora do

not exist, we need to work instead with more general

corpora which may approximate to the types of exposure a

variety of learners, on average, experience. With this aim,

two widely used corpora were used: the British National

Corpus (accessed through Davies’s BYU-BNC interface (Davies

2004-)) and the Corpus of Contemporary American (Davies

2008-). Both of these corpora are intended to be

representative of a national variety of English. The BNC is

a corpus of approximately 100 million words of British

English from the late 20th century. It includes around 10

million words of transcribed spoken language and 90 million

words of written language, sampled from across five genres

(academic, fiction, magazine, newspapers, non-academic non-

fiction) plus one “miscellaneous” category. At the time of

writing, the COCA includes around 450 million words of

American English from the years 1990 to 2012. It is sampled

in roughly equal amounts from spoken, academic, fiction,

newspaper and magazine genres. Since it is possible that

certain genres within each corpus will be more

representative of learners’ experience than others,

frequency information was rerieved both for the corpora as

wholes and separately for each genre within them.

A related issue is that of ‘dispersion’ – i.e. the

extent to which a collocation’s occurrences are evenly

spread throughout a corpus. Items which are frequent only

because they are used intensively in a narrow range of

texts represent a different learning prospect from items

which occur regularly throughout the language. In general,

it seems likely that more learners will have more exposure

to a collocation that is widely dispersed than one which is

restricted to a small range of texts. It is therefore worth

asking whether learners have a better chance of knowing

more widely dispersed collocations than those which are

more restricted in their use. Several measures of

dispersion have been proposed in the literature (Gries

2008). The measure adopted here was Gries’s (2008) DP. This

is calculated by (i) dividing the corpus into sections (in

the present analysis, the sections will be the separate

genres within each corpus); (ii) determining the size of

each section and normalizing this against the overall size

of the corpus to determine what percentage of occurrences

of a collocation can be expected to appear in that section,

if the collocation is equally distributed across sections;

(iii) determining the actual percentage of occurrences of

the collocation which is found in each section; (iv)

computing the differences between expected and actual

occurrences of the collocation in each section, summing

these differences and dividing them by two. This provides a

number, ranging between 0 and 1, where values close to 0

show an even distribution of the collocation across

sections and values close to 1 show a strong bias towards

particular sections.

As the discussion so far shows, collocation frequency

can be quantified in many ways. The present research aims

to determine both whether frequency in general is related

to learners’ likelihood of knowing a collocation and which

of the methods of quantifying frequency are the best

predictors of knowledge. With this aim, several different

frequency statistics were employed. The first analyses

employed frequency data from BNC and COCA as wholes.

Collocation frequency was calculated in a number of ways.

As the 4:4 span appears to be the most commonly-used in the

literature (Hoey 2005) and as lemmatized frequencies have

been argued to be the more relevant, the main analysis used

lemmatized frequency with a span of 4:4 words. To determine

whether different results are obtained when span and

lemmatization change, additional counts were made based on

lemmatized frequency with a span of 9:9 words and non-

lemmatized frequency with span of 4:4 words.

In addition, the three measures of association (t-score;

MI; conditional probability) and the measure of dispersion

(DP) discussed above were calculated. To avoid an

unmanageable multiplication of analyses, these measures

were not calculated separately for all of the three

collocation counts. For the reasons described in the

previous paragraph, counts of lemmatized frequency with a

span of 4:4 were used for this purpose. As a second step,

separate frequency data were provided for each genre within

the two corpora, i.e. in COCA: Academic; Fiction; Magazine;

Newspapers; Spoken. In BNC: Academic; Fiction; Magazine;

Newspapers; Non-academic; Spoken. Again to avoid an

unsustainable multiplication of analyses, only lemmatized

collocation frequency with a span of 4:4 were used for each

genre.

3.4 Analysis

Data analysis took part in two stages. First, for each

test, the percentage of learners correctly answering each

item was correlated with each of the frequency measures

described above. Second, a meta-analysis was conducted to

find the average correlations across all 19 tests. While

the first stage is straightforward, the second is more

complicated and will be described here in detail. The

procedures described here draw on the guidance provided by

Lipsey & Wilson (2000).

The aims of a meta-analysis are to provide a single mean

effect size which summarizes results from different studies

and to determine the variation between different studies.

While the former gives an overall indication of the

influence of the main predictor variable, the latter allows

examination of what other variables moderate this effect.

Because studies which are conducted with a large number of

participants are, other things being equal, more likely to

provide a reliable effect size than studies based on

smaller samples, the mean effect size is weighted to give

more importance to studies with larger subject samples.

Weighting is achieved by multiplying each effect size by

the inverse of the standard error for the sample. Because

correlations have problematic standard error formulations,

they are usually transformed using Fisher’s Z-transform

before the weighting takes place. Z-transformed

correlations are calculated using the formula:

ESzr.5loge

1 r1 r

Once Fisher’s Z transformation has been made, the mean

weighted effect size is found by:

(i) Calculating a weighting for each effect size.

This is the inverse of the variance for the sample. In

the present case

SE zr

1n 3

wzr

1SE zr

2 n 3

where n is the sample size;

(ii) Calculating weighted effect sizes by multiplying

each effect size by its weighting;

(iii) Calculating mean weighted effect size by

dividing the sum of weighted effect sizes by the sum of

weightings;

(iv) Calculating the standard error of the weighted

mean effect size. This is calculated as:

SE ES 1wi

(v) Calculating the 95% confidence intervals for the

mean using the standard error. This is calculated by

adding/subtracting the product of the standard error and

the critical value for the z-distribution (1.96)

to/from the mean weighted effect size:

ESL ES 1.96(SE ES )ESU ES 1.96(SE ES )

(vi) Converting the mean correlation and confidence

interval from Z-transformed figures back to the

original correlation type using the inverse

transformation:

r e2ESzr 1e2ESzr 1

As discussed above, the aim of this meta-analysis is to

allow generalization both to a wider body of L2 learners

and to a broader population of collocations. For this

reason, there are two sample sizes of relevance: the number

of participants taking a test, and the number of

collocations included on that test. For this reason, two

meta-analyses were performed, one for each sample size.

Meta-analyses rely on the assumption that results from

the different effect sizes they combine are independent of

each other. This assumption is usually thought to be met if

no more than one effect size in the analysis is taken from

a single subject sample, though some researchers have

argued that results conducted by the same team should also

be considered dependent (Lipsey & Wilson 2000: 112). In the

present meta-analysis, three types of violation of

independence are relevant. Firstly, as Table 2 showed,

there is some overlap between the collocations sampled in

each test. In most cases, the overlaps are small. However,

the two versions of Farghal & Obiedat’s (1995) test, the

five versions of Gyllstad’s (2007) COLLEX test and the four

versions of Kurosaki’s (2012) test have substantial

overlaps. It is therefore not likely that the effect sizes

from these four tests will be independent of each other.

Secondly, the three tests conducted by Jaén (2009) were all

carried out with the same group of participants. Again,

therefore, the assumption of independence is likely not to

have been met. Thirdly, the four sets of studies just

mentioned were each conducted by the same researchers. In

addition to the overlaps in their samples, therefore, they

also fail to meet the stricter criterion that effect sizes

from studies conducted by the same researchers not be

considered independent. For this reason, the correlations

from each of these four sets of tests were combined into

four single values by taking weighted averages of the

correlations from each test. These average correlations

were then used in the meta-analysis, rather than separate

correlations for each test.

4. Results

Results from the first stage in the analysis are shown in

Table 3 (for COCA data) and 4 (for BNC data). As

collocation frequencies are not normally distributed,

spearman’s r was used to quantify correlation. All three

counts of collocation frequency showed positive

correlations with learner knowledge for the majority of

tests, though the size of the correlation varied widely

across tests and between COCA and BNC counts (with the

former producing the higher correlations). The same pattern

holds for correlations with t-scores and conditional

probability. DP shows the expected negative correlation in

a majority of cases. The results for MI show a high degree

of variability, with positive correlations in 13 tests

using COCA data and in 9 tests using BNC data.

There are not sufficient data here to enable a reliable

analysis of factors that might affect variation in scores

between tests. However, it is worth looking at how these

data vary across potential moderators. This is important

both to provide clues as to potential effects that future

research might investigate and to support interpretation of

the meta-analysis, which relies on the assumption that

effect sizes come from a single population and that

differences between effect sizes are due to random errors,

rather than systematic moderating factors. Section 3.2

described four variables that might moderate the current

findings: learners’ experience of studying English (English

majors vs. non-English majors); learners’ L1; task type

(selected-response vs. translation); and whether test-

takers are asked to provide the whole collocation or only

the collocate.

Table 4. Correlations of learner knowledge with COCA frequency data (spearman’s r)

Lemma-Lemma4:4

Lemma-Lemma9:9

Form-Form4:4

t-score MI

Conditionalprobability

Gries’sDP

Abdul-Fattah 0.23 0.16 0.43 0.25

0.32 0.24 0.13

Barfield

0.26 0.31 0.23 0.24

-0.3

1 -0.14 -0.24Brashi

0.49 0.53 0.40 0.45

-0.2

2 0.24 -0.08Farghal & Obeidat (Test 1) 0.34 0.34 0.25 0.45

0.51 0.60 -0.80

Farghal & Obeidat (Test 2) 0.26 0.22 0.27 0.26

0.23 0.07 -0.31

Gyllstad (COLLEX 1) 0.45 0.44 0.38 0.45

0.08 0.52 -0.26

Gyllstad (COLLEX 2)

0.57 0.56 0.39 0.57

-0.0

3 0.33 0.07Gyllstad (COLLEX 3) 0.39 0.37 0.27 0.39

0.16 0.17 -0.05

Gyllstad (COLLEX 4)

0.23 0.22 0.19 0.22

-0.0

8 0.15 -0.14Gyllstad (COLLEX 5) 0.07 0.05 0.10 0.06

0.04 0.14 -0.10

Jaén (Test 1)0.47 0.49 0.43 0.45

0.15 0.35 -0.64

Jaén (Test 2)0.15 0.16 0.12 0.14

0.09 0.22 -0.35

Jaén (Test 4)

0.10 0.13 0.03 0.09

-0.2

6 0.14 -0.17Koya

0.06 0.03 0.09 0.090.4

0 0.10 0.26Kurosaki (MC Fr)

0.04 -0.02 -0.18 0.070.0

2 0.04 0.41Kurosaki (MC Jp)

0.42 0.37 0.19 0.460.0

7 0.30 0.29Kurosaki (trans. Fr) 0.52 0.53 0.35 0.51

0.21 0.62 0.33

Kurosaki (trans. JP) 0.39 0.38 0.41 0.42

0.19 0.40 0.02

Revier

-0.06 0.01 0.18-

0.02

-0.4

5 -0.24 -0.29

Table 5. Correlations of learner knowledge with BNC frequency data (spearman’s r)

Lemma-Lemma4:4

Lemma-Lemma9:9

Form-Form4:4

t-score MI

Conditionalprobability

Gries’sDP

Abdul-Fattah 0.01 0.15 0.09 0.29

0.16 0.23 0.06

Barfield

0.06 0.12 0.16 0.05

-0.3

0 -0.22 -0.13Brashi

0.57 0.62 0.51 0.57

-0.1

1 0.35 -0.43Farghal & Obeidat (Test 1) 0.28 0.34 0.11 0.22

0.17 0.32 -0.28

Farghal & Obeidat (Test 2)

0.04 0.12 0.11 0.01

-0.1

0 -0.13 0.09Gyllstad (COLLEX 1)

0.32 0.33 0.29 0.31

-0.0

3 0.45 -0.29Gyllstad (COLLEX 0.44 0.44 0.27 0.44 - 0.22 0.02

2) 0.15

Gyllstad (COLLEX 3) 0.14 0.16 0.12 0.14

0.06 0.02 -0.01

Gyllstad (COLLEX 4)

0.08 0.08 0.17 0.08

-0.0

9 0.13 -0.04Gyllstad (COLLEX 5) 0.03 0.04 0.18 0.03

0.00 0.14 0.02

Jaén (Test 1)

0.35 0.39 0.34 0.34

-0.2

0 0.38 -0.35Jaén (Test 2)

-0.11 -0.06 -0.03-

0.11

-0.1

8 0.13 -0.22Jaén (Test 4)

0.20 0.20 0.02 0.19

-0.2

2 0.27 -0.17Koya

-0.15 -0.14 -0.02-

0.130.3

3 -0.02 -0.03Kurosaki (MC Fr)

0.07 0.05 0.00 0.100.1

0 0.21 0.01Kurosaki (MC Jp)

0.38 0.37 0.15 0.380.0

3 0.20 -0.06Kurosaki (trans. Fr) 0.31 0.34 0.13 0.30

0.03 0.47 0.24

Kurosaki (trans. JP) 0.26 0.25 0.19 0.27

0.11 0.30 0.03

Revier

-0.13 -0.15 -0.20-

0.16

-0.3

8 -0.14 -0.39

The boxplots in Figures 1-4 show the spreads of

correlations between learner knowledge and each frequency

measure for each of these variables. Space restrictions do

not allow figures to be included for all analyses, so

lemmatized 4:4 counts only are used to represent frequency

counts. Patterns were similar across 9:9 span and non-

lemmatized counts. The majority of the plots do not show

any obvious differences between groups. The only strong

difference is seen in Figure 3, which shows that the three

tests which do not utilize translation or selected-response

tests tend to show a negative correlation with DP which is

not evident on the other test types. There was also a

(slightly weaker) tendency for English majors to show a

negative correlation with DP, which was not present for the

non-English majors (see Figure 1). While these patterns are

weak, and based on a relatively small number of cases

(especially the “other” category of test type), they do

suggest that care will need to be taken in the

interpretation of meta-analysis results related to DP.

The aim of the meta-analysis is to provide a clearer

overview of the trends in this rather mixed data. Results

are shown in Table 5. For COCA data, both frequency and t-

score show weak-to-moderate correlations with learner

knowledge. No clear differences are evident between

different frequency counts: in the by-item analysis,

lemmatized counts had a stronger correlation, while in the

by-participants analysis, the non-lemmatized count did

better. Differences in span also did not seem to affect the

correlation in a consistent way.

Evidence for the other measures was rather mixed.

Conditional probability and DP both showed weak

correlations with knowledge (the former positive, the

latter negative), though neither was statistically reliable

in the by-items analysis. Results for MI are inconsistent,

showing a small negative correlation in the analysis by

items and a positive correlation in the analysis by

participants.

Data from the BNC correlated more weakly with learner

knowledge. As with COCA, there was little difference

between the various measures of collocation frequency. DP

shows a negative (and statistically reliable) correlation

with knowledge at about the same level (r= - .12/-.10) as

that seen for COCA. Conditional probability also showed a

correlation of similar magnitude to that seen for COCA, but

this was again unreliable in the by-items analysis. MI

showed only weak, and statistically unreliable, negative

correlations.

Figure 1: Correlations across English majors (shaded) vs. non-English-majors (white)Lemma-lemma ±4 MI

COCA BNC COCA BNC

Conditional probability DPCOCA BNC COCA BNC

Figure 2: Correlations across L1s (Arabic (dark grey)/European (light grey)/Japanese (white))Lemma-lemma ±4 Lemma-lemma ±9 Form-form ±4

COCA BNC COCA BNC COCA BNC

MI Conditional probability DPCOCA BNC COCA BNC COCA BNC

Figure 3: Correlations across test types (Selected response (dark grey)/Translation (light grey)/Other(white))

Lemma-lemma ±4 Lemma-lemma ±9 Form-form ±4COCA BNC COCA BNC COCA BNC

MI Conditional probability DPCOCA BNC COCA BNC COCA BNC

Figure 4: Correlations across collocate only (shaded) vs. whole collocation (white) testsLemma-lemma ±4 Lemma-lemma ±9 Form-form ±4

COCA BNC COCA BNC COCA BNC

MI Conditional probability DPCOCA BNC COCA BNC COCA BNC

Table 5. Weighted mean correlations of learner knowledge with frequency data (spearman’s r)

COCA BNCAnalysis by items Analysis by participants Analysis by items Analysis by participants

meanr

95% CI mean r 95% CI mean r 95% CI mean r 95% CI lower upper lower upper lower upper lower upper

Lemma-Lemma ±4 0.26 0.14 0.38 0.20 0.14 0.26 0.10 -0.01 0.22 0.07 0.01 0.13Lemma-Lemma ±9 0.27 0.16 0.39 0.19 0.12 0.25 0.14 0.03 0.26 0.13 0.07 0.20Form-Form ±4 0.24 0.12 0.35 0.27 0.20 0.33 0.13 0.02 0.25 0.09 0.03 0.16t-score 0.26 0.14 0.38 0.21 0.14 0.27 0.11 0.00 0.23 0.15 0.10 0.22MI -0.02 -0.12 0.10 0.10 0.04 0.17 -0.07 -0.17 0.04 -0.02 -0.07 0.04Conditional probability 0.09 -0.02 0.21 0.15 0.09 0.22 0.04 -0.07 0.15 0.15 0.09 0.22Gries’s DP -0.09 -0.19 0.02 -0.08 -0.14 -0.02 -0.12 -0.22 -0.02 -0.10 -0.15 -0.04

Table 6 shows the correlations between learners’ knowledge

and collocation frequency in the separate genres of each

corpus. Since there is little reason to believe that the

relationship between genre and knowledge will vary across

different test types, Figures 5 and 6 show boxplots for

correlations across the groups of L1 and English major vs.

non-English major only. As before, the majority of

comparisons show little evidence of patterning across

groups. However, the Academic genre shows a stronger

correlation in the English major than the non-English major

groups and, for the COCA corpus, a trend whereby L1 Arabic

students show the strongest correlation, followed by

European students, and L1 Japanese students the weakest.

The results of meta-analyses averaging these figures

across tests are shown in Table 7. Across both corpora and

both types of analysis, the fiction genre shows the

strongest correlations with learner knowledge while the

academic genre shows the weakest correlations.

5. Discussion

The findings reported above show that some types of corpus

data are reliably related to learner knowledge. This is

consistent with previous research which has suggested that

collocation learning and processing is related to frequency

(Durrant & Schmitt 2010, Siyanova-Chanturia et al. 2011,

Wolter & Gyllstad 2013). It extends the findings of

previous studies by showing: (i) that this relationship is

statistically reliable across a wide range of learners and

collocations; (ii) that it is reflected in a level of

learner knowledge that can be tapped through traditional

test formats; (iii) how the relationship between frequency

and knowledge varies across different corpora and frequency

measures and across different learner groups. COCA was

more strongly related to learner knowledge than the older

and smaller BNC, and data from the fiction sub-corpora of

each corpus were more strongly related with knowledge than

those from other registers. Frequency data from academic

registers had the weakest relationship with knowledge.

Correlations with the academic genre were higher for

English majors than non-English majors and for students

from Arabic-speaking countries than for those from Japan

(with European students being intermediate between the

two).

Table 6. Correlations of learner knowledge with genre frequency data (spearman’s r)COCA BNC

academic

fiction

magazine

newspaper

spoken

academic

fiction

magazine

newspaper

non-academic

spoken

Abdul-Fattah 0.26 0.28 0.42 0.20 0.19 -0.14 0.25 0.05 0.18 0.04 0.03Barfield 0.16 0.34 0.30 0.12 0.23 0.01 0.25 0.11 -0.03 -0.03 0.12Brashi 0.30 0.39 0.42 0.47 0.39 0.47 0.49 0.60 0.54 0.51 0.60Farghal & Obeidat (Test 1) 0.51 0.11 0.34 0.40 0.78 0.69 0.11 -0.14 0.26 0.37 0.68Farghal & Obeidat (Test 2) 0.30 0.17 0.26 0.42 0.17 -0.33

-0.10 0.12 -0.09 -0.03 -0.20

Gyllstad (COLLEX 1) 0.28 0.49 0.41 0.44 0.51 0.18 0.41 0.20 0.39 0.26 0.36Gyllstad (COLLEX 2) 0.33 0.62 0.50 0.44 0.52 0.34 0.50 0.37 0.29 0.28 0.43Gyllstad (COLLEX 3) 0.19 0.42 0.32 0.31 0.35 0.08 0.32 0.16 0.00 0.07 0.16Gyllstad (COLLEX 4) 0.13 0.41 0.22 0.11 0.27 0.04 0.35 0.10 -0.02 0.05 0.20Gyllstad (COLLEX 5) 0.06 0.24 0.12 0.00 0.13 0.03 0.26 -0.02 0.04 0.09 0.08Jaén (Test 1) 0.32 0.70 0.57 0.43 0.40 0.25 0.64 0.31 0.27 0.22 0.31Jaén (Test 2) -0.26 0.38 0.21 0.18 0.31 -0.26 0.34 0.21 0.31 -0.24 0.09Jaén (Test 4) 0.19 0.07 0.07 0.10 0.14 0.27 0.14 0.23 0.38 0.23 0.10Koya -0.11 0.24 -0.04 -0.03 0.02 -0.16 0.11 -0.13 -0.20 -0.21 -0.05Kurosaki (MC Fr) -0.26 0.22 0.03 0.07 -0.02 -0.31 0.11 0.14 0.09 -0.17 0.11Kurosaki (MC Jp) 0.08 0.56 0.45 0.44 0.34 0.11 0.41 0.27 0.42 0.21 0.21Kurosaki (trans. 0.25 0.61 0.49 0.52 0.50 0.14 0.38 0.28 0.32 0.19 0.39

Fr)Kurosaki (trans. JP) 0.09 0.52 0.41 0.45 0.46 0.12 0.43 0.20 0.30 0.14 0.28Revier 0.13 0.29 0.10 0.02 0.13 0.01 0.13 0.20 0.15 0.05 0.10

Figure 5: Correlations by genre across English majors (shaded) vs. non English-majors (white)Academic Fiction Magazine

COCA BNC COCA BNC COCA BNC

Newspaper Non-academic SpokenCOCA BNC COCA BNC COCA BNC

Figure 6: Correlations by genre across L1s (Arabic (dark grey)/European (light grey)/Japanese (white))Academic Fiction Magazine

COCA BNC COCA BNC COCA BNC

Newspaper Non-academic SpokenCOCA BNC COCA BNC COCA BNC

Table 7. Weighted mean correlations of learner knowledge with genre frequency data (spearman’s r)

COCA BNCAnalysis by items Analysis by participants Analysis by items Analysis by participants

mean r 95% CI mean r 95% CI mean r 95% CI mean r 95% CIlower upper lower upper lower upper lower upper

academic 0.12 0.01 0.24 0.15 0.09 0.22 0.03 -0.08 0.14 -0.01 -0.07 0.05fiction 0.39 0.27 0.52 0.38 0.31 0.44 0.29 0.17 0.41 0.30 0.24 0.37magazine 0.26 0.15 0.39 0.32 0.26 0.39 0.13 0.02 0.25 0.12 0.06 0.19newspaper 0.19 0.07 0.31 0.20 0.13 0.26 0.08 -0.03 0.20 0.16 0.10 0.23non-academic - - - - - - 0.03 -0.08 0.14 0.03 -0.03 0.09spoken 0.25 0.13 0.37 0.25 0.18 0.31 0.15 0.04 0.27 0.12 0.05 0.18

The finding regarding English majors is an intuitively

satisfying one, since these students seem likely to have

had greater exposure to academic writing in English than

the other test-takers. The finding regarding L1s is

difficult to interpret in the absence of a survey of the

materials typically used across these different regional

contexts.

A priority for future research should be to understand

more about the relationship between types of corpora and

learner knowledge. This relationship is likely to depend on

the degree of fit between a particular corpus and a

particular group of learners, and it is important to

understand these group differences if we are to produce

fair tests and valid research. In the present study, group

differences only became evident when drawing on the

relatively specialized academic sub-corpora of COCA and the

BNC. The full national corpora, and the other sub-corpora

within them, did not evidence strong group-based biases.

This suggests that principled choice of corpora is needed

to create tests that can be used fairly across groups.

Given the limited range of groups covered, however, further

research is needed to understand these relationships.

A small but statistically reliable (in three of the four

analyses) relationship was also found between learners’

knowledge and the dispersion of collocations across the

corpora. This suggests that it may be worth incorporating

this measure into future studies of collocation and

sampling strategies for collocation tests. However, it is

important to note that the effect sizes for DP may not have

been equal across groups, and the assumptions of the meta-

analysis therefore violated. It would be of interest for

future research to understand whether (and why) the

influence of DP can be shown to differ consistently across

learner groups and test types.

The relationships between learner knowledge and

conditional probability and MI were inconsistent. The

former failed to show a statistically reliable correlation

in two out of the four meta-analyses; the latter failed to

show a relationship in three out of four. The sizes of

relationships were also small and – in the case of MI –

inconsistent in terms of direction (conditional probability

achieving rs of .09, .15, .04 and .15; MI achieving

-.02, .10, -.07 and -.02).

To my knowledge, this is the first research to test the

relationship between conditional probability and learner

knowledge. The finding with regard to MI, however, is

consistent with Ellis et al. (2008), who fine in a range of

psycholinguistic experiments that, while native speakers’

accuracy and fluency of processing lexical bundles

increased as MI score increased, the same was not true of

advanced L2 learners at a US university. The confirmation

of this finding here suggests that learners’ lack of

sensitivity to MI is a general one, applicable to two-word

collocations, as well as lexical bundles; to knowledge as

it is tapped by traditional test methods, as well as to

processing speed and accuracy; and to a wide range of L2

learners in EFL as well as ESL contexts.

As was discussed above, both MI and conditional

probability differ from raw frequency measures and t-score

in that they depend primarily on the strength of attraction

between a collocation’s elements, rather than on its

overall frequency as an item. Native speakers’ sensitivity

to, and ability to acquire, relatively infrequent, but

strongly associated, pairs suggests that they are aware,

not only of how frequent a collocation is, but of how

frequent its parts are, compared to the frequency of the

whole. Non-natives, it seems, do not typically have this

awareness. Much discussion with regard to collocation

learning has started from the suggestion that L1 learners

pay attention to chunks of language, ignoring their

constituent parts, while L2 learners focus on individual

words (e.g. Wray 2002). The present findings, and those of

Ellis et al (2008) seem to point to the opposite

conclusion: L1 learners notice both collocations and their

components, while L2 learners focus only on the whole

collocation.

Such a model would provide an explanation for the

results of Durrant & Schmitt (2009), who report that L2

learners’ writing makes extensive use of high-frequency

collocations, but underuses lower-frequency collocations

with high MIs (e.g. densely populated, bated breath, arbiter taste).

It is the underuse of this type of collocation, Durrant &

Schmitt argue, rather than a lack of collocation knowledge

in general, that accounts for the often-reported sense that

L2 writing lacks idiomaticity due to a lack of collocations

(e.g. Kjellmer 1990). If L2 learners tend to notice only

collocation frequency, and not strength of attraction, it

is unsurprising that they fail to develop a repertoire of

low-frequency, high-MI collocations.

Some theoretical models (most prominently, Nation 2001)

conceptualize collocation knowledge as an aspect of single-

word knowledge: one thing we know about a word is the other

words with which it is likely to collocate. It is also

possible to see collocation knowledge as a construct in its

own right. The definition of collocations outlined in

Section 2 allows for collocations to be known as

independent entities, divorced from knowledge of any

individual words. It seems likely that a comprehensive

model of collocation knowledge would need to combine these

perspectives. However, the above considerations suggest

that the single-word perspective may be less pertinent in

non-native than in native speaker knowledge. Models of

collocation knowledge covering both native and non-native

learners will need to take this possibility into account.

A final point that should be addressed concerns the

sparse nature of the data which I was able to accumulate

for analysis. My review of the literature identified 85

studies reporting tests of collocation knowledge. However,

little more than half of these (46/85) indicated the items

they had used in their tests. Only about one study in six

provided results on an item-by-item basis. In all general

tests of collocation, the sample of items tested is, of

necessity, small in proportion to the population of

collocations to which we wish to generalize. Given this, I

would argue that providing detailed information about the

contents of each test is vital if meaningful conclusions

are to be drawn and I would call on all researchers working

in this area to provide such information in future work.

6. Conclusions

The analyses presented here suggest that frequency data

should be used as part of the process of sampling

collocations for selective tests. However, they also show

that different corpora correlate differently with knowledge

in different groups of learners. Testers and researchers

need to take this variation into account in their test

designs.

While frequency is clearly important, it will never

predict knowledge perfectly. As Ellis & Larsen-Freeman

(2006) warn, learning is an outcome of complex processes

and single factors rarely account for more than 16-25% of

the variance in knowledge. The inevitable mismatch between

the contents of corpora and learners’ actual experience of

the language, and the error built into frequency counts due

to our current inability to distinguish between different

uses of polysemous items also mean that the frequency-

knowledge relationship will inevitably be a noisy one.

Though tests of single-word vocabulary have tended to focus

on frequency in sampling, therefore, it is important to

remember that this can provide only a rudimentary strategy.

To achieve more generalisable results, we need to work

towards integrating further variables into the sampling

frame. Previous research on collocation suggests that, at

the very least, the influence of the L1 should be included

(Wolter & Gyllstad 2011, Yamashita & Jiang 2010), and the

current study also suggests that learners from different

language backgrounds and different types of language

education may be differently sensitive to frequencies in

particular registers. This argues for a multi-dimensional

approach to item sampling, which we are not yet in a

position fully to specify.

Further work is needed, therefore, to investigate

exactly how frequency and other variables can be combined

in a balanced strategy. Candidate variables might include

L1, learning context, learner proficiency, collocation type

and test type. We also need to acknowledge that the

unpredictable interaction of variables, many of which

cannot be measured, implies that no sampling strategy will

ever enable error-free generalizations beyond a sample. It

is important therefore, to determine how much variation is

likely to be missed by our sampling strategies so that we

can hedge our interpretations accordingly. Until we have

more information on these issues, researchers’ and testers’

claims based on tests of collocation knowledge (and,

indeed, test of vocabulary more generally) need to be

treated with appropriate caution.

This study has also provided further evidence for L2

learners’ lack of sensitivity to the strength of attraction

between words (as opposed to overall collocational

frequency). I have suggested that this may indicate a

holistic approach to collocation learning, which does not

take account of the frequencies of individual component

words. I have argued that this provides an explanation for

previous findings that L2 learners’ writing is

distinguished from that of native speakers by the

relatively low levels of use of collocations which are

infrequent but highly salient to natives because of the

strength of attraction between their parts. Further

research is needed to test this possibility.

Finally, it has become apparent in the course of this

study that studies of collocation knowledge often suffer

from insufficient reporting of the details of the tests

they use. It is important that future research in this area

details the contents of tests and provides statistics which

enable readers to understand the impacts of different items

on overall test scores.

References

Abdul-Fattah, H. S. 2001. “Collocation: A missing chain from

Jordanian basic education stage English language

curriculum and pedagogyˮ. Dirasat, Humand and Social Sciences, 28

(2), 582-596.

Barfield, A. 2003. Collocation Recognition and Production: Research Insights.

Tokyo: Chuo University.

Barfield, A. & Gyllstad, H. 2009. “Introduction: Researching L2

collocation knowledge and developmentˮ. In A. Barfield &

H. Gyllstad (Eds.), Researching Collocations in Another Language.

Basingstoke: Palgrave Macmillan, 1-18.

Biber, D. 2009. “A corpus-driven approach to formulaic language

in English: Multi-word patterns in speech and writingˮ.

International Journal of Corpus Linguistics, 14 (3), 275-311.

Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E.

1999. Longman Grammar of Spoken and Written English. Harlow:

Longman.

Bonk, W. J. 2001. “Testing ESL learners knowledge of

collocationsˮ. In T. Hudson & J. D. Brown (Eds.), A Focus on

Language Test Development. Honolulu: University of Hawaii

Press, 113-142.

Brashi, A. 2009. “Collocability as a problem in L2 productionˮ.

Reflections in English Language Teaching, 8 (1), 21-34.

Cheng, W., Greaves, C. & Warren, M. 2006. “From n-gram to

skipgram to concgramˮ. International Journal of Corpus Linguistics, 11

(4), 411-433.

Clear, J. 1993. “From Firth principles: Computational tools for

the study of collocationsˮ. In M. Baker, G. Francis & E.

Tognini-Bonelli (Eds.), Text and Technology: in Honour of John

Sinclair. Amsterdam: John Benjamins, 271-292.

Cooper, H. 1998. Synthesizing Research: A Guide for Literature Reviewers.

London: Sage.

Cowie, A. P. (Ed.) 1998. Phraseology: Theory, Analysis, and Applications.

Oxford: Oxford University Press.

Davies, M. 2004-. BYU-BNC (Based on the British National Corpus

from Oxford University Press). Available online at

http://corpus.byu.edu/bnc/ (accessed July 2014).

Davies, M. 2008-. The Corpus of Contemporary American: 450 million words,

1990-present. Available online at:

http://corpus.byu.edu/coca/ (accessed July 2014).

Durrant, P. 2008. High-frequency Collocations and Second Language Learning.

Unpublished PhD thesis, University of Nottingham,

Nottingham.

Durrant, P. & Doherty, A. 2010. “Are high-frequency collocations

psychologically real? Investigating the thesis of

collocational primingˮ. Corpus Linguistics and Linguistic Theory, 6

(2), 125-155.

Durrant, P. & Mathews-Aydinli, J. 2011. “A function-first

approach to identifying formulaic language in academic

writingˮ. Journal of English for Specific Purposes, 30 (1), 58-72.

Durrant, P. & Schmitt, N. 2009. “To what extent do native and

non-native writers make use of collocations?ˮ. International

Review of Applied Linguistics, 47 (2), 157-177.

Durrant, P. & Schmitt, N. 2010. “Adult learnersʼ retention of

collocations from exposureˮ. Second Language Research, 26 (2),

163-188.

Ellis, N. C. 2001. “Memory for languageˮ. In P. Robinson (Ed.),

Cognition and Second Language Instruction. Cambridge: Cambridge

University Press, 33-68.

Ellis, N. C. & Larsen-Freeman, D. 2006. “Language emergence:

Implications for applied linguistics – Introduction to the

Special Issueˮ. Applied Linguistics, 27 (4), 558-589.

Ellis, N. C., Simpson-Vlach, R. & Maynard, C. 2008. “Formulaic

language in native and second-language speakers:

Psycholinguistics, corpus linguistics, and TESOLˮ. TESOL

Quarterly, 41 (3), 375-396.

Gardner, D. 2008. “Validating the construct of word in applied

corpus-based vocabulary research: A critical surveyˮ.

Applied Linguistics, 28 (2), 241-265.

Goldberg, A. E. 2006. Constructions at Work: The Nature of Generalization in

Language. Oxford: Oxford University Press.

Gries, S. T. 2008. “Dispersions and adjusted frequencies in

corporaˮ. International Journal of Corpus Linguistics, 13 (4), 403-

437.

Gyllstad, H. 2007. Testing English Collocations: Developing Receptive Tests for

Use with Advanced Swedish Learners. Lund University, Lund.

Halliday, M. A. K. 1966. “Lexis as a linguistic levelˮ. In C. E.

Bazell, J. C. Catford, M. A. K. Halliday & R. H. Robins

(Eds.), In Memory of J. R.Firth. London: Longmans, Green and Co.

Ltd., 148-162.

Hoey, M. 1991. Patterns of Lexis in Text. Oxford: Oxford University

Press.

Hoey, M. 2005. Lexical Priming: A New Theory of Words and Language. London:

Routledge.

Howarth, P. 1998. “The phraseology of learnersʼ academic

writingˮ. In A. P. Cowie (Ed.), Phraseology: Theory, Analysis, and

Applications. Oxford: Oxford University Press, 161-186.

Jaén, M. M. 2009. Recopilación, Desarrollo Pedagógico y Evaluación de un Banco

de Colocaciones Frecuentes de la Lengua Inglesa a Través de la Lingüística de

Corpus y Computacional. Unpublished PhD thesis. Universidad de

Granada, Granada.

Jones, S. & Sinclair, J. M. 1974. “English lexical collocations.

A study in computational linguisticsˮ. Cahiers de Lexicologie,

24 (2), 15-61.

Kjellmer, G. 1990. “A mint of phrasesˮ. In K. Aijmer & B.

Altenberg (Eds.), English Corpus Linguistics: Studies in Honour of Jan

Svartvik. London: Longman, 111-127.

Kurosaki, S. 2012. An Analysis of the Knowledge and Use of English Collocations

by French and Japanese Learners. Unpublished PhD thesis.

University of London Institute in Paris, Paris.

Larsen-Freeman, D. & Cameron, L. 2008. Complex Systems and Applied

Linguistics. Oxford: Oxford University Press.

Lewis, M. 1993. The Lexical Approach: The State of ELT and a Way Forward.

London: Thomson Heinle.

Lipsey, M. W. & Wilson, D. B. 2000. Practical Meta-Analysis. London:

Sage.

Manning, C. D. & Schütze, H. 1999. Foundations of Statistical Natural

Language Processing. Cambridge, MA: MIT Press.

Milton, J. 2009. Measuring Second Language Vocabulary Acquisition.

Bristol: Multilingual Matters.

Nation, P. 1990. Teaching and Learning Vocabulary. Boston: Heinle and

Heinle.

Nation, P. 2001. Learning Vocabulary in another Language. Cambridge:

Cambridge University Press.

Nattinger, J. R. & DeCarrico, J. S. 1992. Lexical Phrases and Language

Teaching. Oxford: Oxford University Press.

Nesselhauf, N. 2004. “What are collocations?ˮ. In D. J.

Allerton, N. Nesselhauf & P. Skandera (Eds.), Phraseological

Units: Basic Concepts and their Application. Basel: Schwabe, 1-21.

Norris, J. M. & Ortega, L. 2006. “The value and practice of

research synthesis for language learning and teachingˮ. In

J. M. Norris and L. Ortega (Eds.). Synthesizing Research on

Language Learning and Teaching. Amsterdam: John Benjamins, 3-50.

Palmer, H. E. 1933. Second Interim Report on English Collocations. Tokyo:

Kaitakusha.

Pawley, A. & Syder, F. H. 1983. “Two puzzles for linguistic

theory: Nativelike selection and nativelike fluencyˮ. In

J. C. Richards & R. W. Schmidt (Eds.), Language and

Communication. New York: Longman, 191-226.

Read, J. 2000. Assessing Vocabulary. Cambridge: Cambridge University

Press.

Revier, R. L. 2009. “Evaluating a new test of whole English

collocationsˮ. In A. Barfield & H. Gyllstad (Eds.),

Researching Collocations in another Language. Basingstoke: Palgrave

Macmillan, 125-138.

Schmitt, N. 2010. Researching Vocabulary: A Vocabulary Research Manual.

Basingstoke: Palgrave Macmillan.

Schmitt, N. & Zimmerman, C. B. 2002. “Derivative word forms:

What do learners know?ˮ. TESOL Quarterly, 36 (2), 145-171.

Shin, D. & Nation, P. 2008. “Beyond single words: The most

frequent collocations in spoken Englishˮ. ELT Journal, 62

(4), 339-348.

Sinclair, J. M. 1991. Corpus, Concordance, Collocation. Oxford: Oxford

University Press.

Sinclair, J. M. 2004. “The search for units of meaningˮ. In J.

M. Sinclair, Trust the Text: Language, Corpus and Discourse. London:

Routledge, 24-48.

Siyanova-Chanturia, A., Conklin, K. & van Heuven, W. J. B. 2011.

“Seeing a phrase ʻtime and againʼ matters: The role of

phrasal frequency in the processing of multiword

sequencesˮ. Journal of Experimental Psychology: Learning, Memory, and

Cognition, 37 (3), 776-784.

Stefanowitsch, A. & Gries, S. T. 2003. “Collostructions:

Investigating the interaction of words and constructionsˮ.

International Journal of Corpus Linguistics, 8 (2), 209-243.

Stubbs, M. 1996. Text and Corpus Analysis. Oxford: Blackwell.

Taeko, K. 2005. The Acquisition of Basic Collocations by Japanese learners of

English. Wasedo University.

Webb, S. & Kagimoto, E. 2011. “Learning collocations: Do the

number of collocates, position of the node word, and

synonymy affect learning?ˮ. Applied Linguistics, 32 (3), 259-

276.

Wolter, B. & Gyllstad, H. 2011. “Collocational links in the L2

mental lexicon and the influence of L1 intralexical

knowledgeˮ. Applied Linguistics, 32 (4), 430-449.

Wolter, B. & Gyllstad, H. 2013. “Frequency of input and L2

collocational processing: A comparison of congruent and

incongruent collocationsˮ. Studies in Second Language Acquisition,

35 (3), 451-482.

Wray, A. 2002. Formulaic language and the Lexicon. Cambridge: Cambridge

University Press.

Yamashita, J. & Jiang, N. 2010. “L1 influence on the acquisition

of L2 collocations: Japanese ESL users and EFL learners

acquiring English collocationsˮ. TESOL Quarterly, 44 (4),

647-668.

Author’s address

Philip Durrant

Graduate School of Education

University of Exeter

St. Luke’s Campus, Heavitree Road

EX1 2LU, Exeter

UK

[email protected]