Upload
exeter
View
0
Download
0
Embed Size (px)
Citation preview
NB. This is the author’s post-refereed version
of a paper to be published in the International
Journal of Corpus Linguistics 19/4 in November 2014.
Copyright belongs to John Benjamins
Publishing. Please contact John Benjamins for
further reprinting or re-use. This paper was
first submitted for publication October 2012.
Corpus frequency and second language learners’
knowledge of collocations
A meta-analysis
Philip DurrantUniversity of Exeter
Tests of second language learners’ knowledge of collocation
have lacked a principled strategy for item selection,
making claims about learners’ knowledge beyond the
particular collocations tested difficult to evaluate.
Corpus frequency may offer a good basis for item selection,
if a reliable relationship can be demonstrated between
frequency and learner knowledge. However, such a
relationship is difficult to establish satisfactorily,
given the small number of items and narrow range of test-
takers involved in any individual study. In this study, a
meta-analysis is used to determine the correlation between
learner knowledge and frequency data across nineteen
previously-reported tests. Frequency is shown to correlate
moderately with knowledge, but the strength of this
correlation varies widely across corpora. Strength of
association measures (such as mutual information) do not to
correlate with learner knowledge. These findings are
discussed in terms of their implications for collocation
testing and models of collocation learning.
Keywords: collocation, testing, frequency, formulaic
language, vocabulary, SLA
1. Introduction
It has long been recognized that collocations are pervasive
in language (Hoey 2005, Sinclair 2004), and that a healthy
repertoire of collocations is essential to mastery of a
foreign language (e.g. Kjellmer 1990, Lewis 1993, Nattinger
& DeCarrico 1992, Palmer 1933, Pawley & Syder 1983).
Research on second language learners’ knowledge and
acquisition of collocations has gathered pace in recent
years, with greater integration of corpus and
psycholinguistic methods advancing our understanding and
allowing ever more specific questions to be addressed (e.g.
Durrant & Schmitt 2010; Webb & Kagimoto 2011; Wolter &
Gyllstad 2011, 2013; Yamashita & Jiang 2010).
A major outstanding issue in this area is that of how
learners’ knowledge of collocations can be validly assessed
While a number of recent studies have evaluated various
test formats (Barfield 2003, Bonk 2001, Gyllstad 2007,
Moreno Jaén 2009, Revier 2009), the key question of how
collocations can be reliably sampled for inclusion as test
items has not been addressed.
As with all ‘selective’ (Read 2000) vocabulary tests,
collocation tests utilize samples of items which are small
in proportion to the population from which they are drawn.
Because we are usually interested not just in learners’
knowledge of the particular items tested, but of
collocations more generally, it is essential that items be
selected in a principled way to allow inference beyond the
sample.
In relation to single-word vocabulary, word frequency
has been shown to correlate with likelihood of knowledge
(Milton 2009) and it has become common practice to sample
items according to this variable. Typically, words are
grouped into frequency ‘bands’ and learners’ performance on
a sample of words is taken to reflect their knowledge of
words in that band (Nation 1990). Since collocation can be
seen as a type of vocabulary (in that collocations are
linguistic items which need to be specifically learned,
rather than being derivable from rules – see Section 2,
below) and since models of collocation have claimed that L1
collocation learning is frequency-driven (e.g. Ellis 2001,
Hoey 2005), it is tempting to extend this strategy to
collocation tests. However, at least two considerations
suggest that it would be unwise to do so without further
evidence.
First, some researchers have doubted whether collocation
learning is frequency-driven for second language learners.
Wray (2002), for example, has claimed that adult L2
learners tend not to notice and remember the collocations
they encounter. This view suggests that collocation
learning is usually the result of explicit memorization of
selected forms, rather than exposure, and so implies that
collocation knowledge may not be sensitive to frequency.
A second reason to question the frequency-knowledge link
for collocations lies in the nature of the corpus data on
which frequency counts are based. The logic behind using
frequency to predict knowledge is that the more frequent an
item is, the more likely learners are to have met it
repeatedly. However, few corpora are likely to be
representative of the language which any individual learner
has encountered (Durrant & Doherty 2010). Corpora are
generally designed to represent, not individuals’
experiences, but rather particular types of discourse. A
further problem is introduced by limitations in the ways
that frequency counts are conducted. In particular, in the
absence of fine-grained semantic tagging, counts do not
distinguish different senses of polysemous words. Since it
is likely that language learners do make such distinctions,
this constitutes a further distortion of their experience
of the language.
Studies have shown that corpus-based frequency counts
are a reasonable guide to learners’ knowledge of relatively
frequent words (especially for the 4,000 most frequent
words) (Milton 2009). However, the correlation weakens
considerably at lower frequencies (Milton 2009). This is
probably because, whereas frequent words tend to be
frequent across a wide range of situations, lower frequency
words are usually associated with particular contexts, and
their frequency therefore tends to vary between corpora.
For such words, the assumption of a correlation between
frequency in a particular corpus and frequency in a given
learner’s experience is dubious. This raises problems for
collocations because individual items tend to be relatively
infrequent. Shin & Nation (2008), for example, find that
only 891 collocation have frequencies similar to those of
the 4,000 most frequent single words, where Milton (2009)
finds frequency to be a reliable predictor.
While the factors discussed above suggest that the
relationship between corpus frequency and L2 knowledge of
collocations may not be entirely straightforward, a number
of recent studies have suggested that some relationship
does exist. Durrant & Schmitt (2010) find that – contrary
to Wray’s (2002) claims – adult second language learners do
retain memories of which words appear together in the
language they meet, and that greater repetition leads to
greater retention. Similarly, both Siyanova-Chanturia et
al. (2011) and Wolter & Gyllstad (2013) find that adult L2
users’ speed of processing English collocations was
affected by collocation frequency.
While these studies suggest that frequency is related to
L2 knowledge, at least two caveats should be noted.
Firstly, these studies were conducted with a relatively
narrow range of high proficiency learners. Durrant &
Schmitt (2010) and Siyanova-Chanturia et al.’s (2011)
participants were all students at a single British
university, while Wolter & Gyllstad’s (2013) participants
showed an impressive mean vocabulary size of 7,350 words,
putting them above even the 3,750-5,000 words which are
associated with the highest levels (C1 and C2) of the
Common European Framework (Wolter & Gyllstad 2013). Whether
similar effects will be seen for learners below these high
levels remains an open question.
Secondly, these studies did not aim to measure knowledge
of the type which is tapped in typical test formats, but
rather efficiency of processing (Siyanova-Chanturia, et al.
2011, Wolter & Gyllstad 2013) or priming relationships
between words (Durrant & Schmitt 2010). Further work is
needed to determine whether the frequency effects which
these studies show through eye-fixation durations, response
times to decision tasks, or priming is also found in
students’ responses on standard test tasks.
In response to these issues, the present paper will
investigate the extent to which learners’ knowledge of
collocations, as measured by typical test formats, is
related to collocations’ frequency in a corpus. It aims
both to establish whether corpus frequency is a valid
strategy for sampling collocation test items and to give
guidance on which types of frequency information are most
relevant to collocation sampling.
One way of studying the frequency-knowledge relationship
would be to create a test including a set of collocations
of different frequencies and to determine whether the
number of students knowing each collocation is correlated
with collocation frequency. However, any individual test
administration would be limited by the inevitably small
sample of collocations used, the testing method employed
and any peculiarities of the test-takers. To gain a more
robust data set, therefore, existing literature was
reviewed to identify studies which report collocation
tests. Frequency data were then retrieved for the
collocations in these tests, and correlations between
frequency and the percentage of students answering
correctly were determined. Meta-analytic techniques were
then used to determine overall correlations across all
studies. As with all meta-analyses, the logic is that, if
the effect we are seeking is robust across multiple studies
despite the different types of error inherent in each, we
can have a high degree of confidence that the effect is
real (Cooper 1998).
2. Defining collocation
The term ‘collocation’ has been used by researchers from a
variety of traditions and has been defined in a several
different ways (see e.g. Barfield & Gyllstad 2009,
Nesselhauf 2004). It is therefore important for any study
of collocation to define its scope clearly.
Durrant & Mathews-Aydinli (2010) describe three main
orientations:
(i) ‘Phraseological approaches’ (e.g. Cowie 1998,
Nesselhauf 2004) define collocations as word
combinations in which one element does not carry its
usual meaning (e.g. take a step, explode a myth) or in which
there are restrictions on which words can enter a
combination (e.g. commit can only be followed by a
small number of nouns, related to wrongdoing, shrug
combines almost exclusively with shoulders).
(ii) ‘Frequency-based approaches’ (e.g. Biber 2009,
Hoey 1991, Sinclair, 1991) define collocations as sets
of words which have a statistical tendency to co-occur
in texts. These are likely to include collocations as
defined in the phraseological approach (e.g. shrug is
statistically highly likely to co-occur with shoulders)
but also include combinations which do not exhibit
semantic specialization or restriction (e.g. next
week, drink tea).
(iii) ‘Psycholinguistic approaches’ (e.g. Hoey 2005,
Wray 2002) define collocations as combinations of
words which have psychological reality in that they
are stored holistically or there is an associative
link between their elements. This has clear overlaps
with the previous categories in that both semantic
specialization/restriction and high frequency of
occurrence are likely to imply some form of mental
representation.
In spite of their differences, all three of these
approaches share the idea that collocations are
combinations whose behaviour cannot be fully explained in
terms of features of their component words and therefore
need to be handled as partially independent entities, with
their own semantic/ distributional/psycholinguistic
properties. For language learning purposes, this
corresponds to the idea that collocations are combinations
of words which need to be independently learned. This
conception is captured well by Palmer’s (1993) definition
of collocations as:
successions of words […] that (for various, different
and overlapping reasons) […] must or should be learnt,
or is best or most conveniently learnt as an integral
whole or independent entity, rather than by the
process of placing together their component parts.
(Palmer 1933: 4)
To be of use, however, Palmer’s (1993) definition needs to
be developed in two ways. First, it is necessary to spell
out the “various, different and overlapping reasons”
(Palmer 1993: 4) why a succession of words might be best
learnt as a whole. Three main reasons can be cited here:
(i) Semantic opacity: the collocation is semantically
non-transparent. I.e. its meaning cannot be reliably
predicted on the basis of a knowledge of the meaning
its component parts have in other contexts. Examples
include: small talk and curry favour. Without specific
knowledge of the meanings of such collocations, a
learner is unlikely to be able to understand or produce
them accurately.
(ii) Received usage: particular collocations may
become the conventional way of expressing a particular
meaning, even though other phrasings are equally
plausible. Examples include: answer phone and slit throat.
Without specific knowledge of such pairings, a learner
has a good chance of guessing the “wrong” combination
and their language is likely to sound “inauthentic”
(Pawley & Syder 1983).
(iii)Fluency: the combination occurs with such high
frequency that learning it as an item is likely to
promote fast and accurate (efficient) language
processing. Possible examples include: sunny day and salt
and pepper. Without knowledge of such collocations, a
learner may not be able to achieve nativelike fluency
(Pawley & Syder 1983).
Second, Palmer (1993) does not specify how many words
collocations can have. His examples (e.g. there is something the
matter with you, to be difficult for someone to do something) seem to
indicate that he has no particular limit in mind. Some
researchers in the frequency-based/psychological traditions
have similarly called combinations of any number of words
‘collocations’ (Biber 2009, Hoey 2005, Kjellmer 1990,
Sinclair 1991). However, corpus linguists often make a
distinction between two-word combinations and longer
sequences, with longer combinations commonly referred to by
other terms, such as ‘lexical bundle’ (Biber et al. 1999,
Ellis et al. 2008), ‘n-gram’ or ‘concgram’ (Cheng et al.
2006). The differences between these labels are important
in corpus research because each involves a different search
strategy. For example, whereas lexical bundles are
retrieved as fixed contiguous sequences of words,
collocations are usually searched for as pairs of words
frequently appearing within a certain distance of each
other, so allowing greater flexibility regarding their
relative positions. Similarly, the various measures which
are used to quantify collocation frequency (reviewed below)
can vary dramatically across combinations of different
length, with frequency dropping and mutual information
increasing sharply as the lengths of combinations increase.
For these reasons, frequency data about positionally-
flexible two-word collocations are not strictly comparable
with frequency data about other types of word combination.
For studies, such as the present one, which make extensive
use of such data, it is therefore important to maintain a
distinction between combinations of different lengths. For
this reason, the term ‘collocation’ will be used here to
refer only to combinations of two words within a given
span.
Taking these points into consideration, Palmer’s (1933)
formulation can be adapted to define collocations as:
combinations of two words that are best learned as
integral wholes or independent entities, rather than
by the process of placing together their component
parts, either because (i) they may not be understood
or appropriately produced without specific knowledge,
or (ii) they occur with sufficient frequency that
their independent learning will facilitate fluency.
3. Material and methods
Meta-analysis is a technique for synthesizing existing
research in order to clarify the relationships between the
main variables and to understand the effects of moderating
variables (e.g. Cooper 1998, Lipsey & Wilson 2000). Meta-
analytic work in the field of language learning is usefully
reviewed by Norris & Ortega (2006), who describe three main
stages in a meta-analysis: sampling of relevant studies;
coding of data relevant to each study; and analysis. The
following sections will describe each of these stages in
turn.
3.1 Sampling
The first step in the meta-analysis was a comprehensive
search of the literature to identify relevant data. To
ensure relevance to the research questions and
comparability between studies, it is important at this
stage to define clear criteria for inclusion in the review.
In the present case, studies needed to include descriptions
of selective tests of non-native speakers’ knowledge of
English collocations and provide information about the
numbers of learners answering each test item correctly.
The first step in the review was to search five major
databases using the search term: collocation* AND (test* OR learn*
OR knowledge). The databases searched were: (i) Web of
Knowledge (topic search, refined to ‘Arts Humanities’ and
‘Social Sciences’); (ii) ERIC (abstract search); (iii)
Linguistics and Language Behavior Abstracts (abstract
search); (iv) PsychInfo (abstract search); (v) PsycArticles
(abstract search). Further, both Google and Google Scholar
were searched using the search term language collocation* test*
learn* acquisition*. Because of the large number of (often
irrelevant) hits returned by Google, only the first 250
results were used.
The abstracts of all retrieved items were checked to
determine whether they included empirical studies which
involved selective tests of non-native speakers’
collocation knowledge in English. 35 such studies were
identified.
The second step was to check the bibliographies of all
relevant studies for further studies. Google Scholar was
also checked for studies citing the retrieved works. Any
publication whose title suggested it included some
evaluation of learners’ collocational knowledge was
retrieved and checked to see if it met the inclusion
criteria. This process was repeated recursively with all
newly-identified studies.
The review was restricted to papers written in English
and either freely available online or accessible through my
institution’s library. In a small number of cases,
references from other sources suggested that a source which
was written in another language or which was not freely
available contained information of the type required. These
sources (i.e. Jaen 2009, Barfield 2003) were obtained
through direct contact with the authors or through my
institution’s library.
This process returned a total of 85 studies. Of these,
very few included the information required for the meta-
analysis. Only 46 studies recorded which collocations were
included in their tests. Of these, 14 provided data on the
number of students answering each item correctly. Four of
these were excluded because test items did not have unique
correct answers and so did not allow show whether learners
knew specific target collocations; one was excluded because
it focused on collocations from a narrowly-defined area of
discourse (Maritime English) for which available corpora
were unlikely to provide valid frequency data.
Some of the nine remaining publications included more
than one test and so provided multiple data sources. As
discussed above, tests were only included if items had a
single correct answer. This meant that, for example,
Gyllstad’s (2007) COLLMATCH tests and Jaen’s (2009) test 3
were not included in the analysis.
This process provided a total of 19 different tests,
summarized in Table 1. The tests were conducted by nine
different researchers in eight different countries.
Participant numbers ranged from 18 to 340, with a total of
1,568 distinct test takers.
Table 1. Tests included in the meta-analysisSource Participants Items Test formatAbdul-Fattah 2001
340 10th grade students at10 different schools in Jordan
12 V + N2 Adj + N
4-option selected response sentence completion; node given, collocate selected
Barfield 2003(Chp 3)
93 students at a university in Japan (various departments)
99 V + N 4-point self-reportknowledge scale; nocontext given
Brashi 2009 20 senior undergraduate English Language studentsat a university in Saudi Arabia
20 V + N 4-option selected response completionof sentence context; noun given, verb selected
Farghal & 34 junior/senior English 7 Adj + Sentence completion
Obeidat 1995 (Test 1)
majors at a university inJordan
N
Farghal & Obeidat 1995 (Test 2)
23 senior English majors at a university of Jordan
15 Adj +N2 N + N
Whole sentence translation from L1(Arabic)
Gyllstad 2007(COLLEX 1)
18 2nd year undergraduate ELT students at a university in Sweden
59 V + N 2-option selected response; noun given, verb selected; no context given
Gyllstad 2007(COLLEX 2)
84 1st year undergraduate English Language studentsat a university in Sweden
48 V + N12 Adj +N2 N + V1 Adv + Adj
2-option selected response; noun given, collocate selected; no context given
Gyllstad 2007(COLLEX 3)
116 1st-2nd year undergraduate English Language students at a university in Sweden
38 V + N8 Adj + N2 Adv + Adj
2-option selected response; noun given, collocate selected; no context given
Gyllstad 2007(COLLEX 4)
188 students in Sweden (26 10th grade high school; 28 11th grade high school; 134 1st year English language undergraduates)
38 V + N8 Adj + N2 Adv + Adj
2-option selected response; noun given, collocate selected; no context given
Gyllstad 2007(COLLEX 5)
24 students in Sweden (7 11th grade high school; 171st year undergraduate English Language students)
38 x V +N
3-option selected response; noun given, verb selected; no context given
Jaén 2009 (Test 1)
311 undergraduate EnglishPhilology/English Translation and Interpretation students at three universities in Spain
22 Adj +N13 V + N6 N + N1 N + V
C-test with sentence context; noun and first letter of collocategiven
Jaén 2009 (Test 2)
311 undergraduate EnglishPhilology/English Translation and Interpretation students at three universities in Spain
16 Adj +N11 V + N1 N + N
Translation: L1 phrase and English node given; test takers supply collocate
Jaén 2009 (Test 4)
311 undergraduate EnglishPhilology/English Translation and Interpretation students at three universities in Spain
23 Adj +N15 V + N6 N + N1 N + V
4-option selected response sentence completion. Node given; collocate selected
Koya 2005 130 students at a 68 V + N 3-option selected
(Test B) university in Japan (various departments)
response completionof sentence context; noun given, verb selected
Kurosaki 2012 (selected response - French)
34 French undergraduate students studying Englishpart-time in Paris
16 V + N7 Adj + N5 Adv + Adj
4-option selected response sentence completion; node given, collocate selected
Kurosaki 2012 (selected response - Japanese)
30 3rd/4th year non-Englishmajor undergraduate students in Japan
16 V + N7 Adj + N5 Adv + Adj
4-option selected response sentence completion; node given, collocate selected
Kurosaki 2012 (translation - French)
29 French undergraduate students studying Englishpart-time in Paris
13 V + N8 Adj + Adj5 Adj + N
Translation from L1- target sentence provided with wholecollocation removed
Kurosaki 2012 (translation - Japanese)
38 3rd/4th year non-English major undergraduate students inJapan
13 V + N9 Adj + Adj5 Adj + N
Translation from L1- target sentence provided with wholecollocation removed
Revier 2009 56 students in Denmark (20 10th grade high school; 17 11th grade highschool; 19 1st year undergraduate)
19 V + N 3-option selected response completionof sentence contexts; each component of collocation selected separately
For various reasons, not all items on all tests were
included in the present analysis. Specifically, items were
omitted if they did not test collocations as defined in
this study (e.g. if they included more than one word or
included a non-lexical word) or if more than one answer was
accepted by the researchers as correct. Table 1 shows the
number and grammatical type of collocations remaining in
each test. After adjustments, the tests comprised between 7
and 100 items each, with a total of 724 items across the 19
tests. There was some overlap between tests in the items
used. For this reason, the total number of unique
(lemmatized) collocations was lower, at 476. The majority
of items were verb + noun combinations (349), followed by
adjective + noun (99), noun + noun (15) and adverb +
adjective (13).
A common problem with meta-analyses is that of
publication bias – i.e. that studies tend only to get
published if they achieve significant results. This means
that meta-analyses which incorporate only published studies
may inadvertently exclude contrary evidence. However, the
present study is unusual amongst meta-analyses in that the
main effect it studies (the relationship between frequency
and knowledge) was not a focus on the original studies
reviewed. There is therefore no reason to believe that the
studies included will demonstrate a greater or lesser
relationship between frequency and knowledge than would
unpublished studies.
3.2 Coding
The second stage of the meta-analysis was that of coding
studies for variables of interest. In this study, the main
variables are the percentage of participants correctly
answering each item and the frequency of each collocation.
The former was provided by the original studies. The latter
was retrieved directly from corpora. Because quantification
of collocation frequency is a complex issue, involving a
number of decisions, this will be described in detail below
(Section 3.3).
As well as the main variables, studies need to be coded
for any potential moderator variables that might be
relevant to the analysis. Four such variables were
identified in the current set of studies:
(i) Students’ experience of studying English. Tests can
be broadly divided into those in which the text-
takers were full-time students on university
programmes directly related to English language and
those which were not (Gyllstad’s (2007) COLLEX 4
and 5 drew on a mix of university and pre-
university students and so will not be included in
this analysis);
(ii) Students’ L1. These can be divided into European
languages (Danish, French, Spanish and Swedish),
Arabic and Japanese;
(iii) Test task type. The main types used are selected
response and translation. Three other task types
(self-report, sentence-completion, and C-test) are
combined under the category ‘other’;
(iv) Whether test-takers are asked to provide the whole
collocation or only the collocate.
Table 2 shows how the 19 tests are categorized on each of
these variables.
Table 2. Categorization of tests according to possible moderatorsSource English
majorsL1 Task type Whole
collocationrequired
Abdul-Fattah No Arabic
Selectedresponse No
BarfieldNo
Japanese Other Yes
Brashi Yes Arabic
Selectedresponse No
Farghal & Obeidat (Test 1) Yes Arabic Other NoFarghal & Obeidat (Test 2) Yes Arabic Translation YesGyllstad (COLLEX 1) Yes
European
Selectedresponse No
Gyllstad (COLLEX 2) Yes
European
Selectedresponse No
Gyllstad (COLLEX 3) Yes
European
Selectedresponse No
Gyllstad (COLLEX 4) Mixed
European
Selectedresponse No
Gyllstad (COLLEX 5) Mixed
European
Selectedresponse No
Jaén (Test 1)Yes
European Other No
Jaén (Test 2)Yes
European Translation No
Jaén (Test 4)Yes
European
Selectedresponse No
KoyaNo
Japanese
Selectedresponse No
Kurosaki (MC Fr)No
European
Selectedresponse No
Kurosaki (MC Jp)No
Japanese
Selectedresponse No
Kurosaki (trans. Fr) No
European Translation Yes
Kurosaki (trans. JP) No
Japanese Translation Yes
Revier No European
Selectedresponse Yes
3.3 Frequency data
Collocation frequency can be quantified in a number of
different ways (see Schmitt 2010 for a review). Since it is
unclear which of these is most likely to be related to
learner knowledge, several different methods were used.
The first variable which needs to be considered in
counting collocations is the ‘span’ of text within which
two words need to occur to be counted as an example of the
collocation. Collocates can occur at quite some distance
from each other, as the following Example (1) of the
collocation realize dream, taken from the Corpus of
Contemporary American (COCA) (Davies 2008-), illustrates:
(1) The old dream of wireless communication through
space has now been realized
Thus if the span used in our search of collocations is too
narrow, many genuine examples will be missed. However, as
the span is widened, the chances of counting word pairs
which are not in a collocational relationship increases.
Consider Example (2), again taken from COCA:
(2) she realizes that the buzzing sound from her dream
is present in her bedroom.
The balance we need to achieve in setting a search span,
therefore, is to maximize the number of genuine
collocations while minimizing the number of false hits. The
former pushes us to widen our search span, while the latter
pushes us to keep it narrow. Jones & Sinclair’s (1974)
claim that most collocates are found within four words to
the left or right of their node has led to the widespread
adoption of a 4:4 span. However, there has been little
direct validation of this claim. The present research will
therefore adopt two spans: a conservative 4:4 and a more
liberal 9:9. Results from both types of search will be
compared with student scores to see which is the better
predictor of knowledge.
A second variable that must be considered is that of
whether counts for separate forms of a word should be
combined – such that, for example, argue strongly and argued
strongly would count as two occurrences of a single
collocation – or whether separate counts should be made for
each form. While Halliday (1966) argues for the former on
the grounds that treating different forms separately would
add complexity without a gain in descriptive power, many
corpus linguists have noted that conflating forms risks
disguising important differences between the collocations
of different forms of a word (Clear 1993, Hoey 2005,
Sinclair 1991, Stubbs 1996). Both of these arguments, it
should be noted, are based on the priorities of descriptive
linguists. For our purposes, the important question is
which approach produces counts which are relevant to
students’ likelihood of knowing a collocation. While there
is some evidence that learners do not always transfer their
knowledge of one form of a word to another (Schmitt &
Zimmerman 2002), I would argue that the default assumption
should be that learning will usually take place at least at
the lemma level – for example, encountering argue strongly
will increase a learners’ chances of recognizing argued
strongly as an appropriate collocation. Most of the frequency
counts used in this study therefore combined counts of
differently inflected forms of the component words.
However, since the assumption that lemmatised counts
provides a better estimate of knowledge is yet to be
substantiated, one frequency count based on unlemmatised
word forms was also provided for comparison.
A third factor that needs to be considered is the
measure used to quantify collocation frequency. The
simplest approach is to record the number of times a
combination appears. However, such counts tend to give
undue prominence to combinations of very high-frequency
words (of the, and a, etc.), which co-occur very frequently
by chance alone, while sidelining genuine collocations
which consist of low-frequency words (abject poverty, battering
ram, etc.). A number of methods have been suggested to
overcome these problems. Perhaps the most widely used are
the ‘t-score’ and ‘mutual information’ (MI) statistics. The
rationale for and calculation of these statistics are
discussed in detail elsewhere (Manning & Schütze 1999) so
will be described only briefly here.
Both statistics work by comparing the actual frequency
of co-occurrence of a pair of words with the frequency we
would expect them to co-occur by chance alone, given the
individual frequency of each word. Expected frequency E is
calculated using the formula
E C *w1*w2C 2
where C is the total number of word tokens in the corpus,
and w1 and w2 are the frequencies of each of the component
words.
T-score and MI are then calculated with the formulas
t O E
O
MI log2OE
where O is the observed frequency of a combination.
The logic behind these two statistics is rather
different, and this results in characteristically different
types of collocations being highlighted by each. MI is a
measure of the extent to which the probability of meeting
one word increases once we encounter the other. T-score, on
the other hand, is a hypothesis testing technique, which
evaluates how much evidence there is that a particular
combination occurs more frequently than we would expect by
chance alone, given the frequencies of its component parts.
As Clear (1993) puts it, whereas “MI is a measure of the
strength of association between two words”, t-score indicates “the
confidence with which we can claim there is some association” (Clear
1993: 279-282, original emphases). Clear (1993) gives the
example of taste arbiters as a combination with a high MI.
Though the pairing is not particularly frequent, a high
proportion of occurrences of each of its component words
are found as part of this collocation, with, according to
Clear’s (1993) data, one quarter of all occurrences of
arbiters being found within a two word span of an occurrence
of taste. The two words are therefore strongly associated in
that, where we find arbiters, we are also likely to find
taste. However, the relatively low frequency of the
collocation means that we cannot be confident that the
association is generalisable – i.e. that we would encounter
it in other samples of language. The pairing taste for, on
the other hand, is an example of a collocation with a high
t-score. Though the association between these words is
weaker than that between arbiter and taste, in that neither
word is a strong predictor of the other, the pair occurs
much more frequently, so we can be more confident in the
generalisability of the association.
Both of these measures of association are non-
directional, in that it makes no difference which word is
taken as node and which as collocate. Clearly, however, the
relationship between two parts of a collocation is often
not symmetrical. The association from arbiters to taste, for
example, is likely to be much stronger than that from taste
to arbiters since, while a very high proportion of
occurrences of arbiters is found in co-occurrence with taste,
the reverse is not true. Since many of the task types
included in the present analysis ask test-takers to
identify a collocate when a particular node is given, this
directionality may be important. For this reason, the
analysis will also include the ‘conditional probability’
measure described by Durrant (2008: 84-85). This shows the
probability of a particular word appearing, given that
another particular word has appeared. It is calculated as:
P(w1|w2)w1*w2w1
A further point that needs to be taken into account when
quantifying collocation frequency is the nature of the
corpus consulted. To determine the extent to which
learners’ knowledge of collocation is frequency-driven, the
best corpus would be one representative of each students’
lifetime exposure to the language. Since such corpora do
not exist, we need to work instead with more general
corpora which may approximate to the types of exposure a
variety of learners, on average, experience. With this aim,
two widely used corpora were used: the British National
Corpus (accessed through Davies’s BYU-BNC interface (Davies
2004-)) and the Corpus of Contemporary American (Davies
2008-). Both of these corpora are intended to be
representative of a national variety of English. The BNC is
a corpus of approximately 100 million words of British
English from the late 20th century. It includes around 10
million words of transcribed spoken language and 90 million
words of written language, sampled from across five genres
(academic, fiction, magazine, newspapers, non-academic non-
fiction) plus one “miscellaneous” category. At the time of
writing, the COCA includes around 450 million words of
American English from the years 1990 to 2012. It is sampled
in roughly equal amounts from spoken, academic, fiction,
newspaper and magazine genres. Since it is possible that
certain genres within each corpus will be more
representative of learners’ experience than others,
frequency information was rerieved both for the corpora as
wholes and separately for each genre within them.
A related issue is that of ‘dispersion’ – i.e. the
extent to which a collocation’s occurrences are evenly
spread throughout a corpus. Items which are frequent only
because they are used intensively in a narrow range of
texts represent a different learning prospect from items
which occur regularly throughout the language. In general,
it seems likely that more learners will have more exposure
to a collocation that is widely dispersed than one which is
restricted to a small range of texts. It is therefore worth
asking whether learners have a better chance of knowing
more widely dispersed collocations than those which are
more restricted in their use. Several measures of
dispersion have been proposed in the literature (Gries
2008). The measure adopted here was Gries’s (2008) DP. This
is calculated by (i) dividing the corpus into sections (in
the present analysis, the sections will be the separate
genres within each corpus); (ii) determining the size of
each section and normalizing this against the overall size
of the corpus to determine what percentage of occurrences
of a collocation can be expected to appear in that section,
if the collocation is equally distributed across sections;
(iii) determining the actual percentage of occurrences of
the collocation which is found in each section; (iv)
computing the differences between expected and actual
occurrences of the collocation in each section, summing
these differences and dividing them by two. This provides a
number, ranging between 0 and 1, where values close to 0
show an even distribution of the collocation across
sections and values close to 1 show a strong bias towards
particular sections.
As the discussion so far shows, collocation frequency
can be quantified in many ways. The present research aims
to determine both whether frequency in general is related
to learners’ likelihood of knowing a collocation and which
of the methods of quantifying frequency are the best
predictors of knowledge. With this aim, several different
frequency statistics were employed. The first analyses
employed frequency data from BNC and COCA as wholes.
Collocation frequency was calculated in a number of ways.
As the 4:4 span appears to be the most commonly-used in the
literature (Hoey 2005) and as lemmatized frequencies have
been argued to be the more relevant, the main analysis used
lemmatized frequency with a span of 4:4 words. To determine
whether different results are obtained when span and
lemmatization change, additional counts were made based on
lemmatized frequency with a span of 9:9 words and non-
lemmatized frequency with span of 4:4 words.
In addition, the three measures of association (t-score;
MI; conditional probability) and the measure of dispersion
(DP) discussed above were calculated. To avoid an
unmanageable multiplication of analyses, these measures
were not calculated separately for all of the three
collocation counts. For the reasons described in the
previous paragraph, counts of lemmatized frequency with a
span of 4:4 were used for this purpose. As a second step,
separate frequency data were provided for each genre within
the two corpora, i.e. in COCA: Academic; Fiction; Magazine;
Newspapers; Spoken. In BNC: Academic; Fiction; Magazine;
Newspapers; Non-academic; Spoken. Again to avoid an
unsustainable multiplication of analyses, only lemmatized
collocation frequency with a span of 4:4 were used for each
genre.
3.4 Analysis
Data analysis took part in two stages. First, for each
test, the percentage of learners correctly answering each
item was correlated with each of the frequency measures
described above. Second, a meta-analysis was conducted to
find the average correlations across all 19 tests. While
the first stage is straightforward, the second is more
complicated and will be described here in detail. The
procedures described here draw on the guidance provided by
Lipsey & Wilson (2000).
The aims of a meta-analysis are to provide a single mean
effect size which summarizes results from different studies
and to determine the variation between different studies.
While the former gives an overall indication of the
influence of the main predictor variable, the latter allows
examination of what other variables moderate this effect.
Because studies which are conducted with a large number of
participants are, other things being equal, more likely to
provide a reliable effect size than studies based on
smaller samples, the mean effect size is weighted to give
more importance to studies with larger subject samples.
Weighting is achieved by multiplying each effect size by
the inverse of the standard error for the sample. Because
correlations have problematic standard error formulations,
they are usually transformed using Fisher’s Z-transform
before the weighting takes place. Z-transformed
correlations are calculated using the formula:
ESzr.5loge
1 r1 r
Once Fisher’s Z transformation has been made, the mean
weighted effect size is found by:
(i) Calculating a weighting for each effect size.
This is the inverse of the variance for the sample. In
the present case
SE zr
1n 3
wzr
1SE zr
2 n 3
where n is the sample size;
(ii) Calculating weighted effect sizes by multiplying
each effect size by its weighting;
(iii) Calculating mean weighted effect size by
dividing the sum of weighted effect sizes by the sum of
weightings;
(iv) Calculating the standard error of the weighted
mean effect size. This is calculated as:
SE ES 1wi
(v) Calculating the 95% confidence intervals for the
mean using the standard error. This is calculated by
adding/subtracting the product of the standard error and
the critical value for the z-distribution (1.96)
to/from the mean weighted effect size:
ESL ES 1.96(SE ES )ESU ES 1.96(SE ES )
(vi) Converting the mean correlation and confidence
interval from Z-transformed figures back to the
original correlation type using the inverse
transformation:
r e2ESzr 1e2ESzr 1
As discussed above, the aim of this meta-analysis is to
allow generalization both to a wider body of L2 learners
and to a broader population of collocations. For this
reason, there are two sample sizes of relevance: the number
of participants taking a test, and the number of
collocations included on that test. For this reason, two
meta-analyses were performed, one for each sample size.
Meta-analyses rely on the assumption that results from
the different effect sizes they combine are independent of
each other. This assumption is usually thought to be met if
no more than one effect size in the analysis is taken from
a single subject sample, though some researchers have
argued that results conducted by the same team should also
be considered dependent (Lipsey & Wilson 2000: 112). In the
present meta-analysis, three types of violation of
independence are relevant. Firstly, as Table 2 showed,
there is some overlap between the collocations sampled in
each test. In most cases, the overlaps are small. However,
the two versions of Farghal & Obiedat’s (1995) test, the
five versions of Gyllstad’s (2007) COLLEX test and the four
versions of Kurosaki’s (2012) test have substantial
overlaps. It is therefore not likely that the effect sizes
from these four tests will be independent of each other.
Secondly, the three tests conducted by Jaén (2009) were all
carried out with the same group of participants. Again,
therefore, the assumption of independence is likely not to
have been met. Thirdly, the four sets of studies just
mentioned were each conducted by the same researchers. In
addition to the overlaps in their samples, therefore, they
also fail to meet the stricter criterion that effect sizes
from studies conducted by the same researchers not be
considered independent. For this reason, the correlations
from each of these four sets of tests were combined into
four single values by taking weighted averages of the
correlations from each test. These average correlations
were then used in the meta-analysis, rather than separate
correlations for each test.
4. Results
Results from the first stage in the analysis are shown in
Table 3 (for COCA data) and 4 (for BNC data). As
collocation frequencies are not normally distributed,
spearman’s r was used to quantify correlation. All three
counts of collocation frequency showed positive
correlations with learner knowledge for the majority of
tests, though the size of the correlation varied widely
across tests and between COCA and BNC counts (with the
former producing the higher correlations). The same pattern
holds for correlations with t-scores and conditional
probability. DP shows the expected negative correlation in
a majority of cases. The results for MI show a high degree
of variability, with positive correlations in 13 tests
using COCA data and in 9 tests using BNC data.
There are not sufficient data here to enable a reliable
analysis of factors that might affect variation in scores
between tests. However, it is worth looking at how these
data vary across potential moderators. This is important
both to provide clues as to potential effects that future
research might investigate and to support interpretation of
the meta-analysis, which relies on the assumption that
effect sizes come from a single population and that
differences between effect sizes are due to random errors,
rather than systematic moderating factors. Section 3.2
described four variables that might moderate the current
findings: learners’ experience of studying English (English
majors vs. non-English majors); learners’ L1; task type
(selected-response vs. translation); and whether test-
takers are asked to provide the whole collocation or only
the collocate.
Table 4. Correlations of learner knowledge with COCA frequency data (spearman’s r)
Lemma-Lemma4:4
Lemma-Lemma9:9
Form-Form4:4
t-score MI
Conditionalprobability
Gries’sDP
Abdul-Fattah 0.23 0.16 0.43 0.25
0.32 0.24 0.13
Barfield
0.26 0.31 0.23 0.24
-0.3
1 -0.14 -0.24Brashi
0.49 0.53 0.40 0.45
-0.2
2 0.24 -0.08Farghal & Obeidat (Test 1) 0.34 0.34 0.25 0.45
0.51 0.60 -0.80
Farghal & Obeidat (Test 2) 0.26 0.22 0.27 0.26
0.23 0.07 -0.31
Gyllstad (COLLEX 1) 0.45 0.44 0.38 0.45
0.08 0.52 -0.26
Gyllstad (COLLEX 2)
0.57 0.56 0.39 0.57
-0.0
3 0.33 0.07Gyllstad (COLLEX 3) 0.39 0.37 0.27 0.39
0.16 0.17 -0.05
Gyllstad (COLLEX 4)
0.23 0.22 0.19 0.22
-0.0
8 0.15 -0.14Gyllstad (COLLEX 5) 0.07 0.05 0.10 0.06
0.04 0.14 -0.10
Jaén (Test 1)0.47 0.49 0.43 0.45
0.15 0.35 -0.64
Jaén (Test 2)0.15 0.16 0.12 0.14
0.09 0.22 -0.35
Jaén (Test 4)
0.10 0.13 0.03 0.09
-0.2
6 0.14 -0.17Koya
0.06 0.03 0.09 0.090.4
0 0.10 0.26Kurosaki (MC Fr)
0.04 -0.02 -0.18 0.070.0
2 0.04 0.41Kurosaki (MC Jp)
0.42 0.37 0.19 0.460.0
7 0.30 0.29Kurosaki (trans. Fr) 0.52 0.53 0.35 0.51
0.21 0.62 0.33
Kurosaki (trans. JP) 0.39 0.38 0.41 0.42
0.19 0.40 0.02
Revier
-0.06 0.01 0.18-
0.02
-0.4
5 -0.24 -0.29
Table 5. Correlations of learner knowledge with BNC frequency data (spearman’s r)
Lemma-Lemma4:4
Lemma-Lemma9:9
Form-Form4:4
t-score MI
Conditionalprobability
Gries’sDP
Abdul-Fattah 0.01 0.15 0.09 0.29
0.16 0.23 0.06
Barfield
0.06 0.12 0.16 0.05
-0.3
0 -0.22 -0.13Brashi
0.57 0.62 0.51 0.57
-0.1
1 0.35 -0.43Farghal & Obeidat (Test 1) 0.28 0.34 0.11 0.22
0.17 0.32 -0.28
Farghal & Obeidat (Test 2)
0.04 0.12 0.11 0.01
-0.1
0 -0.13 0.09Gyllstad (COLLEX 1)
0.32 0.33 0.29 0.31
-0.0
3 0.45 -0.29Gyllstad (COLLEX 0.44 0.44 0.27 0.44 - 0.22 0.02
2) 0.15
Gyllstad (COLLEX 3) 0.14 0.16 0.12 0.14
0.06 0.02 -0.01
Gyllstad (COLLEX 4)
0.08 0.08 0.17 0.08
-0.0
9 0.13 -0.04Gyllstad (COLLEX 5) 0.03 0.04 0.18 0.03
0.00 0.14 0.02
Jaén (Test 1)
0.35 0.39 0.34 0.34
-0.2
0 0.38 -0.35Jaén (Test 2)
-0.11 -0.06 -0.03-
0.11
-0.1
8 0.13 -0.22Jaén (Test 4)
0.20 0.20 0.02 0.19
-0.2
2 0.27 -0.17Koya
-0.15 -0.14 -0.02-
0.130.3
3 -0.02 -0.03Kurosaki (MC Fr)
0.07 0.05 0.00 0.100.1
0 0.21 0.01Kurosaki (MC Jp)
0.38 0.37 0.15 0.380.0
3 0.20 -0.06Kurosaki (trans. Fr) 0.31 0.34 0.13 0.30
0.03 0.47 0.24
Kurosaki (trans. JP) 0.26 0.25 0.19 0.27
0.11 0.30 0.03
The boxplots in Figures 1-4 show the spreads of
correlations between learner knowledge and each frequency
measure for each of these variables. Space restrictions do
not allow figures to be included for all analyses, so
lemmatized 4:4 counts only are used to represent frequency
counts. Patterns were similar across 9:9 span and non-
lemmatized counts. The majority of the plots do not show
any obvious differences between groups. The only strong
difference is seen in Figure 3, which shows that the three
tests which do not utilize translation or selected-response
tests tend to show a negative correlation with DP which is
not evident on the other test types. There was also a
(slightly weaker) tendency for English majors to show a
negative correlation with DP, which was not present for the
non-English majors (see Figure 1). While these patterns are
weak, and based on a relatively small number of cases
(especially the “other” category of test type), they do
suggest that care will need to be taken in the
interpretation of meta-analysis results related to DP.
The aim of the meta-analysis is to provide a clearer
overview of the trends in this rather mixed data. Results
are shown in Table 5. For COCA data, both frequency and t-
score show weak-to-moderate correlations with learner
knowledge. No clear differences are evident between
different frequency counts: in the by-item analysis,
lemmatized counts had a stronger correlation, while in the
by-participants analysis, the non-lemmatized count did
better. Differences in span also did not seem to affect the
correlation in a consistent way.
Evidence for the other measures was rather mixed.
Conditional probability and DP both showed weak
correlations with knowledge (the former positive, the
latter negative), though neither was statistically reliable
in the by-items analysis. Results for MI are inconsistent,
showing a small negative correlation in the analysis by
items and a positive correlation in the analysis by
participants.
Data from the BNC correlated more weakly with learner
knowledge. As with COCA, there was little difference
between the various measures of collocation frequency. DP
shows a negative (and statistically reliable) correlation
with knowledge at about the same level (r= - .12/-.10) as
that seen for COCA. Conditional probability also showed a
correlation of similar magnitude to that seen for COCA, but
this was again unreliable in the by-items analysis. MI
showed only weak, and statistically unreliable, negative
correlations.
Figure 1: Correlations across English majors (shaded) vs. non-English-majors (white)Lemma-lemma ±4 MI
COCA BNC COCA BNC
Conditional probability DPCOCA BNC COCA BNC
Figure 2: Correlations across L1s (Arabic (dark grey)/European (light grey)/Japanese (white))Lemma-lemma ±4 Lemma-lemma ±9 Form-form ±4
COCA BNC COCA BNC COCA BNC
MI Conditional probability DPCOCA BNC COCA BNC COCA BNC
Figure 3: Correlations across test types (Selected response (dark grey)/Translation (light grey)/Other(white))
Lemma-lemma ±4 Lemma-lemma ±9 Form-form ±4COCA BNC COCA BNC COCA BNC
MI Conditional probability DPCOCA BNC COCA BNC COCA BNC
Figure 4: Correlations across collocate only (shaded) vs. whole collocation (white) testsLemma-lemma ±4 Lemma-lemma ±9 Form-form ±4
COCA BNC COCA BNC COCA BNC
MI Conditional probability DPCOCA BNC COCA BNC COCA BNC
Table 5. Weighted mean correlations of learner knowledge with frequency data (spearman’s r)
COCA BNCAnalysis by items Analysis by participants Analysis by items Analysis by participants
meanr
95% CI mean r 95% CI mean r 95% CI mean r 95% CI lower upper lower upper lower upper lower upper
Lemma-Lemma ±4 0.26 0.14 0.38 0.20 0.14 0.26 0.10 -0.01 0.22 0.07 0.01 0.13Lemma-Lemma ±9 0.27 0.16 0.39 0.19 0.12 0.25 0.14 0.03 0.26 0.13 0.07 0.20Form-Form ±4 0.24 0.12 0.35 0.27 0.20 0.33 0.13 0.02 0.25 0.09 0.03 0.16t-score 0.26 0.14 0.38 0.21 0.14 0.27 0.11 0.00 0.23 0.15 0.10 0.22MI -0.02 -0.12 0.10 0.10 0.04 0.17 -0.07 -0.17 0.04 -0.02 -0.07 0.04Conditional probability 0.09 -0.02 0.21 0.15 0.09 0.22 0.04 -0.07 0.15 0.15 0.09 0.22Gries’s DP -0.09 -0.19 0.02 -0.08 -0.14 -0.02 -0.12 -0.22 -0.02 -0.10 -0.15 -0.04
Table 6 shows the correlations between learners’ knowledge
and collocation frequency in the separate genres of each
corpus. Since there is little reason to believe that the
relationship between genre and knowledge will vary across
different test types, Figures 5 and 6 show boxplots for
correlations across the groups of L1 and English major vs.
non-English major only. As before, the majority of
comparisons show little evidence of patterning across
groups. However, the Academic genre shows a stronger
correlation in the English major than the non-English major
groups and, for the COCA corpus, a trend whereby L1 Arabic
students show the strongest correlation, followed by
European students, and L1 Japanese students the weakest.
The results of meta-analyses averaging these figures
across tests are shown in Table 7. Across both corpora and
both types of analysis, the fiction genre shows the
strongest correlations with learner knowledge while the
academic genre shows the weakest correlations.
5. Discussion
The findings reported above show that some types of corpus
data are reliably related to learner knowledge. This is
consistent with previous research which has suggested that
collocation learning and processing is related to frequency
(Durrant & Schmitt 2010, Siyanova-Chanturia et al. 2011,
Wolter & Gyllstad 2013). It extends the findings of
previous studies by showing: (i) that this relationship is
statistically reliable across a wide range of learners and
collocations; (ii) that it is reflected in a level of
learner knowledge that can be tapped through traditional
test formats; (iii) how the relationship between frequency
and knowledge varies across different corpora and frequency
measures and across different learner groups. COCA was
more strongly related to learner knowledge than the older
and smaller BNC, and data from the fiction sub-corpora of
each corpus were more strongly related with knowledge than
those from other registers. Frequency data from academic
registers had the weakest relationship with knowledge.
Correlations with the academic genre were higher for
English majors than non-English majors and for students
from Arabic-speaking countries than for those from Japan
(with European students being intermediate between the
two).
Table 6. Correlations of learner knowledge with genre frequency data (spearman’s r)COCA BNC
academic
fiction
magazine
newspaper
spoken
academic
fiction
magazine
newspaper
non-academic
spoken
Abdul-Fattah 0.26 0.28 0.42 0.20 0.19 -0.14 0.25 0.05 0.18 0.04 0.03Barfield 0.16 0.34 0.30 0.12 0.23 0.01 0.25 0.11 -0.03 -0.03 0.12Brashi 0.30 0.39 0.42 0.47 0.39 0.47 0.49 0.60 0.54 0.51 0.60Farghal & Obeidat (Test 1) 0.51 0.11 0.34 0.40 0.78 0.69 0.11 -0.14 0.26 0.37 0.68Farghal & Obeidat (Test 2) 0.30 0.17 0.26 0.42 0.17 -0.33
-0.10 0.12 -0.09 -0.03 -0.20
Gyllstad (COLLEX 1) 0.28 0.49 0.41 0.44 0.51 0.18 0.41 0.20 0.39 0.26 0.36Gyllstad (COLLEX 2) 0.33 0.62 0.50 0.44 0.52 0.34 0.50 0.37 0.29 0.28 0.43Gyllstad (COLLEX 3) 0.19 0.42 0.32 0.31 0.35 0.08 0.32 0.16 0.00 0.07 0.16Gyllstad (COLLEX 4) 0.13 0.41 0.22 0.11 0.27 0.04 0.35 0.10 -0.02 0.05 0.20Gyllstad (COLLEX 5) 0.06 0.24 0.12 0.00 0.13 0.03 0.26 -0.02 0.04 0.09 0.08Jaén (Test 1) 0.32 0.70 0.57 0.43 0.40 0.25 0.64 0.31 0.27 0.22 0.31Jaén (Test 2) -0.26 0.38 0.21 0.18 0.31 -0.26 0.34 0.21 0.31 -0.24 0.09Jaén (Test 4) 0.19 0.07 0.07 0.10 0.14 0.27 0.14 0.23 0.38 0.23 0.10Koya -0.11 0.24 -0.04 -0.03 0.02 -0.16 0.11 -0.13 -0.20 -0.21 -0.05Kurosaki (MC Fr) -0.26 0.22 0.03 0.07 -0.02 -0.31 0.11 0.14 0.09 -0.17 0.11Kurosaki (MC Jp) 0.08 0.56 0.45 0.44 0.34 0.11 0.41 0.27 0.42 0.21 0.21Kurosaki (trans. 0.25 0.61 0.49 0.52 0.50 0.14 0.38 0.28 0.32 0.19 0.39
Fr)Kurosaki (trans. JP) 0.09 0.52 0.41 0.45 0.46 0.12 0.43 0.20 0.30 0.14 0.28Revier 0.13 0.29 0.10 0.02 0.13 0.01 0.13 0.20 0.15 0.05 0.10
Figure 5: Correlations by genre across English majors (shaded) vs. non English-majors (white)Academic Fiction Magazine
COCA BNC COCA BNC COCA BNC
Newspaper Non-academic SpokenCOCA BNC COCA BNC COCA BNC
Figure 6: Correlations by genre across L1s (Arabic (dark grey)/European (light grey)/Japanese (white))Academic Fiction Magazine
COCA BNC COCA BNC COCA BNC
Newspaper Non-academic SpokenCOCA BNC COCA BNC COCA BNC
Table 7. Weighted mean correlations of learner knowledge with genre frequency data (spearman’s r)
COCA BNCAnalysis by items Analysis by participants Analysis by items Analysis by participants
mean r 95% CI mean r 95% CI mean r 95% CI mean r 95% CIlower upper lower upper lower upper lower upper
academic 0.12 0.01 0.24 0.15 0.09 0.22 0.03 -0.08 0.14 -0.01 -0.07 0.05fiction 0.39 0.27 0.52 0.38 0.31 0.44 0.29 0.17 0.41 0.30 0.24 0.37magazine 0.26 0.15 0.39 0.32 0.26 0.39 0.13 0.02 0.25 0.12 0.06 0.19newspaper 0.19 0.07 0.31 0.20 0.13 0.26 0.08 -0.03 0.20 0.16 0.10 0.23non-academic - - - - - - 0.03 -0.08 0.14 0.03 -0.03 0.09spoken 0.25 0.13 0.37 0.25 0.18 0.31 0.15 0.04 0.27 0.12 0.05 0.18
The finding regarding English majors is an intuitively
satisfying one, since these students seem likely to have
had greater exposure to academic writing in English than
the other test-takers. The finding regarding L1s is
difficult to interpret in the absence of a survey of the
materials typically used across these different regional
contexts.
A priority for future research should be to understand
more about the relationship between types of corpora and
learner knowledge. This relationship is likely to depend on
the degree of fit between a particular corpus and a
particular group of learners, and it is important to
understand these group differences if we are to produce
fair tests and valid research. In the present study, group
differences only became evident when drawing on the
relatively specialized academic sub-corpora of COCA and the
BNC. The full national corpora, and the other sub-corpora
within them, did not evidence strong group-based biases.
This suggests that principled choice of corpora is needed
to create tests that can be used fairly across groups.
Given the limited range of groups covered, however, further
research is needed to understand these relationships.
A small but statistically reliable (in three of the four
analyses) relationship was also found between learners’
knowledge and the dispersion of collocations across the
corpora. This suggests that it may be worth incorporating
this measure into future studies of collocation and
sampling strategies for collocation tests. However, it is
important to note that the effect sizes for DP may not have
been equal across groups, and the assumptions of the meta-
analysis therefore violated. It would be of interest for
future research to understand whether (and why) the
influence of DP can be shown to differ consistently across
learner groups and test types.
The relationships between learner knowledge and
conditional probability and MI were inconsistent. The
former failed to show a statistically reliable correlation
in two out of the four meta-analyses; the latter failed to
show a relationship in three out of four. The sizes of
relationships were also small and – in the case of MI –
inconsistent in terms of direction (conditional probability
achieving rs of .09, .15, .04 and .15; MI achieving
-.02, .10, -.07 and -.02).
To my knowledge, this is the first research to test the
relationship between conditional probability and learner
knowledge. The finding with regard to MI, however, is
consistent with Ellis et al. (2008), who fine in a range of
psycholinguistic experiments that, while native speakers’
accuracy and fluency of processing lexical bundles
increased as MI score increased, the same was not true of
advanced L2 learners at a US university. The confirmation
of this finding here suggests that learners’ lack of
sensitivity to MI is a general one, applicable to two-word
collocations, as well as lexical bundles; to knowledge as
it is tapped by traditional test methods, as well as to
processing speed and accuracy; and to a wide range of L2
learners in EFL as well as ESL contexts.
As was discussed above, both MI and conditional
probability differ from raw frequency measures and t-score
in that they depend primarily on the strength of attraction
between a collocation’s elements, rather than on its
overall frequency as an item. Native speakers’ sensitivity
to, and ability to acquire, relatively infrequent, but
strongly associated, pairs suggests that they are aware,
not only of how frequent a collocation is, but of how
frequent its parts are, compared to the frequency of the
whole. Non-natives, it seems, do not typically have this
awareness. Much discussion with regard to collocation
learning has started from the suggestion that L1 learners
pay attention to chunks of language, ignoring their
constituent parts, while L2 learners focus on individual
words (e.g. Wray 2002). The present findings, and those of
Ellis et al (2008) seem to point to the opposite
conclusion: L1 learners notice both collocations and their
components, while L2 learners focus only on the whole
collocation.
Such a model would provide an explanation for the
results of Durrant & Schmitt (2009), who report that L2
learners’ writing makes extensive use of high-frequency
collocations, but underuses lower-frequency collocations
with high MIs (e.g. densely populated, bated breath, arbiter taste).
It is the underuse of this type of collocation, Durrant &
Schmitt argue, rather than a lack of collocation knowledge
in general, that accounts for the often-reported sense that
L2 writing lacks idiomaticity due to a lack of collocations
(e.g. Kjellmer 1990). If L2 learners tend to notice only
collocation frequency, and not strength of attraction, it
is unsurprising that they fail to develop a repertoire of
low-frequency, high-MI collocations.
Some theoretical models (most prominently, Nation 2001)
conceptualize collocation knowledge as an aspect of single-
word knowledge: one thing we know about a word is the other
words with which it is likely to collocate. It is also
possible to see collocation knowledge as a construct in its
own right. The definition of collocations outlined in
Section 2 allows for collocations to be known as
independent entities, divorced from knowledge of any
individual words. It seems likely that a comprehensive
model of collocation knowledge would need to combine these
perspectives. However, the above considerations suggest
that the single-word perspective may be less pertinent in
non-native than in native speaker knowledge. Models of
collocation knowledge covering both native and non-native
learners will need to take this possibility into account.
A final point that should be addressed concerns the
sparse nature of the data which I was able to accumulate
for analysis. My review of the literature identified 85
studies reporting tests of collocation knowledge. However,
little more than half of these (46/85) indicated the items
they had used in their tests. Only about one study in six
provided results on an item-by-item basis. In all general
tests of collocation, the sample of items tested is, of
necessity, small in proportion to the population of
collocations to which we wish to generalize. Given this, I
would argue that providing detailed information about the
contents of each test is vital if meaningful conclusions
are to be drawn and I would call on all researchers working
in this area to provide such information in future work.
6. Conclusions
The analyses presented here suggest that frequency data
should be used as part of the process of sampling
collocations for selective tests. However, they also show
that different corpora correlate differently with knowledge
in different groups of learners. Testers and researchers
need to take this variation into account in their test
designs.
While frequency is clearly important, it will never
predict knowledge perfectly. As Ellis & Larsen-Freeman
(2006) warn, learning is an outcome of complex processes
and single factors rarely account for more than 16-25% of
the variance in knowledge. The inevitable mismatch between
the contents of corpora and learners’ actual experience of
the language, and the error built into frequency counts due
to our current inability to distinguish between different
uses of polysemous items also mean that the frequency-
knowledge relationship will inevitably be a noisy one.
Though tests of single-word vocabulary have tended to focus
on frequency in sampling, therefore, it is important to
remember that this can provide only a rudimentary strategy.
To achieve more generalisable results, we need to work
towards integrating further variables into the sampling
frame. Previous research on collocation suggests that, at
the very least, the influence of the L1 should be included
(Wolter & Gyllstad 2011, Yamashita & Jiang 2010), and the
current study also suggests that learners from different
language backgrounds and different types of language
education may be differently sensitive to frequencies in
particular registers. This argues for a multi-dimensional
approach to item sampling, which we are not yet in a
position fully to specify.
Further work is needed, therefore, to investigate
exactly how frequency and other variables can be combined
in a balanced strategy. Candidate variables might include
L1, learning context, learner proficiency, collocation type
and test type. We also need to acknowledge that the
unpredictable interaction of variables, many of which
cannot be measured, implies that no sampling strategy will
ever enable error-free generalizations beyond a sample. It
is important therefore, to determine how much variation is
likely to be missed by our sampling strategies so that we
can hedge our interpretations accordingly. Until we have
more information on these issues, researchers’ and testers’
claims based on tests of collocation knowledge (and,
indeed, test of vocabulary more generally) need to be
treated with appropriate caution.
This study has also provided further evidence for L2
learners’ lack of sensitivity to the strength of attraction
between words (as opposed to overall collocational
frequency). I have suggested that this may indicate a
holistic approach to collocation learning, which does not
take account of the frequencies of individual component
words. I have argued that this provides an explanation for
previous findings that L2 learners’ writing is
distinguished from that of native speakers by the
relatively low levels of use of collocations which are
infrequent but highly salient to natives because of the
strength of attraction between their parts. Further
research is needed to test this possibility.
Finally, it has become apparent in the course of this
study that studies of collocation knowledge often suffer
from insufficient reporting of the details of the tests
they use. It is important that future research in this area
details the contents of tests and provides statistics which
enable readers to understand the impacts of different items
on overall test scores.
References
Abdul-Fattah, H. S. 2001. “Collocation: A missing chain from
Jordanian basic education stage English language
curriculum and pedagogyˮ. Dirasat, Humand and Social Sciences, 28
(2), 582-596.
Barfield, A. 2003. Collocation Recognition and Production: Research Insights.
Tokyo: Chuo University.
Barfield, A. & Gyllstad, H. 2009. “Introduction: Researching L2
collocation knowledge and developmentˮ. In A. Barfield &
H. Gyllstad (Eds.), Researching Collocations in Another Language.
Basingstoke: Palgrave Macmillan, 1-18.
Biber, D. 2009. “A corpus-driven approach to formulaic language
in English: Multi-word patterns in speech and writingˮ.
International Journal of Corpus Linguistics, 14 (3), 275-311.
Biber, D., Johansson, S., Leech, G., Conrad, S. & Finegan, E.
1999. Longman Grammar of Spoken and Written English. Harlow:
Longman.
Bonk, W. J. 2001. “Testing ESL learners knowledge of
collocationsˮ. In T. Hudson & J. D. Brown (Eds.), A Focus on
Language Test Development. Honolulu: University of Hawaii
Press, 113-142.
Brashi, A. 2009. “Collocability as a problem in L2 productionˮ.
Reflections in English Language Teaching, 8 (1), 21-34.
Cheng, W., Greaves, C. & Warren, M. 2006. “From n-gram to
skipgram to concgramˮ. International Journal of Corpus Linguistics, 11
(4), 411-433.
Clear, J. 1993. “From Firth principles: Computational tools for
the study of collocationsˮ. In M. Baker, G. Francis & E.
Tognini-Bonelli (Eds.), Text and Technology: in Honour of John
Sinclair. Amsterdam: John Benjamins, 271-292.
Cooper, H. 1998. Synthesizing Research: A Guide for Literature Reviewers.
London: Sage.
Cowie, A. P. (Ed.) 1998. Phraseology: Theory, Analysis, and Applications.
Oxford: Oxford University Press.
Davies, M. 2004-. BYU-BNC (Based on the British National Corpus
from Oxford University Press). Available online at
http://corpus.byu.edu/bnc/ (accessed July 2014).
Davies, M. 2008-. The Corpus of Contemporary American: 450 million words,
1990-present. Available online at:
http://corpus.byu.edu/coca/ (accessed July 2014).
Durrant, P. 2008. High-frequency Collocations and Second Language Learning.
Unpublished PhD thesis, University of Nottingham,
Nottingham.
Durrant, P. & Doherty, A. 2010. “Are high-frequency collocations
psychologically real? Investigating the thesis of
collocational primingˮ. Corpus Linguistics and Linguistic Theory, 6
(2), 125-155.
Durrant, P. & Mathews-Aydinli, J. 2011. “A function-first
approach to identifying formulaic language in academic
writingˮ. Journal of English for Specific Purposes, 30 (1), 58-72.
Durrant, P. & Schmitt, N. 2009. “To what extent do native and
non-native writers make use of collocations?ˮ. International
Review of Applied Linguistics, 47 (2), 157-177.
Durrant, P. & Schmitt, N. 2010. “Adult learnersʼ retention of
collocations from exposureˮ. Second Language Research, 26 (2),
163-188.
Ellis, N. C. 2001. “Memory for languageˮ. In P. Robinson (Ed.),
Cognition and Second Language Instruction. Cambridge: Cambridge
University Press, 33-68.
Ellis, N. C. & Larsen-Freeman, D. 2006. “Language emergence:
Implications for applied linguistics – Introduction to the
Special Issueˮ. Applied Linguistics, 27 (4), 558-589.
Ellis, N. C., Simpson-Vlach, R. & Maynard, C. 2008. “Formulaic
language in native and second-language speakers:
Psycholinguistics, corpus linguistics, and TESOLˮ. TESOL
Quarterly, 41 (3), 375-396.
Gardner, D. 2008. “Validating the construct of word in applied
corpus-based vocabulary research: A critical surveyˮ.
Applied Linguistics, 28 (2), 241-265.
Goldberg, A. E. 2006. Constructions at Work: The Nature of Generalization in
Language. Oxford: Oxford University Press.
Gries, S. T. 2008. “Dispersions and adjusted frequencies in
corporaˮ. International Journal of Corpus Linguistics, 13 (4), 403-
437.
Gyllstad, H. 2007. Testing English Collocations: Developing Receptive Tests for
Use with Advanced Swedish Learners. Lund University, Lund.
Halliday, M. A. K. 1966. “Lexis as a linguistic levelˮ. In C. E.
Bazell, J. C. Catford, M. A. K. Halliday & R. H. Robins
(Eds.), In Memory of J. R.Firth. London: Longmans, Green and Co.
Ltd., 148-162.
Hoey, M. 1991. Patterns of Lexis in Text. Oxford: Oxford University
Press.
Hoey, M. 2005. Lexical Priming: A New Theory of Words and Language. London:
Routledge.
Howarth, P. 1998. “The phraseology of learnersʼ academic
writingˮ. In A. P. Cowie (Ed.), Phraseology: Theory, Analysis, and
Applications. Oxford: Oxford University Press, 161-186.
Jaén, M. M. 2009. Recopilación, Desarrollo Pedagógico y Evaluación de un Banco
de Colocaciones Frecuentes de la Lengua Inglesa a Través de la Lingüística de
Corpus y Computacional. Unpublished PhD thesis. Universidad de
Granada, Granada.
Jones, S. & Sinclair, J. M. 1974. “English lexical collocations.
A study in computational linguisticsˮ. Cahiers de Lexicologie,
24 (2), 15-61.
Kjellmer, G. 1990. “A mint of phrasesˮ. In K. Aijmer & B.
Altenberg (Eds.), English Corpus Linguistics: Studies in Honour of Jan
Svartvik. London: Longman, 111-127.
Kurosaki, S. 2012. An Analysis of the Knowledge and Use of English Collocations
by French and Japanese Learners. Unpublished PhD thesis.
University of London Institute in Paris, Paris.
Larsen-Freeman, D. & Cameron, L. 2008. Complex Systems and Applied
Linguistics. Oxford: Oxford University Press.
Lewis, M. 1993. The Lexical Approach: The State of ELT and a Way Forward.
London: Thomson Heinle.
Lipsey, M. W. & Wilson, D. B. 2000. Practical Meta-Analysis. London:
Sage.
Manning, C. D. & Schütze, H. 1999. Foundations of Statistical Natural
Language Processing. Cambridge, MA: MIT Press.
Milton, J. 2009. Measuring Second Language Vocabulary Acquisition.
Bristol: Multilingual Matters.
Nation, P. 1990. Teaching and Learning Vocabulary. Boston: Heinle and
Heinle.
Nation, P. 2001. Learning Vocabulary in another Language. Cambridge:
Cambridge University Press.
Nattinger, J. R. & DeCarrico, J. S. 1992. Lexical Phrases and Language
Teaching. Oxford: Oxford University Press.
Nesselhauf, N. 2004. “What are collocations?ˮ. In D. J.
Allerton, N. Nesselhauf & P. Skandera (Eds.), Phraseological
Units: Basic Concepts and their Application. Basel: Schwabe, 1-21.
Norris, J. M. & Ortega, L. 2006. “The value and practice of
research synthesis for language learning and teachingˮ. In
J. M. Norris and L. Ortega (Eds.). Synthesizing Research on
Language Learning and Teaching. Amsterdam: John Benjamins, 3-50.
Palmer, H. E. 1933. Second Interim Report on English Collocations. Tokyo:
Kaitakusha.
Pawley, A. & Syder, F. H. 1983. “Two puzzles for linguistic
theory: Nativelike selection and nativelike fluencyˮ. In
J. C. Richards & R. W. Schmidt (Eds.), Language and
Communication. New York: Longman, 191-226.
Read, J. 2000. Assessing Vocabulary. Cambridge: Cambridge University
Press.
Revier, R. L. 2009. “Evaluating a new test of whole English
collocationsˮ. In A. Barfield & H. Gyllstad (Eds.),
Researching Collocations in another Language. Basingstoke: Palgrave
Macmillan, 125-138.
Schmitt, N. 2010. Researching Vocabulary: A Vocabulary Research Manual.
Basingstoke: Palgrave Macmillan.
Schmitt, N. & Zimmerman, C. B. 2002. “Derivative word forms:
What do learners know?ˮ. TESOL Quarterly, 36 (2), 145-171.
Shin, D. & Nation, P. 2008. “Beyond single words: The most
frequent collocations in spoken Englishˮ. ELT Journal, 62
(4), 339-348.
Sinclair, J. M. 1991. Corpus, Concordance, Collocation. Oxford: Oxford
University Press.
Sinclair, J. M. 2004. “The search for units of meaningˮ. In J.
M. Sinclair, Trust the Text: Language, Corpus and Discourse. London:
Routledge, 24-48.
Siyanova-Chanturia, A., Conklin, K. & van Heuven, W. J. B. 2011.
“Seeing a phrase ʻtime and againʼ matters: The role of
phrasal frequency in the processing of multiword
sequencesˮ. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 37 (3), 776-784.
Stefanowitsch, A. & Gries, S. T. 2003. “Collostructions:
Investigating the interaction of words and constructionsˮ.
International Journal of Corpus Linguistics, 8 (2), 209-243.
Stubbs, M. 1996. Text and Corpus Analysis. Oxford: Blackwell.
Taeko, K. 2005. The Acquisition of Basic Collocations by Japanese learners of
English. Wasedo University.
Webb, S. & Kagimoto, E. 2011. “Learning collocations: Do the
number of collocates, position of the node word, and
synonymy affect learning?ˮ. Applied Linguistics, 32 (3), 259-
276.
Wolter, B. & Gyllstad, H. 2011. “Collocational links in the L2
mental lexicon and the influence of L1 intralexical
knowledgeˮ. Applied Linguistics, 32 (4), 430-449.
Wolter, B. & Gyllstad, H. 2013. “Frequency of input and L2
collocational processing: A comparison of congruent and
incongruent collocationsˮ. Studies in Second Language Acquisition,
35 (3), 451-482.
Wray, A. 2002. Formulaic language and the Lexicon. Cambridge: Cambridge
University Press.
Yamashita, J. & Jiang, N. 2010. “L1 influence on the acquisition
of L2 collocations: Japanese ESL users and EFL learners
acquiring English collocationsˮ. TESOL Quarterly, 44 (4),
647-668.
Author’s address
Philip Durrant
Graduate School of Education
University of Exeter
St. Luke’s Campus, Heavitree Road
EX1 2LU, Exeter
UK