College Writing: a Corpus-based Analysis

COLLEGE ESSAY-WRITING: A CORPUS-BASED ANALYSIS

Teodora Popescu “1 Decembrie 1918” University of Alba Iulia

1. Introduction Starting with the 70’s, Error Analysis (EA) became a scientific method in its own

right, owing a lot to the research done by Corder (1967), Richards (1971) and Selinker (1972), who identified different aspects of the second/foreign language learners’ own language system, which is neither the L1 (mother tongue), nor the L2 (second/foreign language). The essential shift that their studies brought about in linguistics is a reassessment of the importance of errors made by ESL/EFL learners. Therefore, according to Corder (1967), a learner’s errors are not random, but systematic (unsystematic errors occur in one’s native language) and they are not negative or interfering with learning the Target Language, but on the contrary, they represent a necessary positive, facilitative factor, indispensable to the learning process, highly indicative of individual learner strategies. Further on, Richards (1971) identified three types of errors: a) interference errors generated by L1 transfer; b) intralingual errors which result from incorrect, incomplete or overgeneralised) application of language rules; c) developmental errors caused by the construction of faulty hypotheses in L2.

By the same token, Selinker (1972, and more recently, 1992) elaborated on the theory of interlanguage, by which we understand a third language, with its own lexicon, grammar and discourse structure, phonological traits, etc. The basic processes through which interlanguage is created are: language transfer (negative transfer, positive transfer, avoidance, and overuse), overgeneralization (at phonetic, grammatical, lexical, discourse level) and simplification (both syntactic and semantic).

This process-oriented approach to error-analysis (investigation into the reasons why language errors are made, and learners’ active strategies) has allowed for the adoption of a learning-based perspective. It follows that teachers now view errors as necessary stages in all language learning, as the product of intelligent cognitive strategies, hence as potentially useful indicators of what processes the student is using.

In our endeavour to investigate students’ errors occurring in essay-writing, we first tried to identify and categorise these mistakes, and further on we attempted to explore the reasons why they might have come about. In order to ascertain learners’ writing competence (in L2), we analysed learner errors from a linguistic perspective: (spelling – partly accounting for phonetic inaccuracies, morphological, syntactic, collocational and discursive – in terms of non-achieving coherence and cohesion). The approach we adopted was one provided by electronic tools of concordancing software.

2. Corpus Linguistics The term corpus, derived from the Latin word for body, was first encountered in the

6th century to refer to a collection of legal texts, Corpus Juris Civilis (Francis 1992: 17). The term corpus has preserved this initial meaning, i.e. a body of text; nevertheless this definition is not entirely satisfactory for corpus linguists. According to one of the five definitions provided by the Oxford English Dictionary, a corpus is ‘the body of written or spoken material upon which a linguistic analysis is based’. It results that a corpus is not just a collection of texts; it represents in fact ‘a collection of texts assumed to be representative of a given language, dialect, or other subset of a language, to be used for linguistic analysis’ (Francis 1982: 7 apud Francis 1992: 17). Furthermore, Francis (1992) mentions three main areas in which corpora have traditionally been used: lexicographical

182

TRANSLATION STUDIES: RETROSPECTIVE AND PROSPECTIVE VIEWS. 2nd Edition. 1 – 2 November 2007. Gala i, Romania

183

studies in the creation of dictionaries, dialectological studies and the creation of grammars. Modern corpus linguists nevertheless are quite different from their early fellows. Kennedy (1992) underlined the fact that initial corpora were mostly of written texts only, just the forms were counted, not the meanings and they were untagged, so homonyms were often classed as one word. Another important reason was that traditionally, linguists had been strongly influenced by Chomsky’s theory that corpora were inadequate whereas intuition was. Chomsky contested the concept of empiricism on which corpus linguistics had been based and offered a rationalist approach instead, supporting a sort of methodology by which ‘rather than try and account for language observationally, one should try to account for language introspectively’ (McEnery & Wilson 1996: 6). Chomsky condemned corpus-based studies asserting that ‘Any natural corpus will be skewed…the corpus, if natural, will be so wildly skewed that the description would be no more than a mere list’ (Chomsky 1962: 159 apud Leech 1991: 8). This theory is not surprising as long as Chomsky, more interested in competence than performance, was against an approach that was foremost based on actual performance data.

Nonetheless corpora research continued in spite of early criticisms, and it even strengthened due to technological advances in computer software. Now it is possible to process texts of several million words in length (Sinclair 1991).

Nelson (2000) pointed out that there are several reasons that speak in favour of using corpora in linguistics analysis: objectivity vs. intuition, verifiability of results (Svartvik 1992, Biber 1996), broadness of language able to be represented (Svartvik 1992, Biber 1995, Biber, Conrad & Reppen 1994), access, broad scope of analysis, pedagogic – face validity, authenticity, motivation (Johns 1988, Tribble & Jones 1990), possibility of cumulative results (Biber 1995), accountability, reliability, view of all language (Sinclair 2000).

3. Methodology The learner essay-writing corpus (LEWC) was created by collecting the essays

written by 30 Romanian-speaking university students of economics in their 2nd year at an intermediate level of language learning. We need to mention that students typed their own essays, which were subsequently compiled by the teacher. This fact might a priori account for the automatic correction of some spelling mistakes (made by the word editor students used), as well as for a limited amount of correction in the case of morphological or syntactic errors. Results would have undoubtedly been different had we had our students handwrite their essays. The essays are argumentative, non-technical, including titles such as ‘Crime does not pay’, ‘Most university degrees are theoretical and do not prepare students for the real world. They are therefore, of very little value’, ‘Living in a city has greater advantages than living in a small town or country’, and have an average length of about 500 words each. The first step was to analyse the corpus as a whole, in order to identify the most frequent words that were used. The figure below will show a screen shot of the frequency count of our corpus.

Teodora Popescu College Essay-Writing: A Corpus-Based Analysis

184

Figure 1 Frequency sort for LEWC

As can be seen from the picture above, the most frequent words are articles (definite and indefinite), prepositions and conjunctions, personal pronouns – mainly 1st person singular since it was an argumentative essay and students expressed their own views, as well as the verb to be. The most frequent noun was people, again because an argumentative essay has to be rather referential and generalising. Next in our count we have crime, person and life. Person is the singular of people, therefore the same explanation as above might hold true. Crime was mainly used because most of the essays were written on the topic ‘Crime does not pay’.

The next step was to highlight the errors that occurred in the corpus and to try to classify them. As we mentioned in the introduction to this study we considered the errors from a linguistic point of view: (spelling, morphological, lexical – inappropriate use of lexis, lexical – collocational, syntactic, and discursive).

Type of error Examples No. of occurrences

Spelling * […] and you remain like a dad person […] a dead person… * They kill for example for unpayed debts[…] unpaid debts

25

Morphological *Crimes always catches up with the criminal… Crimes and criminals are always discovered… * so they have to action […] […] so they have to act / take action […]

54

Lexical – inappropriate use of lexis

* My device1 in life is… My motto/watchword/slogan in life is […] * I think rappers are equal with the criminals…

68


185

Lexical – collocations

*…where they have a lot of opportunities and where have achieved respect and wealth […] […] have earned respect and acquired wealth…

147

Syntactic * In our country, is still that old system […] In our country there still exists […]

43

Discursive * Instead of studying math when at a law school it would be better that everyone will have a course that teaches them or improves their skills when it comes to society and person to person relationships? Wouldn’t it be better, if, while a Law student, one could take a course in personal and social communication skills instead of studying Math?

29

Table 1. Classification of student errors in the LEWC

We previously draw attention to the fact that the relatively low number of spelling mistakes is due to the quasi-satisfactory computer-literacy of our students. There are specific mistakes that a spell-check tool will automatically correct, while others will be just underlined. By a right click on the mouse the student can see the alternatives offered by the computer. Needless to say that the options of the computer are not always the best. There were nevertheless, spelling mistakes which could not be corrected with the help of the computer; these are, in general, words that have more than two misspelled letters. E.g. * […] the person who died wasn’t crossing the street regulamentary […] (in this case the student’s mistake is twofold, both intralingual and interlingual; probably knowing that certain adjectives in English – e.g. customary, preliminary, etc. have the –ary suffix, he/she added it to the Romanian adjective ‘regulamentar’ – both adj. and adv.; in English ‘lawful/-ly’). Neither could the word ‘consilier’ be corrected. What the computer offered as solutions were: ‘consoler’, ‘costlier’, ‘consulter’, ‘consular’ and ‘consigliore’. The best rendition of the idea would be in fact ‘counsellor’. A third example is a rather common mistake that Romanian students make in English: “Is it worthed to live my life in death […]”. Once students learn the passive voice in English, they will develop a strategy of always using the subject + verb to be + past participle. This overgeneralization works detrimentally, since ‘worthed’ does not exist in the English language.

The mistakes that students made in terms of morphology, syntax and discourse did not come necessarily as a surprise, as we knew from the outset that our students’ language competence was at an intermediate level (similar to B1 of the CEF) and we can’t speak of proper foreign language writing competence as long as a certain linguistic competence is achieved. One typical mistake that we recorded was the omission of the subject: “In the majority of the countries is a mixture […]”, which can be easily ascribed to the fact that subject-elliptic utterances are quite frequent in Romanian. Another example of bad morphology/syntax/discourse is the incorrect use of ‘what’ (interrogative pronoun) instead of ‘that’: *”[…] if you accomplish something what is not palpable[…]”; * “Many professions what are needed in a society are now missing people to practice them[…]” or * “In our country are few respectable companies what can give the recompense that top people need.” In Romanian there is one single word for both functions: ‘ce’.

We noticed a high occurrence of lexical mistakes, especially in the case of collocational patterns. We considered therefore that it might be useful to delve even deeper into this type of errors. This time it was relatively easier to identify mistakes with the help of a concordancer. We started form analysing the most frequent words in their vicinities and subsequently categorised the mistakes. We will present the sub-classification of collocational errors in the following table:

Type of error Example No of occurrences

Noun + Verb * crime always gets paid. …a crime will always be punished

10


186

Verb + Noun * achieve offence commit an offence * are now missing people now lack people

27

Adj + Noun *dense course elaborate course

22

Adverb + Verb * will suffer very hard will suffer deeply

15

Verb + Preposition

* graduate the university graduate from university

26

Noun + Preposition

* experience on computer experience in/with computers

18

Preposition + Noun

* in university at university * face to face with a computer in front of a computer

21

Multi-word expressions

* be indicate that it is advisable to * it’s the best to it’s best to

8

Table 2 Sub-classification of collocational errors

We would like to present in the following some of the most frequent words in the corpus and the way they collocate.

Some of the incorrect patterns that you can see below are: *Crime does not worth; *one can make a crime; *crime always gets paid. Nevertheless, the most common verb used was ‘to commit a crime:’

Fig 2 Crime concordances


187

Fig 3 People concordances

A word with such a high degree of generality was by and large used in a correct collocational manner, the mistakes being of ‘grammar’ type: 1. *… essentially if people are being paid by the “quantify” of work they do, … - incorrect use of tense / incorrect use of morphological category; 2. *… I think that the rehabilitate is for people that achieved minor offences like theft – incorrect use of definite article / incorrect use of morphological category / collocational error / morphological error; 3. * The people general opinion is that a better live is far away … - incorrect use of definite article / incorrect use of genitive.

4. Pedagogical implications We are aware that the present study does not cover all aspects pertaining to errors occurring in the LEWC and there still exist numerous facets that deserve a researcher’s and practitioner’s attention.What this study has nevertheless shown is that collocational errors were the most numerous, which entails that the educator should devote substantial time and space to the teaching of collocations. Despite all criticisms brought to the behaviouristic approach, which laid particular emphasis on learning by heart, thus shaping appropriate language behaviours, it is important to teach and learn a large amount of collocations explicitly. In particular, as can be inferred from the above error analysis samples, it is important to focus on the larger context in which collocations / lexical items occur.

5. Suggestions for further research Starting from the findings of the research already carried out and in view of further analyses of students’ writing skills, we will try to focus on specific methods strategies of teaching essay writing, with particular emphasis on the development of collocational competence alongside the syntactic and morphological ones.


188

Notes:1 The Oxford Genie CD-ROM gives the following definitions for device (Engl.): 1. an object or a piece of equipment that has been designed to do a particular job 2. a bomb or weapon that will explode 3. a method of doing sth that produces a particular result or effect 4.a plan or trick that is used to get sth that sb wants. Unfortunately, in Levi chi’s Dic ionar român-englez. Romanian-English Dictionary (1998), the first definition for deviz (Rom.) is device, motto. This was an important means of identifying yet another type or error source: consulting a faulty dictionary.

Bibliography: o Biber, D. (1995) Dimensions of Register Variation, Cambridge: Cambridge University Press. o Biber, D., Conrad, S. & Reppen, R. (1994) ‘Corpus-Based Approaches to Issues in Applied Linguistics,’ Applied

Linguistics, Vol.15, No.2.o Corder, S.P. (1976) ‘The Significance of Learners’ Errors,’ International Review of Applied Linguistics, 5, 161-170. o McEnery, T. and Wilson, A. (1996) Corpus Linguistics, Edinburgh: Edinburgh University Press o Leech, G. (1991) ‘The State-of-the-Art in Corpus Linguistics,’ in Aijmer, K. & Altenberg, B. (eds) English Corpus

Linguistics, Studies in Honour of Jan Svartvik, London and New York: Longman. o Nelson, M. (2000) A Corpus-Based Study of Business English and Business English Teaching Materials.

[UnpublishedPhD Thesis. Manchester: University of Manchester.] Thesis available at http://users.utu.fi/micnel/thesis.html

o Popescu, T. (2007) ‘Concordancing for Increased Learner Independence and Translation Skills,’ Proceedings of the International Conference Foreign Language Competence as an Integral Component of a University Graduate Profile, University of Defence in Brno, Czech Republic, 221-229.

o Popescu, T. (2007) ‘Teaching Business Collocations,’ in D. Galova (ed), Languages for Specific Purposes: Searching for Common Solutions, Newcastle: Cambridge Scholars Publishing, 164-174.

o Popescu, T. (2006) ‘Encouraging Learner Independence through Teaching Collocations,’ Messages, Sages and Ages 2, 805-814.

o Richards, J.C. (1971) ‘A Non-Contrastive Approach to Error-Analysis,’ English Language Teaching, 25, 204-219. o Selinker, L. (1972) Interlanguage. Error Analysis, London: Longman,o Selinker, L. (1992) Rediscovering Interlanguage, New York: Longman Inc. o Sinclair, J. (1991) Corpus, Concordance, Collocation, Oxford: Oxford University Press. o Svartvik, J. (1992) ‘Corpus Linguistics Comes of Age,’ in Svartvik, J. (ed) Directions in Corpus Linguistics,

Proceedings of Nobel Symposium 82 Stockholm, 4-8 August 1991, Berlin, New York: Mouton de Gruyter.

Documents

College Writing: a Corpus-based Analysis