Probably it is only a Matter of Time · This chapter will cover the theory of connectors or connecting adverbials. Three different sub-chapters will deal with different aspects related

Technische Universität Chemnitz

Philosophische Fakultät

Institut für Anglistik & Amerikanistik

English Language & Linguistics

Bachelorarbeit

zur Erlangung des akademischen Grades

„Bachelor of Arts“

im Fach Anglistik/Amerikanistik

Probably it is only a Matter of Time

-

An Empirical Comparison of

Connecting Adverbials

in Timed and Untimed Student Writing

Betreuer: Prof. Dr. Josef Schmied

E-Mail:

Anschrift:

Matrikelnummer:

Geburtsdatum:

Studiengang: Bachelor Anglistik/Amerikanistik

Abgabetermin: 12.12.2013

i

Table of Contents

1. Introduction 1

1.1. Defining the Topic 1

1.2. Research Questions 2

2. Literature Review 4

2.1. Connecting Adverbials 4

2.1.1. Cohesion 4

2.1.2. Reference 5

2.1.3. Categories of Connecting Adverbials 6

2.2. Timed and Untimed Writing 9

2.3. Student Writing and Genre 9

2.4. Gender 11

2.5. L1 11

2.6. Prototype 12

2.7. Previous Studies on the Topic 13

2.7.1. Hůlková (2011) 13

2.7.2. Bolton, Nelson and Hung (2002) 13

2.7.3. Milton and Tsang (1993) 14

3. Analysis 15

3.1. Methodology 15

3.2. Corpus 16

3.3. Monitor Corpus 17

3.4. Corpus Compatibility and Issues 19

3.4.1. Text Type 19

3.4.2. Gender 22

3.4.3. Department 22

ii

3.4.4. Other 23

3.5. Analysis Tool 23

3.6. Statistics 26

3.7. Results 35

3.8. Limitations 38

4. Conclusion 40

5. Outlook 42

6. References 44

7. Appendices 46

7.1. Corpora 46

7.2. Statistics 46

7.3. Sorting Tool Source Code 46

7.4. Analysis Tool Source Code 46

7.5. Selbstständigkeitserklärung 46

iii

List of Tables

Table 1: Comparison of Semantic Categories of Connecting Adverbials 9

Table 2: Number of Texts 16

Table 3: Number of Words 16

Table 4: Average Text Length of Corpora 19

List of Figures

Figure 1: from Halliday & Hasan (1976) 5

Figure 2: ICLE Metadata Database File Header 18

Figure 3: Argumentative Essay (Hyland (1990), p. 69) 21

Figure 4: ChemCorpus Connecting Adverbials per One Million Words 27

Figure 5: ICLE Connecting Adverbials per One Million Words 28

Figure 6: Comparison timed – untimed in ChemCorpus and ICLE 29

Figure 7: Functional Categories, timed – untimed 30

Figure 8: ChemCorpus Connecting Adverbials by Functional Category 30

Figure 9: ICLE Connecting Adverbials by Functional Category 31

Figure 10: Functional Categories by Gender 32

Figure 11: Functional Categories by L1 33

Figure 12: ChemCorpus Functional Categories by Prototype 34

Figure 13: ICLE Functional Categories by Prototype 34

Figure 14: Overall Connecting Adverbial Shares 36

1

1. Introduction

Academic writing in English differs from other forms of written language. It has

been claimed that writing academically has to be learnt by native and non-native

speakers of English alike. Thus, it is important to analyze expert academic writing

and novice academic writing and to compare the results to deduce implications for

teaching academic writing. In addition, analyzing and comparing academic writing

produced by students can prove useful to identify factors that influence the students’

writing. Possible influential factors can be socio-linguistic, such as mother tongue or

gender, but also due to other conditions, such as text type or academic tradition.

These factors can cause differences in the use of the language not only compared to

the expert texts in the genre, but also compared to other novice texts. Thus, the

analysis of student writing can provide useful insight into which of the factors is

most influential.

In this paper, I want to analyze the use of connecting adverbials, or connectors, in

timed and untimed writing of German students’ of English. However, before the

actual analysis can take place, it is necessary to identify the different theoretical

concepts underlying the topic and then to take a closer look at each of the concepts.

Afterwards, the ChemCorpus will be introduced in greater detail, since it is the main

database used for my research. Analogously, the International Corpus of Learners

English (ICLE), which is used as reference corpus, will be introduced. Additionally,

the tool developed to carry out the analysis will be presented. In the next part, the

analysis and the statistics will be introduced and explained in detail. Furthermore, the

findings will be presented and finally limitations will be clarified and an outlook will

be given.

1.1. Defining the Topic

When taking a closer look at the topic of this paper, three main areas are of relevance

for the research. At first, there is the matter of connecting adverbials, which will be

discussed in section 2.1. below. A definition of the term will be provided as well as

the different labels that have been given to the concept of connecting adverbials by

different authors. Furthermore, the broader context of connecting adverbials,

coherence and cohesion, will be discussed, as well as the role of connectors. In this

context of cohesion, different types of reference, such as endophoric and exophoric,

2

will be discussed. Afterwards, four categories will be established and placed in the

different frameworks established by authors in previous studies.

The next area that will be examined is the concept of genre. As this area is very

broad, it is important to further narrow it down. Consequently, a definition that has

been adapted from works by other authors will be given and the distinction between

the, in some aspects overlapping, terms register and genre, will be made. Moreover,

it will be detailed why the term text type is rather used than genre. Furthermore,

attention will be paid to genre analysis and, particularly, the genre of learners’

English and academic prose. The term learners’ English will be defined, and its

features will be explained in detail. Similarly, the genre of academic prose will be

defined and explained, so that the two genres can be contrasted, with focus on

connector use. As the last part of this section, the two text types of the corpus used

for analysis, written exam papers and final theses, will be looked at in more detail.

As the last of the three areas, the effect of timedness on student writing will be

investigated. On this behalf, it is however necessary to define the term and to look at

previous studies that allow a prediction on possible effects. At this point, the concept

of timedness will be related to the features of the two text types discussed above.

1.2. Research Questions

There are five major research questions that cover everything that has been analyzed

in the two corpora. The first three research questions comprise the variables of

timedness, gender and mother tongue (L1). The fourth research question deals with

the prototypical connectors in the functional categories. Finally, the fifth research

question compares the findings of the analysis of the ChemCorpus to findings of the

ICLE.

Q 1. Do L2 learners of English use more connecting adverbials in timed or

untimed writing?

Q 1.1. Are more complex connecting adverbials used in untimed writing due

to the extended time period available for editing?

Q 2. Do women use more connecting adverbials than men?

Q 3. Do German L1 writers use more connecting adverbials than L1 writers of

other Germanic languages?

Q 4. What are the prototypical connectors for each of the four functional

categories?

Q 5. How does the writing of Chemnitz students compare to other L2 writers?

The first research question (Q 1.) investigates whether more connectors are used in

timed or untimed writing. My hypothesis is that, as there is more time available to

3

edit the texts in untimed writing, it is more likely that a higher number of connecting

adverbials can be observed. Furthermore, the sub research question (Q 1.1.)

investigates whether student writers use more complex connecting adverbials if they

have no time restraints and thus can edit their work more intensively.

The second research question deals with the variable of gender. It will be

investigated whether female or male students use more connectors. The initial

assumption is that female authors use more connecting adverbials than male authors.

When answering this research question it is also important to keep the stratification

of the corpora in mind, since, due to the generally low number of male students in

the field of English and American Studies, the ChemCorpus contains only relatively

few texts by male authors.

The third research question deals with the mother tongue of the writers. More

precisely, German L1 writers will be compared to L1 writers of other Germanic

languages, such as Dutch or Swedish, and it will be possible to see whether German

writers use more connecting adverbials than the writers with other Germanic

languages as their L1. It is important to consider that the ICLE, which will be used

for comparison, only, contains texts written by authors with two other Germanic

native tongues, Dutch and Swedish.

The fourth research question steps away from the social variables and investigates

which connecting adverbials are prototypical for each of the four functional

categories. Afterwards, the prototypical connectors will be compared to the rest of

the connectors in this category. It is hard to make a prediction here, but I would

guess that the prototypical connecting adverbials make up a large percentage of the

connectors in the respective functional categories.

Finally, the fifth research question compares the overall usage of connecting

adverbials in the ChemCorpus to the ICLE. It will be possible to see whether the

Chemnitz students use connectors differently than other learners of English.

However, my assumption is that their usage is not very different, since academic

English has to be learnt by all writers alike, disregarding their language background.

After answering all of the five research questions presented, an extensive picture

of connecting adverbial usage by nonnative writers of English will be visible.

Furthermore, possible areas for future research can be highlighted.

4

2. Literature Review

In this part of the paper, most of the theoretical concepts and background information

will be provided. There will be eight subchapters, the first seven of which will deal

with theory, and the last one will review previous studies in the field of connecting

adverbials. At first, connecting adverbials will be covered. Afterwards, the

differences between timed and untimed writing will be explored, and the topic of

student writing and genre will be discussed. In the next part, three of the factors that

will be analyzed in both corpora, namely gender, L1 and prototype, will be

introduced. In the last part of chapter 2, three previous studies (Hůlková (2011);

Bolton, Nelson & Hung (2002); Milton & Tsang (1993)) will be reviewed with

regard to my own study.

2.1. Connecting Adverbials

This chapter will cover the theory of connectors or connecting adverbials. Three

different sub-chapters will deal with different aspects related to connectors. At first,

the concept of cohesion will be introduced and explained. Afterwards, the topic of

reference will be introduced. It covers the property of reference that lexical items

need to have in order to express the semantic ties described as cohesion. Lastly,

different systems of categorizing connecting adverbials, different in both definition

and numbers, will be reviewed. Finally, my own system of categorizing connecting

adverbials will be introduced based on this review.

2.1.1. Cohesion

Before discussing connecting adverbials, it is necessary to step back and take the

path from a very general perspective of a whole text, to the small units of connecting

adverbials. The most general form of data that can be linguistically analyzed is a text.

In linguistics, a text is characterized as “any passage [of] spoken or written

[language], of whatever length, that does form a unified whole” (Halliday & Hasan,

1976, p. 1). It is important to realize that a text is to be understood as a semantic and

not as a grammatical unit. A text is made of sentences, but as Halliday & Hasan

(1976) point out, it is important to say that a text does not consist of sentences, but is

realized by them . This means that the sentences encode the semantic unit of a text

and to realize this coherence and cohesion are needed. As grammar and lexicon are

used to put the semantic concepts expressed in a text into language, cohesion is

5

needed to bridge the gap between the semantic concepts and the words and

grammatical structures of a text. This can be illustrated as shown in Figure 1 below.

Figure 1: from Halliday & Hasan (1976)

The figure shows the concept of a text as a top down model. At the top is meaning,

which is realized via the semantic system and the semantic concepts in our minds.

Below that is wording, which is the way the semantic concepts are realized using a

language. However, there needs to be something that enables us to coherently

express the semantic concepts in a language, and that is cohesion. It is the concept

that ties together meaning and wording. Cohesion is realized by both, lexical items

and grammar, thus there is lexical cohesion and grammatical cohesion (Halliday &

Hasan, 1976, p. 6). The distinction between these two is not a binary, but more of a

distinction of degree, since it is impossible to say that a semantic relation is one or

the other. The distinction is especially difficult when talking about connecting

adverbials, as they almost always have a grammatical and a lexical component. So it

can be said that cohesion establishes relations between the different semantic

concepts expressed in a text.

2.1.2. Reference

Another important concept to consider is that of reference. The previous chapter has

shown that cohesion is used to logically express semantic concepts. In order to do so,

language items that have the property of reference are necessary. In the following,

the concept of reference will be explained in greater detail.

This concept can be defined very straightforward. Reference expresses a semantic

relation that the information has to be retrieved from somewhere else (Halliday &

Hasan, 1976, p. 31). This “somewhere else” can be put into two categories:

situational (exophoric) or textual (endophoric). Exophoric means that the meaning

lies in the context of the utterance, as the term exo- suggests that it lies outside of the

utterance. Endophoric reference is basically the direct opposite, as the reference lies

within the text. Here, the terminology can be further distinguished into anaphoric

reference, referring to preceding text, and cataphoric reference, referring to following

meaning

• the semeantic system

wording

• the lexicogrammatical system

•grammar

•vocabulary

'sounding'/writing

• the phonological system

• the orthographic system

6

text (Halliday & Hasan, 1976, p. 33). The relevant type of reference for this paper is,

as mentioned above, endophoric reference, as connectors are used to establish links

between utterances in a text. The distinction between anaphoric and cataphoric

reference, however, can be ignored for the purpose of this paper. On the one hand,

the distinction is not easy, as reference can be ambivalent when looking at individual

language items, and on the other hand, the classification can hardly be made by

automatic tools. Consequently, it is not suitable for this quantitative analysis.

2.1.3. Categories of Connecting Adverbials

Since this paper aims at investigating a large number of connecting adverbials, it is

necessary to categorize them into adequate semantic categories. Nonetheless, this

categorization is not easy, as different authors have come up with different models.

In this chapter, I want to present previous research that has been done to categorize

the connecting adverbials and finally will present the categories used in my analysis.

The first categorization that will be considered is by Biber et. al. (1999). They

establish six semantic categories, which are enumeration and addition, summation,

apposition, result/inference, contrast and concession, and transition (Biber et. al.,

1999). It is notable that Biber et. al. are some of the few who use a categorization

with six categories rather than just four, as other researchers, such as Halliday &

Hasan (1976) or Bolton, Nelson & Hung (2002), do. The first category is

enumeration and addition, which has the main function of connecting adverbials

belonging to this category is either “the enumeration of information in an order

chosen by the speaker/writer [or] […] the addition of items of discourse to one

another” (Biber et. al., 1999, p. 875). The semantic concept of adding items in a text

according to a sequence or just to one another underlying the category is slightly

ambiguous, since other authors regard enumeration and addition as two separate

concepts. The next category, summation, contains connecting adverbials that “show

that a unit of discourse is intended to conclude or sum up the information in the

preceding discourse” (Biber et. al., 1999, p. 876). Appositional connecting adverbials

function as markers that “the second unit [of discourse] is to be taken as equivalent to

or included in the preceding unit” (Biber et. al., 1999, p. 876). The following

category is result/inference, which, as the name already suggests, “show[s] that the

second unit of discourse states the result or consequence […] of the preceding

discourse” (Biber et. al., 1999, p. 877). The category that follows,

7

contrast/concession, is the broadest category in the model Biber et. al. establish. This

category contains adverbials that “mark incompatibility between information in

different discourse units, or that signal concessive relationships” (Biber et. al., 1999,

p. 878). The last category in this model is transition, which “mark[s] the insertion of

an item that does not follow directly from the previous discourse” (Biber et. al.,

1999, p. 879).

The next categorization that will be covered is by Halliday & Hasan (1976). They

establish four categories, which are additive, adversative, causal and temporal.

Additive is defined as a derived form of coordination and the ‘and’ relation (Halliday

& Hasan, 1976, p. 244). This means that additive connecting adverbials link two

units of text that belong to the same topic and express similar meaning. The next

category is adversative, which they define as “contrary to expectation” (Halliday &

Hasan, 1976, p. 250). Consequently, adversative connecting adverbials link two units

of text that are related, but the second unit expresses meaning that deviates from the

expected meaning that could be deduced from the first unit of text. The third

category is causal connectors, which express a cohesive relation in which a text unit

logically entails another (Halliday & Hasan, 1976, p. 256). The last category is

temporal connecting adverbials, which express succession, as one unit of text is

“subsequent to the other” (Halliday & Hasan, 1976, p. 261). This includes not only

connecting adverbials that directly relate to a temporal succession, such as ‘then’, but

also all items that express a form of succession, such as ‘first’, ‘second’, ‘third’.

It is, furthermore, notable that Halliday & Hasan also define a number of

subcategories to further distinguish each group. However, this subdivision will not be

considered in this paper because it is not only difficult to assign a connector to a

specific subgroup, but also nearly impossible to implement in the rather large-scale

quantitative analysis, as done in this paper.

The classification of connecting adverbials established by Halliday & Hasan is

directly adopted by Bolton, Nelson and Hung (2002). This study is discussed in more

detail in chapter 2.4.

The last model for categorizing connecting adverbials that shall be mentioned

here is proposed by Hůlková (2011). Her model comprises, similar to Biber et. al.

above, six semantic categories. The categories are appositional, listing,

contrastive/concessive, resultive/inferential, summative and transitional (Hůlková,

2011, p. 138), of which the latter four are the same as in Biber et. al. above. The only

8

two categories that are different are appositional and listing, though they directly

correspond to addition and enumeration respectively in Biber at. al.’s model. It is,

furthermore, notable that Hůlková also establishes subcategories for some types of

connecting adverbials, but, again, they are not relevant for the present study.

After the review of the semantic categories established in previous research, the

following part will introduce the classification system used in this study. I opted for a

model with four categories, since some of the categories in Biber et. al. and Hůlková

contain only very few connecting adverbials. The categories used here are additive,

adversative, causal and sequential, which allow the classification of all possible

connecting adverbials. The first category, additive, contains connectors that express

the semantic concept of linking one unit of text to another by marking the second

unit as being similar to the first (Biber et. al., 1999, p. 878). The second category

contains adversative connecting adverbials, and thus, is more or less the inversion of

the additive category, as it also links two units of text, but marks the second as being

different from the first. The next category is causal connectors, which express a

logical link between two units of text, usually marking the second unit as a result

from the previous unit. Finally, there are sequential connecting adverbials, which

include all items that express a certain type of sequence. This includes enumerations,

listings, as well as temporal expressions, and summations. Note that summative

connecting adverbials are within the group of sequential connectors and not causal,

as they do not express a logical consequence.

Table 1 below illustrates how the different semantic categories from previous

research correspond to the categories used in this paper. It is notable that the

sequential category corresponds to more than one category from the six category

models. The reason for this is that the respective categories in the six category

models are defined narrower and, thus, contain considerably less connecting

adverbials then my sequential category. This is caused by the narrower definition of

the respective categories in the six category models. Consequently, the sequential

category as defined for this paper contains considerably more connecting adverbials.

9

additive adversative causal Sequential

Halliday &

Hasan;

Bolton,

Nelson, Hung

additive adversative causal temporal

Biber et. al. enumeration

and addition

apposition

contrast/

concession

result/

inference

transition

summation

Hůlková appositional contrastive/

concessive

resultive/

inferential

listing

summative

transitional

Table 1: Comparison of Semantic Categories of Connecting Adverbials, based on previous Studies

2.2. Timed and Untimed Writing

The topic of timed and untimed writing and the influence of timedness on writing is

not very well researched. There is hardly any research on the topic, nor studies that

investigate the effect of timedness on writing (Gregg, Coleman, Davis & Chalk,

2007). The reason for this is unclear to me, maybe there has simply no research been

carried out, what is relatively unlikely, or it simply has no effect on the writing.

Moreover, timed and untimed texts are hardly of the same text types, which makes it

hard to compare them. Intuitively, I would assume that untimed writing contains

more connecting adverbials, since the authors have more time to rewrite and edit

their work. It would also be possible that untimed writing contains a wider variety of

connecting adverbials than timed writing, since the authors have the possibility to

consult dictionaries or thesauri. This study will show whether these assumptions can

be confirmed.

2.3. Student Writing and Genre

This section will deal with the concept of genre and will relate this concept to student

writing. First of all, it is important to define the term genre. The term is widely

recognized within the field of literature studies, in which genres such as prose,

poetry, or drama are important. Swales defines the term genre in linguistics as

follows:

10

A genre comprises a class of communicative events, the members of which share

some set of communicative purpose. These purposes are recognized by the expert

members of the parent discourse community, and thereby constitute the rationale for

the genre. […] In addition to purpose, exemplars of a genre exhibit various patterns of

similarity in terms of structure, style, content and intended audience. If all high

probability expectations are realized, the exemplar will be viewed as prototypical by

the parent discourse community. (Swales, 1990, p. 58)

The first part of the definition deals mostly with the context and generic features of

genres. According to this definition, genres can contain any kind of communicative

event, either written or spoken, that is used to communicate with members of the

discourse community. However, the expert members of the community have to

recognize the purposes of the communication, which usually means that the form of

the communication follows a certain pattern. Texts belonging to a genre have to obey

to a certain norm, in order to be recognized as specimen of that particular genre.

Consequently, there have to be “typical linguistic features [that are] frequent and

pervasive” (Biber & Conrad, 2009, p. 16). This means that, similar to genres in

literature, genres are not defined by content but by form. Furthermore, what genre a

text belongs to is not defined by a single authority but in a rather democratic way by

the expert members of the discourse community. These experts are also the ones who

create models to which future texts of the genre can be compared. As a result, it is

possible to say that “genres link users to their discourse community” (Schmied,

2011, p. 5). This also means that genres are closely linked to their discourse

community and may only be recognized as such by members of the respective

community.

In the second part of this chapter, the facts mentioned above will be applied to the

case of student writing. The most important fact about student writing is that text

produced by students can be seen as part of apprenticeship to become a recognized

member of the discourse community (Schmied (2011), Hüttner (2007)). As an

apprenticeship signifies the process of acquiring certain skills, the texts produced by

students will lack some of the linguistic features the expert texts of the genre have.

Furthermore, the communicative purposes and target audiences of student and expert

texts differ considerably. This raises the question whether student genres are just

weaker copies of expert genres or should be seen as genres of their own. Hüttner

(2007) argues that student genres have to be considered as separate genres from

expert genres, since they are not just weak copies, as novice genres would be, but are

separate genres because of their special communicative purposes. She argues that

11

student genres are almost entirely produced by students and usually have a very

limited audience, such as the corrector of a paper, and the communicative purpose is

also very different from expert texts, as student texts usually aim at displaying the

progress of learning in a certain discipline (Hüttner, 2007, pp. 58-62). Consequently,

the ways and strategies utilized in the papers will differ, and because of this the

linguistic features will also differ. However, student genres still belong to the greater

complex of academic discourse. In accordance to Hüttner’s argumentation, it may be

more useful to compare texts from student genres amongst themselves instead of

comparing them to expert genres, particularly with the different cultural backgrounds

and native languages of students in mind. (Schmied, 2011, p. 14)

To conclude this chapter, it can be argued that student genres have to be

considered on their own, as their communicative purpose is unique and they differ

significantly from expert genres. However, it must not be forgotten that the student

genres are sub-genres within the field of academic research and mark stages in the

apprenticeship towards the goal of producing texts belonging to the expert genres.

2.4. Gender

There has been a lot of research in the field of gender and its impact on language

since the emergence of feminism and feminist linguistics. These studies mainly focus

on gender as a sociological or socio-cultural phenomenon (Kortmann, 2005, p. 277).

All these studies have shown that there are considerable differences in the language

of men and women, particularly in oral language. For written speech, novels in

particular, Livia (2003) points out that female authors use more cohesive devices

than male authors. For academic writing, gender seems to be of no influence on the

use of connecting adverbials (Hůlková, 2011). Since these two findings, albeit about

different genres, are contradictory, it will be interesting to see what this study reveals

about the use of connecting adverbials by gender.

2.5. L1

The influence of the mother tongue on academic writing in a foreign language is a

heatedly debated topic with two opposing points of view. On the one side, there are

those researchers that suggest that the L1 has a considerable influence on the

academic writing in the L2. On the other side are the ones that claim the influence of

the L1 can either be neglected or cannot be the only factor for explaining the usage

12

patterns of learner writing (Gilquin & Paquot, 2008, p. 54). These other possible

factors include education or the discipline in which the papers are written.

Those who deem the influence of the L1 to be of minor importance, ground their

claim on the basis of the universality hypothesis, which “implies that the methods

and concepts of a science form a secondary cultural system” (Martin-Martin, 2005, p.

193). Martin-Martin quotes Widdowson, according to whom the scientific discourse

is “basically independent of its realization in a particular language” (Martin-Martin,

2005, p. 193). Furthermore, he provides evidence of studies that have found that the

educational experience influences the academic writing of L2 authors more than the

interference from the L1 (Martin-Martin, 2005, p. 196). In addition, he argues that

the writing conventions of certain genres or disciplines are more influential than

cultural influences of the writers (Martin-Martin, 2005, p. 200). In conclusion,

Martin-Martin states

[T]here are certain aspects of academic discourse which are more amenable to the

restrictions of the writing conventions in a specific discipline and in a specific genre,

and that this would tend to be universal, whereas there may be other aspects that are

governed by socio-cultural factors, which are therefore culture-specific. (Martin-

Martin, 2005, p. 200)

Thus, it is difficult to say, whether the use of connecting adverbials is influenced

mostly by genre, discipline or L1 interference. However, since coherence and

cohesion are very prominent features of academic discourse, I assume that there

might be some variation in the use of connecting adverbials between the different L1

groups, however, it would be beyond the scope of this study to investigate whether

the variation is due to L1 interference or the educational background of the writers.

2.6. Prototype

The concept of prototype originates in the field of cognitive semantics and is related

to the categorization of things. Prototype semantics generally assumes that the

boundaries of semantic categories are “much more flexible and fuzzy than is

suggested by traditional componential semantics” (Kortmann, 2005, p. 209). Due to

this fuzziness, there are “central or typical members of a category” (Saeed, 2009, S.

37), which are the best representation of the underlying cognitive concept. For this

study, the statistics will show, which connecting adverbial in the four functional

categories occurs most frequently and, thus, can be assumed to be the most central,

or prototypical member of that category, and how they compare to the other

members of the categories.

13

2.7. Previous Studies on the Topic

2.7.1. Hůlková (2011)

In 2011, Hůlková conducted a study, similar to this one, investigating possible

differences in men’s and women’s use of conjunctive adverbials, as she calls them.

Hůlková bases her work on a corpus of 50 research articles from five different

disciplines, with a total of 350,000 words. First, she presents some theoretical

considerations regarding the register of academic prose, which she considers to be

“explicit, unambiguous and logical” (Hůlková, 2011), and the sub-register of

research articles, which is, according to her, mostly defined by its structures. At this

point, it is notable that Hůlková uses the term ‘register’ to describe academic prose

and research articles instead of the term ‘genre’. Concerning terminology, it is also

evident that she uses the term “conjunctive adverbials” to describe what I call

connecting adverbials. Hůlková analyzes these conjunctive elements in terms of

frequency not only in total, but also by five different academic disciplines and by

men versus women. Moreover, she divides the connectors into six semantic

categories and analyses the frequency for each category in the same way as she did

for the whole corpus. At merely 350,000 words, the corpus Hůlková uses for her

analysis is comparatively small. In contrast to this, she uses a long list of 90

conjunctive adverbials for her analysis. Her findings show that, first and foremost,

there are certain connectors that are generally used more often than the others.

Taking into account the different distinctions she made within her corpus, Hůlková

concludes that gender does not have any influence on the use of connectors. There

might be some minor differences, though they can be explained as being

idiosyncratic. The division by academic fields shows notable differences, which

Hůlková attributes to the different needs and goals the authors in the respective fields

have.

The most important finding of her study for my own research is the fact that

gender has no influence on the use of connectors. The results Hůlková presents are

very clear in this respect. Nevertheless, I included gender to see whether her finding

can be confirmed.

2.7.2. Bolton, Nelson and Hung (2002)

Another study has been conducted in 2002 by Bolton, Nelson and Hung, who

investigated the underuse and overuse of connectors in the writing of Chinese

14

students of English. They use the Hong Kong and the Great Britain components of

the International Corpus of English for their analysis. In their paper, they establish

four categories of cohesive devices: additive, adversative, causal and sequential

(Bolton, Nelson, & Hung, 2002). They investigate the over- and underuse of

connectors in Hong Kong undergraduate student’s writing. For this reason, they

compiled a corpus of ten untimed essays and ten timed examination scripts, with a

total of 46,460 words. (Bolton, Nelson, Hung, 2002) Of particular interest for my

own work is their categorization of the connectors, which I used as a basis to develop

my own word list. It is furthermore notable that, even though Bolton, Nelson and

Hung include both timed and untimed writing, they do not go into detail of the

effects of timedness on writing. Additionally, they use the British component of the

International Corpus of English (ICE-GB) as a reference corpus to investigate the

over- or underuse of connectors. This methodology was adapted into the current

paper, however I decided to use the International Corpus of Learner English (ICLE)

instead of the ICE corpus, since I do not want to investigate an under- or overuse

compared to native speakers, but the general usage of connectors in student writing

and the effect of the writing being timed or untimed. Their study concludes that

connectors are generally overused in student writing, both of native speakers and

learners of English.

2.7.3. Milton and Tsang (1993)

The last study I want to mention in this section was conducted by Milton and Tsang

in 1993. They investigated the usage of logical connectors in non-native students’

writing with a special focus on giving directions for future research. In their paper,

Milton and Tsang start by outlining the necessary background information on the

usage of connectors and the electronically-aided study of student writing. The corpus

used in this study contains about four million words, split into about 2,000

assignments written by first year undergraduates and about 200 scripts from the

Hong Kong Examinations Authorities’ ‘A’ level Use of English examination.

(Milton & Tsang, 1993) To compare their data and their findings, Milton and Tsang

chose three native speaker corpora (Brown Corpus, LOB Corpus, HKUST Corpus).

The results of the study show that there is generally considerable variation in the

distribution of which connectors are used by the different corpora. Only the three

most frequent connectors are the same in all four corpora, namely “and, also and

15

because” (Milton & Tsang, 1993). In the following, Milton and Tsang are focusing

on the English as a Foreign Language (EFL) aspect of their work, particularly on the

over-, under- and misuse of some connectors. They also investigate the origins of

these usage issues and from this, deduce implications for future teaching. They

conclude that the source of these errors in students’ writing originate from teaching,

and consequently, teaching methods and materials have to be adapted to cater for the

special needs of learners of English in an academic context.

3. Analysis

3.1. Methodology

In the following, I want to take a close look at the methodology implemented in this

study. On a broad perspective, the study is a quantitative analysis of two corpora.

The corpus that is the main subject of the analysis is the ChemCorpus, which has

been compiled at the Chemnitz University of Technology. In order to have a

comparison to the results, the International Corpus of Learner English (ICLE) will be

used as a monitor corpus. Furthermore, the corpora will each be divided into two

sub-corpora according to the timedness of the texts. However, for some of the ICLE

texts, the required metadata of whether a text is timed or untimed is not available.

Thus, only the texts for which the information is available will be part of the

analysis.

After the selection of the data, the actual analysis will be carried out by a tool that

I developed specifically for this task. It is a Perl script that has two major tasks: First

of all, filtering unwanted text, such as table of contents or references from the files.

This applies only to the ChemCorpus data, which has been tagged in this respect.

The ICLE data, on the other hand, contains no such tags, since the texts do not

contain these elements, and the script will simply not perform these operations. The

second main task the tool will carry out is counting the occurrences of all tokens that

are to be researched and counting the total number of words.

In the third step, the data gathered by the analysis script will be stored in a

Microsoft Excel table, which will then be used to calculate comparable, normalized

frequencies. The data will also be aggregated into the four functional categories.

Finally, the results of this step will be used to create graphs that visualize the results.

16

3.2. Corpus

In this section, the corpus that is subject to my analysis, the ChemCorpus1, which

was compiled, as the name suggests, at the Chemnitz University of Technology, will

be described in detail. For this paper, not the whole corpus will be used but a subset

of texts containing basic text types, written magister theses (MagTheses) and written

magister exams (MagWritten). The corpus has been compiled from the year 2001

onwards, with the most recent texts used for the analysis in this paper, dating to

2011. The present corpus contains 1,709,983 words, and the complete corpus

consists of four text types, i.e. written magister exams, magister theses, bachelor

theses and master theses. The sub-corpus used in this paper consists of 1,059,263

words and contains two text types, as mentioned above. Concerning the Magister

theses, there is a further subdivision by the subject area into Culture and Literature

(MagCultLit), and Linguistics (MagLing). The Table 2 below shows the distribution

of texts by category. The texts in the MagTheses category have been final papers

written by students at TU Chemnitz, and thus, they represent the untimed category.

In contrast, the texts in the MagWritten category have been texts produced during the

final written exam at TU Chemnitz, and consequently, they represent the timed

category.

MagTheses MagWritten

MagCultLit MagLing

10 23 52

Total number of Texts 85

Table 2: Number of Texts

MagTheses MagWritten

Total Number of

Words

922,343 136,920

Table 3: Number of Words

The table reveals a minor problem with the data set. The number of MagTheses texts

is very unevenly distributed into the subcategories. There are more than twice the

numbers of texts in the MagLing category than in the MagCultLit category.

Analogously, the number of texts in the MagWritten category is again much higher

than in the MagTheses category. The skewness in the distribution of the number of

1 http://www.tu-chemnitz.de/phil/english/ling/chemcorpus.php

17

texts gets completely reversed when looking at the number of words in the respective

categories. Here, the number of words in MagTheses is almost seven times higher

than in MagWritten, despite the fact that the category MagWritten contains a

considerably higher number of texts.

The combination of the two skewed figures above allows the conclusion that this

is not an ideal yet stratified corpus, as there is a sufficient amount of data in for both

categories. Furthermore, it is essential to note that the disproportion in the figures

originates in the different lengths of the texts in the categories. The texts in the

written exams section contain on average 1,987 words, with a standard deviation of

593 words, whereas the texts in the theses section contain 29,379 words on average,

with a standard deviation of 9,990 words. This may be an issue, since the untimed

part of the corpus is multiple times as large as the untimed component, and thus the

corpus is not ideally stratified. However, the analysis will have to show whether this

has an impact on the results.

3.3. Monitor Corpus

For this paper, the International Corpus of Learner’s English (ICLE) is used as a

monitor corpus.

The International Corpus of Learner English contains argumentative essays written by

higher intermediate to advanced learners of English from several mother tongue

backgrounds (Bulgarian, Chinese, Czech, Dutch, Finnish, French, German, Italian,

Japanese, Norwegian, Polish, Russian, Spanish, Swedish, Tswana, Turkish). (UCL -

ICLE, 2012)

However, before the corpus could be used for the analysis, an issue with the data had

to be resolved. The data was not available in a format that could be used the same

way the ChemCorpus was analyzed. The data files were all stored in a single

directory with a unique ID as file name. To access them, there was only one special

piece of software available that allows the user to filter for certain sociolinguistic

criteria. The only downside of this tool is the lacking ability to export the files that

match the search criteria. This functionality would however have been vital to carry

out further analysis using the tool used to analyze the ChemCorpus. The

documentation of the ICLE tool states that the metadata is stored in a certain

database file, which is located in the data folder. This leads/ led to a further problem,

since the database file was stored with a special file extension, ‘.ICLE’. This problem

could be resolved by taking a look at the file header using a hex editor (see Figure 2

18

below), which revealed that the file was simply a renamed Microsoft Jet 4.0

database2 and could consequently be easily accessed using Microsoft Access.

Figure 2: ICLE Metadata Database File Header

This made it possible to create two lists containing all the filenames of timed and

untimed pieces of writing. These lists were then fed into a script (see Appendix A.2.)

that was developed to copy the respective files to a new directory, thus creating two

sub-corpora of timed and untimed writing. Afterwards, it was possible to use the

same analysis script that was used for the ChemCorpus. It is, furthermore, important

to note that there is a considerable amount of texts that cannot be classified as either

timed or untimed since the corresponding metadata field states “Unknown”. These

texts have been ignored in the analysis in this paper, as they would pose an

unresolvable methodological problem because it would be impossible to integrate

them into the established framework of the research.

In the following, a few statistics concerning the used data from the ICLE will be

presented. As mentioned above, the corpus has been split into two sub-corpora of

timed and untimed writing. The former contains 639,673 words and the latter

1,684,555 words, totaling 2,324,228 words altogether. The untimed texts are on

average 700 words long, with a standard deviation of 295 words, whereas the timed

texts average 619 words with a standard deviation of 302 words. There is not much

difference in length and deviation between the two categories, which is a

considerable difference to the ChemCorpus, where the difference in length between

timed and untimed writing is more than a factor ten. More on this in section 3.4

below.

2 http://support.microsoft.com/kb/275561/en

19

3.4. Corpus Compatibility and Issues

When comparing two corpora, the question of data compatibility arises. Attention

has to be paid to the stratification of the corpora as well as to the comparability and

stratification. Analyzing the two corpora used in this paper, some issues arose by

looking at the numbers of texts and the word count. While the corpus that is

primarily researched contains roughly 1 million words, the monitor corpus contains

approximately 2.3 million words, more than twice the primary corpus. Thus, this

issue has not much weight in terms of comparability, as for comparing the two

corpora the absolute values can easily be transformed into normalized, relative

values. However, when taking a closer look at the texts comprising the corpora,

another issue becomes apparent.

ICLE

untimed

ICLE

timed

ChemCorpus

timed

ChemCorpus

untimed

Average Text

Length in words

700 619 1987 29379

Standard

Deviation

295 302 593 9990

Table 4: Average Text Length of Corpora

Table 4 shows that the texts in the timed components of the ChemCorpus and the

ICLE differ by a factor of more than three. The differences in text length are even

more extreme for the untimed components, where the texts differ by a factor of more

than 40. In general, the ICLE is stratified in terms of text length, whereas the

ChemCorpus is not. This might seem like an issue, however since the two corpora

will not be compared as a whole, but instead, the timed and untimed components will

be compared. It is furthermore important to keep in mind that the ChemCorpus might

consist of longer texts, but the ICLE still has more texts and overall more words.

Consequently, the text length might have an impact on the stratification of the

corpus, but is not relevant for the comparison.

3.4.1. Text Type

With regard to the texts that comprise the corpora, the different text types are also

important. While the ChemCorpus contains exam essays and theses, the ICLE

contains argumentative essays. The comparability of these text types, however, is

still subject to discussion. They differ not only in length, as seen above, but also in

the communicative strategies utilized. While the argumentative essay usually tries to

20

convince the reader to something by presenting a structured argumentation, the thesis

presents academic work that has been carried out. Due to the different

communicative purposes, the linguistic strategies to achieve these purposes will

differ as well. Consequently, it is necessary to compare these two text types to assess

whether they are comparable or not. Theses do generally follow the Introduction –

Method – Results – Discussion (IMRD) structure of research articles (Samraj, 2008,

p. 57). However, Samraj shows that, depending on the department, sometimes

different strategies are utilized. As a result, it can be expected that student’s theses

are written in a similar style to research articles. This implies a use of connecting

adverbials similar to research articles.

The argumentative essay, however, has a different purpose, i.e. “to persuade the

reader of the correctness of a central statement” (Hyland, 1990, p. 68). Consequently,

the structure is different too, as

Figure 3 below shows. The text is structured in three stages. Firstly, there is the

thesis, which is accompanied by an attention grabber and background information on

the topic, as well as a short evaluation, which briefly supports the thesis. Secondly,

the main argumentation, which presents a number of claims that are furthermore

supported by evidence, follows. Lastly, the conclusion, which rounds up the

argumentation and reaffirms the thesis, completes the essay. Resulting from this

structure, the connecting adverbials used will differ too. Since it is the purpose of an

argumentative essay to persuade the reader, the connecting adverbials used to create

cohesion will be mostly additive and causal. Adversative connecting adverbials will

most likely be used less frequently, since they would not match the communicative

purpose of relating arguments to each other.

The structure of an argumentative essay is, according to Hyland (1990), thesis –

argument – conclusion. Each step is furthermore divided into up to five moves (c.f.

Figure 3). The thesis, which introduces the proposition that the essay will argue for,

also includes the general introduction to the topic, which often is realized by an

attention grabber, some background information, and a brief support of the thesis.

Afterwards, the main part of the essay contains the actual argumentation which, after

an introduction, basically makes a claim, which then is supported by explicit

assumptions or data and citations. Notably, there is no limit to the amount of

arguments that can be presented. The final part of the argumentative essay is the

21

conclusion, which wraps up the argumentation and shows that the hypothesis has

been proven right or wrong.

Figure 3: Argumentative Essay (Hyland (1990), p. 69)

After reviewing the structure of theses and argumentative essays, it can be seen that

there are numerous differences. First of all, a thesis usually follows the IMRD

structure of research articles, while the argumentative essay has a thesis – argument –

conclusion structure. However, the difference is not only the four-part structure in

contrast to a three-part structure. Apart from the introduction, there are no

similarities in the communicative purpose of the different parts. There is neither a

methodology description, nor a presentation of results or a discussion of findings in

22

an argumentative essay. However, there might be some similarities, as a thesis also

needs to present some arguments, for example to justify the choice of one

methodological approach over another.

In conclusion, the issue of text type compatibility potentially has a big impact on

the results. The two text types are fundamentally different in structure and

communicative purpose. Thus, the results of the analysis have to be examined very

carefully and critically, since the effect of the different text types in the two corpora

cannot be reliably predicted.

3.4.2. Gender

Another issue is the stratification of the corpora. As Schmied (2011) points out, there

is a severe lack of male students within the field of English Language Studies at

universities. This problem is mirrored in the data used for this study. The

ChemCorpus contains only nine texts written by male students at a total of 85 texts,

with 15 texts for which there is no information concerning gender. For the ICLE

corpus there are 2833 texts written by female students, 601 texts written by male

students and nine texts without available gender information. These numbers show a

clear skewness towards texts written by female students, which makes the corpus not

very stratified in terms of gender. While this is an issue affecting the stratification, it

is rather unlikely to have any influence on the results, since Hůlková (2011) has

shown that gender has no significant influence on the usage of connecting adverbials

and that possible minor differences in the statistics are merely idiosyncratic

(Hůlková, 2011, p. 137).

3.4.3. Department

Furthermore, the ICLE does not contain information regarding the department which

the students, who produced the texts in the corpus, belong to. The ChemCorpus on

the other hand contains this information, since there is a clear separation between

Linguistics texts and Culture and Literature texts. But as pointed out in section 3.1.,

there are issues with that as well, since the amount of Linguistics papers is

considerably higher. Furthermore, the information regarding the department is only

available for untimed texts. Although there might be a distinction between the two

departments in the ChemCorpus, there is no evidence that there will be much

difference in the usage of connecting adverbials, since both departments belong to

the field of humanities. There are studies that have researched the impact of

23

department on the usage of connectors. However, the departments researched belong

to quite differing academic fields, such as politics, psychology or management

(Hůlková, 2011). Thus, it is debatable if department has an influence, particularly if

the departments belong to the same academic field. Due to the missing availability of

the department data for the ICLE corpus, I decided not to investigate the department

variable, since there would be no grounds for comparison.

3.4.4. Other

Finally, the age of the corpora differs. The ICLE was released in 2002 and, therefore,

does not contain newer texts. The ChemCorpus, on the other hand, contains texts

from the years 2001 to 2012, which makes the corpus very up-to-date. Since they

have been compiled in the same decade, I would not expect that age has any

considerable influence, since it takes time to incorporate changes in teaching

academic writing into the curricula of universities.

All in all, the two corpora at hand differ in some ways, but the skewness in the

structure of the data seems to be similar. This fact, and the relatively large amount of

data, totaling approximately 1 million and 2.3 million words for the ChemCorpus

and the ICLE respectively, still makes the corpora reasonable data-bases for analysis.

3.5. Analysis Tool

The first step was to convert all data files into a coherent format. The data were

available in Microsoft Office format, which has the drawback of being hard to

process outside of Microsoft Office products, due to the closed source design of the

file format. Consequently, all the files had to be converted to plain text, in order to be

accessible for further analysis with external tools. Since the documents at hand were

not using any of the markup features that MS office file formats offer, no information

would get lost when converting the files to plain text files, as the conversion process

removes all markup features. Since the number of files to convert was rather high,

converting all the files manually would have been to time consuming, I decided to

use an open-source tool named AbiWord (The AbiSource community, 2012), which

is a word processer that allows the use of certain functions via the command prompt,

making automated conversion fairly easy. Ultimately, the following command

accomplished the task by recursively looking for all MS Word files in a given

directory and converting them to plain text files.

24

For /f %a IN (‘dir /b *.doc’) do call “C:\Program Files

(x86)\AbiWord\bin\AbiWord.exe” –to=txt %a

The for loop iterates through all files in the directory and calls AbiWord with the

option to convert a file into plain text for every Microsoft Word file with the .doc file

extension.

Once all the files were available in plain text, the analysis could be carried out. Át

that point, another issue emerged. The default procedure would have been to analyze

the files with AntConc (Anthony, 2011), in order to calculate the frequency of certain

words in the text. Considering that I wanted to analyze a rather large number of

about 55 words, this would have been a very long and repetitive task. Consequently,

I chose to develop a tool that would automatically perform the analysis. In the

following, I want to outline what the script does and how it works. For the full source

code, please, see the appendix.

The first step was deciding on a programming language. I chose Perl because it is

has a very powerful regular expression implementation, which is quite useful for text

processing, and because there are a lot of modules providing additional functionality

available via CPAN3, which is a repository for Perl expansion modules. As a result,

the source code is rather short, despite the different tasks the script accomplishes.

The script itself has four main functions:

1. load a wordlist

2. filter out unwanted text

3. calculate frequencies

4. save results in a MS Excel readable file format

The next step was to decide for an input and output file format. Ideally, both should

be the same, MS Excel should have read and write support for both, and the

implementation should not need more code than the actual script itself. Taking all

these conditions into account, I decided to use comma separated value (CSV) files, as

they are easy to edit in MS Excel (or any other spreadsheet application or even a text

editor) and there was a Perl module, Text::CSV_XS (Brand, 2012), providing all the

necessary functions. For a detailed description of CSV files, refer to RFC 4180

(Shafranovich, 2005), which describes the MIME type text/csv. Since handling CSV

files using this module is quite convenient, I decided to use the file format for the

3 http://cpan.org/

25

output of the tool as well, given the fact that Microsoft Excel reads the files without a

problem.

Now, I want to take a detailed look at the source code of the tool and will describe

how it works. The first few lines of code load all the required modules and set up all

global variables the tool needs. Then the function LoadData() is called, which loads

the CSV file containing the linguistic variables that are going to be analyzed. It,

furthermore, initializes the array that will later contain the statistics. The next step,

which actually is two steps, is done by the line

find(\&Count, $data_path);

This line uses the File::find module, which is part of the standard Perl distribution,

to recursively go through all the files in the directory given in the second argument,

executing the function given in the first argument for every file in the directory. Note

that it is important to provide a reference to the function and not to call the function

directly. The function Count() opens the current file and filters out all passages that

are not text produced by the student. I decided to remove everything that is not full

sentences, namely quotations, headings, figures, tables, the table of contents, the

reference section and the appendices. Since the documents had already been tagged

using (X)HTML/XML style tags, filtering these can be accomplished using regular

expressions.

s/<tag.*?<\/tag>//sg

This regular expression performs a global search for a string that contains “<tag” and

“/tag>” with a number of characters in-between that can also be zero and replaces

that string with an empty string, consequently deleting the given passage. Secondly,

the function iterates through the array, whose elements are the list of words

previously loaded from the CSV file. For each element, the script counts the

occurrences of the element in the current text. The third function of this subroutine is

to count the total number of words in all the documents that are analyzed. After the

completion of this subroutine, the program goes back to its main part, and the

function SaveToCSV() is called. This function compiles an array that represents a

row in the CSV file containing the word, the corresponding frequency, and the

relative frequency in all of the analyzed texts. These rows are then written to the

output file. Note that it is important to specify the separation character when

constructing a new Text::CSV_XS object, so that Microsoft Excel can handle the

26

output file correctly. Lastly, the tool prints status information, namely the file it is

processing at the moment.

27

3.6. Statistics

In this chapter, I want to provide a detailed description of the statistical data my

analysis produced. The complete tables that are described in this section can be

found in the appendices. The analysis tool produced tables with the absolute

occurrences of the tokens in an MS Excel readable file format. The rows of the tables

contain the tokens and the columns represent the corpus files. Additionally, the

number of words of the individual files is printed in the last row. With this basis, the

next step in the analysis was to transpose the table, making further operations easier.

Afterwards, the values were normalized per 1 million words. Finally, the data was

aggregated into a table with the four functional categories and the metadata. This

aggregated table was then used to create the figures to visualize the findings. This

last step is especially important, since the raw numbers do not reveal much insight,

as they are hard to read, so plotting them might reveal more. However, upon doing

so, I noticed one major issue, due to the high number of occurrences of ‘and’, the

graphs for the other connecting adverbials were hardly distinguishable. Thus, I

decided to ignore the values for ‘and’ in the graphs that show data that was not

aggregated. Furthermore, plotting all the connecting adverbials in one figure is only

useful to a limited extend, as it only reveals extremely high usage levels, but hardly

any tendencies are visible. Afterwards, the timed – untimed categories were plotted

and examined, followed by the four functional categories. Plotting the connecting

adverbials according to their functional categories yielded very nice results, showing

a clear, skewed usage pattern. The next section shows the statistics concerning

gender, L1 and prototype. In the following, these findings will be thoroughly

discussed.

28

Fig

ure

4:

Ch

emC

orp

us

Co

nn

ecti

ng

Ad

verb

ials

per

On

e M

illi

on

Wo

rds

29

Fig

ure

5:

ICL

E C

on

nec

ting

Ad

verb

ials

per

On

e M

illi

on

Wo

rds

30

At first, figures 4 and 5 above show the normalized distribution of all connecting

adverbials in the two corpora. These two figures allow the identification of the most

frequent connectors in the corpora. For the ChemCorpus the most frequent

connecting adverbials are also, and and but. For the ICLE on the other hand, there

are four connectors that occur most frequently, which are also, and, but and because.

Apart from the connecting adverbials with very high frequencies I also want to point

out that some connectors do not occur at all. On account of this and whence do not

appear in any either corpus, whereas incidentally, on this basis, anyhow and at last

do not occur in the ChemCorpus. When comparing the two figures another, it can be

seen that the ICLE seems to have overall more connecting adverbials then the

ChemCorpus, however, this needs further investigation.

The next aspect of the corpus that has been analyzed is timedness. Figure 6 shows

the average total of connectors according to timed – untimed writing in the two

corpora side by side.

Figure 6: Comparison timed – untimed in ChemCorpus and ICLE

The figure shows that the numbers are very close together for the two branches

within each corpus, as well as for both corpora in comparison, averaging at around

20,000 connectors per one million words, which equals two percent of the words. It

is furthermore curious that there are more connecting adverbials in the timed section

of the ChemCorpus, whereas in the ICLE there are more connecting adverbials in the

untimed section.

20447 19848

0

5000

10000

15000

20000

25000

timed untimed

ChemCorpus

21324 21841

0

5000

10000

15000

20000

25000

timed untimed

ICLE

31

Figure 7: Functional Categories, timed – untimed

To look further into the timed difference, figure 7 shows the connecting adverbials

grouped into the four functional categories compared by timedness. As the average

total has already suggested, there is hardly any apparent difference. The numbers of

connecting adverbials in the additive and sequential categories are equally high

respectively low and very close together. In contrast, the numbers for the adversative

and causal categories differ more. There are roughly 800 connecting adverbials more

in the ICLE in each of the two categories, which is four percent. While this

difference is not very significant, it accounts for the slightly higher average total that

has been discussed earlier.

Figure 8: ChemCorpus Connecting Adverbials by Functional Category (per one million words)

0

2000

4000

6000

8000

10000

12000

14000

timed untimed timed untimed

ChemCorpus ICLE

additive

adversative

causal

sequential

12849

4592

2320

687

12276

4588

2238

746

0

2000

4000

6000

8000

10000

12000

14000

additive adversative causal sequential

timed - ChemCorpus untimed - ChemCorpus

32

Figure 9: ICLE Connecting Adverbials by Functional Category (per one million words)

After discussing the functional categories in terms of timedness, they now will be

discussed separately. The figures 8 and 9 show the number of connectors in each

functional category for the two corpora, normalized per one million words. It is

striking that, in both corpora, the additive category has more than twice the amount

of connectors as the next frequent category, adversative connecting adverbials. The

graphs show a general trend of decline, with additive connectors being the most

frequent category, followed by adversative, causal and sequential. The adversative

category counts only half the number of connectors as the additive, the causal only

half of the adversative and the sequential only less than half of the causal category.

This trend, again, appears in both corpora. One might argue that the trend is only

visible due to the arrangement of the categories, which is quite arbitrary. On first

glance this may be true. However, there are two factors that influenced the sequence

of the categories. Firstly, as they have been modeled following different

categorizations from literature, the categories follow a similar structure. Secondly,

the categories are sequenced by frequency, from the category with the most tokens to

the category with the least.

12403

5216

2926

779

12843

5362

3040

596

0

2000

4000

6000

8000

10000

12000

14000


timed - ICLE untimed - ICLE

33

In the following, I want to shift the attention to the variable of gender. Even

though Hůlková (2011) finds that in her data gender did not have any influence on

the usage of connecting adverbials, I decided to include gender in my analysis to see

whether her findings could be confirmed. Unfortunately, information on gender was

not available for all texts in the ICLE, thus only those which had information on

gender in the metadata database were considered. Again, the analysis splits up the

connecting adverbials into the four functional categories to provide a more detailed

view.

Figure 10 below shows the results.

Figure 10: Functional Categories by Gender

The bars in the chart already suggest that female writers use connecting adverbials

more frequently than their male counterparts, but the difference is not as clear cut as

expected. While women use connectors in total roughly 1600 more per one million

words for the ChemCorpus and 1200 per one million words in the ICLE, the

functional categories do not exhibit the same usage pattern throughout. The

ChemCorpus data shows that women predominantly use more additive and

adversative connecting adverbials than men, whereas it is the other way round for the

causal and sequential category. For the ICLE data, the usage patterns are even less

visible. Female authors use more additive and causal connecting adverbials, whereas

male authors use more adversative and sequential connectors. However, the

differences for the sequential and especially for the adversative category are very

low. For the adversative category, female authors use 5,304 connecting adverbials

0

2000

4000

6000

8000

10000

12000

14000

Female Male Female Male

ChemCorpus ICLE

additive

adversative

causal

sequential

34

per one million words and male authors use 5,397 connecting adverbials per one

million words, which make the difference insignificant and thus does not allow a

decision whether this difference is due to gender or due to statistical variance.

In the next part, the influence of the mother tongue of the authors on their usage

of connecting adverbials will be investigated. More precisely, the German L1 writers

of English of the ChemCorpus will be compared with Germanic L1 writers of

English of the ICLE.

Figure 11: Functional Categories by L1

The figure shows that authors with Swedish as their L1 use additive and adversative

connecting adverbials more frequently than others, whereas authors with Dutch as

their L1 use more causal connectors. It is furthermore notable that there is a

discrepancy between the usage of connecting adverbials by German L1 authors of

the ChemCorpus and the ICLE. The ICLE texts exhibit a higher usage of adversative

and sequential connectors, which may be due to the different text types of the two

corpora. Overall, the authors with Swedish or Dutch as their L1 use roughly 2,000

connecting adverbials more per one million words than their German colleagues.

The last aspect that will be analyzed in this section is prototype. Here, the most

frequent connecting adverbials were treated as prototypical and compared to the rest

of connectors in the respective categories. For most frequent connecting adverbials

compare Figure 4 & 5. The following connecting adverbials were selected as

prototypical: and (additive), but (adversative), because (causal) and next (sequential).

Moreover, the timed – untimed division was neglected, since it could be seen above

0

2000

4000

6000

8000

10000

12000

14000

German Dutch German Swedish

ChemCorpus ICLE

additive

adversative

causal

sequential

35

that it has no significant influence. Figure 12 and 13 show the plotted statistics for

the comparison prototypical against the rest.

Figure 12: ChemCorpus Functional Categories by Prototype

Figure 13: ICLE Functional Categories by Prototype

It is striking that the results for the two corpora are almost completely different. The

statistics for the ChemCorpus show that the additive prototype occurs more

frequently than the other connecting adverbials in the category, whereas for the other

three categories the frequency of the prototype is lower than the rest of the

connecting adverbials combined. In contrast, the statistics for the ICLE data

coherently show that the prototypical connector in three of the four categories occurs

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

add_prot add_rest adv_prot adv_rest cau_prot cau_rest seq_prot seq_rest


0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

add_prot add_rest adv_prot adv_rest cau_prot cau_rest seq_prot seq_rest


36

more often than the rest of the connectors, however, the sequential prototype occurs

less frequently than the rest of the connectors in the category combined. It is

interesting that the statistics for the ICLE data show basically the reverse tendencies

that could be observed for the ChemCorpus data. Moreover, the difference between

prototype and rest is bigger for the ICLE data, which hints at a lower diversity of

connecting adverbial used in the texts.

3.7. Results

After taking a closer look at the statistics, the results and deductions that can be made

will now be presented. The first and unarguably most important conclusion is that

there is a clear usage pattern that, albeit some minor differences, can be found in both

corpora. More specifically, there are six very evident usage patterns.

Firstly, there are three connecting adverbials that are by far the most frequent:

and, also and but. These three occur more than twice as frequent as any other

connector in both corpora, with the ChemCorpus exhibiting this feature more clearly

than the ICLE. This finding also correlates with the results of the study by Milton

and Tsang (1993).

Secondly, the statistics have shown that whether a text is timed or untimed writing

has most likely no influence on the usage of connecting adverbials (Q 1.). When

comparing the overall connecting adverbial use in the timed and untimed branches of

the corpora, the numbers are almost equal. While there are minor differences, with

the timed texts in the ChemCorpus having more connectors than the untimed texts

and vice versa for the ICLE, these differences are far from statistically significant

and can be accounted for the uneven distributed number of texts in the categories or

by a certain deviation due to other possible influences such as cultural background of

the author or personal preference, which all have not been taken into consideration in

this study. In addition, the data suggests that, since there is no change in the usage

patterns due to time, it cannot be assumed that there are more complex (i.e. causal or

sequential) connecting adverbials in untimed writing (Q 1.1.)

Thirdly, there is a very clear usage pattern of connecting adverbials for the four

functional categories. Most notably, the usage pattern is apparent in both corpora in

the same way, with only little differences in frequencies. There is a very clear decline

in frequency from one category to the next. The fact that additive connectors are

most frequent is logical, since it is the category that has on the one hand the highest

37

number of connectors and on the other hand contains two of the most frequent

connecting adverbials. In similar fashion, the sequential category, which is least

frequent, does not differ in the two corpora. Only for the adversative and causal

categories, the ICLE exhibits slightly more connectors, however, with a four percent

difference still within a statistically tolerable range. Thus it is difficult to deduce

whether this is a difference due to the writing style of the text or due to statistical

variance.

Figure 14: Overall Connecting Adverbial Shares

Since the difference between the frequencies of the categories in the two corpora

so small, the shares of the connectors can be visualized as in

Figure 14, with more than half of the connectors belonging to the additive category.

Fourthly, there is a clear pattern of use for gender. The statistics have shown that

female student writers use more connecting adverbials in total, albeit this is not

Overall Connecting Adverbial Shares


Overall Connecting Adverbial Shares


38

exactly the same for all four functional categories (Q 2.). This overall difference was

even more prominent in the ChemCorpus than in the ICLE. The gender specific use

of connecting adverbials for the four functional categories exhibits a pattern in the

ChemCorpus, where female authors use more additive and adversative connectors,

whereas male authors use more causal and sequential connectors. These findings

directly contradict the study by 2.4.1. Hůlková (2011) who found that, for her data,

gender did not have any influence on the use of connecting adverbials. This might

suggest that the influence of gender on the use of connecting adverbials is also

dependent on the text type.

Fifthly, the analysis has shown that the L1 of the authors has an influence on the

use of connecting adverbials. It could be seen that writers with Dutch or Swedish as

their L1 use more connecting adverbials in general. In particular, Dutch writers of

English use more adversative and causal connectors whereas Swedish writers of

English use more additive and adversative connectors. These findings are quite

astonishing, as the initial assumption was that German authors would use more

connecting adverbials than writers with other Germanic L1’s (Q 3.).

Sixthly, there is a clear usage pattern in terms of prototype. The data shows that

the prototypical connecting adverbial, meaning the most frequently used, accounts

for a big, and for parts of the data even major, share of the overall connector use in

the respective functional category. At this point, it is necessary to differentiate the

results for the two corpora. The ChemCorpus shows that the prototypical connecting

adverbials make up a large share of all of the connecting adverbials in the functional

categories, but not the majority, except for the additive category. The ICLE data

shows the opposite. Here, the prototype accounts for the majority of connecting

adverbials in the functional categories, except for the sequential category. This shows

that the ChemCorpus has a greater diversity of connecting adverbials, whereas the

ICLE is dominated by the prototypical connecting adverbials.

Comparing the total numbers, the result is that Chemnitz learners of English use

connecting adverbials less frequently than learners of English from other countries,

and that there are some differences in their way of using them (Q 5.). The research

has further shown that there are different variables that influence these differences.

At first, there is the influence of timedness on the use of connecting adverbials,

which this study has found to be of no importance. Secondly, it has been shown that

gender has a considerable influence on the use of connectors in the ChemCorpus in

39

contrast to the ICLE. Thirdly, the comparison of the ChemCorpus data to the ICLE

data has shown that the L1 of the authors also seems to have an influence on their use

of connecting adverbials. Specifically, it has been shown that writers with other

Germanic mother tongues than German use connectors slightly different. Lastly, it

has been shown that the most important factor in the use of connecting adverbials by

L2 writers of English seems to be prototype. This means that a large share of the

connecting adverbials in each of the four functional categories is constituted by the

prototypical connector. This tendency is clearly visible in the ChemCorpus, but even

more prominent in the ICLE, where the prototypical connecting adverbial represents

more than half of all connecting adverbials in three of the four categories.

To sum up the results, it can be stated that the Chemnitz L2 writers of English use

connecting adverbials generally in a similar way as their international colleagues, yet

there are some differences, which can be accounted to different variables.

3.8. Limitations

The analysis that has been carried out in this study has of course a number of

limitations, which have to be kept in mind when considering the results and drawing

conclusions. Some of the issues, especially with the data, have been discussed in

section 3.3., but they will be mentioned here again and complemented with further

considerations.

First of all, the categorization of the connecting adverbials has some aspects that

need to be carefully considered when assessing the data. One aspect is the number of

tokens in the categories. The additive category has the most tokens (for a complete

table of the categories and tokens, see the appendix section), and the analysis has

shown that more than half of all occurrences of connectors found belong to the

additive category. However, I want to suggest that there is no direct correlation

between the number of items in the category and the number of tokens found in the

analysis. This notion is supported by the fact that the second most frequently used

category, adversative connecting adverbials, has the least amount of items.

Furthermore, the least frequently identified category of sequential connectors has the

second most items. In addition, the differences in the number of items in each

category are not too big. There is no category that has twice as many items as the

other. Thus, the influence of the uneven distribution of the connectors across the

40

categories is most unlikely to have any impact on the result, but still has to be kept in

mind.

The second aspect I want to focus on in this section is text type. As I have already

discussed in 3.3.1., the two corpora are comprised of different text types. While the

ChemCorpus uses theses papers for the untimed components and written exam texts

for the timed component, the ICLE uses argumentative essays for both components.

The text types differ in multiple ways. First of all, they are of different length. The

texts in the ChemCorpus are longer than the ICLE texts, and particularly the thesis

papers are significantly longer than the ICLE untimed texts. Furthermore, the ICLE

timed and untimed texts are mostly similar in length, whereas the respective

ChemCorpus texts are of considerably different length. Secondly, the communicative

purpose differs across the text types. An argumentative essay undoubtedly has a very

different communicative purpose than a thesis paper. The difference in the

communicative purpose is likely to have an effect on the choice of connecting

adverbials, since the choice of connectors will differ between trying to persuade

someone with an argumentation and describing research that has been carried out.

Another interesting aspect in this regard is the department with which the students

are enrolled. As Samraj (2008) has shown, there are differences in the structures of

Master theses across different disciplines. Consequently, it can be expected that the

usage of connectors also varies across the different departments. In this study, I did

not consider this aspect, mostly because information regarding the department was

not available for all data. Thus, there is no evidence in this study to either support or

contradict Samraj’s notion.

The last issue I want to address in this chapter is the case of and. The statistics

that have been produced by my analysis exhibit a large amount of occurrences for the

token and. The reason for the extraordinarily high number of ands can attributed to

the fact that and can not only occur as an additive connecting adverbial, but also as

an enumeration. Due to this, the high occurrence of and distorts the results, as the

number of tokens in the additive category is presumably higher than the number of

actual additive connectors in the data. The reason for this might not be instantly

clear, since it is to be found in the format of the data and the way the analysis script

works. Since the tool that counts the occurrences of the tokens automatically does

nothing but regular expression matching, it is not possible for it to differentiate

between the conjunctive and the enumerative and. However, the reason for this is not

41

purely technical. The distinction would be possible if the data were completely part

of speech tagged and ideally in XML format. However, the data at hand is in plain

text format and the only tagged parts are table of contents, appendices, and quotes in

the theses that are part of the ChemCorpus. One possible solution would be to

exclude and from the analysis, however, this would distort the results too, possibly

even more, since previous studies have shown that and is one of the most frequently

used connecting adverbials (c.f. (Bolton, Nelson, & Hung, 2002)). Thus, the best

solution for this study is to leave the data and the results as they are, but keeping in

mind that there are too many tokens in the additive category.

Overall, this chapter has shown that there are quite some limitations to this study.

The data is only compatible to a certain extent. Connecting adverbials are unevenly

distributed into the functional categories and there is an issue with and. But still, I am

convinced that the study can prove useful, since general trends in the usage of

connecting adverbials by Chemnitz students have become visible, and putting them

into context by comparing the results with the same analysis carried out with data

from the International Corpus of Learners English can provide basic conclusions

concerning the teaching of academic writing.

4. Conclusion

On the previous pages I have presented the use of connecting adverbials in timed and

untimed academic student writing. My research questions were whether the usage of

connectors of the Chemnitz students varies between timed and untimed writing, the

influence of gender, L1 and prototype and finally if there is an overall difference

between the Chemnitz students and the usage by other L2 learners of English. After

having presented my research questions, the next chapter reviewed the relevant

literature. The topics of the review are connectors, which are further subdivided in

cohesion, reference and the categorization of the connectors, timed and untimed

writing, and the question of whether student writing is a genre on its own right. The

last parts covered three studies, which also deal with the topic of students’ use of

connecting adverbials. In this respect, I also discussed in how far these studies are

relevant for my thesis.

Afterwards, the main part with the analysis followed. First, I gave details about

the corpus used for the analysis as well as the monitor corpus. In the next sub-

chapter, the compatibility of the corpora was researched. The topics here were firstly

42

text type, which dealt with the difference between theses and argumentative essay.

The findings showed that the structure and communicative purpose of the

argumentative essay and theses differ significantly and the usage of connecting

adverbials is also very likely to differ. Secondly, the shares of texts written by male

and female students were discussed. The main issue with this topic is that in the field

of English and American Studies is the under-representation of male students, who

are “hard to find” (Schmied, 2011, p. 17). The last part of the corpus compatibility

section was the influence of the department in which the theses were written.

Following the compatibility discussion, I presented my analysis tool, which had been

developed for this study. The fifth sub-chapter presented the statistics that have been

compiled using the analysis tool. Since the complete tables are far too large, only

visualizations have been presented in this section, and the full tables have been

included in the appendices. The next sub-chapter presented the results of the present

study. It could be seen that there is no significant difference in the usage of

connecting adverbials in timed and untimed writing for neither the ChemCorpus texts

nor the ICLE texts. For the distribution in the four functional categories, there is a

clear trend that is present in both the ChemCorpus and the ICLE. The most

frequently used connecters are additive, which are more than half of all connecting

adverbials. The second most frequent category is adversative connecting adverbials.

The other two categories, causal and sequential, are the least frequently used, and

they comprise only a fraction of the overall connectors. Moreover, the influence of

gender, L1 and prototype on the use of connecting adverbials has been researched. It

has been found that gender seems to have an influence on the use of connecting

adverbials, namely that women use more connectors than men. Additionally, this

influence was more prominent in the ChemCorpus. The influence of L1 also showed

interesting results. It could be seen that authors with other Germanic first languages

use connecting adverbials differently than German L1 writers. Finally, it has been

shown that the prototypical connecting adverbials in each of the four functional

categories account for large shares of the overall number of connectors, even for

more than 50 percent in the ICLE data.

After having discussed the results of my study, the next chapter covered the

limitations of the study. These limitations, which result partly from methodological

issues and partly from the data, are important to consider when interpreting the

results of the study. The first limitation that was addressed was the issue of

43

categorizing the connecting adverbials. The issue deals not with the actual setting up

of the categories, but with the assignment of the connectors to the individual

categories. The 55 connecting adverbials, which are investigated in this paper, are

not evenly distributed among the four functional categories. The category of additive

connectors, which has the most connecting adverbials assigned to it, correlates with

the highest number of occurrences. However, this apparent correlation between the

number of connectors assigned to the category and the number of tokens found in the

corpora for this category cannot be verified with the other categories. The second

most frequently identified category of adversative connectors has the least amount of

connectors, while the least frequently found category of sequential connecting

adverbials has the second most number of connectors. Thus, even though the first

glance may suggest a correlation, closer investigation does not verify this

assumption. The second limitation that has been discussed is the different text types

of the texts that were used to compile the corpora. There is a difference in the

structure and communicative purpose of theses and argumentative essays, which is

very likely to also result in a different usage of connecting adverbials, making the

corpora only conditionally compatible. Lastly, the problematic connecting adverbial

and has been addressed as a limitation. The problem is the inability of the analysis

tool to distinguish between and as an enumerative conjunction and as a connecting

adverbial. Examples E 1. to E 4. show instances of both enumerative and connecting

adverbial and from both corpora.

E 1. “working with computer and Internet”

enumeration [MG05Ft_KM – ChemCorpus untimed]

E 2. “Until 1840 most of the British inhabitants in Australia were convicts

and only a small number of free settlers had arrived.”

connecting adverbial [f_W0809_K_N – ChemCorpus timed]

E 3. “[…] a world that was strongly divided in Blacks and Whites”

enumeration [DNNI4006 – ICLE timed]

E 4. “And life itself is in fact a chain of many simple things.”

connecting adverbial [BGSU1127 – ICLE untimed]

This issue can only be solved by using completely part of speech tagged data, since it

is impossible to automatically recognize the part of speech with only regular

expressions.

44

5. Outlook

After having conducted the analysis and having presented the results of my thesis, I

want to give an outlook with suggestions for further research. As it could be seen

when presenting the results and especially in the discussion of the limitations of this

study, there is certainly enough room for further refining the analysis and continuing

to research the matter. The first approach for further research I want to suggest is a

more detailed analysis of the ICLE data. Since the texts in the corpus have been

produced by students with a variety of L1s, it would be interesting to investigate

whether the result of the corpus as a whole are the same for the different L1 groups.

It could be expected that there is a difference, since previous studies, such as Bolton,

Nelson & Hung (2002), have found that the L2 learners of English usage of

connecting adverbials differs from L1 English writers. It would be very interesting to

see whether different language families have an influence on the usage of connectors

in the L2 and furthermore of what kind this influence is, if there is an under- or

overuse of certain connecting adverbials.

Another possible refinement of this study would be the same analysis with two

truly compatible and stratified corpora. Schmied (2011, p. 18) gives suggestion on

how the ideal, stratified, ten million word ChemCorpus would look like. In this

respect, it would also be interesting to redo the study with two fully part of speech

tagged corpora. This would be especially helpful to differentiate between and as

connecting adverbial and as enumeration.

Furthermore, a possible continuation of this study could be a comparison between

student theses and research articles. It has been shown in the literature review that

student writing can be seen as an apprentice work on the way to becoming a member

of the academic discourse community. While the issue of whether student theses

constitute a genre in its own right or are merely an imitation of professional

academic writing, such as research articles, is still debatable, it is safe to say that by

analyzing student theses and comparing them to research articles, helpful facts to

improve the students’ writing can be gathered.

Lastly, connecting the linguistic research to the field of teaching English, it would

be interesting to deduce implication for teaching from research. Especially when

comparing student theses to research articles, possible fields of action for teachers

can be identified. The results of the analysis in this paper show that students do not

45

use sequential connecting adverbials very frequently. A possible implication for

teaching might be that it could be helpful to incorporate more sequential connecting

adverbials into writing course by focusing on structuring the texts more thoroughly.

However, there is still a lot of room for research in this area, so this paper can be

seen as an initial study that shows different aspects for future research.

46

6. References

Anthony, L. (2011). AntConc (Version 3.2.4) [Software]. Available from

http://www.antlab.sci.waseda.ac.jp/antconc_index.html

Biber, D., & Conrad, S. (2009). Register, Genre, and Style. Cambridge: Cambridge

University Press.

Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman

Grammar of Spoken and Written English. Essex: Pearson.

Bolton, K., Nelson, G., & Hung, J. (2002). A corpus-based study of connectors in

student writing Research from the International Corpus of English in Hong

Kong (ICE-HK). International Journal of Corpus Linguistics 7:2, 156-182.

Gilquin, G., & Paquot, M. (2008). Too chatty Learner academic writing and register

variation. In English Text Construction (1 ed., Vol. 1, pp. 41-61). John

Benjamins Publishing Company.

Halliday, M. A., & Hasan, R. (1976). Cohesion in English. New York: Longman.

Hůlková, I. (2011). Conjunctive Adverbials in Academic Written Discourse:

Conjunctive Adverbials in Academic Written Discourse. In J. Schmied,

Academic Writing in Europe: Empirical Perspectives (pp. 129-142).

Hüttner, J. I. (2007). Academic Writing in a Foreign Language - An Extended Genre

Analysis of Student Texts. Frankfurt am Main: Peter Lang.

Hyland, K. (1990). A Genre Description of the Argumentative Essay. RELC

Journal(21), 66-78. doi:10.1177/003368829002100105

Kortmann, B. (2005). English Linguistics: Essentials. Berlin: Cornelsen.

Livia, A. (2003). Linguistic Approaches to Gender. In J. Holmes, & M. Meyerhoff,

The Handbook of Language and Gender (pp. 142-158). Oxford: Blackwell.

Martin-Martin, P. (2005). Scientific Writing: A Universal or a Culture-Specific Type

of Discourse? In Revista de Lenguas para Fines Especificos (Vols. 11-12, pp.

191-203).

Saeed, J. I. (2009). Semantics (3 ed.). Oxford: Wiley-Blackwell.

Samraj, B. (2008). A discourse analysis of master’s theses across disciplines. Journal

of English for Academic Purposes(7), 55-67.

47

Schmied, J. (2011). Academic Writing in Europe: a Survey of Approaches and

Problems. In J. Schmied, Academic Writing in Europe: Empirical

Perspectives (pp. 1-22).

Swales, J. M. (1990). Genre Analysis - English in academic and research settings.

Cambridge: Cambridge University Press.

UCL - ICLE. (2012, 12 19). Retrieved from http://www.uclouvain.be/en-cecl-

icle.html

48

7. Appendices

The following sections denote the paths under which the content can be found on the

CD accompanying this paper.

7.1. Corpora

\Corpora\ChemCorpus\

\Corpora\ICLE\

7.2. Statistics

\Statistics\

7.3. Sorting Tool Source Code

\Tools\sort_ICLE_data.pl

7.4. Analysis Tool Source Code

\Tools\analyze_data.pl

7.5. Selbstständigkeitserklärung

Documents

Probably it is only a Matter of Time · This chapter will cover the theory of connectors or connecting adverbials. Three different sub-chapters will deal with different aspects related