Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Technische Universität Chemnitz
Philosophische Fakultät
Institut für Anglistik & Amerikanistik
English Language & Linguistics
Bachelorarbeit
zur Erlangung des akademischen Grades
„Bachelor of Arts“
im Fach Anglistik/Amerikanistik
Probably it is only a Matter of Time
-
An Empirical Comparison of
Connecting Adverbials
in Timed and Untimed Student Writing
Betreuer: Prof. Dr. Josef Schmied
E-Mail:
Anschrift:
Matrikelnummer:
Geburtsdatum:
Studiengang: Bachelor Anglistik/Amerikanistik
Abgabetermin: 12.12.2013
i
Table of Contents
1. Introduction 1
1.1. Defining the Topic 1
1.2. Research Questions 2
2. Literature Review 4
2.1. Connecting Adverbials 4
2.1.1. Cohesion 4
2.1.2. Reference 5
2.1.3. Categories of Connecting Adverbials 6
2.2. Timed and Untimed Writing 9
2.3. Student Writing and Genre 9
2.4. Gender 11
2.5. L1 11
2.6. Prototype 12
2.7. Previous Studies on the Topic 13
2.7.1. Hůlková (2011) 13
2.7.2. Bolton, Nelson and Hung (2002) 13
2.7.3. Milton and Tsang (1993) 14
3. Analysis 15
3.1. Methodology 15
3.2. Corpus 16
3.3. Monitor Corpus 17
3.4. Corpus Compatibility and Issues 19
3.4.1. Text Type 19
3.4.2. Gender 22
3.4.3. Department 22
ii
3.4.4. Other 23
3.5. Analysis Tool 23
3.6. Statistics 26
3.7. Results 35
3.8. Limitations 38
4. Conclusion 40
5. Outlook 42
6. References 44
7. Appendices 46
7.1. Corpora 46
7.2. Statistics 46
7.3. Sorting Tool Source Code 46
7.4. Analysis Tool Source Code 46
7.5. Selbstständigkeitserklärung 46
iii
List of Tables
Table 1: Comparison of Semantic Categories of Connecting Adverbials 9
Table 2: Number of Texts 16
Table 3: Number of Words 16
Table 4: Average Text Length of Corpora 19
List of Figures
Figure 1: from Halliday & Hasan (1976) 5
Figure 2: ICLE Metadata Database File Header 18
Figure 3: Argumentative Essay (Hyland (1990), p. 69) 21
Figure 4: ChemCorpus Connecting Adverbials per One Million Words 27
Figure 5: ICLE Connecting Adverbials per One Million Words 28
Figure 6: Comparison timed – untimed in ChemCorpus and ICLE 29
Figure 7: Functional Categories, timed – untimed 30
Figure 8: ChemCorpus Connecting Adverbials by Functional Category 30
Figure 9: ICLE Connecting Adverbials by Functional Category 31
Figure 10: Functional Categories by Gender 32
Figure 11: Functional Categories by L1 33
Figure 12: ChemCorpus Functional Categories by Prototype 34
Figure 13: ICLE Functional Categories by Prototype 34
Figure 14: Overall Connecting Adverbial Shares 36
1
1. Introduction
Academic writing in English differs from other forms of written language. It has
been claimed that writing academically has to be learnt by native and non-native
speakers of English alike. Thus, it is important to analyze expert academic writing
and novice academic writing and to compare the results to deduce implications for
teaching academic writing. In addition, analyzing and comparing academic writing
produced by students can prove useful to identify factors that influence the students’
writing. Possible influential factors can be socio-linguistic, such as mother tongue or
gender, but also due to other conditions, such as text type or academic tradition.
These factors can cause differences in the use of the language not only compared to
the expert texts in the genre, but also compared to other novice texts. Thus, the
analysis of student writing can provide useful insight into which of the factors is
most influential.
In this paper, I want to analyze the use of connecting adverbials, or connectors, in
timed and untimed writing of German students’ of English. However, before the
actual analysis can take place, it is necessary to identify the different theoretical
concepts underlying the topic and then to take a closer look at each of the concepts.
Afterwards, the ChemCorpus will be introduced in greater detail, since it is the main
database used for my research. Analogously, the International Corpus of Learners
English (ICLE), which is used as reference corpus, will be introduced. Additionally,
the tool developed to carry out the analysis will be presented. In the next part, the
analysis and the statistics will be introduced and explained in detail. Furthermore, the
findings will be presented and finally limitations will be clarified and an outlook will
be given.
1.1. Defining the Topic
When taking a closer look at the topic of this paper, three main areas are of relevance
for the research. At first, there is the matter of connecting adverbials, which will be
discussed in section 2.1. below. A definition of the term will be provided as well as
the different labels that have been given to the concept of connecting adverbials by
different authors. Furthermore, the broader context of connecting adverbials,
coherence and cohesion, will be discussed, as well as the role of connectors. In this
context of cohesion, different types of reference, such as endophoric and exophoric,
2
will be discussed. Afterwards, four categories will be established and placed in the
different frameworks established by authors in previous studies.
The next area that will be examined is the concept of genre. As this area is very
broad, it is important to further narrow it down. Consequently, a definition that has
been adapted from works by other authors will be given and the distinction between
the, in some aspects overlapping, terms register and genre, will be made. Moreover,
it will be detailed why the term text type is rather used than genre. Furthermore,
attention will be paid to genre analysis and, particularly, the genre of learners’
English and academic prose. The term learners’ English will be defined, and its
features will be explained in detail. Similarly, the genre of academic prose will be
defined and explained, so that the two genres can be contrasted, with focus on
connector use. As the last part of this section, the two text types of the corpus used
for analysis, written exam papers and final theses, will be looked at in more detail.
As the last of the three areas, the effect of timedness on student writing will be
investigated. On this behalf, it is however necessary to define the term and to look at
previous studies that allow a prediction on possible effects. At this point, the concept
of timedness will be related to the features of the two text types discussed above.
1.2. Research Questions
There are five major research questions that cover everything that has been analyzed
in the two corpora. The first three research questions comprise the variables of
timedness, gender and mother tongue (L1). The fourth research question deals with
the prototypical connectors in the functional categories. Finally, the fifth research
question compares the findings of the analysis of the ChemCorpus to findings of the
ICLE.
Q 1. Do L2 learners of English use more connecting adverbials in timed or
untimed writing?
Q 1.1. Are more complex connecting adverbials used in untimed writing due
to the extended time period available for editing?
Q 2. Do women use more connecting adverbials than men?
Q 3. Do German L1 writers use more connecting adverbials than L1 writers of
other Germanic languages?
Q 4. What are the prototypical connectors for each of the four functional
categories?
Q 5. How does the writing of Chemnitz students compare to other L2 writers?
The first research question (Q 1.) investigates whether more connectors are used in
timed or untimed writing. My hypothesis is that, as there is more time available to
3
edit the texts in untimed writing, it is more likely that a higher number of connecting
adverbials can be observed. Furthermore, the sub research question (Q 1.1.)
investigates whether student writers use more complex connecting adverbials if they
have no time restraints and thus can edit their work more intensively.
The second research question deals with the variable of gender. It will be
investigated whether female or male students use more connectors. The initial
assumption is that female authors use more connecting adverbials than male authors.
When answering this research question it is also important to keep the stratification
of the corpora in mind, since, due to the generally low number of male students in
the field of English and American Studies, the ChemCorpus contains only relatively
few texts by male authors.
The third research question deals with the mother tongue of the writers. More
precisely, German L1 writers will be compared to L1 writers of other Germanic
languages, such as Dutch or Swedish, and it will be possible to see whether German
writers use more connecting adverbials than the writers with other Germanic
languages as their L1. It is important to consider that the ICLE, which will be used
for comparison, only, contains texts written by authors with two other Germanic
native tongues, Dutch and Swedish.
The fourth research question steps away from the social variables and investigates
which connecting adverbials are prototypical for each of the four functional
categories. Afterwards, the prototypical connectors will be compared to the rest of
the connectors in this category. It is hard to make a prediction here, but I would
guess that the prototypical connecting adverbials make up a large percentage of the
connectors in the respective functional categories.
Finally, the fifth research question compares the overall usage of connecting
adverbials in the ChemCorpus to the ICLE. It will be possible to see whether the
Chemnitz students use connectors differently than other learners of English.
However, my assumption is that their usage is not very different, since academic
English has to be learnt by all writers alike, disregarding their language background.
After answering all of the five research questions presented, an extensive picture
of connecting adverbial usage by nonnative writers of English will be visible.
Furthermore, possible areas for future research can be highlighted.
4
2. Literature Review
In this part of the paper, most of the theoretical concepts and background information
will be provided. There will be eight subchapters, the first seven of which will deal
with theory, and the last one will review previous studies in the field of connecting
adverbials. At first, connecting adverbials will be covered. Afterwards, the
differences between timed and untimed writing will be explored, and the topic of
student writing and genre will be discussed. In the next part, three of the factors that
will be analyzed in both corpora, namely gender, L1 and prototype, will be
introduced. In the last part of chapter 2, three previous studies (Hůlková (2011);
Bolton, Nelson & Hung (2002); Milton & Tsang (1993)) will be reviewed with
regard to my own study.
2.1. Connecting Adverbials
This chapter will cover the theory of connectors or connecting adverbials. Three
different sub-chapters will deal with different aspects related to connectors. At first,
the concept of cohesion will be introduced and explained. Afterwards, the topic of
reference will be introduced. It covers the property of reference that lexical items
need to have in order to express the semantic ties described as cohesion. Lastly,
different systems of categorizing connecting adverbials, different in both definition
and numbers, will be reviewed. Finally, my own system of categorizing connecting
adverbials will be introduced based on this review.
2.1.1. Cohesion
Before discussing connecting adverbials, it is necessary to step back and take the
path from a very general perspective of a whole text, to the small units of connecting
adverbials. The most general form of data that can be linguistically analyzed is a text.
In linguistics, a text is characterized as “any passage [of] spoken or written
[language], of whatever length, that does form a unified whole” (Halliday & Hasan,
1976, p. 1). It is important to realize that a text is to be understood as a semantic and
not as a grammatical unit. A text is made of sentences, but as Halliday & Hasan
(1976) point out, it is important to say that a text does not consist of sentences, but is
realized by them . This means that the sentences encode the semantic unit of a text
and to realize this coherence and cohesion are needed. As grammar and lexicon are
used to put the semantic concepts expressed in a text into language, cohesion is
5
needed to bridge the gap between the semantic concepts and the words and
grammatical structures of a text. This can be illustrated as shown in Figure 1 below.
Figure 1: from Halliday & Hasan (1976)
The figure shows the concept of a text as a top down model. At the top is meaning,
which is realized via the semantic system and the semantic concepts in our minds.
Below that is wording, which is the way the semantic concepts are realized using a
language. However, there needs to be something that enables us to coherently
express the semantic concepts in a language, and that is cohesion. It is the concept
that ties together meaning and wording. Cohesion is realized by both, lexical items
and grammar, thus there is lexical cohesion and grammatical cohesion (Halliday &
Hasan, 1976, p. 6). The distinction between these two is not a binary, but more of a
distinction of degree, since it is impossible to say that a semantic relation is one or
the other. The distinction is especially difficult when talking about connecting
adverbials, as they almost always have a grammatical and a lexical component. So it
can be said that cohesion establishes relations between the different semantic
concepts expressed in a text.
2.1.2. Reference
Another important concept to consider is that of reference. The previous chapter has
shown that cohesion is used to logically express semantic concepts. In order to do so,
language items that have the property of reference are necessary. In the following,
the concept of reference will be explained in greater detail.
This concept can be defined very straightforward. Reference expresses a semantic
relation that the information has to be retrieved from somewhere else (Halliday &
Hasan, 1976, p. 31). This “somewhere else” can be put into two categories:
situational (exophoric) or textual (endophoric). Exophoric means that the meaning
lies in the context of the utterance, as the term exo- suggests that it lies outside of the
utterance. Endophoric reference is basically the direct opposite, as the reference lies
within the text. Here, the terminology can be further distinguished into anaphoric
reference, referring to preceding text, and cataphoric reference, referring to following
meaning
• the semeantic system
wording
• the lexicogrammatical system
•grammar
•vocabulary
'sounding'/writing
• the phonological system
• the orthographic system
6
text (Halliday & Hasan, 1976, p. 33). The relevant type of reference for this paper is,
as mentioned above, endophoric reference, as connectors are used to establish links
between utterances in a text. The distinction between anaphoric and cataphoric
reference, however, can be ignored for the purpose of this paper. On the one hand,
the distinction is not easy, as reference can be ambivalent when looking at individual
language items, and on the other hand, the classification can hardly be made by
automatic tools. Consequently, it is not suitable for this quantitative analysis.
2.1.3. Categories of Connecting Adverbials
Since this paper aims at investigating a large number of connecting adverbials, it is
necessary to categorize them into adequate semantic categories. Nonetheless, this
categorization is not easy, as different authors have come up with different models.
In this chapter, I want to present previous research that has been done to categorize
the connecting adverbials and finally will present the categories used in my analysis.
The first categorization that will be considered is by Biber et. al. (1999). They
establish six semantic categories, which are enumeration and addition, summation,
apposition, result/inference, contrast and concession, and transition (Biber et. al.,
1999). It is notable that Biber et. al. are some of the few who use a categorization
with six categories rather than just four, as other researchers, such as Halliday &
Hasan (1976) or Bolton, Nelson & Hung (2002), do. The first category is
enumeration and addition, which has the main function of connecting adverbials
belonging to this category is either “the enumeration of information in an order
chosen by the speaker/writer [or] […] the addition of items of discourse to one
another” (Biber et. al., 1999, p. 875). The semantic concept of adding items in a text
according to a sequence or just to one another underlying the category is slightly
ambiguous, since other authors regard enumeration and addition as two separate
concepts. The next category, summation, contains connecting adverbials that “show
that a unit of discourse is intended to conclude or sum up the information in the
preceding discourse” (Biber et. al., 1999, p. 876). Appositional connecting adverbials
function as markers that “the second unit [of discourse] is to be taken as equivalent to
or included in the preceding unit” (Biber et. al., 1999, p. 876). The following
category is result/inference, which, as the name already suggests, “show[s] that the
second unit of discourse states the result or consequence […] of the preceding
discourse” (Biber et. al., 1999, p. 877). The category that follows,
7
contrast/concession, is the broadest category in the model Biber et. al. establish. This
category contains adverbials that “mark incompatibility between information in
different discourse units, or that signal concessive relationships” (Biber et. al., 1999,
p. 878). The last category in this model is transition, which “mark[s] the insertion of
an item that does not follow directly from the previous discourse” (Biber et. al.,
1999, p. 879).
The next categorization that will be covered is by Halliday & Hasan (1976). They
establish four categories, which are additive, adversative, causal and temporal.
Additive is defined as a derived form of coordination and the ‘and’ relation (Halliday
& Hasan, 1976, p. 244). This means that additive connecting adverbials link two
units of text that belong to the same topic and express similar meaning. The next
category is adversative, which they define as “contrary to expectation” (Halliday &
Hasan, 1976, p. 250). Consequently, adversative connecting adverbials link two units
of text that are related, but the second unit expresses meaning that deviates from the
expected meaning that could be deduced from the first unit of text. The third
category is causal connectors, which express a cohesive relation in which a text unit
logically entails another (Halliday & Hasan, 1976, p. 256). The last category is
temporal connecting adverbials, which express succession, as one unit of text is
“subsequent to the other” (Halliday & Hasan, 1976, p. 261). This includes not only
connecting adverbials that directly relate to a temporal succession, such as ‘then’, but
also all items that express a form of succession, such as ‘first’, ‘second’, ‘third’.
It is, furthermore, notable that Halliday & Hasan also define a number of
subcategories to further distinguish each group. However, this subdivision will not be
considered in this paper because it is not only difficult to assign a connector to a
specific subgroup, but also nearly impossible to implement in the rather large-scale
quantitative analysis, as done in this paper.
The classification of connecting adverbials established by Halliday & Hasan is
directly adopted by Bolton, Nelson and Hung (2002). This study is discussed in more
detail in chapter 2.4.
The last model for categorizing connecting adverbials that shall be mentioned
here is proposed by Hůlková (2011). Her model comprises, similar to Biber et. al.
above, six semantic categories. The categories are appositional, listing,
contrastive/concessive, resultive/inferential, summative and transitional (Hůlková,
2011, p. 138), of which the latter four are the same as in Biber et. al. above. The only
8
two categories that are different are appositional and listing, though they directly
correspond to addition and enumeration respectively in Biber at. al.’s model. It is,
furthermore, notable that Hůlková also establishes subcategories for some types of
connecting adverbials, but, again, they are not relevant for the present study.
After the review of the semantic categories established in previous research, the
following part will introduce the classification system used in this study. I opted for a
model with four categories, since some of the categories in Biber et. al. and Hůlková
contain only very few connecting adverbials. The categories used here are additive,
adversative, causal and sequential, which allow the classification of all possible
connecting adverbials. The first category, additive, contains connectors that express
the semantic concept of linking one unit of text to another by marking the second
unit as being similar to the first (Biber et. al., 1999, p. 878). The second category
contains adversative connecting adverbials, and thus, is more or less the inversion of
the additive category, as it also links two units of text, but marks the second as being
different from the first. The next category is causal connectors, which express a
logical link between two units of text, usually marking the second unit as a result
from the previous unit. Finally, there are sequential connecting adverbials, which
include all items that express a certain type of sequence. This includes enumerations,
listings, as well as temporal expressions, and summations. Note that summative
connecting adverbials are within the group of sequential connectors and not causal,
as they do not express a logical consequence.
Table 1 below illustrates how the different semantic categories from previous
research correspond to the categories used in this paper. It is notable that the
sequential category corresponds to more than one category from the six category
models. The reason for this is that the respective categories in the six category
models are defined narrower and, thus, contain considerably less connecting
adverbials then my sequential category. This is caused by the narrower definition of
the respective categories in the six category models. Consequently, the sequential
category as defined for this paper contains considerably more connecting adverbials.
9
additive adversative causal Sequential
Halliday &
Hasan;
Bolton,
Nelson, Hung
additive adversative causal temporal
Biber et. al. enumeration
and addition
apposition
contrast/
concession
result/
inference
transition
summation
Hůlková appositional contrastive/
concessive
resultive/
inferential
listing
summative
transitional
Table 1: Comparison of Semantic Categories of Connecting Adverbials, based on previous Studies
2.2. Timed and Untimed Writing
The topic of timed and untimed writing and the influence of timedness on writing is
not very well researched. There is hardly any research on the topic, nor studies that
investigate the effect of timedness on writing (Gregg, Coleman, Davis & Chalk,
2007). The reason for this is unclear to me, maybe there has simply no research been
carried out, what is relatively unlikely, or it simply has no effect on the writing.
Moreover, timed and untimed texts are hardly of the same text types, which makes it
hard to compare them. Intuitively, I would assume that untimed writing contains
more connecting adverbials, since the authors have more time to rewrite and edit
their work. It would also be possible that untimed writing contains a wider variety of
connecting adverbials than timed writing, since the authors have the possibility to
consult dictionaries or thesauri. This study will show whether these assumptions can
be confirmed.
2.3. Student Writing and Genre
This section will deal with the concept of genre and will relate this concept to student
writing. First of all, it is important to define the term genre. The term is widely
recognized within the field of literature studies, in which genres such as prose,
poetry, or drama are important. Swales defines the term genre in linguistics as
follows:
10
A genre comprises a class of communicative events, the members of which share
some set of communicative purpose. These purposes are recognized by the expert
members of the parent discourse community, and thereby constitute the rationale for
the genre. […] In addition to purpose, exemplars of a genre exhibit various patterns of
similarity in terms of structure, style, content and intended audience. If all high
probability expectations are realized, the exemplar will be viewed as prototypical by
the parent discourse community. (Swales, 1990, p. 58)
The first part of the definition deals mostly with the context and generic features of
genres. According to this definition, genres can contain any kind of communicative
event, either written or spoken, that is used to communicate with members of the
discourse community. However, the expert members of the community have to
recognize the purposes of the communication, which usually means that the form of
the communication follows a certain pattern. Texts belonging to a genre have to obey
to a certain norm, in order to be recognized as specimen of that particular genre.
Consequently, there have to be “typical linguistic features [that are] frequent and
pervasive” (Biber & Conrad, 2009, p. 16). This means that, similar to genres in
literature, genres are not defined by content but by form. Furthermore, what genre a
text belongs to is not defined by a single authority but in a rather democratic way by
the expert members of the discourse community. These experts are also the ones who
create models to which future texts of the genre can be compared. As a result, it is
possible to say that “genres link users to their discourse community” (Schmied,
2011, p. 5). This also means that genres are closely linked to their discourse
community and may only be recognized as such by members of the respective
community.
In the second part of this chapter, the facts mentioned above will be applied to the
case of student writing. The most important fact about student writing is that text
produced by students can be seen as part of apprenticeship to become a recognized
member of the discourse community (Schmied (2011), Hüttner (2007)). As an
apprenticeship signifies the process of acquiring certain skills, the texts produced by
students will lack some of the linguistic features the expert texts of the genre have.
Furthermore, the communicative purposes and target audiences of student and expert
texts differ considerably. This raises the question whether student genres are just
weaker copies of expert genres or should be seen as genres of their own. Hüttner
(2007) argues that student genres have to be considered as separate genres from
expert genres, since they are not just weak copies, as novice genres would be, but are
separate genres because of their special communicative purposes. She argues that
11
student genres are almost entirely produced by students and usually have a very
limited audience, such as the corrector of a paper, and the communicative purpose is
also very different from expert texts, as student texts usually aim at displaying the
progress of learning in a certain discipline (Hüttner, 2007, pp. 58-62). Consequently,
the ways and strategies utilized in the papers will differ, and because of this the
linguistic features will also differ. However, student genres still belong to the greater
complex of academic discourse. In accordance to Hüttner’s argumentation, it may be
more useful to compare texts from student genres amongst themselves instead of
comparing them to expert genres, particularly with the different cultural backgrounds
and native languages of students in mind. (Schmied, 2011, p. 14)
To conclude this chapter, it can be argued that student genres have to be
considered on their own, as their communicative purpose is unique and they differ
significantly from expert genres. However, it must not be forgotten that the student
genres are sub-genres within the field of academic research and mark stages in the
apprenticeship towards the goal of producing texts belonging to the expert genres.
2.4. Gender
There has been a lot of research in the field of gender and its impact on language
since the emergence of feminism and feminist linguistics. These studies mainly focus
on gender as a sociological or socio-cultural phenomenon (Kortmann, 2005, p. 277).
All these studies have shown that there are considerable differences in the language
of men and women, particularly in oral language. For written speech, novels in
particular, Livia (2003) points out that female authors use more cohesive devices
than male authors. For academic writing, gender seems to be of no influence on the
use of connecting adverbials (Hůlková, 2011). Since these two findings, albeit about
different genres, are contradictory, it will be interesting to see what this study reveals
about the use of connecting adverbials by gender.
2.5. L1
The influence of the mother tongue on academic writing in a foreign language is a
heatedly debated topic with two opposing points of view. On the one side, there are
those researchers that suggest that the L1 has a considerable influence on the
academic writing in the L2. On the other side are the ones that claim the influence of
the L1 can either be neglected or cannot be the only factor for explaining the usage
12
patterns of learner writing (Gilquin & Paquot, 2008, p. 54). These other possible
factors include education or the discipline in which the papers are written.
Those who deem the influence of the L1 to be of minor importance, ground their
claim on the basis of the universality hypothesis, which “implies that the methods
and concepts of a science form a secondary cultural system” (Martin-Martin, 2005, p.
193). Martin-Martin quotes Widdowson, according to whom the scientific discourse
is “basically independent of its realization in a particular language” (Martin-Martin,
2005, p. 193). Furthermore, he provides evidence of studies that have found that the
educational experience influences the academic writing of L2 authors more than the
interference from the L1 (Martin-Martin, 2005, p. 196). In addition, he argues that
the writing conventions of certain genres or disciplines are more influential than
cultural influences of the writers (Martin-Martin, 2005, p. 200). In conclusion,
Martin-Martin states
[T]here are certain aspects of academic discourse which are more amenable to the
restrictions of the writing conventions in a specific discipline and in a specific genre,
and that this would tend to be universal, whereas there may be other aspects that are
governed by socio-cultural factors, which are therefore culture-specific. (Martin-
Martin, 2005, p. 200)
Thus, it is difficult to say, whether the use of connecting adverbials is influenced
mostly by genre, discipline or L1 interference. However, since coherence and
cohesion are very prominent features of academic discourse, I assume that there
might be some variation in the use of connecting adverbials between the different L1
groups, however, it would be beyond the scope of this study to investigate whether
the variation is due to L1 interference or the educational background of the writers.
2.6. Prototype
The concept of prototype originates in the field of cognitive semantics and is related
to the categorization of things. Prototype semantics generally assumes that the
boundaries of semantic categories are “much more flexible and fuzzy than is
suggested by traditional componential semantics” (Kortmann, 2005, p. 209). Due to
this fuzziness, there are “central or typical members of a category” (Saeed, 2009, S.
37), which are the best representation of the underlying cognitive concept. For this
study, the statistics will show, which connecting adverbial in the four functional
categories occurs most frequently and, thus, can be assumed to be the most central,
or prototypical member of that category, and how they compare to the other
members of the categories.
13
2.7. Previous Studies on the Topic
2.7.1. Hůlková (2011)
In 2011, Hůlková conducted a study, similar to this one, investigating possible
differences in men’s and women’s use of conjunctive adverbials, as she calls them.
Hůlková bases her work on a corpus of 50 research articles from five different
disciplines, with a total of 350,000 words. First, she presents some theoretical
considerations regarding the register of academic prose, which she considers to be
“explicit, unambiguous and logical” (Hůlková, 2011), and the sub-register of
research articles, which is, according to her, mostly defined by its structures. At this
point, it is notable that Hůlková uses the term ‘register’ to describe academic prose
and research articles instead of the term ‘genre’. Concerning terminology, it is also
evident that she uses the term “conjunctive adverbials” to describe what I call
connecting adverbials. Hůlková analyzes these conjunctive elements in terms of
frequency not only in total, but also by five different academic disciplines and by
men versus women. Moreover, she divides the connectors into six semantic
categories and analyses the frequency for each category in the same way as she did
for the whole corpus. At merely 350,000 words, the corpus Hůlková uses for her
analysis is comparatively small. In contrast to this, she uses a long list of 90
conjunctive adverbials for her analysis. Her findings show that, first and foremost,
there are certain connectors that are generally used more often than the others.
Taking into account the different distinctions she made within her corpus, Hůlková
concludes that gender does not have any influence on the use of connectors. There
might be some minor differences, though they can be explained as being
idiosyncratic. The division by academic fields shows notable differences, which
Hůlková attributes to the different needs and goals the authors in the respective fields
have.
The most important finding of her study for my own research is the fact that
gender has no influence on the use of connectors. The results Hůlková presents are
very clear in this respect. Nevertheless, I included gender to see whether her finding
can be confirmed.
2.7.2. Bolton, Nelson and Hung (2002)
Another study has been conducted in 2002 by Bolton, Nelson and Hung, who
investigated the underuse and overuse of connectors in the writing of Chinese
14
students of English. They use the Hong Kong and the Great Britain components of
the International Corpus of English for their analysis. In their paper, they establish
four categories of cohesive devices: additive, adversative, causal and sequential
(Bolton, Nelson, & Hung, 2002). They investigate the over- and underuse of
connectors in Hong Kong undergraduate student’s writing. For this reason, they
compiled a corpus of ten untimed essays and ten timed examination scripts, with a
total of 46,460 words. (Bolton, Nelson, Hung, 2002) Of particular interest for my
own work is their categorization of the connectors, which I used as a basis to develop
my own word list. It is furthermore notable that, even though Bolton, Nelson and
Hung include both timed and untimed writing, they do not go into detail of the
effects of timedness on writing. Additionally, they use the British component of the
International Corpus of English (ICE-GB) as a reference corpus to investigate the
over- or underuse of connectors. This methodology was adapted into the current
paper, however I decided to use the International Corpus of Learner English (ICLE)
instead of the ICE corpus, since I do not want to investigate an under- or overuse
compared to native speakers, but the general usage of connectors in student writing
and the effect of the writing being timed or untimed. Their study concludes that
connectors are generally overused in student writing, both of native speakers and
learners of English.
2.7.3. Milton and Tsang (1993)
The last study I want to mention in this section was conducted by Milton and Tsang
in 1993. They investigated the usage of logical connectors in non-native students’
writing with a special focus on giving directions for future research. In their paper,
Milton and Tsang start by outlining the necessary background information on the
usage of connectors and the electronically-aided study of student writing. The corpus
used in this study contains about four million words, split into about 2,000
assignments written by first year undergraduates and about 200 scripts from the
Hong Kong Examinations Authorities’ ‘A’ level Use of English examination.
(Milton & Tsang, 1993) To compare their data and their findings, Milton and Tsang
chose three native speaker corpora (Brown Corpus, LOB Corpus, HKUST Corpus).
The results of the study show that there is generally considerable variation in the
distribution of which connectors are used by the different corpora. Only the three
most frequent connectors are the same in all four corpora, namely “and, also and
15
because” (Milton & Tsang, 1993). In the following, Milton and Tsang are focusing
on the English as a Foreign Language (EFL) aspect of their work, particularly on the
over-, under- and misuse of some connectors. They also investigate the origins of
these usage issues and from this, deduce implications for future teaching. They
conclude that the source of these errors in students’ writing originate from teaching,
and consequently, teaching methods and materials have to be adapted to cater for the
special needs of learners of English in an academic context.
3. Analysis
3.1. Methodology
In the following, I want to take a close look at the methodology implemented in this
study. On a broad perspective, the study is a quantitative analysis of two corpora.
The corpus that is the main subject of the analysis is the ChemCorpus, which has
been compiled at the Chemnitz University of Technology. In order to have a
comparison to the results, the International Corpus of Learner English (ICLE) will be
used as a monitor corpus. Furthermore, the corpora will each be divided into two
sub-corpora according to the timedness of the texts. However, for some of the ICLE
texts, the required metadata of whether a text is timed or untimed is not available.
Thus, only the texts for which the information is available will be part of the
analysis.
After the selection of the data, the actual analysis will be carried out by a tool that
I developed specifically for this task. It is a Perl script that has two major tasks: First
of all, filtering unwanted text, such as table of contents or references from the files.
This applies only to the ChemCorpus data, which has been tagged in this respect.
The ICLE data, on the other hand, contains no such tags, since the texts do not
contain these elements, and the script will simply not perform these operations. The
second main task the tool will carry out is counting the occurrences of all tokens that
are to be researched and counting the total number of words.
In the third step, the data gathered by the analysis script will be stored in a
Microsoft Excel table, which will then be used to calculate comparable, normalized
frequencies. The data will also be aggregated into the four functional categories.
Finally, the results of this step will be used to create graphs that visualize the results.
16
3.2. Corpus
In this section, the corpus that is subject to my analysis, the ChemCorpus1, which
was compiled, as the name suggests, at the Chemnitz University of Technology, will
be described in detail. For this paper, not the whole corpus will be used but a subset
of texts containing basic text types, written magister theses (MagTheses) and written
magister exams (MagWritten). The corpus has been compiled from the year 2001
onwards, with the most recent texts used for the analysis in this paper, dating to
2011. The present corpus contains 1,709,983 words, and the complete corpus
consists of four text types, i.e. written magister exams, magister theses, bachelor
theses and master theses. The sub-corpus used in this paper consists of 1,059,263
words and contains two text types, as mentioned above. Concerning the Magister
theses, there is a further subdivision by the subject area into Culture and Literature
(MagCultLit), and Linguistics (MagLing). The Table 2 below shows the distribution
of texts by category. The texts in the MagTheses category have been final papers
written by students at TU Chemnitz, and thus, they represent the untimed category.
In contrast, the texts in the MagWritten category have been texts produced during the
final written exam at TU Chemnitz, and consequently, they represent the timed
category.
MagTheses MagWritten
MagCultLit MagLing
10 23 52
Total number of Texts 85
Table 2: Number of Texts
MagTheses MagWritten
Total Number of
Words
922,343 136,920
Table 3: Number of Words
The table reveals a minor problem with the data set. The number of MagTheses texts
is very unevenly distributed into the subcategories. There are more than twice the
numbers of texts in the MagLing category than in the MagCultLit category.
Analogously, the number of texts in the MagWritten category is again much higher
than in the MagTheses category. The skewness in the distribution of the number of
1 http://www.tu-chemnitz.de/phil/english/ling/chemcorpus.php
17
texts gets completely reversed when looking at the number of words in the respective
categories. Here, the number of words in MagTheses is almost seven times higher
than in MagWritten, despite the fact that the category MagWritten contains a
considerably higher number of texts.
The combination of the two skewed figures above allows the conclusion that this
is not an ideal yet stratified corpus, as there is a sufficient amount of data in for both
categories. Furthermore, it is essential to note that the disproportion in the figures
originates in the different lengths of the texts in the categories. The texts in the
written exams section contain on average 1,987 words, with a standard deviation of
593 words, whereas the texts in the theses section contain 29,379 words on average,
with a standard deviation of 9,990 words. This may be an issue, since the untimed
part of the corpus is multiple times as large as the untimed component, and thus the
corpus is not ideally stratified. However, the analysis will have to show whether this
has an impact on the results.
3.3. Monitor Corpus
For this paper, the International Corpus of Learner’s English (ICLE) is used as a
monitor corpus.
The International Corpus of Learner English contains argumentative essays written by
higher intermediate to advanced learners of English from several mother tongue
backgrounds (Bulgarian, Chinese, Czech, Dutch, Finnish, French, German, Italian,
Japanese, Norwegian, Polish, Russian, Spanish, Swedish, Tswana, Turkish). (UCL -
ICLE, 2012)
However, before the corpus could be used for the analysis, an issue with the data had
to be resolved. The data was not available in a format that could be used the same
way the ChemCorpus was analyzed. The data files were all stored in a single
directory with a unique ID as file name. To access them, there was only one special
piece of software available that allows the user to filter for certain sociolinguistic
criteria. The only downside of this tool is the lacking ability to export the files that
match the search criteria. This functionality would however have been vital to carry
out further analysis using the tool used to analyze the ChemCorpus. The
documentation of the ICLE tool states that the metadata is stored in a certain
database file, which is located in the data folder. This leads/ led to a further problem,
since the database file was stored with a special file extension, ‘.ICLE’. This problem
could be resolved by taking a look at the file header using a hex editor (see Figure 2
18
below), which revealed that the file was simply a renamed Microsoft Jet 4.0
database2 and could consequently be easily accessed using Microsoft Access.
Figure 2: ICLE Metadata Database File Header
This made it possible to create two lists containing all the filenames of timed and
untimed pieces of writing. These lists were then fed into a script (see Appendix A.2.)
that was developed to copy the respective files to a new directory, thus creating two
sub-corpora of timed and untimed writing. Afterwards, it was possible to use the
same analysis script that was used for the ChemCorpus. It is, furthermore, important
to note that there is a considerable amount of texts that cannot be classified as either
timed or untimed since the corresponding metadata field states “Unknown”. These
texts have been ignored in the analysis in this paper, as they would pose an
unresolvable methodological problem because it would be impossible to integrate
them into the established framework of the research.
In the following, a few statistics concerning the used data from the ICLE will be
presented. As mentioned above, the corpus has been split into two sub-corpora of
timed and untimed writing. The former contains 639,673 words and the latter
1,684,555 words, totaling 2,324,228 words altogether. The untimed texts are on
average 700 words long, with a standard deviation of 295 words, whereas the timed
texts average 619 words with a standard deviation of 302 words. There is not much
difference in length and deviation between the two categories, which is a
considerable difference to the ChemCorpus, where the difference in length between
timed and untimed writing is more than a factor ten. More on this in section 3.4
below.
2 http://support.microsoft.com/kb/275561/en
19
3.4. Corpus Compatibility and Issues
When comparing two corpora, the question of data compatibility arises. Attention
has to be paid to the stratification of the corpora as well as to the comparability and
stratification. Analyzing the two corpora used in this paper, some issues arose by
looking at the numbers of texts and the word count. While the corpus that is
primarily researched contains roughly 1 million words, the monitor corpus contains
approximately 2.3 million words, more than twice the primary corpus. Thus, this
issue has not much weight in terms of comparability, as for comparing the two
corpora the absolute values can easily be transformed into normalized, relative
values. However, when taking a closer look at the texts comprising the corpora,
another issue becomes apparent.
ICLE
untimed
ICLE
timed
ChemCorpus
timed
ChemCorpus
untimed
Average Text
Length in words
700 619 1987 29379
Standard
Deviation
295 302 593 9990
Table 4: Average Text Length of Corpora
Table 4 shows that the texts in the timed components of the ChemCorpus and the
ICLE differ by a factor of more than three. The differences in text length are even
more extreme for the untimed components, where the texts differ by a factor of more
than 40. In general, the ICLE is stratified in terms of text length, whereas the
ChemCorpus is not. This might seem like an issue, however since the two corpora
will not be compared as a whole, but instead, the timed and untimed components will
be compared. It is furthermore important to keep in mind that the ChemCorpus might
consist of longer texts, but the ICLE still has more texts and overall more words.
Consequently, the text length might have an impact on the stratification of the
corpus, but is not relevant for the comparison.
3.4.1. Text Type
With regard to the texts that comprise the corpora, the different text types are also
important. While the ChemCorpus contains exam essays and theses, the ICLE
contains argumentative essays. The comparability of these text types, however, is
still subject to discussion. They differ not only in length, as seen above, but also in
the communicative strategies utilized. While the argumentative essay usually tries to
20
convince the reader to something by presenting a structured argumentation, the thesis
presents academic work that has been carried out. Due to the different
communicative purposes, the linguistic strategies to achieve these purposes will
differ as well. Consequently, it is necessary to compare these two text types to assess
whether they are comparable or not. Theses do generally follow the Introduction –
Method – Results – Discussion (IMRD) structure of research articles (Samraj, 2008,
p. 57). However, Samraj shows that, depending on the department, sometimes
different strategies are utilized. As a result, it can be expected that student’s theses
are written in a similar style to research articles. This implies a use of connecting
adverbials similar to research articles.
The argumentative essay, however, has a different purpose, i.e. “to persuade the
reader of the correctness of a central statement” (Hyland, 1990, p. 68). Consequently,
the structure is different too, as
Figure 3 below shows. The text is structured in three stages. Firstly, there is the
thesis, which is accompanied by an attention grabber and background information on
the topic, as well as a short evaluation, which briefly supports the thesis. Secondly,
the main argumentation, which presents a number of claims that are furthermore
supported by evidence, follows. Lastly, the conclusion, which rounds up the
argumentation and reaffirms the thesis, completes the essay. Resulting from this
structure, the connecting adverbials used will differ too. Since it is the purpose of an
argumentative essay to persuade the reader, the connecting adverbials used to create
cohesion will be mostly additive and causal. Adversative connecting adverbials will
most likely be used less frequently, since they would not match the communicative
purpose of relating arguments to each other.
The structure of an argumentative essay is, according to Hyland (1990), thesis –
argument – conclusion. Each step is furthermore divided into up to five moves (c.f.
Figure 3). The thesis, which introduces the proposition that the essay will argue for,
also includes the general introduction to the topic, which often is realized by an
attention grabber, some background information, and a brief support of the thesis.
Afterwards, the main part of the essay contains the actual argumentation which, after
an introduction, basically makes a claim, which then is supported by explicit
assumptions or data and citations. Notably, there is no limit to the amount of
arguments that can be presented. The final part of the argumentative essay is the
21
conclusion, which wraps up the argumentation and shows that the hypothesis has
been proven right or wrong.
Figure 3: Argumentative Essay (Hyland (1990), p. 69)
After reviewing the structure of theses and argumentative essays, it can be seen that
there are numerous differences. First of all, a thesis usually follows the IMRD
structure of research articles, while the argumentative essay has a thesis – argument –
conclusion structure. However, the difference is not only the four-part structure in
contrast to a three-part structure. Apart from the introduction, there are no
similarities in the communicative purpose of the different parts. There is neither a
methodology description, nor a presentation of results or a discussion of findings in
22
an argumentative essay. However, there might be some similarities, as a thesis also
needs to present some arguments, for example to justify the choice of one
methodological approach over another.
In conclusion, the issue of text type compatibility potentially has a big impact on
the results. The two text types are fundamentally different in structure and
communicative purpose. Thus, the results of the analysis have to be examined very
carefully and critically, since the effect of the different text types in the two corpora
cannot be reliably predicted.
3.4.2. Gender
Another issue is the stratification of the corpora. As Schmied (2011) points out, there
is a severe lack of male students within the field of English Language Studies at
universities. This problem is mirrored in the data used for this study. The
ChemCorpus contains only nine texts written by male students at a total of 85 texts,
with 15 texts for which there is no information concerning gender. For the ICLE
corpus there are 2833 texts written by female students, 601 texts written by male
students and nine texts without available gender information. These numbers show a
clear skewness towards texts written by female students, which makes the corpus not
very stratified in terms of gender. While this is an issue affecting the stratification, it
is rather unlikely to have any influence on the results, since Hůlková (2011) has
shown that gender has no significant influence on the usage of connecting adverbials
and that possible minor differences in the statistics are merely idiosyncratic
(Hůlková, 2011, p. 137).
3.4.3. Department
Furthermore, the ICLE does not contain information regarding the department which
the students, who produced the texts in the corpus, belong to. The ChemCorpus on
the other hand contains this information, since there is a clear separation between
Linguistics texts and Culture and Literature texts. But as pointed out in section 3.1.,
there are issues with that as well, since the amount of Linguistics papers is
considerably higher. Furthermore, the information regarding the department is only
available for untimed texts. Although there might be a distinction between the two
departments in the ChemCorpus, there is no evidence that there will be much
difference in the usage of connecting adverbials, since both departments belong to
the field of humanities. There are studies that have researched the impact of
23
department on the usage of connectors. However, the departments researched belong
to quite differing academic fields, such as politics, psychology or management
(Hůlková, 2011). Thus, it is debatable if department has an influence, particularly if
the departments belong to the same academic field. Due to the missing availability of
the department data for the ICLE corpus, I decided not to investigate the department
variable, since there would be no grounds for comparison.
3.4.4. Other
Finally, the age of the corpora differs. The ICLE was released in 2002 and, therefore,
does not contain newer texts. The ChemCorpus, on the other hand, contains texts
from the years 2001 to 2012, which makes the corpus very up-to-date. Since they
have been compiled in the same decade, I would not expect that age has any
considerable influence, since it takes time to incorporate changes in teaching
academic writing into the curricula of universities.
All in all, the two corpora at hand differ in some ways, but the skewness in the
structure of the data seems to be similar. This fact, and the relatively large amount of
data, totaling approximately 1 million and 2.3 million words for the ChemCorpus
and the ICLE respectively, still makes the corpora reasonable data-bases for analysis.
3.5. Analysis Tool
The first step was to convert all data files into a coherent format. The data were
available in Microsoft Office format, which has the drawback of being hard to
process outside of Microsoft Office products, due to the closed source design of the
file format. Consequently, all the files had to be converted to plain text, in order to be
accessible for further analysis with external tools. Since the documents at hand were
not using any of the markup features that MS office file formats offer, no information
would get lost when converting the files to plain text files, as the conversion process
removes all markup features. Since the number of files to convert was rather high,
converting all the files manually would have been to time consuming, I decided to
use an open-source tool named AbiWord (The AbiSource community, 2012), which
is a word processer that allows the use of certain functions via the command prompt,
making automated conversion fairly easy. Ultimately, the following command
accomplished the task by recursively looking for all MS Word files in a given
directory and converting them to plain text files.
24
For /f %a IN (‘dir /b *.doc’) do call “C:\Program Files
(x86)\AbiWord\bin\AbiWord.exe” –to=txt %a
The for loop iterates through all files in the directory and calls AbiWord with the
option to convert a file into plain text for every Microsoft Word file with the .doc file
extension.
Once all the files were available in plain text, the analysis could be carried out. Át
that point, another issue emerged. The default procedure would have been to analyze
the files with AntConc (Anthony, 2011), in order to calculate the frequency of certain
words in the text. Considering that I wanted to analyze a rather large number of
about 55 words, this would have been a very long and repetitive task. Consequently,
I chose to develop a tool that would automatically perform the analysis. In the
following, I want to outline what the script does and how it works. For the full source
code, please, see the appendix.
The first step was deciding on a programming language. I chose Perl because it is
has a very powerful regular expression implementation, which is quite useful for text
processing, and because there are a lot of modules providing additional functionality
available via CPAN3, which is a repository for Perl expansion modules. As a result,
the source code is rather short, despite the different tasks the script accomplishes.
The script itself has four main functions:
1. load a wordlist
2. filter out unwanted text
3. calculate frequencies
4. save results in a MS Excel readable file format
The next step was to decide for an input and output file format. Ideally, both should
be the same, MS Excel should have read and write support for both, and the
implementation should not need more code than the actual script itself. Taking all
these conditions into account, I decided to use comma separated value (CSV) files, as
they are easy to edit in MS Excel (or any other spreadsheet application or even a text
editor) and there was a Perl module, Text::CSV_XS (Brand, 2012), providing all the
necessary functions. For a detailed description of CSV files, refer to RFC 4180
(Shafranovich, 2005), which describes the MIME type text/csv. Since handling CSV
files using this module is quite convenient, I decided to use the file format for the
3 http://cpan.org/
25
output of the tool as well, given the fact that Microsoft Excel reads the files without a
problem.
Now, I want to take a detailed look at the source code of the tool and will describe
how it works. The first few lines of code load all the required modules and set up all
global variables the tool needs. Then the function LoadData() is called, which loads
the CSV file containing the linguistic variables that are going to be analyzed. It,
furthermore, initializes the array that will later contain the statistics. The next step,
which actually is two steps, is done by the line
find(\&Count, $data_path);
This line uses the File::find module, which is part of the standard Perl distribution,
to recursively go through all the files in the directory given in the second argument,
executing the function given in the first argument for every file in the directory. Note
that it is important to provide a reference to the function and not to call the function
directly. The function Count() opens the current file and filters out all passages that
are not text produced by the student. I decided to remove everything that is not full
sentences, namely quotations, headings, figures, tables, the table of contents, the
reference section and the appendices. Since the documents had already been tagged
using (X)HTML/XML style tags, filtering these can be accomplished using regular
expressions.
s/<tag.*?<\/tag>//sg
This regular expression performs a global search for a string that contains “<tag” and
“/tag>” with a number of characters in-between that can also be zero and replaces
that string with an empty string, consequently deleting the given passage. Secondly,
the function iterates through the array, whose elements are the list of words
previously loaded from the CSV file. For each element, the script counts the
occurrences of the element in the current text. The third function of this subroutine is
to count the total number of words in all the documents that are analyzed. After the
completion of this subroutine, the program goes back to its main part, and the
function SaveToCSV() is called. This function compiles an array that represents a
row in the CSV file containing the word, the corresponding frequency, and the
relative frequency in all of the analyzed texts. These rows are then written to the
output file. Note that it is important to specify the separation character when
constructing a new Text::CSV_XS object, so that Microsoft Excel can handle the
26
output file correctly. Lastly, the tool prints status information, namely the file it is
processing at the moment.
27
3.6. Statistics
In this chapter, I want to provide a detailed description of the statistical data my
analysis produced. The complete tables that are described in this section can be
found in the appendices. The analysis tool produced tables with the absolute
occurrences of the tokens in an MS Excel readable file format. The rows of the tables
contain the tokens and the columns represent the corpus files. Additionally, the
number of words of the individual files is printed in the last row. With this basis, the
next step in the analysis was to transpose the table, making further operations easier.
Afterwards, the values were normalized per 1 million words. Finally, the data was
aggregated into a table with the four functional categories and the metadata. This
aggregated table was then used to create the figures to visualize the findings. This
last step is especially important, since the raw numbers do not reveal much insight,
as they are hard to read, so plotting them might reveal more. However, upon doing
so, I noticed one major issue, due to the high number of occurrences of ‘and’, the
graphs for the other connecting adverbials were hardly distinguishable. Thus, I
decided to ignore the values for ‘and’ in the graphs that show data that was not
aggregated. Furthermore, plotting all the connecting adverbials in one figure is only
useful to a limited extend, as it only reveals extremely high usage levels, but hardly
any tendencies are visible. Afterwards, the timed – untimed categories were plotted
and examined, followed by the four functional categories. Plotting the connecting
adverbials according to their functional categories yielded very nice results, showing
a clear, skewed usage pattern. The next section shows the statistics concerning
gender, L1 and prototype. In the following, these findings will be thoroughly
discussed.
28
Fig
ure
4:
Ch
emC
orp
us
Co
nn
ecti
ng
Ad
verb
ials
per
On
e M
illi
on
Wo
rds
29
Fig
ure
5:
ICL
E C
on
nec
ting
Ad
verb
ials
per
On
e M
illi
on
Wo
rds
30
At first, figures 4 and 5 above show the normalized distribution of all connecting
adverbials in the two corpora. These two figures allow the identification of the most
frequent connectors in the corpora. For the ChemCorpus the most frequent
connecting adverbials are also, and and but. For the ICLE on the other hand, there
are four connectors that occur most frequently, which are also, and, but and because.
Apart from the connecting adverbials with very high frequencies I also want to point
out that some connectors do not occur at all. On account of this and whence do not
appear in any either corpus, whereas incidentally, on this basis, anyhow and at last
do not occur in the ChemCorpus. When comparing the two figures another, it can be
seen that the ICLE seems to have overall more connecting adverbials then the
ChemCorpus, however, this needs further investigation.
The next aspect of the corpus that has been analyzed is timedness. Figure 6 shows
the average total of connectors according to timed – untimed writing in the two
corpora side by side.
Figure 6: Comparison timed – untimed in ChemCorpus and ICLE
The figure shows that the numbers are very close together for the two branches
within each corpus, as well as for both corpora in comparison, averaging at around
20,000 connectors per one million words, which equals two percent of the words. It
is furthermore curious that there are more connecting adverbials in the timed section
of the ChemCorpus, whereas in the ICLE there are more connecting adverbials in the
untimed section.
20447 19848
0
5000
10000
15000
20000
25000
timed untimed
ChemCorpus
21324 21841
0
5000
10000
15000
20000
25000
timed untimed
ICLE
31
Figure 7: Functional Categories, timed – untimed
To look further into the timed difference, figure 7 shows the connecting adverbials
grouped into the four functional categories compared by timedness. As the average
total has already suggested, there is hardly any apparent difference. The numbers of
connecting adverbials in the additive and sequential categories are equally high
respectively low and very close together. In contrast, the numbers for the adversative
and causal categories differ more. There are roughly 800 connecting adverbials more
in the ICLE in each of the two categories, which is four percent. While this
difference is not very significant, it accounts for the slightly higher average total that
has been discussed earlier.
Figure 8: ChemCorpus Connecting Adverbials by Functional Category (per one million words)
0
2000
4000
6000
8000
10000
12000
14000
timed untimed timed untimed
ChemCorpus ICLE
additive
adversative
causal
sequential
12849
4592
2320
687
12276
4588
2238
746
0
2000
4000
6000
8000
10000
12000
14000
additive adversative causal sequential
timed - ChemCorpus untimed - ChemCorpus
32
Figure 9: ICLE Connecting Adverbials by Functional Category (per one million words)
After discussing the functional categories in terms of timedness, they now will be
discussed separately. The figures 8 and 9 show the number of connectors in each
functional category for the two corpora, normalized per one million words. It is
striking that, in both corpora, the additive category has more than twice the amount
of connectors as the next frequent category, adversative connecting adverbials. The
graphs show a general trend of decline, with additive connectors being the most
frequent category, followed by adversative, causal and sequential. The adversative
category counts only half the number of connectors as the additive, the causal only
half of the adversative and the sequential only less than half of the causal category.
This trend, again, appears in both corpora. One might argue that the trend is only
visible due to the arrangement of the categories, which is quite arbitrary. On first
glance this may be true. However, there are two factors that influenced the sequence
of the categories. Firstly, as they have been modeled following different
categorizations from literature, the categories follow a similar structure. Secondly,
the categories are sequenced by frequency, from the category with the most tokens to
the category with the least.
12403
5216
2926
779
12843
5362
3040
596
0
2000
4000
6000
8000
10000
12000
14000
additive adversative causal sequential
timed - ICLE untimed - ICLE
33
In the following, I want to shift the attention to the variable of gender. Even
though Hůlková (2011) finds that in her data gender did not have any influence on
the usage of connecting adverbials, I decided to include gender in my analysis to see
whether her findings could be confirmed. Unfortunately, information on gender was
not available for all texts in the ICLE, thus only those which had information on
gender in the metadata database were considered. Again, the analysis splits up the
connecting adverbials into the four functional categories to provide a more detailed
view.
Figure 10 below shows the results.
Figure 10: Functional Categories by Gender
The bars in the chart already suggest that female writers use connecting adverbials
more frequently than their male counterparts, but the difference is not as clear cut as
expected. While women use connectors in total roughly 1600 more per one million
words for the ChemCorpus and 1200 per one million words in the ICLE, the
functional categories do not exhibit the same usage pattern throughout. The
ChemCorpus data shows that women predominantly use more additive and
adversative connecting adverbials than men, whereas it is the other way round for the
causal and sequential category. For the ICLE data, the usage patterns are even less
visible. Female authors use more additive and causal connecting adverbials, whereas
male authors use more adversative and sequential connectors. However, the
differences for the sequential and especially for the adversative category are very
low. For the adversative category, female authors use 5,304 connecting adverbials
0
2000
4000
6000
8000
10000
12000
14000
Female Male Female Male
ChemCorpus ICLE
additive
adversative
causal
sequential
34
per one million words and male authors use 5,397 connecting adverbials per one
million words, which make the difference insignificant and thus does not allow a
decision whether this difference is due to gender or due to statistical variance.
In the next part, the influence of the mother tongue of the authors on their usage
of connecting adverbials will be investigated. More precisely, the German L1 writers
of English of the ChemCorpus will be compared with Germanic L1 writers of
English of the ICLE.
Figure 11: Functional Categories by L1
The figure shows that authors with Swedish as their L1 use additive and adversative
connecting adverbials more frequently than others, whereas authors with Dutch as
their L1 use more causal connectors. It is furthermore notable that there is a
discrepancy between the usage of connecting adverbials by German L1 authors of
the ChemCorpus and the ICLE. The ICLE texts exhibit a higher usage of adversative
and sequential connectors, which may be due to the different text types of the two
corpora. Overall, the authors with Swedish or Dutch as their L1 use roughly 2,000
connecting adverbials more per one million words than their German colleagues.
The last aspect that will be analyzed in this section is prototype. Here, the most
frequent connecting adverbials were treated as prototypical and compared to the rest
of connectors in the respective categories. For most frequent connecting adverbials
compare Figure 4 & 5. The following connecting adverbials were selected as
prototypical: and (additive), but (adversative), because (causal) and next (sequential).
Moreover, the timed – untimed division was neglected, since it could be seen above
0
2000
4000
6000
8000
10000
12000
14000
German Dutch German Swedish
ChemCorpus ICLE
additive
adversative
causal
sequential
35
that it has no significant influence. Figure 12 and 13 show the plotted statistics for
the comparison prototypical against the rest.
Figure 12: ChemCorpus Functional Categories by Prototype
Figure 13: ICLE Functional Categories by Prototype
It is striking that the results for the two corpora are almost completely different. The
statistics for the ChemCorpus show that the additive prototype occurs more
frequently than the other connecting adverbials in the category, whereas for the other
three categories the frequency of the prototype is lower than the rest of the
connecting adverbials combined. In contrast, the statistics for the ICLE data
coherently show that the prototypical connector in three of the four categories occurs
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
add_prot add_rest adv_prot adv_rest cau_prot cau_rest seq_prot seq_rest
additive adversative causal sequential
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
add_prot add_rest adv_prot adv_rest cau_prot cau_rest seq_prot seq_rest
additive adversative causal sequential
36
more often than the rest of the connectors, however, the sequential prototype occurs
less frequently than the rest of the connectors in the category combined. It is
interesting that the statistics for the ICLE data show basically the reverse tendencies
that could be observed for the ChemCorpus data. Moreover, the difference between
prototype and rest is bigger for the ICLE data, which hints at a lower diversity of
connecting adverbial used in the texts.
3.7. Results
After taking a closer look at the statistics, the results and deductions that can be made
will now be presented. The first and unarguably most important conclusion is that
there is a clear usage pattern that, albeit some minor differences, can be found in both
corpora. More specifically, there are six very evident usage patterns.
Firstly, there are three connecting adverbials that are by far the most frequent:
and, also and but. These three occur more than twice as frequent as any other
connector in both corpora, with the ChemCorpus exhibiting this feature more clearly
than the ICLE. This finding also correlates with the results of the study by Milton
and Tsang (1993).
Secondly, the statistics have shown that whether a text is timed or untimed writing
has most likely no influence on the usage of connecting adverbials (Q 1.). When
comparing the overall connecting adverbial use in the timed and untimed branches of
the corpora, the numbers are almost equal. While there are minor differences, with
the timed texts in the ChemCorpus having more connectors than the untimed texts
and vice versa for the ICLE, these differences are far from statistically significant
and can be accounted for the uneven distributed number of texts in the categories or
by a certain deviation due to other possible influences such as cultural background of
the author or personal preference, which all have not been taken into consideration in
this study. In addition, the data suggests that, since there is no change in the usage
patterns due to time, it cannot be assumed that there are more complex (i.e. causal or
sequential) connecting adverbials in untimed writing (Q 1.1.)
Thirdly, there is a very clear usage pattern of connecting adverbials for the four
functional categories. Most notably, the usage pattern is apparent in both corpora in
the same way, with only little differences in frequencies. There is a very clear decline
in frequency from one category to the next. The fact that additive connectors are
most frequent is logical, since it is the category that has on the one hand the highest
37
number of connectors and on the other hand contains two of the most frequent
connecting adverbials. In similar fashion, the sequential category, which is least
frequent, does not differ in the two corpora. Only for the adversative and causal
categories, the ICLE exhibits slightly more connectors, however, with a four percent
difference still within a statistically tolerable range. Thus it is difficult to deduce
whether this is a difference due to the writing style of the text or due to statistical
variance.
Figure 14: Overall Connecting Adverbial Shares
Since the difference between the frequencies of the categories in the two corpora
so small, the shares of the connectors can be visualized as in
Figure 14, with more than half of the connectors belonging to the additive category.
Fourthly, there is a clear pattern of use for gender. The statistics have shown that
female student writers use more connecting adverbials in total, albeit this is not
Overall Connecting Adverbial Shares
additive adversative causal sequential
Overall Connecting Adverbial Shares
additive adversative causal sequential
38
exactly the same for all four functional categories (Q 2.). This overall difference was
even more prominent in the ChemCorpus than in the ICLE. The gender specific use
of connecting adverbials for the four functional categories exhibits a pattern in the
ChemCorpus, where female authors use more additive and adversative connectors,
whereas male authors use more causal and sequential connectors. These findings
directly contradict the study by 2.4.1. Hůlková (2011) who found that, for her data,
gender did not have any influence on the use of connecting adverbials. This might
suggest that the influence of gender on the use of connecting adverbials is also
dependent on the text type.
Fifthly, the analysis has shown that the L1 of the authors has an influence on the
use of connecting adverbials. It could be seen that writers with Dutch or Swedish as
their L1 use more connecting adverbials in general. In particular, Dutch writers of
English use more adversative and causal connectors whereas Swedish writers of
English use more additive and adversative connectors. These findings are quite
astonishing, as the initial assumption was that German authors would use more
connecting adverbials than writers with other Germanic L1’s (Q 3.).
Sixthly, there is a clear usage pattern in terms of prototype. The data shows that
the prototypical connecting adverbial, meaning the most frequently used, accounts
for a big, and for parts of the data even major, share of the overall connector use in
the respective functional category. At this point, it is necessary to differentiate the
results for the two corpora. The ChemCorpus shows that the prototypical connecting
adverbials make up a large share of all of the connecting adverbials in the functional
categories, but not the majority, except for the additive category. The ICLE data
shows the opposite. Here, the prototype accounts for the majority of connecting
adverbials in the functional categories, except for the sequential category. This shows
that the ChemCorpus has a greater diversity of connecting adverbials, whereas the
ICLE is dominated by the prototypical connecting adverbials.
Comparing the total numbers, the result is that Chemnitz learners of English use
connecting adverbials less frequently than learners of English from other countries,
and that there are some differences in their way of using them (Q 5.). The research
has further shown that there are different variables that influence these differences.
At first, there is the influence of timedness on the use of connecting adverbials,
which this study has found to be of no importance. Secondly, it has been shown that
gender has a considerable influence on the use of connectors in the ChemCorpus in
39
contrast to the ICLE. Thirdly, the comparison of the ChemCorpus data to the ICLE
data has shown that the L1 of the authors also seems to have an influence on their use
of connecting adverbials. Specifically, it has been shown that writers with other
Germanic mother tongues than German use connectors slightly different. Lastly, it
has been shown that the most important factor in the use of connecting adverbials by
L2 writers of English seems to be prototype. This means that a large share of the
connecting adverbials in each of the four functional categories is constituted by the
prototypical connector. This tendency is clearly visible in the ChemCorpus, but even
more prominent in the ICLE, where the prototypical connecting adverbial represents
more than half of all connecting adverbials in three of the four categories.
To sum up the results, it can be stated that the Chemnitz L2 writers of English use
connecting adverbials generally in a similar way as their international colleagues, yet
there are some differences, which can be accounted to different variables.
3.8. Limitations
The analysis that has been carried out in this study has of course a number of
limitations, which have to be kept in mind when considering the results and drawing
conclusions. Some of the issues, especially with the data, have been discussed in
section 3.3., but they will be mentioned here again and complemented with further
considerations.
First of all, the categorization of the connecting adverbials has some aspects that
need to be carefully considered when assessing the data. One aspect is the number of
tokens in the categories. The additive category has the most tokens (for a complete
table of the categories and tokens, see the appendix section), and the analysis has
shown that more than half of all occurrences of connectors found belong to the
additive category. However, I want to suggest that there is no direct correlation
between the number of items in the category and the number of tokens found in the
analysis. This notion is supported by the fact that the second most frequently used
category, adversative connecting adverbials, has the least amount of items.
Furthermore, the least frequently identified category of sequential connectors has the
second most items. In addition, the differences in the number of items in each
category are not too big. There is no category that has twice as many items as the
other. Thus, the influence of the uneven distribution of the connectors across the
40
categories is most unlikely to have any impact on the result, but still has to be kept in
mind.
The second aspect I want to focus on in this section is text type. As I have already
discussed in 3.3.1., the two corpora are comprised of different text types. While the
ChemCorpus uses theses papers for the untimed components and written exam texts
for the timed component, the ICLE uses argumentative essays for both components.
The text types differ in multiple ways. First of all, they are of different length. The
texts in the ChemCorpus are longer than the ICLE texts, and particularly the thesis
papers are significantly longer than the ICLE untimed texts. Furthermore, the ICLE
timed and untimed texts are mostly similar in length, whereas the respective
ChemCorpus texts are of considerably different length. Secondly, the communicative
purpose differs across the text types. An argumentative essay undoubtedly has a very
different communicative purpose than a thesis paper. The difference in the
communicative purpose is likely to have an effect on the choice of connecting
adverbials, since the choice of connectors will differ between trying to persuade
someone with an argumentation and describing research that has been carried out.
Another interesting aspect in this regard is the department with which the students
are enrolled. As Samraj (2008) has shown, there are differences in the structures of
Master theses across different disciplines. Consequently, it can be expected that the
usage of connectors also varies across the different departments. In this study, I did
not consider this aspect, mostly because information regarding the department was
not available for all data. Thus, there is no evidence in this study to either support or
contradict Samraj’s notion.
The last issue I want to address in this chapter is the case of and. The statistics
that have been produced by my analysis exhibit a large amount of occurrences for the
token and. The reason for the extraordinarily high number of ands can attributed to
the fact that and can not only occur as an additive connecting adverbial, but also as
an enumeration. Due to this, the high occurrence of and distorts the results, as the
number of tokens in the additive category is presumably higher than the number of
actual additive connectors in the data. The reason for this might not be instantly
clear, since it is to be found in the format of the data and the way the analysis script
works. Since the tool that counts the occurrences of the tokens automatically does
nothing but regular expression matching, it is not possible for it to differentiate
between the conjunctive and the enumerative and. However, the reason for this is not
41
purely technical. The distinction would be possible if the data were completely part
of speech tagged and ideally in XML format. However, the data at hand is in plain
text format and the only tagged parts are table of contents, appendices, and quotes in
the theses that are part of the ChemCorpus. One possible solution would be to
exclude and from the analysis, however, this would distort the results too, possibly
even more, since previous studies have shown that and is one of the most frequently
used connecting adverbials (c.f. (Bolton, Nelson, & Hung, 2002)). Thus, the best
solution for this study is to leave the data and the results as they are, but keeping in
mind that there are too many tokens in the additive category.
Overall, this chapter has shown that there are quite some limitations to this study.
The data is only compatible to a certain extent. Connecting adverbials are unevenly
distributed into the functional categories and there is an issue with and. But still, I am
convinced that the study can prove useful, since general trends in the usage of
connecting adverbials by Chemnitz students have become visible, and putting them
into context by comparing the results with the same analysis carried out with data
from the International Corpus of Learners English can provide basic conclusions
concerning the teaching of academic writing.
4. Conclusion
On the previous pages I have presented the use of connecting adverbials in timed and
untimed academic student writing. My research questions were whether the usage of
connectors of the Chemnitz students varies between timed and untimed writing, the
influence of gender, L1 and prototype and finally if there is an overall difference
between the Chemnitz students and the usage by other L2 learners of English. After
having presented my research questions, the next chapter reviewed the relevant
literature. The topics of the review are connectors, which are further subdivided in
cohesion, reference and the categorization of the connectors, timed and untimed
writing, and the question of whether student writing is a genre on its own right. The
last parts covered three studies, which also deal with the topic of students’ use of
connecting adverbials. In this respect, I also discussed in how far these studies are
relevant for my thesis.
Afterwards, the main part with the analysis followed. First, I gave details about
the corpus used for the analysis as well as the monitor corpus. In the next sub-
chapter, the compatibility of the corpora was researched. The topics here were firstly
42
text type, which dealt with the difference between theses and argumentative essay.
The findings showed that the structure and communicative purpose of the
argumentative essay and theses differ significantly and the usage of connecting
adverbials is also very likely to differ. Secondly, the shares of texts written by male
and female students were discussed. The main issue with this topic is that in the field
of English and American Studies is the under-representation of male students, who
are “hard to find” (Schmied, 2011, p. 17). The last part of the corpus compatibility
section was the influence of the department in which the theses were written.
Following the compatibility discussion, I presented my analysis tool, which had been
developed for this study. The fifth sub-chapter presented the statistics that have been
compiled using the analysis tool. Since the complete tables are far too large, only
visualizations have been presented in this section, and the full tables have been
included in the appendices. The next sub-chapter presented the results of the present
study. It could be seen that there is no significant difference in the usage of
connecting adverbials in timed and untimed writing for neither the ChemCorpus texts
nor the ICLE texts. For the distribution in the four functional categories, there is a
clear trend that is present in both the ChemCorpus and the ICLE. The most
frequently used connecters are additive, which are more than half of all connecting
adverbials. The second most frequent category is adversative connecting adverbials.
The other two categories, causal and sequential, are the least frequently used, and
they comprise only a fraction of the overall connectors. Moreover, the influence of
gender, L1 and prototype on the use of connecting adverbials has been researched. It
has been found that gender seems to have an influence on the use of connecting
adverbials, namely that women use more connectors than men. Additionally, this
influence was more prominent in the ChemCorpus. The influence of L1 also showed
interesting results. It could be seen that authors with other Germanic first languages
use connecting adverbials differently than German L1 writers. Finally, it has been
shown that the prototypical connecting adverbials in each of the four functional
categories account for large shares of the overall number of connectors, even for
more than 50 percent in the ICLE data.
After having discussed the results of my study, the next chapter covered the
limitations of the study. These limitations, which result partly from methodological
issues and partly from the data, are important to consider when interpreting the
results of the study. The first limitation that was addressed was the issue of
43
categorizing the connecting adverbials. The issue deals not with the actual setting up
of the categories, but with the assignment of the connectors to the individual
categories. The 55 connecting adverbials, which are investigated in this paper, are
not evenly distributed among the four functional categories. The category of additive
connectors, which has the most connecting adverbials assigned to it, correlates with
the highest number of occurrences. However, this apparent correlation between the
number of connectors assigned to the category and the number of tokens found in the
corpora for this category cannot be verified with the other categories. The second
most frequently identified category of adversative connectors has the least amount of
connectors, while the least frequently found category of sequential connecting
adverbials has the second most number of connectors. Thus, even though the first
glance may suggest a correlation, closer investigation does not verify this
assumption. The second limitation that has been discussed is the different text types
of the texts that were used to compile the corpora. There is a difference in the
structure and communicative purpose of theses and argumentative essays, which is
very likely to also result in a different usage of connecting adverbials, making the
corpora only conditionally compatible. Lastly, the problematic connecting adverbial
and has been addressed as a limitation. The problem is the inability of the analysis
tool to distinguish between and as an enumerative conjunction and as a connecting
adverbial. Examples E 1. to E 4. show instances of both enumerative and connecting
adverbial and from both corpora.
E 1. “working with computer and Internet”
enumeration [MG05Ft_KM – ChemCorpus untimed]
E 2. “Until 1840 most of the British inhabitants in Australia were convicts
and only a small number of free settlers had arrived.”
connecting adverbial [f_W0809_K_N – ChemCorpus timed]
E 3. “[…] a world that was strongly divided in Blacks and Whites”
enumeration [DNNI4006 – ICLE timed]
E 4. “And life itself is in fact a chain of many simple things.”
connecting adverbial [BGSU1127 – ICLE untimed]
This issue can only be solved by using completely part of speech tagged data, since it
is impossible to automatically recognize the part of speech with only regular
expressions.
44
5. Outlook
After having conducted the analysis and having presented the results of my thesis, I
want to give an outlook with suggestions for further research. As it could be seen
when presenting the results and especially in the discussion of the limitations of this
study, there is certainly enough room for further refining the analysis and continuing
to research the matter. The first approach for further research I want to suggest is a
more detailed analysis of the ICLE data. Since the texts in the corpus have been
produced by students with a variety of L1s, it would be interesting to investigate
whether the result of the corpus as a whole are the same for the different L1 groups.
It could be expected that there is a difference, since previous studies, such as Bolton,
Nelson & Hung (2002), have found that the L2 learners of English usage of
connecting adverbials differs from L1 English writers. It would be very interesting to
see whether different language families have an influence on the usage of connectors
in the L2 and furthermore of what kind this influence is, if there is an under- or
overuse of certain connecting adverbials.
Another possible refinement of this study would be the same analysis with two
truly compatible and stratified corpora. Schmied (2011, p. 18) gives suggestion on
how the ideal, stratified, ten million word ChemCorpus would look like. In this
respect, it would also be interesting to redo the study with two fully part of speech
tagged corpora. This would be especially helpful to differentiate between and as
connecting adverbial and as enumeration.
Furthermore, a possible continuation of this study could be a comparison between
student theses and research articles. It has been shown in the literature review that
student writing can be seen as an apprentice work on the way to becoming a member
of the academic discourse community. While the issue of whether student theses
constitute a genre in its own right or are merely an imitation of professional
academic writing, such as research articles, is still debatable, it is safe to say that by
analyzing student theses and comparing them to research articles, helpful facts to
improve the students’ writing can be gathered.
Lastly, connecting the linguistic research to the field of teaching English, it would
be interesting to deduce implication for teaching from research. Especially when
comparing student theses to research articles, possible fields of action for teachers
can be identified. The results of the analysis in this paper show that students do not
45
use sequential connecting adverbials very frequently. A possible implication for
teaching might be that it could be helpful to incorporate more sequential connecting
adverbials into writing course by focusing on structuring the texts more thoroughly.
However, there is still a lot of room for research in this area, so this paper can be
seen as an initial study that shows different aspects for future research.
46
6. References
Anthony, L. (2011). AntConc (Version 3.2.4) [Software]. Available from
http://www.antlab.sci.waseda.ac.jp/antconc_index.html
Biber, D., & Conrad, S. (2009). Register, Genre, and Style. Cambridge: Cambridge
University Press.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman
Grammar of Spoken and Written English. Essex: Pearson.
Bolton, K., Nelson, G., & Hung, J. (2002). A corpus-based study of connectors in
student writing Research from the International Corpus of English in Hong
Kong (ICE-HK). International Journal of Corpus Linguistics 7:2, 156-182.
Gilquin, G., & Paquot, M. (2008). Too chatty Learner academic writing and register
variation. In English Text Construction (1 ed., Vol. 1, pp. 41-61). John
Benjamins Publishing Company.
Halliday, M. A., & Hasan, R. (1976). Cohesion in English. New York: Longman.
Hůlková, I. (2011). Conjunctive Adverbials in Academic Written Discourse:
Conjunctive Adverbials in Academic Written Discourse. In J. Schmied,
Academic Writing in Europe: Empirical Perspectives (pp. 129-142).
Hüttner, J. I. (2007). Academic Writing in a Foreign Language - An Extended Genre
Analysis of Student Texts. Frankfurt am Main: Peter Lang.
Hyland, K. (1990). A Genre Description of the Argumentative Essay. RELC
Journal(21), 66-78. doi:10.1177/003368829002100105
Kortmann, B. (2005). English Linguistics: Essentials. Berlin: Cornelsen.
Livia, A. (2003). Linguistic Approaches to Gender. In J. Holmes, & M. Meyerhoff,
The Handbook of Language and Gender (pp. 142-158). Oxford: Blackwell.
Martin-Martin, P. (2005). Scientific Writing: A Universal or a Culture-Specific Type
of Discourse? In Revista de Lenguas para Fines Especificos (Vols. 11-12, pp.
191-203).
Saeed, J. I. (2009). Semantics (3 ed.). Oxford: Wiley-Blackwell.
Samraj, B. (2008). A discourse analysis of master’s theses across disciplines. Journal
of English for Academic Purposes(7), 55-67.
47
Schmied, J. (2011). Academic Writing in Europe: a Survey of Approaches and
Problems. In J. Schmied, Academic Writing in Europe: Empirical
Perspectives (pp. 1-22).
Swales, J. M. (1990). Genre Analysis - English in academic and research settings.
Cambridge: Cambridge University Press.
UCL - ICLE. (2012, 12 19). Retrieved from http://www.uclouvain.be/en-cecl-
icle.html
48
7. Appendices
The following sections denote the paths under which the content can be found on the
CD accompanying this paper.
7.1. Corpora
\Corpora\ChemCorpus\
\Corpora\ICLE\
7.2. Statistics
\Statistics\
7.3. Sorting Tool Source Code
\Tools\sort_ICLE_data.pl
7.4. Analysis Tool Source Code
\Tools\analyze_data.pl
7.5. Selbstständigkeitserklärung