53
Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao [email protected]

Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao [email protected]

Embed Size (px)

Citation preview

Page 1: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Corpora in literary and stylistic studies

Corpus LinguisticsRichard Xiao

[email protected]

Page 2: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Aims of this session• Lecture

– An overview of applications of corpora in literary and stylistic studies

– Case study: Culpeper’s (2002) keyword analysis of six characters in Romeo and Juliet

• Lab session– To duplicate Culpeper’s (2002) study

Page 3: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Corpora vs. literary stylistics• Stylistic shifts in usage may be observed with

reference to features associated with either particular situations of use or particular groups of speakers (cf. Schilling-Estes 2002: 375)– In this sense, similar to registers and genres or

dialects and language varieties– …but stylisticians are typically more interested in

individual works by individual authors rather than language or language variety as such

• The use of corpora in stylistics and literary studies is presently very limited

Page 4: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Potential uses of corpora

• Study of prose style• Study of individual authorial styles• Authorship attribution• Literary appreciation and criticism• Teaching of stylistics• Study of literariness in discourses other than

literary texts (e.g. Carter 1999)

Page 5: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Study of prose style• In stylistics, there is a long tradition of

focusing on the representation of speech and thought in fiction

• Leech and Short’s (1981) influential model of speech and thought presentation– Style in Fiction, Longman, 1981

• Further refined in Short, Semino and Culpeper (1996), and Semino, Short and Culpeper (1997)

Page 6: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

S&TP: Lancaster Speech, Thought and Writing Presentation Corpus

• Developed during 1994-2003– Written: 260,000 words in size, three narrative

genres: prose fiction, newspaper reportage and (auto)biography, which are further divided into ‘serious’ and ‘popular’ sections

– Spoken: created with the express aim of comparing S&TP in spoken and written languages systematically, 260,000 words, 60 samples from BNCdemo, and 60 samples from oral history archives in the Centre for North West Regional Studies at Lancaster

• Download: http://ota.ahds.ac.uk/headers/2464.xml

Page 7: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

S&TP categories• Direct category, e.g.

– direct speech, direct thought and direct writing• Free direct category, e.g.

– free direct speech, free direct thought, free direct writing• Indirect category, e.g.

– indirect speech, indirect thought, indirect writing• Free indirect category

– free indirect speech, free indirect thought, free indirect writing• Representation of speech/thought/writing act category• Representation of voice/internal state/writing category• Report category, e.g.

– report of speech, report of thought, report of writing

Page 8: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Authorial styles of individual authors

• Typically specialized corpora of the works of individual authors, e.g. – A corpus composed of their early and later works to track any

stylistic shift over time– A corpus composed of their works belonging to different genres

(e.g. plays and essays) to compare their styles across genres– A corpus composed of works by different authors to compare

their different authorial styles

• Large general corpora can provide ‘a means of establishing a norm for comparison when discussing features of literary style’ (Hunston 2002: 128)

Page 9: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Techniques of studying authorial styles • Corpus stylistics goes well beyond simple

counting but rather relying heavily on sophisticated statistical approaches – MDA (e.g. Watson 1994)– Principal Component Analysis (e.g. Binongo and Smith

1999)– Multivariate analysis (or more specifically, cluster

analysis, e.g. Watson 1999; Hoover 2003)

• Stylistics + computation + statistics– stylometry, stylometrics, computational stylistics,

statistical stylistics, corpus stylistics

Page 10: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Authorship attribution• Is the work by Shakespeare or Marlowe?• Cluster analysis of frequent words, frequent word

sequences, and frequent collocations provides an accurate and robust method for authorship attribution (Hoover 2001, 2002, 2003a, 2003b)

• Corpus-based authorship attribution has been used as linguistic evidence in court (“forensic linguistics”)– Confession/witness statements (e.g. Coulthard 1993)– Blackmail/ransom/suicide notes (Baldauf 1999)• Plagiarism detection in academic and education settings (e.g.

Turnitin UK)

Page 11: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

The Derek Bentley case• Derek Bentley was hanged in the UK in 1953 for allegedly

encouraging his young companion Chris Craig (a minor) to shoot a policeman– The evidence that weighed against him was a confession

statement which he signed in police custody but later claimed at the trial that the police had ‘helped’ him (to?) produce

• The case was re-opened in 1993, 40 years after Derek was hanged– Malcolm Coulthard, a forensic linguist, was commissioned

by Bentley’s family to examine the confession as part of an appeal to get a posthumous pardon for Derek

Page 12: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

The Derek Bentley case

• The appeal was initially rejected by the Home Secretary

• In 1998, another court of appeal overthrew the original conviction and found Derek Bentley innocent

• In 1999 the Home Secretary awarded compensation to the Bentley family

Page 13: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

The Derek Bentley case• In Bentley’s confession, the word then was unusually

frequent– It occurred 10 times in his 582-word confession statement,

ranking as the 8th most frequent word in the statement– It ranked 58th in a corpus of spoken English, and 83rd in the Bank

of English (on average once every 500 words)

• Six witness statements– 3 made by other witnesses: then occurs just once in 980 words– 3 by police officers, including two involved in the Bentley case:

then occurs 29 times – once in every 78 words!

Page 14: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

The Derek Bentley case• The position of then

– Subject + then (e.g. I then, Chris then) was unusually frequent in Bentley’s confession

• I then occurs three times (once every 190 words)• In a 1.5-million-word corpus of spoken English, the sequence

occurs just nine times (once every 165,000 words)• No instance of I then was found in ordinary witness

statements• Nine occurrences were found in the police statement• In the spoken BoE, then I was 10 times as frequent as I then

Page 15: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

The Derek Bentley case• The sequence subject + then was

characteristic of the police statement• Although the police denied Bentley’s claim

and said that the statement was a verbatim record of what Bentley had actually said, the unusual frequency of then and its abnormal position could be taken to be indicative of some intrusion of the policemen’s register in the statement

Page 16: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Culpeper (2002)• Culpeper, Jonathan (2002) Computers,

language and characterisation: An analysis of six characters in Romeo and Juliet. In U. Melander-Marttala, C. Ostman and Merja Kyto (eds.), Conversation in Life and in Literature. Uppsala: Universitetstryckeriet, pp.11-30.– www.lexically.net/wordsmith/corpus_linguistics_li

nks/Keywords-Culpeper.pdf

Page 17: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Aim of Culpeper (2002)• ‘The broad aim of this paper is to show how the

study of an important area within “stylistics”, namely characterisation, can benefit from an empirical approach, specifically, a methodology for identifying what might be the “key” words of a text … Such an approach can reveal significant lexical and grammatical patterns without reliance on speculations about what the relevant dimensions are’ (Culpeper 2002: 12)

Page 18: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Keywords vs. style-markers• Enkvist (1964: 29)

– ‘Style is concerned with frequencies of linguistic items in a given context, and thus with contextual probabilities.’

– ‘To measure the style of a passage, the frequencies of its linguistic items […] must be compared with the corresponding features in another text or corpus which is regarded as a norm and which has a definite relationship with this passage.’

• Style as a matter of ‘frequencies’, ‘probabilities’ and ‘norms’ – ‘We may […] define style markers as those linguistic items that only

appear, or are most or least frequent in, one group of contexts. In other words, style markers are contextually bound linguistic elements…’ (ibid. 34-5)

– ‘Elements that are not style markers are stylistically neutral.’ (ibid. 35)• ‘Style-markers…are words whose frequencies differ

significantly from their frequencies in a norm’ (Culpeper 2002: 13)– Keywords (positive and negative)

Page 19: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Preparing the text• Problem 1: Which text to use … original version or

modern version?– Culpeper opted for a modern edition (to get round

problem of spelling variation: sweet vs. sweete, etc.)• Problem 2: Shakespeare plays are full of dialogue

– How can we get the tool to distinguish between different characters?

– Culpeper used a simple tagging scheme, e.g.<ROM>…<\ROM><JUL>…<\JUL>

Page 20: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Who is worth concentrating on …?

• Culpeper chose his characters based on the number of words that they “spoke”

Character Total no. of words spoken

Romeo 5031

Juliet 4564

Friar Lawrence

2901

Nurse 2369

Capulet 2292

Mercutio 2254

Benvolio 1293

Page 21: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Choosing a reference corpus• Culpeper opted to make 6 reference corpora – one for

each character, e.g.– RC for Romeo = whole play minus Romeo’s contributions– RC for Juliet = whole play minus Juliet’s contributions– RC for Nurse = whole play minus Nurse’s contributions– …

• Why use a reference corpus of the same play?– ‘Characters are partly shaped by their context. Thus, it makes

little sense to compare, say, the characters of Romeo and Juliet with the characters of Macbeth or Anthony and Cleopatra, since the fictional worlds of Italy, Scotland and Egypt provide very different contextual influences. Furthermore, characters, like people, are partly perceived in terms of whom they interact with …’ (Culpeper 2002: 16)

Page 22: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Alternative reference corpora …?• Scott and Tribble (2006) have compared Romeo and Juliet

against– The Complete Works of Shakespeare– Plays only– Tragedies only– The BNC

• Interestingly … they found that– A ‘robust core’ of keywords occur whichever reference corpus is

used. These include personal and place names like “Benvolio”, “Romeo”, “Juliet” and “Mantua” but also terms like “banished”, “county”, “love” and “night”

• In contrast to Scott and Tribble (2006), Culpeper (2002) found that his results were more meaningful - in terms of characterisation - when using the other Romeo and Juliet characters (minus the target character) as a reference corpus

Page 23: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Making wordlists for each character

• Making the characters’ word lists– Involves telling Wordsmith to only include <…> … <\…>– Procedure …

• Wordlist – Settings – Wordlist specific – Tags – Only part of file – Sections to keep – [specifying start/end tags]

• Making the reference corpora– Involves telling Wordsmith to exclude anything between

<…> … <\…>– Procedure …

• Wordlist – Settings – Wordlist specific – Tags – Only part of file – Sections to cut out – [specifying start/end tags]

Page 24: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Top 10 on wordlists (frequency)ROMEO JULIET CAPULET NURSE MERCUTIO FRIAR L PLAY PRES-

DAY SPOKENENGLISH

PRES-DAY WRITTEN ENGLISH

AND

I

THE

TO

MY

THAT

A

OF

ME

IN

I

TO

AND

MY

THE

THAT

THOU

IS

A

BE

TO

YOU AND

A

MY

I

IS

THE HER NOT

I

A

AND

THE

YOU

TO

IT

IS

MY

O

A

THE

OF

AND

TO

THAT

I

IS

IN

THOU

AND

THE

TO

IN

THY

THOU

OF

IS

THAT

A

AND

THE

I

TO

A

OF

MY

THAT

IS

IN

THE

I

YOU

AND

IT

A

‘S

TO

OF

THAT

THE

OF

AND

A

IN

TO (INF)

IS

TO (PR)

WAS

IT

Q: Do they tell us anything interesting/worthwhile and, if so, what?

Page 25: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Positive keywords for the six charactersRomeo Juliet Capulet Nurse Mercutio Friar L

Beauty

Blessed

Love

Eyes

More

Mine

Rich

Dear

Yonder

Farewell

Me

Sick

Lips

Stars

Fair

Thine

Hand

Banished

If

Or

Sweet

Be

News

My

Night

I

Would

Yet

Thou

Words

Name

Nurse

Tybalt

Send

Husband

That

swear

Go

Wife

Thank

Ha

You

Thursday

Her

Child

Welcome

We

Haste

Gentlemen

Tis

Our

Make

Now

Daughter

Well

Day He’s

You Quoth

Woeful God

Warrant Madam

Lord Lady

Hie It

Your

Faith

Said

Ay

She

About

Ever

Sir

Marry

Ah

Fall

Well

A

Hare

Very

Of

He

The

O’er

Thy

From

Thyself

Mantua

Part

Heaven

Forth

Her

Alone

Time

Married

Letter

What differences can you spot between the results here and the results on the previous table?

Page 26: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

What key words can tell us about characterisation …

• Romeo’s top three key words – ‘beauty’, ‘blessed’, ‘love’ • Expected? Surprising? … the lover of the play

– Other keywords related to ‘love talk’ = ‘dear’, ‘stars’, ‘fair’– Keywords relating to body parts – ‘eyes’, ‘lips’, ‘hand’ –

obsessed with the physical?• Juliet’s top key word – ‘if’, ‘or’, ‘be’, ‘yet’, ‘would’ (conditional

+ modals) – Reflecting her state of mind – anxiety and uncertainty?

• Capulet most ‘key’ key word – ‘go’ – Context reveals that mostly used as an imperative command …

Capulet as head of the household to direct other people (see also ‘make’ and ‘haste’), e.g.

• Go wake Juliet, go and trim her up…• Nurse’s keywords are surge features (i.e. reflecting outbursts

of emotion) – ‘god’, ‘warrant’, ‘woeful’, ‘faith’, ‘marry’, ‘ah’

Page 27: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Negative key words for the six characters

Romeo Juliet Capulet Nurse Mercutio Friar L

You

Romeo

He

Go

Her

The

You

And

Go

Thou

That

The

Of

And

With

Thou

My

I

What

I

You

A

Have

My

IMPORTANTThese represent words that are used unusually infrequently

(statistically speaking) by these characters.

Do you notice anything interesting?

Page 28: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Use of Pronouns within Romeo and JulietJuliet Romeo Capulet nurse Mercutio Friar L

POS MY

I

THOU

ME

MINE

THINE

YOU

WE

TIS

OUR

HE’S

YOU

IT

YOUR

SHE

HE THY

THYSELF

NEG YOU YOU

HE

THOU THOU MY

I

I

YOU

MY

• Romeo and Juliet use first and second person pronouns– Expected? - “at the heart of the social interaction in the play”

• But compare Romeo’s use of ‘me/mine’ with Juliet’s use of ‘I’ …– Culpeper’s (2002) conclusion: ‘Juliet spends much time in the play bearing her

soul … whereas Romeo is much more conscious of his own role as a lover and of the effect of the circumstances upon him’ (ibid: 24)

• What about Capulet? – “you”, “we”, “our”, why?• Thou-forms vs. you-forms to be covered

Page 29: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Culpeper’s Conclusion (2002: 27)• “In some cases, my analysis provided solid evidence for

what one might have guessed (e.g. Romeo’s keywords ‘beauty’ and ‘love’) …”

• “… in others, it revealed what I think would be very difficult to guess but fits well a possible interpretation (e.g. Juliet’s keywords ‘if’ and ‘yet’).”

• “… keywords analysis also offers a way into analysing function words, such as pronouns, and accounting for their contribution to style and meaning”

Page 30: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

What should we take note of …?• How he was able to come to his conclusions

– The importance of having the right reference corpus

– The need to use mark-up (as a means of identifying the different characters)

– Knowing how to use Wordsmith … • To make the different wordlists• To make the keyword lists

Page 31: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Any potential weaknesses …• It did not attempt to lemmatize the word forms … so

that, for example, ‘loves’ would form part of the word count of ‘love’ (Culpeper 2002: 27)

• Contractions (e.g. I’ll) would also have been counted separately

• Key word analysis … – makes us focus on ‘statistical deviations from a relative

norm, and ignores the significance of relatively infrequent deviations from absolute norms’ (i.e. what your given texts may have in common)

– ignores one-off occurrences of words

Page 32: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Now it’s your turn…

Duplicating Culpeper (2002)

Page 33: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

The Romeo text• Download the “Oxford Shakespeare” version of

Romeo and Juliet– http://www.bartleby.com/70/index38.html– Local copy available

• Using tags to separate stage directions from dialogues– Did Culpeper do this?

• Tag words spoken by each character• Alternatively, you can use a local version I have

prepared

Page 34: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Sample of tagged text• <Exeunt MONTAGUE and LADY. ROMEO. >• <Ben.> Good morrow, cousin. <\Ben.>• <Rom.> Is the day so young? <\Rom.>• <Ben.> But new struck nine. <\Ben.>• <Rom.> Ay me! sad hours seem long. Was that my father that went hence

so fast? <\Rom.>• <Ben.> It was. What sadness lengthens Romeo’s hours? <\Ben.>• <Rom.> Not having that, which having, makes them short. <\Rom.>• <Ben.> In love? <\Ben.>• <Rom.> Out— <\Rom.>• <Ben.> Of love? <\Ben.>• <Rom.> Out of her favour, where I am in love. <\Rom.>

Page 35: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Separating words by apostrophes

clear ‘ from this box and press OK

Page 36: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Making a wordlist for each character• Start wordlist function• Load the text• Setting – Tags – Only part of File - “Sections to keep” – type in

the start/end tags given below– Ignore <*> is default setting – ignore stage directions

• Make a wordlist for– Romeo_TC (<Rom.>…<\Rom.>)– Juliet_TC (<Jul.>…<.\Jul.>)– Capulet_TC (<Cap.>…<\Cap.>)– Nurse_TC (<Nurse.>…<\Nurse.>)– Mercutio_TC (<Mer.>…<\Mer.>)– Friar_L_TC (<Fri._L.>…<\Fri._L.>)

Page 37: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Tag and markup

Only Part of file

Page 38: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Making a reference list for each character

• Setting – Tags – Only part of File - “Sections to cut out” – type in the start/end tags given below– Excluding what is said by the target character

• Make a wordlist for– Romeo_RC (<Rom.>…<\Rom.>)– Juliet_RC (<Jul.>…<.\Jul.>)– Capulet_RC (<Cap.>…<\Cap.>)– Nurse_RC (<Nurse.>…<\Nurse.>)– Mercutio_RC (<Mer.>…<\Mer.>)– Friar_L_RC (<Fri._L.>…<\Fri._L.>)

Page 39: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Running wordsCharacter In our file Culpeper (2002)

Romeo 4842 5031

Juliet 4438 4564

Friar Lawrence 2860 2901

Capulet 2282 2292

Nurse 2250 2369

Mercutio 2169 2254

Page 40: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Discrepancies: Some explanations• Different tagging

– We ignored stage directions– We tried what Culpeper (2002) suggested at the end of his

paper, treating contracted words such as “I’ll” as two words

• A potential problem of this approach with Shakespearean texts– danc’d, disturb’d, and rais’d etc all became two words!

– Is there a need to annotate the text?• Not done here or in Culpeper (2002), but worth its efforts

– the city’s side– let’s away– Where’s this girl?

• Want to have a try?– http://ucrel.lancs.ac.uk/claws/trial.html

Page 41: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Top 10 on wordlists• Romeo Juliet Capulet

• Nurse Mercutio Friar L whole play

Page 42: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Keyword settings

Selected statistic formula

Min. Frequency

Cutoff p value

Page 43: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Making a keyword list per character

• Romeo_kw– Romeo_TC + Romeo_RC

• Juliet_kw– Juliet_TC + Juliet_RC

• Capulet_kw– Capulet_TC + Capulet_RC

• Nurse_kw– Nurse_TC + Nurse_RC

• Mercutio_kw– Mercutio_TC + Mercutio_RC

• Friar_L_kw– Friar_L_TC + Friar_L_RC

Page 44: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Romeo’s keywords by keynessPositive keywords Negative keywords

Aboutness: beauty, love, blessed, dream, joy, sin, kiss, death, poison, soul …Love talk: dear, farewell, starsBody parts: eyes, lips, handPronouns: mine, me, thine, thee, my

Himself: Romeo, he, himBoth: you, weMovement: come, go, up

Page 45: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Juliet’s keywords by keynessPositive keywords Negative keywords

People in interaction: nurse, Romeo, sweet, husband, mother, fatherState of mind: if, or, be, yet, wouldPronouns: my, I, thouAboutness: news, words, night, swear, send, tongue, speak

Herself: herBoth: we, youMovement: here, go

Page 46: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Why “nurse” and husband?

(vocal function)

Page 47: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

You-forms vs. thou-forms• You-forms vs. thou-forms

– Plural: ye, you, your, yours, yourself– Singular: thou, thee, thy, thine, thyself

• You-forms vs. thou-forms (thou, thine, thee) – socio-pragmatic implications– Romeo and Juliet prefer thou-forms (positive) and avoid you-forms

(negative)• High status social equals use you-forms• You-forms are dispassionate and emotionally unmarked• Thou-forms are strongly expressive: positive (affection and love) or negative

(anger and contempt) – intimacy, love talk– Friar Laurence prefers thou-forms: He is engaged in intimate and

emotionally charged discourse– Capulet and the Nurse prefer you-forms: used among social superiors, or

individuals of low status talking to people of high social status

Page 48: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Capulet’s keywords by keynessPositive keywords

Negative keywords

Pron: thy, thouOthers: the, of, that, etc.

[full of actions, not a ‘nouny’ style]

Directions: go, haste, make, now, look (imperatives)Pronouns: you, we, her, our (directing and speaking on behalf of the household)etc…[you vs. thou: imperative; less emotional]

Page 49: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Nurse’s keywords by keynessPositive keywords Negative keywords

Emotional: ay, ah, O, God, woeful, warrant, faith Pronouns: you, your, he, IAddress terms: lady, madam, lord, sirWhy “day”? - “O day! O day! O day! O hateful day!”

Pron: thouWhy ’d?

Page 50: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Why “d”?

Culpeper might have made the correct decision to treat contractions as one word?

Page 51: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Mercutio’s keywords by keynessPositive keywords Negative keywords

“Noun-y” style: a, of, the, an – akin to written, less interactive

Less interactive style:Lack of Question word: whatLack of 1st person pron: I, my

Page 52: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Friar L’s keywords by keynessPositive keywords Negative keywords

Pronouns: thy, thyself, thou - involved in intimate and emotional charged discourse, "emotional mirror"A man of the Church: heaven, from (heaven)Roles he played in facilitating the plot: Mantua, letter

Less emotional (than Nurse): OPronouns: my, you, I

Page 53: Corpora in literary and stylistic studies Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

53

Planning your own study …

What should I do first …?

– Choose your data and/or tool

– Determine what interests you about the data

– Come up with some “hypotheses” that you’d like to test out

• This can be data-driven (what seems to “jump out” at you from your data)

• This can be theory-oriented (i.e. testing out something about the language that’s taken for granted)