48
Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao [email protected]

Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao [email protected]

Embed Size (px)

Citation preview

Page 1: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Using corpora in critical discourse analysis

Corpus LinguisticsRichard Xiao

[email protected]

Page 2: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Aims of this session• Lecture

– Corpora versus critical discourse analysis– The state of the art of corpus-based discourse studies– Case study: How is Islam constructed in the UK and US

press before and after 9/11?

• Lab session– Using Wmatrix to exploring political discourse:

Michael Howard and Tony Blair’s farewell speech to their party

Page 3: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Critical discourse analysis (CDA)

• Discourse– Language use above the sentence level– Language use in context– Real language use

• CDA examines language as a form of cultural and social practice, focusing on the relationship between power and discourse, and between language and ideology

Page 4: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

CL vs. CDA• Both rely heavily on real language• ‘a cultural divide’ (Leech 2000: 678-680)

– CDA emphasizes the integrity of text while CL tends to use representative samples

– CDA is primarily qualitative while corpus linguistics is essentially quantitative

– CDA focuses on the contents expressed by language while CL is interested in language (form) per se

– The collector, transcriber and analyst are often the same person(s) in CDA while this is rarely the case in CL

– The data used in CDA is rarely widely available while corpora are typically made widely available

Page 5: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

A diminishing divide…• Some important ‘points of contact’ (McEnery

and Wilson 2001: 114) – The common computer-aided analytic techniques– The great potential of standard corpora in CDA as

control data

Page 6: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Use of corpora in CDA: pros and cons• Cons…

– The corpus-based approach tends to obscure ‘the character of each text as a text’ and ‘the role of the text producer and the society of which they are a part’ (Hunston 2002: 110)

• CL focuses on text, not text producer

– Analyzing a lot of text from a corpus simultaneously would force the analyst to lose ‘contact with text’ (Martin 1999: 52)

• Pros…– Corpora present a real opportunity to discourse analysis,

because the automatic analysis of a large number of texts at one time ‘can throw into relief the non-obvious in a single text’ (Partington 2003: 7)

Page 7: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Use of corpora in CDA: pros and cons

• Pros– ‘Obviously, the methods for doing a ‘critical discourse analysis’

of corpus data are far from established yet. Even when we have examined a fairly large set of attestations, we cannot be certain whether our own interpretations of key items and collocations are genuinely representative of the large populations who produced the data. But we can be fairly confident of accessing a range of interpretative issues that is both wider and more precise than we could access by relying on our own personal usages and intuitions. Moreover, when we observe our own ideological position in contest with others, we are less likely to overlook it or take it for granted.’ (de Beaugrande 1999: 287)

Page 8: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

CL and CDA: interaction and synergy

• Partington (2003: 12) proposes a scalar view of the uses of CL, pointing towards a rationale for using CL-related methods to carry out CDA– ‘At the simplest level, corpus technology helps find other

examples of a phenomenon one has already noted. At the other extreme, it reveals patterns of use previously unthought of. In between, it can reinforce, refute or revise a researcher’s intuition and show them why and how much their suspicions were grounded.’

• Partington (2004, 2006) provides a systematic description of CADS (corpus-assisted discourse studies)

Page 9: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

CL and CDA: interaction and synergy• Complementary to each other and interaction benfiting

both areas of research • CL can provide a general ‘pattern map’ of the data,

mainly in terms of frequencies, key words/clusters and collocations, as well as their diachronic development (the latter contributing to the historical perspective in DHA: Discourse Historical Approach represented and pioneered by Ruth Wodak), which helps pinpoint specific periods for text selection or sites of interest

• The CDA analysis can point towards patterns to be further explored through the CL lens and also provide explanations for corpus findings

Page 10: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

CL and CDA: interaction and synergy

• CL can also examine frequencies (or at least provide strong indicators of the frequency) of specific phenomena recognized in CDA (e.g., topoi, topics, metaphors) by examining lexical patterns

• CL can add a quantitative dimension to CDA to make it more objective

• CL in general and concordance analysis in particular can be positively influenced by exposure and familiarity with CDA analytical techniques

Page 11: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

CL and CDA: interaction and synergy

• CL needs to be supplemented by the close analysis of selected texts using CDA theory and methodology

• CDA, in turn, can benefit from incorporating more objective, quantitative CL approaches, as quantification can reveal the degree of generality of, or confidence in, the study findings and conclusions in CDA

Page 12: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Possible stages in CADS

Baker et al (2008: 295)

Page 13: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Construction of Islam in UK and US press around 9/11

• How do news stories construct Islam?• Have there been any changes before and after 9/11?• Are there differences between reporting on Islam (as

a religion) and Muslims (as a people)?• Are there any differences/similarities between

tabloids and broadsheets?• Are there any differences/similarities between

American and British newspapers?

Page 14: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Why Islam?• Post WWII – demand for unskilled

labour results in migration of Pakistani and Bangladeshi Muslims to the UK

• In April 2001 the former British Foreign Secretary Robin Cook reported that Britain’s national dish is chicken tikka masala

• September 2001 – terrorist attacks on the US, believed to be associated with Islamic extremists

• July 2005 – terrorist attacks on UK

Page 15: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Data• UK and US newspapers in 1998-2005 (pre- and post-

9/11)• 87 million words of British news

– Broadsheets (65 M words): The Business, The Guardian, The Independent & Independent on Sunday, The Observer, The Times & Sunday Times, Daily Telegraph & Sunday Telegraph

– Tabloids (22 M words): The Daily Express & Sunday Express, The Daily Mail & Mail on Sunday, Daily Mirror & Sunday Mirror, The People, Daily Star & Sunday Star, The Sun

• 40 million words of American news– Financial Times, New York Times, Washington Post, San

Francisco Chronicle

Page 16: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Search terms related to Islam

• Alah OR Allah OR ayatolah OR burka! OR burqa! OR chador! OR fatwa! OR hejab! OR imam! OR islam! OR Koran OR Mecca OR Medina OR Mohammedan! OR Moslem! OR Muslim! OR mosque OR mufti! OR mujaheddin! OR mujahedin! OR mullah! OR muslim! OR Prophet Mohammed OR Q'uran OR rupoush OR rupush OR sharia OR shari'a OR shia! OR shi-ite! OR Shi'ite! OR sunni! OR the Prophet OR wahabi OR yashmak! AND NOT Islamabad AND NOT shiatsu AND NOT sunnily

Page 17: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Frequencies of articles over time

0

500

1000

1500

2000

2500

3000

3500

4000

1998-01 1998-11 1999-09 2000-07 2001-05 2002-03 2003-01 2003-11 2004-09 2005-07

2011-09

Page 18: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Method1. Corpora split into 4:

2. All sub-corpora compared to a reference corpus (BNC written – 90 million words)

3. UK sub-corpora compared with US sub-corpora4. Keywords extracted and analysed via concordances with

respect to moral panic categories5. UK broadsheets vs. UK tabloids6. Collocational and concordance analysis of Islam, Islamic,

Muslim, Muslims

UK pre 9/11 (27 million) US pre 9/11

UK post 9/11 (60 million) US post 9/11

Page 19: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Moral panic• Conceived by Stanley Cohen

(1972) in his study of Mods and Rockers in the UK– Violent clash between the gangs

of Mods and Rockers in 1964– Two conflicting British

subcultures in the mid 1960s

• Referring to the intensity of feeling expressed by a large number of people about a specific group of people who appear to threaten the social order at a given time

Page 20: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Features of moral panic• Build-up of concern over a social issue• A scapegoat (social group)• Solutions proposed: moral entrepreneurs

– A person who seeks to influence a social group to adopt or maintain a norm, e.g. MADD (mothers against drunk driving), and the anti-tobacco lobby

• Moral panic is often expressed as outrage rather than fear

• Emotive language is used• Threat is normally exaggerated

Page 21: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

McEnery’s (2005) moral panic categories

• 1. object of offence – that which is identified as problematic

• 2. consequence– the negative results which it is claimed will follow

if the object of offence is not eliminated

• 3. corrective action– the actions to be taken to eliminate the object of

offence

Page 22: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

McEnery’s (2005) moral panic categories• 4. desired outcome

– the positive results which will follow from the elimination of the object of offence

• 5. moral entrepreneur– the person/group campaigning against the object of

offence• 6. scapegoat

– that which is the cause of, or which propagates the cause of offence

• 7. rhetoric – register marked by a strong reliance on evaluative lexis

that is polar and extreme (strong language)

Page 23: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

UK keywords pre 9.11• No evidence of moral panic• References to Iraq, Israel, Kosovo, Palestine• Muslims often mentioned ‘in passing’ rather

than as main subject of article• A wider range of contexts pre 911

– fashion, famous, tourists, music, hotel, cricket, sex, leisure, dance, ski, museum, divorce, café, wine, gardens, film, beer, holidays, football, exotic, fun

Page 24: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

UK - After 9/11• British Muslims and what they believe

– ‘The vast, vast majority, of Muslims living in the UK support policing efforts, fear terrorism and want to work with us," said [Sir Ian].’ (The Guardian, October 29, 2004).

• Focus on belief– moderate, militants, fanatics, fundamentalist,

extremists

• Focus on immigration, political correctness and scroungerphobia (taxpayers)

Page 25: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

UK moral panic post 9/11?Category Positive Keywords in that Category

Consequence anger, angry, bad, bombing, bombings, conflict, crime,dead, death, destruction, died, evil, fear, fears, injured, kill,killed, killing, murder, terror, threat, victims, violence, wounded, wrong

Corrective action

arrested, fight, fighting, invasion, jail, justice, moderate,occupation, police, revenge, troops

Desired outcome

best, better, freedom, good, peace, support

Moral entrepreneur

America, American, Britain, British

Object of offence

atrocities, attack, attacks, bomb, bombs, criminal,extremism, failed, hatred, illegal, jihad, radical, regime,terrible, terrorism, weapons

Scapegoat Arab, (suicide) bombers, enemy, extremists, immigrants,Iran, Iraq, Iraqi, Islam, mosque, Muslim, Muslims,Pakistan, Palestinian, religious, suicide, terrorists

Rhetoric question, need, must, why

Page 26: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

US – before 9/11

• Keywords are mainly proper nouns relating to Israel/Palestine, Bosnia, Kosovo, Indonesia.

• Peace is a keyword – focus on contexts where Muslims are aggressed against

• Muslims (occasionally cast as internal to the US)

Page 27: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

US keywords post 9/11Consequence attacks, Sept

Corrective action

American, Americans, forces, intelligence, marine, marines,military, officials, (war on) terror, war (on terror)

Desired outcome

NONE

Moralentrepreneur

Bush, pentagon, United States, US

Object of offence

Terrorism

Scapegoat (al) Qaeda, afghan, Afghanistan, al (Qaeda), bin (laden),(Saddam) Hussein, Hussein’s, insurgents, Iraq, Iraq’s, Iraqi,Iraqis, (bin) Laden, Saddam (Hussein), Shiite, Shiites, Sunni,Taliban, terrorist, terrorists,

Rhetoric NONE

Page 28: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Tabloids vs. Broadsheets

• Style and spelling– Tabloids (chatty, interactive style)

Pronouns: I, my, me, myself, we, he, she Emphatic adjectives: stunning, fantastic, terrible, wonderful

– Broadsheets (logical, formal, ‘nouny’ style)Conjunctions/determiners: the, that, which however, thus, than Formal terms of address: Mr, Ms

Page 29: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Moslem – key in the tabloids• 7,282 tabloid uses• 4,834 in the Daily Mail• 2,208 Daily Express

0

100

200

300

400

500

600

700

800

98-01

98-05

98-09

99-01

99-05

99-09

00-01

00-05

00-09

01-Jan

01-Ma

01-Sep

02-Jan

02-Ma

02-Sep

03-Jan

03-Ma

03-Sep

04-Jan

04-Ma

04-Sep

05-Jan

05-Ma

Moslem(s)

Muslim(s)

Page 30: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

‘Bin Laden’ in tabloid newspapers

• powerful (mastermind, terrorist godfather, millionaire, Al Qaeda leader)

• warrior leader (chief, warlord)• outcast (dissident, exile, fugitive)• insane (maniac, twisted) • evil (gloating menace, evil, terrorist,

murderous)• fanatical (extremist, fanatic, fanatical)

Page 31: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Tabloid villains

• Direct references to terrorist attacks– terror, terrorists, Taliban, Osama, Bin, Laden,

bomb, bombs, bomber, bombers, plane, suicide, killers, attack, crash, hijack, September, twin and towers

• Emotive/evaluative reaction: emotionally charged lexis– atrocity, atrocities, tragedy, carnage, horror,

terrible, evil

Page 32: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Other tabloid categories

• Brainwashing– lure, rant, rants, spew, rouser, brainwashed

“Children are being brainwashed into becoming Islamic extremists at 300 "Taliban schools" in Britain, it was reported last night. Youngsters are being indoctrinated with radical Islamic ideals by militant groups across the country, said leading British Muslim Dr Zaki Badawi.” (The Sun, December 28, 2001)

• Also, ’scrougerphobia’ and political correctness

Page 33: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Types of belief in tabloid vs. broadsheet

• In the tabloids, Muslims are fanatics and extremists

• In the broadsheets, Muslims are radicals, fundamentalists, separatists but also moderates and progressives

Page 34: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Broadsheet keywords

• More focus on Islam – The media: book, novel, television, film, poetry– Other religions: Hindu, Christian, Buddhist, Judaism – World events: Iran, Iraq, Iraqi, Arab, Israeli, Israel,

Palestinian, Baghdad, Jerusalem, Lebanon, Syria – War and conflict: military, conflict, army, resistance,

violence, occupied, ceasefire, genocide, peace, invasion

Page 35: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Muslim(s) vs. Islam(ic)• Tabloids: more focus on Muslims (the people)

– Muslims as terrorists; evil preachers, Muslims as British and desiring peace, women as victims (honor killings, arranged marriage, hijab), men as potential terrorists or victims of racism

• Broadsheets: more focus on Islam (as a religion)– Stories on terrorism restricted to the word Islamic

Page 36: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Political discourse: Howard vs. Blair

• Use Wmatrix to tag the following two texts– Tips: It’s a good practice to create one folder for each

file

• Michael Howard’s farewell speech to his party (2005)– Leader of Conservative Party in 2003-2007

• Tony Blair’s farewell speech to his party (2006)– Leader of Labour Party in 1997-2007

Page 37: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

part of speech tagging semantic tagging frequency lists

A quick “how to”!

• Enter new workarea name (Blair / Howard)

• Click the browse button to select the right file

• Click the “upload now” button …• A new screen will provide you

with an update report … e.g.

Page 38: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

You will then be taken to your work area[My folders]

Page 39: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

What you’ll see in the Simple “VIEW of folder”

Click on Frequency to see the most frequent words: what are they?

You can also do concordance searches of words/phrases

--- and investigate Word clouds (= the most “key” words)

Scroll down to see Tag clouds - “key” concepts

Page 40: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

The word cloud of Howard’s farewell speech (compared with Blair)

Page 41: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

We use a similar method to investigate keywords (as with WordSmith)

i.e. we compare text A

… with text B

… so that we can discover the most significant items within text A

… and not only the frequent items

Page 42: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Exploring keywords (as word clouds) in simple view

Under 3. Word clouds, scroll down the pop-up menuto choose BlairThen click on Go

- and any keywords with LL15+ will appear

Page 43: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Advanced View of Howard Folder

Click on Frequency to see the most frequent words (as before)

How might we discover the most ‘frequent’ POS? Jot them down

… and the most ‘frequent’ semantic fields? Make a note of them

We can also see all of the keywords using this VIEW

--- and investigate key parts of speech (POS) and key concepts / domains

Page 44: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Frequency of words in Howard and Blair (using advanced view)

Make a note of the similarities and differences …

Page 45: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Exploring keywords using advanced viewFind the “key words compared to:” drop-down menu, and click Go

You will be taken to a web-page, which shows ALL keywords …

Page 46: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Keywords for Howard (when compared with Blair)

IMPORTANT – anything above LL 15 = 99.99%

confidence of significance– anything above LL 6.63 = 99%

confidence of significance

• How many keywords from the Howard text have LL values of 15+? What are they?

• How many keywords have LL values of 7+? What are they?

• Do you notice anything interesting about these keywords?

• Do any of the keywords share the same semantic fields?

Page 47: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Same procedure for key POS and key domains

Find the “key POS compared to:” drop-down menu, and click Go

Find the “key concepts compared to:” drop-down menu, and click Go

Page 48: Using corpora in critical discourse analysis Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Exploring key domains (Howard, in comparison to Blair)

• What do you notice about the “key” domains?

• Do we capture more words by undertaking a key domain analysis than we do by undertaking a keyword analysis? And, if so, why do you think this is the case?

• Undertake a keyword analysis of Blair (using Howard as the reference corpus) to determine the differences between the two speeches