56
Political Text Analysis 1 Political Text Analysis Lecture 4 Kohei Watanabe

Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Political Text Analysis 1

Political Text AnalysisLecture 4

Kohei Watanabe

Page 2: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Statistical analysis of textWhy is it possible?

Political Text Analysis 2

Page 3: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Distributional hypothesis

• Distributional Structure (Harris 1954)• Cooccurrence of words offers complete description of language

without considering its historical or psychological aspects• We do not need treat words as symbols to understand them

• Words do not cooccur randomly• For a word, surrounding words are its environment (context)• Word cooccur in a certain environment because of semantic necessity

• e.g. “Liberty without learning is always in peril, and learning without liberty is always in vain” (John F Kennedy).

• Words are synonyms if they occur in the same environment • e.g. “A president’s hardest task is not to do what is right, but to know what is

right” (Lyndon Johnson).

Political Text Analysis 3

Page 4: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Proksch et al., forthcoming

• Authors wanted to perform sentiment analysis of European legislative debates

• Legislatures are Czech Republic, Finland, Germany, the Netherlands, Spain, Sweden, and the United Kingdom

• Dictionaries are commonly used in sentiment analysis, but Lexicoder Sentiment Dictionary is only in English

• Authors translated LSD into 20 languages• Used Google Translate API• Checked translation by comparing frequency of words in multi-lingual

European Parliament Debate• Assumed that the frequency distribution will be the same if dictionaries are

equivalent

Political Text Analysis 4

Page 5: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Sentiment analysis in English

Political Text Analysis 5

Page 6: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Sentiment analysis in German

Political Text Analysis 6

Page 7: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Potential issues and limitations

• Be careful about use of APIs• We do not really know how the system behind the API works

• There is no transparency• The result will change if the system is updated

• Reproduction would be difficult

• Machine translation does not work always• Algorithms disambiguate words based on contexts

• Lexicon does not have contextual information• Accuracy of translation depends of language pairs

• Translation between European and Asian languages is difficult

Political Text Analysis 7

Page 8: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

• Non-positional analysis (“bag-of-words”)• Analysis is based on solely frequency of words in documents• Discard information about relative positions of words• The mainstream of current text analysis

• For example,• Simple frequency• Relative frequency• Document/feature similarity• Document classification/scaling

• This simplistic approach makes computation statistical estimation easier

Political Text Analysis 8

Types of statistical analysis (1)

Page 9: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

• Positional analysis (“string-of-words”)• Analysis is based on relative positions of words and their frequency

• Positions are either contiguity (sequence) or proximity (n-words window)• N-grams or skip-grams preserve position of words in the bag-of-words

framework • There are only few models for positional analysis

• For example,• Collocation analysis• Word embeddings

• Positional analysis is computationally intensive• There will be more models for positional analysis in the future

Political Text Analysis 9

Types of statistical analysis (2)

Page 10: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Unit of cooccurrences

• Unit cooccurrences changes levels of language phenomena to discover

• Document• Discourse

• Discover topic, ideology, sentiment etc.

• Sentence/window (collocation)• Semantics (meaning of words)

• Identify synonyms or antonyms

• Sequence (contiguous collocation)• Syntactical/lexical information

• Extract multi-word expressions (phrase and proper nouns)

Political Text Analysis 10

Page 11: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Frequency analysis

Political Text Analysis 11

Page 12: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Frequency analysis

• Imagine, we have British election manifestos from 3 parties• We only have frequency of individual words in non-positional analysis• However, we can say many things about the texts

Political Text Analysis 12

government people bank

Party A 0 1 3

Party B 2 3 1

Party C 2 1 3

Page 13: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Frequency analysis

• The DFM tells that• A and C discuss banks more than people, but party B is opposite

• A and C are probably conservative, but B is liberal• B and C mention the government but A does not

• A might be the ruling party and B and C are oppositions

Political Text Analysis 13

government people bank

Party A 0 1 3

Party B 2 3 1

Party C 2 1 3

Page 14: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Interpretation of word frequency

• The DFM with marginals tells that• B and C are longer than C

• Oppositions parties are criticising the government• Parties talks more about people than banks

• Social policy is more important than economic policy in the election

Political Text Analysis 14

government people bank Length

Party A 0 1 3 4

Party B 2 3 1 6

Party C 2 1 3 6

Total 4 4 3 16

Page 15: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Normalized frequency

• The normalized DFM tells the salience of words• Bank: A > C > B• People: B > A > C• Government: B = C > A

Political Text Analysis 15

government people bank

Party A 0.00 0.25 0.75

Party B 0.33 0.50 0.17

Party C 0.33 0.17 0.50

Page 16: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Exercise 1

• How do you interpret the frequency of words?• Hint: Be careful about meanings of “benefits”

Political Text Analysis 16

immigrants economic benefits crime

Party A 3 2 1 0

Party B 3 1 1 1

Party C 4 0 3 2

Total 10 3 5 3

Page 17: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Interpretation

• Examples• A says migrants contribute to economic growth• B talks about immigration with little attention to other issues • C associates migrants with crimes such as welfare fraud (“social

benefit”)

Political Text Analysis 17

immigrants economic benefits crime

Party A 3 2 1 0

Party B 3 1 1 1

Party C 4 0 3 2

Total 10 3 5 3

Page 18: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Document/feature similarity

• Documents/features similarity is ‘correlation’ between pairs vectors for documents or features

• Similarity measures are Person’s correlation, cosine, Jaccard, Dice, Hamman, Faith etc.

Political Text Analysis 18

government people bank

Party A 0 1 3

Party B 2 3 1

Party C 2 1 3

Page 19: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Document/feature similarity

• Document/feature similarity matrix is symmetric• Diagonal elements (similarity to itself) are all 1.0

Political Text Analysis 19

government people bank

government 1.00 0.50 -0.50

people 0.50 1.00 -1.00

bank -0.50 -1.00 1.00

Party A Party B Party C

Party A 1.00 -0.65 0.65

Party B -0.65 1.00 -1.00

Party C 0.65 -1.00 1.00

Page 20: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Document/feature similarity

• Formal definition of documents/features similarity:1. When DFM = 𝑿𝑿2. Document and feature vectors are

𝐷𝐷𝑖𝑖 = 𝑥𝑥𝑖𝑖𝑖𝑖 , 𝑥𝑥𝑖𝑖𝑖𝑖+1, 𝑥𝑥𝑖𝑖𝑖𝑖+2𝐹𝐹𝑖𝑖 = 𝑥𝑥𝑖𝑖𝑖𝑖 ,𝑥𝑥𝑖𝑖+1𝑖𝑖 , 𝑥𝑥𝑖𝑖+2𝑖𝑖

3. Document and feature vectors aredocumet similarity = sim 𝐷𝐷𝑚𝑚,𝐷𝐷𝑛𝑛feature similarity = sim 𝐹𝐹𝑚𝑚,𝐹𝐹𝑛𝑛

4. There are a few similarity measures

Political Text Analysis 20

Page 21: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Similarity measures

• Pearson’s correlation

cor 𝑥𝑥,𝑦𝑦 =∑ 𝑥𝑥𝑖𝑖 − �̅�𝑥 𝑦𝑦𝑖𝑖 − �𝑦𝑦∑ 𝑥𝑥𝑖𝑖 − �̅�𝑥 2 ∑ 𝑦𝑦𝑖𝑖 − �𝑦𝑦 2

• Cosine

cosine 𝑥𝑥,𝑦𝑦 =∑ 𝑥𝑥𝑖𝑖 𝑦𝑦𝑖𝑖

∑ 𝑥𝑥𝑖𝑖 2 ∑ 𝑦𝑦𝑖𝑖 2

• Jaccard, Dice, Hamman, Faith etc.

Political Text Analysis 21

Page 22: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Document/feature similarity

• Cosine similarity is spatial measure

• Angle of two lines becomes 0° if all features are the same

similiaty = cos 0 = 1• Angle of two lines becomes 90° if all

features are the samesimilarity = cos 90 = 0

• Cosine similarity is between 0 to 1.0 for raw frequency count

Political Text Analysis 22

Feature 1

Feature 2

Doc A

Doc B

Page 23: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Interpretation of document similarity

• Document similarity in the DFM tells that• A is most similar to C

• A and C pursue similar policies (both conservative parties) • B is most dissimilar to C

• B emphasises difference with rival opposition party in the campaign

Political Text Analysis 23

Party A Party B Party C

Party A 1.00 -0.65 0.65

Party B -0.65 1.00 -1.00

Party C 0.65 -1.00 1.00

Page 24: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Interpretation of feature similarity

• Feature similarity in the DFM tells that• “Government” is similar to “people” than “bank”

• Again, social policy is the main agenda in the election• “People” and “bank” are most dissimilar

• They represent different policy agendas

Political Text Analysis 24

government people bank

government 1.00 0.50 -0.50

people 0.50 1.00 -1.00

bank -0.50 -1.00 1.00

Page 25: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Exercise 2

• How do you interpret this similarity matrix?

Political Text Analysis 25

immigrants economic benefits crime

immigrants 1.00 -0.87 1.00 0.87

economic -0.87 1.00 -0.87 -1.00

benefits 1.00 -0.87 1.00 0.87

crime 0.87 -1.00 0.87 1.00

Page 26: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Interpretation

• Examples• High similarity between “immigrants” and “crime”

• Immigrants are frame negatively as potential criminals• Higher dissimilarity of “benefits” and “economic”

• In the elections campaign, “benefits” is not about economic benefit

Political Text Analysis 26

immigrants economic benefits crime

immigrants 1.00 -0.87 1.00 0.87

economic -0.87 1.00 -0.87 -1.00

benefits 1.00 -0.87 1.00 0.87

crime 0.87 -1.00 0.87 1.00

Page 27: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Jansa et al. 2018Copy and Paste Lawmaking: Legislative Professionalism and Policy Reinvention in the States

Political Text Analysis 27

Page 28: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Research question

• RQ: How much professionalization of state legislatures affects the adoption of bills from other states?

• The US states with less professional legislatures are more likely copy bills from other states

Political Text Analysis 28

Page 29: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Data

• 400 bills for 12 policies that diffused across the US states between 1982 and 2014

• State legislature characteristics• Ideology of the government• Salary of legislator • Existence of lobby groups• Staff funding• Professionalism indices

• Combines staff resources, session length, and salary

Political Text Analysis 29

Page 30: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Analysis

• Compute cosine similarity of bills • Compute similarity between new and old bills in different legislatures

• Included similarity scores in a two-stage regression model• First stage

• Predicted adoption of bills by state legislature characteristics• Second stage

• Explain the largest similarity score by• Salary of legislator (Salary) • Professionalism index (Professionalism)• Staff funding (Staff Expenditure)• Length of meetings (Session Length)

Political Text Analysis 30

Page 31: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Political Text Analysis 31

Page 32: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Political Text Analysis 32

Page 33: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Conclusion

• High professionalism and staff expenditure make state legislatures less likely to copy bills from other states

Political Text Analysis 33

Page 34: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Relative frequency analysis

Political Text Analysis 34

Page 35: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

More realistic DFM

Political Text Analysis 35

• Usually interesting features are infrequent• The most frequent words are “the” and “policy”

• We can remove “the” using stop list• Difficult to remove words like “policy”

government people bank the policy

Party A 0 1 3 10 5

Party B 2 3 1 12 6

Party C 2 1 3 11 6

Page 36: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Keyness

Political Text Analysis 36

• Keyness is a relative frequency measure in WordSmith (Schott 2006)

• Compare frequency of words between two groups of documents

Chi-square p Party A Party B & C

bank 0.25 0.62 3 4

policy 0.03 0.85 5 12

the 0.01 0.94 9 23

people -0.16 0.69 1 4

government -0.49 0.48 0 4

Page 37: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Computing keyness

• Keyness is signed categorical association measures• Chi-Squared• Likelihood ratio (G score)• Point-wise mutual information (PMI)

• Documents are separated into target and reference groups

Political Text Analysis 37

Word 𝒋𝒋 Words not 𝒋𝒋 Total

Document 𝒊𝒊 𝑥𝑥𝑖𝑖𝑖𝑖 𝑥𝑥𝑖𝑖�̂�𝚥 𝑥𝑥𝑖𝑖�Documents not 𝒊𝒊 𝑥𝑥�̂�𝚤𝑖𝑖 𝑥𝑥�̂�𝚤�̂�𝚥 𝑥𝑥�̂�𝚤�

Total 𝑥𝑥�𝑖𝑖 𝑥𝑥��̂�𝚥 𝑥𝑥��

Page 38: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Computing keyness

• Chis-squared• Given expected values 𝑒𝑒𝑖𝑖𝑖𝑖 , 𝑒𝑒𝑖𝑖�̂�𝚥, 𝑒𝑒�̂�𝚤𝑖𝑖 , 𝑒𝑒�̂�𝚤�̂�𝚥 computed from marginals

Χ2 = (𝑥𝑥𝑖𝑖𝑖𝑖 − 𝑒𝑒𝑖𝑖𝑖𝑖)2

𝑒𝑒𝑖𝑖𝑖𝑖+

(𝑥𝑥𝑖𝑖�𝚥𝚥 −𝑒𝑒𝑖𝑖�𝚥𝚥)2

𝑒𝑒𝑖𝑖�𝚥𝚥+

(𝑥𝑥�̂�𝚤𝑖𝑖 −𝑒𝑒�̂�𝚤𝑖𝑖)2

𝑒𝑒�̂�𝚤𝑖𝑖+

(𝑥𝑥�̂�𝚤�𝚥𝚥 −𝑒𝑒�̂�𝚤�𝚥𝚥)2

𝑒𝑒�̂�𝚤�𝚥𝚥

keyness = �Χ2, 𝑥𝑥𝑖𝑖𝑖𝑖 ≥ 𝑒𝑒𝑖𝑖𝑖𝑖−Χ2, 𝑥𝑥𝑖𝑖𝑖𝑖 < 𝑒𝑒𝑖𝑖𝑖𝑖

• Point-wise mutual information (PMI)• When 𝑝𝑝𝑖𝑖𝑖𝑖 ,𝑝𝑝𝑖𝑖�,𝑝𝑝�𝑖𝑖 are joint and marginal probabilities of 𝑥𝑥𝑖𝑖𝑖𝑖

keyness = 𝑃𝑃𝑃𝑃𝑃𝑃 = 𝑙𝑙𝑙𝑙𝑙𝑙 𝑝𝑝𝑖𝑖𝑖𝑖𝑝𝑝𝑖𝑖�𝑝𝑝�𝑖𝑖

Political Text Analysis 38

Page 39: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Collocation analysis

Political Text Analysis 39

Page 40: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Collocation analysis

• Positional analysis exploits two types of relationships• Contiguous relationship

• “more people”, “people depend”, “depend on” etc.• Non-contiguous relationship

• “more” & “people”, “more” & “depend”, “more” & “on”, etc.

• These are called “collocations”• Collocations recorded in feature-cooccurrence matrix (FCM)

Political Text Analysis 40

More people depend on social benefits without economic growth… We need to stimulate economic growth through social and economic reforms.

Page 41: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Contiguous collocations

• (Contiguous) collocations count sequences of words• Counting contiguous collocation is usually ordered• Usually low-level linguistic phenomena

• Frequent collocations are multi-word expressions• “social benefits”, “economic growth”

Political Text Analysis 41

social benefits economic growth

social 0 1 0 0

benefits 0 0 0 0

economic 0 0 0 2

growth 0 0 0 0

Page 42: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Contiguous collocation analysis

• For sequences of 2 words• Two-by-two contingency tables

• For word 𝑥𝑥 and y, there are four patterns 𝑥𝑥𝑦𝑦, 𝑥𝑥 �𝑦𝑦, �𝑥𝑥𝑦𝑦, �𝑥𝑥 �𝑦𝑦 to consider• Similar to computing keyness (Chi-squared, PMI, likelihood ratio)

• For sequences of 3 words of more• 3 words sequences will be two-by-two-by-two cubes• 4 words will be…• Use model proposed by Blaheta & Johnson (2001)

• Treat collocation as interaction terms of saturated log-linear models

Political Text Analysis 42

Page 43: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Non-contiguous collocation

• Non-contiguous collocation is counted within boundaries• Document, sentences or window (𝑤𝑤 = 3 in the example)• Counting of non-contiguous collocation is usually unordered • Usually high-level linguistic phenomena

• Non-contiguous collocations are often related words• “social” & “economic”, “economic” & “growth”

Political Text Analysis 43

social benefits economic growth

social 0 1 3 1

benefits 1 0 1 1

economic 3 1 0 2

growth 1 1 2 0

Page 44: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Non-contiguous collocation analysis

• Non-contiguous collocations are pairs of words • Documents or sentences

• The same as relative frequency analysis• Windows

• Construct two-by-two contingency tables for

�abs(𝑑𝑑) ≥ 𝑤𝑤abs 𝑑𝑑 < 𝑤𝑤

where 𝑑𝑑 is distance from target word and 𝑤𝑤 is the size of collocation window

• Compare relative frequency of words between inside and outside of the window

Political Text Analysis 44

Page 45: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Baker et al. 2012Sketching Muslims: A Corpus Driven Analysis of Representations Around the Word 'Muslim' in the British Press 1998-2009

Political Text Analysis 45

Page 46: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Research questions

• RQ: How did British newspapers portray Muslims and Islam?• Earlier studies suggest that negative media representation of Muslims

or Islam marginalizes the Muslim people• Especially when the media emphasized homogeneity of Muslim people

Political Text Analysis 46

Page 47: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Data

• The authors collected articles about Muslims and Islam from 8 newspaper in 1999-2009

• Both broadsheets and tabloids• Downloaded 200,037 articles by exhaustive search query

Political Text Analysis 47

alah OR allah OR ayatollah! OR burka! OR burqa! OR chador! OR fatwa! OR hejab! OR imam! OR islam! OR koran OR mecca OR medina OR mohammedan! OR moslem! OR muslim! OR mosque! OR mufti! OR mujaheddin! OR mujahedin! OR mullah! OR prophetmohammed OR q’uran OR rupoush OR rupush OR sharia OR shari’a OR shia! OR shi-ite! OR shi’ite! OR sunni! OR the prophet OR wahabi OR yashmak! AND NOT islamabad AND NOT shiatsu AND NOT sunnily

Page 48: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Analysis

• Collocation analysis• Extract only nouns modified by “Muslim”

• Used part-of-speech (POS) tagging to classify words are verbs, nouns, adjective etc.

• Collocation measure was log-Dice on Sketch Engine

• Simple frequency analysis• Count the frequencies of expression that indicate homogeneity of

Muslim people • Only “Muslim community” and “Muslim Word” • Exclude “Muslim communities”

Political Text Analysis 48

Page 49: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Political Text Analysis 49

Page 50: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Political Text Analysis 50

Page 51: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Frequency of collocation categories

Political Text Analysis 51

Page 52: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Keywords in the context

Political Text Analysis 52

Page 53: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Frequency of homogeneity words

Political Text Analysis 53

Page 54: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Conclusions

• Newspapers portray Muslims as a distinct community that contain dangerous radical elements

• Negative representation of newspapers’ columns and readers’ corner sections

• Publishers to distance themselves from the writers views negative expressions in opinion pieces

Political Text Analysis 54

Page 55: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

Limitations

• There might be more expression that indicate homogeneity of Muslim people

• The trend was only “Muslim community” and “Muslim world”

• The increase in absolute frequency does not tell much• There might be greater increase expressions related to heterogeneity• Author could count the frequency of “Muslim communities” (excluded

in the analysis) as benchmark

• Corpus includes articles about Muslims in overseas• Geographical classifier can be used to remove foreign news stories

Political Text Analysis 55

Page 56: Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

ReferencesHarris, Z. S. (1954). Distributional structure. Word, 10(2–3), 146–162.

Proksch, S.-O., Lowe, W., Wäckerle, J., & Soroka, S. (n.d.). Multilingual Sentiment Analysis: A New Approach to Measuring Conflict in Legislative Speeches. Legislative Studies Quarterly, 0(0).

Scott, M. (2006). WordSmith Tool. Oxford, UK: Oxford University Press.

Blaheta, D., & Johnson, M. (2001). Unsupervised Learning of Multi-Word Verbs. In Proceeding of the Acl/Eacl 2001 Workshop on the Computational Extraction, Analysis and Exploitation of Collocations (pp. 54–60).

Jansa, J. M., Hansen, E. R., & Gray, V. H. (2018). Copy and Paste Lawmaking: Legislative Professionalism and Policy Reinvention in the States. American Politics Research, 1532673X18776628. https://doi.org/10.1177/1532673X18776628

Baker, P., Gabrielatos, C., & McEnery, T. (2012). Sketching Muslims: A Corpus Driven Analysis of Representations Around the Word “Muslim” in the British Press 1998-2009. Applied Linguistics, 34(3), 255–278. https://doi.org/10.1093/applin/ams048

Political Text Analysis 56