Language and Political Ideology

8/8/2019 Language and Political Ideology

1/30

Nisnevich 1

Language and Political Ideology:

A Comparison of the Word Choice of Liberal and Conservative American Columnists

Alex Nisnevich

Ling 55AC

Professor Houser

Fall 2010


2/30


3/30

Nisnevich 3

Introduction

The topic of this study is the connection between language and thought, and

particularly how language is used and chosen, even subconsciously, both as a political tool

and a marker of political identity. To test the relationship between political persuasion and

language, word frequencies of a wide range of opinion pieces by American liberal and

conservative writers are analyzed to determine if there is a statistically significant

difference in the choice of words by liberals and conservatives. This data is then used to

build a computer algorithm that tries to determine the political persuasion of documents

based on their word frequencies, and the remainder of the study consists of tests run with

this algorithm.

If an effective algorithm can be constructed that accurately determines the political

persuasion of a document based on its word use, then this is a very significant result for

linguistics, because it conclusively demonstrates that American liberals and American

conservatives use language in a way that can be modeled and predicted. In this way, a

strong connection between language and political beliefs, and on a broader level between

language and thought, can be shown.

My working hypothesis is that there is a statistically significant correlation between

political identification and word choice and that it is theoretically possible for a computer

model to predict political identification based on a word-frequency analysis of a document

with a reasonable degree of accuracy. However, I still believe that personal ideology does

not entirely determine language, and so I dontexpect the algorithm I develop to be any

more than 80-90% accurate in pinpointing the ideology of well-known writers.


4/30

Nisnevich 4

Literature Review

The connection between language use and political ideology is loosely related to the

theory of linguistic relativity, which states that people of different linguistic communities

think differently because their language offers them different ways of expressing the world

around them (Kramsch 11). While the linguistic relativity hypothesis, or Sapir-Whorf

hypothesis, only makes a statement about different languages as they relate to different

world-views, it is conceivable that the same language can be used to encode different world-

views for different ideological groups, and this is related to linguistic relativity in that both

stem from the fact that signs, despite generally being arbitrary in form, are non-arbitrary in

their use.

The relationship between language and politics in particular was first explored by

George Orwell, in his aptly-titled 1946 essay Politics and the English Language. Orwell

alludes to the linguistic relativity hypothesis by writing, If thought corrupts language,

language can also corrupt thought (Orwell 7), and theorizes that political writing is driven

by what he described as meaningless words, that is, words, such as fascism,

democracy, freedom, patriotic, and justice, that have many different meanings for

different people but are used in an empty and dishonest way (Orwell 4). Orwells essay

suggests that such meaningless words tend to be used in political writing of all

persuasions, due to their variable meaning. Several years later, Orwell went on to brilliantly

describe how language can affect the thoughts of people in a repressive regime in his novel

Nineteen Eighty-Four.

Noam Chomsky frequently discussed the connection between language and politics,

and his view was that political language consisted of mere terms of propaganda such as

the free world and the national interest (Chomsky 472). Like Orwell, Chomsky believes


5/30

Nisnevich 5

that language is often abused to enforce ideological goals (Chomsky 472). In Chomskys

view, the primary persuasive power of language was in the connotations of words. As an

example of the significance of connotations, he describes the time when Nicaragua was

painted negatively by the American media due to its plan to purchase MiG fighter planes,

when Nicaragua would have been perfectly happy to buy French Mirages if they had been

allowed to. In the eyes of the media and the public, the MiG had heavily Soviet (and thus

very negative) connotations, while the Mirage, despite being essentially identical, had no

negative connotations, and Chomsky attests that the MiG purchase was heavily played up by

the hawkish media to provoke public opinion against Nicaragua over a triviality (Chomsky

610-611).

Orwell and Chomsky both discussed how political language can influence and

persuade, but George Lakoff took the discussion in a different direction by theorizing that

political opinions could be correlated to different internal metaphors about government

that people subscribe to. In his view, conservatives tend to envision government under a

strict father model where citizens are disciplined into being responsible, moral adults and

then left alone (Lakoff 65), while liberals tend to envision government under a nurturant

parent model where essentially good citizens are kept away from corrupting influences

(Lakoff 108). In Lakoffs view, it is metaphor a form of language that largely drives

political opinions, not the other way around. Originally I had hoped to be able to empirically

test Lakoffs theories, but since they concern feelings on a deep conceptual level, it is

unlikely that a correlation could be made to word frequencies in documents.

More recently, with the rise of interactivity over the Internet, the connection

between language and politics has become a focus of some websites that track the

popularity of different words among different political groups. For instance, a CNN site


6/30

Nisnevich 6

launched this year allows users to submit a brief statement of their beliefs, and then

measures overall word frequency clouds, as well as frequency clouds for Democrats and for

Republicans (see Figures 1-3). It is important to note, however, that the results of this

survey will likely differ significantly from the results of my study, due to the vastly different

surveyed population (namely, the general public as opposed to political columnists).

Figure 1. CNN's most popular words (independent of ideology)

Figure 2. CNN's most popular words (Democrats) Figure 3. CNN's most common words (Republicans)

Source for Fig. 1-3: http://www.cnn.com/interactive/2010/10/politics/ireport.elex.project/?hpt=C1


7/30

Nisnevich 7

Component I: Frequency Analysis of Liberal and Conservative Articles

Methodology

I chose to break up this study into two components. In the first part, I conducted a

frequency analysis of 50 liberal and 50 conservative editorials (10 editorials by 10 writers

each) to determine if certain words were used significantly more by liberal or conservative

columnists. In the second part, I attempted to build an algorithm to accurately determine an

articles political ideology using the frequencies of certain words and the dataset produced

earlier (see Component II).

The first step of frequency analysis was the selection of authors and articles. To

select five liberal and five conservative authors, I looked for columnists who satisfied the

following seven criteria:

1. Living American print columnist2. Self-described liberal/conservative3. Considered by readers and the general public to be liberal/conservative4. Preferably not too close to the center5. Regular or semi-regular columns, with the last one written no more than a

few months ago

6. Columns are broadly about politics, and are not solely focused on anyparticular policy area, such as foreign policy or economics.

7. Reasonably well-knownThe reason for such a stringent selection process was to eliminate as many sources

of bias as possible. I eliminated from the running writers who were self-described centrists,

libertarians, or populists, or whose public perception did not match their self-described


8/30

Nisnevich 8

ideology, in order to be able to classify the writers into two distinct categories liberal and

conservative with no overlap. Furthermore, I only looked for active writers and avoided

writers who only focused on a single policy area, so that the articles would all have a

relatively similar general focus and time period. This selection process gave me the

following writers:

Liberal

Jonathan Alter (Newsweek) Maureen Dowd (New York Times) Paul Krugman (New York Times) Robert Reich (aggregated on

RealClearPolitics)

Frank Rich (New York Times)

Conservative

Pat Buchanan (syndicated) David Horowitz (FrontPage Magazine) Charles Krauthammer (Washington

Post)

Michelle Malkin (syndicated) George Will (Washington Post)

Once I had my ten writers, selection of articles to examine proved not to be difficult.

With the exception of David Horowitz, all of these writers had their own entries on the news

aggregator site RealClearPolitics.com, so I was able to see all of their most recent articles. In

David Horowitzs case, I simply looked at the latest articles that he wrote for FrontPage

Magazine. In general, I tried to use the ten most recent articles by each writer, though in a

few cases I had to skip over some articles in the event that they clearly had nothing to do

with government or politics. Including such articles would only have confused the data,

since different fields use different jargon. (For a full list of URLs of articles used, see

Appendix II.)

To count word frequencies within each document, I wrote a PHP script (see

count.php in Appendix I for source code). The script used the following rules:


9/30

Nisnevich 9

Remove all punctuation characters before counting Ignore articles, copulas, prepositions, conjunctions, connectors, and

pronouns.

Count words twice if they are in the article title or subtitle (on the groundsthat these are words that the writer must have deemed especially

important)

Ignore words that capitalized >50% of the time in an article (to throw outproper nouns)

The decision to throw out proper nouns was a difficult one to make, but ultimately I

decided to do so on the grounds that counting proper nouns would simply make it too easy

to find differences in language use. For instance, only a liberal article would likely mention

F.D.R. or Kennedy, and only a conservative article would likely talk about Communists. Even

barring such extreme examples, proper nouns would tend to point more clearly than other

words toward a particular ideology, and so I made the decision to ignore words that are

primarily capitalized in an article.

The script integrated into a database and kept running totals of word frequencies by

author and by ideology, so I was able to simply run the script on 100 articles and receive the

cumulative word frequency per author and per ideology. Total word counts were also

returned, which were necessary for the next step.

After I obtained the results for all 100 articles, I calculated the proportional usage of

each word by each writer by dividing the word frequency by the total word count, and

performed the same calculation for the running liberal totals and conservative totals.

Finally, for each word I subtracted its conservative proportional usage from its liberal

proportional usage, obtaining a number I called the bias. A positive bias thus meant that a


10/30

Nisnevich 10

word was used more often by liberal writers, and a negative bias meant that a word was

used more often by conservative writers. Sorting the table by bias enabled me to finally see

the words that are used significantly more often by liberal columnists and by conservative

columnists.

As one final step, I selected 15 liberal key words and 15 conservative key words

from the top and bottom of the table, respectively. I took the most biased words that

satisfied two criteria:

Related in at least some way to politics or government (that is, not a wordlike not or did)

For a liberal word, at least 3 of the 5 liberal columnists use it more oftenthan the average conservative columnist, and vice versa. (This was a

necessary criteria to avoid words that are used frequently but only by one or

two columnists and thus are poor reprentatives of words used by columnists

in general.)

Results and Analysis

Tables 1-3 show the 50 most frequently used words, the 30 most liberally biased

words, and the 30 most conservatively biased words, respectively. Liberal key words are

written in bold blue, and conservative key words are written in bold red. (Note that all

values given are proportional, so, for instance, if a word with a liberal-avg value of 0.001,

it is 1 out of every 1,000 words written by liberal columnists on average.)

Table 1. Most frequently used words (regardless of ideology)

word liberal-avg conservative-avg liberal bias

not 0.004418 0.005734 -0.001317

have 0.004097 0.004716 -0.000619


11/30

Nisnevich 11

has 0.003329 0.003179 0.000150

will 0.003223 0.002909 0.000314

would 0.002390 0.002576 -0.000186

had 0.002369 0.002389 -0.000020

government 0.001601 0.001724 -0.000124

years 0.001451 0.001828 -0.000377

president 0.002241 0.000997 0.001244

political 0.001387 0.001621 -0.000233

do 0.001366 0.001558 -0.000192

war 0.000918 0.001828 -0.000911

percent 0.001686 0.000935 0.000751

new 0.001409 0.001163 0.000245

can 0.001707 0.000831 0.000876

should 0.001409 0.001060 0.000349

tax 0.001793 0.000644 0.001149

time 0.001409 0.001018 0.000390people 0.001216 0.001163 0.000053

just 0.001387 0.000935 0.000452

said 0.001280 0.000935 0.000346

economy 0.001537 0.000603 0.000934

get 0.001430 0.000623 0.000807

could 0.001067 0.000977 0.000091

own 0.001195 0.000831 0.000364

public 0.001110 0.000852 0.000258

last 0.001323 0.000623 0.000700

did 0.000726 0.001184 -0.000459there 0.000619 0.001288 -0.000669

country 0.000704 0.001143 -0.000438

election 0.000854 0.000977 -0.000123

two 0.000854 0.000977 -0.000123

economic 0.001366 0.000395 0.000971

back 0.001195 0.000540 0.000655

world 0.000683 0.001039 -0.000356

money 0.001003 0.000686 0.000317

academic 0.000000 0.001662 -0.001662

year 0.001088 0.000561 0.000527

policy 0.001024 0.000603 0.000422

health 0.000960 0.000623 0.000337

why 0.000896 0.000686 0.000211

right 0.001024 0.000540 0.000484

way 0.000854 0.000561 0.000293

top 0.001195 0.000208 0.000987

cuts 0.001174 0.000208 0.000966


12/30

Nisnevich 12

might 0.001067 0.000312 0.000755

don't 0.000982 0.000395 0.000587

may 0.000811 0.000540 0.000271

same 0.000768 0.000582 0.000187

Table 2. Most liberally biased words

word

liberal-

total

conservative-

total liberal bias

president 0.002241 0.000997 0.001244

tax 0.001793 0.000644 0.001149

top 0.001195 0.000208 0.000987

economic 0.001366 0.000395 0.000971

cuts 0.001174 0.000208 0.000966

big 0.001152 0.000187 0.000965

economy 0.001537 0.000603 0.000934can 0.001707 0.000831 0.000876

get 0.001430 0.000623 0.000807

can't 0.001024 0.000229 0.000796

might 0.001067 0.000312 0.000755

percent 0.001686 0.000935 0.000751

last 0.001323 0.000623 0.000700

plan 0.000790 0.000104 0.000686

won't 0.000726 0.000042 0.000684

back 0.001195 0.000540 0.000655

jobs 0.000896 0.000249 0.000647know 0.000875 0.000229 0.000646

debt 0.000960 0.000353 0.000607

don't 0.000982 0.000395 0.000587

spending 0.000918 0.000332 0.000585

deficit 0.000726 0.000145 0.000580

year 0.001088 0.000561 0.000527

cut 0.000726 0.000208 0.000518

unemployment 0.000704 0.000187 0.000517

costs 0.000598 0.000083 0.000514

business 0.000683 0.000187 0.000496income 0.000662 0.000166 0.000495

didn't 0.000619 0.000125 0.000494

right 0.001024 0.000540 0.000484

Table 3. Most conservatively biased words

word

liberal-

total

conservative-

total liberal bias

academic 0.000000 0.001662 -0.001662

not 0.004418 0.005734 -0.001317

students 0.000000 0.001184 -0.001184

freedom 0.000000 0.001039 -0.001039

war 0.000918 0.001828 -0.000911

left 0.000171 0.001039 -0.000868

liberal 0.000171 0.000977 -0.000806university 0.000000 0.000769 -0.000769

there 0.000619 0.001288 -0.000669

faculty 0.000000 0.000665 -0.000665

have 0.004097 0.004716 -0.000619

hearings 0.000021 0.000623 -0.000602

state 0.000277 0.000873 -0.000595

treaty 0.000043 0.000603 -0.000560

radical 0.000149 0.000706 -0.000557

women 0.000192 0.000748 -0.000556

nuclear 0.000021 0.000561 -0.000540says 0.000299 0.000810 -0.000512

such 0.000405 0.000914 -0.000509

security 0.000043 0.000519 -0.000477

did 0.000726 0.001184 -0.000459

professors 0.000000 0.000457 -0.000457

social 0.000299 0.000748 -0.000449

country 0.000704 0.001143 -0.000438

unions 0.000000 0.000436 -0.000436

left-wing 0.000021 0.000436 -0.000415

states 0.000107 0.000519 -0.000413members 0.000149 0.000561 -0.000412

course 0.000384 0.000790 -0.000405

illegal 0.000000 0.000395 -0.000395


13/30

Nisnevich 13

Table 4 displays the 15 key liberal words and 15key conservative words that

were obtained from the above data:

Table 4. Key liberal and conservative words

# Liberal Conservative

1 president students

2 tax freedom

3 economic war

4 cuts left

5 economy liberal

6 plan state

7 jobs treaty

8 spending radical9 deficit women

10 cut nuclear

11 unemployment security

12 costs social

13 business country

14 income unions

15 financial states

The most liberally biased word, or the word that liberal columnists used the most in

comparison to conservative columnists, is president, which may stem from the fact that

liberal articles tended to refer to Barack Obama as President Obama, while conservative

articles generally did not (this trend would likely be the opposite when a Republican

president is in power). Other than this, the liberally biased words tended to relate to the

economy (tax, economic, cuts, etc), while the conservatively biased words tended to

relate to foreign policy (war, treaty, nuclear), to liberalism (left, liberal), and to

nationalism (state, country, states). Comparing these results with those of the CNN

iReport (see page 5) shows that both datasets put government as the most popular

politics-related word, but other than that there is very little agreement, which, as mentioned


14/30

Nisnevich 14

before, could be ascribed to the two different populations studied. The fact that Paul

Krugman is an economist as well as columnist may have contributed to the high usage of

economic terms by the liberal columnists, though it couldnt have been the only factor, since

almost all of the liberal columnists (with the occasional exceptions of Alter and/or Dowd)

showed relatively high frequencies of use for the economic terms.

Applying the theories of Orwell and Chomsky, there is certainly a prevalence of what

Orwell described as meaningless words in the dataset in particular, the word freedom,

the second most conservatively biased word is one that both Orwell and Chomsky pointed

out has no inherent meaning due to the many conflicting connotations that it could have. To

a lesser extent, this could be said for a great many number of words on these lists: as

Chomsky would argue, these frequency lists show that the principal goal of political

language, even in private newspapers such as the New York Times and the Washington Post,

is to persuade, and persuasion is accomplished with the aid of imprecise language that

avoids the need for concrete details.

In this part of the project I demonstrated that there are significant differences in

word choice between liberal and conservative columnists, but I had not yet determined how

predictable these differences were. This is the topic that I addressed in the second part of

the project.

Component II: Prediction of Ideology from Word Frequencies

Methodology

Suppose that an article is presented that is written by either a liberal columnist or a

conservative columnist, but no other information is given aside from the text of the article


15/30

Nisnevich 15

itself. My goal in this part of the project was to write an algorithm that tried to predict

whether a given article was liberal or conservative, based on the frequency of certain

words. I made use of the key liberal and key conservative words that I found in the first part

of this project and decided that the algorithm would test each of the 30 key words,

determining the likelihood of the article being liberal or conservative based on the

frequency of each key word and then adding the results together.

More precisely, the script that I wrote (see guess.php in Appendix I for source code)

functioned as follows:

I. For each of the 30 key words,1. Find the frequency of the key word in the given article.2. Determine how many of the 10 writers tested in Component 1 used

this word at least as often on average as in the given article.

3.a. If there are some writers who used the word this often, find the

percentage of them who are liberal. This is the percentage

chance that the article has a liberal bias in terms of word

frequency, based on the data from just this word.

b. If there are no writers who used the word this often, then thisarticle is either very liberal (if its a liberal key word) or very

conservative (if its a conservative key word). Thus, give it either

a 100% chance of being liberal or a 0% chance, appropriately,

based on the data from just this word.

4. Subtract 50% from this percentage to obtain the bias number. Ifthe bias is positive, the article is likely to be liberal based on this


16/30

Nisnevich 16

word (+0.5 bias equates to 100% chance of being liberal). If the bias

is negative, the article is likely to be conservative based on this word

( -0.5 bias equates to 100% chance of being conservative).

II. Finally, take the 30 bias numbers that are calculated (one for each keyword), and add them together to obtain the overall bias number for the

article. In theory, this bias could range from -7.5 to +7.5, and in practice it

has ranged between -4.45 (almost certainly conservative) to +6.5 (almost

certainly liberal).

From here, my work on the project consisted of testing this algorithm, the results of

which appear in the next section.

Results and Analysis

To test the results of the algorithm, I ran it on each of the 100 articles I had looked at

previously. Table 5 below shows the resulting bias score for each of the ten articles (in

order according to the list in Appendix I) by each of the ten writers, as well as an average

score for each writer. Figure 4 below is a bar graph of the resulting average bias scores for

each writer.

Table 5. Bias scores for each of the tested articles, and average scores per writer

Alter Dowd Krugman Reich Rich Buchanan Horowitz Krauthammer Malkin W

+1.00 +0.33 +3.67 +6.50 +1.32 +1.00 -3.50 -1.00 -0.83 -0.2

+1.92 +1.00 +2.50 +3.50 +2.76 -2.33 -1.25 -0.50 +1.50 -1.0

+1.00 +1.42 +2.92 +4.50 +3.50 -3.50 -2.25 -1.08 -2.25 +0.7+0.00 +0.00 +2.00 +4.00 +1.31 +1.17 -3.00 -0.25 +1.00 +1.5

+2.50 +2.50 +3.50 +1.00 +0.06 -0.83 -4.23 -0.50 +0.75 +2.0

+3.60 -1.50 +2.50 +1.83 +2.67 +1.67 -3.00 +0.25 -1.00 +0.0

-1.50 -0.33 +2.67 +2.50 +3.10 -1.83 -4.22 -1.00 +0.75 +0.0

-1.00 -0.50 +2.17 +2.50 +1.55 +1.00 -2.00 +1.00 -2.25 +0.0

+2.75 +0.67 +3.33 +3.50 +3.50 -1.00 -2.78 +0.00 -2.50 -1.5


17/30

Nisnevich 17

Figure 4. Average bias scores for each columnist

As can be seen, the results are somewhat promising but not as clear-cut as Id hoped

they would be. While all but one liberal columnist has a score above +1.00, only one

conservative columnist Horowitz managed a score below -1.00, with the rest floating

close to the +0.00 line. Most troubling, the algorithm gave George Will a liberal bias score

thats statistically indistinguishable from that of Maureen Dowd, despite their nearly

diametrically opposite views. Looking at the table shows that the bias scores given jump

all over the place even for articles written by the same author, apparently with minor

fluctuations in word choice from article to article having huge repercussions. All told, 6

liberal articles were miscategorized as conservative and 17 conservative articles were

miscategorized as liberal, for a total of 23 mistakes or a 77% success rate, which is about

what I expected the algorithm to be able to achieve.

-4.00 -3.00 -2.00 -1.00 +0.00 +1.00 +2.00 +3.00 +4.00

Alter

Dowd

Krugman

Reich

Rich

Buchanan

Horowitz

Krauthammer

Malkin

Will

+1.00 +0.25 +2.42 +1.25 -2.33 +0.08 -4.45 -0.58 -2.25 +1.0

AVG +1.13 +0.38 +2.77 +3.11 +1.74 -0.46 -3.07 -0.37 -0.71 +0.2


18/30

Nisnevich 18

Why isnt the algorithm able to give more accurate results? I believe that the issue is

that, while the differences in word choice between liberal and conservative columnists can

be demonstrated (see Component I), they cannot necessarily be predicted in advance:

individual authors will differ from each other in different and possibly unexpected ways,

and word frequency count is simply not precise enough a measurement to be able to

achieve a near-perfect accuracy.

The fact that my algorithm could achieve 77% accuracy with such a simple

algorithm, however, does seem to provide further evidence for the interrelation between

political ideology and language. Nevertheless, this accuracy rating is somewhat suspect,

since the algorithm was applied on the very same articles that were used to determine how

it operates, which could introduce some bias. Further testing is needed to ascertain how

useful this algorithm or its variations could be on an unpredictable sample set.

Conclusion

Ultimately, both of my hypotheses came out as expected: I demonstrated in

Component I a statistically significant correlation between political identification and word

choice, but only managed to achieve 77% accuracy in predicting political persuasion based

on word choice. As mentioned above, this suggests that the differences in word choice

between liberal and conservative columnists, while present, are somewhat unpredictable.

Furthermore, I believe that personal ideology does not entirely determine language use,

though it is a large contributor, so it would make sense that perfect accuracy is impossible

in this context. Finally, there is no such thing as a simple duality of political beliefs, and

there is huge variety in what one can believe in even if one identifies as liberal or as

conservative. In light of this, the results can be interpreted to mean that, while the 5


19/30

Nisnevich 19

liberal authors can be shown to write significantly differently from the 5 conservative

authors, theres nothing close complete agreement in word choice among the liberal

authors and among the conservative authors. As the cultural linguistic concept of agency

makes clear, membership in a group contributes to identity but does not establish identity

(Bucholtz 422).

If I had to do this paper again, I think the one thing that I would definitely try to do

differently would be to include more authors, since only comparing five liberal columnists

and five conservative columnists involves such a small sample size that unexpected bias can

easily be introduced as a result. However, significantly increasing the sample size would

come at a cost, as it would also make the project much more tedious and time-consuming. I

am also interested in what would have happened if I hadnt removed proper nouns from the

comparison: I still think that it was the right thing for my project to ignore proper nouns,

but the results would certainly be different and notable in their own way if I had chosen to

keep them in.

Accepting the necessary failings of any attempt to definitively classify writing by

political persuasion, there are still many useful avenues of research for this topic. In

particular, if the linguistic bias detection algorithm is made a little more robust, there are

many sources of data that it could examine and questions that it could consider: Do

columnists who self-identify as nonpartisan still have a significant conscious or

subconscious linguistic bias? Are presidential speeches written to appeal more to partisans

or to centrists, judging by the bias in their word choice? How do political blog articles

compare to editorials in terms of bias? These and many other questions could be

investigated with such an algorithm, and could lead to interesting results.


20/30

Nisnevich 20

Works Cited

Buckoltz, Mary. "Language, Gender, and Sexuality." Language in the USA: Themes for the

Twenty-first Century. By Edward Finegan and John R. Rickford. Cambridge:

Cambridge UP, 2004. Print.

Chomsky, Noam, and Carlos Peregrn Otero. Language and Politics. Oakland, CA: AK, 2004.

Print.

"IReport Election Project." CNN.com. Cable News Network, 27 Oct. 2010. Web. 2 Dec. 2010.

Kramsch, Claire J. Language and Culture. Oxford, OX: Oxford UP, 1998. Print.

Lakoff, George. Moral Politics: How Liberals and Conservatives Think. Chicago: Univ. of

Chicago, 2001. Print.

Orwell, George. "Politics and the English Language." Horizons 1946. Web.


21/30

Nisnevich 21

Appendix I: Source Code

count.php


22/30

Nisnevich 22

{// Increment total word count$wordCount++;

// Ignore words in ignore listif (!in_array ($word, $ignoreList)) {

// If the word is uppercase, make lowercase and also add touppercase frequency table

if (ctype_upper(substr($word, 0, 1))) {$word = strtolower($word);array_key_exists( $word, $uppercase ) ? $uppercase[

$word ]++ : $uppercase[ $word ] = 1;}

// For each word found in the frequency table, incrementits value by one

array_key_exists( $word, $freqData ) ? $freqData[ $word ]++: $freqData[ $word ] = 1;

}

}

// Insert a "Total" entry for total word count$freqData['#TOTAL#'] = $wordCount;

// Now, we must insert results into the db

// First, add db column for this authoradd_column_if_not_exist ("research_ling_polwordfreq", $author);

// If a word is uppercase more than 50% of the time, ignore it// Otherwise, add it to two columns: one for the author and one for theideologyforeach ( $freqData as $word => $freq) {

if (!array_key_exists ($word, $uppercase) || $freq > (2 *$uppercase[$word])) {

$query = "INSERT INTO `research_ling_polwordfreq` (`word`,`$author`, `$ideology-total`) VALUES('$word', $freq, $freq) ONDUPLICATE KEY UPDATE `$author` = `$author` + $freq, `$ideology-total` =`$ideology-total` + $freq";

$result = mysql_query ($query);echo "$query ... $result
";

}}echo "DONE";

// Function for adding a column only if it doesn't already exist

function add_column_if_not_exist($db, $column, $column_attr = "INT NOTNULL" ){

$exists = false;$columns = mysql_query("show columns from $db");while($c = mysql_fetch_assoc($columns)){

if($c['Field'] == $column){$exists = true;break;

}}


23/30

Nisnevich 23

if(!$exists){$query = "ALTER TABLE `$db` ADD `$column` $column_attr";$result = mysql_query($query);echo "$query ... $result
";

}}

?>

guess.php


24/30

Nisnevich 24

array(0.000128, 0.000124, 0.002339, 0.003072, 0.001303),array(0.000850, 0.000111, 0.000537, 0.000283, 0.000541));

$bias += test_word ("cuts", 1, $freqData, $wordCount,array(0.000639, 0.000248, 0.001969, 0.001469, 0.001368),array(0.000728, 0.000000, 0.000000, 0.000425, 0.000135));

$bias += test_word ("economy", 1, $freqData, $wordCount,array(0.000895, 0.000248, 0.003200, 0.004408, 0.000261),array(0.001214, 0.000111, 0.000939, 0.000425, 0.000946));

$bias += test_word ("plan", 1, $freqData, $wordCount,array(0.001278, 0.000743, 0.001723, 0.000401, 0.000261),array(0.000121, 0.000000, 0.000000, 0.000566, 0.000000));

$bias += test_word ("jobs", 1, $freqData, $wordCount,array(0.000511, 0.000248, 0.000615, 0.001870, 0.001107),array(0.000243, 0.000111, 0.000000, 0.000425, 0.000676));

$bias += test_word ("spending", 1, $freqData, $wordCount,array(0.000383, 0.000619, 0.001600, 0.001603, 0.000651),array(0.000486, 0.000000, 0.000134, 0.000283, 0.001216));

$bias += test_word ("deficit", 1, $freqData, $wordCount,array(0.000128, 0.000124, 0.001723, 0.001870, 0.000261),array(0.000607, 0.000000, 0.000000, 0.000000, 0.000270));

$bias += test_word ("cut", 1, $freqData, $wordCount,array(0.000383, 0.000000, 0.000369, 0.002271, 0.000717),array(0.000607, 0.000000, 0.000134, 0.000425, 0.000135));

$bias += test_word ("unemployment", 1, $freqData, $wordCount,array(0.000383, 0.000000, 0.001600, 0.001202, 0.000521),array(0.000243, 0.000000, 0.000537, 0.000142, 0.000270));

$bias += test_word ("costs", 1, $freqData, $wordCount,array(0.001534, 0.000000, 0.000615, 0.001336, 0.000065),array(0.000000, 0.000000, 0.000000, 0.000425, 0.000135));

$bias += test_word ("business", 1, $freqData, $wordCount,array(0.000383, 0.000124, 0.000123, 0.002004, 0.000782),array(0.000121, 0.000111, 0.000134, 0.000708, 0.000000));

$bias += test_word ("income", 1, $freqData, $wordCount,array(0.000000, 0.000000, 0.000492, 0.002004, 0.000782),array(0.000121, 0.000000, 0.000134, 0.000849, 0.000000));

$bias += test_word ("financial", 1, $freqData, $wordCount,array(0.000767, 0.000248, 0.000739, 0.000267, 0.001172),array(0.000243, 0.000056, 0.000671, 0.000425, 0.000135));

$bias += test_word ("states", 0, $freqData, $wordCount,array(0.000256, 0.000000, 0.000000, 0.000267, 0.000065),array(0.000364, 0.000111, 0.000402, 0.001840, 0.000541));

$bias += test_word ("unions", 0, $freqData, $wordCount,array(0.000000, 0.000000, 0.000000, 0.000000, 0.000000),array(0.000000, 0.000612, 0.000000, 0.000849, 0.000541));

$bias += test_word ("country", 0, $freqData, $wordCount,array(0.000895, 0.000867, 0.000492, 0.000000, 0.000977),

array(0.002064, 0.001113, 0.001342, 0.000566, 0.000541));$bias += test_word ("social", 0, $freqData, $wordCount,

array(0.000000, 0.000619, 0.000123, 0.000000, 0.000521),array(0.000728, 0.000779, 0.000402, 0.000708, 0.001081));

$bias += test_word ("security", 0, $freqData, $wordCount,array(0.000000, 0.000000, 0.000123, 0.000134, 0.000000),array(0.000607, 0.000167, 0.000805, 0.001274, 0.000270));

$bias += test_word ("nuclear", 0, $freqData, $wordCount,array(0.000000, 0.000000, 0.000000, 0.000000, 0.000065),array(0.001700, 0.000056, 0.000939, 0.000000, 0.000676));


25/30

Nisnevich 25

$bias += test_word ("women", 0, $freqData, $wordCount,array(0.000000, 0.000867, 0.000123, 0.000134, 0.000000),array(0.000121, 0.001502, 0.000268, 0.000566, 0.000270));

$bias += test_word ("radical", 0, $freqData, $wordCount,array(0.000000, 0.000248, 0.000123, 0.000000, 0.000261),array(0.000121, 0.001613, 0.000268, 0.000283, 0.000000));

$bias += test_word ("treaty", 0, $freqData, $wordCount,array(0.000000, 0.000000, 0.000246, 0.000000, 0.000000),array(0.001821, 0.000000, 0.000805, 0.000000, 0.001081));

$bias += test_word ("state", 0, $freqData, $wordCount,array(0.000256, 0.000372, 0.000246, 0.000534, 0.000130),array(0.000850, 0.000668, 0.000671, 0.001274, 0.001216));

$bias += test_word ("liberal", 0, $freqData, $wordCount,array(0.000256, 0.000124, 0.000000, 0.000134, 0.000261),array(0.000607, 0.001335, 0.000537, 0.001415, 0.000541));

$bias += test_word ("left", 0, $freqData, $wordCount,array(0.000128, 0.000248, 0.000246, 0.000000, 0.000195),array(0.000607, 0.002059, 0.000671, 0.000425, 0.000000));

$bias += test_word ("war", 0, $freqData, $wordCount,array(0.000895, 0.000743, 0.000000, 0.000000, 0.001954),

array(0.004128, 0.001892, 0.001744, 0.000708, 0.000270));$bias += test_word ("freedom", 0, $freqData, $wordCount,

array(0.000000, 0.000000, 0.000000, 0.000000, 0.000000),array(0.000243, 0.002615, 0.000000, 0.000142, 0.000000));

$bias += test_word ("students", 0, $freqData, $wordCount,array(0.000000, 0.000000, 0.000000, 0.000000, 0.000000),array(0.000121, 0.002782, 0.000000, 0.000849, 0.000000));

// compares a key word frequency with that of the 5 liberal and 5conservative writers,// and returns a corresponding bias score (higher = more liberal, lower= more conservative)function test_word ($word, $is_lib_keyword, $freqData, $wordCount,$lib_freqs, $con_freqs) {

if (array_key_exists($word, $freqData)) {$freq = $freqData[$word] / $wordCount;

} else {$freq = 0;

}

$libs = 0;$cons = 0;foreach ($lib_freqs as $lib_f) {

if ($lib_f >= $freq) {$libs++;

}

}foreach ($con_freqs as $con_f) {

if ($con_f >= $freq) {$cons++;

}}

if ($libs == 0 && $cons == 0) {// either "more liberal" or "more conservative" than any of

the columnists,


26/30

Nisnevich 26

// so we'll give it either a +0.5 or a -0.5 (maximum for asingle test)

return ($is_lib_keyword - .5);} else {

// What is the chance that an article using the given wordthis many times is liberal?

// Find the proportion of writers with this word freq orhigher that are liberal.

// (Then subtract 50% to get a score between +0.5 and -0.5)return ($libs / ($libs + $cons) - .5);

}}

echo ("On a scale of -7.5 (most conservative) to +7.5 (most liberal),this text has $bias bias.");

?>


27/30


28/30

Nisnevich 28

o http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2010/11/07/IN4J1G5VII.DTL

o http://online.wsj.com/article/SB10001424052702304173704575578200086257706.html

o http://www.salon.com/news/politics/2010_elections/index.html?story=/news/feature/2010/10/25/why_democrats_move_to_the_center

o http://www.huffingtonpost.com/robert-reich/the-secret-bigmoney-takeo_b_754938.html

o http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2010/10/03/INC41FL1DM.DTL

o http://www.huffingtonpost.com/robert-reich/republican-economics-as-s_b_739654.html

o http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2010/09/26/INTI1FHHQQ.DTL

o http://www.salon.com/news/feature/2010/09/21/stimulus_not_enough/index.html

Frank Richo http://www.nytimes.com/2010/11/28/opinion/28rich.htmlo http://www.nytimes.com/2010/11/14/opinion/14rich.htmlo http://www.nytimes.com/2010/11/07/opinion/07rich.html?_r=1o http://www.nytimes.com/2010/10/24/opinion/24rich.html?_r=1&ref=opi

nion

o http://www.nytimes.com/2010/10/10/opinion/10rich.html?_r=1&ref=opinion

o http://www.nytimes.com/2010/10/03/opinion/03rich.html?_r=1o http://www.nytimes.com/2010/09/12/opinion/12rich.html?_r=1&ref=opi

niono http://www.nytimes.com/2010/08/29/opinion/29rich.html?_r=1&ref=opi

nion

o

http://www.nytimes.com/2010/08/08/opinion/08rich.html?_r=1&ref=opiniono http://www.nytimes.com/2010/08/01/opinion/01rich.html?_r=1

Conservative Columnists

Pat Buchanano http://www.realclearpolitics.com/articles/2010/11/30/european_union_ri

p_108087.html

o http://www.realclearpolitics.com/articles/2010/11/26/why_are_we_still_in_korea_108069.html

o http://www.realclearpolitics.com/articles/2010/11/23/is_gop_risking_a_new_cold_war_108035.html

o http://www.realclearpolitics.com/articles/2010/11/19/who_fed_the_tiger_108001.html

o http://www.realclearpolitics.com/articles/2010/11/16/tea_partys_winning_hand_107963.html


29/30

Nisnevich 29

o http://www.realclearpolitics.com/articles/2010/11/12/the_fed_trashes_the_dollar_107928.html

o http://www.realclearpolitics.com/articles/2010/11/09/the_murderers_of_christianity_107884.html

o http://www.realclearpolitics.com/articles/2010/11/05/has_history_passed_obama_by_107847.html

o http://www.realclearpolitics.com/articles/2010/11/02/broders_brainstorm_107802.html

o http://www.realclearpolitics.com/articles/2010/10/29/we_are_in_uncharted_waters.html

David Horowitzo http://archive.frontpagemag.com/readArticle.aspx?ARTID=36385o http://archive.frontpagemag.com/readArticle.aspx?ARTID=36267o http://archive.frontpagemag.com/readArticle.aspx?ARTID=36236o http://archive.frontpagemag.com/readArticle.aspx?ARTID=36189o http://archive.frontpagemag.com/readArticle.aspx?ARTID=35156o http://archive.frontpagemag.com/readArticle.aspx?ARTID=35117o http://archive.frontpagemag.com/readArticle.aspx?ARTID=34689o http://archive.frontpagemag.com/readArticle.aspx?ARTID=34836o http://archive.frontpagemag.com/readArticle.aspx?ARTID=34790o http://archive.frontpagemag.com/readArticle.aspx?ARTID=34348

Charles Krauthammero http://www.washingtonpost.com/wp-

dyn/content/article/2010/11/25/AR2010112502232.htmlo http://www.washingtonpost.com/wp-

dyn/content/article/2010/11/18/AR2010111804494.html

o http://www.nationalreview.com/articles/253121/why-obama-right-about-india-charles-krauthammer

o http://www.washingtonpost.com/wp-dyn/content/article/2010/11/04/AR2010110406581.htmlo http://www.washingtonpost.com/wp-dyn/content/article/2010/10/28/AR2010102806270.html

o http://www.washingtonpost.com/wp-dyn/content/article/2010/10/21/AR2010102104856.html?hpid=opinions

box1

o http://www.washingtonpost.com/wp-dyn/content/article/2010/10/14/AR2010101405234.html

o http://articles.ocregister.com/2010-10-07/opinion/24649228_1_debt-problem-national-debt-democrats

o http://www.nationalreview.com/articles/248433/why-he-sending-them-charles-krauthammer

o http://www.washingtonpost.com/wp-dyn/content/article/2010/09/23/AR2010092304746.html Michelle Malkin

o http://www.realclearpolitics.com/articles/2010/12/01/the_littlest_victims_of_obamacare_108102.html

o http://www.realclearpolitics.com/articles/2010/11/24/giving_thanks_for_american_ingenuity_108048.html

o http://www.realclearpolitics.com/articles/2010/11/19/ray_lahood_obamas_power-mad_cell_phone_czar_108007.html


30/30

Nisnevich 30

o http://www.realclearpolitics.com/articles/2010/11/19/dude_wheres_my_obamacare_waiver_107978.html

o http://www.realclearpolitics.com/articles/2010/11/12/throw_carol_browner_under_the_bus_107934.html

o http://www.realclearpolitics.com/articles/2010/11/10/no_illegal_alien_pilot_left_behind_107899.html

o http://www.realclearpolitics.com/articles/2010/11/05/voters_speak_no_to_soak-the-rich_schemes_107848.html

o http://www.realclearpolitics.com/articles/2010/10/29/standing_tall_the_rise_and_resilience_of_conservative_women_107768.html

o http://www.realclearpolitics.com/articles/2010/10/27/the_lefts_voter_fraud_whitewash_107740.html

o http://www.realclearpolitics.com/articles/2010/10/22/free_the_taxpayers_defund_state-sponsored_media_107676.html

George Willo http://www.washingtonpost.com/wp-

dyn/content/article/2010/12/01/AR2010120104728.html



o http://www.newsweek.com/2010/11/20/will-a-senator-looks-back-to-the-future.html


o http://www.pittsburghlive.com/x/pittsburghtrib/opinion/s_709095.htmlo http://www.washingtonpost.com/wp-

dyn/content/article/2010/11/12/AR2010111204494.htmlo http://www.washingtonpost.com/wp-

dyn/content/article/2010/11/10/AR2010111005499.htmlo http://www.washingtonpost.com/wp-dyn/content/article/2010/11/03/AR2010110303844.html

o http://www.pittsburghlive.com/x/pittsburghtrib/opinion/s_706364.html

Documents

Language and Political Ideology