Language and Political Ideology

Embed Size (px)

Citation preview

  • 8/8/2019 Language and Political Ideology

    1/30

    Nisnevich 1

    Language and Political Ideology:

    A Comparison of the Word Choice of Liberal and Conservative American Columnists

    Alex Nisnevich

    Ling 55AC

    Professor Houser

    Fall 2010

  • 8/8/2019 Language and Political Ideology

    2/30

  • 8/8/2019 Language and Political Ideology

    3/30

    Nisnevich 3

    Introduction

    The topic of this study is the connection between language and thought, and

    particularly how language is used and chosen, even subconsciously, both as a political tool

    and a marker of political identity. To test the relationship between political persuasion and

    language, word frequencies of a wide range of opinion pieces by American liberal and

    conservative writers are analyzed to determine if there is a statistically significant

    difference in the choice of words by liberals and conservatives. This data is then used to

    build a computer algorithm that tries to determine the political persuasion of documents

    based on their word frequencies, and the remainder of the study consists of tests run with

    this algorithm.

    If an effective algorithm can be constructed that accurately determines the political

    persuasion of a document based on its word use, then this is a very significant result for

    linguistics, because it conclusively demonstrates that American liberals and American

    conservatives use language in a way that can be modeled and predicted. In this way, a

    strong connection between language and political beliefs, and on a broader level between

    language and thought, can be shown.

    My working hypothesis is that there is a statistically significant correlation between

    political identification and word choice and that it is theoretically possible for a computer

    model to predict political identification based on a word-frequency analysis of a document

    with a reasonable degree of accuracy. However, I still believe that personal ideology does

    not entirely determine language, and so I dontexpect the algorithm I develop to be any

    more than 80-90% accurate in pinpointing the ideology of well-known writers.

  • 8/8/2019 Language and Political Ideology

    4/30

    Nisnevich 4

    Literature Review

    The connection between language use and political ideology is loosely related to the

    theory of linguistic relativity, which states that people of different linguistic communities

    think differently because their language offers them different ways of expressing the world

    around them (Kramsch 11). While the linguistic relativity hypothesis, or Sapir-Whorf

    hypothesis, only makes a statement about different languages as they relate to different

    world-views, it is conceivable that the same language can be used to encode different world-

    views for different ideological groups, and this is related to linguistic relativity in that both

    stem from the fact that signs, despite generally being arbitrary in form, are non-arbitrary in

    their use.

    The relationship between language and politics in particular was first explored by

    George Orwell, in his aptly-titled 1946 essay Politics and the English Language. Orwell

    alludes to the linguistic relativity hypothesis by writing, If thought corrupts language,

    language can also corrupt thought (Orwell 7), and theorizes that political writing is driven

    by what he described as meaningless words, that is, words, such as fascism,

    democracy, freedom, patriotic, and justice, that have many different meanings for

    different people but are used in an empty and dishonest way (Orwell 4). Orwells essay

    suggests that such meaningless words tend to be used in political writing of all

    persuasions, due to their variable meaning. Several years later, Orwell went on to brilliantly

    describe how language can affect the thoughts of people in a repressive regime in his novel

    Nineteen Eighty-Four.

    Noam Chomsky frequently discussed the connection between language and politics,

    and his view was that political language consisted of mere terms of propaganda such as

    the free world and the national interest (Chomsky 472). Like Orwell, Chomsky believes

  • 8/8/2019 Language and Political Ideology

    5/30

    Nisnevich 5

    that language is often abused to enforce ideological goals (Chomsky 472). In Chomskys

    view, the primary persuasive power of language was in the connotations of words. As an

    example of the significance of connotations, he describes the time when Nicaragua was

    painted negatively by the American media due to its plan to purchase MiG fighter planes,

    when Nicaragua would have been perfectly happy to buy French Mirages if they had been

    allowed to. In the eyes of the media and the public, the MiG had heavily Soviet (and thus

    very negative) connotations, while the Mirage, despite being essentially identical, had no

    negative connotations, and Chomsky attests that the MiG purchase was heavily played up by

    the hawkish media to provoke public opinion against Nicaragua over a triviality (Chomsky

    610-611).

    Orwell and Chomsky both discussed how political language can influence and

    persuade, but George Lakoff took the discussion in a different direction by theorizing that

    political opinions could be correlated to different internal metaphors about government

    that people subscribe to. In his view, conservatives tend to envision government under a

    strict father model where citizens are disciplined into being responsible, moral adults and

    then left alone (Lakoff 65), while liberals tend to envision government under a nurturant

    parent model where essentially good citizens are kept away from corrupting influences

    (Lakoff 108). In Lakoffs view, it is metaphor a form of language that largely drives

    political opinions, not the other way around. Originally I had hoped to be able to empirically

    test Lakoffs theories, but since they concern feelings on a deep conceptual level, it is

    unlikely that a correlation could be made to word frequencies in documents.

    More recently, with the rise of interactivity over the Internet, the connection

    between language and politics has become a focus of some websites that track the

    popularity of different words among different political groups. For instance, a CNN site

  • 8/8/2019 Language and Political Ideology

    6/30

    Nisnevich 6

    launched this year allows users to submit a brief statement of their beliefs, and then

    measures overall word frequency clouds, as well as frequency clouds for Democrats and for

    Republicans (see Figures 1-3). It is important to note, however, that the results of this

    survey will likely differ significantly from the results of my study, due to the vastly different

    surveyed population (namely, the general public as opposed to political columnists).

    Figure 1. CNN's most popular words (independent of ideology)

    Figure 2. CNN's most popular words (Democrats) Figure 3. CNN's most common words (Republicans)

    Source for Fig. 1-3: http://www.cnn.com/interactive/2010/10/politics/ireport.elex.project/?hpt=C1

  • 8/8/2019 Language and Political Ideology

    7/30

    Nisnevich 7

    Component I: Frequency Analysis of Liberal and Conservative Articles

    Methodology

    I chose to break up this study into two components. In the first part, I conducted a

    frequency analysis of 50 liberal and 50 conservative editorials (10 editorials by 10 writers

    each) to determine if certain words were used significantly more by liberal or conservative

    columnists. In the second part, I attempted to build an algorithm to accurately determine an

    articles political ideology using the frequencies of certain words and the dataset produced

    earlier (see Component II).

    The first step of frequency analysis was the selection of authors and articles. To

    select five liberal and five conservative authors, I looked for columnists who satisfied the

    following seven criteria:

    1. Living American print columnist2. Self-described liberal/conservative3. Considered by readers and the general public to be liberal/conservative4. Preferably not too close to the center5. Regular or semi-regular columns, with the last one written no more than a

    few months ago

    6. Columns are broadly about politics, and are not solely focused on anyparticular policy area, such as foreign policy or economics.

    7. Reasonably well-knownThe reason for such a stringent selection process was to eliminate as many sources

    of bias as possible. I eliminated from the running writers who were self-described centrists,

    libertarians, or populists, or whose public perception did not match their self-described

  • 8/8/2019 Language and Political Ideology

    8/30

    Nisnevich 8

    ideology, in order to be able to classify the writers into two distinct categories liberal and

    conservative with no overlap. Furthermore, I only looked for active writers and avoided

    writers who only focused on a single policy area, so that the articles would all have a

    relatively similar general focus and time period. This selection process gave me the

    following writers:

    Liberal

    Jonathan Alter (Newsweek) Maureen Dowd (New York Times) Paul Krugman (New York Times) Robert Reich (aggregated on

    RealClearPolitics)

    Frank Rich (New York Times)

    Conservative

    Pat Buchanan (syndicated) David Horowitz (FrontPage Magazine) Charles Krauthammer (Washington

    Post)

    Michelle Malkin (syndicated) George Will (Washington Post)

    Once I had my ten writers, selection of articles to examine proved not to be difficult.

    With the exception of David Horowitz, all of these writers had their own entries on the news

    aggregator site RealClearPolitics.com, so I was able to see all of their most recent articles. In

    David Horowitzs case, I simply looked at the latest articles that he wrote for FrontPage

    Magazine. In general, I tried to use the ten most recent articles by each writer, though in a

    few cases I had to skip over some articles in the event that they clearly had nothing to do

    with government or politics. Including such articles would only have confused the data,

    since different fields use different jargon. (For a full list of URLs of articles used, see

    Appendix II.)

    To count word frequencies within each document, I wrote a PHP script (see

    count.php in Appendix I for source code). The script used the following rules:

  • 8/8/2019 Language and Political Ideology

    9/30

    Nisnevich 9

    Remove all punctuation characters before counting Ignore articles, copulas, prepositions, conjunctions, connectors, and

    pronouns.

    Count words twice if they are in the article title or subtitle (on the groundsthat these are words that the writer must have deemed especially

    important)

    Ignore words that capitalized >50% of the time in an article (to throw outproper nouns)

    The decision to throw out proper nouns was a difficult one to make, but ultimately I

    decided to do so on the grounds that counting proper nouns would simply make it too easy

    to find differences in language use. For instance, only a liberal article would likely mention

    F.D.R. or Kennedy, and only a conservative article would likely talk about Communists. Even

    barring such extreme examples, proper nouns would tend to point more clearly than other

    words toward a particular ideology, and so I made the decision to ignore words that are

    primarily capitalized in an article.

    The script integrated into a database and kept running totals of word frequencies by

    author and by ideology, so I was able to simply run the script on 100 articles and receive the

    cumulative word frequency per author and per ideology. Total word counts were also

    returned, which were necessary for the next step.

    After I obtained the results for all 100 articles, I calculated the proportional usage of

    each word by each writer by dividing the word frequency by the total word count, and

    performed the same calculation for the running liberal totals and conservative totals.

    Finally, for each word I subtracted its conservative proportional usage from its liberal

    proportional usage, obtaining a number I called the bias. A positive bias thus meant that a

  • 8/8/2019 Language and Political Ideology

    10/30

    Nisnevich 10

    word was used more often by liberal writers, and a negative bias meant that a word was

    used more often by conservative writers. Sorting the table by bias enabled me to finally see

    the words that are used significantly more often by liberal columnists and by conservative

    columnists.

    As one final step, I selected 15 liberal key words and 15 conservative key words

    from the top and bottom of the table, respectively. I took the most biased words that

    satisfied two criteria:

    Related in at least some way to politics or government (that is, not a wordlike not or did)

    For a liberal word, at least 3 of the 5 liberal columnists use it more oftenthan the average conservative columnist, and vice versa. (This was a

    necessary criteria to avoid words that are used frequently but only by one or

    two columnists and thus are poor reprentatives of words used by columnists

    in general.)

    Results and Analysis

    Tables 1-3 show the 50 most frequently used words, the 30 most liberally biased

    words, and the 30 most conservatively biased words, respectively. Liberal key words are

    written in bold blue, and conservative key words are written in bold red. (Note that all

    values given are proportional, so, for instance, if a word with a liberal-avg value of 0.001,

    it is 1 out of every 1,000 words written by liberal columnists on average.)

    Table 1. Most frequently used words (regardless of ideology)

    word liberal-avg conservative-avg liberal bias

    not 0.004418 0.005734 -0.001317

    have 0.004097 0.004716 -0.000619

  • 8/8/2019 Language and Political Ideology

    11/30

    Nisnevich 11

    has 0.003329 0.003179 0.000150

    will 0.003223 0.002909 0.000314

    would 0.002390 0.002576 -0.000186

    had 0.002369 0.002389 -0.000020

    government 0.001601 0.001724 -0.000124

    years 0.001451 0.001828 -0.000377

    president 0.002241 0.000997 0.001244

    political 0.001387 0.001621 -0.000233

    do 0.001366 0.001558 -0.000192

    war 0.000918 0.001828 -0.000911

    percent 0.001686 0.000935 0.000751

    new 0.001409 0.001163 0.000245

    can 0.001707 0.000831 0.000876

    should 0.001409 0.001060 0.000349

    tax 0.001793 0.000644 0.001149

    time 0.001409 0.001018 0.000390people 0.001216 0.001163 0.000053

    just 0.001387 0.000935 0.000452

    said 0.001280 0.000935 0.000346

    economy 0.001537 0.000603 0.000934

    get 0.001430 0.000623 0.000807

    could 0.001067 0.000977 0.000091

    own 0.001195 0.000831 0.000364

    public 0.001110 0.000852 0.000258

    last 0.001323 0.000623 0.000700

    did 0.000726 0.001184 -0.000459there 0.000619 0.001288 -0.000669

    country 0.000704 0.001143 -0.000438

    election 0.000854 0.000977 -0.000123

    two 0.000854 0.000977 -0.000123

    economic 0.001366 0.000395 0.000971

    back 0.001195 0.000540 0.000655

    world 0.000683 0.001039 -0.000356

    money 0.001003 0.000686 0.000317

    academic 0.000000 0.001662 -0.001662

    year 0.001088 0.000561 0.000527

    policy 0.001024 0.000603 0.000422

    health 0.000960 0.000623 0.000337

    why 0.000896 0.000686 0.000211

    right 0.001024 0.000540 0.000484

    way 0.000854 0.000561 0.000293

    top 0.001195 0.000208 0.000987

    cuts 0.001174 0.000208 0.000966

  • 8/8/2019 Language and Political Ideology

    12/30

    Nisnevich 12

    might 0.001067 0.000312 0.000755

    don't 0.000982 0.000395 0.000587

    may 0.000811 0.000540 0.000271

    same 0.000768 0.000582 0.000187

    Table 2. Most liberally biased words

    word

    liberal-

    total

    conservative-

    total liberal bias

    president 0.002241 0.000997 0.001244

    tax 0.001793 0.000644 0.001149

    top 0.001195 0.000208 0.000987

    economic 0.001366 0.000395 0.000971

    cuts 0.001174 0.000208 0.000966

    big 0.001152 0.000187 0.000965

    economy 0.001537 0.000603 0.000934can 0.001707 0.000831 0.000876

    get 0.001430 0.000623 0.000807

    can't 0.001024 0.000229 0.000796

    might 0.001067 0.000312 0.000755

    percent 0.001686 0.000935 0.000751

    last 0.001323 0.000623 0.000700

    plan 0.000790 0.000104 0.000686

    won't 0.000726 0.000042 0.000684

    back 0.001195 0.000540 0.000655

    jobs 0.000896 0.000249 0.000647know 0.000875 0.000229 0.000646

    debt 0.000960 0.000353 0.000607

    don't 0.000982 0.000395 0.000587

    spending 0.000918 0.000332 0.000585

    deficit 0.000726 0.000145 0.000580

    year 0.001088 0.000561 0.000527

    cut 0.000726 0.000208 0.000518

    unemployment 0.000704 0.000187 0.000517

    costs 0.000598 0.000083 0.000514

    business 0.000683 0.000187 0.000496income 0.000662 0.000166 0.000495

    didn't 0.000619 0.000125 0.000494

    right 0.001024 0.000540 0.000484

    Table 3. Most conservatively biased words

    word

    liberal-

    total

    conservative-

    total liberal bias

    academic 0.000000 0.001662 -0.001662

    not 0.004418 0.005734 -0.001317

    students 0.000000 0.001184 -0.001184

    freedom 0.000000 0.001039 -0.001039

    war 0.000918 0.001828 -0.000911

    left 0.000171 0.001039 -0.000868

    liberal 0.000171 0.000977 -0.000806university 0.000000 0.000769 -0.000769

    there 0.000619 0.001288 -0.000669

    faculty 0.000000 0.000665 -0.000665

    have 0.004097 0.004716 -0.000619

    hearings 0.000021 0.000623 -0.000602

    state 0.000277 0.000873 -0.000595

    treaty 0.000043 0.000603 -0.000560

    radical 0.000149 0.000706 -0.000557

    women 0.000192 0.000748 -0.000556

    nuclear 0.000021 0.000561 -0.000540says 0.000299 0.000810 -0.000512

    such 0.000405 0.000914 -0.000509

    security 0.000043 0.000519 -0.000477

    did 0.000726 0.001184 -0.000459

    professors 0.000000 0.000457 -0.000457

    social 0.000299 0.000748 -0.000449

    country 0.000704 0.001143 -0.000438

    unions 0.000000 0.000436 -0.000436

    left-wing 0.000021 0.000436 -0.000415

    states 0.000107 0.000519 -0.000413members 0.000149 0.000561 -0.000412

    course 0.000384 0.000790 -0.000405

    illegal 0.000000 0.000395 -0.000395

  • 8/8/2019 Language and Political Ideology

    13/30

    Nisnevich 13

    Table 4 displays the 15 key liberal words and 15key conservative words that

    were obtained from the above data:

    Table 4. Key liberal and conservative words

    # Liberal Conservative

    1 president students

    2 tax freedom

    3 economic war

    4 cuts left

    5 economy liberal

    6 plan state

    7 jobs treaty

    8 spending radical9 deficit women

    10 cut nuclear

    11 unemployment security

    12 costs social

    13 business country

    14 income unions

    15 financial states

    The most liberally biased word, or the word that liberal columnists used the most in

    comparison to conservative columnists, is president, which may stem from the fact that

    liberal articles tended to refer to Barack Obama as President Obama, while conservative

    articles generally did not (this trend would likely be the opposite when a Republican

    president is in power). Other than this, the liberally biased words tended to relate to the

    economy (tax, economic, cuts, etc), while the conservatively biased words tended to

    relate to foreign policy (war, treaty, nuclear), to liberalism (left, liberal), and to

    nationalism (state, country, states). Comparing these results with those of the CNN

    iReport (see page 5) shows that both datasets put government as the most popular

    politics-related word, but other than that there is very little agreement, which, as mentioned

  • 8/8/2019 Language and Political Ideology

    14/30

    Nisnevich 14

    before, could be ascribed to the two different populations studied. The fact that Paul

    Krugman is an economist as well as columnist may have contributed to the high usage of

    economic terms by the liberal columnists, though it couldnt have been the only factor, since

    almost all of the liberal columnists (with the occasional exceptions of Alter and/or Dowd)

    showed relatively high frequencies of use for the economic terms.

    Applying the theories of Orwell and Chomsky, there is certainly a prevalence of what

    Orwell described as meaningless words in the dataset in particular, the word freedom,

    the second most conservatively biased word is one that both Orwell and Chomsky pointed

    out has no inherent meaning due to the many conflicting connotations that it could have. To

    a lesser extent, this could be said for a great many number of words on these lists: as

    Chomsky would argue, these frequency lists show that the principal goal of political

    language, even in private newspapers such as the New York Times and the Washington Post,

    is to persuade, and persuasion is accomplished with the aid of imprecise language that

    avoids the need for concrete details.

    In this part of the project I demonstrated that there are significant differences in

    word choice between liberal and conservative columnists, but I had not yet determined how

    predictable these differences were. This is the topic that I addressed in the second part of

    the project.

    Component II: Prediction of Ideology from Word Frequencies

    Methodology

    Suppose that an article is presented that is written by either a liberal columnist or a

    conservative columnist, but no other information is given aside from the text of the article

  • 8/8/2019 Language and Political Ideology

    15/30

    Nisnevich 15

    itself. My goal in this part of the project was to write an algorithm that tried to predict

    whether a given article was liberal or conservative, based on the frequency of certain

    words. I made use of the key liberal and key conservative words that I found in the first part

    of this project and decided that the algorithm would test each of the 30 key words,

    determining the likelihood of the article being liberal or conservative based on the

    frequency of each key word and then adding the results together.

    More precisely, the script that I wrote (see guess.php in Appendix I for source code)

    functioned as follows:

    I. For each of the 30 key words,1. Find the frequency of the key word in the given article.2. Determine how many of the 10 writers tested in Component 1 used

    this word at least as often on average as in the given article.

    3.a. If there are some writers who used the word this often, find the

    percentage of them who are liberal. This is the percentage

    chance that the article has a liberal bias in terms of word

    frequency, based on the data from just this word.

    b. If there are no writers who used the word this often, then thisarticle is either very liberal (if its a liberal key word) or very

    conservative (if its a conservative key word). Thus, give it either

    a 100% chance of being liberal or a 0% chance, appropriately,

    based on the data from just this word.

    4. Subtract 50% from this percentage to obtain the bias number. Ifthe bias is positive, the article is likely to be liberal based on this

  • 8/8/2019 Language and Political Ideology

    16/30

    Nisnevich 16

    word (+0.5 bias equates to 100% chance of being liberal). If the bias

    is negative, the article is likely to be conservative based on this word

    ( -0.5 bias equates to 100% chance of being conservative).

    II. Finally, take the 30 bias numbers that are calculated (one for each keyword), and add them together to obtain the overall bias number for the

    article. In theory, this bias could range from -7.5 to +7.5, and in practice it

    has ranged between -4.45 (almost certainly conservative) to +6.5 (almost

    certainly liberal).

    From here, my work on the project consisted of testing this algorithm, the results of

    which appear in the next section.

    Results and Analysis

    To test the results of the algorithm, I ran it on each of the 100 articles I had looked at

    previously. Table 5 below shows the resulting bias score for each of the ten articles (in

    order according to the list in Appendix I) by each of the ten writers, as well as an average

    score for each writer. Figure 4 below is a bar graph of the resulting average bias scores for

    each writer.

    Table 5. Bias scores for each of the tested articles, and average scores per writer

    Alter Dowd Krugman Reich Rich Buchanan Horowitz Krauthammer Malkin W

    +1.00 +0.33 +3.67 +6.50 +1.32 +1.00 -3.50 -1.00 -0.83 -0.2

    +1.92 +1.00 +2.50 +3.50 +2.76 -2.33 -1.25 -0.50 +1.50 -1.0

    +1.00 +1.42 +2.92 +4.50 +3.50 -3.50 -2.25 -1.08 -2.25 +0.7+0.00 +0.00 +2.00 +4.00 +1.31 +1.17 -3.00 -0.25 +1.00 +1.5

    +2.50 +2.50 +3.50 +1.00 +0.06 -0.83 -4.23 -0.50 +0.75 +2.0

    +3.60 -1.50 +2.50 +1.83 +2.67 +1.67 -3.00 +0.25 -1.00 +0.0

    -1.50 -0.33 +2.67 +2.50 +3.10 -1.83 -4.22 -1.00 +0.75 +0.0

    -1.00 -0.50 +2.17 +2.50 +1.55 +1.00 -2.00 +1.00 -2.25 +0.0

    +2.75 +0.67 +3.33 +3.50 +3.50 -1.00 -2.78 +0.00 -2.50 -1.5

  • 8/8/2019 Language and Political Ideology

    17/30

    Nisnevich 17

    Figure 4. Average bias scores for each columnist

    As can be seen, the results are somewhat promising but not as clear-cut as Id hoped

    they would be. While all but one liberal columnist has a score above +1.00, only one

    conservative columnist Horowitz managed a score below -1.00, with the rest floating

    close to the +0.00 line. Most troubling, the algorithm gave George Will a liberal bias score

    thats statistically indistinguishable from that of Maureen Dowd, despite their nearly

    diametrically opposite views. Looking at the table shows that the bias scores given jump

    all over the place even for articles written by the same author, apparently with minor

    fluctuations in word choice from article to article having huge repercussions. All told, 6

    liberal articles were miscategorized as conservative and 17 conservative articles were

    miscategorized as liberal, for a total of 23 mistakes or a 77% success rate, which is about

    what I expected the algorithm to be able to achieve.

    -4.00 -3.00 -2.00 -1.00 +0.00 +1.00 +2.00 +3.00 +4.00

    Alter

    Dowd

    Krugman

    Reich

    Rich

    Buchanan

    Horowitz

    Krauthammer

    Malkin

    Will

    +1.00 +0.25 +2.42 +1.25 -2.33 +0.08 -4.45 -0.58 -2.25 +1.0

    AVG +1.13 +0.38 +2.77 +3.11 +1.74 -0.46 -3.07 -0.37 -0.71 +0.2

  • 8/8/2019 Language and Political Ideology

    18/30

    Nisnevich 18

    Why isnt the algorithm able to give more accurate results? I believe that the issue is

    that, while the differences in word choice between liberal and conservative columnists can

    be demonstrated (see Component I), they cannot necessarily be predicted in advance:

    individual authors will differ from each other in different and possibly unexpected ways,

    and word frequency count is simply not precise enough a measurement to be able to

    achieve a near-perfect accuracy.

    The fact that my algorithm could achieve 77% accuracy with such a simple

    algorithm, however, does seem to provide further evidence for the interrelation between

    political ideology and language. Nevertheless, this accuracy rating is somewhat suspect,

    since the algorithm was applied on the very same articles that were used to determine how

    it operates, which could introduce some bias. Further testing is needed to ascertain how

    useful this algorithm or its variations could be on an unpredictable sample set.

    Conclusion

    Ultimately, both of my hypotheses came out as expected: I demonstrated in

    Component I a statistically significant correlation between political identification and word

    choice, but only managed to achieve 77% accuracy in predicting political persuasion based

    on word choice. As mentioned above, this suggests that the differences in word choice

    between liberal and conservative columnists, while present, are somewhat unpredictable.

    Furthermore, I believe that personal ideology does not entirely determine language use,

    though it is a large contributor, so it would make sense that perfect accuracy is impossible

    in this context. Finally, there is no such thing as a simple duality of political beliefs, and

    there is huge variety in what one can believe in even if one identifies as liberal or as

    conservative. In light of this, the results can be interpreted to mean that, while the 5

  • 8/8/2019 Language and Political Ideology

    19/30

    Nisnevich 19

    liberal authors can be shown to write significantly differently from the 5 conservative

    authors, theres nothing close complete agreement in word choice among the liberal

    authors and among the conservative authors. As the cultural linguistic concept of agency

    makes clear, membership in a group contributes to identity but does not establish identity

    (Bucholtz 422).

    If I had to do this paper again, I think the one thing that I would definitely try to do

    differently would be to include more authors, since only comparing five liberal columnists

    and five conservative columnists involves such a small sample size that unexpected bias can

    easily be introduced as a result. However, significantly increasing the sample size would

    come at a cost, as it would also make the project much more tedious and time-consuming. I

    am also interested in what would have happened if I hadnt removed proper nouns from the

    comparison: I still think that it was the right thing for my project to ignore proper nouns,

    but the results would certainly be different and notable in their own way if I had chosen to

    keep them in.

    Accepting the necessary failings of any attempt to definitively classify writing by

    political persuasion, there are still many useful avenues of research for this topic. In

    particular, if the linguistic bias detection algorithm is made a little more robust, there are

    many sources of data that it could examine and questions that it could consider: Do

    columnists who self-identify as nonpartisan still have a significant conscious or

    subconscious linguistic bias? Are presidential speeches written to appeal more to partisans

    or to centrists, judging by the bias in their word choice? How do political blog articles

    compare to editorials in terms of bias? These and many other questions could be

    investigated with such an algorithm, and could lead to interesting results.

  • 8/8/2019 Language and Political Ideology

    20/30

    Nisnevich 20

    Works Cited

    Buckoltz, Mary. "Language, Gender, and Sexuality." Language in the USA: Themes for the

    Twenty-first Century. By Edward Finegan and John R. Rickford. Cambridge:

    Cambridge UP, 2004. Print.

    Chomsky, Noam, and Carlos Peregrn Otero. Language and Politics. Oakland, CA: AK, 2004.

    Print.

    "IReport Election Project." CNN.com. Cable News Network, 27 Oct. 2010. Web. 2 Dec. 2010.

    Kramsch, Claire J. Language and Culture. Oxford, OX: Oxford UP, 1998. Print.

    Lakoff, George. Moral Politics: How Liberals and Conservatives Think. Chicago: Univ. of

    Chicago, 2001. Print.

    Orwell, George. "Politics and the English Language." Horizons 1946. Web.

  • 8/8/2019 Language and Political Ideology

    21/30

    Nisnevich 21

    Appendix I: Source Code

    count.php

  • 8/8/2019 Language and Political Ideology

    22/30

    Nisnevich 22

    {// Increment total word count$wordCount++;

    // Ignore words in ignore listif (!in_array ($word, $ignoreList)) {

    // If the word is uppercase, make lowercase and also add touppercase frequency table

    if (ctype_upper(substr($word, 0, 1))) {$word = strtolower($word);array_key_exists( $word, $uppercase ) ? $uppercase[

    $word ]++ : $uppercase[ $word ] = 1;}

    // For each word found in the frequency table, incrementits value by one

    array_key_exists( $word, $freqData ) ? $freqData[ $word ]++: $freqData[ $word ] = 1;

    }

    }

    // Insert a "Total" entry for total word count$freqData['#TOTAL#'] = $wordCount;

    // Now, we must insert results into the db

    // First, add db column for this authoradd_column_if_not_exist ("research_ling_polwordfreq", $author);

    // If a word is uppercase more than 50% of the time, ignore it// Otherwise, add it to two columns: one for the author and one for theideologyforeach ( $freqData as $word => $freq) {

    if (!array_key_exists ($word, $uppercase) || $freq > (2 *$uppercase[$word])) {

    $query = "INSERT INTO `research_ling_polwordfreq` (`word`,`$author`, `$ideology-total`) VALUES('$word', $freq, $freq) ONDUPLICATE KEY UPDATE `$author` = `$author` + $freq, `$ideology-total` =`$ideology-total` + $freq";

    $result = mysql_query ($query);echo "$query ... $result
    ";

    }}echo "DONE";

    // Function for adding a column only if it doesn't already exist

    function add_column_if_not_exist($db, $column, $column_attr = "INT NOTNULL" ){

    $exists = false;$columns = mysql_query("show columns from $db");while($c = mysql_fetch_assoc($columns)){

    if($c['Field'] == $column){$exists = true;break;

    }}

  • 8/8/2019 Language and Political Ideology

    23/30

    Nisnevich 23

    if(!$exists){$query = "ALTER TABLE `$db` ADD `$column` $column_attr";$result = mysql_query($query);echo "$query ... $result
    ";

    }}

    ?>

    guess.php

  • 8/8/2019 Language and Political Ideology

    24/30

    Nisnevich 24

    array(0.000128, 0.000124, 0.002339, 0.003072, 0.001303),array(0.000850, 0.000111, 0.000537, 0.000283, 0.000541));

    $bias += test_word ("cuts", 1, $freqData, $wordCount,array(0.000639, 0.000248, 0.001969, 0.001469, 0.001368),array(0.000728, 0.000000, 0.000000, 0.000425, 0.000135));

    $bias += test_word ("economy", 1, $freqData, $wordCount,array(0.000895, 0.000248, 0.003200, 0.004408, 0.000261),array(0.001214, 0.000111, 0.000939, 0.000425, 0.000946));

    $bias += test_word ("plan", 1, $freqData, $wordCount,array(0.001278, 0.000743, 0.001723, 0.000401, 0.000261),array(0.000121, 0.000000, 0.000000, 0.000566, 0.000000));

    $bias += test_word ("jobs", 1, $freqData, $wordCount,array(0.000511, 0.000248, 0.000615, 0.001870, 0.001107),array(0.000243, 0.000111, 0.000000, 0.000425, 0.000676));

    $bias += test_word ("spending", 1, $freqData, $wordCount,array(0.000383, 0.000619, 0.001600, 0.001603, 0.000651),array(0.000486, 0.000000, 0.000134, 0.000283, 0.001216));

    $bias += test_word ("deficit", 1, $freqData, $wordCount,array(0.000128, 0.000124, 0.001723, 0.001870, 0.000261),array(0.000607, 0.000000, 0.000000, 0.000000, 0.000270));

    $bias += test_word ("cut", 1, $freqData, $wordCount,array(0.000383, 0.000000, 0.000369, 0.002271, 0.000717),array(0.000607, 0.000000, 0.000134, 0.000425, 0.000135));

    $bias += test_word ("unemployment", 1, $freqData, $wordCount,array(0.000383, 0.000000, 0.001600, 0.001202, 0.000521),array(0.000243, 0.000000, 0.000537, 0.000142, 0.000270));

    $bias += test_word ("costs", 1, $freqData, $wordCount,array(0.001534, 0.000000, 0.000615, 0.001336, 0.000065),array(0.000000, 0.000000, 0.000000, 0.000425, 0.000135));

    $bias += test_word ("business", 1, $freqData, $wordCount,array(0.000383, 0.000124, 0.000123, 0.002004, 0.000782),array(0.000121, 0.000111, 0.000134, 0.000708, 0.000000));

    $bias += test_word ("income", 1, $freqData, $wordCount,array(0.000000, 0.000000, 0.000492, 0.002004, 0.000782),array(0.000121, 0.000000, 0.000134, 0.000849, 0.000000));

    $bias += test_word ("financial", 1, $freqData, $wordCount,array(0.000767, 0.000248, 0.000739, 0.000267, 0.001172),array(0.000243, 0.000056, 0.000671, 0.000425, 0.000135));

    $bias += test_word ("states", 0, $freqData, $wordCount,array(0.000256, 0.000000, 0.000000, 0.000267, 0.000065),array(0.000364, 0.000111, 0.000402, 0.001840, 0.000541));

    $bias += test_word ("unions", 0, $freqData, $wordCount,array(0.000000, 0.000000, 0.000000, 0.000000, 0.000000),array(0.000000, 0.000612, 0.000000, 0.000849, 0.000541));

    $bias += test_word ("country", 0, $freqData, $wordCount,array(0.000895, 0.000867, 0.000492, 0.000000, 0.000977),

    array(0.002064, 0.001113, 0.001342, 0.000566, 0.000541));$bias += test_word ("social", 0, $freqData, $wordCount,

    array(0.000000, 0.000619, 0.000123, 0.000000, 0.000521),array(0.000728, 0.000779, 0.000402, 0.000708, 0.001081));

    $bias += test_word ("security", 0, $freqData, $wordCount,array(0.000000, 0.000000, 0.000123, 0.000134, 0.000000),array(0.000607, 0.000167, 0.000805, 0.001274, 0.000270));

    $bias += test_word ("nuclear", 0, $freqData, $wordCount,array(0.000000, 0.000000, 0.000000, 0.000000, 0.000065),array(0.001700, 0.000056, 0.000939, 0.000000, 0.000676));

  • 8/8/2019 Language and Political Ideology

    25/30

    Nisnevich 25

    $bias += test_word ("women", 0, $freqData, $wordCount,array(0.000000, 0.000867, 0.000123, 0.000134, 0.000000),array(0.000121, 0.001502, 0.000268, 0.000566, 0.000270));

    $bias += test_word ("radical", 0, $freqData, $wordCount,array(0.000000, 0.000248, 0.000123, 0.000000, 0.000261),array(0.000121, 0.001613, 0.000268, 0.000283, 0.000000));

    $bias += test_word ("treaty", 0, $freqData, $wordCount,array(0.000000, 0.000000, 0.000246, 0.000000, 0.000000),array(0.001821, 0.000000, 0.000805, 0.000000, 0.001081));

    $bias += test_word ("state", 0, $freqData, $wordCount,array(0.000256, 0.000372, 0.000246, 0.000534, 0.000130),array(0.000850, 0.000668, 0.000671, 0.001274, 0.001216));

    $bias += test_word ("liberal", 0, $freqData, $wordCount,array(0.000256, 0.000124, 0.000000, 0.000134, 0.000261),array(0.000607, 0.001335, 0.000537, 0.001415, 0.000541));

    $bias += test_word ("left", 0, $freqData, $wordCount,array(0.000128, 0.000248, 0.000246, 0.000000, 0.000195),array(0.000607, 0.002059, 0.000671, 0.000425, 0.000000));

    $bias += test_word ("war", 0, $freqData, $wordCount,array(0.000895, 0.000743, 0.000000, 0.000000, 0.001954),

    array(0.004128, 0.001892, 0.001744, 0.000708, 0.000270));$bias += test_word ("freedom", 0, $freqData, $wordCount,

    array(0.000000, 0.000000, 0.000000, 0.000000, 0.000000),array(0.000243, 0.002615, 0.000000, 0.000142, 0.000000));

    $bias += test_word ("students", 0, $freqData, $wordCount,array(0.000000, 0.000000, 0.000000, 0.000000, 0.000000),array(0.000121, 0.002782, 0.000000, 0.000849, 0.000000));

    // compares a key word frequency with that of the 5 liberal and 5conservative writers,// and returns a corresponding bias score (higher = more liberal, lower= more conservative)function test_word ($word, $is_lib_keyword, $freqData, $wordCount,$lib_freqs, $con_freqs) {

    if (array_key_exists($word, $freqData)) {$freq = $freqData[$word] / $wordCount;

    } else {$freq = 0;

    }

    $libs = 0;$cons = 0;foreach ($lib_freqs as $lib_f) {

    if ($lib_f >= $freq) {$libs++;

    }

    }foreach ($con_freqs as $con_f) {

    if ($con_f >= $freq) {$cons++;

    }}

    if ($libs == 0 && $cons == 0) {// either "more liberal" or "more conservative" than any of

    the columnists,

  • 8/8/2019 Language and Political Ideology

    26/30

    Nisnevich 26

    // so we'll give it either a +0.5 or a -0.5 (maximum for asingle test)

    return ($is_lib_keyword - .5);} else {

    // What is the chance that an article using the given wordthis many times is liberal?

    // Find the proportion of writers with this word freq orhigher that are liberal.

    // (Then subtract 50% to get a score between +0.5 and -0.5)return ($libs / ($libs + $cons) - .5);

    }}

    echo ("On a scale of -7.5 (most conservative) to +7.5 (most liberal),this text has $bias bias.");

    ?>

  • 8/8/2019 Language and Political Ideology

    27/30

  • 8/8/2019 Language and Political Ideology

    28/30

    Nisnevich 28

    o http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2010/11/07/IN4J1G5VII.DTL

    o http://online.wsj.com/article/SB10001424052702304173704575578200086257706.html

    o http://www.salon.com/news/politics/2010_elections/index.html?story=/news/feature/2010/10/25/why_democrats_move_to_the_center

    o http://www.huffingtonpost.com/robert-reich/the-secret-bigmoney-takeo_b_754938.html

    o http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2010/10/03/INC41FL1DM.DTL

    o http://www.huffingtonpost.com/robert-reich/republican-economics-as-s_b_739654.html

    o http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2010/09/26/INTI1FHHQQ.DTL

    o http://www.salon.com/news/feature/2010/09/21/stimulus_not_enough/index.html

    Frank Richo http://www.nytimes.com/2010/11/28/opinion/28rich.htmlo http://www.nytimes.com/2010/11/14/opinion/14rich.htmlo http://www.nytimes.com/2010/11/07/opinion/07rich.html?_r=1o http://www.nytimes.com/2010/10/24/opinion/24rich.html?_r=1&ref=opi

    nion

    o http://www.nytimes.com/2010/10/10/opinion/10rich.html?_r=1&ref=opinion

    o http://www.nytimes.com/2010/10/03/opinion/03rich.html?_r=1o http://www.nytimes.com/2010/09/12/opinion/12rich.html?_r=1&ref=opi

    niono http://www.nytimes.com/2010/08/29/opinion/29rich.html?_r=1&ref=opi

    nion

    o

    http://www.nytimes.com/2010/08/08/opinion/08rich.html?_r=1&ref=opiniono http://www.nytimes.com/2010/08/01/opinion/01rich.html?_r=1

    Conservative Columnists

    Pat Buchanano http://www.realclearpolitics.com/articles/2010/11/30/european_union_ri

    p_108087.html

    o http://www.realclearpolitics.com/articles/2010/11/26/why_are_we_still_in_korea_108069.html

    o http://www.realclearpolitics.com/articles/2010/11/23/is_gop_risking_a_new_cold_war_108035.html

    o http://www.realclearpolitics.com/articles/2010/11/19/who_fed_the_tiger_108001.html

    o http://www.realclearpolitics.com/articles/2010/11/16/tea_partys_winning_hand_107963.html

  • 8/8/2019 Language and Political Ideology

    29/30

    Nisnevich 29

    o http://www.realclearpolitics.com/articles/2010/11/12/the_fed_trashes_the_dollar_107928.html

    o http://www.realclearpolitics.com/articles/2010/11/09/the_murderers_of_christianity_107884.html

    o http://www.realclearpolitics.com/articles/2010/11/05/has_history_passed_obama_by_107847.html

    o http://www.realclearpolitics.com/articles/2010/11/02/broders_brainstorm_107802.html

    o http://www.realclearpolitics.com/articles/2010/10/29/we_are_in_uncharted_waters.html

    David Horowitzo http://archive.frontpagemag.com/readArticle.aspx?ARTID=36385o http://archive.frontpagemag.com/readArticle.aspx?ARTID=36267o http://archive.frontpagemag.com/readArticle.aspx?ARTID=36236o http://archive.frontpagemag.com/readArticle.aspx?ARTID=36189o http://archive.frontpagemag.com/readArticle.aspx?ARTID=35156o http://archive.frontpagemag.com/readArticle.aspx?ARTID=35117o http://archive.frontpagemag.com/readArticle.aspx?ARTID=34689o http://archive.frontpagemag.com/readArticle.aspx?ARTID=34836o http://archive.frontpagemag.com/readArticle.aspx?ARTID=34790o http://archive.frontpagemag.com/readArticle.aspx?ARTID=34348

    Charles Krauthammero http://www.washingtonpost.com/wp-

    dyn/content/article/2010/11/25/AR2010112502232.htmlo http://www.washingtonpost.com/wp-

    dyn/content/article/2010/11/18/AR2010111804494.html

    o http://www.nationalreview.com/articles/253121/why-obama-right-about-india-charles-krauthammer

    o http://www.washingtonpost.com/wp-dyn/content/article/2010/11/04/AR2010110406581.htmlo http://www.washingtonpost.com/wp-dyn/content/article/2010/10/28/AR2010102806270.html

    o http://www.washingtonpost.com/wp-dyn/content/article/2010/10/21/AR2010102104856.html?hpid=opinions

    box1

    o http://www.washingtonpost.com/wp-dyn/content/article/2010/10/14/AR2010101405234.html

    o http://articles.ocregister.com/2010-10-07/opinion/24649228_1_debt-problem-national-debt-democrats

    o http://www.nationalreview.com/articles/248433/why-he-sending-them-charles-krauthammer

    o http://www.washingtonpost.com/wp-dyn/content/article/2010/09/23/AR2010092304746.html Michelle Malkin

    o http://www.realclearpolitics.com/articles/2010/12/01/the_littlest_victims_of_obamacare_108102.html

    o http://www.realclearpolitics.com/articles/2010/11/24/giving_thanks_for_american_ingenuity_108048.html

    o http://www.realclearpolitics.com/articles/2010/11/19/ray_lahood_obamas_power-mad_cell_phone_czar_108007.html

  • 8/8/2019 Language and Political Ideology

    30/30

    Nisnevich 30

    o http://www.realclearpolitics.com/articles/2010/11/19/dude_wheres_my_obamacare_waiver_107978.html

    o http://www.realclearpolitics.com/articles/2010/11/12/throw_carol_browner_under_the_bus_107934.html

    o http://www.realclearpolitics.com/articles/2010/11/10/no_illegal_alien_pilot_left_behind_107899.html

    o http://www.realclearpolitics.com/articles/2010/11/05/voters_speak_no_to_soak-the-rich_schemes_107848.html

    o http://www.realclearpolitics.com/articles/2010/10/29/standing_tall_the_rise_and_resilience_of_conservative_women_107768.html

    o http://www.realclearpolitics.com/articles/2010/10/27/the_lefts_voter_fraud_whitewash_107740.html

    o http://www.realclearpolitics.com/articles/2010/10/22/free_the_taxpayers_defund_state-sponsored_media_107676.html

    George Willo http://www.washingtonpost.com/wp-

    dyn/content/article/2010/12/01/AR2010120104728.html

    o http://www.washingtonpost.com/wp-dyn/content/article/2010/11/26/AR2010112603490.html

    o http://www.washingtonpost.com/wp-dyn/content/article/2010/11/24/AR2010112405841.html

    o http://www.newsweek.com/2010/11/20/will-a-senator-looks-back-to-the-future.html

    o http://www.washingtonpost.com/wp-dyn/content/article/2010/11/17/AR2010111705316.html

    o http://www.pittsburghlive.com/x/pittsburghtrib/opinion/s_709095.htmlo http://www.washingtonpost.com/wp-

    dyn/content/article/2010/11/12/AR2010111204494.htmlo http://www.washingtonpost.com/wp-

    dyn/content/article/2010/11/10/AR2010111005499.htmlo http://www.washingtonpost.com/wp-dyn/content/article/2010/11/03/AR2010110303844.html

    o http://www.pittsburghlive.com/x/pittsburghtrib/opinion/s_706364.html