62
Measuring Polarization in High-Dimensional Data: Method and Application to Congressional Speech Matthew Gentzkow, Stanford and NBER Jesse M. Shapiro, Brown and NBER Matt Taddy, Microsoft and Chicago Booth

Measuring Polarization in High-Dimensional Data

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Measuring Polarization in High-Dimensional Data

Measuring Polarization in High-Dimensional Data:Method and Application to Congressional Speech

Matthew Gentzkow, Stanford and NBERJesse M. Shapiro, Brown and NBERMatt Taddy, Microsoft and Chicago Booth

Page 2: Measuring Polarization in High-Dimensional Data

Tax Relief

DEATH TAX

freedom fighters

illegal alien

terrorists

ESTATE TAX

Tax Breaks

undocumented worker

Wealthiest

1 percent

living wage

fair labor

capitalist African American

Pro choice

equality

Tax Freedom

War on Terror

Pro life

Big Government

entrepreneurs

Right to life

Washington takeover

Welfare Queens

Page 3: Measuring Polarization in High-Dimensional Data

Origins

Dr. Frank I. Luntz – The Language of Healthcare 2009

1

THE LANGUAGE OF HEALTHCARE 2009

THE 10 RULES FOR STOPPING THE

“WASHINGTON TAKEOVER” OF HEALTHCARE

(1) Humanize your approach. Abandon and exile ALL references to the “healthcare

system.” From now on, healthcare is about people. Before you speak, think of the three

components of tone that matter most: Individualize. Personalize. Humanize.

(2) Acknowledge the “crisis” or suffer the consequences. If y

ou say there is no healthcare

crisis, you give your listener permission to ignore everything else you say. It is a

credibility killer for most Americans. A better approach is to define the crisis in your

terms. “If you’re one of the millions who can’t afford healthcare, it is a crisis.” Better

yet, “If some bureaucrat puts himself between you and your doctor, denying you

exactly what you need, that’s a crisis.” And the best: “If you have to wait weeks for

tests and months for treatment, that’s a healthcare crisis.”

(3) “Time” is the government healthcare killer. As Mick Jagger once sang, “Time is on

Your Side.” Nothing else turns people against the government takeover of healthcare

than the realistic expectation that it will result in delayed and potentially even denied

treatment, procedures and/or medications. “Waiting to buy a car or even a house won’t

kill you. But waiting for the healthcare you need – could. Delayed care is denied care.”

(4) The arguments against the Democrats’ healthcare plan must center around

“politicians,” “bureaucrats,” and “Washington” … not the free market, tax incentives,

or competition. Stop talking economic theory and start personalizing the impact of a

government takeover of healthcare. They don’t want to hear that you’re opposed to

government healthcare because it’s too expensive (any help from the government to

lower costs will be embraced) or because it’s anti-competitive (they don’t know about or

care about current limits to competition). But they are deathly afraid that a government

takeover will lower their quality of care – so they are extremely receptive to the anti-

Washington approach. It’s not an economic issue. It’s

a bureaucratic issue.

(5) The healthcare denial horror stories from Canada & Co. do resonate, but you have

to humanize them. You’ll notice we recommend the phrase “government takeover”

rather than “government run” or “government controlled” It’s because too many

politician say “we don’t want a government run healthcare system like Canada or Great

Britain” without explaining those consequences. There is a better approach. “In

countries with government run healthcare, politicians make YOUR healthcare decisions.

THEY decide if you’ll get the procedure you need, or if you are disqualified because the

treatment is too expensive or because you are too old. We can’t have that in America.”

Page 4: Measuring Polarization in High-Dimensional Data

Example: Social Security

• Luntz (2006):• “Never say ’privatization / private accounts.’ Instead say

’personalization / personal accounts.’ Two-thirds of America want topersonalize security while only one third would privatize it. Why?[Personalization] suggests ownership and control... while [privatization]suggests a profit motive and winners and losers.”

Page 5: Measuring Polarization in High-Dimensional Data

Example: Social Security

• 2005 Congress

Rep Dem“personal account” 184 48“private account” 5 542

• Media coverage, 6/23/05• “House GOP offers plan for Social Security; Bush’s private accounts

would be scaled back” (Washington Post)• “GOP backs use of Social Security surplus; Finds funding for personal

accounts” (Washington Times)

Page 6: Measuring Polarization in High-Dimensional Data

Example: Social Security

• 2005 Congress

Rep Dem“personal account” 184 48“private account” 5 542

• Media coverage, 6/23/05• “House GOP offers plan for Social Security; Bush’s private accounts

would be scaled back” (Washington Post)• “GOP backs use of Social Security surplus; Finds funding for personal

accounts” (Washington Times)

Page 7: Measuring Polarization in High-Dimensional Data

Is partisan speech a new phenomenon?

Page 8: Measuring Polarization in High-Dimensional Data

This Paper

• Goal: Measure trends in partisanship of political speech• Data: US Congressional Record, 1873-2009• Challenge: Speech is high-dimensional choice data

• Potential for severe finite-sample bias• Computation can be difficult

• Solution: Structural estimation with machine-learning methods• Approach exportable to other contexts (e.g. web browsing, residential

segregation)

Page 9: Measuring Polarization in High-Dimensional Data

Literature

• Polarization in Congress• E.g., Poole & Rosenthal (1984, 1997); McCarty et al. (2006)

• Polarization more broadly• E.g., Fiorina et al. (2006); Fiorina & Abrams (2006); Abramowitz & Saunders (2008)

• Congressional speech• E.g., Grimmer (2010, 2013); Quinn et al (2010)• Jensen et al (2012)

Page 10: Measuring Polarization in High-Dimensional Data

Data

Page 11: Measuring Polarization in High-Dimensional Data

Data

• US Congressional Record, 1872-2009• Use automated script to identify speaker and tag with metadata• Use some rules of thumb to remove procedural phrases

• “I yield the remainder of my time...”

• Turn into counts of two-word phrases less stems and stopwords• “war on terrorism” and “war on terror” become “war terror”

Page 12: Measuring Polarization in High-Dimensional Data

Trends in Verbosity

1880 1900 1920 1940 1960 1980 2000

5000

1000

015

000

Year

Tota

l utte

ranc

es p

er s

peak

er

Page 13: Measuring Polarization in High-Dimensional Data

Model

Page 14: Measuring Polarization in High-Dimensional Data

Statistical Model

• Vector of phrase counts cit for members i• Party affiliation P (i) ∈ {R,D}• Speaker characteristics xit

• Verbosity mit =∑

j cijt

• Assume throughout that

cit ∼ MN(

mit ,qP(i)t (xit)

)

Page 15: Measuring Polarization in High-Dimensional Data

Question

• How different are choices of R and D at each t?• Translation: how different are qR

t () and qDt ()?

• Approach: measure partisanship by diagnosticity• How much can I learn about your party from what you say?

Page 16: Measuring Polarization in High-Dimensional Data

Posteriors

• Posterior belief of an observer with a neutral prior after hearing phrase j

ρjt (x) =qR

jt (x)qR

jt (x) + qDjt (x)

• Posterior that the observer expects to assign to the speaker’s true party

πt (x) =12

qRt (x)′ · ρt (x) +

12

qDt (x)′ · (1− ρt (x))

Page 17: Measuring Polarization in High-Dimensional Data

Posteriors

• Posterior belief of an observer with a neutral prior after hearing phrase j

ρjt (x) =qR

jt (x)qR

jt (x) + qDjt (x)

• Posterior that the observer expects to assign to the speaker’s true party

πt (x) =12

qRt (x)′ · ρt (x) +

12

qDt (x)′ · (1− ρt (x))

Page 18: Measuring Polarization in High-Dimensional Data

Measure of Partisanship

πt =1Nt

∑i

πt (xit)

• Between 12 (speech uninformative) and 1 (speech fully revealing)

• Close cousin of isolation (White 1986, Cutler et al 1999)

Page 19: Measuring Polarization in High-Dimensional Data

Estimation

Page 20: Measuring Polarization in High-Dimensional Data

Plug-In Estimator

• Empirical analogues

q̂Pjt =

∑i∈P cijt∑i∈P mit

ρ̂jt =q̂R

jt

q̂Rjt + q̂D

jt

π̂PLUGINt =

12(q̂R

t)′ρ̂t +

12(q̂D

t)′(1− ρ̂t)

• This is the MLE when xit is constant• Consistent as quantity of speech grows large holding size of vocabulary

fixed

Page 21: Measuring Polarization in High-Dimensional Data

Maximum Likelihood EstimatorA

vera

ge p

artis

ansh

ip

1870 1890 1910 1930 1950 1970 1990 2010

0.54

0.56

0.58

0.60

0.62

0.64 real

Page 22: Measuring Polarization in High-Dimensional Data

Maximum Likelihood EstimatorA

vera

ge p

artis

ansh

ip

1870 1890 1910 1930 1950 1970 1990 2010

0.54

0.56

0.58

0.60

0.62

0.64 random real

Page 23: Measuring Polarization in High-Dimensional Data

Bias

E[(

q̂Rt)′ρ̂t −

(qR

t)′ρt

]=

(qR

t)′

E (ρ̂t − ρt) +

Cov[(

q̂Rt − qR

t)′, (ρ̂t − ρt)

]

• q̂Pt is unbiased for qP

t

• First term non-zero because ρ̂t is a non-linear function of q̂Pt

• Second term non-zero because ρ̂t is an increasing function of q̂Rt

Page 24: Measuring Polarization in High-Dimensional Data

Jensen et al. (2012)S

tand

ardi

zed

pola

rizat

ion

1870 1890 1910 1930 1950 1970 1990 2010

−1

0

1

2

3random real

Page 25: Measuring Polarization in High-Dimensional Data

Restrict to Commonly Occurring Phrases?

Top 90 percent of phrases

Ave

rage

par

tisan

ship

1870 1890 1910 1930 1950 1970 1990 2010

0.54

0.55

0.56

0.57

0.58

0.59

0.60 random real

Spoken more than 5 times

1870 1890 1910 1930 1950 1970 1990 2010

0.53

0.54

0.55

0.56

Top 50 percent of phrases

1870 1890 1910 1930 1950 1970 1990 2010

0.53

0.54

0.55

0.56

0.57

0.58

0.59Spoken more than 20 times

1870 1890 1910 1930 1950 1970 1990 2010

0.515

0.520

0.525

0.530

0.535

0.540

0.545

Top 10 percent of phrases

1870 1890 1910 1930 1950 1970 1990 2010

0.52

0.53

0.54

0.55

Spoken more than 100 times

1870 1890 1910 1930 1950 1970 1990 2010

0.510

0.515

0.520

0.525

0.530

0.535

Top 1 percent of phrases

1870 1890 1910 1930 1950 1970 1990 2010

0.510

0.515

0.520

0.525

0.530

0.535

Spoken more than 500 times

1870 1890 1910 1930 1950 1970 1990 2010

0.505

0.510

0.515

0.520

0.525

0.530

Page 26: Measuring Polarization in High-Dimensional Data

Leave-Out Estimator

• Define ρ̂−i,t which leaves out i• Define

π̂LOEt =

12

1|Rt |

∑i∈Rt

q̂′i,t · ρ̂−i,t +12

1|Dt |

∑i∈Dt

q̂′i,t ·(1− ρ̂−i,t

)• Enforces independence of q̂ and ρ̂• Still biased because of non-linear ρ̂

Page 27: Measuring Polarization in High-Dimensional Data

Leave-Out EstimatorA

vera

ge p

artis

ansh

ip

1870 1890 1910 1930 1950 1970 1990 2010

0.500

0.505

0.510

0.515

0.520

0.525random real

• Controlling bias• Add lasso type penalty to likelihood• Shrinks ρ̂jt toward 1

2

• Making computation feasible• Approximate likelihood with Poisson• Allows distributed computing (Taddy 2015)

• Controlling for confounds (xit )• geography, chamber, gender, indicator for being in majority party

Page 28: Measuring Polarization in High-Dimensional Data

Main Results

Page 29: Measuring Polarization in High-Dimensional Data

Baseline SpecificationA

vera

ge p

artis

ansh

ip

1870 1890 1910 1930 1950 1970 1990 2010

0.500

0.505

0.510

0.515

0.520

0.525 real random

Page 30: Measuring Polarization in High-Dimensional Data

MagnitudeE

xpec

ted

post

erio

r

0 20 40 60 80 100

0.5

0.6

0.7

0.8

0.9

1.0

Number of phrases

One minute of speech1873−1874

1989−19902007−2008

Page 31: Measuring Polarization in High-Dimensional Data

Comparison: Roll Call Votes

1870 1890 1910 1930 1950 1970 1990 2010

0.500

0.505

0.510

0.515

0.520

0.525

0.50

0.55

0.60

0.65

0.70

0.75

0.80

0.85average partisanship (speech)distance between parties (roll−call voting)

Page 32: Measuring Polarization in High-Dimensional Data

Comparison: Roll Call Votes

.44

.46

.48

.50

.52

.54

.56

−1.0 −0.5 0.0 0.5 1.0NOMINATE

Par

tisan

ship

Democrat

Republican

Page 33: Measuring Polarization in High-Dimensional Data

Unpacking Partisanship

Page 34: Measuring Polarization in High-Dimensional Data

Most Partisan Phrases

• Define the partisanship of phrase j in session t to be the effect on πt ofremoving phrase j from the vocabulary (redistributing probability mass toother phrases proportionally)• Let q̃P

kt equal qPkt/

(1− qP

jt)

if k 6= j and 0 otherwise• Recompute πt replacing qP

t with ~qPt and holding ρt constant

Page 35: Measuring Polarization in High-Dimensional Data

60th Congress (1907-08)

Most Republican Most Democraticinfantri war section cornerindian war ship subsidimount volunt republ panamafeet thenc level canalpostal save powder trustspain pay print paperwar pay lock canalfirst regiment bureau corporsoil survey senatori termnation forest remove wreck

Page 36: Measuring Polarization in High-Dimensional Data

60th Congress (1907-08)

Most Republican Most Democraticinfantri war section cornerindian war ship subsidimount volunt republ panamafeet thenc level canalpostal save powder trustspain pay print paperwar pay lock canalfirst regiment bureau corporsoil survey senatori termnation forest remove wreck

• 1908 Rep platform: Calls for “generous provision” for veterans of Spanish-American andIndian wars

Page 37: Measuring Polarization in High-Dimensional Data

60th Congress (1907-08)

Most Republican Most Democraticinfantri war section cornerindian war ship subsidimount volunt republ panamafeet thenc level canalpostal save powder trustspain pay print paperwar pay lock canalfirst regiment bureau corporsoil survey senatori termnation forest remove wreck

• 1908 Dem platform: “Free the Government from the grip of those who have made it abusiness asset of the favor-seeking corporations.”

• William Cox (D-IN): “the entire United States is now being held up by a great hydra-headedmonster, known in ordinary parlance as a ’powder trust’.”

Page 38: Measuring Polarization in High-Dimensional Data

80th Congress (1947-48)

Most Republican Most Democraticsteam plant admir denfeldcoast guard public busistop communism labor standarddepart agricultur intern laborlend leas tax refundzone germani concili servicebritish loan standard actapprov compact soil conservunit kingdom school lunchunion shop cent hour

Page 39: Measuring Polarization in High-Dimensional Data

80th Congress (1947-48)

Most Republican Most Democraticsteam plant admir denfeldcoast guard public busistop communism labor standarddepart agricultur intern laborlend leas tax refundzone germani concili servicebritish loan standard actapprov compact soil conservunit kingdom school lunchunion shop cent hour

• Aftermath of WWII

Page 40: Measuring Polarization in High-Dimensional Data

80th Congress (1947-48)

Most Republican Most Democraticsteam plant admir denfeldcoast guard public busistop communism labor standarddepart agricultur intern laborlend leas tax refundzone germani concili servicebritish loan standard actapprov compact soil conservunit kingdom school lunchunion shop cent hour

• 1948 Dem platform: Advocates amending Fair Labor Standards Act to raise the federalminimum wage to 75 cents per hour; also advocates school lunch program

Page 41: Measuring Polarization in High-Dimensional Data

100th Congress (1987-88)

Most Republican Most Democraticfreedom fighter star wardoubl breast contra aidabort industri nuclear weapondemand second contra warheifer tax support contrareserv object nuclear wastincom ballist agent orangcommunist govern central americanwithdraw reserv nicaraguan governabort demand hatian peopl

Page 42: Measuring Polarization in High-Dimensional Data

100th Congress (1987-88)

Most Republican Most Democraticfreedom fighter star wardoubl breast contra aidabort industri nuclear weapondemand second contra warheifer tax support contrareserv object nuclear wastincom ballist agent orangcommunist govern central americanwithdraw reserv nicaraguan governabort demand hatian peopl

• Debate over support for Contra rebels fighting Sandinista government in Nicaragua;Iran-Contra affair

Page 43: Measuring Polarization in High-Dimensional Data

100th Congress (1987-88)

Most Republican Most Democraticfreedom fighter star wardoubl breast contra aidabort industri nuclear weapondemand second contra warheifer tax support contrareserv object nuclear wastincom ballist agent orangcommunist govern central americanwithdraw reserv nicaraguan governabort demand hatian peopl

• Debate over Reagan’s “Star Wars” missile defense initiative & nuclear weapons policy

Page 44: Measuring Polarization in High-Dimensional Data

104th Congress (1995-96)

Most Republican Most Democraticmedic save tax breakpartialbirth abort nurs homebig govern comp timefeder debt break wealthitax increas break wealthiesttax relief communiti policterm limit million childrennation debt assault weapontax freedom deficit reductitem veto head start

Page 45: Measuring Polarization in High-Dimensional Data

104th Congress (1995-96)

Most Republican Most Democraticmedic save tax breakpartialbirth abort nurs homebig govern comp timefeder debt break wealthitax increas break wealthiesttax relief communiti policterm limit million childrennation debt assault weapontax freedom deficit reductitem veto head start

• Debate over taxes and fiscal policy; Republicans using language from Luntz memos andContract with America

Page 46: Measuring Polarization in High-Dimensional Data

Distribution of Phrase-Level PartisanshipP

oste

rior

1976 1980 1984 1988 1992 1996 2000 2004 2008

0.0

0.2

0.4

0.6

0.8

1.0

● ● ●

● 0.001 0.01 0.05 0.1 0.9 0.95 0.99 0.999

Page 47: Measuring Polarization in High-Dimensional Data

NeologismsA

vera

ge p

artis

ansh

ip

1870 1890 1910 1930 1950 1970 1990 2010

0.50

0.55

0.60

0.65

baselinepre−1980 vocabulary

post−1980 vocabulary

Page 48: Measuring Polarization in High-Dimensional Data

Topic Decomposition

• Are trends in partisanship driven by• Divergence in which topics Dems/Reps emphasize?• Divergence in how the parties talk about a given topic?

Page 49: Measuring Polarization in High-Dimensional Data

Topics

alcohol environment mailbudget federalism minorities

business foreign moneycrime government religion

defense health taxeconomy immigration tradeeducation justiceelections labor

Page 50: Measuring Polarization in High-Dimensional Data

Ave

rage

par

tisan

ship

1870 1890 1910 1930 1950 1970 1990 2010

0.500

0.505

0.510

0.515

0.520

0.525

0.530

overallwithin

between

Page 51: Measuring Polarization in High-Dimensional Data

0.50

0.52

0.54

0.56

0.58

0.60

alcohol

1870 1890 1910 1930 1950 1970 1990 2010

Avg

. par

tisan

ship

0.000

0.001

Fre

q.

Page 52: Measuring Polarization in High-Dimensional Data

0.50

0.52

0.54

0.56

0.58

0.60

defense

1870 1890 1910 1930 1950 1970 1990 2010

Avg

. par

tisan

ship

0.000

0.027

Fre

q.

Page 53: Measuring Polarization in High-Dimensional Data

0.50

0.52

0.54

0.56

0.58

0.60

minorities

1870 1890 1910 1930 1950 1970 1990 2010

Avg

. par

tisan

ship

0.000

0.009

Fre

q.

Page 54: Measuring Polarization in High-Dimensional Data

0.50

0.52

0.54

0.56

0.58

0.60

budget

1870 1890 1910 1930 1950 1970 1990 2010

Avg

. par

tisan

ship

0.000

0.015

Fre

q.

0.50

0.52

0.54

0.56

0.58

0.60

crime

1870 1890 1910 1930 1950 1970 1990 2010

Avg

. par

tisan

ship

0.000

0.005

Fre

q.

0.50

0.52

0.54

0.56

0.58

0.60

government

1870 1890 1910 1930 1950 1970 1990 2010

Avg

. par

tisan

ship

0.000

0.018

Fre

q.

0.50

0.52

0.54

0.56

0.58

0.60

health

1870 1890 1910 1930 1950 1970 1990 2010

Avg

. par

tisan

ship

0.000

0.016

Fre

q.

0.50

0.52

0.54

0.56

0.58

0.60

immigration

1870 1890 1910 1930 1950 1970 1990 2010

Avg

. par

tisan

ship

0.000

0.002

Fre

q.

0.50

0.52

0.54

0.56

0.58

0.60

tax

1870 1890 1910 1930 1950 1970 1990 2010

Avg

. par

tisan

ship

0.000

0.011

Fre

q.

Page 55: Measuring Polarization in High-Dimensional Data

Individual Tax PhrasesP

oste

rior

prob

abili

ty th

at s

peak

er is

Rep

ublic

an

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1870 1878 1950 1958 1966 1974 1982 1990 1998 2006

● ● ● ●

● ● ● ●

●●

●●

●death taxtax break

tax spendtax loophol

tax reliefclose tax

tax freedomshare tax

Page 56: Measuring Polarization in High-Dimensional Data

Explanations

Page 57: Measuring Polarization in High-Dimensional Data

Political Innovation

• Contract with America (1994)

• Republicans take control of Congress for first time since 1952• Frank Luntz: novel polling techniques, memos to Republican candidates• In the aftermath, Democrats launch an effort to improve their own choice of

language

You believe language can change a paradigm? “I don’t believe it – Iknow it. I’ve seen it with my own eyes...I watched in 1994 when the groupof Republicans got together and said: ‘We’re going to do this completelydifferently than it’s ever been done before.’...Every politician and everypolitical party issues a platform, but only these people signed a contract.” -Luntz (2004)

“Republican framing superiority had played a major role in their takeover ofCongress in 1994. I and others had hoped that... a widespreadunderstanding of how framing worked would allow Democrats to reversethe trend.” - Lakoff (2014)

Page 58: Measuring Polarization in High-Dimensional Data

Phrases from CWA0.

500.

510.

520.

530.

54

1870 1890 1910 1930 1950 1970 1990 2010

Avg

. par

tisan

ship

Contract with America

0.000

0.018

Fre

q.

Page 59: Measuring Polarization in High-Dimensional Data

Broader Context

• Party discipline in speech• Democratic Message Board (1989-1991)• Republican Theme Team (1991-1993): “develop ideas and phrases to be

used by all Republicans”

• Changing media environment• 1979: C-SPAN (House of Representatives)• 1983: C-SPAN2 (Senate)

“When asked whether he would be the Republican leader without C-SPAN,Gingrich... [replied] ‘No’... C-SPAN provided a group of media-savvy Houseconservatives in the mid-1980s with a method of... winning a prime-timeaudience.” (Frantzich & Sullivan 1996)

Page 60: Measuring Polarization in High-Dimensional Data

SummaryA

vera

ge p

artis

ansh

ip

1976 1980 1984 1988 1992 1996 2000 2004 2008

0.500

0.505

0.510

0.515

0.520

0.525 C−SPAN C−SPAN2 Contract with America

Ford Carter Reagan Bush Clinton Bush

Page 61: Measuring Polarization in High-Dimensional Data

Conclusion

Page 62: Measuring Polarization in High-Dimensional Data

Does Language Matter?

• Partisan language in Congress diffuses to broader public• Gentzkow & Shapiro 2010; Martin & Yurukoglu 2016; Greenstein & Zhu 2012

• Issue framing affects public opinion• Lathrop 2003; Graetz and Shapiro 2006; Druckman et al. 2013

• Language affects group identity• Kinzler et al 2007, Clots-Figueas and Masella 2013

• "Human beings do not live in the objective world alone, nor alone in the world ofsocial activity as ordinarily understood, but are very much at the mercy of theparticular language which has become the medium of expression.” (Sapir 1954)

• "When we successfully reframe public discourse, we change the way the publicsees the world. We change what counts as common sense.... Thinkingdifferently requires speaking differently.” (Lakoff 2014)