Upload
yandex
View
894
Download
0
Embed Size (px)
Citation preview
Point of Departure: Labeled Blogs Le[ leaning blogs (387) Right leaning blogs (644)
From Benkler and Shaw “A tale of two blogospheres” (2010) and Wonkosphere Blog Directory
Who are these People?
• Use self-‐provided age and gender and ZIP-‐derived es%mates
• People clicking on right-‐leaning blogs: – Are older (50 vs. 45 years) – Are more male (63% vs. 55%) – Are more white (81% vs. 78%) – More likely to work for Yahoo (92.3% vs. 11.4%)
All these trends agree with voters‘ demographics
“huffingtonpost.com” is le1-‐leaning è a le1-‐leaning vote for “pizza is a vegetable”
Aggregate votes across all clicks on poli%cal blogs to compute overall leaning
From Blogs to Queries
vL = le[-‐clicks for query VL = total le[ clicks
Examples of Assigned Leaning
Examples using Wikipedia mapping for 6 months of data, July 4, 2011 – January 8, 2012.
queries for Wikipedia en=ty “Pa=ent Protec=on & Affordable Care Act” obama healthcare bill text (.91) who pays for obamacare (.04)
obama health care privileges (.83) obamacare reaches the supreme court (.09)
is affordable care act uncons%tu%onal (.78) is obamacare cons%tu%onal (.16)
queries for Wikipedia category “Occupy” who started occupy wall street (.94) occupy wall street rape (.09)
we are the 99% (.91) occupy movement violence (.25)
occupy movement supporters (.78) crime in occupy movement (.44)
lies protest
``cost obama trip to india‘‘
Mapping Queries to Statements
364 dis%nct queries mapped to true facts 574 dis%nct queries mapped to false facts
• Correla%on with leaning? Any guess? – None.
• Correla%on with leaning, when condi%oned on source? Any guess? – None.
• Correla%on with volume? Any guess? – Well ...
Impact of Truth Value
Data Set
• Start with seed set of users with known poli%cal orienta%on, e.g. @BarackObama or @Mi5Romney
• Get their tweets
U.S. Users Only
• Lots of interna%onal interest in U.S poli%cs • People from all over the world retweet
• Use Yahoo! Placemaker to remove non-‐US users h5p://developer.yahoo.com/geo/placemaker/
Evalua%ng Data Quality
• Do we have the correct poli%cal leaning?
Accuracy = 0.98, 0.93 for Wefollow and Twellow respec%vely Inspec%on: “greatest environmentalist. Also, despise republicans“ Corrected accuracy: 0.99 and 0.95
Detec%ng Poli%cal Hashtags • Most hashtags are non-‐poli%cal – #x, #FavouriteAlbums, …
• Not always obvious – #yes4m, #usmc, …
• co-‐occurrence with seed poli%cal hashtags: #p2, #tcot,#gop, #ows, obama*, romney*, … – Keep top 10% in terms of P(POL|h)
• Time dependence – #america during Olympics and during elec%ons
• Remove low volume hashtags – Mostly noise and no “large” poli%cal issues
Detec%ng Trending Hashtags
• Trending = currently popular – Having a higher volume than “expected”
#obamagotosama: May 1, to May 8, 2011 #ows: Sep. 25, to Oct. 2, 2011 Non-‐trending hashtags: #vote, #democracy
Assigning a Leaning to Hashtags
• Vo%ng approach: • Mere counts:
• Normalized counts:
• + smoothing:
Detec%ng “Change Points” • Filter hashtags without sufficient support Total number of weeks > 4
• Relat. and absol. change in leaning from previous week Change from previous week > std and Change from previous week > 0.25
• Change from average value is big Current value -‐ Average value > std
• Change in leaning is in the direc%on of other leaning Change in direc%on = TRUE
What Causes Change Points • Volume-‐to-‐user ra%o:
– High means small, ac%ve set (“hijackers”) – Low means general masses
Topical Clustering of Hashtags
• Hashtags are o[en “micro-‐topics” • Cluster hashtags to have more high-‐level topics • We used simple k-‐means clustering on co-‐occurrence feature vectors
Ongoing Work
• Beyond 2-‐party systems: UK and Germany • Frac%onal party membership • Visualiza%on challenges • Hans Rosling’s “bubbles”