View
2
Download
0
Category
Preview:
Citation preview
Online Operation
Aggregation
Web UI
Relevant Topic 1Relevant Topic 2...
Relevant Topic 1Relevant Topic 2...
Relevant Topic 1Relevant Topic 2...
Scoring and
Ranking
Offline ProcessingSocial Media
Topic-Signal DB
Topic-Signal Creator
job, unemployed, hiring
2012 2014
User Query
Topic-Signal DB
Raccoon: A Query System for Social Media Signals
Architecture
Dolan Antenucci, Michael R. Anderson, Michael CafarellaUniversity of Michigan
Real world phenomena like unemployment and disease behavior have been estimated using social media – a process known as nowcasting [1].
These estimates are cheaper and faster than traditional data collection methods like phone surveys.
Faster and cheaper means saved money and better policy.
Motivation: Forecasting the Present via Social Media Example User Query (Target: Unemployment Behavior)
Problem: Nowcasting Requires Ground Truth Data
Scalability with the CloudCurrent Prototype System:• 40 billion tweets (collected 2011-2015)• 150 million topic-signal pairs (after threshold filtering)• Query processing runtime: ~20s (on 10 EC2 c3.8xlarge servers)• Topic-signal pairs support daily updating
dol@umich.edu w http://web.eecs.umich.edu/~dol
[1] S. L. Scott and H. Varian. Bayesian variable selection for nowcasting economic time series. In Economics of Digitization. University of Chicago Press, 2014.[2] A. Greenspan. The Age of Turbulence. Penguin Press, 2007.[3] D. Antenucci, M. Cafarella, M. C. Levenstein, C. Re ́, and M. D. Shapiro. Using social media to measure labor market flows. Working Paper 20010, National Bureau of Economic Research, March 2014.
Related Work
10/17 i need a job 491
12/15 i love you 5092
1/28 jus8n bieber 940,291
2.
3.
“Ground Truth Data”
1.
Apr May Jun
Une
mploymen
t
Apr May Jun
Une
mploymen
t
“I need a job”, ) ( “I need a job”, ) ( “I need a job”, ) (
1
0Rela
tive
Chan
ge
job, unemployment
2012 20142013 2015
Users Given Trade-Off Between Query Components
2012 20142013Correlation with query signal: 0.897
2015
Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...
Resulting Target Estimate:1
0Rela
tive
Chan
ge
2012 20142013Correlation with query signal: 0.897
2015
Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...
Resulting Target Estimate:1
0Rela
tive
Chan
ge
2012 20142013Correlation with query signal: 0.897
2015
Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...
Resulting Target Estimate:1
0Rela
tive
Chan
ge
Topic-Signal Pairs Nowcasting Result
Like a search engine, users iteratively submit queries:
Raccoon User
Raccoon Nowcasting
System
User Query (q, r)
2012 20142013Correlation with query signal: 0.897
2015
Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...
Resulting Target Estimate:1
0Rela
tive
Chan
ge
Nowcasting Result (T, s, β)
1
0Rela
tive
Chan
ge
job, unemployment
2012 20142013 2015
User Interaction Loop
Unemployment-‐related keywords
Seasonal trends of unemployment (drawn by user)
Yet, economists [3] still want estimates for targets lacking ground truth data:
Solution: Substitute Domain Knowledge for Ground Truth
Our target users have some semantic and signal domain knowledge about their target phenomena:
Seman=c Knowledge (keywords, etc.)
Signal Knowledge (seasonal trends, etc.)
Apr May Jun
Une
mploymen
t
Nowcasting Result
“I need a job”, ) ( “I need a job”, ) ( “I need a job”, ) (
User’s Domain Knowledge
“Ground Truth Data”
• 10-day auto sales (once used, but no longer available [2])• Physical movement (e.g., out of parents house, for job)• etc.
Topic-Signal Pairs
Prefer Signal Query Component
Users can explore a set of query results in real-time, produced by varying the relative influence of the semantic and signal query components:
Prefer SemanticQuery Component
User Query
Signal Scorer
Seman=c Scorer
Trade-‐off Generator
Topic 1 Topic 2 … Topic 1 Topic 2 … Topic 1 Topic 2 …
Feature Reduc=on via PCA
Topic 1 Topic 2 … Topic 1 Topic 2 … Topic 1 Topic 2 …
Topic-‐Signal DB
Final Aggrega=on via Linear Regression
2012 20142013Correlation with query signal: 0.897
2015
Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...
Resulting Target Estimate:1
0Rela
tive
Chan
ge
2012 20142013Correlation with query signal: 0.897
2015
Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...
Resulting Target Estimate:1
0Rela
tive
Chan
ge
2012 20142013Correlation with query signal: 0.897
2015
Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...
Resulting Target Estimate:1
0Rela
tive
Chan
ge
Scoring and Ranking Aggregation
Recommended