University of Michigan · Raccoon: A Query System for Social Media Signals Architecture Dolan...

Preview:

Citation preview

Online Operation

Aggregation

Web UI

Relevant Topic 1Relevant Topic 2...

Relevant Topic 1Relevant Topic 2...

Relevant Topic 1Relevant Topic 2...

Scoring and

Ranking

Offline ProcessingSocial Media

Topic-Signal DB

Topic-Signal Creator

job, unemployed, hiring

2012 2014

User Query

Topic-Signal DB

Raccoon: A Query System for Social Media Signals

Architecture

Dolan Antenucci, Michael R. Anderson, Michael CafarellaUniversity of Michigan

Real world phenomena like unemployment and disease behavior have been estimated using social media – a process known as nowcasting [1].

These estimates are cheaper and faster than traditional data collection methods like phone surveys.

Faster and cheaper means saved money and better policy.

Motivation: Forecasting the Present via Social Media Example User Query (Target: Unemployment Behavior)

Problem: Nowcasting Requires Ground Truth Data

Scalability with the CloudCurrent Prototype System:•  40 billion tweets (collected 2011-2015)•  150 million topic-signal pairs (after threshold filtering)•  Query processing runtime: ~20s (on 10 EC2 c3.8xlarge servers)•  Topic-signal pairs support daily updating

dol@umich.edu w http://web.eecs.umich.edu/~dol

[1] S. L. Scott and H. Varian. Bayesian variable selection for nowcasting economic time series. In Economics of Digitization. University of Chicago Press, 2014.[2] A. Greenspan. The Age of Turbulence. Penguin Press, 2007.[3] D. Antenucci, M. Cafarella, M. C. Levenstein, C. Re ́, and M. D. Shapiro. Using social media to measure labor market flows. Working Paper 20010, National Bureau of Economic Research, March 2014.

Related Work

10/17   i  need  a  job   491  

12/15   i  love  you   5092  

1/28   jus8n  bieber   940,291  

2.  

3.  

“Ground Truth Data”

1.  

         Apr                  May                    Jun  

     Une

mploymen

t  

 Apr                  May                    Jun        

Une

mploymen

t  

“I  need  a  job”,     ) ( “I  need  a  job”,     ) ( “I  need  a  job”,     ) (

1

0Rela

tive

Chan

ge

job, unemployment

2012 20142013 2015

Users Given Trade-Off Between Query Components

2012 20142013Correlation with query signal: 0.897

2015

Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...

Resulting Target Estimate:1

0Rela

tive

Chan

ge

2012 20142013Correlation with query signal: 0.897

2015

Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...

Resulting Target Estimate:1

0Rela

tive

Chan

ge

2012 20142013Correlation with query signal: 0.897

2015

Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...

Resulting Target Estimate:1

0Rela

tive

Chan

ge

Topic-Signal Pairs Nowcasting Result

Like a search engine, users iteratively submit queries:

Raccoon User

Raccoon Nowcasting

System

User Query (q, r)

2012 20142013Correlation with query signal: 0.897

2015

Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...

Resulting Target Estimate:1

0Rela

tive

Chan

ge

Nowcasting Result (T, s, β)

1

0Rela

tive

Chan

ge

job, unemployment

2012 20142013 2015

User Interaction Loop

Unemployment-­‐related  keywords  

Seasonal  trends  of  unemployment  (drawn  by  user)  

Yet, economists [3] still want estimates for targets lacking ground truth data:

Solution: Substitute Domain Knowledge for Ground Truth

Our target users have some semantic and signal domain knowledge about their target phenomena:

Seman=c    Knowledge  (keywords,  etc.)  

Signal  Knowledge  (seasonal  trends,  etc.)  

 Apr                  May                    Jun        

Une

mploymen

t  

Nowcasting Result

“I  need  a  job”,     ) ( “I  need  a  job”,     ) ( “I  need  a  job”,     ) (

User’s Domain Knowledge

“Ground Truth Data”

•  10-day auto sales (once used, but no longer available [2])•  Physical movement (e.g., out of parents house, for job)•  etc.

Topic-Signal Pairs

Prefer Signal Query Component

Users can explore a set of query results in real-time, produced by varying the relative influence of the semantic and signal query components:

Prefer SemanticQuery Component

User  Query  

Signal  Scorer  

Seman=c  Scorer  

Trade-­‐off  Generator  

Topic  1  Topic  2  …  Topic  1  Topic  2  …  Topic  1  Topic  2  …  

Feature  Reduc=on  via  PCA  

Topic  1  Topic  2  …  Topic  1  Topic  2  …  Topic  1  Topic  2  …  

Topic-­‐Signal  DB  

Final  Aggrega=on  via  Linear  Regression  

2012 20142013Correlation with query signal: 0.897

2015

Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...

Resulting Target Estimate:1

0Rela

tive

Chan

ge

2012 20142013Correlation with query signal: 0.897

2015

Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...

Resulting Target Estimate:1

0Rela

tive

Chan

ge

2012 20142013Correlation with query signal: 0.897

2015

Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...

Resulting Target Estimate:1

0Rela

tive

Chan

ge

Scoring and Ranking Aggregation

Recommended