1
Online Operation Aggregation Web UI Relevant Topic 1 Relevant Topic 2 ... Relevant Topic 1 Relevant Topic 2 ... Relevant Topic 1 Relevant Topic 2 ... Scoring and Ranking Offline Processing Social Media Topic- Signal DB Topic-Signal Creator job, unemployed, hiring 2012 2014 User Query Topic- Signal DB Raccoon: A Query System for Social Media Signals Architecture Dolan Antenucci, Michael R. Anderson, Michael Cafarella University of Michigan Real world phenomena like unemployment and disease behavior have been estimated using social media – a process known as nowcasting [1]. These estimates are cheaper and faster than traditional data collection methods like phone surveys. Faster and cheaper means saved money and better policy. Motivation: Forecasting the Present via Social Media Example User Query (Target: Unemployment Behavior) Problem: Nowcasting Requires Ground Truth Data Scalability with the Cloud Current Prototype System: 40 billion tweets (collected 2011-2015) 150 million topic-signal pairs (after threshold filtering) Query processing runtime: ~20s (on 10 EC2 c3.8xlarge servers) Topic-signal pairs support daily updating [email protected] http://web.eecs.umich.edu/~dol [1] S. L. Scott and H. Varian. Bayesian variable selection for nowcasting economic time series. In Economics of Digitization. University of Chicago Press, 2014. [2] A. Greenspan. The Age of Turbulence. Penguin Press, 2007. [3] D. Antenucci, M. Cafarella, M. C. Levenstein, C. Ré, and M. D. Shapiro. Using social media to measure labor market flows. Working Paper 20010, National Bureau of Economic Research, March 2014. Related Work 10/17 i need a job 491 12/15 i love you 5092 1/28 jus8n bieber 940,291 2. 3. “Ground Truth Data” 1. Apr May Jun Unemployment Apr May Jun Unemployment ) ( ) ( “I need a job”, ) ( 1 0 Relative Change job, unemployment 2012 2014 2013 2015 Users Given Trade-Off Between Query Components Relative Change Relative Change 2012 2014 2013 Correlation with query signal: 0.897 2015 Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs ... Resulting Target Estimate: 1 0 Relative Change Topic-Signal Pairs Nowcasting Result Like a search engine, users iteratively submit queries: Raccoon User Raccoon Nowcasting System User Query 2012 2014 2013 Correlation with query signal: 0.897 2015 Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs ... Resulting Target Estimate: 1 0 Relative Change Nowcasting Result 1 0 Relative Change job, unemployment 2012 2014 2013 2015 User Interaction Loop Unemployment related keywords Seasonal trends of unemployment (drawn by user) Yet, economists [3] still want estimates for targets lacking ground truth data: Solution: Substitute Domain Knowledge for Ground Truth Our target users have some semantic and signal domain knowledge about their target phenomena: Seman=c Knowledge (keywords, etc.) Signal Knowledge (seasonal trends, etc.) Apr May Jun Unemployment Nowcasting Result ) ( ) ( “I need a job”, ) ( User’s Domain Knowledge “Ground Truth Data” 10-day auto sales (once used, but no longer available [2]) Physical movement (e.g., out of parents house, for job) etc. Topic-Signal Pairs Prefer Signal Query Component Users can explore a set of query results in real-time, produced by varying the relative influence of the semantic and signal query components: Prefer Semantic Query Component User Query Signal Scorer Seman=c Scorer Tradeoff Generator Topic 1 Topic 2 Topic 1 Topic 2 Topic 1 Topic 2 Feature Reduc=on via PCA Topic 1 Topic 2 Topic 1 Topic 2 Topic 1 Topic 2 Topic Signal DB Final Aggrega=on via Linear Regression Relative Change Relative Change 2012 2014 2013 Correlation with query signal: 0.897 2015 Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs ... Resulting Target Estimate: 1 0 Relative Change Scoring and Ranking Aggregation

University of Michigan · Raccoon: A Query System for Social Media Signals Architecture Dolan Antenucci, Michael R. Anderson, Michael Cafarella University of Michigan Real world phenomena

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: University of Michigan · Raccoon: A Query System for Social Media Signals Architecture Dolan Antenucci, Michael R. Anderson, Michael Cafarella University of Michigan Real world phenomena

Online Operation

Aggregation

Web UI

Relevant Topic 1Relevant Topic 2...

Relevant Topic 1Relevant Topic 2...

Relevant Topic 1Relevant Topic 2...

Scoring and

Ranking

Offline ProcessingSocial Media

Topic-Signal DB

Topic-Signal Creator

job, unemployed, hiring

2012 2014

User Query

Topic-Signal DB

Raccoon: A Query System for Social Media Signals

Architecture

Dolan Antenucci, Michael R. Anderson, Michael CafarellaUniversity of Michigan

Real world phenomena like unemployment and disease behavior have been estimated using social media – a process known as nowcasting [1].

These estimates are cheaper and faster than traditional data collection methods like phone surveys.

Faster and cheaper means saved money and better policy.

Motivation: Forecasting the Present via Social Media Example User Query (Target: Unemployment Behavior)

Problem: Nowcasting Requires Ground Truth Data

Scalability with the CloudCurrent Prototype System:•  40 billion tweets (collected 2011-2015)•  150 million topic-signal pairs (after threshold filtering)•  Query processing runtime: ~20s (on 10 EC2 c3.8xlarge servers)•  Topic-signal pairs support daily updating

[email protected] w http://web.eecs.umich.edu/~dol

[1] S. L. Scott and H. Varian. Bayesian variable selection for nowcasting economic time series. In Economics of Digitization. University of Chicago Press, 2014.[2] A. Greenspan. The Age of Turbulence. Penguin Press, 2007.[3] D. Antenucci, M. Cafarella, M. C. Levenstein, C. Re ́, and M. D. Shapiro. Using social media to measure labor market flows. Working Paper 20010, National Bureau of Economic Research, March 2014.

Related Work

10/17   i  need  a  job   491  

12/15   i  love  you   5092  

1/28   jus8n  bieber   940,291  

2.  

3.  

“Ground Truth Data”

1.  

         Apr                  May                    Jun  

     Une

mploymen

t  

 Apr                  May                    Jun        

Une

mploymen

t  

“I  need  a  job”,     ) ( “I  need  a  job”,     ) ( “I  need  a  job”,     ) (

1

0Rela

tive

Chan

ge

job, unemployment

2012 20142013 2015

Users Given Trade-Off Between Query Components

2012 20142013Correlation with query signal: 0.897

2015

Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...

Resulting Target Estimate:1

0Rela

tive

Chan

ge

2012 20142013Correlation with query signal: 0.897

2015

Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...

Resulting Target Estimate:1

0Rela

tive

Chan

ge

2012 20142013Correlation with query signal: 0.897

2015

Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...

Resulting Target Estimate:1

0Rela

tive

Chan

ge

Topic-Signal Pairs Nowcasting Result

Like a search engine, users iteratively submit queries:

Raccoon User

Raccoon Nowcasting

System

User Query (q, r)

2012 20142013Correlation with query signal: 0.897

2015

Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...

Resulting Target Estimate:1

0Rela

tive

Chan

ge

Nowcasting Result (T, s, β)

1

0Rela

tive

Chan

ge

job, unemployment

2012 20142013 2015

User Interaction Loop

Unemployment-­‐related  keywords  

Seasonal  trends  of  unemployment  (drawn  by  user)  

Yet, economists [3] still want estimates for targets lacking ground truth data:

Solution: Substitute Domain Knowledge for Ground Truth

Our target users have some semantic and signal domain knowledge about their target phenomena:

Seman=c    Knowledge  (keywords,  etc.)  

Signal  Knowledge  (seasonal  trends,  etc.)  

 Apr                  May                    Jun        

Une

mploymen

t  

Nowcasting Result

“I  need  a  job”,     ) ( “I  need  a  job”,     ) ( “I  need  a  job”,     ) (

User’s Domain Knowledge

“Ground Truth Data”

•  10-day auto sales (once used, but no longer available [2])•  Physical movement (e.g., out of parents house, for job)•  etc.

Topic-Signal Pairs

Prefer Signal Query Component

Users can explore a set of query results in real-time, produced by varying the relative influence of the semantic and signal query components:

Prefer SemanticQuery Component

User  Query  

Signal  Scorer  

Seman=c  Scorer  

Trade-­‐off  Generator  

Topic  1  Topic  2  …  Topic  1  Topic  2  …  Topic  1  Topic  2  …  

Feature  Reduc=on  via  PCA  

Topic  1  Topic  2  …  Topic  1  Topic  2  …  Topic  1  Topic  2  …  

Topic-­‐Signal  DB  

Final  Aggrega=on  via  Linear  Regression  

2012 20142013Correlation with query signal: 0.897

2015

Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...

Resulting Target Estimate:1

0Rela

tive

Chan

ge

2012 20142013Correlation with query signal: 0.897

2015

Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...

Resulting Target Estimate:1

0Rela

tive

Chan

ge

2012 20142013Correlation with query signal: 0.897

2015

Topics Used: • job new year • job interview tomorrow • first job interview today • working two jobs • job interview • #hiring #jobs • ...

Resulting Target Estimate:1

0Rela

tive

Chan

ge

Scoring and Ranking Aggregation