Metro Maps of Dafna Shahaf Carlos Guestrin Eric Horvitz

Preview:

Citation preview

Metro Maps of

Dafna ShahafCarlos Guestrin

Eric Horvitz

The abundance of books is a distraction‘‘

,,Lucius Annaeus Seneca

4 BC – 65 AD

… and it does not get any better

• 129,864,880 Books (Google estimate)

• Research:– PubMed: 19 million papers

(One paper added per minute!)– Scopus: 40 million papers

Papers

InnovativePapers

So, you want to understand a research topic…

Now what?

Search Engines are Great

• But do not show how it all fits together

Timeline Systems

Research is not Linear

Metro Map

• A map is a set of lines of articles• Each line follows a coherent narrative thread• Temporal Dynamics + Structure

austerity

bailout

junk status

Germany

protests

strike

labor unionsMerkel

Map Definition• A map M is a pair (G, P) where

– G=(V,E) is a directed graph– P is a set of paths in G (metro lines)– Each e Î E must belong to at least one metro line

austerity

bailout

junk status

protests

strike

Germany

labor unionsMerkel

Game Plan

Objective Algorithm Does itwork?

Properties of a Good Map

1. Coherence

???

1 2 3 4 5

Greece

Europe

ItalyRepublican

Protest

Coherence: Main IdeaConnecting the Dots [S, Guestrin, KDD’10]

Debt default

Coherence is not a property of local interactions:

Incoherent: Each pair shares different words

1 2 3 4 5

Greece

Austerity

ItalyRepublican

Protest

Coherence: Main IdeaConnecting the Dots [S, Guestrin, KDD’10]

Debt default

A more-coherent chain:

Coherent: a small number of words captures the story

Words are too Simple

1 2 3

Probability

NetworkCost

Sensor networks

Bayesiannetworks Social

networks

Using the Citation Graph

• Create a graph per word– All papers mentioning the word – Edge weight = strength of influence [El-Arini, Guestrin KDD‘11]

3

6 7

4

9

2

8

1

5

Network

Where did paper 8 get the idea?

Do papers 8 and 9 mean the same thing?

Words are too Simple

1 2 3

Probability

NetworkCost

Sensor networks

Bayesiannetworks Social

networks

Incoherent

Properties of a Good Map

1. Coherence

Is it enough?

Max-coherence MapQuery: Reinforcement Learning

Properties of a Good Map

1. Coherence

2. Coverage

Should cover diverse topics important to

the user

Coverage: What to Cover?

• Perhaps words?• Not enough:

SVM in oracle database 10gMilenova et al

VLDB '05

Support Vector Machines in Relational Databases RupingSVM '02

1

2

Similar Content

1 2

Different Impact Citing Venues and Authors:

Affected more authors/ venues

Very little intersection

1 2

What to Cover?

• Instead of words…• Cover papers• A paper covers papers that

it had an impact on• High-coverage map:

impact on a lot of the corpus• Why descendants?

• Soft notion: [0,1]

p has High Impact on q if…p

q

Many paths(especially short)

Note that our protocol is different from previous

work…

coherent

Formalize with coherent random walks

We use the algorithm of…

r

Map Coverage• Documents cover pieces of the corpus:

CorpusCoverage

High-coverage, Coherent Map

Properties of a Good Map

1. Coherence

2. Coverage

3. Connectivity

Definition: Connectivity

• Experimented with formulations• Users do not care about connection type• Encourage connections between pairs of lines

Lines with No Intersection

Solution: Reward lines that had impact on each other

Perceptrons SVMOptimizing Kernels

for SVM

Face DetectionSVM for Facial

Recognition

Tying it all Together:Map Objective

• Coherence– Either coherent or not: Constraint

• Coverage– Must have!

• Connectivity– Nice to have

Consider all coherent maps with maximum possible coverage.

Find the most connected one.

Game Plan

Objective Algorithm Does itwork?

Approach Overview

Documents D

1. Coherence graph G 2. Coverage function f

f( ) = ?

3. Increase Connectivity

Coherence Graph: Main Idea

• Vertices correspond to short coherent chains• Directed edges between chains which can be

conjoined and remain coherent

1 2 3

4 5 6 5 8 9

1 2 3 5 8 9

Finding High-Coverage Chains• Paths correspond to coherent chains.• Problem: find a path of length K maximizing

coverage of underlying articles

1 2 3

4 5 6 5 8 9

Cover( )

>

Cover( )

?

1 2 3 4 5 6

1 2 3 5 8 9

Reformulation• Paths correspond to coherent chains.• Problem: find a path of length K maximizing

coverage of underlying articles

• Submodular orienteering– [Chekuri and Pal, 2005]– Quasipolynomial time recursive greedy– O(log OPT) approximation

Orienteering

a function of the nodes visited

Approach Overview: Recap

Documents D

1. Coherence graph G 2. Coverage function f

f( ) = ?

3. Increase Connectivity

Encodes all coherent chains as

graph paths

Submodular orienteering [Chekuri & Pal, 2005]

Quasipoly time recursive greedy

O(log OPT) approximation

Example Map: Reinforcement Learning

multi-agent cooperative joint teammdp states pomdp transition optioncontrol motor robot skills armbandit regret dilemma exploration armq-learning bound optimal rmax mdp

Example Map Detail: SVM

Game Plan

Objective Algorithm Does itwork?

User Study

• Tricky!– No double-blind, no within-subject– Domain: understandable yet unfamiliar– Reinforcement Learning (RL)

User Study

• 30 participants• First-year grad student, Reinforcement

Learning project• Update a survey paper from 1996• Identify research directions + relevant papers

– Google Scholar – Map and Google Scholar – Baselines: Map, Wikipedia

Results (in a nutshell)Be

tter

Google Us Google Us

Map users find better papers, and

cover more important areas

User CommentsHelpful

noticed directions I didn't know aboutgreat starting point

… get a basic idea of what science is up to

why don't you draw words on edges?

Legend is confusing

hard to get an idea from paper title alone

Conclusions• Formulated metrics characterizing good maps for

the scientific domain• Efficient methods with theoretical guarantees• User studies highlight the promise of the method• Website on the way!• Personalization

Thank you!

Recommended