Landmark-Based User Location Inference in Social Media YUTO YAMAGUCHI †, TOSHIYUKI AMAGASA † AND...

Preview:

Citation preview

Landmark-Based User Location Inferencein Social Media

YUTO YAMAGUCHI†, TOSHIYUKI AMAGASA †

AND HIROYUKI KITAGAWA †

†UNIVERSITY OF TSUKUBA

13/10/08

COSN 2013 - Yuto Yamaguchi 1

LOCATION-RELATED INFORMATION

13/10/08

COSN 2013 - Yuto Yamaguchi 2

Eating seafood !!!

I’m at Logan airport

Profile

Residence: Tokyo, Japan

COSN @ northeastern

APPLICATIONS

Various Researches using Home Locations

Outbreak Modeling [Poul+, ICWSM’12]

Real-World Event Detection [Sakaki+, WWW’12]

Analyzing Disasters [Mandel+, LSM’12]

Other Useful Applications

Location-aware Recommender [Levandoski+, ICDE’12]

Merketing, Ads

Disaster Warning

13/10/08

COSN 2013 - Yuto Yamaguchi 3

OUR PROBLEM

Location profiles are not available for …

76% of Twitter users [Cheng et al., CIKM’10]

94% of Facebook users [Backstrom et al., WWW’10]

This reduces opportunities of location information

                User Home Location Inference

13/10/08

COSN 2013 - Yuto Yamaguchi 4

USER HOME LOCATION INFERENCE Content-Based Approaches

[Cheng et al., CIKM’10] [Kinsella et al., SMUC’11] [Chandra et al., SocialCom’11]

Graph-Based Approaches

[Backstrom et al., WWW’10] [Sadilek et al., WSDM’12] [Jurgens, ICWSM’13]

13/10/08

COSN 2013 - Yuto Yamaguchi 5

Our focus

GRAPH-BASED APPROACH (1/2)

Basic Idea

13/10/08

COSN 2013 - Yuto Yamaguchi 6

Boston

Boston

Boston Chicago

New York Boston?

friends

GRAPH-BASED APPROACH (2/2)

Closeness Assumption

13/10/08

COSN 2013 - Yuto Yamaguchi 7

Friends

Not friends

Spatially close

Spatially distant

Really close?

60% are 100km distant

CONCENTRATION ASSUMPTION

13/10/08

COSN 2013 - Yuto Yamaguchi 8

Boston

Boston?

LANDMARK

Unknown

NYChicago

LANDMARKS         13/10/08

9COSN 2013 - Yuto Yamaguchi

REQUIREMENTS Small Dispersion

Large Centrality

13/10/08

COSN 2013 - Yuto Yamaguchi 10

EXAMPLES IN TWITTER

13/10/08

COSN 2013 - Yuto Yamaguchi 11

LANDMARKS MAPPING

13/10/08

COSN 2013 - Yuto Yamaguchi 12

Red: all usersBlue: landmarks

PROPOSED METHOD    13/10/08

13COSN 2013 - Yuto Yamaguchi

OVERVIEW

Probabilistic Model

Modeling

13/10/08

COSN 2013 - Yuto Yamaguchi 14

Each user has his/her location distribution

Location inference = Selecting the location with the largest probability density

location set

LANDMARK MIXTURE MODEL

DOMINANCE DISTRIBUTION

Spatial distribution of followers’ home locations

Modeled as Gaussian

Landmarks have small covariances

many followers at the center

13/10/08

COSN 2013 - Yuto Yamaguchi 15

latitude

longitude

manyfollowers

fewfollowers

LANDMARK MIXTURE MODEL (LMM)

13/10/08

COSN 2013 - Yuto Yamaguchi 16

Inferencetarget user

follow

Landmark

Non-landmark

Non-landmark

Dominancedistribution

Mixtureweight

Large weight for landmark

MIXTURE WEIGHTS

13/10/08

COSN 2013 - Yuto Yamaguchi 17

Proportional to centrality

Landmark Non-landmark

Large mixture weight Small mixture weight

CONFIDENCE CONSTRAINT

If the distribution does not have a clear peak,

we should not infer the location of that user

13/10/08

COSN 2013 - Yuto Yamaguchi 18

High precision but low recall

CENTRALITY CONSTRAINT

We can reduce the cost by ignoring non-landmarks

13/10/08

COSN 2013 - Yuto Yamaguchi 19

low cost but low recall

Inferencetarget user

follow

Landmark

Non-landmark

Non-landmark

EXPERIMENTS         13/10/08

20COSN 2013 - Yuto Yamaguchi

DATASET

Twitter dataset provided by [Li et al., KDD’12]

3M users in the U.S.

285M follow edges

Geocode their location profiles for ground truth

465K users (15%) labeled users

Test set

46K users (10% of labeled users)

13/10/08

COSN 2013 - Yuto Yamaguchi 21

PERFORMANCE COMPARISON

13/10/08

COSN 2013 - Yuto Yamaguchi 22

Compared three methods LMM: our method UDI: [Li+, KDD’12] Naïve: Spatial median

EFFECT OF CONFIDENCE CONSTRAINT

13/10/08

COSN 2013 - Yuto Yamaguchi 23

p0

We can adjust the trade-off between precision and recall

EFFECT OF CENTRALITY CONSTRAINT

13/10/08

COSN 2013 - Yuto Yamaguchi 24

c0 We can adjust the trade-off between cost and recall

CONCLUSIONIntroduced the concentration assumptioninstead of widely-used closeness assumption

There exist landmarks

Proposed landmark mixture model

Outperforms the state-of-the-art method

Confidence / Centrality constraint

Future work

Other application of landmarks

Recommending landmarks or their tweets 13/10/08

COSN 2013 - Yuto Yamaguchi 25

Recommended