26
Takeaways in Large-scale Human Mobility Data Mining Guangshuo Chen, Aline Carneiro Viana, and Marco Fiore

Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

Takeaways in Large-scale Human Mobility Data Mining

Guangshuo Chen, Aline Carneiro Viana, and Marco Fiore

Page 2: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

Human Mobility Investigation

!2

timeLocations

General Networking

•Prediction•Reconstruction•Characterization

•Paging management•Caching optimization•Resource allocation

Page 3: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

Human Mobility Investigation1. Data collection Operator (CDR), App + Volunteers, WiFi, …

2. Data preliminary/processing

!3

3. Data utilization

•Prediction•Reconstruction•Characterization

•Paging•Caching•Resource allocation

Page 4: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

How to Obtain 100K Users’ Locations?

!4

WiFi? App + Volunteers? No! Only CDR!

• (Legacy) Call Detail Records• (Now) Charging Data Records• 3GPP TS 32.240• Calls, SMS, data sessions,

mobility updates, on/off, …• Necessary for billing• Large populations

Page 5: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

Phone Time Location

User ID 1 2015-01-0106:47:56

(19.028, -98.209)

User ID 1 2015-01-0108:23:08

(18.993, -98.202)

User ID 2 2015-01-0216:59:34

(19.025, -98.217)

Payload of

communication events

CDR Data

Page 6: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

• Collaboration- Operators- Data companies- Researchers

• Internet! • Publicly available

datasets• Where to find?

!6

from CDR datasets?

How to Obtain 100K Users’ Locations?

Page 7: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

Publicly Available CDR datasets• Data competitions• Orange D2D 2013• TIM Big Data Challenge 2015• flowminder.org, South Africa• dandelion.eu, Milan, Italy• (Chinese) kesci.com Big Data Competitions• 2016-2017, China Unicom, Shanghai,

642K users (2016), 1 weeks• (Chinese) zjdex.com, China Mobile,

Hangzhou, 7K users, 1 month

!7

Page 8: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

Data Preliminary Is Always Required

• Imperfectness • temporal heterogeneity• abnormalities

!8

0 20 40 60 80 100

120

140

160

180

200

220

240

260

280

300

320

340

day

06

1218

hour

Number of users (⇥104)

510152025

Page 9: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

Data Preliminary: Common Practices

• Extract location coordinates • Filter out “bad” users • Reduce resolutions• Segment observing periods• Correlate with mobility loss • Perform controlled experiments• Fill spatiotemporal gaps

!9

Page 10: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

Extract Location Coordinates• Locations in CDR

- Geographical coordinates - Cell tower IDs (extraction required) ‣MCC-MNC-LAC-CID

• Reliable cell tower locations- France OpenData, www.data.gouv.fr

• Crowdsensing, third-party services- OpenCelliD, Google Geolocation,

Unwired Labs, OpenSignal, and Mozilla Location Service

!10

Page 11: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

Unwired Labs

Page 12: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

Filter Out Users• Shrinking user population• 6M -> 100K•Gonzalez et al. "Understanding individual human mobility

patterns,” Nature, 2008

• 10M -> 50K•Song et al. "Limits of predictability in human mobility,” Science,

2010

• 1M -> 700•Hoteit et al. "Estimating human trajectories and hotspots through

mobile phone data." Computer Networks, 2014

• To cut off, to let go, and to move on!12

Page 13: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

Collaborate with Mobility Loss

!13

Entropy

Incompleteness (%)

10%

20%

30%

Page 14: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

Before Filtering: Mobility Measurement

• Location• Cell coverage• Repetitiveness• Categories

• Individual• Displacement• Travelled distance• Radius of gyration

!14

Page 15: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

Before Filtering: Mobility Measurement

• Location• Cell coverage • Repetitiveness• Categories

• Individual• Displacement• Travelled distance• Radius of gyration

!15

km0 10 20 30 40

km0

5

10

15

20

25

30

35

40

45

0

1

2

3

4

5

6

7

8

9

10

Page 16: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

Before Filtering: Mobility Measurement

• Location• Cell coverage• Repetitiveness • Categories

• Individual• Displacement• Travelled distance• Radius of gyration

!16

100 101 102

L-th most visited location

10�3

10�2

10�1

100

P(L

)

(L)�1

Page 17: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

Before Filtering: Mobility Measurement

• Location• Cell coverage• Repetitiveness• Categories

• Individual• Displacement• Travelled distance• Radius of gyration

!17

Home Work

Page 18: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

Before Filtering: Mobility Measurement

• Location• Cell coverage• Repetitiveness• Categories

• Individual• Displacement • Travelled distance• Radius of gyration

!18

Page 19: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

Before Filtering: Mobility Measurement

• Location• Cell coverage• Repetitiveness• Categories

• Individual• Displacement• Travelled distance • Radius of gyration

!19

Page 20: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

Before Filtering: Mobility Measurement

• Location• Cell coverage• Repetitiveness• Categories

• Individual• Displacement• Travelled distance• Radius of gyration

!20

Page 21: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

Case Study: Next-location Prediction

• Dataset: CDR (voice calls), ~1M users, 15 months

• User filtering:• Metropolitan area• Radius of gyration < 32km (urban, peri-urban)• Completeness > 20%• Weekdays only• Three or more unique locations• Data traffic > 1KB/day• Active days > 150 days

• Users of study: 7K• Travel history: >150 days for each user

!21

Page 22: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

• Problem:

• Accuracy Percentage of correct predictions • Theoretical predictability

- Entropy + Fano’s inequality => Accuracy upper bound

• Practical predictive models- PPM: Prediction by partial matching- MLP: Multi-layer perceptron

l̂t = arg maxl2Cells

P (Lt = l|lt�1, lt�2, · · · )

Page 23: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

0.4 0.5 0.6 0.7 0.8 0.9 1.0Predictability ⇡(L)

0.0

0.2

0.4

0.6

0.8

1.0

F(⇡

)⇡PPM(L)

⇡MLP(L)

⇡MLP-CI(L)

⇡MLP-CI(L)PastV⇧max

u (L) Theoretical predictability

Travel History

Travel History + Time

Travel history +Time+Data Traffic

CDF

Page 24: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

How to Improve Next-location Prediction?

• Still from travel history?• (2006) Markov Chain, Text Compression Algorithms

• Song, Libo, et al. "Evaluating next-cell predictors with extensive Wi-Fi mobility data." IEEE Transactions on Mobile Computing 5.12 (2006): 1633-1649.

• (2010) Matrix/Tensor Factorization• Zheng, Vincent Wenchen, et al. "Collaborative Filtering Meets Mobile Recommendation: A

User-Centered Approach." AAAI. Vol. 10. 2010.

• (2012) ARIMA models• Li, Xiaolong, et al. "Prediction of urban human mobility using large-scale taxi traces and its

applications." Frontiers of Computer Science 6.1 (2012): 111-121.

• (2016) Non-parametric Bayesian + MCMC• Jeong et al. “Cluster-aided mobility predictions." INFOCOM 2016, IEEE, 2016.

• (2016) Recurrent Neural Networks• Liu, Qiang, et al. "Predicting the Next Location: A Recurrent Model with Spatial and

Temporal Contexts." AAAI. 2016.

•New techniques are limited

!24

Page 25: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

• Travel history + Context + Deep learning

Travel history

AND

Shared trajectories

(2016) by Jeong et al.

Mobile Traffic

(2018) Call/SMS by González et al.

(2018) Data Sessions by us

?Future

Environment (WiFi/Bluetooth)

POIs

Detailed Traffic

Page 26: Takeaways in Large-scale Human Mobility Data … › papers › lanman_slides.pdf• Data competitions • Orange D2D 2013 • TIM Big Data Challenge 2015 • flowminder.org, South

Questions about mobility data processing?

[email protected]