Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
Similarity-based Analysis for Trajectory Data
Kevin Zheng
25/04/2014 DASFAA 2014 Tutorial 1
Outline • Background
– What is trajectory – Where do they come from – Why are they useful – Characteristics
• Trajectory similarity search – Query classification – Trajectory similarity measures – Trajectory index
• Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering
25/04/2014 DASFAA 2014 Tutorial 2
Outline • Background
– What is trajectory – Where do they come from – Why are they useful – Characteristics
• Trajectory similarity search – Query classification – Trajectory similarity measures – Trajectory index
• Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering
25/04/2014 DASFAA 2014 Tutorial 3
What is trajectory? • Historical location records of moving objects • In mathematics
– Continuous function: time à location – Location can be any dimension
• In real applications – Locations are sampled periodically – A finite sequence of time-stamped locations: <p1,
t1>, <p2, t2> …, <pn, tn> – p: two or three dimensions (longitude, latitude)
25/04/2014 DASFAA 2014 Tutorial 4
Where is it from?
25/04/2014 DASFAA 2014 Tutorial 5
Where is it from? • GPS module on moving objects
– Vehicles, mobile phone users, animals • Online social network
– Twitter, Flickr, Facebook, Weibo
• Sensors – Surveillance cameras, RFID, WiFi
• More …
25/04/2014 DASFAA 2014 Tutorial 6
Who cares about it? • Government
– Traffic pattern analysis – Public transportation management – Urban planning
• Business – Location-based service – Personalized advertisement & recommendation – Taxi company, logistic company
• Scientists & Researchers – Zoologist, meteorologist, astronomer – Open problems, challenging tasks
• More …
25/04/2014 DASFAA 2014 Tutorial 7
Trajectory data are BIG • Volume • Velocity • Variety
25/04/2014 DASFAA 2014 Tutorial 8
Volume • In 2010, 1 billion vehicles
– Taxi, logistic companies keep tracking their vehicles
– Self-driving car in near future? • In 2012, 1.08 billion smartphone users • In 2013, 20 million surveillance cameras in
China • They are generator!
– The data keep accumulated
25/04/2014 DASFAA 2014 Tutorial 9
Velocity • Not just huge, they’re being generated quickly • Vehicle tracking & navigation
– Re-position every few seconds • Geo-tagged social media
– 2 million Flickr photos per day, 5% geo-tagged – 100 million posts on Sina Weibo per day, 1-2%
geo-tagged – 400 million tweets per day, 1% geo-tagged
• Sensors – How many cars pass a road camera every day?
25/04/2014 DASFAA 2014 Tutorial 10
Geo-tagged tweets
25/04/2014 DASFAA 2014 Tutorial 11
Images courtesy of Twitter
Variety • Data source • Tracking devices
– Car GPS, smartphones, sensors • Tracking methods
– Sampling strategy, sampling rate,
• Spatial length & temporal duration • Data quality
25/04/2014 DASFAA 2014 Tutorial 12
Research directions • Scalable, real-time data processing • Flexible database storage and index • Effective similarity measures • Uncertainty management • Data compression
25/04/2014 DASFAA 2014 Tutorial 13
Key and fundamental research problem: similarity-based analysis
Outline • Background
– What is trajectory – Where do they come from – Why are they useful – Characteristics
• Trajectory similarity search – Query classification – Trajectory similarity measures – Trajectory index
• Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering
25/04/2014 DASFAA 2014 Tutorial 14
Similarity-based analysis for trajectories • Core problem: trajectory similarity search
– Input: a trajectory dataset D, a query Q – Output: a subset of D that are ‘similar’ to Q
• Foundation – Trajectory similarity measures
• Approach – Index and search algorithm
• Application – Popular route mining (route recommendation) – co-traveller discovery, clustering, classification, etc…
25/04/2014 DASFAA 2014 Tutorial 15
Similarity query classification • P-query
– Query: point(s) • R-query
– Query: region (spatial & temporal dimension) • T-query
– Query: trajectory
25/04/2014 DASFAA 2014 Tutorial 16
P-query (single point)
25/04/2014 DASFAA 2014 Tutorial 17
ts te
ts
te
q
𝐷(𝑞, 𝑇)= 𝑚𝑖𝑛𝑑𝑖𝑠𝑡(𝑞,𝑝) 𝑝∈𝑇 and satisfy tc dist(q,p): - Lp-norm - Network distance
Query location: q Temporal constraint (optional): tc = [ts, te]
[Tao2002] Tao Y., Papadias D. and Shen Q., Continuous nearest neighbour search, VLDB, 2002
P-query (multiple points)
25/04/2014 DASFAA 2014 Tutorial 18
q1
q2
q3
q4
[Chen2010] Chen Z., Shen HT., Zhou X., Zheng Y and Xie X., Searching trajectories by locations – an efficiency study. SIGMOD 2010
Query locations Q: q1, q2, q3, q4 D(Q,T) is an aggregate function of D(q,T)
R-query • Spatial region: R • Temporal interval:[ts, te]
25/04/2014 DASFAA 2014 Tutorial 19
ts te
ts te
[Pfoster 2000] Dieter Pfoster, Christian S. Jensen, Yannis T., Novel approaches to the indexing of moving object trajectories. VLDB, 2000
R
Ask for trajectories in a given region during a time interval
T-query • Query: Tq
25/04/2014 DASFAA 2014 Tutorial 20
Tq
How to measure their distance?
Trajectory similarity measures • Many-to-many mapping • Different semantic/applications • Different lengths • Different sampling rates • Noises • Temporal dimension?
25/04/2014 DASFAA 2014 Tutorial 21
Classification
25/04/2014 DASFAA 2014 Tutorial 22
Lp-norm DTW LCSS EDR
DTW, LCSS, EDR with time constrain
OWD LIP
Synchronous Euclidean Distance
Spatial-only Spatial-temporal
Discrete
Continuous
Consider location only
Consider both location and time
Based on location samples
Based on line segments or curves
Classification
25/04/2014 DASFAA 2014 Tutorial 23
DTW, LCSS, EDR with time constrain
OWD LIP
Synchronous Euclidean Distance
Spatial-only Spatial-temporal
Discrete
Continuous
Lp-norm DTW LCSS EDR
Lp-norm • Average Lp-norm distance of all matched
locations • 1-to-1 mapping • Trajectories are of the same length
25/04/2014 DASFAA 2014 Tutorial 24
Lp-norm • Cannot detect similar trajectories with different
sampling rates • Sensitive to noise
25/04/2014 DASFAA 2014 Tutorial 25
DTW • Dynamic Time Warping distance
– Adaptation from time series distance measure – Used to handle time shift and scale in time series
• Optimal order-aware alignment between two sequences – Goal: minimize the aggregate distance between
matched points • 1-to-many mapping
25/04/2014 DASFAA 2014 Tutorial 26
Yi, Byoung-Kee, Jagadish, HV and Faloutsos, Christos, Efficient retrieval of similar time sequences under time warping. ICDE 1998
DTW for trajectories • Nothing to do with ‘time’ at all • Useful when detecting similar trajectories with
different sampling rates • Sensitive to noise
25/04/2014 DASFAA 2014 Tutorial 27
LCSS • Longest Common Sub-Sequence • Adaptation of string similarity
– Lcss(‘abcde’,’bd’) = 2 • Threshold-based equality relationship
– Two locations are regarded as equal if they’re ‘close’ (compared to a threshold)
• 1-to-(1 or null) mapping
25/04/2014 DASFAA 2014 Tutorial 28
VLACHOS, M., GUNOPULOS, D., AND KOLLIOS, G. Discovering similar multidimensional trajectories. ICDE 2002
LCSS • Insensitive to noise • Not easy to define threshold • May return dissimilar trajectories
25/04/2014 DASFAA 2014 Tutorial 29
p1
p2
p3
p4
p5
p’1 p’2
p’3
EDR • Edit Distance on Real sequence • Adaptation from Edit Distance on strings
– Number of insert, delete, replace needed to convert A into B
• Threshold-based equality relationship – Two locations are regarded as equal if they’re
‘close’ (compared to a threshold)
25/04/2014 DASFAA 2014 Tutorial 30
Lei Chen, M. Tamer Ozsu, Vincent Oria, Robust and Fast Similarity Search for Moving Object Trajectories. SIGMOD 2005
EDR • Value means the number of operations, not
“distance between locations” – Insensitive to noise
25/04/2014 DASFAA 2014 Tutorial 31
p1
p2
p3
p4
p5
p’1 p’2
p’3 insert
insert replace
LCSS and EDR • They are both count-based
– LCSS counts the number of matched pairs – EDR counts the cost of operations needed to fix
the unmatched pairs • Higher LCSS, lower EDR • If cost(replace) = cost(insert) + cost(delete): • EDR(X,Y) = L(X)+L(Y) – 2LCSS(X,Y)
25/04/2014 DASFAA 2014 Tutorial 32
Classification
25/04/2014 DASFAA 2014 Tutorial 33
DTW, LCSS, EDR with time constrain
Synchronous Euclidean Distance
Spatial-only Spatial-temporal
Discrete
Continuous
Lp-norm DTW LCSS EDR
OWD LIP
OWD • One Way Distance from T1 to T2 is:
– Integral of the distance from points of T1 to T2
– Divided by the length of T1
• Make it into symmetric measure
25/04/2014 DASFAA 2014 Tutorial 34
Bin Lin, Jianwen Su, One Way Distance: For Shape Based Similarity Search of Moving Object Trajectories. In Geoinformatica (2008)
OWD example • Consider one trajectory as piece-wise line
segment, and the other as discrete samples
25/04/2014 DASFAA 2014 Tutorial 35
LIP distance • Locality In-between Polylines
– Polygon is the set of polygons formed between intersection points
–
25/04/2014 DASFAA 2014 Tutorial 36
Nikos Pelekis et al, Similarity Search in Trajectory Databases. Symposium on Temporal Representation and Reasoning 2007
LIP distance • Only work for 2-dimensional trajectories • Polygon à polyhedron: non-trivial change
25/04/2014 DASFAA 2014 Tutorial 37
Spatial-temporal similarity measure • All the spatial-only similarity measures can
incorporate time information in trajectories • Lp-norm and DTW can apply a temporal
constrain for more synchronized alignment • EDR and LCSS can apply a temporal threshold
on top of spatial threshold
25/04/2014 DASFAA 2014 Tutorial 38
Classification
25/04/2014 DASFAA 2014 Tutorial 39
OWD LIP
Synchronous Euclidean Distance
Spatial-only Spatial-temporal
Discrete
Continuous
Lp-norm DTW LCSS EDR
DTW, LCSS, EDR with time constrain
DTW with temporal constrain
25/04/2014 DASFAA 2014 Tutorial 40
10
15 13
9
10
Time tolerance = 2
7
Spatial-temporal LCSS • A temporal threshold controls how far in time
we can go in order to match two locations
25/04/2014 DASFAA 2014 Tutorial 41
VLACHOS, M., GUNOPULOS, D., AND KOLLIOS, G. Discovering similar multidimensional trajectories. ICDE 2002
p1
p2
p3
p4
p5
p’1 p’2
p’3
t1=1
t2=2
t3=4
t4=7
t5=11
t’1=1
t’2=3
t’3=4
Time threshold=2
Classification
25/04/2014 DASFAA 2014 Tutorial 42
DTW, LCSS, EDR with time constrain
OWD LIP
Synchronous Euclidean Distance
Spatial-only Spatial-temporal
Discrete
Continuous
Lp-norm DTW LCSS EDR
SED • Synchronous Euclidean Distance
– Euclidean distance between locations at the same time instance of two trajectories
• Regard trajectory as continuous function of time
25/04/2014 DASFAA 2014 Tutorial 43
Mirco Nanni, Dino Pedreschi, Time-focused clustering of trajectories of moving objects. Journal of Intelligent Information Systems (2006) POTAMIAS, M., PATROUMPAS, K., AND SELLIS, T. K. Sampling trajectory streams with spatiotemporal criteria. SSDBM 2006
SED
25/04/2014 DASFAA 2014 Tutorial 44
t=0 t=10
t=5
t=0
t=15 t=25
t=20
t=12 t=10
t=7
t=3
Virtually create a sample point at t=3
Trajectory index • Similarity measures give the way to calculate
the distance between two trajectories • However …
– Huge amount of trajectories – Linear scan is inefficient
25/04/2014 DASFAA 2014 Tutorial 45
Trajectory index • 3D R-tree • STR-tree (Spatio-temporal R-tree) • TB-tree (Trajecoty Bundle) • Multi-version R-tree
– Partition temporal dimension • Grid-based index
– Partition spatial dimension
25/04/2014 DASFAA 2014 Tutorial 46
3D R-tree • Indexing position samples only
– Cannot answer queries about movements in-between those samples
• Indexing line segments
25/04/2014 DASFAA 2014 Tutorial 47
x
Time
y
Problem of R-tree • Large “dead space”
– MBB covers a large portion of the space with no data – Low pruning power
• Trajectory preservation – Line segments are grouped merely by spatial proximity – Regardless which trajectory they belong to – Retrieving a trajectory requires visits of different paths
in the tree
25/04/2014 DASFAA 2014 Tutorial 48
Augmented 3D R-tree • Augment leaf node with orientation
information • Distance to line segments can be approximated
more accurately
25/04/2014 DASFAA 2014 Tutorial 49
[Pfoster 2000] Dieter Pfoster, Christian S. Jensen, Yannis T., Novel approaches to the indexing of moving object trajectories. VLDB, 2000
q
STR-tree • Extension of augmented 3D R-tree • Insertion
– Try to keep the line segments belonging to the same trajectory together
– Find the leaf node containing the predecessor • Node split
– Put disconnected segments into new node – Put the most recent backward-connected segment
into new node
25/04/2014 DASFAA 2014 Tutorial 50
[Pfoster 2000] Dieter Pfoster, Christian S. Jensen, Yannis T., Novel approaches to the indexing of moving object trajectories. VLDB, 2000
TB-tree (Trajectory Bundle) • A leaf node only contains segments belonging
to the same trajectory • Leaf nodes containing same trajectory are
linked • Strictly preserve trajectories
25/04/2014 DASFAA 2014 Tutorial 51
[Pfoster 2000] Dieter Pfoster, Christian S. Jensen, Yannis T., Novel approaches to the indexing of moving object trajectories. VLDB, 2000
TB-tree (Trajectory Bundle)
25/04/2014 DASFAA 2014 Tutorial 52
Reflection • Previous index treat spatial and temporal
dimensions equally – 3D tree structure
• In trajectory database, temporal dimension is more dynamic – New segments are appended to existing trajectories – Archived trajectories rarely update
• Indexing spatial and temporal dimensions separately – Partition temporal dimension first – Partition spatial dimension first
25/04/2014 DASFAA 2014 Tutorial 53
Multi-version R-tree
25/04/2014 DASFAA 2014 Tutorial 54
HR-‐tree
For each 2mestamp, an R-‐tree is created. So, there are many R-‐trees. These R-‐trees are indexed.
Query for trajectories in a given region and in a given time interval: 1. The R-tree at the timestamp is found first 2. The trajectories in the specified region are retrieved from the R-tree.
[Nascimento1998] Nascimento, M., Silva, J. Towards Historical R-trees. ACM SAC, 1998 [Tao2001a] Tao, Y., Papadias, D.: Efficient historical r-trees. In: ssdbm, p. 0223. Published by the IEEE Computer Society (2001) [Xu2005]Xu, X., Han, J., Lu, W.: Rt-tree: An improved r-tree indexing structure for temporal spatial databases. In: Int. Symp. on Spatial Data Handling, 2005 [Tao2001b] Tao, Y., Papadias, D.: Mv3r-tree: A spatio-temporal access method for timestamp and interval queries. In: VLDB, pp. 431-440 (2001)
Grid-based index • Partition space into non-overlapping cells • Trajectory segments in each cell are indexed
on temporal dimension • Query processing:
– Spatial filtering – Temporal filtering
25/04/2014 DASFAA 2014 Tutorial 55
[Prasad2003] V. Prasad Chakka Adam C. Everspaugh Jignesh M., Patel, Indexing Large Trajectory Data Sets With SETI, CIDR 2003
Grid-based index
25/04/2014 DASFAA 2014 Tutorial 56
[Prasad2003] V. Prasad Chakka Adam C. Everspaugh Jignesh M., Patel, Indexing Large Trajectory Data Sets With SETI, CIDR 2003
Outline • Background
– What is trajectory – Where do they come from – Why are they useful – Characteristics
• Trajectory similarity search – Query classification – Trajectory similarity measures – Trajectory index
• Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering
25/04/2014 DASFAA 2014 Tutorial 57
Popular route mining • Shortest path may not be the favourable • Find the most popular/desirable path using the
GPS trajectories of past travellers • Classification
– No specific source/destination – hot route discovery
– Specific source/destination – popular route search
25/04/2014 DASFAA 2014 Tutorial 58
T-pattern • A set of individual trajectories that share the
property of visiting the same sequence of places with similar travel times
25/04/2014 DASFAA 2014 Tutorial 59
1. Spatial discretization: discretize space into finite set of regions of interest (RoI) 2. Translate trajectories into sequence of RoIs 3. Adapt sequential pattern mining algorithms with time constrain
F. Giannotti, M. Nanni, F. Pinelli, and D. Pedreschi. Trajectory pattern mining. In SIGKDD, pages 330–339, 2007
Periodic pattern • Find the repeat pattern for individual’s
trajectory
25/04/2014 DASFAA 2014 Tutorial 60
1. Pre-defined and synchronized timestamps 2. Adaptive region: density-based cluster 3. Trajectories are translated to sequence of regions at pre-defined time instances
N. Mamoulis, H. Cao, G. Kollios, M. Hadjieleftheriou, Y. Tao, and D. W. Cheung. Mining, indexing, and querying historical spatiotemporal data. In SIGKDD, pages 236–245, 2004.
Interesting travel sequence • A sequence of interesting locations that is
travelled by experienced drivers frequently • Interestingness of a location?
– How many experienced travellers visited it
• Experience of a traveller? – How many interesting locations she has visited
25/04/2014 DASFAA 2014 Tutorial 61
Y. Zheng, L. Zhang, X. Xie, and W.-Y. Ma. Mining interesting locations and travel sequences from gps trajectories. In WWW, pages 791–800, 2009.
Interesting travel sequence • Hyperlink-Induced Topic Search (HITS)
25/04/2014 DASFAA 2014 Tutorial 62
Popular route search • Given source/destination, find/estimate the
most popular route in between • Ideally, we can just count the number of
trajectories on different paths connecting the two locations
25/04/2014 DASFAA 2014 Tutorial 63
Z. Chen, H. T. Shen, and X. Zhou. Discovering popular routes from trajectories. In ICDE, pages 900–911, 2011
Popular route discovery • In real scenarios, it’s not easy to find such
well-divided groups
• Even worse, there’s no trajectory connecting two locations at all!
25/04/2014 DASFAA 2014 Tutorial 64
Popular route discovery • Construct a transfer network from raw trajectories
as an intermediate result to capture the moving behaviors between locations – Node: cluster of turning points – Edge: trajectories passing two nodes
25/04/2014 DASFAA 2014 Tutorial 65
Popular route discovery • Transfer probability
• Find the route with the highest joint transfer probability w.r.t. destination
25/04/2014 DASFAA 2014 Tutorial 66
Find popular route from uncertain trajectories • Low-sampling-rate trajectories have high
degree of uncertainty • Uncertain trajectories are prevalent! • Is it possible to recover the original route given
a low-sampling-rate trajectory? • What if…
– There is a historical set of uncertain trajectories
25/04/2014 DASFAA 2014 Tutorial 67
Kai Zheng, Yu Zheng, Xing Xie and Xiaofang Zhou. Reducing Uncertainty of Low-Sampling-Rate Trajectories. ICDE 2012
Find popular route from uncertain trajectories • Can we find popular routes for specific s/d
using low-sampling-rate trajectories
25/04/2014 DASFAA 2014 Tutorial 68
Infrequent samples on the same path can reinforce each other, and they collectively form a more ‘dense’ trajectory
Find popular route from uncertain trajectories • Find PR for a given sequence of locations • Two-phase approach
– Local PR construction – Global PR search
25/04/2014 DASFAA 2014 Tutorial 69
Find popular route from uncertain trajectories without road networks • A road network is not always available or
applicable
25/04/2014 DASFAA 2014 Tutorial 70
1. Discretize space into disjoint cells 2. Derive the transfer graph using cells 3. Infer frequent ‘virtual edges’ on the graph
L.-Y. Wei, Y. Zheng, and W.-C. Peng. Constructing popular routes from uncertain trajectories. In ACM SIGKDD, pages 195–203, 2012.
Time period-based most frequent path (TPMFP) • Find the most frequent path for specific s/d and
time period • Desired properties for a MFP
– Suffix optimal: suffix of a MFP is also a MFP – Length insensitive: MFP shouldn’t favor long/short
route – Bottleneck free: MFP shouldn’t contain infrequent
edges
25/04/2014 DASFAA 2014 Tutorial 71
Wuman Luo, Haoyu Tan, Lei Chen, Lionel M. Ni. Finding Time Period-Based Most Frequent Path in Big Trajectory Data. SIGMOD 2013
Time period-based most frequent path (TPMFP) • Footmark graph
– A weighted sub-graph – Edge frequency: number of trajectories reaching vd
during T – Path frequency: non-decreasingly sorted sequence of
edge frequencies
25/04/2014 DASFAA 2014 Tutorial 72
v1 à v2: 14, v2 à v3: 10 v3 à v12: 10, v2 à v12: 8
V1 à v2 à v12: (8, 14) V1àv2àv3àv12: (10,10,14) V1àv10àv11àv12: (1,21,21)
Time period-based most frequent path (TPMFP) • Compare path frequency
– More-frequent-than relation (>) – F > F’ if their first different value fj > fj’ – (10,10,14) > (8,14) > (1,21,21)
• Property – A total order, so MFP always exist – Guarantee the suffix optimal – Length of path doesn’t matter a lot – Path with infrequent edge frequency be disadvantaged
25/04/2014 DASFAA 2014 Tutorial 73
PR search is more challenging • Specific source/destination (and time
constraint) – Data are sparse – How to utilize more relevant data?
• Online performance – Efficiency is critical
25/04/2014 DASFAA 2014 Tutorial 74
Summary • Background
– What is trajectory – Where do they come from – Why are they useful – Characteristics
• Trajectory similarity search – Query classification – Trajectory similarity measures – Trajectory index
• Similarity-based trajectory mining – Popular route mining – Co-traveller discovery – Trajectory clustering
25/04/2014 DASFAA 2014 Tutorial 75
Thank you • Questions
– [email protected] • DKE group at UQ
– http://www.itee.uq.edu.au/dke/dke-lab
• My research – http://staff.itee.uq.edu.au/kevinz/
25/04/2014 DASFAA 2014 Tutorial 76