Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
From Trajectories to Activities: A Spatio-Temporal Join Approach*
Kexin Xie, Ke Deng, Xiaofang Zhou
School of Information Technology & Electrical EngineeringUniversity of Queensland
Brisbane, Australia{kexin, dengke, zxf}@itee.uq.edu.au
University
Cafe
Restaurant A
Petrol Shop
Fast Food
SupermarketRestaurant B
ParkT_1 T_2
T_3
9:30 17:3018:00
18:4519:00
20:00
T_4
T_5
T_6
T_7
T_8
17:40
17:50
17:55
* This work is partially supported by Microsoft Research Asia Internet Services Theme Research Program
Activity Sequences
• What do we do each day?
• E.g.
‣ Drink after work?
‣ Drunk after semester ends?
‣ Jogging for 1 hour every Saturday?
‣ Watching movie after 4 hours of shopping?
‣ Only have one meal everyday during semester breaks?
‣ Sleep when the sun rises?
Data Mining with Activity Sequences
• Technique => Knowledge
‣ Frequent Sequential Pattern => Common Behavior (e.g. Collaborative promotions)
‣ Periodic Pattern => Periodic Behavior (e.g. POI suggestion)
‣ Outlier Detection => Odd Behavior (e.g. Criminal Investigation)
‣ Clustering => Similar Behavior (e.g. Friend suggestion)
• The Problem?
Spend $80 or more at Gardencity,
Get 20% Discount for
movie tickets
Data Mining with Activity Sequences
• Technique => Knowledge
‣ Frequent Sequential Pattern => Common Behavior (e.g. Collaborative promotions)
‣ Periodic Pattern => Periodic Behavior (e.g. POI suggestion)
‣ Outlier Detection => Odd Behavior (e.g. Criminal Investigation)
‣ Clustering => Similar Behavior (e.g. Friend suggestion)
• The Problem?
‣ Facebook vs Twitter
Spend $80 or more at Gardencity,
Get 20% Discount for
movie tickets
Data Mining with Activity Sequences
• Technique => Knowledge
‣ Frequent Sequential Pattern => Common Behavior (e.g. Collaborative promotions)
‣ Periodic Pattern => Periodic Behavior (e.g. POI suggestion)
‣ Outlier Detection => Odd Behavior (e.g. Criminal Investigation)
‣ Clustering => Similar Behavior (e.g. Friend suggestion)
• The Problem?
‣ Facebook vs Twitter
‣ How to obtain the activity sequences?
Spend $80 or more at Gardencity,
Get 20% Discount for
movie tickets
Travel Sequences
• Guess activity from travel sequences
‣ From office to a bar every workday.
‣ Stay at night club for 5 hours twice a year.
‣ Moving slightly faster than working speed in the park every Saturday.
‣ Moving slowly in shopping center for 4 hours then stays in a cinema.
• Where can we get such travel sequences?
‣ GPS-enabled mobile devices
From Trajectory to Sequence of Activities
• Problem
‣ Given a set of trajectories and a set of POIs, find the sequence of activities that might be performed during those set of trajectories
• Rationale
‣ If the user stays at a POI for long enough time, them some activity may take place.
• Question
‣ Which POIs did the user stay?
‣ What activities the user performed?
An Influence Model
• Given a set of POIs P, a segment of trajectory T is influenced by a p ∈ P if for every point pt on T, there does not exists a point p’ ∈ P, dist(pt, p’) < dist(pt, p).
• The influence duration of a POI p given a segment of trajectory T influenced by p, is the timestamp difference of T.
Influence Example
University
Cafe
Restaurant A
Petrol Shop
Fast Food
SupermarketRestaurant B
ParkT_1 T_2
T_3
9:30 17:3018:00
18:4519:00
20:00
T_4
T_5
T_6
T_7
T_8
17:40
17:50
17:55
Study@University Shopping@Supermarket Dining@Restaurant_B
Trajectories to Activities using Influence Duration• How long is long enough?
‣ Different POI have different requirement
‣ e.g. Fast food restaurant vs Luxury restaurant
• Which activity took place?
‣ Different activity requires different time length
‣ e.g. Working in restaurant vs Dining in restaurant
POI-Activity Mapping Set
A POI-Activity Mapping Set (PAMS) is a set of quadruple M = {(p, e, tmin, tmax)}, where p is some POI, e is some activity, and tmin, tmax are the minimum and maximum elapsed time for user activity e happened at p.
Assign User Activities
• Influence duration are checked agains tmin and tmax in PAMS
• Multiple activities may happen.
E.g. Given a subtrajectory Ts influenced by POI pi with influence duration 50, the PAMS is given as follow:
The possible activity are:
• a and b
• b
• c
Trajectory Semantic Join
• Input: A set of trajectories ST, A set of POIs P and a PAMS M
• Output: All subtrajectory-activity doubles (Ts, e), such that e may happen at Ts
( )
(T_1, (Studying@University ORWorking@University)), (T_2, Shopping@Supermarket), (T_3, Dining@Restaurant)
University
Cafe
Restaurant A
Petrol Shop
Fast Food
SupermarketRestaurant B
ParkT_1 T_2
T_3
9:30 17:3018:00
18:4519:00
20:00
T_4
T_5
T_6
T_7
T_8
17:40
17:50
17:55
Filter and Refine Approach
• Filter: Trajectory-POI join - find all subtrajectory-POI pairs that activities may took place
• Refine: assign possible events to these subtrajectory-POI pairs
University
Cafe
Restaurant A
Petrol Shop
Fast Food
SupermarketRestaurant B
ParkT_1 T_2
T_3
9:30 17:3018:00
18:4519:00
20:00
T_4
T_5
T_6
T_7
T_8
17:40
17:50
17:55
University
Cafe
Restaurant A
Petrol Shop
Fast Food
SupermarketRestaurant B
ParkT_1 T_2
T_3
9:30 17:3018:00
18:45
19:00
20:00
T_4
T_5
T_6
T_7
T_8
17:40
17:50
17:55
Trajectory Semantic Join using Voronoi Diagram
• Voronoi Diagram
‣ A Voronoi diagram of a set of points P is the subdivision of the plain into n cells, one of each site in P, with the property that a point q lies in the cell corresponding to a site pi if and only if dist(p, pi) < dist (q, pj) for each pj ∈ P, with j ≠ i.
• Computes Voronoi Diagram
‣ Fortune’s Algorithm to Compute Voronoi diagram in O(n log n) time.
(V0, t0)
(v1, t1)
(v2, t2)(v3, t3)
(v4, t4)
x1
x2
x3
P1
P2
P3
P4
P5
P6
P7
Trajectory Semantic Join using Voronoi Diagram
• Voronoi Diagram
‣ A Voronoi diagram of a set of points P is the subdivision of the plain into n cells, one of each site in P, with the property that a point q lies in the cell corresponding to a site pi if and only if dist(p, pi) < dist (q, pj) for each pj ∈ P, with j ≠ i.
• Computes Voronoi Diagram
‣ Fortune’s Algorithm to Compute Voronoi diagram in O(n log n) time.
Trajectory POI Join Algorithm
TP Join (Compute Line Intersections)
• Plane-sweep for Line Intersections
‣ Line segment becomes active when intersect with the sweep line
‣ Only “active” line segments can intersect with each other
‣ Perform intersection
check between “active”
line segments
TP Join - Organize and Output
• Observation: Given a trajectory intersects Voronoi edges on a sequence of points (p0, p1, ..., pn), where ∀0≤i<n timestamp(pi, T) ≤ timestamp(pi+1, T). All consecutive points pi and pi+1 (0≤i<n) have at least one common influenced POI influencing POI pc, and the subtrajectory Ti of T, defined by pi and pi+1 is influenced by pc.
(V0, t0)
(v1, t1)
(v2, t2)(v3, t3)
(v4, t4)
x1
x2
x3
P1
P2
P3
P4
P5
P6
P7
• Temporally sorted intersection points enclosed subtrajectories are fully influenced by some POI.
Partition Based Trajectory Semantic Join (PTSJ)
• Problem with TP Join
‣ Memory based techniques
‣ Need sort Voronoi edges and segment of trajectories globally
• Partition Based TSVID Join (PTSJ)
‣ A modified algorithm from PBSM (Patel SIGMOD 96)
‣ Partition the space into smaller size grids that spatial object can be fit into memory, insert the Key-Pointer-Element (KPE) of spatial objects into each partition if their MBR overlap with the partition.
‣ Estimate number of partitions:
‣ Perform in memory technique for each partition.
‣ Repartition may be needed if not enough memory.
Partition Based Trajectory Semantic Join (PTSJ)
Partition
TP Join for each Partition
Problems with PTSJ
• Trajectories and Voronoi cells may be inserted into multiple partition entries.
• Many duplicated entries
Reuse duplicated entries
• Duplication Reuse
‣ Each KPE stores the partitions the spatial object is inserted in, when retrieve the spatial objects of the next partition, the spatial objects with duplicated KPE do not need to be loaded in to memory (save I/O cost).
‣ Order of performing computations for the partitions are important.
Number of duplicated entries
• If KPEo is duplicated with partition P and one of its opposite partitions Popp, then it must have duplicated entires in one of its adjacent partitions.
• If KPEo is duplicated with partition P and one of its disjoint partitions Pdist, then it must have duplicated entires in one of its adjacent partitions.
• Hence Ndup(P, adj(P)) ≥ Ndup(P, opp(P)) ≥ Ndup(P, disj(P))
• Hillbert order always traverse adjacent partitions one after the other.
Adjacent
Adjacent
Adjacent
Adjacent
Opposite
Opposite Opposite
Opposite
DisjointDisjointDisjointDisjoint
Disjoint
Disjoint
Disjoint
0
1 2
3 4 5
67
8 9
101112
1314
15
PTSJ+ - PTSJ with duplication reuse
• Insert KPEs of each spatial object in to each partition.
• Each KPEs include the IDs of all partitions it is inserted into.
• Partitions need to be sorted in Hillbert order.
• Do TP-Join in for each partition in Hillbert order
• Keep track of shared spatial objects
• Avoid reloading already loaded spatial objects
Experiments
• Experimental Setting
‣ POI: 6487 POIs cropped from 105725 California POIs
‣ Trajectories: simulated from California road networks
‣ Time consumed by building index, retrieving spatial objects (I/O) is measured.
Experimental Result
0
20
40
60
80
100
120
0 4000 8000 12000
Ela
psed T
ime (
sec)
Number of Trajectories
PTSJPTSJ+
0
20
40
60
80
100
120
0 5 10 15
Ela
psed T
ime (
sec)
Trajectory Length (Hours)
PTSJ+PTSJ
Conclusion
• Trajectory and Semantic Issues:
‣ Preprocessing for Data Mining
‣ Understand User Behaviours
• Future works
‣ Further Optimisations
‣ Privacy Issues
Questions?