Upload
clovis
View
46
Download
0
Tags:
Embed Size (px)
DESCRIPTION
GDC: Group Discovery using Co-location Traces. Steve Mardenfeld Daniel Boston Susan Juan Pan Quentin Jones † Adriana Iamntichi ‡ Cristian Borcea Department of Computer Science, New Jersey Institute of Technology † Department of Information Systems, NJIT - PowerPoint PPT Presentation
Citation preview
GDC: Group Discovery using Co-location TracesSteve MardenfeldDaniel BostonSusan Juan PanQuentin Jones†
Adriana Iamntichi‡Cristian Borcea
Department of Computer Science,New Jersey Institute of Technology†Department of Information Systems, NJIT‡Department of Computer Science, USF
Physical GroupsInformally: groups of people that meet
face to face◦ Formal definition: Homans’ sociology book “The
Human Group”Groups can be used in social or socially
aware applications◦ Recommender systems: recommend concerts
to people who go to concerts together◦ Data forwarding in delay-tolerant ad hoc
networks: give priority to members of same group as destination when selecting next hopHow to detect groups automatically?
Group Detection Using Location TracesUsers carry mobile phones and upload
location to central serverServer analyzes location traces to
detect groupsIn previous work, we developed an
algorithm for group/place detection◦ Achieved 96% accuracy with low false
positives
Problems: Location privacyBattery power
4
GDC: Use Bluetooth Co-location Traces
Advantages◦ Improved location privacy◦ Low power consumption◦ Practicality due to Bluetooth ubiquity in
mobile phones◦ Accuracy due to Bluetooth transmission
range
User Seen
TimeA B 1:00
B A 1:05INTERNET
A B 1:07
A B
B C 1:05
A C 1:07
5
Challenges Attendance at a group is variablePeople may be merely passing near a
group, not remaining part of itGroup members spend different
lengths of time with the groupSampling frequency and user mobility
can affect data completenessEach user may have a different
perspective on the same meeting
6
OutlineGDC AlgorithmUser Study ResultsDistributed GDCConclusions
7
GDC in a NutshellTransform raw Bluetooth records into
meeting records between pairs of users
Discover and record all combinations of users appearing at the same meeting (user clusters)
Resolve differences in user perspectives on shared clusters
Select all significant clusters and output as user groups
8
Creating Pair-wise Meeting Records
Time Stamp
User User With
11:02:01
djb38 jp238
11:02:01
djb38 mak43
11:04:14
djb38 jp238
11:04:14
djb38 mak43
11:07:05
djb38 mak43
Time Stamp
User User With
11:02:15
jp238 djb38
11:02:15
jp238 mak43
11:05:02
jp238 mak43
11:07:50
jp238 djb38
11:07:50
jp238 mak43
Time Stamp
User User With
11:01:30
mak43
jp238
11:01:30
mak43
djb38
11:04:18
mak43
jp238
11:10:10
mak43
jp238
User mak43
Time Stamp
User With
11:01:30
jp238
11:01:30
djb38
11:02:01
djb38
11:02:15
jp238
11:04:14
djb38
11:04:18
jp238
11:05:02
jp238
11:07:05
djb38
11:07:50
jp238
11:10:10
jp238
User djb38Time
StampUser With
11:01:30
mak43
11:02:01
jp238
11:02:01
mak43
11:02:15
jp238
11:04:14
jp238
11:04:14
mak43
11:07:05
mak43
11:07:50
jp238
User jp238Time
StampUser With
11:01:30
mak43
11:02:01
djb38
11:02:15
djb38
11:02:15
mak43
11:04:14
djb38
11:04:18
mak43
11:05:02
mak43
11:07:50
djb38
11:07:50
mak43
11:10:10
mak43
User mak43User With
Start Time
End Time
jp238 11:01:30
11:10:10
djb38 11:01:30
11:07:05
User djb38User With
Start Time
End Time
jp238 11:02:01
11:07:50
mak43 11:01:30
11:07:05
User jp238User With
Start Time
End Time
mak43 11:01:30
11:10:10
djb38 11:02:01
11:07:05
User mak43User With
Start Time
End Time
jp238 11:01:30
11:04:18
jp238 11:07:50
11:10:10
djb38 11:01:30
11:04:14
User djb38User With
Start Time
End Time
jp238 11:02:01
11:04:14
mak43 11:01:30
11:04:14
User jp238User With
Start Time
End Time
mak43 11:01:30
11:05:02
mak43 11:07:50
11:10:10
djb38 11:02:01
11:04:14
Decreasing Meeting Granularity (MG) from 5 min to
2 ½ min produces noticeable changes
9
Creating User Clusters
User mak43User With
Start Time
End Time
jp238 11:01:30
11:10:10
djb38 11:01:30
11:07:05
User djb38User With
Start Time
End Time
jp238 11:02:01
11:07:50
mak43 11:01:30
11:07:05
User jp238User With
Start Time
End Time
mak43 11:01:30
11:10:10
djb38 11:02:01
11:07:05User mak43
Users With
Time Spent
jp238, djb38
00:05:35
jp238 00:08:40djb38 00:05:35
User djb38Users With
Time Spent
jp238, mak43
00:05:04
jp238 00:05:49mak43 00:05:35
User jp238Users With
Time Spent
djb38, mak43
00:05:04
djb38 00:05:04mak43 00:08:40
10
Creating Global ClustersResolve Perspective Differences
◦ Use Minimum Group Time (MGT)◦ Use Minimum Group Meeting Frequency
(MGMF)User mak43Users With Time
Spentjp238, djb38
00:05:35
jp238 00:08:40djb38 00:05:35
User djb38Users With Time
Spentjp238, mak43
00:05:04
jp238 00:05:49mak43 00:05:35
User jp238Users With Time
Spentdjb38, mak43
00:05:04
djb38 00:05:04mak43 00:08:40
Cluster Minimum Time
Min. Frequency
djb38, jp238, mak43
00:05:04 1
djb38, mak43 00:05:35 1djb38, jp238 00:05:04 1jp238, mak43 00:08:40 1
11
Selecting the User GroupsIdentify and remove subgroups of
significant groups◦ Keep a subgroup if it meets double the time of
the group that includes itCluster Minimum
Timedjb38, jp238, mak43
00:05:04
djb38, mak43 00:05:35jp238, mak43 00:10:40
Group Min. Time
djb38, jp238, mak43
00:05:04
jp238, mak43
00:10:40
12
Complexity AnalysisR - total number of Bluetooth recordsN - total number of users in the
datasetL - maximum number of users in a
group◦ Small value because relatively few users
are in the transmission range (10m)◦ Our experiments: max = 15, avg = 6.8Creating Pair-Wise Meeting
Records O(R)Creating User Clusters O(R * 2L)
Creating Global Clusters O(N * 2L)Selecting the User Groups O(R * 2L)
Total ComplexityO(R * 2L), R>> N
13
EvaluationGoals
◦ Analyze effect of group meeting frequency and time
◦ Compare GDC and K-Clique K-Clique uses a time threshold to select graph
edges and analyzes the graph for k-cliquesExperiments
◦ Collect data from mobile phones carried by 100+ volunteer students on campus for one month
◦ Run GDC and K-Clique on collected data Also tested on Reality Mining data from MIT
◦ Ask users to rank groups using Likert Scale 1 to 5, 5 is best
14
Data Collection Details
78 users each contributed less than 24 hours of recorded data
Sparse data: random volunteers, many students are commuters
Demographics: 72% male, 28% female, 25% graduate, 75% undergraduate
0
2
4
6
8
10
12
14
16
18
0 2 4 6 8 10 15 20 25 30 35 40 45 50 55 60 65 70 75 100
125
150
175
200
250
300
MoreNumber of Hours
Num
ber
of U
sers
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
FrequencyCumulative %
15
Effect of Meeting Time and Frequency
Detection accuracy increases significantly with meeting frequency
and total meeting time
0
2
4
6
8
10
12
14
16
2000-3000 3000-5000 5000-7000 >7000Minimum Group Time
Rati
ng F
requ
ency
Very BadBadOKGoodVery Good
0
5
10
15
20
25
1 2 3-4 5 and GreaterGroup Meeting Frequency
Rati
ng F
requ
ency
Very BadBadOkGoodVery Good
16
GDC vs. K-Clique
Overall, GDC groups rated 30% better than the popular K-Clique algorithm◦ GDC groups are guaranteed to meet◦ Not all K-Clique groups meet
Some GDC groups are rated poorly because members don’t know their names
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Very Bad Bad OK Good Very GoodRating Category
Perc
enta
ge o
f Tot
al R
atin
gsK-CliqueGDC GDC:
MGT = 2000s
MGMF = 2
K-Clique:
Threshold 2000s
17
GDC Groups: NJIT Dataset vs. Reality Mining Dataset
Group distributions as a function of size are relatively similar despite the fact that Reality Mining is a denser dataset
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
3 4 5 6 7 8 9 10 11 12 13Group Size
Perc
enta
ge o
f Tot
al G
roup
s
Reality Mining
NJ IT Datset
NJIT: MGT = 2000s, MGMF = 1 Reality Mining: MGT = 18000s,
MGMF = 9 (normalized for 9 months)
18
OutlineGDC AlgorithmUser Study ResultsDistributed GDCConclusions
19
Distributed GDC (D-GDC)GDC executed on the phonesBenefits
◦ Better privacy Avoid “Big Brother” scenario Ability to control message exchange on a per-case
basis◦ Resiliency: no bottleneck & no single point of
failure◦ Flexibility: each user controls how often to run
D-GDC
20
D-GDC ImplementationCollect Bluetooth records locally
through message exchange◦ No global aggregation like in GDC
Control exchange with heuristic policies◦ These policies can be specified by users◦ Allows greater individual privacy control
Run remainder of GDC device-localEvaluated using replay simulation over
our real traces
Preliminary Results
Overall similarity: compute similarity of each user’s GDC groups against the closest matches in D-GDC and average the results
Compared D-GDC with a version running only on data collected locally by phones◦ D-GDC performs significantly better than local-
only version
D-GDC Local only
Average similarity 77.33% 58.24%Groups with similarity > 90%
59.77% 19.14%
21
22
ConclusionPhysical groups enable new socially-
aware features in applicationsGDC: practical, high-accuracy, no
location collection◦ Validated by users and outperforms K-
Clique by 30%◦ Higher accuracy can be achieved by
increasing frequency and time parametersA decentralized version improves
privacy and produces promising results
23
Thank You!
Mobius project: http://www.cs.njit.edu/~borcea/mobius/
Acknowledgement: NSF grants CNS-0831753 and CNS-0834585