14
WWW2008 Workshop on Social Web Search and Mining April 22th, 2008 - Beijing Time Based Context Cluster Analysis for Automatic Blog Generation Luca Costabello and Laurent-Walter Goix Telecom Italia, Italy

Time Based Cluster Analysis for Automatic Blog Generation

Embed Size (px)

DESCRIPTION

Presented at the Social Web Search and Mining Workshop, WWW2008 in Beijing

Citation preview

Page 1: Time Based Cluster Analysis for Automatic Blog Generation

WWW2008Workshop on Social Web Search and MiningApril 22th, 2008 - Beijing

Time Based Context Cluster Analysis for Automatic Blog Generation

Luca Costabello and Laurent-Walter GoixTelecom Italia, Italy

Page 2: Time Based Cluster Analysis for Automatic Blog Generation

2

Luca Cos ta be lloLa ure nt-Wa lte r Goix

Time Based Context Cluster Analysis for Automatic Blog Generation

Context as Blog Content

User context is gaining importance

Location info

Nearby buddies

The surrounding environment in general

We mine context data to detect daily user actions

User actions are converted into natural text

Blog posts describing the user days enable the detection of a community of users with similar behavioral patterns.

Page 3: Time Based Cluster Analysis for Automatic Blog Generation

3

Luca Cos ta be lloLa ure nt-Wa lte r Goix

Time Based Context Cluster Analysis for Automatic Blog Generation

1) Raw data gathering

Daily actions

2) Offline Cluster analysis

3) Blog post generation

Context-Based Blog Generation

Page 4: Time Based Cluster Analysis for Automatic Blog Generation

4

Luca Cos ta be lloLa ure nt-Wa lte r Goix

Time Based Context Cluster Analysis for Automatic Blog Generation

System Architecture

Page 5: Time Based Cluster Analysis for Automatic Blog Generation

5

Luca Cos ta be lloLa ure nt-Wa lte r Goix

Time Based Context Cluster Analysis for Automatic Blog Generation

Cluster Analysis: Detecting User Actions

2007-10-03 11:02:33 222-1-61101-72162201 office,tilab 2007-10-03 10:59:09 222-1-61101-72162201 office,tilab 2007-10-03 10:55:46 222-1-61101-72162201 office,tilab 2007-10-03 10:52:41 222-1-61101-64530928 n/a,n/a2007-10-03 10:48:59 222-1-61101-72162201 office,tilab 2007-10-03 10:45:34 222-1-61101-72162201 office,tilab 2007-10-03 10:42:11 222-1-61101-64530928 n/a,n/a2007-10-03 10:38:47 222-1-61101-72162201 office,tilab 2007-10-03 10:37:47 222-1-61101-72162201 office,tilab 2007-10-03 09:27:01 222-1-61101-72157899 office,tilab 2007-10-03 08:58:11 222-1-61104-72386176 n/a,n/a 2007-10-03 08:56:28 222-1-24650-121 n/a,n/a 2007-10-03 08:56:05 222-1-24650-122 n/a,n/a 2007-10-03 08:54:20 222-1-54650-923 n/a,n/a 2007-10-03 08:51:31 222-1-61104-72395762 n/a,n/a 2007-10-03 08:49:16 222-1-61104-72384437 n/a,n/a 2007-10-03 08:48:47 222-1-61104-72395762 n/a,n/a 2007-10-03 08:48:18 222-1-61104-72384437 n/a,n/a 2007-10-03 08:47:50 222-1-61104-72395762 n/a,n/a 2007-10-03 08:47:21 222-1-61104-72395762 n/a,n/a 2007-10-03 08:46:51 222-1-61104-72384437 n/a,n/a 2007-10-03 08:46:20 222-1-61104-72376116 n/a,n/a 2007-10-03 08:45:15 222-1-61104-72395763 n/a,n/a 2007-10-03 08:44:02 222-1-61104-72400263 n/a,n/a 2007-10-03 08:42:33 222-1-61104-72395770 n/a,n/a 2007-10-03 08:42:02 222-1-61104-72400262 n/a,n/a 2007-10-03 08:40:08 222-1-24650-1281 residence,home2007-10-03 08:36:26 222-1-24650-1281 residence,home 2007-10-03 08:33:02 222-1-24650-1281 residence,home

Cluster 1 (Static)Start 08:58End 11:02CGI 222-1-61101-162201VP CGI Office, TILabVP Bth Not available

Cluster 2 (Movement)Start 08:42End 08:56CGI From 222-1-24550-1281CGI To 222-1-24650-121 VP CGI From Residence,homeVP CGI To Office, TILabVP Bth Not available

Timestamp Cell ID Cell ID Virtual Place

Page 6: Time Based Cluster Analysis for Automatic Blog Generation

6

Luca Cos ta be lloLa ure nt-Wa lte r Goix

Time Based Context Cluster Analysis for Automatic Blog Generation

Clustering Algorithms Dimensions

Location

GSM/UMTS Cell IDs

User-defined Cell ID Labels

Time

Chronological order of actions must be respected

Categorical attributes

Euclidean distance not available

Time must be evaluated according to

“temporal distance”

Ad-hoc algorithms had to be designed

Page 7: Time Based Cluster Analysis for Automatic Blog Generation

7

Luca Cos ta be lloLa ure nt-Wa lte r Goix

Time Based Context Cluster Analysis for Automatic Blog Generation

Cell-Based Location Data Issues

Context updates occur with variable frequency

Detecting static situations VS detecting movement

Base station concentration affects context data patterns

Frequent cell handovers during static actions

Page 8: Time Based Cluster Analysis for Automatic Blog Generation

8

Luca Cos ta be lloLa ure nt-Wa lte r Goix

Time Based Context Cluster Analysis for Automatic Blog Generation

Compare&Merge Algorithm

2007-10-03 11:02:33 222-1-61101-72162201 office,tilab 2007-10-03 10:59:09 222-1-61101-72162201 office,tilab 2007-10-03 10:55:46 222-1-61101-72162201 office,tilab 2007-10-03 10:52:41 222-1-61101-64530928 n/a,n/a2007-10-03 10:48:59 222-1-61101-72162201 office,tilab 2007-10-03 10:45:34 222-1-61101-72162201 office,tilab 2007-10-03 10:42:11 222-1-61101-64530928 n/a,n/a2007-10-03 10:38:47 222-1-61101-72162201 office,tilab 2007-10-03 10:37:47 222-1-61101-72162201 office,tilab 2007-10-03 09:27:01 222-1-61101-72157899 office,tilab 2007-10-03 08:58:11 222-1-61104-72386176 n/a,n/a 2007-10-03 08:56:28 222-1-24650-121 n/a,n/a 2007-10-03 08:56:05 222-1-24650-122 n/a,n/a 2007-10-03 08:54:20 222-1-54650-923 n/a,n/a 2007-10-03 08:51:31 222-1-61104-72395762 n/a,n/a 2007-10-03 08:49:16 222-1-61104-72384437 n/a,n/a 2007-10-03 08:48:47 222-1-61104-72395762 n/a,n/a 2007-10-03 08:48:18 222-1-61104-72384437 n/a,n/a

Context History Preliminary Context Scan

Long Temporary Cluster

Short Temporary Clusters

Temporary Clusters Merge

Static Cluster

Movement Cluster

Static Cluster

Page 9: Time Based Cluster Analysis for Automatic Blog Generation

9

Luca Cos ta be lloLa ure nt-Wa lte r Goix

Time Based Context Cluster Analysis for Automatic Blog Generation

MultiLevel Sliding Window Algorithm

For each window iteration:

2. Check if any user-defined label is available.

3. Detect user movement

4. Detect the most frequent position

5. Merge window data with previous window iteration (if detected position is the same)

Page 10: Time Based Cluster Analysis for Automatic Blog Generation

10

Luca Cos ta be lloLa ure nt-Wa lte r Goix

Time Based Context Cluster Analysis for Automatic Blog Generation

Algorithms Comparison

Lower precision than C&M.

(A 30 minute long window leads to a less than 30 minutes error)

Very high in optimal situations

(less than 2-5 minutes)Precision

Non-labeled areas

Frequent cell handovers

Good user labeling

Cells with low handovers issuesOptimal usage

NoneFrequent cell handoversCritical situations

MultiLevel Sliding WindowCompare&Merge

Page 11: Time Based Cluster Analysis for Automatic Blog Generation

11

Luca Cos ta be lloLa ure nt-Wa lte r Goix

Time Based Context Cluster Analysis for Automatic Blog Generation

Cluster Analysis Accuracy VS User Perception

Page 12: Time Based Cluster Analysis for Automatic Blog Generation

12

Luca Cos ta be lloLa ure nt-Wa lte r Goix

Time Based Context Cluster Analysis for Automatic Blog Generation

From Clusters To Blog Post

Context Clusters NLG

Natural Text Generation

Action Detector

User Preferences

Page 13: Time Based Cluster Analysis for Automatic Blog Generation

13

Luca Cos ta be lloLa ure nt-Wa lte r Goix

Time Based Context Cluster Analysis for Automatic Blog Generation

Results

Mining context history leads to user pattern discovery

Daily actions sharing

Detection of user communities, according to daily behaviors

Clustering accuracy VS personal memories perception

Movement detection

Location-labeling importance

Page 14: Time Based Cluster Analysis for Automatic Blog Generation

14

Luca Cos ta be lloLa ure nt-Wa lte r Goix

Time Based Context Cluster Analysis for Automatic Blog Generation

Any Questions?Thank You!

[email protected]

[email protected]

Email