SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker:...

Preview:

Citation preview

SCS CMU

Joint Work by

Hanghang Tong, Yasushi Sakurai,

Tina Eliassi-Rad, Christos Faloutsos

Speaker: Hanghang Tong

Oct. 26-30, 2008, Napa, CA CIKM 2008

Fast Mining of Complex Time-Stamped Events

SCS CMU

A Motivating Example: InputsTime Event(e.g., Session) EntityOct. 26 Link Analysis Tom, Bob

Clustering Bob, Alan

Oct. 27 Classification Bob, Alan

Anomaly Detection Alan, Beck

Oct. 28 Party Beck, Dan

Oct. 29 Web Search Dan, Jack

Advertising Jack, Peter

Oct. 30 Enterprise Search Jack, Peter

Oct. 31 Q & A Peter, Smith2

SCS CMU

Time Cluster, rep. entities: b7,b6, b8

A Motivating Example: Outputs

Jack

Oct. 29

Oct. 30Oct. 30

Oct. 28

Oct. 26

Oct. 27

Time ClusterRep. Entities:

``Jack’’, ``Peter’’, ``Smith’’

Abnormal TimeRep. Entities:

``Beck’’ ``Dan’’

Time ClusterRep. Entities:

``Tom’’, ``Bob’’,``Alan’’

1st eigen-vector

2n

d eig

en-vector

SCS CMU

Problem Definitions: (How to Understand Time in such complex context)

• Given datasets collected at different time stamps;

• Find– Q1: Time Clusters

– Q2: Abnormal Time stamps

– Q3: Interpretations

– Q4: Right time granularity

4

SCS CMU

Roadmap• Motivation

• T3: Single Resolution Analysis

• MT3: Multi Resolution Analysis

• Experimental Evaluations

• Conclusion

5

SCS CMU

T3: Single Resolution Analysis

• Given the data sets collected at different time stamps…

• Find –(1) Clusters for time stamps

–(2) Abnormal time stamps

–(3) Interpretations

6

SCS CMU

How to represent the data sets?Time Event(e.g., Session) EntityOct. 26 Link Analysis Tom, Bob

Clustering Bob, Alan

Oct. 27 Classification Bob, Alan

Anomaly Detection Alan, Beck

Oct. 28 Party Beck, Dan

Oct. 29 Web Search Dan, Jack

Advertising Jack, Peter

Oct. 30 Enterprise Search Jack, Peter

Oct. 31 Q & A Peter, Smith7

SCS CMU

A: Graph Representation!

Oct. 26, 2008

Oct. 27, 2008

Oct. 28, 2008

Oct. 29, 2008

Oct. 30, 2008

Oct. 31, 2008

Link Analysis

Clustering

Classification

Anomaly Dect.

Party

Web Search

Advertising

En. Search

Q & A

Tom

Bob

Alan

Beck

Dan

Jack

Peter

Smith8

SCS CMU

A: Graph Representation!

Oct. 26, 2008

Oct. 27, 2008

Oct. 28, 2008

Oct. 29, 2008

Oct. 30, 2008

Oct. 31, 2008

Link Analysis

Clustering

Classification

Anomaly Dect.

Party

Web Search

Advertising

En. Search

Q & A

Tom

Bob

Alan

Beck

Dan

Jack

Peter

Smith9

Prof.

CEO

Stu.

SCS CMU

Qs: Given the graph,

• How to cluster time nodes?

• How to spot abnormal time nodes?

• How to interpret?

10

SCS CMU

Q1: How to cluster time nodes?

• Step 1: Time-To-Time (TT) proximity matrix

Oct. 26

Oct. 27

Oct. 28

Oct. 29

Oct. 30

Oct. 31

Oct

. 26

Oct

. 27

Oct

. 28

Oct

. 29

Oct

. 30

Oct

. 31

11

SCS CMU

Q1: How to cluster time nodes?

• Step 2: Cluster time nodes by TT matrix– Spectral Cluster Alg. (and a lot of others)

Oct. 26

Oct. 27

Oct. 28

Oct. 29

Oct. 30

Oct. 31

Oct

. 26

Oct

. 27

Oct

. 28

Oct

. 29

Oct

. 30

Oct

. 31

12

SCS CMU

Q2: how to find abnormal time node?

• Abnormal time = Time cluster with singleton

Oct. 26

Oct. 27

Oct. 28

Oct. 29

Oct. 30

Oct. 31

Oct

. 26

Oct

. 27

Oct

. 28

Oct

. 29

Oct

. 30

Oct

. 31

Oct. 28 is abnormal! 13

SCS CMU

Q3: How to interpret?

• Step 1: Time-to-People (TP) proximity matrix

.9 .8 .8 .01 .01 .01 .01 .3

.5 .8 .8 .5 .01 .01 .01 .3

.2 .2 .2 .9 .9 .01 .01 .3

.01 .01 .01 .01 .5 .8 .8 .8

.01 .01 .01 .01 .01 .8 .8 .8

.01 .01 .01 .01 .01 .8 .8 .8

Tom

Oct. 26

Oct. 27

Oct. 28

Oct. 29

Oct. 30

Oct. 31

Bob

Ala

n

Bec

k

Dan Jack

Pete

rSm

ith

e.g., we want to use people to interpret time cluster/anomaly14

SCS CMU

Q3: How to interpret?

• Step 2: Time Cluster-to-People (TCP) matrix

.9 .8 .8 .01 .01 .01 .01 .3

.5 .8 .8 .5 .01 .01 .01 .3

.2 .2 .2 .9 .9 .2 .2 .3

.01 .01 .01 .01 .5 .8 .8 .8

.01 .01 .01 .01 .01 .8 .8 .8

.01 .01 .01 .01 .01 .8 .8 .8

Tom

Oct. 26

Oct. 27

Oct. 28

Oct. 29

Oct. 30

Oct. 31

Bob

Ala

n

Bec

k

Dan Jack

Pete

rSm

ith

e.g., we want to use people to interpret time cluster/anomaly15

SCS CMU

Q3: How to interpret?

• Step 2: Time Cluster-to-People (TCP) matrix

Tom

Oct. 26

Oct. 27

Oct. 28

Oct. 29

Oct. 30

Oct. 31

Bob

Ala

n

Bec

k

Dan Jack

Pete

rSm

ith

e.g., we want to use people to interpret time cluster/anomaly16

SCS CMU

Q3: How to interpret?• Step 3: Find `unique’ entity nodes

Tom

Oct. 26

Oct. 27

Oct. 28

Oct. 29

Oct. 30

Oct. 31

Bob Ala

n

Bec

k

Dan Jack

Pete

rSm

ith

.9

.8

.8

e.g., “Bob is close to green cluster on average, but far away from both red & blue clusters”

SCS CMU

Summary So Far…

• Given the data sets collected at different time stamps,

• We– Construct a graph representation– Get two proximity matrices– Find time clusters/abnormal time stamps– Provide the interpretations.

Q: How to get proximity matrices ? 18

SCS CMU

How to get proximity matrices ? (i.e., TT/TP matrices)

A BH1 1

D1 1

E

F

G1 11

I J1

1 1

a.k.a Relevance, Closeness, ‘Similarity’… 19

e.g., Oct. 28, 2008e.g., Oct. 28, 2008Or, ``John Smith’’

SCS CMU

What is a ``good’’ Proximity?

A BH1 1

D1 1

E

F

G1 11

I J1

1 1

• Multiple Connections/paths

• Quality of connection

•Direct & In-directed Conns

•Length, Degree, Weight…

20

SCS CMU

1

4

3

2

56

7

910

8

11

12

Random walk with restart

21

SCS CMU

Random walk with restart

Node 4

Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12

0.130.100.130.220.130.050.050.080.040.030.040.02

1

4

3

2

56

7

910

811

120.13

0.10

0.13

0.13

0.05

0.05

0.08

0.04

0.02

0.04

0.03

Ranking vector More red, more relevant

Nearby nodes, higher scores

4r

22

SCS CMU

Computing RWR

1

43

2

5 6

7

9 10

811

12

0.13 0 1/3 1/3 1/3 0 0 0 0 0 0 0 0

0.10 1/3 0 1/3 0 0 0 0 1/4 0 0 0

0.13

0.22

0.13

0.050.9

0.05

0.08

0.04

0.03

0.04

0.02

0

1/3 1/3 0 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 1/4 0 0 0 0 0 0 0

0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0

0 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0

0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0

0 0 0 0 0 0 0 1/4 0 1/3 0 0

0 0 0 0 0 0 0 0 1/2 0 1/3 1/2

0 0 0 0 0 0 0 1/4 0 1/3 0 1/2

0 0 0 0 0 0 0 0 0 1/3 1/3 0

0.13 0

0.10 0

0.13 0

0.22

0.13 0

0.05 00.1

0.05 0

0.08 0

0.04 0

0.03 0

0.04 0

2 0

1

0.0

n x n n x 1n x 1

Ranking vector Starting vectorAdjacency matrix

1

(1 )i i ir cWr c e

Restart p

A lot of techniques exist to solve this, - e.g., Iterative method

23

SCS CMU

Roadmap• Motivation

• T3: Single Resolution Analysis

• MT3: Multi Resolution Analysis

• Experimental Evaluations

• Conclusion

24

SCS CMU

MT3: Multiple Resolution Analysis

• Given – (1) the data sets collected at different time

stamps; – (2) different time resolutions

• Find – (1) Clusters for time stamps– (2) Abnormal time stamps– (3) Interpretations

• At each of the given resolutions, efficiently.25

SCS CMU

• Given

• We want to …– (At the Finest Res.) Mine & Interpret

• `Oct 26’, `Oct 27’, `Oct 28’, `Oct 29’, `Oct 30’, `Oct 31’

– (At the coarser Res.) Mine & Interpret • `Oct 26-27’, `Oct 28-29’, `Oct 30-31’

MT3:an example

26

SCS CMU

Outputs

At the finest resolution At the coaser resolution

27

SCS CMU

MT3: How to (Naïve Solution)

TT

TP

Time Cluster& Anomaly

Annotations/interpretations

TT

TP

Time Cluster& Anomaly

~

~28

Annotations/interpretations

SCS CMU

Challenges

• Given the mining results at the finest resolution,

• How to speed up the analysis at the coarser resolutions?

29

SCS CMU

MT3: Observation

A lot of overlap between two graphs !

for finest resolution for coarser resolution

SCS CMU

MT3: Solution

TT TP TTTP

~ ~31

SCS CMU

Roadmap• Motivation

• T3: Single Resolution Analysis

• MT3: Multi Resolution Analysis

• Experimental Evaluations

• Conclusion

32

SCS CMU

Data Sets• CIKM: from CIKM proceedings

• Time: Publication year (1993-2007, 15)• Event: Paper-published (952)• Entities: Author (1895) & Session (279)• Attribute: Keyword (158)

• DeviceScan: from MIT Reality Mining• Time: the day scanning happened

• 1/1/2004-5/5/2005, 294• Event: blue tooth device scanning person

(114, 046)• Entities: Device (103) & Person (97)• Attribute: NA

33

SCS CMU

T3 on `CIKM’ Data Set Rep. Authors Rep. Keywords

James. P. CallanW. Bruce CroftJames AllanPhilip S. Yu

George KarypisCharles Clarke

WebCluster

ClassificationXML

LanguageStream

Rep. Authors Rep. Keywords

Elke RundensteinerDaniel MirankerAndreas Henrich

Il-Yeol SongScott B Huffman

Robert J. Hall

KnowledgeSystem

UnstructuredRule

Object-orientedDeductive 34

SCS CMU

MT3 on `DeviceScan’ Data Set

Aggregate by Month

Apr. 2004 is anomaly

Aggregate by Day

Work day

Semester Break & Holiday

35

SCS CMU

Evaluation on Speed of MT3

Aggregation Length

Log Time (Sec.)

MT3

Naïve Sol.

DeviceScan Data Set120x speed up

36

SCS CMU

Conclusion• T3: Single Resolution Analysis

• Graph Representation• Using Proximity to

• Find Time Cluster/Anomaly• Provide Interpretations

• MT3: Multiple Resolution Analysis• Redundancy among different resolutions• Up to 2 orders of magnitude speedup

• (same quality)37

SCS CMU

Thank you!

htong@cs.cmu.edu

38

Recommended