127
Mining Large Dynamic Graphs and Tensors Kijung Shin Ph.D. Candidate ([email protected])

Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Mining Large DynamicGraphs and Tensors

Kijung Shin

Ph.D. Candidate

([email protected])

Page 2: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Thesis Committee• Prof. Christos Faloutsos (Chair)

• Prof. Tom M. Mitchell

• Prof. Leman Akoglu

• Prof. Philip S. Yu

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 2/106

Page 3: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 3/127

What Do Real Graphs Look Like?• Part 1 (Chapters 3 - 8)

Page 4: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 4/127

How to Spot Anomalies?• Part 2 (Chapters 9 - 13)

Page 5: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

How to Model Behavior?

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 5/127

• Part 3 (Chapters 14 - 15)

Page 6: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Graphs are Everywhere!

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 6/127

Page 7: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Graphs are Large and Dynamic• Large: many nodes, more edges

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 7/127

2B+ active users

500M+ products

•Dynamic: additions/deletions of nodes and edges

40B+ web pages

5M+ articles

Page 8: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

.. and with Rich Side Information•Rich: timestamps, scores, text, etc.

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 8/127

Poor Quality

Satisfied ☺

Page 9: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Simple Graphs are Matrices

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 9/127

0 0

1 1

1 0

0

1 1

Graph Adjacency Matrix

Page 10: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Rich Graphs are Tensors

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 10/127

1

00

0

0

•Tensors: multi-dimensional array

3-order tensor(3-dimensional array)

+ Stars(4-order tensor)

+ Text(5-order tensor)

Satisfied ☺

0

Page 11: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Thesis Goal and Focus•Goal:

•Our Focus: To Develop Scalable Algorithms for

▪T1. Structure Analysis (Part 1)

▪T2. Anomaly Detection (Part 2)

▪T3. Behavior Modeling (Part 3)

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 11/127

To Fully Understand and Utilize Large Dynamic Graphs and Tensors

on User Behavior

Page 12: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Tasks and Their Relation

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 12/127

“How do they look like?”

•Given large dynamic graphs or tensors,

“How to spot anomalies?”

“How to model behavior?”

T1.Structure

T2.Anomaly

T3.Behavior

contrast

Page 13: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Our Tools for Scalability •We design (sub) linear algorithms

•Running on big data platforms

• Exploiting empirical patterns in data◦ locality, power-laws, etc.

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 13/127

Approx. Sampling Streaming Out-of-core Parallel

Page 14: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Organization of the Thesis

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 14/127

Part1.Structure Analysis

Part2. AnomalyDetection

Part3. BehaviorModeling

GraphsTriangle Count

(§§ 3-6) Anomalous Subgraph

(§ 9)

PurchaseBehavior

(§ 14)Summarization(§ 7)

Tensors Summarization(§ 8)

Dense Subtensors(§§ 10-13)

Progression(§ 15)

Chapter

Chapters

Page 15: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Focuses of This Presentation

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 15/127

Part1.Structure Analysis

Part2. AnomalyDetection

Part3. BehaviorModeling

GraphsTriangle Count

(§§ 3-6) Anomalous Subgraph

(§ 9)

PurchaseBehavior

(§ 14)Summarization(§ 7)

Tensors Summarization(§ 8)

Dense Subtensors(§§ 10-13)

Progression(§ 15)

Page 16: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Roadmap• T1. Structure Analysis (Part 1) <<

• T2. Anomaly Detection (Part 2)

• T3. Behavior Modeling (Part 3)

• Future Directions

•Conclusion

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 16/127

Page 17: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

T1. Structure Analysis (Part 1)

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 17/127

“Given a large graph (or tensor), how can we analyze its structure?

𝑎

𝑏

𝑐

𝑑

𝑒

𝑓

𝑔

Structure measures- density- clustering coefficients- transitivity ratio- triangle connectivity× 4

× 9

× 7

Basic statisticsInput graph

T1-1. Triangle Counting

T1-2. Summarization𝑎, 𝑏 𝑐, 𝑑, 𝑒 𝑓, 𝑔

Summary graph

Page 18: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

T1-1. Triangle Counting (§§ 4-6)

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 18/127

“Given a large dynamic graph, how can we track the count of triangles

accurately with sub-linear memory?”

× 4 × 6

, + , , − , , +

Sources Destination

Page 19: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

T1-1. Triangle Counting (§§ 4-6)

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 19/127

, + , , − , , +

Sources Destination

Created at9:21 AM

Created at9:08 AM

Created at9:02 AM

How can we exploit temporal patterns? (§4)

How can we make good use of multiple machines? (§5)

How can we handle removed edges? (§6)

Page 20: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Roadmap• T1. Structure Analysis (Part 1)

◦ T1.1 Triangle Counting

▪Handling Deletions (§6) <<▪…

◦ …

• T2. Anomaly Detection (Part 2)

• T3. Behavior Modeling (Part 3)

• Future Directions

•Conclusions

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 20/127

K. Shin, J. Kim B. Hooi, C. Faloutsos, “Think before You Discard: Accurate Triangle Counting in Graph Streams with Deletions”, ECML/PKDD 2018

Page 21: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Triangles in a Graph•A triangle is 3 nodes connected to each other

• The count of triangles is an important primitive◦ Applications: ▪community detection, spam detection, link prediction

◦ Structure measures:▪transitivity ratio, clustering coefficients, trussness

21/106

1 2

3

4

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 22: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Remaining Challenge•Counting triangles in real-world graphs

◦ Large: not fitting in main memory

◦ Fully dynamic: both growing and shrinking

22/127

Online social networks

Citation networks Call networks

Web

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 23: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Previous Work

23/127

LargeGraph

Fully dynamic GraphAccurate

Growing Shrinking

MASCOT [LJK18]

Triest-IMPR [DERU17]

WRS [Shi17]

ESD [HS17]

Triest-FD [DERU17]

- Given: a large and fully-dynamic graph- To estimate: the count of triangles accurately

ThinkD (Proposed)

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 24: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Our Contribution: ThinkD

24/127

•We develop ThinkD (Think before You Discard):

Fast & Accurate: outperforming competitors

Scalable: linear data scalability

Theoretically Sound: unbiased estimates

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 25: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Roadmap• T1. Structure Analysis (Part 1)

◦ T1.1 Triangle Counting

▪Handling Deletions (§6)

◦ Problem Definition <<

◦ Proposed Method: ThinkD

◦ Experiments▪…

• T2. Anomaly Detection (Part 2)

• T3. Behavior Modeling (Part 3)

•…

25/127Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

Page 26: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

26/127

Fully Dynamic Graph Stream•Our model for a large and fully-dynamic graph

•Discrete time 𝑡, starting from 1 and ever increasing

•At each time 𝑡, a change in the input graph arrives◦ change: either an insertion or deletion of an edge

𝑎 𝑏

Time 𝑡 1

Change(given)

+(𝑎, 𝑏)

Graph(unmate-rialized)

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 27: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Fully Dynamic Graph Stream

27/127

•Our model for a large and fully-dynamic graph

•Discrete time 𝑡, starting from 1 and ever increasing

•At each time 𝑡, a change in the input graph arrives◦ change: either an insertion or deletion of an edge

𝑐

𝑎 𝑏 𝑎 𝑏

Time 𝑡 1 2

Change(given)

+(𝑎, 𝑏) +(𝑎, 𝑐)

Graph(unmate-rialized)

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 28: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Fully Dynamic Graph Stream

28/127

•Our model for a large and fully-dynamic graph

•Discrete time 𝑡, starting from 1 and ever increasing

•At each time 𝑡, a change in the input graph arrives◦ change: either an insertion or deletion of an edge

𝑐

𝑎 𝑏 𝑎 𝑏

Time 𝑡 1 2 3

Change(given)

+(𝑎, 𝑏) +(𝑎, 𝑐) +(𝑏, 𝑐)

Graph(unmate-rialized)

𝑐

𝑎 𝑏

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 29: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Fully Dynamic Graph Stream

29/127

•Our model for a large and fully-dynamic graph

•Discrete time 𝑡, starting from 1 and ever increasing

•At each time 𝑡, a change in the input graph arrives◦ change: either an insertion or deletion of an edge

𝑐

𝑎 𝑏 𝑎 𝑏

Time 𝑡 1 2 3 4

Change(given)

+(𝑎, 𝑏) +(𝑎, 𝑐) +(𝑏, 𝑐) −(𝑎, 𝑏)

Graph(unmate-rialized)

𝑐

𝑎 𝑏

𝑐

𝑎 𝑏

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 30: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Fully Dynamic Graph Stream

30/127

•Our model for a large and fully-dynamic graph

•Discrete time 𝑡, starting from 1 and ever increasing

•At each time 𝑡, a change in the input graph arrives◦ change: either an insertion or deletion of an edge

𝑐

𝑎 𝑏 𝑎 𝑏

Time 𝑡 1 2 3 4 5 …

Change(given)

+(𝑎, 𝑏) +(𝑎, 𝑐) +(𝑏, 𝑐) −(𝑎, 𝑏) +(𝑏, 𝑑) …

Graph(unmate-rialized)

…𝑐

𝑎 𝑏

𝑐

𝑎 𝑏

𝑐 𝑑

𝑎 𝑏

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 31: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Fully Dynamic Graph Stream

31/127

•Our model for a large and fully-dynamic graph

•Discrete time 𝑡, starting from 1 and ever increasing

•At each time 𝑡, a change in the input graph arrives◦ change: either an insertion or deletion of an edge

𝑐

𝑎 𝑏 𝑎 𝑏

Time 𝑡 1 2 3 4 5 …

Change(given)

+(𝑎, 𝑏) +(𝑎, 𝑐) +(𝑏, 𝑐) −(𝑎, 𝑏) +(𝑏, 𝑑) …

Graph(unmate-rialized)

…𝑐

𝑎 𝑏

𝑐

𝑎 𝑏

𝑐 𝑑

𝑎 𝑏

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Not Materialized

Page 32: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Problem Definition

32/127

•Given: ◦ a fully-dynamic graph stream (possibly infinite)

◦ memory space (finite)

• Estimate: the count of triangles

• To Minimize: estimation error

Time 𝑡 1 2 3 4 5 …

Changes +(𝑎, 𝑏) +(𝑎, 𝑐) +(𝑏, 𝑐) −(𝑎, 𝑏) +(𝑏, 𝑑) …

# Triangles…

Given(input)

Estimate(output)

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 33: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Roadmap• T1. Structure Analysis (Part 1)

◦ T1.1 Triangle Counting▪Handling Deletions (§6)

◦ Problem Definition◦ Proposed Method: ThinkD <<◦ Experiments▪…

• T2. Anomaly Detection (Part 2)

• T3. Behavior Modeling (Part 3)

• Future Directions

•Conclusions

33/127Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 34: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Overview of ThinkD•Maintains and updates ∆

◦ Number of (non-deleted) triangles that it has observed

•How it processes an insertion:

34/127

- arrive: an insertion of an edge arrives

- count: count new triangles and increase ∆- test: toss a coin - store: store the edge in memory

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

storearrive

No

(keep?)Yes

testcount:

increase ∆

Page 35: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Overview of ThinkD (cont.)

35/127

•Maintains and updates ∆◦ Number of (non-deleted) triangles that it has observed

•How it processes an deletion:

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

deletearriveYes

- arrive: a deletion of an edge arrives

- count: count deleted triangles and decrease ∆- test: test whether the edge is stored in memory- delete: delete the edge in memory

No

test(stored?)

count:

decrease ∆

Page 36: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Why is ThinkD Accurate?

36/125

• ThinkD (Think before You Discard):◦ every arrived change is used to update ∆

• Triest-FD [DERU17]:◦ some changes are discarded without being used to update ∆

store /delete

testarrivecount:

update ∆ Yes

No (discard)

store /delete

arriveYes

No (discard) → information loss!

count:update ∆

test

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 37: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Two Versions of ThinkD

37/127

• ThinkD-FAST: simple and fast◦ independent Bernoulli trials with probability 𝑝

• ThinkD-ACC: accurate and parameter-free◦ random pairing [GLH08]

Q1: How to test in the test step

Q2: How to estimate the count of all triangles from ∆

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 38: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Unbiasedness of ThinkD-FAST•∆

𝒑𝟐: estimated count of all triangles

•∆: true count of all triangles

[ Theorem 1 ] At any time 𝒕,

• Proof and a variance of ∆/p2: see the thesis

38/127

𝔼∆

𝒑𝟐= 𝜟

Unbiased estimate of 𝜟

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 39: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

ThinkD-ACC: More Accurate•Disadvantage of ThinkD-FAST:

◦ setting the parameter 𝑝 is not trivial

▪small 𝑝 → underutilize memory

→ inaccurate estimation

▪large 𝑝 → out-of-memory error

• ThinkD-ACC uses Random Pairing [RLH08] ◦ always utilizes memory as fully as possible

◦ gives more accurate estimation

39/127Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 40: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

40/127

Scalability of ThinkD• Let 𝑘 be the size of memory

• For processing 𝑡 changes in the input stream,

[ Theorem 2 ] The time complexity of ThinkD-ACC is

[ Theorem 3 ] If 𝑝 = 𝑂𝑘

𝑡,

the time complexity ThinkD-FAST is

𝑂(𝑘 ⋅ 𝑡) linear in data size

𝑂(𝑘 ⋅ 𝑡)

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 41: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

41/127

Advantages of ThinkD

Fast & Accurate: outperforming competitors

Scalable: linear data scalability (Theorems 2 & 3)

Theoretically Sound: unbiased estimates (Theorem 1)

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 42: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Roadmap• T1. Structure Analysis (Part 1)

◦ T1.1 Triangle Counting

▪Handling Deletions (§6)

◦ Problem Definition

◦ Proposed Method: ThinkD

◦ Experiments <<▪…

• T2. Anomaly Detection (Part 2)

• T3. Behavior Modeling (Part 3)

•…

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 42/127

Page 43: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Experimental Settings•Competitors: Triest-FD [DERU17] & ESD [HS17]

◦ triangle counting in fully-dynamic graph streams

• Implementations:

•Datasets:◦ insertions (edges in graphs) + deletions (random 20%)

43/127

Web(6M+)

Citation(16M+)

Social Networks(1.8B+ edges, …)

Synthetic(100B edges)

Trust(0.7M+)

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 44: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

EXP1. Variance Analysis

44/127

ThinkD is accurate with small varianceTriest-FD

ThinkD-ACC

ThinkD-FAST

True Count

Number of Processed Changes

- dataset:

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 45: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

EXP2. Scalability [THM 2 & 3]

45/127

ThinkD-ACC

ThinkD-FAST

Number of Changes

- dataset:

100 billion changes

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

ThinkD is scalable

Page 46: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

EXP3. Space & Accuracy

46/127

ThinkD outperforms its best competitors

ThinkD-FAST

ThinkD-ACC

Memory budget (ratio)

Estim

ation E

rror

(ratio)

Triest-FD

ESD

- dataset:

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 47: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

EXP4. Speed & Accuracy

47/127

Running time (Sec)

Estim

ation E

rror

(ratio)

ThinkD-FAST

ThinkD-ACC

ESD

Triest-FD

- dataset:

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

ThinkD outperforms its best competitors

Page 48: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Advantages of ThinkD

48/127

Fast & Accurate: outperforming competitors

Scalable: linear data scalability

Theoretically Sound: unbiased estimates

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 49: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Summary of §6•We propose ThinkD (Think Before you Discard)

◦ for accurate triangle counting

◦ in large and fully-dynamic graphs

49/127

ThinkDDownload

Fast & Accurate: outperforming competitors

Scalable: linear data scalability

Theoretically Sound: unbiased estimates

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 50: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 50/127

Part1.Structure Analysis

Part2. AnomalyDetection

Part3. BehaviorModeling

GraphsTriangle Count

(§§ 3-6) Anomalous Subgraph

(§ 9)

PurchaseBehavior

(§ 14)Summarization(§ 7)

Tensors Summarization(§ 8)

Dense Subtensors(§§ 10-13)

Progression(§ 15)

Organization of the Thesis (Recall)

Page 51: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

T1.2 Summarization

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 51/127

T1-2𝑎, 𝑏 𝑐, 𝑑, 𝑒 𝑓, 𝑔

𝑎

𝑏

𝑐

𝑑

𝑒

𝑓

𝑔

“Given a web-scale graph or tensor, how can we succinctly represent it?”

Input graph Summary graph

• §7: Summarizing Graphs

• §8: Summarizing Tensors (via Tucker Decomposition)◦ External-memory algorithm with 1,000× improved scalability

Page 52: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Roadmap• T1. Structure Analysis (Part 1)

◦ …◦ T1-2. Summarization (§§ 7-8)▪Summarizing Graphs (§ 7)

◦ Problem Definition <<◦ Proposed Method: SWeG◦ Experiments▪…

• T2. Anomaly Detection (Part 2)

• T3. Behavior Modeling (Part 3)

•…

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 52/127K. Shin, A. Ghoting, M. Kim, and H. Raghavan, “SWeG: Lossless and Lossy

Summarization of Web-Scale Graphs”, WWW 2019

Page 53: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Graph Summarization: Example

53/127

− (𝑎, 𝑑)𝑎

𝑏

𝑐

𝑑

𝑒

𝑓

𝑔 𝑎, 𝑏

𝑐

𝑑

𝑒

𝑓

𝑔

𝑎, 𝑏 𝑐, 𝑑, 𝑒

𝑓

𝑔

− (𝑎, 𝑑)− (𝑐, 𝑒)+ 𝑑, 𝑔𝑎, 𝑏 𝑐, 𝑑, 𝑒 𝑓, 𝑔

− (𝑎, 𝑑)− (𝑐, 𝑒)+ 𝑑, 𝑔

Input Graph (w/ 9 edges)

Output (w/ 6 edges)

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 54: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Graph Summarization [NRS08]•Given: an input graph

• Find:◦ a summary graph

◦ positive and negative residual graphs

• To Minimize: the edge count (≈ description length)

54/127

Summary Graph

Residual Graph (Positive)

Residual Graph (Negative)

𝑎

𝑏

𝑑

𝑒

𝑓

𝑔 𝑎, 𝑏 𝑐, 𝑑, 𝑒 𝑓, 𝑔− (𝑎, 𝑑)− (𝑐, 𝑒)

+ 𝑑, 𝑔

Input Graph

𝑐

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 55: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Restoration: Example

55/127

𝑎, 𝑏 𝑐, 𝑑, 𝑒 𝑓, 𝑔

− (𝑎, 𝑑)− (𝑐, 𝑒)+ 𝑑, 𝑔 𝑎, 𝑏 𝑐, 𝑑, 𝑒

𝑓

𝑔

− (𝑎, 𝑑)− (𝑐, 𝑒)+ 𝑑, 𝑔

𝑎, 𝑏

𝑐

𝑑

𝑒

𝑓

𝑔− (𝑎, 𝑑)

𝑎

𝑏

𝑐

𝑑

𝑒

𝑓

𝑔

Restored Graph (w/ 9 edges)

Summarized Graph (w/ 6 edges)

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 56: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Why Graph Summarization?• Summarization:

◦ the summary graph is easy to visualize and interpret

•Compression:◦ support efficient neighbor queries

◦ applicable to lossy compression

◦ combinable with other graph compression techniques

▪the outputs are also graphs

56/127

Summary Graph

Residual Graph (Positive)

Residual Graph (Negative)𝑎, 𝑏 𝑐, 𝑑, 𝑒 𝑓, 𝑔

− (𝑎, 𝑑)− (𝑐, 𝑒)

+ 𝑑, 𝑔

discussed in the thesis

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 57: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Challenge: Scalability!

57/127

Maximum Size of Graphs

Compression Performance

Good

Bad

VoG [KKVF14]Greedy [NSR08]

millions 10 millions

Randomized [NSR08]

SAGS [KNL15]

billions

10,000 ×

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

SWeG

Page 58: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Our Contribution: SWeG

58/127

Fast with Concise Outputs

Memory Efficient

Scalable

•We develop SWeG (Summarizing Web-scale Graphs):

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 59: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Roadmap• T1. Structure Analysis (Part 1)

◦ …◦ T1-2. Summarization (§§ 7-8)▪Summarizing Graphs (§ 7)

◦ Problem Definition ◦ Proposed Method: SWeG <<◦ Experiments▪…

• T2. Anomaly Detection (Part 2)

• T3. Behavior Modeling (Part 3)

•…

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 59/127

Page 60: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Terminologies

60/127

Summary Graph 𝑺

{𝑎, 𝑏}= 𝐴

𝑐, 𝑑, 𝑒= 𝐵

𝑓, 𝑔= 𝐶

− (𝑎, 𝑑)− (𝑐, 𝑒)

+ 𝑑, 𝑔

super node

Residual Graph 𝑹 PositiveResidual Graph 𝑹+

Negative Residual Graph 𝑹−

𝑺𝒂𝒗𝒊𝒏𝒈 𝑨,𝑩 := 1 −𝐶𝑜𝑠𝑡(𝐴 ∪ 𝐵)

𝐶𝑜𝑠𝑡 𝐴 + 𝐶𝑜𝑠𝑡(𝐵)

Encoding cost when 𝐴 and 𝐵

are merged

Encoding cost of 𝐴 Encoding cost of 𝐵

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 61: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Overview of SWeG• Inputs: - input graph 𝑮

- number of iterations 𝑻

•Outputs: - summary graph 𝑺

- residual graph 𝑹 (or 𝑹+ and 𝑹−)

•Procedure:

61/127

• S0: Initializing Step• repeat 𝑇 times

• S1-1: Dividing Step• S1-2: Merging Step

• S2: Compressing Step (optional)

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 62: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Overview: Initializing Step

62/127

𝐴 = {𝑎}

𝐵 = {𝑏}

𝐷 = {𝑑}

𝐸 = {𝑒}

𝐹 = {𝑓}

𝐺 = {𝑔}

𝐶 = {𝑐}

Summary Graph 𝑺 = 𝑮 Residual Graph 𝑹 = ∅

• S0: Initializing Step <<• repeat 𝑇 times

• S1-1: Dividing Step• S1-2: Merging Step

• S2: Compressing Step (optional)

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 63: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Overview: Dividing Step

63/127

• S0: Initializing Step• repeat 𝑇 times

• S1-1: Dividing Step <<• S1-2: Merging Step

• S2: Compressing Step (optional)

𝐴 = {𝑎}

𝐵 = {𝑏}

𝐷 = {𝑑}𝐶 = {𝑐}

𝐸 = {𝑒}

𝐹 = {𝑓}

𝐺 = {𝑔}

•Divides super nodes into groups◦ MinHashing (used), EigenSpoke, Min-Cut, etc.

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 64: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Overview: Merging Step•Merge some supernodes within each group if 𝑆𝑎𝑣𝑖𝑛𝑔 > 𝜃(𝑡)

64/127

𝐴 = {𝑎, 𝑏}𝐷 = {𝑑, 𝑒}

𝐶 = {𝑐}𝐹 = {𝑓, 𝑔}

• S0: Initializing Step • repeat 𝑇 times

• S1-1: Dividing Step• S1-2: Merging Step <<

• S2: Compressing Step (optional)

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 65: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

65/127

• S0: Initializing Step • repeat 𝑇 times

• S1-1: Dividing Step• S1-2: Merging Step

• S2: Compressing Step (optional)

𝐴 = {𝑎, 𝑏}

𝐷 = {𝑑, 𝑒}

𝐹 = {𝑓, 𝑔}

𝐶 = {𝑐}

Summary Graph 𝑺 Residual Graph 𝑹

− (𝑎, 𝑑)

+ 𝑑, 𝑔

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Overview: Merging Step (cont.)

Page 66: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Overview: Dividing Step

66/127

• S0: Initializing Step • repeat 𝑇 times

• S1-1: Dividing Step <<• S1-2: Merging Step

• S2: Compressing Step (optional)

𝐴 = {𝑎, 𝑏} 𝐷 = {𝑑, 𝑒}𝐶 = {𝑐}𝐹 = {𝑓, 𝑔}

•Divides super nodes into groups

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 67: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

67/127

𝐴 = {𝑎, 𝑏} 𝐶 = {𝑐, 𝑑, 𝑒}𝐹 = {𝑓, 𝑔}

• S0: Initializing Step • repeat 𝑇 times

• S1-1: Dividing Step• S1-2: Merging Step <<

• S2: Compressing Step (optional)

•Merge some supernodes within each group if 𝑆𝑎𝑣𝑖𝑛𝑔 > 𝜃(𝑡)

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Overview: Merging Step

Page 68: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

68/127

𝐴 = {𝑎, 𝑏}

𝐵 =𝑐, 𝑑, 𝑒

𝐹 = 𝑓, 𝑔− (𝑎, 𝑑)− (𝑐, 𝑒)

+ 𝑑, 𝑔

Summary Graph 𝑺 Residual Graph 𝑹

• S0: Initializing Step• repeat 𝑇 times

• S1-1: Dividing Step• S1-2: Merging Step

• S2: Compressing Step (optional)

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Overview: Merging Step (cont.)

Page 69: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

69/127

• S0: Initializing Step • repeat 𝑇 times

• S1-1: Dividing Step• S1-2: Merging Step <<

• S2: Compressing Step (optional)

•Merge some supernodes within each group if 𝑆𝑎𝑣𝑖𝑛𝑔 > 𝜃(𝑡)

•Decreasing 𝜃(𝑡) = 1 + 𝑡 −1

◦ exploration of other groups

◦ exploitation within each group

◦~ 30% better compression than 𝜃(𝑡) = 0

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Overview: Merging Step (cont.)

Page 70: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Overview: Compressing Step•Compress each output graph (𝑺, 𝑹+ and 𝑹−)

•Use any off-the-shelf graph-compression algorithm◦ Boldi-Vigna [BV04]

◦ VNMiner [BC08]

◦ Graph Bisection [DKKO+16]

70/127

• S0: Initializing Step• repeat 𝑇 times

• S1-1: Dividing Step• S1-2: Merging Step

• S2: Compressing Step (optional) <<

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 71: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

No need to load the entire graph in memory!

•Map stage: compute min hashes in parallel

• Shuffle stage: divide super nodes using min hashes

•Reduce stage: process groups independently in parallel

71/127

Parallel & Distributed Processing

𝐴 = {𝑎}

𝐵 = {𝑏}𝐶 = {𝑐}

MinHash = 𝟏

Merge!

𝐷 = {𝑑}

𝐸 = {𝑒}

MinHash = 𝟐

𝐹 = {𝑓}

𝐺 = {𝑔}

MinHash = 𝟑

Merge! Merge!

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 72: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Roadmap• T1. Structure Analysis (Part 1)

◦ ...◦ T1-2. Summarization (§§ 7-8)▪Summarizing Graphs (§ 7)

◦ Problem Definition ◦ Proposed Method: SWeG◦ Experiments <<▪…

• T2. Anomaly Detection (Part 2)

• T3. Behavior Modeling (Part 3)

•…

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 72/127

Page 73: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Experimental Settings

73/127

• 13 real-world graphs (10K - 20B edges)

•Graph summarization algorithms: ◦ Greedy [NRS08], Randomized [NSR08], SAGS [KNL15]

• Implementations: &

Social Collaboration Citation Web …

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 74: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

EXP1. Speed and Compression

74/127

SWeG outperforms its competitors

SWeG

- dataset:

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 75: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Advantages of SWeG (Recall)

75/127

Fast with Concise Outputs

Memory Efficient

Scalable

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 76: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Memory Usage

Input Graph

76/127

EXP2. Memory EfficiencySWeG loads ≤0.1−4% of edges

in main memory at once

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

294X1209X

Page 77: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Advantages of SWeG (Recall)

77/127

Fast with Concise Outputs

Memory Efficient

Scalable

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 78: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

78/127

About 20 iterations are enough

EXP3. Effect of Iterations

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 79: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

EXP4. Data Scalability

79/127

SWeG is linear in the number of edges

SWeG(Hadoop)

SWeG(Single machine)

≥ 𝟐𝟎 billion edges

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 80: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

EXP5. Machine Scalability

80/127

SWeG(Hadoop)

SWeG(Single machine)

SWeG scales up

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Page 81: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Advantages of SWeG (Recall)

81/127Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

Fast with Concise Outputs

Memory Efficient

Scalable

Page 82: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Summary of §7•We propose SWeG (Summarizing Web Graphs)

◦ for summarizing large-scale graphs

82/127

Fast with Concise Outputs

Memory Efficient

Scalable

Graph / TensorPart 1 / Part 2 / Part 3 §4 / §5 / §6 / §7

SWeG(Hadoop)

SWeG

Page 83: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Triangle counting algorithms [ICDM17, PKDD18, PAKDD18]

Summarization algorithms [WSDM17, WWW19]

Patent on SWeG: filed by LinkedIn Inc.

Open-source software: downloaded 82 times

83/127

Contributions and Impact (Part 1)

Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

github.com/kijungs

SWeG

ThinkD

Page 84: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 84/127

Part1.Structure Analysis

Part2. AnomalyDetection

Part3. BehaviorModeling

GraphsTriangle Count

(§§ 3-6) Anomalous Subgraph

(§ 9)

PurchaseBehavior

(§ 14)Summarization(§ 7)

Tensors Summarization(§ 8)

Dense Subtensors(§§ 10-13)

Progression(§ 15)

Organization of the Thesis (Recall)

Page 85: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

85/127

T2. Anomaly Detection (Part 2)“How can we detect anomalies or fraudsters

in large dynamic graphs (or tensors)?”

Hint: fraudsterstend to form dense subgraphs

??

benigndense

subgraphs

Graph / Tensor Part 1 / Part 2 / Part 3

Page 86: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

T2-1. Utilizing Patterns• T2-1. Patterns and Anomalies in Dense Subgraphs (§ 9)

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 86/127

“What are patterns in dense subgraphs?”“What are anomalies deviating from the patterns?”

K. Shin, T. Eliassi-Rad, C. Faloutsos, “Patterns and Anomalies in k-Cores of Real-world Graphs with Applications”, KAIS 2018 (formerly, ICDM 2016)

Page 87: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

T2-2. Utilizing Side Information

87/127

Item

s

Accounts

Graph / Tensor Part 1 / Part 2 / Part 3 §11 / §12 / §13 / §14

Page 88: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

88/127

“How can we detect dense subtensors in large dynamic data?”

• T2-2. Detecting Dense Subtensors (§§ 11-13)◦ In-memory Algorithm (§ 11)

◦ Distribute Algorithm for Web-scale Tensors (§ 12)

◦ Incremental Algorithms for Dynamic Tensors (§ 13)

T2-2. Utilizing Side Information

Graph / Tensor Part 1 / Part 2 / Part 3 §11 / §12 / §13 / §14

Page 89: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Contributions and Impact (Part 2)Patterns in dense subgraphs [ICDM16]

◦ Award: best paper candidate at ICDM 2016

◦ Class:

Algorithms for dense subtensors [PKDD16, WSDM17, KDD17]

◦ Real-world usage:

Open-source software: downloaded 257 times

89/127Mining Large Dynamic Graphs and Tensors (by Kijung Shin)

github.com/kijungs

Page 90: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 90/127

Part1.Structure Analysis

Part2. AnomalyDetection

Part3. BehaviorModeling

GraphsTriangle Count

(§§ 3-6) Anomalous Subgraph

(§ 9)

PurchaseBehavior

(§ 14)Summarization(§ 7)

Tensors Summarization(§ 8)

Dense Subtensors(§§ 10-13)

Progression(§ 15)

Organization of the Thesis (Recall)

Page 91: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

“How can we model the behavior of individualsin graph and tensor data?”

91/127

T3. Behavior Modeling (Part 3)

Social Network Behavior Log on Social Media

• T3-1. Modeling Purchase Behavior in a Social Network (§14)

• T3-2. Modeling Progression of Users of Social Media (§15)

“How do users evolve over time on social media?”

Graph Tensor Part 1 / Part 2 / Part 3 §14 / §15

Page 92: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Roadmap• T1. Structure Analysis (Part 1)

• T2. Anomaly Detection (Part 2)

• T3. Behavior Modeling (Part 3)◦ T3-1. Modeling Purchases (§14) <<◦ …

• Future Directions

•Conclusions

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 92/127

K Shin, E Lee, D Eswaran, AD Procaccia, “Why You Should Charge Your Friends for Borrowing Your Stuff”, IJCAI 2017

Page 93: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Sharable Goods: Question

93/127

“What do they have in common?”

Portable crib IKEA toolkit DVDs

Graph TensorPart 1 / Part 2 / Part 3 §14 / §15

Page 94: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Sharable Goods: Properties

•Used occasionally

• Share with friends

•Do not share with strangers

94/127Graph TensorPart 1 / Part 2 / Part 3 §14 / §15

Page 95: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Motivation: Social Inefficiency

95/127

Efficiencyof Purchase

Likelihoodof Purchase

High(share with many)

Low(share with few)

can be Low(likely to borrow)

can be High(likely to buy)

Popular Lonely

Q1 “How large can social inefficiency be?”Q2 “How to nudge people towards efficiency?”

Graph TensorPart 1 / Part 2 / Part 3 §14 / §15

Page 96: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Roadmap• T1. Structure Analysis (Part 1)

• T2. Anomaly Detection (Part 2)

• T3. Behavior Modeling (Part 3)◦ T3-1. Modeling Purchases (§14)

▪Toy Example <<

▪Game-theoretic Model

▪Best Rental-fee Search

◦ …

• Future Directions

•Conclusions

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 96/127

Page 97: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Social Network•Consider a social network, which is a graph

◦ Nodes: people

◦ Edges: friendship

97/127

Alice

Carol

Bob

“How many people should buy an IKEA toolkit for everyone to use it?”

Graph TensorPart 1 / Part 2 / Part 3 §14 / §15

Page 98: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Socially Optimal Decision• The answer is at least 𝟐

• Socially optimal: ◦ everyone uses a toolkit

◦ with minimum purchases (or with minimum cost)

98/127

Bob

“Does everyone want to stick to their current decisions?”

Alice

Graph TensorPart 1 / Part 2 / Part 3 §14 / §15

Page 99: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Individually Optimal Decision• The answer is No

• Individually optimal: ◦ everyone best responses to others’ decisions

• Socially inefficient (suboptimal):◦ 4 purchases happen when 2 are optimal

99/127

AliceBob

Graph TensorPart 1 / Part 2 / Part 3 §14 / §15

Page 100: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Social Inefficiency• Individually optimal outcome with 6 purchases

100/127

Dan

Carol

“How can we prevent this social inefficiency?”

Graph TensorPart 1 / Part 2 / Part 3 §14 / §15

Page 101: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Moving toward Social Optimum

101/127

“How can we make people stick with this socially optimal outcome?”

•Recall the socially optimal outcome

BobAlice

Graph TensorPart 1 / Part 2 / Part 3 §14 / §15

Page 102: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Imposing Rental Fee•Renters pay rental fee for getting permanent access

•Rental fee is half the price of a toolkit

102/127

BobAlice

“Does everyone want to stick to their current decisions?”

Graph TensorPart 1 / Part 2 / Part 3 §14 / §15

Page 103: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Socially & Individually Optimal• The answer is Yes

◦ Alice & Bob: are paid more than the price

◦ The others: renting is cheaper than buying

• Individually optimal

• Socially optimal with minimum (2) purchases

103/127

BobAlice

Graph TensorPart 1 / Part 2 / Part 3 §14 / §15

Page 104: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Socially & Individually Optimal• The answer is Yes

◦ Alice & Bob: are paid more than the price

◦ The others: renting is cheaper than buying

104/127

BobAlice

Graph TensorPart 1 / Part 2 / Part 3 §14 / §15

Q1 “How do rental fees affect social inefficiency?”Q2 “What are the ‘socially optimal’ rental fees?”

Page 105: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Roadmap• T1. Structure Analysis (Part 1)

• T2. Anomaly Detection (Part 2)

• T3. Behavior Modeling (Part 3)◦ T3-1. Modeling Purchases (§14)

▪Toy Example

▪Game-theoretic Model <<

▪Best Rental-fee Algorithm

◦ …

• Future Directions

•Conclusions

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 105/127

Page 106: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Formal Game-theoretic Model•Players: nodes in a social network

• Strategies:◦ buy a good / rent a good from a friend with a good

•Nash Equilibrium (NE): individually optimal outcome

• Social Optimum: socially optimal outcome

• Inefficiency of an NE:

106/127

# purchases in a social optimum

# purchases in the NE

Graph TensorPart 1 / Part 2 / Part 3 §14 / §15

Page 107: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Inefficiency without Rental Fee• [THM 1] Existence of NEs

◦ In every social network, there exists an NE.

• [THM 2] Inefficiency without Rental Fee◦ There exists a social network with 𝒏 nodes

◦ where all NEs have 𝜣(𝒏) inefficiency.

107/127

… …

𝑛 − 1

2

Input graph

… …

Social optimum (𝟐 owners)

Best NE(𝒏/𝟐 owners)

… …

Graph TensorPart 1 / Part 2 / Part 3 §14 / §15

Page 108: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Inefficiency with Rental Fee• [THM 3] Inefficiency with Rental Fee

◦ In every social network,

◦ if 𝑝𝑟𝑖𝑐𝑒

3< 𝑟𝑒𝑛𝑡𝑎𝑙 𝑓𝑒𝑒 < 𝑝𝑟𝑖𝑐𝑒,

◦ then, there exists a socially optimal NE,

◦ otherwise …

108/127

… …

𝑛 − 1

2

Input graph… …

Social optimum (𝟐 owners)

→ NE with proper rental fee

Graph TensorPart 1 / Part 2 / Part 3 §14 / §15

Page 109: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Roadmap• T1. Structure Analysis (Part 1)

• T2. Anomaly Detection (Part 2)

• T3. Behavior Modeling (Part 3)◦ T3-1. Modeling Purchases (§14) <<

▪Toy Example

▪Game-theoretic Model

▪Best Rental-fee Search <<

◦ …

• Future Directions

•Conclusions

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 109/127

Page 110: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Finding Best Rental Fee•Given:

◦ a social network

◦ a sharable good with price 𝑝

• Find: a range of rental fees

• To Minimize: inefficiencies of NEs

110/127

,Graph TensorPart 1 / Part 2 / Part 3 §14 / §15

Page 111: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Searching NEs (SGG-Nash)• Linear-time algorithm for searching NEs

•Gives different NEs depending on initial strategies

111/127

randomly initialize strategies

repeat until an NE is reached◦ for each node

▪optimize its strategy while fixing the others’

[THM 4] ConvergenceIn every social network, an NE is reached within 𝟑 iterations.

Graph TensorPart 1 / Part 2 / Part 3 §14 / §15

Page 112: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Scalability of SGG-Nash• SGG-Nash is linear in the number of edges

112/127Graph TensorPart 1 / Part 2 / Part 3 §14 / §15

- Dataset:

Page 113: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Best Rental Fee in Real Graphs•Datasets:

• Inefficiency is minimized consistently when

113/127

1.5

2.0

2.5

3.0

0.0 0.5 1.0Ave

rage

Ine

ffic

iency

Rental Fee (Relative to Price)

2

3

4

0.0 0.5 1.0Ave

rage

Ine

ffic

iency

Rental Fee (Relative to Price)

𝑝𝑟𝑖𝑐𝑒/3 < 𝑟𝑒𝑛𝑡𝑎𝑙 𝑓𝑒𝑒 < 𝑝𝑟𝑖𝑐𝑒/2

Graph TensorPart 1 / Part 2 / Part 3 §14 / §15

Theoretically best [THM3]

Theoretically best [THM3]

Page 114: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

1.5

2.0

2.5

3.0

0.0 0.5 1.0Ave

rage

Ine

ffic

iency

Summary of §14•Game-theoretic Model: Sharable good game

• Theoretical Analysis:◦ Existence of NEs

◦ Inefficiency of NEs

•Algorithm: for linear-time NE search

• Suggestion: “socially optimal” rental fees

114/127

Rental Fee (Relative to Price)

Graph TensorPart 1 / Part 2 / Part 3 §14 / §15

Page 115: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Tools for purchase modeling [IJCAI17]

◦ Suggest ‘socially optimal’ rental fees

◦ Media:

Tools for progression modeling [WWW18]

◦ Scale to datasets with a trillion records

◦ Real-world usage:

115/127

Contributions and Impact (Part 3)

Graph Tensor Part 1 / Part 2 / Part 3 §14 / §15

Page 116: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 116/127

Part1.Structure Analysis

Part2. AnomalyDetection

Part3. BehaviorModeling

GraphsTriangle Count

(§§ 3-6) Anomalous Subgraph

(§ 9)

PurchaseBehavior

(§ 14)Summarization(§ 7)

Tensors Summarization(§ 8)

Dense Subtensors(§§ 10-13)

Progression(§ 15)

Organization of the Thesis (Recall)

Page 117: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Roadmap• T1. Structure Analysis (Part 1)

• T2. Anomaly Detection (Part 2)

• T3. Behavior Modeling (Part 3)

• Future Directions <<

•Conclusions

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 117/127

Page 118: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 118/127

Model & Theory

Scalable Algorithms for Big Data Mining

D3

Vision: Algorithms for “Big Data”

Page 119: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

D1: Distributed Graph Stream Processing“How to analyze large dynamic graphs

on a cluster of machines?”

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 119/127

Current

SourcesDestination

Short Term

More Graph Mining Tasks

Mid Term

Programming Model

(Generalization)

Long Term

System / Platform

Triangle Counting

(§5)

Page 120: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

D2: Detecting Adversarial Anomalies

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 120/127

Anomalies / Fraudsters Detection System

improve

avoid

Profits of Anomalies

Cost to AvoidAlgorithms

Algorithms Costly

to Avoid

Algorithms for Static

Anomalies

Current Short Term Mid Term Long Term

Page 121: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

D3: Co-Evolution of Beliefs and Graphs

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 121/127

“How to model the co-evolution of nodes’ beliefs and edges?” Change of Beliefs

OR

Change of Edges

Game Theory/ Nash

Equilibrium

PredictionAlgorithms

Reducing Polarization

Regression Model

[PKDD18]

Current Short Term Mid Term Long Term

Page 122: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Roadmap• T1. Structure Analysis (Part 1)

• T2. Anomaly Detection (Part 2)

• T3. Behavior Modeling (Part 3)◦ T3-1. Modeling Purchases (§14) <<

◦ T3-2. Modeling Progression (§ 15) (Skip)

• Future Directions

•Conclusions <<

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 122/127

Page 123: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Conclusion•Goal: “To fully understand and utilize …………. large dynamic graphs and tensors”

•Contributions: developing scalable algorithms for

• Impact:

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 123/127

T1. StructureAnalysis (Part 1)

T2. Anomaly Detection (Part 2)

T3. Behavior Modeling (Part 3)

1 23

4

github.com/kijungs

Page 124: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

[1] K Shin, B Hooi, and C Faloutsos, “M-Zoom: Fast Dense-Block Detection in Tensors with Quality Guarantees”, ECML/PKDD 2016 (§11)

[2] K Shin, T Eliassi-Rad, and C Faloutsos, “CoreScope: Graph Mining Using k-Core Analysis - Patterns, Anomalies and Algorithms”, ICDM 2016 (§9)

[3] K Shin, “Mining Large Dynamic Graphs and Tensors for Accurate Triangle Counting in Real Graph Streams”, ICDM 2017 (§4)

[4] K Shin, B Hooi, J Kim, and C Faloutsos, “D-Cube: Dense-Block Detection in Terabyte-Scale Tensors”, WSDM 2017 (§12)

[5] K Shin, E Lee, D Eswaran, and AD. Procaccia, “Why You Should Charge Your Friends for Borrowing Your Stuff”, IJCAI 2017 (§14)

[6] K Shin, B Hooi, J Kim, and C Faloutsos, “DenseAlert: Incremental Dense-Subtensor Detection in Tensor Streams”, KDD 2017 (§13)

[7] J Oh, K Shin, EE Papalexakis, C Faloutsos, and Hwanjo Yu, “S-HOT: Scalable High-Order Tucker Decomposition”, WSDM 2017 (§8)

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 124/127

References

Page 125: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

References (cont.)

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 125/127

[8] K Shin, B Hooi, and C Faloutsos, “Fast, Accurate and Flexible Algorithms for Dense Subtensor Mining”, TKDD 2018 (§11)

[9] K Shin, M Shafiei, M Kim, A Jain, and H Raghavan, “Discovering Progression Stages in Trillion-Scale Behavior Logs” WWW 2018 (§14)

[10] K Shin, T Eliassi-Rad, and C Faloutsos, “Patterns and Anomalies in k-Cores of Real-world Graphs with Applications”, KAIS 2018 (§9)

[11] K Shin, M Hammoud, E Lee, J Oh, and C Faloutsos. “Tri-fly: Distributed estimation of global and local triangle counts in graph streams” PAKDD 2018 (§5)

[12] K Shin, J Kim, B Hooi, and C Faloutsos, “Think before You Discard: Accurate Triangle Counting in Graph Streams with Deletions”, ECML/PKDD 2018 (§6)

[13] K Shin, A Ghoting, M Kim and H Raghavan, “SWeG: Lossless and Lossy Summarization of Web-Scale Graphs”, WWW 2019 (§7)

Page 126: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Thank You!• Sponsors:

•Admins:

•Collaborators:

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 126/127

Page 127: Mining Large Dynamic Graphs and Tensorskijungs/defense/slides.pdf · 2019-02-09 · Thesis Committee •Prof. Christos Faloutsos (Chair) •Prof. Tom M. Mitchell •Prof. Leman Akoglu

Thank You!

Mining Large Dynamic Graphs and Tensors (by Kijung Shin) 127/127

•Homepage (Software & Datasets): http://www.cs.cmu.edu/~kijungs/defense/

ThinkD

-ACC