73
Pattern Mining: Getting the most out of your log data. Krishna Sridhar Staff Data Scientist, Dato Inc. krishna_srd

Pattern Mining: Extracting Value from Log Data

Embed Size (px)

Citation preview

Page 1: Pattern Mining: Extracting Value from Log Data

Pattern Mining: Getting the most out of your log data.

Krishna SridharStaff Data Scientist, Dato Inc. krishna_srd

Page 2: Pattern Mining: Extracting Value from Log Data

• Background- Machine Learning (ML) Research.- Ph.D Numerical Optimization @Wisconsin

• Now- Build ML tools for data-scientists & developers @Dato.- Help deploy ML algorithms.

@krishna_srd, @DatoInc

About Me!

Page 3: Pattern Mining: Extracting Value from Log Data

45+$and$growing$fast!

About Us!

Page 4: Pattern Mining: Extracting Value from Log Data

+ =

Questions?• (Now) We are monitoring the chat window.• (Later) Email me [email protected].

Webinars

Page 5: Pattern Mining: Extracting Value from Log Data

About you?

Page 6: Pattern Mining: Extracting Value from Log Data

Creating a model pipeline

Ingest Transform Model Deploy Unstructured Data

exploration

data

modeling

Data Science Workflow

Ingest Transform Model Deploy

Page 7: Pattern Mining: Extracting Value from Log Data

GraphLab(Create(

Train Model

Pipeline

Deploy Models

Serve Requests

(REST API)

Monitor Services

Get Live Feedback

Update Pipelines

Prototype & Develop Model

Pipelines

Update Live Experiment

Deploy New Pipeline

Dato(Predic2ve(Services(Dato’s Products Dato(Distributed(

We can help!

Page 8: Pattern Mining: Extracting Value from Log Data

Log Journey

Lots of data

Insights Profits

Page 9: Pattern Mining: Extracting Value from Log Data

Log Mining: Pattern Mining

Page 10: Pattern Mining: Extracting Value from Log Data

Logs are everywhere!

Page 11: Pattern Mining: Extracting Value from Log Data

Machine Learning in Logs

Source: Mining Your Logs - Gaining Insight Through Visualization

Page 12: Pattern Mining: Extracting Value from Log Data

Coffee shop

Coffee Shops Menu

Page 13: Pattern Mining: Extracting Value from Log Data

Receipts

Coffee Shops Menu

Page 14: Pattern Mining: Extracting Value from Log Data

Coffee Store Logs

Page 15: Pattern Mining: Extracting Value from Log Data

Frequent Pattern Mining

What sets of items were bought together?

Page 16: Pattern Mining: Extracting Value from Log Data

Real Applications

Page 17: Pattern Mining: Extracting Value from Log Data

Real Applications

Page 18: Pattern Mining: Extracting Value from Log Data

Real Applications

Page 19: Pattern Mining: Extracting Value from Log Data

Log Mining: Rule Mining

Page 20: Pattern Mining: Extracting Value from Log Data

Can we recommend items?

Rule Mining

Page 21: Pattern Mining: Extracting Value from Log Data

Real Applications

Page 22: Pattern Mining: Extracting Value from Log Data

Log Mining: Feature Extraction

Page 23: Pattern Mining: Extracting Value from Log Data

Feature Extraction

0 1 0 0 0 0 1 1 0 1 1 0 0 1 0 0 0 0 0 0 1 1 1 0

Receipt Space Features inMenu Space

ML

Page 24: Pattern Mining: Extracting Value from Log Data

3 Useful Data Mining Tasks

Rule MiningPattern Mining Feature Extraction

Page 25: Pattern Mining: Extracting Value from Log Data

Demo

Page 26: Pattern Mining: Extracting Value from Log Data

ML is not a black-box.Transparency

Learning is also about understanding. Interpretability

Whatever can go wrong, will go wrong. Diagnosis

Moving on

Page 27: Pattern Mining: Extracting Value from Log Data

Pattern Mining Explained

Page 28: Pattern Mining: Extracting Value from Log Data

Formulating Pattern Mining

N distinct items → 2N itemsets

Page 29: Pattern Mining: Extracting Value from Log Data

Formulating Pattern Mining

Find the top K most frequent sets of length at least L that occur at least M times.

Page 30: Pattern Mining: Extracting Value from Log Data

Formulating Pattern Mining

Find the top K most frequent sets of length at least L that occur at least M times.

- max_patterns- min_length- min_support

Page 31: Pattern Mining: Extracting Value from Log Data

Pattern Mining

N distinct items → 2N itemsets

Page 32: Pattern Mining: Extracting Value from Log Data

Pattern Mining: Principles

Page 33: Pattern Mining: Extracting Value from Log Data

Principle 1: What is frequent?

A pattern is frequent if it occurs at least M times.

{B, C, D}

{A, C, D}

{A, B, C, D}

{A, D}

{B, C, D}

{B, C, D}

{C, D}: 5 is frequentM = 4

{A, D}: 5 is not frequent

Page 34: Pattern Mining: Extracting Value from Log Data

Principle 1: What is frequent?

A pattern is frequent if it occurs at least M times.

{B, C, D}

{A, C, D}

{A, B, C, D}

{A, D}

{B, C, D}

{B, C, D}

{C, D}: 5 is frequentM = 4

{A, D}: 5 is not frequent

min_support

Page 35: Pattern Mining: Extracting Value from Log Data

Principle 2: Apriori principle

A pattern is frequent only if a subset is frequent

{B, C, D}

{A, C, D}

{A, B, C, D}

{A, D}

{B, C, D}

{B, C, D}

{B, C, D} : 5 is frequent therefore {C, D} : 5 is frequent

{A} : 3 is not frequent therefore {A, D} : 3 is not frequent

M = 4

Page 36: Pattern Mining: Extracting Value from Log Data

Two Main Algorithms

• Candidate Generation- Apriori - Eclat

• Pattern Growth- FP-Growth- TopK FP-Growth [GLC 1.6]

Page 37: Pattern Mining: Extracting Value from Log Data

Lots of Generalizations

Source: http://www.philippe-fournier-viger.com/spmf/

Page 38: Pattern Mining: Extracting Value from Log Data

Candidate Generation

Two phases1. Candidate generation.2. Candidate filtering.

Exploit Apriori Principle!

Page 39: Pattern Mining: Extracting Value from Log Data

Candidate Generation

{AB} : ? {AC} : ? {AD} : ? {BC} : ? {BD} : ? {CD} : ?

{A} : ? {B} : ? {C} : ? {D} : ?

{ } : 6

{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?

{B, C, D}

{A, C, D}

{A, B, C, D}

{A, D}

{B, C, D}

{B, C, D}

Page 40: Pattern Mining: Extracting Value from Log Data

Candidate Generation

{AB} : ? {AC} : ? {AD} : ? {BC} : ? {BD} : ? {CD} : ?

{A} : ? {B} : ? {C} : ? {D} : ?

{ } : 6

{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?

{B, C, D}

{A, C, D}

{A, B, C, D}

{A, D}

{B, C, D}

{B, C, D}

Page 41: Pattern Mining: Extracting Value from Log Data

Candidate Generation

{AB} : ? {AC} : ? {AD} : ? {BC} : ? {BD} : ? {CD} : ?

{A} : 3 {B} : 4 {C} : 5 {D} : 6

{ } : 6

{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?

{B, C, D}

{A, C, D}

{A, B, C, D}

{A, D}

{B, C, D}

{B, C, D}

Page 42: Pattern Mining: Extracting Value from Log Data

Candidate Generation

{AB} : ? {AC} : ? {AD} : ? {BC} : ? {BD} : ? {CD} : ?

{A} : 3 {B} : 4 {C} : 5 {D} : 6

{ } : 6

{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?

{B, C, D}

{A, C, D}

{A, B, C, D}

{A, D}

{B, C, D}

{B, C, D}

Page 43: Pattern Mining: Extracting Value from Log Data

Candidate Generation

{AB} : ? {AC} : ? {AD} : ? {BC} : ? {BD} : ? {CD} : ?

{A} : 3 {B} : 4 {C} : 5 {D} : 6

{ } : 6

{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?

{B, C, D}

{A, C, D}

{A, B, C, D}

{A, D}

{B, C, D}

{B, C, D}

Page 44: Pattern Mining: Extracting Value from Log Data

Candidate Generation

{AB} : ? {AC} : ? {AD} : ? {BC} : 4 {BD} : 4 {CD} : 5

{A} : 3 {B} : 4 {C} : 5 {D} : 6

{ } : 6

{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?

{B, C, D}

{A, C, D}

{A, B, C, D}

{A, D}

{B, C, D}

{B, C, D}

Page 45: Pattern Mining: Extracting Value from Log Data

Candidate Generation

{AB} : ? {AC} : ? {AD} : ? {BC} : 4 {BD} : 4 {CD} : 5

{A} : 3 {B} : 4 {C} : 5 {D} : 6

{ } : 6

{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?

{B, C, D}

{A, C, D}

{A, B, C, D}

{A, D}

{B, C, D}

{B, C, D}

Page 46: Pattern Mining: Extracting Value from Log Data

Pattern Growth

Two phases1. Candidate filtering2. Conditional database constructions.

Avoid full scans over the data & large candidate sets!

Page 47: Pattern Mining: Extracting Value from Log Data

Pattern Growth - Depth First {B, C, D}

{A, C, D}

{B, D}

{A, C, D}

{B, C, D}

{A, B, D}

{AB} : 1 {AC} : 2 {AD} : 3 {BD} : 4 {CD} : 4

{A} : 3 {B} : 4 {C} : 4 {D} : 6

{ } : 6

{ABC} : 0 {ABD} : 1 {ACD} : 2 {BCD} : 2

{BC} : 2

Page 48: Pattern Mining: Extracting Value from Log Data

Pattern Growth - Preprocessing {B, C, D}

{A, C, D}

{B, D}

{A, C, D}

{B, C, D}

{A, B, D}

{A} : 3 {B} : 4 {C} : 4 {D} : 6

{ } : 6

Page 49: Pattern Mining: Extracting Value from Log Data

Pattern Growth - Depth First {B, C, D}

{A, C, D}

{B, D}

{A, C, D}

{B, C, D}

{A, B, D}

{AB} : ? {AC} : ? {AD} : ? {BD} : ? {CD} : ?

{A} : ? {B} : ? {C} : ? {D} : ?

{ } : 6

{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?

{BC} : ?

Page 50: Pattern Mining: Extracting Value from Log Data

Pattern Growth - Depth First {B, C, D}

{A, C, D}

{B, D}

{A, C, D}

{B, C, D}

{A, B, D}

{AB} : ? {AC} : ? {AD} : ? {BD} : ? {CD} : ?

{A} : 3 {B} : 4 {C} : 4 {D} : 6

{ } : 6

{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?

{BC} : ?

Page 51: Pattern Mining: Extracting Value from Log Data

Pattern Growth - Depth First {B, C, D}

{A, C, D}

{B, D}

{A, C, D}

{B, C, D}

{A, B, D}

{AB} : ? {AC} : ? {AD} : ? {BD} : ? {CD} : ?

{A} : 3 {B} : 4 {C} : 4 {D} : 6

{ } : 6

{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?

{BC} : ?

Page 52: Pattern Mining: Extracting Value from Log Data

Pattern Growth - Depth First {B, C, D}

{A, C, D}

{B, D}

{A, C, D}

{B, C, D}

{A, B, D}

{AB} : X {AC} : ? {AD} : ? {BD} : 4 {CD} : ?

{A} : 3 {B} : 4 {C} : 4 {D} : 6

{ } : 6

{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?

{BC} : 2

Page 53: Pattern Mining: Extracting Value from Log Data

Pattern Growth

{B} : 4

{ } : 6

Call: Growth(db = DB{}, item = B, freq = {B,C,D})

DB{}

{B, C, D}

{A, C, D}

{B, D}

{A, C, D}

{B, C, D}

{A, B, D}

Page 54: Pattern Mining: Extracting Value from Log Data

Pattern Growth

{B} : 4

{ } : 6

Conditional Database ConstructionDB{} DB{B}

{B, C, D}

{A, C, D}

{B, D}

{A, C, D}

{B, C, D}

{A, B, D}

{C, D}

{D}

{C, D}

{D}

Page 55: Pattern Mining: Extracting Value from Log Data

Pattern Growth

{B} : 4

{ } : 6

Candidate FilteringDB{B}

{C, D}

{D}

{C, D}

{D}

{D} : 4

{C} : 2

DB{}

{B, C, D}

{A, C, D}

{B, D}

{A, C, D}

{B, C, D}

{A, B, D}

DB{B}

Add {BD} as frequent

Page 56: Pattern Mining: Extracting Value from Log Data

Pattern Growth - Depth First {C, D}

{D}

{C, D}

{D}

{AB} : X {AC} : ? {AD} : ? {BD} : 4 {CD} : ?

{A} : 3 {B} : 4 {C} : 4 {D} : 6

{ } : 6

{ABC} : ? {ABD} : ? {ACD} : ? {BCD} : ?

{BC} : 2

Page 57: Pattern Mining: Extracting Value from Log Data

Pattern Growth

Recurse: Growth(db = DB{B}, item = D, freq = {D})DB{B}

{C, D}

{D}

{C, D}

{D}

{B} : 4

{ } : 6

{BD} : 4

DB{BD}

Page 58: Pattern Mining: Extracting Value from Log Data

Pattern Growth - Depth First

{AB} : X {AC} : ? {AD} : ? {BD} : 4 {CD} : ?

{A} : 3 {B} : 4 {C} : 4 {D} : 6

{ } : 6

{ABC} : ? {ABD} : X {ACD} : ? {BCD} : X

{BC} : 2

Page 59: Pattern Mining: Extracting Value from Log Data

Compare & Constrast

• Candidate Generation + Better than brute force + Filters candidate sets - Multiple passes over the data

• Pattern Growth + Fewer passes over the data + Space efficient.

Page 60: Pattern Mining: Extracting Value from Log Data

Compare & Constrast

• Candidate Generation + Better than brute force + Filters candidate sets - Multiple passes over the data

• Pattern Growth + Fewer passes over the data + Space efficient.

Better choice

Page 61: Pattern Mining: Extracting Value from Log Data

FP-Tree CompressionFigures From Florian Verhein’s Slides on FP-Growth

Page 62: Pattern Mining: Extracting Value from Log Data

FP-Growth AlgorithmFigures From Florian Verhein’s Slides on FP-Growth

Two phases1. Candidate filtering.2. Conditional database constructions.

Page 63: Pattern Mining: Extracting Value from Log Data

TopK FP-Growth Algorithm

Similar to FP-Growth1. Dynamically raise min_support.2. Estimates of min_support greatly help.

Page 64: Pattern Mining: Extracting Value from Log Data

Performance on Website Logs

• 1.5m events• 84k sessions• 3k unique ids

Page 65: Pattern Mining: Extracting Value from Log Data

Future Work

Page 66: Pattern Mining: Extracting Value from Log Data

Distributed FP-Growth

Partition database on item-ids.

Database

Page 67: Pattern Mining: Extracting Value from Log Data

Bags + Sequences

× 2

Itemset: {Item}

Bags: {Item: quantity}

Sequences : (item)

Page 68: Pattern Mining: Extracting Value from Log Data

Model built, now what?

Page 69: Pattern Mining: Extracting Value from Log Data

Creating a model pipeline

Ingest Transform Model Deploy Unstructured Data

exploration

data

modeling

Data Science Workflow

Ingest Transform Model Deploy

Page 70: Pattern Mining: Extracting Value from Log Data

Demo

Page 71: Pattern Mining: Extracting Value from Log Data

Summary

Log Data Mining

≠Rocket Science

• FP-Growth for finding frequent patterns.• Find rules from patterns to make predictions.• Extract features for useful ML in pattern space.

Page 72: Pattern Mining: Extracting Value from Log Data

SELECT questions FROM audienceWHERE difficulty == “Easy”

Thanks!

Page 73: Pattern Mining: Extracting Value from Log Data

Extra Slides