49
How Machine Learning Detect Anomalies ? Anomaly is when you don't fit the expected norm. Like "wait what is this?! It doesn't belong here." Its what the system never planned for but now has to adapt to. Its Neo in The Matrix. - LeCrae - 2005 Big Data ParisMarch 2017

How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

How Machine

Learning

Detect Anomalies ? Anomaly is when you don't fit

the expected norm. Like "wait

what is this?! It doesn't belong

here." Its what the system

never planned for but now has

to adapt to. Its Neo in The

Matrix.

- LeCrae - 2005

Big Data Paris– March 2017

Page 2: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

WHAT IS

NORMALITY

?

Page 3: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

ORIGINAL

Page 4: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

OUTSTANDING

Page 5: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

SLIGHTLY

ODD

Page 6: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

UGLY

Page 7: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

HACKING CPC

Flo: Hi!

CPC: Hello, how are you?

Flo: My name is Florian

CPC: Nice to meet you.

Flo: I’m learning BASIC

CPC: Do you know GOSUB?

Page 8: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

ODDITY ROBUST?

Flo: Hi !

CPC: Hello, how are you ?

Flo: I'm great, I just ate a dog !

CPC: Oops, did I wake you up?

Flo: No, i was just dining.

CPC: I might be your friend named

Andrew.

Flo: What ?

CPC: What is your favorite name?

Flo: What ???

Page 9: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

TRUE A.I.

WILL

UNDERSTAND

ABNORMAL

Page 10: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

The Counting Way

Edgeworth 1887

Discordant observations may be

defined as those which present the

appearance of differing in respect of

their law of frequency from other

observations with which they are

combined.

Page 11: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

The Empirical Way

Page 12: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

The Statistical Way

Grubbs' test for outliers (1950)An outlying observation, or outlier, is one that

appears to deviate markedly from the other

member of the sample in which it occurs.

Page 13: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

The Machine Learning Way

Supervisedrare class

mining

Semi-Supervisednovelty

detection

UnsupervisedOutlier

detection

Page 14: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

SUPERVISED ANOMALY DETECTION

Page 15: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Fraud @ AMADEUS

Time limit churningby taking advantage of various functionalities, agents are able to

lock the booking of a seat for an unlimited time without issuing

and paying a ticket.

This gives them the possibility to offer an unlimited reflection

period to their customers without the usual price increase.

Frequent flyer abuseabusive use of frequent flyer cards to be granted higher

privileges.

Page 16: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Supervised Anomaly Detection

Data available with good and bad labels

Bad Labels are Rare

”Rare Class Mining”

We assume that any new anomaly will

be similar to some past anomaly

Page 17: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Learning With Unbalanced Data

https://pdfs.semanticscholar.org/239b/2210b3fbc1f4b8246437a88a668bf9a0d2c0.pdf

An overview of classification algorithms for imbalanced datasets, Vaishali Ganganwar

OversamplingGenerate Synthetic

Examples from the

Identified anomaly

UndersamplingSelect a subset

Of the original Data

Cost Sensitive LearningTake Misclassification

Costs to minimize

financial cost

Page 18: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

NOVELTY DETECTION

Page 19: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Detect The Verge Of The Storm

Page 20: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Novelty Detection

Data Available with Only Normal Labels

Detect abnormalities among new

observations

Time

Page 21: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Density Estimates

Gaussian Mixture

Anomaly/Novelty detection with scikit-learn

Alexandre Gramfort

Assumption

Independent and Identically

Distributed Variables

Page 22: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

http://fr.slideshare.net/agramfort/anomalynovelty-detection-with-scikitlearn

Page 23: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

One Class SVM

Page 24: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Detect Abnormal Network Activitytypical proportion of anomalies is 1 − 0.1%

0.5 million data points → 1000 anomalies

Page 25: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Rare Mining: Fraud Detection

STAKE : 13 B$ Per Year (US, 2015)

~0.04% Of Transaction Volume

(compared to 1.60% transaction fee)

Page 26: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Sequence Anomaly DetectionUniversal Probability Assigment

Universal Anomaly Detection: Algorithms and Applications

Page 27: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Markov Chain

First Letter:

Transaction Amount. Low (L) or High (H)

Second Letter:

Time Between Transaction. Low (L) or High(H)

LL : Small Transaction, Shortly After the previous one

LH: Small Transaction, Long after the previous one

Etc..

Learn Transition Probabilities on sample data

Identify Sequence that do not match

Page 28: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

OUTLIER DETECTION

Page 29: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Particle Physics

Typical proportion of anomalies is 10-4 %

2 million data points → 100 anomalies

Page 30: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Outlier Detection

No Labels available whatsoever

The only information whatsoever is that

labels are ”rare” and ”isolated” in a

sense to be determined w.r.t the

remaining of the data

Page 31: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Damage Detection

Energy with ”predictive” preventive maintenance programreduce their pump costs by 30%

https://www.rolandberger.com/publications/publication_pdf/roland_berger_predictive_maintenance_20141215.pdf

Roland Berger Report on Predictive Maintenance – Novembre 2014

Life Cycle CostsClassic Preventive

Maintenance

Predictive Preventive

Maintenance

Initial Cycle Costs $20,600 $20,600

Installation Costs $83,000 $83,000

Pump Maintenance Costs $25,000 $16,000

Other Maintenance Costs $6,000 $2,000

Total Life Cycle Costs $134,600 $121,600

Page 32: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Anomaly DetectionCluster Based Detectors

Page 33: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Anomaly DetectionDensity Based Detectors

Compare Density around a poin t

With the density of its neighbours

Page 34: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Isolation Forest

Page 35: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Isolation Forest

Representative subset selection and outlier detection via isolation forest

Wo-Ruo Chen,a Yong-Huan Yun,a Ming Wen,a Hong-Mei Lu,a Zhi-Min

Zhang*a and Yi-Zeng Liang*

Page 36: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Security

Page 37: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Fake Reviews / Fake News

Opinion Fraud Detection in Online Reviews by Network Effects

Page 38: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Graph Analytics

http://www3.cs.stonybrook.edu/~leman/pubs/14-dami-graphanomalysurvey.pdf

Community Based

Assign nodes into communities

and detect nodes belonging to no

communities

Relational Learning.

Learn Using Neighbor as a feature

Structured Based

Find rare substructure in the graph

Page 39: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Anomaly Detection : In Crowd

http://www.svcl.ucsd.edu/projects/anomaly/

STAKE:

Page 40: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Medical Imaging

http://ots.fh-brandenburg.de/downloads/abschlussarbeiten/2016-10-14%20pl_tobias_meyer.pdf

Hyper Parameter Selection for Anomaly Detection With Stack Autoencoders – a Deep Learning Application

Page 41: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

ElectrocardiogramsElectroencephalograms

Page 42: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Deep Learning: LTSM, Reconstruction Error

Page 43: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Deep Learning : Reconstruction Error

http://radar.oreilly.com/2014/07/new-approaches-to-anomaly-detection.html

Ellen Friedman

Page 44: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

SOLVE BUSINESS CRITICAL PROBLEMS

Solve the XXX Remaining Fraud

Get the next ”9” in product quality

Keep us Safe

Page 45: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

ANOMALY DETECTION BECAME ROBUST

Robust New ”General Purpose” Techniques (Isolation Forest)

Robust Specific Algorithms (Sequence Mining / Graph Mining)

Deep Learning To the Rescue

Page 46: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Machine Learning Challenge #1

ENDURE

Page 47: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Machine Learning Trade-offs

Interpretability Performance Self-Adaptation

Page 48: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Future with Deep LearningLearn when you might be wrong ?

Representation layer

High Reconstruction Error = Anomaly

Classification / Regression

Page 49: How Machine Learning Detect Anomalies ? LeCrae - 2005€¦ · detection Unsupervised Outlier detection. SUPERVISED ANOMALY DETECTION. Fraud @ AMADEUS Time limit churning by taking

Thank you !