Transcript
Page 1: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Anomaly Detection in Data

Docent Xiao-Zhi [email protected]

Page 2: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Outline

• Introduction

• Anomaly detection in data

• Negative Selection Algorithm (NSA)

• Application examples

• Conclusions

Page 3: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

What are Anomalies?

• Anomaly is a pattern in the data that does not conform to the expected behavior

• Anomaly is also referred to as outliers, exceptions, peculiarities, surprise, etc.

• Anomalies translate to significant (often critical) real life entities– Credit card fraud– Cyber intrusions

Page 4: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Simple Example

• N1 and N2 are regions of normal behavior

• Points o1 and o2 are anomalies

• Points in region O3 are anomalies

X

Y

N1

N2

o1

o2

O3

Page 5: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Simple Example

Page 6: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Simple Example

Page 7: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Real World Anomalies

• Credit Card Fraud– An abnormally high purchase made on a

credit card

• Cyber Intrusions– A web server involved in ftp traffic

Page 8: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Applications of Anomaly Detection

• Network intrusion detection• Insurance / Credit card fraud detection• Healthcare informatics / Medical diagnostics• Industrial damage detection• Image processing / Video surveillance • Novel topic detection in text mining• …

Page 9: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Key Challenges

• Defining a representative normal region is challenging

• The boundary between normal and outlying behavior is often not precise

• The exact notion of an outlier is different for different application domains

• Data might contain noise• Normal behavior keeps evolving

Page 10: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Artificial Immune Systems (AIS)

• Artificial Immune Systems (AIS) are an emerging kind of soft computing methods– Inspired by natural immune systems– Features of pattern recognition, anomaly detection, data

analysis, machine learning, etc• Negative Selection Algorithm (NSA) is an important

partner of AIS– Maturation of T cells and self/nonself discrimination– Developed by Forrest in 1994

• NSA is applied to deal with anomaly detection in data

Page 11: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Biological Information Processing Systems

SystemGenetic

SystemEndocrine

SystemBrain

SystemImmune

Page 12: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Natural Immune System

Page 13: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Natural Immune System

X X

Pathogens

Biochemical barriers

Skin

Innate immune response

Adaptive immune response

Lymphocytes

Page 14: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

How Does Natural Immune System Work?

Page 15: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Negative Selection Algorithm (NSA)

• Immune system (B and T cells) is capable of distinguishing self from nonself– Negative censoring of T cells in thymus

• Negative Selection Algorithm (NSA) mimics mechanism of immune system– 1. Define self samples (representative samples)– 2. Generation of detectors (binary and real-valued)– 3. Negative selection of detectors– 4. Employment of detectors in anomaly detection

Page 16: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Negative Selection Algorithm (NSA)

Generation of NSA Detectors

Page 17: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Negative Selection Algorithm (NSA)

Anomaly Detection Using NSA

Page 18: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Self and Nonself Samples

Self Samples

Nonself Samples

Page 19: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Self and Nonself Coverage in NSA

Page 20: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Self and Nonself Samples

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 21: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Two Examples of Random Detector Groups

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 22: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Two Examples of Optimized Detector Groups (Gao, 2004)

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 23: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Distribution of Fisher’s iris Data in Sepal Length-Sepal Width Dimensions

4 4.5 5 5.5 6 6.5 7 7.5 82

2.5

3

3.5

4

4.5

Sepal Length

Sep

al W

idth

setosa

versicolorvirginica

Page 24: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Distribution of Fisher’s iris Data in Petal Length-Petal Width Dimensions

0 1 2 3 4 5 6 70

0.5

1

1.5

2

2.5

Petal Length

Pet

al W

idth

setosa

versicolorvirginica

Page 25: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Detector Generation in NSA (Gao, 2006)

Page 26: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Anomaly Detection in Fisher’s iris Data with NSA Detectors

Page 27: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Anomaly Detection in Chaotic Time Series

• Mackey-Glass chaotic time series

controls the behaviors of Mackey-Glass time series– and

• Anomaly detection rate can be improved by neural networks-based NSA

)()(1

)()( tbx

tx

taxtx

c

1730

Page 28: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Mackey-Glass Time Series

30

17

Fresh Data

Page 29: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Anomaly Detection in Mackey-Glass Time Series Using Adaptive NSA

Before Training

After Training

57M 9L%86

31M 1L%97

Page 30: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

NSA-based Motor Fault Detection

• Monitoring of the working conditions of the running motors is necessary in maintaining their normal status

• Anomaly in the feature signals acquired from the faulty motors is caused by faults

• Motor fault detection is converted to a typical problem of anomaly detection

Page 31: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Normal, Abnormal, and Faulty Feature Signals of Motors

Normal Feature Signal

Abnormal Feature Signal

Faulty Feature Signal

Page 32: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

NSA in Motor Fault Detection

Feature Signals from Healthy Motors

Feature Signals from Operating Motors

Signal Preprocessing

Signal Preprocessing

Detector Generation

Detectors Anomaly Detection

Fault Detection

Fault Detection PhaseDetector Generation Phase

Page 33: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

NSA-based Motor Fault Detection

•Anomaly in feature signals is caused by faults–Healthy feature signals: self–Faulty feature signals: nonself

•Neural networks are combined with NSA• NSA detectors are built up on the structure of BP

neural networks• Neural networks training algorithm is applied

Page 34: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Neural Networks-based NSA (Gao, 2010)

1x 2x Nx

Nw

1v 2v Nv

y

f f f

2w1w

E

3Layer

2Layer

1Layer

2 iii wxd

N

iiii

N

iii wxvdvy

1

2

1

)(f)f(

yE

Page 35: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Training of Neural Networks-based NSA

),( Weights vw

)E

(E

rror

Mat

chin

gv

w,

)E( vw,

),( ** vw

Page 36: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Margin Training Strategy of NSA Detectors

• Case1: (for faulty plant feature signals only): if , detectors are trained using normal BP learning algorithm to decrease E; otherwise, no training is employed

• Case 2 (for healthy plant feature signals only): if , detectors are trained using ‘positive’ learning algorithm to increase E; otherwise, no training is employed

E0

0 E

Page 37: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Margin Training of NSA Detectors in Fault Detection

y

Detectors

Out

puts

Det

ecto

r IIRegion Training

IRegion Training

Page 38: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Inner Raceway Fault Detection of Bearings

• Bearings are important components in rotating machinery

• Defect on the inner raceway is a common but typical fault of bearings

• Fault detection is based on vibration signals of bearings– A sensor mounted on eight-ball bearings with a motor rotation

speed at 1,782 rpm

Page 39: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Inner Raceway Fault of Bearings

Page 40: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Features Signals of Healthy and Faulty Bearings

Healthy Bearings

Faulty Bearings

Page 41: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Fault Detection Rate of Neural Networks-based NSA

27M

Before Training

7L

%79

15M

After Training

0L

%100

Page 42: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Motor Fault Detection using NSA (Gao, 2012)• Two kinds of motor faults are considered here

– Rotor fault– Stator fault

• Stator current signals are used as feature signals

• Both healthy and faulty motors are running with/without varying loads

• Fault detection rate is %100

BA

B

Page 43: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Feature Signals for Motor Fault Detection

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1-150

-100

-50

0

50

100

150

Time in Seconds

Sta

tor

Cur

rent

Healthy Motor

Page 44: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Feature Signals for Motor Fault Detection

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1-200

-150

-100

-50

0

50

100

150

Time in Seconds

Sta

tor

Cur

rent

Broken Rotor

Page 45: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Feature Signals for Motor Fault Detection

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1-150

-100

-50

0

50

100

150

Time in Seconds

Sta

tor

Cur

rent

Broken Stator

Page 46: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Motor Fault Detection Results

0 50 100 150 200 250 300 350 400 450 5000

1

2

3

4

5

6

7

8

Number of Signal Windows

Num

ber

of A

ctiv

ated

Det

ecto

rs

Healthy Motor

Page 47: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Motor Fault Detection Results

0 50 100 150 200 250 300 350 400 450 5000

1

2

3

4

5

6

7

8

Number of Signal Windows

Num

ber

of A

ctiv

ated

Det

ecto

rs

Broken Rotor

Page 48: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Motor Fault Detection Result

0 50 100 150 200 250 300 350 400 450 5000

1

2

3

4

5

6

7

8

Number of Signal Windows

Num

ber

of A

ctiv

ated

Det

ecto

rs

Broken Stator

Page 49: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Detection Results of Rotor and Stator Faults

Page 50: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Current Signals of Healthy Motor withFour Different Loads

0 5000 10000

-100

0

100

Time in Seconds

Rot

or C

urre

nt

(a)

0 5000 10000-400

-200

0

200

400

Time in Seconds

Rot

or C

urre

nt

(b)

0 5000 10000-100

-50

0

50

100

Time in Seconds

Rot

or C

urre

nt

(c)

0 5000 10000-100

-50

0

50

100

Time in Seconds

Rot

or C

urre

nt

(d)

Page 51: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Current Signals of Faulty Motor withFour Different Loads

0 5000 10000-200

-100

0

100

200

Time in Seconds

Rot

or C

urre

nt

(a)

0 5000 10000

-500

0

500

Time in Seconds

Rot

or C

urre

nt

(b)

0 5000 10000

-100

0

100

Time in Seconds

Rot

or C

urre

nt

(c)

0 5000 10000

-200

0

200

Time in Seconds

Rot

or C

urre

nt

(d)

Page 52: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Numbers of Activated NSA Detectorsfor Healthy Motor

0 100 200 300 400 5000

5

10

Number of Signal Windows

Num

ber

of A

ctiv

ated

Det

ecto

rs (a)

0 100 200 300 400 5000

5

10

Number of Signal Windows

Num

ber

of A

ctiv

ated

Det

ecto

rs (b)

0 100 200 300 400 5000

5

10

15

20

Number of Signal Windows

Num

ber

of A

ctiv

ated

Det

ecto

rs (c)

0 100 200 300 400 5000

10

20

30

40

Number of Signal Windows

Num

ber

of A

ctiv

ated

Det

ecto

rs (c)

Healthy Motor

Page 53: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Numbers of Activated NSA Detectorsfor Faulty Motor

0 100 200 300 400 5000

5

10

Number of Signal Windows

Num

ber

of A

ctiv

ated

Det

ecto

rs (a)

0 100 200 300 400 5000

5

10

Number of Signal Windows

Num

ber

of A

ctiv

ated

Det

ecto

rs (b)

0 100 200 300 400 5000

5

10

15

20

Number of Signal Windows

Num

ber

of A

ctiv

ated

Det

ecto

rs (c)

0 100 200 300 400 5000

10

20

30

40

Number of Signal Windows

Num

ber

of A

ctiv

ated

Det

ecto

rs (c)

Faulty Motor

Page 54: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Fault Detection Rates of Faulty Motorswith Different Loads (Gao, 2013)

Page 55: Anomaly Detection in Data Docent Xiao-Zhi Gao xiao-zhi.gao@aalto.fi

Conclusions

• Anomaly detection in data is an important topic• NSA can be used for anomaly detection based

on only the normal data• A few application examples have demonstrated

the effectiveness of the NSA in anomaly detection

• Performance comparisons need to be made between the NSA and other anomaly detection methods, e.g., Support Vector Machine (SVM)


Recommended