Novelty Detection in Data Streams Profa. Elaine Faria UFU ...elaine/disc/MFCD2018/Aula5... · –is...

Preview:

Citation preview

Novelty Detection in Data Streams

Profa. Elaine Faria UFU - 2018

• Slides based on the papers– FARIA, ELAINE R.; GONÇALVES, ISABEL J. C. R. ; DE

CARVALHO, ANDRÉ C. P. L. F. ; GAMA, JOÃO . Novelty detection in data streams. Artificial Intelligence Review, v. 45, p. 235-269, 2016.

– FARIA, ELAINE RIBEIRO; PONCE DE LEON FERREIRA CARVALHO, ANDRÉ CARLOS ; GAMA, JOÃO . MINAS: multiclass learning algorithm for novelty detection in data streams. Data Mining and Knowledge Discovery, v. 30, p. 640-680, 2016.

– FARIA, ELAINE; GONCALVES, ISABEL ; GAMA, JOAO ; PONCE DE LEON FERREIRA CARVALHO, ANDRE . Evaluation of Multiclass Novelty Detection Algorithms for Data Streams. IEEE Transactions on Knowledge and Data Engineering (Print), v. 27, p. 2961-2973, 2015.

2

Introduction

• Novelty Detection(ND) - DefinitionsNovelty detection is concerned with identifying abnormal system behaviours and abrupt changes from one regime to another (Lee and Roberts 2008)

The recognition that an input differs in some respect from previous inputs (Perner 2008)

Novelty detection makes it possible to recognize novel concepts, which may indicate the appearance of a new concept, a change occurred in known concepts or the presence of noise (Gama 2010).

3

Introduction

• Novelty detection – is useful in cases where an important class is

under-represented in the training set– is an important task, since, for many problems,

we never know if the currently available training data include on all possible object classes

– allows the recognition of novel profiles (concepts) in unlabeled data

4

Introduction

• Novelty Detection - Challenges– Concept drift

– Noise and outliers

– Recurring Concepts

– Concept Evolution• Number of problem classes increases over time

5

Introduction

• Data stream applications for ND– Intrusion detection– Fraud detection– Medical diagnosis– Detection of interest regions in images– Fault detection– Spam filter– Text classification– ....

6

Introduction

• It is important to distinguish– Anomaly detection

– Outlier detection

– Novelty detection

7

Introduction

• Novelty, anomaly and outlier detection are related to find patterns that are different from the normal (usual)– Anomaly and outlier detection give the idea of

an undesired pattern– Novelty indicates an emergent or a new

concept that needs to be incorporated to the normal pattern

8

Novelty detection - Formalization of the Problem

Training set (Offline Phase )Dtr = {(X1, y1), (X2, y2), …, (Xm, ym)}

Xi: vector of input attributes for the ith example yi: target attributeyi Ytr and Ytr ={c1,c2, …,cL}

When new data arrive (Online Phase)Yall ={c1,c2, …,cL, …, cK}, K > LGoal: Classify Xnew in Yall

9

Novelty detection - Phases

• Offline Phase– Induces a classifier from a set of labeled

examples → known concept about the problem

• Online Phase– Classifies new unlabeled examples– Identifies novelty patterns– Updates the decision model

10

Offline Phase - Taxonomy

11

Offline Phase

• Learning task– Unsupervised approaches

• Suppose that all the examples from the training set belongs to the normal concept

– Supervised approaches• Use the label of the examples to build the decision

model• Normal concept is composed by a set of different

classes

12

Online Phase

• Tasks– Classification of new examples– Detection of novelty patterns– Adaptation of the decision model

• some algorithms update the decision model in an offline fashion

13

Online Phase

14

Classification

Online Phase

• Classification– Verify if a new example can be explained by

the current decision model– Approach 1

• Classify new examples only as normal or novelty– Approach 2

• Consider the problem as a multiclass classification task

15

Online Phase

16

Classification - Taxonomy

Online Phase

• Classification with unknown label option– Examples not explained by the current

decision are not immediately classified• Assign an unknown profile

– They are put in a short-term memory for future analysis

• Used to update the decision model: extensions and novelty patterns

17

Online Phase

18

Detection of novelty patterns

or

Online Phase

• Detection of novelty patterns– Uses unlabeled examples not explained by

the current decision model to identify novelty patterns

– Anomaly detection• Presence of one example not explained by the

model identifies an anomaly behavior– Novelty

• Composed by a set of cohesive and representative examples not explained by the decision model

19

Online Phase

20

Detection of novelty patterns: Taxonomy

Online Phase

21

Update of the decision model

or

Online Phase

• Update of the decision model– Necessary task to address concept drift and

concept evolution– Can be carried with or without feedback– Forgetting mechanisms

• Important strategy used to remove outdated concepts

22

Online Phase

23

Update of the decision model

Online Phase

• Update of the decision model: External Feedback– Approach 1: external feedback

• Assume that the true label of all the examples will be available after a delay

• Unrealistic assumption for data streams– Approach 2: active learning

• Ask the user the label of a subset of the examples in the stream

– Approach 3: without feedback• Decision model is updated without information

about the true label of the examples 24

Online Phase

• Update of the decision model: Forgetting mechanism → Important to forget previous, outdated, concepts– Approach 1: Based on an ensemble of classifiers

• To train a new classifier and replace an old one– Approach 2: Based on clusters

• Clusters that do not received new examples for a long time are removed

– Approach 3: Based on weight• To reduce the weight of the old examples

25

Detection of recurring concepts

• Recurring concepts: definition– The class definitions may change when

previous situations recur, in periodic or random way, after some period of time (Elwell and Polikar 2011)

– Special type of concept drift where concepts that appeared in the past may recur in the future (Katakis et al. 2010)

26

Detection of recurring concepts

• Recurring contexts: Examples– Climate change– Electricity demand – Buyer habits – ....

27

Detection of recurring concepts

It would be a waste of effort to relearn an old concept from scratch for each recurrence (Widmer and Kubat 1996)

– In recurring contexts • Instead of forgetting outdated concepts, these

concepts should be saved and reexamined at some later time when they can improve the prediction performance in a cost-effective way

28

Detection of recurring concepts

• Systems that do not address recurring concepts: Treat them as novelty– Undesirable effects

• Increase in the false alarm rate• Increase in the human effort in analyzing the false

alarms• Computational efforts in executing a novelty

detection task and in learning a new class that was already learned

29

Detection of recurring concepts

• Approaches– Approach 1: To use an auxiliary ensemble of

classifiers that detects recurring classes– Approach 2: To use c ensembles, one per

class• Each ensemble is never deleted, but only updated• c is the number of classes seen so far in the

stream– Approach 3: To use a sleep memory to store

clusters not used to classify new examples for a long time 30

Treatment of Outliers

• Outliers– Data that are isolated, sparse and not present

in a representative number• Novelty detection algorithms

– Look for a cohesive and representative set of examples

– Must address the treatment of noise or outliers which can be confused with the appearing of a new concept or a change in the known concepts

31

Treatment of Outliers

• Approach for outlier treatment (used by MCM, ECSMiner, MINAS, OLINDDA algorithms)– To store the examples not explained by the current

model in a temporary memory– To cluster these examples– To apply validation criteria on the clusters

• Examples of validation criteria: cohesiveness, representativeness, separability

• Not valid clusters are potential outliersMinas also propose to remove old examples, which stay in the temporary memory for a long time

32

Examples of Novelty Detection Algorithms for Data Streams

• ECSMiner (Masud et al. 2011)• OLINDDA (Spinosa 2009)• MINAS (Faria 2016)• MCM (Masud et al. 2010)• CLAM (Al-Khateeb et al. 2012)

33

ECSMiner • Supervised algorithm for concept drift and

concept evolution• The decision model is composed by an

ensemble of classifiers– It supposes that all examples will be labeled after a

delay– Each classification model is trained from a chunk of

data– The ensemble is composed by M models– The ensemble is continuoulsy updated

• The model with the highest prediction error is replaced by a new model

34

ECSMiner

• Assumptions– After Tl timestamps the true label of the

example will be available– It is possible to wait to Tc timestamps before

to make a decision about the classification of an example

Tc < Tl

35

ECSMiner• Offline Phase

– Supervised– Ensemble of classifiers

• Decision tree or KNN

• Online Phase– Use the ensemble for classify new examples– Store the examples not explained by the ensemble (f-outliers)– Build clusters from f-outliers using K-Means– Calculate the q-NSC measure (q-neighbourhood silhouette

coefficient)– If most of the classifiers has the q-NSC positive→ a novelty is

detected

36

OLINDDA

• Offline Phase– Unsupervised– Learn a decision model about the normal class– The decision model is a set of clusters (k-hypershperes)

• Clustering algorithm: K-Means

37

OLINDDA

• Online Phase• Unsupervised• Use the decision model created in the offline

phase to classify new examples as normal • Examples not explained by the decision model are

put in a short-term memory (unknown)• Valid clusters of unknown examples are used to

create the extension and novelty models

38

OLINDDA

Normal

Extension

Novelty

39

OLINDDA

Normal

Extension

Novelty

Example ???

Example

Example

Example

If a new example is inside the radius of one of the hypersphers classify it with the label of the hypersphere

Example

Example

Example

Example

Example

40

Normal

Extension

OLINDDA

• If the example is labeled as unknown it is stored in a short-term memory

Example

Short-term memory

Not explained by any of the hyperspheres

41

OLINDDA

If the number of examples in the short-term memory > threshold cluster the examples using K-Means Only valid clusters (cohesive and representative) are considered

Sort-term memory

# Examples > Threshold

K- Means

42

OLINDDA

• A new cluster is – Extension

• Neighbourhood of the normal model

– Novelty• Distant from the

normal model

43

MINASMultIclass learNing Algorithm for data Streams

• Offline Phase– Learns a decision model based on the known concept

about the problem – Execute once– Each class represent by a set of clusters (hyperspheres)

• Online Phase– Receives new examples and classify them either as one of

the known classes or as unknown– Cohesive group of unknown examples are used to detect

new classes or extensions44

MINAS - Offline Phase

45

MINAS - Offline Phase• Micro-clusters: statistical summary (incremental)N number of examplesLS linear sum of the examplesSS squared sum of the examplest timestamp of the arrival of the last example classified by the micro-

cluster

• Example of clustering algorithms used in the Training Phase– K-Means– Clustream

46

MINAS - Offline Phase

47

MINAS

• Online Phase– To classify new examples– To detect novelty patterns– To update the decision model

48

MINAS - Classification

49

MINAS - Classification

• Classify an example as unknown means– The example is a noise or outlier and it can not be

explained by anyone of the micro-clusters • The example must be discarded

– The example represents a concept drift • The example must be used to update the decision

model– The example represents a novelty pattern

• The example must be used to update the decision model

50

MINAS – Novelty detection and update

51

MINAS - Online Phase

52

MINAS - Online Phase

53

MINAS-Active Learning

• Used when the label of a reduced set of examples are available

• Use active learning techniques to select a representative set of examples to be labeled and used to update the decision model

• Main idea– Time to time select the centroid of the new created

micro-clusters as the examples to be labeled by the specialist

– Update the decision model with the new label

54

Evaluation in Novelty DetectionMulticlass novelty detection data stream algorithms use binary evaluation measures

% of examples misclassified in the normal class

% of normal class examples wrongly classified as novelty

% classificações incorretas

FP: # of examples from the known classes wrongly classified as noveltyFN: # of examples from the novel classes wrongly classified as known classesFE: # of examples from known classes misclassified (other than FP)N: # of examples in the stream Nc: # of examples from the novel classes

55

Evaluation in Novelty Detection

• Binary classification evaluation measures: Problems– Considers the novelty detection as a binary

classification task• It is a multiclassification task

– Do not consider the unknown examples separately

– Do not consider that different novelty patterns can appear

– Evaluate only the final confusion matrix 56

57

Evaluation in Novelty Detection (Faria et. al 2013)

• Confusion matrix– Not square (rectangle)– Number of columns

increases over time– Novelty patterns do not

have direct matching with problem classes

– Presence of unknown examples

58

Evaluation in Novelty Detection (Faria et. al 2013)

• Rectangular Confusion Matrix – Problem

• Difficult to define hits and errors• Matrix is not squared• Each novelty pattern needs to be assigned to only

one class – One class may be associated with one or more novelties

– Solution• Representation using Bipartite graph• Based on the Hungarian Method

59

Evaluation in Novelty Detection (Faria et. al 2013)

Confusion Matrix

Corresponding Bipartite Graph Resulting Bipartite Subgraph

60

Evaluation in Novelty Detection (Faria et. al 2013)

• Unknown examples– Problem

• How to consider the unknown examples? – Hits or Errors?

– Solution• Neither hits nor errors• Unknown examples should be computed

separately

61

Unknown examples

ACCExp + ErrExp = 1 ACCExp/ErrExp: accuracy/error considering only the

examples explained by the model

Unki: # examples from the class Ci classified as unknown

ExCi: # examples from class Ci

M: # classes

Evaluation in Novelty Detection (Faria et. al 2013)

62

Evaluation in Novelty Detection (Faria et. al 2013)

• Use evaluation measure CER (Combined Error Rate) to calculate classification error rate

• Considerer only the examples classified as not unknown

#Ex′Ci: number of examples from class Ci#Ex′: number of examples

FPRi: false positive rateFNRi: false negative rate

63

Evaluation in Novelty Detection (Faria et. al 2013)

• Evaluation over time: Problem– In evolving data stream, it is not sufficient to extract

information about the final confusion matrix• Solution

– Plot a 2D-graphic• X represents the data timestamps • Y represents the evaluation measure values

– Plot the information about errors and unknown examples

– Identify the timestamps of when a new concept was detected

64

Referências• Masud M, Gao J, Khan L, Han J, Thuraisingham BM (2011)

Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Transaction on Knowledge Data Engineering 23(6):859–874

• Spinosa EJ, Carvalho ACPLF, Gama J (2009) Novelty detection with application to data streams. Intelligent Data Analysis 13(3):405–422

• Faria, ER; Carvalho ACPLF, Gama J (2016) MINAS: multiclass learning algorithm for novelty detection in data streams. Data Mining and Knowledge Discovery, v. 30, p. 640-680

• Masud MM, Chen Q, Khan L, Aggarwal CC, Gao J, Han J, Thuraisingham BM (2010) Addressing concept evolution in concept-drifting data streams. In: Proceedings of the 10th IEEE international conference on data mining (ICDM’10), pp 929–934

65

Referências• Al-Khateeb TM, Masud MM, Khan L, Thuraisingham B

(2012) Cloud guided stream classification using class-based ensemble. In: Proceedings of the 2012 IEEE 5th international conference on computing (CLOUD’12). IEEE Computer Society, Washington, DC, USA, pp 694–701

• Elwell R, Polikar R (2011) Incremental learning of concept drift in nonstationary environments. IEEE Transactions on Neural Network 22(10):1517–1531

• Katakis I, Tsoumakas G, Vlahavas I (2010) Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl Inf Syst 2(3):371–391

66

Referências• Widmer G, Kubat M (1996) Learning in the presence of

concept drift and hidden contexts. Machine Learning 23(1):69–101