Noise Resilience in Machine Learning Algorithms

Preview:

Citation preview

Exploring the Noise ResilienceCombined Sturges Algorithm

Akrita AgarwalAdvisor: Dr. Anca Ralescu

November 7, 2015

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 1 / 39

Motivation

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 2 / 39

Motivation

A study on Noise?

Real-world datasets are noisyRecordings under normal environmental conditionsEquipment Measurement ErrorMost algorithms ignore Noise.Not a lot of research done on Noise.

Aim : Explore the robustness of algorithms to Noise.

Which algorithm is least affected by noisy Datasets?

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 3 / 39

Classification

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 4 / 39

Classification

Classification : Assigning a new observation to a set of knowncategories

Companies store large amounts of data.

Effective Classifier can assist in making good predictions and informedbusiness decisions.

E.g. Whether to recommend Prime products to the non-primecustomers, based on behavior

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 5 / 39

Classification Algorithms

Two broad kinds of Classifiers are -

Frequency based classifiers: use the frequency of datapoints in thedataset to determine the class membership of a given test point,Geometry based classifiers leverage the geometrical aspects of adataset such as the distance.

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 6 / 39

Naive Bayes

The Naive Bayes Classifier

Frequency based classifierComputes the probability of a test data point to be in each classclass probability extracted from training data.

Pros

Intuitive to understand and build.Easily trained, even with a small datasetIt’s fast

Cons

Assumes conditional independence of the dataignores the underlying geometry of the data.

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 7 / 39

k Nearest Neighbors

The k Nearest Neighbors Classifier

Geometry based classifierAssigns the class to test data point by determining the majority classof k nearest points

Pros

Easy to implement and understandClasses don’t have to be linearly separable

Cons

Tends to ignore the importance of an attribute; uses allonly indirectly takes into account the frequency of the data

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 8 / 39

Combined Sturges Classifier

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 9 / 39

Combined Sturges

The Combined Sturges(CS)Classifier

Explicitly uses geometry + frequencyData represented as Frequency distribution on class.Classification Score is computed for each class.Test point assigned to class with highest Score.

Continuous data values are binned.

No. of bins = d1 + log2neSturges, 1926 - Choice of a Class Interval

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 10 / 39

Combined Sturges

Dummy Dataset

Table: Dummy Dataset

A1 A2 Class3 2 11 2 14 2 03 2 11 1 02 2 13 3 04 1 0

Table: Frequency Distribution on Classes 0 & 1

A1 f (A1) A2 f (A2)

1 0.25 1 0.503 0.25 2 0.254 0.50 3 0.25

A1 f (A1) A2 f (A2)

1 0.25 2 0.752 0.25 3 0.253 0.50

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 11 / 39

Combined Sturges

Dummy Dataset

Table: Dummy Dataset

A1 A2 Class3 2 11 2 14 2 03 2 11 1 02 2 13 3 04 1 0

Table: Frequency Distribution on Classes 0 & 1

A1 f (A1) A2 f (A2)

1 0.25 1 0.503 0.25 2 0.254 0.50 3 0.25

A1 f (A1) A2 f (A2)

1 0.25 2 0.752 0.25 3 0.253 0.50

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 11 / 39

Combined Sturges

Test Point : T1

3 4

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 12 / 39

Combined Sturges

1 Geometric Criterion

Test Point : T1

3 4

minimum Distance

ClassificationCriteria :Geometric

ClassificationScore: HighestPosteriorProbability

Table: Nearest distance of T1 to Classes

A1 f (A1) A2 f (A2)

1 0.25 1 0.503 0.25 2 0.254 0.50 3 0.25

A1 f (A1) A2 f (A2)

1 0.25 2 0.752 0.25 3 0.253 0.50

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 13 / 39

Combined Sturges

Classification Score, S(c) c ∈ 0, 1

S(0)

A1 = P(Class0)× f (A1)A2 = P(Class0)× f (A2)average(A1,A2) =average(0.5 ×0.25, 0.5× 0.25) = 0.125

S(1)

A1 = P(Class1)× f (A1)A2 = P(Class1)× f (A2)average(A1,A2) = average(0.5× 0.50, 0.5× 0.25) = 0.187

S(0) < S(1)

Class 1

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 14 / 39

Combined Sturges

1 Statistical Criterion

Test Point : T1

3 4

maximumFrequency

Classificationcriteria : Statistical

ClassificationScore: MinimumDistance

Table: Maximum Frequency in Classes

A1 f (A1) A2 f (A2)

1 0.25 1 0.503 0.25 2 0.254 0.50 3 0.25

A1 f (A1) A2 f (A2)

1 0.25 2 0.752 0.25 3 0.253 0.50

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 15 / 39

Combined Sturges

Classification Score

S(0)

A1 = (4− 3) = 1A2 = (4− 1) = 3average(A1,A2) = average(1, 3) = 2

S(1)

A1 = (3− 3) = 0A2 = (4− 2) = 2average(A1,A2) = average(0, 2) = 1

S(0) > S(1)

Class 1

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 16 / 39

Combined Sturges

1 Combined Criterion

Test Point : T1

3 4

d =(T1 − A1).f (A1)

Expected DistanceED = EDc

A1.EDcA2

min ExpectedDistance, ED

Table: Aggregate Expected Distance, ED

A1 f (A1) d .f A2 f (A2) d .f

1 0.25 0.50 1 0.50 1.503 0.25 0 2 0.25 0.504 0.50 0.50 3 0.25 0.25

ED0A1 1.00 ED0

A2 2.25

A1 f (A1) d .f A2 f (A2) d .f

1 0.25 0.50 2 0.75 1.502 0.25 0.25 3 0.25 0.253 0.50 0

ED1A1 0.75 ED1

A2 1.75

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 17 / 39

Combined Sturges

Classification Penalty

S(0)

ED = 1.00× 2.25 = 2.25S(0) = ED × (1− P(Class0)) = 1.125

S(1)

ED = 0.75× 1.75 = 1.31S(1) = ED × (1− P(Class1)) = 0.655

S(0) > S(1)

Class 1

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 18 / 39

The Noise Model

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 19 / 39

The Noise Model

Dealing with Noise

Brodley & Fried, 1999 - detect and reduce noise

Kubica & Moore, 2003 - identify Noise using a probabilistic modeland remove it.

Elias Kalapanidas, 2003 - Developed a Noise Model based on dataproperties.

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 20 / 39

The Noise Model

Additive Noise, x′

= x + δx

δx = σxj × zi,jσxj , standard deviation of attribute j,zi,j = CDF (pi,j)

xi ,j =

{x

′i ,j if pi ,j ≥ n

xi ,j if pi ,j < n(1)

Based on Noise level n ∈ {0, 0.15, 0.30, 0.50, 0.80}

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 21 / 39

The Noise Model

Attribute-level Noise

Table: Original Dataset

A1 A2 Class3 2 11 2 14 2 03 2 11 1 02 2 13 3 04 1 0

Table: 40% (n = 0.4) Noisy Dataset

A1 A2 Class8.5 0.55 18.9 2 14 0.7 03 2 14.7 1 02 2 13 3 01.6 0.02 0

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 22 / 39

Datasets

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 23 / 39

Datasets

Artificial datasets

Multivariate Normal

x1 = random Normal vector, t = random Normal vectorx2 = 0.8x1 + 0.6tx3 = 0.6x1 + 0.8tx4 = t

Linear Function with Non-normal inputs

x2 = (x1)2 + 0.5t

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 24 / 39

Datasets

2 Artificial datasets

Different Imbalanced-Ratio

3 Real Datasets

Table: Comparison of physical properties of Datasets.

DatasetNo. of

SamplesNo. ofClasses

No. ofAttributes

AttributeValue

ImbalanceRatio

Haberman 306 2 3 Integer 2.78A1 200 3 4 Real 6.66A2 200 3 4 Real 39Iris 150 3 4 Real 2

PimaDiabetes

768 2 8Integer,Real

1.87

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 25 / 39

Process Flow

1 Create Artificial Datasets

2 Implement the Noise model on all Datasets

3 Apply the three algorithms

4 Compare the results

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 26 / 39

Results

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 27 / 39

Results

Performance Measures

Confusion Matrix

Table: Confusion matrix for 2 classes.

Predicted OutcomePositive Negative

Actual valuesPositive TP FNNegative FP TN

Accuracy Acc = TP+TNTP+TN+FP+FN

Precision P = TPTP+FP

Recall R = TPTP+FN

F-measure Fα = PRαP+(1−α)R

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 28 / 39

Results

Non-Noisy Datasets

Artificial Datasets -

knn does the best - 91.2% & 93.7%Good improvement in CS from 65% - 76%

Table: Non-Noisy Artificial Datasets - Performance of all algorithms

Dataset Algorithm Accuracy Precision Recall F-measure

CS 65.0 63.5 70.1 66.6knn 91.2 92.8 87.4 89.8A1Naive Bayes 60.2 61.6 60.14 64.1

CS 76.0 68.4 71.62 69.7knn 93.7 94.7 91.9 93.2A2Naive Bayes 63.1 61.1 65.2 63.5

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 29 / 39

Results

Real Datasets -Iris : knn does best. Followed by Naive Bayes.Haberman : CS does best. Naive Bayes is really bad.Pima-Diabetes : CS is best. Naive Bayes follows.

Table: Non-Noisy Real Datasets - Performance of all algorithms

Dataset Algorithm Accuracy Precision Recall F-Measure

CS 94.3 95.1 94.3 94.7knn 96.7 96.8 96.7 96.8IrisNaive Bayes 96.2 93.7 95 94.3

CS 75.2 67.2 61.6 64.2knn 73.4 63.2 54.8 58.5HabermanNaive Bayes 0.5 41.9 47.6 47.3

CS 73.7 74.9 65.1 69.6knn 64.5 65.6 66.9 66.3

Pima -Diabetes

Naive Bayes 70.3 59.2 56.7 57.9

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 30 / 39

Results

Noisy Datasets : A1knn does best.For both knn and CS, No change with noiseNaive Bayes does bad.

Table: Noisy A1 dataset - Performance of all algorithms

Algorithm Noise % Accuracy Precision Recall F-Measure

0 65 63.5 70.1 66.615 64.8 63.4 96.7 96.8CS50 65.5 63.2 95 94.3

0 87.5 87.2 61.6 61.615 87.3 88.1 54.8 58.5knn50 86.7 88.5 47.6 47.3

0 ≈ 0 ≈ 0 ≈ 0 ≈ 015 ≈ 0 ≈ 0 ≈ 0 ≈ 0Naive Bayes50 ≈ 0 ≈ 0 ≈ 0 ≈ 0

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 31 / 39

Results

Noisy Datasets : A2knn does best, but goes from 92.6% - 86.3%For CS, no change with noiseFrom A1 to A2, CS : 65% - 76%

Table: Noisy A2 dataset - Performance of all algorithms

Algorithm Noise % Accuracy Precision Recall F-Measure

0 76.0 68.4 71.6 69.715 76.8 64.7 73.1 68.4CS50 76.4 66.9 71.7 68.5

0 92.6 86.9 85.5 86.215 91.1 84.2 84.2 83.5knn50 86.3 83.0 78.2 77.9

0 ≈ 0 ≈ 0 ≈ 0 ≈ 015 ≈ 0 ≈ 0 ≈ 0 ≈ 0Naive Bayes50 ≈ 0 ≈ 0 ≈ 0 ≈ 0

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 32 / 39

Results

Noisy Datasets : Irisknn does best at 0% Noise (96.7%) , then CS 94.5%CS does best at 50% Noise - 73.1%, then knn - 63.8%

Table: Noisy Iris dataset - Performance of all algorithms

Algorithm Noise % Accuracy Precision Recall F-Measure

0 94.5 94.9 94.5 94.715 86.2 87.6 86.2 86.9CS50 73.1 74.9 73.1 73.9

0 96.7 96.8 96.7 96.815 83.6 84.6 83.6 84.1knn50 63.8 63.2 63.8 63.5

0 93.3 92.3 91.9 92.115 92.3 91.5 91.2 91.4Naive Bayes50 0.7 18.3 0.7 NaN

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 33 / 39

Results

Noisy Datasets : HabermanCS does best at 74.7%Naive Bayes performs badly at ≈ 43%

Table: Noisy Haberman dataset - Performance of all algorithms

Algorithm Noise % Accuracy Precision Recall F-Measure

0 74.7 66.7 61.4 63.915 66.1 62.2 61.9 62.0CS50 74.5 66.6 63 64.7

0 74.1 65.7 55.1 59.715 72.0 56.2 52.3 54.0knn50 70.5 51.8 50.6 51.0

0 41.0 47.1 46.5 46.815 43.3 46.2 45.3 45.7Naive Bayes50 41.4 34.7 32.4 31.8

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 34 / 39

Results

Noisy Datasets : Pima-DiabetesCS does best, followed by knnNaive Bayes bad with Noise : 70% - 55.7% - 0%

Table: Noisy Pima-Diabetes dataset - Performance of all algorithms

Algorithm Noise % Accuracy Precision Recall F-Measure

0 72.8 72.8 64.2 68.215 70.8 68.3 65.8 67CS50 67.0 64.9 55.9 60.0

0 63.5 64.6 65.9 65.215 60.8 61.2 62.3 61.7knn50 55.0 55.6 56.1 55.8

0 70.3 59.2 56.7 57.915 55.7 49.4 46.0 NaNNaive Bayes50 0 0 0 NaN

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 35 / 39

Results

Results Summary

Table: Best Algorithm for different Noise Levels

Dataset 0% Noise 15% Noise 50% Noise

A1 knn knn knnA2 knn knn knn

Haberman CS knn CSIris knn Naive Bayes CS

Pima -Diabetes

CS CS CS

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 36 / 39

Conclusion

No algorithm is best.

In general knn has better accuracy but CS is more robust to noise.

Naive Bayes does much worse for noise, than others.

Also:

CS performs well for Imbalanced Datasets.

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 37 / 39

Future Work

Test with more datasets.

Test for performance on imbalanced datasets.

Only additive Noise model was used, try with other variations.

Compare with more algorithms.

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 38 / 39

Questions

Questions?

Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 39 / 39

Recommended