Upload
akrita-agarwal
View
380
Download
4
Embed Size (px)
Citation preview
Exploring the Noise ResilienceCombined Sturges Algorithm
Akrita AgarwalAdvisor: Dr. Anca Ralescu
November 7, 2015
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 1 / 39
Motivation
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 2 / 39
Motivation
A study on Noise?
Real-world datasets are noisyRecordings under normal environmental conditionsEquipment Measurement ErrorMost algorithms ignore Noise.Not a lot of research done on Noise.
Aim : Explore the robustness of algorithms to Noise.
Which algorithm is least affected by noisy Datasets?
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 3 / 39
Classification
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 4 / 39
Classification
Classification : Assigning a new observation to a set of knowncategories
Companies store large amounts of data.
Effective Classifier can assist in making good predictions and informedbusiness decisions.
E.g. Whether to recommend Prime products to the non-primecustomers, based on behavior
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 5 / 39
Classification Algorithms
Two broad kinds of Classifiers are -
Frequency based classifiers: use the frequency of datapoints in thedataset to determine the class membership of a given test point,Geometry based classifiers leverage the geometrical aspects of adataset such as the distance.
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 6 / 39
Naive Bayes
The Naive Bayes Classifier
Frequency based classifierComputes the probability of a test data point to be in each classclass probability extracted from training data.
Pros
Intuitive to understand and build.Easily trained, even with a small datasetIt’s fast
Cons
Assumes conditional independence of the dataignores the underlying geometry of the data.
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 7 / 39
k Nearest Neighbors
The k Nearest Neighbors Classifier
Geometry based classifierAssigns the class to test data point by determining the majority classof k nearest points
Pros
Easy to implement and understandClasses don’t have to be linearly separable
Cons
Tends to ignore the importance of an attribute; uses allonly indirectly takes into account the frequency of the data
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 8 / 39
Combined Sturges Classifier
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 9 / 39
Combined Sturges
The Combined Sturges(CS)Classifier
Explicitly uses geometry + frequencyData represented as Frequency distribution on class.Classification Score is computed for each class.Test point assigned to class with highest Score.
Continuous data values are binned.
No. of bins = d1 + log2neSturges, 1926 - Choice of a Class Interval
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 10 / 39
Combined Sturges
Dummy Dataset
Table: Dummy Dataset
A1 A2 Class3 2 11 2 14 2 03 2 11 1 02 2 13 3 04 1 0
Table: Frequency Distribution on Classes 0 & 1
A1 f (A1) A2 f (A2)
1 0.25 1 0.503 0.25 2 0.254 0.50 3 0.25
A1 f (A1) A2 f (A2)
1 0.25 2 0.752 0.25 3 0.253 0.50
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 11 / 39
Combined Sturges
Dummy Dataset
Table: Dummy Dataset
A1 A2 Class3 2 11 2 14 2 03 2 11 1 02 2 13 3 04 1 0
Table: Frequency Distribution on Classes 0 & 1
A1 f (A1) A2 f (A2)
1 0.25 1 0.503 0.25 2 0.254 0.50 3 0.25
A1 f (A1) A2 f (A2)
1 0.25 2 0.752 0.25 3 0.253 0.50
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 11 / 39
Combined Sturges
Test Point : T1
3 4
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 12 / 39
Combined Sturges
1 Geometric Criterion
Test Point : T1
3 4
minimum Distance
ClassificationCriteria :Geometric
ClassificationScore: HighestPosteriorProbability
Table: Nearest distance of T1 to Classes
A1 f (A1) A2 f (A2)
1 0.25 1 0.503 0.25 2 0.254 0.50 3 0.25
A1 f (A1) A2 f (A2)
1 0.25 2 0.752 0.25 3 0.253 0.50
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 13 / 39
Combined Sturges
Classification Score, S(c) c ∈ 0, 1
S(0)
A1 = P(Class0)× f (A1)A2 = P(Class0)× f (A2)average(A1,A2) =average(0.5 ×0.25, 0.5× 0.25) = 0.125
S(1)
A1 = P(Class1)× f (A1)A2 = P(Class1)× f (A2)average(A1,A2) = average(0.5× 0.50, 0.5× 0.25) = 0.187
S(0) < S(1)
Class 1
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 14 / 39
Combined Sturges
1 Statistical Criterion
Test Point : T1
3 4
maximumFrequency
Classificationcriteria : Statistical
ClassificationScore: MinimumDistance
Table: Maximum Frequency in Classes
A1 f (A1) A2 f (A2)
1 0.25 1 0.503 0.25 2 0.254 0.50 3 0.25
A1 f (A1) A2 f (A2)
1 0.25 2 0.752 0.25 3 0.253 0.50
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 15 / 39
Combined Sturges
Classification Score
S(0)
A1 = (4− 3) = 1A2 = (4− 1) = 3average(A1,A2) = average(1, 3) = 2
S(1)
A1 = (3− 3) = 0A2 = (4− 2) = 2average(A1,A2) = average(0, 2) = 1
S(0) > S(1)
Class 1
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 16 / 39
Combined Sturges
1 Combined Criterion
Test Point : T1
3 4
d =(T1 − A1).f (A1)
Expected DistanceED = EDc
A1.EDcA2
min ExpectedDistance, ED
Table: Aggregate Expected Distance, ED
A1 f (A1) d .f A2 f (A2) d .f
1 0.25 0.50 1 0.50 1.503 0.25 0 2 0.25 0.504 0.50 0.50 3 0.25 0.25
ED0A1 1.00 ED0
A2 2.25
A1 f (A1) d .f A2 f (A2) d .f
1 0.25 0.50 2 0.75 1.502 0.25 0.25 3 0.25 0.253 0.50 0
ED1A1 0.75 ED1
A2 1.75
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 17 / 39
Combined Sturges
Classification Penalty
S(0)
ED = 1.00× 2.25 = 2.25S(0) = ED × (1− P(Class0)) = 1.125
S(1)
ED = 0.75× 1.75 = 1.31S(1) = ED × (1− P(Class1)) = 0.655
S(0) > S(1)
Class 1
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 18 / 39
The Noise Model
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 19 / 39
The Noise Model
Dealing with Noise
Brodley & Fried, 1999 - detect and reduce noise
Kubica & Moore, 2003 - identify Noise using a probabilistic modeland remove it.
Elias Kalapanidas, 2003 - Developed a Noise Model based on dataproperties.
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 20 / 39
The Noise Model
Additive Noise, x′
= x + δx
δx = σxj × zi,jσxj , standard deviation of attribute j,zi,j = CDF (pi,j)
xi ,j =
{x
′i ,j if pi ,j ≥ n
xi ,j if pi ,j < n(1)
Based on Noise level n ∈ {0, 0.15, 0.30, 0.50, 0.80}
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 21 / 39
The Noise Model
Attribute-level Noise
Table: Original Dataset
A1 A2 Class3 2 11 2 14 2 03 2 11 1 02 2 13 3 04 1 0
Table: 40% (n = 0.4) Noisy Dataset
A1 A2 Class8.5 0.55 18.9 2 14 0.7 03 2 14.7 1 02 2 13 3 01.6 0.02 0
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 22 / 39
Datasets
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 23 / 39
Datasets
Artificial datasets
Multivariate Normal
x1 = random Normal vector, t = random Normal vectorx2 = 0.8x1 + 0.6tx3 = 0.6x1 + 0.8tx4 = t
Linear Function with Non-normal inputs
x2 = (x1)2 + 0.5t
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 24 / 39
Datasets
2 Artificial datasets
Different Imbalanced-Ratio
3 Real Datasets
Table: Comparison of physical properties of Datasets.
DatasetNo. of
SamplesNo. ofClasses
No. ofAttributes
AttributeValue
ImbalanceRatio
Haberman 306 2 3 Integer 2.78A1 200 3 4 Real 6.66A2 200 3 4 Real 39Iris 150 3 4 Real 2
PimaDiabetes
768 2 8Integer,Real
1.87
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 25 / 39
Process Flow
1 Create Artificial Datasets
2 Implement the Noise model on all Datasets
3 Apply the three algorithms
4 Compare the results
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 26 / 39
Results
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 27 / 39
Results
Performance Measures
Confusion Matrix
Table: Confusion matrix for 2 classes.
Predicted OutcomePositive Negative
Actual valuesPositive TP FNNegative FP TN
Accuracy Acc = TP+TNTP+TN+FP+FN
Precision P = TPTP+FP
Recall R = TPTP+FN
F-measure Fα = PRαP+(1−α)R
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 28 / 39
Results
Non-Noisy Datasets
Artificial Datasets -
knn does the best - 91.2% & 93.7%Good improvement in CS from 65% - 76%
Table: Non-Noisy Artificial Datasets - Performance of all algorithms
Dataset Algorithm Accuracy Precision Recall F-measure
CS 65.0 63.5 70.1 66.6knn 91.2 92.8 87.4 89.8A1Naive Bayes 60.2 61.6 60.14 64.1
CS 76.0 68.4 71.62 69.7knn 93.7 94.7 91.9 93.2A2Naive Bayes 63.1 61.1 65.2 63.5
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 29 / 39
Results
Real Datasets -Iris : knn does best. Followed by Naive Bayes.Haberman : CS does best. Naive Bayes is really bad.Pima-Diabetes : CS is best. Naive Bayes follows.
Table: Non-Noisy Real Datasets - Performance of all algorithms
Dataset Algorithm Accuracy Precision Recall F-Measure
CS 94.3 95.1 94.3 94.7knn 96.7 96.8 96.7 96.8IrisNaive Bayes 96.2 93.7 95 94.3
CS 75.2 67.2 61.6 64.2knn 73.4 63.2 54.8 58.5HabermanNaive Bayes 0.5 41.9 47.6 47.3
CS 73.7 74.9 65.1 69.6knn 64.5 65.6 66.9 66.3
Pima -Diabetes
Naive Bayes 70.3 59.2 56.7 57.9
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 30 / 39
Results
Noisy Datasets : A1knn does best.For both knn and CS, No change with noiseNaive Bayes does bad.
Table: Noisy A1 dataset - Performance of all algorithms
Algorithm Noise % Accuracy Precision Recall F-Measure
0 65 63.5 70.1 66.615 64.8 63.4 96.7 96.8CS50 65.5 63.2 95 94.3
0 87.5 87.2 61.6 61.615 87.3 88.1 54.8 58.5knn50 86.7 88.5 47.6 47.3
0 ≈ 0 ≈ 0 ≈ 0 ≈ 015 ≈ 0 ≈ 0 ≈ 0 ≈ 0Naive Bayes50 ≈ 0 ≈ 0 ≈ 0 ≈ 0
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 31 / 39
Results
Noisy Datasets : A2knn does best, but goes from 92.6% - 86.3%For CS, no change with noiseFrom A1 to A2, CS : 65% - 76%
Table: Noisy A2 dataset - Performance of all algorithms
Algorithm Noise % Accuracy Precision Recall F-Measure
0 76.0 68.4 71.6 69.715 76.8 64.7 73.1 68.4CS50 76.4 66.9 71.7 68.5
0 92.6 86.9 85.5 86.215 91.1 84.2 84.2 83.5knn50 86.3 83.0 78.2 77.9
0 ≈ 0 ≈ 0 ≈ 0 ≈ 015 ≈ 0 ≈ 0 ≈ 0 ≈ 0Naive Bayes50 ≈ 0 ≈ 0 ≈ 0 ≈ 0
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 32 / 39
Results
Noisy Datasets : Irisknn does best at 0% Noise (96.7%) , then CS 94.5%CS does best at 50% Noise - 73.1%, then knn - 63.8%
Table: Noisy Iris dataset - Performance of all algorithms
Algorithm Noise % Accuracy Precision Recall F-Measure
0 94.5 94.9 94.5 94.715 86.2 87.6 86.2 86.9CS50 73.1 74.9 73.1 73.9
0 96.7 96.8 96.7 96.815 83.6 84.6 83.6 84.1knn50 63.8 63.2 63.8 63.5
0 93.3 92.3 91.9 92.115 92.3 91.5 91.2 91.4Naive Bayes50 0.7 18.3 0.7 NaN
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 33 / 39
Results
Noisy Datasets : HabermanCS does best at 74.7%Naive Bayes performs badly at ≈ 43%
Table: Noisy Haberman dataset - Performance of all algorithms
Algorithm Noise % Accuracy Precision Recall F-Measure
0 74.7 66.7 61.4 63.915 66.1 62.2 61.9 62.0CS50 74.5 66.6 63 64.7
0 74.1 65.7 55.1 59.715 72.0 56.2 52.3 54.0knn50 70.5 51.8 50.6 51.0
0 41.0 47.1 46.5 46.815 43.3 46.2 45.3 45.7Naive Bayes50 41.4 34.7 32.4 31.8
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 34 / 39
Results
Noisy Datasets : Pima-DiabetesCS does best, followed by knnNaive Bayes bad with Noise : 70% - 55.7% - 0%
Table: Noisy Pima-Diabetes dataset - Performance of all algorithms
Algorithm Noise % Accuracy Precision Recall F-Measure
0 72.8 72.8 64.2 68.215 70.8 68.3 65.8 67CS50 67.0 64.9 55.9 60.0
0 63.5 64.6 65.9 65.215 60.8 61.2 62.3 61.7knn50 55.0 55.6 56.1 55.8
0 70.3 59.2 56.7 57.915 55.7 49.4 46.0 NaNNaive Bayes50 0 0 0 NaN
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 35 / 39
Results
Results Summary
Table: Best Algorithm for different Noise Levels
Dataset 0% Noise 15% Noise 50% Noise
A1 knn knn knnA2 knn knn knn
Haberman CS knn CSIris knn Naive Bayes CS
Pima -Diabetes
CS CS CS
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 36 / 39
Conclusion
No algorithm is best.
In general knn has better accuracy but CS is more robust to noise.
Naive Bayes does much worse for noise, than others.
Also:
CS performs well for Imbalanced Datasets.
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 37 / 39
Future Work
Test with more datasets.
Test for performance on imbalanced datasets.
Only additive Noise model was used, try with other variations.
Compare with more algorithms.
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 38 / 39
Questions
Questions?
Akrita Agarwal Exploring the Noise Resilience Combined Sturges AlgorithmNovember 7, 2015 39 / 39