Evaluating Classifiers

Evaluating Classifiers

Reading:

T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7

(linked from class website)

http://tsam-fich.wikidot.com/local--files/apuntes/ROCintro.pdf

Evaluating Classifiers

• What we want: classifier that best predicts unseen data

• Assumption: Data is “iid” (independently and identically distributed)

Accuracy and Error Rate

• Accuracy = fraction of correct classifications on unseen data (test set)

• Error rate = 1 − Accuracy

How to use available data to best measure accuracy?

Split data into training and test sets.

But how to split?

Too little training data: Don’t get optimal classifier

Too little test data: Measured accuracy is not correct

One solution: “k-fold cross validation”

• Each example is used both as a training instance and as a test instance.

• Split data into k disjoint parts: S1, S2, ..., Sk.

• For i = 1 to kSelect Si to be the test set. Train on the remaining

data, test on Si , to obtain accuracy Ai .

• Report as the final accuracy.

“Confusion matrix” for a given class c

Actual Predicted (or “classified”) Positive Negative(in class c) (not in class c)

Positive (in class c) TruePositive FalseNegative

Negative (not in class c) FalsePositive TrueNegative

Evaluating classification algorithms

Example: “A” vs. “B”

Results from Perceptron:

Instance Class Perception Output1 “A” −12 “A” +13 “A” +14 “A” −15 “B” +16 “B” −17 “B” −11 “B” −1

Accuracy:

Actual Predicted Positive Negative

Positive

Negative

Assume “A” is positive class Confusion Matrix

Evaluating classification algorithms

“Confusion matrix” for a given class c

Actual Predicted (or “classified”) Positive Negative(in class c) (not in class c)

Positive (in class c) TruePositive FalseNegative

Negative (not in class c) FalsePositive TrueNegative

Type 2 error

Type 1 error

Precision and Recall

All instances in test set

Positive Instances

Negative Instances



Instancesclassifiedas positive

Positive Instances

Negative Instances

Instancesclassifiedas negative



Instancesclassifiedas positive

Positive Instances

Negative Instances

Recall = Sensitivity = True Positive Rate

Example: “A” vs. “B”

Results from Perceptron:

Instance Class Perception Output1 “A” −12 “A” +13 “A” +14 “A” −15 “B” +16 “B” −17 “B” −11 “B” −1

Assume “A” is positive class

Results of classifier

Threshold Accuracy Precision Recall

.9

.8

.7

.6

.5

.4

.3

.2

.1

-∞

Creating a Precision vs. Recall Curve

14

(“sensitivity”)(1 “specificity”)

15

Results of classifier

Threshold Accuracy TPR FPR

.9

.8

.7

.6

.5

.4

.3

.2

.1

-∞

Creating a ROC Curve

17

Precision/Recall versus ROC curves

http://blog.crowdflower.com/2008/06/aggregate-turker-judgments-threshold-calibration/

18

19

20

• There are various algorithms for computing AOC (often built into statistical software) – we won’t cover this in this class.

How to create a ROC curve for a perceptron

• Run your perceptron on each instance in the test data, without doing the sgn step:

• Get the range [min, max] of your scores

• Divide the range into about 200 thresholds, including −∞ and max

• For each threshold, calculate TPR and FPR

• Plot TPR (y-axis) vs. FPR (x-axis)

Documents

Evaluating Classifiers