17
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp. 16-21)

Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp. 16-21)

Embed Size (px)

Citation preview

Page 1: Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp. 16-21)

Evaluating Hypotheses

Reading:

Coursepack: Learning From Examples,

Section 4 (pp. 16-21)

Page 2: Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp. 16-21)

Evaluating Hypotheses

• What we want: hypothesis that best predicts unseen data

• Assumption: Data is “iid” (independently and identically distributed)

Page 3: Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp. 16-21)

Accuracy and Error

• Accuracy = fraction of correct classifications on unseen data (test set)

• Error rate = 1 − Accuracy

Page 4: Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp. 16-21)

How to use available data to best measure accuracy?

Page 5: Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp. 16-21)

How to use available data to best measure accuracy?

Split data into training and test sets.

Page 6: Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp. 16-21)

How to use available data to best measure accuracy?

Split data into training and test sets.

But how to split?

Page 7: Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp. 16-21)

How to use available data to best measure accuracy?

Split data into training and test sets.

But how to split?

Too little training data:

Too little test data:

Page 8: Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp. 16-21)

How to use available data to best measure accuracy?

Split data into training and test sets.

But how to split?

Too little training data: Don’t get optimal classifier

Too little test data: Measured accuracy is not correct

Page 9: Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp. 16-21)

One solution: “k-fold cross validation”

• Each example is used both as a training instance and as a test instance.

• Split data into k disjoint parts: S1, S2, ..., Sk.

• For i = 1 to k

Select Si to be the test set. Train on the remaining data, test on Si , to obtain accuracy Ai .

• Report as the final accuracy.

Page 10: Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp. 16-21)

Avoid “peeking” at test data when training

Example from readings:

Split data into training and test sets.

Train model with one learning parameter (e.g., “gain” vs “gain ratio”)

Test on test set.

Repeat with other learning parameter.

Test on test set.

Return accuracy of model with best performance.

What’s wrong with this procedure?

Page 11: Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp. 16-21)

Avoid “peeking” at test data when training

Example from readings:

Split data into training and test sets.

Train model with one learning parameter (e.g., “gain” vs “gain ratio”)

Test on test set.

Repeat with other learning parameter.

Test on test set.

Return accuracy of model with best performance.

Problem: You used the test set to select the best model – but is part of the learning process! Risk of overfitting to a particular test set.

Need to evaluate final learned model on previously unseen data.

Page 12: Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp. 16-21)

• Can also solve this problem by using k-fold cross-validation to select model parameters, and then evaluate the resulting model on unseen test data that has been set aside previous to training.

Page 13: Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp. 16-21)

“Confusion matrix” for a given class c

Predicted (or “classified”)

True False

Actu al (in class c)(not in class c)

True (in class c) TruePositiveFalseNegative

False (not in class c) FalsePositive TrueNegative

Evaluating classification algorithms

Page 14: Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp. 16-21)

“Confusion matrix” for a given class c

Predicted (or “classified”)

True False

Actu al (in class c)(not in class c)

True (in class c) TruePositiveFalseNegative

False (not in class c) FalsePositive TrueNegative

Evaluating classification algorithms

Type 2 error

Type 1 error

Page 15: Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp. 16-21)

• Precision: Fraction of true positives out of all predicted positives:

• Recall: Fraction of true positives out of all actual positives:

Page 16: Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp. 16-21)

Error vs. Loss

• Loss functions

Page 17: Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp. 16-21)

Regularization