Performance Measures in Data Mining - Rostlab€¦ · Performance Measures in Data Mining Common Performance Measures used in Data Mining and Machine Learning Approaches L. Richter

Performance

Performance Measures in Data MiningCommon Performance Measures used in Data Mining and

Machine Learning Approaches

L. Richter J.M. Cejuela

Department of Computer ScienceTechnische Universität München

Master Lab Course Data Mining, SS 2015, Jul 1st

Performance

Outline

Item Set and Association Rule WeightsSimple MeasuresComplex Measures

ClassificationBasic Performance MeasuresComplex Measures – Performance Curves

Regression

Assessment Strategies

Performance

Item Set and Association Rule Weights

Simple Measures

Measures for Item Sets

Various algorithms can yield frequent item sets. From frequentitem sets c and c ∪ {i} you can derive if c then {i}. Typicallythere is only one item in the RHS (right hand side of the rule).

I Support (of an item set):sup(X → Y ) = sup(Y → X ) = P(X ∧ Y )how many times the item set is found in the database

I Confidence (of a rule): conf (X → Y ) = P(Y |X ) =P(X and Y )/P(X ) = sup(X → Y )/sup(X )

Performance


Simple Measures





Performance


Simple Measures





Performance


Complex Measures

Measures for Item Sets cont.’d

Measures how frequent an item set / how interesting a rule is incomparison to the expected occurrence (interesting):

I Leverage (of an item set):lev(X → Y ) = P(X and Y )− (P(X )P(Y ))

I Lift (of a rule): lift(X → Y ) = lift(Y → X ) =P(X ∧ Y )/(P(X )P(Y )) =conf (X → Y )/sup(Y ) = conf (Y → X )/sup(X )

I Conviction (of a rule): Similar to Lift, but directedCompares the probability that X appears without Y, if theywere independent with the observed frequency of X and Y.conviction(X → Y ) = P(X )P(¬Y )/P(X ∧ ¬Y ) =(1− sup(Y )/(1− conf (X → Y ))

Performance


Complex Measures






Performance


Complex Measures






Performance


Complex Measures

Supplementary Material

J-MeasureI empirically observed accuracy of ruleI Cross-entropy (measuring how good a distribution

approximates another distribution) between the binaryvariables φ and θ with vs. without conditioning on event θ

J(θ ⇒ φ) = p(θ)

(p(φ|θ)log

p(φ|θ)

p(φ)+ (1− p(φ|θ)) log

1− p(φ|θ)

1− p(φ)

)

Performance

Classification

Basic Performance Measures

Basic Building Blocks for Performance Measures

I True Positive (TP): positive instances predicted as positiveI True Negative (TN): negative instances predicted as

negativeI False Positive (FP): negative instances predicted as

positiveI False Negative (FN): positive instances predicted as

negativeI Confusion Matrix:

Predicted a Predicted bReal a TP FNReal b FP TN

Performance

Classification


Performance Measures

Accuracy ,acc =TP + TN

TP + FN + FP + TN

=Number of correct predictions

Total number of predictions

Error rate,err =FN + FP

TP + FN + FP + TN= 1− acc

=Number of wrong predictionsTotal number of predictions

Performance

Classification


Performance Measures cont’d

True Positive Rate,TPR,Sensitivity =TP

TP + FN

True Negative Rate,TNR,Specificity =TN

TN + FP

False Positive Rate,FPR =FP

TN + FP

False Negative Rate,FNR =FN

TP + FN

Performance

Classification




TP + FN


TN + FP


TN + FP


TP + FN

Performance

Classification




TP + FN


TN + FP


TN + FP


TP + FN

Performance

Classification




TP + FN


TN + FP


TN + FP


TP + FN

Performance

Classification



Precision,p =TP

TP + FP

Recall , r =TP

TP + FN

F1measure =2rp

r + p=

2× TP2× TP + FP + FN

Performance

Classification



Precision,p =TP

TP + FP

Recall , r =TP

TP + FN

F1measure =2rp

r + p=


Performance

Classification



Precision,p =TP

TP + FP

Recall , r =TP

TP + FN

F1measure =2rp

r + p=


Performance

Classification

Complex Measures – Performance Curves

Performance Curves

I "costs" of different error types are differentI prediction behaviour changes over the test setI performance display in 2DI different domains prefer different chart types

Performance

Classification


Lift Charts

Taken fromhttp://www.dmg.org/rfc/pmml-3.2/ModelExplanation.html

I from marketing area toevaluate mailingsuccess

I y -axis: number orpercentage ofresponders

I x-axis: sampleI red diagonal: random

liftI green line: optimum lift

Performance

Classification


ROC Curves

Taken from http://en.wikipedia.org/wiki/File:Roccurves.png

I Receiver OperatorCharacteristics

I y -axis: TPRI x-axis: FPRI diagonal: random

guessing (TPR=FPR)

Performance

Classification


Sensitivity vs. Specificity

Taken from http://i.stack.imgur.com/fnUd2.png

I preferred in medicineI y -axis: TPRI x-axis: TNR (specificity)I also frequently as ROC

curve with 1 - specificity

Performance

Classification


Recall Precision Curves

Taken from http://scikit-learn.github.io/scikit-learn.org/

I preferred in informationretrieval

I positives are thedocuments retrieved inresponse to a query

I true positives aredocuments really relevantto the query

I y -axis: precisionI x-axis: recall

Performance

Regression

Error Measures for Regression

Mean − squared error ,MSE =(p1 − a1)2 + · · ·+ (pn − an)2

n

Root mean−squared error ,RMSE =

√(p1 − a1)2 + · · ·+ (pn − an)2

n

Mean − absolute error ,MAE =|p1 − a1|+ · · ·+ |pn − an|

n

Performance

Regression



n


√(p1 − a1)2 + · · ·+ (pn − an)2

n


n

Performance

Regression



n


√(p1 − a1)2 + · · ·+ (pn − an)2

n


n

Performance

Regression

Relative Error Measures

Relative − squared error =(p1 − a1)2 + · · ·+ (pn − an)2

(a1 − a)2 + · · ·+ (an − a)2

Root relative−squared error =

√(p1 − a1)2 + · · ·+ (pn − an)2

(a1 − a)2 + · · ·+ (an − a)2

Relative − absolute error =|p1 − a1|+ · · ·+ |pn − an||a1 − a|+ · · ·+ |an − a|

Performance


General Problem

I each algorithm abstracts from observations (instances)I the aspects kept and the aspect discarded differ between

the learning scheme (inductive bias)I this means also: information about individual instances are

contained in the model, tooI individual instance information leads to overfitting

Performance


Solution Strategies

I use fresh date, i.e. instances not used for the trainingI for very large numbers of instances: simple split in test and

training setI most common: 10-fold cross validationI LOOCV: Leave one out cross validation

Documents

Performance Measures in Data Mining - Rostlab€¦ · Performance Measures in Data Mining Common Performance Measures used in Data Mining and Machine Learning Approaches L. Richter