Upload
ilya-kuzovkin
View
357
Download
0
Embed Size (px)
Citation preview
ONE MACHINE LEARNING USE CASE
Can we ask a computer to create those patterns
automatically?
Can we ask a computer to create those patterns
automatically?
Yes
Can we ask a computer to create those patterns
automatically?
Yes
How?
Raw data
Instance Raw dataClass (label)A data sample:
“7”
Instance Raw dataClass (label)A data sample:
“7”
How to represent it in a machine-readable form?
Instance Raw dataClass (label)A data sample:
“7”
How to represent it in a machine-readable form?
Feature extraction
Instance Raw dataClass (label)A data sample:
“7”
How to represent it in a machine-readable form?
Feature extraction
28 p
x
28 px
Instance Raw dataClass (label)A data sample:
“7”
28 p
x
28 px784 pixels in total
Feature vector(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)
How to represent it in a machine-readable form?
Feature extraction
Instance Raw dataClass (label)A data sample:
“7”
28 p
x
28 px784 pixels in total
Feature vector(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)
How to represent it in a machine-readable form?
Feature extraction
(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)
(0, 0, 0, …, 13, 48, 102, 0, 46, 255,… 0, 0, 0)
(0, 0, 0, …, 17, 34, 12, 43, 122, 70,… 0, 7, 0)
(0, 0, 0, …, 98, 21, 255, 255, 231, 140,… 0, 0, 0)
“7”“2”
“8”“2”
Instance Raw dataClass (label)A data sample:
“7”
28 p
x
28 px784 pixels in total
Feature vector(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)
How to represent it in a machine-readable form?
Feature extraction
(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)
(0, 0, 0, …, 13, 48, 102, 0, 46, 255,… 0, 0, 0)
(0, 0, 0, …, 17, 34, 12, 43, 122, 70,… 0, 7, 0) Dataset(0, 0, 0, …, 98, 21, 255, 255, 231, 140,… 0, 0, 0)
“7”“2”
“8”“2”
The data is in the right format — what’s next?
The data is in the right format — what’s next?• C4.5• Randomforests• Bayesiannetworks• HiddenMarkovmodels• Artificialneuralnetwork• Dataclustering• Expectation-maximizationalgorithm• Self-organizingmap• Radialbasisfunctionnetwork• VectorQuantization• Generativetopographicmap• Informationbottleneckmethod• IBSEAD• Apriorialgorithm• Eclatalgorithm• FP-growthalgorithm• Single-linkageclustering• Conceptualclustering• K-meansalgorithm• Fuzzyclustering• Temporaldifferencelearning• Q-learning• LearningAutomata
• AODE• Artificialneuralnetwork• Backpropagation• NaiveBayesclassifier• Bayesiannetwork• Bayesianknowledgebase• Case-basedreasoning• Decisiontrees• Inductivelogicprogramming• Gaussianprocessregression• Geneexpressionprogramming• Groupmethodofdatahandling(GMDH)• LearningAutomata• LearningVectorQuantization• LogisticModelTree• Decisiontree• Decisiongraphs• Lazylearning• MonteCarloMethod• SARSA
• Instance-basedlearning• NearestNeighborAlgorithm• Analogicalmodeling• Probablyapproximatelycorrectlearning(PACL)• Symbolicmachinelearningalgorithms• Subsymbolicmachinelearningalgorithms• Supportvectormachines• RandomForest• Ensemblesofclassifiers• Bootstrapaggregating(bagging)• Boosting(meta-algorithm)• Ordinalclassification• Regressionanalysis• Informationfuzzynetworks(IFN)• Linearclassifiers• Fisher'slineardiscriminant• Logisticregression• NaiveBayesclassifier• Perceptron• Supportvectormachines• Quadraticclassifiers• k-nearestneighbor• Boosting
Pick an algorithm
The data is in the right format — what’s next?• C4.5• Randomforests• Bayesiannetworks• HiddenMarkovmodels• Artificialneuralnetwork• Dataclustering• Expectation-maximizationalgorithm• Self-organizingmap• Radialbasisfunctionnetwork• VectorQuantization• Generativetopographicmap• Informationbottleneckmethod• IBSEAD• Apriorialgorithm• Eclatalgorithm• FP-growthalgorithm• Single-linkageclustering• Conceptualclustering• K-meansalgorithm• Fuzzyclustering• Temporaldifferencelearning• Q-learning• LearningAutomata
• AODE• Artificialneuralnetwork• Backpropagation• NaiveBayesclassifier• Bayesiannetwork• Bayesianknowledgebase• Case-basedreasoning• Decisiontrees• Inductivelogicprogramming• Gaussianprocessregression• Geneexpressionprogramming• Groupmethodofdatahandling(GMDH)• LearningAutomata• LearningVectorQuantization• LogisticModelTree• Decisiontree• Decisiongraphs• Lazylearning• MonteCarloMethod• SARSA
• Instance-basedlearning• NearestNeighborAlgorithm• Analogicalmodeling• Probablyapproximatelycorrectlearning(PACL)• Symbolicmachinelearningalgorithms• Subsymbolicmachinelearningalgorithms• Supportvectormachines• RandomForest• Ensemblesofclassifiers• Bootstrapaggregating(bagging)• Boosting(meta-algorithm)• Ordinalclassification• Regressionanalysis• Informationfuzzynetworks(IFN)• Linearclassifiers• Fisher'slineardiscriminant• Logisticregression• NaiveBayesclassifier• Perceptron• Supportvectormachines• Quadraticclassifiers• k-nearestneighbor• Boosting
Pick an algorithm
DECISION TREE
vs.
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
PIXEL #417
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
PIXEL #417
PIXEL #417
>200 <200
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
PIXEL #417
PIXEL #417
>200 <200
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
PIXEL #417
>200 <200
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
PIXEL #417
>200 <200
PIXEL #123
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
PIXEL #417
>200 <200
PIXEL #123
<100 >100
PIXEL #123
DECISION TREE
vs.
(0, …, 28, 65, …, 207, 101, 0, 0)
(0, …, 19, 34, …, 254, 54, 0, 0)
(0, …, 87, 59, …, 240, 52, 4, 0)
(0, …, 87, 52, …, 240, 19, 3, 0)
(0, …, 28, 64, …, 102, 101, 0, 0)
(0, …, 19, 23, …, 105, 54, 0, 0)
(0, …, 87, 74, …, 121, 51, 7, 0)
(0, …, 87, 112, …, 239, 52, 4, 0)
PIXEL #417
>200 <200
<100 >100
PIXEL #123
DECISION TREE
DECISION TREE
ACCURACY
ACCURACY
Confusion matrix
True
cla
ss
Predicted class
ACCURACY
Confusion matrix
acc =
correctly classified
total number of samples
True
cla
ss
Predicted class
ACCURACY
Confusion matrix
acc =
correctly classified
total number of samples
Beware of an imbalanced dataset!
True
cla
ss
Predicted class
ACCURACY
Confusion matrix
acc =
correctly classified
total number of samples
Beware of an imbalanced dataset!
Consider the following model: “Always predict 2”
True
cla
ss
Predicted class
ACCURACY
Confusion matrix
acc =
correctly classified
total number of samples
Beware of an imbalanced dataset!
Consider the following model: “Always predict 2”
Accuracy 0.9
True
cla
ss
Predicted class
DECISION TREE
DECISION TREE
“You said 100% accurate?! Every 10th digit your system detects is wrong!”
Angry client
DECISION TREE
“You said 100% accurate?! Every 10th digit your system detects is wrong!”
Angry client
We’ve trained our system on the data the client gave us. But our system has never seen the new data the client applied it to.
And in the real life — it never will…
OVERFITTING
Simulate the real-life situation — split the dataset
OVERFITTING
Simulate the real-life situation — split the dataset
OVERFITTING
Simulate the real-life situation — split the dataset
OVERFITTING
Simulate the real-life situation — split the dataset
Underfitting!“Too stupid” OK Overfitting!
“Too smart”
OVERFITTING
Underfitting!“Too stupid” OK Overfitting!
“Too smart”
OVERFITTING
Our current decision tree has too much capacity, it just has memorized all of the data.
Let’s make it less complex.
You probably did not notice, but we are overfitting again :(
TEST SET 20%
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
TEST SET 20%
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
Fit various models and parameter combinations on this subset
TEST SET 20%
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
Fit various models and parameter combinations on this subset
• Evaluate the models created with different parameters
TEST SET 20%
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
Fit various models and parameter combinations on this subset
• Evaluate the models created with different parameters
!• Estimate overfitting
TRAVALI
TEST SET 20%
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
Fit various models and parameter combinations on this subset
• Evaluate the models created with different parameters
!• Estimate overfitting
TRAVALITRAVALI
TEST SET 20%
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
Fit various models and parameter combinations on this subset
• Evaluate the models created with different parameters
!• Estimate overfitting
TRAVALITRAVALITRAVALI
TEST SET 20%
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
Fit various models and parameter combinations on this subset
• Evaluate the models created with different parameters
!• Estimate overfitting
TRAVALITRAVALITRAVALITRAVALI
TEST SET 20%
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
Fit various models and parameter combinations on this subset
• Evaluate the models created with different parameters
!• Estimate overfitting
TRAVALITRAVALITRAVALITRAVALITRAVALI
TEST SET 20%
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
Fit various models and parameter combinations on this subset
• Evaluate the models created with different parameters
!• Estimate overfitting
Use only once to get the final performance estimate
TRAVALITRAVALITRAVALITRAVALITRAVALI
TEST SET 20%
TRAINING SET 60%
VALIDATION SET 20%
TEST SET 20%
TRAINING SET 60%
VALIDATION SET 20%
CROSS-VALIDATION
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
CROSS-VALIDATION
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
What if we got too optimistic validation set?
CROSS-VALIDATION
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
What if we got too optimistic validation set?
TRAINING SET 80%
CROSS-VALIDATION
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
What if we got too optimistic validation set?
TRAINING SET 80%
Fix the parameter value you ned to evaluate, say msl=15
CROSS-VALIDATION
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
What if we got too optimistic validation set?
TRAINING SET 80%
Fix the parameter value you ned to evaluate, say msl=15
TRAINING VAL
TRAINING VAL
TRAININGVAL
Repeat 10 times
CROSS-VALIDATION
TRAINING SET 60%
THE WHOLE DATASET
VALIDATION SET 20%
What if we got too optimistic validation set?
TRAINING SET 80%
Fix the parameter value you ned to evaluate, say msl=15
TRAINING VAL
TRAINING VAL
TRAININGVAL
Repeat 10 times } Take average validation score over 10 runs — it is a more stable estimate.
MACHINE LEARNING PIPELINE
Take raw data Extract features Split into TRAINING and TEST
Pick an algorithm and parameters
Train on the TRAINING data
Evaluate on the TRAINING data
with CV
Train on the whole TRAINING
Fix the best parameters
Evaluate on TESTReport final
performance to the client
Try our different algorithms and parameters
MACHINE LEARNING PIPELINE
Take raw data Extract features Split into TRAINING and TEST
Pick an algorithm and parameters
Train on the TRAINING data
Evaluate on the TRAINING data
with CV
Train on the whole TRAINING
Fix the best parameters
Evaluate on TESTReport final
performance to the client
Try our different algorithms and parameters
“So it is ~87%…erm… Could you do better?”
MACHINE LEARNING PIPELINE
Take raw data Extract features Split into TRAINING and TEST
Pick an algorithm and parameters
Train on the TRAINING data
Evaluate on the TRAINING data
with CV
Train on the whole TRAINING
Fix the best parameters
Evaluate on TESTReport final
performance to the client
Try our different algorithms and parameters
“So it is ~87%…erm… Could you do better?”
Yes
• C4.5• Randomforests• Bayesiannetworks• HiddenMarkovmodels• Artificialneuralnetwork• Dataclustering• Expectation-maximizationalgorithm• Self-organizingmap• Radialbasisfunctionnetwork• VectorQuantization• Generativetopographicmap• Informationbottleneckmethod• IBSEAD• Apriorialgorithm• Eclatalgorithm• FP-growthalgorithm• Single-linkageclustering• Conceptualclustering• K-meansalgorithm• Fuzzyclustering• Temporaldifferencelearning• Q-learning• LearningAutomata
• AODE• Artificialneuralnetwork• Backpropagation• NaiveBayesclassifier• Bayesiannetwork• Bayesianknowledgebase• Case-basedreasoning• Decisiontrees• Inductivelogicprogramming• Gaussianprocessregression• Geneexpressionprogramming• Groupmethodofdatahandling(GMDH)• LearningAutomata• LearningVectorQuantization• LogisticModelTree• Decisiontree• Decisiongraphs• Lazylearning• MonteCarloMethod• SARSA
• Instance-basedlearning• NearestNeighborAlgorithm• Analogicalmodeling• Probablyapproximatelycorrectlearning(PACL)• Symbolicmachinelearningalgorithms• Subsymbolicmachinelearningalgorithms• Supportvectormachines• RandomForest• Ensemblesofclassifiers• Bootstrapaggregating(bagging)• Boosting(meta-algorithm)• Ordinalclassification• Regressionanalysis• Informationfuzzynetworks(IFN)• Linearclassifiers• Fisher'slineardiscriminant• Logisticregression• NaiveBayesclassifier• Perceptron• Supportvectormachines• Quadraticclassifiers• k-nearestneighbor• Boosting
Pick another algorithm
• C4.5• Randomforests• Bayesiannetworks• HiddenMarkovmodels• Artificialneuralnetwork• Dataclustering• Expectation-maximizationalgorithm• Self-organizingmap• Radialbasisfunctionnetwork• VectorQuantization• Generativetopographicmap• Informationbottleneckmethod• IBSEAD• Apriorialgorithm• Eclatalgorithm• FP-growthalgorithm• Single-linkageclustering• Conceptualclustering• K-meansalgorithm• Fuzzyclustering• Temporaldifferencelearning• Q-learning• LearningAutomata
• AODE• Artificialneuralnetwork• Backpropagation• NaiveBayesclassifier• Bayesiannetwork• Bayesianknowledgebase• Case-basedreasoning• Decisiontrees• Inductivelogicprogramming• Gaussianprocessregression• Geneexpressionprogramming• Groupmethodofdatahandling(GMDH)• LearningAutomata• LearningVectorQuantization• LogisticModelTree• Decisiontree• Decisiongraphs• Lazylearning• MonteCarloMethod• SARSA
• Instance-basedlearning• NearestNeighborAlgorithm• Analogicalmodeling• Probablyapproximatelycorrectlearning(PACL)• Symbolicmachinelearningalgorithms• Subsymbolicmachinelearningalgorithms• Supportvectormachines• RandomForest• Ensemblesofclassifiers• Bootstrapaggregating(bagging)• Boosting(meta-algorithm)• Ordinalclassification• Regressionanalysis• Informationfuzzynetworks(IFN)• Linearclassifiers• Fisher'slineardiscriminant• Logisticregression• NaiveBayesclassifier• Perceptron• Supportvectormachines• Quadraticclassifiers• k-nearestneighbor• Boosting
Pick another algorithm
RANDOM FOREST
RANDOM FORESTDecision tree:
pick best out of all features
RANDOM FORESTDecision tree:
pick best out of all featuresRandom forest:
pick best out of random subset of features
RANDOM FOREST
RANDOM FOREST
pick best out of another random subset of features
RANDOM FOREST
pick best out of another random subset of features pick best out of yet another
random subset of features
RANDOM FOREST
RANDOM FOREST
RANDOM FOREST
class
instance
RANDOM FOREST
class
instance
RANDOM FOREST
class
instance
RANDOM FOREST
class
instance
Happy client
ALL OTHER USE CASES
Sound
Frequency components Genre Bag of
words Topic
Text
Pixel values
Image
Cat or dog
Video
Frame pixels
Walking or running
Database records Biometric data
Census data
Average salary … Dead or
alive
HANDS-ON SESSION
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html