103
by Ilya Kuzovkin [email protected] Mooncascade ML Camp 2016 Machine Learning ESSENTIAL CONCEPTS

Introduction to Machine Learning @ Mooncascade ML Camp

Embed Size (px)

Citation preview

Page 1: Introduction to Machine Learning @ Mooncascade ML Camp

by Ilya Kuzovkin [email protected]

Mooncascade ML Camp 2016

Machine LearningESSENTIAL CONCEPTS

Page 2: Introduction to Machine Learning @ Mooncascade ML Camp

ONE MACHINE LEARNING USE CASE

Page 3: Introduction to Machine Learning @ Mooncascade ML Camp
Page 4: Introduction to Machine Learning @ Mooncascade ML Camp
Page 5: Introduction to Machine Learning @ Mooncascade ML Camp
Page 6: Introduction to Machine Learning @ Mooncascade ML Camp
Page 7: Introduction to Machine Learning @ Mooncascade ML Camp
Page 8: Introduction to Machine Learning @ Mooncascade ML Camp
Page 9: Introduction to Machine Learning @ Mooncascade ML Camp
Page 10: Introduction to Machine Learning @ Mooncascade ML Camp

Can we ask a computer to create those patterns

automatically?

Page 11: Introduction to Machine Learning @ Mooncascade ML Camp

Can we ask a computer to create those patterns

automatically?

Yes

Page 12: Introduction to Machine Learning @ Mooncascade ML Camp

Can we ask a computer to create those patterns

automatically?

Yes

How?

Page 13: Introduction to Machine Learning @ Mooncascade ML Camp

Raw data

Page 14: Introduction to Machine Learning @ Mooncascade ML Camp

Instance Raw dataClass (label)A data sample:

“7”

Page 15: Introduction to Machine Learning @ Mooncascade ML Camp

Instance Raw dataClass (label)A data sample:

“7”

How to represent it in a machine-readable form?

Page 16: Introduction to Machine Learning @ Mooncascade ML Camp

Instance Raw dataClass (label)A data sample:

“7”

How to represent it in a machine-readable form?

Feature extraction

Page 17: Introduction to Machine Learning @ Mooncascade ML Camp

Instance Raw dataClass (label)A data sample:

“7”

How to represent it in a machine-readable form?

Feature extraction

28 p

x

28 px

Page 18: Introduction to Machine Learning @ Mooncascade ML Camp

Instance Raw dataClass (label)A data sample:

“7”

28 p

x

28 px784 pixels in total

Feature vector(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)

How to represent it in a machine-readable form?

Feature extraction

Page 19: Introduction to Machine Learning @ Mooncascade ML Camp

Instance Raw dataClass (label)A data sample:

“7”

28 p

x

28 px784 pixels in total

Feature vector(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)

How to represent it in a machine-readable form?

Feature extraction

(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)

(0, 0, 0, …, 13, 48, 102, 0, 46, 255,… 0, 0, 0)

(0, 0, 0, …, 17, 34, 12, 43, 122, 70,… 0, 7, 0)

(0, 0, 0, …, 98, 21, 255, 255, 231, 140,… 0, 0, 0)

“7”“2”

“8”“2”

Page 20: Introduction to Machine Learning @ Mooncascade ML Camp

Instance Raw dataClass (label)A data sample:

“7”

28 p

x

28 px784 pixels in total

Feature vector(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)

How to represent it in a machine-readable form?

Feature extraction

(0, 0, 0, …, 28, 65, 128, 255, 101, 38,… 0, 0, 0)

(0, 0, 0, …, 13, 48, 102, 0, 46, 255,… 0, 0, 0)

(0, 0, 0, …, 17, 34, 12, 43, 122, 70,… 0, 7, 0) Dataset(0, 0, 0, …, 98, 21, 255, 255, 231, 140,… 0, 0, 0)

“7”“2”

“8”“2”

Page 21: Introduction to Machine Learning @ Mooncascade ML Camp

The data is in the right format — what’s next?

Page 22: Introduction to Machine Learning @ Mooncascade ML Camp

The data is in the right format — what’s next?• C4.5• Randomforests• Bayesiannetworks• HiddenMarkovmodels• Artificialneuralnetwork• Dataclustering• Expectation-maximizationalgorithm• Self-organizingmap• Radialbasisfunctionnetwork• VectorQuantization• Generativetopographicmap• Informationbottleneckmethod• IBSEAD• Apriorialgorithm• Eclatalgorithm• FP-growthalgorithm• Single-linkageclustering• Conceptualclustering• K-meansalgorithm• Fuzzyclustering• Temporaldifferencelearning• Q-learning• LearningAutomata

• AODE• Artificialneuralnetwork• Backpropagation• NaiveBayesclassifier• Bayesiannetwork• Bayesianknowledgebase• Case-basedreasoning• Decisiontrees• Inductivelogicprogramming• Gaussianprocessregression• Geneexpressionprogramming• Groupmethodofdatahandling(GMDH)• LearningAutomata• LearningVectorQuantization• LogisticModelTree• Decisiontree• Decisiongraphs• Lazylearning• MonteCarloMethod• SARSA

• Instance-basedlearning• NearestNeighborAlgorithm• Analogicalmodeling• Probablyapproximatelycorrectlearning(PACL)• Symbolicmachinelearningalgorithms• Subsymbolicmachinelearningalgorithms• Supportvectormachines• RandomForest• Ensemblesofclassifiers• Bootstrapaggregating(bagging)• Boosting(meta-algorithm)• Ordinalclassification• Regressionanalysis• Informationfuzzynetworks(IFN)• Linearclassifiers• Fisher'slineardiscriminant• Logisticregression• NaiveBayesclassifier• Perceptron• Supportvectormachines• Quadraticclassifiers• k-nearestneighbor• Boosting

Pick an algorithm

Page 23: Introduction to Machine Learning @ Mooncascade ML Camp

The data is in the right format — what’s next?• C4.5• Randomforests• Bayesiannetworks• HiddenMarkovmodels• Artificialneuralnetwork• Dataclustering• Expectation-maximizationalgorithm• Self-organizingmap• Radialbasisfunctionnetwork• VectorQuantization• Generativetopographicmap• Informationbottleneckmethod• IBSEAD• Apriorialgorithm• Eclatalgorithm• FP-growthalgorithm• Single-linkageclustering• Conceptualclustering• K-meansalgorithm• Fuzzyclustering• Temporaldifferencelearning• Q-learning• LearningAutomata

• AODE• Artificialneuralnetwork• Backpropagation• NaiveBayesclassifier• Bayesiannetwork• Bayesianknowledgebase• Case-basedreasoning• Decisiontrees• Inductivelogicprogramming• Gaussianprocessregression• Geneexpressionprogramming• Groupmethodofdatahandling(GMDH)• LearningAutomata• LearningVectorQuantization• LogisticModelTree• Decisiontree• Decisiongraphs• Lazylearning• MonteCarloMethod• SARSA

• Instance-basedlearning• NearestNeighborAlgorithm• Analogicalmodeling• Probablyapproximatelycorrectlearning(PACL)• Symbolicmachinelearningalgorithms• Subsymbolicmachinelearningalgorithms• Supportvectormachines• RandomForest• Ensemblesofclassifiers• Bootstrapaggregating(bagging)• Boosting(meta-algorithm)• Ordinalclassification• Regressionanalysis• Informationfuzzynetworks(IFN)• Linearclassifiers• Fisher'slineardiscriminant• Logisticregression• NaiveBayesclassifier• Perceptron• Supportvectormachines• Quadraticclassifiers• k-nearestneighbor• Boosting

Pick an algorithm

Page 24: Introduction to Machine Learning @ Mooncascade ML Camp

DECISION TREE

vs.

Page 25: Introduction to Machine Learning @ Mooncascade ML Camp

DECISION TREE

vs.

(0, …, 28, 65, …, 207, 101, 0, 0)

(0, …, 19, 34, …, 254, 54, 0, 0)

(0, …, 87, 59, …, 240, 52, 4, 0)

(0, …, 87, 52, …, 240, 19, 3, 0)

(0, …, 28, 64, …, 102, 101, 0, 0)

(0, …, 19, 23, …, 105, 54, 0, 0)

(0, …, 87, 74, …, 121, 51, 7, 0)

(0, …, 87, 112, …, 239, 52, 4, 0)

Page 26: Introduction to Machine Learning @ Mooncascade ML Camp

DECISION TREE

vs.

(0, …, 28, 65, …, 207, 101, 0, 0)

(0, …, 19, 34, …, 254, 54, 0, 0)

(0, …, 87, 59, …, 240, 52, 4, 0)

(0, …, 87, 52, …, 240, 19, 3, 0)

(0, …, 28, 64, …, 102, 101, 0, 0)

(0, …, 19, 23, …, 105, 54, 0, 0)

(0, …, 87, 74, …, 121, 51, 7, 0)

(0, …, 87, 112, …, 239, 52, 4, 0)

PIXEL #417

Page 27: Introduction to Machine Learning @ Mooncascade ML Camp

DECISION TREE

vs.

(0, …, 28, 65, …, 207, 101, 0, 0)

(0, …, 19, 34, …, 254, 54, 0, 0)

(0, …, 87, 59, …, 240, 52, 4, 0)

(0, …, 87, 52, …, 240, 19, 3, 0)

(0, …, 28, 64, …, 102, 101, 0, 0)

(0, …, 19, 23, …, 105, 54, 0, 0)

(0, …, 87, 74, …, 121, 51, 7, 0)

(0, …, 87, 112, …, 239, 52, 4, 0)

PIXEL #417

PIXEL #417

>200 <200

Page 28: Introduction to Machine Learning @ Mooncascade ML Camp

DECISION TREE

vs.

(0, …, 28, 65, …, 207, 101, 0, 0)

(0, …, 19, 34, …, 254, 54, 0, 0)

(0, …, 87, 59, …, 240, 52, 4, 0)

(0, …, 87, 52, …, 240, 19, 3, 0)

(0, …, 28, 64, …, 102, 101, 0, 0)

(0, …, 19, 23, …, 105, 54, 0, 0)

(0, …, 87, 74, …, 121, 51, 7, 0)

(0, …, 87, 112, …, 239, 52, 4, 0)

PIXEL #417

PIXEL #417

>200 <200

Page 29: Introduction to Machine Learning @ Mooncascade ML Camp

DECISION TREE

vs.

(0, …, 28, 65, …, 207, 101, 0, 0)

(0, …, 19, 34, …, 254, 54, 0, 0)

(0, …, 87, 59, …, 240, 52, 4, 0)

(0, …, 87, 52, …, 240, 19, 3, 0)

(0, …, 28, 64, …, 102, 101, 0, 0)

(0, …, 19, 23, …, 105, 54, 0, 0)

(0, …, 87, 74, …, 121, 51, 7, 0)

(0, …, 87, 112, …, 239, 52, 4, 0)

PIXEL #417

>200 <200

Page 30: Introduction to Machine Learning @ Mooncascade ML Camp

DECISION TREE

vs.

(0, …, 28, 65, …, 207, 101, 0, 0)

(0, …, 19, 34, …, 254, 54, 0, 0)

(0, …, 87, 59, …, 240, 52, 4, 0)

(0, …, 87, 52, …, 240, 19, 3, 0)

(0, …, 28, 64, …, 102, 101, 0, 0)

(0, …, 19, 23, …, 105, 54, 0, 0)

(0, …, 87, 74, …, 121, 51, 7, 0)

(0, …, 87, 112, …, 239, 52, 4, 0)

PIXEL #417

>200 <200

PIXEL #123

Page 31: Introduction to Machine Learning @ Mooncascade ML Camp

DECISION TREE

vs.

(0, …, 28, 65, …, 207, 101, 0, 0)

(0, …, 19, 34, …, 254, 54, 0, 0)

(0, …, 87, 59, …, 240, 52, 4, 0)

(0, …, 87, 52, …, 240, 19, 3, 0)

(0, …, 28, 64, …, 102, 101, 0, 0)

(0, …, 19, 23, …, 105, 54, 0, 0)

(0, …, 87, 74, …, 121, 51, 7, 0)

(0, …, 87, 112, …, 239, 52, 4, 0)

PIXEL #417

>200 <200

PIXEL #123

<100 >100

PIXEL #123

Page 32: Introduction to Machine Learning @ Mooncascade ML Camp

DECISION TREE

vs.

(0, …, 28, 65, …, 207, 101, 0, 0)

(0, …, 19, 34, …, 254, 54, 0, 0)

(0, …, 87, 59, …, 240, 52, 4, 0)

(0, …, 87, 52, …, 240, 19, 3, 0)

(0, …, 28, 64, …, 102, 101, 0, 0)

(0, …, 19, 23, …, 105, 54, 0, 0)

(0, …, 87, 74, …, 121, 51, 7, 0)

(0, …, 87, 112, …, 239, 52, 4, 0)

PIXEL #417

>200 <200

<100 >100

PIXEL #123

Page 33: Introduction to Machine Learning @ Mooncascade ML Camp

DECISION TREE

Page 34: Introduction to Machine Learning @ Mooncascade ML Camp

DECISION TREE

Page 35: Introduction to Machine Learning @ Mooncascade ML Camp

ACCURACY

Page 36: Introduction to Machine Learning @ Mooncascade ML Camp

ACCURACY

Confusion matrix

True

cla

ss

Predicted class

Page 37: Introduction to Machine Learning @ Mooncascade ML Camp

ACCURACY

Confusion matrix

acc =

correctly classified

total number of samples

True

cla

ss

Predicted class

Page 38: Introduction to Machine Learning @ Mooncascade ML Camp

ACCURACY

Confusion matrix

acc =

correctly classified

total number of samples

Beware of an imbalanced dataset!

True

cla

ss

Predicted class

Page 39: Introduction to Machine Learning @ Mooncascade ML Camp

ACCURACY

Confusion matrix

acc =

correctly classified

total number of samples

Beware of an imbalanced dataset!

Consider the following model: “Always predict 2”

True

cla

ss

Predicted class

Page 40: Introduction to Machine Learning @ Mooncascade ML Camp

ACCURACY

Confusion matrix

acc =

correctly classified

total number of samples

Beware of an imbalanced dataset!

Consider the following model: “Always predict 2”

Accuracy 0.9

True

cla

ss

Predicted class

Page 41: Introduction to Machine Learning @ Mooncascade ML Camp

DECISION TREE

Page 42: Introduction to Machine Learning @ Mooncascade ML Camp

DECISION TREE

“You said 100% accurate?! Every 10th digit your system detects is wrong!”

Angry client

Page 43: Introduction to Machine Learning @ Mooncascade ML Camp

DECISION TREE

“You said 100% accurate?! Every 10th digit your system detects is wrong!”

Angry client

We’ve trained our system on the data the client gave us. But our system has never seen the new data the client applied it to.

And in the real life — it never will…

Page 44: Introduction to Machine Learning @ Mooncascade ML Camp

OVERFITTING

Simulate the real-life situation — split the dataset

Page 45: Introduction to Machine Learning @ Mooncascade ML Camp

OVERFITTING

Simulate the real-life situation — split the dataset

Page 46: Introduction to Machine Learning @ Mooncascade ML Camp

OVERFITTING

Simulate the real-life situation — split the dataset

Page 47: Introduction to Machine Learning @ Mooncascade ML Camp

OVERFITTING

Simulate the real-life situation — split the dataset

Page 48: Introduction to Machine Learning @ Mooncascade ML Camp

Underfitting!“Too stupid” OK Overfitting!

“Too smart”

OVERFITTING

Page 49: Introduction to Machine Learning @ Mooncascade ML Camp

Underfitting!“Too stupid” OK Overfitting!

“Too smart”

OVERFITTING

Our current decision tree has too much capacity, it just has memorized all of the data.

Let’s make it less complex.

Page 50: Introduction to Machine Learning @ Mooncascade ML Camp
Page 51: Introduction to Machine Learning @ Mooncascade ML Camp
Page 52: Introduction to Machine Learning @ Mooncascade ML Camp
Page 53: Introduction to Machine Learning @ Mooncascade ML Camp

You probably did not notice, but we are overfitting again :(

Page 54: Introduction to Machine Learning @ Mooncascade ML Camp

TEST SET 20%

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

Page 55: Introduction to Machine Learning @ Mooncascade ML Camp

TEST SET 20%

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

Fit various models and parameter combinations on this subset

Page 56: Introduction to Machine Learning @ Mooncascade ML Camp

TEST SET 20%

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

Fit various models and parameter combinations on this subset

• Evaluate the models created with different parameters

Page 57: Introduction to Machine Learning @ Mooncascade ML Camp

TEST SET 20%

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

Fit various models and parameter combinations on this subset

• Evaluate the models created with different parameters

!• Estimate overfitting

TRAVALI

Page 58: Introduction to Machine Learning @ Mooncascade ML Camp

TEST SET 20%

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

Fit various models and parameter combinations on this subset

• Evaluate the models created with different parameters

!• Estimate overfitting

TRAVALITRAVALI

Page 59: Introduction to Machine Learning @ Mooncascade ML Camp

TEST SET 20%

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

Fit various models and parameter combinations on this subset

• Evaluate the models created with different parameters

!• Estimate overfitting

TRAVALITRAVALITRAVALI

Page 60: Introduction to Machine Learning @ Mooncascade ML Camp

TEST SET 20%

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

Fit various models and parameter combinations on this subset

• Evaluate the models created with different parameters

!• Estimate overfitting

TRAVALITRAVALITRAVALITRAVALI

Page 61: Introduction to Machine Learning @ Mooncascade ML Camp

TEST SET 20%

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

Fit various models and parameter combinations on this subset

• Evaluate the models created with different parameters

!• Estimate overfitting

TRAVALITRAVALITRAVALITRAVALITRAVALI

Page 62: Introduction to Machine Learning @ Mooncascade ML Camp

TEST SET 20%

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

Fit various models and parameter combinations on this subset

• Evaluate the models created with different parameters

!• Estimate overfitting

Use only once to get the final performance estimate

TRAVALITRAVALITRAVALITRAVALITRAVALI

Page 63: Introduction to Machine Learning @ Mooncascade ML Camp

TEST SET 20%

TRAINING SET 60%

VALIDATION SET 20%

Page 64: Introduction to Machine Learning @ Mooncascade ML Camp

TEST SET 20%

TRAINING SET 60%

VALIDATION SET 20%

Page 65: Introduction to Machine Learning @ Mooncascade ML Camp

CROSS-VALIDATION

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

Page 66: Introduction to Machine Learning @ Mooncascade ML Camp

CROSS-VALIDATION

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

What if we got too optimistic validation set?

Page 67: Introduction to Machine Learning @ Mooncascade ML Camp

CROSS-VALIDATION

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

What if we got too optimistic validation set?

TRAINING SET 80%

Page 68: Introduction to Machine Learning @ Mooncascade ML Camp

CROSS-VALIDATION

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

What if we got too optimistic validation set?

TRAINING SET 80%

Fix the parameter value you ned to evaluate, say msl=15

Page 69: Introduction to Machine Learning @ Mooncascade ML Camp

CROSS-VALIDATION

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

What if we got too optimistic validation set?

TRAINING SET 80%

Fix the parameter value you ned to evaluate, say msl=15

TRAINING VAL

TRAINING VAL

TRAININGVAL

Repeat 10 times

Page 70: Introduction to Machine Learning @ Mooncascade ML Camp

CROSS-VALIDATION

TRAINING SET 60%

THE WHOLE DATASET

VALIDATION SET 20%

What if we got too optimistic validation set?

TRAINING SET 80%

Fix the parameter value you ned to evaluate, say msl=15

TRAINING VAL

TRAINING VAL

TRAININGVAL

Repeat 10 times } Take average validation score over 10 runs — it is a more stable estimate.

Page 71: Introduction to Machine Learning @ Mooncascade ML Camp
Page 72: Introduction to Machine Learning @ Mooncascade ML Camp
Page 73: Introduction to Machine Learning @ Mooncascade ML Camp
Page 74: Introduction to Machine Learning @ Mooncascade ML Camp

MACHINE LEARNING PIPELINE

Take raw data Extract features Split into TRAINING and TEST

Pick an algorithm and parameters

Train on the TRAINING data

Evaluate on the TRAINING data

with CV

Train on the whole TRAINING

Fix the best parameters

Evaluate on TESTReport final

performance to the client

Try our different algorithms and parameters

Page 75: Introduction to Machine Learning @ Mooncascade ML Camp

MACHINE LEARNING PIPELINE

Take raw data Extract features Split into TRAINING and TEST

Pick an algorithm and parameters

Train on the TRAINING data

Evaluate on the TRAINING data

with CV

Train on the whole TRAINING

Fix the best parameters

Evaluate on TESTReport final

performance to the client

Try our different algorithms and parameters

“So it is ~87%…erm… Could you do better?”

Page 76: Introduction to Machine Learning @ Mooncascade ML Camp

MACHINE LEARNING PIPELINE

Take raw data Extract features Split into TRAINING and TEST

Pick an algorithm and parameters

Train on the TRAINING data

Evaluate on the TRAINING data

with CV

Train on the whole TRAINING

Fix the best parameters

Evaluate on TESTReport final

performance to the client

Try our different algorithms and parameters

“So it is ~87%…erm… Could you do better?”

Yes

Page 77: Introduction to Machine Learning @ Mooncascade ML Camp

• C4.5• Randomforests• Bayesiannetworks• HiddenMarkovmodels• Artificialneuralnetwork• Dataclustering• Expectation-maximizationalgorithm• Self-organizingmap• Radialbasisfunctionnetwork• VectorQuantization• Generativetopographicmap• Informationbottleneckmethod• IBSEAD• Apriorialgorithm• Eclatalgorithm• FP-growthalgorithm• Single-linkageclustering• Conceptualclustering• K-meansalgorithm• Fuzzyclustering• Temporaldifferencelearning• Q-learning• LearningAutomata

• AODE• Artificialneuralnetwork• Backpropagation• NaiveBayesclassifier• Bayesiannetwork• Bayesianknowledgebase• Case-basedreasoning• Decisiontrees• Inductivelogicprogramming• Gaussianprocessregression• Geneexpressionprogramming• Groupmethodofdatahandling(GMDH)• LearningAutomata• LearningVectorQuantization• LogisticModelTree• Decisiontree• Decisiongraphs• Lazylearning• MonteCarloMethod• SARSA

• Instance-basedlearning• NearestNeighborAlgorithm• Analogicalmodeling• Probablyapproximatelycorrectlearning(PACL)• Symbolicmachinelearningalgorithms• Subsymbolicmachinelearningalgorithms• Supportvectormachines• RandomForest• Ensemblesofclassifiers• Bootstrapaggregating(bagging)• Boosting(meta-algorithm)• Ordinalclassification• Regressionanalysis• Informationfuzzynetworks(IFN)• Linearclassifiers• Fisher'slineardiscriminant• Logisticregression• NaiveBayesclassifier• Perceptron• Supportvectormachines• Quadraticclassifiers• k-nearestneighbor• Boosting

Pick another algorithm

Page 78: Introduction to Machine Learning @ Mooncascade ML Camp

• C4.5• Randomforests• Bayesiannetworks• HiddenMarkovmodels• Artificialneuralnetwork• Dataclustering• Expectation-maximizationalgorithm• Self-organizingmap• Radialbasisfunctionnetwork• VectorQuantization• Generativetopographicmap• Informationbottleneckmethod• IBSEAD• Apriorialgorithm• Eclatalgorithm• FP-growthalgorithm• Single-linkageclustering• Conceptualclustering• K-meansalgorithm• Fuzzyclustering• Temporaldifferencelearning• Q-learning• LearningAutomata

• AODE• Artificialneuralnetwork• Backpropagation• NaiveBayesclassifier• Bayesiannetwork• Bayesianknowledgebase• Case-basedreasoning• Decisiontrees• Inductivelogicprogramming• Gaussianprocessregression• Geneexpressionprogramming• Groupmethodofdatahandling(GMDH)• LearningAutomata• LearningVectorQuantization• LogisticModelTree• Decisiontree• Decisiongraphs• Lazylearning• MonteCarloMethod• SARSA

• Instance-basedlearning• NearestNeighborAlgorithm• Analogicalmodeling• Probablyapproximatelycorrectlearning(PACL)• Symbolicmachinelearningalgorithms• Subsymbolicmachinelearningalgorithms• Supportvectormachines• RandomForest• Ensemblesofclassifiers• Bootstrapaggregating(bagging)• Boosting(meta-algorithm)• Ordinalclassification• Regressionanalysis• Informationfuzzynetworks(IFN)• Linearclassifiers• Fisher'slineardiscriminant• Logisticregression• NaiveBayesclassifier• Perceptron• Supportvectormachines• Quadraticclassifiers• k-nearestneighbor• Boosting

Pick another algorithm

Page 79: Introduction to Machine Learning @ Mooncascade ML Camp

RANDOM FOREST

Page 80: Introduction to Machine Learning @ Mooncascade ML Camp

RANDOM FORESTDecision tree:

pick best out of all features

Page 81: Introduction to Machine Learning @ Mooncascade ML Camp

RANDOM FORESTDecision tree:

pick best out of all featuresRandom forest:

pick best out of random subset of features

Page 82: Introduction to Machine Learning @ Mooncascade ML Camp

RANDOM FOREST

Page 83: Introduction to Machine Learning @ Mooncascade ML Camp

RANDOM FOREST

pick best out of another random subset of features

Page 84: Introduction to Machine Learning @ Mooncascade ML Camp

RANDOM FOREST

pick best out of another random subset of features pick best out of yet another

random subset of features

Page 85: Introduction to Machine Learning @ Mooncascade ML Camp

RANDOM FOREST

Page 86: Introduction to Machine Learning @ Mooncascade ML Camp

RANDOM FOREST

Page 87: Introduction to Machine Learning @ Mooncascade ML Camp

RANDOM FOREST

class

instance

Page 88: Introduction to Machine Learning @ Mooncascade ML Camp

RANDOM FOREST

class

instance

Page 89: Introduction to Machine Learning @ Mooncascade ML Camp

RANDOM FOREST

class

instance

Page 90: Introduction to Machine Learning @ Mooncascade ML Camp

RANDOM FOREST

class

instance

Page 91: Introduction to Machine Learning @ Mooncascade ML Camp
Page 92: Introduction to Machine Learning @ Mooncascade ML Camp
Page 93: Introduction to Machine Learning @ Mooncascade ML Camp

Happy client

Page 94: Introduction to Machine Learning @ Mooncascade ML Camp

ALL OTHER USE CASES

Page 95: Introduction to Machine Learning @ Mooncascade ML Camp

Sound

Frequency components Genre Bag of

words Topic

Text

Pixel values

Image

Cat or dog

Video

Frame pixels

Walking or running

Database records Biometric data

Census data

Average salary … Dead or

alive

Page 96: Introduction to Machine Learning @ Mooncascade ML Camp
Page 97: Introduction to Machine Learning @ Mooncascade ML Camp
Page 98: Introduction to Machine Learning @ Mooncascade ML Camp
Page 99: Introduction to Machine Learning @ Mooncascade ML Camp
Page 100: Introduction to Machine Learning @ Mooncascade ML Camp
Page 101: Introduction to Machine Learning @ Mooncascade ML Camp

HANDS-ON SESSION

Page 102: Introduction to Machine Learning @ Mooncascade ML Camp

http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

Page 103: Introduction to Machine Learning @ Mooncascade ML Camp