Introduction to Machine Learning - Alex Smolaalex.smola.org/.../slides/11_Learning_Theory.pdf ·...

Introduction to Machine Learning CMU-10701

11. Learning Theory

Barnabás Póczos

We have explored many ways of learning from data But…

– How good is our classifier, really?

– How much data do we need to make it “good enough”?

Learning Theory

Please ask Questions and give us Feedbacks!

Review of what we have learned so far

Notation

This is what the learning algorithm produces

We will need these definitions, please copy it!

Big Picture

Bayes risk

Estimation error Approximation error

Bayes risk

Ultimate goal:

Approximation error

Estimation error

Big Picture

Bayes risk

Big Picture

Bayes risk

Big Picture: Illustration of Risks

Upper bound

Goal of Learning:

11. Learning Theory

Outline

These results are useless if N is big, or infinite. (e.g. all possible hyper-planes)

Today we will see how to fix this with the Shattering coefficient and VC dimension

Theorem:

From Hoeffding’s inequality, we have seen that

Outline

Theorem:

From Hoeffding’s inequality, we have seen that

After this fix, we can say something meaningful about this too:

This is what the learning algorithm produces and its true risk

Hoeffding inequality

Theorem:

Observation:

McDiarmid’s Bounded Difference Inequality

It follows that

Bounded Difference Condition

Our main goal is to bound Lemma:

Let g denote the following function:

Observation:

Proof:

) McDiarmid can be applied for g!

Bounded Difference Condition

Corollary:

The Vapnik-Chervonenkis inequality does that with the shatter coefficient (and VC dimension)!

Concentration and Expected Value

Vapnik-Chervonenkis inequality

We already know:

Vapnik-Chervonenkis inequality:

Corollary: Vapnik-Chervonenkis theorem:

Our main goal is to bound

Shattering

How many points can a linear boundary classify exactly in 1D?

2 pts 3 pts - + +

- + - ?? There exists placement s.t. all labelings can be classified

The answer is 2

- + 3 pts 4 pts

- + + - +

- + - +

How many points can a linear boundary classify exactly in 2D?

There exists placement s.t. all labelings can be classified

The answer is 3

How many points can a linear boundary classify exactly in d-dim?

The answer is d+1

How many points can a linear boundary classify exactly in 3D? The answer is 4

tetraeder

Growth function, Shatter coefficient

Definition

0 0 0 0 1 0 1 1 1 1 0 0 0 1 1 0 1 0 1 1 1

(=5 in this example)

maximum number of behaviors on n points

Definition

Example: Half spaces in 2D

VC-dimension

Definition

Definition: VC-dimension

Definition: Shattering

Note: 25

VC-dimension

Definition

- + - +

VC-dimension

Examples

VC dim of decision stumps (axis aligned linear separator) in 2d

What’s the VC dim. of decision stumps in 2d?

There is a placement of 3 pts that can be shattered ) VC dim ≥ 3

What’s the VC dim. of decision stumps in 2d? If VC dim = 3, then for all placements of 4 pts, there exists a labeling that

can’t be shattered

3 collinear 1 in convex hull of other 3

quadrilateral

- + - - +

VC dim of decision stumps (axis aligned linear separator) in 2d

VC dim. of axis parallel rectangles in 2d

What’s the VC dim. of axis parallel rectangles in 2d?

- There is a placement of 3 pts that can be shattered ) VC dim ≥ 3

There is a placement of 4 pts that can be shattered ) VC dim ≥ 4 32

What’s the VC dim. of axis parallel rectangles in 2d?

- + - - +

If VC dim = 4, then for all placements of 5 pts, there exists a labeling that can’t be shattered

4 collinear 2 in convex hull 1 in convex hull

pentagon

Sauer’s Lemma

The VC dimension can be used to upper bound the shattering coefficient.

Sauer’s lemma:

Corollary:

We already know that [Exponential in n]

[Polynomial in n]

Proof of Sauer’s Lemma

Write all different behaviors on a sample (x1,x2,…xn) in a matrix:

0 0 0 0 1 0 1 1 1 1 0 0 0 1 0 1 1 1 0 1 1

0 0 0 0 1 0 1 1 1 1 0 0 0 1 1

We will prove that

0 0 0 0 1 0 1 1 1 1 0 0 0 1 1

Shattered subsets of columns:

Therefore,

Lemma 1

0 0 0 0 1 0 1 1 1 1 0 0 0 1 1

for any binary matrix with no repeated rows.

Lemma 2

In this example: 5· 6

In this example: 6· 1+3+3=7

Proof of Lemma 1

Lemma 1

0 0 0 0 1 0 1 1 1 1 0 0 0 1 1

In this example: 6· 1+3+3=7

Proof of Lemma 2

for any binary matrix with no repeated rows. Lemma 2

Induction on the number of columns Proof

Base case: A has one column. There are three cases:

) 1 · 1 ) 1 · 1 ) 2 · 2

Proof of Lemma 2

Inductive case: A has at least two columns.

We have,

By induction (less columns)

0 0 0 0 1 0 1 1 1 1 0 0 0 1 1

Proof of Lemma 2

because

0 0 0 0 1 0 1 1 1 1 0 0 0 1 1

Vapnik-Chervonenkis inequality

From Sauer’s lemma:

Therefore,

[We don’t prove this]

Estimation error

Linear (hyperplane) classifiers

Estimation error

We already know that

Estimation error

Vapnik-Chervonenkis Theorem

We already know from McDiarmid:

Corollary: Vapnik-Chervonenkis theorem: [We don’t prove them]

Hoeffding + Union bound for finite function class:

PAC Bound for the Estimation Error

VC theorem:

Inversion:

Estimation error

Structoral Risk Minimization

Bayes risk

Ultimate goal:

Approximation error

Estimation error

So far we studied when estimation error ! 0, but we also want approximation error ! 0

Many different variants… penalize too complex models to avoid overfitting

What you need to know

Complexity of the classifier depends on number of points that can be classified exactly

Finite case – Number of hypothesis

Infinite case – Shattering coefficient, VC dimension

PAC bounds on true error in terms of empirical/training error and complexity of hypothesis space

Empirical and Structural Risk Minimization

Thanks for your attention

Introduction to Machine Learning - Alex Smolaalex.smola.org/.../slides/11_Learning_Theory.pdf ·...

Documents

MACHINE LEARNING AN INTRODUCTION - Sas Institute · 2016-03-11 · MACHINE LEARNING - AN INTRODUCTION WHAT IS MACHINE LEARNING? Wikipedia: Machine learning, a branch of artificial

Introduction to Machine Learning - Alex Smolaalex.smola.org/teaching/cmu2013-10-701/slides/3_Instance_Based.pdf · • Density estimate • Smoothing kernels Parzen Windows p emp

Machine learning for neuroimaging with scikit-learn · Machine learning for neuroimaging with scikit-learn. ... a Python machine learning library, ... Abraham et al. Machine learning

Chapter- 6 : Machine Learning - IOE Notesioenotes.edu.np › media › ...Chapter-6-machine-learning... · Chapter- 6 : Machine Learning-Machine learning is a branch of AI that uses

Machine Learning and Azure Machine Learning

Fast Track Machine Learning Part 1 (Machine Learning Overview) - Types of Machine Learning Problems

An Introduction to Machine Learning - Alex Smolaalex.smola.org/teaching/pune2007/pune_1.pdf · 2013-09-09 · An Introduction to Machine Learning L1: Basics and Probability Theory

Machine Learning for NLP - Ethics and Machine Learning

Tutorial: Deep Reinforcement Learning - Machine Learning ...hunch.net/~beygel/deep_rl_tutorial.pdfTutorial: Deep Reinforcement Learning - Machine Learning

Machine Learning in Logistics: Machine Learning Algorithmsltu.diva-portal.org/smash/get/diva2:1118764/FULLTEXT02.pdf · Machine Learning in Logistics: Machine Learning ... Machine

Machine Learning on Spark - UC Berkeley AMP Campampcamp.berkeley.edu/.../Machine-Learning-on-Spark... · Machine Learning on Spark Shivaram Venkataraman ... Machine learning algorithms

Masters of Craft: Programming - Machine Learning : Research · Masters of Craft: Programming - Machine Learning: Research Janet Choi. Machine Learning Machine Learning is a trend

Hawaii Machine Learning Meetup · Introduction –Why Machine Learning? Machine Learning Quotes ^A breakthrough in machine learning would be worth ten Microsofts. ―Bill Gates For

Predicting Housing Prices With Azure Machine Learning · Azure Machine Learning. Expected Learning Outcomes Azure Machine Learning @tetranoodle Build Regression Machine Learning Model

Learning with large datasets Machine Learning Large scale machine learning

Machine Learning FabSpace March 2018 - irit.fr Learning FabSpace March 2018... · Machine Learning 5 Machine Learning • Supervised : Train / test • Predictiveanalysis Machine

¿Qué es Machine Learning? Usos de Machine Learning

Deep learning - Machine learning overviewce.sharif.edu/courses/98-99/1/ce719-1/resources/root/...Deep learning j What is machine learning? Types of machine learning Machine learning

1-Machine Learning in Oracle - ITOUG2017/12/01 · – Oracle Machine Learning (on Oracle Autonomous Data Warehouse Cloud Service) – Automated Machine Learning => Machine Learning

1. Introduction to Machine - lis-lab.fr › bernard.espinasse › ... · § 1. Introduction to Machine Learning § Machine Learning Problem § Machine Learning: Supervised & Unsupervised