20
PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

Embed Size (px)

Citation preview

Page 1: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

PAC Learning

adapted from

Tom M.Mitchell

Carnegie Mellon University

Page 2: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

Learning Issues

Under what conditions is successful learning

… possible ?

… assured for a particular learning algorithm ?

Page 3: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

Sample Complexity

How many training examples are needed

… for a learner to converge (with high probability) to a successful hypothesis?

Page 4: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

Computational Complexity

How much computational effort is needed

… for a learner to converge (with high probability) to a successful hypothesis?

Page 5: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

The world

X is the sample space

Example: Two dice{(1,1),(1,2),…,(6,5),(6,6)}

x x xx

x x x

xx

x

xx

x

Page 6: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

Weighted world

X is a distribution over X

Example: Biased dice{(1,1; p11),(1,2 ; p12),…,(6,5 ; p65),(6,6 ;

p66)}

xx xx

x xx

xx

x

xx

x

Page 7: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

An event

E is a subset of X

Example: Two dice{(1,1),(1,2),…,(6,5),(6,6)}

x x xx

x x x

xx

x

xx

x

Page 8: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

An event

E is a subset of X

Example: A pair in Two dice{(1,1),(2,2),(3,3),(4,4),(5,5),(6,6)}

x x xx

x x x

xx

x

xx

x

Page 9: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

A Concept

C is an indicator function of an event E

Example: A pair in Two dicec(x,y) := (x==y)x x xx

x x x

xx

x

xx

x

Page 10: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

A hypotesis

h is an approximation to a concept c

Example: A separating hyperplane

h(x,y) := (0.5).[1+sign(a.x+by+c)]

x x xx

x x x

xx

x

xx

x

Page 11: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

The dataset

D is an i.i.d. sample from (X, )

{<xi,c(xi)>}i=1,…,m

m examples

Page 12: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

An Inductive learner

L is an algorithm that uses data D to produce hH

Example: The Perceptron Algorithm

h(x,y) := (0.5).[1+sign(a(D).x+b(D).y+c(D))]

x x xx

x x x

xx

x

xx

x

Page 13: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

Error Measures

Training error of hypothesis h

How often over training instances

True error of hypothesis h

How often over future random instances

Page 14: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

True error

Page 15: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

True error

Page 16: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

Learnability

How to describe Learn-ability ?

the number of training examples needed to learn a hypothesis for

which = 0.

Infeasible Infeasible

Page 17: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

PAC Learnability

Weaken demands on the learner

true error accuracy failure probability

and can be arbitrarily small

Probably Approximately Correct Probably Approximately Correct LearningLearning

Page 18: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

PAC Learnability

C is PAC-learnable by L

true error < with probability (1-) after reasonable # of examples reasonable time per example

Reasonable polynomial in terms of 1/, 1/, n(size of

examples) and target concept encoding length

Page 19: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

PAC Learnability

1)(Pr herrorD

Page 20: PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

C is PAC-Learnable

each target concept in C can be learned from a polynominal number of training examples

the processing time per example is also polynominal bounded

polynomial in terms of 1/, 1/, n (size of examples) and target c encoding length