14
©2005-2007 Carlos Guestrin 1 VC Dimension Machine Learning – 10701/15781 Carlos Guestrin Carnegie Mellon University October 29 th , 2007 ©2005-2007 Carlos Guestrin 2 What about continuous hypothesis spaces? Continuous hypothesis space: |H| = Infinite variance??? As with decision trees, only care about the maximum number of points that can be classified exactly!

VC Dimension - Carnegie Mellon School of Computer Scienceguestrin/Class/15781/slides/learningtheory-bns... · ©2005-2007 Carlos Guestrin 1 VC Dimension Machine Learning – 10701/15781

  • Upload
    vanhanh

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

Page 1: VC Dimension - Carnegie Mellon School of Computer Scienceguestrin/Class/15781/slides/learningtheory-bns... · ©2005-2007 Carlos Guestrin 1 VC Dimension Machine Learning – 10701/15781

©2005-2007 Carlos Guestrin 1

VC Dimension

Machine Learning – 10701/15781Carlos GuestrinCarnegie Mellon University

October 29th, 2007

©2005-2007 Carlos Guestrin 2

What about continuous hypothesisspaces?

Continuous hypothesis space: |H| = ∞ Infinite variance???

As with decision trees, only care about themaximum number of points that can beclassified exactly!

Page 2: VC Dimension - Carnegie Mellon School of Computer Scienceguestrin/Class/15781/slides/learningtheory-bns... · ©2005-2007 Carlos Guestrin 1 VC Dimension Machine Learning – 10701/15781

©2005-2007 Carlos Guestrin 3

How many points can a linearboundary classify exactly? (1-D)

©2005-2007 Carlos Guestrin 4

How many points can a linearboundary classify exactly? (2-D)

Page 3: VC Dimension - Carnegie Mellon School of Computer Scienceguestrin/Class/15781/slides/learningtheory-bns... · ©2005-2007 Carlos Guestrin 1 VC Dimension Machine Learning – 10701/15781

©2005-2007 Carlos Guestrin 5

How many points can a linearboundary classify exactly? (d-D)

©2005-2007 Carlos Guestrin 6

PAC bound using VC dimension

Number of training points that can beclassified exactly is VC dimension!!! Measures relevant size of hypothesis space, as

with decision trees with k leaves

Page 4: VC Dimension - Carnegie Mellon School of Computer Scienceguestrin/Class/15781/slides/learningtheory-bns... · ©2005-2007 Carlos Guestrin 1 VC Dimension Machine Learning – 10701/15781

©2005-2007 Carlos Guestrin 7

Shattering a set of points

©2005-2007 Carlos Guestrin 8

VC dimension

Page 5: VC Dimension - Carnegie Mellon School of Computer Scienceguestrin/Class/15781/slides/learningtheory-bns... · ©2005-2007 Carlos Guestrin 1 VC Dimension Machine Learning – 10701/15781

©2005-2007 Carlos Guestrin 9

PAC bound using VC dimension

Number of training points that can beclassified exactly is VC dimension!!! Measures relevant size of hypothesis space, as

with decision trees with k leaves Bound for infinite dimension hypothesis spaces:

©2005-2007 Carlos Guestrin 10

Examples of VC dimension

Linear classifiers: VC(H) = d+1, for d features plus constant term b

Neural networks VC(H) = #parameters Local minima means NNs will probably not find best

parameters

1-Nearest neighbor?

Page 6: VC Dimension - Carnegie Mellon School of Computer Scienceguestrin/Class/15781/slides/learningtheory-bns... · ©2005-2007 Carlos Guestrin 1 VC Dimension Machine Learning – 10701/15781

©2005-2007 Carlos Guestrin 11

Another VC dim. example -What can we shatter? What’s the VC dim. of decision stumps in 2d?

©2005-2007 Carlos Guestrin 12

Another VC dim. example -What can’t we shatter? What’s the VC dim. of decision stumps in 2d?

Page 7: VC Dimension - Carnegie Mellon School of Computer Scienceguestrin/Class/15781/slides/learningtheory-bns... · ©2005-2007 Carlos Guestrin 1 VC Dimension Machine Learning – 10701/15781

©2005-2007 Carlos Guestrin 13

What you need to know

Finite hypothesis space Derive results Counting number of hypothesis Mistakes on Training data

Complexity of the classifier depends on number ofpoints that can be classified exactly Finite case – decision trees Infinite case – VC dimension

Bias-Variance tradeoff in learning theory Remember: will your algorithm find best classifier?

Questions

Discussion board

Privacy

©2005-2007 Carlos Guestrin 14

Page 8: VC Dimension - Carnegie Mellon School of Computer Scienceguestrin/Class/15781/slides/learningtheory-bns... · ©2005-2007 Carlos Guestrin 1 VC Dimension Machine Learning – 10701/15781

©2005-2007 Carlos Guestrin 15

Bayesian Networks –Representation

Machine Learning – 10701/15781Carlos GuestrinCarnegie Mellon University

October 29th, 2007

©2005-2007 Carlos Guestrin 16

Handwriting recognition

Character recognition, e.g., kernel SVMs

z cbca

c rrr

r rr

Page 9: VC Dimension - Carnegie Mellon School of Computer Scienceguestrin/Class/15781/slides/learningtheory-bns... · ©2005-2007 Carlos Guestrin 1 VC Dimension Machine Learning – 10701/15781

©2005-2007 Carlos Guestrin 17

Webpage classification

Company home page

vs

Personal home page

vs

University home page

vs

©2005-2007 Carlos Guestrin 18

Handwriting recognition 2

Page 10: VC Dimension - Carnegie Mellon School of Computer Scienceguestrin/Class/15781/slides/learningtheory-bns... · ©2005-2007 Carlos Guestrin 1 VC Dimension Machine Learning – 10701/15781

©2005-2007 Carlos Guestrin 19

Webpage classification 2

©2005-2007 Carlos Guestrin 20

Today – Bayesian networks

One of the most exciting advancements instatistical AI in the last 10-15 years

Generalizes naïve Bayes and logistic regressionclassifiers

Compact representation for exponentially-largeprobability distributions

Exploit conditional independencies

Page 11: VC Dimension - Carnegie Mellon School of Computer Scienceguestrin/Class/15781/slides/learningtheory-bns... · ©2005-2007 Carlos Guestrin 1 VC Dimension Machine Learning – 10701/15781

©2005-2007 Carlos Guestrin 21

Causal structure

Suppose we know the following: The flu causes sinus inflammation Allergies cause sinus inflammation Sinus inflammation causes a runny nose Sinus inflammation causes headaches

How are these connected?

©2005-2007 Carlos Guestrin 22

Possible queries

Flu Allergy

Sinus

Headache Nose

Inference

Most probableexplanation

Active datacollection

Page 12: VC Dimension - Carnegie Mellon School of Computer Scienceguestrin/Class/15781/slides/learningtheory-bns... · ©2005-2007 Carlos Guestrin 1 VC Dimension Machine Learning – 10701/15781

©2005-2007 Carlos Guestrin 23

Car starts BN

18 binary attributes

Inference P(BatteryAge|Starts=f)

216 terms, why so fast? Not impressed?

HailFinder BN – more than 354 =58149737003040059690390169 terms

©2005-2007 Carlos Guestrin 24

Factored joint distribution -Preview

Flu Allergy

Sinus

Headache Nose

Page 13: VC Dimension - Carnegie Mellon School of Computer Scienceguestrin/Class/15781/slides/learningtheory-bns... · ©2005-2007 Carlos Guestrin 1 VC Dimension Machine Learning – 10701/15781

©2005-2007 Carlos Guestrin 25

Number of parameters

Flu Allergy

Sinus

Headache Nose

©2005-2007 Carlos Guestrin 26

Key: Independence assumptions

Flu Allergy

Sinus

Headache Nose

Knowing sinus separates the variables from each other

Page 14: VC Dimension - Carnegie Mellon School of Computer Scienceguestrin/Class/15781/slides/learningtheory-bns... · ©2005-2007 Carlos Guestrin 1 VC Dimension Machine Learning – 10701/15781

©2005-2007 Carlos Guestrin 27

(Marginal) Independence

Flu and Allergy are (marginally) independent

More Generally:

Flu = t Flu = f

Allergy = t

Allergy = f

Allergy = t

Allergy = f

Flu = t

Flu = f

©2005-2007 Carlos Guestrin 28

Marginally independent randomvariables

Sets of variables X, Y X is independent of Y if

P ²(X=x⊥Y=y), 8 x2Val(X), y2Val(Y)

Shorthand: Marginal independence: P ² (X ⊥ Y)

Proposition: P statisfies (X ⊥ Y) if and only if P(X,Y) = P(X) P(Y)