20
Antoine Cornuéjols AgroParisTech – INRA MIA 518 [email protected] Course Learning theory and advanced Machine Learning 2 / 79 Course « InductionS » (A. Cornuéjols) The course n Documents Le livre "L'apprentissage artificiel. Deep Learning, concepts et algorithmes" A. Cornuéjols & L. Miclet & V. Barra Eyrolles. 3 ème éd. 2018 Les transparents + informations sur : http://www2.agroparistech.fr/ufr-info/membres/cornuejols/Teaching/Master-AIC/M2-AIC-advanced-ML.html 3 / 79 Course « InductionS » (A. Cornuéjols) 4 / 79 Course « InductionS » (A. Cornuéjols) Outline of the course Building an inductive criterion Semi-supervised learning Learning sparse models Induction How it works? Which guarantees can we get? The no-fre-lunch theorem Online learning Theory: new inductive criteria In practice: heuristic inductive criteria E.g. early classification of time series and LUPI Transfer learning Scenarios Which information to exchange? How to obtain guarantees? Ensemble methods What kinds of algorithms? Which information to exchange? And in the unsupervised case?

The course Course - AgroParisTech€¦ · Course « InductionS » (A. Cornuéjols) 3 / 79 Course « InductionS » (A. Cornuéjols) 4 / 79 Outline of the course Building an inductive

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The course Course - AgroParisTech€¦ · Course « InductionS » (A. Cornuéjols) 3 / 79 Course « InductionS » (A. Cornuéjols) 4 / 79 Outline of the course Building an inductive

AntoineCornuéjols

AgroParisTech–INRAMIA518

[email protected]

Course

Learning theory and

advanced Machine Learning

2 / 79 Course « InductionS » (A. Cornuéjols)

Thecourse

n  Documents

–  Le livre "L'apprentissage artificiel. Deep Learning, concepts et algorithmes" A. Cornuéjols & L. Miclet & V. Barra

Eyrolles. 3ème éd. 2018

–  Les transparents + informations sur :

http://www2.agroparistech.fr/ufr-info/membres/cornuejols/Teaching/Master-AIC/M2-AIC-advanced-ML.html

3 / 79 Course « InductionS » (A. Cornuéjols) 4 / 79 Course « InductionS » (A. Cornuéjols)

Outlineofthecourse

Building an inductive criterion • Semi-supervised learning • Learning sparse models

Induction • How it works? • Which guarantees can we get? • The no-fre-lunch theorem

Online learning • Theory: new inductive criteria • In practice: heuristic inductive criteria • E.g. early classification of time series and LUPI

Transfer learning • Scenarios • Which information to exchange? • How to obtain guarantees?

Ensemble methods • What kinds of algorithms? • Which information to exchange? • And in the unsupervised case?

Page 2: The course Course - AgroParisTech€¦ · Course « InductionS » (A. Cornuéjols) 3 / 79 Course « InductionS » (A. Cornuéjols) 4 / 79 Outline of the course Building an inductive

5 / 79 Course « InductionS » (A. Cornuéjols)

Course’sorganization

6Courses

1seminarlikesession:discussionofpapers

n  5quizz (5x5=25%)

n  Project :50%

–  19/12/2019:descriptionofthechosenproject(2pages)

–  31/01/2020:mid-termreport(5à8pages)

–  28/02/2020:rapportfinal(10pagesstrict.FormatpapierICML)

n  Criticalreviewofpapers :25%

A.Cornuéjols

AgroParisTech–INRAMIA518

ReflectionsonINDUCTION-S

http://www.agroparistech.fr/ufr-info/membres/cornuejols/Teaching/Master-AIC/M2-AIC-advanced-ML.html

7 / 79 Course « InductionS » (A. Cornuéjols)

Outline

1.   Inductions

2.  ThestatisticaltheoryofLearning

3.  Otherscenarios

4.  Theno-free-lunchtheorem

5.   Explanation-Basedlearning:whatkindofvalidation?

6.  Questions

8 / 79 Course « InductionS » (A. Cornuéjols)

Page 3: The course Course - AgroParisTech€¦ · Course « InductionS » (A. Cornuéjols) 3 / 79 Course « InductionS » (A. Cornuéjols) 4 / 79 Outline of the course Building an inductive

9 / 79 Course « InductionS » (A. Cornuéjols)

Supervisedinduction

10 / 79 Course « InductionS » (A. Cornuéjols)

Learningbyheart

11 / 79 Course « InductionS » (A. Cornuéjols) 12 / 79 Course « InductionS » (A. Cornuéjols)

Whentherearefewdatapoints

n  Learningatable

Example x1 x2 x3 x4 Label

1 0 0 1 0 0

2 0 1 0 0 0

3 0 0 1 1 1

4 1 0 0 1 1

5 0 1 1 0 0

6 1 1 0 0 0

7 0 1 0 1 0

Page 4: The course Course - AgroParisTech€¦ · Course « InductionS » (A. Cornuéjols) 3 / 79 Course « InductionS » (A. Cornuéjols) 4 / 79 Outline of the course Building an inductive

13 / 79 Course « InductionS » (A. Cornuéjols)

Whenthereisahugenumberofdatapoints

n  Learningafunctionf:x->y

Buthow?Whichfunction?

14 / 79 Course « InductionS » (A. Cornuéjols)

Supervisedlearning:

Simpleornotsosimple?

15 / 79 Course « InductionS » (A. Cornuéjols)

n  Examplesdescribedusing:Number(1or2);size(smallorlarge);shape(circleorsquare);color(redorgreen)

n  Theybelongeithertoclass‘+’ortoclass‘-’

Oneexamplethattellsalot…

16 / 79 Course « InductionS » (A. Cornuéjols)

Description Your answer True answer

1largeredsquare -

1largegreensquare

2smallredsquares

2largeredcircles

1largegreencircle

1smallredcircle

1smallgreensquare

1smallredsquare

2largegreensquares

+

+

+

-

+

+

+

-

Yet another exercise n  Examplesdescribedusing:

Number(1or2);size(smallorlarge);shape(circleorsquare);color(redorgreen)

n  Theybelongeithertoclass‘+’ortoclass‘-’

Page 5: The course Course - AgroParisTech€¦ · Course « InductionS » (A. Cornuéjols) 3 / 79 Course « InductionS » (A. Cornuéjols) 4 / 79 Outline of the course Building an inductive

17 / 79 Course « InductionS » (A. Cornuéjols)

Description Your prediction True class

1 large red square -

n  Examplesdescribedusing:Number(1or2);size(smallorlarge);shape(circleorsquare);color(redorgreen)

1largegreensquare

2smallredsquares

2largeredcircles

1largegreencircle

1smallredcircle

+

+

+

-

+

Oneexamplethattellsalot…

HowmanypossiblefunctionsaltogetherfromXtoY?

Howmanyfunctionsdoremainafter8trainingexamples?

22=216=65,5364

26=1024

18 / 79 Course « InductionS » (A. Cornuéjols)

n  Examplesdescribedusing:Number(1or2);size(smallorlarge);shape(circleorsquare);color(redorgreen)

Oneexamplethattellsalot…

Description Yourprediction Trueclass1largeredsquare -1largegreensquare +2smallredsquares +2largeredcircles -1largegreencircle +1smallredcircle +1smallgreensquare -1smallredsquare +2largegreensquares +2smallgreensquares +2smallredcircles +1smallgreencircle -2largegreencircles -2smallgreencircles +1largeredcircle -2largeredsquares ?

Howmanyremainingfunctions?

15

?

19 / 79 Course « InductionS » (A. Cornuéjols)

Description Your prediction True class

1 large red square -

n  Examplesdescribedusing:Number(1or2);size(smallorlarge);shape(circleorsquare);color(redorgreen)

1largegreensquare

2smallredsquares

2largeredcircles

1largegreencircle

1smallredcircle

+

+

+

-

+

Oneexamplethattellsalot…

Howmanypossiblefunctionswith2descriptorsfromXtoY?

Howmanyfunctionsdoremainafter3≠trainingexamples?

22=24=162

21=2

20 / 79 Course « InductionS » (A. Cornuéjols)

Induction:animpossiblegame?

n  Abiasisneeded

n  Typesofbias

–  Representationbias (declarative)

–  Researchbias (procedural)

Page 6: The course Course - AgroParisTech€¦ · Course « InductionS » (A. Cornuéjols) 3 / 79 Course « InductionS » (A. Cornuéjols) 4 / 79 Outline of the course Building an inductive

21 / 79 Course « InductionS » (A. Cornuéjols)

Interpretation–completionofpercepts

22 / 79 Course « InductionS » (A. Cornuéjols)

Interpretation–completionofpercepts

23 / 79 Course « InductionS » (A. Cornuéjols)

Interpretation–completionofpercepts

!"#$%&'()%*+,-./01%-2%3#4/1-5-64/701%85697,5%-:%#0;9.2%$97-<2/=-23%>?@%#91/11-A%

&)B'CB&'()%

ED%

6R(&,(-%"U8"-,"%&(7&'!"&"%%"-'.+#.%'&+%%81$'.(-&.-&;i&

!/2*29924#&'(&9#* '##&#"32*/'9*/'3* #1$*0'>$%* -$492Z"24-29*1/&-/*8%2'#(@*-$4#%&J"#23*#$*9/'?2*#/2*?'##2%4*%2-$84&#&$4*V2(3*&4*#/2*?'9#*A21*32-'329B**

!! &#* /'9* (23* #/2* -$00"4&#@* #$* A$-"9* 0'&4(@* $4* 7"+'8)"_*",'()&)"$)"%"-'+'.(-%+* 1/2%2+* 2'-/* $J>2-#* &9* 329-%&J23* &4* #2%09* $A* '*C2-#$%*$A*4"02%&-'(*'##%&J"#29*'43*&9*#/2%2A$%2*0'??23*#$*'*?$&4#*&4*'*S"-(&32'4*G82$02#%&-I*C2-#$%*9?'-2**

!! &#*/'9*(23*%292'%-/2%9*#$*0'&4#'&4*'*)"98,'.(-.%'&$(%.'.(-+*1/2%2J@*$J>2-#9*'%2*9224* &4* &9$('#&$4*'43*1/&-/*#/2%2A$%2*#2439*#$*$C2%($$.*#/2*%$(2*$A*-$4#2a#"'(+*$%*%2('#&$4'(+*&4A$%0'#&$4*

Y(-'"='&!"#$%&]&

!"#$%&'()%*+,-./01%-2%3#4/1-5-64/701%85697,5%-:%#0;9.2%$97-<2/=-23%>?@%#91/11-A%

&)B'CB&'()%

ED%

6R(&,(-%"U8"-,"%&(7&'!"&"%%"-'.+#.%'&+%%81$'.(-&.-&;i&

!/2*29924#&'(&9#* '##&#"32*/'9*/'3* #1$*0'>$%* -$492Z"24-29*1/&-/*8%2'#(@*-$4#%&J"#23*#$*9/'?2*#/2*?'##2%4*%2-$84&#&$4*V2(3*&4*#/2*?'9#*A21*32-'329B**

!! &#* /'9* (23* #/2* -$00"4&#@* #$* A$-"9* 0'&4(@* $4* 7"+'8)"_*",'()&)"$)"%"-'+'.(-%+* 1/2%2+* 2'-/* $J>2-#* &9* 329-%&J23* &4* #2%09* $A* '*C2-#$%*$A*4"02%&-'(*'##%&J"#29*'43*&9*#/2%2A$%2*0'??23*#$*'*?$&4#*&4*'*S"-(&32'4*G82$02#%&-I*C2-#$%*9?'-2**

!! &#*/'9*(23*%292'%-/2%9*#$*0'&4#'&4*'*)"98,'.(-.%'&$(%.'.(-+*1/2%2J@*$J>2-#9*'%2*9224* &4* &9$('#&$4*'43*1/&-/*#/2%2A$%2*#2439*#$*$C2%($$.*#/2*%$(2*$A*-$4#2a#"'(+*$%*%2('#&$4'(+*&4A$%0'#&$4*

Y(-'"='&!"#$%&]&

!"#$%&'()%*+,-./01%-2%3#4/1-5-64/701%85697,5%-:%#0;9.2%$97-<2/=-23%>?@%#91/11-A%

&)B'CB&'()%

ED%

6R(&,(-%"U8"-,"%&(7&'!"&"%%"-'.+#.%'&+%%81$'.(-&.-&;i&

!/2*29924#&'(&9#* '##&#"32*/'9*/'3* #1$*0'>$%* -$492Z"24-29*1/&-/*8%2'#(@*-$4#%&J"#23*#$*9/'?2*#/2*?'##2%4*%2-$84&#&$4*V2(3*&4*#/2*?'9#*A21*32-'329B**

!! &#* /'9* (23* #/2* -$00"4&#@* #$* A$-"9* 0'&4(@* $4* 7"+'8)"_*",'()&)"$)"%"-'+'.(-%+* 1/2%2+* 2'-/* $J>2-#* &9* 329-%&J23* &4* #2%09* $A* '*C2-#$%*$A*4"02%&-'(*'##%&J"#29*'43*&9*#/2%2A$%2*0'??23*#$*'*?$&4#*&4*'*S"-(&32'4*G82$02#%&-I*C2-#$%*9?'-2**

!! &#*/'9*(23*%292'%-/2%9*#$*0'&4#'&4*'*)"98,'.(-.%'&$(%.'.(-+*1/2%2J@*$J>2-#9*'%2*9224* &4* &9$('#&$4*'43*1/&-/*#/2%2A$%2*#2439*#$*$C2%($$.*#/2*%$(2*$A*-$4#2a#"'(+*$%*%2('#&$4'(+*&4A$%0'#&$4*

Y(-'"='&!"#$%&]&

24 / 79 Course « InductionS » (A. Cornuéjols)

Interprétation–complétiondepercepts

Page 7: The course Course - AgroParisTech€¦ · Course « InductionS » (A. Cornuéjols) 3 / 79 Course « InductionS » (A. Cornuéjols) 4 / 79 Outline of the course Building an inductive

25 / 79 Course « InductionS » (A. Cornuéjols)

Opticalillusions

26 / 79 Course « InductionS » (A. Cornuéjols)

Inductionanditsillusions

Illustration

27 / 79 Course « InductionS » (A. Cornuéjols)

Clustering

28 / 79 Course « InductionS » (A. Cornuéjols)

Clustering

Page 8: The course Course - AgroParisTech€¦ · Course « InductionS » (A. Cornuéjols) 3 / 79 Course « InductionS » (A. Cornuéjols) 4 / 79 Outline of the course Building an inductive

29 / 79 Course « InductionS » (A. Cornuéjols)

Inductioneverywhere

30 / 79 Course « InductionS » (A. Cornuéjols)

Theroleofinduction

n  [LeslieValiant,«ProbablyApproximatelyCorrect.Nature’sAlgorithmsforLearningandProsperinginaComplexWorld»,BasicBooks,2013]

«Fromthis,wehavetoconcludethatgeneralizationorinductionis

apervasivephenomenon(…).Itisasroutineandreproduciblea

phenomenonasobjectsfallingundergravity.

Itisreasonabletoexpectaquantitativescientificexplanation

ofthishighlyreproduciblephenomenon.»

31 / 79 Course « InductionS » (A. Cornuéjols)

Theroleofinduction

n  [EdwinT.Jaynes,«Probabilitytheory.Thelogicofscience»,CambridgeU.Press,2003],p.3

«Wearehardlyabletogetthroughonewakinghourwithoutfacingsome

situation(e.g.willitrainorwon’tit?)wherewedonothaveenough

informationtopermitdeductivereasoning;butstillwemustdecide

immediately.

Inspiteofitsfamiliarity,theformationofplausibleconclusionsisavery

subtleprocess.»

32 / 79 Course « InductionS » (A. Cornuéjols)

Sequences

n  1123581321…

n  1235...

n  1 1 1 2 1 1 2 1 1 1 1 1 2 2 1 3 1 2 2 1 1 …

–  Comment?

–  Pourquoiserait-ilpossibledefairedel’induction?

–  Est-cequ’unexemplesupplémentaire

doitaugmenterlaconfiancedanslarègleinduite?

–  Combienfaut-ild’exemples?

Page 9: The course Course - AgroParisTech€¦ · Course « InductionS » (A. Cornuéjols) 3 / 79 Course « InductionS » (A. Cornuéjols) 4 / 79 Outline of the course Building an inductive

33 / 79 Course « InductionS » (A. Cornuéjols)

Supervisedinduction

n  Howtochosethedecisionfunction?

x

y

34 / 79 Course « InductionS » (A. Cornuéjols)

Interrogations

Eachtime:

Specificcases=>generallaworadaptationtoanewcase

1.   Howthisgeneralizationisallowed?

2.   Canweguaranteesomething?

35 / 79 Course « InductionS » (A. Cornuéjols)

Whatkindoftheoreticalguarantees

oninductioncanweget?

36 / 79 Course « InductionS » (A. Cornuéjols)

Analysisoftheperceptron

Page 10: The course Course - AgroParisTech€¦ · Course « InductionS » (A. Cornuéjols) 3 / 79 Course « InductionS » (A. Cornuéjols) 4 / 79 Outline of the course Building an inductive

37 / 79 Course « InductionS » (A. Cornuéjols)

Theperceptron

{ biais

x

y

w1w2

w3 w4 w5

w0

wd

1

x1 x2 x3 x4 x5 xd

x0

neurone de biais

1

yi

x(1)

x(2)

x(3)

x(d)

w1i

w2i

w3i

wdi

σ(i) =d!

j=0

wjix(j)w0i

–  Rosenblatt (1958-1962)

38 / 79 Course « InductionS » (A. Cornuéjols)

Theperceptron:alineardiscriminant

w

39 / 79 Course « InductionS » (A. Cornuéjols)

Theperceptron

n  Learningtheweights–  Principle(Hebb’srule):incaseofsuccess,addtoeachweight

(connection)somevalueproportionaltotheinputandoutput

Perceptron’srule:learnonlyincaseoffailure

+

40 / 79 Course « InductionS » (A. Cornuéjols)

Propertiesthatareremarquable!!

n  Convergenceinafinitenumberofsteps

–  Independentlyofthenumberofexamples

–  Independentlyofthedistributionofexamples

–  (quasi)independentlyofthedimensionoftheinputspace

Ifthereexistsatleastonelinearseparatriceoftheexamples

!!!

Page 11: The course Course - AgroParisTech€¦ · Course « InductionS » (A. Cornuéjols) 3 / 79 Course « InductionS » (A. Cornuéjols) 4 / 79 Outline of the course Building an inductive

41 / 79 Course « InductionS » (A. Cornuéjols)

Guaranteesaboutgeneralizing??

n  Theoremsovertheperformance

withrespecttothetrainingsample

n  Butwhataboutfutureexamples?

42 / 79 Course « InductionS » (A. Cornuéjols)

–  Rosenblatt(1958-1962)

ThePerceptron

43 / 79 Course « InductionS » (A. Cornuéjols)

PAClearning

ProbablyApproximativelyCorrect

44 / 79 Course « InductionS » (A. Cornuéjols)

Targetclass:rectanglesinR2

n  Sample

–  Positiveinstances

–  Negativeinstances

P+X

P�X

x

y

Page 12: The course Course - AgroParisTech€¦ · Course « InductionS » (A. Cornuéjols) 3 / 79 Course « InductionS » (A. Cornuéjols) 4 / 79 Outline of the course Building an inductive

45 / 79 Course « InductionS » (A. Cornuéjols)

Targetclass:unknown

n  Whatdowewanttolearn?

Adecisionfonction(prediction)

x

y

!

46 / 79 Course « InductionS » (A. Cornuéjols)

Targetclass:unknown

n  Howtolearn?

x

y

47 / 79 Course « InductionS » (A. Cornuéjols)

Targetclass:rectanglesinR2

n  Howtolearn?

–  IfIknowthatthetargetconceptisarectangle

x

y

48 / 79 Course « InductionS » (A. Cornuéjols)

Targetclass:rectanglesinR2

n  Howtolearn?

–  IfIknowthatthetargetconceptisarectangle

x

y

Most general hypotheses

Page 13: The course Course - AgroParisTech€¦ · Course « InductionS » (A. Cornuéjols) 3 / 79 Course « InductionS » (A. Cornuéjols) 4 / 79 Outline of the course Building an inductive

49 / 79 Course « InductionS » (A. Cornuéjols)

Targetclass:rectanglesinR2

n  Howtolearn?

–  IfIknowthatthetargetconceptisarectangle

x

y

Most specific hypotheses

50 / 79 Course « InductionS » (A. Cornuéjols)

Targetclass:rectanglesinR2

n  Howtolearn?

–  Choiceofonehypothesish

Version

space

51 / 79 Course « InductionS » (A. Cornuéjols)

Targetclass:rectanglesinR2

n  Learning:choicedeh

–  Whichperformancetoexpect?

x

y

h

52 / 79 Course « InductionS » (A. Cornuéjols)

Thestatisticaltheoryoflearning

Whichperformance?

n  Costforapredictionerror

–  Thelossfunction

n  WhichexpectedcostifIchooseh?

–  The«realrisk»(ortruerisk)

R(h) =�

X�Y��h(x), y

�pXY(x, y) dx dy

��h(x), y

Page 14: The course Course - AgroParisTech€¦ · Course « InductionS » (A. Cornuéjols) 3 / 79 Course « InductionS » (A. Cornuéjols) 4 / 79 Outline of the course Building an inductive

53 / 79 Course « InductionS » (A. Cornuéjols)

Thestatisticaltheoryoflearning

n  Whichexpectedcostwhenhischosen?

–  AssumingthatthereisnotrainingerroronS

x

y

h

The«empiricalrisk»

R(h) =1m

m�

i=1

��h(xi), yi

54 / 79 Course « InductionS » (A. Cornuéjols)

Statisticaltheoryoflearning:theERM

n  Learningstrategy:

–  Selectanhypothesiswithnullempiricalrisk(notrainingerror)

–  Whichgeneralizationperformancetoexpectforh?

x

y

h

x

y

f

h

55 / 79 Course « InductionS » (A. Cornuéjols)

Statisticaltheoryoflearning:theERM

–  Selectanhypothesiswithnullempiricalrisk(notrainingerror)

–  Whichgeneralizationperformancetoexpectforh?

–  WhatistheriskofgettingerrorR(h)>ε?

x

y

f

h

h � f

x

y

f

h

56 / 79 Course « InductionS » (A. Cornuéjols)

Centralinterrogation:theinductiveprinciple

n  Theempiricalriskminimizationprinciple(ERM)

…isitsound?

–  IfIchosehsuchthat

–  Ishgoodwithrespecttotherealrisk?

–  CouldIhavedonemuchbetter?

R(h)? ! R(h)

Page 15: The course Course - AgroParisTech€¦ · Course « InductionS » (A. Cornuéjols) 3 / 79 Course « InductionS » (A. Cornuéjols) 4 / 79 Outline of the course Building an inductive

57 / 79 Course « InductionS » (A. Cornuéjols)

Thestatisticaltheoryoflearning

The1erstep

Onehypothesis

58 / 79 Course « InductionS » (A. Cornuéjols)

StatisticalstudyforONEhypothesis

–  Choseonehypothesisofnulempiricalrisk(noerroronthetrainingsetS)

–  Whichperformancecanweexpectforh?

–  WhatistheriskofhavingR(h)>ε?

x

y

f

h

h � f

x

y

f

h

59 / 79 Course « InductionS » (A. Cornuéjols)

StatisticalstudyforONEhypothesis

n  Assumethathst.(his«bad»)

n  Whatistheprobabilitythatnonethelesshhavebeenselected?

x

y

f

h

h � f

R(h) � �

R(h) = pX (h � f)

Afteroneexample: p�R(h

�= 0) � 1� �

Aftermexamples(i.i.d.):

pm�R(h

�= 0) � (1� �)m

Wewant: � ⇥, � � [0, 1] : pm�R(h

�� ⇥) � �

«falls»outside h � f

60 / 79 Course « InductionS » (A. Cornuéjols)

StatisticalstudyforONEhypothesis

n  Wewant:

x

y

f

h

h � f

Or:

Hence:

� ⇥, � � [0, 1] : pm�R(h

�� ⇥) � �

(1 � �)m � �

e�� m � �

�⇥ m � ln(�)

m � ln(1/�)⇥

<

Page 16: The course Course - AgroParisTech€¦ · Course « InductionS » (A. Cornuéjols) 3 / 79 Course « InductionS » (A. Cornuéjols) 4 / 79 Outline of the course Building an inductive

61 / 79 Course « InductionS » (A. Cornuéjols)

Thestatisticaltheoryoflearning

The2ndstep

Whichhypothesisinthecrowd

62 / 79 Course « InductionS » (A. Cornuéjols)

Statisticalstudyfor|H|hypotheses

n  WhatistheprobabilitythatIchoseonehypothesisherrofrealrisk>εandthatIdonotrealizeitaftermexamples?

n  Probabilityofsurvivalofherrafter1example:

n  Probabilityofsurvivalofherraftermexamples:

n  ProbabilityofsurvivalofatleastonehypothesisinH:

–  Weusetheprobabilityoftheunion

n  Wewantthattheprobabilitythatthereremainsatleastonehypothesisofrealrisk>εintheversionspacebeboundedbyδ:

63 / 79 Course « InductionS » (A. Cornuéjols)

The«PAClearning»analysis

n  Weget:

=0

Realizablecase:thereexistsatleastonefunctionhofrisk0

TheEmpiricalRiskMinimizationprinciple

issoundonlyifthereareconstraintsonthehypothesisspace

64 / 79 Course « InductionS » (A. Cornuéjols)

PAClearning:definition

n  Worstcaseanalysis

–  AgainstalldistributionsP

–  Foranytargethypothesisinaclassofhypotheses

n  Notionofcomputationalcomplexity

Given 0 < �, " < 1, a concept class C is learnable by a polynomial time algorithm A if,

for any distribution P of samples and any concept c 2 C,

there exists a polynomial p(·, ·, ·) such that

A will produce with probability at least 1� � a hypothesis h 2 C whose error is "

when given at least p(m, 1/�, 1") independent random examples drawn according to P .

[Valiant,1984]

Page 17: The course Course - AgroParisTech€¦ · Course « InductionS » (A. Cornuéjols) 3 / 79 Course « InductionS » (A. Cornuéjols) 4 / 79 Outline of the course Building an inductive

65 / 79 Course « InductionS » (A. Cornuéjols)

Thestatisticaltheoryoflearning

Uniformconvergencebounds

(fortheunrealizablecase)

66 / 79 Course « InductionS » (A. Cornuéjols)

Generalizingthelawoflargenumbers:uniformconvergence

Theoreme 1 (Inegalite de Hoe�ding). Si les �i sont des variables aleatoires,tirees independamment et selon une meme distribution et prenant leurvaleur dans l’intervalle [a, b], alors :

P

�����1m

m�

i=1

�i � E(�)���� � ⇥

�� 2 exp

�� 2 m ⇥2

(b� a)2

Appliquee au risque empirique et au risque reel, cette inegalite nous donne :

P�|REmp(h)�RReel(h)| ⇤ �

�⇥ 2 exp

�� 2 m �2

(b� a)2�

(1)

si la fonction de perte ⇥ est definie sur l’intervalle [a, b].

Pm[⌅h ⇤ H : RReel(h)�REmp(h) > ⇥] ⇥|H|�

i=1

Pm[RReel(hi)�REmp(hi) > ⇥]

⇥ |H| exp(�2 m ⇥2) = �

en supposant ici que la fonction de perte ⇤ prend ses valeurs dans l’intervalle[0, 1].

« H finite »

67 / 79 Course « InductionS » (A. Cornuéjols)

Boundingthetrueriskwiththeempiricalrisk+…

n  Hfinite,realizablecase

n  Hfinite,nonrealizablecase

⌅h ⇤ H,⌅� ⇥ 1 : Pm

�RReel(h) ⇥ REmp(h) +

�log |H|+ log 1

2 m

�> 1� �

⌅h ⇤ H,⌅� ⇥ 1 : Pm

�RReel(h) ⇥ REmp(h) +

log |H|+ log 1�

m

�> 1� �

68 / 79 Course « InductionS » (A. Cornuéjols)

Tosumup:for|H|finite

n  Nonrealizablecase

� =

�log |H|+ log 1

2 m and

� =log |H|+ log 1

mm �

log |H|+ log 1�

m �log |H|+ log 1

2 �2

n  Realizablecase

and

Page 18: The course Course - AgroParisTech€¦ · Course « InductionS » (A. Cornuéjols) 3 / 79 Course « InductionS » (A. Cornuéjols) 4 / 79 Outline of the course Building an inductive

69 / 79 Course « InductionS » (A. Cornuéjols)

|H|infinite!!

n  EffectivedimensionofH=theVapnik-Chervonenkisdimension

–  Combinatorialcriterion

–  Sizeofthelargestsetofpoints(ingeneralconfiguration)thatcanbelabeledinanywaybyhypothesesdrawnfrom H

Boundonthetruerisk

dV C(H) = max�m : �H(m) = 2m

⌅h ⇤ H,⌅� ⇥ 1 : Pm

�RReel(h) ⇥ REmp(h) +

�8 dV C(H) log 2 e m

dV C(H) + 8 log 4�

m

�> 1� �

70 / 79 Course « InductionS » (A. Cornuéjols)

VCdim:illustrations

n  dVC(linearseparator)=?

+

+ -

+

+

--

+

+

-

+

+

(a) (b) (c)

•  dVC(rectangles) = ?

+

+

-- +

+

-

++

+

-

+

+

-

(a) (b) (c) (d)

+

71 / 79 Course « InductionS » (A. Cornuéjols)

Lesson

n  Youcannotguaranteeanythingaboutinduction

n  Evenifyouassumethattheworldisstationary

andexamplesarei.i.d.

n  Unlessthereare(severe)constraintsonthehypothesisspace

Butwait…?

72 / 79 Course « InductionS » (A. Cornuéjols)

TheSuperVisionnetwork

Imageclassificationwithdeepconvolutionalneuralnetworks

–  7hidden“weight”layers

–  650Kneurons

–  60Mparameters

–  630Mconnections

SuperVision (SV)

Image classification with deep convolutional neural networks •  7 hidden “weight” layers •  650K neurons •  60M parameters •  630M connections

•  Rectified Linear Units, overlapping pooling, dropout trick •  Randomly extracted 224x224 patches for more data

h-p://image4net.org/challenges/LSVRC/2012/supervision.pdf+

Signal

Page 19: The course Course - AgroParisTech€¦ · Course « InductionS » (A. Cornuéjols) 3 / 79 Course « InductionS » (A. Cornuéjols) 4 / 79 Outline of the course Building an inductive

73 / 79 Course « InductionS » (A. Cornuéjols)

GoogleNet

n  Amécanoofneuralnetworks

1x1 semblent triviaux car ils ne permettent pas de reduire la dimension de l’entree, mais son criterenon-lineaire lui permet de complexifier la nature des attributs detectes et donc de voir des motifs pluscomplexes. Network in Network introduit aussi l’utilisation de reseau completement constitue par descouches convolutives, en remplacant les couches de classification par des filtres 1x1 (Figure 10).

FIGURE 10. Module Network in Network [33]

GoogleNet [58] une des architectures les plus utilisees (avec AlexNet) de part ses performances.Developpe par Google et gagnant du l’ILSVRC 2014, le modele se differencie des autres par sa com-plexite (22 couches contre 8 pour AlexNet) et l’utilisation de module inception (Figure 11). Le moduled’inception (Figure 12) est une configuration permettant d’appliquer plusieurs filtres de tailles differentesen parallele. La parallelisation et l’application de multiples filtres permettent d’apprendre plusieurs lo-giques d’extraction d’attributs, allant sur des details precis pour les filtres 1x1 jusqu’a des formes pluslarges pour les filtres 5x5.

FIGURE 11. Architecture du reseau GoogleNet [58]

9

74 / 79 Course « InductionS » (A. Cornuéjols)

Troublingfindings

Apaper–  C.Zhang,S.Bengio,M.Hardt,B.Recht,O.Vinyals(ICLR,May2017).

“Understandingdeeplearningrequiresrethinkinggeneralization”

Extensiveexperimentsontheclassificationofimages

–  TheAlexNet(>1,000,000parameters)+2otherarchitectures

–  TheCIFAR-10dataset:•  60,000imagescategorizedin10classes(50,000fortrainingand10,000fortesting)

•  Images:32x32pixelsin3colorchannels

Again, on intuitive grounds we expect that in order to make good predic-tions we need to select a hypothesis class F that is appropriate for the problemat hand. More precisely we should use some prior knowledge about the natureof the link between between the features x and the target y to choose whichfunctions the class F should possess. For instance if, for any reason, we knowthat with high probability the relation between x and y is approximately lin-ear we better choose F to contain only such functions fw(x) = w · x. In themost general setting this relationship is encoded in a complicated and unknownprobability distribution P on labeled observations (x, y). In many cases all weknow is that the relation between x and y has some smoothness properties.

The set of techniques that data scientists use to adapt the hypothesis classF to a specific problem is know as regularization. Some of these are explicit inthe sense that they constrain estimators f in some way as we shall describe insection 2. Some are implicit meaning that it is the dynamics of the algorithmwhich walks its way through the set F in search for a good f (typically usingstochastic gradient descent) that provides the regularization. Some of theseregularization techniques actually pertain more to art than to mathematics asthey rely more on experience and intuition than on theorems.

Figure 1: The architecture of AlexNet which is one of the networks used by the authors

in [1]

Deep Learning is a a very popular class of machine learning models, roughlyinspired by biology, that are particularly well suited for tackling complex, AI-like tasks such as image classification, NLP or automatic translation. Roughlyspeaking these models are defined by stacking layers that, each, combine linearcombinations of the input with non-linear activation functions (and perhapssome regularization). We won’t enter into defining them in detail here as manyexcellent textbooks [3, 4] will do the job. Figure 1 shows the architecture ofAlexNet a deep network used in the experiment [1]. For our purpose, which is adiscussion of the issue of generalization and regularization, su�ce it to say herethat these Deep Learning problems share the following facts:

• The number n of samples available for training these networks is typicallymuch smaller than the number k of parameters w = (w1, . . . , wk) thatdefine the functions fw 2 F

1.

• The probability distribution P (x, y) is impossible to describe in any sen-sible way in practice. For concreteness, think of x as the pixels of and

1The number of parameters k of a Deep Learning network such as AlexNet can be over ahundred of millions while being trained on “only” a few millions of images in image-net.

2

75 / 79 Course « InductionS » (A. Cornuéjols)

Troublingfindings

Experiments

1.  Originaldatasetwithoutmodification

•  Results?–  Trainingaccuracy=100%;Testaccuracy=89%–  Speedofconvergence~5,000steps

76 / 79 Course « InductionS » (A. Cornuéjols)

Troublingfindings

Experiments

1.  Originaldatasetwithoutmodification

•  Results?–  Trainingaccuracy=100%;Testaccuracy=89%–  Speedofconvergence~5,000steps

Expectedbehaviorifthecapacityofthehypothesisspaceislimited

i.e.thesystemcannotfitany(arbitrary)trainingdata

8h 2 H, 8� 1 : Pm

"R(h) bR(h) + 2 dRadm(H) + 3

rln(2/�)

m

#> 1� �

Page 20: The course Course - AgroParisTech€¦ · Course « InductionS » (A. Cornuéjols) 3 / 79 Course « InductionS » (A. Cornuéjols) 4 / 79 Outline of the course Building an inductive

77 / 79 Course « InductionS » (A. Cornuéjols)

Troublingfindings

Experiments

1.  Originaldatasetwithoutmodification

•  Results?–  Trainingaccuracy=100%;Testaccuracy=89%–  Speedofconvergence~5,000steps

2.  Randomlabels–  Trainingaccuracy=100%!!??;Testaccuracy=9.8%

–  Speedofconvergence=similarbehavior(~10,000steps)

!!!

78 / 79 Course « InductionS » (A. Cornuéjols)

Troublingfindings

Experiments

1.  Originaldatasetwithoutmodification

•  Results?–  Trainingaccuracy=100%;Testaccuracy=89%–  Speedofconvergence~5,000steps

2.  Randomlabels–  Trainingaccuracy=100%!!??;Testaccuracy=9.8%–  Speedofconvergence=similarbehavior(~10,000steps)

3.  Randompixels–  Trainingaccuracy=100%!!??;Testaccuracy~10%–  Speedofconvergence=similarbehavior(~10,000steps)

Now, we are in

trouble!!

79 / 79 Course « InductionS » (A. Cornuéjols)

Troublingfindings

n  DeepNNscanaccommodateANYtrainingset

Can grow without limit!!

But then,

why are deep NNs so good on image classification tasks?

8h 2 H, 8� 1 : Pm

"R(h) bR(h) + 2 dRadm(H) + 3

rln(2/�)

m

#> 1� �