41
CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2020

CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

CPSC340:MachineLearningandDataMining

ProbabilisticClassificationFall2020

Page 2: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

Admin• Waitinglistpeople:everyshouldbein!• Coursewebpage:– https://www.cs.ubc.ca/~fwood/CS340/

• Homework1duetonight.

Page 3: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

LastTime:Training,Testing,andValidation• Trainingstep:

• Predictionstep:

• Whatweareinterestedinisthetesterror:– Errormadebypredictionsteponnewdata.

Page 4: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

LastTime:FundamentalTrade-Off• Wedecomposedtesterrortogetafundamentaltrade-off:

– WhereEapprox =(Etest – Etrain).

• Etrain goesdownasmodelgetscomplicated:– Trainingerrorgoesdownasadecisiontreegetsdeeper.

• ButEapprox goesupasmodelgetscomplicated:– Trainingerrorbecomesaworseapproximationoftesterror.

Page 5: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

LastTime:ValidationError• Goldenrule:wecan’tlookattestdataduringtraining.• ButwecanapproximateEtest withavalidationerror:– Erroronasetoftrainingexampleswe“hid”duringtraining.

– Findthedecisiontreebasedonthe“train”rows.– Validationerroristheerrorofthedecisiontreeonthe“validation”rows.

• Wetypicallychoose“hyper-parameters”likedepthtominimizethevalidationerror.

Page 6: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

OverfittingtotheValidationSet?• Validationerrorusuallyhasloweroptimizationbiasthantrainingerror.

– Mightoptimizeover20valuesof“depth”,insteadofmillions+ofpossibletrees.

• Butwecanstilloverfit tothevalidationerror(commoninpractice):– Validationerrorisonlyanunbiasedapproximationifyouuseitonce.– Onceyoustartoptimizingit,youstarttooverfit tothevalidationset.

• Thisismostimportantwhenthevalidationsetis“small”:– Theoptimizationbiasdecreasesasthenumberofvalidationexamplesincreases.

• Remember,ourgoalisstilltodowellonthetestset(newdata),notthevalidationset(wherewealreadyknowthelabels).

Page 7: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

Shouldyoutrustthem?• Scenario1:

– “Ibuiltamodelbasedonthedatayougaveme.”– “Itclassifiedyourdatawith98%accuracy.”– “Itshouldget98%accuracyontherestofyourdata.”

• Probablynot:– Theyarereportingtrainingerror.– Thismighthavenothingtodowithtesterror.– E.g.,theycouldhavefitaverydeepdecisiontree.

• Why‘probably’?– Iftheyonlytriedafewverysimplemodels,the98%mightbereliable.– E.g.,theyonlyconsidereddecisionstumpswithsimple1-variablerules.

Page 8: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

Shouldyoutrustthem?• Scenario2:– “Ibuiltamodelbasedonhalfofthedatayougaveme.”– “Itclassifiedtheotherhalfofthedatawith98%accuracy.”– “Itshouldget98%accuracyontherestofyourdata.”

• Probably:– Theycomputedthevalidationerroronce.– Thisisanunbiasedapproximationofthetesterror.– Trustthemifyoubelievetheydidn’tviolatethegoldenrule.

Page 9: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

Shouldyoutrustthem?• Scenario3:– “Ibuilt10models basedonhalfofthedatayougaveme.”– “Oneofthemclassifiedtheotherhalfofthedatawith98%accuracy.”– “Itshouldget98%accuracyontherestofyourdata.”

• Probably:– Theycomputedthevalidationerrorasmallnumberoftimes.– Maximizingovertheseerrorsisabiasedapproximationoftesterror.– Buttheyonlymaximizeditover10models,sobiasisprobablysmall.– Theyprobablyknowaboutthegoldenrule.

Page 10: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

Shouldyoutrustthem?• Scenario4:– “Ibuilt1billionmodels basedonhalfofthedatayougaveme.”– “Oneofthemclassifiedtheotherhalfofthedatawith98%accuracy.”– “Itshouldget98%accuracyontherestofyourdata.”

• Probablynot:– Theycomputedthevalidationerrorahugenumberoftimes.– Theytriedsomanymodels,oneofthemislikelytoworkbychance.

• Why‘probably’?– Ifthe1billionmodelswereallextremely-simple,98%mightbereliable.

Page 11: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

Shouldyoutrustthem?• Scenario5:

– “Ibuilt1billionmodels basedonthefirstthirdofthedatayougaveme.”– “Oneofthemclassifiedthesecondthirdofthedatawith98%accuracy.”– “Italsoclassifiedthelastthirdofthedatawith98%accuracy.”– “Itshouldget98%accuracyontherestofyourdata.”

• Probably:– Theycomputedthefirstvalidationerrorahugenumberoftimes.– Buttheyhadasecondvalidationsetthattheyonlylookedatonce.– Thesecondvalidationsetgivesunbiasedtesterrorapproximation.– Thisisideal,aslongastheydidn’tviolategoldenruleonthelastthird.– AndassumingyouareusingIIDdatainthefirstplace.

Page 12: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

ValidationErrorandOptimizationBias• Optimizationbiasissmallifyouonlycompareafewmodels:– Bestdecisiontreeonthetrainingsetamongdepths1,2,3,…,10.– Riskofoverfittingtovalidationsetislowifwetry10things.

• Optimizationbiasislargeifyoucomparealotofmodels:– Allpossibledecisiontreesofdepth10orless.– Herewe’reusingthevalidationsettopickbetweenabillion+models:

• Riskofoverfittingtovalidationsetishigh:couldhavelowvalidationerrorbychance.

– Ifyoudidthis,youmightwantasecondvalidationsettodetectoverfitting.

• Andoptimizationbiasshrinksasyougrowsizeofvalidationset.

Page 13: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

Aside:OptimizationBiasleadstoPublicationBias• Supposethat20researchersperformtheexactsameexperiment:

• Theyeachtestwhethertheireffectis“significant”(p<0.05).– 19/20findthatitisnotsignificant.– Butthe1groupfindingit’ssignificantpublishesapaperabouttheeffect.

• Thisisagainoptimizationbias,contributingtopublicationbias.– Acontributingfactortomanyreportedeffectsbeingwrong.

Page 14: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

Cross-Validation(CV)• Isn’titwastefultoonlyusepartofyourdata?• 5-foldcross-validation:– Trainon80%ofthedata,validateontheother20%.– Repeatthis5moretimeswithdifferentsplits,andaveragethescore.

Page 15: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

Cross-Validation(CV)

TRAIN

TRAIN

TRAIN

TRAIN

VALIDATION

TRAIN

TRAIN

TRAIN

VALIDATION

TRAIN

TRAIN

TRAIN

VALIDATION

TRAIN

TRAIN

TRAIN

VALIDATION

TRAIN

TRAIN

TRAIN

VALIDATION

TRAIN

TRAIN

TRAIN

TRAIN

Page 16: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

Cross-ValidationPseudo-Code

Page 17: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

Cross-Validation(CV)• Youcantakethisideafurther(“k-foldcross-validation”):– 10-foldcross-validation:trainon90%ofdataandvalidateon10%.

• Repeat10timesandaverage(testonfold1,thenfold2,…,thenfold10),

– Leave-one-outcross-validation:trainonallbutonetrainingexample.• Repeatntimesandaverage.

• Gets moreaccurate butmoreexpensive withmorefolds.– Tochoosedepthwecomputethecross-validationscoreforeachdepth.

• Asbefore,ifdataisorderedthenfoldsshouldberandomsplits.– Randomizefirst,thensplitintofixedfolds.

Page 18: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

(pause)

Page 19: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

The“Best”MachineLearningModel• Decisiontreesarenotalwaysmostaccurateontesterror.• Whatisthe“best”machinelearningmodel?• Analternativemeasureofperformanceisthegeneralizationerror:

– Averageerroroverallxi vectorsthatarenotseeninthetrainingset.– “Howwellweexpecttodoforacompletelyunseenfeaturevector”.

• Nofreelunchtheorem (proofinbonusslides):– Thereisno “best”modelachievingthebestgeneralizationerrorforeveryproblem.

– IfmodelAgeneralizesbettertonewdatathanmodelBononedataset,thereisanotherdatasetwheremodelBworksbetter.

• Thisquestion islikeaskingwhichis“best”among“rock”,“paper”,and“scissors”.

Page 20: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

The“Best”MachineLearningModel• Implicationsofthelackofa“best”model:

– Weneedtolearnaboutandtryoutmultiplemodels.• SowhichonestostudyinCPSC340?

– We’llusuallymotivateeachmethodbyaspecificapplication.– Butwe’refocusingonmodelsthathavebeeneffectiveinmanyapplications.

• Caveatofnofreelunch(NFL)theorem:– Theworldisverystructured.– Somedatasetsaremorelikelythanothers.– ModelAreallycouldbebetterthanmodelBoneveryrealdatasetinpractice.

• Machinelearningresearch:– Largefocusonmodelsthatareusefulacrossmanyapplications.

Page 21: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

Application:E-mailSpamFiltering• Wantabuildasystemthatdetectsspame-mails.– Context:spamusedtobeabigproblem.

• Canweformulateassupervisedlearning?

Page 22: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

SpamFilteringasSupervisedLearning• Collectalargenumberofe-mails,getsuserstolabelthem.

• Wecanuse(yi =1)ife-mail‘i’isspam,(yi =0)ife-mailisnotspam.• Extractfeaturesofeache-mail(likebagofwords).– (xij =1)ifword/phrase‘j’isine-mail‘i’,(xij =0)ifitisnot.

$ Hi CPSC 340 Vicodin Offer …

1 1 0 0 1 0 …

0 0 0 0 1 1 …

0 1 1 1 0 0 …

… … … … … … …

Spam?

1

1

0

Page 23: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

FeatureRepresentationforSpam• Aretherebetterfeaturesthanbagofwords?– Weaddbigrams (setsoftwowords):

• “CPSC340”,“waitlist”,“specialdeal”.– Ortrigrams (setsofthreewords):

• “Limitedtimeoffer”,“courseregistrationdeadline”,“you’reawinner”.

– Wemightincludethesenderdomain:• <senderdomain==“mail.com”>.

– Wemightincluderegularexpressions:• <yourfirstandlastname>.

Page 24: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

ReviewofSupervisedLearningNotation• Wehavebeenusingthenotation‘X’and‘y’forsupervisedlearning:

• Xismatrixofallfeatures,yisvectorofalllabels.– Weuseyi forthelabelofexample‘i’(element‘i’of‘y’).– Weusexij forfeature‘j’ofexample‘i‘.– Weusexi asthelistoffeaturesofexample‘i’ (row‘i’of‘X’).

• Sointheabovex3 =[011100…].• Inpractice,onlystorelistofnon-zerofeaturesforeachxi(smallmemoryrequirement).

$ Hi CPSC 340 Vicodin Offer …

1 1 0 0 1 0 …

0 0 0 0 1 1 …

0 1 1 1 0 0 …

… … … … … … …

Spam?

1

1

0

Page 25: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

ProbabilisticClassifiers• Foryears,bestspamfilteringmethodsusednaïveBayes.

– Aprobabilistic classifier basedonBayesrule.– Ittendstoworkwellwithbagofwords.– RecentlyshowntoimproveonstateoftheartforCRISPR“geneediting”(link).

• Probabilisticclassifiersmodeltheconditionalprobability,p(yi |xi).– “Ifamessagehaswordsxi,whatisprobabilitythatmessageisspam?”

• Classifyitasspamifprobabilityofspamishigherthannotspam:– Ifp(yi =“spam”|xi)>p(yi =“notspam”|xi)

• return“spam”.– Else

• return“notspam”.

Page 26: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

SpamFilteringwithBayesRule• Tomodelconditionalprobability,naïveBayesusesBayesrule:

• Soweneedtofigureoutthreetypesofterms:– Marginalprobabilityp(yi)thatane-mailisspam.– Marginalprobabilityp(xi) thatane-mailhasthesetofwordsxi.– Conditionalprobabilityp(xi |yi)thataspame-mailhasthewordsxi.

• Andthesamefornon-spame-mails.

Page 27: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

SpamFilteringwithBayesRule

• Whatdothesetermsmean?

ALLE-MAILS(includingduplicates)

Page 28: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

SpamFilteringwithBayesRule

• p(yi =“spam”)isprobabilitythatarandome-mailisspam.– Thisiseasytoapproximatefromdata:usetheproportioninyourdata.

ALLE-MAILS(includingduplicates)SPAMNOT

SPAM Thisisan“estimate”ofthetrueprobability.Inparticular,thisformulaisa“maximumlikelihoodestimate”(MLE).WewillcoverlikelihoodsandMLEslaterinthecourse.

Page 29: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

SpamFilteringwithBayesRule

• p(xi)isprobabilitythatarandome-mailhasfeaturesxi:– Hardtoapproximate:with‘d’wordsweneedtocollect2d “coupons”,

andthat’sjusttoseeeachwordcombinationonce.

ALLE-MAILS(includingduplicates)

Page 30: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

SpamFilteringwithBayesRule

• p(xi)isprobabilitythatarandome-mailhasfeaturesxi:– Hardtoapproximate:with‘d’wordsweneedtocollect2d “coupons”,butitturnsoutwecanignoreit:

Page 31: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

SpamFilteringwithBayesRule

• p(xi |yi =“spam”)isprobabilitythatspamhasfeaturesxi.

ALLE-MAILS(includingduplicates)

NOTSPAM SPAM

• Alsohardtoapproximate.• Andweneedit.

Page 32: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

NaïveBayes• NaïveBayesmakesabigassumptiontomakethingseasier:

• Weassumeall featuresxi areconditionallyindependentgivelabel yi.– Onceyouknowit’sspam,probabilityof“vicodin”doesn’tdependon“340”.– Definitelynottrue,butsometimesagoodapproximation.

• Andnowweonlyneedeasyquantitieslikep(“vicodin” =0|yi =“spam”).

Page 33: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

NaïveBayes• p(“vicodin”=1|“spam”=1)isprobabilityofseeing“vicodin”inspam.

ALLPOSSIBLEE-MAILS(includingduplicates)SPAMNOT

SPAM

• Easytoestimate:Vicodin

Again,thisisa“maximumlikelihoodestimate”(MLE).Wewillcoverhowtoderivethislater.

Page 34: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

Summary• Optimizationbias:usingavalidationsettoomuchoverfits.• Cross-validation:allowsbetteruseofdatatoestimatetesterror.• Nofreelunchtheorem:thereisno“best”MLmodel.• Probabilisticclassifiers:trytoestimatep(yi |xi).• NaïveBayes:simpleprobabilisticclassifierbasedoncounting.– Usesconditionalindependenceassumptionstomaketrainingpractical.

• Nexttime:– A“best”machinelearningmodelas‘n’goesto∞.

Page 35: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

BacktoDecisionTrees• Insteadofvalidationset,youcanuseCVtoselecttreedepth.

• Butyoucanalsousethesetodecidewhethertosplit:– Don’tsplitifvalidation/CVerrordoesn’timprove.– Differentpartsofthetreewillhavedifferentdepths.

• Orfitdeepdecisiontreeanduse[cross-]validationtoprune:– Removeleafnodesthatdon’timproveCVerror.

• Popularimplementationsthathavethesetricksandothers.

Page 36: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

RandomSubsamples• Insteadofsplittingintok-folds,consider“randomsubsample”method:– Ateach“round”,choosearandomsetofsize‘m’.

• Trainonallexamplesexceptthese‘m’examples.• Computevalidationerroronthese‘m’examples.

• Advantages:– Stillanunbiasedestimatoroferror.– Numberof“rounds”doesnotneedtoberelatedto“n”.

• Disadvantage:– Examplesthataresampledmoreoftengetmore“weight”.

Page 37: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

Cross-ValidationTheory• DoesCVgiveunbiasedestimateoftesterror?

– Yes!• Sinceeachdatapointisonlyusedonceinvalidation,expectedvalidationerroroneachdatapointistesterror.

– Butagain,ifyouuseCVtoselectamongmodelsthenitisnolongerunbiased.

• WhataboutvarianceofCV?– Hardtocharacterize.– CVvarianceon‘n’datapointsisworsethanwithavalidationsetofsize‘n’.

• Butwebelieveitisclose.

• Doescross-validationremoveoptimizationbias?– No,butthebiasmightbesmallersinceyouhavemore“test”points.

Page 38: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

HandlingDataSparsity• Doweneedtostorethefullbagofwords0/1variables?– No:onlyneedlistofnon-zerofeatures foreache-mail.

– Math/modeldoesn’tchange,butmoreefficientstorage.

$ Hi CPSC 340 Vicodin Offer …

1 1 0 0 1 0 …

0 0 0 0 1 1 …

0 1 1 1 0 0 …

1 1 0 0 0 1 …

Non-Zeroes

{1,2,5,…}

{5,6,…}

{2,3,4,…}

{1,2,6,…}

Page 39: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

ProofofNoFreeLunchTheorem• Let’sshowthe“nofreelunch”theoreminasimplesetting:– Thexi andyi arebinary,andyi beingadeterministicfunctionofxi.

• With‘d’features,each“learningproblem”isamapfromeachofthe2d featurecombinationsto0or1:{0,1}d ->{0,1}

• Let’spickoneofthesemaps(“learningproblems”)and:– Generateasettrainingsetof‘n’IIDsamples.– FitmodelA (convolutionalneuralnetwork)andmodelB (naïveBayes).

Feature 1 Feature2 Feature3

0 0 0

0 0 1

0 1 0

… … …

Map1 Map2 Map3 …

0 1 0 …

0 0 1 …

0 0 0 …

… … … …

Page 40: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

ProofofNoFreeLunchTheorem• Definethe“unseen”examplesasthe(2d – n)notseenintraining.– Assumingnorepetitionsofxi values,andn<2d.– Generalizationerroristheaverageerroronthese“unseen”examples.

• SupposethatmodelAgot1%errorandmodelBgot60%error.– WewanttoshowmodelBbeatsmodelAonanother“learningproblem”.

• Amongoursetof“learningproblems”findtheonewhere:– Thelabelsyi agreeonalltrainingexamples.– Thelabelsyi disagreeonall“unseen”examples.

• Onthisother“learningproblem”:– ModelAgets99%errorandmodelBgets40%error.

Page 41: CPSC 340: Machine Learning and Data Miningfwood/CS340/lectures/L5.pdf · Machine Learning and Data Mining Probabilistic Classification Fall 2019. Admin •Waiting list people: the

ProofofNoFreeLunchTheorem• Further,acrossall“learningproblems”withthese‘n’examples:– Averagegeneralizationerrorofeverymodelis50%onunseenexamples.

• It’srightoneachunseenexampleinexactlyhalfthelearningproblems.– With‘k’classes,theaverageerroris(k-1)/k(randomguessing).

• Thisiskindofdepressing:– Forgeneralproblems,no“machinelearning”isbetterthan“predict0”.

• ButtheproofalsorevealstheproblemwiththeNFLtheorem:– Assumesevery“learningproblem”isequallylikely.– Worldencouragespatternslike“similarfeaturesimpliessimilarlabels”.