Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
CS639:DataManagementfor
DataScienceLecture19:UnsupervisedLearning/EnsembleLearning
TheodorosRekatsinas
1
Today
1. UnsupervisedLearning/Clustering
2. K-Means
3. EnsemblesandGradientBoosting
2
Whatisclustering?
Whatdoweneedforclustering?
Distance(dissimilarity)measures
Clusterevaluation(ahardproblem)
Howmanyclusters?
Clusteringtechniques
Clusteringtechniques
Clusteringtechniques
K-means
K-meansalgorithm
K-meansconvergence(stoppingcriterion)
K-meansclusteringexample:step1
K-meansclusteringexample:step2
K-meansclusteringexample:step3
K-meansclusteringexample
K-meansclusteringexample
K-meansclusteringexample
WhyuseK-means?
WeaknessesofK-means
Outliers
Dealingwithoutliers
Sensitivitytoinitialseeds
K-meanssummary
Tonsofclusteringtechniques
Summary:clustering
Ensemblelearning:FightingtheBias/VarianceTradeoff
28
Ensemblemethods• Combinedifferentmodelstogetherto
• Minimizevariance• Bagging• RandomForests
• Minimizebias• FunctionalGradientDescent• Boosting• EnsembleSelection
Ensemblemethods• Combinedifferentmodelstogetherto
• Minimizevariance• Bagging• RandomForests
• Minimizebias• FunctionalGradientDescent• Boosting• EnsembleSelection
Bagging• Goal:reducevariance
• Idealsetting:manytrainingsetsS’• TrainmodelusingeachS’• Averagepredictions
ES[(h(x|S)- y)2]=ES[(Z-ž)2]+ž2
Variance BiasExpectedError
Z=h(x|S)– yž =ES[Z]
http://statistics.berkeley.edu/sites/default/files/tech-reports/421.pdf“BaggingPredictors” [LeoBreiman,1994]
VariancereduceslinearlyBiasunchanged
sampledindependently
Person+ Age+ Male?+ Height+>+55”+
Alice# 14# 0# 1#
Bob# 10# 1# 1#
Carol# 13# 0# 1#
Dave# 8# 1# 0#
Erin# 11# 0# 0#
Frank# 9# 1# 1#
Gena# 8# 0# 0#
Person+ Age+ Male?+ Height+>+55”+
James# 11# 1# 1#
Jessica# 14# 0# 1#
Alice# 14# 0# 1#
Amy# 12# 0# 1#
Bob# 10# 1# 1#
Xavier# 9# 1# 0#
Cathy# 9# 0# 1#
Carol# 13# 0# 1#
Eugene# 13# 1# 0#
Rafael# 12# 1# 1#
Dave# 8# 1# 0#
Peter# 9# 1# 0#
Henry# 13# 1# 0#
Erin# 11# 0# 0#
Rose# 7# 0# 0#
Iain# 8# 1# 1#
Paulo# 12# 1# 0#
Margaret# 10# 0# 1#
Frank# 9# 1# 1#
Jill# 13# 0# 0#
Leon# 10# 1# 0#
Sarah# 12# 0# 0#
Gena# 8# 0# 0#
Patrick# 5# 1# 1#…+
L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###GeneralizaHon+Error:+
h(x)+y+
Person+ Age+ Male?+ Height+>+55”+
Alice# 14# 0# 1#
Bob# 10# 1# 1#
Carol# 13# 0# 1#
Dave# 8# 1# 0#
Erin# 11# 0# 0#
Frank# 9# 1# 1#
Gena# 8# 0# 0#
Person+ Age+ Male?+ Height+>+55”+
James# 11# 1# 1#
Jessica# 14# 0# 1#
Alice# 14# 0# 1#
Amy# 12# 0# 1#
Bob# 10# 1# 1#
Xavier# 9# 1# 0#
Cathy# 9# 0# 1#
Carol# 13# 0# 1#
Eugene# 13# 1# 0#
Rafael# 12# 1# 1#
Dave# 8# 1# 0#
Peter# 9# 1# 0#
Henry# 13# 1# 0#
Erin# 11# 0# 0#
Rose# 7# 0# 0#
Iain# 8# 1# 1#
Paulo# 12# 1# 0#
Margaret# 10# 0# 1#
Frank# 9# 1# 1#
Jill# 13# 0# 0#
Leon# 10# 1# 0#
Sarah# 12# 0# 0#
Gena# 8# 0# 0#
Patrick# 5# 1# 1#…+
L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###GeneralizaHon+Error:+
h(x)+y+
S’P(x,y)
Bagging
32
• Goal:reducevariance
• Inpractice:resampleS’withreplacement• TrainmodelusingeachS’• Averagepredictions
ES[(h(x|S)- y)2]=ES[(Z-ž)2]+ž2
Variance BiasExpectedError
Z=h(x|S)– yž =ES[Z]
fromS
http://statistics.berkeley.edu/sites/default/files/tech-reports/421.pdf“BaggingPredictors” [LeoBreiman,1994]
Variancereducessub-linearly(BecauseS’arecorrelated)Biasoftenincreasesslightly
Person+ Age+ Male?+ Height+>+55”+
Alice# 14# 0# 1#
Bob# 10# 1# 1#
Carol# 13# 0# 1#
Dave# 8# 1# 0#
Erin# 11# 0# 0#
Frank# 9# 1# 1#
Gena# 8# 0# 0#
Person+ Age+ Male?+ Height+>+55”+
James# 11# 1# 1#
Jessica# 14# 0# 1#
Alice# 14# 0# 1#
Amy# 12# 0# 1#
Bob# 10# 1# 1#
Xavier# 9# 1# 0#
Cathy# 9# 0# 1#
Carol# 13# 0# 1#
Eugene# 13# 1# 0#
Rafael# 12# 1# 1#
Dave# 8# 1# 0#
Peter# 9# 1# 0#
Henry# 13# 1# 0#
Erin# 11# 0# 0#
Rose# 7# 0# 0#
Iain# 8# 1# 1#
Paulo# 12# 1# 0#
Margaret# 10# 0# 1#
Frank# 9# 1# 1#
Jill# 13# 0# 0#
Leon# 10# 1# 0#
Sarah# 12# 0# 0#
Gena# 8# 0# 0#
Patrick# 5# 1# 1#…+
L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###GeneralizaHon+Error:+
h(x)+y+
Person+ Age+ Male?+ Height+>+55”+
Alice# 14# 0# 1#
Bob# 10# 1# 1#
Carol# 13# 0# 1#
Dave# 8# 1# 0#
Erin# 11# 0# 0#
Frank# 9# 1# 1#
Gena# 8# 0# 0#
Person+ Age+ Male?+ Height+>+55”+
James# 11# 1# 1#
Jessica# 14# 0# 1#
Alice# 14# 0# 1#
Amy# 12# 0# 1#
Bob# 10# 1# 1#
Xavier# 9# 1# 0#
Cathy# 9# 0# 1#
Carol# 13# 0# 1#
Eugene# 13# 1# 0#
Rafael# 12# 1# 1#
Dave# 8# 1# 0#
Peter# 9# 1# 0#
Henry# 13# 1# 0#
Erin# 11# 0# 0#
Rose# 7# 0# 0#
Iain# 8# 1# 1#
Paulo# 12# 1# 0#
Margaret# 10# 0# 1#
Frank# 9# 1# 1#
Jill# 13# 0# 0#
Leon# 10# 1# 0#
Sarah# 12# 0# 0#
Gena# 8# 0# 0#
Patrick# 5# 1# 1#…+
L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###GeneralizaHon+Error:+
h(x)+y+
S’S
Bagging=BootstrapAggregation
RandomForests
33
• Goal: reducevariance• Baggingcanonlydosomuch• Resamplingtrainingdataasymptotes
• RandomForests:sampledata&features!• SampleS’• TrainDT
• Ateachnode,samplefeatures(sqrt)• Averagepredictions
“RandomForests– RandomFeatures”[LeoBreiman,1997]http://oz.berkeley.edu/~breiman/random-forests.pdf
Furtherde-correlatestrees
Ensemblemethods• Combinedifferentmodelstogetherto
• Minimizevariance• Bagging• RandomForests
• Minimizebias• FunctionalGradientDescent• Boosting• EnsembleSelection
GradientBoosting
35
h(x)=h1(x)
S’={(x,y)}
h1(x)
S’={(x,y-h1(x))}
h2(x)
S’={(x,y-h1(x)- …- hn-1(x))}
hn(x)
…
+h2(x) +…+hn(x)
Boosting(AdaBoost)
36
h(x)=a1h1(x)
S’={(x,y,u1)}
h1(x)
S’={(x,y,u2)}
h2(x)
S’={(x,y,u3))}
hn(x)
…
+a2h2(x)+…+a3hn(x)
https://www.cs.princeton.edu/~schapire/papers/explaining-adaboost.pdf
u– weightingondatapointsa – weightoflinearcombination Stopwhenvalidation
performanceplateaus(willdiscusslater)
EnsembleSelection
37
“EnsembleSelectionfromLibrariesofModels”Caruana,Niculescu-Mizil,Crew&Ksikes,ICML2004
Person+ Age+ Male?+ Height+>+55”+
Alice# 14# 0# 1#
Bob# 10# 1# 1#
Carol# 13# 0# 1#
Dave# 8# 1# 0#
Erin# 11# 0# 0#
Frank# 9# 1# 1#
Gena# 8# 0# 0#
Person+ Age+ Male?+ Height+>+55”+
James# 11# 1# 1#
Jessica# 14# 0# 1#
Alice# 14# 0# 1#
Amy# 12# 0# 1#
Bob# 10# 1# 1#
Xavier# 9# 1# 0#
Cathy# 9# 0# 1#
Carol# 13# 0# 1#
Eugene# 13# 1# 0#
Rafael# 12# 1# 1#
Dave# 8# 1# 0#
Peter# 9# 1# 0#
Henry# 13# 1# 0#
Erin# 11# 0# 0#
Rose# 7# 0# 0#
Iain# 8# 1# 1#
Paulo# 12# 1# 0#
Margaret# 10# 0# 1#
Frank# 9# 1# 1#
Jill# 13# 0# 0#
Leon# 10# 1# 0#
Sarah# 12# 0# 0#
Gena# 8# 0# 0#
Patrick# 5# 1# 1#…+
L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###GeneralizaHon+Error:+
h(x)+y+
Person+ Age+ Male?+ Height+>+55”+
Alice# 14# 0# 1#
Bob# 10# 1# 1#
Carol# 13# 0# 1#
Dave# 8# 1# 0#
Erin# 11# 0# 0#
Frank# 9# 1# 1#
Gena# 8# 0# 0#
Person+ Age+ Male?+ Height+>+55”+
James# 11# 1# 1#
Jessica# 14# 0# 1#
Alice# 14# 0# 1#
Amy# 12# 0# 1#
Bob# 10# 1# 1#
Xavier# 9# 1# 0#
Cathy# 9# 0# 1#
Carol# 13# 0# 1#
Eugene# 13# 1# 0#
Rafael# 12# 1# 1#
Dave# 8# 1# 0#
Peter# 9# 1# 0#
Henry# 13# 1# 0#
Erin# 11# 0# 0#
Rose# 7# 0# 0#
Iain# 8# 1# 1#
Paulo# 12# 1# 0#
Margaret# 10# 0# 1#
Frank# 9# 1# 1#
Jill# 13# 0# 0#
Leon# 10# 1# 0#
Sarah# 12# 0# 0#
Gena# 8# 0# 0#
Patrick# 5# 1# 1#…+
L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###GeneralizaHon+Error:+
h(x)+y+
Person+ Age+ Male?+ Height+>+55”+
Alice# 14# 0# 1#
Bob# 10# 1# 1#
Carol# 13# 0# 1#
Dave# 8# 1# 0#
Erin# 11# 0# 0#
Frank# 9# 1# 1#
Gena# 8# 0# 0#
Person+ Age+ Male?+ Height+>+55”+
James# 11# 1# 1#
Jessica# 14# 0# 1#
Alice# 14# 0# 1#
Amy# 12# 0# 1#
Bob# 10# 1# 1#
Xavier# 9# 1# 0#
Cathy# 9# 0# 1#
Carol# 13# 0# 1#
Eugene# 13# 1# 0#
Rafael# 12# 1# 1#
Dave# 8# 1# 0#
Peter# 9# 1# 0#
Henry# 13# 1# 0#
Erin# 11# 0# 0#
Rose# 7# 0# 0#
Iain# 8# 1# 1#
Paulo# 12# 1# 0#
Margaret# 10# 0# 1#
Frank# 9# 1# 1#
Jill# 13# 0# 0#
Leon# 10# 1# 0#
Sarah# 12# 0# 0#
Gena# 8# 0# 0#
Patrick# 5# 1# 1#…+
L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###GeneralizaHon+Error:+
h(x)+y+
TrainingS’
ValidationV’H={2000modelstrainedusingS’}
h(x)=h1(x)+h2(x)+…+hn(x)MaintainensemblemodelascombinationofH:
AddmodelfromHthatmaximizesperformanceonV’
+hn+1(x)
Repeat
S
Denoteashn+1
ModelsaretrainedonS’EnsemblebuilttooptimizeV’
Summary
38
Method MinimizeBias? MinimizeVariance? Other Comments
Bagging Complexmodelclass.(DeepDTs)
Bootstrap aggregation(resamplingtrainingdata)
Does notworkforsimplemodels.
RandomForests
Complexmodelclass.(DeepDTs)
Bootstrapaggregation+bootstrappingfeatures
Onlyfordecisiontrees.
GradientBoosting(AdaBoost)
Optimizetrainingperformance.
Simplemodelclass.(ShallowDTs)
Determineswhichmodeltoaddatrun-time.
EnsembleSelection
Optimize validationperformance
Optimizevalidationperformance
Pre-specified dictionaryofmodelslearnedontrainingset.