Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

CS639:DataManagementfor

DataScienceLecture19:UnsupervisedLearning/EnsembleLearning

TheodorosRekatsinas

1

Today

1. UnsupervisedLearning/Clustering

2. K-Means

3. EnsemblesandGradientBoosting

2

Whatisclustering?

Whatdoweneedforclustering?

Distance(dissimilarity)measures

Clusterevaluation(ahardproblem)

Howmanyclusters?

Clusteringtechniques



K-means

K-meansalgorithm

K-meansconvergence(stoppingcriterion)

K-meansclusteringexample:step1



K-meansclusteringexample



WhyuseK-means?

WeaknessesofK-means

Outliers

Dealingwithoutliers

Sensitivitytoinitialseeds

K-meanssummary

Tonsofclusteringtechniques

Summary:clustering

Ensemblelearning:FightingtheBias/VarianceTradeoff

28

Ensemblemethods• Combinedifferentmodelstogetherto

• Minimizevariance• Bagging• RandomForests

• Minimizebias• FunctionalGradientDescent• Boosting• EnsembleSelection




Bagging• Goal:reducevariance

• Idealsetting:manytrainingsetsS’• TrainmodelusingeachS’• Averagepredictions

ES[(h(x|S)- y)2]=ES[(Z-ž)2]+ž2

Variance BiasExpectedError

Z=h(x|S)– yž =ES[Z]

http://statistics.berkeley.edu/sites/default/files/tech-reports/421.pdf“BaggingPredictors” [LeoBreiman,1994]

VariancereduceslinearlyBiasunchanged

sampledindependently

Person+ Age+ Male?+ Height+>+55”+

Alice# 14# 0# 1#

Bob# 10# 1# 1#

Carol# 13# 0# 1#

Dave# 8# 1# 0#

Erin# 11# 0# 0#

Frank# 9# 1# 1#

Gena# 8# 0# 0#


James# 11# 1# 1#

Jessica# 14# 0# 1#

Alice# 14# 0# 1#

Amy# 12# 0# 1#

Bob# 10# 1# 1#

Xavier# 9# 1# 0#

Cathy# 9# 0# 1#

Carol# 13# 0# 1#

Eugene# 13# 1# 0#

Rafael# 12# 1# 1#

Dave# 8# 1# 0#

Peter# 9# 1# 0#

Henry# 13# 1# 0#

Erin# 11# 0# 0#

Rose# 7# 0# 0#

Iain# 8# 1# 1#

Paulo# 12# 1# 0#

Margaret# 10# 0# 1#

Frank# 9# 1# 1#

Jill# 13# 0# 0#

Leon# 10# 1# 0#

Sarah# 12# 0# 0#

Gena# 8# 0# 0#

Patrick# 5# 1# 1#…+

L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###GeneralizaHon+Error:+

h(x)+y+


Alice# 14# 0# 1#

Bob# 10# 1# 1#

Carol# 13# 0# 1#

Dave# 8# 1# 0#

Erin# 11# 0# 0#

Frank# 9# 1# 1#

Gena# 8# 0# 0#


James# 11# 1# 1#

Jessica# 14# 0# 1#

Alice# 14# 0# 1#

Amy# 12# 0# 1#

Bob# 10# 1# 1#

Xavier# 9# 1# 0#

Cathy# 9# 0# 1#

Carol# 13# 0# 1#

Eugene# 13# 1# 0#

Rafael# 12# 1# 1#

Dave# 8# 1# 0#

Peter# 9# 1# 0#

Henry# 13# 1# 0#

Erin# 11# 0# 0#

Rose# 7# 0# 0#

Iain# 8# 1# 1#

Paulo# 12# 1# 0#

Margaret# 10# 0# 1#

Frank# 9# 1# 1#

Jill# 13# 0# 0#

Leon# 10# 1# 0#

Sarah# 12# 0# 0#

Gena# 8# 0# 0#

Patrick# 5# 1# 1#…+


h(x)+y+

S’P(x,y)

Bagging

32

• Goal:reducevariance

• Inpractice:resampleS’withreplacement• TrainmodelusingeachS’• Averagepredictions

ES[(h(x|S)- y)2]=ES[(Z-ž)2]+ž2

Variance BiasExpectedError

Z=h(x|S)– yž =ES[Z]

fromS

http://statistics.berkeley.edu/sites/default/files/tech-reports/421.pdf“BaggingPredictors” [LeoBreiman,1994]

Variancereducessub-linearly(BecauseS’arecorrelated)Biasoftenincreasesslightly


Alice# 14# 0# 1#

Bob# 10# 1# 1#

Carol# 13# 0# 1#

Dave# 8# 1# 0#

Erin# 11# 0# 0#

Frank# 9# 1# 1#

Gena# 8# 0# 0#


James# 11# 1# 1#

Jessica# 14# 0# 1#

Alice# 14# 0# 1#

Amy# 12# 0# 1#

Bob# 10# 1# 1#

Xavier# 9# 1# 0#

Cathy# 9# 0# 1#

Carol# 13# 0# 1#

Eugene# 13# 1# 0#

Rafael# 12# 1# 1#

Dave# 8# 1# 0#

Peter# 9# 1# 0#

Henry# 13# 1# 0#

Erin# 11# 0# 0#

Rose# 7# 0# 0#

Iain# 8# 1# 1#

Paulo# 12# 1# 0#

Margaret# 10# 0# 1#

Frank# 9# 1# 1#

Jill# 13# 0# 0#

Leon# 10# 1# 0#

Sarah# 12# 0# 0#

Gena# 8# 0# 0#

Patrick# 5# 1# 1#…+


h(x)+y+


Alice# 14# 0# 1#

Bob# 10# 1# 1#

Carol# 13# 0# 1#

Dave# 8# 1# 0#

Erin# 11# 0# 0#

Frank# 9# 1# 1#

Gena# 8# 0# 0#


James# 11# 1# 1#

Jessica# 14# 0# 1#

Alice# 14# 0# 1#

Amy# 12# 0# 1#

Bob# 10# 1# 1#

Xavier# 9# 1# 0#

Cathy# 9# 0# 1#

Carol# 13# 0# 1#

Eugene# 13# 1# 0#

Rafael# 12# 1# 1#

Dave# 8# 1# 0#

Peter# 9# 1# 0#

Henry# 13# 1# 0#

Erin# 11# 0# 0#

Rose# 7# 0# 0#

Iain# 8# 1# 1#

Paulo# 12# 1# 0#

Margaret# 10# 0# 1#

Frank# 9# 1# 1#

Jill# 13# 0# 0#

Leon# 10# 1# 0#

Sarah# 12# 0# 0#

Gena# 8# 0# 0#

Patrick# 5# 1# 1#…+


h(x)+y+

S’S

Bagging=BootstrapAggregation

RandomForests

33

• Goal: reducevariance• Baggingcanonlydosomuch• Resamplingtrainingdataasymptotes

• RandomForests:sampledata&features!• SampleS’• TrainDT

• Ateachnode,samplefeatures(sqrt)• Averagepredictions

“RandomForests– RandomFeatures”[LeoBreiman,1997]http://oz.berkeley.edu/~breiman/random-forests.pdf

Furtherde-correlatestrees




GradientBoosting

35

h(x)=h1(x)

S’={(x,y)}

h1(x)

S’={(x,y-h1(x))}

h2(x)

S’={(x,y-h1(x)- …- hn-1(x))}

hn(x)

…

+h2(x) +…+hn(x)

Boosting(AdaBoost)

36

h(x)=a1h1(x)

S’={(x,y,u1)}

h1(x)

S’={(x,y,u2)}

h2(x)

S’={(x,y,u3))}

hn(x)

…

+a2h2(x)+…+a3hn(x)

https://www.cs.princeton.edu/~schapire/papers/explaining-adaboost.pdf

u– weightingondatapointsa – weightoflinearcombination Stopwhenvalidation

performanceplateaus(willdiscusslater)

EnsembleSelection

37

“EnsembleSelectionfromLibrariesofModels”Caruana,Niculescu-Mizil,Crew&Ksikes,ICML2004


Alice# 14# 0# 1#

Bob# 10# 1# 1#

Carol# 13# 0# 1#

Dave# 8# 1# 0#

Erin# 11# 0# 0#

Frank# 9# 1# 1#

Gena# 8# 0# 0#


James# 11# 1# 1#

Jessica# 14# 0# 1#

Alice# 14# 0# 1#

Amy# 12# 0# 1#

Bob# 10# 1# 1#

Xavier# 9# 1# 0#

Cathy# 9# 0# 1#

Carol# 13# 0# 1#

Eugene# 13# 1# 0#

Rafael# 12# 1# 1#

Dave# 8# 1# 0#

Peter# 9# 1# 0#

Henry# 13# 1# 0#

Erin# 11# 0# 0#

Rose# 7# 0# 0#

Iain# 8# 1# 1#

Paulo# 12# 1# 0#

Margaret# 10# 0# 1#

Frank# 9# 1# 1#

Jill# 13# 0# 0#

Leon# 10# 1# 0#

Sarah# 12# 0# 0#

Gena# 8# 0# 0#

Patrick# 5# 1# 1#…+


h(x)+y+


Alice# 14# 0# 1#

Bob# 10# 1# 1#

Carol# 13# 0# 1#

Dave# 8# 1# 0#

Erin# 11# 0# 0#

Frank# 9# 1# 1#

Gena# 8# 0# 0#


James# 11# 1# 1#

Jessica# 14# 0# 1#

Alice# 14# 0# 1#

Amy# 12# 0# 1#

Bob# 10# 1# 1#

Xavier# 9# 1# 0#

Cathy# 9# 0# 1#

Carol# 13# 0# 1#

Eugene# 13# 1# 0#

Rafael# 12# 1# 1#

Dave# 8# 1# 0#

Peter# 9# 1# 0#

Henry# 13# 1# 0#

Erin# 11# 0# 0#

Rose# 7# 0# 0#

Iain# 8# 1# 1#

Paulo# 12# 1# 0#

Margaret# 10# 0# 1#

Frank# 9# 1# 1#

Jill# 13# 0# 0#

Leon# 10# 1# 0#

Sarah# 12# 0# 0#

Gena# 8# 0# 0#

Patrick# 5# 1# 1#…+


h(x)+y+


Alice# 14# 0# 1#

Bob# 10# 1# 1#

Carol# 13# 0# 1#

Dave# 8# 1# 0#

Erin# 11# 0# 0#

Frank# 9# 1# 1#

Gena# 8# 0# 0#


James# 11# 1# 1#

Jessica# 14# 0# 1#

Alice# 14# 0# 1#

Amy# 12# 0# 1#

Bob# 10# 1# 1#

Xavier# 9# 1# 0#

Cathy# 9# 0# 1#

Carol# 13# 0# 1#

Eugene# 13# 1# 0#

Rafael# 12# 1# 1#

Dave# 8# 1# 0#

Peter# 9# 1# 0#

Henry# 13# 1# 0#

Erin# 11# 0# 0#

Rose# 7# 0# 0#

Iain# 8# 1# 1#

Paulo# 12# 1# 0#

Margaret# 10# 0# 1#

Frank# 9# 1# 1#

Jill# 13# 0# 0#

Leon# 10# 1# 0#

Sarah# 12# 0# 0#

Gena# 8# 0# 0#

Patrick# 5# 1# 1#…+


h(x)+y+

TrainingS’

ValidationV’H={2000modelstrainedusingS’}

h(x)=h1(x)+h2(x)+…+hn(x)MaintainensemblemodelascombinationofH:

AddmodelfromHthatmaximizesperformanceonV’

+hn+1(x)

Repeat

S

Denoteashn+1

ModelsaretrainedonS’EnsemblebuilttooptimizeV’

Summary

38

Method MinimizeBias? MinimizeVariance? Other Comments

Bagging Complexmodelclass.(DeepDTs)

Bootstrap aggregation(resamplingtrainingdata)

Does notworkforsimplemodels.

RandomForests

Complexmodelclass.(DeepDTs)

Bootstrapaggregation+bootstrappingfeatures

Onlyfordecisiontrees.

GradientBoosting(AdaBoost)

Optimizetrainingperformance.

Simplemodelclass.(ShallowDTs)

Determineswhichmodeltoaddatrun-time.

EnsembleSelection

Optimize validationperformance

Optimizevalidationperformance

Pre-specified dictionaryofmodelslearnedontrainingset.

Documents

Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent