38
CS639: Data Management for Data Science Lecture 19: Unsupervised Learning/Ensemble Learning Theodoros Rekatsinas 1

Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

  • Upload
    others

  • View
    11

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

CS639:DataManagementfor

DataScienceLecture19:UnsupervisedLearning/EnsembleLearning

TheodorosRekatsinas

1

Page 2: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

Today

1. UnsupervisedLearning/Clustering

2. K-Means

3. EnsemblesandGradientBoosting

2

Page 3: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

Whatisclustering?

Page 4: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

Whatdoweneedforclustering?

Page 5: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

Distance(dissimilarity)measures

Page 6: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

Clusterevaluation(ahardproblem)

Page 7: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

Howmanyclusters?

Page 8: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

Clusteringtechniques

Page 9: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

Clusteringtechniques

Page 10: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

Clusteringtechniques

Page 11: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

K-means

Page 12: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

K-meansalgorithm

Page 13: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

K-meansconvergence(stoppingcriterion)

Page 14: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

K-meansclusteringexample:step1

Page 15: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

K-meansclusteringexample:step2

Page 16: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

K-meansclusteringexample:step3

Page 17: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

K-meansclusteringexample

Page 18: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

K-meansclusteringexample

Page 19: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

K-meansclusteringexample

Page 20: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

WhyuseK-means?

Page 21: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

WeaknessesofK-means

Page 22: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

Outliers

Page 23: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

Dealingwithoutliers

Page 24: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

Sensitivitytoinitialseeds

Page 25: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

K-meanssummary

Page 26: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

Tonsofclusteringtechniques

Page 27: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

Summary:clustering

Page 28: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

Ensemblelearning:FightingtheBias/VarianceTradeoff

28

Page 29: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

Ensemblemethods• Combinedifferentmodelstogetherto

• Minimizevariance• Bagging• RandomForests

• Minimizebias• FunctionalGradientDescent• Boosting• EnsembleSelection

Page 30: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

Ensemblemethods• Combinedifferentmodelstogetherto

• Minimizevariance• Bagging• RandomForests

• Minimizebias• FunctionalGradientDescent• Boosting• EnsembleSelection

Page 31: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

Bagging• Goal:reducevariance

• Idealsetting:manytrainingsetsS’• TrainmodelusingeachS’• Averagepredictions

ES[(h(x|S)- y)2]=ES[(Z-ž)2]+ž2

Variance BiasExpectedError

Z=h(x|S)– yž =ES[Z]

http://statistics.berkeley.edu/sites/default/files/tech-reports/421.pdf“BaggingPredictors” [LeoBreiman,1994]

VariancereduceslinearlyBiasunchanged

sampledindependently

Person+ Age+ Male?+ Height+>+55”+

Alice# 14# 0# 1#

Bob# 10# 1# 1#

Carol# 13# 0# 1#

Dave# 8# 1# 0#

Erin# 11# 0# 0#

Frank# 9# 1# 1#

Gena# 8# 0# 0#

Person+ Age+ Male?+ Height+>+55”+

James# 11# 1# 1#

Jessica# 14# 0# 1#

Alice# 14# 0# 1#

Amy# 12# 0# 1#

Bob# 10# 1# 1#

Xavier# 9# 1# 0#

Cathy# 9# 0# 1#

Carol# 13# 0# 1#

Eugene# 13# 1# 0#

Rafael# 12# 1# 1#

Dave# 8# 1# 0#

Peter# 9# 1# 0#

Henry# 13# 1# 0#

Erin# 11# 0# 0#

Rose# 7# 0# 0#

Iain# 8# 1# 1#

Paulo# 12# 1# 0#

Margaret# 10# 0# 1#

Frank# 9# 1# 1#

Jill# 13# 0# 0#

Leon# 10# 1# 0#

Sarah# 12# 0# 0#

Gena# 8# 0# 0#

Patrick# 5# 1# 1#…+

L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###GeneralizaHon+Error:+

h(x)+y+

Person+ Age+ Male?+ Height+>+55”+

Alice# 14# 0# 1#

Bob# 10# 1# 1#

Carol# 13# 0# 1#

Dave# 8# 1# 0#

Erin# 11# 0# 0#

Frank# 9# 1# 1#

Gena# 8# 0# 0#

Person+ Age+ Male?+ Height+>+55”+

James# 11# 1# 1#

Jessica# 14# 0# 1#

Alice# 14# 0# 1#

Amy# 12# 0# 1#

Bob# 10# 1# 1#

Xavier# 9# 1# 0#

Cathy# 9# 0# 1#

Carol# 13# 0# 1#

Eugene# 13# 1# 0#

Rafael# 12# 1# 1#

Dave# 8# 1# 0#

Peter# 9# 1# 0#

Henry# 13# 1# 0#

Erin# 11# 0# 0#

Rose# 7# 0# 0#

Iain# 8# 1# 1#

Paulo# 12# 1# 0#

Margaret# 10# 0# 1#

Frank# 9# 1# 1#

Jill# 13# 0# 0#

Leon# 10# 1# 0#

Sarah# 12# 0# 0#

Gena# 8# 0# 0#

Patrick# 5# 1# 1#…+

L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###GeneralizaHon+Error:+

h(x)+y+

S’P(x,y)

Page 32: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

Bagging

32

• Goal:reducevariance

• Inpractice:resampleS’withreplacement• TrainmodelusingeachS’• Averagepredictions

ES[(h(x|S)- y)2]=ES[(Z-ž)2]+ž2

Variance BiasExpectedError

Z=h(x|S)– yž =ES[Z]

fromS

http://statistics.berkeley.edu/sites/default/files/tech-reports/421.pdf“BaggingPredictors” [LeoBreiman,1994]

Variancereducessub-linearly(BecauseS’arecorrelated)Biasoftenincreasesslightly

Person+ Age+ Male?+ Height+>+55”+

Alice# 14# 0# 1#

Bob# 10# 1# 1#

Carol# 13# 0# 1#

Dave# 8# 1# 0#

Erin# 11# 0# 0#

Frank# 9# 1# 1#

Gena# 8# 0# 0#

Person+ Age+ Male?+ Height+>+55”+

James# 11# 1# 1#

Jessica# 14# 0# 1#

Alice# 14# 0# 1#

Amy# 12# 0# 1#

Bob# 10# 1# 1#

Xavier# 9# 1# 0#

Cathy# 9# 0# 1#

Carol# 13# 0# 1#

Eugene# 13# 1# 0#

Rafael# 12# 1# 1#

Dave# 8# 1# 0#

Peter# 9# 1# 0#

Henry# 13# 1# 0#

Erin# 11# 0# 0#

Rose# 7# 0# 0#

Iain# 8# 1# 1#

Paulo# 12# 1# 0#

Margaret# 10# 0# 1#

Frank# 9# 1# 1#

Jill# 13# 0# 0#

Leon# 10# 1# 0#

Sarah# 12# 0# 0#

Gena# 8# 0# 0#

Patrick# 5# 1# 1#…+

L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###GeneralizaHon+Error:+

h(x)+y+

Person+ Age+ Male?+ Height+>+55”+

Alice# 14# 0# 1#

Bob# 10# 1# 1#

Carol# 13# 0# 1#

Dave# 8# 1# 0#

Erin# 11# 0# 0#

Frank# 9# 1# 1#

Gena# 8# 0# 0#

Person+ Age+ Male?+ Height+>+55”+

James# 11# 1# 1#

Jessica# 14# 0# 1#

Alice# 14# 0# 1#

Amy# 12# 0# 1#

Bob# 10# 1# 1#

Xavier# 9# 1# 0#

Cathy# 9# 0# 1#

Carol# 13# 0# 1#

Eugene# 13# 1# 0#

Rafael# 12# 1# 1#

Dave# 8# 1# 0#

Peter# 9# 1# 0#

Henry# 13# 1# 0#

Erin# 11# 0# 0#

Rose# 7# 0# 0#

Iain# 8# 1# 1#

Paulo# 12# 1# 0#

Margaret# 10# 0# 1#

Frank# 9# 1# 1#

Jill# 13# 0# 0#

Leon# 10# 1# 0#

Sarah# 12# 0# 0#

Gena# 8# 0# 0#

Patrick# 5# 1# 1#…+

L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###GeneralizaHon+Error:+

h(x)+y+

S’S

Bagging=BootstrapAggregation

Page 33: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

RandomForests

33

• Goal: reducevariance• Baggingcanonlydosomuch• Resamplingtrainingdataasymptotes

• RandomForests:sampledata&features!• SampleS’• TrainDT

• Ateachnode,samplefeatures(sqrt)• Averagepredictions

“RandomForests– RandomFeatures”[LeoBreiman,1997]http://oz.berkeley.edu/~breiman/random-forests.pdf

Furtherde-correlatestrees

Page 34: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

Ensemblemethods• Combinedifferentmodelstogetherto

• Minimizevariance• Bagging• RandomForests

• Minimizebias• FunctionalGradientDescent• Boosting• EnsembleSelection

Page 35: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

GradientBoosting

35

h(x)=h1(x)

S’={(x,y)}

h1(x)

S’={(x,y-h1(x))}

h2(x)

S’={(x,y-h1(x)- …- hn-1(x))}

hn(x)

+h2(x) +…+hn(x)

Page 36: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

Boosting(AdaBoost)

36

h(x)=a1h1(x)

S’={(x,y,u1)}

h1(x)

S’={(x,y,u2)}

h2(x)

S’={(x,y,u3))}

hn(x)

+a2h2(x)+…+a3hn(x)

https://www.cs.princeton.edu/~schapire/papers/explaining-adaboost.pdf

u– weightingondatapointsa – weightoflinearcombination Stopwhenvalidation

performanceplateaus(willdiscusslater)

Page 37: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

EnsembleSelection

37

“EnsembleSelectionfromLibrariesofModels”Caruana,Niculescu-Mizil,Crew&Ksikes,ICML2004

Person+ Age+ Male?+ Height+>+55”+

Alice# 14# 0# 1#

Bob# 10# 1# 1#

Carol# 13# 0# 1#

Dave# 8# 1# 0#

Erin# 11# 0# 0#

Frank# 9# 1# 1#

Gena# 8# 0# 0#

Person+ Age+ Male?+ Height+>+55”+

James# 11# 1# 1#

Jessica# 14# 0# 1#

Alice# 14# 0# 1#

Amy# 12# 0# 1#

Bob# 10# 1# 1#

Xavier# 9# 1# 0#

Cathy# 9# 0# 1#

Carol# 13# 0# 1#

Eugene# 13# 1# 0#

Rafael# 12# 1# 1#

Dave# 8# 1# 0#

Peter# 9# 1# 0#

Henry# 13# 1# 0#

Erin# 11# 0# 0#

Rose# 7# 0# 0#

Iain# 8# 1# 1#

Paulo# 12# 1# 0#

Margaret# 10# 0# 1#

Frank# 9# 1# 1#

Jill# 13# 0# 0#

Leon# 10# 1# 0#

Sarah# 12# 0# 0#

Gena# 8# 0# 0#

Patrick# 5# 1# 1#…+

L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###GeneralizaHon+Error:+

h(x)+y+

Person+ Age+ Male?+ Height+>+55”+

Alice# 14# 0# 1#

Bob# 10# 1# 1#

Carol# 13# 0# 1#

Dave# 8# 1# 0#

Erin# 11# 0# 0#

Frank# 9# 1# 1#

Gena# 8# 0# 0#

Person+ Age+ Male?+ Height+>+55”+

James# 11# 1# 1#

Jessica# 14# 0# 1#

Alice# 14# 0# 1#

Amy# 12# 0# 1#

Bob# 10# 1# 1#

Xavier# 9# 1# 0#

Cathy# 9# 0# 1#

Carol# 13# 0# 1#

Eugene# 13# 1# 0#

Rafael# 12# 1# 1#

Dave# 8# 1# 0#

Peter# 9# 1# 0#

Henry# 13# 1# 0#

Erin# 11# 0# 0#

Rose# 7# 0# 0#

Iain# 8# 1# 1#

Paulo# 12# 1# 0#

Margaret# 10# 0# 1#

Frank# 9# 1# 1#

Jill# 13# 0# 0#

Leon# 10# 1# 0#

Sarah# 12# 0# 0#

Gena# 8# 0# 0#

Patrick# 5# 1# 1#…+

L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###GeneralizaHon+Error:+

h(x)+y+

Person+ Age+ Male?+ Height+>+55”+

Alice# 14# 0# 1#

Bob# 10# 1# 1#

Carol# 13# 0# 1#

Dave# 8# 1# 0#

Erin# 11# 0# 0#

Frank# 9# 1# 1#

Gena# 8# 0# 0#

Person+ Age+ Male?+ Height+>+55”+

James# 11# 1# 1#

Jessica# 14# 0# 1#

Alice# 14# 0# 1#

Amy# 12# 0# 1#

Bob# 10# 1# 1#

Xavier# 9# 1# 0#

Cathy# 9# 0# 1#

Carol# 13# 0# 1#

Eugene# 13# 1# 0#

Rafael# 12# 1# 1#

Dave# 8# 1# 0#

Peter# 9# 1# 0#

Henry# 13# 1# 0#

Erin# 11# 0# 0#

Rose# 7# 0# 0#

Iain# 8# 1# 1#

Paulo# 12# 1# 0#

Margaret# 10# 0# 1#

Frank# 9# 1# 1#

Jill# 13# 0# 0#

Leon# 10# 1# 0#

Sarah# 12# 0# 0#

Gena# 8# 0# 0#

Patrick# 5# 1# 1#…+

L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###GeneralizaHon+Error:+

h(x)+y+

TrainingS’

ValidationV’H={2000modelstrainedusingS’}

h(x)=h1(x)+h2(x)+…+hn(x)MaintainensemblemodelascombinationofH:

AddmodelfromHthatmaximizesperformanceonV’

+hn+1(x)

Repeat

S

Denoteashn+1

ModelsaretrainedonS’EnsemblebuilttooptimizeV’

Page 38: Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize variance •Bagging •Random Forests •Minimize bias •Functional Gradient Descent

Summary

38

Method MinimizeBias? MinimizeVariance? Other Comments

Bagging Complexmodelclass.(DeepDTs)

Bootstrap aggregation(resamplingtrainingdata)

Does notworkforsimplemodels.

RandomForests

Complexmodelclass.(DeepDTs)

Bootstrapaggregation+bootstrappingfeatures

Onlyfordecisiontrees.

GradientBoosting(AdaBoost)

Optimizetrainingperformance.

Simplemodelclass.(ShallowDTs)

Determineswhichmodeltoaddatrun-time.

EnsembleSelection

Optimize validationperformance

Optimizevalidationperformance

Pre-specified dictionaryofmodelslearnedontrainingset.