Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize...

CS639:DataManagementfor

DataScienceLecture19:UnsupervisedLearning/EnsembleLearning

TheodorosRekatsinas

1. UnsupervisedLearning/Clustering

2. K-Means

3. EnsemblesandGradientBoosting

Whatisclustering?

Whatdoweneedforclustering?

Distance(dissimilarity)measures

Clusterevaluation(ahardproblem)

Howmanyclusters?

Clusteringtechniques

K-means

K-meansalgorithm

K-meansconvergence(stoppingcriterion)

K-meansclusteringexample:step1

K-meansclusteringexample

WhyuseK-means?

WeaknessesofK-means

Outliers

Dealingwithoutliers

Sensitivitytoinitialseeds

K-meanssummary

Tonsofclusteringtechniques

Summary:clustering

Ensemblelearning:FightingtheBias/VarianceTradeoff

Ensemblemethods• Combinedifferentmodelstogetherto

• Minimizevariance• Bagging• RandomForests

• Minimizebias• FunctionalGradientDescent• Boosting• EnsembleSelection

Bagging• Goal:reducevariance

• Idealsetting:manytrainingsetsS’• TrainmodelusingeachS’• Averagepredictions

ES[(h(x|S)- y)2]=ES[(Z-ž)2]+ž2

Variance BiasExpectedError

Z=h(x|S)– yž =ES[Z]

http://statistics.berkeley.edu/sites/default/files/tech-reports/421.pdf“BaggingPredictors” [LeoBreiman,1994]

VariancereduceslinearlyBiasunchanged

sampledindependently

Person+ Age+ Male?+ Height+>+55”+

Alice# 14# 0# 1#

Bob# 10# 1# 1#

Carol# 13# 0# 1#

Dave# 8# 1# 0#

Erin# 11# 0# 0#

Frank# 9# 1# 1#

Gena# 8# 0# 0#

James# 11# 1# 1#

Jessica# 14# 0# 1#

Alice# 14# 0# 1#

Amy# 12# 0# 1#

Bob# 10# 1# 1#

Xavier# 9# 1# 0#

Cathy# 9# 0# 1#

Carol# 13# 0# 1#

Eugene# 13# 1# 0#

Rafael# 12# 1# 1#

Dave# 8# 1# 0#

Peter# 9# 1# 0#

Henry# 13# 1# 0#

Erin# 11# 0# 0#

Rose# 7# 0# 0#

Iain# 8# 1# 1#

Paulo# 12# 1# 0#

Margaret# 10# 0# 1#

Frank# 9# 1# 1#

Jill# 13# 0# 0#

Leon# 10# 1# 0#

Sarah# 12# 0# 0#

Gena# 8# 0# 0#

Patrick# 5# 1# 1#…+

L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###GeneralizaHon+Error:+

h(x)+y+

Alice# 14# 0# 1#

Bob# 10# 1# 1#

Carol# 13# 0# 1#

Dave# 8# 1# 0#

Erin# 11# 0# 0#

Frank# 9# 1# 1#

Gena# 8# 0# 0#

James# 11# 1# 1#

Jessica# 14# 0# 1#

Alice# 14# 0# 1#

Amy# 12# 0# 1#

Bob# 10# 1# 1#

Xavier# 9# 1# 0#

Cathy# 9# 0# 1#

Carol# 13# 0# 1#

Eugene# 13# 1# 0#

Rafael# 12# 1# 1#

Dave# 8# 1# 0#

Peter# 9# 1# 0#

Henry# 13# 1# 0#

Erin# 11# 0# 0#

Rose# 7# 0# 0#

Iain# 8# 1# 1#

Paulo# 12# 1# 0#

Margaret# 10# 0# 1#

Frank# 9# 1# 1#

Jill# 13# 0# 0#

Leon# 10# 1# 0#

Sarah# 12# 0# 0#

Gena# 8# 0# 0#

Patrick# 5# 1# 1#…+

h(x)+y+

S’P(x,y)

Bagging

• Goal:reducevariance

• Inpractice:resampleS’withreplacement• TrainmodelusingeachS’• Averagepredictions

ES[(h(x|S)- y)2]=ES[(Z-ž)2]+ž2

Variance BiasExpectedError

Z=h(x|S)– yž =ES[Z]

http://statistics.berkeley.edu/sites/default/files/tech-reports/421.pdf“BaggingPredictors” [LeoBreiman,1994]

Variancereducessub-linearly(BecauseS’arecorrelated)Biasoftenincreasesslightly

Alice# 14# 0# 1#

Bob# 10# 1# 1#

Carol# 13# 0# 1#

Dave# 8# 1# 0#

Erin# 11# 0# 0#

Frank# 9# 1# 1#

Gena# 8# 0# 0#

James# 11# 1# 1#

Jessica# 14# 0# 1#

Alice# 14# 0# 1#

Amy# 12# 0# 1#

Bob# 10# 1# 1#

Xavier# 9# 1# 0#

Cathy# 9# 0# 1#

Carol# 13# 0# 1#

Eugene# 13# 1# 0#

Rafael# 12# 1# 1#

Dave# 8# 1# 0#

Peter# 9# 1# 0#

Henry# 13# 1# 0#

Erin# 11# 0# 0#

Rose# 7# 0# 0#

Iain# 8# 1# 1#

Paulo# 12# 1# 0#

Margaret# 10# 0# 1#

Frank# 9# 1# 1#

Jill# 13# 0# 0#

Leon# 10# 1# 0#

Sarah# 12# 0# 0#

Gena# 8# 0# 0#

Patrick# 5# 1# 1#…+

h(x)+y+

Alice# 14# 0# 1#

Bob# 10# 1# 1#

Carol# 13# 0# 1#

Dave# 8# 1# 0#

Erin# 11# 0# 0#

Frank# 9# 1# 1#

Gena# 8# 0# 0#

James# 11# 1# 1#

Jessica# 14# 0# 1#

Alice# 14# 0# 1#

Amy# 12# 0# 1#

Bob# 10# 1# 1#

Xavier# 9# 1# 0#

Cathy# 9# 0# 1#

Carol# 13# 0# 1#

Eugene# 13# 1# 0#

Rafael# 12# 1# 1#

Dave# 8# 1# 0#

Peter# 9# 1# 0#

Henry# 13# 1# 0#

Erin# 11# 0# 0#

Rose# 7# 0# 0#

Iain# 8# 1# 1#

Paulo# 12# 1# 0#

Margaret# 10# 0# 1#

Frank# 9# 1# 1#

Jill# 13# 0# 0#

Leon# 10# 1# 0#

Sarah# 12# 0# 0#

Gena# 8# 0# 0#

Patrick# 5# 1# 1#…+

h(x)+y+

Bagging=BootstrapAggregation

RandomForests

• Goal: reducevariance• Baggingcanonlydosomuch• Resamplingtrainingdataasymptotes

• RandomForests:sampledata&features!• SampleS’• TrainDT

• Ateachnode,samplefeatures(sqrt)• Averagepredictions

“RandomForests– RandomFeatures”[LeoBreiman,1997]http://oz.berkeley.edu/~breiman/random-forests.pdf

Furtherde-correlatestrees

GradientBoosting

h(x)=h1(x)

S’={(x,y)}

S’={(x,y-h1(x))}

S’={(x,y-h1(x)- …- hn-1(x))}

+h2(x) +…+hn(x)

Boosting(AdaBoost)

h(x)=a1h1(x)

S’={(x,y,u1)}

S’={(x,y,u2)}

S’={(x,y,u3))}

+a2h2(x)+…+a3hn(x)

https://www.cs.princeton.edu/~schapire/papers/explaining-adaboost.pdf

u– weightingondatapointsa – weightoflinearcombination Stopwhenvalidation

performanceplateaus(willdiscusslater)

EnsembleSelection

“EnsembleSelectionfromLibrariesofModels”Caruana,Niculescu-Mizil,Crew&Ksikes,ICML2004

Alice# 14# 0# 1#

Bob# 10# 1# 1#

Carol# 13# 0# 1#

Dave# 8# 1# 0#

Erin# 11# 0# 0#

Frank# 9# 1# 1#

Gena# 8# 0# 0#

James# 11# 1# 1#

Jessica# 14# 0# 1#

Alice# 14# 0# 1#

Amy# 12# 0# 1#

Bob# 10# 1# 1#

Xavier# 9# 1# 0#

Cathy# 9# 0# 1#

Carol# 13# 0# 1#

Eugene# 13# 1# 0#

Rafael# 12# 1# 1#

Dave# 8# 1# 0#

Peter# 9# 1# 0#

Henry# 13# 1# 0#

Erin# 11# 0# 0#

Rose# 7# 0# 0#

Iain# 8# 1# 1#

Paulo# 12# 1# 0#

Margaret# 10# 0# 1#

Frank# 9# 1# 1#

Jill# 13# 0# 0#

Leon# 10# 1# 0#

Sarah# 12# 0# 0#

Gena# 8# 0# 0#

Patrick# 5# 1# 1#…+

h(x)+y+

Alice# 14# 0# 1#

Bob# 10# 1# 1#

Carol# 13# 0# 1#

Dave# 8# 1# 0#

Erin# 11# 0# 0#

Frank# 9# 1# 1#

Gena# 8# 0# 0#

James# 11# 1# 1#

Jessica# 14# 0# 1#

Alice# 14# 0# 1#

Amy# 12# 0# 1#

Bob# 10# 1# 1#

Xavier# 9# 1# 0#

Cathy# 9# 0# 1#

Carol# 13# 0# 1#

Eugene# 13# 1# 0#

Rafael# 12# 1# 1#

Dave# 8# 1# 0#

Peter# 9# 1# 0#

Henry# 13# 1# 0#

Erin# 11# 0# 0#

Rose# 7# 0# 0#

Iain# 8# 1# 1#

Paulo# 12# 1# 0#

Margaret# 10# 0# 1#

Frank# 9# 1# 1#

Jill# 13# 0# 0#

Leon# 10# 1# 0#

Sarah# 12# 0# 0#

Gena# 8# 0# 0#

Patrick# 5# 1# 1#…+

h(x)+y+

Alice# 14# 0# 1#

Bob# 10# 1# 1#

Carol# 13# 0# 1#

Dave# 8# 1# 0#

Erin# 11# 0# 0#

Frank# 9# 1# 1#

Gena# 8# 0# 0#

James# 11# 1# 1#

Jessica# 14# 0# 1#

Alice# 14# 0# 1#

Amy# 12# 0# 1#

Bob# 10# 1# 1#

Xavier# 9# 1# 0#

Cathy# 9# 0# 1#

Carol# 13# 0# 1#

Eugene# 13# 1# 0#

Rafael# 12# 1# 1#

Dave# 8# 1# 0#

Peter# 9# 1# 0#

Henry# 13# 1# 0#

Erin# 11# 0# 0#

Rose# 7# 0# 0#

Iain# 8# 1# 1#

Paulo# 12# 1# 0#

Margaret# 10# 0# 1#

Frank# 9# 1# 1#

Jill# 13# 0# 0#

Leon# 10# 1# 0#

Sarah# 12# 0# 0#

Gena# 8# 0# 0#

Patrick# 5# 1# 1#…+

h(x)+y+

TrainingS’

ValidationV’H={2000modelstrainedusingS’}

h(x)=h1(x)+h2(x)+…+hn(x)MaintainensemblemodelascombinationofH:

AddmodelfromHthatmaximizesperformanceonV’

+hn+1(x)

Repeat

Denoteashn+1

ModelsaretrainedonS’EnsemblebuilttooptimizeV’

Summary

Method MinimizeBias? MinimizeVariance? Other Comments

Bagging Complexmodelclass.(DeepDTs)

Bootstrap aggregation(resamplingtrainingdata)

Does notworkforsimplemodels.

RandomForests

Complexmodelclass.(DeepDTs)

Bootstrapaggregation+bootstrappingfeatures

Onlyfordecisiontrees.

GradientBoosting(AdaBoost)

Optimizetrainingperformance.

Simplemodelclass.(ShallowDTs)

Determineswhichmodeltoaddatrun-time.

EnsembleSelection

Optimize validationperformance

Optimizevalidationperformance

Pre-specified dictionaryofmodelslearnedontrainingset.

Lecture 19 UN EN - GitHub PagesEnsemble methods •Combine different models together to •Minimize...

Documents

Vacuum Bagging Techniques - Jamestown Distributors...Vacuum Bagging Techniques A guide to the principles and practical application of vacuum bagging for ... differential (clamping

Newton’s Method - CMU Statisticsryantibs/convexopt/lectures/newton.pdf · Newton’s method interpretation Recall the motivation for gradient descent step at x: we minimize the

Bagging Equipment | Bag Filling Machines - Choice Bagging

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE ...cmp.felk.cvut.cz/~zimmerk/zimmermann-TPAMI.pdf · Lucas and Kanade [1], [2] use the steepest descent optimization to minimize

Vacuum Bagging Basics - Diane Selkirk Vacuum Bagging Basics.pdf · Vacuum Bagging Basics Vacuum bagging small fiberglass panels provides a higher ... double as comfortable foredeck

Bagging Weak Predictors Manuel Lukas and Eric Hillebrand ......Bagging Weak Predictors ... With bagging, the modeling strategy is applied repeatedly to bootstrap samples of the data,

SB2b Statistical Machine Learning Bagging Decision Trees ...flaxman/HT17_lecture14.pdf · Tree Ensembles Bagging Variance Reduction in Bagging Suppose, in an ideal world, our estimators

Bagging in Textiles

Stochastic Gradient Descent Tricks - leon.bottou.orgleon.bottou.org/publications/pdf/tricks-2012.pdf · 2 2.1 Gradient descent It has often been proposed (e.g., [18]) to minimize

Social Bagging 101

CHOICE BAGGING EQUIPMENT, LTD

Bagging: motivationanna/stat697F/Chapter8...Bagging: motivation I The decision trees su er from high variance. Bootstrap aggregation, or bagging, is a general-purpose procedure for

Tunnel Finisher & Bagging Machines · Automatic Bagging Machines for Hanging Garments The automatic bagging machine SPEED-PACK from VEIT BRISAY guarantees quick, reliable and cost-effective

Sand bagging

(Sub)gradient Descent - UMD...•Gradient descent –A generic algorithm to minimize objective functions –Works well as long as functions are well behaved (ie convex) –Subgradient

DUALITY - UMD Department of Computer Sciencetomg/course/cmsc764/L9_duality.pdfWHY DUALITY? minimize f (x) No constraints Gradient descent Newton’s method Quasi-newton Conjugate gradients

INFO-4604, Applied Machine Learning University of Colorado ... › classes › INFO-4604 › fa17 › ... · Learning We can use gradient descent to minimize the negative log-likelihood,

Bagging Films En

Boosting e Bagging

Bagging manual