A Hands-On Introduction to Automatic Machine Learninglarsko/slides/automl18.pdf · Automatic Machine Learning Lars Kotthoff University of Wyoming [email protected] AutoML Workshop,

A Hands-On Introduction toAutomatic Machine Learning

Lars KotthoffUniversity of Wyoming

[email protected]

AutoML Workshop, 28 August 2018, Nanjing

1

[email protected]

Machine Learning

Data Machine Learning Predictions

2

Automatic Machine Learning

HyperparameterTuning

Data Machine Learning Predictions

3

Grid and Random Search▷ evaluate certain points in parameter space

Bergstra, James, and Yoshua Bengio. “Random Search forHyper-Parameter Optimization.” J. Mach. Learn. Res. 13, no. 1 (February2012): 281–305.

4

Local Search

▷ start with random configuration▷ change a single parameter (local search step)▷ if better, keep the change, else revert▷ repeat, stop when resources exhausted or desired solution

quality achieved▷ restart occasionally with new random configurations

5

Local Search ExampleGoing Beyond Local Optima: Iterated Local Search

(Initialisation)

Animation credit: Holger HoosHutter & Lindauer AC-Tutorial AAAI 2016, Phoenix, USA 33

graphics by Holger Hoos

6


(Initialisation)



7


(Local Search)



8


(Local Search)



9


(Perturbation)



10


(Local Search)



11


(Local Search)



12


(Local Search)



13


?

Selection (using Acceptance Criterion)



14

Model-Based Search

▷ evaluate small number of configurations▷ build model of parameter-performance surface based on the

results▷ use model to predict where to evaluate next▷ repeat, stop when resources exhausted or desired solution

quality achieved▷ allows targeted exploration of promising configurations

15

Model-Based Search Example

●

●

●

●

●

yei

−1.0 −0.5 0.0 0.5 1.0

0.0

0.4

0.8

0.000

0.005

0.010

0.015

0.020

0.025

x

type

● init

prop

type

y

yhat

ei

Iter = 1, Gap = 1.9909e−01

Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas,and Michel Lang. “MlrMBO: A Modular Framework for Model-BasedOptimization of Expensive Black-Box Functions,” March 9, 2017.http://arxiv.org/abs/1703.03373. 16


●

●

●

●

●

yei

−1.0 −0.5 0.0 0.5 1.0

0.0

0.4

0.8

0.00

0.01

0.02

0.03

x

type

● init

prop

seq

type

y

yhat

ei

Iter = 2, Gap = 1.9909e−01



●

●

●

●

●

yei

−1.0 −0.5 0.0 0.5 1.0

0.0

0.4

0.8

0.000

0.002

0.004

0.006

x

type

● init

prop

seq

type

y

yhat

ei

Iter = 3, Gap = 1.9909e−01



●

●

●

●

●

yei

−1.0 −0.5 0.0 0.5 1.0

0.0

0.4

0.8

0e+00

2e−04

4e−04

6e−04

8e−04

x

type

● init

prop

seq

type

y

yhat

ei

Iter = 4, Gap = 1.9992e−01



●

●

●

●

●

yei

−1.0 −0.5 0.0 0.5 1.0

0.0

0.4

0.8

0e+00

1e−04

2e−04

x

type

● init

prop

seq

type

y

yhat

ei

Iter = 5, Gap = 1.9992e−01



●

●

●

●

●

yei

−1.0 −0.5 0.0 0.5 1.0

0.0

0.4

0.8

0.00000

0.00003

0.00006

0.00009

0.00012

x

type

● init

prop

seq

type

y

yhat

ei

Iter = 6, Gap = 1.9996e−01



●

●

●

●

●

yei

−1.0 −0.5 0.0 0.5 1.0

0.0

0.4

0.8

0e+00

1e−05

2e−05

3e−05

4e−05

5e−05

x

type

● init

prop

seq

type

y

yhat

ei

Iter = 7, Gap = 2.0000e−01



●

●

●

●

●

yei

−1.0 −0.5 0.0 0.5 1.0

0.0

0.4

0.8

0.0e+00

5.0e−06

1.0e−05

1.5e−05

2.0e−05

x

type

● init

prop

seq

type

y

yhat

ei

Iter = 8, Gap = 2.0000e−01



●

●

●

●

●

yei

−1.0 −0.5 0.0 0.5 1.0

0.0

0.4

0.8

0.0e+00

2.5e−06

5.0e−06

7.5e−06

1.0e−05

x

type

● init

prop

seq

type

y

yhat

ei

Iter = 9, Gap = 2.0000e−01



●

●

●

●

●

yei

−1.0 −0.5 0.0 0.5 1.0

0.0

0.4

0.8

0e+00

1e−07

2e−07

3e−07

4e−07

x

type

● init

prop

seq

type

y

yhat

ei

Iter = 10, Gap = 2.0000e−01


Problems

▷ How good are we really?▷ How much of it is just random chance?▷ Can we do better?

26

Underlying Issues

▷ true performance landscape unknown▷ resources allow to explore only tiny part of hyperparameter

space▷ results inherently stochastic

27

Potential Solutions

▷ better-understood benchmarks▷ more comparisons▷ more runs with different random seed

28

Two-Slide MBO ML# http://www.cs.uwyo.edu/~larsko/mbo.pyparams = { 'C': np.logspace(-2, 10, 13),

'gamma': np.logspace(-9, 3, 13) }param_grid = [ { 'C': x, 'gamma': y } for x in params['C']

for y in params['gamma'] ]# [{'C': 0.01, 'gamma': 1e-09}, {'C': 0.01, 'gamma': 1e-08}...]

initial_samples = 3evals = 10random.seed(1)

def est_acc(pars):clf = svm.SVC(**pars)return np.median(cross_val_score(clf, iris.data, iris.target, cv = 10))

data = []for pars in random.sample(param_grid, initial_samples):

acc = est_acc(pars)data += [ list(pars.values()) + [ acc ] ]

# [[1.0, 0.1, 1.0],# [1000000000.0, 1e-07, 1.0],# [0. 1, 1e-06,0.9333333333333333]]

29

Two-Slide MBO ML

regr = RandomForestRegressor(random_state = 0)for evals in range(0, evals):

df = np.array(data)regr.fit(df[:,0:2], df[:,2])

preds = regr.predict([ list(pars.values()) for pars in param_grid ])i = preds.argmax()

acc = est_acc(param_grid[i])data += [ list(param_grid[i].values()) + [ acc ] ]print("{}: best predicted {} for {}, actual {}"

.format(evals, round(preds[i], 2), param_grid[i], round(acc, 2)))

i = np.array(data)[:,2].argmax()print("Best accuracy ({}) for parameters {}".format(data[i][2], data[i][0:2]))

30

Two-Slide MBO ML

0: best predicted 0.99 for {'C': 1.0, 'gamma': 1e-09}, actual 0.931: best predicted 0.99 for {'C': 1000000000.0, 'gamma': 1e-09}, actual 0.932: best predicted 0.99 for {'C': 1000000000.0, 'gamma': 0.1}, actual 0.933: best predicted 0.97 for {'C': 1.0, 'gamma': 0.1}, actual 1.04: best predicted 0.99 for {'C': 1.0, 'gamma': 0.1}, actual 1.05: best predicted 1.0 for {'C': 1.0, 'gamma': 0.1}, actual 1.06: best predicted 1.0 for {'C': 1.0, 'gamma': 0.1}, actual 1.07: best predicted 1.0 for {'C': 1.0, 'gamma': 0.1}, actual 1.08: best predicted 1.0 for {'C': 0.01, 'gamma': 0.1}, actual 0.939: best predicted 1.0 for {'C': 1.0, 'gamma': 0.1}, actual 1.0Best accuracy (1.0) for parameters [1.0, 0.1]

31

Tools and Resources

iRace http://iridia.ulb.ac.be/irace/TPOT https://github.com/EpistasisLab/tpot

mlrMBO https://github.com/mlr-org/mlrMBOSMAC http://www.cs.ubc.ca/labs/beta/Projects/SMAC/

Spearmint https://github.com/HIPS/SpearmintTPE https://jaberg.github.io/hyperopt/

Auto-WEKA http://www.cs.ubc.ca/labs/beta/Projects/autoweka/

Auto-sklearn https://github.com/automl/auto-sklearn

Available soon: edited book on automatic machine learninghttps://www.automl.org/book/ (Frank Hutter, Lars Kotthoff,Joaquin Vanschoren)

32

http://iridia.ulb.ac.be/irace/

https://github.com/EpistasisLab/tpot

https://github.com/mlr-org/mlrMBO

http://www.cs.ubc.ca/labs/beta/Projects/SMAC/

https://github.com/HIPS/Spearmint

https://jaberg.github.io/hyperopt/

http://www.cs.ubc.ca/labs/beta/Projects/autoweka/

https://github.com/automl/auto-sklearn

https://www.automl.org/book/

I’m hiring!

Several funded graduate/postdoc positions available.

33

Documents

A Hands-On Introduction to Automatic Machine Learninglarsko/slides/automl18.pdf · Automatic Machine Learning Lars Kotthoff University of Wyoming [email protected] AutoML Workshop,