Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
A Hands-On Introduction toAutomatic Machine Learning
Lars KotthoffUniversity of Wyoming
AutoML Workshop, 28 August 2018, Nanjing
1
Machine Learning
Data Machine Learning Predictions
2
Automatic Machine Learning
HyperparameterTuning
Data Machine Learning Predictions
3
Grid and Random Search▷ evaluate certain points in parameter space
Bergstra, James, and Yoshua Bengio. “Random Search forHyper-Parameter Optimization.” J. Mach. Learn. Res. 13, no. 1 (February2012): 281–305.
4
Local Search
▷ start with random configuration▷ change a single parameter (local search step)▷ if better, keep the change, else revert▷ repeat, stop when resources exhausted or desired solution
quality achieved▷ restart occasionally with new random configurations
5
Local Search ExampleGoing Beyond Local Optima: Iterated Local Search
(Initialisation)
Animation credit: Holger HoosHutter & Lindauer AC-Tutorial AAAI 2016, Phoenix, USA 33
graphics by Holger Hoos
6
Local Search ExampleGoing Beyond Local Optima: Iterated Local Search
(Initialisation)
Animation credit: Holger HoosHutter & Lindauer AC-Tutorial AAAI 2016, Phoenix, USA 33
graphics by Holger Hoos
7
Local Search ExampleGoing Beyond Local Optima: Iterated Local Search
(Local Search)
Animation credit: Holger HoosHutter & Lindauer AC-Tutorial AAAI 2016, Phoenix, USA 33
graphics by Holger Hoos
8
Local Search ExampleGoing Beyond Local Optima: Iterated Local Search
(Local Search)
Animation credit: Holger HoosHutter & Lindauer AC-Tutorial AAAI 2016, Phoenix, USA 33
graphics by Holger Hoos
9
Local Search ExampleGoing Beyond Local Optima: Iterated Local Search
(Perturbation)
Animation credit: Holger HoosHutter & Lindauer AC-Tutorial AAAI 2016, Phoenix, USA 33
graphics by Holger Hoos
10
Local Search ExampleGoing Beyond Local Optima: Iterated Local Search
(Local Search)
Animation credit: Holger HoosHutter & Lindauer AC-Tutorial AAAI 2016, Phoenix, USA 33
graphics by Holger Hoos
11
Local Search ExampleGoing Beyond Local Optima: Iterated Local Search
(Local Search)
Animation credit: Holger HoosHutter & Lindauer AC-Tutorial AAAI 2016, Phoenix, USA 33
graphics by Holger Hoos
12
Local Search ExampleGoing Beyond Local Optima: Iterated Local Search
(Local Search)
Animation credit: Holger HoosHutter & Lindauer AC-Tutorial AAAI 2016, Phoenix, USA 33
graphics by Holger Hoos
13
Local Search ExampleGoing Beyond Local Optima: Iterated Local Search
?
Selection (using Acceptance Criterion)
Animation credit: Holger HoosHutter & Lindauer AC-Tutorial AAAI 2016, Phoenix, USA 33
graphics by Holger Hoos
14
Model-Based Search
▷ evaluate small number of configurations▷ build model of parameter-performance surface based on the
results▷ use model to predict where to evaluate next▷ repeat, stop when resources exhausted or desired solution
quality achieved▷ allows targeted exploration of promising configurations
15
Model-Based Search Example
●
●
●
●
●
yei
−1.0 −0.5 0.0 0.5 1.0
0.0
0.4
0.8
0.000
0.005
0.010
0.015
0.020
0.025
x
type
● init
prop
type
y
yhat
ei
Iter = 1, Gap = 1.9909e−01
Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas,and Michel Lang. “MlrMBO: A Modular Framework for Model-BasedOptimization of Expensive Black-Box Functions,” March 9, 2017.http://arxiv.org/abs/1703.03373. 16
Model-Based Search Example
●
●
●
●
●
yei
−1.0 −0.5 0.0 0.5 1.0
0.0
0.4
0.8
0.00
0.01
0.02
0.03
x
type
● init
prop
seq
type
y
yhat
ei
Iter = 2, Gap = 1.9909e−01
Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas,and Michel Lang. “MlrMBO: A Modular Framework for Model-BasedOptimization of Expensive Black-Box Functions,” March 9, 2017.http://arxiv.org/abs/1703.03373. 17
Model-Based Search Example
●
●
●
●
●
yei
−1.0 −0.5 0.0 0.5 1.0
0.0
0.4
0.8
0.000
0.002
0.004
0.006
x
type
● init
prop
seq
type
y
yhat
ei
Iter = 3, Gap = 1.9909e−01
Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas,and Michel Lang. “MlrMBO: A Modular Framework for Model-BasedOptimization of Expensive Black-Box Functions,” March 9, 2017.http://arxiv.org/abs/1703.03373. 18
Model-Based Search Example
●
●
●
●
●
yei
−1.0 −0.5 0.0 0.5 1.0
0.0
0.4
0.8
0e+00
2e−04
4e−04
6e−04
8e−04
x
type
● init
prop
seq
type
y
yhat
ei
Iter = 4, Gap = 1.9992e−01
Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas,and Michel Lang. “MlrMBO: A Modular Framework for Model-BasedOptimization of Expensive Black-Box Functions,” March 9, 2017.http://arxiv.org/abs/1703.03373. 19
Model-Based Search Example
●
●
●
●
●
yei
−1.0 −0.5 0.0 0.5 1.0
0.0
0.4
0.8
0e+00
1e−04
2e−04
x
type
● init
prop
seq
type
y
yhat
ei
Iter = 5, Gap = 1.9992e−01
Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas,and Michel Lang. “MlrMBO: A Modular Framework for Model-BasedOptimization of Expensive Black-Box Functions,” March 9, 2017.http://arxiv.org/abs/1703.03373. 20
Model-Based Search Example
●
●
●
●
●
yei
−1.0 −0.5 0.0 0.5 1.0
0.0
0.4
0.8
0.00000
0.00003
0.00006
0.00009
0.00012
x
type
● init
prop
seq
type
y
yhat
ei
Iter = 6, Gap = 1.9996e−01
Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas,and Michel Lang. “MlrMBO: A Modular Framework for Model-BasedOptimization of Expensive Black-Box Functions,” March 9, 2017.http://arxiv.org/abs/1703.03373. 21
Model-Based Search Example
●
●
●
●
●
yei
−1.0 −0.5 0.0 0.5 1.0
0.0
0.4
0.8
0e+00
1e−05
2e−05
3e−05
4e−05
5e−05
x
type
● init
prop
seq
type
y
yhat
ei
Iter = 7, Gap = 2.0000e−01
Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas,and Michel Lang. “MlrMBO: A Modular Framework for Model-BasedOptimization of Expensive Black-Box Functions,” March 9, 2017.http://arxiv.org/abs/1703.03373. 22
Model-Based Search Example
●
●
●
●
●
yei
−1.0 −0.5 0.0 0.5 1.0
0.0
0.4
0.8
0.0e+00
5.0e−06
1.0e−05
1.5e−05
2.0e−05
x
type
● init
prop
seq
type
y
yhat
ei
Iter = 8, Gap = 2.0000e−01
Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas,and Michel Lang. “MlrMBO: A Modular Framework for Model-BasedOptimization of Expensive Black-Box Functions,” March 9, 2017.http://arxiv.org/abs/1703.03373. 23
Model-Based Search Example
●
●
●
●
●
yei
−1.0 −0.5 0.0 0.5 1.0
0.0
0.4
0.8
0.0e+00
2.5e−06
5.0e−06
7.5e−06
1.0e−05
x
type
● init
prop
seq
type
y
yhat
ei
Iter = 9, Gap = 2.0000e−01
Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas,and Michel Lang. “MlrMBO: A Modular Framework for Model-BasedOptimization of Expensive Black-Box Functions,” March 9, 2017.http://arxiv.org/abs/1703.03373. 24
Model-Based Search Example
●
●
●
●
●
yei
−1.0 −0.5 0.0 0.5 1.0
0.0
0.4
0.8
0e+00
1e−07
2e−07
3e−07
4e−07
x
type
● init
prop
seq
type
y
yhat
ei
Iter = 10, Gap = 2.0000e−01
Bischl, Bernd, Jakob Richter, Jakob Bossek, Daniel Horn, Janek Thomas,and Michel Lang. “MlrMBO: A Modular Framework for Model-BasedOptimization of Expensive Black-Box Functions,” March 9, 2017.http://arxiv.org/abs/1703.03373. 25
Problems
▷ How good are we really?▷ How much of it is just random chance?▷ Can we do better?
26
Underlying Issues
▷ true performance landscape unknown▷ resources allow to explore only tiny part of hyperparameter
space▷ results inherently stochastic
27
Potential Solutions
▷ better-understood benchmarks▷ more comparisons▷ more runs with different random seed
28
Two-Slide MBO ML# http://www.cs.uwyo.edu/~larsko/mbo.pyparams = { 'C': np.logspace(-2, 10, 13),
'gamma': np.logspace(-9, 3, 13) }param_grid = [ { 'C': x, 'gamma': y } for x in params['C']
for y in params['gamma'] ]# [{'C': 0.01, 'gamma': 1e-09}, {'C': 0.01, 'gamma': 1e-08}...]
initial_samples = 3evals = 10random.seed(1)
def est_acc(pars):clf = svm.SVC(**pars)return np.median(cross_val_score(clf, iris.data, iris.target, cv = 10))
data = []for pars in random.sample(param_grid, initial_samples):
acc = est_acc(pars)data += [ list(pars.values()) + [ acc ] ]
# [[1.0, 0.1, 1.0],# [1000000000.0, 1e-07, 1.0],# [0. 1, 1e-06,0.9333333333333333]]
29
Two-Slide MBO ML
regr = RandomForestRegressor(random_state = 0)for evals in range(0, evals):
df = np.array(data)regr.fit(df[:,0:2], df[:,2])
preds = regr.predict([ list(pars.values()) for pars in param_grid ])i = preds.argmax()
acc = est_acc(param_grid[i])data += [ list(param_grid[i].values()) + [ acc ] ]print("{}: best predicted {} for {}, actual {}"
.format(evals, round(preds[i], 2), param_grid[i], round(acc, 2)))
i = np.array(data)[:,2].argmax()print("Best accuracy ({}) for parameters {}".format(data[i][2], data[i][0:2]))
30
Two-Slide MBO ML
0: best predicted 0.99 for {'C': 1.0, 'gamma': 1e-09}, actual 0.931: best predicted 0.99 for {'C': 1000000000.0, 'gamma': 1e-09}, actual 0.932: best predicted 0.99 for {'C': 1000000000.0, 'gamma': 0.1}, actual 0.933: best predicted 0.97 for {'C': 1.0, 'gamma': 0.1}, actual 1.04: best predicted 0.99 for {'C': 1.0, 'gamma': 0.1}, actual 1.05: best predicted 1.0 for {'C': 1.0, 'gamma': 0.1}, actual 1.06: best predicted 1.0 for {'C': 1.0, 'gamma': 0.1}, actual 1.07: best predicted 1.0 for {'C': 1.0, 'gamma': 0.1}, actual 1.08: best predicted 1.0 for {'C': 0.01, 'gamma': 0.1}, actual 0.939: best predicted 1.0 for {'C': 1.0, 'gamma': 0.1}, actual 1.0Best accuracy (1.0) for parameters [1.0, 0.1]
31
Tools and Resources
iRace http://iridia.ulb.ac.be/irace/TPOT https://github.com/EpistasisLab/tpot
mlrMBO https://github.com/mlr-org/mlrMBOSMAC http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
Spearmint https://github.com/HIPS/SpearmintTPE https://jaberg.github.io/hyperopt/
Auto-WEKA http://www.cs.ubc.ca/labs/beta/Projects/autoweka/
Auto-sklearn https://github.com/automl/auto-sklearn
Available soon: edited book on automatic machine learninghttps://www.automl.org/book/ (Frank Hutter, Lars Kotthoff,Joaquin Vanschoren)
32
I’m hiring!
Several funded graduate/postdoc positions available.
33