86
Multiobjective Optimization and the COCO Platform MascotNum 2017, Paris, March 24, 2017 Dimo Brockhoff [email protected] http://www.cmap.polytechnique.fr/~dimo.brockhoff/

Multiobjective Optimization and the COCO Platform · 2017-03-24 · Deterministic algorithms Quasi-Newton with estimation of gradient (BFGS) [Broyden et al. 1970] Simplex downhill

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Multiobjective Optimizationand the COCO Platform

MascotNum 2017, Paris, March 24, 2017

Dimo [email protected]

http://www.cmap.polytechnique.fr/~dimo.brockhoff/

yesterday:

Multiobjective Optimization: The Big Picture

Algorithm Design Principles and Concepts

today:

How to Benchmark (Multiobjective) Optimizers

Given:

Not clear:

which of the many algorithms should I use on my problem?

𝑥 ∈ ℝ𝑛 𝑓(𝑥) ∈ ℝ𝑘

Practical Blackbox Optimization

Deterministic algorithmsQuasi-Newton with estimation of gradient (BFGS) [Broyden et al. 1970]

Simplex downhill [Nelder & Mead 1965]

Pattern search [Hooke and Jeeves 1961]

Trust-region methods (NEWUOA, BOBYQA) [Powell 2006, 2009]

Stochastic (randomized) search methodsEvolutionary Algorithms (continuous domain)

• Differential Evolution [Storn & Price 1997]

• Particle Swarm Optimization [Kennedy & Eberhart 1995]

• Evolution Strategies, CMA-ES [Rechenberg 1965, Hansen &

Ostermeier 2001]

• Estimation of Distribution Algorithms (EDAs) [Larrañaga, Lozano,

2002]

• Cross Entropy Method (same as EDA) [Rubinstein, Kroese, 2004]

• Genetic Algorithms [Holland 1975, Goldberg 1989]

Simulated annealing [Kirkpatrick et al. 1983]

Simultaneous perturbation stochastic approx. (SPSA) [Spall 2000]

Numerical Blackbox Optimizers

• understanding of algorithms

• algorithm selection

• putting algorithms to a standardized test• simplify judgement

• simplify comparison

• regression test under algorithm changes

Kind of everybody has to do it (and it is tedious):

• choosing (and implementing) problems, performance measures, visualization, stat. tests, ...

• running a set of algorithms

Need: Benchmarking

that's where COCO comes into play

Comparing Continuous Optimizers Platform

https://github.com/numbbo/coco

automatized benchmarking

How to benchmark algorithms with COCO?

...in 5min

https://github.com/numbbo/coco

https://github.com/numbbo/coco

https://github.com/numbbo/coco

https://github.com/numbbo/coco

https://github.com/numbbo/coco

https://github.com/numbbo/coco

https://github.com/numbbo/coco

https://github.com/numbbo/coco

requirements &

download

https://github.com/numbbo/coco

installation I & test

installation II (post-processing)

https://github.com/numbbo/coco

example

experiment

example_experiment.c

/* Iterate over all problems in the suite */while ((PROBLEM = coco_suite_get_next_problem(suite, observer)) != NULL) {

size_t dimension = coco_problem_get_dimension(PROBLEM);

/* Run the algorithm at least once */for (run = 1; run <= 1 + INDEPENDENT_RESTARTS; run++) {

size_t evaluations_done = coco_problem_get_evaluations(PROBLEM);long evaluations_remaining =

(long)(dimension * BUDGET_MULTIPLIER) – (long)evaluations_done;

if (... || (evaluations_remaining <= 0))break;

my_random_search(evaluate_function, dimension,coco_problem_get_number_of_objectives(PROBLEM), coco_problem_get_smallest_values_of_interest(PROBLEM), coco_problem_get_largest_values_of_interest(PROBLEM),(size_t) evaluations_remaining,random_generator);

}

https://github.com/numbbo/coco

postprocess

cocopp

result folder

automatically generated results

automatically generated results

automatically generated results

doesn't look too complicated, does it?

[the devil is in the details ]

until now (single-objective):

data for about 150 algorithm variants

120+ workshop papers

by 79 authors from 25 countries

new multiobjective test suite in 2016:

so far 15 datasets on noiseless bi-objective test bed

On

• real world problems• expensive

• comparison typically limited to certain domains

• experts have limited interest to publish

• "artificial" benchmark functions• cheap

• controlled

• data acquisition is comparatively easy

• problem of representativeness

Measuring Performance

• define the "scientific question"

the relevance can hardly be overestimated

• should represent "reality"

• are often too simple?

remind separability

• a number of testbeds are around

• account for invariance properties

prediction of performance is based on “similarity”, ideally equivalence classes of functions

Test Functions

Available Test Suites in COCO

bbob 24 noiseless fcts 150+ algo data sets

bbob-noisy 30 noisy fcts 40+ algo data sets

bbob-biobj 55 bi-objective fcts 15 algo data sets

Under development:

large-scale versions

linearly constrained test suite

Long-term goals:

combining difficulties

almost real-world problems

real-world problems

Meaningful quantitative measure• quantitative on the ratio scale (highest possible)

"algo A is two times better than algo B" is a meaningful statement

• assume a wide range of values

• meaningful (interpretable) with regard to the real world

possible to transfer from benchmarking to real world

How Do We Measure Performance?

runtime or first hitting time is the prime candidate(we don't have many choices anyway)

Two objectives:

• Find solution with small(est possible) function/indicator value

• With the least possible search costs (number of function evaluations)

For measuring performance: fix one and measure the other

How Do We Measure Performance?

convergence graphs is all we have to start with...

Measuring Performance Empirically

ECDF:

Empirical Cumulative Distribution Function of theRuntime

[aka data profile]

A Convergence GraphA Convergence Graph

First Hitting Time is Monotonous

15 Runs

target

15 Runs ≤ 15 Runtime Data Points

Empirical CDF1

0.8

0.6

0.4

0.2

0

the ECDF of run lengths to reach the target

has for each data point a vertical step of constant size

displays for each x-value (budget) the count of observations to the left (first hitting times)

Empirical Cumulative Distribution

e.g. 60% of the runs need between 2000 and 4000 evaluations

e.g. 80% of the runs reached the target

Empirical CDF1

0.8

0.6

0.4

0.2

0

Empirical Cumulative Distribution

ECDFs: a standard display (aka data profiles [Moré and Wild 2009])

Reconstructing A Single Run

50 equallyspaced targets

Reconstructing A Single Run

Reconstructing A Single Run

Reconstructing A Single Run

the empirical CDFmakes a step for each star, is monotonous and displays for each budget the fraction of targets achieved within the budget

1

0.8

0.6

0.4

0.2

0

Reconstructing A Single Run

the ECDF recovers the monotonous graph, discretised and flipped

1

0.8

0.6

0.4

0.2

0

Reconstructing A Single Run

1

0.8

0.6

0.4

0.2

0

Reconstructing A Single Run

the ECDF recovers the monotonous graph, discretised and flipped

15 runs

Aggregation

15 runs

50 targets

Aggregation

15 runs

50 targets

Aggregation

15 runs

50 targets

ECDF with 750 steps

Aggregation

50 targets from 15 runs

...integrated in a single graph

Aggregation

area over the ECDF curve

=average log

runtime(or geometric avg. runtime) over all

targets (difficult and easy) and all runs

50 targets from 15 runs integrated in a single graph

Interpretation

Fixed-target: Measuring Runtime

Fixed-target: Measuring Runtime

• Algo Restart A:

• Algo Restart B:

𝑹𝑻𝑨𝒓

ps(Algo Restart A) = 1

𝑹𝑻𝑩𝒓

ps(Algo Restart A) = 1

Fixed-target: Measuring Runtime

• Expected running time of the restarted algorithm:

𝐸 𝑅𝑇𝑟 =1 − 𝑝𝑠𝑝𝑠𝐸 𝑅𝑇𝑢𝑛𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑓𝑢𝑙 + 𝐸[𝑅𝑇𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑓𝑢𝑙]

• Estimator average running time (aRT):

𝑝𝑠 =#successes

#runs

𝑅𝑇𝑢𝑛𝑠𝑢𝑐𝑐 = Average evals of unsuccessful runs

𝑅𝑇𝑠𝑢𝑐𝑐 = Average evals of successful runs

𝑎𝑅𝑇 =total #evals

#successes

ECDFs with Simulated Restarts

What we typically plot are ECDFs of the simulated restarted algorithms:

Worth to Note: ECDFs in COCO

In COCO, ECDF graphs

• never aggregate over dimension

• but often over targets and functions

• can show data of more than 1 algorithm at a time

150 algorithms

from BBOB-2009

till BBOB-2015

http://coco.gforge.inria.fr/

150 algorithms

more representative set

of 31 algorithms

from BBOB-2009

dimension 10

Results of BBOB 2009-2015 (noiseless test suite)

• low dimension (2-D, 3-D), small budget: Nelder Mead (implementation by B. Doerr et al.)

• very small budget (<50D funevals): BBOB-SMAC (an EGO variant)

• larger dimension, small budget (<500D funevals): NEWUOA, MCS

• larger dimension, large budget (>500D funevals): CMA-ES variants

• portfolio algorithms can combine the best of all worlds, e.g. HCMA

Recommendations

Invariance

many algorithms are not invariant under affine transformations of the search space (for example PSO), others like CMA-ES variants are

as a consequence, other types of algorithms are good for separable and non-separable problems

Robustness of Algorithms

CMA-ES was tested in several variants, in several programming languages and with several, slightly changing settings

nevertheless, it showed throughout its variants consistently good behavior and can be considered as a robust algorithm if the budget is not too low

Further Observations

...but no time to explain them here

More Automated Plots...

...but no time to explain them here

More Automated Plots...

Expensive Scenario

an expensive scenario is available with COCO ('--expensive')

instead of uniform targets on the logscale: budget-relative targets choose reference budgets 0.5n, 1.2n, ..., 50n

for each budget, function and dimension, use the target that is just not reached by a reference algorithm (default reference = best2009)

allows to "zoom" into first 1000n function evaluations

One Thing to Mention Here...

Comparing Performance in the Multiobjective Case

Quality indicator approach

reduces each approximation set to a single quality value

applies statistical tests to the quality values

Two Main Approaches for Empirical Studies

see e.g. [Zitzler et al. 2003]

Attainment function approach

applies statistical tests directly

to the approximation set

detailed information about how and where performance differences occur

Comparing Performance in EMO

Method 1: via performance indicators

Comparing Performance in EMO

Method 1: via performance indicators

Comparing Performance in EMO

In principle, same as in single-objective case, but:

choice of performance indicator?

choose indicator that is at least weakly refining the Pareto dominance relation, i.e. does not contradict it:

hypervolume

epsilon indicator

R2 indicator

choice of targets, in particular if Pareto set not known?

typically: choose best known approximation (e.g. from the algorithm data itself)

more recent: choose best set of 𝜇 points from the approximation as reference

Comparing Performance in EMO

Idea:

plot empirical success probability to "attain" (i.e. dominate) each objective vector in objective space at a given budget

Method 2: via Empirical Attainment Functions (EAFs)

© Manuel López-Ibáñez

[López-Ibáñez et al. 2010]

Comparing Performance in EMO

Method 2: via Empirical Attainment Functions (EAFs)

© Manuel López-Ibáñez

[López-Ibáñez et al. 2010]

Comparing Performance in EMO

Method 2: via Empirical Attainment Functions (EAFs)

© Manuel López-Ibáñez

[López-Ibáñez et al. 2010]

Comparing Performance in EMO

Method 2: via Empirical Attainment Functions (EAFs)

© Manuel López-Ibáñez

[López-Ibáñez et al. 2010]

Comparing Performance in EMO

Method 2: via Empirical Attainment Functions (EAFs)

© Manuel López-Ibáñez

[López-Ibáñez et al. 2010]

Comparing Performance in EMO

Method 2: via Empirical Attainment Functions (EAFs)

© Manuel López-Ibáñez

[López-Ibáñez et al. 2010]

Method 2: via Empirical Attainment Functions (EAFs)

Comparing Performance in EMO

The Empirical Attainment Function

The Empirical Attainment Function 𝛼 𝑧 "counts" how many solution sets 𝒳𝑖 attain or dominate a vector 𝑧 at time 𝑇:

𝛼𝑇 𝑧 =1

𝑁

𝑖=1

𝑁

𝟏{𝒳𝑖⊴𝑇𝑧}

with ⊴𝑇 being the weak dominance relation between a solution set and an objective vector at time 𝑇.

Note that 𝛼𝑇 𝑧 is the empirical cumulative distribution function of the achieved objective function distribution at time T in the single-objective case ("fixed budget scenario").

latest implementation online at http://eden.dei.uc.pt/~cmfonsec/software.html

R package: http://lopez-ibanez.eu/eaftools

see also [López-Ibáñez et al. 2010, Fonseca et al. 2011]

© Manuel López-Ibáñez

[López-Ibáñez et al. 2010]

EAFs "in Practice"

code available at http://github.com/numbbo/coco/

see also [Brockhoff et al. 2017]

success probability can be naturally replaced by aRT:

Plotting Average Runtimes

The bi-objective BBOB results

• test suite with 55 bi-objective functions• combining 10 of the 24 bbob functions

• performance assessment wrt. hypervolume of the entire set of non-dominated solutions found so far

• [ideal, nadir] normalized to [0,1]2 with nadir as ref. point

• simple distance to region of interest [0,1]2 if outside of it

• targets relative to a (pre-chosen) reference set

bbob-biobj @ GECCO 2016 in a nut shell

our baselines: pure random search (in [-4,4]n, [-5,5]n, and [-100,100]n)

SMS-EMOA (1x with DE, 1x with polyn. mutation (PM) and SBX)

NSGA-II as implemented in MATLAB (gamultiobj)

RM-MEDA of Zhang et al.

Differential Evolution for multiobj. opt. (DEMO)

3 MO-DIRECT versions

2 surrogate-based algos as MATSuMoTo extensions

an MO-CMA-ES variant with unbounded popsize, new crossover, and initial 100 parallel runs (UP-MO-CMA-ES)

HMO-CMA-ES: portfolio of single-objective NEWUOA & CMA-ES on scalarizing functions, IPOP-MO-CMA-ES with restarts and a steady state MO-CMA-ES with increasing popsize

Available Algorithms on bbob-biobj

new test suites for two objectives, a large-scale and a constrained test suite on the way

towards more realistic problems combinations of difficulties blackbox constraints (?) "almost real-world" problems

online visualization of data

We need your help!

benchmark your favorite (surrogate-assisted?) algorithm

BBOB-2017 deadline in 2 weeks

18 months engineer position available for online visualization

Thank you.....

The Future of COCO