Upload
doandiep
View
223
Download
0
Embed Size (px)
Citation preview
Stochastic Local Search, Multi-objectiveOptimization, and Automated Configuration of
Algorithms
Thomas Stutzle
IRIDIA, CoDE, Universite Libre de BruxellesBrussels, Belgium
iridia.ulb.ac.be/∼stuetzle
Outline
1. Stochastic local search
2. “Simple SLS method”: IG
3. Multi-objective Optimization
4. Automatic offline configuration: F-race
5. Automatic Configuration of Multi-objective Optimizers
Combinatorial optimisation problems
Examples
I finding minimum cost schedule to deliver goods
I finding optimal sequence of jobs in production line
I finding best allocation of flight crews to airplanes
I finding a best routing for Internet data packets
I . . . and many more
Few facts
I arise in many real-world applications
I many have high computational complexity (NP-hard)
I in research, often abstract versions of real-world problems aretreated
Search paradigms
Systematic search
I traverse search space of instances in systematic manner
I complete: guaranteed to find optimal solution in finiteamount of time (plus proof of optimality)
Local search
I start at some initial solution
I iteratively move from search position to neighbouring one
I incomplete: not guaranteed to find optimal solutions
Search paradigms
Perturbative (local) search
I search space = complete candidate solutions
I search step = modification of one or more solutioncomponents
I example: 2-opt algorithm for TSP
Search paradigms
Constructive (local) search
I search space = partial candidate solutions
I search step = extension with one or more solution components
I example: nearest neighbor heuristic for TSP
Stochastic local search — global view
I vertices: candidate solutions(search positions)
I edges: connect neighbouringpositions
I s: (optimal) solution
I c: current search position
c
s
Stochastic local search — local view
c
s
I next search position is selected from local neighbourhoodbased on local information.
Stochastic local search (SLS)
SLS algorithm defined through
I search space SI set of solutions
I neighborhood relation
I finite set of memory states
I initialization function
I step function
I termination predicate
I evaluation function
(for a formal definition see SLS:FA, Hoos & Stutzle, 2005)
A simple SLS algorithm
Iterative improvement
I start from some initial solution
I iteratively move from the current solution to an improvingneighbouring one as long as such one exists
Main problem
I getting stuck in local optima
Solution
I general–purpose SLS methods (aka metaheuristics) that directthe search and allow escapes from local optima
SLS methods (metaheuristics)
modify neighbourhoodsI variable neighbourhood search
accept occasionally worse neighboursI simulated annealing
I tabu search
modify evaluation functionI dynamic local search
generate new (starting) solutions (for local search)
I EAs / memetic algorithms
I ant colony optimization
I iterated local search
I iterated greedy
Outline
1. Stochastic local search
2. “Simple SLS method”: IG
3. Multi-objective Optimization
4. Automatic offline configuration: F-race
5. Automatic Configuration of Multi-objective Optimizers
Iterated Greedy
Key Idea: iterate over greedy construction heuristics throughdestruction and construction phases
Motivation:
I start solution construction from partial solutions to avoidreconstruction from scratch
I keep features of the best solutions to improve solution quality
I if few construction steps are to be executed, greedy heuristicsare fast
I adding a subsidiary local search phase may further improveperformance
Iterated Greedy (IG):
While termination criterion is not satisfied:|| generate candidate solution s using|| greedy constructive search
While termination criterion is not satisfied:|| r := s|| apply solution destruction on s|| perform greedy constructive search on s|| perform local search on s|| based on acceptance criterion,b keep s or revert to s := r
Note:
I local search after solution reconstruction can substantiallyimprove performance
IG—main issues
I destruction phase
I fixed vs. variable size of destructionI stochastic vs. deterministic destructionI uniform vs. biased destruction
I construction phase
I not every construction heuristic is necessarily usefulI typically, adaptive construction heuristics preferableI speed of the construction heuristic is an issue
I acceptance criterion
I determines tradeoff diversification–intensification of the search
Permutation flow-shop problem (PFSP)
M2
J1
J1
J2
J2
J3
J3
J4
J4
J5
J5
M1
M3
0 5 10 15 20
time
J1 J2 J3 J4 J5
I n jobs are to be processed on m machines (in canonical orderof machines)
I input data: processsing times for each job on each machineand due dates of each job
I otherwise: usual PFSP characteristics
IG for PFSP
Initial solution construction
I NEH heuristic
Destruction heuristic
I randomly remove d jobs from sequence
Construction heuristic
I follow the NEH heuristic considering jobs in random order
Acceptance criterion
I Metropolis condition with fixed temperature
Iterative improvement for PFSP
A B C D E F
A C B D E F
φ
φ'
A B C D E F
A E C D B F
φ
transpose neighbourhood
φ'
A B C D E F
A C D B E F
φ
exchange neighbourhood
insert neighbourhood
φ'
I best choice: insert; profits from speed-ups
IG for PFSP, example
3 4 1 8 57 62 Initial NEH solution , Cmax
= 8564
3 4 1 8 57 62
123
3 8 2 67
1 45
Choose d (3) jobs at random
---DESTRUCTION PHASE ---
Partial sequence to reconstruct
Jobs to reinsert
---CONSTRUCTION PHASE ---
3 8 5 27 6 After reinserting job 5, Cmax
= 7589
3 8 5 27 1 6 After reinserting job 1, Cmax
= 8243
3 8 5 27 1 6 4 After reinserting job 4, Cmax
= 8366
I when combined with local search, IG is a state-of-the-artalgorithm for permutation flow-shop scheduling
GA_AA
GA_CHEN
GA_MIT
GA_REEV
GA_RMA
HGA_RMA
IG_RS
ILS
IG_RSLS
M-MMAS
NEHT
PACO
SA_OP
SPIRIT
Algorithm
0
1
2
3
4
5
6
Means and 95.0 Percent LSD IntervalsA
vrg
. R
elati
ve P
erce
nta
ge D
evia
tio
n (RPD
)
IG — enhancements
I usage of history information to bias destructive/constructivephase
I use lower bounds on the completion of a solution in theconstructive phase
I combination with local search in the constructive phase
I use local search to improve full solutions destruction / construction phases can be seen as aperturbation mechanism (as in ILS)
I exploitation of constraint propagation techniques
I IG has been re-invented several times; names include
I simulated annealing, ruin–and–recreate, iterative flattening,iterative construction search, large neighborhood search, ..
I close relationship to iterative improvement in largeneighbourhoods
I for some applications so far excellent results
I can give lead to effective combinations of tree search andlocal search heuristics
Outline
1. Stochastic local search
2. “Simple SLS method”: IG
3. Multi-objective Optimization
4. Automatic offline configuration: F-race
5. Automatic Configuration of Multi-objective Optimizers
Multi-objective Optimization
Multiobjective Combinatorial Optimization Problems (MCOPs)
I many real-life problems are multiobjective
I timetabling and scheduling
I logistics and transportation
I telecommunications and computer networks
I ... and many others
I example: objectives in PFSPI makespan
I sum of flowtimes
I total weighted or unweighted tardiness
Pareto optimization
I multiple objective functions f(x) = (f1(x), . . . , fQ(x))
I no a priori knowledge Pareto-optimality
Main SLS approaches to Pareto optimization
SLS algorithms
I based on dominance criterion
I component-wise acceptance criterion
I example: Pareto local search (PLS)
I based on solving scalarizations
I convert MCOPs into single-objective problems
minx∈X
Q∑i=1
λi fi (x)
I for obtaining many solution: vary weight vector λ
I example: two-phase local search (TPLS)
I hybrids of the two search models
CWAC Search Model
————————————input: candidate solution xAdd x to Archiverepeat
Choose x from ArchiveXN = Neighbors(x)Add XN to ArchiveFilter Archive
until all x in Archive are visitedreturn Archive————————————
cost
time
SAC Search Model
————————————–input: weight vectors Λfor each λ ∈ Λ do
x is a candidate solutionx ′ = SolveSAC(x , λ)Add x ′ to Archive
Filter Archivereturn Archive————————————–
cost
time
Hybrid Search Model
————————————input: weight vectors Λfor each λ ∈ Λ do
x is a candidate solutionx ′ = SolveSAC(x , λ)X ′ = CW(x ′)Add X ′ to Archive
Filter Archivereturn Archive————————————
cost
time
Our research on MCOPs
I MCOPs tackled in Pareto sense
I main algorithmic approaches followed
I two-phase local search (SAC search model)
I Pareto local search (CWAC search model)
I multi-objective ACO algorithms
I empirical analysis
I empirical attainment functions (EAFs)
I visualization techniques for EAF differences
Hybrid TPLS+PLS for biobjective PFSPs
Engineering an effective TPLS+PLS algorithm
I context: development of effective SLS algorithms MCOPS
I example problem: bi-objective flow-shop problems (bPFSPs)
I steps followed:
1. knowledge of state-of-the-art
2. development of powerful single-objective algorithms
3. experimental study of TPLS components
4. experimental study of PLS components
5. design of a hybrid algorithm
6. detailed comparison to state of the art
bi-objective permutation flow-shop problem
permutation flow-shop problem
I n jobs are to be processed on m machines (in canonical orderof machines)
I input data: processsing times for each job on each machineand due dates of each job
I otherwise: usual PFSP characteristics
objective functions
I makespan
I sum of flowtimes
I total weighted or unweighted tardiness
tackle all bi-objective problems for any combination of objectives
Step 2: IG for other single-objective problems
Recall: we have state-of-the-art IG algorithm for PFSP withmakespan criterion (part of step 1)
main adaptations for other objectives
I constructive heuristics to provide good initial solutions
I neighborhood operators for local search step
I number of jobs to remove
I acceptance criterion: formula, temperature
Remark: parameters have been fine-tuned using I/F-Race.
Evaluation of IG—Sum of flowtimes
comparison with the state-of-the-art (Tseng and Lin, 2009):
Short runs Long runs
size of instance R.D. Best R.D. Mean
20x5 0.016 -0.14220x10 0.000 -0.00220x20 0.000 -0.033
50x5 0.117 0.16450x10 0.012 -0.06550x20 -0.196 -0.278
100x5 0.186 0.135100x10 -0.003 -0.149100x20 -0.502 -0.625
Average -0.041 -0.110
size of instance R.D. Best R.D. Mean
50x5 -0.026 -0.04450x10 -0.034 -0.13550x20 -0.071 0.201
Average on all sizes -0.044 -0.126
I more than 50 new best known solutions for instance set of 90instances
I state-of-the-art results
Evaluation of IG-Weighted tardiness
I few studies on this criterion in the literature
I more than 90% of best known solutions improved for abenchmark set of 540 instances (for total tardiness), inproduction mode
Experimental analysis
I Better relation [Jaszkiewicz and Hansen 1998]
cost
time
Blue is incomparable to Red
cost
time
Blue is better than Red
Experimental analysis
I Attainment functions [Grunert da Fonseca et al. 2001]
AF : Probability that an outcome ≤ an arbitrary point
EAF : How many runs an outcome ≤ an arbitrary point
Is EAFBlue significantly different from EAFRed?
Permutation tests with Smirnov distance as test statistic
Experimental analysis
I Visualization of differences [Paquete 2005]
EAFBlue − EAFRed
positive differences negative differences
Step 3: Effective TPLS algorithm
Two-phase local search
I Phase 1: generate high quality solution for single objectiveproblem
I Phase 2: solve sequence of scalarizationsI use solution found for previous scalarization as initial solution
for the next one
Studied TPLS components
1. Search strategy
2. Number of scalarizations
3. Intensification mechanism
f1
f2
2-phase
Generate (i) high quality solution for f1 and (ii) sequence of solutions
Studied TPLS components
1. Search strategy
2. Number of scalarizations
3. Intensification mechanism
f1
f2
Restart
Independent runs of SLS algorithms using different weights
Studied TPLS components
1. Search strategy
2. Number of scalarizations
3. Intensification mechanism
f1
f2
Studied TPLS components
1. Search strategy
2. Number of scalarizations
3. Intensification mechanism
f1
f2
Higher solution quality returned by SLS algorithm
Adaptive Anytime TPLS
Idea: dynamically generate weights to adapt to the shape of thePareto front
I focus search on thelargest gap in Paretofront
I seed new scalarizationswith solutions fromprevious similarscalarizations
I weight generationinspired from dichotomicscheme
Adaptive Anytime TPLS vs. Restart
6700 6800 6900 7000 7100∑Ci
3.9e
+05
4e+
054.
1e+
05
∑T
i
Two Phase
[0.8, 1.0][0.6, 0.8)[0.4, 0.6)[0.2, 0.4)[0.0, 0.2)
6700 6800 6900 7000 7100∑Ci
Restart
Step 4: Effective PLS algorithm
Pareto local search
I iterative improvement algorithm that directly follows theCWAC search model (dominance-based acceptance criterion)
I studied PLS components
I neighborhood operators
I seeding the algorithm with different quality solutions
PLS: Neighborhhood operators
3.8e+05 3.86e+05 3.92e+05 3.98e+05∑Ci
5e+
056e
+05
7e+
058e
+05
∑w
iTi
Exchange
[0.8, 1.0][0.6, 0.8)[0.4, 0.6)[0.2, 0.4)[0.0, 0.2)
3.8e+05 3.86e+05 3.92e+05 3.98e+05∑Ci
Exchange+Insertion
PLS: Seeding
6500 6700 6900 71003800
0040
0000
4200
00
Objectives: Cmax and ∑∑Cj
Cmax
∑∑C
j
●
●
●
random setheuristic setIG setheuristic seedsIG seeds
380000 400000 4200006e
+05
8e+
051e
+06
Objectives: ∑∑Cj and ∑∑wjTj
∑∑Cj
∑∑w
jTj
●
●
●
random setheuristic setIG setheuristic seedsIG seeds
Step 5: Hybrid algorithm, TPLS+PLS
1: TPLS
I uses roughly half of the overall time
I each objective and combination of objectives uses a dedicatedIterated Greedy
2: PLS
I both exchange and insertion operators
I bounded in time
Step 6: Comparison to state-of-the-art
I recent review (2008) by Minella et al.tests 23 algorithms forthree biobjective PFSP problems. They also provide referencesets measured across all 23 algorithms
often the median or even worst attainment surfaces ofTPLS+PLS dominate reference sets!
I two state-of-the-art algorithms identified by Minella et al.
I multi-objective Simulated Annealing (MOSA) by Varadharajan& Rajendran, 2005
I multi-objective Genetic Local Search (MOGLS) by Arroyo &Armentano, 2004
I comparison to re-implementations of both algorithms; 10 runsper instance
Comparison to state-of-the-art: PFSP-(Cmax, SFT)
nxm TPLS+PLS MOSA
20x5 4.75 6.1120x10 3.15 9.5720x20 0 1.7
50x5 91.99 050x10 78.57 050x20 82.88 0
100x5 85.06 0100x10 74.5 0.03100x20 75.91 0
200x10 26.36 0200x20 32.84 0
Given: Percentage of times an outcome of an algorithmoutperforms an outcome of the other one.
Comparison to state-of-the-art: PFSP-(Cmax, TT)
nxm TPLS+PLS MOSA
20x5 5.74 1.3620x10 0 0.2920x20 0.4 1.57
50x5 84.75 050x10 67.35 050x20 69.55 0
100x5 84.61 0100x10 73.29 0100x20 61.99 0
200x10 37.14 0200x20 33.04 0
Given: Percentage of times an outcome of an algorithmoutperforms an outcome of the other one.
Comparison to state-of-the-art: PFSP-(SFT, TT)
nxm TPLS+PLS MOSA
20x5 8.46 2820x10 0.19 0.420x20 3.6 5.19
50x5 98.37 050x10 94.88 050x20 5.13 0
100x5 98.56 0.6100x10 98.9 0100x20 97.25 0
200x10 98.34 0.36200x20 89.13 0.92
Given: Percentage of times an outcome of an algorithmoutperforms an outcome of the other one.
Comparison to state-of-the-art: PFSP-(Cmax, SFT)
6300 6450 6600 6750 6900Cmax
3.75
e+05
3.9e
+05
4e+
05
∑C
i
TP+PLS
[0.8, 1.0][0.6, 0.8)[0.4, 0.6)[0.2, 0.4)[0.0, 0.2)
6300 6400 6500 6600 6700 6800 6900Cmax
MOSA
Comparison to state-of-the-art: PFSP-(Cmax, WT)
6300 6500 6700 6900Cmax
1.2e
+05
1.4e
+05
1.6e
+05
∑w
iTi
TP+PLS
[0.8, 1.0][0.6, 0.8)[0.4, 0.6)[0.2, 0.4)[0.0, 0.2)
6300 6400 6500 6600 6700 6800 6900 7000Cmax
MOSA
Comparison to state-of-the-art: PFSP-(SFT, WT)
3.69e+05 3.72e+05 3.75e+05 3.78e+05∑Ci
1.2e
+05
1.24
e+05
1.3e
+05
∑w
iTi
TP+PLS
[0.8, 1.0][0.6, 0.8)[0.4, 0.6)[0.2, 0.4)[0.0, 0.2)
3.69e+05 3.72e+05 3.75e+05 3.78e+05∑Ci
MOSA
Summary of results for PFSP
I single-objective problems
I new best-known solutions for 50 out of 90 instances fromPFSP-flowtime benchmarks
I new best-known solutions of 90% of available benchmarks forPFSP-total-tardiness
I multi-objective problems
I hybrid algorithms clearly outperforms the two previousstate-of-the-art algorithms
I hybrid algorithms usually outperforms the non-dominatedobtained from the best results in an extensive computationalstudy of 23 algorithms
Outline
1. Stochastic local search
2. “Simple SLS method”: IG
3. Multi-objective Optimization
4. Automatic offline configuration: F-race
5. Automatic Configuration of Multi-objective Optimizers
Configuration of SLS algorithms
SLS algorithm components
I categorical parametersI type of construction method in IGI choice of cross-over operator in evolutionary algorithms
I numerical parametersI destruction strengthI operator application probability
Configuration/design problem
I given an application scenario, choose categorical andnumerical parameters to optimize some performance criterion
I finding a good configuration can be very time-consuming
Main configuration approaches
Offline configuration
I configure algorithm before deploying it
I configuration done on training instances
Online tuning (parameter control)
I adapt parameter setting while solving an instance
I typically limited to a set of known crucial algorithmparameters
We focus on offline tuning
Main configuration approaches
Offline configuration
I configure algorithm before deploying it
I configuration done on training instances
Online tuning (parameter control)
I adapt parameter setting while solving an instance
I typically limited to a set of known crucial algorithmparameters
We focus on offline tuning
Importance of the configuration problem
I improvement over manual, ad-hoc methods for tuning
I reduction of development time and human intervention
I increase number of considerable degrees of freedom
I empirical studies, comparisons of algorithms
I support for end users of algorithms
Methods for automated algorithm configuration are an importanttool for engineering SLS algorithms
Our work
I development of automatic configuration methods
I F-race, iterated F-race
I ParamILS (work with UCB, Vancouver)
I application of configuration tools
I exploitation in development of state-of-the-art algorithms
I integration into SLS algorithm engineering
I (few) applications in industrial contexts
The configuration problem
(Our) Configuration problem
I Given:I (finite) set of candidate configurationsI set of training instancesI optimization criterion: solution quality, run-time
I Goal:I find a best configuration (for future instances in
production-mode)
The racing approach
Θ
i
I start with a set of initial candidates
I consider a stream of instances
I sequentially evaluate candidates
I discard inferior candidates
as sufficient evidence is gathered against them
I . . . repeat until a winner is selected
or until computation time expires
The racing approach
Θ
i
I start with a set of initial candidates
I consider a stream of instances
I sequentially evaluate candidates
I discard inferior candidates
as sufficient evidence is gathered against them
I . . . repeat until a winner is selected
or until computation time expires
The racing approach
Θ
i
I start with a set of initial candidates
I consider a stream of instances
I sequentially evaluate candidates
I discard inferior candidates
as sufficient evidence is gathered against them
I . . . repeat until a winner is selected
or until computation time expires
The racing approach
Θ
i
I start with a set of initial candidates
I consider a stream of instances
I sequentially evaluate candidates
I discard inferior candidates
as sufficient evidence is gathered against them
I . . . repeat until a winner is selected
or until computation time expires
The racing approach
Θ
i
I start with a set of initial candidates
I consider a stream of instances
I sequentially evaluate candidates
I discard inferior candidates
as sufficient evidence is gathered against them
I . . . repeat until a winner is selected
or until computation time expires
The racing approach
Θ
i
I start with a set of initial candidates
I consider a stream of instances
I sequentially evaluate candidates
I discard inferior candidates
as sufficient evidence is gathered against them
I . . . repeat until a winner is selected
or until computation time expires
The racing approach
Θ
i
I start with a set of initial candidates
I consider a stream of instances
I sequentially evaluate candidates
I discard inferior candidatesas sufficient evidence is gathered against them
I . . . repeat until a winner is selected
or until computation time expires
The racing approach
Θ
i
I start with a set of initial candidates
I consider a stream of instances
I sequentially evaluate candidates
I discard inferior candidatesas sufficient evidence is gathered against them
I . . . repeat until a winner is selected
or until computation time expires
The racing approach
Θ
i
I start with a set of initial candidates
I consider a stream of instances
I sequentially evaluate candidates
I discard inferior candidatesas sufficient evidence is gathered against them
I . . . repeat until a winner is selected
or until computation time expires
The racing approach
Θ
i
I start with a set of initial candidates
I consider a stream of instances
I sequentially evaluate candidates
I discard inferior candidatesas sufficient evidence is gathered against them
I . . . repeat until a winner is selected
or until computation time expires
The racing approach
Θ
i
I start with a set of initial candidates
I consider a stream of instances
I sequentially evaluate candidates
I discard inferior candidatesas sufficient evidence is gathered against them
I . . . repeat until a winner is selected
or until computation time expires
The racing approach
Θ
i
I start with a set of initial candidates
I consider a stream of instances
I sequentially evaluate candidates
I discard inferior candidatesas sufficient evidence is gathered against them
I . . . repeat until a winner is selected
or until computation time expires
The racing approach
Θ
i
I start with a set of initial candidates
I consider a stream of instances
I sequentially evaluate candidates
I discard inferior candidatesas sufficient evidence is gathered against them
I . . . repeat until a winner is selected
or until computation time expires
The racing approach
Θ
i
I start with a set of initial candidates
I consider a stream of instances
I sequentially evaluate candidates
I discard inferior candidatesas sufficient evidence is gathered against them
I . . . repeat until a winner is selected
or until computation time expires
The racing approach
Θ
i
I start with a set of initial candidates
I consider a stream of instances
I sequentially evaluate candidates
I discard inferior candidatesas sufficient evidence is gathered against them
I . . . repeat until a winner is selected
or until computation time expires
The racing approach
Θ
i
I start with a set of initial candidates
I consider a stream of instances
I sequentially evaluate candidates
I discard inferior candidatesas sufficient evidence is gathered against them
I . . . repeat until a winner is selected
or until computation time expires
The racing approach
Θ
i
I start with a set of initial candidates
I consider a stream of instances
I sequentially evaluate candidates
I discard inferior candidatesas sufficient evidence is gathered against them
I . . . repeat until a winner is selected
or until computation time expires
The racing approach
Θ
i
I start with a set of initial candidates
I consider a stream of instances
I sequentially evaluate candidates
I discard inferior candidatesas sufficient evidence is gathered against them
I . . . repeat until a winner is selectedor until computation time expires
The racing approach
Θ
i
I start with a set of initial candidates
I consider a stream of instances
I sequentially evaluate candidates
I discard inferior candidatesas sufficient evidence is gathered against them
I . . . repeat until a winner is selectedor until computation time expires
The racing approach
Θ
i
I start with a set of initial candidates
I consider a stream of instances
I sequentially evaluate candidates
I discard inferior candidatesas sufficient evidence is gathered against them
I . . . repeat until a winner is selectedor until computation time expires
The racing approach
Θ
i
I start with a set of initial candidates
I consider a stream of instances
I sequentially evaluate candidates
I discard inferior candidatesas sufficient evidence is gathered against them
I . . . repeat until a winner is selectedor until computation time expires
The racing approach
Θ
i
I start with a set of initial candidates
I consider a stream of instances
I sequentially evaluate candidates
I discard inferior candidatesas sufficient evidence is gathered against them
I . . . repeat until a winner is selectedor until computation time expires
The F-Race algorithm
Statistical testing
1. family-wise tests for differences among configurationsI Friedman two-way analysis of variance by ranks
2. if Friedman rejects H0, perform pairwise comparisons to bestconfiguration
I apply Friedman post-test
Predecessors
I racing algorithms in model-selection
Sampling configurations
F-race is a method for the selection of the best configuration andindependent of the way the set of configurations is sampled
Sampling configurations and F-race
I full factorial design
I random sampling design
I iterative refinement of a sampling model (iterative F-race)(Balaprakash, Birattari, Stutzle, 2007; Birattari et al. 2010, Lopez-Ibanez
et al. 2011)
Iterative F-race: an illustration
I sample configurationsfrom initial distribution
While not terminate()
1. apply F-Race
2. modify the distribution
3. sample configurationswith selection probability
Iterative F-race: an illustration
I sample configurationsfrom initial distribution
While not terminate()
1. apply F-Race
2. modify the distribution
3. sample configurationswith selection probability
Iterative F-race: an illustration
I sample configurationsfrom initial distribution
While not terminate()
1. apply F-Race
2. modify the distribution
3. sample configurationswith selection probability
Iterative F-race: an illustration
I sample configurationsfrom initial distribution
While not terminate()
1. apply F-Race
2. modify the distribution
3. sample configurationswith selection probability
Iterated F-race: design choices
I one may adopt known sampling techniques but stronglimitation on number function evaluations
I however, strong limitation on the number of functionevaluations
I main design issues for an ad-hac design
I How many iterations?
I Which computational budget at each iteration?
I How many candidate configurations at each iteration?
I When to terminate F-race at each iteration?
I How to generate candidate configurations?
Example iterated F-race algorithm
this versions extends proposal of [Balaprakash, Birattari, Stutzle,2007] by considering categorical and conditional parameters
I number iterations LI L = 2 + round(log2 d)
I computational budget at each iterationsI Bl = B − Bused/(L− l + 1)
I number of candidate configurationsI Nl = bBl/µlcI µl = 5 + l (increases with iteration counter)
I termination of F-Race at each iterationI usual F-race terminations criteriaI additional: stop when Nmin = 2 + round(log2 d) configurations
remain
Example iterated F-race algorithm (2)
Generation of candidate solutions
I set a Ns elite solutions is maintainedI distribution is defined on each elite configurationI selection probability for each elite configuration is definedI choose the distribution for sampling w.r.t selection probability
I continuous and pseudo-continuous parametersI domain Xi ∈ [Xi ,Xi ], range vi = Xi − Xi
I sampling distribution is a normal one N(xzi , σl
i )
σl+1i = vi ·
(1
Nl+1
) ld
for l = 1, . . . , L− 1
I categorical parametersI assume parameter takes level f z
i in elite solution
Pl+1(fj) = Pl(fj) · (1−l
L) + Ij=f z
i· l
Lfor l = 1, . . . , L− 1
I first iteration: uniform distribution
Some applications
International time-tabling competition
I winning algorithm configured by F-race
I interactive injection of new configurations
Vehicle routing and scheduling problem
I first industrial application
I improved commerialized algorithm
F-race in stochastic optimization
I evaluate “neighbours” using F-race(solution cost is a random variable!)
I very good performance if variance of solution cost is high
Current, ongoing work
Main directions
I comparison of configuration methods
I improvements of configuration methods
I automatic configuration of multi-objective algorithms
I multi-objective configuration
I understanding difficulty of configuration problems
Main theme: computer-aided algorithm design
Example: configuration of multi-objective optimizers
Current, ongoing work
Main directions
I comparison of configuration methods
I improvements of configuration methods
I automatic configuration of multi-objective algorithms
I multi-objective configuration
I understanding difficulty of configuration problems
Main theme: computer-aided algorithm design
Example: configuration of multi-objective optimizers
Current, ongoing work
Main directions
I comparison of configuration methods
I improvements of configuration methods
I automatic configuration of multi-objective algorithms
I multi-objective configuration
I understanding difficulty of configuration problems
Main theme: computer-aided algorithm design
Example: configuration of multi-objective optimizers
Outline
1. Stochastic local search
2. “Simple SLS method”: IG
3. Multi-objective Optimization
4. Automatic offline configuration: F-race
5. Automatic Configuration of Multi-objective Optimizers
Automatic configuration of multi-objective optimizers
I Goal: find the best parameter settings of multi-objectiveoptimizer to solve unseen instances of a problem, given
I a flexible framework for the multi-objective optimizer
I a set of training instances representative of the same problem.
I a maximum budget (number of experiments / time limit)
I automatic configuration tool: I/F-Race
I designed for single-objective optimization.
I I/F-Race + hypervolume = multi-objective automaticconfiguration
Hypervolume measure
Automatic configuration of TPLS+PLS
TPLS+PLS framework
I multi-objective part is modular and problem-independent
I TPLS+PLS framework can be easily parameterized
Parameter name Type Domaintpls ratio ordered {0.1, 0.2, . . . , 0.9, 1}init scal ratio ordered {1, 1.5, 2, 3, 4, 6, 8, 10}nb scal integer [0, 30]two seeds categorical {yes, no}restart categorical {yes, no}theta real [0, 0.5]pls operator categorical {ex, ins, exins}
Hypervolume statistics, size 50x20
confhand conftun−rnd conftun−ic
mean sd mean sd mean sd
(Cmax, SFT) 0.974 0.036 0.982 0.038 0.984 0.034(Cmax, TT) 0.999 0.039 1.005 0.038 1.002 0.035(Cmax, WT) 1.037 0.026 1.045 0.024 1.045 0.023(SFT, TT) 0.954 0.038 0.955 0.039 0.96 0.04(SFT, WT) 1.022 0.028 1.024 0.03 1.029 0.026
Hypervolume statistics, size 100x20
confhand conftun−rnd conftun−ic
mean sd mean sd mean sd
(Cmax, SFT) 0.943 0.058 0.968 0.056 0.971 0.058(Cmax, TT) 1.005 0.043 1.008 0.045 1.012 0.038(Cmax, WT) 1.013 0.043 1.028 0.039 1.025 0.04(SFT, TT) 0.621 0.129 0.755 0.117 0.761 0.133(SFT, WT) 0.951 0.037 0.922 0.051 0.962 0.048
Comparison hand-tuned vs. automatically configured
3650 3750 3850 3950 4050 4150 4250Cmax
4e+
048e
+04
1.2e
+05
1.6e
+05
∑w
iTi
hand
[0.8, 1.0][0.6, 0.8)[0.4, 0.6)[0.2, 0.4)[0.0, 0.2)
3650 3750 3850 3950 4050 4150 4250Cmax
4e+
048e
+04
1.2e
+05
1.6e
+05
∑w
iTi
tuning−1
3800 3900 4000 4100 4200 4300 4400Cmax
5e+
041e
+05
1.5e
+05
∑w
iTi
hand
[0.8, 1.0][0.6, 0.8)[0.4, 0.6)[0.2, 0.4)[0.0, 0.2)
3800 3900 4000 4100 4200 4300 4400Cmax
5e+
041e
+05
1.5e
+05
∑w
iTi
tuning−1
3800 4000 4200 4400Cmax
4e+
048e
+04
1.2e
+05
1.6e
+05
∑w
iTi
hand
[0.8, 1.0][0.6, 0.8)[0.4, 0.6)[0.2, 0.4)[0.0, 0.2)
3800 4000 4200 4400Cmax
4e+
048e
+04
1.2e
+05
1.6e
+05
∑w
iTi
tuning−1
3750 3850 3950 4050 4150 4250Cmax
2e+
046e
+04
1e+
051.
4e+
05
∑w
iTi
hand
[0.8, 1.0][0.6, 0.8)[0.4, 0.6)[0.2, 0.4)[0.0, 0.2)
3750 3850 3950 4050 4150 4250Cmax
2e+
046e
+04
1e+
051.
4e+
05
∑w
iTi
tuning−1
Automatic configuration multi-objective ACO
MOACO (5)
MOACO (4)
MOACO (3)
MOACO (2)
MOACO (1)
mACO−4
mACO−3
mACO−2
mACO−1
PACO
COMPETants
MACS
BicriterionAnt (3 col)
BicriterionAnt (1 col)
MOAQ
0.5 0.6 0.7 0.8 0.9 1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
euclidAB100.tsp
0.5 0.6 0.7 0.8 0.9 1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●●●●
euclidAB300.tsp
0.5 0.6 0.7 0.8 0.9 1.0
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
euclidAB500.tsp
Automatic configuration multi-objective ACO
MOACO−full (5)
MOACO−full (4)
MOACO−full (3)
MOACO−full (2)
MOACO−full (1)
MOACO−aco (5)
MOACO−aco (4)
MOACO−aco (3)
MOACO−aco (2)
MOACO−aco (1)
MOACO (5)
BicriterionAnt−aco (5)
BicriterionAnt−aco (4)
BicriterionAnt−aco (3)
BicriterionAnt−aco (2)
BicriterionAnt−aco (1)
BicriterionAnt (3 col)
0.85 0.90 0.95 1.00 1.05 1.10
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
euclidAB100.tsp
0.85 0.90 0.95 1.00 1.05 1.10
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●●●
●
●
euclidAB300.tsp
0.85 0.90 0.95 1.00 1.05 1.10
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
euclidAB500.tsp
Conclusions
I automatic configuration of multi-objective optimizers wellfeasible
I new state-of-the-art algorithms for biobjective PFSPs havebeen obtained
I new state-of-the-art ACO algorithms have been obtained
I significant room for further research
IRIDIA: Metaheuristics unit
I headed by Prof. Dorigo (director of IRIDIA)
I permanent FNRS researchers
I PostDocs
I PhD students
Metaheuristics unit
Projects
I Ongoing: Meta-X, MIBISOC, FRFC, individual fellowships
I Past: Ants, Comp2SYS, Metaheuristics Network
Organization
I conference series (ANTS, SLS engineering)
I new journal (Swarm Intelligence)
Publications
I > 200 publications in last 10 years
Main research areas
Metaheuristic techniques
I ant colony optimization, iterated local search, iterated greedy,particle swarm optimization, local search algorithms
Applications
I routing, scheduling, assignment, bioinformatics problems
I multi-objective, dynamic and stochastic problems
I for many tackled problems state-of-the-art algorithms were“engineered”
Main research areas
Automatic algorithm configuration / tuning
I development of tools for the automatic configuration / tuningof algorithms, application of automatic configuration tools inalgorithm engineering
Others
I parallelization of metaheuristics
I large-scale experimental studies
I experimental methodologies
I continuous optimization problems