61
Simplicial Global Optimization Julius ˇ Zilinskas Vilnius University, Lithuania November 8, 2016 http://web.vu.lt/mii/j.zilinskas

Simplicial Global Optimizationhpc-education.unn.ru/files/conference_hpc/2016/... · 2016. 11. 12. · n = 4 1759290 1758531 1760934 1759680 n = 5;6 7878996 7846744 7851609 7920898

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Simplicial Global Optimization

    Julius Žilinskas

    Vilnius University, Lithuania

    November 8, 2016

    http://web.vu.lt/mii/j.zilinskas

    http://web.vu.lt/mii/j.zilinskas

  • Global optimization

    Find f ∗ = minx∈A

    f (x) and x∗ ∈ A, f (x∗) = f ∗, where A ⊆ Rn.

    Example:

    I n = 1;

    I A = [0, 10];I Objective function

    f (x) = −5∑

    j=1

    j sin((j+1)x+j);

    I f ∗ = −12.0312;I x∗ = 5.7918. 0 2 4 6 8 10

    −15

    −10

    −5

    0

    5

    10

    15

    x

    f(x

    )

  • Local optimization

    I A point x∗ is a local minimumpoint if f (x∗) ≤ f (x) forx ∈ N, where N is aneighborhood of x∗.

    I A local minimum point canbe found stepping in thedirection of steepest descent.

    I Without additionalinformation one cannot say ifthe local minimum is global.

    I How do we know if we are inthe deepest hole?

    4 6 8−15

    −10

    −5

    0

    5

    10

    15

    xf

    (x)

  • Algorithms for global optimization

    I With guaranteed accuracy:I Lipschitz optimization,I Branch and bound algorithms.

    I Randomized algorithms and heuristics:I Random search,I Multi-start,I Clustering methods,I Generalized descent methods,I Simulated annealing,I Evolutionary algorithms (genetic algorithms, evolution

    strategies),I Swarm-based optimization (particle swarm, ant colony).

  • Criteria of performance of global optimization algorithms

    I Speed:I Time of optimization,I Number of objective function (and sometimes gradient,

    bounding, and other functions) evaluations,I Both criteria are equivalent when objective function is

    “expensive” – its evaluation takes much more time thanauxiliary computations of an algorithm.

    I Best function value found.

    I Reliability – how often problems are solved with prescribedaccuracy.

  • Parallel global optimization

    I Global optimization problems are classified difficult in thesense of the algorithmic complexity theory. Globaloptimization algorithms are computationally intensive.

    I When computing power of usual computers is not sufficient tosolve a practical global optimization problem, highperformance parallel computers and computational grids maybe helpful.

    I An algorithm is more applicable in case its parallelimplementation is available, because larger practical problemsmay be solved by means of parallel computations.

  • Criteria of performance of parallel algorithms

    I Efficiency of the parallelization can be evaluated using severalcriteria taken into account the optimization time and thenumber of processors.

    I A commonly used criterion of parallel algorithms is thespeedup:

    sp =t1tp,

    where tp is the time used by the algorithm implemented on pprocessors.

    I The speedup divided by the number of processors is called theefficiency:

    ep =spp.

  • Covering methods

    I Detect and discard non-promising sub-regions usingI interval arithmetic

    {f (X ) |X ∈ X ,X ∈ [R,R]n

    }⊆ f

    (X),

    I Lipschitz condition |f (x)− f (y)| ≤ L||x − y ||,I convex envelopes,I statistical estimates,I heuristic estimates,I ad hoc algorithms.

    I May be implemented using branch and bound algorithms.

    I May guarantee accuracy for some classes of problems:f ∗ ≤ minx∈D f (x) + �.

  • Branch and bound

    I An iteration of a classical branch and bound processes a nodein the search tree representing a not yet explored subspace.

    I Iteration has three main components: selection of the node toprocess, branching and bound calculation.

    I The bound for the objective function over a subset of feasiblesolutions should be evaluated and compared with the bestobjective function value found.

    I If the evaluated bound is worse than the known functionvalue, the subset cannot contain optimal solutions and thebranch describing the subset can be pruned.

    I The fundamental aspect of branch and bound is that theearlier the branch is pruned, the smaller the number ofcomplete solutions that will need to be explicitly evaluated.

  • General algorithm for partition and branch-and-bound

    Cover D by L = {Lj |D ⊆⋃Lj , j = 1,m} using covering rule.

    while L 6= ∅ doChoose I ∈ L using selection rule, L← L\{I}.if bounding rule is satisfied for I then

    Partition I into p subsets Ij using subdivision rule.for j = 1, . . . , p do

    if bounding rule is satisfied for Ij thenL← L

    ⋃{Ij}.

    end ifend for

    end ifend while

  • Rules of branch and bound algorithms for globaloptimization

    I The rules of covering and subdivision depend on type ofpartitions used: hyper-rectangular, simplicial, etc.

    I Selection rules:I Best first, statistical – the best candidate: priority queue.I Depth first – the youngest candidate: stack or without storing.I Breadth first – the oldest candidate: queue.

    I Bounding rule describes how the bounds for minimum arefound. For the upper bound the best currently found functionvalue may be accepted. The lower bound may be estimatedusing

    I Convex envelopes of the objective function.I Lipschitz condition |f (x)− f (y)| ≤ L||x − y ||.I Interval arithmetic

    {f (X ) |X ∈ X ,X ∈ [R,R]n

    }⊆ f

    (X).

  • Rules of covering rectangular feasible region and branching

    ���

    ���

    @@ ���

    @@@

    ���@@@

    &%'$

    &%'$����������������

    &%'$����������������

    ����������������

    &%'$

    &%'$��������

    &%'$����������������

    &%'$������������������������

  • Lipschitz optimization

    I Lipschitz optimization is one of the most deeply investigatedsubjects of global optimization. It is based on the assumptionthat the slope of an objective function is bounded.

    I The multivariate function f : D → R,D ⊂ Rn is said to beLipschitz if it satisfies the condition

    |f (x)− f (y)| ≤ L‖x − y‖, ∀x , y ∈ D,

    where L > 0 is a constant called Lipschitz constant, thedomain D is compact and ‖ · ‖ denotes a norm.

    I Branch and bound algorithm with Lipschitz bounds may bebuilt: if the evaluated bound is worse than the known functionvalue, the sub-region cannot contain optimal solutions and thebranch describing it can be pruned.

  • Lipschitz bounds

    The efficiency of the branch and bound technique depends on thebound calculation.

    I The upper bound for the minimum is the smallest value of thefunction at the vertices xv : LB(D) = min

    xvf (xv ).

    I The lower bound for the minimum is evaluated exploitingLipschitz condition:

    f (x) ≤ f (y) + L‖x − y‖.

    I The lower bound can be derived as

    µ1 = maxxv∈V (I )

    {f (xv )− Lmax

    x∈I‖x − xv‖

    }.

  • Comparison of selection strategies in Lipschitz optimization

    Paulavičius, Žilinskas, Grothey, Optimization Letters, 4(2),173-183, 2010, doi:10.1007/s11590-009-0156-3

    Average number of function evaluationsDimension Best First Statistical Depth First Breadth First

    n = 2 9064 9100 9716 9068n = 3 1217072 1217206 1222060 1217509n = 4 879656 879276 880478 879851n = 5, 6 3939678 3923552 3925984 3960629Average total amount of simplicesDimension Best First Statistical Depth First Breadth First

    n = 2 18126 18199 19430 18133n = 3 2434138 2434406 244114 2435013n = 4 1759290 1758531 1760934 1759680n = 5, 6 7878996 7846744 7851609 7920898

  • Average maximal number of simplices in the listDimension Best First Statistical Depth First Breadth First

    n = 2 2709 640 12 3135n = 3 274960 102734 23 423552n = 4 168927 105814 36 249651n = 5, 6 704862 136589 374 903555Average execution timeDimension Best First Statistical Depth First Breadth First

    n = 2 0.05 0.05 0.05 0.05n = 3 17.99 16.00 12.41 13.64n = 4 31.65 31.68 26.86 27.53n = 5, 6 1592.19 1568.13 1577.68 1555.55Average ratio tBestFound/tAllTimeDimension Best First Statistical Depth First Breadth First

    n = 2 0.234 0.236 0.365 0.408n = 3 0.140 0.008 0.209 0.306n = 4 0.142 0.035 0.240 0.472n = 5, 6 0.074 0.000 0.005 0.077

  • Simplex

    I An n-simplex is the convex hull of a set of (n + 1) affinelyindependent points in n-dimensional Euclidean space.

    I An one-simplex is a segment of line, a two-simplex is atriangle, a three-simplex is a tetrahedron.

    TTTTTTT�

    ������ T

    TTTTTT�

    ������

  • Simplicial vs rectangular partitions

    I A simplex is a polyhedron in n-dimensional space, which hasthe minimal number of vertices.

    I Simplicial partitions are preferable when values of theobjective function at vertices of partitions are used to evaluatesub-regions.

    I Numbers of function evaluations in Lipschitz globaloptimization with rectangular and simplicial partitions:

    test function 1 2 3 4 5 6 7rectangular 643 167 3531 45 73 969 7969simplicial 611 132 2185 70 80 838 3117

    test function 8 9 10 11 12 13rectangular 301 13953 1123 2677 12643 15695simplicial 244 3773 848 1566 4001 4084

  • Subdivision of simplices

    I Into similar simplices.

    TTTTTTT�

    ������ T

    TTT �

    ���

    I Into two through the middle of the longest edge.

    ������� @

    @@@

    ��������

    ��������

    ����BBBB

  • Covering of feasible region

    I Very often a feasible region in global optimization is ahyper-rectangle defined by intervals of variables.

    I A feasible region is face-to-face vertex triangulated: it ispartitioned into n-simplices, where the vertices of simplices arealso the vertices of the feasible region.

    I A feasible region defined by linear inequality constraints maybe vertex triangulated. In this way constraints are managed byinitial covering.

    ������ ���

    ����@@@@@@(((((

    ((((�������

    ����

    QQQQ

    BBBB(((((((((""

    """"""

  • Combinatorial approach for vertex triangulation of ahyper-rectangle

    I General – for any n.

    I Deterministic, the number of simplices is known in advance –n!.

    I All simplices are of equal hyper-volume – 1/n! of thehyper-volume of the hyper-rectangle.

    I By adding just one point at the middle of diagonal of thehyper-rectangle each simplex may be subdivided into two.

    I May be easily parallelized.

  • Examples of combinatorial vertex triangulation of two- andthree-dimensional hyper-rectangles

    ������

    ����

    �����������(((((((((

    ����������

    ������� �

    ����������

    ��������

    ���������

    ((((((((((((((((((���

    � ���������

    ��������

    ��������

    ����

    ���������������((((((((( �������

    ���������

    �����

  • Algorithm for combinatorial vertex triangulation ofhyper-rectangle

    for τ = equals one of all permutations of 1, . . . , n dofor j = 1, . . . , n do

    v1j ← Dj1end forfor i = 1, . . . , n do

    for j = 1, . . . , n dov(i+1)j ← vij

    end forv(i+1)τi ← Dτi2

    end forend for

  • Combinatorial vertex triangulation of a unit cube

    τ = {1, 2, 3} τ = {1, 3, 2} τ = {2, 1, 3}

    v =

    0 0 01 0 01 1 01 1 1

    v =

    0 0 01 0 01 0 11 1 1

    v =

    0 0 00 1 01 1 01 1 1

    τ = {2, 3, 1} τ = {3, 1, 2} τ = {3, 2, 1}

    v =

    0 0 00 1 00 1 11 1 1

    v =

    0 0 00 0 11 0 11 1 1

    v =

    0 0 00 0 10 1 11 1 1

  • Parallel algorithm for combinatorial vertex triangulation

    for k = bn!rank/sizec to bn!(rank + 1)/sizec − 1 dofor j = 1, . . . , n do τj ← j end forc ← 1for j = 2, . . . , n doc ← c(j − 1)swap τj−bk/cc%j with τj

    end forfor j = 1, . . . , n do v0j ← Dj1 end forfor i = 1, . . . , n do

    for j = 1, . . . , n do v(i+1)j ← vij end forv(i+1)τi ← Dτi2

    end forend for

  • Vertex triangulation of feasible region defined by linearinequality constraints I

    min f (x),s.t. 0 ≤ xi ≤ 1,

    x1 + x2 ≤ 1,x2 − x3 ≤ 0.

    @@@@@@

    ����

    ��������������� B

    BBBBBB�

    ������@@@@@@����

    x1 = 0, x2 = 0, x3 = 0x1 = 0, x2 = 0, x3 = 1

    ((((((((

    (((hhhhhhhhhhhx1 = 0, x2 = 1, x3 = 0x1 = 0, x2 = 1, x3 = 1x1 = 1, x2 = 0, x3 = 0x1 = 1, x2 = 0, x3 = 1

    ((((((((

    (((hhhhhhhhhhhx1 = 1, x2 = 1, x3 = 0

    ((((((((

    (((hhhhhhhhhhhx1 = 1, x2 = 1, x3 = 1

  • Vertex triangulation of feasible region defined by linearinequality constraints II

    min f (x),s.t. 0 ≤ xi ≤ 1,

    x1 + x2 ≤ 1,−x2 + x3 ≤ 0.

    @@@@@@

    ����

    ���������������

    @@@@@@BBBBBBB

    ����

    x1 = 0, x2 = 0, x3 = 0

    ((((((((

    (((hhhhhhhhhhhx1 = 0, x2 = 0, x3 = 1x1 = 0, x2 = 1, x3 = 0x1 = 0, x2 = 1, x3 = 1x1 = 1, x2 = 0, x3 = 0

    (((((((

    ((((hhhhhhhhhhhx1 = 1, x2 = 0, x3 = 1

    ((((((((

    (((hhhhhhhhhhhx1 = 1, x2 = 1, x3 = 0

    ((((((((

    (((hhhhhhhhhhhx1 = 1, x2 = 1, x3 = 1

  • DISIMPL (DIviding SIMPLices) algorithm

    0 14

    12

    34

    1

    0

    14

    12

    34

    1

    305.96 9.50

    152.0117.24

    24.28

    10.22

    24.51

    27.16

    105.35

    150.84

    32.72 2.94

    20.60

    x1

    0 13

    23

    1

    0

    13

    23

    1

    1.33

    3.33

    5.83

    8.00

    4.89

    1.14

    2.05

    5.058.00

    2.811.14

    5.72

    x1I The feasible region defined by linear constraints may be

    covered by simplices.I Symmetries of objective functions may be exploited using

    linear inequality constraints.I Paulavičius, Žilinskas. Optimization Letters, 10(2), 237-246,

    2016, doi:10.1007/s11590-014-0772-4

  • DISIMPL-V and DISIMPL-C algorithmsDISIMPL - stands for DIviding SIMPLices.

    I Initial step: Scale D→ D = {x ∈ Rn : 0 ≤ xi ≤ 1, i =1, . . . , n}

    I Covering step: Cover D by n! simplices

    0 10

    1

    01

    01

    0

    1

    I Step 1: Identify and select potentially optimal simplicesI Step 2: Sample the function and divide into two by a

    hyper-plane passing through the middle point of the longestedge and the vertices which do not belong to the longest edge(DISIMPL-c) or into three simplices dividing the longestedge too.

    I Repeat Step 1 and Step 2 until satisfied some stoppingcriteria.

  • DISIMPL-V and DISIMPL-C algorithmsDISIMPL - stands for DIviding SIMPLices.

    I Initial step: Scale D→ D = {x ∈ Rn : 0 ≤ xi ≤ 1, i =1, . . . , n}

    I Covering step: Cover D by n! simplices of equalhyper-volume

    0 10

    1

    01

    01

    0

    1

    I Step 1: Identify and select potentially optimal simplicesI Step 2: Sample the function and divide into two by a

    hyper-plane passing through the middle point of the longestedge and the vertices which do not belong to the longest edge(DISIMPL-c) or into three simplices dividing the longestedge too.

    I Repeat Step 1 and Step 2 until satisfied some stoppingcriteria.

  • DISIMPL-V and DISIMPL-C algorithmsDISIMPL - stands for DIviding SIMPLices.

    I Initial step: Scale D→ D = {x ∈ Rn : 0 ≤ xi ≤ 1, i =1, . . . , n}

    I Covering step: Cover D by n! simplices and evaluate at allthe vertices [DISIMPL-v] (barycenters [DISIMPL-c]).

    0 10

    1

    01

    01

    0

    1

    I Step 1: Identify and select potentially optimal simplicesI Step 2: Sample the function and divide into two by a

    hyper-plane passing through the middle point of the longestedge and the vertices which do not belong to the longest edge(DISIMPL-c) or into three simplices dividing the longestedge too.

    I Repeat Step 1 and Step 2 until satisfied some stoppingcriteria.

  • DISIMPL-V and DISIMPL-C algorithmsDISIMPL - stands for DIviding SIMPLices.

    I Initial step: Scale D→ D = {x ∈ Rn : 0 ≤ xi ≤ 1, i =1, . . . , n}

    I Covering step: Cover D by n! simplices and evaluate at allthe vertices [DISIMPL-v] (barycenters [DISIMPL-c]).For symmetric problems there may be one initial simplex.

    0 10

    1

    01

    01

    0

    1

    I Step 1: Identify and select potentially optimal simplicesI Step 2: Sample the function and divide into two by a

    hyper-plane passing through the middle point of the longestedge and the vertices which do not belong to the longest edge(DISIMPL-c) or into three simplices dividing the longestedge too.

    I Repeat Step 1 and Step 2 until satisfied some stoppingcriteria.

  • DISIMPL-V and DISIMPL-C algorithmsDISIMPL - stands for DIviding SIMPLices.

    I Initial step: Scale D→ D = {x ∈ Rn : 0 ≤ xi ≤ 1, i =1, . . . , n}

    I Covering step: Cover D by n! simplices and evaluate at allthe vertices [DISIMPL-v] (barycenters [DISIMPL-c]).For symmetric problems there may be one initial simplex.

    I Step 1: Identify and select potentially optimal simplices

    Potentially optimal simplices (POS)[DISIMPL-v]

    Let S be a set of all simplices created by DISIMPL-v after kiterations, ε > 0 and fmin current best function value. A j ∈ S isPOS if ∃L̃ > 0 such that

    minv∈V(j)

    f (v)− L̃δj ≤ minv∈V(i)

    f (v)− L̃δi , ∀i ∈ S

    minv∈V(j)

    f (v)− L̃δj ≤ fmin − ε|fmin|.

    Here δj is the length of its longest edge and V(j) is the vertexset.

    I Step 2: Sample the function and divide into two by ahyper-plane passing through the middle point of the longestedge and the vertices which do not belong to the longest edge(DISIMPL-c) or into three simplices dividing the longestedge too.

    I Repeat Step 1 and Step 2 until satisfied some stoppingcriteria.

  • DISIMPL-V and DISIMPL-C algorithmsDISIMPL - stands for DIviding SIMPLices.

    I Initial step: Scale D→ D = {x ∈ Rn : 0 ≤ xi ≤ 1, i =1, . . . , n}

    I Covering step: Cover D by n! simplices and evaluate at allthe vertices [DISIMPL-v] (barycenters [DISIMPL-c]).For symmetric problems there may be one initial simplex.

    I Step 1: Identify and select potentially optimal simplices

    Potentially optimal simplices (POS) [DISIMPL-c]

    Let S be a set of all simplices created by DISIMPL-c after kiterations, ε > 0 and fmin current best function value. A j ∈ S isPOS if ∃L̃ > 0 such that

    f (cj)− L̃δj ≤ f (ci )− L̃δi , ∀i ∈ Sf (cj)− L̃δj ≤ fmin − ε|fmin|.

    Here cj is the geometric center (centroid) of simplex j and ameasure δj is the maximal distance from cj to its vertices.

    I Step 2: Sample the function and divide into two by ahyper-plane passing through the middle point of the longestedge and the vertices which do not belong to the longest edge(DISIMPL-c) or into three simplices dividing the longestedge too.

    I Repeat Step 1 and Step 2 until satisfied some stoppingcriteria.

  • DISIMPL-V and DISIMPL-C algorithms

    DISIMPL - stands for DIviding SIMPLices.

    I Initial step: Scale D→ D = {x ∈ Rn : 0 ≤ xi ≤ 1, i =1, . . . , n}

    I Covering step: Cover D by n! simplices and evaluate at allthe vertices [DISIMPL-v] (barycenters [DISIMPL-c]).For symmetric problems there may be one initial simplex.

    I Step 1: Identify and select potentially optimal simplices

    I Step 2: Sample the function and divide into two by ahyper-plane passing through the middle point of the longestedge and the vertices which do not belong to the longest edge(DISIMPL-c) or into three simplices dividing the longestedge too.

    I Repeat Step 1 and Step 2 until satisfied some stoppingcriteria.

  • Visualization of DISIMPL-V and DISIMPL-C

    I DISIMPL-V Result: fmin = 0.07, iter = 0, f .eval = 3

    I DISIMPL-C Result: fmin = −3.90, iter = 0, f .eval = 1

    0 16

    13

    12

    23

    56

    1

    0

    16

    13

    12

    23

    56

    1

    0.07 0.86

    11.18

    DISIMPL-V

    0 16

    13

    12

    23

    56

    1

    0

    16

    13

    12

    23

    56

    1

    -3.90

    DISIMPL-C

  • Visualization of DISIMPL-V and DISIMPL-C

    I DISIMPL-V Result: fmin = 0.07, iter = 1, f .eval = 4

    I DISIMPL-C Result: fmin = −9.36, iter = 1, f .eval = 3

    0 16

    13

    12

    23

    56

    1

    0

    16

    13

    12

    23

    56

    1

    0.07 0.86

    11.18

    0.07 0.86

    19.88

    DISIMPL-V

    0 16

    13

    12

    23

    56

    1

    0

    16

    13

    12

    23

    56

    1

    -3.90

    -9.36

    0.64

    DISIMPL-C

  • Visualization of DISIMPL-V and DISIMPL-C

    I DISIMPL-V Result: fmin = 0.07, iter = 2, f .eval = 5

    I DISIMPL-C Result: fmin = −41.48, iter = 2, f .eval = 5

    0 16

    13

    12

    23

    56

    1

    0

    16

    13

    12

    23

    56

    1

    0.07 0.86

    11.18

    0.07 0.86

    11.18

    19.88

    1.15

    DISIMPL-V

    0 16

    13

    12

    23

    56

    1

    0

    16

    13

    12

    23

    56

    1

    -3.90

    -9.36

    0.64

    37.87 -41.48

    DISIMPL-C

  • Visualization of DISIMPL-V and DISIMPL-C

    I DISIMPL-V Result: fmin = 0.07, iter = 3, f .eval = 7

    I DISIMPL-C Result: fmin = −16.20, iter = 3,f .eval = 9

    0 16

    13

    12

    23

    56

    1

    0

    16

    13

    12

    23

    56

    1

    0.07 0.86

    11.18

    0.07 0.86

    19.88

    1.15

    14.91

    8.08

    DISIMPL-V

    0 16

    13

    12

    23

    56

    1

    0

    16

    13

    12

    23

    56

    1

    -3.90

    -9.36

    0.64

    37.87 -41.4815.98

    1.08

    -1.45

    17.35

    DISIMPL-C

  • Coping with symmetries of objective function

    I If interchange of the variables xi and xj does not change thevalue of the objective function, it is symmetric over thehyper-plane xi = xj .

    I It is possible to avoid such symmetries by setting linearconstraints on such variables: xi ≤ xj .

    I The resulting constrained search space may be vertextriangulated.

    I The search space and the numbers of local and globalminimizers may be reduced by avoiding symmetries.

  • Example of coping with symmetries of objective function

    f (x) =n∑

    i=1

    x2i4000

    −n∏

    i=1

    cos(xi ) + 1,D = [−500, 700]n

    I The objective function is symmetric over hyper-planes xi = xj .

    I Constraints for avoiding symmetries: x1 ≤ x2 ≤ . . . ≤ xn.I The resulting simplicial search space:

    D =

    −500 −500 . . . −500 −500−500 −500 . . . −500 700

    .... . .

    ...−500 700 . . . 700 700

    700 700 . . . 700 700

    .

    ������� �

    ����������

    ��������

  • Example of coping with symmetries: optimization ofgillage-type foundations

    I Grillage foundation consists of separate beams, which aresupported by piles or reside on other beams.

    I The piles should be positioned minimizing the largestdifference between reactive forces and limit magnitudes ofreactions.

  • Optimization of gillage-type foundations: formulation

    I Black-box problem. The values of objective function areevaluated by an independent package which models reactiveforces in the grillage using finite element method.

    I Gradient may be estimated using sensitivity analysisimplemented in the modelling package.

    I The number of piles is n.

    I The position of a pile is given by a real number, which ismapped to a two-dimensional position by the modellingpackage. Possible values are from zero to the sum of length ofall beams l .

    I Feasible region is [0, l ]n.

    I Characteristic of all piles are equal, their interchange does notchange the value of objective function.

  • Optimization of gillage-type foundations: simplicial searchspace I

    I Let us constrain the problem avoiding symmetries of theobjective function: x1 ≤ x2 ≤ . . . ≤ xn.

    I Simplicial search space:

    D =

    0 0 . . . 0 00 0 . . . 0 l...

    . . ....

    0 l . . . l ll l . . . l l

    .

    I Search space and the numbers of local and global minimizersare reduced n! times with respect to the original feasibleregion.

  • Optimization of gillage-type foundations: simplicial searchspace II

    I The distance between adjacent piles cannot be too small dueto the specific capacities of a pile driver.

    I Let us constrain the problem avoiding symmetries anddistances between two piles smaller than δ:x1 ≤ x2 − δ, . . . , xn−1 ≤ xn − δ.

    I Simplicial search space:

    D =

    0 δ . . . (n − 2)δ (n − 1)δ0 δ . . . (n − 2)δ l...

    . . ....

    0 l − (n − 2)δ . . . l − δ ll − (n − 1)δ l − (n − 2)δ . . . l − δ l

    .

  • Least squares nonlinear regressionI The optimization problem in nonlinear LSR:

    minx∈D

    m∑k=1

    (yi − ϕ(x, zi ))2 = minx∈D

    f (x),

    where the measurements yi at points zi = (z1i , z2i , ..., zpi )should be tuned by the nonlinear function ϕ(x, ·), e.g.

    ϕ(x, z) = x1 + x2 exp(−x4z) + x3 exp(−x5z).

    −2 −1.5 −1 −0.5 0−2

    −1.5

    −1

    −0.5

    0

    −2 −1.5 −1 −0.5 0−2

    −1.5

    −1

    −0.5

    0

  • Experimental investigation

    # DISIMPL-V DISIMPL-C SymDIRECT

    pe < 1.0 pe < 0.01 pe < 1.0 pe < 0.01 pe < 1.0 pe < 0.01

    1. 156 (1) 216 (2) 172 (0) 554 (0) 105 (2) 409 (1)2. 141 (2) 192 (2) 252 (0) 430 (0) 199 (1) 289 (1)3. 935 (1) 1914 (1) 2456 (0) 3534 (0) 705 (2) 955 (2)4. 939 (1) 1894 (1) 686 (2) 1718 (2) 1577 (0) 2763 (0)5. 288 (1) 1171 (0) 202 (2) 1162 (1) 425 (0) 1025 (2)6. 333 (0) 977 (1) 274 (1) 1006 (0) 201 (2) 815 (2)7. 612 (0) 2439 (2) 258 (1) 5034 (0) 157 (2) 2701 (1)8. 224 (2) 3879 (2) 282 (1) 4050 (1) 407 (0) 5927 (0)9. 612 (0) 2439 (2) 258 (1) 4974 (0) 157 (2) 2701 (1)

    10. 224 (2) 3879 (2) 282 (1) 4050 (1) 407 (0) 5897 (0)11. 612 (0) 2439 (2) 258 (1) 5034 (0) 157 (2) 2839 (1)12. 224 (2) 4041 (1) 282 (1) 4026 (2) 407 (0) 5633 (0)

    TP (12) (18) (11) (7) (13) (11)

  • Motivation of globally-biased Direct-type algorithmsI It is well known that the Direct algorithm quickly gets close

    to the optimum but can take much time to achieve a highdegree of accuracy.

    0 0.05 0.1 0.15 0.20 0.25−1

    0

    1

    δki

    minf

    (v),v∈V

    (S)

    Disimpl-v

    0 0.05 0.1 0.15 0.20 0.25−1

    0

    1

    δki

    f(c

    i)

    Disimpl-c

    non-potentially optimal simplices potentially optimal simplices

  • The globally-biased Disimpl (Gb-Disimpl)

    I Gb-Disimpl consists of the following two-phases: a usualphase (as in the original Disimpl method) and a global one.

    I The usual phase is performed until a sufficient number ofsubdivisions of simplices near the current best point isperformed.

    I Once many subdivisions around the current best point havebeen executed, its neighborhood contains only small simplicesand all larger ones are located far from it.

    I Thus, the two-phase approach forces the Gb-Disimplalgorithm to explore larger simplices and to return to theusual phase only when an improved minimal function value isobtained.

    I Each of these phases can consist of several iterations.

    I Paulavičius, Sergeev, Kvasov, Žilinskas. JOGO, 59(2-3),545-567, 2014, doi:10.1007/s10898-014-0180-4

    Journal of Global Optimization Best Paper Award 2014.

  • Visualization of Gb-Disimpl-v

    98 7 6 5 4 3 2 1 0−1

    −0.5

    0

    0.5

    1

    fmin − ε|fmin|f kmin

    Skmin

    groups indices: lk ∈ Lk = {0, . . . , 9}

    minf

    (v),v∈V

    (Si)

    non-potentially optimal simplicespotentially optimal simplices using Disimpl-vpotentially optimal simplices using Gb-Disimpl-v

  • Visualization of Gb-Disimpl

    0 0.05 0.1 0.15 0.20 0.25−1

    0

    1

    δki

    minf

    (v),v∈V

    (S)

    Gb-Disimpl-v

    0 0.05 0.1 0.15 0.20 0.25−1

    0

    1

    δkif

    (ci)

    Gb-Disimpl-c

    non-potentially optimal simplices potentially optimal simplices

    Figure: Geometric interpretation of potentially optimal simplices ink = 23 iteration of the Gb-Disimpl-v and Gb-Disimpl-c algorithms ona n = 2 GKLS test problem

  • Description of GKLS classes of test problems

    The standard test problems are too simple for global optimizationmethods, therefore we use GKLS generator

    Class (#) Difficulty ∆ n m f ∗ d r

    1. Simple 10−4 2 10 −1.0 0.90 0.202. Hard 10−4 2 10 −1.0 0.90 0.103. Simple 10−6 3 10 −1.0 0.66 0.204. Hard 10−6 3 10 −1.0 0.90 0.205. Simple 10−6 4 10 −1.0 0.66 0.206. Hard 10−6 4 10 −1.0 0.90 0.207. Simple 10−7 5 10 −1.0 0.66 0.308. Hard 10−7 5 10 −1.0 0.66 0.20

    Here: n – dimension; m – no. of local minima; f ∗ – global minimavalue; d – distance from the global minimizer to the vertex; r –radius of the attraction region.

  • Comparison criteria

    1. The number (maximal and average over each class) offunction evaluations (f .e.) executed by the methods until thesatisfaction of the stopping rule:

    Stopping Rule

    The global minimizer x∗ ∈ D = [l , u] was considered to be foundwhen an algorithm generated a trial point xj ∈ Sj such that

    |xj(i)− x∗(i)| ≤n√

    ∆(u(i)− l(i)), 1 ≤ i ≤ n, (1)

    where 0 < ∆ ≤ 1 is an accuracy coefficient. The algorithms alsostopped if the maximal number of function evaluationsTmax = 1 000 000 was reached.

    2. The number of subregions generated until condition (1) issatisfied. This number reflects indirectly degree of qualitativeexamination of D during the search for a global minimizer.

  • Comparison on 8 classes of GKLS problems

    Direct Directl Disimpl-c Disimpl-v

    # Original Gb Original Gb

    Average number of function evaluations

    1 198.89 292.79 200.80 187.72 192.93 173.092 1063.78 1267.07 1189.60 701.28 1003.56 535.953 1117.70 1785.73 2614.02 2217.08 1061.83 745.074 >42322.65 4858.93 7158.76 4452.86 2598.91 1419.115 >47282.89 18983.55 >46500.90 19593.56 10617.98 5051.866 >95708.25 68754.02 >78794.37 >53716.76 33985.16 12796.867 >16057.46 16758.44 >43428.74 >25306.30 11200.44 8579.898 >217215.58 >269064.35 >255311.84 >164754.26 64750.99 34079.21

  • Comparison on 8 classes of GKLS problems

    Direct Directl Disimpl-c Disimpl-v

    # Original Gb Original Gb

    Number of function evaluations (50%)

    1 111 152 132 132 151 1512 1062 1328 705 556 1021 5213 386 591 1403 1438 787 7214 1749 1967 3315 2960 2594 13335 4805 7194 6211 4717 7334 46236 16114 33147 20677 14116 29807 113057 1660 9246 3893 3522 7252 67348 55092 126304 50096 25746 42680 25106

    Number of function evaluations (100%)

    1 1159 2318 960 938 773 4182 3201 3414 6696 2624 2683 17453 12507 13309 50166 20568 4740 26364 FAIL (4) 29233 111252 33346 7354 34835 FAIL (4) 118744 FAIL (4) 238964 58764 129606 FAIL (7) 287857 FAIL (10) FAIL (2) 118482 422467 FAIL (1) 178217 FAIL (2) 288372 48590 321198 FAIL (16) FAIL (4) FAIL (21) FAIL (9) 382593 201365

  • Comparison on 8 classes of GKLS problems

    Direct Directl Disimpl-c Disimpl-v

    # Original Gb Original Gb

    Number of subregions (50%)

    1 111 152 132 132 243 2432 1062 1328 705 556 1862 9213 386 591 1403 1438 2785 25664 1749 1967 3315 2960 10841 51885 4805 7194 6211 4717 63996 420446 16114 33147 20677 14116 301997 1085207 1660 9246 3893 3522 140013 1712158 55092 126304 50096 25746 1238077 728281

    Number of subregions (100%)

    1 1159 2318 960 938 1419 7352 3201 3414 6696 2624 5085 32813 12507 13309 50166 20568 20973 115484 >1000000 29233 111252 33346 33087 150755 >1000000 118744 >1000000 238964 649395 1285346 >1000000 287857 >1000000 >1000000 1335106 4449237 >1000000 178217 >1000000 288372 1465960 12882628 >1000000 >1000000 >1000000 >1000000 11423212 6719537

  • Operating characteristics for the algorithms

    0 1,000 2,000 3,000 4,0000

    20

    40

    60

    80

    100

    Numb. of func. eval.

    Nu

    mb

    erof

    solv

    edfu

    nct

    ion

    s

    Class no. 1 (simple)

    0 1,000 2,000 3,000 4,000

    Numb. of func. eval.

    Class no. 2 (hard)

    Direct Directl Disimpl-cGb-Disimpl-c Disimpl-v Gb-Disimpl-v

    Figure: Operating characteristics for the n = 2 test problems

  • Operating characteristics for the algorithms

    0 1 2 3

    ·104

    0

    20

    40

    60

    80

    100

    Numb. of func. eval.

    Nu

    mb

    erof

    solv

    edfu

    nct

    ion

    s

    Class no. 3 (simple)

    0 1 2 3

    ·104Numb. of func. eval.

    Class no. 4 (hard)

    Direct Directl Disimpl-cGb-Disimpl-c Disimpl-v Gb-Disimpl-v

    Figure: Operating characteristics for the n = 3 test problems

  • Operating characteristics for the algorithms

    0 2 4 6 8

    ·104

    0

    20

    40

    60

    80

    100

    Numb. of func. eval.

    Nu

    mb

    erof

    solv

    edfu

    nct

    ion

    s

    Class no. 5 (simple)

    0 2 4 6 8

    ·104Numb. of func. eval.

    Class no. 6 (hard)

    Direct Directl Disimpl-cGb-Disimpl-c Disimpl-v Gb-Disimpl-v

    Figure: Operating characteristics for the n = 4 test problems

  • Operating characteristics for the algorithms

    0 0.5 1 1.5

    ·105

    0

    20

    40

    60

    80

    100

    Numb. of func. eval.

    Nu

    mb

    erof

    solv

    edfu

    nct

    ion

    s

    Class no. 7 (simple)

    0 0.5 1 1.5

    ·105Numb. of func. eval.

    Class no. 8 (hard)

    Direct Directl Disimpl-cGb-Disimpl-c Disimpl-v Gb-Disimpl-v

    Figure: Operating characteristics for the n = 5 test problems

  • Thank you for your attention