Chance-constrained optimization with tight confidence bounds€¦ · Chance-constrained optimization with tight conﬁdence bounds Mark Cannon University of Oxford 25 April 2017 1

Chance-constrained optimization with tightconfidence bounds

Mark CannonUniversity of Oxford

25 April 2017

1

Outline

1. Problem definition and motivation

2. Confidence bounds for randomised sample discarding

3. Posterior confidence bounds using empirical violation estimates

4. Randomized algorithm with prior and posterior confidence bounds

5. Conclusions and future directions

2

Chance-constrained optimization

Chance constraints limit the probability of constraint violationin optimization problems with stochastic parameters

Applications are numerous and diverse:

. . . building control, chemical process design, finance, portfolio manage-

ment, production planning, supply chain management, sustainable develop-

ment, telecommunications networks . . .

Long history of stochastic programming methodse.g. Charnes & Cooper, Management Sci., 1959

Prekopa, SIAM J. Control, 1966but exact handling of probabilistic constraints is generally intractable

This talk: approximate methods based on random samples

3

Motivation: Energy management in PHEVs

Battery: charged before trip, empty at end of trip

Electric motor: shifts engine load point to improve fuel eciencyor emissions

Control problem: optimize power flows to maximize performance(eciency or emissions) over the driving cycle

4


Engine Clutch

ElectricMotor

+ Gearbox Vehicle

Battery CircuitLosses

FuelP

f

P

eng

P

s

P

b

P

em

P

drv

Parallel configuration with common engine and electric motor speed

P

drv

= P

eng

+ P

em

!

eng

= !

em

= ! (if ! !

stall

)

I Engine fuel consumption:

P

f

= P

f

(P

eng

, !)

5


Engine Clutch

ElectricMotor

+ Gearbox Vehicle

Battery CircuitLosses

FuelP

f

P

eng

P

s

P

b

P

em

P

drv

I Electric motor electrical power P

b

$ mechanical power P

em

:

P

b

= P

b

(P

em

, !)

I Battery stored energy flow rate P

s

$ P

b

:

P

s

= P

s

(P

b

)

stored energy E $ P

s

:

˙

E = P

s

5


Nominal optimization, assuming known P

drv

(t), !(t) for t 2 [0, T ]:

minimize

P

eng

(), P

em

()

2[t,T ]

Z

T

0

P

f

(P

eng

(), !()) d

subject to

˙

E() = P

s

(), 2 [t, T ]

E(t) = E

now

E(T ) E

end

with P

drv

= P

eng

+ P

em

P

b

= P

b

(P

em

, !)

P

s

= P

s

(P

b

)

6


P

drv

(t), !(t) depend on uncertain trac flow and driver behaviour

Time (sec)200 400 600 800 1000 1200 1400 1600 1800

Battery

sta

te o

f energ

y

0.5

0.52

0.54

0.56

0.58

0.6

Introduce feedback by using a receding horizon implementation

but how likely is it that we can meet constraints?

7


P

drv

(t), !(t) depend on uncertain trac flow and driver behaviour

Real time trac flow data

7


Reformulate as a chance-constrained optimization,assuming known distributions for P

drv

(t), !(t):

minimize

P

eng

(), P

em

()

2[t,T ]

Z

T

0

P

f

(P

eng

(), !()) d

subject to

˙

E() = P

s

(), 2 [t, T ]

E(t) = E

now

P

E(T ) < E

end

for a specified probability

8


Define the chance-constrained problem

CCP : minimize

x2Xc

>x

subject to P

f(x, ) > 0

.

2 Rd stochastic parameter 2 [0, 1] maximum allowable violation probability

Assume

X Rn compact and convex

f(·, ) : Rn ! R convex, lower-semicontinuous for any 2

CCP is feasible, i.e. Pf(x, ) > 0 for some x 2 X .

9


Computational diculties with chance constraints:

1. Prohibitive computation to determine feasibility of a given x

P

f(x, ) > 0

=

Z

2

1f(x,)>0() F (d)

– Closed form expression only available in special cases

e.g. if

N (0, I) Gaussian

f(x, ) = b

> 1 + (d + D

>)>x ane

then

Pf(x, ) > 0

() 1

N (1 )kb + Dxk 1 d

>x

– In general multidimensional integration is needed(e.g. Monte Carlo methods)

10


Computational diculties with chance constraints:

2. Feasible set may be nonconvex even if f(·, ) is convex for all 2

e.g. 1

N (1 ) kb + Dxk 1 d

>x is convex if 0.5

nonconvex if > 0.5

x feasible

< 0.5

x feasible

> 0.5

10

Sample approximation

Define the multisample !

m

=

(1)

, . . . ,

(m) for i.i.d.

(i) 2

Define the sample approximation of CCP as

SP : minimize

x2X!!m

c

>x

subject to f(x, ) 0 for all 2 !

|!| q

m q discarded constraints, m q n

x

(!

): optimal solution for x

!

: optimal solution for !

11

Sample approximation

B Solutions of SP converge to the solution of CCP as m ! 1if q = m(1 )

B Approximation accuracy characterized via bounds on:

– probability that solution of SP is feasible for CCP

– probability that optimal objective of SP exceeds that of CCP

Luedtke & Ahmed, SIAM J. Optim., 2008

Calafiore, SIAM J. Optim., 2010

Campi & Garatti, J. Optim. Theory Appl., 2011

B SP can be formulated as a mixed integer program with a convexcontinuous relaxation

12

Example: minimal bounding circle

Smallest circle containing a given probability mass

CCP : minimize

c,R

R subject to P

kc k > R

SP : minimize

c,R

!!m

R subject to kc k R for all 2 !

|!| q

e.g. = 0.5, m = 1000, q = 500

R

c

CCP 1.386 (0, 0)SP 1.134 (0.003, 0.002)

SP solved for one realisation !

1000

N (0, I)

13


Bounds on confidence that SP solution, x

(!

) with m = 1000, q = 500

is feasible for CCP () from Calafiore, 2010

Violation probability ϵ

0.4 0.45 0.5 0.55 0.6

Confiden

ce

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.95 -

0.05 -

-0.95 0.05 = 0.11

=) with 90% probability: the violation probability of x

(!

)

lies between 0.47 and 0.58

14


Smallest circle containing a given probability mass – observations:

– SP computation grows rapidly with m

– Bounds on confidence of feasibility of x

(!

) for CCP are not tight

(unless q < m or probability distribution is a specific worst case)

because SP solution has poor generalization properties

e.g. m = 20, q = 10:

– Randomized sample discarding improves generalization

15

Outline






16

Level-q subsets of a multisample

Define the sampled problem for given ! !

m

as

SP(!) : minimize

x2Xc

>x subject to f(x, ) 0 for all 2 !

x

(!): optimal solution of SP(!)

Assumptions:

(i). SP(!) is feasible for any ! !

m

with probability (w.p.) 1

(ii). f

x

(!),

6= 0 for all 2 !

m

\ ! w.p. 1

17


Define the set of level-q subsets of !

m

, for q m as

q

(!

m

) =

n

! !

m

: |!| = q and f

x

(!),

> 0 for all 2 !

m

\ !

o

x

(!) for ! 2

q

(!

m

) is called a level-q solution

e.g. minimal bounding circle problem, m = 20, q = 10:

! 2 10

(!

20

)

18



m

, for q m as

q

(!

m

) =

n

! !

m

: |!| = q and f

x

(!),

> 0 for all 2 !

m

\ !

o

x

(!) for ! 2

q

(!

m



! 2 10

(!

20

)

18



m

, for q m as

q

(!

m

) =

n

! !

m

: |!| = q and f

x

(!),

> 0 for all 2 !

m

\ !

o

x

(!) for ! 2

q

(!

m



! 2 10

(!

20

)

18



m

, for q m as

q

(!

m

) =

n

! !

m

: |!| = q and f

x

(!),

> 0 for all 2 !

m

\ !

o

x

(!) for ! 2

q

(!

m



! 2 10

(!

20

)

18



m

, for q m as

q

(!

m

) =

n

! !

m

: |!| = q and f

x

(!),

> 0 for all 2 !

m

\ !

o

x

(!) for ! 2

q

(!

m



! 2 10

(!

20

)

18



m

, for q m as

q

(!

m

) =

n

! !

m

: |!| = q and f

x

(!),

> 0 for all 2 !

m

\ !

o

x

(!) for ! 2

q

(!

m



! 62 10

(!

20

)

18

Essential constraint set

Es(!) =

(i1), . . . ,

(ik) ! is an essential set of SP(!) if

(i). x

(i1), . . . ,

(ik)

= x

(!),

(ii). x(! \ ) 6= x

(!) for all 2

(i1), . . . ,

(ik).

e.g. minimal bounding circle problem

essential constraintsCCCCO

PPPPPPPi

19

Support dimension

Define:

maximum support dimension: ¯

= ess sup

!2

m

m1

|Es(!)|

minimum support dimension: = ess inf

!2

m

m¯

|Es(!)|

then ¯

dim(x) = n and 1


|Es(!)| = 2 =

20

Support dimension

Define:

maximum support dimension: ¯

= ess sup

!2

m

m1

|Es(!)|

minimum support dimension: = ess inf

!2

m

m¯

|Es(!)|

then ¯

dim(x) = n and 1


|Es(!)| = 3 =

¯

20

Confidence bounds for randomly selected level-q subsets

Theorem 1:

For any 2 [0, 1] and q 2 [

¯

,m],

Pm

V

x

(!)

| ! 2 q

(!

m

)

(q ¯

; m, 1 ).

For any 2 [0, 1] and q 2 [, m],

Pm

V

x

(!)

| ! 2 q

(!

m

)

(q ; m, 1 ).

where

violation probability: V (x) = P

f(x, ) > 0

binomial c.d.f.: (n; N, p) =

n

X

i=0

N

i

p

i

(1 p)

Ni

21


Proof of Thm. 1 relies on . . .

Lemma 1: For any v 2 [0, 1] and k 2 [,

¯

],

Pm

V

x

(!

k

)

v

|Es(!

k

)| = k

= F (v) = v

k

(e.g. Calafiore, 2010)

Proof (sketch): for all q 2 [k, m],

Pm

!

k

= Es(!q

) \ V

x

(!k

)

= v

|Es(!k

)| = k

= (1 v)qk dF (v)

=) Pm

!

k

= Es(!q

) |Es(!

k

)| = k

=

Z 1

0

(1 v)qk dF (v)

e.g. k = 3, q = 10:

hence

Z 1

0

(1 v)qk dF (v) =

q

k

1— Hausdor↵ moment problem

22


Proof of Thm. 1 relies on . . .

Lemma 1: For any v 2 [0, 1] and k 2 [,

¯

],

Pm

V

x

(!

k

)

v

|Es(!

k

)| = k

= F (v) = v

k

(e.g. Calafiore, 2010)

Proof (sketch): for all q 2 [k, m],

Pm

!

k

= Es(!q

) \ V

x

(!k

)

= v

|Es(!k

)| = k

= (1 v)qk dF (v)

=) Pm

!

k

= Es(!q

) |Es(!

k

)| = k

=

Z 1

0

(1 v)qk dF (v)

e.g. k = 3, q = 10:

hence

Z 1

0

(1 v)qk dF (v) =

q

k

1— Hausdor↵ moment problem

22


. . . and

Lemma 2: For any k 2 [,

¯

] and q 2 [k, m],

Pm

!

k

= Es(!) for some ! 2 q

(!

m

)

|Es(!

k

)| = k

= k

m k

q k

B(m q + k, q k + 1)

(B(·, ·) = beta function)

Note:

!

k

=

(1), . . . ,

(k) is statistically equivalent to a randomly selectedsubset of !

m

!

k


(!m

) |Es(!

k

)| = k

occurs i↵:

q k of the samples in !

m

\ !

k

satisfy f(x(!k

), ) 0and the remaining m q samples satisfy f(x(!

k

), ) > 0

hence Lem. 2 follows from Lem. 1 and the law of total probability

23


. . . and

Lemma 2: For any k 2 [,

¯

] and q 2 [k, m],

Pm

!

k


(!

m

)

|Es(!

k

)| = k

= k

m k

q k

B(m q + k, q k + 1)

(B(·, ·) = beta function)

Note:

!

k

=

(1), . . . ,

(k) is statistically equivalent to a randomly selectedsubset of !

m

!

k


(!m

) |Es(!

k

)| = k

occurs i↵:

q k of the samples in !

m

\ !

k

satisfy f(x(!k


k

), ) > 0

hence Lem. 2 follows from Lem. 1 and the law of total probability

23


Proof of Thm. 1 (sketch):

I Pm

!

k


(!m

) \ V

x

(!k

)

|Es(!k

)| = k

= k

m k

q k

!B(; m q + k, q k + 1)

(B(·, ·, ·) = incomplete beta function)

I Pm

V

x

(!k

)

!

k


(!m

)

=Pm

k

!

k

= Es(!) for ! 2 q

(!m

) \ V

x

(!k

)

|Es(!k

)| = k

Pm

k

!

k


(!m

) |Es(!

k

)| = k

= (q k; m, 1 ) (Baye’s law)

I Hence

Pm

V

x

(!)

! 2

q

(!m

) \ |Es(!)| = k

= (q k; m, 1 )

Thm. 1 follows from total law of probability, since, for all k 2 [, ],

(q ; m, 1 ) (q k; m, 1 ) (q ; m, 1 )24



I Pm

!

k


(!m

) \ V

x

(!k

)

|Es(!k

)| = k

= k

m k

q k

!B(; m q + k, q k + 1)


I Pm

V

x

(!k

)

!

k


(!m

)

=Pm

k

!

k

= Es(!) for ! 2 q

(!m

) \ V

x

(!k

)

|Es(!k

)| = k

Pm

k

!

k


(!m

) |Es(!

k

)| = k


I Hence

Pm

V

x

(!)

! 2

q

(!m

) \ |Es(!)| = k

= (q k; m, 1 )


(q ; m, 1 ) (q k; m, 1 ) (q ; m, 1 )24



I Pm

!

k


(!m

) \ V

x

(!k

)

|Es(!k

)| = k

= k

m k

q k

!B(; m q + k, q k + 1)


I Pm

V

x

(!k

)

!

k


(!m

)

=Pm

k

!

k

= Es(!) for ! 2 q

(!m

) \ V

x

(!k

)

|Es(!k

)| = k

Pm

k

!

k


(!m

) |Es(!

k

)| = k


I Hence

Pm

V

x

(!)

! 2

q

(!m

) \ |Es(!)| = k

= (q k; m, 1 )


(q ; m, 1 ) (q k; m, 1 ) (q ; m, 1 )24


Consider the relationship between optimal costs: J

o

() of CCPJ

(!) of SP(!)

Prop. 1: Pm

J

(!) J

o

()

! 2 q

(!

m

)

(q ¯

; m, 1 )

Proof: J

(!) J

o() if the solution x of SP(!) is feasible for CCP, so

Pm

V

x

(!)

! 2

q

(!m

)

Pm

J

(!r

) J

o()

! 2 q

(!m

)

Prop. 2: Pm

J

(!) > J

o

()

|!| = q

(q 1; q, 1 )

Proof: J

(!) J

o() if the solution x

o of CCP is feasible for SP(!)i.e. if f

x

o(),

0 for all 2 !, so

Pm

J

(!) J

o()

Pf

x

o(),

0

q

(1 )q = 1 (q 1; q, 1 )

25



o

() of CCPJ

(!) of SP(!)

Prop. 1: Pm

J

(!) J

o

()

! 2 q

(!

m

)

(q ¯

; m, 1 )

Proof: J

(!) J


Pm

V

x

(!)

! 2

q

(!m

)

Pm

J

(!r

) J

o()

! 2 q

(!m

)

Prop. 2: Pm

J

(!) > J

o

()

|!| = q

(q 1; q, 1 )

Proof: J

(!) J



x

o(),

0 for all 2 !, so

Pm

J

(!) J

o()

Pf

x

o(),

0

q

(1 )q = 1 (q 1; q, 1 )

25



o

() of CCPJ

(!) of SP(!)

Prop. 1: Pm

J

(!) J

o

()

! 2 q

(!

m

)

(q ¯

; m, 1 )

Proof: J

(!) J


Pm

V

x

(!)

! 2

q

(!m

)

Pm

J

(!r

) J

o()

! 2 q

(!m

)

Prop. 2: Pm

J

(!) > J

o

()

|!| = q

(q 1; q, 1 )

Proof: J

(!) J



x

o(),

0 for all 2 !, so

Pm

J

(!) J

o()

Pf

x

o(),

0

q

(1 )q = 1 (q 1; q, 1 )

25

Comparison with deterministic discarding strategies

For optimal sample discarding (i.e. SP) or a greedy heuristic we have

Pm

V

x

(!

)

(q,

¯

; m, ),

(q,

¯

; m, ) = 1

m q +

¯

1

m q

(m q +

¯

1; m, )

Calafiore (2010), Campi & Garatti (2011)

this bound is looser than the lower bound for random sample discarding(i.e. SP(!), ! 2

q

(!

m

))

since

1 (q, ; m, ) =m q + 1

m q

1 (q ; m, 1 )

implies

B (q, ; m, ) (q ; m, 1 )

B (q, ; m, ) = (q ; m, 1 ) only if = 1 or q = m

(in which case |q

(!m

)| = 1)26


For optimal sample discarding (i.e. SP) or a greedy heuristic we have

Pm

V

x

(!

)

(q,

¯

; m, ),

(q,

¯

; m, ) = 1

m q +

¯

1

m q

(m q +

¯

1; m, )

Calafiore (2010), Campi & Garatti (2011)

this bound is looser than the lower bound for random sample discarding(i.e. SP(!), ! 2

q

(!

m

))

since

1 (q, ; m, ) =m q + 1

m q

1 (q ; m, 1 )

implies

B (q, ; m, ) (q ; m, 1 )

B (q, ; m, ) = (q ; m, 1 ) only if = 1 or q = m

(in which case |q

(!m

)| = 1)26


The lower confidence bound for optimal sample discarding corresponds tothe worst case assumption:

Pm

V

x

(!)

>

= 0 for all ! 2 q

(!

m

) such that ! 6= !

since

? Thm. 1 implies

1

Em

|

q

(!m

)| Em

(X

!2q(!m)

Pm

V

x

(!)

>

)

1 (q ; m, 1 )

? Lem. 2 implies

Em

|

q

(!m

)|

m q + 1

m q

27


Confidence bounds for m = 500 with q = d0.75me = 375, = 1, ¯

= 10

Violation probability ϵ0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Con

fiden

ce

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Φ(q − ζ;m, 1− ϵ)

Φ(q − ζ;m, 1− ϵ)

Ψ(q, ζ;m, ϵ)

28


Values of lying on 5% and 95% confidence boundsfor varying m with q = d0.75me, = 1, ¯

= 10

Sample size m

102

104

106

108

Violationprobability

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6ϵ95

ϵ5

ϵ′

95

Sample size m

102

104

106

108

10-4

10-3

10-2

10-1

100

ϵ95 − ϵ5

ϵ′

95 − ϵ5

probablities

5

,

95

and

095

are defined by

0.05 = (q ; m, 1 5), 0.95 = (q ; m, 1 95)

0.95 = (q, ; m,

095)

29


Empirical distribution of violation probability for smallest bounding circlewith: N (0, I), q = 80, m = 100, and 500 realisations of !

m


0 0.1 0.2 0.3 0.4 0.5

Probab

ilityof

V(x)≤

ϵ

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x(ω), ω ∈ Ωq(ωm)x = x∗(ω∗)

(78; 100, 1 )

(77; 100, 1 )

--

(80, 3; 100, )

30


Empirical cost distributions for smallest bounding circlewith: N (0, I), q = 80, m = 100, and 500 realisations of !

m


0 0.1 0.2 0.3 0.4 0.5

Probab

ilityof

J∗(ω

)≥

Jo(ϵ)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ω ∈ Ωq(ωm)ω = ω∗

(99; 100, 1 )

(77; 100, 1 )

@@I-

(80, 3; 100, )

31

Outline






32

Random constraint selection strategy

How can we select ! 2 q

(!

m

) at random?

Summary of approach:

For given !

m

, let !

r

=

(1)

, . . . ,

(r), and

– solve SP(!

r

) for x

(!

r

)

– count q =

2 !

m

: f

x

(!

r

),

0

then

2 !

m

: f

x

(!

r

),

0

2 q

(!

m

)

Choose r to maximize the probabilitythat q lies in the desired range

e.g. r = 10

m = 50

q = 13 + 10 = 23

33



(!

m

) at random?


For given !

m

, let !

r

=

(1)

, . . . ,

(r), and

– solve SP(!

r

) for x

(!

r

)

– count q =

2 !

m

: f

x

(!

r

),

0

then

2 !

m

: f

x

(!

r

),

0

2 q

(!

m

)


e.g. r = 10

m = 50

q = 13 + 10 = 23

33



(!

m

) at random?


For given !

m

, let !

r

=

(1)

, . . . ,

(r), and

– solve SP(!

r

) for x

(!

r

)

– count q =

2 !

m

: f

x

(!

r

),

0

then

2 !

m

: f

x

(!

r

),

0

2 q

(!

m

)


e.g. r = 10

m = 50

q = 13 + 10 = 23

33



(!

m

) at random?


For given !

m

, let !

r

=

(1)

, . . . ,

(r), and

– solve SP(!

r

) for x

(!

r

)

– count q =

2 !

m

: f

x

(!

r

),

0

then

2 !

m

: f

x

(!

r

),

0

2 q

(!

m

)


e.g. r = 10

m = 50

q = 13 + 10 = 23

33

Posterior confidence bounds

Define

r

(!

m

) =

2 !

m

: f

x

(!

r

),

0

Theorem 2:

For any 2 [0, 1] and ¯

r q m,

Pm

V

x

(!

r

)

|

r

(!

m

) = q

(q ¯

; m, 1 ).

For any 2 [0, 1] and r q m],

Pm

V

x

(!

r

)

|

r

(!

m

) = q

(q ; m, 1 ).

?

r

(!m

) provides an empirical estimate of violation probability

? Thm. 2 gives a posteriori confidence bounds (i.e. for given

r

(!m

))

? These bounds are tight for general probability distributions, i.e.

– they cannot be improved if can take any value in [, ]

– they are exact for problems with =

34

Posterior confidence bounds

Define

r

(!

m

) =

2 !

m

: f

x

(!

r

),

0

Theorem 2:

For any 2 [0, 1] and ¯

r q m,

Pm

V

x

(!

r

)

|

r

(!

m

) = q

(q ¯

; m, 1 ).

For any 2 [0, 1] and r q m],

Pm

V

x

(!

r

)

|

r

(!

m

) = q

(q ; m, 1 ).

?

r

(!m

) provides an empirical estimate of violation probability

? Thm. 2 gives a posteriori confidence bounds (i.e. for given

r

(!m

))

? These bounds are tight for general probability distributions, i.e.

– they cannot be improved if can take any value in [, ]

– they are exact for problems with =

34

Probability of selecting a level-q subset of !m

Selection of r is based on

Lemma 3: For any ¯

r q m,

Pm

r

(!

m

) = q

m r

q r

min

2[,

¯

]

B(m q + , q + 1)

B(, r + 1)

Pm

r

(!

m

) = q

m r

q r

max

2[,

¯

]

B(m q + , q + 1)

B(, r + 1)

Proof (sketch):

1 From Lem. 1 & Lem. 2, for k 2 [, ] we have

Pm

V

x

(!r

)

= v

|Es(!r

)| = k

=

(1 v)rk

v

k1

B(k, r k + 1)dv

2

r

(!m

) = q i↵

q r of the m r samples in !

m

\ !

r

satisfy f(x(!r


r

), ) > 0

Hence Lem. 3 follows from 1 and the law of total probability35

Probability of selecting a level-q subset of !m

Selection of r is based on

Lemma 3: For any ¯

r q m,

Pm

r

(!

m

) = q

m r

q r

min

2[,

¯

]

B(m q + , q + 1)

B(, r + 1)

Pm

r

(!

m

) = q

m r

q r

max

2[,

¯

]

B(m q + , q + 1)

B(, r + 1)

Proof (sketch):

1 From Lem. 1 & Lem. 2, for k 2 [, ] we have

Pm

V

x

(!r

)

= v

|Es(!r

)| = k

=

(1 v)rk

v

k1

B(k, r k + 1)dv

2

r

(!m

) = q i↵

q r of the m r samples in !

m

\ !

r

satisfy f(x(!r


r

), ) > 0

Hence Lem. 3 follows from 1 and the law of total probability35

Posterior confidence bounds (contd.)


I Pm

r

(!m

) = q \ V

x

(!r

)

|Es(!r

)| = k

=

m r

q r

!B(; m q + k, q k + 1)

B(k, r k + 1)

I Hence from Lem. 3 and Bayes’ law

Pm

V

x

(!r

)

r

(!m

) = q \ |Es(!r

)| = k

= (q k; m, 1 )

so the total law of probability implies

Pm

V

x

(!r

)

r

(!m

) = q

(q ; m, 1 )X

k=

Pm

|Es(!

r

)| = k

Pm

V

x

(!r

)

r

(!m

) = q

(q ; m, 1 )

minr,X

k=

Pm

|Es(!

r

)| = k

36



I Pm

r

(!m

) = q \ V

x

(!r

)

|Es(!r

)| = k

=

m r

q r

!B(; m q + k, q k + 1)

B(k, r k + 1)


Pm

V

x

(!r

)

r

(!m

) = q \ |Es(!r

)| = k

= (q k; m, 1 )


Pm

V

x

(!r

)

r

(!m

) = q

(q ; m, 1 )X

k=

Pm

|Es(!

r

)| = k

Pm

V

x

(!r

)

r

(!m

) = q

(q ; m, 1 )

minr,X

k=

Pm

|Es(!

r

)| = k

36



I Pm

r

(!m

) = q \ V

x

(!r

)

|Es(!r

)| = k

=

m r

q r

!B(; m q + k, q k + 1)

B(k, r k + 1)


Pm

V

x

(!r

)

r

(!m

) = q \ |Es(!r

)| = k

= (q k; m, 1 )


Pm

V

x

(!r

)

r

(!m

) = q

(q ; m, 1 )X

k=

Pm

|Es(!

r

)| = k

Pm

V

x

(!r

)

r

(!m

) = q

(q ; m, 1 )

minr,X

k=

Pm

|Es(!

r

)| = k

36

Outline






37

Parameters and target confidence bounds

Consider the design of an algorithm to generate

x: an approximate CCP solution

q: the number of constraints satisfied out of m sampled constraints

Impose prior confidence bounds:

V (x) 2 (, ] with probability pprior

Impose posterior confidence bounds, given the value of q:

|V (x) q/m| with probability ppost

38

Parameters and target confidence bounds

q cannot be specified directly, but, for given m and q, q:

B Lem. 3 gives the probability, ptrial, of

r

(!

m

) 2 [q, q], for given r

B hence SP(!r

) must be solved Ntrial timesin order to ensure that q 2 [q, q] with a given prior probability

B choose r = r

, the maximizer of ptrial (possibly subject to r r

max

)in order to minimize Ntrial

Selection of m and q, q:

B choose m so that posterior confidence bounds are met:

m(1 ) ; m, 1 /2

1

2 (1 + ppost)

m(1 ) ; m, 1 +/2

1

2 (1 ppost).

B Thm. 2 allows q, q to be chosen so that prior confidence bounds hold

39

Randomized algorithm with tight confidence bounds

Algorithm:

(i). Given bounds , , probabilities pprior < ppost, and sample size m,determine

q = min q subject to (q ¯

; m, 1 ) 1

2

(1 + ppost)

q = max q subject to (q ; m, 1 ) 1

2

(1 ppost)

r

= arg max

r

q

X

q=q

m r

q r

min

2[,

¯

]

B(m q + , q + 1)

B(, r + 1)

ptrial =

q

X

q=q

m r

q r

min

2[,

¯

]

B(m q + , q + 1)

B(, r

+ 1)

Ntrial =

ln(1 pprior/ppost)

ln(1 ptrial)

40


Algorithm (cont.):

(ii). Draw Ntrial multisamples, !

(i)

m

, and for each i = 1, . . . , Ntrial:

– compute x

(!

(i)

r

)

– count

r

(!

(i)

m

) =

2 !

(i)

m

: f

x

(!

(i)

r

),

0

(iii). Determine i

2 1, . . . , Ntrial:

i

= arg min

i21,...,Ntrial

1

2

(q + q)

r

(!

(i)

m

)

,

and return

the solution estimate, x = x

(!

(i

)

r

)

the number of satisfied constraints, q =

r

(!

(i

)

m

)

40


Proposition 3: x, q satisfy the prior confidence bounds

PNtrial m

V (x) 2 (, ]

pprior

and the posterior confidence bounds

(q ¯

; m, 1 ) PNtrial m

V (x) | q = q

(q ; m, 1 ).

The posterior bounds follow directly from Thm. 2

The choice of q, q gives Pm

V (x) 2 (, ] | q = q

ppost 8q 2 [q, q],

=) PNtrial mV (x) 2 (, ]

ppost PNtrial m

q 2 [q, q]

but the definition of i

implies PNtrial mq 2 [q, q]

1 (1 ptrial)

Ntrial

=) PNtrial mV (x) 2 (, ]

ppost

1 (1 ptrial)

Ntrial

so the choice of Ntrial ensures the prior bound

41


B Large m is computationally cheap (hence small and ppost 1)

B ptrial is a tight bound on the probability of q 2 [q, q]

and is exact if =

¯

B The posterior confidence bounds are tight for all distributionsand are exact if =

¯

B Repeated trials and empirical violation tests have been used beforee.g. Calafiore (2017), Chamanbaz et al. (2016)

B The lower posterior confidence bound derived here coincides withCalafiore (2017) for fully supported problems ( = = n)

but our approach:

– shows that this bound is exact for fully supported problems– provides tight bounds in all other cases

42


Variation of r

and Ntrial with ,

¯

for m = 10

5, = 0.19, = 0.21, = 0.005, and ppost = 0.5(1 + pprior):

(, ) (2, 5) (7, 10) (17, 20) (47, 50) (97, 100)pprior p1 p2 p3 p4 p1 p2 p3 p4 p1 p2 p3 p4 p1 p2 p3 p4 p1 p2 p3 p4

r

15 40 91 241 492Ntrial 84 109 176 291 37 48 77 128 22 29 46 76 13 16 26 43 8 11 17 29

(, ) (1, 2) (1, 5) (1, 10)pprior p1 p2 p3 p4 p1 p2 p3 p4 p1 p2 p3 p4

r

5 12 22Ntrial 96 125 200 331 189 246 396 655 1022 1329 2116 3465

where p

1

= 0.9, p

2

= 0.95, p

3

= 0.99, p

4

= 0.999

43

Example I: minimal bounding hypersphere

Find smallest hypersphere in R4 containing N (0, I) with probability 0.8

Support dimension: = 2, ¯

= 5

Prior bounds:V (x) 2 (0.19, 0.21] with probability pprior = 0.9

Posterior bounds:|V (x) q/m| 0.005 with probability ppost = 0.95

Parameters: m = 10

5, q = 79257, q = 80759

r

= 15, ptrial = 0.0365, Ntrial = 84

Compare SP solved approximately using greedy discarding (assuming oneconstraint discarded per iteration), with m = 420, q = 336:

V

x

(!

)

2 (0.173, 0.332] with probability 0.9

44


Find smallest hypersphere in R4 containing N (0, I) with probability 0.8

Support dimension: = 2, ¯

= 5

Prior bounds:V (x) 2 (0.19, 0.21] with probability pprior = 0.9

Posterior bounds:|V (x) q/m| 0.005 with probability ppost = 0.95

Parameters: m = 10

5, q = 79257, q = 80759

r

= 15, ptrial = 0.0365, Ntrial = 84

Compare SP solved approximately using greedy discarding (assuming oneconstraint discarded per iteration), with m = 420, q = 336:

V

x

(!

)

2 (0.173, 0.332] with probability 0.9

44


Posterior confidence bounds given q 2 [q, q]:

PV (x) | q = q (blue) PV (x) | q = q (red)

Violation probability ϵ0.185 0.19 0.195 0.2 0.205 0.21 0.215

Probab

ility

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Φ(q − ζ;m, 1− ϵ)Φ(q − ζ;m, 1− ϵ)

45


Proportion of sampled constraints satisfied q/m0.785 0.79 0.795 0.8 0.805 0.81 0.815

Probability

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Observed frequency

Worst-case bound

– Empirical probability distribution of q (blue) for 1000 repetitions

– Worst case bound F (red), P|q/m 0.8| t F (0.8 + t) F (0.8 t)

46

Example II: finite horizon robust control design

Control problem (from Calafiore 2017):

Determine a control sequence to drive the system state to a target over afinite horizon with high probability despite model uncertainty

Discrete-time uncertain linear system

z(t + 1) = A()z(t) + Bu(t), A() = A

0

+

nzX

i,j=1

i,j

e

i

e

>j

i,j

2 [, ] uniformly distributed

47


Control problem (from Calafiore 2017):

Determine a control sequence to drive the system state to a target over afinite horizon with high probability despite model uncertainty

Discrete-time uncertain linear system

z(t + 1) = A()z(t) + Bu(t), A() = A

0

+

nzX

i,j=1

i,j

e

i

e

>j

i,j

2 [, ] uniformly distributed

47


Control objective:

CCP : minimize

,u

subject to P

kz

T

z(T, )k2

+ kuk2

>

with u = u(0), . . . , u(T 1): predicted control sequence

z(T, ): = A()

T

z(0) +

A()

T1

B · · · B

u

z

T

: target state

time step t

0 2 4 6 8 10

controlinputu(t)

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

48


Problem dimensions: n

z

= 6, T = 10, n = 11, = 1, ¯

= 3

and parameters: = 10

3, = 0.005

Single-sided prior probability bounds with = 0.005:

P

V (x) 2 (0, ]

pprior = 0.99

Posterior uncertainty: = 0.003

and probability: ppost = 1 10

12

=) m = 63 10

3

Choose number of sampled constraints in each trial to maximize ptrial:

r

= 2922, ptrial = 0.482, Ntrial = 9

if r

is limited to 2000, we get ptrial = 0.414, Ntrial = 9

49


Compare number of trials needed for q q

expected using bounds of Calfiore (2017): 9.6

expected using proposed approach (1/ptrial): 2.4

observed: 1.4

Residual conservativism is due to use of bounds: = 1, ¯

= 3

rather than the actual distribution for :

support dimension: 1 2 3observed frequency: 5% 59% 36%

50

Future directions

B Alternative methods for randomly selecting level-q solutions

B Methods of checking feasibility of suboptimal points

B Estimation of violation probabilityover a set of parameters

application to computing invariantsets for constrained dynamics

v

11.5 1 0.5 0 0.5 1 1.5

v

2

1.5

1

0.5

0

0.5

1

1.5

Fig. 1: Outlined in black the result V of the optimization (18), thered points show samples drawn for this distribution and 100 redoutlined sets are convex hulls of a random selection of N = 11samples such that any V allows statements on (18) to be made witha confidence larger than 0.999. The result for the non-homogeneousgrid discussed in Section VI is outlined in blue.

of V , this means that for non-uniform distributions theerror e(V) decreases with the distance from the expectedvalue. This implies that a non-uniform grid size such thatfor each Qi the measure 0 < fi < for some fixed maybe used to reduce the number of necessary cubes as well asthe dependence of the error e(V) on the inner radius of V .

VI. A NON-HOMOGENEOUS GRID

In the previous section we presented a simple exampleusing a homogeneous grid, this led to a large number ofcubes Qi and hence centre points ci. Since the centralapproximation (16) does not require a homogeneous gridand neither does Lemma 4.1, we present in this section amethod to determine a grid which reflects the properties ofthe probability density function more closely. We considera compact sample space B for some box B = x 2Rd

: c x

i

c. The algorithm we suggest is todefine a polytopal complex C consisting of cubes Q

i

suchthat

Qi2C Q

i

= B and the associated probability measureof each cube is below a certain threshold f

i

fmax

. To dothis efficiently first notice that

Z

0

. . .

Z

0

f(t1

, . . . , t

d

) dt

1

. . . dt

d

=

d

Z

1

0

. . .

Z

1

0

f(1

, . . . ,

d

) d

1

. . . d

d

.

Although this is a trivial substitution, it allows the simul-taneous computation of f

i

over differently sized cubes Q

i

.Let q

1

, . . . , q

2

d denote the vertices of the cube Q = x 2

Rd

: 0 x

i

1. For this algorithm it is convenient tocharacterise cubes by their lower left corner b

i

and theirside length

i

, so that the ith cube is given by Q

i

=

conv1k2

db

i

+

i

q

k

, and the associated measure is givenby

fi

=

Z

Qf(b

i

+

i

1t) dt.

We propose the following algorithm to determine C.

Algorithm 1 : Non-homogeneous grid generationChoose desired f

max

> 0 and

min

> 0, set C = B andevaluate f

i

while (max

i

fi

fmax

and min

i

min

) doIDX := f

i

fmax

for i 2 IDX doreplace b

i

by b

i

+ q

k

, k = 1, . . . , 2

d and

i

by i2

endevaluate f

i

, i 2 IDX

update Cend

Algorithm 1 terminates either when every cube contains aprobability measure below the threshold f

max

or when somecubes become smaller than the threshold

min

. Otherwise itreplaces all cubes for which the probability measure is largerthan f

max

with cubes with a halved side length. Once the gridis found the centre points are given by c

i

= b

i

+

i2

1.

Algorithm 1 can reduce the computational overhead sig-nificantly: For the same example as in Section V witha non-homogeneous grid satisfying f

i

1

20000

whichyields min

i

i

=

1

2

7 = 0.0078125, max

i

i

=

1

2

anda total of 45436 cubes. Since (16) does not depend onthe choice of the grid itself the remaining optimization isunchanged, a single evaluation of ˚

P (V) now takes 0.0251

seconds, i.e. about five times faster than the homogeneouscounterpart. The resulting set V is shown in blue in Figure 1,For illustrative purposes, Figure 2 shows a (slightly less fine)grid constructed using Algorithm 1.

We summarise the numerical values of the proposed schemesin Table I. The first row of the table shows values forthe homogeneous grid with 160000 cubes (with no directinfluence on the measure of each box), the second row givesvalues for the inhomogeneous grid, while the third row showsthe values obtained with the grid shown in Figure 2. Bothscenario based sets are evaluated on the inhomogeneousgrid, with P (V) and P (V) evaluated at extreme values ofthe generated convex hulls, and the values of ˚

P (V) are thearithmetic means of all 100 sample sets.

VII. CONCLUSIONS

This paper presents a method of optimizing an auxiliarypolytope so as to guarantee satisfaction of probabilisticconstraints. Such problems arise, for example, in stochasticmodel predictive control (see e.g. [20], [6]). To obtain a finite

CONFIDENTIAL. Limited circulation. For review only.

Preprint submitted to 56th IEEE Conference on Decision and Control.

Received March 20, 2017.

B Application to chance-constrained energy management in PHEVs

51

Conclusions

B Solutions of sampled convex optimization problems with randomisedsample selection strategies have tighter confidence bounds thanoptimal sample discarding strategies

B Randomised sample selection can be performed by solving a robustsampled optimization and counting violations of additional sampledconstraints

B We propose a randomised algorithm for chance-constrained convexprogramming with tight a priori and a posteriori confidence bounds

52

Thank you for your attention!

Acknowledgements: James Fleming, Matthias Lorenzen,Johannes Buerger, Rainer Manuel Schaich . . .

53

Research directions

B Optimal control of uncertain systems

– chance-constrained optimization

– stochastic predictive control

B Robust control design

– invariant sets and uncertainty parameterization

B Applications:

– energy management in hybrid electric vehicles (BMW)

– ecosystem-based fisheries management (BAS)

54

Documents

Chance-constrained optimization with tight confidence bounds€¦ · Chance-constrained optimization with tight conﬁdence bounds Mark Cannon University of Oxford 25 April 2017 1