Data-Driven Optimization in Power Systems · Data-Driven Optimization in Power Systems Andrea Simonetto IBM Research DTU Summer School, June 18, 2019. What’s in here Time-varying

Data-Driven Optimization in Power Systems

Andrea Simonetto

IBM Research

DTU Summer School, June 18, 2019

What’s in hereTime-varying optimizationOptimal power flow problems (that change over time)Regularization (of optimization problems)Measurement feedback in optimization (for cyber-physicalsystems)

Motivation: ...

Recent work with a few researchers, e.g., Dr. Emiliano Dall’AneseE. Dall’Anese, AS, arXiv: 1601.07263A.S., arXiv: 1807.07032Our websites, ... [mailto: [email protected]]

A. Simonetto (IBM Research) 2 / 23

OutlineBasics

I Optimization problems, gradient descent, regularizationI Lagrangian formalism, saddle-point method, convergence

AdvancedI Time-varying optimization ideas, online gradient descent, online

saddle-pointI Measurement feedback I: idea, derivation, insightsI Optimal power flow pursuit: formulation, linear approximationI Measurement feedback II: application to OPF, convergence and

numerical results

Take-home messages


Optimization problemsWe start with convex optimization problems of the form:

minimizex∈Rn

f (x), subject to: g(x) ≤ 0.

Immagine g(x) is not there, then the gradient method is defined as

xk = xk−1 − α∇xf (xk−1)

if f strongly smooth, then f (xk)− f ∗ ≤ O(1/k) (O(1/k2) for FGM)if f strongly convex and strongly smooth, Q-linear convergence O(%k)

‖xk − x∗‖ ≤ %‖xk−1 − x∗‖, % < 1 (α < 2/L)

But the World isn’t always nice...



minimizex∈Rn



xk = xk−1 − α∇xf (xk−1)


‖xk − x∗‖ ≤ %‖xk−1 − x∗‖, % < 1 (α < 2/L)




minimizex∈Rn



xk = xk−1 − α∇xf (xk−1)


‖xk − x∗‖ ≤ %‖xk−1 − x∗‖, % < 1 (α < 2/L)




minimizex∈Rn



xk = xk−1 − α∇xf (xk−1)


‖xk − x∗‖ ≤ %‖xk−1 − x∗‖, % < 1 (α < 2/L)



Regularization can do a lot for youIdea: regularize the problem, solve it fast, then tune theregularization, then ...

See, e.g., F. Glineur, Yu. Nesterov, ...L. Stella, A. Themelis, P. Patrinos, arXiv: 1604.08096

Moreau’s envelope (for non-smooth):

f (x)→ infx∈Rn

{f (z) +

12γ ‖z − x‖

2}, γ > 0

Plain vanilla regularization (for non strongly-convex):

f (x)→ f (x) +εt2 ‖x‖

2





f (x)→ infx∈Rn

{f (z) +

12γ ‖z − x‖

2}, γ > 0


f (x)→ f (x) +εt2 ‖x‖

2





f (x)→ infx∈Rn

{f (z) +

12γ ‖z − x‖

2}, γ > 0


f (x)→ f (x) +εt2 ‖x‖

2


Constrained optimization problems

minimizex∈Rn


Lagrangian formalism:

L(x,µ) := f (x) + µTg(x), x ∈ Rn,µ ∈ Rq+

Saddle-point method:

xk = xk−1 − α∇xL(xk−1,µk−1) =

xk−1 − α[∇xf (xk−1) +∇xTg(xk−1)µk−1

]µk = ΠRq

+[µk−1 + α∇µL(xk−1,µk−1)] = ΠRq

+[µk−1 + αg(xk−1)]


Constrained optimization problems

minimizex∈Rn


Lagrangian formalism:

L(x,µ) := f (x) + µTg(x), x ∈ Rn,µ ∈ Rq+

Saddle-point method:

xk = xk−1 − α∇xL(xk−1,µk−1) =


]µk = ΠRq

+[µk−1 + α∇µL(xk−1,µk−1)] = ΠRq

+[µk−1 + αg(xk−1)]


Saddle-point convergenceSaddle-point method:

xk = xk−1 − α∇xL(xk−1,µk−1) =


]µk = ΠRq

+[µk−1 + α∇µL(xk−1,µk−1)] = ΠRq

+[µk−1 + αg(xk−1)]



xk = xk−1 − α∇xL(xk−1,µk−1) =


]µk = ΠRq

+[µk−1 + α∇µL(xk−1,µk−1)] = ΠRq

+[µk−1 + αg(xk−1)]

Convergence is linear (under Slater’s condition + strongconvexity/concavity and strong smoothness), for small stepsizes.Specifically, call z := [xT,µT]T, then

‖zk − z∗‖ ≤ %‖zk−1 − z∗‖, % < 1 (α < 2m/L2)



xk = xk−1 − α∇xL(xk−1,µk−1) =


]µk = ΠRq

+[µk−1 + α∇µL(xk−1,µk−1)] = ΠRq

+[µk−1 + αg(xk−1)]

Assumptions do not hold, unless..

double regularization:

L(x,µ)→ L(x,µ) +εt2 ‖x‖

2 − νt2 ‖µ‖

2

What does νt imply?



xk = xk−1 − α∇xL(xk−1,µk−1) =


]µk = ΠRq

+[µk−1 + α∇µL(xk−1,µk−1)] = ΠRq

+[µk−1 + αg(xk−1)]

Assumptions do not hold, unless.. double regularization:

L(x,µ)→ L(x,µ) +εt2 ‖x‖

2 − νt2 ‖µ‖

2

What does νt imply?


OutlineBasics




numerical results

Take-home messages


Time-varying OptimizationWe start with convex optimization problems of the form:

minimizex∈Rn

f (x; t), subject to: g(x; t) ≤ 0.


xk = xk−1 − α∇xf (xk−1; tk)

If f strongly convex and strongly smooth and

‖x∗k − x∗k−1‖ ≤ δ,

‖xk − x∗k‖ ≤ %(‖xk−1 − x∗k−1‖+ δ), % < 1 (α < 2/L)

So, convergence around an error ball of size O(δ)Plenty of works: correction-only, prediction-correction, etc..see: A.S., arXiv: 1807.07032



minimizex∈Rn





‖x∗k − x∗k−1‖ ≤ δ,

‖xk − x∗k‖ ≤ %(‖xk−1 − x∗k−1‖+ δ), % < 1 (α < 2/L)




minimizex∈Rn





‖x∗k − x∗k−1‖ ≤ δ,

‖xk − x∗k‖ ≤ %(‖xk−1 − x∗k−1‖+ δ), % < 1 (α < 2/L)

So, convergence around an error ball of size O(δ)

Plenty of works: correction-only, prediction-correction, etc..see: A.S., arXiv: 1807.07032



minimizex∈Rn





‖x∗k − x∗k−1‖ ≤ δ,

‖xk − x∗k‖ ≤ %(‖xk−1 − x∗k−1‖+ δ), % < 1 (α < 2/L)



Constrained caseWorks similarly with a doubly regularized Lagrangian



100 101 102 103

Time index

10−6

10−5

10−4

10−3

10−2

10−1

Rel

ativ

eer

ror:|f t

(zt)−f∗ t|/|f∗ t|

η = 0, ν = ε = 0

η = 0, ν = 0.1, ε = 0.01



100 101 102 103

Time index

10−6

10−5

10−4

10−3

10−2

10−1

Rel

ativ

eer

ror:|f t

(zt)−f∗ t|/|f∗ t|

η = 0, ν = ε = 0

η = 0, ν = 0.1, ε = 0.01

η = 0.5, ν = ε = 0

η = 0.5, ν = 0.1, ε = 0.01


Physics is not convexMany cyber-physical systems have the following structure

minimizex∈Rn,y∈Rl

f (x; t) + h(y; t)

subject to: y =M(x) (Physics)x ∈ X (Engineering)

Can we linearize?Feed-forward (open loop): y =M(x) ≈ Ax+ bFeedback: y =M(x) + ω

xk = ΠX [xk − α∇xf (xk−1; t)− α(∇xy ◦ ∇yh)(xk−1)]

≈ ΠX [xk − α∇xf (xk−1; t)− αAT∇yh(Axk−1 + b)]

≈ ΠX [xk − α∇xf (xk−1; t)− αAT∇yh(yk−1)]




f (x; t) + h(y; t)


Can we linearize?

Feed-forward (open loop): y =M(x) ≈ Ax+ bFeedback: y =M(x) + ω







f (x; t) + h(y; t)


Can we linearize?Feed-forward (open loop): y =M(x) ≈ Ax+ b

Feedback: y =M(x) + ω







f (x; t) + h(y; t)









f (x; t) + h(y; t)









f (x; t) + h(y; t)







Physics is not convexFeedback: y =M(x) + ω




Let the physics do the job for you

First-order

algorithmx

y

E. Dall’Anese, AS, arXiv: 1601.07263A. Bernstein, E. Dall’Anese, A.S., arXiv: 1804.05159M. Colombino, J. Simpson-Porco, A. Bernstein, arXiv: 1905.07363


Physics is not convexFeedback: y =M(x) + ω

First-order

algorithmx

y

E. Dall’Anese, AS, arXiv: 1601.07263A. Bernstein, E. Dall’Anese, A.S., arXiv: 1804.05159M. Colombino, J. Simpson-Porco, A. Bernstein, arXiv: 1905.07363


Optimal power flow: basicsSetting (high level): distribution feeder:

1n

Node n = 1, . . . ,N:Vkn ∈ C, Ikn ∈ C, graphN

Pk`,n ∈ R,Qk

`,n ∈ Rvk := [Vk

1 , . . . ,VkN ], ik := [Ik1 , . . . , I

kN ]

Node n ∈ G:Pkn ∈ R,Qk

n ∈ R


Optimal power flow: basicsSetting (high level): distribution feeder:

1n

PV

Node n = 1, . . . ,N:Vkn ∈ C, Ikn ∈ C, graphN

Pk`,n ∈ R,Qk

`,n ∈ Rvk := [Vk

1 , . . . ,VkN ], ik := [Ik1 , . . . , I

kN ]

Node n ∈ G:Pkn ∈ R,Qk

n ∈ R


Optimal power flow: basicsSetting (high level): distribution feeder, AC OPF problem as

(OPFk) minimizev,i,{Pi,Qi}i∈G

hk({Vi}i∈N ) +∑i∈G

f ki (Pi,Qi)

subject to :

[Ik0ik]

=

[yk00 (yk)T

yk Yk

]︸︷︷︸

:=Yknet

[Vk0

vk

]

ViI∗i = Pi − Pk`,i + j(Qi − Qk

`,i), ∀ i ∈ GVnI∗n = −Pk

`,n − jQk`,n, ∀n ∈ N\G

Vmin ≤ |Vi| ≤ Vmax, ∀ i ∈M(Pi,Qi) ∈ Yk

i , ∀ i ∈ G ,


Optimal power flow: linearization


`,i), ∀ i ∈ GVnI∗n = −Pk

`,n − jQk`,n, ∀n ∈ N\G

Set:s collecting net power injected, p = R(s), q = I(s)ρ = [|V1|, . . . , |VN |]T ∈ RN

Linearizing the power-flow relations as

v ≈ Hp + Jq + bρ ≈ Rp + Bq + a

To write, e.g.,: Vmin1N ≤ Rp + Bq + a ≤ Vmax1N (Measurement)


Optimal power flow: linearization


`,i), ∀ i ∈ GVnI∗n = −Pk

`,n − jQk`,n, ∀n ∈ N\G

Set:s collecting net power injected, p = R(s), q = I(s)ρ = [|V1|, . . . , |VN |]T ∈ RN

Linearizing the power-flow relations as

v ≈ Hp + Jq + bρ ≈ Rp + Bq + a

To write, e.g.,: Vmin1N ≤ Rp + Bq + a ≤ Vmax1N (Measurement)


Optimal power flow: formulation

(R− OPFk) minimize{ui}i∈G

∑i∈G

f ki (ui)

subject to :

gkn({ui}i∈G) ≤ 0, ∀n ∈M

gkn({ui}i∈G) ≤ 0, ∀n ∈Mui ∈ Yk

i , ∀ i ∈ Gwhere ui := [Pi,Qi]

T, f ki (ui) := f ki (ui) := f ki (ui) + hki (ui)

gkn({ui}i∈G) := Vmin − ckn −

∑i∈G

[rkn,i(Pi − Pk`,i) + bk

n,i(Qi − Qk`,i)]

gkn({ui}i∈G) :=

∑i∈G

[rkn,i(Pi − Pk`,i) + bk

n,i(Qi − Qk`,i)] + ckn − Vmax

Yk := Yk1 × . . .Yk

NG


Optimal power flow: saddle-point problem

Lk(u,γ,µ) :=∑i∈G

f ki (Pi,Qi)+

(Pi − Pk`,i)(r

ki )T(µ− γ) + (Qi − Qk

`,i)(bki )T(µ− γ)

+ (ck)T(µ− γ) + γT1mVmin − µT1mVmax

Double smoothing:

Lkν,ε(u,γ,µ) := Lk(u,γ,µ) +

ν

2‖u‖22 −

ε

2(‖γ‖22 + ‖µ‖22)


Optimal power flow: saddle-point problem

Lk(u,γ,µ) :=∑i∈G

f ki (Pi,Qi)+

(Pi − Pk`,i)(r

ki )T(µ− γ) + (Qi − Qk

`,i)(bki )T(µ− γ)

+ (ck)T(µ− γ) + γT1mVmin − µT1mVmax

Double smoothing:

Lkν,ε(u,γ,µ) := Lk(u,γ,µ) +

ν

2‖u‖22 −

ε

2(‖γ‖22 + ‖µ‖22)


Saddle-point algorithmFrom feed-forward:

uk+1i = ΠYk

i

{uk

i − α∇uiLkν,ε(u,γ,µ)|uk

i ,γk,µk

}, ∀ i ∈ G

γk+1n = ΠR+

{γkn + α(gk

n(uk)− εγkn)}, ∀n ∈M

µk+1n = ΠR+

{µkn + α(gk

n(uk)− εµkn)}

∀n ∈M,

To feedback-based:[S1] Collect voltage measurements {yk

n}n∈M.[S2] For all n ∈M, update dual variables as follows:

γk+1n = ΠR+

{γkn + α(Vmin − yk

n − εγkn)}

µk+1n = ΠR+

{µkn + α(yk

n − Vmax − εµkn)}

[S3] Update power setpoints at each RES i ∈ G as:

uk+1i = ΠYk

i

{uk


i ,γk,µk

}and go to [S1].



uk+1i = ΠYk

i

{uk


i ,γk,µk

}, ∀ i ∈ G

γk+1n = ΠR+

{γkn + α(gk


µk+1n = ΠR+

{µkn + α(gk

n(uk)− εµkn)}

∀n ∈M,



γk+1n = ΠR+


n − εγkn)}

µk+1n = ΠR+

{µkn + α(yk



uk+1i = ΠYk

i

{uk


i ,γk,µk

}and go to [S1].



uk+1i = ΠYk

i

{uk


i ,γk,µk

}, ∀ i ∈ G

γk+1n = ΠR+

{γkn + α(gk


µk+1n = ΠR+

{µkn + α(gk

n(uk)− εµkn)}

∀n ∈M,



γk+1n = ΠR+


n − εγkn)}

µk+1n = ΠR+

{µkn + α(yk



uk+1i = ΠYk

i

{uk


i ,γk,µk

}and go to [S1].



uk+1i = ΠYk

i

{uk


i ,γk,µk

}, ∀ i ∈ G

γk+1n = ΠR+

{γkn + α(gk


µk+1n = ΠR+

{µkn + α(gk

n(uk)− εµkn)}

∀n ∈M,



γk+1n = ΠR+


n − εγkn)}

µk+1n = ΠR+

{µkn + α(yk



uk+1i = ΠYk

i

{uk


i ,γk,µk

}and go to [S1].



uk+1i = ΠYk

i

{uk


i ,γk,µk

}, ∀ i ∈ G

γk+1n = ΠR+

{γkn + α(gk


µk+1n = ΠR+

{µkn + α(gk

n(uk)− εµkn)}

∀n ∈M,



γk+1n = ΠR+


n − εγkn)}

µk+1n = ΠR+

{µkn + α(yk



uk+1i = ΠYk

i

{uk


i ,γk,µk

}and go to [S1].A. Simonetto (IBM Research) 18 / 23

ConvergenceAssumptions:

1 Cost function: convex and strongly smooth over Yk with constantL

2 Slater’s condition hold3 There exist constants that upper-bound the variation in time of

the problem:

‖u∗,k+1 − u∗,k‖ ≤ σu, |gk+1n (u∗,k+1)− gk

n(u∗,k)| ≤ σd,

|gk+1n (u∗,k+1)− gk

n(u∗,k)| ≤ σd.

Which implies ‖z∗,k+1 − z∗,k‖ ≤ σz4 There exist constants that upper-bound the linearization error:

max{‖ekγ‖2, ‖ek

µ‖2} ≤ e




2 Slater’s condition hold

3 There exist constants that upper-bound the variation in time ofthe problem:

‖u∗,k+1 − u∗,k‖ ≤ σu, |gk+1n (u∗,k+1)− gk

n(u∗,k)| ≤ σd,

|gk+1n (u∗,k+1)− gk

n(u∗,k)| ≤ σd.



µ‖2} ≤ e





the problem:

‖u∗,k+1 − u∗,k‖ ≤ σu, |gk+1n (u∗,k+1)− gk

n(u∗,k)| ≤ σd,

|gk+1n (u∗,k+1)− gk

n(u∗,k)| ≤ σd.

Which implies ‖z∗,k+1 − z∗,k‖ ≤ σz

4 There exist constants that upper-bound the linearization error:max{‖ek

γ‖2, ‖ekµ‖2} ≤ e





the problem:

‖u∗,k+1 − u∗,k‖ ≤ σu, |gk+1n (u∗,k+1)− gk

n(u∗,k)| ≤ σd,

|gk+1n (u∗,k+1)− gk

n(u∗,k)| ≤ σd.



µ‖2} ≤ e


Convergence: resultTheorem.Consider the sequence {zk} := {uk,γk,µk}Let the assumptions hold.For fixed positive scalars ε, ν > 0, if the stepsize α > 0 is chosen suchthat

ρ(α) :=√

1− 2ηα+ α2L2ν,ε < 1,

that is 0 < α < 2η/L2ν,ε, then the sequence {zk} converges Q-linearly

to z∗,k := {u∗,k,γ∗,k,µ∗,k} up to the asymptotic error bound given by:

lim supk→∞

‖zk − z∗,k‖ =1

1− ρ(α)

[√2αe + σz

]

Lν,ε :=√

(L+ ν + 2G)2 + 2(G+ ε)2, G := max ‖∇g‖, η := min{ν, η}


SimulationsReal load and solar data from Anatolia, CAPQ of inverters updated every 1sHVAC controlled every 5 minVoltage regulation and power tracking

1

23

4 5

6

7

8910

11 12 13

14

1516

17

18

19

20

21

2223

24

25

26

272829

30

31

32

33

34 3536

37


SimulationsReal load and solar data from Anatolia, CAPQ of inverters updated every 1sHVAC controlled every 5 minVoltage regulation and power tracking

6:00 8:00 10:00 12:00 14:00 16:00 18:00 20:00

Time

0.98

0.99

1

1.01

1.02

1.03

1.04

1.05

1.06

|Vt n|

Node 2

Node 28

Node 35


OutlineBasics




numerical results

Take-home messages


Take-home messagesTime-varying optimization rocks!

Regularizing time-varying problems is not a bad ideaCyber-physical systems have a structure which allows you to usefeedback!

Some extra useful literature:First-order algorithms I: Adrien Taylor, PhD ThesisFirst-order algorithms II: E. K. Ryu, S. Boyd, Primer on MonotoneOperator Methods, 2016

Mailto: [email protected]


Take-home messagesTime-varying optimization rocks!Regularizing time-varying problems is not a bad idea

Cyber-physical systems have a structure which allows you to usefeedback!




Take-home messagesTime-varying optimization rocks!Regularizing time-varying problems is not a bad ideaCyber-physical systems have a structure which allows you to usefeedback!




Take-home messagesTime-varying optimization rocks!Regularizing time-varying problems is not a bad ideaCyber-physical systems have a structure which allows you to usefeedback!




Documents

Data-Driven Optimization in Power Systems · Data-Driven Optimization in Power Systems Andrea Simonetto IBM Research DTU Summer School, June 18, 2019. What’s in here Time-varying