Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
High-Confidence Data-Driven Ambiguity Sets for Time-Varying Linear Systems∗1
Dimitris Boskos† , Jorge Cortes‡ , and Sonia Martınez‡2
3
Abstract. This paper builds Wasserstein ambiguity sets for the unknown probability distribution of dynamic4random variables leveraging noisy partial-state observations. The constructed ambiguity sets contain5the true distribution of the data with quantifiable probability and can be exploited to formulate6robust stochastic optimization problems with out-of-sample guarantees. We assume the random7variable evolves in discrete time under uncertain initial conditions and dynamics, and that noisy8partial measurements are available. All random elements have unknown probability distributions9and we make inferences about the distribution of the state vector using several output samples from10multiple realizations of the process. To this end, we leverage an observer to estimate the state of11each independent realization and exploit the outcome to construct the ambiguity sets. We illustrate12our results in an economic dispatch problem involving distributed energy resources over which the13scheduler has no direct control.14
Key words. Distributional uncertainty, Wasserstein ambiguity sets, stochastic systems, state estimation15
AMS subject classifications. 62M20, 62G35, 90C15, 93C05, 93E1016
1. Introduction. Decisions under uncertainty are ubiquitous in a wide range of engi-17
neering applications. Faced with complex systems that include components with probabilistic18
models, such decisions seek to provide rigorous solutions with quantifiable guarantees in hedg-19
ing against uncertainty. In practice, the designer makes inferences about uncertain elements20
based on collected data and exploits them to formulate data-driven stochastic optimization21
problems. This decision-making paradigm has found applications in finance, communications,22
control, medicine, and machine learning. Recent research focuses on how to retain high-23
confidence guarantees for the optimization problems under plausible variations of the data.24
To this end, distributionally robust optimization (DRO) formulations evaluate the optimal25
worst-case performance over an ambiguity set of probability distributions that contains the26
true one with high confidence. Such ambiguity sets are typically constructed under the as-27
sumption that data are generated from a static distribution and can be measured in a direct28
manner. In this paper we significantly expand on the class of scenarios for which reliable am-29
biguity sets can be constructed. We consider scenarios where the random variable is dynamic30
and partial measurements, corrupted by noise, are progressively collected from its evolving dis-31
tribution. In our analysis, we exploit the underlying dynamics and study how the probabilistic32
properties of the noise affect the ambiguity set size while maintaining the same guarantees.33
Literature review: Optimal decision problems in the face of uncertainty, like expected-cost34
minimization and chance-constrained optimization, are the cornerstones of stochastic pro-35
gramming [39]. Distributionally robust versions of stochastic optimization problems [2, 5, 38]36
carry out a worst-case optimization over all possibilities from an ambiguity set of probability37
∗This work was supported by the DARPA Lagrange program through award N66001-18-2-4027. A preliminaryversion appeared as [7] at the American Control Conference.†Delft Center for Systems and Control, Delft University of Technology ([email protected]).‡Department of Mechanical and Aerospace Engineering, University of California, San Diego
(cortes,[email protected]).
1
This manuscript is for review purposes only.
2 D. BOSKOS, J. CORTES, AND S. MARTINEZ
distributions. This is of particular importance in data-driven scenarios where the unknown38
distributions of the random variables are inferred in an approximate manner using a finite39
amount of data [3]. To hedge this uncertainty, optimal transport ambiguity sets have emerged40
as a promising tool. These sets typically group all distributions up to some distance from the41
empirical approximation in the Wasserstein metric [42]. There are several reasons that make42
this metric a popular choice among the distances between probability distributions, partic-43
ularly, for data-driven problems. Most notably, the Wasserstein metric penalizes horizontal44
dislocations between distributions and provides ambiguity sets that have finite-sample guaran-45
tees of containing the true distribution and lead to tractable optimization problems. This has46
rendered the convergence of empirical measures in the Wasserstein distance an ongoing active47
research area [15, 16, 18, 24, 43, 44]. Towards the exploitation of Wasserstein ambiguity sets48
for DRO problems, the work [17] introduces tractable reformulations with finite-sample guar-49
antees, further exploited in [11, 23] to deal with distributionally robust chance-constrained50
programs. The work [13] develops distributed optimization algorithms using Wasserstein balls,51
while optimal transport ambiguity sets have recently been connected to regularization for ma-52
chine learning [4, 19, 36]. The paper [28] exploits Wasserstein balls to robustify data-driven53
online optimization algorithms, and [37] leverages them for the design of distributionally ro-54
bust Kalman filters. Further applications of Wasserstein ambiguity sets include the synthesis55
of robust control policies for Markov decision processes [46] and their data-driven exten-56
sions [47], and regularization for stochastic predictive control algorithms [14]. Several recent57
works have also devoted attention to distributionally robust problems in power systems con-58
trol, including optimal power flow [22, 25] and economic dispatch [45, 31, 35]. Time-varying59
aspects of Wasserstein ambiguity sets are considered in our previous work: in [26] for dy-60
namic traffic models, in [27] for online learning of unknown dynamical environments, in [8],61
which constructs ambiguity balls using progressively assimilated dynamic data for processes62
with random initial conditions that evolve under deterministic dynamics, and in [10], which63
studies the propagation of ambiguity bands under hyperbolic PDE dynamics. In contrast, in64
the present work, the state distribution does not evolve deterministically due to the presence65
of random disturbances, which together with output measurements that are corrupted by66
noise, generate additional stochastic elements that make challenging the quantification of the67
ambiguity set guarantees.68
Statement of contributions: Our contributions revolve around building Wasserstein am-69
biguity sets with probabilistic guarantees for dynamic random variables when we have no70
knowledge of the probability distributions of their initial condition, the disturbances in their71
dynamics, and the measurement noise. To this end, our first contribution estimates the states72
of several process realizations from output samples and exploits these estimates to build a suit-73
able empirical distribution as the center of an ambiguity ball. Our second contribution is the74
exploitation of concentration of measure results to quantify the radius of this ambiguity ball75
so that it provably contains the true state distribution with high probability. To achieve this,76
we break the radius into nominal and noise components. The nominal component captures77
the deviation between the true distribution and the empirical distribution formed by the state78
realizations. The noise component captures the deviation between the empirical distribution79
and the center of our ambiguity ball. To quantify the latter, we carefully evaluate the im-80
pact of the estimation error, which due to the measurement noise, does not have a compactly81
This manuscript is for review purposes only.
DATA-DRIVEN AMBIGUITY SETS FOR LINEAR SYSTEMS 3
supported distribution like the internal uncertainty and requires a separate analysis. Our82
third contribution is the extension of these results to obtain simultaneous guarantees about83
ambiguity sets that are built along finite time horizons, instead of at isolated time instances.84
The fourth contribution is to generalize a concentration inequality around the mean of suffi-85
ciently light-tailed independent random variables, which enables us to obtain tighter results86
when analyzing the effect of the estimation error. Our last contribution is the validation of87
the results in simulation for a distributionally robust economic dispatch problem, for which88
we further provide a tractable reformulation. We stress that our general objective revolves89
around the robust uncertainty quantification (i.e., distributional inference) problem at hand,90
without having DRO as a necessary end-goal. Further, our approach is fundamentally differ-91
ent from classical Kalman filtering, where the initial state and dynamics noise distributions92
are known and Gaussian, and hence, the state distribution over time is also a known Gaussian93
random variable. Here, instead, we are interested to infer the unknown state distribution94
from data collected by multiple realizations of the dynamics. For each such realization, we95
use an observer since we have no concrete knowledge of the state and noise random models96
to directly invoke optimal filtering techniques. In the online version [9] of this manuscript, we97
provide explicit constants for several of the presented concentration of measure inequalities98
which, to the best of our knowledge, have not been delineated in the literature. These results99
are not essential to keep the theoretical presentation self-contained and are omitted due to100
space constraints.101
2. Preliminaries. Here we present general notation and concepts from probability theory102
used throughout the paper.103
Notation: We denote by ‖ · ‖p the pth norm in Rn, p ∈ [1,∞], using also the notation104
‖ · ‖ ≡ ‖ · ‖2 for the Euclidean norm. The inner product of two vectors a, b ∈ Rn is denoted by105
〈a, b〉 and the Khatri-Rao product [32] of a ≡ (a1, . . . , ad) ∈ Rd and b ≡ (b1, . . . , bd) ∈ Rdn,106
with each bi belonging to Rn, is a ∗ b := (a1b1, . . . , adbd) ∈ Rdn. We use the notation Bnp (ρ)107
for the ball of center zero and radius ρ in Rn with the pth norm and [n1 : n2] for the set of108
integers n1, n1 + 1, . . . , n2 ⊂ N ∪ 0 =: N0. The interpretation of a vector in Rn as an109
n × 1 matrix should be clear form the context (this avoids writing double transposes). The110
diameter of a set S ⊂ Rn with the pth norm is defined as diamp(S) := sup‖x−y‖p |x, y ∈ S111
and for z ∈ Rn, S + z := x+ z |x ∈ S. We denote the induced Euclidean norm of a matrix112
A ∈ Rm×n by ‖A‖ := max‖x‖=1 ‖Ax‖/‖x‖. Given B ⊂ Ω, 1B is the indicator function of B113
on Ω, with 1B(x) = 1 for x ∈ B and 1B(x) = 0 for x /∈ B.114
Probability Theory: We denote by B(Rd) the Borel σ-algebra on Rd, and by P(Rd) the115
probability measures on (Rd,B(Rd)). For any p ≥ 1, Pp(Rd) := µ ∈ P(Rd) |∫Rd ‖x‖
pdµ <∞116
is the set of probability measures in P(Rd) with finite pth moment. The Wasserstein distance117
between µ, ν ∈ Pp(Rd) is118
Wp(µ, ν) :=(
infπ∈H(µ,ν)
∫Rd×Rd
‖x− y‖pπ(dx, dy))1/p
,119120
where H(µ, ν) is the set of all couplings between µ and ν, i.e., probability measures on Rd×Rd121
with marginals µ and ν, respectively. For any µ ∈ P(Rd), its support is the closed set122
supp(µ) := x ∈ Rd |µ(U) > 0 for each neighborhood U of x, or equivalently, the smallest123
This manuscript is for review purposes only.
4 D. BOSKOS, J. CORTES, AND S. MARTINEZ
closed set with measure one. For a random variable X with distribution µ we also denote124
supp(X) ≡ supp(µ). We denote the product of the distributions µ in Rd and ν in Rr by the125
distribution µ ⊗ ν in Rd × Rr. The convolution µ ? ν of the distributions µ and ν on Rd is126
the image of the measure µ ⊗ ν on Rd × Rd under the mapping (x, y) 7→ x + y; equivalently,127
µ ? ν(B) =∫Rd×Rd 1B(x+ y)µ(dx)ν(dy) for any B ∈ B(Rd) (c.f. [6, Pages 207, 208]). Given a128
measurable space (Ω,F), an exponent p ≥ 1, the convex function R 3 x 7→ ψp(x) := exp − 1,129
and the linear space of scalar random variables Lψp := X |E[ψp(|X|/t)] <∞ for some t > 0130
on (Ω,F), the ψp-Orlicz norm (cf. [41, Section 2.7.1]) of X ∈ Lψp is131
‖X‖ψp := inft > 0 |E[ψp(|X|/t)] ≤ 1.132133
When p = 1 and p = 2, each random variable in Lψp is sub-exponential and sub-Gaussian,134
respectively. We also denote by ‖X‖p ≡(E[|X|p
]) 1p the norm of a scalar random variable with135
finite pth moment, i.e., the classical norm in Lp(Ω) ≡ Lp(Ω;PX), where PX is the distribution136
of X. The interpretation of ‖ · ‖p as the pth norm of a vector in Rn or a random variable in137
Lp should be clear from the context throughout the paper. Given a set Xii∈I of random138
variables, we denote by σ(Xii∈I) the σ-algebra generated by them. We conclude with a139
useful technical result which follows from Fubini’s theorem [1, Theorem 2.6.5].140
Lemma 2.1. (Expectation inequality). Consider the independent random vectors X and141
Y , taking values in Rn1 and Rn2, respectively, and let (x, y) 7→ g(x, y) be integrable. Assume142
that E[g(x, Y )] ≥ k(x) for some integrable function k and all x ∈ K with supp(X) ⊂ K ⊂ Rn1.143
Then, E[g(X,Y )] ≥ E[k(X)].144
3. Problem formulation. Consider a stochastic optimization problem where the objective145
function x 7→ f(x, ξ) depends on a random variable ξ with an unknown distribution Pξ. To146
hedge this uncertainty, rather than using the empirical distribution147
PNξ :=1
N
N∑i=1
δξi ,(3.1)148
149
formed by N i.i.d. samples ξ1, . . . , ξN of Pξ to optimize a sample average approximation of150
the expected value of f , one can instead consider the DRO problem151
infx∈X
supP∈PN
EP [f(x, ξ)],(3.2)152
153
of evaluating the worst-case expectation over some ambiguity set PN of probability measures.154
This helps the designer robustify the decision against plausible variations of the data, which155
can play a significant role when the number of samples is limited. Different approaches156
exist to construct the ambiguity set PN so that it contains the true distribution Pξ with157
high confidence. We are interested in approaches that employ data, and in particular the158
empirical distribution PNξ , to construct them. In the present setup, the data is generated159
by a dynamical system subject to disturbances, and we only collect partial (instead of full)160
measurements that are distorted by noise. Therefore, it is no longer obvious how to build a161
candidate state distribution as in (3.1) from the collected samples. Further, we seek to address162
This manuscript is for review purposes only.
DATA-DRIVEN AMBIGUITY SETS FOR LINEAR SYSTEMS 5
this in a distributionally robust way, i.e., finding a suitable replacement PNξ for (3.1) together163
with an associated ambiguity set, by exploiting the dynamics of the underlying process.164
To make things precise, consider data generated by a discrete-time system165
ξk+1 = Akξk +Gkwk, ξk ∈ Rd, wk ∈ Rq,(3.3a)166167
with linear output168
ζk = Hkξk + vk, ζk ∈ Rr.(3.3b)169170
The initial condition ξ0 and the noises wk and vk, k ∈ N0 in the dynamics and the measure-171
ments, respectively, are random variables with an unknown distribution. We seek to build an172
ambiguity set for the state distribution at certain time ` ∈ N, by collecting data up to time173
` from multiple independent realizations of the process, denoted by ξi, i ∈ [1 : N ]. This can174
occur, for instance, when the same process is executed repeatedly, or in multi-agent scenarios175
where identical entities are subject to the same dynamics, see e.g. [48]. The time-dependent176
matrices in the dynamics (3.3) widen the applicability of the results, since they can cap-177
ture the linearization of nonlinear systems along trajectories or the sampled-data analogues of178
continuous-time systems under irregular sampling, even if the latter are linear and time invari-179
ant. To formally describe the problem, we consider a probability space (Ω,F ,P) containing180
all random elements from these realizations, and make the following sampling assumption.181
Assumption 3.1. (Sampling schedule). For each realization i of system (3.3), output182
samples ζi0, . . . , ζi` are collected over the discrete time instants of the sampling horizon [0 : `].183
According to this assumption, the measurements of all realizations are collected over the same184
time window [0 : `]. To obtain quantifiable characterizations of the ambiguity sets, we require185
some further hypotheses on the classes of the distributions Pξ0 of the initial condition, Pwk186
of the dynamics noise, and Pvk of the measurement errors (cf. Figure 1). These assumptions187
are made for individual realizations and allow us to consider non-identical observation error188
distributions—in this way, we allow for the case where each realization is measured by a189
non-identical sensor of variable precision.190
Assumption 3.2. (Distribution classes). Consider a finite sequence of realizations ξi,191
i ∈ [1 : N ] of (3.3a) with associated outputs given by (3.3b), and noise elements wik, vik,192
k ∈ N0. We assume the following:193
H1: The distributions Pξi0, i ∈ [1 : N ], are identically distributed; further Pwik
, i ∈ [1 : N ],194
are identically distributed for all k ∈ N0.195
H2: The sigma fields σ(ξi0 ∪ wikk∈N0
), σ(vikk∈N0
), i ∈ [1 : N ] are independent.196
H3: The supports of the distributions Pξi0and Pwik
, k ∈ N0 are compact, centered at the197
origin, and have diameters 2ρξ0 and 2ρw, respectively, for all i.198
H4: The components of the random vectors vik have uniformly bounded Lp and ψp-Orlicz199
norms, as follows,200
0 < mv ≤ ‖vik,l‖p ≤Mv, ‖vik,l‖ψp ≤ Cv,201202
for all k ∈ N0, i ∈ [1 : N ], and l ∈ [1 : r], where p ≥ 1.203
This manuscript is for review purposes only.
6 D. BOSKOS, J. CORTES, AND S. MARTINEZ
Remark 3.3. (Bounded ψp-Orlicz/Lp-norm ratio). By definition, ψp-Orlicz norms204
can become significantly larger than Lp norms for random variables with heavier tails. Thus,205
over an infinite sequence of random variables Xk, the ratio ‖Xk‖ψp/‖Xk‖p may grow un-206
bounded. We exclude this by assuming that Cv and mv are either positive or zero simultane-207
ously, in which case we set Cv/mv := 0. 208
𝜉𝜉01~ 𝑃𝑃𝜉𝜉0
⋯
⋯ ⋯𝑣𝑣11 ∼ 𝑃𝑃𝑣𝑣11
1 𝑁𝑁
𝑤𝑤11 ∼ 𝑃𝑃𝑤𝑤1
𝜉𝜉0𝑁𝑁~ 𝑃𝑃𝜉𝜉0
𝑤𝑤01 ∼ 𝑃𝑃𝑤𝑤0 𝑣𝑣01 ∼ 𝑃𝑃𝑣𝑣01 𝑤𝑤0𝑁𝑁 ∼ 𝑃𝑃𝑤𝑤0
𝑤𝑤1𝑁𝑁 ∼ 𝑃𝑃𝑤𝑤1
𝑣𝑣0𝑁𝑁 ∼ 𝑃𝑃𝑣𝑣0𝑁𝑁
𝑣𝑣1𝑁𝑁 ∼ 𝑃𝑃𝑣𝑣1𝑁𝑁
Figure 1. Illustration of the probabilistic models for the random variables in the dynamics and observationsaccording to Assumption 3.2.
A direct approach to build the ambiguity set using the measurements of the trajectories209
at time ` would be severely limited, since the output map is in general not invertible. In such210
case, the inverse image of each measurement is the translation of a subspace, whose location211
is further obscured by the measurement noise. As a consequence, candidate states for a212
generated output sample may lie at an arbitrary distance apart, which could only be bounded213
by making additional hypotheses about the support of the state distribution. Instead, despite214
the lack of full-state information, we aim to leverage the system dynamics to estimate the state215
from the whole assimilated output trajectory. To guarantee some boundedness notion for the216
state estimation errors over arbitrary evolution horizons, we make the following assumption.217
Assumption 3.4. (Detectability/uniform observability). System (3.3) satisfies one of218
the following properties:219
(i) It is time invariant and the pair (A,H) (with A ≡ Ak and H ≡ Hk) is detectable.220
(ii) It is uniformly observable, i.e., for some t ∈ N, the observability Gramian221
Ok+t,k :=k+t∑i=k
Φ>i,kH>i HiΦi,k222
223
satisfies Ok+t,k bI for certain b > 0 and all k ∈ N0, where we denote Φk+s,k := Ak+s−1 · · ·224
Ak+1Ak. Further, all system matrices are uniformly bounded and the singular values of Ak225
and the norms of ‖Hk‖ are uniformly bounded below.226
Problem statement: Under Assumptions 3.1 and 3.2 on the measurements and distribu-227
tions of N realizations of the system (3.3), we seek to construct an estimator ξi`(ζi0, . . . , ζ
i`) for228
the state of each realization and build an ambiguity set for the state distribution at time ` with229
This manuscript is for review purposes only.
DATA-DRIVEN AMBIGUITY SETS FOR LINEAR SYSTEMS 7
probabilistic guarantees. Further, under Assumption 3.4 on the system’s detectability/uniform230
observability properties, we aim to characterize the effect of the estimation precision on the231
accuracy of the ambiguity sets.232
We proceed to address the problem in Section 4 by exploiting a Luenberger observer to233
estimate the states of the collected data and using them to replace the classical empirical234
distribution (3.1) in the construction of the ambiguity set. To obtain the probabilistic guar-235
antees, we leverage concentration inequalities to bound the distance between the updated236
empirical distribution and the true state distribution with high confidence. To this end, we237
further quantify the increase of the ambiguity radius due to the noise. We also study the238
beneficial effect on the ambiguity radius of detectability/uniform observability for arbitrarily239
long evolution horizons in Section 5.240
4. State-estimator based ambiguity sets. We address here the question of how to con-241
struct an ambiguity set at certain time instant `, when samples are collected from (3.3)242
according to Assumption 3.1. If we had access to N independent full-state samples ξ1` , . . . , ξ
N`243
from the distribution of ξ at `, we could construct an ambiguity ball in the Wasserstein metric244
Wp centered at the empirical distribution (3.1) with ξi ≡ ξi` and containing the true distribu-245
tion with high confidence. In particular, for any confidence 1 − β > 0, it is possible, cf. [17,246
Theorem 3.5], to specify an ambiguity ball radius εN (β) so that the true distribution of ξ` is247
in this ball with confidence 1− β, i.e.,248
P(Wp(PNξ`, Pξ`) ≤ εN (β)) ≥ 1− β.249250
Instead, since we only can collect noisy partial measurements of the state, we use a Luenberger251
observer to estimate ξ at time `. The dynamics of the observer, initialized at zero, is given by252
ξk+1 = Akξk +Kk(Hkξk − ζk), ξ0 = 0,(4.1)253254
where each Kk is a nonzero gain matrix. Using the corresponding estimates from system (4.1)255
for the independent realizations of (3.3a), we define the (dynamic) estimator-based empirical256
distribution257
PNξk :=1
N
N∑i=1
δξik,(4.2)258
259
Denoting by ek := ξk− ξk the error between (3.3a) and the observer (4.1), the error dynamics260
is ek+1 = Fkek+Gkwk+Kkvk, e0 = ξ0, where Fk := Ak+KkHk and ξ0 is the initial condition261
of (3.3a). In particular,262
ek = Ψkξ0 +k∑
κ=1
(Ψk,k−κ+1Gk−κwk−κ + Ψk,k−κ+1Kk−κvk−κ
)(4.3)263
264
for all k ≥ 1, where Ψk+s,k := Fk+s−1 · · ·Fk+1Fk, Ψk,k := I and Ψk := Ψk,0. To build the265
ambiguity set at time `, we set its center at the estimator-based empirical distribution PNξ`266
given by (4.2). In what follows, we leverage concentration of measure results to identify an267
ambiguity radius ψN (β) so that the resulting Wasserstein ball contains the true distribution268
This manuscript is for review purposes only.
8 D. BOSKOS, J. CORTES, AND S. MARTINEZ
with a given confidence 1 − β. Note that even if a distributionally robust framework is not269
employed, replacing the empirical distribution by the estimator empirical distribution in (4.2)270
does no longer guarantee consistency, in the sense that the estimator empirical distribution271
does not necessarily converge (weakly) to the true distribution. Hence, there is also no indi-272
cation that the solution to the associated estimator Sample Average Approximation (SAA)273
problem, i.e., to274
infx∈X
1
N
N∑i=1
f(x, ξi`)275
276
with ξi` replaced by ξi`, will be a consistent estimator of the solution to the nominal stochastic277
optimization problem. This is a fundamental limitation that is justified by the fact that,278
in general, the estimation error is dependent on the state realization, i.e., it has a variable279
distribution when conditioned on the state and the internal noise, and so its effect cannot be280
easily reversed (this may only be possible in rather degenerate cases, e.g., one has access to281
full-sate samples and the measurement noise is known).282
Note that the random variable ξik of a system realization at time k is a function ξik(ξi0,w
ik)283
of the random initial condition ξi0 and the dynamics noise wik ≡ (wi0, . . . , w
ik−1). Analogously,284
the estimated state ξik of each observer realization is a stochastic variable ξik(ξi0,w
ik,v
ik) with285
additional randomness induced by the output noise vik ≡ (vi0, . . . , vik−1). Using the compact286
notation ξ0 ≡ (ξ10 , . . . , ξ
N0 ), wk ≡ (w1
k, . . . ,wNk ), and vk ≡ (v1
k, . . . ,vNk ) for the corresponding287
initial conditions, dynamics noise, and output noise of all realizations, respectively, we can288
denote the empirical and estimator-based-empirical distributions at time ` as PNξ` (ξ0,w`)289
and PNξ` (ξ0,w`,v`). If we view the initial conditions and the corresponding internal noise290
of the realizations ξi over the whole time horizon as deterministic quantities, we use the291
alternative notation PNξ` (z,ω) and PNξ` (z,ω,v`) for the corresponding distributions, where292
z = (z1, . . . , zN ), z1 ≡ ξ10 , . . . , z
N ≡ ξN0 , and ω = (ω1, . . . ,ωN ), ω1 ≡ w1` , . . . ,ω
N ≡ wN` . We293
also denote by Pξ` the true distribution of the data at discrete time `, where from (3.3a),294
ξ` = Φ`ξ0 +∑k=1
Φ`,`−k+1G`−kw`−k,(4.4)295
296
where Φ` := Φ`,0 and Φ`,` := I (and with Φk+δk,k defined in Assumption 3.4). Then, it follows297
from H1 and H2 in Assumption 3.2 that the random states ξi` of the system realizations298
are independent and identically distributed. Leveraging this, our goal is to associate to each299
confidence 1− β, an ambiguity radius ψN (β) so that300
P(Wp(PNξ`, Pξ`) ≤ ψN (β)) ≥ 1− β.(4.5)301302
To achieve this, we decompose the confidence as the product of two factors:303
1− β = (1− βnom)(1− βns).(4.6)304305
The first factor (the nominal component “nom”) is exploited to control the Wasserstein dis-306
tance between the empirical distribution and the true state distribution Pξ` . The purpose of307
This manuscript is for review purposes only.
DATA-DRIVEN AMBIGUITY SETS FOR LINEAR SYSTEMS 9
the second factor (the noise component “ns”) is to bound the Wasserstein distance between308
the true- and the estimator-based-empirical distributions, which is affected by the measure-309
ment noise. Using this decomposition, our strategy to get (4.5) builds on further breaking the310
ambiguity radius as311
ψN (β) := εN (βnom) + εN (βns).(4.7)312313
We exploit what is known [8] for the no-noise case to bound the nominal ambiguity radius314
εN (βnom) with confidence 1− βnom. Moreover, we bound the noise ambiguity radius εN (βns)315
with confidence 1 − βns. This latter radius corresponds to the impact on distributional un-316
certainty of the internal and measurement noise. In the next sections we present the precise317
individual bounds for these terms and then combine them to obtain the overall ambiguity318
radius.319
4.1. Nominal ambiguity radius. According to Assumption 3.2, the initial condition and320
internal noise distributions are compactly supported, and hence, the same holds also for the321
state distribution along time. We will therefore use the following result, that is focused on322
compactly supported distributions and bounds the distance between the true and empirical323
distribution for any fixed confidence level.324
Proposition 4.1. (Nominal ambiguity radius [8, Corollary 3.3]). Consider a se-325
quence Xii∈N of i.i.d. Rd-valued random variables with a compactly supported distribu-326
tion µ. Then for any p ≥ 1, N ≥ 1, and confidence 1 − β with β ∈ (0, 1), we have327
P(Wp(µN , µ) ≤ εN (β, ρ)) ≥ 1− β, where328
εN (β, ρ) :=
(ln(Cβ−1)
c
) 12p ρ
N12p, if p > d/2,
h−1(
ln(Cβ−1)cN
) 1pρ, if p = d/2,(
ln(Cβ−1)c
) 1d ρ
N1d, if p < d/2,
(4.8)329
330
µN := 1N
∑Ni=1 δXi, ρ := 1
2diam∞(supp(µ)), h(x) := x2
(ln(2+1/x))2, x > 0, and the constants C331
and c depend only on p and d.332
This result shows how the nominal ambiguity radius depends on the size of the distri-333
bution’s support, the confidence level, and the number of samples, and is based on recent334
concentration of measure inequalities from [18]. The determination of the constants C and335
c in (4.8) for the whole spectrum of data dimensions d and Wasserstein exponents p is a336
particularly cumbersome task. Nevertheless, in the online version [9, Section 8.2], we provide337
some alternative concentration of measure results and use them to obtain explicit formulas338
for these constants when d > 2p.339
4.2. Noise ambiguity radius. In this section, we quantify the noise ambiguity radius340
εN (βns) for any prescribed confidence 1 − βns. We first give a result that uniformly bounds341
the distance between the true- and estimator-based-empirical distributions with prescribed342
confidence for all values of the initial condition and the internal noise from the set BNd∞ (ρξ0)×343
BN`q∞ (ρw), which contains the support of their joint distribution (and hence all their possible344
This manuscript is for review purposes only.
10 D. BOSKOS, J. CORTES, AND S. MARTINEZ
realizations). For the results of this section, the initial condition and the internal noise are345
interpreted as deterministic quantities, as discussed above.346
Lemma 4.2. (Distance between true- & estimator-based-empirical distribution).347
Let (z,ω) ∈ BNd∞ (ρξ0) × BN`q
∞ (ρw) and consider the discrete distribution PNξ` ≡ PNξ` (z,ω)348
and the empirical distribution PNξ` ≡ PNξ` (z,ω,v`), where v` is the measurement noise of the349
realizations. Then,350
Wp(PNξ`, PNξ` ) ≤ 2
p−1p Mw + 2
p−1p
( 1
N
N∑i=1
(Ei)p) 1p, where(4.9a)351
Mw :=√d‖Ψ`‖ρξ0 +
√q∑k=1
‖Ψ`,`−k+1G`−k‖ρw,(4.9b)352
Ei ≡ E(vi) :=∑k=1
‖Ψ`,`−k+1K`−k‖‖vi`−k‖1.(4.9c)353
354
The next result gives bounds for the norms of the random variables Ei in Lemma 4.2.355
Lemma 4.3. (Orlicz- & Lp-norm bounds for Ei). The random variables Ei in (4.9c)356
satisfy357
‖Ei‖p ≤Mv := Mvr∑k=1
‖Ψ`,`−k+1K`−k‖,(4.10a)358
‖Ei‖ψp ≤ Cv := Cvr∑k=1
‖Ψ`,`−k+1K`−k‖,(4.10b)359
‖Ei‖p ≥ mv := mvr1p
(∑k=1
‖Ψ`,`−k+1K`−k‖p) 1p
,(4.10c)360
361
with mv, Mv, and Cv as given in H4.362
The proofs of both results above are given in the Appendix. We further rely on the363
following concentration of measure result around the mean of nonnegative independent random364
variables, whose proof is also in the Appendix, to bound the term(
1N
∑Ni=1(Ei)p
) 1p , and control365
the Wasserstein distance between the true- and the estimator-based-empirical distribution.366
Proposition 4.4. (Concentration around pth mean). Let X1, . . . , XN be scalar, non-367
negative, independent random variables with finite ψp norm and E[Xpi ] = 1. Then,368
P((
1
N
N∑i=1
Xpi
) 1p
− 1 ≥ t)≤ 2 exp
(− c′N
R2αp(t)
),(4.11)369
370
for every t ≥ 0, with c′ = 1/10, R := maxi∈[1:N ] ‖Xi‖ψp + 1/ ln 2, and371
αp(s) :=
s2, if s ∈ [0, 1],
sp, if s ∈ (1,∞).(4.12)372
373
This manuscript is for review purposes only.
DATA-DRIVEN AMBIGUITY SETS FOR LINEAR SYSTEMS 11
Combining the results above, we obtain the main result of this section regarding the374
ambiguity center difference.375
Proposition 4.5. (Distance guarantee between true- & estimator-based-empirical376
distribution). Consider a confidence 1− βns and let377
εN (βns) := 2p−1p
(Mw + Mv + Mvα
−1p
(R2
c′Nln
2
βns
)),(4.13)378
379
with Mw, Mv given by (4.9b), (4.10a),380
R := Cv/mv + 1/ ln 2,(4.14)381382
and Cv, mv as in (4.10b), (4.10c). Then, for all (z,ω) ∈ BNd∞ (ρξ0)×BN`q
∞ (ρw), we have383
P(Wp(P
Nξ`
(z,ω,v`), PNξ`
(z,ω)) ≤ εN (βns))≥ 1− βns.(4.15)384385
Proof. For each i, the random variable Xi := Ei/‖Ei‖p satisfies ‖Xi‖p = 1. Thus, we386
obtain from Proposition 4.4 that387
P((
1
N
N∑i=1
(Ei
‖Ei‖p
)p) 1p
− 1 ≥ t)≤ 2 exp
(− c′N
R2αp(t)
),388
389
where R = maxi∈[1:N ]
∥∥Ei/‖Ei‖p∥∥ψp + 1/ ln 2. From (4.10b), (4.10c), and (4.14), we deduce390
R ≥ R, and thus,391
P((
1
N
N∑i=1
(Ei
‖Ei‖p
)p) 1p
− 1 ≥ t)≤ 2 exp
(− c′N
R2αp(t)
).392
393
Now, it follows from (4.10a) that394
Mv
(1
N
N∑i=1
(Ei
‖Ei‖p
)p) 1p
−Mv ≥(
1
N
N∑i=1
(Ei)p) 1p
−Mv.395
396
Thus, we deduce397
P((
1
N
N∑i=1
(Ei)p) 1p
−Mv ≥Mvt
)≤ 2 exp
(− c′N
R2αp(t)
),398
399
or, equivalently, that400
P((
1
N
N∑i=1
(Ei)p) 1p
≥Mv + s
)≤ 2 exp
(− c′N
R2αp
( s
Mv
)).(4.16)401
402
To establish (4.15), it suffices by Lemma 4.2 to show that403
P(
2p−1p Mw + 2
p−1p
( 1
N
N∑i=1
(Ei)p) 1p ≤ εN (βns)
)≥ 1− βns.404
This manuscript is for review purposes only.
12 D. BOSKOS, J. CORTES, AND S. MARTINEZ
405
By the definition of εN and exploiting that it is strictly decreasing with βns, it suffices to prove406
P(( 1
N
N∑i=1
(Ei)p) 1p<Mv + Mvα
−1p
( R2
c′Nln
2
βns
))≥ 1− βns.407
408
Setting τ = α−1p
(R2
c′N ln 2βns
), we equivalently need to show409
P(( 1
N
N∑i=1
(Ei)p) 1p ≥Mv + τMv
)≤ βns,410
411
which follows by (4.16) with s = τMv.412
4.3. Overall ambiguity set. Here we combine the results from Sections 4.1 and 4.2 to413
obtain the ambiguity set of the state distribution in the following result.414
Theorem 4.6. (Ambiguity set under noisy dynamics & observations). Consider415
data collected from N realizations of system (3.3) in accordance to Assumptions 3.1 and 3.2,416
a confidence 1−β, and let βnom, βns ∈ (0, 1) satisfying (4.6). Then, the guarantee (4.5) holds,417
where ψN (β) is given in (4.7) and its components εN (βnom) ≡ εN (βnom, ρξ`) and εN (βns) are418
given by (4.8) and (4.13), respectively, with419
ρξ` :=√d‖Φ`‖ρξ0 +
√q∑k=1
‖Φ`,`−k+1G`−k‖ρw.(4.17)420
421
Proof. Due to (4.7) and the triangle inequality for Wp,422
Wp(PNξ`, Pξ`) ≤ ψN (β) ⊃ Wp(P
Nξ`, PNξ` ) ≤ εN (βns) ∩ Wp(P
Nξ`, Pξ`) ≤ εN (βnom, ρξ`).423424
Thus, to show (4.5), it suffices to show that425
E[1Wp(PNξ`
,PNξ`)−εN (βns)≤0 × 1Wp(PNξ`
,Pξ` )−εN (βnom,ρξ` )≤0
]≥ 1− β.(4.18)426
427
We therefore exploit Lemma 2.1 with the random variable X ≡ (ξ0,w`), taking values in428
the compact set K ≡ BNd∞ (ρξ0) × BN`q
∞ (ρw), the random variable Y ≡ v` ∈ RN`r, and429
g(X,Y ) ≡ g(ξ0,w`,v`), where430
g(ξ0,w`,v`) := 1Wp(PNξ`(ξ0,w`),Pξ` )−εN (βnom,ρξ` )≤0 × 1Wp(PNξ`
(ξ0,w`,v`),PNξ`
(ξ0,w`))−εN (βns)≤0.431432
Due to (4.15), E[1Wp(PNξ`
(z,ω,v`),PNξ`
(z,ω))−εN (βns)≤0
]≥ 1 − βns for any x = (z,ω) ∈ K and433
thus E[g(x, Y )] ≥ 1Wp(PNξ`(x),Pξ` )−εN (βnom,ρξ` )≤0 × (1 − βns) =: k(x), for all x ∈ K. Hence,434
since X ≡ (ξ0,w`) and Y ≡ v` are independent by H2, we deduce from Lemma 2.1 that435
E[g(X,Y )] ≥ E[1Wp(PNξ`
(ξ0,w`),Pξ` )−εN (βnom,ρξ` )≤0(1− βns)]
436
This manuscript is for review purposes only.
DATA-DRIVEN AMBIGUITY SETS FOR LINEAR SYSTEMS 13
= (1− βns)P(Wp(PNξ`
(ξ0,w`), Pξ`) ≤ εN (βnom, ρξ`)).437438
From (4.4) and H3 in Assumption 3.2, it follows that Pξ` is supported on the compact set439
Bd∞(ρξ`) with diam∞(Bd
∞(ρξ`)) = 2ρξ` and ρξ` given in (4.17). In addition, due to H1 and H2440
in Assumption 3.2 the random states ξi` in the empirical distribution PNξ` (ξ0,w`) = 1N
∑Ni=1 δξi`
441
are i.i.d.. Thus, we get from Proposition 4.1 that P(Wp(PNξ`
(ξ0,w`), Pξ`) ≤ εN (βnom, ρξ`)) ≥442
1− βnom, which implies E[g(X,Y )] ≥ (1− βns)(1− βnom) = 1− β. Finally, (4.18) follows from443
this and the definition of g.444
With this result at hand, we deduce from the expressions (4.8) and (4.15) for the compo-445
nents of the ambiguity radius that it decreases as we exploit a larger number N of independent446
trajectories and relax our confidence choices, i.e., reduce 1 − βnom and 1 − βns. Notice fur-447
ther that no matter how many trajectories we use, the noise ambiguity radius decreases to a448
strictly positive value. It is also worth to observe that ψN generalizes the nominal ambiguity449
radius εN in the DRO literature (even when dynamic random variables are considered [8])450
and reduces to εN in the noise-free case where εN = 0.451
Drawing conclusions about how the ambiguity radius behaves as we simultaneously allow452
the horizon [0 : `] and the number N of sampled trajectories to increase is a more delicate453
matter. The value of the nominal component depends essentially on N and the support of454
the distribution at `, with the latter in turn depending on the system’s stability properties455
and the support of the initial condition and internal noise distributions. On the other hand,456
the noise component depends on N and the quality of the estimation error. We quantify in457
the next section how the latter guarantees uniform boundedness of the noise radius under458
detectability-type assumptions.459
Remark 4.7. (Positive lower bound of the noise radius). The positive lower bound460
2p−1p (Mw+Mv) on the noise radius in (4.13) represents in general a fundamental limitation for461
the ambiguity set accuracy, which is independent of the number N of estimated state samples.462
This is because the bound is related to the size of the state estimation error, which persists463
under the presence of noise and may further grow in time if there is no system detectability.464
Remark 4.8. (Optimal radius selection). Once a desired confidence level 1 − β and465
the number of independent trajectories N are fixed, we can optimally select the ambiguity466
radius by minimizing the function467
βnom 7→ ψN (βnom) ≡ εN (βnom) + εN ((β − βnom)/(1− βnom)),468469
where we have taken into account the constraint (4.6) between the nominal and the noise470
confidence. This function is non-convex, but one-dimensional, and its minimizer is in the471
interior of the interval (0, β), so its optimal value can be approximated with high accuracy.472
4.4. Uncertainty quantification over bounded time horizons. In this section we discuss473
how the guarantees can be extended to scenarios where an ambiguity set is built over a474
finite-time horizon instead of a single instance `. In this case we assume that samples are475
collected over the time window [0 : `2] and we seek to build an ambiguity set about the476
state distribution along [`1 : `2], with 0 ≤ `1 ≤ `2. We distinguish between two ambiguity set477
descriptions depending on the way the associated probabilistic guarantees are obtained. In the478
This manuscript is for review purposes only.
14 D. BOSKOS, J. CORTES, AND S. MARTINEZ
first, we directly build an ambiguity set for the probability distribution of the random vector479
ξ` := (ξ`1 , . . . , ξ`2) ∈ R˜d with ` := (`1, . . . , `2) and ˜= `2−`1 +1, comprising of all states over480
the interval of interest and using the concentration of measure result of Proposition 4.1 for ˜d-481
dimensional random variables. This has the drawback that the ambiguity radius decays slowly482
with the number of trajectories due to the high dimension of ξ`. The other description derives483
an ambiguity set about the state distribution Pξ`1 at time `1 with prescribed confidence, and484
propagates it under the dynamics while taking into account the possible values of the internal485
noise. We also present sharper results for the case when the internal noise sequence is known.486
The first ambiguity set description is provided by the following analogue of Theorem 4.6.487
Theorem 4.9. (Ambiguity set over a bounded time horizon). Consider output data488
collected from N realizations of system (3.3) over the interval [0 : `2] and let Assumption 3.1489
hold. Pick a confidence 1−β, let βnom, βns ∈ (0, 1) satisfying (4.6), and consider the bounded-490
horizon estimator empirical distribution491
PNξ` :=1
N
N∑i=1
δξi`
492
493
over the horizon [`1 : `2], where ξi` := (ξi`1 , . . . , ξi`2
) ∈ R˜d and each ξi` is given by the observer494
(4.1). Then495
P(Wp(PNξ`, Pξ`) ≤ ψN (β)) ≥ 1− β(4.19)496497
holds, where ξ` := (ξ`1 , . . . , ξ`2) and ψN (β) is given in (4.7). The nominal component498
εN (βnom) ≡ εN (βnom, ρξ`) is given by (4.8) (with d in the expression substituted by ˜d)499
ρξ` := max`∈[`1:`2]
√d‖Φ`‖ρξ0 +
√q∑k=1
‖Φ`,`−k+1G`−k‖ρw,(4.20)500
501
whereas εN (βns) is given as502
εN (βns) := 2p−1p
(Mw + Mv + Mvα
−1p
(R2
c′Nln
2
βns
)), with503
Mw :=
`2∑`=`1
Mw(`), Mv :=
`2∑`=`1
Mv(`),504
R :=Cvmv
+1
ln 2, Cv :=
`2∑`=`1
Cv(`), mv :=
`2∑`=`1
mv(`),505
506
and Mw(`) ≡ Mw, Mv(`) ≡ Mv, Cv(`) ≡ Cv, and mv(`) ≡ mv, as given by (4.9b), (4.10a),507
(4.10b), and (4.10c), respectively.508
The proof of this result follows the argumentation employed for the proof of Theorem 4.6509
(a sketch can be found in the online version [9]). For the second ambiguity set description we510
use a pointwise-in-time approach. To this end, we build a family of ambiguity balls so that511
This manuscript is for review purposes only.
DATA-DRIVEN AMBIGUITY SETS FOR LINEAR SYSTEMS 15
under the same confidence level the state distribution at each time instant of the horizon lies512
in the associated ball, i.e.,513
P(Pξ` ∈ BψN,`(P
Nξ`
) ∀` ∈ [`1 : `2])≥ 1− β,(4.21)514515
where BψN,`(PNξ`)
:= P ∈ Pp(Rd) |Wp(P, PNξ`
) ≤ ψN,` and PNξ` is the center of the ball.516
This is well suited for stochastic optimization problems that have a separable structure with517
respect to the stochastic argument across different time instances, i.e., problems of the form518
infx∈X
E[f1(x, ξ`1) + · · ·+ f˜(x, ξ`2)
].519
520
The pointwise ambiguity sets are quantified in the following result.521
Theorem 4.10. (Pointwise ambiguity sets over a bounded time horizon). Let the522
assumptions of Theorem 4.9 hold, assume that the internal noise sequence w` is independent523
(also of the initial state), Pw` ∈ Pp(Rd) for ` ∈ [`1 : `2], i.e., it is not necessarily compactly524
supported, and consider either of the following two cases for its distribution when ` ∈ [`1 : `2]:525
(i) Pw` is not known and E[‖w`‖p
] 1p ≤ qw.526
(ii) Pw` is known.527
Then, for any confidence 1 − β, and βnom, βns ∈ (0, 1) satisfying (4.6), (4.21) holds, with528
PNξ`1:= PNξ`1
and ψN,`1 as given by Theorem 4.6 (for ` ≡ `1), and PNξ` , ψN,`, ` ∈ [`1 + 1 : `2]529
defined as follows for the respective two cases above:530
(i) The ambiguity set center is PNξ` := 1N
∑Ni=1 δξi`
with ξi` := Φ`,`1 ξ`1 and the radius is531
given recursively by ψN,` := ‖A`−1‖ψN,`−1 + qw.532
(ii) The ambiguity set center is PNξ` :=((A`−1)#P
Nξ`−1
)? Pw`−1
and the radius is ψN,` :=533
‖A`−1‖ · · · ‖A`1‖ψN,`1.534
In our technical approach, we use the next result, whose proof is given in the Appendix.535
The result examines what happens to the Wasserstein distance between the distributions of536
two random variables when other random variables are added.537
Lemma 4.11. (Wasserstein distance under convolution). Given p ≥ 1 and distribu-538
tions P1, P2, Q ∈ Pp(Rd), it holds that Wp(P1, P2) ≤Wp(P1 ? Q, P2 ? Q). Also, if it holds that539 ( ∫Rd ‖x‖
pQ(dx)) 1p ≤ q, then Wp(P1, P2 ? Q) ≤Wp(P1, P2) + q.540
Proof of Theorem 4.10. The proof is carried out by induction on ` ∈ [`1 : `2]. In particular,541
it suffices to establish that542
Wp
(Pξ`1 , P
Nξ`1
)≤ ψN,`1 =⇒Wp
(Pξ`′ , P
Nξ`′
)≤ ψN,`′ ∀`′ ∈ [`1 : `].(4.22)543
544
Note that from Theorem 4.6, P(Pξ`1 ∈ BψN,`1 (PNξ`1
))≥ 1 − β. From (4.22), this also implies545
that P(Pξ`′ ∈ BψN,`′ (P
Nξ`′
) ∀`′ ∈ [`1 : `])≥ 1− β, establishing validity of the result for ` ≡ `2.546
For ` ≡ `1, the induction hypothesis (4.22) is a tautology. Next, assuming that it is547
true for certain ` ∈ [`1 : `2 − 1], we show that it also holds for ` + 1. Hence it suffices548
to show that if Wp
(Pξ` , P
Nξ`
)≤ ψN,` then also Wp
(Pξ`+1
, PNξ`+1
)≤ ψN,`+1 for both cases (i)549
and (ii). Consider first (i) and note that then the ambiguity set center at ` + 1 satisfies550
This manuscript is for review purposes only.
16 D. BOSKOS, J. CORTES, AND S. MARTINEZ
PNξ`+1= 1
N
∑Ni=1 δξi`+1
= 1N
∑Ni=1 δA`ξi`
= (A`)#PNξ`, where we have exploited that ξik ≡ Φk,`1 ξ
i`1
551
and the definition of Φk,`1 (for k = `− 1, `) to derive the second equality. Using also the fact552
that Pξ`+1=((A`)#Pξ`
)? Pw` , we get from the second result of Lemma 4.11 that553
Wp(Pξ`+1, PNξ`+1
) = Wp
(((A`)#Pξ`
)? Pw` , (A`)#P
Nξ`
)≤Wp
((A`)#Pξ` , (A`)#P
Nξ`
)+ qw554
≤ ‖A`‖Wp
(Pξ` , P
Nξ`
)+ qw ≤ ‖A`‖ψN,` + qw = ψN,`+1.555556
Here we also used the fact that Wp(f#P, f#Q) ≤ LWp(P,Q) for any globally Lipschitz func-557
tion f : Rd → Rr with Lipschitz constant L in the second to last inequality (see e.g., [42,558
Proposition 7.16]).559
Next, we prove the induction hypothesis for (ii). Using Lemma 4.11 and the definition of560
the ambiguity set center and radius,561
Wp(Pξ`+1, PNξ`+1
) = Wp
(((A`)#Pξ`
)? Pw` ,
((A`)#P
Nξ`
)? Pw`
)≤Wp
((A`)#Pξ` , (A`)#P
Nξ`
)562
≤ ‖A`‖Wp
(Pξ` , P
Nξ`
)≤ ‖A`‖ψN,` = ‖A`‖‖A`−1‖ · · · ‖A`1‖ψN,`1 = ψN,`+1,563564
completing the proof.565
5. Sufficient conditions for uniformly bounded noise ambiguity radii. In this section566
we leverage Assumption 3.4 to establish that the noise ambiguity radius remains uniformly567
bounded as the sampling horizon increases. We first provide uniform bounds for the matrices568
involved in the system and observer error dynamics.569
Proposition 5.1. (Bounds on system/observer matrices). Under Assumption 3.4, the570
gain matrices Kk can be selected so that the following properties hold:571
(i) There exist K?,K?, G? > 0 and Ψ?
s > 0, s ∈ N0, so that ‖Gk‖ ≤ G?, K? ≤ ‖Kk‖ ≤ K?,572
and ‖Ψk+s,k‖ ≤ Ψ?s for all and k ∈ N0.573
(ii) There exists s0 ∈ N so that ‖Ψk+s,k‖ ≤ 12 for all k ∈ N0 and s ≥ s0.574
Proof. Note that we only need to verify part (i) for the time-varying case. Since all Gk575
are uniformly bounded, we directly obtain the bound G?. Let576
Kk := −AkΦk,k−t−1O−1k,k−t−1Φ>k,k−t−1H
>k ,577
578
(for k > t + 1) as selected in [33, Page 574] (but with a minus sign at the front to get579
the plus sign in Fk = Ak +KkHk) and with the observability Gramian Ok,k−t−1 as defined in580
Assumption 3.4(ii). Then, the upper bound K? follows from the fact that the system matrices581
are uniformly bounded combined with the uniform observability property of Assumption 3.4,582
which implies that all O−1k,k−t−1 are also uniformly bounded. On the other hand, the lower583
bound K? follows from the assumption that the system matrices are uniformly bounded, which584
imposes a uniform lower bound on the smallest singular value of O−1k,k−t−1, the uniform lower585
bound on the smallest singular value of Ak, hence, also on that of Φk,k−t−1 and Φ>k,k−t−1,586
and the uniform lower bound on ‖Hk‖ (all found in Assumption 3.4). Finally, the bounds Ψ?s587
follow from the uniform bounds for all Ak and Hk and the derived bound K? for all Kk.588
To show part (ii), assume first that Assumption 3.4(i) holds, i.e., the system is time589
invariant and (A,H) is detectable. Then, we can choose a nonzero gain matrix K so that590
This manuscript is for review purposes only.
DATA-DRIVEN AMBIGUITY SETS FOR LINEAR SYSTEMS 17
F = A+KH is convergent (cf. [40, Theorem 31]), namely lims→∞ ‖F s‖ = 0. Consequently,591
there is s0 ∈ N with ‖F s‖ ≤ 12 for all s ≥ s0 and the result follows by taking into account that592
Ψk+s,k = F s. In case Assumption 3.4(ii) holds, let593
ek+1 = Fkek(5.1)594595
be the recursive noise-free version of the error equation (4.3). Then, from [33, Page 577],596
there exists a quadratic time-varying Lyapunov function V (k, e) := e>Qke with each Qk597
being positive definite, a1, a2 > 0, a3 ∈ (0, 1), and m ∈ N so that598
a1 ≤ λmin(Qk) ≤ λmax(Qk) ≤ a2(5.2a)599
V (k +m, ek+m)− V (k, ek) ≤ −a3V (k, ek)(5.2b)600601
for any k and any solution of (5.1) with state ek at time k. Thus, Ψ>k+m,mQk+mΨk+m,m 602
(1− a3)Qk, and hence, by induction Ψ>k+νm,mQk+νmΨk+νm,m (1− a3)νQk, since603
Ψ>k+(ν+1)m,mQk+(ν+1)mΨk+(ν+1)m,m=Ψ>k+m,kΨ>k+(ν+1)m,k+mQk+(ν+1)mΨk+(ν+1)m,k+mΨk+m,k604
(1− a3)νΨ>k+m,kQk+mΨk+m,k (1− a3)(ν+1)Qk.605606
Next, pick e with ‖e‖ = 1 and ‖Ψk+νm,me‖ = ‖Ψk+νm,m‖. Taking into account that607
e>Ψ>k+νm,kQk+νmΨk+νm,me ≤ (1 − a3)ν e>Qke, we get λmin(Qk+νm)‖Ψk+νm,ke‖2 ≤ (1 −608
a3)νλmax(Qk). Using (5.2a),609
‖Ψk+νm,k‖ ≤ (1− a3)ν2
(a2
a1
) 12.(5.3)610
611
Now, select ν so that (1 − a3)ν′2 (a2/a1)
12 ≤ 1/(2 maxs∈[1:m] Ψ?
s) for all ν ′ ≥ ν. Let s0 := νm612
and pick s ≥ s0. Then, s = s′0 + m′ for some s′0 = ν ′m, ν ′ ≥ ν, and m′ ∈ [0 : m − 1] and we613
get from (5.3), part (i), and the selection of ν that614
‖Ψk+sm,k‖=‖Ψk+s′0+m′,k+s′0Ψk+s′0,k
‖≤‖Ψk+s′0+m′,k+s′0‖‖Ψk+ν′m,k‖≤Ψ?
m′1
2 maxs∈[1:m] Ψ?s
≤ 1
2,615
616
which establishes the result.617
Based on this result and Assumption 3.4 about the system’s detectability/uniform observ-618
ability properties, we proceed to provide a uniform bound on the size of the noise radius for619
arbitrarily long evolution horizons.620
Proposition 5.2. (Uniform bounds for noise ambiguity radius). Consider data col-621
lected from N realizations of system (3.3), a confidence 1 − β as in (4.6), and let Assump-622
tions 3.1, 3.2, and 3.4 hold. Then, there exist observer gain matrices Kk so that the noise623
ambiguity radius εN in (4.13) is uniformly bounded with respect to the sampling horizon size.624
In particular, there exists `0 ∈ N so that, for each ` ≥ `0, Mw ≡ Mw(`), Mv ≡ Mv(`), and625
R ≡ R(`) given by (4.9b), (4.10a), and (4.14), are uniformly upper bounded as626
Mw ≤1
2
√dρξ0 + 3
√q
`0−1∑j=0
Ψ?jG
?ρw,627
This manuscript is for review purposes only.
18 D. BOSKOS, J. CORTES, AND S. MARTINEZ
Mv ≤ 3Mvr
`0−1∑j=0
Ψ?jK
?, R ≤ 3Cvmv
rp−1p
∑`0−1j=0 Ψ?
jK?
K?.628
629
Proof. Consider gain matrices Kk and the time s0 as given in Proposition 5.1, and let630
`0 := s0. Then, for any ` ≥ `0, ` = n`0 + r′ with 0 ≤ r′ < `0 and we have631
∑k=1
‖Ψ`,`−k+1G`−k‖ ≤∑k=1
‖Ψ`,`−k+1‖G? =
( r′∑k=1
‖Ψ`,`−k+1‖+∑
k=r′+1
‖Ψ`,`−k+1‖)G?632
≤( r′−1∑
s=0
Ψ?s +
n`0+r′∑k=r′+1
‖Ψn`0+r′,n`0+r′−k+1‖)G? (k 7→ (ν − 1)`0 + j + r′)633
=
( r′−1∑s=0
Ψ?s +
n∑ν=1
`0∑j=1
‖Ψn`0+r′,(n−ν)`0+r′+`0−j+1‖)G? (`0 + 1− j 7→ j)634
≤( r′−1∑
s=0
Ψ?s +
n∑ν=1
`0∑j=1
‖Ψn`0+r′,(n−ν+1)`0+r′‖‖Ψ(n−ν)`0+r′+`0,(n−ν)`0+r′+j‖)G?635
=
( r′−1∑s=0
Ψ?s +
n∑ν=1
‖Ψn`0+r′,(n−ν+1)`0+r′‖`0∑j=1
‖Ψ(n−ν)`0+r′+`0,(n−ν)`0+r′+j‖)G?636
≤( r′−1∑
s=0
Ψ?s +
n∑ν=1
( ν−1∏κ=1
‖Ψ(n+1−κ)`0+r′,(n−κ)`0+r′‖) `0∑j=1
Ψ?`0−j
)G?637
≤( `0−1∑
s=0
Ψ?s +
n∑ν=1
(1
2
)ν−1`0−1∑j=0
Ψ?j
)G? ≤ 3
`0−1∑j=0
Ψ?jG
?,638
639
where we have used∑−1
κ=0 ≡∑0
κ=1 ≡ 0 and∏0κ=1 ≡ 1. From this and the fact that from640
Proposition 5.1, ‖Ψ`‖ ≤ 12 for all ` ≥ `0, we get the upper bound for Mw. The one for Mv is641
obtained analogously. Finally, for R, we obtain the same type of upper bound for Cv as for Mw,642
and exploit Proposition 5.1(i) to get the lower bound mv = mvr1p(∑`
k=1 ‖Ψ`,`−k+1K`−k‖p) 1p ≥643
mvr1p ‖Ψ`,`K`−1‖ ≥ mvr
1pK?, which is independent of `.644
Remark 5.3. (Noise ambiguity radius for time-invariant systems). For time-645
invariant systems, it is possible to improve the bounds of Proposition 5.2 for Mw, Mv, and R646
by exploiting the fact that the system and observer gain matrices are constant. The precise647
bounds in this case (see also [7, Proposition 5.5]) are648
Mw ≤1
2
√dρξ0 + 2
√q
`0−1∑k=0
‖ΨkG‖ρw,649
Mv ≤ 2Mvr
`0−1∑k=0
‖ΨkK‖, R ≤ 2Cvmv
rp−1p
∑`0−1k=0 ‖ΨkK‖(∑`0−1k=0 ‖ΨkK‖p
) 1p
,650
651
This manuscript is for review purposes only.
DATA-DRIVEN AMBIGUITY SETS FOR LINEAR SYSTEMS 19
with `0 as in the time-invariant case of Proposition 5.2, and whereG andK denote the constant652
values of the internal noise and observer gain matrices, resp. The superiority of these bounds653
can be checked using the matrix bounds in Proposition 5.1(i) and their derivation is based on654
a simplified version of the arguments employed for the proof of Proposition 5.2. 655
6. Application to economic dispatch with distributed energy resources. In this section,656
we take advantage of the ambiguity sets constructed with noisy partial measurements, cf.657
Theorem 4.6, to hedge against the uncertainty in an optimal economic dispatch problem.658
This is a problem where uncertainty is naturally involved due to (dynamic) energy resources,659
which the scheduler has no direct access to control or measure, like storage or renewable energy660
elements. The financial implications of the associated decisions are of utmost importance for661
the electricity market and justify the use of a reliable decision framework that accounts for662
the variability of the uncertain factors.663
6.1. Network model and optimization objective. Consider a network with distributed664
energy resources [12] comprising of n1 generator units and n2 storage (battery) units. The665
network needs to operate as close as possible to a prescribed power demand D at the end666
of the time horizon [0 : `], corresponding to a uniform discretization of step-size δt of the667
continuous-time domain. To this end, each generator and storage unit supplies the network668
with positive power P j and Sι, respectively, at time `. We assume we can control the power669
of the generators, which additionally needs to be within the upper and lower thresholds P jmin670
and P jmax, respectively. Each battery is modeled as an uncertain dynamic element with an671
unknown initial state distribution and we can decide whether it is connected (ηι = 1) or not672
(ηι = 0) to the network at time `. Our goal is to minimize the energy cost while remaining as673
close as possible to the prescribed power demand. Thus, we minimize the overall cost674
C(P ,η) :=
n1∑j=1
gj(P j) +
n2∑ι=1
ηιhι(Sι) + c
( n1∑j=1
P j +
n2∑ι=1
ηιSι −D)2
(6.1)675
676
where P := (P 1, . . . , Pn1), η := (η1, . . . , ηn2), gj and hι are cost functions for the power677
provided by generator j and storage unit ι, respectively. We treat the deviation of the injected678
power from its prescribed demand as a soft constraint by assigning it a quadratic cost with679
weight c and augmenting the overall cost function (6.1). Due to the uncertainty about the680
batteries’ state and their injected powers Sι, the minimization of (6.1) is a stochastic problem.681
6.2. Battery dynamics and observation model. Each battery is modeled as a single-cell682
dynamic element and we consider its current Iι discharging over the operation interval (if683
connected to the network) as a fixed and a priori known function of time. Its dynamics is684
conveniently approximated by the equivalent circuit in Figure 2(a) (see e.g., [29, 30]), where685
zι is the state of charge (SoC) of the cell and Ocv(zι) is its corresponding open-circuit voltage,686
which we approximate by the affine function αιzι+βι in Figure 2(b). The associated discrete-687
time cell model is688
χιk+1 ≡(Iι,2k+1
zιk+1
)=
(aι 00 1
)(Iι,2kzιk
)+
(1− aι−δt/Qι
)Iιk689
θιk ≡ V ιk = αιzιk + βι − IιkRι,1 − I
ι,2k Rι,2690
This manuscript is for review purposes only.
20 D. BOSKOS, J. CORTES, AND S. MARTINEZ
load
Ocv(𝑧𝑧𝜄𝜄)
source
𝑅𝑅𝜄𝜄,1
𝑅𝑅𝜄𝜄,2𝐼𝐼𝜄𝜄,2
𝐼𝐼𝜄𝜄,1
𝐼𝐼𝜄𝜄
𝐶𝐶𝜄𝜄
+ −
−+
(a)
State of Charge
Ope
n Ci
rcui
t Vol
tage
0 0.2 0.60.4 0.82
2.5
3.5
4
3
(b)
Figure 2. (a) shows the equivalent circuit model of a lithium-ion battery cell in discharging mode (c.f. [30,Figure 2],[29, Figure 1]). (b) is taken from [29, Figure 3] and shows the nonlinear dependence of the opencircuit voltage on the state of charge and its affine approximation.
691
where aι := e−δt/(R2,ιCι), δt is the time discretization step, and Qι is the cell capacity. Here, we692
assume that for all k ∈ [0 : `] the cell is neither fully charged or discharged (by e.g., requiring693
that 0 < z0 −∑`−1
k=0 δtIιk/Q
ι < 1 for all k and any candidate initial conditions and input694
currents) and so, the evolution of its voltage is accurately represented by the above difference695
equation. The initial condition comprising of the SoC zι0 and the current Iι,20 through Rι,2 is696
random with an unknown probability distribution. We also consider additive measurement697
noise with an unknown distribution, namely, we measure698
θιk = αιzιk + βι − IιkRι,1 − Iι,2k Rι,2 + vk.699700
To track the evolution of each random element through a linear system of the form (3.3), we701
consider for each battery a nominal state trajectory χι,?k = (Iι,2,?k , zι,?k ) initiated from the center702
of the support of its initial-state distribution. Setting ξιk = χιk−χι,?k and ζιk = θk(χ
ιk)−θk(χ
ι,?k ),703
ξιk+1 = Aιkξιk704
ζιk = Hιkξιk + vk,705706
where Aιk := diag(a, 1) and Hιk := (αι,−Rι,2). Denoting ξ := (ξ1, . . . , ξn2) and ζ :=707
(ζ1, . . . , ζn2), we obtain a system of the form (3.3) for the dynamic random variable ξ. Despite708
the fact that the state distribution ξk of the batteries across time is unknown, we assume hav-709
ing access to output data from N independent realizations of their dynamics over the horizon710
[0 : `]. Using these samples we exploit the results of the paper to build an ambiguity ball PN711
of radius εN in the 2-Wasserstein distance (i.e., with p = 2), that contains the batteries’ state712
distribution Pξ` at time ` with prescribed probability 1−β. In particular, we take the samples713
from each realization i ∈ [1 : N ] and use an observer to estimate its state ξi` at time `. The714
ambiguity set is centered at the estimator-based empirical distribution PNξ` = 1N
∑Ni=1 δξi`
and715
its radius can be determined using Theorem 4.6 and Proposition 4.5.716
This manuscript is for review purposes only.
DATA-DRIVEN AMBIGUITY SETS FOR LINEAR SYSTEMS 21
6.3. Decision problem as a distributionally robust optimization (DRO) problem. To717
solve the decision problem regarding whether or not to connect the batteries for economic718
dispatch, we formulate a distributionally robust optimization problem for the cost (6.1) using719
the ambiguity set PN . To do this, we derive an explicit expression of how the cost function C720
depends on the stochastic argument ξ`. Notice first that the power injected by each battery721
at time ` is722
Sι = Iι`Vι` = Iι`
(αιzι` + βι − Iι`Rι,1 − I
ι,2` Rι,2
)723
= 〈(−Iι`Rι,2, αιIι`), χι`〉+ βιIι` − (Iι`)2Rι,1 = 〈αι, ξι`〉+ βι ≡ (αι)>ξι` + βι,724725
with αι := (−Iι`Rι,2, αιIι`) and726
βι := 〈αι, χι,?` 〉+ Iι`βι − (Iι`)
2Rι,1 = Iι`Iι,2,?` Rι,2 − αιIι`z
ι,?` + Iι`β
ι − (Iι`)2Rι,1.727728
Considering further affine costs hι(S) := αιS+ βι for the power provided by the batteries, the729
overall cost C becomes730
C(P ,η) = g(P ) + (η ∗ α)>ξ` + η>β + c(1>P + (η ∗ α)>ξ` + η>β −D
)2,(6.2)731732
where ∗ denotes the Khatri-Rao product (cf. Section 2) and733
g(P ) :=
n1∑j=1
gj(P j), α := (α1, . . . , αn2), β := (β1, . . . , βn2),734
α := (α1α1, . . . , αn2αn2), β := (α1β1 + β1, . . . , αn2 βn2 + βn2).735736
Using the equivalent description (6.2) for C and recalling the upper and lower bounds P jmin737
and P jmax for the generator’s power, we formulate the DRO power dispatch problem738
infη,P
fη(P ) + sup
Pξ`∈PN
EPξ`[hη(P , ξ`)
],(6.3a)739
s.t. P jmin ≤ Pj ≤ P jmax ∀j ∈ [1 : n1],(6.3b)740741
with the ambiguity set PN introduced above and742
fη(P ) := g(P ) + cP>11>P + 2c(η>β −D)1>P + c(η>β −D)2 + η>β743
hη(P , ξ`) := cξ>` (η ∗ α)(η ∗ α)>ξ` +(2c(1>P + η>β −D)(η ∗ α)> + (η ∗ α)>
)ξ`,744745
This formulation aims to minimize the worst-case expected cost with respect to the plausible746
distributions of ξ at time `.747
6.4. Tractable reformulation of the DRO problem. Our next goal is to obtain a tractable748
reformulation of the optimization problem (6.3). To this end, we first provide an equivalent749
description for the inner maximization in (6.3), which is carried out over a space of probability750
measures. Exploiting strong duality (see [20, Corollary 2(i)] or [5, Remark 1]) and recalling751
This manuscript is for review purposes only.
22 D. BOSKOS, J. CORTES, AND S. MARTINEZ
that our ambiguity set PN is based on the 2-Wasserstein distance, we equivalently write the752
inner maximization problem supPξ`∈PN EPξ`
[hη(P , ξ`)
]as753
infλ≥0
λψ2
N +1
N
N∑i=1
supξ`∈Ξhη(P , ξ`)− λ‖ξ` − ξi`‖2
,(6.4)754
755
where ψN ≡ ψN (β) is the radius of the ambiguity ball, Ξ ⊂ R2n2 is the support of the756
batteries’ unknown state distribution, and the ξi` are the estimated states of their realizations.757
We slightly relax the problem, by allowing the ambiguity ball to contain all distributions with758
distance ψN from PNξ` that are supported on R2n2 and not necessarily on Ξ. Thus, we first759
look to solve for each estimated state ξi` the optimization problem760
supξ`∈R2n2
hη(P , ξ`)− λ‖ξ` − ξi`‖2,761
762
which is written763
supξ`∈R2n2
ξ>` Aξ` +
(2c(1>P + η>β −D)(η ∗ α)> + (η ∗ α)>
)ξ` − λ(ξ` − ξi`)>(ξ` − ξi`)
764
= −λ(ξi`)>ξi` + sup
ξ`∈R2n2
ξ>` (A− λI2n2)ξ`765
+(2c(1>P + η>β −D)(η ∗ α)> + (η ∗ α)> + 2λ(ξi`)
>)ξ`766
= −λ(ξi`)>ξi` + sup
ξ`∈R2n2
ξ>` (A− λI2n2)ξ` + (ri)>ξ`
767
768
where ri ≡ riη(P , λ) := 2c(1>P+η>β−D)(η∗α)+η∗α+2λξi` and A ≡ Aη := c(η∗α)(η∗α)>769
is a symmetric positive semi-definite matrix with diagonalization A = Q>DQ where the770
eigenvalues decrease along the diagonal. Hence, we get771
supξ`∈R2n2
ξ>` (A− λI2n2)ξ` + (ri)>ξ`
= supξ`∈R2n2
ξ>` (Q>DQ−Q>λI2n2Q)ξ` + (ri)>ξ`
772
= supξ∈R2n2
ξ>(D− λI2n2)ξ + (ri)>ξ
773
774
with ri := Qri and denoting λmax(A) the maximum eigenvalue of A we have775
supξ∈R2n2
ξ>(D− λI2n2)ξ + (ri)>ξ
=
∞ if 0 ≤ λ < λmax(A)14(ri)>(λI2n2 −D)−1ri if λ > λmax(A).
(6.5)776
777
To obtain this we exploited that Q(ξ) := ξ>(D− λI2n2)ξ + (ri)>ξ is maximized when778
∇Q(ξ?) = 0 ⇐⇒ 2(D− λI2n2)ξ? + ri = 0 ⇐⇒ ξ? =1
2(λI2n2 −D)−1ri,779
780
This manuscript is for review purposes only.
DATA-DRIVEN AMBIGUITY SETS FOR LINEAR SYSTEMS 23
which gives the optimal value Q(ξ?) = 14(ri)>(λI2n2 −D)−1ri. Note that we do not need to781
specify the value of the expression in (6.5) for λ = λmax. In particular, since the function we782
minimize in (6.4) is convex in λ, the inner part of the DRO problem is equivalently written783
infλ>λmax(A)
λ
(ψ2N −
1
N
N∑i=1
(ξi`)>ξi`
)+
1
4N
N∑i=1
riη(P , λ)>(λI2n2 −D)−1riη(P , λ)
.784
785
Taking further into account that786
(λI2n2 −D)−1 = diag( 1
λ− λmax(A), . . . ,
1
λ− λmin(A)
),787
788
as well as the constraints (6.3b) on the decision variable P , the overall DRO problem is789
reformulated as790
minη
infP ,λ
fη(P ) + λ
(ψ2N −
1
N
N∑i=1
(ξi`)>ξi`
)+
1
4N
N∑i=1
riη(P , λ)>(6.6a)791
× diag( 1
λ− λmax(A), . . . ,
1
λ− λmin(A)
)riη(P , λ)
792
subject to P jmin ≤ Pj ≤ P jmax ∀j ∈ [1 : n1](6.6b)793
λ > λmax(A).794795
6.5. Simulation results. For the simulations we consider n1 = 4 generators and n2 = 3796
batteries with the same characteristics. We assume that the distributions of each initial797
SoC zι0 and current Iι,20 are known to be supported on the intervals [0.45, 0.9] and [1.5, 1.7],798
respectively. The true SoC distribution for batteries 2 and 3 at time zero is Pz20 = Pz30 =799
U [0.45, 0.65] (U denotes uniform distribution). On the other hand, the provider of battery 1800
has access to the distinct batteries 1A and 1B and selects randomly one among them with801
probabilities 0.9 and 0.1, respectively. The SoC distribution of battery 1A at time zero is802
Pz1A0= U [0.45, 0.65], whereas that of battery 1B is Pz1B0
= U [0.84, 0.86]. Thus, we get the803
bimodal distribution Pz10 = 0.9U [0.45, 0.65] + 0.1U [0.84, 0.86], which is responsible for non-804
negligible empirical distribution variations, since for small numbers of samples, it can fairly805
frequently occur that the relative percentage of samples from 1B deviates significantly from its806
expected one. On the other hand, we assume that the true initial currents Iι,20 of all batteries807
are fixed to 1.6308, namely, PI1,20
= PI2,20
= PI3,20
= δ1.6308. For the measurements, we consider808
the Gaussian mixture noise model Pvk = 0.5N (0.01, 0.012)+0.5N (−0.01, 0.012) with N (µ, σ2)809
denoting the normal distribution with mean µ and variance σ2.810
To compute the ambiguity radius for the reformulated DRO problem (6.6), we specify811
its nominal and noise components εN (βnom, ρξ`) and εN (βns), where due to Proposition 4.1,812
ρξ` can be selected as half the diameter of any set containing the support of Pξ` in the813
infinity norm. It follows directly from the specific dynamics of the batteries that ρξ` does not814
exceed half the diameter of the initial conditions’ distribution support, which is isometric to815
[0.45, 0.9]3× [1.5, 1.7]3 ⊂ R6. Hence, using Proposition 19 in the online version [9] with p = 2,816
d = 6, and ρξ` = 0.225, we obtain817
εN (βnom, ρξ`) = 4.02N−16 + 1.31(lnβ−1
nom)14N−
14 .818
This manuscript is for review purposes only.
24 D. BOSKOS, J. CORTES, AND S. MARTINEZ
Figure 3. Results from 100 realizations of the power dispatch problem with N = 10 independent samplesused for each realization. We compute the optimizers of the SAA and DRO problems, plot their correspondingoptimal values (termed “SAA cost” and “DRO cost”), and also evaluate their performance with respect to thetrue distribution (“true cost with SAA optimizer” and “true cost with DRO optimizer”). With the exceptionof two realizations (whose DRO cost and true cost with the DRO optimizer are framed inside black boxes), theDRO cost is above the true cost of the DRO optimizer, namely, this happens with high probability. From theplot, it is also clear that the SAA solution tends to over-promise since its value is most frequently below thetrue cost of the SAA optimizer.
819
To determine the noise radius, we first compute lower and upper bounds mv and Mv for the L2820
norm of the Gaussian mixture noise vk and an upper bound Cv for its ψ2 norm. Denoting by EP821
the integral with respect to the distribution P , we have for Pvk = 0.5N (µ1, σ21)+0.5N (µ2, σ
22)822
that ‖vk‖22 = E 12
(P1+P2)
[v2k
]= 1
2(µ21 + σ2
1 + µ22 + σ2
2), where P1 = N (µ1, σ21), P2 = N (µ2, σ
22)823
and we used the fact that EPi[v2k
]= µ2
i +EPi[(vk−µi)2
]= µ2
i +σ2i . Hence, in our case, where824
µi = σi = 0.01, we can pick mv = Mv = 0.01√
2. Further, using Proposition 21 from the825
online version [9], we can select Cv = 0.01(√
8/3 +√
ln 2). To perform the state estimation826
from the output samples we used a Kalman filter. Its initial condition covariance matrix827
corresponds to independent Gaussian distributions for each SoC zι0 and current Iι,20 with a828
standard deviation of the order of their assumed support. We also select the same covariance829
as in the components of the Gaussian mixture noise to model the measurement noise of the830
Kalman filter. Using the dynamics of the filter and the values of mv, Mv, and Cv above, we831
obtain from (4.9b), (4.10a)-(4.10c), and (4.14) the constants Mw = 0.325, Mv = 0.008, and832
R = 2.72 for the expression of the noise radius. In particular, we have from Proposition 4.5833
that εN (βns) = 0.47 + 0.0113√
74.98/N ln(2/βns) and the overall radius is834
ψN (β) = 0.47 + 4.02N−16 + 1.31(lnβ−1
nom)14N−
14 + 0.0973(ln(2β−1
ns ))12N−
12 .(6.7)835836
We assume that the energy cost of the generators is lower than that of the batteries837
and select the quadratic power generation cost g(P ) = 0.25∑4
j=1(P j − 0.1)2 and the same838
This manuscript is for review purposes only.
DATA-DRIVEN AMBIGUITY SETS FOR LINEAR SYSTEMS 25
(a)
(b)
Figure 4. Analogous results to those of Figure 3, from 100 realizations with (a) N = 40 and (b) N =160 independent samples, and the ambiguity radius tuned so that the same confidence level is preserved. Inboth cases, the DRO cost is above the true cost of the DRO optimizer with high probability (in fact, always).Furthermore, the cost of the DRO optimizer (red star) is strictly better than the true cost of the SAA one (greencircle) for a considerable number of realizations (highlighted in the illustrated boxes).
lower/upper power thresholds P jmin = 0.2/P jmax = 0.5 for all generators. For the batteries, we839
pick the same resistances Rι,1 = 0.34 and Rι,2 = 0.17, and we take aι = 0.945 and Iιk = 8840
for all times. We nevertheless use different linear costs hι(S) = αιS for their injected powers,841
with α1 = 1 and α2 = α3 = 1.3, since battery 1 is less reliable due to the large SoC fluctuation842
among its two modes.843
We solve 100 independent realizations of the overall economic dispatch problem. For each844
of them, we generate independent samples from the batteries’ initial condition distributions845
and solve the associated sample average approximation (SAA) and DRO problems for N = 10,846
N = 40, and N = 160 samples, respectively, using CVX [21]. It is worth noting that the radius847
ψN given by (6.7) is rather conservative. The main reasons for this are 1) conservativeness848
of the concentration of measure results used for the derivation of the nominal radius, 2)849
lack of homogeneity of the distribution’s support (the a priori known support of the Iι,20850
components is much smaller than that of the zι0 ones), 3) independence of the batteries’851
individual distributions, which we have not exploited, and 4) conservative upper bounds for852
This manuscript is for review purposes only.
26 D. BOSKOS, J. CORTES, AND S. MARTINEZ
the estimation error. Although there is room to sharpen all these aspects, it requires multiple853
additional contributions and lies beyond the scope of the paper. Nevertheless, the formula (6.7)854
gives a qualitative intuition about the decay rates for the ambiguity radius. In particular, it855
indicates that under the same confidence level and for small sample sizes, an ambiguity radius856
proportional to N−14 is a reasonable choice. Based on this, we selected the ambiguity radii857
0.05, 0.0354, and 0.025 for N = 10, N = 40, and N = 160. The associated simulation results858
are shown in Figures 3, 4(a), and 4(b), respectively. We plot there the optimal values of the859
SAA and DRO problems (termed “SAA cost” and “DRO cost”) and provide the expected860
performance of their respective decisions with respect to the true distribution (“true cost with861
SAA optimizer” and “true cost with DRO optimizer”). We observe that in all three cases,862
the DRO value is above the true cost of the DRO optimizer for nearly all realizations (and863
for all when N is 40 or 160), which verifies the out-of-sample guarantees that we seek in DRO864
formulations [17, Theorem 3.5]. In addition, when solving the problem for 40 or 160 samples,865
we witness a clear superiority of the DRO decision compared to the one of the non-robust866
SAA, because it considerably improves the true cost for a significant number of realizations867
(cf. Figure 4).868
6.6. Discussion. The SAA solution tends to consistently promise a better outcome com-869
pared to what the true distribution reveals for the same decision (e.g., magenta circle being870
usually under the green circle in all figures). This rarely happens for the DRO solution, and871
when it does, it is only by a small margin. This makes the DRO approach preferable over the872
SAA one in the context of power systems operations where honoring committments at a much873
higher cost than anticipated might result in significant losses, and not fulfilling committments874
may lead to penalties from the system operator.875
7. Conclusions. We have constructed high-confidence ambiguity sets for dynamic random876
variables using partial-state measurements from independent realizations of their evolution.877
In our model, both the dynamics and measurements are subject to disturbances with unknown878
probability distributions. The ambiguity sets are built using an observer to estimate the full879
state of each realization and leveraging concentration of measure inequalities. For systems880
that are either time-invariant and detectable, or uniformly observable, we have established881
uniform boundedness of the ambiguity radius. To aid the associated probabilistic guarantees,882
we also provided auxiliary concentration of measure results. Future research will include the883
consideration of robust state estimation criteria to mitigate the noise effect on the ambiguity884
radius, the extension of the results to nonlinear dynamics, and the construction of ambiguity885
sets with information about the moments.886
8. Appendix. Here we give proofs of various results of the paper.887
Proof of Lemma 2.1. Since X and Y are independent, their joint distribution P(X,Y ) is888
the product measure PX ⊗ PY of the individual distributions PX and PY . Thus, from the889
Fubini theorem [1, Theorem 2.6.5] and integrability of g, k we get890
E[g(X,Y )] =
∫Rn1×Rn2
g(x, y)dP(X,Y ) =
∫Rn1
∫Rn2
g(x, y)dPY dPX =
∫Rn1
E[g(x, Y )]dPX891
=
∫KE[g(x, Y )]dPX ≥
∫Kk(x)dPX =
∫Rn1
k(x)dPX = E[k(X)],892
This manuscript is for review purposes only.
DATA-DRIVEN AMBIGUITY SETS FOR LINEAR SYSTEMS 27
893
which concludes the proof.894
Proof of Lemma 4.2. Using [8, Lemma A.2] to bound the Wasserstein distance of two895
discrete distributions, we get Wp(PNξ`, PNξ` ) ≤
(1N
∑Ni=1 ‖ξi` − ξi`‖p
) 1p
=(
1N
∑Ni=1 ‖ei`‖p
) 1p.896
From (4.3), we have897
‖ei`‖ =∥∥∥Ψ`z
i +∑k=1
(Ψ`,`−k+1G`−kω
i`−k + Ψ`,`−k+1K`−kv
i`−k)∥∥∥898
≤ ‖Ψ`‖‖zi‖+∑k=1
‖Ψ`,`−k+1G`−k‖‖ωi`−k‖+∑k=1
‖Ψ`,`−k+1K`−k‖‖vi`−k‖1899
=: M(zi,ωi) + E(vi),900901
with E(vi) ≡ Ei given in the statement. Since (a+ b)p ≤ 2p−1(ap + bp) for a, b ≥ 0 and p ≥ 1,902
Wp(PNξ`, PNξ` ) ≤
( 1
N2p−1
N∑i=1
(M(zi,ωi)p + (Ei)p)) 1p.903
904
Next, using (a+ b)1p ≤ a
1p + b
1p for a, b ≥ 0 and p ≥ 1, we have905
Wp(PNξ`, PNξ` ) ≤
( 1
N2p−1
N∑i=1
M(zi,ωi)p) 1p
+( 1
N2p−1
N∑i=1
(Ei)p) 1p.(8.1)906
907
Finally, since (z,ω) ∈ BNd∞ (ρξ0) × BN`q
∞ (ρw), we get M(zi,ωi)p ≤ ‖Ψ`‖√d‖zi‖∞908
+∑`
k=1 ‖Ψ`,`−k+1G`−k‖√q‖ωi`−k−1‖∞ ≤Mw. This combined with (8.1) yields (4.9a).909
Proof of Lemma 4.3. From H4 in Assumption 3.2, we obtain for each summand in (4.9c)910 ∥∥‖Ψ`,`−k+1K`−k‖‖vi`−k‖1∥∥ψp≤ ‖Ψ`,`−k+1K`−k‖
(∥∥vi`−k,1∥∥ψp + · · ·+∥∥vi`−k,r∥∥ψp)911
≤ Cvr‖Ψ`,`−k+1K`−k‖.912913
Hence, we deduce that914
‖Ei‖ψp ≤∑k=1
∥∥‖Ψ`,`−k+1K`−k‖‖vi`−k‖1∥∥ψp≤ Cvr
∑k=1
‖Ψ`,`−k+1K`−k‖.915
916
For the Lp bounds, note that ‖Ei‖p =∥∥∑
k∈[1:`],l∈[1:r] ‖Ψ`,`−k+1K`−k‖|vi`−k,l|∥∥p. Thus, from917
the inequality ‖∑
i ciXi‖p ≤∑
i ci‖Xi‖p, which holds for any nonnegative ci and Xi in Lp,918
‖Ei‖p ≤∑
k∈[1:`],l∈[1:r]
‖Ψ`,`−k+1K`−k‖‖vi`−k,l‖p,919
920
This manuscript is for review purposes only.
28 D. BOSKOS, J. CORTES, AND S. MARTINEZ
which, by the upper bound in H4 of Assumption 3.2, implies (4.10a). For the other bound,921
we exploit linearity of the expectation and the inequality(∑
i ci)p ≥ ∑i c
pi , which holds for922
any nonnegative ci, to get923
(E[(Ei)p
]) 1p ≥
( ∑k∈[1:`],l∈[1:r]
‖Ψ`,`−k+1K`−k‖pE[|vi`−k,l|
]p) 1p
.924
925
Thus, from the lower bound in H4 of Assumption 3.2 we also obtain (4.10c).926
We next prove Proposition 4.4, along the lines of the proof of [41, Theorem 3.1.1], which927
considers the special case of sub-Gaussian distributions. We rely on the following concentration928
inequality [41, Corollary 2.8.3].929
Proposition 8.1. (Bernstein inequality). Let X1, . . . , XN be scalar, mean-zero, sub-930
exponential, independent random variables. Then, for every t ≥ 0 we have931
P(∣∣∣ 1
N
N∑i=1
Xi
∣∣∣ ≥ t) ≤ 2 exp(− c′min
t2R2
,t
R
N),932
933
where c′ = 1/10 and R := maxi∈[1:N ] ‖Xi‖ψ1.934
The precise constant c′ above is not specified in [41] but we provide an independent proof935
of this result in the online version [9, Section 8.2].936
Proof of Proposition 4.4. Note that each random variable Xpi −1 is mean zero by assump-937
tion. Additionally, we have that ‖Xpi − 1‖ψ1 ≤ ‖X
pi ‖ψ1 + ‖1‖ψ1 = ‖Xi‖ψp + 1/ ln 2 ≤ R, where938
we took into account that939
E[ψ1(Xpi /t
p)] = E[ψp(Xi/t)]⇒ ‖Xpi ‖ψ1 = ‖Xi‖ψp ,940941
and the following fact, shown after the proof.942
. Fact I. For any constant random variable X = µ ∈ R, it holds ‖X‖ψp = |µ|/(ln 2)1p . /943
Thus, we get from Proposition 8.1 that944
P(∣∣∣∣ 1
N
N∑i=1
Xpi − 1
∣∣∣∣ ≥ t) ≤ 2 exp(− c′N
R2mint2, t
),(8.2)945
946
where we used the fact that R > 1. We will further leverage the following facts shown after947
the proof of the proposition.948
. Fact II. For all p ≥ 1 and z ≥ 0 it holds that |z − 1| ≥ δ ⇒ |zp − 1| ≥ maxδ, δp. /949
. Fact III. For any δ ≥ 0, if u = maxδ, δp, then minu, u2 = αp(δ), with αp as given950
by (4.12). /951
By exploiting Fact II, we get952
P(∣∣∣∣( 1
N
N∑i=1
Xpi
) 1p
− 1
∣∣∣∣ ≥ t) ≤ P(∣∣∣∣ 1
N
N∑i=1
Xpi − 1
∣∣∣∣ ≥ maxt, tp)
953
This manuscript is for review purposes only.
DATA-DRIVEN AMBIGUITY SETS FOR LINEAR SYSTEMS 29
≤ 2 exp(− c′N
R2minmaxt, tp2,maxt, tp
).954
955
Thus, since P(|Y | ≥ t) ≥ P(Y ≥ t) for any random variable Y , we obtain (4.11) from Fact III956
and conclude the proof.957
Proof of Fact I. From the ψp norm definition, ‖X‖ψp = inft > 0 |E
[e(|X|/t)p] ≤ 2
=958
inft > 0 | t ≥ |µ|/(ln 2)
1p
= |µ|/(ln 2)1p , which establishes the result.959
Proof of Fact II. Assume first that z < 1. Then, we have that |zp− 1| = 1− zp > 1− z ≥960
δ ≥ δp. Next, let z ≥ 1. Then, we get |zp− 1| = zp− 1 ≥ z− 1 ≥ δ. In addition, when δp ≥ δ,961
namely, when δ ≥ 1, we have that zp−(z−1)p ≥ 1, and hence, |zp−1| = zp−1 ≥ (z−1)p ≥ δp.962
Proof of Fact III. We consider two cases. Case (i): 0 ≤ δ ≤ 1 ⇒ δ ≥ δp ⇒ u =963
maxδ, δp = δ. Then minu, u2 = minδ, δ2 = δ2. Case (ii): δ > 1 ⇒ δ ≤ δp ⇒ u =964
maxδ, δp = δp. Then minu, u2 = minδp, δ2p = δp. Thus, we get that minu, u2 = αp(δ)965
for all δ ≥ 0.966
Proof of Lemma 4.11. The proof of the first part is found inside the proof of [34, Lemma 1].967
To prove the second part, let π be an optimal coupling of P1 and P2 for the Wasserstein distance968
Wp(P1, P2) and define969
π(B) :=
∫Rd×Rd
∫Rd
1B(x, y + z)Q(dz)π(dx, dy)970971
for any measurable subset B of Rd × Rd. Then its marginals satisfy972
π(Rd ×By) =
∫Rd×Rd
∫Rd
1By(y + z)Q(dz)π(dx, dy)973
=
∫Rd
∫Rd
1By(y + z)Q(dz)P2(dy) = P2 ? Q(By)974975
and π(Bx × Rd) = P1(Bx), which is derived by analogous arguments. In addition,976 (∫Rd×Rd
‖x− y‖pπ(dx, dy)
) 1p
=
(∫Rd×Rd
∫Rd‖x− (y + z)‖pQ(dz)π(dx, dy)
) 1p
977
=
(∫Rd×Rd
∫Rd‖(x− y) + z‖pQ(dz)π(dx, dy)
) 1p
978
≤(∫
Rd×Rd
∫Rd‖x− y‖pQ(dz)π(dx, dy)
) 1p
+
(∫Rd×Rd
∫Rd‖z‖pQ(dz)π(dx, dy)
) 1p
979
≤Wp(P1, P2) +
(∫Rd×Rd
qpπ(dx, dy)
) 1p
= Wp(P1, P2) + q,980981
where we used the definition of π in the first equality, the triangle inequality for the space982
Lp(Rd×R2d;Q⊗π) in the first inequality, and that( ∫
Rd ‖x‖pQ(dx)
) 1p ≤ q in the last inequality.983
This concludes also the second part of the proof.984
This manuscript is for review purposes only.
30 D. BOSKOS, J. CORTES, AND S. MARTINEZ
REFERENCES985
[1] R. B. Ash, Real Analysis and Probability, Academic Press, 1972.986[2] A. Ben-Tal, L. E. Ghaoui, and A. Nemirovski, Robust optimization, Princeton University Press,987
2009.988[3] D. Bertsimas, V. Gupta, and N. Kallus, Robust sample average approximation, Mathematical Pro-989
gramming, 171 (2018), pp. 217–282.990[4] J. Blanchet, Y. Kang, and K. Murthy, Robust Wasserstein profile inference and applications to991
machine learning, Journal of Applied Probability, 56 (2019), pp. 830–857.992[5] J. Blanchet and K. Murthy, Quantifying distributional model risk via optimal transport, Mathematics993
of Operations Research, 44 (2019), pp. 565–600.994[6] V. Bogachev, Measure theory, vol. 1, Springer, 2007.995[7] D. Boskos, J. Cortes, and S. Martinez, Data-driven ambiguity sets for linear systems under distur-996
bances and noisy observations, in American Control Conference, Denver, CO, July 2020, pp. 4491–9974496.998
[8] D. Boskos, J. Cortes, and S. Martinez, Data-driven ambiguity sets with probabilistic guarantees for999dynamic processes, IEEE Transactions on Automatic Control, 66 (2021), pp. 2991–3006.1000
[9] D. Boskos, J. Cortes, and S. Martınez, High-confidence data-driven ambiguity sets for time-varying1001linear systems, https://arxiv.org/abs/2102.01142, (2021).1002
[10] F. Boso, D. Boskos, J. Cortes, S. Martınez, and D. M. Tartakovsky, Dynamics of data-driven1003ambiguity sets for hyperbolic conservation laws with uncertain inputs, (2021). To appear.1004
[11] Z. Chen, D. Kuhn, and W. Wiesemann, Data-driven chance constrained programs over Wasserstein1005balls, arXiv preprint arXiv:1809.00210, (2018).1006
[12] A. Cherukuri and J. Cortes, Distributed coordination of DERs with storage for dynamic economic1007dispatch, IEEE Transactions on Automatic Control, 63 (2018), pp. 835–842.1008
[13] A. Cherukuri and J. Cortes, Cooperative data-driven distributionally robust optimization, IEEE Trans-1009actions on Automatic Control, 65 (2020), pp. 4400–4407.1010
[14] J. Coulson, J. Lygeros, and F. Dorfler, Data-enabled predictive control: In the shallows of the1011DeePC, in European Control Conference, 2019, pp. 307–312.1012
[15] J. Dedecker and F. Merlevede, Behavior of the empirical Wasserstein distance in Rd under moment1013conditions, Electronic Journal of Probability, 24 (2019).1014
[16] S. Dereich, M. Scheutzow, and R. Schottstedt, Constructive quantization: Approximation by1015empirical measures, Annales de l’Institut Henri Poincar, Probabilits et Statistiques, 49 (2013),1016p. 11831203.1017
[17] P. M. Esfahani and D. Kuhn, Data-driven distributionally robust optimization using the Wasser-1018stein metric: performance guarantees and tractable reformulations, Mathematical Programming, 1711019(2018), pp. 115–166.1020
[18] N. Fournier and A. Guillin, On the rate of convergence in Wasserstein distance of the empirical1021measure, Probability Theory and Related Fields, 162 (2015), p. 707738.1022
[19] R. Gao, X. Chen, and A. J. Kleywegt, Wasserstein distributional robustness and regularization in1023statistical learning, arXiv preprint arXiv:1712.06050, (2017).1024
[20] R. Gao and A. Kleywegt, Distributionally robust stochastic optimization with Wasserstein distance,1025arXiv preprint arXiv:1604.02199, (2016).1026
[21] M. Grant and S. Boyd, CVX: Matlab software for disciplined convex programming, version 2.1. http:1027//cvxr.com/cvx, Mar. 2014.1028
[22] Y. Guo, K. Baker, E. DallAnese, Z. Hu, and T. H. Summers, Data-based distributionally robust1029stochastic optimal power flow–Part I: Methodologies, IEEE Transactions on Power Systems, 34 (2018),1030pp. 1483–1492.1031
[23] A. Hota, A. Cherukuri, and J. Lygeros, Data-driven chance constrained optimization under Wasser-1032stein ambiguity sets, in American Control Conference, Philadelphia, PA, USA, 2019, pp. 1501–1506.1033
[24] B. Kloeckner, Empirical measures: regularity is a counter-curse to dimensionality, arXiv preprint1034arXiv:1802.04038, (2019), http://dx.doi.org/https://doi.org/10.1051/ps/2019025.1035
[25] B. Li, J. Mathieu, and R. Jiang, Distributionally robust chance constrained optimal power flow assum-1036ing log-concave distributions, in Power Systems Computation Conference, 2018, pp. 1–7.1037
This manuscript is for review purposes only.
DATA-DRIVEN AMBIGUITY SETS FOR LINEAR SYSTEMS 31
[26] D. Li, D. Fooladivanda, and S. Martınez, Data-driven variable speed limit design for highways1038via distributionally robust optimization, in European Control Conference, Napoli, Italy, June 2019,1039pp. 1055–1061.1040
[27] D. Li, D. Fooladivanda, and S. Martınez, Online learning of parameterized uncertain dynamical1041environments with finite-sample guarantees, IEEE Control Systems Letters, (2020).1042
[28] D. Li and S. Martınez, Online data assimilation in distributionally robust optimization, in IEEE Int.1043Conf. on Decision and Control, Miami, FL, USA, December 2018, pp. 1961–1966.1044
[29] M. Li, Li-ion dynamics and state of charge estimation, Renewable Energy, 100 (2017), pp. 44–52.1045[30] B. Y. Liaw, G. Nagasubramanian, R. G. Jungst, and D. H. Doughty, Modeling of lithium ion1046
cells–A simple equivalent-circuit model approach, Solid State Ionics, 175 (2004), pp. 835–839.1047[31] J. Liu, Y. Chen, C. Duan, J. Lin, and J. Lyu, Distributionally robust optimal reactive power dispatch1048
with Wasserstein distance in active distribution network, Journal of Modern Power Systems and Clean1049Energy, 8 (2020), pp. 426–436.1050
[32] S. Liu, Matrix results on the Khatri-Rao and Tracy-Singh products, Linear Algebra and its Applications,1051289 (1999), p. 267277.1052
[33] J. B. Moor and B. D. O. Anderson, Coping with singular transition matrices in estimation and control1053stability theory, International Journal of Control, 31 (1980), pp. 571–586.1054
[34] G. C. Pflug and A. Pichler, From empirical observations to tree models for stochastic optimization:1055convergence properties, SIAM Journal on Optimization, 26 (2016), pp. 1715–1740.1056
[35] B. K. Poolla, A. R. Hota, S. Bolognani, D. S. Callaway, and A. Cherukuri, Wasserstein1057distributionally robust look-ahead economic dispatch, arXiv preprint arXiv:2003.04874, (2020).1058
[36] P. E. S. Shafieezadeh-Abadeh, D. Kuhn, Regularization via mass transportation, Journal of Machine1059Learning Research, 20 (2019), pp. 1–68.1060
[37] S. Shafieezadeh-Abadeh, V. A. Nguyen, D. Kuhn, and P. M. Esfahani, Wasserstein distribution-1061ally robust Kalman filtering, in Advances in Neural Information Processing Systems, 2018, pp. 8474–10628483.1063
[38] A. Shapiro, Distributionally robust stochastic programming, SIAM Journal on Optimization, 27 (2017),1064p. 22582275.1065
[39] A. Shapiro, D. Dentcheva, and A. Ruszczyski, Lectures on Stochastic Programming: Modeling and1066Theory, vol. 16, SIAM, Philadelphia, PA, 2014.1067
[40] E. D. Sontag, Mathematical control theory: deterministic finite dimensional systems, Springer, 1998.1068[41] R. Vershynin, High-dimensional probability: An introduction with applications in data science, vol. 47,1069
Cambridge University Press, 2018.1070[42] C. Villani, Topics in optimal transportation, no. 58 in Graduate Studies in Mathematics, American1071
Mathematical Society, 2003.1072[43] J. Weed and F. Bach, Sharp asymptotic and finite-sample rates of convergence of empirical measures1073
in Wasserstein distance, Bernoulli, 25 (2019), pp. 2620–2648.1074[44] J. Weed and Q. Berthe, Estimation of smooth densities in Wasserstein distance, arXiv preprint1075
arXiv:1902.01778, (2019).1076[45] F. Xin, B.-M. Hodge, L. Fangxing, D. Ershun, and K. Chongqing, Adjustable and distributionally1077
robust chance-constrained economic dispatch considering wind power uncertainty, Journal of Modern1078Power Systems and Clean Energy, 7 (2019), pp. 658–664.1079
[46] I. Yang, A convex optimization approach to distributionally robust Markov decision processes with1080Wasserstein distance, IEEE Control Systems Letters, 1 (2017), pp. 164–169.1081
[47] I. Yang, Wasserstein distributionally robust stochastic control: A data-driven approach, arXiv preprint1082arXiv:1812.09808, (2018).1083
[48] S. Zeng, Sample-based population observers, Automatica, 101 (2019), pp. 166–174.1084
This manuscript is for review purposes only.