Stochastic methods for stochastic variational...

Stochastic methods for stochastic variational inequalities

Philip ThompsonSantiago, 2017

Center for Mathematical Modeling (CMM) - Universidad de Chile

Workshop on Variational and Stochastic Analysis,Santiago, March 16th, 2017

Joint work with: A. Iusem (IMPA), A. Jofre (CMM-DIM) and R. Oliveira(IMPA).

pthompson@dim.uchile.cl Santiago 2017 CMM - Uni. Chile 1 / 21

Expected value formulation of SVI

Assume

Random operator: F : Ξ× Rn → R

n is a Caratheodory map on ameasurable space (Ξ,G),

Assume

n is a Caratheodory map on ameasurable space (Ξ,G),Randomness: ξ : Ω → Ξ is a r.v. on the probability space (Ω,F ,P),

Assume

Feasible set: X ⊂ Rn closed and convex.

Assume

Definition (SVI)

Assuming T : Rn → Rn is given by T = E[F (ξ, ·)], find x∗ ∈ X s.t.

∀x ∈ X ,〈T (x∗), x − x∗〉 ≥ 0.

(Solution set: X ∗).

Assume

Definition (SVI)

Assuming T : Rn → Rn is given by T = E[F (ξ, ·)], find x∗ ∈ X s.t.

∀x ∈ X ,〈T (x∗), x − x∗〉 ≥ 0.

(Solution set: X ∗).

Under some conditions, variational equilibria can be solved in stochasticgeneralized Nash games with expected value constraints and reduced tothe above setting with X = K × R

m+ (unbounded).

Expected valued formulation of SVI

Definition (SVI)

Assuming T : Rn → Rn is given by T = E[F (ξ, ·)], solve

0 ∈ T (x) +NX (x).

Dates back to King and Rockafellar (1993) and Gurkan, Ozge andRobinson (1999),

Definition (SVI)

0 ∈ T (x) +NX (x).

Unavailability of T : only samples of F (ξ, ·) available,

Definition (SVI)

0 ∈ T (x) +NX (x).

Unavailability of T : only samples of F (ξ, ·) available,Stochastic Optimization and SVIs need to be studied in termsperturbation theory of the mean objective function or operator withrespect to the probability measure,

Definition (SVI)

0 ∈ T (x) +NX (x).

Unavailability of T : only samples of F (ξ, ·) available,Stochastic Optimization and SVIs need to be studied in termsperturbation theory of the mean objective function or operator withrespect to the probability measure,

Two complexity metrics: optimization error and sample size.

Sample Average Approximation (SAA)

Framework:

Unavailability of T .

Framework:

SAA problem: choose a sample ξjNj=1 and solve

0 ∈ FN(x) +NX (x),

FN(·) :=1

F (ξj , ·)

is the empirical sample mean operator.

Framework:

SAA problem: choose a sample ξjNj=1 and solve

0 ∈ FN(x) +NX (x),

FN(·) :=1

F (ξj , ·)

is the empirical sample mean operator.

Choose a deterministic algorithm to solve the SAA problem.

Stochastic Approximation (SA)

Framework:

Stochastic oracle: given xk ∈ Rn and sample ξk of ξ, F (ξk , xk) is

available.

Framework:

available.

Stochastic approximation (SA): construct an explicit method usingonline samples with T (xk) replaced by F (ξk , xk).

Framework:

available.

Oracle error:ǫ(ξk , xk) := F (ξk , xk)− T (xk).

Framework:

available.

Oracle error:ǫ(ξk , xk) := F (ξk , xk)− T (xk).

Monotonicity: explore this property in the stochastic setting pushingforward significantly previous known convergence properties.

Examples: Stochastic Nash Equilibria and Simulation

Optimization

Stochastic Nash Equilibria: find x∗ ∈ Πmi=1X

i s.t.

∀i = 1, . . . ,m, x∗i ∈ argminxi∈X iE[fi(ξ, xi , x

∗−i )

Optimization

i s.t.

∗−i )

Assuming agent-wise smooth convex pay-offs, equivalent to a SVI with

X := X 1 × . . . × Xm,

F := (∇x1 f1, . . . ,∇xm fm).

Optimization

i s.t.

∗−i )

Assuming agent-wise smooth convex pay-offs, equivalent to a SVI with

X := X 1 × . . . × Xm,

F := (∇x1 f1, . . . ,∇xm fm).

Simulation optimization: Handbook of Simulation Optimization (2014).

Examples: Statistical Machine Learning

(Linear) Empirical risk minimization (ERM): given sample data

(xj , yj)Nj=1 and loss ℓ(·) the proposed estimator is

βN ∈ argminβ∈Θ1

ℓ(Yj − βTXj

Or the SAA problem associated to F (X ,Y ;β) := ℓ(Y − βTX

)where

N ≫ 1.

ℓ(Yj − βTXj

)where

N ≫ 1.

Stochastic gradient descent methods (SGD) are SA-type methods for theERM problem (in a discrete distribution) also known as randomincremental methods.

ℓ(Yj − βTXj

)where

N ≫ 1.

Stochastic gradient descent methods (SGD) are SA-type methods for theERM problem (in a discrete distribution) also known as randomincremental methods. Other incremental variations with respect to otherdiscrete high-dimension parameters (coordinates, constraints, number ofagents, etc).

Applications

Monotonicity in:

mechanical problems,

Applications

Monotonicity in:

partial economic equilibrium,

Applications

Monotonicity in:

transportation: Wardrop equilibrium,

Applications

Monotonicity in:

wireline and wireless communication networks (Koshal, Nedic andShanbhag),

Applications

Monotonicity in:

bandwith allocation and design of cognitive radio systems (Scutari,Facchinei and Pang),

Applications

Monotonicity in:

optical networks (Pan and Pavel),

Applications

Monotonicity in:

zero-sum Nash games, potential games

Applications

Monotonicity in:

Some structured Nash-equilibria (e.g. a class of Nash-Cournotproblems with nonlinear prices: Kannan and Shanbhag).

Applications

Monotonicity in:

Some structured Nash-equilibria (e.g. a class of Nash-Cournotproblems with nonlinear prices: Kannan and Shanbhag).

NOTE: Local monotonicity satisfied by a large class of non-monotoneequilibrium problems.

A variance-based stochastic extragradient method

Notation: Given sample ξN := ξ1, . . . , ξN, F (ξN , x) = 1N

∑Nj=1 F (ξj , x).

Algorithm (A variance-based stochastic extragradient method)

zk = ΠX

[xk − αk F (ξ

k , xk)],

xk+1 = ΠX

[xk − αk F (η

k , zk)],

where ξk := ξkj : j = 1, . . . ,Nk and ηk := ηkj : j = 1, . . . ,Nk.

Assumptions

T is pseudo-monotone:〈T (y), x − y〉 ≥ 0 =⇒ 〈T (x), x − y〉 ≥ 0,∀x , y ∈ R

Assumptions

Lipschitz-continuity: There exists measurable L : Ξ → R+ with

finite p-moment, p ≥ 2,‖F (ξ, x) − F (ξ, y)‖ ≤ L(ξ)‖x − y‖,∀x , y ∈ R

Assumptions

Sampling rate:∑ 1

Nk< ∞.

Assumptions

Sampling rate:∑ 1

Nk< ∞.

Stepsize bounded away from zero: 0 < infk αk ≤ supk αk < 1√6L.

Assumptions

Sampling rate:∑ 1

Nk< ∞.

Stepsize bounded away from zero: 0 < infk αk ≤ supk αk < 1√6L.

Previous assumptions not required:

bounded T or X

regularization

Uniformly bounded variance of oracle:supx∈X E

[‖F (ξ, x) − T (x)‖2

]≤ σ2.

Stochastic extragradient method with linear search

OBJECTIVE: Absence of knowledge of L.

Stochastic extragradient method with linear search

OBJECTIVE: Absence of knowledge of L.Notation: Given sample ξN := ξ1, . . . , ξN, F (ξN , x) = 1

∑Nj=1 F (ξj , x).

Algorithm (A variance-based stochastic extragradient method withlinear search)

Choose any α > 0. If xk = Π[xk − αF (ξk , xk)

]stop. Otherwise:

Linear search rule: define αk as the maximum α ∈ Θj α : j ∈ N0 such

α∥∥∥F

(ξk , zk(α)

)− F

(ξk , xk

)∥∥∥ ≤ λ‖zk(α) − xk‖,

where zk(α) := ΠX

[xk − αF (ξk , xk)

]for all α > 0. Set

zk = ΠX

[xk − αk F (ξ

k , xk)],

xk+1 = ΠX

[xk − αk F (η

k , zk)].

Results

Theorem (Asymptotic convergence)

For both methods, a.s. the sequence xk is bounded,

limk→∞

d(xk ,X ∗) = 0,

rαk(xk)

a.s.,L2−−−−→k→∞

Natural residual:rα(x) := ‖x − ΠX [x − αT (x)]‖.

Results

Proposition (Uniform boundedness of p-moment)

For both methods, given x∗ ∈ X ∗, there exists cp(x∗) ≥ 1 and

k0 := k0(x∗) ∈ N s.t.

supk≥k0

∣∣∣‖xk − x∗‖∣∣∣2

p≤ cp(x

∣∣∣‖xk0 − x∗‖∣∣∣2

NOTE: cp(x∗) and k0(x

∗) are explicitly estimated.NOTE: boundedness not assumed a priori!

Results

Theorem (Convergence rate and oracle complexity: known L)

Take αk ≡ α ∈ (0, 1/√6L) and Nk as

Nk =⌈θ(k + µ)(ln(k + µ))1+b

Then a.s.-convergence holds and for all x∗ ∈ X ∗, there are non-negative

constants Q(x∗), P(x∗) and I(x∗) such that for all ǫ > 0, there exists

K := Kǫ ∈ N such that

E[rα(xK )2] ≤ ǫ ≤ max1, θ−2Q(x∗)

2Nk ≤max1, θ−4max1, θI(x∗)

[ln(P(x∗)ǫ−1

)]1+b+ 1

Results

Theorem (Convergence rate and oracle complexity: unknown L)

Take any α > 0, Nk as

Nk =⌈θ(k + µ)(ln(k + µ))1+b

Then a.s.-convergence holds and for all x∗ ∈ X ∗ and all ǫ > 0, thereexists K := Kǫ ∈ N such that

E[rα(xK )2] ≤ ǫ .x∗

max1, θ−2K

jk · 2Nk .x∗ ln 1Θ

) max1, θ−4max1, θ[

ln(ǫ−1

)]1+b+ 1

where Lk = 1Nk

j=1 L(ξkj ) and jk is the number of oracle calls at the k-th

linear search.

Contributions: stochastic extragradient methods

Unbounded T or X ,

Before:

Boundedness

Unbounded T or X ,

Non-uniform variance: ∀x ∈ X ,E[‖F (ξ, x) − T (x)‖2

]< ∞,

Before:

Boundedness

Uniform variance: supx∈X E[‖F (ξ, x) − T (x)‖2

]≤ σ2,

Unbounded T or X ,

]< ∞,

No regularization,

Before:

Boundedness

]≤ σ2,

Regularization additional coordination sub-optimality

Unbounded T or X ,

]< ∞,

No regularization,

Faster rate O(1/K ) & constant stepsize,

Before:

Boundedness

]≤ σ2,

Rate O(1/√K ) & small stepsizes,

Unbounded T or X ,

]< ∞,

No regularization,

Oracle complexity O(ǫ−2) ·[ln(ǫ−1)

Before:

Boundedness

]≤ σ2,

Oracle complexity O(ǫ−2)

Unbounded T or X ,

]< ∞,

No regularization,

]1+b(efficient)

Before:

Boundedness

]≤ σ2,

Unbounded T or X ,

]< ∞,

No regularization,

]1+b(efficient)

Uniform boundedness of p-moments.

Before:

Boundedness

]≤ σ2,

Absence of L: First stochastic approximation method with linear search for SVI, Essentially same performance as with knowledge of L up to a factor of

O(ln 1

in the number of projections per iteration and oracle complexity.

O(ln 1

To the best of our knowledge, we obtain the best error bound knownfor SA of monotone SVIs improving works ofJuditsky-Nemirovski-Tauvel, Lan et al. and others.

O(ln 1

To the best of our knowledge, we obtain the best error bound knownfor SA of monotone SVIs improving works ofJuditsky-Nemirovski-Tauvel, Lan et al. and others.Previous results not shared by the SAA method or classical SAmethod:

Possibility of using local and distributed empirical averages, Avoid uniform Central Limit Theorem, Robust sampling.

O(ln 1

To the best of our knowledge, we obtain the best error bound knownfor SA of monotone SVIs improving works ofJuditsky-Nemirovski-Tauvel, Lan et al. and others.Previous results not shared by the SAA method or classical SAmethod:

Possibility of using local and distributed empirical averages, Avoid uniform Central Limit Theorem, Robust sampling.

Empirical evidence in Nemirovski-Juditsky-Lan-Shapiro (2009) thatSA can outperform SAA in a large class of convex-structuredproblems.

Final comments

Non-uniform variance of oracle:

Affine variational inequalities, LCP or systems of equations:

σ(x)2 := E[‖F (ξ, x) − T (x)‖2

]∼ ‖x‖2,

Final comments

σ(x)2 := E[‖F (ξ, x) − T (x)‖2

]∼ ‖x‖2,

The performance of our methods depend on the variance at points ofthe trajectory and in X ∗ (but not on the whole X ):

Q := infx∗∈X∗

‖x0 − x∗‖2 + σ(x∗)4 · max

0≤k≤k0(x∗)E

[‖xk − x∗‖2

Final comments

σ(x)2 := E[‖F (ξ, x) − T (x)‖2

]∼ ‖x‖2,

The performance of our methods depend on the variance at points ofthe trajectory and in X ∗ (but not on the whole X ):

Q := infx∗∈X∗

‖x0 − x∗‖2 + σ(x∗)4 · max

0≤k≤k0(x∗)E

[‖xk − x∗‖2

Better performance also for the case of compact X or uniformvariance if

σ(x∗)2 ≪ σ2,

(e.g. affine variational inequalities over compact X ).

Final comments: some new proof techniques

Known L:

Martingale methods,

Known L:

Martingale methods,

Oracle error control: Burkholder-Davis-Gundy inequality in Hilbertspaces.

Known L:

Martingale methods,

Unknown L:

Endogenous random stepsize: must control an oracle error which isnot a martingale difference,

Known L:

Martingale methods,

Unknown L:

Nevertheless, we can condition on a ball centred at the previous k-thiterate and control the empirical process

supx∈B(xk)

∥∥∥∥∥∥

ǫ(ξkj , x)− ǫ(ξkj , x∗)

∥∥∥∥∥∥,

where ǫ(ξ, x) := F (ξ, x)− T (x),

Known L:

Martingale methods,

Unknown L:

Nevertheless, we can condition on a ball centred at the previous k-thiterate and control the empirical process

supx∈B(xk)

∥∥∥∥∥∥

ǫ(ξkj , x)− ǫ(ξkj , x∗)

∥∥∥∥∥∥,

where ǫ(ξ, x) := F (ξ, x)− T (x),

Use locally moment and concentration inequalities of empiricalprocess theory.

References

[1] A. Iusem, A. Jofre, R.I. Oliveira and P. Thompson, Extragradientmethod with variance reduction for stochastic variational inequalities. Toappear in SIAM Journal on Optimization.

[2] A. Iusem, A. Jofre, R.I. Oliveira and P. Thompson, Variance-basedextragradient method with linear search for stochastic variational

inequalities. Submitted.

[3] A. Iusem, A. Jofre and P. Thompson, Incremental constraint projection

methods for monotone stochastic variational inequalities. Under secondround revision.

THANK YOU VERY MUCH!

Stochastic methods for stochastic variational...

Documents

Stochastic Impulse Control of Non-Markovian Processesboualem/impulse_control.pdf · Stochastic Impulse Control of Non-Markovian Processes ... on quasi-variational inequalities and

A Variational Auto-Encoder Model for Stochastic …openaccess.thecvf.com/content_CVPR_2019/papers/Mehrasa_A...A Variational Auto-Encoder Model for Stochastic Point Processes Nazanin

Stochastic Optimization and Variational Inferencehelper.ipam.ucla.edu/publications/sgm2014/sgm2014_11776.pdf · Stochastic Optimization and Variational Inference David M. Blei Princeton

Variational Inference for GPs: Presenters · Variational Inference for GPs: Presenters Group1: Stochastic variational inference. Slides 2 - 28 Chaoqi Wang Sana Tonekaboni Will Grathwohl

Variational convergence on Riemannian manifolds Stochastic Analysis … · 2019-04-04 · Variational convergence on Riemannian manifolds Stochastic Analysis and Applications Sendai,

Recent Applications of the Stochastic Variational Method (SVM)brown/p-shell-2004/pdf/ysuzuki.pdf · the Stochastic Variational Method (SVM) Y. Suzuki (Niigata) Outline 1. Motivation

A Variational Analysis of Stochastic Gradient Algorithms · A Variational Analysis of Stochastic Gradient Algorithms Stephan Mandt 1, Matthew D. Hoffman 2, David M. Blei 1;3 1 Data

Faster Stochastic Variational Inference using Proximal ... · Faster Stochastic Variational Inference using Proximal-Gradient Methods with General Divergence Functions Mohammad Emtiyaz

Non-convex variational inequalities with singular inputs ... · Non-convex variational inequalities with singular inputs and applications to stochastic inclusions Aurel Rascan¸ u˘

Stochastic gradient variational Bayes for gamma

solving variational inequalities with stochastic mirror-prox algorithm

Stochastic Variational Inferenceblei/papers/HoffmanBleiWangPaisley2013.pdf · to use stochastic variational inference in a variety of Bayesian nonparametric settings.) Stochastic

Variational Stochastic Multiscale Framework for Material Systems

STOCHASTIC VARIATIONAL INEQUALITIES: SINGLE-STAGE TO …rtr/papers/rtr238-MultiSVI.pdf · STOCHASTIC VARIATIONAL INEQUALITIES: SINGLE-STAGE TO MULTISTAGE R. Tyrrell Rockafellar,1

Stochastic Variational Inference With Gradient Linearization...Stochastic variational optimization. Recently, it was shown that the KL divergence is amenable to stochastic optimization

The Imperial Republic Lanny Thompson.pdf

Variational Bayesian Image Processing on Stochastic Factor Graphs

Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Stochastic Annealing for Variational Inference

Stochastic Variational Inference for Dynamic Correlated Topic … · through generalised Wishart processes (GWPs). We develop a stochastic variational inference method, which enables