Event-Based Stochastic Learning and...

Event-Based Stochastic

Learning and Optimization

Xi-Ren Cao

Hong Kong Uni. of Science & Tech.

Example 1: Admission Control in Communication

l b(n)

1-b(n)

n: number of all customers in networkni: number of customers at server i n=(n1,…,nM): state, N: Capacity

How do we choose the admission probability b(n)?

The Problem

n Non-standard n Not a Markov Decision Process (MDP)

Action depends on the population n,Different state n need to take the same action

n Not a Partially Observable MDP (POMDP)When a customer arrives, know something about the next state (population will not decrease)

n Additional New featuresn Control only applied when a customer arrivesn Relatively small policy space: n -> a vs n -> a

n New formulation and approachn Event based and sensitivity view

Not traditional: MDP with constraintsn Generally applicable to many other problems

Perturbation Analysis

i. Queuing sys.ii. Markov sys.

Markov Decision Process:

Policy Iteration

Event-Based Optimization:i. PAii. Policy Iterat

A sensitivityview of

Learning &Optimization

APPLIC

Table of Contents

n Formulation of the Event-based Optimizationn Formulating events

n Event-based optimization

n Introduction to a Sensitivity-Based View of Learning and Optimizationn The Problem

n Two fundamental sensitivity-based approaches

n Limitations of state-based approaches

n Solutions to the Event-Based Optimization

n Examples

n Summary and Discussion

Event-Based Approaches

n X={Xn, n=1,2,…..}, ergodic, Xn in S={1,2,…,M}.

n Transition prob. Matrix P=[p(i,j)]ij=1,..,M

n Steady-state probability: p=(p(1), p(2),..,p(M)).

n Performance function: f=(f(1), …, f(M))T

n Steady-state performance: h = pf = Si p(i)f(i)

Pe=e, p(I-P)=0, pe=1. e=(1,1,…,1)T, I: identity

The Markov ModelThe Markov Model

p(1,3)

p(3,1)

p(2,1)

p(3,2)

p(1,2)p(2,3)

Formulating Events

1-b(n)

An event may contain information about current state, and

next state after transition

A customer arrival: population not reduced after transition

A transition: <i,j>; An event: a set of transitions

n=(n1,…,nM): n+i=(n1,…,ni+1,…,nM) ; n-i=(n1,…,ni-1,…,nM)

An arrival customer accepted: a+ = {< n, n+i >, all n & i}

A customer departure: d- = {< n, n-i>, all n & i}

An arrival customer rejected: a- = {< n, n>, all n}

A customer arrival: a = a+U a-

Arriving customer finds population n:

Three Types of Events

Observable Event:

1. We may observe the event2. Contains some information

0 1 1 1 0 1

0 2 0 1 1 0 2 0 0

0 1 0 1 0 00 0 0

3 svr, 2 cust.

Controllable Event: Arriving customer accepted: an,+ = {< n, n+i >, all i & n in Sn}

Sn = {n, with S ni = n }

Arriving customer rejected: an,- = {< n, n>, n in Sn}

We can control the probabilities of these events: b(n) = p(an,+|an)

0 1 1 1 0 1

0 2 0 1 1 0 2 0 0

0 1 0 1 0 00 0 0

n=1a1-

a=a1+ U a1-

Natural Transition Event: A customer entering server i: an,+i = {< n, n+i >, n in Sn}

Nature determines randomly

0 1 1 1 0 1

0 2 0 1 1 0 2 0 0

0 1 0 1 0 00 0 0

n=1a1-

a1,+3a1,+2

Natural transitionEvent

Controllable Event

Action

Observable Event

Is this an arrival?No

Do nothing

No. of customer n?

Determine b(n)

Accept Reject

b(n) 1-b(n)

Customerjoins sv j

Customerleaves

Logical Relations in Events

Three events, observable, controllable, and naturaltransition, happens simultaneously in time, but have a logical order

è State dependent Markov model does not fit well

Event-Based Optimization

DeterminesState

transition

Underlying Markov process {Xn , n=0,1,…}

At time n, we observe an observable event an or b, d (not Xn )

Determine an action L(an) - controls the prob. of

Controllable event occurs - an+ or an-

Natural transition event follows

Goal: find a policy L to optimize the performance.

Partially Observable MDP (POMDP)

Underlying Markov process {Xn , n=0,1,…}

At time n, we observe a random variable Yn (not Xn )

Determine an action L(Y0..Yn) - controls state transition prob.

State transition follows

Goal: find a policy L to optimize the performance.

HOWTo Find a Solution?

A Sensitivity-Based View of

The System

(state x)

Inputs al , l=0,1,…(actions)

Output yl , l=0,1,…

(Observations)

System Performance

Dynamic system

Policy: action= f(information) , a =L(y)

L may depend on parameters q

Optimizatiom: Determine a policy L that maximizes hL

Learning:

1. Observing /analyzing sample paths (with or without estimating system parameters).

2. Study the behavior of a system under one policy

to learn how to improve system performance

Difficulties:

1. Policy space too large (# states: N, # actions: M, # policies: MN)

2. System too complicated, structure/parameters unknown

A Philosophical Boundary

§ We can study/observe

only one policy at a time

We can do very little!

§ By studying or observingthe behavior of one policy,

in general we cannot

obtain the performance of

other policies

§ Continuous Parameters § Discrete Policy Space

Two Basic Approaches

(policy gradient) (policy iteration)

h= Qg'' phh =-

We can only start with these two sensitivity formulas!

Performance Difference Formula

For two Markov chains P, f, h, p and P’, f, h’, p’, let Q=P’-P

Poisson equation and performance potentials g :

fgePI =+- )( p

hpp == fg:p´ f'' ph =

:'p´ QggPP ')'('' pphh =-=-

Potentials g can be estimated from a sample path of (P,f)

0 }|])([{)(k

k iXXfEig h

Policy Iteration

gPPQg )'(''' -==- pphh

1. h’>h if P’g>Pg

2. Policy iteration:

At any state find a policy P’ with P’g>Pg

3. Multi-chain (average performance): A simple, intuitive approach without discounting

Limitations of State-Based Approaches

n Curse of dimensionality: state space too large

n Independent actions: action at different statesneed to be independent

n Structural property lost

Event-Based Optimization

Event-based policy: an à b(n)

Problem: Find a policy that maximizes steady-state perf.

Method: Start with perf. difference formula

How to derive PDF?

Construction method with performance potentialsas building blocks.

Two policies: b(n) and b’(n)

Potential aggregation:

p(n): prob. of event an , arrival finding population n

åå ååå

-=== =

ngnpngnpqn

nd )}()()]()([{)(

)}()]()(')[('{'N

ndnbnbnphh1. PDF:

h q2. Performance deriv:

1è policy iteration: Reduced computation: N linear in sizeaction depends on states

2è gradient-based optimization (Perturbation analysis)

Performance Difference Formula

3. Learning and On-line implementation:

d(n) estimated with same amount of computation as g(n)!

n Action taken when an event happens

n Same event may happen at different states, i.e., actions may be the same for different states

n Event-based policy

n Perf. diff. formula for event based policiesn Utilize system structure (aggregation)

Benefit ?n Event-based policy iteration possible

n Reduce computation (e.g, linear in system size)

n Intuitive, clear physical meaning

Summary:

Other Examples

Example 1: Admission Control in Communication

l b(n)

1-b(n)

n: number of all customers in networkni: number of customers at server i n=(n1,…,nM): state, N: Capacity

Events: A customer arrives finding a population n

How do we choose the admission probability b(n)?

qi i=0

Example 2: Manufacturing

M products: i=1,2,…, M Operation of product i consists of Ni phases, j=1,…,Ni

State: (i,j)

Two level hierarchical control When a product i is finishes, how do we choose

The next product i’, (probability c(i,i’) )?

Events: product i finished

c(i, i’)

Example 3: Control with Lebesgue Sampling

Riemann Sampling Lebesgue Sampling

x xf(x) f(x)

LSC More efficient than RSC (Astrom, et al, for X={x1})

X= {-inf, +inf} X= {x1, x2, x3 , x4}

How to develop an optimal control policy for LSC with X= {x1, … } ?

Events: System reaches a state in X= {x1, … }

Example 4: Large Communication Networks

Subnet 1

Subnet 2

Subnet 3

Subnet 4

Conclusion and Discussion

n Learning and optimization are bases on

two performance sensitivity formulas

n They lead to two fundamental approaches: gradient based (PA) and policy iteration based

n State-based model has limitations:State space too large, action independent, no structure

n Event-based optimizationn Event: a set of transitions, extension of set of states

n Three types of events, logically related, structure properties

n Solution based on PDF

n Computation reduced, maybe linear in system size

n On-lime implementation

n Widely applicable

(Exs: some not fit Markov model, others utilize special features)

Thank You!

Event-Based Stochastic Learning and...

Documents

Stochastic Optimization for Big Data Analytics: Algorithms ... · STochastic OPtimization (STOP) and Machine Learning Outline 1 STochastic OPtimization (STOP) and Machine Learning

Stochastic Learning

Hierarchical Reinforcement Learning Considering Stochastic

Learning Weight Uncertainty With Stochastic Gradient MCMC ...openaccess.thecvf.com/content...Weight_Uncertainty... · Learning Weight Uncertainty with Stochastic Gradient MCMC for

ORF 544 Stochastic Optimization and Learning

Stochastic Unsupervised Learning on Unlabeled Data

Stochastic Learning Networks and their Electronic ...papers.nips.cc/paper/80-stochastic-learning-networks-and...Stochastic Learning Networks and their Electronic Implementation Joshua

Learning Stochastic Feedforward Neural Networkspapers.nips.cc/paper/5026-learning-stochastic-feed...Figure 1: Stochastic Feedforward Neural Networks. Left: Network diagram. Red nodes

Sampling-based RBDO using the stochastic sensitivity ...user.engineering.uiowa.edu/~kkchoi/RBDO_A.24.pdf · Sampling-based RBDO using the stochastic sensitivity analysis and Dynamic

Deterministic and Stochastic Sensitivity Analysis of a … · 2020-04-02 · Deterministic and Stochastic Sensitivity Analysis of a Mathematical Model for Polymer Flooding Hans Petter

Stochastic Knowledge Representations and Machine Learning

Sensitivity analysis of a stochastic simulation model for ... · Sensitivity analysis of a stochastic simulation model for foot and mouth disease IMke traulsen1, ... feasible mean

Learning Stochastic Feedforward Neural Networks

Evolution of Stochastic Learning Automata

Learning stochastic neural networks with Chainer

Applications of Stochastic Calculus of Variations to ...duedahl.org/1665_Duedahl_materie.pdf · Applications of Stochastic Calculus of Variations to Sensitivity Analysis and Related

Accurate parametric sensitivity of stochastic biochemical ... · Accurate parametric sensitivity of stochastic biochemical systems Masters of Applied Mathematics 2015 Farid Gassoumov

Extensive Sensitivity Analysis and Parallel Stochastic

Online Learning: Stochastic and Constrained Adversaries

An Optimization Algorithm Based On Stochastic Sensitivity ... · Optimization Algorithm Based On Stochastic Sensitivity Analysis For Noisy Objective Landscapes 日本 IBM. 東京基礎研究所