30
HAL Id: hal-01024655 https://hal.inria.fr/hal-01024655 Submitted on 16 Jul 2014 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Dynamic programming using radial basis functions Oliver Junge, Alex Schreiber To cite this version: Oliver Junge, Alex Schreiber. Dynamic programming using radial basis functions. NETCO 2014, 2014, Tours, France. hal-01024655

Dynamic programming using radial basis functions

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dynamic programming using radial basis functions

HAL Id: hal-01024655https://hal.inria.fr/hal-01024655

Submitted on 16 Jul 2014

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

Dynamic programming using radial basis functionsOliver Junge, Alex Schreiber

To cite this version:Oliver Junge, Alex Schreiber. Dynamic programming using radial basis functions. NETCO 2014,2014, Tours, France. hal-01024655

Page 2: Dynamic programming using radial basis functions

Dynamic programming

using radial basis functions

Oliver Junge

Fakultat fur Mathematik

Technische Universitat Munchen

joint work with Alex Schreiber

Page 3: Dynamic programming using radial basis functions

Problem

discrete-time control system

xk+1 = f (xk , uk), k = 0, 1, 2, . . . ,

f : Ω× U → Ω continuous

Ω ⊂ Rd and U ⊂ Rm compact

target set T ⊂ Ω, compact

goal: construct feedback F : S → U, S ⊂ Ω, such that for

the closed loop system

xk+1 = f (xk ,F (xk)), xk ∈ S ,

the target T is asymptotically stable.

Oliver Junge Dynamic programming using radial basis functions 2

Page 4: Dynamic programming using radial basis functions

Optimal control

cost function c : Ω× U → [0,∞) continuous,

c(x , u) ≥ δ > 0 for x 6∈ T and any u ∈ U.

accumulated cost

J(x0, (uk)k) =

∞∑

k=0

c(xk , uk),

with trajectory (xk)k associated to x0 ∈ Ω and (uk)k ∈ UN.

optimal value function

V (x) = inf(uk )kJ(x , (uk)k)

Oliver Junge Dynamic programming using radial basis functions 3

Page 5: Dynamic programming using radial basis functions

The Bellman equation

V fulfills the Bellman equation

V (x) = infu∈Uc(x , u) + V (f (x , u))

=: L[V ](x)

with boundary condition V (T ) = 0.

optimal feedback

F (x) = argminu∈U

c(x , u) + V (f (x , u))

(whenever the min exists)

Oliver Junge Dynamic programming using radial basis functions 4

Page 6: Dynamic programming using radial basis functions

Numerical treatment

assume V ∈ F

approximation space A ⊂ F , dim(A) <∞

projection Π : F → A

discretized Bellman operator

Π L : A → A

value iteration: choose V (0) ∈ A with V (0)(T ) = 0,

V (n+1) := Π L[V (n)], n = 0, 1, . . .

typical A: finite differences, finite elements (order p)

problem: dim(A) ∼ O(nd) for error O(n−p)

Oliver Junge Dynamic programming using radial basis functions 5

Page 7: Dynamic programming using radial basis functions

Nonlinear approximation

Theorem [Girosi, Anzellotti, ’92]

If f ∈ Hs,2(Rd), s > d/2, we can find

n coefficients ci ∈ R,

n centers xi ∈ Rd ,

and n variances σi > 0 such that

∥∥∥∥∥f −

n∑

i=1

cie−‖x−xi ‖

2

2σ2i

∥∥∥∥∥

2

= O(n−1).

Oliver Junge Dynamic programming using radial basis functions 6

Page 8: Dynamic programming using radial basis functions

Scattered data interpolation

Problem

Given

sites X = x1, . . . , xN ⊂ Ω ⊂ Rd

data f1, . . . , fN ∈ R,

find a function a ∈ A such that

a(xi) = fi , i = 1, . . . ,N.

For A = spana1, . . . , aN we get

Ac = f , with Aij = aj(xi).

Oliver Junge Dynamic programming using radial basis functions 7

Page 9: Dynamic programming using radial basis functions

Radial basis functions

radial basis functions a( · , xj) = ϕ(‖ · −xj‖2)

examples: Gaussian: ϕ(r) = exp(−r2), Wendland function: ϕ(r) = (1− r)4+ · (4r + 1)

scaling: aj = aε

j = ϕ(ε‖ · −xj‖)

−1 −0.5 0 0.5 1

0

0.2

0.4

0.6

0.8

1

ε = 5

−1 −0.5 0 0.5 1

0.4

0.6

0.8

1

ε = 1

Oliver Junge Dynamic programming using radial basis functions 8

Page 10: Dynamic programming using radial basis functions

The Kruzkov transform

problem: V (x) increasing, but ϕ(x) decreasing as ‖x‖ → ∞

Kruzkov transform: V 7→ V = e−V (·)

Kruzkov-Bellman equation

V (x) = supu∈Ue−c(x ,u) · V (f (x , u)) =: L[V ](x), x ∈ Ω\T

with boundary condition V (T ) = 1.

under the assumption c(x , u) ≥ δ > 0 for x 6∈ T , the

Kruzkov-Bellman operator L is a contraction on L∞.

Oliver Junge Dynamic programming using radial basis functions 9

Page 11: Dynamic programming using radial basis functions

Dynamic programming using radial basis functions

approximation space

A = AX ,ε = spanϕ(ε‖ · −x‖2) : x ∈ X

interpolation operator on X

Π : F → A

discretized Kruzkov-Bellman operator

Π L : A → A

value iteration: choose V (0) ∈ A with V (0)(0) = 1,

V (n+1) := Π L[V (n)], n = 0, 1, . . .

Oliver Junge Dynamic programming using radial basis functions 10

Page 12: Dynamic programming using radial basis functions

Weighted least squares

Problem

Given

sites X = x1, . . . , xN ⊂ Ω ⊂ Rd ,

data f1, . . . , fN ∈ R,

approximation space A = spana1, . . . , am, m < N,

weight function w : Ω→ R with associated scalar product

〈f , g〉w :=∑Nk=1 f (xk)g(xk)w(xk) and induced norm

find a function a ∈ A such that

‖f − a‖w!= min

Optimal coefficient vector c :

Gc = fA

with Gram matrix G = (〈ai , aj〉w )ij and fA = (〈f , aj〉w )j .Oliver Junge Dynamic programming using radial basis functions 11

Page 13: Dynamic programming using radial basis functions

Moving least squares

Idea

In computing an approximation to the function f : Ω→ R at

x ∈ Ω, only the values at sites xj ∈ X close to x should play a role.

moving weight function w : Ω× Ω→ R

w(x , y) small for ‖x − y‖2 large

inner product: 〈f , g〉w(·,x) :=∑Nk=1 f (xk)g(xk)w(xk , x)

moving least squares approximation a of data f is

a(x) = ax (x),

where ax ∈ A is minimizing ‖f − ax‖w(·,x).

given by solving the Gram system G xcx = f xA

Oliver Junge Dynamic programming using radial basis functions 12

Page 14: Dynamic programming using radial basis functions

Shepard’s methodD. Shepard, A two dimensional interpolation function for irregularly spaced data, Proc. 23rd Nat. Conf. ACM, 1968.

simply choose A = span1

Gram matrix G x = 〈1, 1〉w(·,x) =∑Ni=1 w(xi , x)

right hand side f xA = 〈f , 1〉w(·,x) =∑Ni=1 f (xi)w(xi , x)

thus we get

cx = f x/G x =

N∑

i=1

f (xi)w(xi , x)

∑Ni=1 w(xi , x)

︸ ︷︷ ︸

=:ai (x)

and so the Shepard approximant is

Sf (x) = cx · 1 =

N∑

i=1

f (xi)ai(x)

advantage: Shepard approximation requires no matrix solve

Page 15: Dynamic programming using radial basis functions

Shepard discretization of the Bellman equation

approximation space

A = span

w(xi , ·)∑Ni=1 w(xi , x)

, xi ∈ X

Shepard approximation operator

S : F → A

discretized Kruzkov-Bellman operator

S L : A → A

value iteration as usual

Page 16: Dynamic programming using radial basis functions

Convergence of the value iteration

f 7→ Sf is linear,

for each x ∈ Ω, Sf (x) is a convex combination of the values

f (x1), . . . , f (xn), therefore

the Shepard operator S : (L∞, ‖ · ‖∞)→ (A, ‖ · ‖∞) has

norm 1,

thus we get

Lemma

Value iteration with the discretized Kruzkov-Bellman operator

S L : (A, ‖ · ‖∞)→ (A, ‖ · ‖∞) converges to the unique fixed

point of S L.

Oliver Junge Dynamic programming using radial basis functions 15

Page 17: Dynamic programming using radial basis functions

Convergence for fill distance → 0

fill distance of X ⊂ Ω

h = h(X ,Ω) = supx∈Ωminxj∈X‖x − xj‖2

If f : Ω→ R is Lipschitz continuous with constant L then

‖f − Sf ‖∞ ≤ CLh

for some constant C > 0.

Oliver Junge Dynamic programming using radial basis functions 16

Page 18: Dynamic programming using radial basis functions

Convergence for fill distance → 0

sequence (Xn)n of nodes sets, Xn ⊂ Ω, fill distances hn,

Shepard operators Sn,

K < 1 contraction constant of L,

V fixed point of L, Vn fixed point of Sn L

Theorem

If V is Lipschitz continuous, then

‖V − Vn‖∞ ≤CL

1− Kh

Oliver Junge Dynamic programming using radial basis functions 17

Page 19: Dynamic programming using radial basis functions

Example 1A simple 1D example

f (x , u) = aux , c(x , u) = ax , x ∈ [0, 1], u ∈ [−1, 1]

optimal feedback u(x) = −1

optimal value function V (x) = x

nodes Xk equidistant with spacing 1/k

T = [0, 1/(2k)]

U = −1 : 0.1 : 1

φσ : Wendland function of order 4, σ = k/5

Oliver Junge Dynamic programming using radial basis functions 18

Page 20: Dynamic programming using radial basis functions

Example 1A simple 1D example

L∞

-error

fill distance hk = 1/k

10−3

10−2

10−1

10−3

10−2

10−1

Oliver Junge Dynamic programming using radial basis functions 19

Page 21: Dynamic programming using radial basis functions

Example 2shortest path, geometrically complicated state constraints

We consider a boat in the mediteranian sea surrounding Greece

which moves with constant speed 1, i.e.

f (x , u) = x + hu, c(x , u) ≡ 1, x ∈ neighborhood of Greece

with time step h = 0.1 and u ∈ u ∈ R2 : ‖u‖ = 1

T = neighborhood of the harbour of Athens

X = equidistant in the sea on a 275 x 257 grid (50301 nodes)

U = exp(2πij/20) : j = 0, . . . , 19

φσ : Wendland function of order 4, σ = 10

CPU time: 6 secs

Oliver Junge Dynamic programming using radial basis functions 20

Page 22: Dynamic programming using radial basis functions

Example 2shortest path, geometrically complicated state constraints

7.5

9 10

21

22

21 20

19

18

17

16

15

14

13

2019

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

21.5

Oliver Junge Dynamic programming using radial basis functions 21

Page 23: Dynamic programming using radial basis functions

Example 3inverted pendulum, highly nonlinear dynamics

ϕ

u

M

m

`

f : equations of the forced pendulum

c : quadratic deviation from the origin + quadratic in control

T : neighborhood of the origin

X : equidistant grid of 100 x 100 nodes

U = −128 : 8 : 128

φσ : Wendland function of order 4, σ = 2.22

CPU time: 7 secsOliver Junge Dynamic programming using radial basis functions 22

Page 24: Dynamic programming using radial basis functions

Example 3inverted pendulum, highly nonlinear dynamics

ϕ

ϕ

0 1 2 3 4 5 6

1

1.5

2

2.5

3

3.5

4

4.5

5

−8

−6

−4

−2

0

2

4

Oliver Junge Dynamic programming using radial basis functions 23

Page 25: Dynamic programming using radial basis functions

Example 3inverted pendulum, highly nonlinear dynamics

‖v−

v k‖∞/‖v‖∞

k0.05 0.06 0.07 0.08 0.09 0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Oliver Junge Dynamic programming using radial basis functions 24

Page 26: Dynamic programming using radial basis functions

Example 4magnetic wheel, 3D example

s R

LN U

Ls

trackmagnet

J

Oliver Junge Dynamic programming using radial basis functions 25

Page 27: Dynamic programming using radial basis functions

Example 4magnetic wheel, 3D example

s = v ,

v =CJ2

mm4s2− µg,

J =1

Ls +C2s

(−RJ +C

2s2Jv + U),

cost function c quadratic in s, v and u

Ω: suitably chosen box

U = 6 · 103u3 | u ∈ −1,−0.99, . . . , 0.99, 1

T : neighborhood of the equilibrium (0.01, 0, 17.155)

X : equidistant grid of 30× 30× 30 nodes

φσ : Wendland function of order 4, σ = 11.2

CPU time: 60 secsOliver Junge Dynamic programming using radial basis functions 26

Page 28: Dynamic programming using radial basis functions

Example 4magnetic wheel, 3D example

J

vs

0.20.4

0.60.80 0.2 0.4 0.6

0

0.2

0.4

0.6

0.8

1

Jv

s

0.20.4

0.60.8

00.20.40.6

0

0.2

0.4

0.6

0.8

1

Oliver Junge Dynamic programming using radial basis functions 27

Page 29: Dynamic programming using radial basis functions

Matlab code template

f = @(x,u) ...

c = @(x,u) ...

phi = @(r) max(spones(r)-r,0) .ˆ4.*(4*r+spones(r));

T = [0 0]; v˙T = 1;

shepard = @(A) spdiags (1./ sum(A’) ’,0,size(A,1),size(A,1))*A

S = [8 ,10];

L = 33; U = linspace (-128,128,L)’;

N = 100; X1 = linspace(-1,1,N);

[XX ,YY] = meshgrid(X1*S(1),X1*S(2)); X = [XX(:) YY(:)];

ep = 1/sqrt ((4* prod(S)*20/Nˆ2)/pi);

A = shepard(phi(ep*sdistm(f(X,U),[T;X],1/ep)));

C = c(X,U);

v = zeros(Nˆ2+1 ,1); v0 = ones(size(v)); TOL = 1e-12;

while norm(v-v0,inf)/norm(v,inf) ¿ TOL

v0 = v;

v = [v˙T; max(reshape(C.*(A*v),L,Nˆ2)) ’];

end

contour (...

Oliver Junge Dynamic programming using radial basis functions 28

Page 30: Dynamic programming using radial basis functions

Conclusion

Pros

simple convergence theory

simple implementation, independent of state dimension

easy to incorporate complicated state constraints

Cons

delicate choice of the shape parameter

does not solve the curse of dimension ;-)

Oliver Junge Dynamic programming using radial basis functions 29