Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
HAL Id: hal-01024655https://hal.inria.fr/hal-01024655
Submitted on 16 Jul 2014
HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.
Dynamic programming using radial basis functionsOliver Junge, Alex Schreiber
To cite this version:Oliver Junge, Alex Schreiber. Dynamic programming using radial basis functions. NETCO 2014,2014, Tours, France. hal-01024655
Dynamic programming
using radial basis functions
Oliver Junge
Fakultat fur Mathematik
Technische Universitat Munchen
joint work with Alex Schreiber
Problem
discrete-time control system
xk+1 = f (xk , uk), k = 0, 1, 2, . . . ,
f : Ω× U → Ω continuous
Ω ⊂ Rd and U ⊂ Rm compact
target set T ⊂ Ω, compact
goal: construct feedback F : S → U, S ⊂ Ω, such that for
the closed loop system
xk+1 = f (xk ,F (xk)), xk ∈ S ,
the target T is asymptotically stable.
Oliver Junge Dynamic programming using radial basis functions 2
Optimal control
cost function c : Ω× U → [0,∞) continuous,
c(x , u) ≥ δ > 0 for x 6∈ T and any u ∈ U.
accumulated cost
J(x0, (uk)k) =
∞∑
k=0
c(xk , uk),
with trajectory (xk)k associated to x0 ∈ Ω and (uk)k ∈ UN.
optimal value function
V (x) = inf(uk )kJ(x , (uk)k)
Oliver Junge Dynamic programming using radial basis functions 3
The Bellman equation
V fulfills the Bellman equation
V (x) = infu∈Uc(x , u) + V (f (x , u))
=: L[V ](x)
with boundary condition V (T ) = 0.
optimal feedback
F (x) = argminu∈U
c(x , u) + V (f (x , u))
(whenever the min exists)
Oliver Junge Dynamic programming using radial basis functions 4
Numerical treatment
assume V ∈ F
approximation space A ⊂ F , dim(A) <∞
projection Π : F → A
discretized Bellman operator
Π L : A → A
value iteration: choose V (0) ∈ A with V (0)(T ) = 0,
V (n+1) := Π L[V (n)], n = 0, 1, . . .
typical A: finite differences, finite elements (order p)
problem: dim(A) ∼ O(nd) for error O(n−p)
Oliver Junge Dynamic programming using radial basis functions 5
Nonlinear approximation
Theorem [Girosi, Anzellotti, ’92]
If f ∈ Hs,2(Rd), s > d/2, we can find
n coefficients ci ∈ R,
n centers xi ∈ Rd ,
and n variances σi > 0 such that
∥∥∥∥∥f −
n∑
i=1
cie−‖x−xi ‖
2
2σ2i
∥∥∥∥∥
2
∞
= O(n−1).
Oliver Junge Dynamic programming using radial basis functions 6
Scattered data interpolation
Problem
Given
sites X = x1, . . . , xN ⊂ Ω ⊂ Rd
data f1, . . . , fN ∈ R,
find a function a ∈ A such that
a(xi) = fi , i = 1, . . . ,N.
For A = spana1, . . . , aN we get
Ac = f , with Aij = aj(xi).
Oliver Junge Dynamic programming using radial basis functions 7
Radial basis functions
radial basis functions a( · , xj) = ϕ(‖ · −xj‖2)
examples: Gaussian: ϕ(r) = exp(−r2), Wendland function: ϕ(r) = (1− r)4+ · (4r + 1)
scaling: aj = aε
j = ϕ(ε‖ · −xj‖)
−1 −0.5 0 0.5 1
0
0.2
0.4
0.6
0.8
1
ε = 5
−1 −0.5 0 0.5 1
0.4
0.6
0.8
1
ε = 1
Oliver Junge Dynamic programming using radial basis functions 8
The Kruzkov transform
problem: V (x) increasing, but ϕ(x) decreasing as ‖x‖ → ∞
Kruzkov transform: V 7→ V = e−V (·)
Kruzkov-Bellman equation
V (x) = supu∈Ue−c(x ,u) · V (f (x , u)) =: L[V ](x), x ∈ Ω\T
with boundary condition V (T ) = 1.
under the assumption c(x , u) ≥ δ > 0 for x 6∈ T , the
Kruzkov-Bellman operator L is a contraction on L∞.
Oliver Junge Dynamic programming using radial basis functions 9
Dynamic programming using radial basis functions
approximation space
A = AX ,ε = spanϕ(ε‖ · −x‖2) : x ∈ X
interpolation operator on X
Π : F → A
discretized Kruzkov-Bellman operator
Π L : A → A
value iteration: choose V (0) ∈ A with V (0)(0) = 1,
V (n+1) := Π L[V (n)], n = 0, 1, . . .
Oliver Junge Dynamic programming using radial basis functions 10
Weighted least squares
Problem
Given
sites X = x1, . . . , xN ⊂ Ω ⊂ Rd ,
data f1, . . . , fN ∈ R,
approximation space A = spana1, . . . , am, m < N,
weight function w : Ω→ R with associated scalar product
〈f , g〉w :=∑Nk=1 f (xk)g(xk)w(xk) and induced norm
find a function a ∈ A such that
‖f − a‖w!= min
Optimal coefficient vector c :
Gc = fA
with Gram matrix G = (〈ai , aj〉w )ij and fA = (〈f , aj〉w )j .Oliver Junge Dynamic programming using radial basis functions 11
Moving least squares
Idea
In computing an approximation to the function f : Ω→ R at
x ∈ Ω, only the values at sites xj ∈ X close to x should play a role.
moving weight function w : Ω× Ω→ R
w(x , y) small for ‖x − y‖2 large
inner product: 〈f , g〉w(·,x) :=∑Nk=1 f (xk)g(xk)w(xk , x)
moving least squares approximation a of data f is
a(x) = ax (x),
where ax ∈ A is minimizing ‖f − ax‖w(·,x).
given by solving the Gram system G xcx = f xA
Oliver Junge Dynamic programming using radial basis functions 12
Shepard’s methodD. Shepard, A two dimensional interpolation function for irregularly spaced data, Proc. 23rd Nat. Conf. ACM, 1968.
simply choose A = span1
Gram matrix G x = 〈1, 1〉w(·,x) =∑Ni=1 w(xi , x)
right hand side f xA = 〈f , 1〉w(·,x) =∑Ni=1 f (xi)w(xi , x)
thus we get
cx = f x/G x =
N∑
i=1
f (xi)w(xi , x)
∑Ni=1 w(xi , x)
︸ ︷︷ ︸
=:ai (x)
and so the Shepard approximant is
Sf (x) = cx · 1 =
N∑
i=1
f (xi)ai(x)
advantage: Shepard approximation requires no matrix solve
Shepard discretization of the Bellman equation
approximation space
A = span
w(xi , ·)∑Ni=1 w(xi , x)
, xi ∈ X
Shepard approximation operator
S : F → A
discretized Kruzkov-Bellman operator
S L : A → A
value iteration as usual
Convergence of the value iteration
f 7→ Sf is linear,
for each x ∈ Ω, Sf (x) is a convex combination of the values
f (x1), . . . , f (xn), therefore
the Shepard operator S : (L∞, ‖ · ‖∞)→ (A, ‖ · ‖∞) has
norm 1,
thus we get
Lemma
Value iteration with the discretized Kruzkov-Bellman operator
S L : (A, ‖ · ‖∞)→ (A, ‖ · ‖∞) converges to the unique fixed
point of S L.
Oliver Junge Dynamic programming using radial basis functions 15
Convergence for fill distance → 0
fill distance of X ⊂ Ω
h = h(X ,Ω) = supx∈Ωminxj∈X‖x − xj‖2
If f : Ω→ R is Lipschitz continuous with constant L then
‖f − Sf ‖∞ ≤ CLh
for some constant C > 0.
Oliver Junge Dynamic programming using radial basis functions 16
Convergence for fill distance → 0
sequence (Xn)n of nodes sets, Xn ⊂ Ω, fill distances hn,
Shepard operators Sn,
K < 1 contraction constant of L,
V fixed point of L, Vn fixed point of Sn L
Theorem
If V is Lipschitz continuous, then
‖V − Vn‖∞ ≤CL
1− Kh
Oliver Junge Dynamic programming using radial basis functions 17
Example 1A simple 1D example
f (x , u) = aux , c(x , u) = ax , x ∈ [0, 1], u ∈ [−1, 1]
optimal feedback u(x) = −1
optimal value function V (x) = x
nodes Xk equidistant with spacing 1/k
T = [0, 1/(2k)]
U = −1 : 0.1 : 1
φσ : Wendland function of order 4, σ = k/5
Oliver Junge Dynamic programming using radial basis functions 18
Example 1A simple 1D example
L∞
-error
fill distance hk = 1/k
10−3
10−2
10−1
10−3
10−2
10−1
Oliver Junge Dynamic programming using radial basis functions 19
Example 2shortest path, geometrically complicated state constraints
We consider a boat in the mediteranian sea surrounding Greece
which moves with constant speed 1, i.e.
f (x , u) = x + hu, c(x , u) ≡ 1, x ∈ neighborhood of Greece
with time step h = 0.1 and u ∈ u ∈ R2 : ‖u‖ = 1
T = neighborhood of the harbour of Athens
X = equidistant in the sea on a 275 x 257 grid (50301 nodes)
U = exp(2πij/20) : j = 0, . . . , 19
φσ : Wendland function of order 4, σ = 10
CPU time: 6 secs
Oliver Junge Dynamic programming using radial basis functions 20
Example 2shortest path, geometrically complicated state constraints
7.5
9 10
21
22
21 20
19
18
17
16
15
14
13
2019
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
21.5
Oliver Junge Dynamic programming using radial basis functions 21
Example 3inverted pendulum, highly nonlinear dynamics
ϕ
u
M
m
`
f : equations of the forced pendulum
c : quadratic deviation from the origin + quadratic in control
T : neighborhood of the origin
X : equidistant grid of 100 x 100 nodes
U = −128 : 8 : 128
φσ : Wendland function of order 4, σ = 2.22
CPU time: 7 secsOliver Junge Dynamic programming using radial basis functions 22
Example 3inverted pendulum, highly nonlinear dynamics
ϕ
ϕ
0 1 2 3 4 5 6
1
1.5
2
2.5
3
3.5
4
4.5
5
−8
−6
−4
−2
0
2
4
Oliver Junge Dynamic programming using radial basis functions 23
Example 3inverted pendulum, highly nonlinear dynamics
‖v−
v k‖∞/‖v‖∞
k0.05 0.06 0.07 0.08 0.09 0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Oliver Junge Dynamic programming using radial basis functions 24
Example 4magnetic wheel, 3D example
s R
LN U
Ls
trackmagnet
J
Oliver Junge Dynamic programming using radial basis functions 25
Example 4magnetic wheel, 3D example
s = v ,
v =CJ2
mm4s2− µg,
J =1
Ls +C2s
(−RJ +C
2s2Jv + U),
cost function c quadratic in s, v and u
Ω: suitably chosen box
U = 6 · 103u3 | u ∈ −1,−0.99, . . . , 0.99, 1
T : neighborhood of the equilibrium (0.01, 0, 17.155)
X : equidistant grid of 30× 30× 30 nodes
φσ : Wendland function of order 4, σ = 11.2
CPU time: 60 secsOliver Junge Dynamic programming using radial basis functions 26
Example 4magnetic wheel, 3D example
J
vs
0.20.4
0.60.80 0.2 0.4 0.6
0
0.2
0.4
0.6
0.8
1
Jv
s
0.20.4
0.60.8
00.20.40.6
0
0.2
0.4
0.6
0.8
1
Oliver Junge Dynamic programming using radial basis functions 27
Matlab code template
f = @(x,u) ...
c = @(x,u) ...
phi = @(r) max(spones(r)-r,0) .ˆ4.*(4*r+spones(r));
T = [0 0]; v˙T = 1;
shepard = @(A) spdiags (1./ sum(A’) ’,0,size(A,1),size(A,1))*A
S = [8 ,10];
L = 33; U = linspace (-128,128,L)’;
N = 100; X1 = linspace(-1,1,N);
[XX ,YY] = meshgrid(X1*S(1),X1*S(2)); X = [XX(:) YY(:)];
ep = 1/sqrt ((4* prod(S)*20/Nˆ2)/pi);
A = shepard(phi(ep*sdistm(f(X,U),[T;X],1/ep)));
C = c(X,U);
v = zeros(Nˆ2+1 ,1); v0 = ones(size(v)); TOL = 1e-12;
while norm(v-v0,inf)/norm(v,inf) ¿ TOL
v0 = v;
v = [v˙T; max(reshape(C.*(A*v),L,Nˆ2)) ’];
end
contour (...
Oliver Junge Dynamic programming using radial basis functions 28
Conclusion
Pros
simple convergence theory
simple implementation, independent of state dimension
easy to incorporate complicated state constraints
Cons
delicate choice of the shape parameter
does not solve the curse of dimension ;-)
Oliver Junge Dynamic programming using radial basis functions 29