Function approximation - Algorithmic Learning 64-360, Part 3d · 2012-06-27 · University of...

Preview:

Citation preview

University of Hamburg

MIN Faculty

Department of Informatics

Function approximation

Function approximationAlgorithmic Learning 64-360, Part 3d

Jianwei Zhang

University of HamburgMIN Faculty, Dept. of Informatics

Vogt-Kolln-Str. 30, D-22527 Hamburgzhang@informatik.uni-hamburg.de

27/06/2012

Zhang 1

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Approximation

Approximation of the relation between x and y (curve, plane,hyperplane ) with a different function, given a limited number ofdata points D = {xi , yi}li=1.

Zhang 2

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Approximation vs. Interpolation

A special case of approximation is interpolation:the model exactly matches all data points.

If many data points are given or measurement data is affected by noise,approximation is preferably used.

Zhang 3

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Approximation without Overfitting

Zhang 4

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Interpolation with Polynomials

Polynomial interpolation:

I Lagrange polynomial,

I Newton polynomial,

I Bernstein polynomial,

I Basis-Splines.

Zhang 5

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Lagrange interpolation

To match l + 1 data points (xi , yi ) (i = 0, 1, . . . , l) with a polynomial ofdegree l , the following approach of LAGRANGE can be used:

pl(x) =l∑

i=0

yiLi (x)

The interpolation polynomial in the Lagrange form is defined as follows:

Li (x) =(x − x0)(x − x1) · · · (x − xi−1)(x − xi+1) · · · (x − xl)

(xi − x0)(xi − x1) · · · (xi − xi−1)(xi − xi+1) · · · (xi − xl)

=

{1 if x = xi0 if x 6= xi

Zhang 6

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Newton Interpolation

The Newton basis polynomials of degree l are constructed as follows:

pl(x) = a0+a1(x−x0)+a2(x−x0)(x−x1)+· · ·+al(x−x0)(x−x1) · · · (x−xl−1)

This approach enables us to calculate the coefficients easily.For n = 2 the following system of equations is obtained:

p2(x0) = a0 = y0p2(x1) = a0 + a1(x1 − x0) = y1p2(x2) = a0 + a1(x2 − x0) + a2(x2 − x0)(x2 − x1) = y2

Zhang 7

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Interpolation with Bernstein polynomials - I

Interpolation of two points with Bernstein polynomials:

y = x0B0,1(t) + x1B1,1(t) = x0(1− t) + x1t

Zhang 8

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Interpolation with Bernstein polynomials - II

Interpolation of three points with Bernstein polynomials:

y = x0B0,2(t) + x1B1,2(t) + x2B2,2(t) = x0(1− t)2 + x12t(1− t) + x2t2

Zhang 9

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Interpolation with Bernstein polynomials - III

Interpolation of four points with Bernstein polynomials:

y = x0B0,3(t) + x1B1,3(t) + x2B2,3(t)x3B3,3(t)

= x0(1− t)3 + x13t(1− t)2 + x23t2(1− t) + x3t3

Zhang 10

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Interpolation with Bernstein polynomials - IV

The Bernstein polynomials of degree k + 1 are defined as follows:

Bi,k(t) =

(k

i

)(1− t)k−i t i , i = 0, 1, . . . , k

Interpolation with Bernstein polynomials Bi,k :

y = x0B0,k(t) + x1B1,k(t) + · · ·+ xkBk,k(t)

Zhang 11

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

B-Splines

A normalized B-Splines Ni,k of degree k is defined as follows: For k = 1,

Ni,k(t) =

{1 : for ti ≤ t < ti+1

0 : else

and for k > 1, the recursive definition:

Ni,k(t) =t − ti

ti+k−1 − tiNi,k−1(t)+

ti+k − t

ti+k − ti+1Ni+1,k−1(t)

with i = 0, . . . ,m.

Zhang 12

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

B-Spline-Curve

A B-Spline-Curve of degree k is a composite function built piecewisefrom basis B-Splines resulting in a polynomial of degree (k − 1) that is(k-2)-times continuously differentiable (class C k−2) at the borders of thesegments.The Curve is constructed by polynomials, that are defined by thefollowing parameters:

t = (t0, t1, t2, . . . , tm, tm+1, . . . , tm+k),

where

I m: depending on the number of data-points

I k: the fixed degree of the B-Spline curve

Zhang 13

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Examples of B-Splines

B-Splines with degree 1, 2, 3 and 4:

t

t

t

tt j j+1 j+2 j+3 j+4tttt

Nj, 1

Nj, 2

Nj, 3

Nj, 4

Between the interval of parameters k B-Splines are overlapping.

Zhang 14

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Examples of cubic B-Splines

Zhang 15

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

B-Splines of degree k - I

The recursive definition procedure of a B-Spline basis functionNi ,k(t):

Zhang 16

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

B-Splines of degree k - II

Current segments of B-Spline basis functions of degree 2, 3 and 4 for

ti ≤ t < ti+1:

Zhang 17

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Uniform B-spline of degree 1 to 4

k=1 k=2

k=3 k=4

Zhang 18

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Non-uniform B-Splines

Degree 3:

Zhang 19

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Properties of B-splines

Partition of unity:∑k

i=0 Ni ,k(t) = 1.

Positivity: Ni ,k(t) ≥ 0.

Local support: Ni ,k(t) = 0 for t /∈ [ti , ti+k ].

C k−2 continuity: If the knots {ti} different in pairsthen Ni ,k(t) ∈ C k−2,i.e. Ni ,k(t) is (k − 2) times continuouslydifferentiable.

Zhang 20

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Construction of B-spline curves

A B-spline curve can be constructed blending a number of predefinedvalues (data-points) with B-splines

r(t) =m∑j=0

vj ·Nj,k(t)

where vj are called control points (de Boor-points).

Let t be a given parameter, then r(t) is a point of the B-spline curve.

If t varies from tk−1 to tm+1, then r(t) is a (k-2)-times continuously

differentiable function (class C k−2).

Zhang 21

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Calculation of control points from data points

The points vj are only identical with the data points if k = 2(interpolation/otherwise approximation). The control points form aconvex hull of the interpolation curve. Two methods for thecalculation of control points from data points:

Zhang 22

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Calculation of control points from data points

1 Solving the following system of equations (Bohm84):

qj(t) =m∑j=0

vj · Nj ,k(t)

where qj are the data-points for interpolation/approximation,j = 0, · · · ,m.

2 Learning based on gradient descent(Zhang98).

Zhang 23

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Lattice - I

Zhang 24

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Lattice - II

Zhang 25

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Tensor 2D-NURBS

Zhang 26

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Real-world Problems

I modeling: learning from examples, self-optimized formation,prediction, ...

I control: perception-action cycle, state control, Identification ofdynamic systems,...

Function approximation as a benchmark for the choice of a model

Zhang 27

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Function approximation - 1D example

An example function f (x) = 8sin(10x2 + 5x + 1) with −1 < x < 1and the correctly distributed B-Splines:

Zhang 28

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Lattice

The B-spline model – a two-dimensional illustration.

Zhang 29

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Lattice (cont.)

Every n-dimensional square (n > 1) is covered by the j th

multivariate B-spline N jk(x). N j

k(x) is defined by the tensor of nunivariate B-splines:

N jk(x) =

n∏j=1

N jij ,kj

(xj) (1)

Therefore the shape of each B-spline, and thus the shape ofmultivariate ones (Figure 2), is implicitly set by their order andtheir given knot distribution on each input interval.

Zhang 30

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Lattice (cont.)

(a) Tensor of two, order2 univariate B-splines.

(b) Tensor of one order3 and one order 2 univa-riate B-splines.

(c) Tensor of two univa-riate B-splines of order3.

Bivariate B-splines formed by taking the tensor of two univariateB-splines.

Zhang 31

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

General requirements for an approximator

I Universality: Approximation of arbitrary functions

I Generalization: good approximation without Overfitting

I Adaptivity: on the basis of new data

I Parallelism: Computing based on biological models

I Interpretability: at least “Grey-box” instead of “Black-box”

Zhang 32

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Importance of the Interpretability of a Model

Richard P. Feynman: “the way we have to describe nature isgenerally incomprehensible to us”.

Albert Einstein: “it should be possible to explain the laws ofphysics to a barmaid”.

Zhang 33

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Importance of the Interpretability of a Model (cont.)

Important reasons for the symbolic interpretability of anapproximator:

I Linguistic modeling is a basis of skill transfer from an expert toa computer or robot .

I Automated learning of a transparent model facilitates theanalysis, validation and monitoring in the development cycle ofa model or a controller.

I Transparen models provide diverse applications inDecision-Support Systems.

Zhang 34

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

B-Spline ANFIS

In a B-Spline ANFIS with n inputs x1, x2, . . . , xn, the rules are usedthe following form:{Rule(i1, i2, . . . , in): IF (x1 IS N1

i1,k1) AND (x2 IS N2

i2,k2) AND . . .

AND (xn IS Nnin,kn

) THEN y IS Yi1i2...in},where

I xj : input j (j = 1, . . . , n),

I kj : degree of B-spline basis function for xj ,

I N jij ,kj

: with the i-th linguistic term for the xj -associated B-splinefunction,

I ij = 0, . . . ,mj , partitioning of input j ,

I Yi1i2...in : control points for Rule(i1, i2, . . . , in).

I the “AND”-operator: product

Zhang 35

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

B-Spline ANFIS (cont.)

Then the output y of the MISO control system is:

y =

m1∑i1=1

. . .

mn∑in=1

(Yi1,...,in

n∏j=1

N jij ,kj

(xj))

This is a general B-spline model that represents the hyperplane ( itNUBS (nonuniform B-spline)).http://equipe.nce.ufrj.br/adriano/fuzzy/transparencias/anfis/anfis.pdf

Zhang 36

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Architecture of B-Spline ANFIS

Zhang 37

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

MF(membership function)-Formulation - Tensor

Tensor of 2D-Splines:

Zhang 38

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

The activation of MF by the inputs

Zhang 39

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

B-Spline ANFIS: example

An example with two input variables (x und y) and one Output z .

The parameters of the THEN-clauses are Z1,Z2,Z3,Z4.

Zhang 40

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

B-Spline ANFIS: example (cont.)

The linguistic terms of inputs (IF-clauses):

Zhang 41

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

B-Spline ANFIS: example (cont.)

The parameters of the THEN-clauses:

Zhang 42

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Example: control basis

The sample control basis consists of four rules:

Rule1) IF x is X1 and y is Y1 THEN z is Z1

2) IF x is X1 and y is Y2 THEN z is Z2

3) IF x is X2 and y is Y1 THEN z is Z2

4) IF x is X2 and y is Y2 THEN z is Z4

Zhang 43

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Illustration of the fuzzy inference

Zhang 44

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Illustration of the fuzzy inference (2)

Zhang 45

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Illustration of the fuzzy inference (3)

Zhang 46

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Illustration of the fuzzy inference (4)

Zhang 47

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Illustration of the fuzzy inference (5)

Zhang 48

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Algorithms for Supervised Learning - I

Let {(X, yd)} be a set of training data, where

I X = (x1, x2, . . . , xn) : the vector of input data,

I yd : the desired output for X.

The LSE is:

E =1

2(yr − yd)2, (2)

where yr is the current real output value during the training cycle.Goal is to find the parameters Yi1,i2,...,in , that minimize the error in(2)

E =1

2(yr − yd)2 ≡ MIN. (3)

Zhang 49

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Algorithms for Supervised Learning - II

Each control point Yi1,...,in can be improved with the followinggradient descend algorithm:

∆Yi1,...,in = −ε ∂E

∂Yi1,...,in

(4)

= ε(yr − yd)n∏

j=1

N jij ,kj

(xj) (5)

where 0 < ε ≤ 1.

Zhang 50

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Algorithms for Supervised Learning - III

The gradient descend algorithm ensures that the learning algorithmconverges to the global minimum of the LSE-function, because thesecond partial derivative of Y(i1, I2, hdots, in) is constant:

∂2E

∂2Yi1,...,in

= (n∏

j=1

N jij ,kj

(xj))2 ≥ 0. (6)

This means that the LSE-function ( ref (error)) is convexY(i1, I2, dots, in is) and therefore has only one (global) minimum.

Zhang 51

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Symbol Transformation of the Core Functions

Positive, convex core functions can be considered as Fuzzy sets, forexample:

µB(x) =1

1 +(x−5010

)2

Zhang 52

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Membership-functions

0

1

-5 0 5

0

1

-5 0 5

0

1

-5 0 5

0

1

-5 0 5

0

1

-5 0 5

0

1

-5 0 5

-5

0

1

0 5

0

1

-5 0 5

(a) (b)

( c )

(e) (f)

(g) (h)

(d)

Zhang 53

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Introduction to fuzzy sets

I fuzzy natural-language gradations of terms like “big”,“beautiful”, “strong” ...

I human thought and behavior models using the one-step logic:�� ��Driving: “IF-THEN”-clauses�� ��Car parking: With millimeter accuracy?

Zhang 54

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Introduction to Fuzzy sets

I Use of fuzzy language instead of numerical description:�� ��brake 2.52 m before the curve

→ only in machine systems�� ��brake shortly before the curve

→ in natural language

Zhang 55

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Definitions

Fuzzy: indistinctive, vague, unclear.

Fuzzy sets / fuzzy logic as a mechanism for

I fuzzy natural-language gradations of terms like “big”,“beautiful”, “strong” ...

I usage of fuzzy language instead of numerical description:.

I abstraction of unnecessary / too complex details.

I human thought and behavior models using the one-step logic.

Zhang 56

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Characteristic function vs. Membership function

For Fuzzy-sets A we used a generalized characteristic function µA thatassigns a real number from [0, 1] to to each member x ∈ X — the“degree” of membership of x to the fuzzy set A:

µA : X → [0, 1]

µA is called membershop-function.

A = {(x , µA(x)|x ∈ X}

Zhang 57

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Membership function

Characteristic of the continuous membership function

I Positive, convex functions (some important core functions).

I Subjective perception

I no probabilistic functions

Zhang 58

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Membership function types - I

Triangle: trimf (x ; a, b, c) = max

(min

(x − a

b − a,

c − x

c − b

), 0

)

Trapeze: trapmf (x ; a, b, c , d) = max

(min

(x − a

b − a, 1,

d − x

d − c

), 0

)

Zhang 59

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Membership function types - II

Gaussian: gaussmf (x ; c , σ) = e−12( x−c

σ )2

B-Splines: bsplinemf (x , xi , xi+1, · · · , xi+k)

Zhang 60

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Linguistic variables

A numeric variable has numerical values:

age = 25

A linguistic variable has linguistic values (terms):age : young

A linguistic value is a fuzzy set.

Zhang 61

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Fuzzy-Partition

Fuzzy partition of the linguistic values “young”, “average” and“old”:

Zhang 62

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Fuzzy Logic: inference mechanisms

A fuzzy rule is formulated as follows:“IF A THEN B”

with Fuzzy-sets A, B and the universes X , Y .

One of the most important inference mechanisms is the generalizedModus-Ponens (GMP):

Implication: IF x is A THEN y is BPremise: x is A′

Conclusion: y is B ′

Zhang 63

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Fuzzy systems for function approximation

Basic idea:

I Description of the desired control behavior through naturallanguage, qualitative rules.

I Quantification of linguistic values by fuzzy sets.

I Evaluation by methods of fuzzy logic or interpolation.

Zhang 64

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Fuzzy systems for function approximation

Fuzzy-rules:

”IF (a set of conditions is met)THEN(a set of consequences can be determined)“

In the premises (Antecedents) of the IF-part: linguistic variablesfrom the domain of process states;

In the conclusions (Consequences) of the THEN-part:linguistic variables from the system domain.

Zhang 65

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Adaptive networks

Architecture:

Feedforward networks with different node functions

Zhang 66

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Rule Extraction

The Fuzzy-Patches (Kosko):

Zhang 67

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Rule Extraction

A Fuzzy-Rule-Patch:

Zhang 68

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Additive Systeme

An additive fuzzy controller adds the “THEN”-Parts of the firedrules.

Fuzzy-Approximations-Rule:

An additive Fuzzy controller can approximate any continuousfunction f : X → Y if X is compact

Zhang 69

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Optimal Fuzzy-Rule-Patches

Optimal fuzzy rule patches cover the extrema of a function:

Zhang 70

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Optimal Fuzzy-Rule-Patches

Projection of the ellipsoids on the input and output axis:

Zhang 71

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Optimal Fuzzy-Rule-Patches

The size of an ellipsoid depends on the training data.

Zhang 72

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Optimal Fuzzy-Rule-Patches

Visualization of the input-output space:

Zhang 73

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Optimal Fuzzy-Rule-Patches

An example for interpolation:

Zhang 74

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Optimal Fuzzy-Rule-Patches

Data cluster along the function:

Zhang 75

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Optimal Fuzzy-Rule-Patches

Zhang 76

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Optimal Fuzzy-Rule-Patches

Approximation with Fuzzy-sets using the projections of extremes:

Zhang 77

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Approximation of a 2D-function

z =f (x, y)

=3(1 − x)2e−x2−(y+1)2 − 10(x

5− x3 − y5)e−x2−y2 −

1

3e−(x+1)2−y2

Zhang 78

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Approximation of a 2D-function

Derivatives of the function:dz

dx=− 6(1− x)e−x2−(y+1)2 − 6(1− x)2xe−x2−(y+1)2

− 10(1

5− 3x2) ∗ e−x2−y2

+ 20(1

5x − x3 − y5)xe−x2−y2

− 1

3(−2x − 2)e−(x+1)2−y2

dz

dy=3(1− x)2(−2y − 2)e−x2−(y+1)2

+ 50y4e−x2−y2

+ 20(1

5x − x3 − y5)ye−x2−y2

+2

3ye−(x+1)2−y2

Zhang 79

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Approximation of a 2D-function

d dzdx

dx=36xe−x2−(y+1)2 − 18x2e−x2−(y+1)2 − 24x3e−x2−(y+1)2

+ 12x4e−x2−(y+1)2 + 72xe−x2−y2

− 148x3e−x2−y2

− 20y5e−x2−y2

+ 40x5e−x2−y2

+ 40x2e−x2−y2

y5

− 2

3e−(x+1)2−y2

− 4

3e−(x+1)2−y2

x2 − 8

3e−(x+1)2−y2

x

Zhang 80

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Approximation of a 2D-function

d( dzdy )

dy=− 6(1− x)2e−x2−y(+1)2 + 3(1− x)2(−2y − 2)2e−x2−(y+1)2

+ 200y3e−x2−y2

− 200y5e−x2−y2

+ 20(1

5x − x3 − y5)e−x2−y2

− 40(1

5x − x3 − y5)y2e−x2−y2

+2

3e−(x+1)2−y2

− 4

3y2e−(x+1)2−y2

Zhang 81

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Global Overview of the Statistical Learning Theory - I

Let {(xi , yi )}li=1 be a set of data points/examples. We are searching for afunction f , which minimizes the following equation:

H[f ] =1

l

l∑i=1

V (yi , f (xi )) + λ‖f ‖2K

where V (·, ·) is a loss function and ‖f ‖2K is a norm in the Hilbert spaceH , which is defined by a positive kernel K , and λ is the regularizationparameter.

The problems in modeling, data regression and pattern classification are

each based on a kind of V (·, ·).

Zhang 82

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Global Overview of the Statistical Learning Theory - II

1. V (yi , f (xi )) = (yi − f (xi ))2

(yi ∈ R1, V : square error function)⇒ Regularization Networks, RN

2. V (yi , f (xi )) = |yi − f (xi )|ε(yi ∈ R1, | · |ε: eine ε-unempfindliche Norm)

⇒ Support Vector Machines Regression, SVMR

3. V (yi , f (xi )) = |1− yi f (xi )|+(yi ∈ {−1, 1}, |x |+ = x fur x = 0, else |x |+ = 0)

⇒ Support Vector Machines Classification, SVMC

Zhang 83

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Global Overview of the Statistical Learning Theory - II

1. V (yi , f (xi )) = (yi − f (xi ))2

(yi : a real number, V : square error function)⇒ Regularization Networks, RN

2. V (yi , f (xi )) = |yi − f (xi )|ε(yi : a real number, | · |ε: an ε-independent norm)

⇒ Support Vector Machines Regression, SVMR

3. V (yi , f (xi )) = |1− yi f (xi )|+(yi : -1 oder 1, |x |+ = x fur x = 0, sonst |x |+ = 0)

⇒ Support Vector Machines Classification, SVMC

For modeling and control tasks, the first definition is mostimportant.

Zhang 84

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Universelal Function Approximation - I

A control-network can approximate all smooth functions with anarbitrary precision.The general solution of this problem is:

f (x) =l∑

i=1

ciK (x ; xi )

where ci are the coefficients.

Zhang 85

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Universelal Function Approximation - II

Proposition of the approximation:

For any continuous function Y that is defined on the compactsubset Rn and the core function K , there is a functiony∗(x) =

∑li=1 ciK (x ; xi ) that fulfills for all x and any ε:

|Y (x)− y∗(x)| < ε

Zhang 86

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Universal Function Approximation - IIUsing different kernel functions leads to different models:

K (x− xi ) = exp(−‖x− xi‖2) Gaussian RBF

K (x− xi ) = (‖x− xi‖2 + c2)−12 Inverse multiquadratic functions

K (x− xi ) = tanh(x · xi − θ) Multilayer perceptron

K (x− xi ) = (1 + x · xi )d Polynomial of degree d

K (x − xi ) = B2n+1(x − xi ) B-Splines

K (x − xi ) =sin(d + 1

2 )(x − xi )

sin (x−xi )2

Trigonometric polynomial

... ...

Zhang 87

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Problems

1. “Curse of dimensionality”because of the exponentialdependency between the memory requirements and thedimension of input space.

2. Aliasing within the feature extraction

3. Not available target data (y).

4. Not available input factors.

Zhang 88

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Learning from DJ-Data

Dow-Jones-Index: can the function be modeled?

Zhang 89

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Example: Image Processing in Local Observation scenarios

A sequence of gray scale images of an object is acquired by the

movement along a fixed location:

Zhang 90

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Example: Extraction of Eigenvectors

The first 6 Eigenvectors:

Zhang 91

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Example: Eigenvectors and Eigenvalues

Zhang 92

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

Combination of Dimension Reduction with B-Spline Model

Eigenvectors can be partitioned by linguistic terms.

Such a combination of PCA and B-spline model can be considered as a

Neuro-Fuzzy model.

Zhang 93

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

The Neuro-Fuzzy Model

Zhang 94

University of Hamburg

MIN Faculty

Department of Informatics

Introduction Function approximation

The Training and Application Phases

Zhang 95

Recommended