An Introduction to Orthogonal Polynomials - Marek Rychlik · 2018-06-26 · Orthogonal polynomials in Statistics The polynomials commonly used as orthogonal contrasts for quantitative

An Introduction to OrthogonalPolynomials

by Marek Rychlik

Copyright © Marek Rychlik, 2009All rights reserved

March 14, 2009

Orthogonal polynomials in function spaces

We tend to think of scientific data as having some sort of continuity. This allowsus to approximate these data by special functions, such as polynomials or finitetrigonometric series. The quantitative measure of the quality of these approxi-mations is necessary. It is typically given by a norm.

Definition 1. The space of square-integrable functions on the interval [− 1, 1] isa vector space consisting of all measurable functions f : [− 1, 1]→R such that

∫

1

1

f(x)2dx <∞.

The integral is in the sense of Lebesgue.

This space is denoted by L2([ − 1, 1]). The space is endowed with an inner pro-duct:

〈f , g〉=

∫

−1

1

f(x)g(x)dx.

2

Remarks on convergence

Definition 2. A Cauchy sequence in a metric space (V , d) where d: V × V →R

is a metric, is a sequence (xn)n=1∞ such that xn ∈ V and for every ǫ > 0 there is N

such that for all m, n≥N we have:

d(xm, xn)< ǫ.

Definition 3. A Banach space is a normed space (V , ‖ · ‖) which is complete as

a metric space, i.e. in which every Cauchy sequence converges. The metric is

given by d(u, v) = ‖u− v‖.

Definition 4. A Hilbert space is an inner product space (V , 〈 · , · 〉) which is a

Banach space as a normed space with the norm ‖u‖= 〈u, u〉√

.

Theorem 5. L2([− 1, 1]) is a Hilbert space.

Orthogonal sets in L2([-1,1])

Studying orthogonality in L2([ − 1, 1]) has been one of the most fruitful humanendeavors, as it let to the advent of Fourier Theory and its modern continuation,

3

the wavelet theory.

It is a standard result in Fourier series theory, that the following set isorthonormal:

{1}∪ {cos(nπx)}n=1∞ ∪{sin(nπx)}n=1

∞

This is equivalent to the vanishing of certain integrals of trigonometric functions:

∫

−1

1

cos(nπx)dx = 0,

∫

−1

1

cos(nπx)cos(mπx)dx = 0 form� n,

∫

−1

1

cos(nπx)sin(nπx)dx = 0,

∫

−1

1

sin(nπx)sin(mπx)dx = 0 form� n,

∫

−1

1

cos2(nπx)dx = 1,

4

∫

−1

1

sin2(nπx)dx = 1.

Theorem 6. Every function f ∈ L2([− 1, 1]) admits a Fourier series representa-

tion:

f =a0

2+∑

n=1

∞

(ancos(nπx) + bnsin(nπx))

where:

an = 〈f , cos(nπx)〉=

∫

−1

1

f(x) cos(nπx)dx forn= 0, 1,� ,

bn = 〈f , sin(nπx)〉=

∫

−1

1

f(x) sin(nπx)dx forn= 1, 2,� .

The equality in the representation means that:

limN→∞

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

f −

(

a0

2+∑

n=1

N

(ancos(nπx) + bnsin(nπx))

)∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

= 0.

5

It does not mean pointwise convergence of the right-hand side to the value off(x).

Hilbert bases

Definition 7. A Hilbert basis is an orthogonal subset {e1, e2, � } in a Hilbert

space (V , 〈 · , · 〉) such that for every f ∈ V there and every ǫ > 0 there exists a

sequence of numbers αi, a finite number of which is � 0 and such that:

‖f −∑

i=1

∞

αiei‖< ǫ.

Remark 8. Thus, we assume that every element of V can be approximated byfinite linear combinations of the elements of the orthogonal set.

Theorem 9. If {e1, e2,� } is a Hilbert basis in a Hilbert space (V , 〈 · , · 〉) then:

limN→∞

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

f −∑

i=1

N〈f , ei〉

〈ei, ei〉ei

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

= 0.

6

In orther words, the sequence of projections of f onto span{e1, e2, � , eN} con-

verges to f as N →∞.

Legendre polynomials

In many applications, polynomials are preferred to trigonometric functions, formany reasons, e.g. the cost of numerical evaluation.

We have already examined the Gram-Schmidt process for converting any linearlyindependent set to an orthogonal set. We may apply Gram-Schmidt process tothe sequence of powers {1, x, x2, � } to obtain an infinite orthogonal set. Thepolynomials we obtain are:

Q0(x) = 1,

Q1(x) = x−〈x, 1〉

〈1, 1〉1 = x, because 〈x, 1〉= 0,

Q2(x) = x2−〈x2, 1〉

〈1, 1〉1−

〈x2, x〉

〈x, x〉x = x2−

1

3.

7

In similar fashion, we can obtain additional Legendre polynomials. The theory ofLegendre polynomials yields the following expression (the Rodrigues formula):

Pn(x)=1

2nn!

dn

dxn(x2− 1)n

which is equivalent to ours, up to the normalizing constant. We can see thateach of our polynomials Qn(x) has coefficient 1 at power n. Rodrigues formulayields the coefficient at xn euqal to:

(2n)(2n− 1)� (n + 1)

2nn!=

(2n)!

2n(n!)2=

1

2n

(

2n

n

)

.

Hence, the formula:

Pn(x)=1

2n

(

2n

n

)

Qn(x).

The Legendre polynomials are orthogonal, and their normalizing constants areobtained from the formula:

〈Pn, Pn〉=

∫

−1

1

Pm(x)2dx =2

2n + 1.

8

Computing first few Legendre polynomials

We use an open source Computer Algebra System (CAS) called Maxima, tocompute the first few Legendre polynomials:

(%i1) Q[0](x):= 1;

(%o16) Q0(x): = 1

(%i17) Q[n](x):= expand(x^n-sum(integrate(x^n*Q[k](x), x, -1,

1)/integrate(Q[k](x)*Q[k](x),x,-1,1)*Q[k](x), k, 0, n-1));

(%o17) Qn(x): = expand

(

xn − sum

(

integrate(xn Qk(x), x,− 1, 1)

integrate(Qk(x) Qk(x), x,− 1, 1)Qk(x), k,

0, n− 1

))

(%i18) for k from 0 thru 7 do ( display(Q[k](x)) );

Q0(x) = 1

Q1(x) = x

9

Q2(x) = x2−1

3

Q3(x) = x3−3 x

5

Q4(x) = x4−6 x2

7+

3

35

Q5(x) = x5−10x3

9+

5 x

21

Q6(x) = x6−15x4

11+

5 x2

11−

5

231

Q7(x) = x7−21x5

13+

105x3

143−

35x

429

(%o18) done

(%i19) plot2d(makelist(Q[k](x),k,0,7),[x,-

1,1],[psfile,"/tmp/Q.ps"]);

(%o19)

(%i20)

10

-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

x

1x

x2-1/3x3-3*x/5

x4-6*x2/7+3/35x5-10*x3/9+5*x/21

x6-15*x4/11+5*x2/11-5/231x7-21*x5/13+105*x3/143-35*x/429

Figure 1. The plot of the first 8 Qk’s.

11

If we use Rodrigues’ formula, we obtain a slightly different plot. Here is the cal-culation:

(%i12) P[n](x):=expand((1/(2^n*n!))*diff((x^2-1)^n,x,n));

(%o9) Pn(x): = expand

(

1

2n n!diff(

(

x2− 1)n

, x, n)

)

(%i10) for k from 0 thru 5 do ( display(P[k](x)) );

12

P0(x)= 1

P1(x)= x

P2(x)=3 x2

2−

1

2

P3(x)=5 x3

2−

3 x

2

P4(x)=35x4

8−

15x2

4+

3

8

P5(x)=63x5

8−

35x3

4+

15x

8

(%o10) done

(%i11) plot2d(makelist(P[k](x),k,0,7),[x,-

1,1],[psfile,"/tmp/P.ps"])$

(%i21)

13

-1

-0.5

0

0.5

1

-1 -0.5 0 0.5 1

x

1x

3*x2/2-1/25*x3/2-3*x/2

35*x4/8-15*x2/4+3/863*x5/8-35*x3/4+15*x/8

231*x6/16-315*x4/16+105*x2/16-5/16429*x7/16-693*x5/16+315*x3/16-35*x/16

Figure 2. The plot of the first 8 Pk’s.

We can see that Pk’s are scaled so that the value at 1 is + 1.

14

Orthogonal polynomials in Statistics

The polynomials commonly used as orthogonal contrasts for quantitative factorsare discrtete analogues of Legendre polynomials. One way to understand them isto consider the discretization of the inner product of L2([a, b]):

〈f , g〉=∑

i=0

t−1

f(xi)g(xi)

where xi is an increasing sequence of points in [a, b]. The most common case isthat of equally spaced points:

xi = a+ i · d

where d =b − a

t. We may replace the variable x (the factor) with i by performing

the mapping:

x� x̃ =x− x̄

d

where x̄ is the mean of xi. We can see that the values assumed by the trans-

formed variable are x̃i = i−t − 1

2, i = 0, 1,� , t− 1.

15

Because we are using only a finite number of points, the bilinear form justdefined is degenerate, i.e. it is possible that 〈f , g〉 = 0 for all g and still f � 0.However, if we restrict this form to the set of polynomials of degree t, the formbecomes non-degenerate. Hence, we consider the vector space Vt of all polyno-mials of degree < t:

f(x)=∑

i=0

t−1

βixi.

Lemma 10. The space (Vt, 〈 · , · 〉) is an inner product space.

Proof. We need to study the Van der Monde matrix. �

Van der Monde matrix

Definition 11. Van der Monde matrix is defined for a sequence of points (x1,

x2,� , xt) as follows:

M =

1 x1 x12 � x1

t−1

1 x2 x22 � x2

t−1

1 xt xt2 � xt

t−1

.

16

Lemma 12. The determinant of M is:

det(M) =∏

i<j

(xi −xj).

In particular, if xi are all distinct then det(M)� 0.

Lemma 13. The mapping F : Vt → Rt which maps a polynomial to the vector of

its values:

f(x)�

f(x1)f(x2)f(xt)

has the Van der Monde matrix as its matrix, if the basis {1, x, � , xt−1} is used

as a basis of Vt and the standard basis is used as a basis of Rt. If xi are all dis-

tinct then F is non-singular, and thus an isomorphism.

Lagrange interpolating polynomialsThe problem of finding a polynomial f ∈Vt which assumes given values (b1, b2,� ,

bt) at given points x1, x2, � , xt is the Lagrange interpolation problem and it isequivalent to finding the inverse of F . The solutions takes the form of the

17

Lagrange interpolating polynomial. There are many ways to derive the formulafor the Lagrange interpolating polynomial, and one of them is to use Cramer’sRule to solve the linear system Mβ = b, which is obtained by rewriting the equa-tion F (f) = b in the aforementioned bases of Vt and R

t. The solution is:

f(x) =∑

i=1

t

biLt,i(x), where

Lt,i(x) =

∏

j=1,2,� ,tj� i

(x−xj)

∏

j=1,2,� ,tj� i

(xi −xj)for i = 1, 2,� , t.

The polynomials Lt,i(x) are the unique polynomials of degree t− 1 such that

Lt,i(xj) = δij

where δij is the Kronecker delta.

Gram-Schmidt orthogonalization in Vt

We perform the Gram-Schmidt process on the basis {1, x,� , xt−1} in Vt.

18

We assume that we transformed the variable, so that xi = i − (t − 1)/2 for i = 0,

1,� , t− 1. Here are the first few polynomials computed by hand:

P0(x) = 1,

P1(x) = x−〈x, 1〉

〈1, 1〉1 = x,

P2(x) = x2−〈x2, 1〉

〈1, 1〉1−

〈x2, x〉

〈x, x〉x

= x2−

∑

i=0

t−1xi

2

∑

i=0

t−1 11−

∑

i=0

t−1xi

3

∑

i=0

t−1 xi2x

= x2−t2− 1

12.

We can see that in the process of calculations, we need to use closed-form for-mulas for the sums of the powers of integers:

bk(t)=∑

i=0

t−1

xik.

The theory leading to these closed form formulas is well know and has to dowith Bernoulli polynomials. We will only note that by induction it is easy to

19

prove that bk(t) is a polynomial of degree k + 1 in t. We can use Maxima to pro-duce closed formulas for the sums used in the above calculation. We producebk(t) for even t only, as for odd k bk(t) = 0 for reasons of parity.

(%i77) b[k](t):=nusum((i-(t-1)/2)^k,i,0,t-1);

(%o86) bk(t): = nusum

(

(

i−t− 1

2

)k

, i, 0, t− 1

)

(%i87) for k from 0 thru 6 step 2 do ( display(b[k](t)) );

b0(t) = t

b2(t) =(t− 1) t (t + 1)

12

b4(t) =(t− 1) t (t + 1)

(

3 t2− 7)

240

b6(t) =(t− 1) t (t + 1)

(

3 t4− 18 t2 + 31)

1344

20

(%o87) done

(%i88)

Computing orthogonal polynomials with a CAS

We once again employ maxima to compute the orthogonal polynomials used ascontrasts in statistics. For simplicity, we fix the order at the beginning.

(%i88) kill(t,offset,p,inner,norm,B)

(%o115) done

(%i116) t:7;

(%o120) 7

(%i121) offset:nusum(i,i,0,t-1)/t;

(%o121)3 7

7

21

(%i122) p[i]:=i-offset;

(%o122) pi: = i− offset

(%i123) inner(f,g):=sum(f(p[i])*g(p[i]),i,0,t-1);

(%o123) inner(f , g): = sum(f(pi) g(pi), i, 0, t− 1)

(%i124) norm(f):=sqrt(inner(f,f));

(%o125) norm(f): = inner(f , f)√

(%i126) B[0](x):=1;

(%o126) B0(x): = 1

(%i127) B[n](x):=expand(x^n-

sum(inner(lambda([x],x^n),B[k])/inner(B[k],B[k])

*B[k](x),k,0,n-1));

(%o127) Bn(x): = expand

(

xn − sum

(

inner(λ([x], xn), Bk)

inner(Bk, Bk)Bk(x), k, 0, n− 1

))

(%i128) for k from 0 thru t-1 do( display(B[k](x)) )$

22

B0(x) = 1

B1(x) = x

B2(x) = x2− 4

B3(x) = x3− 7 x

B4(x) = x4−67x2

7+

72

7

B5(x) = x5−35x3

3+

524x

21

B6(x) = x6−145x4

11+

434x2

11−

1200

77

(%i129) plot2d(makelist(B[k](x),k,0,t-1),

[x,p[0],p[t-1]],

[psfile,"/tmp/B.ps"])$

(%i132)

23

-50

-40

-30

-20

-10

0

10

20

30

-3 -2 -1 0 1 2 3

x

1x

x2-4x3-7*x

x4-67*x2/7+72/7x5-35*x3/3+524*x/21

x6-145*x4/11+434*x2/11-1200/77

Figure 3. A plot of the Bk’s for t =7.

24

25

Documents

An Introduction to Orthogonal Polynomials - Marek Rychlik · 2018-06-26 · Orthogonal polynomials in Statistics The polynomials commonly used as orthogonal contrasts for quantitative