Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
An Introduction to OrthogonalPolynomials
by Marek Rychlik
Copyright © Marek Rychlik, 2009All rights reserved
March 14, 2009
Orthogonal polynomials in function spaces
We tend to think of scientific data as having some sort of continuity. This allowsus to approximate these data by special functions, such as polynomials or finitetrigonometric series. The quantitative measure of the quality of these approxi-mations is necessary. It is typically given by a norm.
Definition 1. The space of square-integrable functions on the interval [− 1, 1] isa vector space consisting of all measurable functions f : [− 1, 1]→R such that
∫
1
1
f(x)2dx <∞.
The integral is in the sense of Lebesgue.
This space is denoted by L2([ − 1, 1]). The space is endowed with an inner pro-duct:
〈f , g〉=
∫
−1
1
f(x)g(x)dx.
2
Remarks on convergence
Definition 2. A Cauchy sequence in a metric space (V , d) where d: V × V →R
is a metric, is a sequence (xn)n=1∞ such that xn ∈ V and for every ǫ > 0 there is N
such that for all m, n≥N we have:
d(xm, xn)< ǫ.
Definition 3. A Banach space is a normed space (V , ‖ · ‖) which is complete as
a metric space, i.e. in which every Cauchy sequence converges. The metric is
given by d(u, v) = ‖u− v‖.
Definition 4. A Hilbert space is an inner product space (V , 〈 · , · 〉) which is a
Banach space as a normed space with the norm ‖u‖= 〈u, u〉√
.
Theorem 5. L2([− 1, 1]) is a Hilbert space.
Orthogonal sets in L2([-1,1])
Studying orthogonality in L2([ − 1, 1]) has been one of the most fruitful humanendeavors, as it let to the advent of Fourier Theory and its modern continuation,
3
the wavelet theory.
It is a standard result in Fourier series theory, that the following set isorthonormal:
{1}∪ {cos(nπx)}n=1∞ ∪{sin(nπx)}n=1
∞
This is equivalent to the vanishing of certain integrals of trigonometric functions:
∫
−1
1
cos(nπx)dx = 0,
∫
−1
1
cos(nπx)cos(mπx)dx = 0 form� n,
∫
−1
1
cos(nπx)sin(nπx)dx = 0,
∫
−1
1
sin(nπx)sin(mπx)dx = 0 form� n,
∫
−1
1
cos2(nπx)dx = 1,
4
∫
−1
1
sin2(nπx)dx = 1.
Theorem 6. Every function f ∈ L2([− 1, 1]) admits a Fourier series representa-
tion:
f =a0
2+∑
n=1
∞
(ancos(nπx) + bnsin(nπx))
where:
an = 〈f , cos(nπx)〉=
∫
−1
1
f(x) cos(nπx)dx forn= 0, 1,� ,
bn = 〈f , sin(nπx)〉=
∫
−1
1
f(x) sin(nπx)dx forn= 1, 2,� .
The equality in the representation means that:
limN→∞
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
f −
(
a0
2+∑
n=1
N
(ancos(nπx) + bnsin(nπx))
)∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
= 0.
5
It does not mean pointwise convergence of the right-hand side to the value off(x).
Hilbert bases
Definition 7. A Hilbert basis is an orthogonal subset {e1, e2, � } in a Hilbert
space (V , 〈 · , · 〉) such that for every f ∈ V there and every ǫ > 0 there exists a
sequence of numbers αi, a finite number of which is � 0 and such that:
‖f −∑
i=1
∞
αiei‖< ǫ.
Remark 8. Thus, we assume that every element of V can be approximated byfinite linear combinations of the elements of the orthogonal set.
Theorem 9. If {e1, e2,� } is a Hilbert basis in a Hilbert space (V , 〈 · , · 〉) then:
limN→∞
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
f −∑
i=1
N〈f , ei〉
〈ei, ei〉ei
∣
∣
∣
∣
∣
∣
∣
∣
∣
∣
= 0.
6
In orther words, the sequence of projections of f onto span{e1, e2, � , eN} con-
verges to f as N →∞.
Legendre polynomials
In many applications, polynomials are preferred to trigonometric functions, formany reasons, e.g. the cost of numerical evaluation.
We have already examined the Gram-Schmidt process for converting any linearlyindependent set to an orthogonal set. We may apply Gram-Schmidt process tothe sequence of powers {1, x, x2, � } to obtain an infinite orthogonal set. Thepolynomials we obtain are:
Q0(x) = 1,
Q1(x) = x−〈x, 1〉
〈1, 1〉1 = x, because 〈x, 1〉= 0,
Q2(x) = x2−〈x2, 1〉
〈1, 1〉1−
〈x2, x〉
〈x, x〉x = x2−
1
3.
7
In similar fashion, we can obtain additional Legendre polynomials. The theory ofLegendre polynomials yields the following expression (the Rodrigues formula):
Pn(x)=1
2nn!
dn
dxn(x2− 1)n
which is equivalent to ours, up to the normalizing constant. We can see thateach of our polynomials Qn(x) has coefficient 1 at power n. Rodrigues formulayields the coefficient at xn euqal to:
(2n)(2n− 1)� (n + 1)
2nn!=
(2n)!
2n(n!)2=
1
2n
(
2n
n
)
.
Hence, the formula:
Pn(x)=1
2n
(
2n
n
)
Qn(x).
The Legendre polynomials are orthogonal, and their normalizing constants areobtained from the formula:
〈Pn, Pn〉=
∫
−1
1
Pm(x)2dx =2
2n + 1.
8
Computing first few Legendre polynomials
We use an open source Computer Algebra System (CAS) called Maxima, tocompute the first few Legendre polynomials:
(%i1) Q[0](x):= 1;
(%o16) Q0(x): = 1
(%i17) Q[n](x):= expand(x^n-sum(integrate(x^n*Q[k](x), x, -1,
1)/integrate(Q[k](x)*Q[k](x),x,-1,1)*Q[k](x), k, 0, n-1));
(%o17) Qn(x): = expand
(
xn − sum
(
integrate(xn Qk(x), x,− 1, 1)
integrate(Qk(x) Qk(x), x,− 1, 1)Qk(x), k,
0, n− 1
))
(%i18) for k from 0 thru 7 do ( display(Q[k](x)) );
Q0(x) = 1
Q1(x) = x
9
Q2(x) = x2−1
3
Q3(x) = x3−3 x
5
Q4(x) = x4−6 x2
7+
3
35
Q5(x) = x5−10x3
9+
5 x
21
Q6(x) = x6−15x4
11+
5 x2
11−
5
231
Q7(x) = x7−21x5
13+
105x3
143−
35x
429
(%o18) done
(%i19) plot2d(makelist(Q[k](x),k,0,7),[x,-
1,1],[psfile,"/tmp/Q.ps"]);
(%o19)
(%i20)
10
-1
-0.5
0
0.5
1
-1 -0.5 0 0.5 1
x
1x
x2-1/3x3-3*x/5
x4-6*x2/7+3/35x5-10*x3/9+5*x/21
x6-15*x4/11+5*x2/11-5/231x7-21*x5/13+105*x3/143-35*x/429
Figure 1. The plot of the first 8 Qk’s.
11
If we use Rodrigues’ formula, we obtain a slightly different plot. Here is the cal-culation:
(%i12) P[n](x):=expand((1/(2^n*n!))*diff((x^2-1)^n,x,n));
(%o9) Pn(x): = expand
(
1
2n n!diff(
(
x2− 1)n
, x, n)
)
(%i10) for k from 0 thru 5 do ( display(P[k](x)) );
12
P0(x)= 1
P1(x)= x
P2(x)=3 x2
2−
1
2
P3(x)=5 x3
2−
3 x
2
P4(x)=35x4
8−
15x2
4+
3
8
P5(x)=63x5
8−
35x3
4+
15x
8
(%o10) done
(%i11) plot2d(makelist(P[k](x),k,0,7),[x,-
1,1],[psfile,"/tmp/P.ps"])$
(%i21)
13
-1
-0.5
0
0.5
1
-1 -0.5 0 0.5 1
x
1x
3*x2/2-1/25*x3/2-3*x/2
35*x4/8-15*x2/4+3/863*x5/8-35*x3/4+15*x/8
231*x6/16-315*x4/16+105*x2/16-5/16429*x7/16-693*x5/16+315*x3/16-35*x/16
Figure 2. The plot of the first 8 Pk’s.
We can see that Pk’s are scaled so that the value at 1 is + 1.
14
Orthogonal polynomials in Statistics
The polynomials commonly used as orthogonal contrasts for quantitative factorsare discrtete analogues of Legendre polynomials. One way to understand them isto consider the discretization of the inner product of L2([a, b]):
〈f , g〉=∑
i=0
t−1
f(xi)g(xi)
where xi is an increasing sequence of points in [a, b]. The most common case isthat of equally spaced points:
xi = a+ i · d
where d =b − a
t. We may replace the variable x (the factor) with i by performing
the mapping:
x� x̃ =x− x̄
d
where x̄ is the mean of xi. We can see that the values assumed by the trans-
formed variable are x̃i = i−t − 1
2, i = 0, 1,� , t− 1.
15
Because we are using only a finite number of points, the bilinear form justdefined is degenerate, i.e. it is possible that 〈f , g〉 = 0 for all g and still f � 0.However, if we restrict this form to the set of polynomials of degree t, the formbecomes non-degenerate. Hence, we consider the vector space Vt of all polyno-mials of degree < t:
f(x)=∑
i=0
t−1
βixi.
Lemma 10. The space (Vt, 〈 · , · 〉) is an inner product space.
Proof. We need to study the Van der Monde matrix. �
Van der Monde matrix
Definition 11. Van der Monde matrix is defined for a sequence of points (x1,
x2,� , xt) as follows:
M =
1 x1 x12 � x1
t−1
1 x2 x22 � x2
t−1
1 xt xt2 � xt
t−1
.
16
Lemma 12. The determinant of M is:
det(M) =∏
i<j
(xi −xj).
In particular, if xi are all distinct then det(M)� 0.
Lemma 13. The mapping F : Vt → Rt which maps a polynomial to the vector of
its values:
f(x)�
f(x1)f(x2)f(xt)
has the Van der Monde matrix as its matrix, if the basis {1, x, � , xt−1} is used
as a basis of Vt and the standard basis is used as a basis of Rt. If xi are all dis-
tinct then F is non-singular, and thus an isomorphism.
Lagrange interpolating polynomialsThe problem of finding a polynomial f ∈Vt which assumes given values (b1, b2,� ,
bt) at given points x1, x2, � , xt is the Lagrange interpolation problem and it isequivalent to finding the inverse of F . The solutions takes the form of the
17
Lagrange interpolating polynomial. There are many ways to derive the formulafor the Lagrange interpolating polynomial, and one of them is to use Cramer’sRule to solve the linear system Mβ = b, which is obtained by rewriting the equa-tion F (f) = b in the aforementioned bases of Vt and R
t. The solution is:
f(x) =∑
i=1
t
biLt,i(x), where
Lt,i(x) =
∏
j=1,2,� ,tj� i
(x−xj)
∏
j=1,2,� ,tj� i
(xi −xj)for i = 1, 2,� , t.
The polynomials Lt,i(x) are the unique polynomials of degree t− 1 such that
Lt,i(xj) = δij
where δij is the Kronecker delta.
Gram-Schmidt orthogonalization in Vt
We perform the Gram-Schmidt process on the basis {1, x,� , xt−1} in Vt.
18
We assume that we transformed the variable, so that xi = i − (t − 1)/2 for i = 0,
1,� , t− 1. Here are the first few polynomials computed by hand:
P0(x) = 1,
P1(x) = x−〈x, 1〉
〈1, 1〉1 = x,
P2(x) = x2−〈x2, 1〉
〈1, 1〉1−
〈x2, x〉
〈x, x〉x
= x2−
∑
i=0
t−1xi
2
∑
i=0
t−1 11−
∑
i=0
t−1xi
3
∑
i=0
t−1 xi2x
= x2−t2− 1
12.
We can see that in the process of calculations, we need to use closed-form for-mulas for the sums of the powers of integers:
bk(t)=∑
i=0
t−1
xik.
The theory leading to these closed form formulas is well know and has to dowith Bernoulli polynomials. We will only note that by induction it is easy to
19
prove that bk(t) is a polynomial of degree k + 1 in t. We can use Maxima to pro-duce closed formulas for the sums used in the above calculation. We producebk(t) for even t only, as for odd k bk(t) = 0 for reasons of parity.
(%i77) b[k](t):=nusum((i-(t-1)/2)^k,i,0,t-1);
(%o86) bk(t): = nusum
(
(
i−t− 1
2
)k
, i, 0, t− 1
)
(%i87) for k from 0 thru 6 step 2 do ( display(b[k](t)) );
b0(t) = t
b2(t) =(t− 1) t (t + 1)
12
b4(t) =(t− 1) t (t + 1)
(
3 t2− 7)
240
b6(t) =(t− 1) t (t + 1)
(
3 t4− 18 t2 + 31)
1344
20
(%o87) done
(%i88)
Computing orthogonal polynomials with a CAS
We once again employ maxima to compute the orthogonal polynomials used ascontrasts in statistics. For simplicity, we fix the order at the beginning.
(%i88) kill(t,offset,p,inner,norm,B)
(%o115) done
(%i116) t:7;
(%o120) 7
(%i121) offset:nusum(i,i,0,t-1)/t;
(%o121)3 7
7
21
(%i122) p[i]:=i-offset;
(%o122) pi: = i− offset
(%i123) inner(f,g):=sum(f(p[i])*g(p[i]),i,0,t-1);
(%o123) inner(f , g): = sum(f(pi) g(pi), i, 0, t− 1)
(%i124) norm(f):=sqrt(inner(f,f));
(%o125) norm(f): = inner(f , f)√
(%i126) B[0](x):=1;
(%o126) B0(x): = 1
(%i127) B[n](x):=expand(x^n-
sum(inner(lambda([x],x^n),B[k])/inner(B[k],B[k])
*B[k](x),k,0,n-1));
(%o127) Bn(x): = expand
(
xn − sum
(
inner(λ([x], xn), Bk)
inner(Bk, Bk)Bk(x), k, 0, n− 1
))
(%i128) for k from 0 thru t-1 do( display(B[k](x)) )$
22
B0(x) = 1
B1(x) = x
B2(x) = x2− 4
B3(x) = x3− 7 x
B4(x) = x4−67x2
7+
72
7
B5(x) = x5−35x3
3+
524x
21
B6(x) = x6−145x4
11+
434x2
11−
1200
77
(%i129) plot2d(makelist(B[k](x),k,0,t-1),
[x,p[0],p[t-1]],
[psfile,"/tmp/B.ps"])$
(%i132)
23
-50
-40
-30
-20
-10
0
10
20
30
-3 -2 -1 0 1 2 3
x
1x
x2-4x3-7*x
x4-67*x2/7+72/7x5-35*x3/3+524*x/21
x6-145*x4/11+434*x2/11-1200/77
Figure 3. A plot of the Bk’s for t =7.
24
25