Notes: Introduction to Numerical Methodsjchrispe/MATH_250/NotesFull3.pdf · Notes: Introduction to Numerical Methods J.C. Chrispell Department of Mathematics Indiana University of

Notes: Introduction toNumerical Methods

J.C. ChrispellDepartment of Mathematics

Indiana University of PennsylvaniaIndiana, PA, 15705, USA

E-mail: [email protected]://www.math.iup.edu/~jchrispe

May 5, 2017

http://www.math.iup.edu/~jchrispe

Numerical Methods Notes Draft: May 5, 2017

ii

Preface

These notes will serve as an introduction to numerical methods for scientific computing.From the IUP course catalog the course will contain:

Algorithmic methods for function evaluation, roots of equations, solutions tosystems of linear equations, function interpolation, numerical differentiation; anduse spline functions for cure fitting. Focus on managing and measuring errors incomputation. Also offered as COSC 250; either COSC 250 or MATH 250 maybe substituted for the other and may be used interchangeably for D or F repeatsbut may not be counted for duplicate credit.

Material presented in the course will tend to follow the presentation of Cheney and Kincaidin their text: Numerical Mathematics and Computing (seventh edition) [2]. Relevant coursematerial will start in chapter 1 of the text and selected chapters will be covered as time inthe course permits. I will supplement the Winston text with additional material from otherpopular books on numerical methods:

• Scientific Computing: An Introduction Survey by Heath [3]

• Numerical Analysis by Burden and Faires [1]

My Apologies in advance for any typographical errors or mistakes that are present in thisdocument. That said, I will do my very best to update and correct the document if I ammade aware of these inaccuracies.

-John Chrispell

iii


iv

Contents

1 Introduction and Review 1

1.1 Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.1.1 Accurate and Precise . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.1.2 Horner’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2 Floating Point Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.4.1 Taylor’s Theorem using h . . . . . . . . . . . . . . . . . . . . . . . . 12

1.5 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.5.1 Assessment of Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.6 Improving Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . 16

2 Methods for Finding Zeros 19

2.0.1 Bisection Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.0.2 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 Numerical Integration 25

3.1 Trapezoid Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.1.1 Newton-Cotes Quadrature . . . . . . . . . . . . . . . . . . . . . . . . 29

3.1.2 Gaussian Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4 Polynomial Interpolation 35

4.0.1 Error in Polynomial Interpolation . . . . . . . . . . . . . . . . . . . . 40

4.0.2 Highlights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

v


4.1 Cubic Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5 Initial Value Problems 45

5.1 Second Order Runge-Kutta Method . . . . . . . . . . . . . . . . . . . . . . . 46

5.2 Second Order Runge-Kutta Method . . . . . . . . . . . . . . . . . . . . . . . 52

5.3 Fourth Order Runge-Kutta Method . . . . . . . . . . . . . . . . . . . . . . . 53

6 The Heat Equation 57

6.1 Numerical Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6.1.1 Taylors’s Theorem For Approximations . . . . . . . . . . . . . . . . . 59

6.1.2 Discritizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6.2 Implicit Time Stepping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.2.1 Tri-Diagonal Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.3 Order of Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

7 Appendices 67

Bibliography 69

vi

Chapter 1

Introduction and Review

“I have never listened to anyone who criticized my taste in space travel, sideshows or

gorillas. When this occurs, I pack up my dinosaurs and leave the room.”

−Ray Bradbury, Zen in the Art of Writing

What is Scientific Computing?

The major theme of this class will be solving scientific problems using computers. Many ofthe examples considered will be smaller parts that can be thought of as tools for implementingor examining larger computational problems of interest.

We will take advantage of replacing a difficult mathematical problem with simpler problemsthat are easier to handle. Using the smaller parts insight will be gained into the largerproblem of interest. In this class the methods and algorithms underlying computationaltools you already use will be examined.

Scientific Computing: Deals with computing continuous quantities in science and en-gineering (time, distance, velocity, temperature, density, pressure, stress) that can not besolved exactly or analytically in a finite number of steps. Typically we are numericallysolving problems that involve integrals, derivatives, and nonliterary.

Numerical Analysis: An area of mathematics where concern is placed on the design andimplementation of algorithms to solve scientific problems.

In general for solving a problem you will:

• Develop a model (expressed by equations) for a phenomenon or system of interest.

• Find/Develop an algorithm to solve the the system.

• Develop a Computational Implementations.

1


• Run your implementation.

• Post process your results (Graphs Tables Charts).

• Interpret validate your results.

Problems are well posed provided:

1. A solution to the problem of interest exists.

2. The solution is unique.

3. The solution depends continuously on the data.

The last item here is important as problems that are ill conditioned have large changesin output with small changes in the initial conditions or data. This can be troubling fornumerical methods, and is not always avoidable.

In general we will use some standard techniques to attack problems presented. Replacing anunsolvable problem by a problem that is “close to it” in some sense and then looking at theclosely related solution.

• Replace infinite dimensional spaces with finite ones.

• Infinite processes with Finite processes:

– Integrals with Sums– Derivatives with Finite Differences

• Nonlinear Problems with Linear Ones

• Complicated Functions with Simple Ones (polynomials).

• General Matrices with Simpler Matrices.

With all of this replacement and simplification the sources of error and approximation needto be accounted for. How good is the approximated solution?

Significant Digits

The significant digits in a computation start with the left most non zero digit in a compu-tation, and end with the right most correct digit (including final zeros that are correct).

Example: Lets consider calculating the surface area of the Earth.

• The area of a sphere is:A = 4πr2

2


• The radius of the Earth (r ≈ 6370 km).

• The value for π ≈ 3.141592653 rounded at some point.

• The numerical computation will be rounded at some point.

• All of these assumptions will come into play at some point.

• How many digits are significant?

Figure 1.0.1: Here the intersection of two parallel lines is compared with an error range givenof size ε. Note the closer the two lines are to parallel the more ill conditioned finding theintersection will become.

www.math.iup.edu/~jchrispe/MATH_250/eps_error.html

Example: Consider solving the following system of equations.

0.1243x+ 0.2345y = 0.8723

0.3237x+ 0.5431y = 0.9321

However you can only keep three significant digits.

Keeping only three significant digits in all computations an answer of

x ≈ −29.0 and y ≈ 19.0

Solving the problem using sage:

x ≈ −30.3760666260334 and y ≈ 19.8210877680851

Note that the example in the Cheney text is far more dramatic, and the potential for errorwhen truncating grows dramatically if the two lines of interest are nearly parallel.

3

www.math.iup.edu/~jchrispe/MATH_250/eps_error.html


1.1 Errors

If two values are considered one taken to be true and the other an approximation then theError is given by:

Error = True− Approximation

The Absolute Error of using the Approximation is

Absolute Error =

∣∣∣∣True− Approximation∣∣∣∣

and we denote

Relative Error =

∣∣∣∣True− Approximation∣∣∣∣∣∣∣∣True∣∣∣∣

• The Relative Error is usually more useful than the Absolute Error.

• The Relative Error is not defined if the true value we are looking for is zero.

Example: Consider the case where we are approximating and have:

True = 12.34 and Approximation = 12.35

Here we have the following:

Error = −0.01

Absolute Error = 0.01

Relative Error = 0.0008103727714748612

Note that the approximation has 4 significant digits.

Example: Consider the case where we are approximating and have:

True = 0.001 and Approximation = 0.002

Here we have the following:

Error = −0.001

Absolute Error = 0.001

Relative Error = 1

Here relative error is a much better indicator of how well the approximation fits the truevalue.

4


1.1.1 Accurate and Precise

When a computation is accurate to n decimal places then we can trust n digits to the rightof the decimal place. Similar when a computation is said to be accurate to n significantdigits then the computation is meaningful for n places beginning with the leftmost nonzerodigit given.

• The classic example here is a meter stick. The user can consider it accurate to thelevel of graduation on the meter stick.

• A second example would be the mileage on your car. It usually displays in tenth ofa mile increments. You could use your car to measure distances accurate withing twotenths of a mile.

Precision is a different game. Consider adding the following values:

3.4 + 5.67 = 9.07

The second digit in 3.4 could be from rounding any of the following:

3.41, 3.4256, 3.44, 3.36, 3.399, 3.38

to two significant digits. So there can only be two signifigant digits in the answer. Theresults from multiplication and division can be even more misleading.

Computers will in some cases allow a user to decide if they would like to use rounding orchopping. Note there may be several different schemes for rounding values (especially whenit comes to rounding values ending with a 5).

1.1.2 Horner’s Algorithm

In general it is a good idea to complete most computation using a minimum number offloating point operations. Consider evaluating polynomials. For example given

f(x) = a0 + a1x+ a2x2 + · · · an−1x

n−1 + anxn

It would not be wise to compute x2, then x3 and so on. Writing the polynomial as:

f(x) = a0 + x(a1 + x(a2 + x(· · ·x(an−1 + x(an)) · · · ))

will efficiently evaluate the polynomial without ever having to use exponentiation. Noteefficient evaluation of polynomials in this is Horner’s Algorithm and is accomplishedusing synthetic division.

5


1.2 Floating Point Representation

Numbers when entered into a computational machine are typically broken into two parts:

• An integer portion.

• A fractional portion.

with these two parts being separated by a decimal point.

123.456, 0.0000123

A second form that is used is normalized scientific notation , or normalized floating-pointrepresentation.

Here the decimal point is shifted so it is written as a fraction multiplied by some power of10. Note the leading digit of the fraction is nonzero.

0.0000123 =⇒ 0.123× 10−5

Any decimal in the floating point system may be written in this manner.

x = ±0.d1d2d3 . . .× 10n

with d1 not equal to zero.

More generally we write

x = ±r × 10n with(

1

10≤ r < 1

).

Here r is the mantissa and n is the exponent. If we are looking at numbers in a binarysystem then

x = ±q × 2n with(

1

2≤ q < 1

).

Computers work exactly like this; however, on a computer we have the issue of needing touse finite word length (No more “. . .”).

This means a couple of things:

• No representation for irrational numbers.

• No representation for numbers that do not fit into a finite format.

6


Activity

Numbers that can be expressed on a computer are called its Machine Numbers, and theyvery depending on the computational system being used. If we consider a binary computa-tional system where numbers must be expressed using ‘normalized’ scientific notation in theform:

x = ±(0.b1b2b3)2 × 2±k

where the values ofb1, b2, b3, and k ∈ 0, 1.

what are all the possible numbers in this computational system?

What additional observations can be made about the system?

We shall consider here only the positive numbers:

(0.100)2 × 2−1 =1

4

(0.100)2 × 2−1 = 14

(0.100)2 × 20 = 12

(0.100)2 × 21 = 1

(0.101)2 × 2−1 = 516

(0.101)2 × 20 = 58

(0.101)2 × 21 = 54

(0.110)2 × 2−1 = 38

(0.110)2 × 20 = 34

(0.110)2 × 21 = 32

(0.111)2 × 2−1 = 716

(0.111)2 × 20 = 78

(0.111)2 × 21 = 74

• Note there is a hole in the number system near zero.

• Note there is also uneven spacing of the numbers we do have.

• Numbers smaller than the smallest representable number are considered underflow andtypically treated as zero.

• Numbers larger than the largest representable number are considered overflow and willtypically through an error.

7


For number representations on computers we actually use the IEEE-754 standard has beenaccepted.

Precision Bits Sign Exponent MantissaSingle 32 1 8 23Double 64 1 11 52Long Double 80 1 15 64

Note that223 ≈ 1× 10−7

252 ≈ 1× 10−16

264 ≈ 1× 10−20

gives us the ballpark for machine precision when a computation is done using a given numberof bits.

8


1.3 Activities

The limite = lim

n→∞

(1 +

1

n

)ndefines the number e in calculus. Estimate e by taking the value of this expression forn = 8, 82, 83, . . . , 810. Compare with e obtained from the exponential function on yourmachine. Interpret the results.

9


1.4 Taylor’s Theorem

There are several useful forms of Taylor’s Theorem and it can be argued that it is the mostimportant theorem for the study of numerical methods.

Theorem 1.4.1 If the function f possess continuous derivatives of orders 0, 1, 2, . . . , (n+1)in a closed interval I = [a, b] then for any c and x in I,

f(x) =n∑k=0

f (k)(c)

k!(x− c)k + En+1

where the error tem En+1 can be given in the form of

En+1 =f (n+1)(η)

(n+ 1)!(x− c)n+1.

Here η is a point that lies between c and x and depends on both.

Note we can use Taylor’s Theorem to come up with useful series expansions.

Example: Use Taylor’s Theorem to find a series expansion for ex.

Here we need to evaluate the nth derivative of ex. We also need to pick a point of expansionor value for c.

We will choose c to be zero, and recall that the derivative of ex is such that

d

dxex = ex.

Thus, for Taylor’s Theorem we need:

f(0) = e0 = 1

f ′(0) = e0 = 1

f ′′(0) = e0 = 1

I see a pattern!

So we then have:

ex =f(0)

0!(x)0 +

f ′(0)

1!(x)1 +

f ′′(0)

2!(x)2 +

f ′′′(0)

3!(x)3 + . . .

=1

0!+x

1!x+

x2

2!+x3

3!+ . . .

=∞∑k=0

xk

k!for |x| <∞.

Note we should be a little more careful here, and prove that the series truly does convergeto ex by using the full definition given in Taylor’s Theorem.

10


In this case we have:

ex =n∑k=0

ek

k!+

eη

(n+ 1)!xn+1 (1.4.1)

which incorporates the error term. We now look at values of x in some interval around theorigin, consider −a ≤ x ≤ a. Then |η| ≤ a and we know

eη ≤ ea

Then the remainder or error term is such that:

limn→∞

∣∣∣∣ eη

(n+ 1)!xn+1

∣∣∣∣ ≤ limn→∞

∣∣∣∣ ea

(n+ 1)!an+1

∣∣∣∣ = 0

Then when the limit is taken of both sides of (1.4.1) it can be seen that:

ex =∞∑k=0

ek

k!

Taylor’s theorem can be useful to find approximations to hard to compute values:

Example: Use the first five terms in a Taylor’s series expansion to approximate the valueof e.

e ≈ 1 + 1 +1

2+

1

6+

1

24= 2.70833333333

Example: In the special case of n = 0 Taylors theorem is known as the Mean ValueTheorem.

Theorem 1.4.2 If f is a continuous function on the closed interval [a, b] and possesses aderivative at each point in the open interval (a, b), then

f(b) = f(a) + (b− a)f ′(η)

for some η in (a, b).

Notice that this can be rearranged so that:

f ′(η) =f(b)− f(a)

b− a

The right hand side here is an approximation of the derivative for any x ∈ (a, b).

11


1.4.1 Taylor’s Theorem using h

Their is a more useful form of Taylor’s Theorem:

Theorem 1.4.3 If the function f possesses continuous derivatives of order 0, 1, 2, . . . , (n+1)in a closed interval I = [a, b], then for any x in I,

f(x+ h) = f(x) + f ′(x)h+1

2f ′′(x)h2 +

1

6f ′′′(x)h3 + . . .+ En+1

=n∑k=0

f (k)(x)

k!hk + En+1

where h is any value such that x+ h is in I and where

En+1 =f (n+1)(η)

(n+ 1)!hn+1

for some η between x and x+ h.

Note that the error term En+1 will depend on h in two ways.

• Explicitly on h with the hn+1 presence.

• The point η generally depends on h.

Note as h converges to zero we see the Error Term converges to zero at a rate proportionalto hn+1. Thus, we typically write:

En+1 = O(hn+1)

as h goes to zero. This is short hand for:

|En+1| ≤ C|h|n+1

where C is an upper bounding constant.

We additionally note that Taylor’s Theorem in terms of h may be written down specificallyfor any value of n, and thus represents a family of theorems, each with a specific order of happroximation.

f(x+ h) = f(x) +O(h)

f(x+ h) = f(x) + f ′(x)h+O(h2)

f(x+ h) = f(x) + f ′(x)h+1

2f ′′(x)h2 +O(h3)

12


1.5 Gaussian Elimination

In the previous section we considered the numbers that are available for our use on a com-puter. We made note that their are many numbers (especially near zero) that are not machinenumbers, and when used in a computation these numbers result in numerical roundoff error.As the computation will use the closest available machine number.

Lets now look at how this roundoff error can come into play when we are solving the familiarlinear equation system:

Ax = b

The normal approach would be to compute A−1 and then use that to find x. However, thereare other questions that can come into play:

• How do we store a large system of this form on a computer?

• How do we know that the answer we receive is correct?

• Can the algorithm we use fail?

• How long will it take to compute the answer?

• What is the operation count for computing the answer?

• Will the algorithm be unstable for certain systems of equations?

• Can we modify the algorithm to control instabilities?

• What is the best algorithm for the task at hand?

• Matrix Conditioning Issues?

Lets start by considering the system of equations:

Ax = b

with

A =

1 2 4 . . . 2n−1

1 3 9 . . . 3n−1

1 4 16 . . . 4n−1

......

... . . ....

1 n+ 1 n+ 12 . . . n+ 1n−1

and the right hand side be such that

bi =n∑j=1

Ai,j

is the sum of any given row. Note then here the solution to the system will trivially be acolumn of ones. Here A is a well know and ‘poorly conditioned’ Vandermonde matrix.

13


It may be useful to use the sum of a geometric series when coding this so that any row iwould look like:

n∑j=1

(1 + i)j−1xj =1

i((1 + i)n − 1)

The following is pseudo code for a Gaussian elimination procedure. Much like you would doby hand, our goal will be to implement and test this in MATLAB.

Listing 1.1: Straight Gaussian Elimination% Forward El iminat ion .f o r k = 1 to (n−1)

f o r i = (k+1) to nxmult = A( i , k )/A(k , k ) ;A( i , k ) = xmult ;f o r j = (k+1) to n

A( i , j ) = A( i , j ) − ( xmult )∗A(k , j ) ;endb( i , 1 ) = b( i , 1 ) − ( xmult )∗b(k , 1 ) ;

endend

% Backward Subs t i tu t i on .x (n , 1 ) = b(n , 1 ) /A(n , n ) ;f o r i = (n−1) to 1

sum = b( i , 1 ) ;f o r j = ( i +1) to n

sum = sum − A( i , j )∗x ( j , 1 ) ;endx ( i , 1 ) = sum/A( i , i ) ;

end

Write a piece of code that implements this algorithm.

14


1.5.1 Assessment of Algorithm

In order to see how well our algorithm was performing the error can be considered. There areseveral ways of computing the error of a vector solution. The first is to consider a straightforward vector of the difference between the computed solution and the true solution:

e = xh − x.

A second method that is used when the true solution to a given problem is unknown is toconsider a residual vector.

r = Axh − b

Note the residual vector will be all zeros when the true solution is obtained. In order to geta handle on the size of either the residual vector or the error vector norms are often used.

A vector norm is any mapping from Rn to R that satisfies the following properties:

• ‖x‖ > 0 if x 6= 0.

• ‖αx‖ = |α| ‖x‖.

• ‖x + y‖ ≤ ‖x‖+ ‖y‖. (triangle inequality).

where x and y are vectors in Rn, and α ∈ R.

Examples of vector norms include:

• The l1 vector norm:

‖x‖1 =n∑i=1

|xi|

• The Euclidean/ l2-vector norm:

‖x‖2 =

(n∑i=1

x2i

)1/2

• lp-vector norm:

‖x‖p =

(n∑i=1

xpi

)1/p

Note there are also norms for Matrices. More on this when condition number for matricesis discussed. Different norms of the residual and error vectors allow for a single value to beassessed rather than an entire vector.

15


1.6 Improving Gaussian Elimination

For notes here we will follow Cheney’s presentation. The algorithm that we have implementedwill not always work! To see this consider the following example:

0x1 + x2 = 1

x1 + x2 = 2

The solution to this system is clearly x1 = 1 and x2 = 1; However, our Gaussian Eliminationalgorithm will fail! (Division by zero.) When algorithms fail this tells us to be skeptical ofthe results for values near the failure.If we apply the Gaussian Elimination Algorithm to the following system what happens?

εx1 + x2 = 1

x1 + x2 = 2

After step one:

εx1 + x2 = 1

+(1− ε−1)x2 = 2− ε−1

Doing the back solve yields:

x2 =2− ε−1

1− ε−1

However we make note that the value of ε is very small, and thus

ε−1 is very large

x2 =2− ε−1

1− ε−1≈ 1

andx1 = ε−1(1− x2) ≈ 0.

These values are not correct as we would expect in the real world to obtain values of

x1 =1

1− ε≈ 1 and x2 =

1− 2ε

1− ε≈ 1

How could we fix the system/algorithm?

• Note that if we had attacked the problem considering the second equation first therewould have been no difficulty with division by zero.

• A second issue comes from the coefficient ε being very small compared with the othercoefficients in the row.

16


• At the kth step in the gaussian elimination process. The entry akk is known as thepivot element or pivot. The process of interchanging rows or columns of a matrix isknown as pivoting and alters the pivot element.

We aim to improve the numerical stability of the numerical algorithm. Many differentoperations may be algebraically equivalent; however, produce different numerical resultswhen implemented numerically.

The idea becomes to swap the rows of the system matrix so that the entry with the largestvalue is used to zero out the entries in the column associated with that variable during Gaus-sian Elimination. This is known as partial pivoting and is accomplished by interchangingtwo rows in the system.

Gaussian Elimination with full pivoting or complete pivoting would select the pivot entryto be the largest entry in the sub-matrix of the system and reorder both rows and columnsto make that element the pivot element.

Seeking the largest value possible hopes to make the pivot element as numericallystable as possible. This makes the process less susceptible to roundoff errors.However, the large amount of work is usually not seen as worth the extra effortwhen compared with partial pivoting.

An even more sophisticated method would be scaled partial pivoting. Here the largestentry in each row si is used when picking the initial pivot equation. The pivot entry isselected by dividing current column entries (for the current variable) by the scaling value sifor each row, and taking the largest as the pivot row (see the Cheney text for an exampleand the Pseudocode).

Simulates full pivoting by using an index vector containing information aboutthe relative sizes of elements in each row.

• The idea here as that these changes to the Gaussian Elimination algorithm will allowfor zero pivots and small pivots to be avoided.

• Gaussian Elimination is numerically stable for diagonally dominant matrices or matri-ces that are symmetric positive definite.

• The Matlab backslash operator attempts to use the best or most numerically stablealgorithm available.

17


18

Chapter 2

Methods for Finding Zeros

“Four quiet hours is a resource that I can put to good use. Two slabs of time, each twohours long, might add up to the same four hours., but are not nearly as productive as an

unbroken four. If I know that I am going to be interrupted, I can’t concentrate, and

if I suspect that I might be interrupted, I can’t do anything at all. ” â

− Neal Stephenson, Why I’m a Bad Correspondent

There are lots of different methods for going about finding the roots or zeros of a function.More methods than could probably be listed in a reasonable space. The importance offinding zeros of functions can be seen by considering that any equation may be written anan equivalent form with a zero on one side of the equal sign.

In general methods for finding the roots of a function make a couple of assumptions. Wewill assume that the domain of the function over which the root is to be found that:

• The function is continuous.

• The function is also differentiatable on the domain considered.

With these assumptions we can now look at several method to find the roots of functionsnumerically that are especially useful when analytic methods for finding roots are not pos-sible.

In order to find a zero of a function most root finding methods make use of the intermediatevalue theorem . For a continuous function, f , and real values a < b such that

f(a)f(b) < 0

there will be a root in the interval (a, b).

19


2.0.1 Bisection Algorithm

The bisection method looks for the root between the end points of the search interval a andb by:

1. Looking at the midpoint

c =a+ b

2

2. Computing f(c).

3. Seeing if f(a)f(c) < 0 and if so looking in the interval (a, c).

4. Else seeing if f(b)f(c) < 0 and if so looking in the interval form (c, b).

Class coding Exercise: Write a piece of code that can be used to find the root of a specifiedfunction on a given interval in SAGE, Python, or MATLAB.

Convergence Analysis

At this junction it would be a good idea to take stock of how well the bisection algorithmis performing. After the nth iteration of the algorithm the distance from the root r to thecenter of the interval considered will be:

|r − cn| <bn − an

2≤ b− a

2n+1< εtol. (2.0.1)

• The denominator in (2.0.1) has a factor of 2n+1 as the guess for the root will be at thecenter of the new interval (a, b).

• How many iterations will it take for the error to be less than a given tolerance?

a− b2n+1

< εtol =⇒ a− b2εtol

< 2n

=⇒ ln

(a− b2εtol

)< n ln(2)

=⇒ n >ln(a−b2εtol

)ln(2)

The bisection method works in the same manner as a binary search method that some mayhave seen in a data structures course.

20


False Position Method

A modification of the bisection method that can be use to find the zeros of a function isthe method false position method. Here instead of using the midpoint of the interval (a, b)as the new end point of the search interval. A secant line between (a, f(a)) and (b, f(b)) isconstructed. and the point at which the secant line crosses the x-axis is considered the newdecision point.

21


Using the slope of the line segments it can be seen that:

c =a(f(a)− f(c))

(f(a)− f(b))

and the algorithm would carry on in the same manner as the bisection method did.

2.0.2 Newton’s Method

Newton’s method or Newton-Raphson iterations are a second way to find the root of a func-tion. Note that presented here is Newton’s method for a single variable function; however,more general versions of Newton’s method may be used to solve systems of equations.

As with the bi-section method Newton’s method assumes that our function f is continuous.Additionally it is assumed that the function f is differentable. Using the fact that the func-tion is differentiable allows for use of the tangent line at a given point to find an approximatevalue for the root of the function. Consider the following figure:

The initial guess for the root, x0, of function f is updated to x1 using the zero of the tangentline of f at the point x0.

Using point slope form of a line gives

y = f ′(x0)(x− x0) + f(x0) (2.0.2)

as the equation of the tangent line of the function f at x0. Solving (2.0.2) for its root givesx1 a hopefully better approximation for the root of f .

0 = f ′(x0)(x1 − x0) + f(x0) =⇒ −f(x0) = f ′(x0)x1 − x0f′(x0)

=⇒ −f(x0) + x0f′(x0) = f ′(x0)x1

=⇒ x1 = x0 −f(x0)

f ′(x0)

22


Extending this to successive values allows for a sequence of approximations to the root off(x) to be found where xn+1 is found from xn as:

xn+1 = xn −f(xn)

f ′(xn)

The algorithm should terminate when successive approximating values become within adefined tolerance of one another. We should examine whether or not

limn→∞

xn = r

for r the root of f .

Coding Exercise

Use Newtons Method to find the root

f(x) = sin(x)

between 2 and 4. Note this will approximate pi.

23


24

Chapter 3

Numerical Integration

“On Monday in math class Mrs. Fibonacci says, ‘You know, you can think of almost

everything as a math problem.’ On Tuesday I start having problems.”

− Jon Scieszka and Lane Smith, MATH CURSE

In Calculus one of the fundamental topics discussed is integration.

• The indefinite integral of a function is also a function of class of functions.

• The definite integral of a function over a fixed interval is a number.

Example: Consider the functionf(x) = x3

• Indefinite Integral:

F (x) =

∫x3 dx =

x4

4+ C

• Definite Integral: ∫ 3

0

x3 dx =x4

4

∣∣∣∣30

=81

4

Example: Consider finding the Indefinite Integral of

f(x) = ex2

.

That is, ∫ex

2

dx =?

25


Using u-substitution doesn’t work, and computational algebra-systems like sage give answerssuch as: ∫

ex2

dx = (−1/2)i√πerf(ix)

as no elementary function of x has a derivative that is simply ex2 .

The definite integral ∫ b

a

f(x) dx

is representation of the area under the f(x) curve between a and b. There should be a wayto get a handle on this value for f(x) = ex

2 . Consider the interval of interest to be between0 and 1. Then, ∫ 1

0

ex2

dx = Area

How do we find the ‘Area’ when we don’t know the function F needed in the FundamentalTheorem of Calculus?

Theorem 3.0.1 (Fundamental Theorem of Calculus) If f is continuous on the inter-val [a, b] and F is an antiderivative of f , then∫ b

a

f(x) dx = F (b)− F (a)

26


3.1 Trapezoid Rule

Consider dividing the domain of interest [a, b] into sections such that:

a = x0 ≤ x1 ≤ x2 ≤ · · · ≤ xn = b

Then the area under the curve f on each of the sub-intervals [xi, xi+1] is approximated usinga trapezoid with a base of xi+1 − xi and average height of

1

2(f(xi) + f(xi+1))

Thus, ∫ xi+1

xi

f(x) dx ≈ 1

2(xi+1 − xi) (f(xi) + f(xi+1))

and the full definite integral is approximated as:∫ b

a

f(x) dx ≈ 1

2

n−1∑i=0

(xi+1 − xi) (f(xi) + f(xi+1))

Note if a uniform spacing of the sub-intervals of size h is used the above estimate of thedefinite integral simplifies to:∫ b

a

f(x) dx ≈ h

2

n−1∑i=0

(f(xi) + f(xi+1))

and several computations may be saved if the definite integral is written as:∫ b

a

f(x) dx ≈ h

2(f(x0) + f(xn)) + h

n−1∑i=1

f(xi)

Computational Exercise

Using the Trapezoid Rule and a uniformly spaced set of points of distance h apart estimatethe following definite integral: ∫ 1

0

(sin(x)

x

)dx

Assuming that the true solution to the definite integral is 0.946083070367 compute an esti-mate for the convergence rate of the Trapezoid Rule with respect to refinement of the ‘meshspacing’ h.

27


The following listing gives some SAGE code for the trapezoid rule that can be used tonumerically find the value of the desired definite integral.

Listing 3.1: Straight Gaussian Eliminationvar ( ’ x ’ ) ;f ( x ) = ( s i n (x )/x ) ;t rue = round ( i n t e g r a t e ( f ( x ) , x , 0 . 0 , 1 . 0 ) , 2 0 ) ;p r i n t t rue

a = 0 .0b = 1 .0eps = 1e−14;

n = 2∗∗8; # Number o f Nodes

h = (b − ( a+eps ) ) / ( n−1);

Int = (h /2 . 0 )∗ ( f ( a+eps ) + f (b ) ) ;

f o r i in range (1 , n−1):x i = a+(h∗ i )Int = Int + h∗ f ( x i ) ;e r r o r = ( abs ( t rue − Int ) ) ;

p r i n t Int ;p r i n t e r r o r ;

Here when we are establishing the value for the rate of convergence for the trapezoid rulethe above code needs to be run several times for regular refinement of grid spacing.

Considering the error as it relates to the mesh spacing h

h =b− an− 1

where n is the number of nodes used in the approximation. Then

errori =

∣∣∣∣true− approximationh

∣∣∣∣. = Chα

Then for regularly refined mesh spacing h, h/2, h/4 the value of the convergence rate α maybe found. Such that

α =

(ln(errori)

ln(errori+1)

)ln(2)

For the example case given:

28


Nodes 4 8 16 32 64 128 256Value 0.943291429 0.945570776 0.945971522 0.946056954 0.946076747 0.946081514 0.946082684Error 0.002791641 0.000512294 0.000111549 2.61E-05 6.32E-06 1.56E-06 3.86E-07Rate 2.446069334 2.19929713 2.094659561 2.046178376 2.022812206 2.011338136

This shows the convergence rate to be 2.0.

3.1.1 Newton-Cotes Quadrature

Newton-Cotes Quadrature routines can be derived using the method of undetermined coef-ficients. The idea becomes to exactly integrate “numerically” the first n polynomial basisfunctions by appropriately assigning weights

w1, w2, . . . , wn

and choosing evaluation pointsx1, x2, . . . , xn

such that the first n polynomial basis functions are integrated over the interval [a, b] exactlyby

n∑i=1

wif(xi).

Polynomial basis functions are defined to be of the form:

Polynomial Basis function i := xi−1, for i = 1, 2, 3, . . .

Thus, the first three polynomial basis functions are:

1, x, x2

The evaluation points x1, x2, . . . , xn on a given interval [a, b] are chosen in an open or closedfashion.

• For a set of open points we choose points that are equally spaced on the interior of theinterval [a, b].

• Choosing points in a closed fashion the end points of the interval are used.

Note that choosing points in an closed fashion requires using as least two points. Choosingpoints in an open fashion must be done using atleast one point. Figure 3.1.1 illustrates thepoints that should be used as quadrature inputs for a three point closed, and three pointopen quadrature scheme over the interval [a, b].

Once the evaluation points (or x′is) have been picked the idea becomes to solve for theappropriate weights w1, w2, . . . , wn.

29


Figure 3.1.1: Quadrature points for a three point open (left) and closed (right) Newton-Cotesquadrature method on the interval [a, b].

Example: One Point Newton-Cotes Quadrature

Deriving a one point open interval Newton-Cotes quadrature method will exactly integratethe first polynomial basis function on a sub-interval [a, b]. Here it is expected that∫ b

a

f(x) dx ≈ w1x01 =

∫ b

a

x0 dx = b− a

where the quadrature point x1 used is picked to be the center of the interval [a, b]. Thus,

x1 =b+ a

2.

andw1(1) = b− a =⇒ w1 = b− a

and the Midpoint Quadrature rule has been established with∫ b

a

f(x) dx ≈ w1f(x1) = (b− a)f

(b+ a

2

).

Example: Two Point Newton-Cotes Quadrature

A two point closed end Newton-Cotes quadrature may be established by choosing the quadra-ture points:

x1 = a and x2 = b.

The goal is to have the quadrature rule integrate the first two polynomial basis functions

1 and x.

Using the two quadrature points x1 = a and x2 = b the goal is to exactly integrate the twopolynomial basis functions. This establishes the following two equations

w1x01 + w2x

02 =

∫ b

a

1 dx = b− a

w1x11 + w2x

12 =

∫ b

a

x dx =1

2

(b2 − a2

)30


with two unknowns w1 and w2. Writing the system in a simplified form yields:

w1 + w2 = b− a and w1a+ w2b =1

2

(b2 − a2

)Solving for w1 and w2 yields:

w1 = w2 =b− a

2.

Note this quadrature technique is exactly the same as the trapezoid quadrature rule for [a, b]that had been previously established:∫ b

a

f(x) dx ≈ w1f(x1) + w2f(x2) =(b− a)

2f(a) +

(b− a)

2f(b) = (b− a)

(f(a) + f(b)

2

).

Continuing in this manner allows for Simpson’s Rule to be established.

31


3.1.2 Gaussian Quadrature

Gaussian Quadrature is similar to that of Newton-Cotes Quadrature rules with the exceptionthat the quadrature points are treated as a degree of freedom, and determined to obtainhigher orders of accuracy. For any n point Quadrature rule the quadrature points (or xi’s)and weights (wi’s) are used such that:∫ b

a

f(x) dx ≈n∑i=1

wif(xi)

For a Gaussian Quadrature rule the weights and quadrature points are determined so thatthe first 2n monomial basis functions are integrated exactly. This means that an n pointGaussian Quadrature rule can integrate polynomials of the form

p(x) = a2n−1x2n−1 + a2n−2x

2n−2 + · · ·+ a2x2 + a1x

1 + a0

exactly.

Example

Consider deriving a two point Gaussian Quadrature Rule for the interval [−1, 1]. Note thatthe resulting rule can be linearly mapped to any interval of size [a, b]. Here the first fourmonomial basis functions

x0, x1, x2, and x3

will be integrated exactly. Thus,

w1x01 + w2x

02 =

∫ b

a

x0 dx = 2 (3.1.1)

w1x11 + w2x

12 =

∫ b

a

x1 dx = 0 (3.1.2)

w1x21 + w2x

22 =

∫ b

a

x2 dx =2

3(3.1.3)

w1x31 + w2x

32 =

∫ b

a

x3 dx = 0 (3.1.4)

(3.1.5)

32


Solving a system of equations in SAGE for two equations and two unknowns for variablesx1, x2 may be acomplished using:

Listing 3.2: SAGE Code for solving a two equation Non-Linear Systemvar ( ’ x1 x2 ’ )( x1 , x2 )eqA = 7x1 + 3x2 ==0eqB = x1∗∗2 + x2∗∗2 == 2/3show ( so l v e ( [ eqA , eqB ] , x1 , x2 ) )

One possible solution found using SAGE yields:

x1 =

√3

3, x2 = −

√3

3, w1 = 1, and w2 = 1.

Try using these points and weights to approximate the definite integral∫ 1

−1

f(x) dx

where f(x) is defined to be:

• f(x) = 3.0

• f(x) = −2.5x+ 3.0

• f(x) = 4.6x2 − 2.5x+ 3.0

• f(x) = −5.1x3 + 4.6x2 − 2.5x+ 3.0

• f(x) = (x− 0.2) ∗ (x+ 0.2) ∗ (x− 0.9)

• f(x) = 6.3x4 − 5.1x3 + 4.6x2 − 2.5x+ 3.0

What is the error in the approximation using the quadrature rule when compared withintegrating each of the functions exactly. Plot each of the functions.

Once you have the two-point Gaussian Quadrature rule mastered. How would you derivea Thee-Point Gaussian Quadrature Rule. What is the highest degree polynomial that yournew Quadrature rule will integrate exactly?

When I used sage to compute the points and wights I received:

x1 = 0, x2 = −√

15

5, x3 =

√15

5, w1 =

8

9, w2 =

5

9, and w3 =

5

9

33


34

Chapter 4

Polynomial Interpolation

“When you are wrestling for possession of a sword, the man with the handle always wins.

” â

− Neal Stephenson, Snow Crash

The goal of interpolation is to fit a function exactly through a series of points. The mainidea is the interpolating function goes through some required collection of points exactly.Usually we build interpolating functions using a collection of basis functions and multiplythe given basis functions by a collection of coefficients such that the required set of points issatisfied.

gapinnotes

The idea of our collection of basis functions was to allow them to have a Kronecker Deltaproperty:

δi,j =

0 if i 6= j1 if i = j

For a given set of data points we have seen that if li(x) is a polynomial with the KroneckerDelta property at the given data set:

x x0 x1 · · · xny y0 y1 · · · yn

The interpolating polynomial of the set is:

P (x) =n∑i=0

li(x)yi, (4.0.1)

which can be seen by observing

P (xj) =n∑i=0

li(xj)yi = lj(xj)yj = yj

35


We can built an set of basis functions li(x) for our specified data set using a product offactors. Consider

li(x) =

(x− x0

xi − x0

)(x− x1

xi − x1

)· · ·(x− xi−1

xi − xi−1

)(x− xi+1

xi − xi+1

)· · ·(x− xnxi − xn

)Notice that the ith term in the sequence is skipped and that the li(x) expressions have theKronecker Delta Property. Then

li(x) =n∏

j=0,j 6=i

(x− xjxi − xj

)∀i ∈ 0, 1, . . . , n.

here P (x) in (4.0.1) is the Lagrange Form of the interpolating polynomial, and the basisfunctions li(x) are often called cardinal polynomials.

Class Activity

Using sage find the first five Lagrange Interpolating Polynomials evenly spaced on the interval[−1, 1] then modify your code to find the Lagrange Interpolating polynomial for the data setlisted below:

x 0 2 3 4y 7 11 28 63

The following sage code is used:

Listing 4.1: ]Code to Create 5 Lagrange Cardinal Polynomials on [-1,1]xnodes = [−1.0 , −0.5 , 0 . 0 , 0 . 5 , 1 . 0 ] ;ynodes = [ 0 . 0 , 0 . 0 , 0 . 0 , 0 . 0 , 0 . 0 ] ;

L0(x ) = 1 . 0 ;L1(x ) = 1 . 0 ;L2(x ) = 1 . 0 ;L3(x ) = 1 . 0 ;L4(x ) = 1 . 0 ;

f o r i in range ( l en ( xnodes ) ) :i f ( i != 0 ) :

L0(x ) = L0(x )∗ ( ( x − xnodes [ i ] ) / ( xnodes [0]− xnodes [ i ] ) ) ;i f ( i != 1 ) :




L4(x ) = L4(x )∗ ( ( x − xnodes [ i ] ) / ( xnodes [4]− xnodes [ i ] ) ) ;

36


To modify the code for the second portion of the activity note that we need to change thenodes vectors.

37


-1 -0.5 0.5 1

-0.4

-0.2

0.2

0.4

0.6

0.8

1

Figure 4.0.1: Plot of the first five Lagrange cardinal polynomials on [−1, 1].

38


-1 1 2 3 4 5

-2e5

2e5

4e5

6e5

8e5

Figure 4.0.2: The Lagrange Interpolating polynomial on [0, 4].

39


4.0.1 Error in Polynomial Interpolation

Lets consider the error in polynomial interpolation. Given the following sets of points find apolynomial using SAGE that interpolates the data:

• Two Points:

x 0 10y 10 0

• Three Points:

x 0 4 10y 10 5 0

• Four Points:

x 0 4 7 10y 10 5 2 0

• Five Points:

x 0 4 7 9 10y 10 5 2 3 0

Typically the thought is:

The larger the number of nodes used the better the interpolating is between thenode for a data set.

Is this correct?

Consider the Runge function on the interval [−5, 5].

R(x) =1

x2 + 1

What would the interpolating polynomial look like if we used 11 equally spaced points tocreate a polynomial interpolating function that approximates the Runge function.

40


4.0.2 Highlights

Two main methods for creating the interpolating polynomial for a data set.

• Lagrange method.

– Establish base function that are one at a specific node and zero at all other nodes.

– Multiply by the interpolating coefficients at the given nodes.

• Newtons method for creating the interpolating polynomial.

– Recursive

• Chebyshev Points used to redistribute the points about an interval.

41


4.1 Cubic Splines

The idea behind cubic splines it to fit a set of n+ 1 points:

t t0 t1 t2 . . . tn−1 tny y0 y1 y2 . . . yn−1 yn

with a piece wise continuous function. The function domain is broken up such that:

S(t) =

φ0(t) , t ∈ [t0, t1)φ1(t) , t ∈ [t1, t2)...

...φn(t) , t ∈ [tn, tn+1]

where the φi(t) functions have been defined by:

φi(t) = αi + βit+ γit2 + δit

3 for i ∈ 0, 1, 2, . . . , n− 1

The traditional conditions on a cubic spline function are such that the function is piecewise continuous with additional constraints on the first and second derivatives of the splinematching at the data points (nodes) used to create the spline function.

Insuring that the spline function passes through the constructing nodes for each piecewisecomponent yields constraints of the form:

φi(ti) = αi+βiti+γi(ti)2+δi(ti)

3 = yi and φi(ti+1) = αi+βiti+1+γi(ti+1)2+δi(ti+1)3 = yi+1

On interior nodes of the spline (excluding the end nodes (t0, y0) or (tn+1, yn+1)) matchingthe first and second derivative yields constraints:

φ′i(ti+1) = φ′i+1(ti+1)

=⇒ βi + 2γi(ti+1) + 3δi(ti+1)2 = βi+1 + 2γi+1(ti+1) + 3δi+1(ti+1)2

andφ′′i (ti+1) = φ′′i+1(ti+1)

=⇒ 2γi + 6δi(ti+1) = 2γi+1(ti+1) + 6δi+1(ti+1)

=⇒ γi + 3δi(ti+1) = γi+1(ti+1) + 3δi+1(ti+1).

In the case of a periodic cubic spline all conditions are met, and the system of expressionsthat are used to define the spline is complete. In the case of non-periodic splines there aretwo equations left to our discretion to make the system complete. Traditionally these arehandled by augmenting conditions on the end nodes of the spline. A natural spline has thesecond derivative set to zero on the end nodes. The reader is also encouraged to look up theproper condition on the end nodes for the not-a-knot spline.

42

Numerical

Meth

odsNotes

Draft:

May

5,2017

For a natural spline the system matrix created to solve for the coefficents α, β, γ, and δ is:

Ax = y

where,xT = (α0, β0, γ0, δ0, α1, β1, γ1, δ1, . . . , αn−1, βn−1, γn−1, δn−1)T

yT = (y0, y1, 0, 0, y1, y2, 0, 0 . . . yn−1, yn, 0, 0)T

and

A =

1 t0 t20 t30 0 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 01 t1 t21 t31 0 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 00 1 2t1 3t21 0 −1 −2t1 −3t21 0 0 0 0 . . . 0 0 0 0 0 0 0 00 0 1 3t1 0 0 −1 −3t1 0 0 0 0 . . . 0 0 0 0 0 0 0 00 0 0 0 1 t1 t21 t31 0 0 0 0 . . . 0 0 0 0 0 0 0 00 0 0 0 1 t2 t22 t32 0 0 0 0 . . . 0 0 0 0 0 0 0 00 0 0 0 0 1 2t2 3t22 0 −1 −2t2 −3t22 . . . 0 0 0 0 0 0 0 00 0 0 0 0 0 1 3t2 0 0 −1 −3t2 . . . 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 1 t2 t22 t32 . . . 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 1 t3 t23 t33 . . . 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 1 2t3 3t23 . . . 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 1 3t3 . . . 0 0 0 0 0 0 0 0...

......

......

......

......

......

... . . . ......

......

......

......

0 0 0 0 0 0 0 0 0 0 0 0 . . . 1 tn−2 t2n−2 t3n−2 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 . . . 1 tn−1 t2n−1 t3n−1 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 . . . 0 1 2tn−1 3t2n−1 0 −1 −2tn−1 −3t2n−1

0 0 0 0 0 0 0 0 0 0 0 0 . . . 0 0 1 3tn−1 0 0 −1 −3tn−1

0 0 0 0 0 0 0 0 0 0 0 0 . . . 0 0 0 0 1 tn−1 t2n−1 t3n−1

0 0 0 0 0 0 0 0 0 0 0 0 . . . 0 0 0 0 1 tn t2n t3n0 0 2 6t0 0 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 2 6tn

43


Class Activity

Recall that arc length of a curve at is represented by y = f(x) can be calculated by:

L =

∫ b

a

√1 +

(dy

dx

)2

dx

Using the points:(−3,−2), (−1, 1), (5,−3), (9, 7)

• Fit a natural cubic-spline to the data set.

• Compute the arc length of the natural cubic spline you have just found.

44

Chapter 5

Initial Value Problems

“Fear is the path to the dark side. Fear leads to anger. Anger leads to hate. Hate leads

to suffering. I sense much fear in you. ”

− Yoda, The Phantom Menace

Lets start this by thinking about the following:

If there were only two types of creature on the planet (zombies andtheir food) how would you go about modeling the population of eachtype?

Often times the problem on interest has a value that is changing with respect to some othervalue (possibly time). Problems of this nature have the form:

∂u

∂t= f(u) and u(0) = specified value (5.0.1)

Predator-Prey models are of this form. Let u(u(t), v(t)) then:

u′ = f(u)

whereu′ =

(dudtdvdt

)=

(a(v + b)uc(u+ d)v

)= f(u)

where u and v represent populations of a predator (zombies) and prey population respec-tively over time, and the values of a, b, c, and d are fixed model parameters describing theinteraction of the species. Note in order to determine a unique solution to the given system ofODE described above we need to also have some information about the system at a specifictime:

u(t0) = u0.

Note lots of systems can be written in the form of a first order system just as the Predator-Prey problem above.

45


Example:

Write the following system as an equivalent first-order system of ODE’s:

y′′′ = y′′ + ty.

Here we let

y1 = y′

y2 = y′1 = y′′

and the system described above may be transformed to:

y′2 = y2 + ty

y′1 = y′2y′ = y1

A first order system!

The goal now is to find solutions to problems of the form stated above. That is can we findthe solution to a system of first order ordinary differential equations. Equations of the formof (5.1.5) are known as initial value problems.

You already know the basics!

Note that the system we are looking to solve is:

u′ = f(u)

with a given initial condition. Then.

uk+1 − uk =

∫ tk+1

tk

f(t,u(t)) dt

and we have

uk+1 = uk +

∫ tk+1

tk

f(t,u(t)) dt

and we use a numerical quadrature rule to evaluate the integral.

5.1 Second Order Runge-Kutta Method

Derivation of the second order method described below may be found in the class text. Theclassic Runge-Kutta Method of Order 2 for solving (5.1.5) is given by:

K1 = h f(t, u) (5.1.2)K2 = h f(t+ h, u+K1) (5.1.3)

u(t+ h) = u(t) +1

2(K1 +K2) (5.1.4)

This is sometimes known as Huen’s method and is derived using trapezoid quadrature.

46


Class Activity

Consider using the this 2nd order Runge-Kutta Method to solve the following initial valueproblem:

du

dt= (2− t)u

u(2) = 1.

and compare your numerical answer with the exact solution:

u(t) = e−12

(t−2)2

Note it may be helpful to think of the problem in two parts: one starting from time ‘2’ andprogressing forward in time, and a second part starting from the given initial condition andmoving backward in time.

47


Solving the Zombie Problem:

Let u(u(t), v(t)) be the populations of our predators, v (zombies) and their prey u (whateverzombies eat)1.

u′ = f(u)

whereu′ =

(dudtdvdt

)=

((α1 − β1v)u

(−α1 + β2u)v

)= f(u)

The parameters parameters in the model are:

α1 = Natural Birth Rate of Prey in Isolationα2 = Natural Birth Rate of Predator in Isolationβ1 = Effect of interactions between two populationsβ2 = Effect of interactions between two populations

Note we need an initial value for each population to start the modeling process.

We can solve problems of the above type using Runge-Kutta methods.

uk+1 = uk +

∫ tk+1

tk

f(t,u(t)) dt

Last class we looked at a classic second order Runge-Kutta Method. Here we will considera fourth order method.

Fourth Order Runge-Kutta Method

A more accurate Runge-Kutta Method is to use the forth order scheme where the initialvalue problem of the form (5.1.5) is solved in the following manner:

u(t+ h) = u(t) +h

6(K1 + 2K2 + 2K3 +K4)

where

K1 = f(t, u)

K2 = f

(t+

h

2, u+

K1

2

)K3 = f

(t+

h

2, u+

K2

2

)K4 = f (t+ h, u+K3)

1This is the classic Lotka-Volterra model

48


This scheme is order O(h5), and requires only 4 function evaluations of f(t, u) making it avery popular method for numerically solving ODEs.

Coding

Before we can start coding what is expected by the method:

• We are using a quadrature rule to solve the integral. Thus we need to evaluate afunction. In this instance the function we have is a vector function with two unknownsu, and v.

• We also nee a starting population value for both u and v.

• We need a time step size h.

• We need a stopping time in the future.

• We need information about the birth rates and interaction effects.

α1 = 1, α2 = 0.5, β1 = 0.1, and β2 = 0.02

Once the model is working:

By now you should all have a working code lets play with it for a second.

• Using the suggested values for the birth and interaction rates, can you find any (non-trivial) points where the two populations are stationary. That is setting the two pop-ulations to these initial values and the populations do not change? Note you do notneed to run your code to examine this?

Try 25 prey and 10 predator.

• What happens if we change the model slightly so that the modeling system is givenby2:

du

dt= (α1 − β1v)u

dv

dt=

(α1 − β2

(vu

))v

• How do you interpret the phase-portrait for this second model.

2This is the Leslie-Gower Model

49


Listing 5.1: MATLAB Driver For Runge-Kutta Methods

% This i s a s imple Runge−Kutta Driver f o r the t e s t func t i on :% du/dt = (2 − t )u% with : u (2 ) = 1 ;%% Note the ture s o l u t i o n i s g iven as% u( t ) = exp (−0.5∗( t−2)^2)

c l e a r a l l ;c l o s e a l l ;

% To so l v e the ODE numer i ca l ly we s t a r t with :T = 0 : 0 . 1 : 4 ;True = exp (−0.5∗(T−2 . 0 ) .^2 ) ;

p l o t (T, True ) ;hold ;

% Compute and approximation us ing the Runge−Kutta Method .h = 0.01Star t = 2 . 0 ; % Comes from the i n i t i a l Condit ion .EndValue = 4 . 0 ;BEndValue = 0 . 0 ;uold = 1 . 0 ; % Comes from i n i t a l cond i t i on .

% Moving Forwardt cu r r en t = Star t ;index = 1 ;

whi l e ( t cu r r en t < EndValue )% Advance the s o l u t i o n .u( index ) = RungeKutta2 ( uold , t cur rent , h ) ;% Update a l l the s o l u t i o n s and save index .uold = u( index ) ;t ( index ) = tcu r r en t ;t cu r r en t = tcu r r en t + h ;index = index + 1 ;

end

% Moving Backwardt cu r r en t = Star t ;index = 1 ;uold = 1 . 0 ;

50


whi le ( t cu r r en t > BEndValue )% Advance the s o l u t i o n .uB( index ) = RungeKutta2 ( uold , tcur rent ,−h ) ;% Update a l l the s o l u t i o n s and save index .uold = uB( index ) ;tB ( index ) = tcu r r en t ;t cu r r en t = tcu r r en t − h ;index = index + 1 ;

end

p lo t ( t ,u , ’ r ’ ) ;p l o t ( tB ,uB, ’ r ’ ) ;

Listing 5.2: MATLAB Runge-Kutta Method Order 2func t i on [ unew ] = RungeKutta2 ( uold , t , h )%RungeKutta2 i s a s imple second order Runge−Kutta s o l v e r .% So lve s the i n i t a l va lue problem% with du/dt = (2 − t )u

K1 = h∗(2 − t )∗ uold ;K2 = h∗(2 − ( t+h ) )∗ ( uold+K1 ) ;unew = uold + 0 .5∗ (K1 + K2 ) ;end

51


Often times the problem on interest has a value that is changing with respect to some othervalue (possibly time). Problems of this nature have the form:

∂u

∂t= f(t, u) and u(0) = specified value (5.1.5)

Predator-prey models are of this form.

du

dt= a(v + b)u

dv

dt= c(u+ d)v

where u and v represent populations of a predator and prey species respectively, and thevalues of a, b, c, and d are fixed model parameters describing the interaction of the species.

The goal with problems of the form stated above is to find the solution to a first orderordinary differential equation. Thus, we are looking to find a function that satisfies thegiven differential relationship (potentially more than one simultaneously). Equations ofthe form of (5.1.5) are known as initial value problems. To solve problems of this typenumerically routines based on Taylor series expansions have been developed to propagatea solution forward from a known starting point. Derivation of the second order methoddescribed below may be found in the class text.

5.2 Second Order Runge-Kutta Method

The classic Runge-Kutta Method of Order 2 for solving (5.1.5) is given by:

K1 = h f(t, u) (5.2.6)K2 = h f(t+ h, u+K1) (5.2.7)

u(t+ h) = u(t) +1

2(K1 +K2) (5.2.8)

Example

Consider using the this 2nd order Runge-Kutta Method to solve the following initial valueproblem:

du

dt= (2− t)u

u(2) = 1.

and compare your numerical answer with the exact solution:

u(t) = e−12

(t−2)2

52


5.3 Fourth Order Runge-Kutta Method

A more accurate Runge-Kutta Method is to use the forth order scheme where the initialvalue problem of the form (5.1.5) is solved in the following manner:

u(t+ h) = u(t) +1

6(K1 + 2K2 + 2K3 +K4)

K1 = h f(t, u)

K2 = h f

(t+

h

2, u+

K1

2

)K3 = h f

(t+

h

2, u+

K2

2

)K4 = h f (t+ h, u+K3)

This scheme is order O(h5), and requires only 4 function evaluations of f(t, u) making it avery popular method for numerically solving ODEs.

53


Listing 5.3: MATLAB Driver For Runge-Kutta Methods

% This i s a s imple Runge−Kutta Driver f o r the t e s t func t i on :% du/dt = (2 − t )u% with : u (2 ) = 1 ;%% Note the ture s o l u t i o n i s g iven as% u( t ) = exp (−0.5∗( t−2)^2)

c l e a r a l l ;c l o s e a l l ;

% To so l v e the ODE numer i ca l ly we s t a r t with :T = 0 : 0 . 1 : 4 ;True = exp (−0.5∗(T−2 . 0 ) .^2 ) ;

p l o t (T, True ) ;hold ;

% Compute and approximation us ing the Runge−Kutta Method .h = 0.01Star t = 2 . 0 ; % Comes from the i n i t i a l Condit ion .EndValue = 4 . 0 ;BEndValue = 0 . 0 ;uold = 1 . 0 ; % Comes from i n i t a l cond i t i on .

% Moving Forwardt cu r r en t = Star t ;index = 1 ;

whi l e ( t cu r r en t < EndValue )% Advance the s o l u t i o n .u( index ) = RungeKutta2 ( uold , t cur rent , h ) ;% Update a l l the s o l u t i o n s and save index .uold = u( index ) ;t ( index ) = tcu r r en t ;t cu r r en t = tcu r r en t + h ;index = index + 1 ;

end

% Moving Backwardt cu r r en t = Star t ;index = 1 ;uold = 1 . 0 ;

54


whi le ( t cu r r en t > BEndValue )% Advance the s o l u t i o n .uB( index ) = RungeKutta2 ( uold , tcur rent ,−h ) ;% Update a l l the s o l u t i o n s and save index .uold = uB( index ) ;tB ( index ) = tcu r r en t ;t cu r r en t = tcu r r en t − h ;index = index + 1 ;

end

p lo t ( t ,u , ’ r ’ ) ;p l o t ( tB ,uB, ’ r ’ ) ;

Listing 5.4: MATLAB Runge-Kutta Method Order 2func t i on [ unew ] = RungeKutta2 ( uold , t , h )%RungeKutta2 i s a s imple second order Runge−Kutta s o l v e r .% So lve s the i n i t a l va lue problem% with du/dt = (2 − t )u

K1 = h∗(2 − t )∗ uold ;K2 = h∗(2 − ( t+h ) )∗ ( uold+K1 ) ;unew = uold + 0 .5∗ (K1 + K2 ) ;end

55


56

Chapter 6

The Heat Equation

“But then, Cap’n Crunch in a flake form would be suicidal madness; it would last

about as long, when immersed in milk, as snowflakes sifting down into a deep fryer. No,

the cereal engineers at General Mills had to find a shape that would minimize surface

area, and, as some sort of compromise between the sphere that is dictated by Euclideangeometry and whatever sunken treasure related shapes that the cereal aestheticians were

probably clamoring for, they came up with this hard -to-pin-down striated pillow formation.”â

− Neal Stephenson, Cryptonomicon

Basic Notation

The heat equation is a fundamental partial differential equation that is used to describe howthe temperature of a defined domain changes over time.

In order to correctly write the heat equation it can be seen that the value of the temperatureof the object at any specified point will depend not only on the observed location, but alsoon the time of the observation. Denoting a position in space by x where:

x =

xyz

and time using the variable t we have the temperature function

u(t,x) for x ∈ Ω

where Ω is describing the domain of interest. This could be a beam, a room, a part for anengine (I’ll leave this to the reader’s imagination for now).

57


In order to describe how the temperature u is changing with respect to space and time weneed to have a notation to describe changes in the temperature with respect to these differentquantities (that is take derivatives). The mathematical notation that allows for taking thederivative of a multi-variable function with respect to a specified variable is:

∂u(t,x)

∂x:= derivative of u with respect to x

Here the derivative of u is taken with respect to the spatial variable x treating all othervariables (t, y, z) as constants. The partial derivative of u may be taken with respect to t,x, y, and z, and would be denoted as:

∂u

∂t,∂u

∂x,∂u

∂y, and

∂u

∂z

respectively. Note that it has been established that u is a function of time and space so wemay write u(t,x) as u.

In order to simplify notation operators that combine the different spatial derivatives areoften used. The gradient operator is defined as:

∇ :=

∂∂x

∂∂y

∂∂z

This is the multi-dimensional equivalent of a first derivative, and is a vector.

Recalling the dot product vector operation, where if two vectors

w =

w1

w2

w3

and v =

v1

v2

v3

are ‘dotted’ with each other:

w · v = w1v1 + w2v2 + w3v3

allows for the definition of other operators based on the gradient operator.

In order to define the ‘Heat Equation’ the divergence of a vector needs to be considered.The divergence operator denoted by ‘div’ or ‘∇·’ is defined such that:

divw = ∇ ·w =∂w1

∂x+∂w2

∂y+∂w3

∂z.

Taking the divergence of the gradient vector yields the Laplace or Laplacian operator denotedby ∆, or ∇ · ∇.

∆ = ∇ · ∇ =

∂∂x

∂∂y

∂∂z

·

∂∂x

∂∂y

∂∂z

=∂2

∂x2+

∂2

∂y2+

∂2

∂z2

58


Some texts use the notation ∇2 for the Laplacian operator as well.

Armed with a bunch of new notation we can now write down the Heat Equation that modelsthe change of temperature on a given domain with respect to time and space. Defining thetemperature function u(t,x) we have:

∂u

∂t− c∆u = 0 on Ω

where c ∈ R is a diffusion coefficient. To complete the problem definition an initial conditionshould be given as well as a description of the boundary conditions.

Specifically lets consider the problem in a single dimension. For instance we may desire tomodel the temperature of a beam or wire that has an initial heat profile along its length.We can consider submerging the ends of the wire into an ice bath. This will keep themconsistently at a temperature of 0C. Lets also consider our wire to be of length 2π units,and having an initial profile of sin(x).

The Heat Equation under these assumptions reduces to:

∂u

∂t− c∂

2u

∂x2= 0 for x ∈ [0, 2π] governing PDE

u(t, 0) = 0 boundary conditionu(t, 2π) = 0 boundary conditionu(0, x) = sin(x) initial condition

6.1 Numerical Solution

The goal now becomes to model the PDE description of the Heat Equation using a numericalimplementation. To do this we will need a way to approximate the different differentialoperators, as well as a discrete approximation to the problem not only in space but also withrespect to time!

6.1.1 Taylors’s Theorem For Approximations

In order to get an approximation for the different differential operators that are involved inthe PDE we turn to Taylors theorem. Consider the following two Taylor series expansions:

f(x+ h) = f(x) + f ′(x)h+1

2f ′′(x)h2 +

1

6f ′′′(x)h3 +O(h4) (6.1.1)

f(x− h) = f(x)− f ′(x)h+1

2f ′′(x)h2 − 1

6f ′′′(x)h3 +O(h4) (6.1.2)

59


By adding expression (6.1.1) to (6.1.2) we obtain:

f(x+ h) + f(x− h) = 2f(x) + f ′′(x)h2 +O(h4) (6.1.3)

Solving (6.1.3) for f ′′(x) an approximation of the second derivative of f(x) is found usingclose points a distance of h on either side of x. Specifically

f ′′(x) =f(x− h)− 2f(x) + f(x+ h)

h2+O(h2). (6.1.4)

Expression (6.1.4) is a called a second order centered difference approximation of f ′′(x).The approximation uses only values of the function f to approximate the second derivative.This is especially useful as an analytic function for f may not always be known.

6.1.2 Discritizing

An order to approximate the true solution u(t, x) for the temperature of the wire a discretesolution is considered. A grid of points is placed on the problem domain Ω allowing foran approximate solution to the value of u(t, x) to be considered at these discrete points.Specifically we can consider:

x x x1 2 3 xx xi-1 i i+1 mx

h

The domain has been divided into m−1 equal length segments using m discrete points. Thelength of these segments will be considered h with

h =2π − 0

m− 1

and the approximate solution will be obtained at m points in the spatial domain.

xi = h× i for i ∈ 1, 2, . . . ,m.

This gives x1 = 0 and xm = 2π.

Lets denote our discrete approximation to the temperature as:

uh(t, x).

Note we will also need to look at the problem using a discrete time step too. If we considersome final time of interest to be T a discrete time step can be defined as:

∆t =T

k

60


where ‘k’ is the total number of time steps to be taken during the simulation. By defining

tn := n×∆t

then we can use the following notation to describe the approximation of the temperature:

uh(tn, xi) = uni .

Explicit Time Stepping

Using a finite difference for the temporal derivative and temporally lagging the derivedcentered difference formula for the second derivative yields the following discrete expressionfor the heat equation:

un+1i − uni

∆t− c

(uni−1 − 2uni + uni+1

h2

)= 0

Note if we solve the given expression for un+1i we have:

un+1i =

(c∆t

h2

)uni−1 +

(1− 2c∆t

h2

)uni +

(c∆t

h2

)uni+1.

Considering this for all values of i allows for a matrix system to be written that can advancethe solution from tn to time tn+1. Here we have:

Aunh = un+1h

where the entries in A are given by

ai,i =

(1− 2c∆t

h2

), and ai,i−1 = ai,i+1 =

(c∆t

h2

)∀ i ∈ 2, 3, . . . ,m− 1

Leaving the non-temproal derivative terms to be evaluated at time n in the method describedabove is known as a Forward Euler or Explicit time-stepping technique. The Forward Eulerscheme for the heat equation has a time step restriction such that:

∆t ≤ 1

2h2

This is known as the CFL-condition or stability condition. The explicit Forward Euler Timestepping scheme is unstable when this condition is not satisfied. Larger time steps may betaken provided the numerical method is modified.

61


Boundary Conditions

For the first and last row we set the values of a1,1 and am,m equal to 1 as they are determinedby our boundary condition. The values of u are known on the boundary for all time t ∈ [0, T ].Boundary conditions for PDE systems where the values are set to know or specified valueson the boundary are called Dirichlet Boundary Conditions .

• Note that the values in the system form a tri-diagonal matrix system.

• Note that to advance the solution from one disctete time step to the next we only needto do a matrix multiply. This ease of solving the system comes at a cost! The solutionsadvancement suffers from small time step restrictions. This is typical for explicit timestepping schemes like the Forward Euler technique described here.

6.2 Implicit Time Stepping

To improve on the time-step restriction of our numerical method we can consider discretizingthe Heat Equation as:

un+1i − uni

∆t− c

(un+1i−1 − 2un+1

i + un+1i+1

h2

)= 0 (6.2.5)

The finite difference approximation of the second derivative in (6.2.5) is now considered atthe current time-step instead of at the known lagged time step. Finding the value of un+1

h nowrequires solving a system instead of a matrix multiply. The system in (6.2.5) is a BackwardEuler or Implicit time stepping scheme and has a larger range of stable time steps with

∆t ≈ h

Rearranging the terms in (6.2.5) we can work to set up the following matrix system foradvancing the solution temporally:

un+1i − c∆t

h2

(un+1i−1 − 2un+1

i + un+1i+1

)= uni

Here we can define:γ =

c∆t

h2

This gives:−γun+1

i−1 + (1 + 2γ)un+1i − γun+1

i+1 = uni

Thus,Aun+1

h = unh

withai,i−1 = −γ, ai,i = (1 + 2γ), and ai,i+1 = −γ.

62


Make note that the Dirichlet boundary conditions here may be set in the same manner aswith the explicit time stepping method already discussed. Specifically

a1,1 = 1 and am,m = 1.

Dirichlet boundary conditions, being known, may also be taken out of the system completely.This is done by adjusting the values on the right hand side vector (unh in our current example).

6.2.1 Tri-Diagonal Systems

The matrix system created in order to solve (6.2.5) is a tri-diagonal system or bandedsystem, and can be solved readily using a tridiagonal solver.

Consider the system: d1 c1

a2 d2 c2

a3 d3 c3

. . . . . . . . .am dm

x1

x2...xm

=

b1

b2...bm

(6.2.6)

The two general steps in solving the system can be thought of as:

1. Forward Elimination (step 1): subtract a2/d1 times row 1 from row 2. Creating a0 in the a2 position. In general this modifies the system values as follows (for i ∈2, 3, . . . ,m:

di = di −(

aidi−1

)ci−1

bi = bi −(

aidi−1

)bi−1

2. Back Substitution (Step 2): In this portion of the solver the value for xi are computedstarting with xm = bm

dm, and in general as:

xi =(bi − cixi+1)

di

The following listing is a simple Tri-Diagonal solver.

Listing 6.1: A Tri-Diagonal Solverfunc t i on [ x ] = Tr iSo lve ( A, b )% Tr iSo lve A s imple Tri−Diagonal So lve r% ( John Ch r i s p e l l 2/27/2013)% This s imple func t i on Takes a Tr ida igona l System

63


% and re tu rn s a s o l u t i o n vec to r x% that s a t i s f i e s Ax = b .

n = length (b ( : , 1 ) ) ;% Forward El iminat ionf o r i =2:n

xmult = A( i , i −1)/A( i −1, i −1);A( i , i ) = A( i , i ) − xmult∗A( i −1, i ) ;b ( i , 1 ) = b( i , 1 ) − xmult∗b( i −1 ,1) ;

end

% Back Subs t i tu t i onx (n , 1 ) = b(n , 1 ) /A(n , n ) ;f o r i = (n−1):(−1):1

x ( i , 1 ) = (b( i , 1 ) − A( i , i +1)∗x ( i +1))/A( i , i ) ;end

Use the code above for a Tri-Diagonal solver to implement a Backward Euler time steppingscheme for the Heat Equation.

6.3 Order of Accuracy

For the different methods we have discussed it is often necessary to confirm that the methodshave been implemented correctly.

Consider adding a forcing function to the Heat Equation such that we can drive the solutionto a known given function value. For example if we desire that the true solution to thefunction be:

u(t, x) = e−t sin(x) cos(x)

By settingf(t, x) = 4ce−t sin(x) cos(x)− e−t sin(x) cos(x)

and adding f as a right hand side forcing function in our governing PDE we have the system:

∂u

∂t− c∂

2u

∂x2= f for x ∈ [0, 2π] governing PDE

u(t, 0) = 0 boundary conditionu(t, 2π) = 0 boundary conditionu(0, x) = sin(x) cos(x) initial condition

allows us to know the true solution and the error can be examined. Specifically we canlook at the norm of the error between the computed solution and the true solution for anycomputational grid. If the approximated solution was continuous in space and time then theerror between the true solution denoted by u(t, x) and the approximated solution by uh(t, x)

64


would be written as:

error =

∫ T

0

∫ 2π

0

|u(t, x)− uh(t, x)| dx dt (6.3.7)

Since the solution to u(x, t) is being approximated using m discrete points with separation h,and advanced in time using k time steps of size ∆t we can compute a discreta approximationusing a discrete approximation (6.3.7) as:

errorh,∆t =

(k∑j=1

(m∑i=1

(u(tj, xi)− uh(tj, xi))2 h

)∆t

) 12

wheretj := j ×∆t, and xi = i× h.

Note that this discrete norm mimics the continuous error given by the integral expression ofthe error given in (6.3.7). With a method of computing the error in our approximation athand the order of accuracy of a computational method may be examined discretely.

65


66

Chapter 7

Appendices

67


68

Bibliography

[1] R. Burden and J. Faires. Numerical Analysis. Brooks/Cole, Boston, ninth edition edition,2011.

[2] W. Cheney and D. Kincaid. Numerical Mathematics and Computing. Brooks/Cole,Boston, seventh edition, 2012.

[3] M.T. Heath. Scientific Computing: An Introductory Survey, 2nd Edition. McGraw-Hill,New York, 2002.

69

Index

Dirichlet Boundary Conditions, 24

Forward Euler, 24full pivoting, 17

intermediate value theorem, 29

Laplace Operator, 20

partial pivoting, 17predator-prey problems, 47

scientific notation, 10

70

Documents

Notes: Introduction to Numerical Methodsjchrispe/MATH_250/NotesFull3.pdf · Notes: Introduction to Numerical Methods J.C. Chrispell Department of Mathematics Indiana University of