35
COSC 3361 – Numerical Analysis I Edgar Gabriel COSC 3361 Numerical Analysis I Solution of nonlinear equations (II) Edgar Gabriel Fall 2005

COSC 3361 Numerical Analysis I - University of Houston

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

COSC 3361Numerical Analysis ISolution of nonlinear equations (II)

Edgar GabrielFall 2005

Page 2: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Summary of the last lecture

• Three methods presented for finding zeros of non-linear equations– Bisection method linear convergence– Newton’s method quadratic convergence– Secant method convergence of order ~1.62

• All methods have their advantages and disadvantages– Number of iterations required until convergence– Number of function evaluations required per iteration

(=speed)– Stability– Function derivatives required/not required

Page 3: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Summary of the last lecture (II)

• None of the methods did take into account the structure of the provided functions (e.g. polynomials, trigonometric functions, exponential functions etc.)

Page 4: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Computing Roots of Polynomials

• A polynomial of the form

– has degree n if– A polynomial of degree n has exactly n roots in the

complex plain, it being agreed that each root shall be counted a number of times equal to its multiplicity

• Complex roots:

012

21

1 ...)( azazazazazp nn

nn +++++= −

0≠na

1)( 2 += xxf

(1)

Page 5: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Polynomials

• If a polynomial p of degree n is divided by a linear factor the result is a quotient q and a remainder e

– q is a polynomial of degree n-1– If (c = r) is a zero of p, e = 0

• Applying such a division repeatedly leads to

– with being a constant

cz −

ezqczzp +−= )()()(

nn qrzrzrzzp ))...()(()( 21 −−−=nq

Page 6: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Horner’s Algorithm

• also called nested multiplication or synthetic division• If a polynomial p and a (complex) number are given,

Horner’s algorithm will produce the number and

• From (2) we can derive

• Let the unknown polynomial q be represented by

0z)( 0zp

0

0 )()()(

zzzpzp

zq−−= (2)

)()()()( 00 zpzqzzzp +−=

1110 ...)( −

−+++= nn zbzbbzq

(3)

(4)

Page 7: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Horner’s Algorithm (II)

• Substitute (4) and (1) in (3)

• Comparison of the coefficients of like powers of z leads to

)()()()( 00 zpzqzzzp +−=

)()...)((... 0011

10011

1 zpbzbzbzzazazaza nn

nn

nn +++−=+++ −

−−

nn ab =−1

112 −−− += nnn zbab…

110 zbab +=0000 )( bzazp +=

Page 8: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Horner’s Algorithm (III)

• The algorithm in compact form

input n, a(n), z0b(n-1) = a(n)for k = n : 0 step -1b(k-1) = a(k) + z0*b(k)

end

Page 9: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Example

• Use Horner’s algorithm to evaluate p(3) with p being

1 -4 7 -5 -23 3 -3 12 21

1 -1 4 7 19

• Thus

2574)( 234 −−+−= zzzzzp

19)74)(3()( 23 +++−−= zzzzzp

Page 10: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Taylor expansion with Horner’s algorithm

• Suppose we are looking for the coefficients of

• Elements could be calculated by the usual, inefficient formula

kc

012

21

1 ...)( azazazazazp nn

nn +++++= −

01

010 ...)()( czzczzc nn

nn ++−+−= −

!)( 0

)(

kzp

ck

k =

Page 11: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Taylor expansion with Horner’s algorithm

but:and because of

you have

• This process is repeated until all coefficients of are found

)( 00 zpc =

12

011

00

0 ...)()()()(

)( czzczzczz

zpzpzq n

nn

n ++−+−=−−= −

−−

)( 01 zqc =

kc

Page 12: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Example• Find the Taylor expansion at for the polynomial of

the previous example 1 -4 7 -5 -2

3 3 -3 12 211 -1 4 7 19

3 3 6 301 2 10 37

3 3 151 5 25

3 31 8

30 =z

19)3(37)3(25)3(8)3()( 234 +−+−+−+−= zzzzzp

Page 13: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Taylor expansion with polynomials

• Please note:)( 00 zpc =)( 01 zqc =

function horner( n, a(n, 0<=k<=n), z0)alpha = a(n)beta = 0for k = n-1 : 0 step -1beta = alpha + z0*betaalpha = a(k)+ z0*alpha

endreturn alpha, beta

Page 14: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Theorem on Horner’s Method

• Let . Define pairsfor i=n, n-1, …, 0 by the algorithm

Then and• Easy to verify, that the term is correct for n=0, since

and • Easy to verify for n=1:

012

21

1 ...)( azazazazazp nn

nn +++++= −

),( ii βα

���

++==

+++ ),(),()0,(),(

111 jjjjjj

nnn

zza βααβααβα

)01( ≥≥− jn

)(0 zp=α )(0 zp′=β

0)( azp = 0)( =′ zp

01)( azazp +=))(),((),(),(),( 110111000 zpzpaazazza ′=+=++= βααβα

Page 15: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Theorem on Horner’s Method (II)

012

21

1 ...)( azazazazazp nn

nn +++++= −

)...( 121

11

0 azazazaza nn

nn +++++= −

−−

)(0 zzqa +=

)()()( zqzqzzp +′=′

Suppose the theorem works for n<m. For n=m+1 you have to add than one step:

),(),( 000111 βααβα zza ++= −−−

))()(),(( 1 zqzzqzzqa ′++= −

))(),(( zpzp ′=

Page 16: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Newton iterations

)()(

1n

nnn xf

xfxx

′−=+

1

0

)()(

cc

xxpxp

x nn

nn −=

′−=

input n, a(n, 0<=k<=n), z0, maxit, precision

for j = 1 : maxit[alpha, beta] = horner (n, a, z0 )z1 = z0 – alpha/betaif ( |z1-z0| < precision stopz0 = z1

end

Page 17: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Hardware issues

• Performance of numerical algorithm does not only depend on the algorithm, but also on the implementation of the algorithm

• Some hardware issues have to be understand for being able to design and efficiently implement numerical algorithms– pipelining concept– caches

Page 18: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Pipelining

• Splits a single (expensive) operation into several (cheap) sub-operations

• Each of the sub-operations can be executed in parallel

4 8 12

t

Non-pipelined execution of an operation for three elementsconsisting of four sub-operations

Pipelined execution of an operation for three elementsconsisting of four sub-operations

4 8 12

t

Page 19: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Pipelining (II)

• Example for a (fictive) pipelined addition

Comparison of leading sign

Comparison of exponents

Alignment shift

Addition

Normalization

Exponent

Data

Result

+,+

.100000e9+.10000e1

.100000e9+.000001e-5

.100000e9

.100000e9

.100000e9

)1()10( 8 +

)10( 80.1e9

Page 20: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Pipelining – Metrics (I)

Clocktime, time to finish one segment/sub-operationnumber of stages of the pipelinelength of the vectorStartup time in clocks, time after which the first result is

available,length of the loop to achieve half of the maximum speedAssuming a simple loop like:

for (i=0; i<n; i++) {

a[i] = b[i] + c[i];

}

cTmn

S

21N

MS =

Page 21: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Pipelining – Metrics (II)

Number of operations per loop iterationtotal number of operations for the loop, withSpeed of the loop is

For we get

op

totalop nopoptotal *=

)11

())1((*

+−=−+

==

nm

T

opnmTnop

timeop

F

cc

total

∞→n

cTop

F =max

Page 22: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Pipelining – Metrics (III)

Because of the Definition of we now get

or

and

�length of the loop required to achieve half of the theoretical peak performance of a pipeline is equal to the number of segments (stages) of the pipeline

21N

cc

Top

F

Nm

T

op21

21

)11

(max

21

==+−

211

21

=+−N

m

121 −= mN

Page 23: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Pipelining – Metrics (IV)

More general: is defined through

and leads to

E.g. for you get

� the closer you would like to get to the maximum performance of your pipeline, the larger the iteration counter of your loop has to be

αN

cc

Top

Nm

T

op α

α

=+−

)11

(

11

1*

−≈

α

α mN

43=α 3*

43 mN =

Page 24: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

The memory bottleneck (I)

• Every loop iteration requires 3 memory operations– 2 loads– 1 store

• For a micro-processor having a frequency of 2 GHz this would require

to satisfy one Floating Point Unit (FPU) • Most modern processors have 2 FPUs and 2 IUs which can work in

parallel

for (i=0; i<n; i++ ) {c[i] = a[i] + b[i];

}

sGBytessBytes /2410*2*4*3 19 =−

Page 25: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Memory technology (www.kingston.com/newtech)

• DDR: Double Data Rate SDRAM• Bandwidth of a memory module

withCycleOpfSBSB BUSBus /**max =

maxSB

BUSSB

BUSf

: max. memory bandwidth: Bandwidth of the memory bus (64 Bit = 8 Bytes): Frequency of the memory bus

Page 26: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Memory bandwidth

800 MB/s100 PC100 SDRAM

1.1 GB/s133PC133 SDRAM

4.2 GB/s266PC4200 DDR

3.7 GB/s233PC3700 DDR

3.2 GB/s200PC3200 DDR

2.7 GB/s166PC2700 DDR

2.1 GB/s133PC2100 DDR

1.6 GB/s100PC1600 DDR

max. bandwidthFrequency of memory bus (MHz)

Name

Page 27: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Memory modules (cont.)

• Dual Channel Memory: 2 I/O Channels between memory controller und memory module

• DDR2: further evolution of the DDR technology– uses 1.8 Volts vs. 2.5 Volts technology– larger capacity of the chips– higher frequency

6.4 GB/s

5.3 GB/s

4.2 GB/s

3.2 GB/s

Bandwidth of a module

12.8 GB/s800 MHzPC2-6400

10.6 GB/s667 MHzPC2-5300

8.4 GB/s533 MHzPC2-4200

6.4 GB/s400 MHzPC2-3200

Dual Channel DDR2 bandwidth

Frequency of memory bus

Name

Page 28: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Memory interleaving

• Split the main memory into several physical areas (banks)

• each area can serve a memory request without blocking the other ones– several memory requests can be interleaved as long as

they are using different memory banks• A PC has between 1-4 memory banks• (A vector supercomputer, e.g. NEC SX-8 has 8192

memory banks)

Page 29: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Memory hierarchies

2 – 50~ 1 MBCaches

1 - 2< 256 WordsRegister

100 - 1000~ 1 GBmain memory

> 106~ 100 GBPrimary datastorage (disk)

TB, PTBackup (tape)

Access time[cycles]

Size

Page 30: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Caches

• Fast memory which is closer to the CPU/FPU• significantly smaller than the main memory• often organized also in several hierarchies

– level 1 cache– level 2 cache– …

• Each of these levels is closer to the CPU, faster, and smaller

• Reason for not having only fast memory (=cache): money, money, money…

Page 31: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Caches (II)

• Caches are organized in cache-lines– e.g. on a PC it is typically 64 bytes

• cache hit: if the data, which the processor is asking for is already in the cache

• cache miss: if the data, which the processor is asking for is not in the cache yet– a good performing code needs a high cache hit/cache

miss rate– compilers/processors try to circumvent cache misses

through techniques like pre-fetching etc.

Page 32: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Caches (III)

• If a data element has to be loaded into the cache first, the whole cache-line is loaded– more than the processor asked for– the processor better uses this data, else the load of the

whole cache-line has been wasted!• Caches are organized internally either as

– direct mapped cache– n-way associative cache

Page 33: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Direct mapped cache

• The cache is split into chunks of length of the cache-line• Each address in the memory can uniquely be mapped

into a block of the cache

• Problem with direct mapped cache: two address, which map onto the same cache-block can cause consecutive cash misses

0xffff8e10 0xffff8e50 … 0xffff8f90

Page 34: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

n-way associative cache

• Cache is split into chunks of length of cache-line • each chunk is replicated n-times

– n is typically 2 or 4

• Problem of direct mapped cache is solved• Algorithms for determining which entry of a cache-block has to be replaced

– random replacement– least used replacement– longest not touched replacement

0xffff8e10 0xffff8e50 … 0xffff8f90

Page 35: COSC 3361 Numerical Analysis I - University of Houston

COSC 3361 – Numerical Analysis IEdgar Gabriel

Cache coherence protocols

• Problem: what happens if two processors share some data– if one processor modifies the data in the cache, the copy

of the same element in the other cache has to be invalidated

CPU 1 CPU 2

Cache Cache

Memory