13
1a-1.1 Parallel Computing Demand for High Performance ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson Dec 27, 2012 slides1a-1

1a-1.1 Parallel Computing Demand for High Performance ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson Dec 27, 2012 slides1a-1

Embed Size (px)

Citation preview

Page 1: 1a-1.1 Parallel Computing Demand for High Performance ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson Dec 27, 2012 slides1a-1

1a-1.1

Parallel Computing

Demand for High Performance

ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson Dec 27, 2012 slides1a-1

Page 2: 1a-1.1 Parallel Computing Demand for High Performance ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson Dec 27, 2012 slides1a-1

1a-1.2

Parallel Computing• Using more than one computer, or a computer with more

than one processor, to solve a problem.

Motives• Usually faster computation.

• Very simple idea– n computers operating simultaneously can achieve the

result faster– it will not be n times faster for various reasons

• Other motives include: fault tolerance, larger amount of memory available, ...

Page 3: 1a-1.1 Parallel Computing Demand for High Performance ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson Dec 27, 2012 slides1a-1

1a-1.3

Parallel programming has been around for more than 50 years. Gill writes in 1958*:

“... There is therefore nothing new in the idea of parallel programming, but its application to computers. The author cannot believe that there will be any insuperable difficulty in extending it to computers. It is not to be expected that the necessary programming techniques will be worked out overnight. Much experimenting remains to be done. After all, the techniques that are commonly used in programming today were only won at the cost of considerable toil several years ago. In fact the advent of parallel programming may do something to revive the pioneering spirit in programming which seems at the present to be degenerating into a rather dull and routine occupation ...”

* Gill, S. (1958), “Parallel Programming,” The Computer Journal, vol. 1, April, pp. 2-10.

Page 4: 1a-1.1 Parallel Computing Demand for High Performance ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson Dec 27, 2012 slides1a-1

Some problems needing a large number of computers

Computer animation•1995 - Toy Story processed on 200 processors (cluster of 100 dual processor machines).•1999 Toy Story 2, -- 1400 processor system•2001 Monsters, Inc. -- 3500 processors (250 servers each containing 14 processors)

Frame rate relatively constant, but increasing computing power provides more realistic animation.

Sequencing the human genome•Celera corporation uses 150 four processor servers plus a server with 16 processors.

1a.4Facts from: http://www.eng.utah.edu/~cs5955/material/mattson12.pdf

Page 5: 1a-1.1 Parallel Computing Demand for High Performance ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson Dec 27, 2012 slides1a-1

1a-1.5

“Grand Challenge” Problems

Ones that cannot be solved in a “reasonable” time with today’s computers.

Obviously, an execution time of 10 years is always unreasonable.

Examples

•Global weather forecasting•Modeling motion of astronomical bodies•Modeling large DNA structures …

Page 6: 1a-1.1 Parallel Computing Demand for High Performance ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson Dec 27, 2012 slides1a-1

1a-1.6

Weather Forecasting

• Atmosphere modeled by dividing it into 3-dimensional cells.

• Calculations of each cell repeated many times to model passage of time.

Temperature, pressure, humidity, etc.

Page 7: 1a-1.1 Parallel Computing Demand for High Performance ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson Dec 27, 2012 slides1a-1

1a-1.7

Global Weather Forecasting• Suppose global atmosphere divided into 1 mile 1 mile 1 mile cells to a

height of 10 miles - about 5 108 cells.

• Suppose calculation in each cell requires 200 floating point operations. In one time step, 1011 floating point operations necessary in total.

• To forecast weather over 7 days using 1-minute intervals, a computer operating at 1Gflops (109 floating point operations/s) takes 106 seconds or over 10 days. Obviously that will not work.

• To perform calculation in 5 minutes requires computer operating at 3.4 Tflops (3.4 1012 floating point operations/sec)

In 2012, a typical Intel processor operates in 20-100 Gflops region, a GPU up to 500Gflops, so we will have to use multiple computers/cores

3.4 Tflops achievable with a parallel cluster, but note 200 floating pt operations per cell and 1 x 1 x 1 mile cells are just guesses.

Page 8: 1a-1.1 Parallel Computing Demand for High Performance ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson Dec 27, 2012 slides1a-1

“Erik P. DeBenedictis of Sandia National Laboratories theorizes that a zettaFLOPS (ZFLOPS) computer is required to accomplish full weather modeling of two week time span.[1] Such systems might be built around 2030.[2]”

Wikipedia “FOPS,” http://en.wikipedia.org/wiki/FLOPS

[1] DeBenedictis, Erik P. (2005). "Reversible logic for supercomputing". Proc. 2nd conference on Computing frontiers. New York, NY: ACM Press. pp. 391–402.

[2] "IDF: Intel says Moore's Law holds until 2029". Heise Online. April 4, 2008.

ZFLOP = 1021 FLOPS

1a.8

Page 9: 1a-1.1 Parallel Computing Demand for High Performance ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson Dec 27, 2012 slides1a-1

1a-1.9

Modeling Motion of Astronomical Bodies

Each body attracted to each other body by gravitational forces.

Movement of each body predicted by calculating total force on each body and applying Newton’s laws (in the simple case) to determine the movement of the bodies.

Page 10: 1a-1.1 Parallel Computing Demand for High Performance ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson Dec 27, 2012 slides1a-1

1a-1.10

Modeling Motion of Astronomical Bodies • Each body has N-1 forces on it from the N-1 other bodies -

O(N) calculation to determine the force on one body (three dimensional).

• With N bodies, approx. N2 calculations, i.e. O(N2) *

• After determining new positions of bodies, calculations repeated, i.e. N2 x T calculations where T is the number of time steps, i.e. the total complexity is O(N2T).

* There is an O(N log2 N) algorithm, which we will cover in the course

Page 11: 1a-1.1 Parallel Computing Demand for High Performance ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson Dec 27, 2012 slides1a-1

1a-1.11

• A galaxy might have, say, 1011 stars.

• Even if each calculation done in 1 ms (extremely optimistic figure), it takes:

• 109 years for one iteration using N2 algorithm

or

• Almost a year for one iteration using the N log2 N algorithm assuming the calculations take the same time (which may not be true).

• Then multiple the time by the number of time periods!

We may set the N-body problem as one assignment using the basic O(N2) algorithm. However, you do not have 109 years to get the solution and turn it in.

Page 12: 1a-1.1 Parallel Computing Demand for High Performance ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson Dec 27, 2012 slides1a-1

My code with graphics running on coit-grid server and forwarded to client PC. Will be explained how to do this later with assignment

1a-1.12

Nbodies.wmvDouble click to run

Page 13: 1a-1.1 Parallel Computing Demand for High Performance ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson Dec 27, 2012 slides1a-1

Questions

1a-1.13