PP16 Lec2 Intro2&Arch1

8/16/2019 PP16 Lec2 Intro2&Arch1

http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 1/18

Parallel Processingsp2016

Dr M Shamim Baig

Why Use Parallel

Computing?• Limits to serial computing:

– Both physical & practical reasons pose

significant constraints to simply building

eer faster serial computers !"oors a$%

• imits to miniaturi'ation

• (ransmission speeds

• Po$er )issipation

• *nergy consumption

• *conomic limitations

Why Use Parallel Computing?• Save time and/or money:

• ,hile- Parallel clusters can be built from cheap-commodity components.

• But- (hro$ing more resources at a tas shortens

its time to completion

• /oling problems in shorter time results in saing

big "oney in many practical situations.

Why Use Parallel Computing?• Provide concurrent Working environment:

– single compute resource can only do one thingat a time. "ultiple computing resources can do many

things simultaneously.

– or e3ample- ccess 4rid www.accessgrid.org proides

global collaboration net$or $here people around the$orld can meet & conduct $or 5irtually5

Why Use Parallel Computing?• Integrating Remote Resources usage:

– 7sing compute resources on a $ide area net$or- oreen the 8nternet $hen local compute resources arescarce.

– or e3ample• /*(89home satiathome.berkeley.edu uses oer ++0-000

computers for a compute po$er oer 2: (era;P/• olding9home folding.stanforg.edu uses oer +0-000

computers for a compute po$er of .2 Peta;P/

http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 7/181.?

*3ample 4rand @hallenge Problems

;nes that cannot be soled in a reasonableamount of time $ith todayAs computers.;biously- an e3ecution time of 10 years is

al$ays unreasonable e.g

• 4lobal $eather forecasting• "odeling motion of astronomical bodies.

• @ryptography *ncrypted code breaing

http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 8/181.:

"odeling 4lobal ,eather orecast

• /uppose $hole global atmosphere diided into

cells of si'e 1 m × 1 m × 1 m to a height of10 m !10 cells high% about × 10: cells.

• /uppose each calculation re>uires 200 floatingpoint operations. 8n one time step- 1011 floating

point operations necessary.• (o forecast $eather oer ? days using 1minute

interals- a computer operating at 14flops !10C floating point operations<s% taes 106 seconds oroer 10 days.

• (o perform calculation in minutes re>uirescomputer operating at +. (flops !+. × 1012 floating point operations<sec%.

http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 9/181.C

"odeling stronomical Bodies "otion

•*ach body attracted to each other body

graitationalforces. "oement of each body predicted by calculating

total force on each body.

• ,ith bodies- 1 forces to calculate for each body-

$e re>uire appro3. D2 calculations.

• fter determining ne$ positions of bodies-

calculations are repeated.

• gala3y might hae- say- 1011 stars.

•*en if each calculation is done in 1 ms !e3tremelyoptimistic figure%- it taes 10C years for one iteration

using D2 algorithm & almost 1 year for one iteration

using an efficient D log2D appro3imate algorithm

• @anAt do $ithout faster parallel processing

http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 10/181.10

*ncrypted @ode Breaing• !n brute"force attack# simply try e$ery key

• most basic attack# proportional to key si%e

• assume either know / recognise plainte&t "ey Si#e

$its%&umer o'

(lternative "eys)ime re*uired at

+ decryption/,s)ime re*uired at

+-. decryptions/,s

0 00 1 23 +-4 0+ ,s 1 536

minutes

03+5 milliseconds

5. $78S% 05. 1 930 +-+. 055 ,s 1 ++20years

+-3-+ hours

+06 $(8S% 0+06 1 32 +-6 0+09 ,s 1 532 +-02

532 +-+6 years

+.6 $78S% 0+.6 1 39 +-5- 0+.9 ,s 1 534 +-.

534 +-- years

0.characterspermutation

0.; 1 2 +-0. 0 +-0. ,s 1 .32

+-+0 years

.32 +-. years

(he uture

• )uring the past 20 years- the trends

indicated by eer

faster net$ors- distributed systems- &

multiprocessor computer architectures

'e$en at the desktop le$el( clearly show that

parallelism is the future of computing

Implicitly Parallel Processorarchitectures 'or ILP

http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 13/181.1+

/cope of Parallelism

• @onentional architectures coarselycomprise of processor - memory & datapath

• *ach of these components present

significant performance bottlenecs.• 8t is important to understand each of

these performance bottlenecs

• Parallelism addresses each of thesecomponents in significant $ays..

• ,e start ne3t $ith the processor leel

parallel architectures.

8mplicit Parallelism (rends in

"icroprocessor rchitectures

• "icroprocessor cloc speeds hae postedimpressie gains oer the past t$o decades

!t$o to three orders of magnitude%.

• Eigher leels of deice integration hae made

aailable a large number of transistors.• (he >uestion of ho$ best to utili'e these

resources effectiely is an important one.

• @urrent processors use these resources by

e3ecuting multiple instructions in the samecycle using multiple pipelines < functional units

• (he precise manner in $hich these instructions

are selected and e3ecuted proides impressie

diersity in architectures.

8mplicit Parallel rchitectures

8P "icroprocessors

• Pipelined Processors

• /uperscalar Processor

• F8, Processor

Pipelining

• (his is ain to an assembly line formanufacture of cars

• Pipelining oerlaps arious stages ofactiity !instruction e3ecution orarithmetic operation% to achiee theperformance gain.

• or e3ample- in instruction pipeline aninstruction can be e3ecuted $hile thene3t one is being decoded & the ne3tone is being fetched.

http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 17/181.1?

Pipeline Performance

• 8nstruction & rithmeticunit Pipeline

• 8deal pipeline /peedup calculation & imits

• @hained Pipeline Performance

• )he speed"up of a pipeline is e$entually limited by thenumber of stages * time of slowest stage.

• +or this reason# con$entional processors rely on $erydeep"pipeline ',- stage pipeline is an e&ample of deep pipeline compared to normal pipeline of " stages(

http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 18/181 1:

Pipeline Performance Bottlenecs

• Pipeline has follo$ing performance bottlenecs

Gesource @onstraint

)ata )ependency

Branch Prediction

• 0ppro& e$ery 1"th instruction is a conditional 2ump3

)his re4uires $ery accurate branch prediction.• )he penalty of a prediction error grows with the depthof the pipeline# since a larger number of instructionswill ha$e to be flushed .

• Eence need for better architecturesHHHH

PP16 Lec2 Intro2&Arch1

Documents

Curso IOS - Intro2

Computer Intro2

Intro2 tech finallessonplan

Computer Graphics Intro2

Mallozzi EDC618Sum2011 Course intro2

Intro2 research

Paths to leadership intro2

Intro2 11 Macro

Brm Intro2

Figure Intro2

Micropower Intro2

01 INTRO2 (1)

07 intro2 oop

Stritch Intro2 Elluminate2 Handout

Angel Intro2

Health Camp Sf Intro2

Intro2 Cuda Moayad

Eng Geology (Intro2)

Outsiders identity intro2

Intro2 linux&opensource