View
229
Download
0
Category
Preview:
Citation preview
8/16/2019 PP16 Lec2 Intro2&Arch1
http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 1/18
1.1
Parallel Processingsp2016
lec#2
Dr M Shamim Baig
8/16/2019 PP16 Lec2 Intro2&Arch1
http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 2/18
1.2
Why Use Parallel
Computing?• Limits to serial computing:
– Both physical & practical reasons pose
significant constraints to simply building
eer faster serial computers !"oors a$%
• imits to miniaturi'ation
• (ransmission speeds
• Po$er )issipation
• *nergy consumption
• *conomic limitations
8/16/2019 PP16 Lec2 Intro2&Arch1
http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 3/18
1.+
Why Use Parallel Computing?• Save time and/or money:
• ,hile- Parallel clusters can be built from cheap-commodity components.
• But- (hro$ing more resources at a tas shortens
its time to completion
• /oling problems in shorter time results in saing
big "oney in many practical situations.
8/16/2019 PP16 Lec2 Intro2&Arch1
http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 4/18
1.
Why Use Parallel Computing?• Provide concurrent Working environment:
– single compute resource can only do one thingat a time. "ultiple computing resources can do many
things simultaneously.
– or e3ample- ccess 4rid www.accessgrid.org proides
global collaboration net$or $here people around the$orld can meet & conduct $or 5irtually5
8/16/2019 PP16 Lec2 Intro2&Arch1
http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 5/18
1.
Why Use Parallel Computing?• Integrating Remote Resources usage:
– 7sing compute resources on a $ide area net$or- oreen the 8nternet $hen local compute resources arescarce.
– or e3ample• /*(89home satiathome.berkeley.edu uses oer ++0-000
computers for a compute po$er oer 2: (era;P/• olding9home folding.stanforg.edu uses oer +0-000
computers for a compute po$er of .2 Peta;P/
8/16/2019 PP16 Lec2 Intro2&Arch1
http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 7/181.?
*3ample 4rand @hallenge Problems
;nes that cannot be soled in a reasonableamount of time $ith todayAs computers.;biously- an e3ecution time of 10 years is
al$ays unreasonable e.g
• 4lobal $eather forecasting• "odeling motion of astronomical bodies.
• @ryptography *ncrypted code breaing
8/16/2019 PP16 Lec2 Intro2&Arch1
http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 8/181.:
"odeling 4lobal ,eather orecast
• /uppose $hole global atmosphere diided into
cells of si'e 1 m × 1 m × 1 m to a height of10 m !10 cells high% about × 10: cells.
• /uppose each calculation re>uires 200 floatingpoint operations. 8n one time step- 1011 floating
point operations necessary.• (o forecast $eather oer ? days using 1minute
interals- a computer operating at 14flops !10C floating point operations<s% taes 106 seconds oroer 10 days.
• (o perform calculation in minutes re>uirescomputer operating at +. (flops !+. × 1012 floating point operations<sec%.
8/16/2019 PP16 Lec2 Intro2&Arch1
http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 9/181.C
"odeling stronomical Bodies "otion
•*ach body attracted to each other body
by
graitationalforces. "oement of each body predicted by calculating
total force on each body.
• ,ith bodies- 1 forces to calculate for each body-
$e re>uire appro3. D2 calculations.
• fter determining ne$ positions of bodies-
calculations are repeated.
• gala3y might hae- say- 1011 stars.
•*en if each calculation is done in 1 ms !e3tremelyoptimistic figure%- it taes 10C years for one iteration
using D2 algorithm & almost 1 year for one iteration
using an efficient D log2D appro3imate algorithm
• @anAt do $ithout faster parallel processing
8/16/2019 PP16 Lec2 Intro2&Arch1
http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 10/181.10
*ncrypted @ode Breaing• !n brute"force attack# simply try e$ery key
• most basic attack# proportional to key si%e
• assume either know / recognise plainte&t "ey Si#e
$its%&umer o'
(lternative "eys)ime re*uired at
+ decryption/,s)ime re*uired at
+-. decryptions/,s
0 00 1 23 +-4 0+ ,s 1 536
minutes
03+5 milliseconds
5. $78S% 05. 1 930 +-+. 055 ,s 1 ++20years
+-3-+ hours
+06 $(8S% 0+06 1 32 +-6 0+09 ,s 1 532 +-02
years
532 +-+6 years
+.6 $78S% 0+.6 1 39 +-5- 0+.9 ,s 1 534 +-.
years
534 +-- years
0.characterspermutation
0.; 1 2 +-0. 0 +-0. ,s 1 .32
+-+0 years
.32 +-. years
8/16/2019 PP16 Lec2 Intro2&Arch1
http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 11/181.11
(he uture
• )uring the past 20 years- the trends
indicated by eer
faster net$ors- distributed systems- &
multiprocessor computer architectures
'e$en at the desktop le$el( clearly show that
parallelism is the future of computing
8/16/2019 PP16 Lec2 Intro2&Arch1
http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 12/18
Implicitly Parallel Processorarchitectures 'or ILP
1.12
8/16/2019 PP16 Lec2 Intro2&Arch1
http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 13/181.1+
/cope of Parallelism
• @onentional architectures coarselycomprise of processor - memory & datapath
• *ach of these components present
significant performance bottlenecs.• 8t is important to understand each of
these performance bottlenecs
• Parallelism addresses each of thesecomponents in significant $ays..
• ,e start ne3t $ith the processor leel
parallel architectures.
8/16/2019 PP16 Lec2 Intro2&Arch1
http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 14/181.1
8mplicit Parallelism (rends in
"icroprocessor rchitectures
• "icroprocessor cloc speeds hae postedimpressie gains oer the past t$o decades
!t$o to three orders of magnitude%.
• Eigher leels of deice integration hae made
aailable a large number of transistors.• (he >uestion of ho$ best to utili'e these
resources effectiely is an important one.
• @urrent processors use these resources by
e3ecuting multiple instructions in the samecycle using multiple pipelines < functional units
• (he precise manner in $hich these instructions
are selected and e3ecuted proides impressie
diersity in architectures.
8/16/2019 PP16 Lec2 Intro2&Arch1
http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 15/181.1
8mplicit Parallel rchitectures
8P "icroprocessors
• Pipelined Processors
• /uperscalar Processor
• F8, Processor
8/16/2019 PP16 Lec2 Intro2&Arch1
http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 16/181.16
Pipelining
• (his is ain to an assembly line formanufacture of cars
• Pipelining oerlaps arious stages ofactiity !instruction e3ecution orarithmetic operation% to achiee theperformance gain.
• or e3ample- in instruction pipeline aninstruction can be e3ecuted $hile thene3t one is being decoded & the ne3tone is being fetched.
8/16/2019 PP16 Lec2 Intro2&Arch1
http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 17/181.1?
Pipeline Performance
• 8nstruction & rithmeticunit Pipeline
• 8deal pipeline /peedup calculation & imits
• @hained Pipeline Performance
• )he speed"up of a pipeline is e$entually limited by thenumber of stages * time of slowest stage.
• +or this reason# con$entional processors rely on $erydeep"pipeline ',- stage pipeline is an e&le of deep pipeline compared to normal pipeline of " stages(
8/16/2019 PP16 Lec2 Intro2&Arch1
http://slidepdf.com/reader/full/pp16-lec2-intro2arch1 18/181 1:
Pipeline Performance Bottlenecs
• Pipeline has follo$ing performance bottlenecs
Gesource @onstraint
)ata )ependency
Branch Prediction
• 0ppro& e$ery 1"th instruction is a conditional 2ump3
)his re4uires $ery accurate branch prediction.• )he penalty of a prediction error grows with the depthof the pipeline# since a larger number of instructionswill ha$e to be flushed .
• Eence need for better architecturesHHHH
Recommended