9
Suntntnry-I'arallct procissing ha,sclncrgcd as lhe only cfTectivclrxrl for nrecting the evcrlasting qrrcst for higher cornputing powel.lo solve conlplex problents ln Sciencc and Ii^ginccr.ing. I,arallcl archilcclurr:s posscss features such a.s lruilt-in fautt tolernnce, redulrtlancy, in- crcased rellability, cost e{T€clivcnes.s,. bxparrdnllilitl, etc,, and lt l.he s&rlre tirne provide,contiaral)le perfofhlstrce witlr Ure Voh Ncunrnnn architecturc hasid conrpulers a( Lower costs. Itrrrthcrnrorc. thc compirtel's based olr parallel pnrcr$ring:rr.clritrclrrrcs ctn be designcrl, by nnd large, using off-the-shelf conrponcnls, rc<lucing design lirnc, The bcst charncteristic'f paralrel arcrritcctures is that thc upper lir'it ln perforrnance i.e' spced 'f cornpruratiorr is rrot likely ro be rencrrerr iu the ncxt few decades rylrcrens tlre perforinance achievahle in vorr Neurnann archilecturcs are currcntry reaching rrre theorcticar lir'it. Parallel Computing, thus u sheer nicestity, ha.sntade n tr.enrendous inrpact in recent tinrcs arrtl is likcty to rltairr ilre statrrs of.,fjtate-of_the- art" computcr shortly ancl riray ciculually relrlacc (hr: conycrrlional conrputers, Nevcrthcless, designilrg o[ optinrizeti parallel cornputcr I Involvcs careful selcctiiln of proces.sing elcnrents,nr€rnory, nrcihorlolo_ gy for intercoirnection lrelwcen thenr, rrretlrodology for hnnrllirrg lnstructions nnd data, handling of prucrsses, rlesiglr uf suitablc operating system, compilers, nlgoriilrmg etc. lf not designerl carefullv, the effect of adding ntolr: proccss{)rs in a parallrl conll}ulcr n,uy ,r,ri provide any risible spcetl up and in worst cn.scs it calr.turn out to lrc counler productive! In this pnper, vrrrious aspcclsof design of panrllcl archltectures are discussed hrieflt,. i l ELECTI{ONICS INI;ORM^'IION r& pi-n NNINC; MARC)t l9B9 Parallel \. t. I. IN'I'RODUC'I'ION onpul"ing powcr is cruciill for the dcvcloyrurcnt of nations Siuce ally advanccilrcnt in Scicnccand Engineering is fcasible only through application of sophisticatedcomputcrs for Sinrulation ancl lclated problenr solving. Cornpulcrs, high perfonnance cotn- puters in particular, are esselltial Lo push back thc curreut tcchnological barriers ancl move forward in the frontiers ofrScience ancl E,ngineering. The colnl)uters can be brordly classi[ied into thc following categor.ics depending,tlpol ,lr: Pcrforriiancc, fcatules cost etc.: (i) Superconrpdters , , (ii) Minisupcrcoilrput crs (iii) Mainfranrc cornputcrs '(iv) Supennini conrltuters;, (v) Minicornlluters (vi) Microconrputers (vii).i'crsonal cornputers Supcrcornpulcrs The classification of "supercomltutcr" is attributablc to thosc coliq)uters which ar'c' the fastcst, ntost powerful computers availgbleat any tinie. Prcseutly, the cornpu- ters with cCrinputing speedin ihc range of 200 to 1000 'lr{illion (nrega)filoating point.,Opcratiorls per Sccond (megaflops) and in thc pricc rarlgc ()[ $ 10 rtrillions are called supcrcorhpu.ters. ' Departnrcnt of [.lectrtrrrics. Ncw Dclhi-l l(XX)] 'l'he supercourputcrs are prirlarily used for carrying out reseat.ch in the area of Science ancl Engineerin(. Altlrough, chriing the last four-decacles, thJspecdif ' $upercolnputcrs have incrchEed by four orders..of rlraguitude, the quest lrtrincreased conrputing power is also increasillg asa sheer necessity for solving moreand nlore coruplex probleurs [lJ. I In fact, the cleruand for higher- conrputilrg power existsin a uun)berof real time qpltlications such as visirrrl ppsgessing lor cfetection of fast movingobjects sucllas nrissiles wherebut lor possessing the required power, tlre job call not be taken up. In thc real tinre vision proccssing a computing speed of 2500 megaflops t is essential to detectthe object in time. Arrother example is seismic clata processing where erploratory wells to detect preseuce of oil, is dug after, collccti,ng data about earth's surface throughvarious lulechanisnls altd analysing the data to explore the prescnce of I:lydrocarbous whichlnay be oil .bearing. Fig. 1 indicates the broadsteps iuvolved in oil explora- tion. Massive contputing power is required for convert- iug r aw clala into "Quality clata" usable by the , t'I,-lli-tt,: incricarcs as ro rrow rrrcusage of rrigh ' l)elfornlallce computcls woukl increase the productiv- ity and rluicken the proccss of identification o[ oil , rcservoirs through seisnric data processing[2] particu- Iarly in vicw of largc anrount of clata to be hanclled, poor accutacy levels in precliction -ancl' high costs . involvccl. 271, l1 Cornputinrg Srrbburaj + ,{ '\+ " '11 ll q' 'l'able l. Scisrnic Dsttr I'rocessing ": i;-. ':';ft;,r ..|.\.il i :.1 'l',. - Eaclr oil exploration Gencratis 5 x i0**10 sarnJrles of raw data. :. - Each san.rple of itaw Data lcquires 100 to 1ffX) flop to convert inlo a uslrblc fornr. - 10'*14 flop (Iffi'I'craflop) are to bc pclfornrcd pcr exploration. - If conrptrtir)ll powcr is lmflop (Mainfranre) 'I'ime taken for ' proclucirrg Quality Dnta is 25tXD. [lrs. or'1fffl days or 3 years. .* Using I gigaflops (sustairred) tinre /educcd (o I day. l -- Succcssratc iu' l9B3 tvas 17.5% - lrr;r wcck, orrc succcssful oil wbll can bc locatctl with I eigafloD ' c()nll)utcr (Supcr-Colirputcr) uscd. -: ' - Cost o[ Scisrnic SurVey--US $ 500,m0. - Cost of Wildcat (cxploratory) well-US $ 7 rnillion. ' More such exarnples. could be citccl where the denrandfor increascd computing power is evident. The nunrber crunclling lunction '-where hundreds gr o tlrousirrrds of nrillions lof conrputations can be per- ' ' forrred in ir secondin supcrconrfuter, is useful not dhly r in science, enginebring ancl rnilitary applications, but also in areas such as colrlnercial product design,' [rusirress iln(l nlally lt]ore. '|j. 'r:l

Parallel Computing

Embed Size (px)

Citation preview

Suntntnry-I'arallct procissing ha,s clncrgcd as lhe only cfTectivc lrxrlfor nrect ing the evcr last ing qrrcst for h igher cornput ing powel . lo solveconlplex problents ln Sciencc and Ii^ginccr.ing. I,arallcl archilcclurr:sposscss features such a.s lruilt-in fautt tolernnce, redulrtlancy, in-crcased rellability, cost e{T€clivcnes.s,. bxparrdnllilitl, etc,, and lt l.hes&rlre tirne provide,contiaral)le perfofhlstrce witlr Ure Voh Ncunrnnnarchitecturc hasid conrpulers a( Lower costs. Itrrrthcrnrorc. thccompir te l 's based ol r paral le l pnrcr$r ing:rr .c l r i t rc l r r rcs ctn be designcr l ,by nnd large, using off-the-shelf conrponcnls, rc<lucing design lirnc,The bcst charncter is t ic ' f paral re l arcrr i tcctures is that thc upper l i r ' i tln perforrnance i.e' spced 'f cornpruratiorr is rrot likely ro be rencrrerr iuthe ncxt few decades rylrcrens tlre perforinance achievahle in vorrNeurnann archilecturcs are currcntry reaching rrre theorcticar lir'it.Parallel Computing, thus u sheer nicestity, ha.s ntade n tr.enrendousinrpact in recent tinrcs arrtl is likcty to rltairr ilre statrrs of.,fjtate-of_the-art" computcr shortly ancl riray ciculually relrlacc (hr: conycrrlionalconrputers, Nevcrthcless, designilrg o[ optinrizeti parallel cornputcr IInvolvcs careful selcctiiln of proces.sing elcnrents, nr€rnory, nrcihorlolo_gy for intercoirnection lrelwcen thenr, rrretlrodology for hnnrllirrglnstructions nnd data, handling of prucrsses, rlesiglr uf suitablcoperating system, compilers, nlgoriilrmg etc. lf not designerl carefullv,the effect of adding ntolr: proccss{)rs in a parallrl conll}ulcr n,uy ,r,riprovide any risible spcetl up and in worst cn.scs it calr.turn out to lrccounler productive! In this pnper, vrrrious aspccls of design of panrllclarchltectures are discussed hrieflt,.

i l

ELECTI{ONICS INI;ORM^'IION r& pi-n NNINC; MARC)t l9B9

Parallel\.

t.

I. IN'I 'RODUC'I'ION

onpul" ing powcr is cruc i i l l for the dcvcloyrurcntof nat ions Siuce a l ly advancci l rcnt in Scicncc and

Engineering is fcasible only through application ofsophisticated computcrs for Sinrulation ancl lclatedproblenr solving. Cornpulcrs, high perfonnance cotn-puters in par t icu lar , are essel l t ia l Lo push back thccurreut tcchnological barr iers ancl move forward in thefrontiers ofrScience ancl E,ngineering. The colnl)uterscan be brord ly c lass i [ ied in to thc fo l lowing categor. icsdepending, t lpo l , l r : Pcr forr i iancc, fcatu les cost etc . :

( i ) Superconrpdters , ,

( i i ) Min isupcrcoi l rput crs(i i i) Mainfranrc cornputcrs

' ( iv) Supennin i conr l tu ters ; ,

(v) Min icorn l lu ters(v i ) Microconrputers

(v i i ) . i 'c rsonal cornputers

Supcrcornpulcrs

The c lass i f icat ion of "supercomltutcr" is at t r ibutablcto thosc coliq)uters which ar'c' the fastcst, ntost power fulcomputers availgble at any tinie. Prcseutly, the cornpu-ters wi th cCr input ing speed in ihc range of 200 to 1000' l r { i l l ion

(nrega) f i loat ing point . ,Opcrat ior ls per Sccond(megaf lops) and in thc pr icc rar lgc ( ) [ $ 10 r t r i l l ions arecalled supcrcorhpu.ters.

' Departnrcnt of [ . lectr t r r r ics. Ncw Dclhi - l l (XX)]

' l 'he supercourputcrs are prirlarily used for carryingout reseat.ch in the area of Science ancl Engineerin(.Al t l rough, chr i ing the last four-decacles, thJspecd i f

'

$upercolnputcrs have incrchEed by four orders..ofr l raguitude, the quest l r t r increased conrput ing power isalso increasillg as a sheer necessity for solving more andnlore coruplex probleurs [lJ. I

In fact, the cleruand for higher- conrputilrg powerexists in a uun)ber of real time qpltlications such asvisirrrl ppsgessing lor cfetection of fast moving objectssucll as nrissiles where but lor possessing the requiredpower, tlre job call not be taken up. In thc real tinrevision proccssing a computing speed of 2500 megaflops tis essent ial to detect the object in t ime.

Arrother example is seismic clata processing whereerploratory wel ls to detect preseuce of oi l , is dug after,collccti,ng data about earth's surface through variouslulechanisnls altd analysing the data to explore theprescnce of I : lydrocarbous which lnay be oi l .bearing.Fig. 1 indicates the broad steps iuvolved in oil explora-tion. Massive contputing power is required for convert-iug r aw clala into "Qual i ty clata" usable by the ,t'I,-lli-tt,:

incricarcs as ro rrow rrrc usage of rrigh

'

l )e l forn la l lce computc ls woukl increase the product iv-i ty and r lu icken the proccss of ident i f icat ion o[ o i l ,rcservoi rs through seisnr ic data processing[2] par t icu-Iar ly in v icw of largc anrount of c lata to be hancl led,poor accutacy levels in precliction

-ancl' high costs .

i nvo l vcc l .

271,

l1

CornputinrgSrrbburaj +

, {'\+ "' 1 1

ll

q'

' l 'able l. Scisrnic Dsttr I'rocessing

" : i ; - .' : ' ; f t ; , r

. . | . \ . i li :.1' l ',.

- Eaclr o i l explorat ion Gencrat is 5 x i0**10 sarnJr les of raw data. : .- Each san.rple of itaw Data lcquires 100 to 1ffX) flop to convert

i n l o a us l r b l c f o rn r .-

10 '*14 f lop ( I f f i ' I 'craf lop) are to bc pcl fornrcd pcr explorat ion.

- If conrptrtir)ll powcr is lmflop (Mainfranre) 'I' ime taken for

'

procluci r rg Qual i ty Dnta is 25tXD. [ l rs . or '1f f f l days or 3 years..* Using I g igaf lops (sustai r red) t inre /educcd (o I day. l- - Succcss ratc iu ' l9B3 tvas 17.5%- l r r ; r wcck , o r r c succcss fu l o i l wb l l can bc l oca t c t l w i t h I e i ga f l oD

'

c ( ) n l l ) u t c r (Supc r -Co l i r pu t c r ) uscd . - : '

- Cost o[ Scisrnic SurVey--US $ 500,m0.- Cost of Wi ldcat (cxploratory) wel l -US $ 7 rn i l l ion.

'

More such exarnples. could be citccl where thedenrand for increascd computing power is evident. Thenunrber cruncl l ing lunct ion ' -where

hundreds gr o

t l rousi r r rds of nr i l l ions lo f conrputat ions can be per- ' '

forrred in ir second in supcrconrfuter, is useful not dhly rin sc ience, enginebr ing ancl rn i l i tary appl icat ions, buta lso in areas such as col r lnerc ia l product design, '[rusirress i ln(l nlally lt]ore.

' | j .

'r:l

:ii.i

Il l

I ' i g . ISc i s r r r i e I ) i r t a I ' r occss i r r g ( ( ) r r sho l c & O l l s l r o r t J

I

Necd lirr l 'alt l lcl l)roccssing iv

I t i s cs t i r t r a t cd t l r a t i u tu rc sc i cn t i l i c i r nd c l t g i nc rc t i l ) gI ) rob l cn i s as wc l l as p lo l ) [ c l ns o I con ln l c r . c i a l na tu r . c , i r rsuch d i vc r ' sc f i e l c l s as f cn i ( ) t c scns i r rg , ac r .oc l ynd rn i cs ,\ / l lS lC . s i n tu la t i on , wca thc r ' [ o r ccas t i ng , t ] t o to l . ca rdes igu , f i l r n p roduc t i on c t c . r ' c r l u i l c l ) l occss ing s l l ce r l sfu r i r t cxccss t r f t l r r cc l l i g l r l l r 4 ' 5 . l g i g l r l l o | s i s t l r ctheoret ic i t l lU)per boun( lary f or ' ihc spcccl u,h ich t :orr lc lbc ob ta incd ' 1 r1 , , . , , , t s i ng l c l ) r occss rJ r con rpu te r . ' I ' hcspcci l <t i ,ur i ipr i r i - rssor col t iputcJ. is l i ru i tcc l r l r rc to theI cng t i r o f t i r nc t akcn [o r c l cc t r i cu l s i gna l s t r t t r , avc lt l t r ough Ihc r v i r cs a t h l l I t l r c sp r : cd o l l i g l r l . ' l ' o

ovc rcon re t h i s l l a r r i e r . i n l r c l r i c v i ng l r i ghe r con tJ ru t i ngsL reec l s i n couvcn t i ona l Vo l t Ncu r r rann a rc l t i t c c t r . r r c[ l ascd sc i i a I con t l l r l t c r s , pa la l l c | 1 ; roccss ing I cc l t r t o l ogy

UI-EC I ' I iONIC-S INFOI{MATION & PLANNING, MARCI I I989,:

is lteing irursued whereiu number of relatively slow

I)rocessors are hanressed tcl lvork as an aggregate inpirlalleI to resr.t l l . in high perforlnance.

,' I"hc

parallel processing architectures possess the,follorving irdvantages over uni-processor ntachitres:

(i) I 'elforrn;rrrqc--l;or vcry high speecl systerns suchar i sLrpcrcolnputc ls , para l le l ism is the only way to.g,o about. As soou as a high perforrnance lroardwith a sirrf i le processor is dcsigned the sanr€ carr bcrcpl icated 1o obta in the desi rec l power rat .her .c r r s i l y .

( i i ) l ' r icc/ l 'er [onnancc rat io-Paral lc l computcrs arcdc'signecl with off-the-shelf, n'ricroproccssors ascourpalcd to usagc of dedicated siugle processorsoI vcry lrigh contplexity using Gall iuur Arseuiclc ,tcc: luro logy or ECL or such h igh speed dcvices,tusing crl,ogenic coolir;g or sltecial .cooling rncthodsto opcraIc the dcviccs a l . n iuol i h ighcr spccds, i r r .arrn ip loccssor sul )er co l t lputer . Thus the systcm j ,

cost of a para l le l computer wi l l turn out to belowcr than convent ional corn i tutcrs. l ior instance;i l t a cost o l about $ 1 Mi l l ion, uscrs can now buy aparal lc l computef wi t l l th 'e sarne pcr fonuanie as

' thut o l thc Iastest sr" tpercomputers of t l re prev i<tusgenera{ i t rn. A s l " r rdy [3J revcals that not"only in thcr : i rsc of su l )crconlputers, buL, in , a l l rarrgcs beyondthc Iar gcst , I , ( . ls , l lara l le l processing is morecost c f lcct ivc.

' l 'hrs is o le l 'eason why paral le l

pr l rccss ing is ac lopl"ec l in o l I rangcs of conrpdters.( i i i ) l t4ot l i r lar i ty- ln gel lcra l , thc users ' nced for in-

c lc i tsct l conrput ing l )ower incrcascs wi th t in te.Accordingly , i t woukl be wise to invcst on t l l0r r rachi r rc wlr ich would sat is fy thc currer i t iequi r :c-r r rcr r ts i rnc l r rdd rnore r r rodules i rs and when thelcr lu i lcnrcnts grow. ' [ 'h is

coucept is kr rown as 'N' lodu Ia l upgr"ar l l r t i r r r r . I 'ar -aI Ic I l ) roccssi r Ig prov idesthc re<l t r i rcc l nroc lu lar 'arc l t i tecturc. In fact , CRn YX I \41 ' ( expc r . i n t c r r t i r l Mu l t i -P roccsso r ) i s a knowncx i r r r r l r l c o l r r r o r l t r l ; u ' a r c l r i t c c tu rc .

( iv) l r r t t r l i to lcranJc-- -Coinplcx onrbeddecl contro l lcr .sIo r l r r i l i t a r y a r r c l i r r c l us t r i a l app l i ca t i ons den randbui l t in l ' i r r r l t lo lerancc and gracclu l dcgradat ion. ,I ' l r r i r l l c l c< l r r r l t u t i ng i s t hc on l y way to p rov i c l e l l r i sr : r r v i l o l r r ncn t cas i l y .

l luu,cvcL thc problcrn wi th para l lc l conrput ing is i lcnon -av i r i l i r b i l i t y o f sys te rn so l twa re and comp i l e r sru, l t ich prov ic lc rbst ract p loglarnnt ing envi ronnrcnt for . ,I l tc t rsers.

' fhe success of p i r ra l le l cornput ing c lepduds i

c t t t i t c l l , on av l i l ab i l i t ) , 9 f t l r i s f ea tu re so t l r a t t he use rcan c lcvelo l ) l ) rograntntcs easi ly . ,

Wi th ntorc cor i r l tar r ics cntc l ing superconlputert t l r nu Iac (u l i l t g i l r cn i l a r r t l r v i t l r t l i c p ro l i f e ra t i on o fsul )crconr l )utcrs. a l I nrachines designeci for l r igh-speecll l ( ' i r l i n l l l ) ( ) i r r t l ) c r l i ) n l l i t n c c . ( i r r c s p c c t i v c o l ' w l r e t h e r i ti s a s i r r g i c - r r sc r ' ( ) r r r r r r l t i - r r sc r sys ten t , possesses ( t 4 -b i t

PAI

t t,ilir-oRm

R t A U R t F L t t I t 0WAVTS A'TRTGULAF OIS ' I A I ITES00T t i . s l0 ts

N A W O A T A

I ' lATJJS OPiRA T IONSAI IALYSIS , , ,

ve(orsul: te I

itl (. T i

i{1{ - -i l V ,'tl i

t -

ii ({

--.sr A llslcAL.-FTT--0tc0ilv0LUTl0N--!iAVE t0UATloN TEtltNt0lJts- -sPAnst i i lATRU sOLUi lo i l

_l

i1

l'

i

II

t , hl -

;liIrt{.Ii,Ir,

ilf\I

h

')PARALLEL COMI 'U' f INC

..vec. tor .capabi l i ty or not , l ruge rnachine wi th lo t of I /Oor a desk- top lnachi r re) ,has conrc to be known as

i super-cornpulers[4] . An i l lust rat ivc l is t o [ su l )crcentpu_, ter vendors in thc world (which I ' i t in this iefinit ion)alongwith their current range of proclucls, is given in'I'able

2 bcloul.

scrr t such computel .s . Of coursc, there woulc l be anoverlap between supercomputir,s an<lrnrinisupers and itis diff icrrlt to have a clear demarcation between the#two .

l t rs ut lcrcst lng to note f rom the Table 2, out of thelar gc l is t o I l r ranu[acl .urcrs of a c lass qf supercorpputers

'I'ablc 2. Illustrativc List of superco'rpurrr yentrors antl pr'rlucts

Vcndor Pt 'oduct(s) Speed inMegaf lops

No . o fPEs

Avai lableappl icat ions

Cray .Research X.MI'Cray-2}' .MP

ETn lo-QE]-A 10-E,E]'A l(].CB

VP- IOO EvP-200vP-400'6-rJ2.0/60

s-B20/rnsx- l nsx-2 n

500150150

161616

' ? ( )

202021a l

100100

1 1 0>200

B65

2fi) ,

l l a

I to l le

5<10na

none yel

20-25

no.ncl

l l onc

1,800

I IOne

.ETA Systenrs

Fujitsu(Anrdahl inUS)

llitachi

NEC ( lrsNX'in USA)

All iant

Convex

Cydronrc/Prirne

Elxsi

FloatingPoint Systenr

Gould

Saxpy

ITX/B()

C 24Q

Cytlra/ lvlXOL 56420

f,PS M 64/r;0

N P l

I\4alr ix 1p

. Ser ics 2010

cP 1000

Il lex/32

IPSC/2 VX I

Corrrprtt i rrgSur facc

l{ Clubc l0

Iv{oilcl 2.56

CollncctionlYt acnlnc

Series l0(XX)

Titan

II igh-end Su;rercolnpulers

t ,200I ,8002,400-3,600

9476,85710,28f;(e)

425Bs0I,700I ,.5()()3,(XX)

665,1,.] f i )

I \ l i t l -rarrge Supcrcorrrputcrs

l 8 9

2.1t0

50

120

38

321)

l0m

ll ighly Parnl lcl Oonrprrtcr.s

B()

l2-s

B()

I ,028 :I , f iX)()l l .nore

500

384 :

2,5ff1

Sirtgle-user 5u;rcr-conrputcrs

14Q

(t4

44I

2 'II

I11

II

, l, l

Ametek

B B & N

Flexible

Intel

Meiko

,N Cubc

Parsytcc

ThinkingMachines

;

Apollo

Ardent

' i i

B4l '

12

I

B

32

5t225620

641,000artcl uJr

1,024

256 .

4

4Electro' ir ics/March 3, l98&

, The above lrst"riogs.includC,sonte computers of a l itt le

low.er rauge oI perforrnance as comparecl to outf lehnl t lon 'o f supcrcornputers. In fact , the new c lass i_ttcatlon o[ minisuperionrputdr could be usecl to repre_

!

ip the world, mainly the Japanese manufacturers(NI lC, Fuj i tsu aucl. l - l i tachi) make single processorversions and all others including the lqaders in the field,such as Cray and ET,A Systerls, makb multi-processor

274

, . ve t s i ons ,w i t l t va r i ous t ypcs o f pa r i r l l c l a r c l r r t cc tu rcs [4 ] .' l t coulcl ' [re sccn tlr lrt u,lr i le Cra1, n11,1 I]TA are [rasccl ou

rn in inr ; t l ly ;para l le l archi tecture, Connect ion Mir ih inc[a l l s i n t hc o lhc r ' . ex t rq r ] r c r ra rnc l y u rass i vc l y pa ra l l c lct. lnrDrltcrs.

Vcc{o l I ' r 'uccssorsar

Sca la r p rocc . sso i ' l t c r i o r - rns an A r i t h r r r c t i c o r [ _og i coi rerat io l l on o l le sct ,o f operancls rv l rer .cas vcctor

. l ) rocessor l ter forr t rs onc o l tcrat io t l o l l an array c l I'opcrands. Vcctor processor uscs n l l ipc l ine proccssl l tg

i .e . the opcrarrds arc in i t ia l l l , loaclcc l in vcctor lp i l lcs ani la [ { .er the p ipes are [u l l , resul ts of opel .at iorrs come outonc l ler c lock cyc le and thorc is r r cont inuor"rs fccd ofdata to thc vcctor Ar . i thnrct icy 'Logic urr iLs ancl con-Linuous f ldw of rbsul ts t i l l thc p ipe is cr r rpty . l ly Lrs ingrnorc vecto l ' [uuct ional uni ts thc s1;eecl can l tc fur . thcriucreascd.

CDC STAI{- l ( )0 rv i rs thc f i r .s t vcctor corr r l . r r . r tcr ancl i twas no t success [u l . S i rn i l a r l y , ano thc r vcc t r l r nnc rh i i r c ,T 'cxas Inst ruurcnts Aclvarrcecl Scierr t i t ic (Jo l rpulcr(ASC) a lso could not succcccl in thc urar tc t . I lowcver,these c levblopl l lcnts. gave I i .se to ac lc l ing c lec l icatedvector processing nni ts iu gcncra l l lurpose sul )erc() rnp(r -tcrs such as Cray 1, Cybcr 20-5, Cray XMp scr . ics, Cray2 e lc . lo s l lce d up the cxe cut ion o l ' vcctor opc:Lat ions.Acld ing \ /cct ( ) r funct iorra l uni ts a lc wcl l su i tcc l forl t roblerns such as sc l r t ing ancl s inru lat ion pr-ot r lcr r rs sucl ras wcat l rcr forccast ing wlrcrc t l rc cnt i r .c univcrsc coukl

'bb t l lokc-r r in to gr ic ls ancl ,n thc lvcathcr l tur . i r rnc lcrs ateacl r gr ic l arc evaluated i :cpeatc<l ly :

Srrpcrcoru;ru ler in t t rc Int l iun Context

A pr inrary qucst ion for Inc l ia is , which appr.oach,should ia . c lcvc loping courr t ry l ikc I r r r l i i i ac lopt fori l td igcuous c levc loprr rcnt of supcrcro ln l )uters-tut t iprocessor based sys l iur , nr in imal ly para l lc l s) /s tcntsor nrasdivc.fi, Jrirrallcl s),st.pllts?

U.rr ip loccssor s l s t r r r r

Ut t iprocessor bascd syste lns arc g iv i r rg way to par . ; r l lc lp rocess iug sys t c l l s ( r t r i n i r r r u l l y p l r r . a l l c l o r n rass i vc l yparal le l ) in v ierv of thc Lrppcr l i rn i t iu pcr lornrance o lt l tc formcr and cosl e f fect ivcncss of t i re la i lcr . lnc l iaI tav ing a lmost a weal( co lnponcnt basc ancl a lnrr ls t nr . rbase i n t he a rea o f V I -S i c l un ( ) [ t l r i r r k o f l t u i l d i ng a ve ryI t igh per forntauce uniprocessor basccl s i r l rc t rcor lpr . r ter(s i r rce the pfoccssors in such nracl t incs ar .c propr . ie tory)t i l l such t iurc Iuc l ia 's . cdni i rot rcnt basc in i lu t l ing rn ic-roc l cc l rp r r i as l cac l r cs I l r i g l r c r dcg rcc o f sc l [ -suf f ic ic r lcy.

I\4assivcly I,arallcl Sl,sterrrs

I t may uot t ! .bc d i f f icu i t a t a l l ro gct of l - rhe-s l rc l fl i l icroprocessors. I-Iencc onc nray be tenrlrtecl to use alargc nuntbel of s lanclarc i rn icroproc"rrnr . and bui ld a

; s .

EI.EC'fItONICSiITFORMAI'ION & PLANNINC, MAI],CI I 1989

rurassivcly parallel processing system. There is a catch!Will such a systenl be able to give near l inear speed-upsfor a host of applicutions? What is the chance 'of

succcss[u l ly l lu t t ing them togethcr at thc background ofthc dcsigl infradtructural support industries in Inclia,such as l ' ]CIl Manujacture, Engineering etc? I-Iow e4sywill i t lre to develop application prograrnntcs? ' l]he

i lns lver to thesc quest ions.rnay r rot be posi t ive. , -

l\{iuirnallj ' I,arallel Systenrs I

Al though t i rc country cannot get, access to t i reproprictory l.,rclcessois usecl irr CItAy like corrrputers,access to powerful RISC chip$ released recently byConrpal ies l ike intel , Motordla, AMD, Sun erc. mayltot bc as difficult sincc tltese ate conllllercial chips.I lcucc i t is advisablc to.go in for developrnent ofsupcrcontputers based upon such proccssors and adqrt,aCRAY XMP I ike approach. As t i rne passes, the power-of thcsc ptoccssors wi l l a lso increase and the power o[the supercornputer in turn rvi l l increase. I t woulcl alsobe easicr to provicle i lbstract progranrrrr ing capabi l i ty in{his type of maclr ines.

Iu the fol lowing, we wi l l , consicler these issucs inr r r r r l c dc t l r i l s . I

II. PAIIAI,T,IIL AIICIIITIICTUITE CMarry cliffererrt types of architectures have been

devclol lcr l by di f fcrent veni lors of paral lel computersi ;lurrcl nran1, arc bcirrg cvolvgd. ' Ihe var iat ions in paral lelarchit.cctules can be due to the following aspects where{.lrere are various waj,s of designing the cornputer, any'of which nray sui t sorne appl icat ion or other: , j*- Degr ee of paral lc l ism , , ' : .- l)cgree of coordination betwecn processofs ,,- - Ins t r r rc t ion and L)a ta hand l ing*- "l 'opology o[ iuterconnectiorr

Wc rvorrlcl llow sec each aspect of detail. : _

I)egrce of ParallclisntDepending upon the degree of paral lel isrp bui l t into

lhc architccture, the architectures can be classi f iedeit l ier as f inc grairred or coarse grained.

I i i r rc ( i raincd :

Arr architccture,is sdid to be fine grained if there arevcly nr iury slnul l processors which cornrnunicate re-lrr t ivcly l i rs l . E,ach ProceSsing Elcrnent (PE) executesver:y few instructions before conu'nuflicating with othgr.processors or nremory elernents. Exarnples are th'econnect ion,nrachine with 65,636 processors ancl rnai iynew pl'oclucts using olf the shelf microproces3ors fronllNl'EL farnily such as 8086/87, 80386/87, Motorolil ls68020/(r[t881, INMOS T'ransputers etc.

Coar:se {}raiuecl ' ,. : ,,In Lhis type of nrachines,'the size ancl'capabilities bf

each unit is much larger; the speed of each processortvould be of the order of atleast 20 megaflops or more,

l ) ,

I

D

Et

II

I

I

II

' f '

275

S I M Dl l ' a s inglc s t r .canl o l i r rs t rLrct ions arc cxccutet l by a l l

l ) rocessols c()ncrurrenI ly rv i th d i f ferent s t rcanls of ia taf lorv to t l tc p locessors thel r the archi tcctur .e is known asSI I \ ' lD. S() lne t i l l lcs ih is is a lso krrown as Georrrc t r ic orl )ata [ )ara l lc l jsrn. I 'he f i rs i pract ica l SIMD rnaih inervas lL .L. lACl lV of the 70s.

' l 'hc SIMD archi tecture

cons i s t s o f . . a s i ng le con t ro l un i t coo rd ina t i ug t he ,

<tpcrat iou of 'a l l t l re proccssols ancl conta in ing necessary

: ,b l : :1 . cor les oI thc ; : rograrnnres for propogatron toi 'c l iv ic l t ra l [ ) r 'occssors ' t pre-deterr r r inc i t i rne sy 'cr r r . -nously. A t l ,p ic i r l conf igurat ion of Sl l r4D archi tccture isshown i n l r i g . 2 [ 6 ] .

il{ lt lt r l t 989 | n l , nn t r r : l ( ( ) l \ l t , r I lN ( ;t a - _ - _

! lt : i : l ] |

' l . hc p ro . " ss ( ) r s : r r c g r .nc r - l l l 1 , j o i r r c t l l r y r t . l ; r l i v c t y s to i v

l - up r f con rn ru ' i c . t i o ' l i r r ks - I i xa rn j r l cs r r r c [ r ' r 'A l 0 w i t h r . r

i . l o i

I processors, CI{Ay Xlv l l ) i rva i l r r l r lc rv i r l r [ .2 or_ 4

l ] 1 .o , I P rc l cess , r s , ( c l r c l t P r ( ) ( . css ( ) r r v i t l r l r Pc i r l i P t . r [ r , r . r r r i r ' t . c . 1 .1) i l1a, | ,

40U L 'eg.0.ps) c tc . - I ' r rc rcce '1 ly a l l 'o ' l rccr l r r r rc l

f j ty | 80,860 I ) recessor re por tcc l to bc conta in i r rg I r r r i l l ior r

i r , , : I t ra 'srstors, t tPcrat i 'g wi th a c l 'ck ipec<J of 4 i ) I ' l I lz a, r l

i | . IJrovtdrng. a _1 ' rcr . [or .nrancc of ,g0 ntcgal - l t l1 t pcr ch ip i rnt l

l ' ,

I Motoro la 's 88000 r i , i i l { ' .c i l i r i i tc bui r i l ing^uP , f 'a l r .s t . I

i t r , . I coa rsc g r i t i r t ec l t t r i r ch i t l c s a I t ' uc l t l owc r c ' s t s .

b " , | . 1 c ( ) i r r sc g r ; r i l r c r r J ap l i i i ca t i on ' c l r r bc i l i v i t l cd i r r t o

iUy I l og ' ca l l l a t t s . r t t ade r rp r . r f . l o r r , g ' i nc l cpc r r c l c r r l ' r ' c css i ' g

h rv I scqucnccs r v i t h l i t t l c co r t r ' u r r i c : i r t i . ' s w l i i l s i ' l r f i r r cipr . I

g tarncd appl ica l ior r I 'c rvcr ins(r r rc t ions ar .c cxcculct l

l 'o f | " 0 . i , I : . ' l cot r ln tu i l ic : r l ion stcPs [ ,5 ] . I r r t l ic i i r rc gr l inet l

i p , | ' a r cJ l l t cc tu ' c , . u ' h i l c

t hc c r s t ' f c l t ch P r . ccss i r rg c l c ' r t . r r tl ; . I

wl l l bc low, thc cdnrplcx i ty oI in tcrcronncctror t u ,oul< l l rco f : I

t r cn l c l l dous ; ( ) l l t l l c o t l l c l l t : r r t t l , i t c ( ) i * sc l l . . i r r c r li ip f . l t : l , l : . . lur

cs [hc cosr of cach Proccssor rvorr lc t i rc h igh

l i , , I and l hc , t cch r to log i ca l i s sucs o f bL r i l c l i ng cac l r p r c , ccss i l r

f , , I yo,Y11", ro Lrc too cornPl icatet l . ldcal ly r l rc l rcst way to] , . , 10, ,1-r l , l '

Iastcs[ , tuo,s l powerfu l cr . r r r r l r t r tcr .s is to bui lc l Lrs ing

i , l | ; l l " lastcsI pos-s ib lc . pr ,eccssiors uncl c<)r rncct i rs rnarry o l

I I ' t he ' as ' oss ib l c . ' r . r r r l , c vc r , t r r c r i r ' i t a t i . l r s ' f i l r i s

i 1 , , a l lPrr rach woi lk l bc duc to cost anr l d i l f icu l t ics in

I l j , l r r , r ,g , ra l t rnur lE i ppl i r :a t ions t t j l t i rkc advarr tage of t l tc

' { l '

" 'u t t " ' tcctufc '. it l ; i{D:gl:. of Coorrtination ltcrrvccn l,roctrsrjor,s

p f Ci r ; 1 'he p*r 'a l le l ar .c l r i tccturc c i r . ' ls ' bc c luss i l icc l i rsrv

lJr i ; t rg l t i ly coupled a ld Iooscly couplcc l del tending ot r thcOl

l i ; l ' i ,9 .gt"" .oI coordinat ion bctrvccrr rhc prdcessors, t l re

i f i : l iundcr t f ing cor tccPt of rvh ich is expln i i rcc l [ r 'e low:

; i t i ,Tight ly Couplcd

1{ "L ins t ruc t i o r r and D l r t a . I l and l i r r g, ; { ' .

' . Thc paral lc l co ln l )utc ts ( . i l l l ii j , . ; Tt t " para l lc l conrputers can a lso bc categor i

r i i i j i S ins le Instnrc l ion Mrr l f in le f ) l r r /Ql t \ , , r l ) r - , , . r r i /

r 5l ; i g . 2 S ing l c I us tn r c t i on Mu l t i p l c I ) a ta .

lvl l I\.{I)I f tJ i f fcrcnt inst ruct iorr s t rcanrs arc cxcculed us ins

clif l 'ercnt sets of clata {ncl/or each proccssor executes fl ) rop,ranr ' in iso lat ion f rorr l a l l the other-precessors thent l rc i r rch i tccturc is kn<; lvn as MiMD or evcnt para l ie l -isnr . ln t l r is case, cach proccssing e lenrent is associatedr ' i , i t l r i ts owl t coutro l Lrr r i t ancl inst ruct ions in eacl rprocessor arc executecl asyrrchronously. I i ig . 3 i l lus_t ra tes p r iuc ip lc o f MIMD.

l 5 I

F ig. 3 lv lu l t ip le Instruct ion Mul t ip le,Dara

Note: Cu-Control Uni tPE-Proccssing Uni tiS -- Instruct ion StreanrDS- Data Strearn

HEIIONY

nt_l, i i l : I I eacl t proccssor has a corrsic lcratr lc pr ivate lner l lory

#"i l unO calt [unct iorr as indcpcnrlent l l , is 1r9ssi f ie anrl

;;+ipaSSes or reccrvcs nlcssagcs toilronr other processors/

renloryw l

l b i

I l (

ru

ba lI l e l

r Y l

r v ; r t c i r r r d c t i r r , r p l c ( c l y g l r -rn ing to val r ish and nranyrnelltory anil local ntetr.lor f bo th .

) ry { ) l llDegfnarcU rtgcs ()

ete lt re i I

shhntar

rely ,e;be6rhhrc itagcs

sys I cu lscxplo i t

gorised as

Lp le

s a rt h svan

ornplllns tbothadva

,0tn

elnlCoste,(

syslusethe

; : r ; "S ing l c I ns t ruc t i o r r Mu l t iP l c ' Da ta (S tMD) r r r r c l N {u l t iP l c: ' . ; , i l l n s t r u c t i o r r M i l l l i r r l e f ) r r r / h l l t \ t I - \ \ . , - ^ r , : ri i i . Ins t ruc t io r r Mu l l ip le na ta . ln ,4 ln lD) a rch i rcc lu rcs .

I

6

e'

Whi le in SIMD typc, t l re host proccssor or t l rc f rorr t' 'end has to cornmunicate i [nc l coorc l i l ta tc wi t l r a s i r rg l t :

contro l t in i t synchrouor. rs ly , in MIMD i t . l ras to be doncwi th a numbcr of coul ro l uni ts asyrrchrorrorrs ly increns-

, ing the complcx i ty of thc r les ign. I lorvever , the SIMI)1 'and MIMD nrachine sui t d i l ' fercnt types oI appr l icat ions.

V/ l r i le thc SIN4l l nrachi t res i i rc i r .L : i r l ly su i ted for so lv i l rgproblenis possessing d i r ta par i r l lc l ism i .e . huge dataI rave to be t r t roccsscd for thc same inst ruct ions, t l reMIMD nrachi r res are sui tablc lor problems cx l i ib i t i r rgcol t t ro l para l lc l isrn i .e . t i rsks which can be subdiv i r lec liu to a nunrbcr of smal l l isks ancl thc c lata to bc hanci lcdfo r cach task i s d i l f e r c r r t . A co rnp r r i so l t o l S IMD andMIN' ID ntachincs arc g iven in

' l 'ab le 3 [7] .

' l nble 3

sl .No,

[ ]a ra l t )c l c r sIt!{ t)

f

E I-E C'I'II O NICS INI;OIIMAT'ION & PLANNI N G, MARCFI 1 989

. Fig. 4 '1Bus" Architbcture

I

Ilrclllory conlmunical.ion is typically non-blopking. Syn.chronisation does not clepend on data transfer and soplocess need not be suspended unt i l conrmunicat ionsteps are completed, '.lo read or write datp, a processuscs ei ther a inutual exclusion' lock (a var iable thatirtclicates whether another process is u'sing the data) orsonre other signal for ,deciding whether the clata isready. When a lock is used, processor waits only if theclata looking for is not available. The shared memofysystenr can be' inrplenrented using a global "bus" orthrough switching network.

hr a "IJus" bascd systern (a Bfock diagranr shown inI"ig.4), the processors, Inclnory modules and

'IlO,,

dcvices arc conuectecl by one or inore liigh speed buseslo a vcry large global lnemory. Irr this system,..one rprocessor does not inter(upt other processclr for corn:nrunication. I-Iowcver, there 'could be delays due tonr,emory contention i.e. two plocessors need acgess to..1reacl the sarnc nrcnlory module sirnqltandously. Thohand-width of the conruon bus l imi[s t l rc nurnber ofproccsses which could be connectecl to the bus withoutcontcution. Designers can improve the performance oftire hus-based systenr by adding high speed localrnernories or 'caches. to each processor. Thb loeallnemol'y can respond to'-most rnemory referencesrcducirrg contcnt ion for the global bus. Local memofynray lre a problem because each processor cfn modify=copy o[ clala independently. One solution is to addhardrvare that synchronises the contents of all caches,but this rnay aclcl to the cost of each processor in thesystern. M/s Sequent and lt4/s Encorp have adopted.high-spced trusei in theii parallel computer design;r'

. t .2-.3 .A

6 .

'1 .

Data pa ra l l c l i s r nC-on t ro l f a r r r l l c l i sn rMclnory contc l r t innMcrnory acrccss I ) r t tcrn

Progrant t r tc ct l l ts l . l ucts

Easc . o f dcbugg i r r gprogrnnrrhc

Algor i thrrr dcvcloprt rent

Abstract progranrrningUpward conr i iat ib i l i tyof .exist i r rg appl icat ionsoltrvii rc

l r t l t c r c r r tNo t l t oss i l r l cVc ry l i t t l cI ) ror l ic table ant lc le tcrnr i r r is t icDiroctex t cus io r r s r i fconvc r r t i ona lp'iogra nlr,,1sS i r r r i l l r t oc r r r i v cn t i ona I

l r rogranrrrrcs &I r c r r cc c ; r s i c rNo . o I c x i s t i ngparal lc la l g r> r i ( l r n r s i r r ctrascrl on SIN'll)l&ssib lcCa r r be eas i l ycx t cndcd t ol i r r t lan 8X

( l n r t bo s i r r r u l n t cdl 'ossib I cVcry l r ig l rl {and<rnr anr lnon -dc [ c r r r r i n i s ( i cNew p rog ra rn rn i r r gtccl rn ic l r rcs &crrv i rorr incnL arcnccdcd r '

Di f f icul t tov is ual isec0l)cu I rcn IproccsscsNlany ncrvalgur i thnrs haveto bc t levelopedabini t<rDi t f icul t1'o bc r,levelopedafrcsl r

' 8 .

9 .

A new mct l r r :d of lnst ruct ion and Data handl ing isepergih[ lvJr ich is knorvu as Algcrr i thrn ic Paral lc l isu i inwltich cach proccssior is responsible for part of thealgor i thrn and the ent i re data passes through each PE,.Choicc of SIMD or MIfv lD or Algor i thrn ic Paral lc l isnr ,has to depcnd. upon appl . icat iorr on hand.

't 'opology of Itttcri 'onrrcclion

One of the major arcas of fcsei t rch is the topologyusecl to in lcrconncct t l rp processors and nrernolyelemenls. As of now, three d is t inct topologics haveernelged l lanrc ly bus basocl arc i r i tccturc us ing shareclmetnory, swi tch ing nctryorks for sharcd n lcnrr ) ry or

'pr jvater nrq inory a i rd rnessage passi r rg scherr ic .

! i

"Bus " A rch i ( cc tu rcIn a shdrcd lnernory con.rputcr , a l l proccsscs acccss a

comlnon mer i lory and corn i l runicatc rv i th eacl r otherthrough l ressages lef t in , the u le l t lory. Al l processorsI tave acccss to any" in[orrnat ion in lhe ntcniory. Sharct l

I

Swi t c l r i l r g Nc tworks'l 'hc

other topology in the sharecl memory moclel iarusagc of switching uetworks which work l ike telephopgswitch boards to intcrconuect procqssors and merhoryrnoclules. ln this schenle, ane to one il)terconnectionbctwecn N processors would require a cross-bar switchof s izc of order N*N,for cgrnplete connect iv i ty , In sucha case, there woulc l not be any men- lory content ion.

tut ,

S

i

' g l , ,

i l,,i l: i.i l t,lfuot'-ol coMrlr r IN(;

i lCflo-U:"1' thc.swil.ch size woultl incr.ease ctepencling

ili i i}fon tlte ni.rnrbcr of proccssors. 'fhe Ilutrcrfly rnuttil i

j l lt i[q::t:".., fronr IIBN Aclvanced Compu'ters uses packcrI rt swltclled neltvork and 256 processors can br: connectecl

| ; ; i In t t te; . archi tccture, A typical in lcrconncct iou is shorvn

l;l',1lo,f ''g t'

l . r i . i ; , 'i'l,lri tFi

-l--{.--_l!-fir.MnDY r

Fjg. .5 "Nctrvork" Alcf t i tccture

:

rcubcs-l t {cssagc Passiug . ,scherirc

277

qocles, each connccte,ct by cle<_licated cornrnunicationclarrnels to {he 6 neighbouring nodes. A ,ro;;;;;however, colirniunicate with oih*. ,,o.I*u by p;ru,;;l l lessages tlrror.rgh irrtcrntediate noctes [B]. Fig.'O ittu_il rates a hypercube.

. ' l 'hc l ry l ler .cube's corn lnunicat io l r sys lenr an<J indi_vidual nrenrot-), at each nodc arc key i lraractcristics toal low cxparrs ion of , lhesc conrpuicrs beyoncl most1t;rrallcl architeclures. LJecausc hypercutlcs r"p;;;;efficierrt cotnrn.unicatio,n bgtween treighbours, lryper_cti lres w.rk ivcil irr airplications involviirg simulatirir ofinhercnt ly co l rcurrcnt Jrhenomena.

. Mclsage passing fo l lowcd in l rypcrcubc archi tecture

rs. a [rl<.rcking rrre thod uf comnrrlniiations that synclrro_ru ises proccsscss i rnpl ic i t ly . In other words, thc processrcqucst ing r la ta nrust suspend i ts operat io t r unt i l thecolnurunicat ion proccss is conrpletcc l . Impl ic i l .sy l rc l r ro_,iza(ion siurPlif ics the progran'ucrs job rrut it ir icrcasesthe scircdulirg overhcacrs tlccu'sc i1 pl 'occss must lrcstopl tcr l and rcstar ted each t i rne i t requcsts c lata.

' l 'hc .rncssagc passing schcnles enrptoying switching

netlvorks, alc becornirrg poptrlar i,, ui"* of theirnon-blockirrg lftture ancl allowing clifferent configura_tions of pl 'ocessot.s clepenclirrg orr application irr hand.L-orr f iguLat ions such

.as hypcrcul r " , 'or r .^y, t ree (bothbinary, (curury, double teniiry) are easy io ,uof," irrinlsn i tchirrg nctworks. l)ynaruic rccorif igurirt ion is the areao[ iesearch wlr ich when successfu l would nrakc theI r i r r l r l l c l co r r r J ru l i r r g n ro rc pop r r l a r .

i l t . soF. t .w^t i l i

_ ( lor r r l ru tc ' hardware is ' f r r ' usc Lrut for t l re sol twarc.

In t ruely para l lc l arcrr i tcc lurcs, sof t rvare issues are verycorrrp lcx s ince conccptual iz iug, f r t rnru lat ing o, r iv isr ra l is i r rg oI lhc progfanl l l le s teps are to l re done i r rpara l lc l . ' l ' r l re l l , l tara l lc l ,nrachines enrpioy ing MlMi)arc l r i {cct t r rcs cxhib i t a great chal lange^ in the area ofsoI t rvare pal t icu lar ly the appl icat io l r sof twarc. The

,ii'fr-lir jl:iiai{ 'i':':n''ir,i

il.ili.i ii

,,;i:1i!ir 'dliil$:

$iJi*,[''lil'. i'ii#iriit"l,;"F],:,,i 1i , t | . i . ' i l ,j.l.li 1. t!. "

f

iF j lpe lcubcs ruu r i ru l t ip le prograni lncs that o l )crat .er t , l r ru l t ip le scts oI c luta. Wi th in the rnachine cac]r

pro.ces,s ing.e le l r rcnt , ca l led noclc , , is inc lcpcndcnt . Uach

i l ro"gp t ras l ts o\v l l urc luoly , f loat iug ; ro int haidware,l f fmunicat ion p. roccssor ancl a coi ry of thc o l tcrat i r rgste ln a ld apl l l rcat rorr progta ln lnc. ' l 'he

I Iy l ler .cubebhiteoture was clcnronstrated for (he first iirnc byilifornia.Lrstitute of 'I 'cchnotogy*(CA

I-'f IrCII) clur ingJ l j c J \ . - - - ^ - ^ ' - - ' - ' , 1

" - " " f j

;:,year 19{33 by constructing 6.1-nodc nraclrirre witlr2/ t {UU/ processors. I ly l tercubcs can be thorrg l t t o f asbe of any climension with a noclc at each coiner: lor

.qmpie, a two clirrrcnsiorral hypercube contains fourideg connectcd $y cornrnunicat iorr l incs to fornr ar iare. In thre 'e d i r r icns ional archi tectu l ,c , g nodes are

bnnected irrl.o a cubc. l. irc nunrltcr. of proccssors in theypercube architccture is t.rl lvays'ia power of two,the

NET

0RK

t,lEt,l0R Y

, i t ; l l t , , " bu t tc r f l y s$ ' i rc l r i s a g ror rp .o l c ro55 [1111srv i {c l rcs .l i i 'Assumilrg that l6 pt.occssors havi, to bc copl lsgled 1o 16i; ld, lynlory.baukS usirrg 4X4 crops bar switchcs, thcu thcff i ; f ]$q calr bc connccrccl ro rhc 16 inpuls of rhc 4 switchcsffiin- the PE, sidc and rhc 16 ourputs oI thcsc srvitches willl$iiqg,,to.16 inPuts of 4 such cro'.ss bar switchcs in the&T1:tn: ly sic le in such a way that onc ourpur each of rhef|,+j,l '.fi,tt,t" ttuil.ch gocs to one input of thc urcnrory siclchlii}t,t,t,

Eaclt outPut of the nrcmory siclc srvitchcs will got j i ro each nrc' lory bu,k. "r ' r r is cau arso bc cxtcrrcrecr i 'i r ; . ; 1 i . : . , ' - " , . " ' e , , , v r / u ( r , r A . I l l l s u i U l a l s o l ) c c x t c l l d c ( l ( o

ffin,con,ec[ lno'c l)r'occss'rs usi'g air irrtc'r'ccliatc swir.ch-|,1iiltry

rt.,e" wlrich is knorvn as nrulti-stagc swirchirrgIli.l lgl*,,tt sclrerrrre. 'I ' lris

scherrre' cail als() [rc rrsect fui.itiu,g9nttccttrtg l)r()ccssors to l)roccrisor.s in ir l i lessilgcl:;;paSStttg tupcrlogy.

I : ig. 6 "I lyirercube" Architecrure

273

' I ) i l1 f l l l c l so [ t \ \ .u r\ i ;,,,1".,r,"iil;:' ; i,l":l;;'jii i:' *i'i:ii:lll,,Lilil.ii;tvoulc l . rvaste a Iargc a l l tount oI l rarc ju,arc pr : r fonnauce.' fhc

rcar pcr forr r ra,cc ' f the parul rcr L-0 ' ]putef crcperrr rs' to a grcat cxtent on t l ic r ; r . , [ t ruu," ut , t , " systcrr r .

Operating Sl,stclrrs

As i s we l l U : l l * l ] r l r e o l r cn r r i r ) g sys l cn r i r r a r r y. contputet, nlarlages harulware ,.r,,,-, ir.r, clelivers r.e_sourccs to t i rc progrant l le and scheclu les anr l coorc l in_ales users anci jobs ancl o l l t i r l ises lhcr t l12; i11111111ut i l isat ion of thc i r roccssing . l "nr" , r i r .

In the parallel cornpul.er al.clta, prol)l. le tory operatirrgsystcrns were usetr by a r r r r rurrcr , r iu" i r . tu* r i r l recerr t ry .yNrX is cnrcrging as a ,t"r,, | i , , lnjurtry sranrlart loperahl lg systenr afte r a lonsiclerable arnount o[ wol kl ias beeir carr iecl out to_ac[d 1,ur, , t t" t s, i f tware exlcrrsirrrsto UNIX. Scclucrrt and lJ l lN u.tu^,, . . , t corrrputcr. havcalrcady anrrourrccd paral lc l I , ro. . : ; : ; ;g sysrcnrs w, i(hyi l l l ,opc.r t i r tg. systern* clray l {csearcrr l 'c. hasreleasct l two vcrsions of UNIX f trr i ts nracl t ines alrcl,,ro,l{-,111r.". sullcr.colu [) il I cr r11 11u u p,,.t u rc rs h ave UN IXor UNIX l ike op,er i t i , . rg ,yrt" , , , , , r ' , , r , , i , lg on theirn rac l r i r rcs . A nunrber o f l c i , r . l i ng cornp i r r r i cs suc l r asCray, I l fA, I iervtc.rr_p.ckard ; , , ,? ; ; ;J, orga' isal iorrs, are w,orking togethcr to arr ive at stanclarcl UNIXopcrat i rg syste'r a-r tr lor i ' terfacing l roRrnaN *i i r ,thc stan<larcl UNIX [al

. f l t is i , , , , r i , , iy, f , ,c to lhe uscrs,dcrnanc l lo .havc l r kn<r rv r r opeIu t ing sys(c rn .

Corrrpilcr.i I

' I ' l te cclrrrpi lcr ls convcrt sol l r .cc coclc of i rppl icat iorrprograntnrc into nraclr i rrc coclc for.execu(ion. l ,hc

:lt::::l l:t ot cor4rilc. .r"r.r,ni,,.r'^,r,. cfficiency ofexplol t l i lg paral lel isrn i t r thc architccturc. Sornet irrres,the, ef f ic icncy of co'rpi lc, , ,u, , , ,ui i r"

" i i i . r , , " . ' r becausc, qf b.adr 'r.grarnrni.g. c,i.xt l,,.,gr;,;,;l;, ca' also bespoi lcd [ry i ref f icrcnt c ' rrr ' i iors] r ' l rc icrcar si lu ' t i r_rrru ,ou ld bc to l ravc c f i i c icn t 'p ,ugrnnuu" , a r rc l c f l i c i c r r tcornpi lers. Extclrs ivc ancl s incceisf , . ,1 u" j t , , r ,ui , rg, opt i_mising, paral lel is ing cornl l i lcrs havc l rcerr c lcvclol ;ecl byConvex, Alt iant inrt t r , t rr l t i t .krrv. (r ; ; ; ; "^ has cvcrrl ] : : lq:U

cornpi lcrs ro orhcr . , , , , ,pn,,r . r . - i .hc . t lPancsehdve dcve lo l tc i l au lon t l i t i c vccror is i r rg conrp i le rs us i r rgrvlr ich i t is rc lal ivcl l , cas), Io rrrrr cxisl i lg l ;ORl. l (ANprogranl l l ' les with sul)crconrl)utcr,s vcct i r pr.ocessor.

, Applic:r(ion Softryare

., ._4,r" kei- t 'o ' thc succc.ss of paral lc l proccssirrg l ies orrt l re speedy devcropment crf appl icat ion sort*,ares.Although niore than i00 appl icar ion- 'packag.r- ' , , r"

, reported to. have,,been rcleaseJ by Al l iani ancl Convex,w n o s e a r c l t r t c c t u r c s a r . e c ( ) i u s c g r i t i n c d . l l r e r c a r c a v c l . yfew appl icat ion packagcs auniTabt"- { t ; , - ' i i , " ' rassiverv

l r l - l rct t{ONIUS tNFOIIMA]]ON & pLn NNINC, Mn RClt i9B9

parallel systerirs. .[]rus a lot necds to be Coile- fordevcloping applications bcfore tlre massively parallcl,systenrs becolne popular . , ,

,

!

IV. AI,GORIT'IIMS

RctD effclrts irr 'par.allel algorithms cla(cs lrack tro;tcleca<le or.nrore wlien Vecto;;.u*;;;,;purers such as9:t-t and Cyber 205 ancl A;;;;^;;"ssors like ttrcICL's distributeci Array p.or"uuor'haie createcl theinr l tetus lor enonrrnur- ninou,rt of

'*ork in paral lelalgor i th 'r ' A ,urrrber of rrrethe'rai i r iunr ancr scient ists

l:T other <tisciplines are J;;;;;s, a substanriat ,arnount of their tine. in clesignirrg pnroihl ̂ lg";;il;;;golve problcms which lvere solvej hir .herto ser ial ly andto achicve linear snced ups. ..1.o ,t" ii;;;; ure atgorirhmclcveloper ancl prograurrner have to rhinI in ;"";;,bi:'

Y. PtrRiolL\,t^(\cri EvAr,uATtON ' ,

T'lie power of a scicnlific co,nlrut"r is rneasurccl by itsspeccl i r r calculat ing, ( loat ing point ( .eol ; uu,rr fr" i r , ,:1ur,rq:

capaciry (real. menroi"ylvirtual nremory),, rliscl-llaci.1y,

I/O spcecl, the freciiion antJ ro,rg" of num_Ircls tha( it calcula.tes ancl the ttansport rate bfcalculated,uurnbcrs (band_*icl th; to i" iJ, f rnnr var iouSelenteuts of the systcm.'I' l ic

pcrfonrrance:cvaluation is at tlte $anre tirne not inecessar.ily an isolate<j ttrsk of involving the conrpui[r; r lorre bur i l cornutex i rrup lnuni" i ;go; l" fol lowing[ ' ) j '- App l ica l io l rs r-- Sizc t-r f t l te prgblcur i- Algur. i thrn to solvc thc l rroblcr i l-_ Quanlum of hunran cftoi ts nccclecl to:op1i11i isc,the o

l ) l ug f i t n l I l l c- - Ab i l i t y o l t l i c co l r rp i lep , to op t in r isc the prob lenrs-_ l 'hc i rr tcgr i ty of i l rc np.rnt i , ,g

-ry; ; . , , i , ; , ; ; - ' ; ;course, the architecture.

. l ] l : M. l l ,S or nregaf lops rar ing quorciJ by rrrosr of rhenri lnulacturers of coml:u(crs is thl pcak i" . forrurr." ,ll 'he systenr : performance ,is net

^ u Oii"ct, unit . of .lulcilslll 'clnent since it rnay be sp€cific to a partiquhr

applic:ation. For bcttcr evaluatiori, U",,.t, irrlrt O;j;:rarnl l les should be run on. lhe systems. The best benchruark is to r.un one's own applictrtion. fiis Aifficuft atiJcost ly to do that s ince at thc t i rne of tnc setect i" , i " f l i rJ:l]:llil::.1 llc

applica riorl pr osr anrrne Inay not r," ."ogrr ; rncc r tcvc l r lpnrent of t l re rp l l l icat i<_rr t progranl l . l te rsclcpeuclent on the cernputer ar"chitecture' l

l rc sccont l l rcst rnet l ror l ib to usc t l re bench maik,plogl.anunc which closely resembles the application,, _rn.ix of the buyer, T.he thircl best way *trl.t, is most

,l, i: '; l l,:,:::,]1::' b{ rhe conr1,ur",. .o,,,,nuni,, r, .( 'v : l rualc i l lc s) ,s tenl by_.r .unrr ing s landar<J industry benchtnarks. Most of ten Wheatsto ie, , f r " , r . f r^ , ; " rks whibharc a,stanclard col lec l i r ln of For t ran rout ines for carry ing

:t.nr,-sr.otPU-i'iNG

, f i i :conrputat ions. arc t rscc l ' ' l 'hc <; thcr sct ics

i i "a iut i i ,g r \ rp. t tn"qtutct pct f i r r t t t : rncc : t1 :

:

ways to discover the inherent parallelisln rvithin a

nr,ltrf"tn aucl artive at a ntethod -of

solution to create

i,t"f ",-itftf,

par'allelism' This is lot a trivial task and

..,r,,t i" it i . t ir is problem will only enable wide spread

n,lo1r,i,rtt of pniallel proccssing' '-

IIBIITiIIENCIIS

{ l l Chr istophcr Lrz 'orr , Supcr iotupulers unr l t l rc i r Usc' Clr rcndon

I ' I c ss , Ox fo rd l lB6 '

[2J Lcs lIn(krn ct' al' {' 'Ilte Seislnic l(einel System-A Large-Scal6

llxcrcisc itr liortrarr 77 lit"tobility"' Softv'nre-Pructicc and Dxpcri-

cnce, V(rl' ltt (4), pp 3tL-329' April l9B8'

[JJ .lolrna 'l ' i l l,

"Computtrt Sy*"il Architccturc"' 0lectronics De'' '

s i t4n, ' |nr tuary 12, lgt l t t , pp 50-56'

[4] specitrl issue, "5Ul'li"Rl;iirtl PuTrlns"' Iilcctrottic's' March 3'

l g l l f t , pp 5 l - 56 '

l5l Crrrl I). Ilorvc ct' ql' "IIo'w to ltrogrnrtltne Pnrnllel ltrtrcessors"'

lfil i l i Spcclrurn Scptcnrbcr 1987'' '

l6l S.Y' Kurrg, Vl'St n'i"y ltt*"t"rr!' I'rtntice llall' Errglewrxi<I'"'

;'ril.;'N';'. ff.632, re88'[7] l\krhnn

'l 'atnlrc, n' ittil l 'rltni nntl K' Senthil Kunrar' "Virtull

Arral' l 'rtxcessor", Rcporl prcstnted.nt the."::::1" Cotr[crence on

Supcrtonrptrtcro ona'ippiitotions hekl ut I'ttttc ori April 25-26'

, n t l'9^t1,1'rri,-r, "A I'nrallcl Archilcctrrre comes of A ge at Lnst"' IEIIF'

Spec(ruttr, ' tune lgt iT' pp 4(150'

[9] .lolur [)ottgnrr:r *t" Jt'' llcu"'t'uter Ucncl'murkilrg: Puths und

pitf:rlls", Illlili Spcctrurrt, July f9E?' pp 38-43' :

fo i ovet t rvo t lecadcs, the f i rs t ior l l t terc ia l t t tachine' 's tar tco

a t lozet t s t rppl ic ls in Lt t " u" i i " t t Statcs o{ Ainer ica suclr as t l te

l !NCOI { I I ,NCLJ I J I I ' f f t ' n f i ' ' g f t ' f u t f t i t t e " l l a rac l a ta ' e t c ' ' a r } d

meant f0rcalled tlte[-iverrnore

i i it i nto.I loops devcloped at th'e Lawr"atrce

il[bor"toti*t

i,,ffr" widell, uscd Dench ma,-rk is tlie l-inpack.prc-

id iJ ; ; - . ;evelopc ' l ,uv ' l ' . i )u"gorro : " Argot t t te

iiilii;;;i Laboratoriei, illi 'rt' ' is' l'ncse arc a sclics o[r l<tr 'ru'or

rolving l i t tcar ccluati ' t rs altd l 'uuEoitr'an Progranllnes s

frilii;tJ"' ra'nge ot cornputer sy.stenrs ft",'ii ,:lll]"::ilI;;;;r';""ict' Linpack '"n''ilt' are publislrcd as

t / i r ,thly rePorts

vr. cONiLLr$loN' i l ' "

rv conrputcr designct ' is to t lcs igni : lThc obiect lve 0 l cvcr l r ' . . , ,^ r . . , , , . . , ' - , , , - . i . " , ' r ' , rnbi t ta t io t r o [ a la lge;pi#'""i;;,1.t t'i "",'p't"l i:':' ::::ll]]:]l':::]]::i::: 1:"1:i[ i l i l; ': 't ; ';. i, i "..r'

' ' i ' ' 'uett' i ics Ii II k c d':ls::l l:: i ': :: I

i l i l; ;ith vtitious irttctleavitrg fe.trturcs ancl

i$;#;il; strcccss of eaclt' mirchi'e 1':'lti::tlut;,.1:i'fi#r';; i,;,; ;"n it achicvc: lll: :l:,1":,., .,,il;]iht?'."ut'i,,i,

" iu.. i "'g:, :'^t"l':,. Ll'].. j'i:, "*H:t::ilrc l jsr rur \ '^ r ' rv " - ' l i l l " l i 'n t ^c l t ievct l i r t so lv i r rg rcal

fi[bptett, clegrcc of par , , -,- ,.- ",,-.,i l ; ; : i ' "r ir ir ir i 'y . I t l tc machitrc' .wlt iclt ] ' ] t t l l ] t

;-*";il' ti,e Letiat']iri'v "l \ll:, fli'11,i1.q,,.1):'lTl:i f t 'y

" i i - t f*r t , pucl<aging, t lual i ty o{ software ctc

'1 A utuJotc l ta l lenge in l l i l ra l lc l p tocessing is to [ inc l

l:klitor's Nola

Althorrgh t cscarclt on paral lcl ctrtnlrtr lcts 1tas.]: : : l : : : , l" l i ;iil:::il ;iiffi' ;fi; ;:;i J,:r: .ll::::,1 :ilffil::, illj

llll,:lit'i',#i il;;;; ;;; : i;''l'..i't cor.. p"I c'r Ii vs. e'rs' r) r) N'

ffij;l,ilii'],:"ti,till:,,l:I:$lli,iil'- iiili::H:.;i:l',il]ijitl,H,T:;,1;il1xll1li;J:Tiltll

i , . r f . , ,rup", IJicnrcns air l l . Ml ' ICIO'F.ulopc, IJ icnrcns anl l . f \ i l l ' lL ' \ / ot e ' tcred thc {r .ay. , Indian el for ts at

Llrrvclcr, it is significant thar thc sianrs in,ll],c..:::lt:::i"tlxi'::: 'ri l:,iti lJl, 'AeronauticsLio:l"to'vg*]l,:,::

,*:il;H;;:fii'illTi::ll'J'l;:,",:::lll:i|."ij:::itii*ti:T:ii[ l*:*i*:'i:li[:lFTliJ' 'il.fill%ir"* i . , r , i ' 'g1;aral lc l | rocc. 's ing 'Systel I rsct)n l l ] lc l }ce( l . i t l ) . , : : ' ; : : : : l : ] 'rlic firsr iu 1987 rvith n,pror,iru'.r for ftuid ul'l '. ', ']t:::.,1,..

crvsrallosraptry ^,,u *"^,i,"r',r,i i"r[ng t ":":Il1:5rl:?j,l

l;lil;*rifl*lu;*+iil{ffii;'iiiiil,;ih:iil*'{i#t;*r*',n*:i:*.'1i:"!IT':'triill;Group aurrouuccd a 25(r pro."rrur. glotral rJenrory.'sys.tetu optitnise(l lor wcatrri'rr "";r;";;rt;";ecr_ tiye 'f111s,

a0 cS using

asrro,,p,'y 'rrto rrirt'lrustar';;;;'ii;;' t;tlliill ,!l::i;l,l'1,1";.i:;il;,;T:::'i:1'.|:l:"11'il;;; ;i'" IrArlc' cMC and:#'|,il;liffi'''iltii:fl;,;;,;i"', !llt,,:! lll:;:lJ'1,,1":Jl:I;ii;,;T:::,i:1'.i,i;;'i;i;",, ,he 'Arlc, cMC andt r i l l t s l l u t c t s a s I l l c l ) r ( ) ( : c s s l r t g

c l c r l t c t t t s O t l l c l s w o r k l

DltDo..fhc nrosr co'rprehc.srve a.cr anrbiric,us,r,'0,;:i: l'::llli:T,:l';:llii':?}ii,:]il:iftilll'il li;:J""lil, $:itjl!;;fhc mosr co,Iprehc,rsive ancr ail'lil:ii,li]:];1:.1)il::::tH,:'l:e:lliitilili)':--;lii;;;'i; 'up*" uuru"' triii project r';as

systcrn wi th powcr ranSlng' ,l l r c t l1os l cor l r l ) ru r rc r rs tYw " " ; . , . ; ; .

; i " r " l r r . rn t i rcc i u rega- f lops to lens o i g rga- l loP: i '_ /1 r t 'uu6r r r " ' - r . thc advanced

sys tcnwi th .po" : l " : t l l l l i l ]F , , . . lhc rores .cons i r l c r iogr r r "J , , i ^ i r " , " t l , , i i i ' " "q t ' iuo lc r t tca l ib reo f t t ranpower t r rbec i r fu i lded to thc tunc o l l ' . , . . r re ( r ry r r r r i l l i on .:::lJJll:'ll,I :,1:,;i.:i ;,;',;";'';'.;,,ivarcnt r. us g r{r) r'irrio.