29
Matrix Multiplication Matrix Multiplication Instructor: Dr Sushil K Prasad Instructor: Dr Sushil K Prasad Presented By: R. Jayampathi Sampath Presented By: R. Jayampathi Sampath

Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

Embed Size (px)

Citation preview

Page 1: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

Matrix MultiplicationMatrix MultiplicationMatrix MultiplicationMatrix Multiplication

Instructor: Dr Sushil K PrasadInstructor: Dr Sushil K PrasadPresented By: R. Jayampathi Presented By: R. Jayampathi SampathSampath

Instructor: Dr Sushil K PrasadInstructor: Dr Sushil K PrasadPresented By: R. Jayampathi Presented By: R. Jayampathi SampathSampath

Page 2: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

Outline Outline Outline Outline

IntroductionIntroduction Hypercube Interconnection NetworkHypercube Interconnection Network The Parallel AlgorithmThe Parallel Algorithm Matrix TranspositionMatrix Transposition Communication Efficient Matrix Multiplication Communication Efficient Matrix Multiplication

on Hypercubes (The paper)on Hypercubes (The paper)

Page 3: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

IntroductionIntroductionIntroductionIntroduction Matrix multiplication is important algorithm design in Matrix multiplication is important algorithm design in

parallel computation. parallel computation. Matrix multiplication on hypercubeMatrix multiplication on hypercube

– Diameter is smallerDiameter is smaller– Degree = Degree = log(p)log(p)

Straightforward RAM algorithm for MatMul requires Straightforward RAM algorithm for MatMul requires O(nO(n33)) time. time.

– Sequential Algorithm: Sequential Algorithm: for (i=0; i<n; i++){ for (i=0; i<n; i++){ for (j=0; j<n; j++) { for (j=0; j<n; j++) {

t=0;t=0; for(k = 0; k<n; k++){for(k = 0; k<n; k++){ t=t +at=t +aikik*b*bkjkj; ; }} ccijij=t;=t; }} }}

Page 4: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

Hypercube Interconnection NetworkHypercube Interconnection NetworkHypercube Interconnection NetworkHypercube Interconnection Network

00000001

00100011

011101000101

0110

1001

1000

1100

1101

1111

1011

Page 5: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

Hypercube Interconnection Network Hypercube Interconnection Network (contd.)(contd.)

Hypercube Interconnection Network Hypercube Interconnection Network (contd.)(contd.)

The formal specification of a Hypercube Interconnection The formal specification of a Hypercube Interconnection Network.Network.

– Let processors be available Let processors be available

– Let Let ii and and ii(b) (b) be two integers whose binary be two integers whose binary representation differ only in position representation differ only in position bb, ,

– Specifically,Specifically, If is binary representation of If is binary representation of

ii Then is the binary Then is the binary

representation of representation of i i(b) (b) where is the complement of bitwhere is the complement of bit A g-Dimentional Hypercube interconnection network A g-Dimentional Hypercube interconnection network is formed by is formed by

connection each processor connection each processor ppi i to by two way link for to by two way link for all all

gN 2 ,1,2,1,0 ... Npppp 1g

1,0 )( Nii b

gb 0

011121 ...... iiiiiii bbbgg

011'

121 ...... iiiiiii bbbgg 'bi bi

10 Ni )(bip

gb 0

Page 6: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

The Parallel AlgorithmThe Parallel AlgorithmThe Parallel AlgorithmThe Parallel Algorithm Example Example (parallel algorithm)(parallel algorithm)

4-4

3-3

4 -4

3-3

2-2

1 -1

2 -2

1-1

2 -2

1 -2

3 -4

4-4

4-3

3-3

1 -1

2-1

4-4

4-3

3-2

3-1

1 -1

1 -2

2-3

2-4

4-4

3 -3

2 -2

1 -1

000000 001001

010010 011011

100100

110110 111111

101101

1 2

3 4

-1 -2

-3 -4

A =

B =

Step1:Step1:

2*22*2

n=2n=211

#processors N#processors N

N=nN=n33=2=233=8=8

X,X,XX,X,X

ii jj kk

Initial stepInitial step Step 1.1Step 1.1

Step 1.2Step 1.2 Step 1.2Step 1.2

A(0,j,k) & A(0,j,k) & B(0,j,k) -> B(0,j,k) -> processors (i,j,k), processors (i,j,k), where 1<=i<=n-1.where 1<=i<=n-1.

A(i,j,i) -> A(i,j,i) -> processors processors (i,j,k) (i,j,k)

where where 0<=k<=n-10<=k<=n-1

B(i,j,k) -> B(i,j,k) -> processors processors (i,j,k)(i,j,k)

where where 0<=j<=n-10<=j<=n-1

Page 7: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

The Parallel Algorithm (contd.)The Parallel Algorithm (contd.)The Parallel Algorithm (contd.)The Parallel Algorithm (contd.)

-16-12

-6-3

-2-1

-6 -8

-22-15

-10-7

Step2:Step2:

Step3:Step3:

Page 8: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

The Parallel Algorithm (contd.)The Parallel Algorithm (contd.)The Parallel Algorithm (contd.)The Parallel Algorithm (contd.) Implementation of straightforward RAM algorithm on Implementation of straightforward RAM algorithm on

HC.HC. The multiplication of two The multiplication of two n x nn x n matrices A , B where matrices A , B where n=2n=2qq

Use HC with Use HC with N = nN = n33 = 2 = 23q3q Each processor Each processor PPrr occupying position occupying position (i,j,k)(i,j,k)

where where r = inr = in22 + jn + k for 0<= i,j,k <= n-1 + jn + k for 0<= i,j,k <= n-1 If the binary representation of If the binary representation of rr is : is :

rr3q-13q-1rr3q-23q-2…r…r2q2qrr2q-12q-1…r…rqqrrq-1q-1…r…r00

then the binary representation of then the binary representation of i, j, ki, j, k are are

rr3q-13q-1rr3q-23q-2…r…r2q2q,, rr2q-12q-1…r…rqq,, rrq-1q-1…r…r00 respectively respectively

Page 9: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

The Parallel Algorithm (contd.)The Parallel Algorithm (contd.)The Parallel Algorithm (contd.)The Parallel Algorithm (contd.) Example Example (positioning(positioning))

– The multiplication of two The multiplication of two 2 x 22 x 2 matrices A , B where matrices A , B where n=2=2n=2=21 1 q=1 q=1

– Use HC with Use HC with N = nN = n33 = 2 = 23q3q=8 processors=8 processors

– Each processor Each processor PPrr occupying position occupying position (i,j,k)(i,j,k)

– where where r = i2r = i222 + j2 + k for 0<= i,j,k <= 1 + j2 + k for 0<= i,j,k <= 1

– If the binary representation of If the binary representation of rr is : is :

– rr22rr11rr0 0

– then the binary representation of then the binary representation of i, j, ki, j, k are are

– rr22,, rr11,, rr00 respectively respectively

Page 10: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

The Parallel Algorithm (contd.)The Parallel Algorithm (contd.)The Parallel Algorithm (contd.)The Parallel Algorithm (contd.)

All processors with same index value in the one of All processors with same index value in the one of i,j,ki,j,k form a HC with form a HC with nn22 processors processors

All processors with the same index value in two field All processors with the same index value in two field coordinates form a HC with coordinates form a HC with nn processors processors

Each processor will have 3 registers Each processor will have 3 registers AArr, B, Brr and and CCrr also also denoted denoted A(I,j,k) B(I,j,k) and C(i,j,k)A(I,j,k) B(I,j,k) and C(i,j,k)

000000 001001

010010 011011

100100

110110 111111

101101

ArAr

BrBr

CrCr

101101

Page 11: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

The Parallel Algorithm (contd.)The Parallel Algorithm (contd.)The Parallel Algorithm (contd.)The Parallel Algorithm (contd.) Step 1:The elements of A and B are distributed to the nStep 1:The elements of A and B are distributed to the n33

processors so that the processor in position i,j,k will contain aprocessors so that the processor in position i,j,k will contain a jiji and band bikik

– 1.1 Copies of data initially in A(0,j,k) and B(0,j,k) are sent to processors 1.1 Copies of data initially in A(0,j,k) and B(0,j,k) are sent to processors in positions (i,j,k), where 1<=i<=n-1. Resulting in A(i,j,k) = ain positions (i,j,k), where 1<=i<=n-1. Resulting in A(i,j,k) = a ijij and and B(i,j,k) = bB(i,j,k) = bjkjk for 0<=i<=n-1. for 0<=i<=n-1.

– 1.2 Copies of data in A(i,j,i) are sent to processors in positions (i,j,k), 1.2 Copies of data in A(i,j,i) are sent to processors in positions (i,j,k), where 0<=k<=n-1. Resulting in A(i,j,k) = awhere 0<=k<=n-1. Resulting in A(i,j,k) = a jiji for 0<=k<=n-1. for 0<=k<=n-1.

– 1.3 Copies of data in B(i,j,k) are sent to processors in positions (i,j,k), 1.3 Copies of data in B(i,j,k) are sent to processors in positions (i,j,k), where 0<=j<=n-1. Resulting in B(i,j,k) = bwhere 0<=j<=n-1. Resulting in B(i,j,k) = b ikik for 0<=j<=n-1. for 0<=j<=n-1.

Step 2: Each processor in position (i,j,k) computes the product Step 2: Each processor in position (i,j,k) computes the product C(i,j,k) = A(i,j,k) * B(i,j,k)C(i,j,k) = A(i,j,k) * B(i,j,k)

Step 3: The sum C(0,j,k) = Step 3: The sum C(0,j,k) = ∑C(i,j,k) for 0<=i<=n-1 and is ∑C(i,j,k) for 0<=i<=n-1 and is computed for 0<=j,k<n-1.computed for 0<=j,k<n-1.

Page 12: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

The Parallel Algorithm (contd.)The Parallel Algorithm (contd.)The Parallel Algorithm (contd.)The Parallel Algorithm (contd.)

AnalysisAnalysis– Steps 1.1,1.1,1.3 and 3 consists of q constant time Steps 1.1,1.1,1.3 and 3 consists of q constant time

iterations.iterations.

– Step 2 requires constant timeStep 2 requires constant time

– So, So, T(nT(n33) = O(q)) = O(q)

= O(logn)= O(logn)

– Cost Cost pT(p) = O(npT(p) = O(n33logn)logn)

– Not cost optimal.Not cost optimal.

Page 13: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

n=4=2n=4=222

#processors N#processors N

N=nN=n33=4=433=64=64

XX,XX,XXXX,XX,XX

ii jj kk

Copies of Copies of data data initially in initially in A(0,j,k) A(0,j,k) and and B(0,j,k)B(0,j,k)

Page 14: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

n=4=2n=4=222

#processors N#processors N

N=nN=n33=4=433=64=64

XX,XX,XXXX,XX,XX

ii jj kk

A(0,j,k) A(0,j,k) and and B(0,j,k) B(0,j,k) are sent to are sent to processors processors in in positions positions (i,j,k), (i,j,k), where where 1<=i<=n-11<=i<=n-1

Page 15: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

n=4=2n=4=222

#processors N#processors N

N=nN=n33=4=433=64=64

XX,XX,XXXX,XX,XX

ii jj kk

Copies of Copies of data data initially in initially in A(0,j,k) A(0,j,k) and and B(0,j,k)B(0,j,k)

Page 16: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

n=4=2n=4=222

#processors N#processors N

N=nN=n33=4=433=64=64

XX,XX,XXXX,XX,XX

ii jj kk

Senders Senders of Aof A

Copies of Copies of data in data in A(i,j,i) are A(i,j,i) are sent to sent to processors in processors in positions positions (i,j,k), where (i,j,k), where 0<=k<=n-10<=k<=n-1

Page 17: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

n=4=2n=4=222

#processors N#processors N

N=nN=n33=4=433=64=64

XX,XX,XXXX,XX,XX

ii jj kk

Senders Senders of Bof B

Copies of Copies of data in data in B(i,j,k) are B(i,j,k) are sent to sent to processors in processors in positions positions (i,j,k), where (i,j,k), where 0<=j<=n-10<=j<=n-1

Page 18: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

Matrix TranspositionMatrix TranspositionMatrix TranspositionMatrix Transposition

The number of processors used is N = nThe number of processors used is N = n22 = 2 = 22q2q and and processor Pprocessor Prr occupies position (i,j) where r = in + j occupies position (i,j) where r = in + j where 0<=i,j<=n-1.where 0<=i,j<=n-1.

Initially, processor PInitially, processor Prr holds all of the elements of holds all of the elements of matrix A where r = in + j.matrix A where r = in + j.

Upon termination, processor PUpon termination, processor Pss holds element a holds element aijij where s = jn + i.where s = jn + i.

Page 19: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

Matrix Transposition (contd.)Matrix Transposition (contd.)Matrix Transposition (contd.)Matrix Transposition (contd.) A recursive interpretation of the algorithmA recursive interpretation of the algorithm

– Divide the matrix into 4 sub-matrices – n/2 x n/2Divide the matrix into 4 sub-matrices – n/2 x n/2– The first level of recursionThe first level of recursion

The elements of bottom left sub-matrix are swapped with The elements of bottom left sub-matrix are swapped with the corresponding elements of the top right sub-matrix.the corresponding elements of the top right sub-matrix.

The elements of other two sub-matrix are untouched.The elements of other two sub-matrix are untouched.

– The same step is now applied to each of four The same step is now applied to each of four (n/2)*(n/2) matrices.(n/2)*(n/2) matrices.

– This continues until 2*2 matrices are transposed.This continues until 2*2 matrices are transposed. AnalysisAnalysis

– The algorithm consists of The algorithm consists of qq constant time iterations. constant time iterations.– T(n) = log(n)T(n) = log(n)– Cost = nCost = n22log(n)log(n) – not cost optimal ( – not cost optimal (n(n-1)/2n(n-1)/2 operations operations

on on n*nn*n matrix on the RAM by swapping matrix on the RAM by swapping aaijij wit wit aajiji for for all all i<ji<j))

Page 20: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

Matrix Transposition (contd.)Matrix Transposition (contd.)Matrix Transposition (contd.)Matrix Transposition (contd.) ExampleExample

A=A=

11

eebb

22cc

ffdd

gg

hh

xxvv

yy33

zzww

44

A=A=

11

eebb

22hh

xxvv

yy

cc

ffdd

gg33

zzww

44

A=A=

11 bb hh vv

ee 22 xx YY

cc dd 33 ww

ff gg zz 44

A=A=

11

bbee

22hh

vvxx

yy

cc

ddff

gg33

wwzz

44

1.01.0 1.11.1

1.21.2 1.31.3

Page 21: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi
Page 22: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

OutlineOutlineOutlineOutline

2D Diagonal Algorithm2D Diagonal Algorithm The 3-D Diagonal AlgorithmThe 3-D Diagonal Algorithm

Page 23: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

2D Diagonal Algorithm2D Diagonal Algorithm2D Diagonal Algorithm2D Diagonal Algorithm

A*0A*0 A*1A*1 A*2A*2 A*3A*3B0*B0*

B1*B1*

B2*B2*

B3*B3*

A*0A*0

B0*B0*

A*1A*1

B1*B1*

A*2A*2

B2*B2*

A*3A*3

B3*B3*

AA BB

Step 1Step 1

4*44*4 4*44*4

Page 24: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

2D Diagonal Algorithm (Contd.)2D Diagonal Algorithm (Contd.)2D Diagonal Algorithm (Contd.)2D Diagonal Algorithm (Contd.)A*0A*0

B0*B0*A*0A*0 A*0A*0 A*0A*0

A*1A*1 A*1A*1

B1*B1*A*1A*1 A*1A*1

A*2A*2 A*2A*2 A*2A*2

B2*B2*A*2A*2

A*3A*3 A*3A*3 A*3A*3 A*3A*3

B3*B3*

A*0A*0

B0*B0*A*0A*0

B01B01A*0A*0

B02B02A*0A*0

B03B03

A*1A*1

B10B10A*1A*1

B1*B1*A*1A*1

B12B12A*1A*1

B13B13

A*2A*2

B20B20A*2A*2

B21B21A*2A*2

B2*B2*A*2A*2

B23B23

A*3A*3

B30B30A*3A*3

B31B31A*3A*3

B32B32A*3A*3

B3*B3*

Step 2Step 2 Step 3Step 3

C00,C00,

C10C10

C20,C20,

C30C30

C01,C01,

C11C11

C21,C21,

C31C31

C02,C02,

C12C12

C22,C22,

C32C32

C03,C03,

C13C13

C23,C23,

C33C33

Step 4Step 4

Page 25: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

2D Diagonal Algorithm (Contd.)2D Diagonal Algorithm (Contd.)2D Diagonal Algorithm (Contd.)2D Diagonal Algorithm (Contd.) Above algorithm can be extended to a 3-D mesh Above algorithm can be extended to a 3-D mesh

embedded in a hypercube with embedded in a hypercube with AA*i*i and and BBi*i* being being initially distributed along the third dimension z.initially distributed along the third dimension z.

Processor Processor ppiikiik holding the sub-blocks of holding the sub-blocks of AAkiki and and BBikik

One-to-all-personalized One-to-all-personalized broadcast of broadcast of Bi*Bi* then then replaced by replaced by point-to-point communicationpoint-to-point communication of of BBikik from from ppiikiik to to ppkikkik

It fallows one-to-all broadcast It fallows one-to-all broadcast BBikik to to ppkikkik along the along the z direction.z direction.

Page 26: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

The 3-D Diagonal AlgorithmThe 3-D Diagonal AlgorithmThe 3-D Diagonal AlgorithmThe 3-D Diagonal Algorithm HC consisting of p processorsHC consisting of p processors Can be visualized as a 3-D mesh of sizeCan be visualized as a 3-D mesh of size Matrices A and B are partitioned into blocks of pMatrices A and B are partitioned into blocks of p⅔⅔ with blocks along with blocks along

each dimension.each dimension. Initially, it is assumed that A and B are mapped onto the 2-D plane x = yInitially, it is assumed that A and B are mapped onto the 2-D plane x = y processor pprocessor piikiik containing the blocks of A containing the blocks of Akiki and B and Bkiki

333 ppp 3 p

Page 27: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

The 3-D Diagonal Algorithm (contd.)The 3-D Diagonal Algorithm (contd.)The 3-D Diagonal Algorithm (contd.)The 3-D Diagonal Algorithm (contd.) Algorithm consists of 3 phasesAlgorithm consists of 3 phases

– Point to point communication of Bki by piik to pikkPoint to point communication of Bki by piik to pikk

– One to all broadcast of blocks of A along the x One to all broadcast of blocks of A along the x direction and the newly acquired block of B along the direction and the newly acquired block of B along the z direction.z direction.

Now processor pijk has the blocks of Akj and BjiNow processor pijk has the blocks of Akj and Bji Each processor calculates the products of blocks of A and B.Each processor calculates the products of blocks of A and B.

– The reduction by adding the result sub matrices along The reduction by adding the result sub matrices along the z direction.the z direction.

Page 28: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

The 3-D Diagonal Algorithm (contd.)The 3-D Diagonal Algorithm (contd.)The 3-D Diagonal Algorithm (contd.)The 3-D Diagonal Algorithm (contd.) AnalysisAnalysis

– Phase 1: Passing messages of size nPhase 1: Passing messages of size n22/ / pp⅔⅔ require require log(log(33√p(t√p(tss + t + tww((nn22/ / pp⅔ ⅔ ))) where t))) where tss is the time it takes to is the time it takes to start up for message sending and tstart up for message sending and tww is time it takes to is time it takes to send a word from one processor to its neighbor. send a word from one processor to its neighbor.

– Phase 2: takes twice as much time as phase 1.Phase 2: takes twice as much time as phase 1.

– Phase 3: Can be completed in the same amount of time Phase 3: Can be completed in the same amount of time as Phase 1.as Phase 1.

Overall, the algorithm takes (4/3 log p, Overall, the algorithm takes (4/3 log p, nn22/ / pp⅔ ⅔ (4/3 (4/3 log p))log p))

Page 29: Matrix Multiplication Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi Sampath Instructor : Dr Sushil K Prasad Presented By : R. Jayampathi

BibliographyBibliographyBibliographyBibliography

Akl, Akl, Parallel Computation, Models and MethodsParallel Computation, Models and Methods, , Prentice Hall 1997.Prentice Hall 1997.

Gupta, H & Sadayappan P., Gupta, H & Sadayappan P., Communication Communication Efficient Matrix Mulitplication on HypercubesEfficient Matrix Mulitplication on Hypercubes , , August 1994   Proceedings of the sixth annual August 1994   Proceedings of the sixth annual ACM symposium on Parallel algorithms and ACM symposium on Parallel algorithms and architectures, 320 - 329 architectures, 320 - 329

Quinn, M.J., Quinn, M.J., Parallel Computing – Theory and Parallel Computing – Theory and PracticePractice, McGraw Hill, 1997, McGraw Hill, 1997