34
1 Parallel Programming using the Iteration Space Visualizer Yijun Yu and Erik H. D'Hollander University of Ghent, Belgium http:// www.elis.rug.ac.be/paris/ppt

Parallel Programming using the Iteration Space Visualizer

  • Upload
    atira

  • View
    58

  • Download
    0

Embed Size (px)

DESCRIPTION

Parallel Programming using the Iteration Space Visualizer. Yijun Yu and Erik H. D'Hollander University of Ghent, Belgium http://www.elis.rug.ac.be/paris/ppt. Introduction. Overview of the approach interactive vs automatic Loop dependence Iteration Space Dependence Graph ISDG - PowerPoint PPT Presentation

Citation preview

Page 1: Parallel Programming using the Iteration Space Visualizer

1

Parallel Programming using the Iteration Space Visualizer

Yijun Yu and Erik H. D'Hollander

University of Ghent, Belgiumhttp://www.elis.rug.ac.be/paris/ppt

Page 2: Parallel Programming using the Iteration Space Visualizer

2

Introduction Overview of the approach

interactive vs automatic Loop dependence

Iteration Space Dependence Graph ISDG Instrumentation and construct ISDG

Visualization of … Dependence Transformations

Applications and Results Conclusion and Future work

Page 3: Parallel Programming using the Iteration Space Visualizer

3

Overview of the approachProgram

Code Generation

Visualize dependence

Visualize transformation

Dependence Analysis

Dataflow Analysis

ProgramTransformation

Construct the ISDG

Instrument the program

Iteration Space Visualizer Parallel Compiler

Automatic

Interactiveexact?

why?

Page 4: Parallel Programming using the Iteration Space Visualizer

4

Introduction (2) Overview of the approach

interactive vs automatic Loop dependence

Iteration Space Dependence Graph ISDG Instrumentation and construct ISDG

Visualization of … Dependence Transformations

Applications and Results Conclusion and Future work

Page 5: Parallel Programming using the Iteration Space Visualizer

5

Loop Dependence Nested loops are the focus of the parallel

programming Data dependences happen when there are

multiple accesses to the same memory locations where at least one of them WRITE

Data dependence is classified as flow (first WRITE then READ), anti-flow (first READ then WRITE) or output (WRITE after WRITE)

Loop dependence is the ordering between data dependent loop iterations

Page 6: Parallel Programming using the Iteration Space Visualizer

6

The Iteration Space Dependence Graph (ISDG)

The object to be visualized is …ISDG = Iteration Space + Loop Dependence

An iteration I=(i1..im) is a point in the m-D iteration space, which is mapped to the 3D space

The dependent iterations I and J are linked by an arrow I J

Page 7: Parallel Programming using the Iteration Space Visualizer

7

An example of ISDG

do i=1,n do j=1,n do k=1,2 if(k.eq.1) then a(i,j,k)=(a(i-1,j,k)+a(i+1,j,k))/2 else a(i,j,k)=(a(i,j-1,k)+a(i,j+1,k))/2 endif enddo enddoenddo

i

j

k

(1,1,1) (1,2,1) (1,3,1)(2,1,1) (1,4,1)(2,2,1) (2,3,1)(3,1,1) (2,4,1)(3,2,1) (3,3,1)(4,1,1) (3,4,1)(4,2,1) (4,3,1) (4,4,1)

(1,1,2) (1,2,2) (1,3,2) (1,4,2)(2,1,2) (2,2,2) (2,3,2) (2,4,2)(3,1,2) (3,2,2)(4,1,2) (3,3,2) (3,4,2)(4,2,2) (4,3,2) (4,4,2)

Page 8: Parallel Programming using the Iteration Space Visualizer

8

Instrumentation and the ISDG construction Program instrumentation

Loop iteration: id + indices Array reference:

id + name + Read | Write + subscripts ISDG construction

1. Create the iteration points from indices2. Setup a reference list for every accessed

location3. Mark Flow-, Anti- and Output-dependence

arrows

Page 9: Parallel Programming using the Iteration Space Visualizer

9

Introduction (3) Overview of the approach

interactive vs automatic Loop dependence

Iteration Space Dependence Graph ISDG Instrumentation and construct ISDG

Visualization of … Dependence Transformations

Applications and Results Conclusion and Future work

Page 10: Parallel Programming using the Iteration Space Visualizer

10

Dependence Visualization Loop visualization

3D view-port of Iteration space Graphical operations

Detecting and enhancing parallelism Automatic parallelization Maximal parallelism detection Parallelization by plane execution

Page 11: Parallel Programming using the Iteration Space Visualizer

11

Loop Visualization Visualization of the ISDG

Points + Arrows + Colors + Labels + Axes 3D view-port of Iteration space

=3D, >3D and < 3Dprojection (condensed points and arrows)expansion (dummy index dimension)

ISDG operations Graphical operations: rotate, move and

animate Query dialogs: selection, variable zooming

and dependence type filtering, etc.

Page 12: Parallel Programming using the Iteration Space Visualizer

12

Automatic Parallelization Sequential execution

Traverse the iteration space in lexicographical order and count the iterations TSeq

Parallel execution Traverse the iterations in a marked loop in parallel and

count the steps Tpar

Report speedup Spara = Tseq / Tpar

Automatic parallelization Test whether the dependence ordering is kept for all

combinations of loop parallelizations :DOALLi1,i2,i3?+DOALLi1,i2?+DOALLi1,i3? + DOALLi2,i3?+DOALLi1?+DOALLi2?+DOALLi3?

Page 13: Parallel Programming using the Iteration Space Visualizer

13

Maximal Parallelism Detection Data-flow order

An iteration is executed as soon as its data are ready, i.e. after all the dependent iterations are carried out

The iterations of the same delay are executed at the same time, i.e. in parallel

The dependent iterations are executed sequentially. Count the steps Tdf

Minimal executing time = Maximal parallelism

Maximal speedup Smax = Tseq/Tdf

Page 14: Parallel Programming using the Iteration Space Visualizer

14

Plane Parallelization Define a cutting plane Ax+By+Cz=D

Clicking at three points Giving parameters A,B,C,D

Plane execution Traverse the planes d0 Ax+By+Cz<d0+Td

along the normal vector (A,B,C) Plane parallelization

Matching the dataflow execution may enhance speedup Splane=Tseq/Td

Verified by cross-plane dependence checking or 3D->2D projection checking

Page 15: Parallel Programming using the Iteration Space Visualizer

15

Dependence Visualization procedural summary

Spara=Sdf?

Start

Maximal parallelism detection Sdf

Automatic parallelization Spara

Prune false dependences

End

Yes

Plane parallelization Splane

Splane>Spara?

No

NoYes

Program transformation

Page 16: Parallel Programming using the Iteration Space Visualizer

16

Program Transformations

When Sdf>Spara, loop transformations may enhance the parallelism of the target loop…

Unimodular Loop Transformations Why? 3D 3D, 1-to-1, etc.

Loop Projections and Expansions Loop Projection: >3D 3D Loop Expansion: <3D 3D

Page 17: Parallel Programming using the Iteration Space Visualizer

17

Unimodular Transformations

?

?

?

?

?

?

?

?

?

Normal vector

(A,B,C)

A

B

C

?

?

?

?

?

?

A

B

C

!

!

!

!

!

!

•Unimodular

•Legality

Look for a suitable transformation Interactive way

Automatic way Possible when array index expression are linear

and all the distance vectors lie in a plane Extract largest base vectors of the dependence

distances and construct the transformation (pseudo distance matrix approach)

Page 18: Parallel Programming using the Iteration Space Visualizer

18

Loop Expansion Non-perfectly vs perfectly nested loop Statement vs Iteration-level parallelism Statement reordering affine remapping Loop expansion

Use additional dimension to index the statements in the loop body

Unimodular loop transformations are still applicable at the statement level

Page 19: Parallel Programming using the Iteration Space Visualizer

19

Introduction Overview of the approach

interactive vs automatic Loop dependence

Iteration Space Dependence Graph ISDG Instrumentation and construct ISDG

Visualization of … Dependence Transformations

Applications and Results Conclusion and Future work

Page 20: Parallel Programming using the Iteration Space Visualizer

20

Application and Results Gauss-Jordan:

linear system solver Lim’s example:

statement-level parallelism Cholesky kernel:

loop projection CFD application:

unimodular transformation

Page 21: Parallel Programming using the Iteration Space Visualizer

21

Gauss-Jordan elimination do i=1,n do j=1,n if(i.ne.j) then f=a(j,i)/a(i,i)C$doisv do k=i+1,n+1 a(j,k)=a(j,k)-f*a(i,k) enddo endif enddo enddo

id=0 do i = 1,n do j = 1,n if (i.ne.j) then

write(11,*) id+1," r ","a",2,j,i write(11,*) id+1," r ","a",2,i,i write(11,*) id+1," w ","f"," 1 0 " f=a(j,i)/a(i,i) do k = i+1,n

id=id+1 write(11,*) id,i,j,k write(11,*) id," r ","a",2,j,k write(11,*) id," r ","f"," 1 0 " write(11,*) id," r ","a",2,i,k write(11,*) id," w ","a",2,j,k a(j,k)=a(j,k)-f*a(i,k) enddo endif enddo enddo

Page 22: Parallel Programming using the Iteration Space Visualizer

22

Gauss-Jordan elimination

Plane: I = 1

DOALL J, K validSeq. time: 30 Dataflow: 4, Speedup: 7.5

Loop time: 4, Speedup: 7.5

IJ

K

(1,4,2)

(2,4,3)

(2,4,4) (3,4,4)

(3,4,5)

(1,4,3)

(2,4,5)

(1,4,4)

(1,4,5)(4,3,5)

(1,3,2)

(2,3,3)(1,3,3)

(1,3,4) (2,3,4)

(2,3,5)(1,3,5)

(1,2,2)

(3,2,4)

(4,2,5)(3,2,5)

(1,2,3)

(1,2,4)

(1,2,5)

(2,1,3)

(2,1,4) (3,1,4)

(3,1,5) (4,1,5)(2,1,5)

Page 23: Parallel Programming using the Iteration Space Visualizer

23

Lim’s Example

The original program do l1=1,n do l2=1,n a(l1,l2)=a(l1,l2)+b(l1-1,l2) b(l1,l2)=a(l1,l2-1)*b(l1,l2) enddo enddo

do l1=1,n do l2=1,nc$doisv do l3=0,1 if(l3.eq.0) a(l1,l2)=a(l1,l2)+b(l1-1,l2) if(l3.eq.1) b(l1,l2)=a(l1,l2-1)*b(l1,l2) enddo enddo enddo

Loop Expansion

Page 24: Parallel Programming using the Iteration Space Visualizer

24

Lim’s example unimodular transformation

1

-1 1

1

0

0

0

1

0

Plane:L1-L2+L3=0

DOALL L3 validSeq. time: 32 Dataflow: 7, Speedup: 4.57

Loop time: 16, Speedup: 2.00

l1

l2

l3

i1

i2i3

Plane: i1 = 0

DOALL i1 validSeq. time: 32 Dataflow: 7, Speedup: 4.57

Loop time:7, Speedup: 4.57

Page 25: Parallel Programming using the Iteration Space Visualizer

25

Lim’s exampleCode generation

C The unimodular transformed code doall i1 = 1-n, n do i2 = max(i1,1), min(n,i1+n) do i3 = max(-i1+i2,1), min(-i1+i2+1,n) l1 = i2 l2 = i3 l3 = i1 - i2 + i3 if (l3.eq.1)a(l1,l2)=a(l1,l2)+b(l1-1,l2) if (l3.eq.2)b(l1,l2)=a(l1,l2-1)*b(l1,l2) enddo enddo enddoall

1

-1

1

1

0

0

0

1

0FourierMotzkin

0

1

0

0

0

1

1

-1 1

Inversion

Page 26: Parallel Programming using the Iteration Space Visualizer

26

Lim’s exampleCode generation

symbolic n;IS1:={[i,j,k]:1<=i,j<=n && k=0};IS2:={[i,j,k]:1<=i,j<=n && k=1};T1:={[i,j,k]->[i-j+k,i,j]};T2:={[i,j,k]->[i-j+k,i,j]};codegen 0 T1:IS1,T2:IS2;

1

-1

1

1

0

0

0

1

0

I’ = I – J + KJ’ = IK’= J

Page 27: Parallel Programming using the Iteration Space Visualizer

27

Lim’s exampleCode generation

1

-1

1

1

0

0

0

1

0

C the optimized code by Omega calculator doall p = 1-n, n if (p.ge.1)b(p,1) = a(p,0) * b(p,1) do l1 = max(p+1,1), min(p+n-1,n) a(l1,l1-p) =a(l1,l1-p)+b(l1-1,l1-p) a(l1,l1-p+1)=a(l1,l1-p)*b(l1,l1-p+1) enddo if (p.le.0)a(p+n,n)=a(p+n,n)+b(p+n-1,n) enddoall

Page 28: Parallel Programming using the Iteration Space Visualizer

28

Cholesky kernel (I,K,J,L) DO 1 I = 0,NRHS DO 1 K = 0,2*N+1 IF (K.LE.N) THEN I0 = MIN(M,N-K) ELSE I0 = MIN(M,2*N-K+1) ENDIF DO 1 J = 0,I0C$DOISV DO 1 L = 0,NMAT IF (K.LE.N) THEN IF (J.EQ.0) THEN 8 B(I,L,K)=B(I,L,K)*A(L,0,K) ELSE 7 B(I,L,K+J)=B(I,L,K+J)-A(L,-J,K+J)*B(I,L,K) ENDIF ELSE IF (J.EQ.0) THEN 9 B(I,L,K)=B(I,L,K)*A(L,0,K) ELSE 6 B(I,L,K-J)=B(I,L,K-J)-A(L,-J,K)*B(I,L,K) ENDIF ENDIF1 CONTINUE

C THE ORIGINAL KERNEL DO 6 I = 0, NRHS DO 7 K = 0, N DO 8 L = 0, NMAT8 B(I,L,K) = B(I,L,K) * A(L,0,K) DO 7 J = 1, MIN (M, N-K) DO 7 L = 0, NMAT7 B(I,L,K+J) = B(I,L,K+J) - A(L,-J,K+J) * B(I,L,K) DO 6 K = N, 0, -1 DO 9 L = 0, NMAT9 B(I,L,K) = B(I,L,K) * A(L,0,K) DO 6 J = 1, MIN (M, K) DO 6 L = 0, NMAT6 B(I,L,K-J) = B(I,L,K-J) - A(L,-J,K) * B(I,L,K)

Loop Fusion

Page 29: Parallel Programming using the Iteration Space Visualizer

2929

Cholesky Kernel

29

(I,K,J ,L)

IK

J

Plane: L=0

I

KL

Loop Projections

Page 30: Parallel Programming using the Iteration Space Visualizer

30

Cholesky kernel (I,K,J,L) DO 1 I = 0,NRHS DO 1 K = 0,2*N+1 IF (K.LE.N) THEN I0 = MIN(M,N-K) ELSE I0 = MIN(M,2*N-K+1) ENDIF DO 1 J = 0,I0C$DOISV

DOALL 1 L = 0,NMAT IF (K.LE.N) THEN IF (J.EQ.0) THEN 8 B(I,L,K)=B(I,L,K)*A(L,0,K) ELSE 7 B(I,L,K+J)=B(I,L,K+J)-A(L,-J,K+J)*B(I,L,K) ENDIF ELSE IF (J.EQ.0) THEN 9 B(I,L,K)=B(I,L,K)*A(L,0,K) ELSE 6 B(I,L,K-J)=B(I,L,K-J)-A(L,-J,K)*B(I,L,K) ENDIF ENDIF1 CONTINUE

(L,I,K,J)

Page 31: Parallel Programming using the Iteration Space Visualizer

31

CFD application Computation Fluid Dynamics CFD

Navier-Stokes equations Successive Over-Relaxation SOR Kernel 3D loop: difficult to analyze

172 array references/iteration33 if-branches/iteration

Unimodular transformation found!

Page 32: Parallel Programming using the Iteration Space Visualizer

32

CFD Application

Range:I1’= 6,24I2’= 1, 4I3’= 1, 4

Plane: i1’=9

Seq. timeDOALL i2’,i3’

Dataflow: 19, Speedup: 3.37Loop time:19,Speedup: 3.37

I1’

I2’

I3’

(9,1,1)

(9,2,1)

(9,1,2)

Range:i1= 1, 4i2= 1, 4i3= 1, 4

Plane: 3 i1+2 i2+i3=9

Seq. time: 64 Dataflow: 19, Speedup: 3.37Loop time: 64, Speedup: 1.00

i1

i2

i3

(2,1,1)

(1,2,2)

(1,1,4)

3

2

1

0

1

0

1

0

0

Page 33: Parallel Programming using the Iteration Space Visualizer

33

Conclusion and Future work Allowing the exact visualization of real

program loops Assistance with detecting parallel loops Estimation of maximal speedup using

dataflow execution Assistance with finding suitable loop

transformations Future work:

Seemless Integration into PPT (parallel programming environment)

Page 34: Parallel Programming using the Iteration Space Visualizer

34

THANKS For you attention!

Any question?