14
Juergen Ributzka David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

•Juergen Ributzka•David Stephenson•Timothy Kong•Dee Lee•Fred Chow•Guang R. Gao

Page 2: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Overview

• SiCortex Multiprocessor

• Software Pipelining Framework

• Results

• Future Work

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 2

Page 3: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

SiCortex Multiprocessor

• RISC

• 6 cores (MIPS 5KF)

• 500 – 700 MHz

• In-Order Execution

• Limited-Dual Issue

• Low Power (12 – 16 Watt)

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 3

Page 4: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Software Pipelining Framework

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 4

DDG

MII

MS

MVE

RA

CG

Front-End

Middle-End

Back-End

FORTRAN C/C++ Java

Loop NestOptimizer

GlobalOptimizer

CodeGenerator

Page 5: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Data Dependence Graph (DDG)

• Cyclic Data Dependence Graph

• No register anti- and output-dependencies

• Only single BBs

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 5

DDG

MII

MS

MVE

RA

CG

S1

S2

S1

S2

Page 6: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Minimum Initiation Interval

• Limited by:

– Resource Requirements

– Recurrence Requirements

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 6

DDG

MII

MS

MVE

RA

CG

Page 7: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Modulo Scheduling (MS)

• Huff Modulo Scheduling

– Lifetime sensitive

– Uses backtracking

• Hyper Node Reduction Scheduling

– Lifetime sensitive

– No backtracking

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 7

DDG

MII

MS

MVE

RA

CG

Page 8: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Modulo Variable Expansion (MVE)

• Separate TN set for every iteration

• # of unrollings depends on longest lifetime

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 8

DDG

MII

MS

MVE

RA

CG

TN10 op1 TNXX…TN20 op2 TN10 TNYY

TN11 op1 TNXX…TN21 op2 TN11 TNYY

TN10 op1 TNXX…TN20 op2 TN10 TNYY

Iteration

C

y

c

l

e

Page 9: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Register Allocation (RA)

• Only kernels are fully register allocated

• No regions

– SWP kernels are not a black box for GRA and LRA anymore

• Currently no register spilling support

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 9

DDG

MII

MS

MVE

RA

CG

Page 10: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Code Generation (CG)

• Prologues and epilogues are partially register allocated

• Need several epilogues due to different register sets

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 10

DDG

MII

MS

MVE

RA

CG

P

P0

K0

K1 E0

E1

E2

E

Page 11: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Benchmark Results

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 11

0.9

0.95

1

1.05

1.1

1.15

1.2

BT CG EP FT IS LU MG SP UA

spe

ed

up

NAS Parallel Benchmark

ABI n32

ABI 64

Page 12: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Benchmark Results

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 12

0.95

0.96

0.97

0.98

0.99

1

1.01

1.02

1.03

1.04

1.05

41

0.b

wav

es

43

3.m

ilc

43

4.z

eusm

p

43

5.g

rom

acs

43

6.c

actu

sAD

M

43

7.le

slie

3d

44

4.n

amd

44

7.d

ealII

45

0.s

op

lex

45

3.p

ovr

ay

45

4.c

alcu

lix

45

9.G

emsF

DTD

46

5.t

on

to

47

0.lb

m

48

1.w

rf

48

2.s

ph

inx3

40

0.p

erlb

ench

40

1.b

zip

2

40

3.g

cc

42

9.m

cf

44

5.g

ob

mk

45

6.h

mm

er

45

8.s

jen

g

46

2.li

bq

uan

tum

46

4.h

26

ref

47

1.o

mn

etp

p

47

3.a

star

spe

ed

up

SPEC2006

ABI n32

ABI 64

Page 13: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Conclusion and Future Work

• Spilling support needed to enable more loops for SWP

• Screen out loops with small trip count during runtime to reduce SWP overhead

• Hit-under-miss support to overcome current hardware limitations

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 13

Page 14: David Stephenson Timothy Kong Dee Lee Fred Chow Guang R. Gao

Questions ?

3/23/2009 Copyright © 2009 - Juergen Ributzka. All rights reserved. 14