AMNESIAC: Amnesic Automatic Computer€¦ · AMNESIAC: Amnesic Automatic Computer Trading...

Preview:

Citation preview

AMNESIAC: Amnesic Automatic ComputerTrading Computation for Communication for Energy Efficiency

Ismail Akturk and Ulya R. KarpuzcuUniversity of Minnesota, Twin Cities

{aktur002, ukarpuzc}@umn.edu

Motivation

Energy of on chip communication normalized to computation

Technology Node 40nm

1.55x 5.77x

10nm

AMNESIAC: Amnesic Automatic Computer

Keckler, S. W., Dally, W. J., Khailany, B., Garland, M., and Glasco, D. “GPUs and the Future of Parallel Computing”, IEEE Micro 31, 5 (2011).

2

Motivation

(Re)computing becomes cheaper than memorizing and retrieving data

AMNESIAC: Amnesic Automatic Computer

Keckler, S. W., Dally, W. J., Khailany, B., Garland, M., and Glasco, D. “GPUs and the Future of Parallel Computing”, IEEE Micro 31, 5 (2011).

3

Energy of on chip communication normalized to computation

Technology Node 40nm

1.55x 5.77x

10nm

Amnesic* Automatic Computer• Idea: Replace energy-hungry load with sequence of instructions

4

*amnesia: noun a partial or total loss of memory (origin late 18th cent.: from Greek amnēsia ‘forgetfulness’)

AMNESIAC: Amnesic Automatic Computer

Amnesic* Automatic Computer• Idea: Replace energy-hungry load with sequence of instructions• Scope

• How to determine which data values to recompute• How to identify instructions to recompute data values• How to protect architectural state during recomputation• How to orchestrate recomputation at runtime

5AMNESIAC: Amnesic Automatic Computer

*amnesia: noun a partial or total loss of memory (origin late 18th cent.: from Greek amnēsia ‘forgetfulness’)

How to determine which data values to recompute

6

core

L1

L2

memory

load A

core

L1

L2

memory

load B

core

L1

L2

memory

load C

A

B

C

Eld,B ? Erc,B Eld,C ? Erc,CEld,A ? Erc,A

AMNESIAC: Amnesic Automatic Computer

Recomputation Slice (RSlice)

7AMNESIAC: Amnesic Automatic Computer

Overview• Amnesic Compiler

• Identifies independent RSlices• Annotates RSlices: RCMP, REC, RTN

8AMNESIAC: Amnesic Automatic Computer

Overview• Amnesic Compiler

• Identifies independent RSlices• Annotates RSlices: RCMP, REC, RTN

• Amnesic Scheduler• Decides when to recompute• Can use various policies

• Compiler• First-Level Cache Miss - FLC• Last-Level Cache Miss - LLC

9AMNESIAC: Amnesic Automatic Computer

Overview• Amnesic Compiler

• Identifies independent RSlices• Annotates RSlices: RCMP, REC, RTN

• Amnesic Scheduler• Decides when to recompute• Can use various policies

• Compiler• First-Level Cache Miss - FLC• Last-Level Cache Miss - LLC

• Amnesic Microarchitecture• Prevents corruption of architectural state• Makes RSlice inputs available at the time of recomputation

10AMNESIAC: Amnesic Automatic Computer

Amnesic Microarchitecture

11

Source and Destination Registers

SFile

AMNESIAC: Amnesic Automatic Computer

Amnesic Microarchitecture

12

Source and Destination Registers

RegisterFile[#] -> SFile[#]

Renamer SFile

AMNESIAC: Amnesic Automatic Computer

Amnesic Microarchitecture

13

Leaf Address Non-Recomputable Inputs

v

Source and Destination Registers

RegisterFile[#] -> SFile[#]

Hist

Renamer SFile

v …

AMNESIAC: Amnesic Automatic Computer

Amnesic Microarchitecture

14

Leaf Address Non-Recomputable Inputs

v

RSlice Instructions

Source and Destination Registers

RegisterFile[#] -> SFile[#]

IBuff Hist

Renamer SFile

v …

AMNESIAC: Amnesic Automatic Computer

Putting It All Together: Default Execution

15

0

Leaf Address Non-Recomputable Inputs

v

RSlice Instructions

Source and Destination Registers

RegisterFile[#] -> SFile[#]

IBuff Hist

Renamer SFile

v …

MemoryorRegisterFile

AMNESIAC: Amnesic Automatic Computer

Putting It All Together: Default Execution

16

01

Leaf Address Non-Recomputable Inputs

v

RSlice Instructions

Source and Destination Registers

RegisterFile[#] -> SFile[#]

IBuff Hist

Renamer SFile

v …

MemoryorRegisterFile

Fetch Logic

AMNESIAC: Amnesic Automatic Computer

Putting It All Together: During Recomputation

17

01

2

Leaf Address Non-Recomputable Inputs

v

RSlice Instructions

Source and Destination Registers

RegisterFile[#] -> SFile[#]

IBuff Hist

Renamer SFile

v …

MemoryorRegisterFile

Fetch Logic

AMNESIAC: Amnesic Automatic Computer

Putting It All Together: During Recomputation

18

01

2 3

4Leaf Address Non-Recomputable Inputs

v

RSlice Instructions

Source and Destination Registers

RegisterFile[#] -> SFile[#]

IBuff Hist

Renamer SFile

v …

MemoryorRegisterFile

Fetch Logic

ExecutionUnits

AMNESIAC: Amnesic Automatic Computer

Putting It All Together: During Recomputation

19

01

2 3

4

5

Leaf Address Non-Recomputable Inputs

v

RSlice Instructions

Source and Destination Registers

RegisterFile[#] -> SFile[#]

IBuff Hist

Renamer SFile

v …

MemoryorRegisterFile

Fetch Logic

ExecutionUnits

AMNESIAC: Amnesic Automatic Computer

Putting It All Together: During Recomputation

20

01

2 3

4

5

6

Leaf Address Non-Recomputable Inputs

v

RSlice Instructions

Source and Destination Registers

RegisterFile[#] -> SFile[#]

IBuff Hist

Renamer SFile

v …

MemoryorRegisterFile

Fetch Logic

ExecutionUnits

AMNESIAC: Amnesic Automatic Computer

Putting It All Together: During Recomputation

21

01

2 3

4

5

6

7

8

Leaf Address Non-Recomputable Inputs

v

RSlice Instructions

Source and Destination Registers

RegisterFile[#] -> SFile[#]

IBuff Hist

Renamer SFile

v …

MemoryorRegisterFile

Fetch Logic

ExecutionUnits

AMNESIAC: Amnesic Automatic Computer

Evaluation Setup

22

• Benchmarks: SPEC2006, NAS, Parsec, Rodinia• 33 benchmarks evaluated, 11 of them show > 10% EDP gain

• Amnesic compiler pass mimicked by binary generator Pintool• Microarchitecture and scheduler implemented in Snipersim

AMNESIAC: Amnesic Automatic Computer

Energy-Delay Product

23AMNESIAC: Amnesic Automatic Computer

mcf sx cg is ca fs fe rt bp bfs sr

Energy-Delay Product

24AMNESIAC: Amnesic Automatic Computer

mcf sx cg is ca fs fe rt bp bfs sr

Energy-Delay Product

25AMNESIAC: Amnesic Automatic Computer

mcf sx cg is ca fs fe rt bp bfs sr

Energy-Delay Product

26AMNESIAC: Amnesic Automatic Computer

mcf sx cg is ca fs fe rt bp bfs sr

mcf sx cg is ca fs fe rt bp bfs sr

Energy-Delay Product

27AMNESIAC: Amnesic Automatic Computer

up to 87% (24.92% on average ) EDP gain obtained

Instruction Mix

28

Benchmark% increase in

instruction count% decrease in load

count

mcf 4.47 6.19

sx 4.55 6.68

cg 3.97 2.11

is 17.97 49.99

ca 7.38 7.95

fs 1.83 3.08

fe 3.55 1.75

rt 1.97 6.08

bp 31.89 55.55

bfs 1.20 60.93

sr 20.02 23.33

AMNESIAC: Amnesic Automatic Computer

Benchmark% increase in

instruction count% decrease in load

count

mcf 4.47 6.19

sx 4.55 6.68

cg 3.97 2.11

is 17.97 49.99

ca 7.38 7.95

fs 1.83 3.08

fe 3.55 1.75

rt 1.97 6.08

bp 31.89 55.55

bfs 1.20 60.93

sr 20.02 23.33

Instruction Mix

29AMNESIAC: Amnesic Automatic Computer

49.99% reduction in loads, at the expense of 17.97% more instructions

Benchmark% increase in

instruction count% decrease in load

count

mcf 4.47 6.19

sx 4.55 6.68

cg 3.97 2.11

is 17.97 49.99

ca 7.38 7.95

fs 1.83 3.08

fe 3.55 1.75

rt 1.97 6.08

bp 31.89 55.55

bfs 1.20 60.93

sr 20.02 23.33

Instruction Mix

30AMNESIAC: Amnesic Automatic Computer

translates into 74.68% reduction in (load) energy

49.99% reduction in loads, at the expense of 17.97% more instructions

RSlice Length

31

sx bfs

AMNESIAC: Amnesic Automatic Computer

RSlice Length

32

sx bfs

AMNESIAC: Amnesic Automatic Computer

majority of RSlices (78.32%) have less than 10 instructions

Summary• Amnesic execution can minimize power and performance overhead of

• Data retrieval• Data communication

• Amnesic execution makes the workload more compute-intensive• Better use of classic architectures

• Up to 87% improvement in energy-delay product (avg. 24%)

33AMNESIAC: Amnesic Automatic Computer

Questions and Comments

Please send your questions and comments to

aktur002@umn.edu

34AMNESIAC: Amnesic Automatic Computer

AMNESIAC: Amnesic Automatic ComputerTrading Computation for Communication for Energy Efficiency

Ismail Akturk and Ulya R. KarpuzcuUniversity of Minnesota, Twin Cities

{aktur002, ukarpuzc}@umn.edu

Recommended