30
MEMORY SYSTEMS, ETC. Bruce Jacob University of Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering University of Maryland at College Park [email protected] 1

Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

The Memory System and YouA Love/Hate Relationship

Bruce JacobElectrical & Computer EngineeringUniversity of Maryland at College [email protected]

1

Page 2: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

Good Quotes from Tues.

Al Geist: Apps get bigger and more complex

Thomas Schulthess: Memory BW primary limiter to solving superconductivity equations

Karl-Heinz Winkler: Memory = 50% power(cf. Zia spec: proc=214W, proc+mem=230W)

Chuck Moore: image/video as data types => larger working sets, larger data types

Bill Camp: “DRAM sucks—that’s the real problem”

2

Page 3: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

Some Trends

Storage per CPU socket has been flat:

(the capacity problem, Brinda Ganesh’s thesis)

Per-core capacity decr. as # cores/CPU incr.

3

(Gb

it)

Page 4: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

Some Trends

Required BW per core (~1 GB/s):

• Thread-based load (SPECjbb),memory set to 52GB/s sustained

• Saturates around 64 cores/threads (~1GB/s per core)

• cf. 32-core Sun Niagara: saturates at 25.6 GB/s

4

Page 5: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

Some Trends

Commodity Systems:

• Low double-digit GB per CPU socket

• $10–100 per DIMM

High End:

• Higher (but still not high)double-digit GB per CPU socket

• ~ $1000 per DIMM

Fully-Buffered DIMM:

• (largely failed) attempt to bridge the gap …

5

Page 6: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

Some Trends: FB-DIMM

• JEDEC DDRx: ~1W/DIMM, ~10W total

• FB-DIMM: 5–10W/DIMM, ~350W total

6

MC MC

Page 7: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

Some Perspective

Cost of access is high; requires significant effort to amortize this over the (increasingly short) payoff.

ColumnRead

7

tRP = 15ns tRCD = 15ns, tRAS = 37.5ns

CL = 8

Bank Precharge

Row Activate (15ns)and Data Restore (another 22ns)

DATA (on bus)

BL = 8TIME

Page 8: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

Bottom Line

DRAM is performance limiter: HPC apps will always exceed your cache (if you could run it on a laptop, you would)

Memory system determines, to large extent:

• Your performance

• Your power dissipation

• Your system cost

PROBLEM:

• Nobody models it accurately

8

Page 9: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

What It Looks Like

CPU/$CPU/$

Outgoing bus request

MC

read data

read data

Read B

Write X, data

Read Z

Write Q, data

Read A

Write A, data

Read W

Read Z

Read Y

AC

T

RD

PR

E

RD

RD

PR

E

PR

E

AC

T

WR

WR

AC

TR

D

PRERDread data

bea

t

cmd

9

Page 10: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

How It Is Represented

if (cache_miss(addr)) {

cycle_count += DRAM_LATENCY;

}

… even in simulators with “cycle accurate” memory systems—no lie

10

Page 11: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

Some Cases In Point

Prefetching, Multicore, etc.

11

!"

#$%%

&'((

!"#$%&'()*()'+&%,-%./01&'1-.2/%"3-0'-,'4/%"-$3'.&.-%5'.-6&73',-%'889:*;<<)'

&)*+,-.,/0+0)12+3)*+4/55*6*73+789:*6+15+;16*0+.74+<-.,/0+0)12+3)*+='(+4/55*6*7;*+516

>.6/180+9*916<+914*?0+7169.?/@*4+31+.;;86.3*+9*916<+;17361??*6A+B*+;.7+1:0*6>*+3).3

3)*+ 4/55*6*7;*+ /7;6*.0*0+ 2/3)+ 3)*+ 789:*6+ 15+ ;16*0+ .?:*/3+ ?*00*6+ 3).7+ CCD-EFF

;175/G86.3/17A

Page 12: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

Some Cases In Point

Prefetching, Multicore, etc.

12

!"

!"#$%&'()*+)',&%-.%/012&'2./30%"4.1'.-'50%".$4'/&/.%6'/.7&84'9":;'3%&-&:2;"1#)'

#$%&' ()*+$&' &$,-&' .$/' 012' 3%44/)/56/' 5,)7*8%9/3' .,' *66:)*./' 7/7,);' 6,5.),88/)' 4,)

+)/4/.6$%5(' &6$/7/' <&.)/*7' +)/4/.6$/)'-%.$' *' 3/+.$' ,4' =>?' #$/' +/)4,)7*56/' 3%44/)/56/

@/.-//5'.$/'&%7+8%&.%6'7,3/8&'*53'AB2'%56)/*&/&'-%.$'.$/'6,)/&'*53'%&'()/*./)'.$*5'.$/

5,C+)/4/.6$%5('&6$/7/?

DEFF

#122

Page 13: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

Some Cases In Point

Prefetching, Multicore, etc.

13

Page 14: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

Your Choice

Today, DRAM is primary limiter. If you really want to predict system behavior

• performance

• power

• cost

you have to model the memory system, accurately.

… so what do you get if you do?

14

Page 15: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

DRAMsim

15

Latest version via Sandia & NSA

Page 16: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

DRAMsim

SystemParameters

16

Page 17: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

DRAMsim

17

Page 18: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

DRAMsim

DeviceParameters

18

Page 19: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

DRAMsim

19

Page 20: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

DRAMsim

Overlay Graphs from Multiple Runs

20

Page 21: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

DRAMsim Results: Epetra

21

bandwidth, open page

bandwidth, closed page

Page 22: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

DRAMsim Results: Epetra

22

latency, closed page

latency, open page

Page 23: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

DRAMsim Results: Epetra

23

power, closed page power, open page

Page 24: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

DRAMsim Results: Epetra

24

BANDWIDTH, addr2

Page 25: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

DRAMsim Results: Epetra

25

BANDWIDTH, addr4

Page 26: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

DRAMsim Results: Epetra

26

POWER, addr2

Page 27: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

DRAMsim Results: Epetra

27

POWER, addr4

Page 28: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

DRAMsim Results: Flash

28

0

1

2

3

4

1 x

8 x

25

1 x

8 x

50

2 x

8 x

25

1 x

16 x

50

1 x

8 x

100

2 x

8 x

50

1 x

16 x

50

4 x

8 x

25

2 x

16 x

25

1 x

32 x

25

2 x

8 x

100

1 x

16 x

100

4 x

8 x

50

2 x

16 x

50

1 x

32 x

50

4 x

16 x

25

2 x

32 x

25

4 x

8 x

100

2 x

16 x

100

1 x

32 x

100

4 x

16 x

50

2 x

32 x

50

4 x

32 x

25

4 x

16 x

100

2 x

32 x

100

4 x

32 x

50

4 x

32 x

100

Read Response Time vs. Organization

Resp

on

se T

ime (

msec)

Configuration

disk-interface speeds are scaling up with serial interface and fiber channel, SSD’s performance is

expected to be limited by the media transfer rate. We have measured the effect of media transfer rate on

the performance of NAND Flash SSD by scaling I/O bus bandwidth from 25 MB/s (8-bit wide bus at 25

MHz) up to 400 MB/s (32-bit wide bus at 100 MHz). As shown in Figure 7, performance does not

improve significantly beyond 100 MB/s.

However, note that, even when performance saturates at high bandwidths, it is still possible to

achieve significant performance gains by increasing the level of concurrency by either banking or

implementing superblocks. Performance saturates at 100MB/s because the real limitation to NAND Flash

memory performance is the device’s core interface—the requirement to read and write the flash storage

array through what is effectively a single port (the read/cache registers)—and this is a limitation that

concurrency overcomes.

5.4. Increasing the Degree of Concurrency

As shown previously, flash memory performance can be improved significantly if request latency is

reduced by dividing the flash array into independent banks and utilizing concurrency. The flash controller

can support these concurrent requests through multiple flash memory banks via the same channel or

through multiple independent channels to different banks, or through a combination of two. To get a

better idea of the shape of the design space, we have focused on changing the degree of concurrency one

I/O bandwidth at a time. Figure 8 shows example configurations modeled in our simulations with

bandwidths ranging from 25 MB/s to 400 MB/s. This is equivalent to saying, “I have 4 50 MHz 8-bit I/O

channels... what should I do? Gang them together, use them as independent channels, or a combination of

the two?”

The performance results are shown in Figure 9. Though increasing the number of available

concurrency in the storage sub-system (number of banks x number of channels) typically increases

16

pro

Fit T

RIA

L v

ers

ion

Ho

st

Ho

st I

/F

Fla

sh

Co

ntr

oll

er

Ban

k

Ban

k

Ban

k

Ban

k

Ban

k

Ban

k

Ban

k

Ban

k

Ban

k

Ban

k

Ban

k

Ban

k

Ban

k

Ban

k

Ban

k

Ban

k

Ban

k

Ban

k

Ban

k

Ban

k

Ban

k

Ban

k

Ban

k

Ban

kHo

st

Ho

st I

/F

Fla

sh

Co

ntr

oll

er

Ho

st

Ho

st I

/F

Fla

sh

Co

ntr

oll

er

(a) Single channel (b) Dedicated channel for each bank (c) Multiple shared channels

Figure 8: Flash SSD Organizations. (a) Single I/O bus is shared - 1, 2, or 4 banks; (b) dedicated I/O bus: 1, 2, or 4

buses and single bank per bus; (c) multiple shared I/O channels - 2 or 4 channels with 2 or 4 banks per channel.

Page 29: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

… and tons more

Address-mapping policies (2–10x)

Scheduling policies (2–10x)(e.g., improved Cray MC by 20% sustained BW)

Paging policies (2x)

System organization (10+x)

Blocking, cache interactions, OS interactions (e.g. virtual memory), file system interactions, queueing mechanisms, device architectures, …

If you want to reduce power, this is where to lookIf you want to increase performance, this is where to look

29

Page 30: Maryland and You - Sandia National Laboratories...Maryland SOS-13 March 2009 SLIDE The Memory System and You A Love/Hate Relationship Bruce Jacob Electrical & Computer Engineering

MEMORY SYSTEMS,ETC.

Bruce Jacob

University of Maryland

SOS-13March 2009

SLIDE

Shameless Plug

30