63
CENG3420 Lecture 09: Memory Organization Bei Yu [email protected] (Latest update: March 23, 2017) 2017 Spring 1 / 48

@let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu [email protected] (Latest

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

CENG3420Lecture 09: Memory Organization

Bei Yu

[email protected](Latest update: March 23, 2017)

2017 Spring

1 / 48

Page 2: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Overview

Introduction

Random Access Memory (RAM)

Interleaving

Secondary Memory

Conclusion

2 / 48

Page 3: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Overview

Introduction

Random Access Memory (RAM)

Interleaving

Secondary Memory

Conclusion

3 / 48

Page 4: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Review: Major Components of a Computer

Processor

Control

Datapath

Memory

Devices

Input

Output

Cache

Main

Mem

ory

Secondary M

emory

(Disk)

3 / 48

Page 5: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Why We Need Memory?

Combinational Circuit:I Always gives the same output for a given set of inputs

I E.g., adders

Sequential Circuit:I Store information

I Output depends on stored information

I E.g., counter

I Need a storage element

4 / 48

Page 6: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Who Cares About the Memory Hierarchy?

Processor60%/yr.(2x/1.5  yr)

DRAM9%/yr.(2x/10  yrs)1

10

100

100019

8019

81

1983

1984

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

1998

1999

2000

DRAM

CPU

1982

Processor-­MemoryPerformance  Gap:(grows  50%  /  year)

Performance

Time

“Moore’s  Law”Processor  Growth  Curve  follows

Processor-DRAM Memory Performance Gap

5 / 48

Page 7: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

6 / 48

Page 8: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

7 / 48

Page 9: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Memory System Revisted

I Maximum size of memory is determined by addressing scheme

E.g.

16-bit addresses can only address 216 = 65536 memory locations

I Most machines are byte-addressableI each memory address location refers to a byteI Most machines retrieve/store data in wordsI Common abbreviations

I 1k ≈ 210 (kilo)I 1M ≈ 220 (Mega)I 1G ≈ 230 (Giga)I 1T ≈ 240 (Tera)

8 / 48

Page 10: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Simplified ViewData transfer takes place through

I MAR: memory address registerI MDR: memory data register

Up to 2k addressableMDR

MAR

k-bitaddress bus

n -bitdata bus

Control lines( , MFC, etc.)

Processor Memory

locations

Word length = n bits

WR /

Several  addressablelocations  (bytes)  are  grouped  into  a  word

9 / 48

Page 11: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Big Picture

Processor usually runs much faster than main memory:I Small memories are fast, large memories are slow.I Use a cache memory to store data in the processor that is likely to be

used.

Main memory is limited:I Use virtual memory to increase the apparent size of physical memory

by moving unused sections of memory to disk (automatically).I A translation between virtual and physical addresses is done by a

memory management unit (MMU)I To be discussed in later lectures

10 / 48

Page 12: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Characteristics of the Memory Hierarchy

Increasing distance from the processor in accesstime

L1$

L2$

Main Memory

Secondary Memory

Processor

(Relative) size of the memory at each level

Inclusive–what is in L1$ is a subset of what is in L2$ is a subset of what is in MM that is a subset of is in SM

4-8 bytes (word)

1 to 4 blocks

1,024+ bytes (disk sector = page)

8-32 bytes (block)

11 / 48

Page 13: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Memory Hierarchy: Why Does it Work?

Temporal Locality (locality in time)

If a memory location is referenced then it will tend to bereferenced again soon

I Keep most recently accessed data items closer to the processor

Spatial Locality (locality in space)

If a memory location is referenced, the locations with nearbyaddresses will tend to be referenced soon

I Move blocks consisting of contiguous words closer to the processor

12 / 48

Page 14: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Memory Hierarchy: Why Does it Work?

Temporal Locality (locality in time)

If a memory location is referenced then it will tend to bereferenced again soon

I Keep most recently accessed data items closer to the processor

Spatial Locality (locality in space)

If a memory location is referenced, the locations with nearbyaddresses will tend to be referenced soon

I Move blocks consisting of contiguous words closer to the processor

12 / 48

Page 15: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Memory Hierarchy

Taking advantage of the principle of locality:I Present the user with as much memory as is available in the cheapest

technology.I Provide access at the speed offered by the fastest technology

Control

Datapath

SecondaryStorage(Disk)

Processor

Registers

MainMemory(DRAM)

SecondLevelCache

(SRAM)

On-C

hipC

ache

Speed:Hundreds Tera'sSize (bytes): Mega's Giga's

TertiaryStorage(Tape)

~1 ns Tens msTens ns Hundreds ns – 1 us Tens sec

13 / 48

Page 16: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Terminology

Random Access Memory (RAM)

Property: comparable access time for any memory locations

Block (or line)

the minimum unit of information that is present (or not) in acache

14 / 48

Page 17: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Terminology

I Hit Rate: the fraction of memory accesses found in a level of thememory hierarchy

I Miss Rate: the fraction of memory accesses not found in a level of thememory hierarchy, i.e. 1 - (Hit Rate)

Hit TimeTime to access the block + Time to determine hit/miss

Miss Penalty

Time to replace a block in that level with the correspondingblock from a lower level

Hit Time << Miss Penalty

15 / 48

Page 18: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Bandwidth v.s. Latency

Example

I Mary acts FAST but she’s always LATE.I Peter is always PUNCTUAL but he is SLOW.

Bandwidth:I talking about the “number of bits/bytes per second” when transferring a

block of data steadily.

Latency:I amount of time to transfer the first word of a block after issuing the

access signal.I Usually measure in “number of clock cycles” or in ns/µs.

16 / 48

Page 19: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Bandwidth v.s. Latency

Example

I Mary acts FAST but she’s always LATE.I Peter is always PUNCTUAL but he is SLOW.

Bandwidth:I talking about the “number of bits/bytes per second” when transferring a

block of data steadily.

Latency:I amount of time to transfer the first word of a block after issuing the

access signal.I Usually measure in “number of clock cycles” or in ns/µs.

16 / 48

Page 20: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Question:Suppose the clock rate is 500 MHz. What is the latency andwhat is the bandwidth, assuming that each data is 64 bits?

Clock

RowAccessStrobe

Data d0 d1 d2

17 / 48

Page 21: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Overview

Introduction

Random Access Memory (RAM)

Interleaving

Secondary Memory

Conclusion

18 / 48

Page 22: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Storage based on Feedback

I What if we add feedback to a pair of inverters?

I Usually drawn as a ring of cross-coupled invertersI Stable way to store one bit of information (w. power)

18 / 48

Page 23: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Storage based on Feedback

I What if we add feedback to a pair of inverters?

I Usually drawn as a ring of cross-coupled invertersI Stable way to store one bit of information (w. power)

18 / 48

Page 24: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

How to change the value stored?

I Replace inverter with NOR gateI SR-Latch

19 / 48

Page 25: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

QUESTION:Whats the Q value based on different R, S inputs?

I R=S=1:

I S=0, R=1:

I S=1, R=0:

I R=S=0:

20 / 48

Page 26: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

SRAM Cell

I At least 6 transistors (6T)I Used in most commercial chipsI A pair of weak cross-coupled invertersI Data stored in cross-coupled inverters

bit bit_b

word

21 / 48

Page 27: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Classical SRAM Organization

Row

Decoder

rowaddress

data bit or word

RAM CellArray

word (row) line

bit (data) lines

Each intersection represents a 6-T SRAM cell

Column Selector &I/O Circuits

columnaddress

One memory row holds a block of data, so the column address selects the requested bit or wordfrom that block

22 / 48

Page 28: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Classical SRAM OrganizationLatch based memory

n

D D D.$.$.

D D D.$.$.

D D D.$.$.

.$.$.

m

Memory$Data$Out

mMemory$Data$In

Read$Address$Decoder

Memory$Read$Address

n

Write$Address$Decoder

Memory$WriteAddress

GatedD8latch

D Q

WERead$bitlines

Write$bitlines

Write$word$line

Read$word$line

.$.$.

.$.$.

WE

23 / 48

Page 29: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

DRAM CellI 1 Transistor (1T)I Requires presence of an extra capacitorI Modifications in the manufacturing process.I Higher densityI Write: Charged or discharged the capacitor (slow)I Read: Charge redistribution takes place between bit line and storage

capacitance

24 / 48

Page 30: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Classical DRAM Organization

data bitdata bit

Row

Decoder

rowaddress

Column Selector &I/O Circuits

columnaddress

data bit

word (row) line

bit (data) lines

Each intersection represents a 1-T DRAM cell

Thecolumnaddressselectstherequestedbit fromtherowineachplane

RAM CellArray

25 / 48

Page 31: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Synchronous DRAM (SDRAM)

I The common type used today as it uses a clock to synchronize theoperation.

I The refresh operation becomes transparent to the users.

I All control signals needed are generated inside the chip.

I The initial commercial SDRAM in the1990s were designed for clockspeed of up to 133MHz.

I Todays SDRAM chips operate with clock speeds exceeding 1 GHz.

Memory modules are used to hold severalSDRAM chips and are the standard type usedin a computers motherboard, of size like 4GBor more.

26 / 48

Page 32: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Synchronous DRAM (SDRAM)

I The common type used today as it uses a clock to synchronize theoperation.

I The refresh operation becomes transparent to the users.

I All control signals needed are generated inside the chip.

I The initial commercial SDRAM in the1990s were designed for clockspeed of up to 133MHz.

I Todays SDRAM chips operate with clock speeds exceeding 1 GHz.

Memory modules are used to hold severalSDRAM chips and are the standard type usedin a computers motherboard, of size like 4GBor more.

26 / 48

Page 33: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Double Data Rate (DDR) SDRAM

I normal SDRAMs only operate once per clock cycle

I Double Data Rate (DDR) SDRAM transfers data on both clock edges

I DDR-2 (4x basic memory clock) and DDR-3 (8x basic memory clock)are in the market.

I They offer increased storage capacity, lower power and faster clockspeeds.

I For example, DDR2 can operate at clock frequencies of 400 and 800MHz. Therefore, they can transfer data at effective clock speed of 800and 1600 MHz.

27 / 48

Page 34: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Performance of SDRAM

1 Hertz1 Cycle per second

RAM Type Theoretical Maximum BandwidthSDRAM 100 MHz (PC100) 100 MHz X 64 bit/ cycle = 800 MByte/sec

SDRAM 133 MHz (PC133) 133 MHz X 64 bit/ cycle = 1064 MByte/sec

DDR SDRAM 200 MHz (PC1600) 2 X 100 MHz X 64 bit/ cycle ~= 1600 MByte/sec

DDR SDRAM 266 MHz (PC2100) 2 X 133 MHz X 64 bit/ cycle ~= 2100 MByte/sec

DDR SDRAM 333 MHz (PC2600) 2 X 166 MHz X 64 bit/ cycle ~= 2600 MByte/sec

DDR-2 SDRAM 667 MHz (PC2-5400) 2 X 2 X 166 MHz X 64 bit/ cycle ~= 5400 MByte/sec

DDR-2 SDRAM 800 MHz (PC2-6400) 2 X 2 X 200 MHz X 64 bit/ cycle ~= 6400 MByte/sec

Bandwidth comparison. However, due to latencies, SDRAM does not perform as goodas the figures shown.

28 / 48

Page 35: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

SRAM v.s. DRAM

Static RAM (SRAM)I Capable of retaining the state as long as power is applied.I They are fast, low power (current flows only when accessing the cells)

but costly (require several transistors), so the capacity is small.I They are the Level 1 cache and Level 2 cache inside a processor, of

size 3 MB or more.

Dynamic RAM (DRAM)I store data as electric charge on a capacitor.I Charge leaks away with time, so DRAMs must be refreshed.I In return for this trouble, much higher density (simpler cells).

29 / 48

Page 36: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Memory Hierarchy

I Aim: to produce fast,big and cheapmemory

I L1, L2 cache areusually SRAM

I Main memory isDRAM

I Relies on locality ofreference

Processor

Primarycache

Secondarycache

Main

Magnetic disk

memory

Increasingsize

Increasingspeed

secondarymemory

Increasingcost per bit

Registers

L1

L2

Increasinglatency

30 / 48

Page 37: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Mix-and-Match: Best of Both

By taking advantages of the principle of locality:I Present the user with as much memory as is available in the cheapest

technology.I Provide access at the speed offered by the fastest technology.

DRAM is slow but cheap and dense:I Good choice for presenting the user with a BIG memory system main

memory

SRAM is fast but expensive and not very dense:I Good choice for providing the user FAST access time L1 and L2 cache

31 / 48

Page 38: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Overview

Introduction

Random Access Memory (RAM)

Interleaving

Secondary Memory

Conclusion

32 / 48

Page 39: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Memory ControllerI A memory controller is normally used to interface between the memory

and the processor.I DRAMs have a slightly more complex interface as they need refreshing

and they usually have time-multiplex signals to reduce pin number.I SRAM interfaces are simpler and may not need a memory controller.

Processor

RAS

CAS

R/ W

Clock

AddressRow/Columnaddress

Memorycontroller

R/ W

Clock

Request

CS

Data

Memory

RAS (CAS) = Row (Column) Address Strobe; CS = Chip Select

32 / 48

Page 40: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Memory Controller

I The memory controller accepts a complete address and the R/W signalfrom the processor.

I The controller generates the RAS (Row Access Strobe) and CAS(Column Access Strobe) signals.

I The high-order address bits, which select a row in the cell array, areprovided first under the control of the RAS (Row Access Strobe) signal.

I Then the low-order address bits, which select a column, are providedon the same address pins under the control of the CAS (ColumnAccess Strobe) signal.

I The right memory module will be selected based on the address. Datalines are connected directly between the processor and the memory.

I SDRAM needs refresh, but the refresh overhead is only less than 1percent of the total time available to access the memory.

33 / 48

Page 41: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Memory Controller

I The memory controller accepts a complete address and the R/W signalfrom the processor.

I The controller generates the RAS (Row Access Strobe) and CAS(Column Access Strobe) signals.

I The high-order address bits, which select a row in the cell array, areprovided first under the control of the RAS (Row Access Strobe) signal.

I Then the low-order address bits, which select a column, are providedon the same address pins under the control of the CAS (ColumnAccess Strobe) signal.

I The right memory module will be selected based on the address. Datalines are connected directly between the processor and the memory.

I SDRAM needs refresh, but the refresh overhead is only less than 1percent of the total time available to access the memory.

33 / 48

Page 42: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Memory Controller

I The memory controller accepts a complete address and the R/W signalfrom the processor.

I The controller generates the RAS (Row Access Strobe) and CAS(Column Access Strobe) signals.

I The high-order address bits, which select a row in the cell array, areprovided first under the control of the RAS (Row Access Strobe) signal.

I Then the low-order address bits, which select a column, are providedon the same address pins under the control of the CAS (ColumnAccess Strobe) signal.

I The right memory module will be selected based on the address. Datalines are connected directly between the processor and the memory.

I SDRAM needs refresh, but the refresh overhead is only less than 1percent of the total time available to access the memory.

33 / 48

Page 43: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Memory Module InterleavingI Processor and cache are fast, main memory is slow.I Try to hide access latency by interleaving memory accesses across

several memory modules.I Each memory module has own Address Buffer Register (ABR) and

Data Buffer Register (DBR)

Which scheme below can be better interleaved?

m bits

Address in module MMaddress

i

k bits

Module Module Module

Module

DBRABR DBRABR ABR DBR

0 n 1-

(a) Consecutive words in amodule

i

k bits

0ModuleModuleModule

Module MM address

DBRABRABR DBRABR DBR

Address in module

2k 1-

m bits

(b) Consecutive words indifferent modules

34 / 48

Page 44: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Memory Module InterleavingI Processor and cache are fast, main memory is slow.I Try to hide access latency by interleaving memory accesses across

several memory modules.I Each memory module has own Address Buffer Register (ABR) and

Data Buffer Register (DBR)

Which scheme below can be better interleaved?

m bits

Address in module MMaddress

i

k bits

Module Module Module

Module

DBRABR DBRABR ABR DBR

0 n 1-

(a) Consecutive words in amodule

i

k bits

0ModuleModuleModule

Module MM address

DBRABRABR DBRABR DBR

Address in module

2k 1-

m bits

(b) Consecutive words indifferent modules

34 / 48

Page 45: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Memory Module InterleavingI Two or more compatible (identical the best) memory modules are used.

I Within a memory module, several chips are used in “parallel”.

I E.g. 8 modules, and within each module 8 chips are used in “parallel’.Achieve a 8× 8 = 64-bit memory bus.

I Memory interleaving can be realized in technology such as “DualChannel Memory Architecture”.

data bitdata bit

Row

Decoder

rowaddress

Column Selector &I/O Circuits

columnaddress

data bit

word (row) line

bit (data) lines

Each intersection represents a 1-T DRAM cell

Thecolumnaddressselectstherequestedbit fromtherowineachplane

RAM CellArray

35 / 48

Page 46: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Non-Interleaving v.s. Interleaving

CPU

Cache

DRAMMemory

bus

on-chip

(a) Non-Interleaving

CPU

Cache

DRAMMemorybank 1

bus

on-chip

DRAMMemorybank 0

DRAMMemorybank 2

DRAMMemorybank 3

(b) Interleaving

36 / 48

Page 47: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Example

I Suppose we have a cache read miss and need to load from mainmemory

I Assume cache with 8-word block, i.e., cache line size = 8 words (bytes)

I Assume it takes one clock to send address to DRAM memory and oneclock to send data back.

I In addition, DRAM has 6 cycle latency for first word

I Good that each of subsequent words in same row takes only 4 cycles

Single Memory Read: 1 + 6 + 1 = 8 Cycles

1 6 1

37 / 48

Page 48: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Example: Non-Interleaving1 6 1

I First byte DRAM needs 6 cycle (same as single memory read)

I All subsequent words DRAM needs 4 cycleI Non-overlappings in cache accessI Assumption: all words are in the same row

Non-Interleaving Cycle#

1 + 1 × 6 + 7 × 4 + 1 = 36

38 / 48

Page 49: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Example: Non-Interleaving1 6 1

1 4 1

1 4 1

1 4 1

I First byte DRAM needs 6 cycle (same as single memory read)I All subsequent words DRAM needs 4 cycleI Non-overlappings in cache accessI Assumption: all words are in the same row

Non-Interleaving Cycle#

1 + 1 × 6 + 7 × 4 + 1 = 36

38 / 48

Page 50: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Example: Non-Interleaving1 6 1

1 4 1

1 4 1

1 4 1

I First byte DRAM needs 6 cycle (same as single memory read)I All subsequent words DRAM needs 4 cycleI Non-overlappings in cache accessI Assumption: all words are in the same row

Non-Interleaving Cycle#

1 + 1 × 6 + 7 × 4 + 1 = 36

38 / 48

Page 51: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Example: Four Module Interleaving

1

1 6 1

1 6 1

1 6 1

1 6

Interleaving Cycle#

1 + 6 + 1 × 8 = 15

39 / 48

Page 52: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Example: Four Module Interleaving

1 4 1

1 4 1

1 4 1

1 4 1

1

1 6 1

1 6 1

1 6 1

1 6

Interleaving Cycle#

1 + 6 + 1 × 8 = 15

39 / 48

Page 53: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Question:To transfer 8 bytes, what is the cycle# if just have TWO-moduleinterleaved?

40 / 48

Page 54: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Overview

Introduction

Random Access Memory (RAM)

Interleaving

Secondary Memory

Conclusion

41 / 48

Page 55: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Major Components of A Computer

Processor

Control

Datapath

Memory

Devices

Input

Output

Cache

Main

Mem

ory

Secondary M

emory

(Disk)

41 / 48

Page 56: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Magnetic Disk

I Long term, nonvolatile storageI Lowest level memory: slow; large; inexpensiveI A rotating platter coated with a magnetic surfaceI A moveable read/write head to access the information

Track

SectorSector

Cylinder

PlatterHead

TrackController+

Cache

42 / 48

Page 57: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Magnetic Disk (cont.)I Latency: average seek time plus the rotational latencyI Bandwidth: peak transfer time of formatted data from the media (not

from the cache)

Year of Introduction

Bandwidth (MB/s)

Latency (msec)

I In the time the bandwidth doubles, latency improves by a factor of onlyaround 1.2

43 / 48

Page 58: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Read-Only Memory (ROM)

I Memory content fixed and cannot be changed easily.I Useful to bootstrap a computer since RAM is volatile (i.e. lost memory)

when power removed.I We need to store a small program in such a memory, to be used to start

the process of loading the OS from a hard disk into the main memory.

PROM/EPROM/EEPROM

44 / 48

Page 59: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Flash Storage

I First credible challenger to disksI Nonvolatile, and 100×−1000× faster than disksI Wear leveling to overcome wear out problem

45 / 48

Page 60: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

FLASH Memory

I Flash devices have greater density, higher capacity and lower cost perbit.

I Can be read and written

I This is normally used for non-volatile storage

I Typical applications include cell phones, digital cameras, MP3 players,etc.

46 / 48

Page 61: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

FLASH Cards

I Flash cards are made from FLASH chips

I Flash cards with standard interface are usable in a variety of products.

I Flash cards with USB interface are widely used – memory keys.

I Larger cards may hold 32GB. A minute of music can be stored in about1MB of memory, hence 32GB can hold 500 hours of music.

47 / 48

Page 62: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Overview

Introduction

Random Access Memory (RAM)

Interleaving

Secondary Memory

Conclusion

48 / 48

Page 63: @let@token CENG3420 Lecture 09: Memory Organizationbyu/CENG3420/2017Spring/L09-memory.pdf · 2017. 3. 23. · CENG3420 Lecture 09: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest

Conclusion

I Processor usually runs much faster than main memory

I Common RAM types:SRAM, DRAM, SDRAM, DDR SDRAM

I Principle of locality: Temporal and SpatialI Present the user with as much memory as is available in the

cheapest technology.I Provide access at the speed offered by the fastest technology.

I Memory hierarchy:I Register→ Cache→ Main Memory→ Disk→ Tape

48 / 48