39
55:035 Computer Architecture and Organization Lecture 6 1

55:035 Computer Architecture and Organization Lecture 6 1

Embed Size (px)

Citation preview

Page 1: 55:035 Computer Architecture and Organization Lecture 6 1

55:035 Computer Architecture and Organization

Lecture 6

1

Page 2: 55:035 Computer Architecture and Organization Lecture 6 1

Outline Memory Arrays and Hierarchy SRAM Architecture

SRAM Cell Decoders Column Circuitry Multiple Ports

Serial Access Memories Flash DRAM

255:035 Computer Architecture and Organization

Page 3: 55:035 Computer Architecture and Organization Lecture 6 1

Memory ArraysMemory Arrays

Random Access Memory Serial Access Memory Content Addressable Memory(CAM)

Read/Write Memory(RAM)

(Volatile)

Read Only Memory(ROM)

(Nonvolatile)

Static RAM(SRAM)

Dynamic RAM(DRAM)

Shift Registers Queues

First InFirst Out(FIFO)

Last InFirst Out(LIFO)

Serial InParallel Out

(SIPO)

Parallel InSerial Out

(PISO)

Mask ROM ProgrammableROM

(PROM)

ErasableProgrammable

ROM(EPROM)

ElectricallyErasable

ProgrammableROM

(EEPROM)

Flash ROM

355:035 Computer Architecture and Organization

Page 4: 55:035 Computer Architecture and Organization Lecture 6 1

Levels of the Memory Hierarchy

55:035 Computer Architecture and Organization 4

Part of The On-chip CPU Datapath ISA 16-128 Registers

One or more levels (Static RAM):Level 1: On-chip 16-64K Level 2: On-chip 256K-2MLevel 3: On or Off-chip 1M-16M

Registers

CacheLevel(s)

Main Memory

Magnetic Disc

Optical Disk or Magnetic Tape

Farther away from the CPU:

Lower Cost/Bit

Higher Capacity

Increased AccessTime/Latency

Lower Throughput/Bandwidth

Dynamic RAM (DRAM) 256M-16G

Interface:SCSI, RAID, IDE, 139480G-300G

CPU

Page 5: 55:035 Computer Architecture and Organization Lecture 6 1

Memory Hierarchy Comparisons

55:035 Computer Architecture and Organization 5

CPU Registers100s Bytes<10s ns

CacheK Bytes10-100 ns1-0.1 cents/bit

Main MemoryM Bytes200ns- 500ns$.0001-.00001 cents /bitDiskG Bytes, 10 ms (10,000,000 ns)

10 - 10 cents/bit-5 -6

CapacityAccess TimeCost

Tapeinfinitesec-min10 -8

Registers

Cache

Memory

Disk

Tape

Instr. Operands

Blocks

Pages

Files

StagingXfer Unit

prog./compiler1-8 bytes

cache cntl8-128 bytes

OS4K-16K bytes

user/operatorMbytes

faster

Larger

Page 6: 55:035 Computer Architecture and Organization Lecture 6 1

Connecting Memory

55:035 Computer Architecture and Organization 6

Up to 2 k addressableMDR

MAR

k-bitaddress bus

n-bitdata bus

Control lines

( , MFC, etc.)

Processor Memory

locations

Word length = n bits

WR /

Page 7: 55:035 Computer Architecture and Organization Lecture 6 1

Array Architecture 2n words of 2m bits each If n >> m, fold by 2k into fewer rows of more columns

Good regularity – easy to design Very high density if good cells are used

row decoder

columndecoder

n

n-kk

2m bits

columncircuitry

bitline conditioning

memory cells:2n-k rows x2m+k columns

bitlines

wordlines

755:035 Computer Architecture and Organization

Page 8: 55:035 Computer Architecture and Organization Lecture 6 1

6T SRAM Cell Cell size accounts for most of array size

Reduce cell size at expense of complexity 6T SRAM Cell

Used in most commercial chips Data stored in cross-coupled inverters

Read: Precharge bit, bit_b Raise wordline

Write: Drive data onto bit, bit_b Raise wordline

bit bit_b

word

855:035 Computer Architecture and Organization

Page 9: 55:035 Computer Architecture and Organization Lecture 6 1

SRAM Read Precharge both bitlines high Then turn on wordline One of the two bitlines will be pulled down by the cell Ex: A = 0, A_b = 1

bit discharges, bit_b stays high

bit bit_b

N1

N2P1

A

P2

N3

N4

A_b

word

955:035 Computer Architecture and Organization

Page 10: 55:035 Computer Architecture and Organization Lecture 6 1

SRAM Write Drive one bitline high, the other low Then turn on wordline Bitlines overpower cell with new value Ex: A = 0, A_b = 1, bit = 1, bit_b = 0

Force A_b low

bit bit_b

N1

N2P1

A

P2

N3

N4

A_b

word

1055:035 Computer Architecture and Organization

Page 11: 55:035 Computer Architecture and Organization Lecture 6 1

SRAM Column ExampleRead Write

H H

SRAM Cell

word_q1

bit_v1f

bit_b_v1f

out_v1rout_b_v1r

1

2

word_q1

bit_v1f

out_v1r

2

MoreCells

Bitline Conditioning

2

MoreCells

SRAM Cell

word_q1

bit_v1f

bit_b_v1f

data_s1

write_q1

Bitline Conditioning

1155:035 Computer Architecture and Organization

Page 12: 55:035 Computer Architecture and Organization Lecture 6 1

Decoders n:2n decoder consists of 2n n-input AND gates

One needed for each row of memory Build AND from NAND or NOR gates

word0

word1

word2

word3

A0A1

1255:035 Computer Architecture and Organization

Page 13: 55:035 Computer Architecture and Organization Lecture 6 1

Large Decoders For n > 4, NAND gates become slow

Break large gates into multiple smaller gates

word0

word1

word2

word3

word15

A0A1A2A3

1355:035 Computer Architecture and Organization

Page 14: 55:035 Computer Architecture and Organization Lecture 6 1

Column Circuitry Some circuitry is required for each column

Bitline conditioning Sense amplifiers Column multiplexing

1455:035 Computer Architecture and Organization

Page 15: 55:035 Computer Architecture and Organization Lecture 6 1

Bitline Conditioning Precharge bitlines high before reads

Equalize bitlines to minimize voltage difference when using sense amplifiers

bit bit_b

bit bit_b

1555:035 Computer Architecture and Organization

Page 16: 55:035 Computer Architecture and Organization Lecture 6 1

Differential Pair Amp Differential pair requires no clock But always dissipates static power

bit bit_bsense_b sense

N1 N2

N3

P1 P2

1655:035 Computer Architecture and Organization

Page 17: 55:035 Computer Architecture and Organization Lecture 6 1

Column Multiplexing Recall that array may be folded for good aspect ratio Ex: 2 kword x 16 folded into 256 rows x 128 columns

Must select 16 output bits from the 128 columns Requires 16 8:1 column multiplexers

1755:035 Computer Architecture and Organization

Page 18: 55:035 Computer Architecture and Organization Lecture 6 1

Multiple Ports We have considered single-ported SRAM

One read or one write on each cycle

Multiported SRAM are needed for register files Examples:

Multicycle MIPS must read two sources or write a result on some cycles

Pipelined MIPS must read two sources and write a third result each cycle

Superscalar MIPS must read and write many sources and results each cycle

1855:035 Computer Architecture and Organization

Page 19: 55:035 Computer Architecture and Organization Lecture 6 1

Dual-Ported SRAM Simple dual-ported SRAM

Two independent single-ended reads Or one differential write

Do two reads and one write by time multiplexing Read during ph1, write during ph2

bit bit_b

wordBwordA

1955:035 Computer Architecture and Organization

Page 20: 55:035 Computer Architecture and Organization Lecture 6 1

Multi-Ported SRAM Adding more access transistors hurts read stability Multiported SRAM isolates reads from state node Single-ended design minimizes number of bitlines

bA

wordBwordA

wordDwordC

wordFwordE

wordG

bB bC

writecircuits

readcircuits

bD bE bF bG

2055:035 Computer Architecture and Organization

Page 21: 55:035 Computer Architecture and Organization Lecture 6 1

Serial Access Memories Serial access memories do not use an address

Shift Registers Tapped Delay Lines Serial In Parallel Out (SIPO) Parallel In Serial Out (PISO) Queues (FIFO, LIFO)

2155:035 Computer Architecture and Organization

Page 22: 55:035 Computer Architecture and Organization Lecture 6 1

Shift Register Shift registers store and delay data Simple design: cascade of registers

Watch your hold times!

clk

Din Dout8

2255:035 Computer Architecture and Organization

Page 23: 55:035 Computer Architecture and Organization Lecture 6 1

Denser Shift Registers Flip-flops aren’t very area-efficient For large shift registers, keep data in SRAM instead Move read/write pointers to RAM rather than data

Initialize read address to first entry, write to last Increment address on each cycle

Din

Dout

clk

counter counter

reset

00...00

11...11

readaddr

writeaddr

dual-portedSRAM

2355:035 Computer Architecture and Organization

Page 24: 55:035 Computer Architecture and Organization Lecture 6 1

Tapped Delay Line A tapped delay line is a shift register with a

programmable number of stages Set number of stages with delay controls to mux

Ex: 0 – 63 stages of delay

SR

32

clk

Din

delay5

SR

16

delay4S

R8

delay3

SR

4

delay2

SR

2

delay1

SR

1

delay0

Dout

2455:035 Computer Architecture and Organization

Page 25: 55:035 Computer Architecture and Organization Lecture 6 1

Serial In Parallel Out 1-bit shift register reads in serial data

After N steps, presents N-bit parallel output

clk

P0 P1 P2 P3

Sin

2555:035 Computer Architecture and Organization

Page 26: 55:035 Computer Architecture and Organization Lecture 6 1

Parallel In Serial Out Load all N bits in parallel when shift = 0

Then shift one bit out per cycle

clkshift/load

P0 P1 P2 P3

Sout

2655:035 Computer Architecture and Organization

Page 27: 55:035 Computer Architecture and Organization Lecture 6 1

Queues Queues allow data to be read and written at

different rates. Read and write each use their own clock, data Queue indicates whether it is full or empty Build with SRAM and read/write counters

(pointers)

Queue

WriteClk

WriteData

FULL

ReadClk

ReadData

EMPTY

2755:035 Computer Architecture and Organization

Page 28: 55:035 Computer Architecture and Organization Lecture 6 1

FIFO, LIFO Queues First In First Out (FIFO)

Initialize read and write pointers to first element Queue is EMPTY On write, increment write pointer If write almost catches read, Queue is FULL On read, increment read pointer

Last In First Out (LIFO) Also called a stack Use a single stack pointer for read and write

2855:035 Computer Architecture and Organization

Page 29: 55:035 Computer Architecture and Organization Lecture 6 1

Memory Timing: Approaches

DRAM TimingMultiplexed Adressing

SRAM TimingSelf-timed

Addressbus

RAS

RAS-CAS timing

Row Address

AddressBus

Address transitioninitiates memory operation

Address

Column Address

CAS

2955:035 Computer Architecture and Organization

Page 30: 55:035 Computer Architecture and Organization Lecture 6 1

Non-Volatile Memories Floating-gate transistor

Floating gate

Source

Substrate

Gate

Drain

n+ n+_p

tox

tox

Device cross-section Schematic symbol

G

S

D

3055:035 Computer Architecture and Organization

Page 31: 55:035 Computer Architecture and Organization Lecture 6 1

NOR Flash Operations ―Erase

S D

12 VG

cell arrayBL0 BL1

open open

WL0

WL1

0 V

0 V

3155:035 Computer Architecture and Organization

Page 32: 55:035 Computer Architecture and Organization Lecture 6 1

S D

12 V

6 VG

BL0 BL1

6 V 0 V

WL0

WL1

12 V

0 V

NOR Flash Operations ―Program

3255:035 Computer Architecture and Organization

Page 33: 55:035 Computer Architecture and Organization Lecture 6 1

5 V

1 VG

S D

BL0 BL1

1 V 0 V

WL0

WL1

5 V

0 V

NOR Flash Operations ―Read

3355:035 Computer Architecture and Organization

Page 34: 55:035 Computer Architecture and Organization Lecture 6 1

NAND Flash Memory

Unit Cell

Word line(poly)

Source line(Diff. Layer)

Courtesy Toshiba34

55:035 Computer Architecture and Organization

Page 35: 55:035 Computer Architecture and Organization Lecture 6 1

Read-Write Memories (RAM) Static (SRAM)

Data stored as long as supply is applied Large (6 transistors/cell) Fast Differential

Dynamic (DRAM) Periodic refresh required Small (1-3 transistors/cell) Slower Single Ended

3555:035 Computer Architecture and Organization

Page 36: 55:035 Computer Architecture and Organization Lecture 6 1

1-Transistor DRAM Cell Write: Cs is charged or discharged by asserting WL

and BL Read: Charge redistribution takes place between bit

line and storage capacitance Voltage swing is small; typically around 250 mV

3655:035 Computer Architecture and Organization

Page 37: 55:035 Computer Architecture and Organization Lecture 6 1

DRAM Cell Observations 1T DRAM requires a sense amplifier for each bit line, due to

charge redistribution read-out. DRAM memory cells are single ended in contrast to SRAM

cells. The read-out of the 1T DRAM cell is destructive; read and

refresh operations are necessary for correct operation. 1T cell requires presence of an extra capacitance that must

be explicitly included in the design. When writing a “1” into a DRAM cell, a threshold voltage is

lost. This charge loss can be circumvented by bootstrapping the word lines to a higher value than VDD

3755:035 Computer Architecture and Organization

Page 38: 55:035 Computer Architecture and Organization Lecture 6 1

Sense Amp Operation

ΔV(1)

V (1)

V(0)

t

VPRE

VBL

Sense amp activatedWord line activated

3855:035 Computer Architecture and Organization

Page 39: 55:035 Computer Architecture and Organization Lecture 6 1

DRAM Timing

3955:035 Computer Architecture and Organization