IT253: Computer Organization Lecture 11: Memory Tonga Institute of Higher Education

IT253: Computer Organization

Lecture 11:Memory

Tonga Institute of Higher Education

The Big Picture

What is Memory? (Review)

• A large, linear array of bytes.– Each byte has it’s own address in memory

• Most ISA’s have commands that do byte addressing (the addresses start every 8 bits)

• Data is aligned on the word boundary.– This means things like integers, characters,

instructions are 32 bits long (1 word)

How we think of memory now

• When we built our processor we needed to pretend memory worked very simply so that we could get instructions and data from it

What do we really need for memory?

• We need four parts for our memory– The cache, which

is the fastest memory, that the processor will use directly

– The memory bus and I/O bus

– Memory (or RAM)– Hard Disks

Part I: Inside the Processor

• The processor will use an internal cache (inside the processor) and an external cache that is nearby

• This is called a two-level cache

• If things can’t be saved in the cache, it goes to main memory

Part II: Main Memory

• Main memory is the RAM in the computer. It is often called DRAM (dynamic random access memory)

Memory Types Explained

• RAM – Random Access Memory– Random – access all the locations at the any

time– DRAM – dynamic RAM

• High density, cheap, slow, low power usage• Dynamic means it needs to be “refreshed”.• This is the main memory

– SRAM – static RAM• Low Density, high power, expensive, fast• Static – memory will last forever (until power cuts off)• Caches are made out of this

• Non-Random Access Memory– Some memory technology is sequential (like a

tape). You need to go through a lot of memory to find the spot you want.

RAM

• What's important to know about RAM?– Latency – time it takes for a word to be read

from memory– Bandwidth – average words read per second

• If a programmer can fit his whole program in the size of the cache, it will be much faster.

• Every time the CPU goes to the RAM, it must wait a long time to get the data.

• We can make our programs faster if all the instructions stay inside cache

SRAM• We can make a SRAM circuit (one that does not

need to be refreshed) with 6 transistors. • Then we can put together the SRAM to make a

bigger SRAM

This is a 16 word SRAM diagram.It can be accessed with 4 bits. 2^4 = 16

Each SRAM cellwill hold 8 bits

The SRAM diagram

• Like everything else, we can draw one simple box to describe an SRAM

• WE_L – Write Enable• OE_L – Output Enable

• We need Output Enable and Write enable because we are using the D bus to do both the input and the output. • This is to save space inside the processor• A is the address that we are either writing to or outputting to. • The number of bits depends on how many words are inside the SRAM

DRAM

• What we know about DRAM– Needs to be

refreshed regularly– Holds a lot of data

in small space– Uses very little

power– Has Output Enable– Has Write Enable

The 1-transistor DRAM memory

• To save a single bit, we need just 1 transistor

• To Write:– Select row, put bit on the bit

line

• To Read:– Select row, read what

comes on bit line. (Only very few electrons)

– Then rewrite value, because the charge of electricity left during the read

• To Refresh:– Just do a read that will

rewrite value

Simple DRAM groupingThe DRAM cells are put together in an array, where it is possible to access one bit at a time

Complicated DRAM grouping

• The real way DRAM is put together is in layers.

• Usually, 8 layers will be put together and the row and column numbers will go to all the layers and will return 8 bits (1 byte) at a time

• Example:– 2 MB DRAM = 256K

x 8 layers– 512 rows x 512

columns x 8 planes– 512x512 = 256,000

(256K)

Diagram for RAM

RAS_L = If this is 1 then A contains the row addressCAS_L = If this is 1 then A contains column addressWE_L = write enableOE_L = output enableD = the data that will be either inputted or outputted. (To save space, we use the same line for input and output)

DRAMs through History• Fast Page DRAM – this type of DRAM allowed

selecting memory through rows and columns and was able to automatically get the next byte, saving time. It was introduced in 1992 for PCs.

• Synchronous DRAM (SDRAM) – gives a clock signal to the RAM, so that it can "pipeline" data, meaning it can send more than one piece of data at a time. Introduced in 1997 and is very common

• Dual Data Rate RAM (DDR-RAM) – can transfer data two times during a clock cycle. Introduced in 2000 and is used in all new computers

• Rambus DRAM (RDRAM) – Uses a special method of signalling that allows for faster clock speeds, but is made only by the Rambus company. Introduced in 2001, it was popular for a short time, before Intel refused to support it

Summary of DRAM and SRAM

• DRAM – Slow, cheap, low power.– Good for giving user a lot of memory at

a low price– Uses 1 transistor to save one bit

• SRAM– Fast, expensive, uses power– Good for people who need speed– Uses 6 transistors to save one bit

Caches

• Why do we want a cache?

• If DRAM is slow and SRAM is fast, then we can make the average access time to memory very small if most of the accesses are in SRAM

• We can use SRAM to make a memory that works very quickly (the cache)

Different Levels of MemoryTHE MEMORY HIERARCHY

Cache Ideas: Locality

• Locality – the idea that most of the things you need are close by to you

• 90 percent of the time, you will be using 10 percent of the code

• Two types of locality:– Temporal – The locality of time – if

something is used, it will be used again in the near future

– Spatial – The locality of space – if something is used, then things that are near it will probably be used as well

How the levels work together

• The levels of memory are always working together to keep moving memory closer to the fastest level (the cache).

• The levels copy data between themselves• Block – a block is the smallest piece of data

that will be copied between levels

The Memory Hierarchy• Hit – the data that is wanted is in the memory level

we are searching– (example in picture is Block X)– Hit Rate – fraction of time that we find the data

we want in the memory level– Hit Time – the time it takes to get a piece of data

from the higher level into processor• Miss – data is not in the higher level. The data

needs to come from the lower level– Miss Rate = 1 – Hit Rate– Miss Penalty = the time it takes to load data

from lower level into higher level and send to processor

A simple cache: Direct Mapped

The first spot in a cache index will be from thebeginning of a word. The next 4 cache indexeswill automatically be the next 4 bytes from themain memory. Thus we are using 1-byte blocks in the cacheindex

Direct Mapped Cache• A direct mapped cache – a cache of fixed size

blocks. Each block holds data from main memory• Parts of a direct mapped cache

– Data – the actual data– Tag – special number for each block– Index – spot in the cache that holds the data

• Parts of a direct mapped cache address– Tag Array – list of tags that identify what's in the cache

• A Tag will tell us if the data we are looking for is in the cache• Each cache entry will have a special, unique tag. If that tag is

not in the cache, then we know that it is a miss and we need to get it from main memory

– Cache Index – the location of a block in the cache– Block Offset – byte location in the cache block

Direct Mapped Caches• The processor will use addresses that link into the

cache.• The address will have special parts, just like

instruction formatting. With the different pieces of the address we can figure out where to find the data in the cache

• If the cache is 2M bytes (in size) and the block size is 2L, then there are 2(M-L) blocks– If we use 32-bit addresses then:– Lowest L bits are for block offset– Next (M-L) bits are for Cache-Index– The last (32-M) bits are for Tag bits (tag holds

address of data in cache)

Direct Mapped Cache Example• Example: 1 KB cache with 32 byte blocks

– Cache-Index = (Address % 1024) / 32– Block Offset = Address % 32– Tag = Address / 1024 (tag holds address of data in cache)– Valid Bit – says if the data in the cache is good, or if its bad

32 cache blocks * 32 byte blocks = 1024 bytes = 1 KB cache

Direct Mapped Cache Example

Cache tag will check to see if the cache entry is actuallyIn the cache or if it is not. If it is not, we get it from RAM


• Example of a Cache Miss


• A Cache Hit

The Block Size Decision• The goal is to find the right block size so that you will get

mostly cache hits. But also, if you miss, the penalty will not be that bad

• Larger block size – better spatial locality– But takes longer to put a new one into cache– If block size is too big, there are too few blocks in the

cache and you will get many misses again

A Better Cache: Associative Cache

• An N-Way Set Associative Cache works differently from the direct mapped cache.

• In the N-Way Set, there are N entries for each cache index, so it is like N direct mapped caches at the same time

• All the entries in one set are selected and then only the one with the correct Cache Tag is chosen

Pros and Cons: Set Associative Cache

• The set associative cache gives us many benefits– Higher hit rate for same size cache– Fewer conflict misses– Can have a larger cache, but not change the

number of bits used for cache index

• But there are also bad things– You need to compare N things to choose which

is the right piece of data (so we get a time delay for a MUX)

– The data is only available to use after we decide if it’s a hit or a miss• (With direct mapped, we can assume it’s a

hit and if it’s not, then fix the mistake)

Cache Questions• Draw a 32 KB cache with 4 byte blocks that is 2

way set associative• If you have a 256 byte direct mapped cache

with 16 byte blocks, and you have the following tags in your tag array, choose which address will result in a hit in the cache: Tag array: Index 0 = 0xEF4021, Index 1 = 0xEF4022, Index 2 = 0x430322, Index 3 = 0x320933, Index 4 = 0xA34E44

1. 0x430322632. 0x430322023. 0xEF4021144. 0xA34E44415. 0x32093301

Sources for Cache Misses• What can cause a cache miss?

– Compulsory: When you start a computer, all the data in the cache is no good (also called ‘Cold Start’). Nothing we can do about it

– Conflict: Multiple memory locations mapped to same cache spot• You can increase cache size, or increase

associativity– Capacity: Cache cannot contain all

blocks needed by a program.• Increase cache size

– Invalidation: Something else changes the data (like some sort of input)

A Simple Chart for Cache misses

Replacing Blocks in Cache• We need a way to decide how to replace

blocks in cache.– For a direct mapped cache, there is no policy,

because we just throw away the block that is in it’s place

• For a N-Way Set Associative cache, we have N blocks to choose from to throw away, because we’ll need to make room for the new block

• This is called the Cache Block Replacement Policy

Cache Block Replacement Policy

• Random Replacement - hardware randomly selects a block to throw out

• First in, First Out (FIFO) – Hardware keeps a list of what came into the cache in what order. It will then throw out what came first

• Least Recently Used (LRU) – Hardware keeps track of when each block was used. The one that has not been used for the longest is deleted

Cache Write Policy

• There are a few ways we can write data to the cache as well

• Our problem is that we need to keep data in the memory and the cache the same

• Two options to do this:– Write Back: store data only in cache. When

cache block is replaced, move back to memory. Only one copy. We must use special controls to make sure we don't make mistakes

– Write Through: Write to memory and to cache at the same time. We use a small buffer that will save copies of things before they get written to main memory, because it may take longer to write to main memory than it does to the cache.

Questions for the memory hierarchy

• Designers of memory systems need to know the answers to these questions before they start building

1. Where is a block placed in the upper level of memory?– (Block Placement)

2. How is a block found if it’s in the upper level?– (Block Identification)

3. Which block should be replaced on a miss?– (Block Replacement)

4. What happens on a write?– (Write Strategy)

Cache Performance• CPU time = (CPU execution clock cycles +

Memory Stall clock cycles) x Clock cycle time• Memory Stall clock cycles = Memory accesses x

Miss Rate x Miss Penalty• We can figure out how well our cache will work

with formulas like these– Example:

• If 1 instruction takes one clock cycle• Miss penalty = 20 cycles• Miss rate = 10%• And there are a 1000 instructions and 300 memory

accesses)– Then

• Memory Stall clock cycles = (300 * .10 * 20) = 600 cycles

• CPU time = (1000 + 600) * 1 = 1,600 cycles to do 1,000 instruction

• This means we are spending 37.5% of our time on memory access!!!!

How to improve cache performance

• Reduce miss rate– Remember 4 reasons for miss

• Compulsory (at first, there is no memory in cache, all bad)

• Capacity (can’t fit everything inside of the cache)• Conflict (the stuff in the cache is not the right stuff we

want)• Invalidation (nothing we can do about this)

• Reduce miss penalty• Reduce time for a hit in the cache

• So can we improve cache performance with our programming?? Yes!

Ways to improve Cache performance with

programming• With instructions

– Loop interchange – change nesting of loops to access data in ways that will use the cache wisely

– Combining Loops – Combine two loops that have much of same data and some of the same variables

• With data in memory– Merging arrays – putting arrays together. Use

1 array of an object that can hold two types of data instead of two arrays, each holding a different type of data

– Pointers – Use pointers to access memory. They are not big blocks that need to be copied in and out of cache

Loop Interchange Example

Loop Combining Example

Merging Arrays Example

Changing code

• A lot of the time, the compiler will change your code into a more optimized version using these examples. It will try hard to make sure cache misses do not happen often.

• The compiler will reorder some instructions and look at memory for possible conflicts and try to fix them

Summary

• The chapter about memory covers a great deal.

• From the way it is built to the way that it works

• There are different levels of memory that work together

• The cache is the fastest and most important memory, so we have special rules about how to make it work

• We can affect memory speed ourselves through better coding

Documents

IT253: Computer Organization Lecture 11: Memory Tonga Institute of Higher Education