54
Review EE138 – SJSU 1

EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

Embed Size (px)

Citation preview

Page 1: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

ReviewEE138 – SJSU

1

Page 2: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

2

Memory organizationMemory chips are organized into a number of locations within the IC.Each location can hold 1 bit, 4 bits, 8 bits, or even 16 bits, depending on how it is designed internally. The number of bits that each location within the memory chip can hold is always equal to the number of data pins on the chip. How many locations exist inside a memory chip? That depends on the number of address pins. The number of locations within a memory IC always equals 2 to the power of the number of address pins. Therefore, the total number of bits that a memory chip can store is equal to the number of locations times the number of data bits per location.To summarize:1. A memory chip contains 2x locations, where x is the number of address pins.2. Each location contains y bits, where y is the number of data pins on the chip.3. The entire chip will contain 2x × y bits, where x is the number of address pins and y is the number of data pins on the chip

Page 3: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

3

Page 4: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

4

EXAMPLES:1) A given memory chip has 12 address pins and 4 data pins. Find: (a) the organization, and (b) the capacity.Solution: (a) This memory chip has 4,096 locations (212 = 4,096), and each location can hold 4 bits of data. This gives an organization of 4,096 × 4, often represented as 4K × 4.(b) The capacity is equal to 16K bits since there is a total of 4K locations and each location can hold 4 bits of data.

2) A 512K memory chip has 8 pins for data. Find:(a) the organization, and (b) the number of address pins for this memory chip.Solution: (a) A memory chip with 8 data pins means that each location within the chip can hold 8 bits of data. To find the number of locations within this memory chip, divide the capacity by the number of data pins. 512K/8 = 64K; therefore, the organization for this memory chip is 64K × 8. (b) The chip has 16 address lines since 216 = 64K.

Page 5: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

5

Using the conventional method of data access, a 64K-bit chip (64K × 1) must have 16 address lines and 1 data line.

To reduce the number of pins needed for addresses, multiplexing/demultiplexing is used. The method used is to split the address in half and send in each half of the address through the same pins, thereby requiring fewer address pins. Internally, the DRAM structure is divided into a square of rows and columns. The first half of the address is called the row and the second half is called the column.

For example, in the case of DRAM of 64K × 1 organization, the first half of the address is sent in through the 8 pins A0–A7, and by activating RAS (row address strobe), the internal latches inside DRAM grab the first half of the address. After that, the second half of the address is sent in through the same pins, and by activating CAS (column address strobe), the internal latches inside DRAM latch the second half of the address. This results in using 8 pins for addresses plus RAS and CAS, for a total of 10 pins, instead of the 16 pins that would be required without multiplexing.

Packaging issue in DRAM

Page 6: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

6

Page 7: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

7

EXAMPLES:Discuss the number of pins set aside for addresses in each of the following memory chips.(a) 16K × 4 DRAM (b) 16K × 4 SRAM

Solution:Since 214 = 16K:

(a) For DRAM we have 7 pins (A0–A6) for the address pins and 2 pins for RAS and CAS. Total 9 pins for address.

(b) For SRAM we have 14 pins for address and no pins for RAS and CAS since they are associated only with DRAM. In both cases we have 4 pins for the data bus.

Page 8: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

8

Inside CPUA program stored in memory provides instructions to the CPU to performan action. The function of the CPU is to fetch these instructions from memory and execute them. To perform the actions of fetch and execute, all CPUs are equipped with resources such as the following:

1. Registers: The CPU uses registers to store information temporarily. Registers inside the CPU can be 8-bit, 16-bit, 32-bit, or even 64-bit registers, depending on the CPU.

2. ALU (arithmetic/logic unit): is responsible for performing arithmetic functions such as add, subtract, multiply, and divide, and logic functions such as AND, OR, and NOT.

3. Program Counter: is to point to the address of the next instruction to be executed. As each instruction is executed, the program counter is incremented to point to the address of the next instruction to be executed.

4. Instruction Decoder: is to interpret the instruction fetched into the CPU. One can think of the instruction decoder as a kind of dictionary, storing the meaning of each instruction and what steps the CPU should take upon receiving a given instruction.

Page 9: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

9

Page 10: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

10

Harvard and von Neumann architecturesEvery microprocessor must have memory space to store program (code)and data. While code provides instructions to the CPU, the data provides the information to be processed. The CPU uses buses (wire traces) to access the code ROM and data RAM memory spaces. • von Neumann (Princeton) architecture uses the same bus for accessing

both the code and data. The process of accessing the code or data could cause them to get in each other’s way and slow down the processing speed of the CPU, because each had to wait for the other to finish fetching.

• Harvard architecture speeds up the process of program execution by using separate buses for the code and data memory.

• A set of data buses for carrying data into and out of the CPU.• A set of address buses for accessing the data.• A set of data buses for carrying code into the CPU.• An address bus for accessing the code.

This is easy to implement inside an IC chip such as a microcontroller where both ROM code and data RAM are internal (on-chip) and distances are on the micron and millimeter scale.

Page 11: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

11

Page 12: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

12

Mega AVR (ATmegaxxxx) FamilyThese are powerful microcontrollers with more than 120 instructions andlots of different peripheral capabilities, which can be used in different designs. See Table 1-3. Some of their characteristics are as follows:• Program memory: 4K to 256K bytes• Package: 28 to 100 pins• Extensive peripheral set• Extended instruction set: They have rich instruction sets.

Page 13: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

13

Page 14: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

14

THE AVR DATA MEMORY

In AVR microcontrollers there are two kinds of memory space: code memory space and data memory space. Our program is stored in code memory space, whereas the data memory stores data. The data memory is composed of three parts: GPRs (general purpose registers), I/O memory, and internal data SRAM.

Page 15: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

15

Page 16: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

16

C Programming ExampleWrite an AVR C program to send values 00–FF to Port B.Solution:#include <avr/io.h> //standard AVR headerint main(void){unsigned char z;DDRB = 0xFF; //PORTB is outputfor(z = 0; z <= 255; z++)PORTB = z;return 0;}//Notice that the program never exits the for loop because if you//increment an unsigned char variable when it is 0xFF, it will//become zero.

Page 17: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

17

Page 18: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

18

Write an AVR C program to get a byte of data from Port B, and then send it to Port C.Solution:#include <avr/io.h> //standard AVR headerint main(void){ unsigned char temp;DDRB = 0x00; //Port B is inputDDRC = 0xFF; //Port C is outputwhile(1){ temp = PINB;PORTC = temp;}return 0;}

Page 19: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

19

Page 20: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

20

Write an AVR C program to toggle only bit 4 of Port B continuously without disturbing the rest of the pins of Port B.Solution:#include <avr/io.h> //standard AVR headerint main(void){ DDRB = 0xFF; //PORTB is outputwhile(1){ PORTB = PORTB | 0b00010000; //set bit 4 (5th bit) of PORTBPORTB = PORTB & 0b11101111; //clear bit 4 (5th bit) of PORTB}return 0;}

Page 21: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

21

Write an AVR C program to monitor bit 5 of port C. If it is HIGH, send 55H to Port B; otherwise, send AAH to Port B.Solution:#include <avr/io.h> //standard AVR headerint main(void){DDRB = 0xFF; //PORTB is outputDDRC = 0x00; //PORTC is inputDDRD = 0xFF; //PORTB is outputwhile(1){ if (PINC & 0b00100000) //check bit 5 (6th bit) of PINCPORTB = 0x55;elsePORTB = 0xAA;}return 0;}

Page 22: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

22

Page 23: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

23

Find the contents of PORTC after execution of the following code:PORTC = 0;PORTC = PORTC | 0x99;PORTC = ~PORTC;Solution:66H

Find the contents of PORTC after execution of the following code:PORTC = ~(0<<3);Solution:FFH

Page 24: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

24

TIMER IN AVRNormal modeIn this mode, the content of the timer/counter increments with each clock.It counts up until it reaches its max of 0xFF. When it rolls over from 0xFF to 0x00, it sets high a flag bit called TOV0 (Timer Overflow). This timer flag can be monitored.

Page 25: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

25

CTC ModeThe OCR0 register is used with CTC mode. As with the Normal mode, in the CTC mode, the timer is incremented with a clock. But it counts up until the content of the TCNT0 register becomes equal to the content of OCR0 (compare match occurs); then, the timer will be cleared and the OCF0 flag will be set when the next clock occurs. The OCF0 flag is located in the TIFR register.

Page 26: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

26

EXAMPLES:1) In Normal mode, when the counter rolls over it goes from ____ to ____.

2) In CTC mode, the counter rolls over when the counter reaches____.

3) To get a 5-ms delay, what numbers should be loaded into TCNT1H and TCNT1L using Normal mode and the TOV1 flag? Assume that XTAL = 8 MHz.

4) To get a 20-μs delay, what number should be loaded into the TCNT0 register using Normal mode and the TOV0 flag? Assume that XTAL = 1 MHz.

1) Max ($FFFF for 16-bit timers and $FF for 8-bit timers), 00002) OCR1A3) $10000 – (5000 × 8) = 25536 = 63C0, TCNT1H = 0x64 and TCNT1L = 0xC04) XTAL = 1 MHz Tmachine cycle = 1/1 M = 1 μs 20 μs / 1 μs = 20−20 = $100 – 20 = 256 − 20 = 236 = 0xEC

Page 27: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

27

TIMER INTERRUPTS

Page 28: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

28

BASICS OF SERIAL COMMUNICATIONSerial data communication uses two methods, asynchronous and synchronous. The synchronous method transfers a block of data (characters) at a time, whereas the asynchronous method transfers a single byte at a time.

Page 29: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

29

Asynchronous Serial Communication

In the asynchronous method, each data character is placed between start and stop bits. This is called framing. The start bit is always a 0 (low) and one bit, but the stop bit(s) is 1 (high) and can be one or two bits. When there is no transfer, the signal is 1 (high), which is referred to as mark or idle. Data D0 (LSB) goes first then the rest of the bits until the MSB (D7). Example: the ASCII character “A” (8-bit binary 0100 0001) is framed between the start bit and a single stop bit.

Page 30: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

30

Example:a) Find the overhead due to framing when transmitting the ASCII letter “A” (01000001).b) Calculate the time it takes to transfer 10,000 characters as in question a) if we use 9600 bps. What percentage of time is wasted due to overhead?

Solutions:a) 2 bits (one for the start bit and one for the stop bit). Therefore, for each 8-bit character, a total of 10 bits is transferred. b) 10,000 × 10 = 100,000 total bits transmitted. 100,000 / 9600 = 10.4 seconds; 2 / 10 = 20%.

Page 31: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

31

Baud Rate in the AVR

In the AVR microcontroller five registers are associated with the USART. They are UDR (USART Data Register), UCSRA, UCSRB, UCSRC (USART Control Status Register), and UBRR (USART Baud Rate Register).

Desired Baud Rate = Fosc/ (16(X + 1))

where X is the value we load into the UBRR register. To get the X value for different baud rates we can solve the equation as follows:

X = (Fosc/ (16(Desired Baud Rate))) – 1

Assuming that Fosc = 8 MHz, we have the following:

Desired Baud Rate = Fosc/ (16(X + 1)) = 8 MHz/16(X + 1) = 500 kHz/(X + 1)

X = (500 kHz/ Desired Baud Rate) – 1

Page 32: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

32

Examples:1) Find Baud Rate if UBRR = 67H = 103Solution:Desired Baud Rate = Fosc/(16(X + 1)) = 8MHz/(16(103+1)) = 4807 bps

2) Find the UBRR value needed to have the following baud rates: (a) 9600 (b) 1200 for Fosc = 8 MHz.Solution:Fosc = 8 MHz => X = (8 MHz/16(Desired Baud Rate)) – 1=> X = (500 kHz/(Desired Baud Rate)) – 1(a) (500 kHz/ 9600) – 1 = 52.08 – 1 = 51.08 = 51 = 33 (hex) is loaded into UBRR(b) (500 kHz/ 1200) – 1 = 416.66 – 1 = 415.66 = 415 = 19F (hex) is loaded into UBRR

Page 33: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

33

Baud Rate Generation Block Diagram

Doubling the baud rate in the AVRThere are two ways to increase the baud rate of data transfer in the AVR: 1. Use a higher-frequency crystal (not feasible in many cases).2. Change a bit in the UCSRA register (U2X = 1).

Desired Baud Rate = Fosc / (8 (X + 1)) when U2x = 1

Page 34: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

34

Baud Rate Error CalculationIn calculating the baud rate we have used the integer number for the UBRR register values because AVR microcontrollers can only use integer values. By dropping the decimal portion of the calculated values we run the risk of introducing error into the baud rate. One way to calculate this error

Error = (Calculated value for the UBRR – Integer part) / Integer part

For example, with XTAL = 8 MHz and U2X = 0 we have the following for the 9600 baud rate:

UBRR value = (500,000/ 9600) – 1 = 52.08 – 1 = 51.08 = 51=> Error = (51.08 – 51)/ 51 = 0.16%

Page 35: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

35

Examples:Given: XTAL = 7.3728 MHz.a) What value should be loaded into UBRR to have a 9600 baud rate for U2X = 0, 1? Give the answers in both decimal and hex.b) What are the baud rate errors in a)?

Solutions:a) U2X = 0: (Fosc/16(baud rate)) – 1 = (7372800/16(9600)) – 1 = 47or 2FH U2X = 1: (Fosc/8(baud rate)) – 1 = (7372800 / 8 (9600)) – 1 = 94 or 5EH

b) 0%

Page 36: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

36

Memory and I/O Systems

Computer system performance depends on the memory system as well as the processor microarchitecture. Early processors were relatively slow, so memory was able to keep up. But processor speed has increased at a faster rate than memory speeds. DRAM memories are currently 10 to 100 times slower than processors. The increasing gap between processor and DRAM memory speeds demands increasingly ingenious memory systems to try to approximate a memory that is as fast as the processor.

Page 37: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

37

Diverging processor and memory performanceAdapted with permission from Hennessy and Patterson,

Computer Architecture:A Quantitative Approach,

5th ed., Morgan Kaufmann, 2012.

Page 38: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

38

Cache MemoryTo counteract this trend, computers store the most commonly used instructions and data in a faster but smaller memory, called a cache. The cache is usually built out of SRAM on the same chip as the processor.The cache speed is comparable to the processor speed, because SRAM is inherently faster than DRAM, and because the on-chip memory eliminates lengthy delays caused by traveling to and from a separate chip.

Cache Hit and Cache MissIf the processor requests data that is available in the cache, it is returned quickly. This is called a cache hit. Otherwise, the processor retrieves the data from main memory (DRAM). This is called a cache miss. If the cache hits most of the time, then the processor seldom has to wait for the slow main memory, and the average access time is low.

Page 39: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

39

Memory Hierarchy Computer SystemThe processor first seeks data in a small but fast cache that is usually located on the same chip. If the data is not available in the cache, the processor then looks in main memory. If the data is not there either, the processor fetches the data from virtual memory on the large but slow hard disk

Page 40: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

40

Memory Hierarchy Componentswith typical characteristics in 2012

Page 41: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

41

MEMORY SYSTEM PERFORMANCE ANALYSISMemory system performance metrics are miss rate or hit rate and average memory access time.

Miss and Hit rate calculation:

Average memory access time (AMAT) is the average time a processor must wait for memory per load or store instruction.

AMAT calculation:

Note: In the typical computer system, the processor first looks for the data in the cache. If the cache misses, the processor then looks in main memory. If the main memory misses, the processor accesses virtual memory on the hard disk.

Page 42: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

42

CALCULATING CACHE PERFORMANCE1) Suppose a program has 2000 data access instructions (loads or stores), and 1250 of these requested data values are found in the cache. The other 750 data values are supplied to the processor by main memory or disk memory. What are the miss and hit rates for the cache?Solution: The miss rate is 750/2000 = 0.375 = 37.5%. The hit rate is 1250/2000 = 0.625 = 1 − 0.375 = 62.5%.

CALCULATING AVERAGE MEMORY ACCESS TIME2) Suppose a computer system has a memory organization with only two levels of hierarchy, a cache and main memory. What is the average memory access time given Access times and miss rates as belowMemory Level Access Time (Cycles) Miss RateCache 1 10%Main Memory 100 0%Solution: The average memory access time is 1 +0.1(100) =11 cycles.

Page 43: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

43

Data Held in the CacheIn particular, the cache exploits temporal and spatial locality to achieve a low miss rate.Temporal locality means that the processor is likely to access a piece of data again soon if it has accessed that data recently. Therefore, when the processor loads or stores data that is not in the cache, the data is copied from main memory into the cache. Subsequent requests for that data hit in the cache.Spatial locality means that, when the processor accesses apiece of data, it is also likely to access data in nearby memory locations. Therefore, when the cache fetches one word from memory, it may also fetch several adjacent words. This group of words is called a cache block or cache line. The number of words in the cache block, b, is called the block size. A cache of capacity C contains B = C/b blocks.

Page 44: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

44

The principles of temporal and spatial locality have been experimentally verified in real programs.

If a variable is used in a program, the same variable is likely to be used again, creating temporal locality.

If an element in an array is used, other elements in the same array are also likely to be used, creating spatial locality.

Page 45: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

45

Advanced Cache DesignModern systems use multiple levels of caches to decrease memory access time that will improve performance of the systems.

Multiple-Level CachesLarge caches are beneficial because they are more likely to hold data of interest and therefore have lower miss rates. However, large caches tend to be slower than small ones. Modern systems often use at least two levels of caches. The first-level (L1) cache is small enough to provide a one- or two-cycle access time. The second-level (L2) cache is also built from SRAM but is larger, and therefore slower, than the L1 cache. The processor first looks for the data in the L1 cache. If the L1 cache misses, the processor looks in the L2 cache. If the L2 cache misses, the processor fetches the data from main memory. Many modern systems add even more levels of cache to the memory hierarchy, because accessing main memory is so slow.

Page 46: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

46

Memory Hierarchy with Two Levels of Cache

Page 47: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

47

SYSTEM WITH AN L2 CACHEGiven a system using 2 level of Cache, what is the average memory access time (AMAT) for given access time and miss rate below?Memory Level Access Time (Cycles) Miss RateCache L1 1 5%Cache L2 10 20%Main Memory 100 0% Solution: Each memory access checks the L1 cache. When the L1 cache misses (5% of the time), the processor checks the L2 cache. When the L2 cache misses (20% of the time), the processor fetches the data from main memory.

AMAT = 1 cycle + 0.05[10 cycle + 0.2(100 cycles)] = 2.5 cycles The L2 miss rate is high because it receives only the “hard” memory accesses, those that miss in the L1 cache. If all accesses went directly to the L2 cache, the L2 miss rate would be about 1%.

AMAT = tcache + MRcache(tL2cache + MRL2cache tMM)

Page 48: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

48

Endianness• Big-endian: Most significant byte of

the word is stored in the smallest address given and the least significant byte is stored in the largest.

• Little endian: Least significant byte is stored in the smallest address.

In modern days, big-endian is generally used in computer networks, and little-endian in microprocessors. Example: The Intel processors use little-endian system and the  IBM computer networks use big-endian system.

Page 49: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

49

There are several possible methods for determining where memory blocks are placed in the cache. Data is usually stored in cache in one of three schemes:

direct mapped,

associative,

set associative.

Tag Block/Line Offset

Tag Set Offset

Tag Offset

Page 50: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

Cache and Memory Working Together http://www.cs.nmsu.edu/~pfeiffer/classes/473/notes/cache-vm.html

Virtual Memory

  Address width: 32 bits

  Page size: 1 K bytes

  Single level page table

 

Physical Memory

  32 bit physical address space

 

Cache

  Block size: 16 bytes

  Cache size: 1 K bytes

  Associativity: Direct mapped

 

Translation Lookaside Buffer

 Number of translations:

64

  Associativity: Direct mapped

Let's try to put together some examples of simultaneous TLB and L1 cache lookups. For example, let's look at the simplest case: we'll make both the TLB and the L1 cache direct-mapped. Let's assume the following specifications:

50

Page 51: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

Cache◦ The block size is 16 bytes, so the byte offset field is 4 bits◦ The total size of the cache is 1K, so there are1K/16 = 64blocks. Since it's direct mapped

we've got a six bit index field.◦ We've used up 10 bits; since the physical address is 32 bits that tells us that we've got a

22 bit tag.◦ So this looks like the following:

Tag 31-10 Cache Index 9-4 Byte Offset 3-0

Virtual Memory◦ The page size is 1K, so the byte offset field is 10 bits.◦ That leaves us a 22 bit virtual page number

Virtual Page Number 31-10 Byte Offset 9-0

TLB◦ We get the field breakdown for the TLB by further dividing the VPN. ◦ Since we've got 64 translations and a direct-mapped organization, the 22 bit VPN gets

divided into: TLB Tag 31-16 TLB Index 15-10

Cache and Memory Working Together http://www.cs.nmsu.edu/~pfeiffer/classes/473/notes/cache-vm.html

51

Page 52: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

TLB Tag TLB Index Cache Index Byte Offset

31-16 15-10 9-4 3-0

Cache and Memory Working Together http://www.cs.nmsu.edu/~pfeiffer/classes/473/notes/cache-vm.html

Let's put specific numbers on this: we'll try to read one byte from virtual address 0x1234abcd.

1. The byte offset field contains d (bits 3-0 of the address).2. The cache index field contains 3c (bits 9-4 of the address).3. The TLB index field contains 2a (bits 15-10 of the address).4. The TLB tag field contains 1234 (bits 31-16 of the address).

So now we go through the following steps:1. We look up translation 2a in the TLB and cache line 3c in the

cache.2. We obtain the TLB tag from the TLB and the cache tag from the

cache.3. We ask whether:

1. The TLB entry is valid.2. The TLB tag is 1234 (that's the TLB tag from our virtual

address.3. We have permissions to perform the requested access.4. The cache entry is valid.5. The cache tag from the cache entry matches the cache tag

from the TLB entry.4. If the answer to all of the questions in Step 3 was "yes", we've

both got a valid translation and a cache hit. We can either obtain our data from the cache or write our value to the cache.

52

Page 53: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

We have two competing requirements: 1. we'd like to bring an entire cache line in from memory in one transfer (for bandwidth), but

2. we want to have as few data lines as possible (for cost).

There are really three feasible solutions here: 1. the fastest (but most expensive) approach is to use a memory bus that's as wide as a cache line. Now,

any time you have a miss, you can just do a single memory transfer.

2. The cheapest (but slowest) approach is to use a memory bus that's narrower than a cache line; then, on a miss, we take several memory transfers to bring the whole line in.

3. The third approach is a compromise between the first two: use the narrower bus from the second approach, but find a way to overlap the memory accesses. The traditional way to implement this approach was to have several distinct memory modules: you'd start a read from each of them in turn, and the data would arrive from them on consecutive cycles.

The current solution to this problem is to use fast page DRAM or synchronous DRAM. With both of these technologies, we can make a transfer from the internal DRAM cells (comparatively slow) into some substantially faster static memory on the memory chip, and then transfer the data from the static memory much more quickly than we could from DRAM.

PC100 and PC133 SDRAM uses four transfers of 64 bits each to fill a cache line on a system with a 32 byte cache line.

Transfers Between Cache and Memory

http://www.cs.nmsu.edu/~pfeiffer/classes/473/notes/cache-vm.html

53

Page 54: EE138 – SJSU 1. 2 Memory organization Memory chips are organized into a number of locations within the IC. Each location can hold 1 bit, 4 bits, 8 bits,

54

References:

The AVR Microcontroller and Embedded Systems: Using Assembly and CMuhammad Ali Mazidi; Sarmad Naimi; Sepehr Naimi

Digital Design and Computer Architecture, 2nd EditionDavid Harris; Sarah Harris

Computer Organization and Embedded Systems, 6th EditionHamacher, Carl; Vranesic, Zvonko; Zaky, Safwat; Manjikian, Naraig