48
Computer Orgnization Rabie A. Ramadan Lecture 9

Computer Orgnization

Embed Size (px)

DESCRIPTION

Computer Orgnization. Rabie A. Ramadan Lecture 9. Cache Mapping Schemes. Cache Mapping Schemes. Cache memory is smaller than the main memory Only few blocks can be loaded at the cache The cache does not use the same memory addresses - PowerPoint PPT Presentation

Citation preview

Page 1: Computer Orgnization

Computer Orgnization

Rabie A. Ramadan

Lecture 9

Page 2: Computer Orgnization

Cache Mapping Schemes

Page 3: Computer Orgnization

Cache Mapping Schemes Cache memory is smaller than the main memory

Only few blocks can be loaded at the cache

The cache does not use the same memory addresses

Which block in the cache is equivalent to which block in the memory? • The processor uses Memory Management Unit (MMU) to convert the

requested memory address to a cache address

Page 4: Computer Orgnization

Direct Mapping

Assigns cache mappings using a modular approach

j = i mod n j cache block number i memory block number n number of cache blocks

Memory

Cache

Page 5: Computer Orgnization

Example Given M memory blocks to be mapped to 10 cache blocks, show

the direct mapping scheme?

How do you know which block is

currently in the cache?

Page 6: Computer Orgnization

Direct Mapping (Cont.) Bits in the main memory address are divided into three fields.

Word identifies specific word in the block

Block identifies a unique block in the cache

Tag identifies which block from the main memory currently in the cache

Page 7: Computer Orgnization

Example Consider, for example, the case of a main memory consisting of 4K

blocks, a cache memory consisting of 128 blocks, and a block size of 16 words. Show the direct mapping and the main memory address format?

Tag

Page 8: Computer Orgnization

Example (Cont.)

Page 9: Computer Orgnization

Direct Mapping

Advantage

• Easy

• Does not require any search technique to find a block in cache

• Replacement is a straight forward

Disadvantages• Many blocks in MM are mapped to the same cache block

• We may have others empty in the cache

• Poor cache utilization

Page 10: Computer Orgnization

Group Activity

Consider, the case of a main memory consisting of 4K blocks, a cache memory consisting of 8 blocks, and a block size of 4 words. Show the direct mapping and the main memory address format?

Page 11: Computer Orgnization

Given the following direct mapping chart, what is the cache and memory location required by the following addresses:

31 126 3

4 20 2

Page 12: Computer Orgnization

Fully Associative Mapping Allowing any memory block to be placed anywhere in the

cache

A search technique is required to find the block number in the tag field

Page 13: Computer Orgnization

Example

We have a main memory with 214 words , a cache with 16 blocks , and blocks is 8 words. How many tag & word fields bits?

Word field requires 3 bits

Tag field requires 11 bits 214 /8 = 2048 blocks

Page 14: Computer Orgnization

Which MM block in the cache?

Naïve Method: • Tag fields are associated with each cache block

• Compare tag field with tag entry in cache to check for hit.

CAM (Content Addressable Memory)• Words can be fetched on the basis of their contents, rather than on the basis

of their addresses or locations.

• For example:

• Find the addresses of all “Smiths” in Dallas.

Page 15: Computer Orgnization

Fully Associative Mapping

Advantages • Flexibility

• Utilizing the cache

Disadvantage• Required tag search

• Associative search Parallel search

• Might require extra hardware unit to do the search

• Requires a replacement strategy if the cache is full

• Expensive

Page 16: Computer Orgnization

N-way Set Associative Mapping Combines direct and fully associative mapping The cache is divided into a set of blocks All sets are the same size

Main memory blocks are mapped to a specific set based on :

s = i mod S

• s specific to which block i mapped

• S total number of sets

Any coming block is assigned to any cache block inside the set

Page 17: Computer Orgnization

N-way Set Associative Mapping

Tag field uniquely identifies the targeted block within the determined set.

Word field identifies the element (word) within the block that is requested by the processor.

Set field identifies the set

Page 18: Computer Orgnization

N-way Set Associative Mapping

Page 19: Computer Orgnization

Group Activity

Compute the three parameters (Word, Set, and Tag) for a memory system having the following specification: • Size of the main memory is 4K blocks,

• Size of the cache is 128 blocks,

• The block size is 16 words.

Assume that the system uses 4-way set-associative mapping.

Page 20: Computer Orgnization

Answer

Page 21: Computer Orgnization

N-way Set Associative Mapping

Advantages:• Moderate utilization to the cache

Disadvantage • Still needs a tag search inside the set

Page 22: Computer Orgnization

If the cache is full and there is a need for block replacement ,

Which one to replace?

Page 23: Computer Orgnization

Cache Replacement Policies Random

• Simple

• Requires random generator

First In First Out (FIFO)• Replace the block that has been in the cache the longest

• Requires keeping track of the block lifetime

Least Recently Used (LRU) • Replace the one that has been used the least

• Requires keeping track of the block history

Page 24: Computer Orgnization

Cache Replacement Policies (Cont.)

Most Recently Used (MRU) • Replace the one that has been used the most

• Requires keeping track of the block history

Optimal • Hypothetical

• Must know the future

Page 25: Computer Orgnization

Example Consider the case of a 4X8 two-dimensional array of numbers, A.

Assume that each number in the array occupies one word and that the array elements are stored column-major order in the main memory from location 1000 to location 1031. The cache consists of eight blocks each consisting of just two words. Assume also that whenever needed, LRU replacement policy is used. We would like to examine the changes in the cache if each of the direct mapping techniques is used as the following sequence of requests for the array elements are made by the processor:

Page 26: Computer Orgnization

Array elements in the main memory

Page 27: Computer Orgnization
Page 28: Computer Orgnization

Conclusion

16 cache miss No single hit 12 replacements Only 4 cache blocks are used

Page 29: Computer Orgnization

Group Activity

Do the same in case of fully and 4-way set associative mappings ?

Page 30: Computer Orgnization

Pipelining

Page 31: Computer Orgnization

BasicBasic IdeaIdea

Assembly Line

Divide the execution of a task among a number of stages

A task is divided into subtasks to be executed in sequence

Performance improvement compared to sequential execution

Page 32: Computer Orgnization

PipelinePipeline

Job

1 2 m

tasks

1 2 n

Pipeline

Stream ofTasks

Page 33: Computer Orgnization

5 Tasks on 4 stage pipeline5 Tasks on 4 stage pipeline

Task 1

Task 2

Task 3

Task 4

Task 5

1 2 3 4 5 6 7 8Time

Page 34: Computer Orgnization

SpeedupSpeedup

t t t

1 2 n

Pipeline

Stream ofm Tasks

T (Seq) = n * m * t

T(Pipe) = n * t + (m-1) * t

Speedup = (n *m)/(n + m -1)

Page 35: Computer Orgnization

Efficiency Efficiency

t t t

1 2 n

Pipeline

Stream ofm Tasks

T (Seq) = n * m * t

T(Pipe) = n * t + (m-1) * t

Efficiency = Speedup/ n =m/(n+m-1)

Page 36: Computer Orgnization

Throughput Throughput

t t t

1 2 n

Pipeline

Stream ofm Tasks

T (Seq) = n * m * t

T(Pipe) = n * t + (m-1) * t

Throughput = no. of tasks executed per unit of time = m/((n+m-1)*t)

Page 37: Computer Orgnization

Instruction Pipeline Instruction Pipeline

Pipeline stallSome of the stages might need more time to perform its function.

E.g. the pipeline stalls after I2

This is called a “Bubble” or “pipeline hazard”

Page 38: Computer Orgnization

ExampleExample Show a Gantt chart for 10 instructions that enter a

four-stage pipeline (IF, ID, IE , and IS)?

Assume that I5 fetching process depends on the results of the I4 evaluation.

Page 39: Computer Orgnization

Answer Answer

Page 40: Computer Orgnization

ExampleExample

Delay due to branch

Page 41: Computer Orgnization

Pipeline and Instruction Dependency Pipeline and Instruction Dependency

Instruction Dependency • The operation performed by a stage depends on the operation(s)

performed by other stage(s).

E.g. Conditional Branch

Instruction I4 can not be executed until the branch condition in I3 is evaluated and stored.

The branch takes 3 units of time

Page 42: Computer Orgnization

Pipeline and Data Dependency Pipeline and Data Dependency

Data Dependency:A source operand of instruction Ii depends on the results of

executing a proceeding Ij i > j

E.g.

Ij can not be fetched unless the results of Ii are saved.

Page 43: Computer Orgnization

ExampleExample

ADD R1, R2, R3 R3 R1 + R2 Ii

SL R3 , R3 SL( R3 ) Ii+1

SUB R5, R6, R4 R4 R5 – R6 Ii+2

Assume that we have five stages in the pipeline:

IF (Instruction Fetch)

ID (Instruction Decode)

OF (Operand Fetch)

IE (Instruction Execute)

IS (Instruction Store)

Show a Gantt chart for this code?

Shift Left

Page 44: Computer Orgnization

Answer Answer

R3 in both Ii and Ii+1 needs to be written

Therefore, the problem is a

Write after Write Data Dependency

Page 45: Computer Orgnization

Stalls Due to Data Dependency Stalls Due to Data Dependency Write after write

Read after write

Write after read

Read after read does not cause stall

Page 46: Computer Orgnization

Read after write

Page 47: Computer Orgnization

ExampleExample Consider the execution of the following sequence of

instructions on a five-stage pipeline consisting of IF, ID, OF, IE, and IS. It is required to show the succession of these instructions in the pipeline. Show all types of data dependency? Show the speedup and efficiency?

Page 48: Computer Orgnization

Answer Answer