ECE 456 Computer Architecture Lecture #14 – CPU (III) Instruction Cycle & Pipelining Instructor: Dr. Honggang Wang Fall 2013

ECE 456 Computer Architecture

Lecture #14 – CPU (III)

Instruction Cycle & Pipelining

Instructor: Dr. Honggang Wang

Fall 2013

Dr. Wang Lecture #13 2

Administrative Issues(Wednesday, Dec 4)

• Project–Report Due Dec 9

–Presentation Due 2:00 pm, Dec 9

–Order: Group 1, Group 2, Group 3, Group 4

• Exam 2 review

Dr. Wang

Review of Lecture #12 & 13

• Machine instruction characteristics– constituent elements, instruction representation, instruction

types, and number of addresses

• Instruction set design– types of operands– operation repertoire– Addressing modes (how is the operand address specified?):

immediate, direct, indirect, register, register indirect, displacement (relative, base-register, indexing), stack

– Instruction formats

• Little-, big-, and bi-endian (byte ordering, bit ordering)

Dr. Wang

Topics

• Instruction cycle • Instruction pipelining

– Principle

– Performance

– Problems (L15)

– Examples (L15)

Dr. Wang

Instruction Cycle

+ Indirect Cycle (for indirect addressing operands)

Dr. Wang

Instruction Cycle with Indirect Sub-Cycle

Dr. Wang

Instruction Cycle State Diagram

Dr. Wang

Data Flow in Each Cycle

Dr. Wang

Data Flow (1: Fetch Cycle)

– PC contains address of next instruction

– Address moved to MAR

– Address placed on address bus

– Control unit requests a memory read

– Result placed on data bus, copied to MBR, then to IR

– Meanwhile PC incremented by 1

Dr. Wang

Data Flow (2: Indirect Cycle)

• IR is examined

• If indirect addressing, indirect cycle is performed– Right most N bits of MBR

transferred to MAR

– Control unit requests a memory read

– Result (address of operand) moved to MBR

Dr. Wang

Data Flow (3: Execute Cycle)

• May take many forms• Depends on instruction being executed• May include

– Register transfers

– Memory read/write

– Input/Output

– ALU operations

Dr. Wang

Data Flow (4: Interrupt Cycle)

• Simple &Predictable

• Current PC saved to allow resumption after interrupt– Contents of PC copied to MBR– Special memory location (e.g.

stack pointer) loaded to MAR– MBR written to memory

• PC loaded with address of ISR

• Next instruction (first of ISR) can be fetched

Dr. Wang

Agenda

• Instruction cycle – Fetch, indirect, execute, interrupt cycle

– Data flow

• Instruction pipelining– Principle

– Performance

– Problems

– Examples

Dr. Wang

A Laundry Example

• Let us assume there are four steps to the weekly (monthly) laundry: 4 loads

Dr. Wang

Do the Laundry

• Pipelined 4 loads

16 cycles

7 cycles

• Sequential 4 loads:

Dr. Wang

Principles of Pipelining• Tasks are subdivided into successive subtasks• A pipeline stage is associated with each subtask• The same amount of time is allocated to each subtask• All pipeline stages operate like an assembly line; 1st stage accepts

input, the last stage delivers the output• Basic pipeline is synchronous

Dr. Wang

Instruction Pipelining

• A key, powerful technique to make fast CPU• An ‘assembly line’ in computing used for instruction

processing; 6 stages of (nearly) equal duration– Fetch instruction (FI)– Decode instruction (DI)– Calculate operands, i.e. EAs (CO)– Fetch operands (FO)– Execute instruction (EI)– Write operand / result (WO)

• Multiple instructions are overlapped in execution

Dr. Wang

Timing of Instruction Pipeline (1)

54 cycles 14 cycles

Dr. Wang


• Time progresses vertically down the figure

• Each row shows the state of the pipeline at a given point in time

• Pipeline is full at time 6 through 9 with different instructions in different stages

Dr. Wang

Comments (1)• Each instruction is assumed to go through all 6

stages of the pipeline – not always the case, e.g., no WO stage for ‘LOAD’– timing is set up so for simplifying pipeline

hardware

• Assume no potential hazard– data dependency, branch, interrupt

Dr. Wang

Comments (2)• Assumes no memory conflicts

– Most memory systems don’t permit simultaneous accesses

– Desired value may be in cache, or FO, or WO may be null stage, or separate instruction and data memories are used pipeline is not slowed down for much of time

Dr. Wang


Dr. Wang

Agenda


– Data flow


– Performance

– Problems

– Examples

Dr. Wang

Pipeline Performance (1)

• Cycle time – the time available for each stage to accomplish the

required operations

– Determined by the worst-case processing time of the longest stage

– Currently pipelined processors: 2-20 ns

Dr. Wang


• Total time to execute n instructions– k: number of stages in the pipeline

– To complete the execution of the 1st ins: k cycles

– The remaining n-1 ins require n-1 cycles

)]1([ nkTk

Dr. Wang


• Speedup factor– Compared to execution without pipeline:

– The larger the # of pipeline stages, the larger the potential for speedup

)1()]1([1

nk

nk

nk

nk

T

TS

kk

Dr. Wang

Speedup Factor

Illustration


Dr. Wang


• Throughput– Also called “repetition rate”

– The shortest possible time interval between subsequent independent instructions in the pipeline

– When the basic pipe is full, throughput is 1 cycle

Dr. Wang

Hands-On Problem

If you have a simple 6-stage pipeline executing a basic code block containing 10 instructions. Assume the pipeline clock cycle time is 10ns and there is no potential hazard (data / branch / interrupt).

1. What is the total time to execute this block of code?

2. What is the repetition rate of this pipeline for this basic block?

3. What is the speedup factor?

Dr. Wang

Agenda


– Data flow


– Performance

– Problems

– Examples

Dr. Wang

Difficulties with Pipelining

• The stages are not of equal duration– use the worst-case processing time of the longest stage

– waiting must be involved

• Data hazard due to Read-After-Write dependency• Conditional branch instructions could invalidate

the fetched instructions behind them• Interrupt could invalidate the fetched instructions

Dr. Wang

Summary of Lecture #14

• Instruction cycle (elaborated version)– Fetch, indirect, execute, interrupt cycle– Data flow

• Instruction pipelining– Principle: assembly line– Performance measures– Problems / difficulties introduction

Dr. Wang

Things To Do

• Work on the project

• Check out the class website about lecture notes

Dr. Wang

Solution• T=(k+(n-1))*c, where k=6, the number of stages in the

pipeline; n=10, the number of instructions to be executed; c=10ns, the clock cycle time, so, the total time to execute the code is: 150ns

• Repetition rate also known as throughput, for this pipeline, the throughput is 1 cycle

• Speedup factor is the ratio of total execution time without pipelining to total execution time with pipelining. The total time without pipelining is n*k*c=600ns. So, the speedup factor s=600/150=4

Documents

ECE 456 Computer Architecture Lecture #14 – CPU (III) Instruction Cycle & Pipelining Instructor: Dr. Honggang Wang Fall 2013