ECE 486/586 Computer Architecture Lecture # 9

ECE 486/586

Computer Architecture

Lecture # 9

Spring 2019

Portland State University

Lecture Topics

• Basic Processor Hardware

• Pipelining

Reference:

• Appendix C: Section C.1

Overview

• A typical computing task consists of a sequence of instructionsthat constitute a program

• Processing Unit (processor) carries out these instructions

• An instruction is processed by performing a sequence of more rudimentary operations, such as:– Fetching the instruction from memory

– Decoding the instruction to understand its semantics

– Executing the instruction

• Different processor architectures differ in the way they perform the above operations

Processor Organization

Processor

Memory

PC

IR

MDR

Control

ALUR

N-1

R1

R0

MAR

N general-purpose

registers

Read/Write

Processor Registers

• Instruction register (IR)– Holds the instruction that is currently being executed

• Program counter (PC)– Contains address of the next instruction to be fetched/executed

• General-purpose register (R0 – Rn-1)– Hold operands that are currently being (or soon to be) processed

Instruction Processing Steps

• Initial State: PC is set to point to the first instruction

• For each instruction, perform the following 5 steps:

– Instruction Fetch (IF)

– Instruction Decode/Register Read (ID)

– Execution/Effective Address Calculation (EX)

– Memory Access (MEM)

– Write Back (WB)

5 Processing Steps

• Instruction Fetch:– Send PC to memory, assert MemRead signal

– Instruction read out from memory

– Place instruction in IR

– Update PC to next instruction: PC [PC] + 4

• Instruction Decode:– Instruction in IR decoded by control logic, instruction type and

operands determined

– Source operand registers read from general purpose register file

5 Processing Steps (cont.)

• Execute:– ALU operates on operands prepared in previous cycle

– One of four functions depending upon opcode• Memory Reference

– Form effective address from base register and immediate offset

– ALUOutput A + Imm

• Register-Register ALU Instruction– ALU Output A func B

• Register-Immediate ALU Instruction– ALU Output A func Imm

• Branch– Compute branch target by adding Imm to PC

– ALU Output PC + (Imm << 2)

– Evaluate the branch condition

5 Processing Steps (cont.)

• Memory Access:– For load instructions, read data from memory

– For store instructions, write data to memory

• Writeback: – Results written to destination register

Basic Processing Hardware

Register stage A Register stage B

Clock period must be longer than the delay of the combinational logic circuit

Multi-stage Processing Hardware

Register stage A Register stage B

• Combinational circuit broken down into multiple stages, each taking one clock cycle • If there are n stages, clock period is 1/n of the clock period on previous slide• Inter-stage registers hold a stage’s result, so they can be used as next stage’s inputs

5-stage MIPS Processor

FETCH: Fetch an instruction and increment the PC

DECODE: Decode the instruction (determine instruction type, source & destination registers) and read source registers from register file

EXECUTE: Perform an ALU operation

MEMORY ACCESS: Read or write memory data if the instruction involves a memory operation

WRITEBACK: Write the result into the destination register, if needed

Fetch Decode ExecuteMemory Access

Writeback

Arithmetic and Logic Instructions

Example

• Add R3, R4, R5 R3[R4]+[R5]

Source registers: R4, R5 Destination register: R3

Instruction steps:

• Fetch: Fetch the instruction and increment the program counter

• Decode: Decode the instruction in IR to determine the operation to be performed (add). Read the contents of registers R4 and R5

• Execute: Compute the sum [R4] + [R5]

• Memory Access: No action, since there are no memory operands

• Writeback: Write the result into register R3

Load Instructions

Example

• Load R5, X(R7) R5[[R7]+X]

Source register: R7 Destination register: R5

The immediate value X is given in the instruction word

Instruction steps:


• Decode: Decode the instruction in IR to determine the operation to be performed (load). Read the contents of register R7.

• Execute: Add the immediate value X to the contents of R7

• Memory Access: Use the sum X+[R7] as the effective address of the source operand, read the contents of that location from memory

• Writeback: Write the data received from memory into register R5

Store Instructions

Example

• Store R6, X(R8) X+[R8][R6]

Source registers: R6, R8 Destination register: None

The immediate value X is given in the instruction word

Instruction steps:


• Decode: Decode the instruction in IR to determine the operation to be performed (store). Read the contents of registers R6 and R8.

• Execute: Compute the effective address X + [R8]

• Memory Access: Store the contents of register R6 into memory location X + [R8]

• Writeback: No action

How to Improve Performance?

• So far, we have assumed that only one instruction is being processed by the multi-stage hardware at any point of time

• How do we decrease the execution time of a program?

• One possibility is to use faster circuits to implement the processor

– This approach will decrease the execution time of each instruction

• Another possibility is to arrange the processor hardware in such a way that multiple instructions can be processed at the same time. This approach is called pipelining

Pipelining does not change the time needed to perform a single instruction, but it increases the number of instructions performed

per second (instruction completion rate or throughput)

Pipelining: Example from Daily Life

• Laundry Example

• Four loads of laundry need

to be washed, dried and folded

• Washer takes 30 minutes

• Dryer takes 40 minutes

• “Folder” takes 20 minutes

A B C D

Pipelining: Example from Daily Life (cont.)

• Sequential laundry takes 6 hours for 4 loads

• If pipelining is used, how long would laundry take?

A

B

C

D

30 40 20 30 40 20 30 40 20 30 40 20

6 PM 7 8 9 10 11 Midnight

Time

• Pipelined laundry takes 3.5 hours for 4 loads

A

B

C

D

6 PM 7 8 9 10 11 Midnight

T

a

s

k

O

r

d

e

r

Time

30 40 40 40 40 20


• Pipelining doesn’t reduce the time taken by an individual task, it improves the throughput of entire workload

• Task completion rate limited by slowest pipeline stage

• Potential speedup = Number of pipeline stages

• Unbalanced lengths of pipeline stages reduces speedup

• Time to “fill” pipeline and time to “drain” it reduces speedup

A

B

C

D

6 PM 7 8 9

T

a

s

k

O

r

d

e

r

Time

30 40 40 40 40 20


Pipelining in Processors

• Exploit parallelism in sequential instruction stream

– Resources (e.g. ALU, memory, register file) can be used concurrently by different instructions

– Multiple instructions processed in parallel More instructions completed per unit time Higher throughput (performance)

Pipelining and ISA

• What is desirable in instruction sets for pipelining

– Variable length instructions vs. fixed length instructions?

– Memory operands supported for any instruction or memory operands only in loads/stores?

– Register operands in different places in each instruction format vs. in the same place in different instruction formats?

MIPS Choices for Pipelining

• 32-bit fixed instruction format

• Memory access only via load/store instruction

• Opcode field for all instruction formats in the same place

• Source registers for R-type and I-type instruction formats in the same place

• Single addressing mode for load/store instructions

• Simple branch conditions

5-Stage MIPS Pipeline

At any given time, each pipeline stage processes a different instructionRegister file appears in two stages: read in stage-2 and written in stage-5

IF

IF

ID

ID

EX

EX

MEM

MEM WB

WB

5-Stage MIPS Pipeline

How do we ensure that each stage has the correct inputs for the instruction that is being processed by that stage?

IF

IF

ID

ID

EX

EX

MEM

MEM WB

WB

Pipeline Registers

Information needed by an instruction is carried through the pipeline (via pipeline registers) as the instruction proceeds from one stage to the next

Representing Pipelined Execution

Time

IF ID EX MEM WB

Clock Cycle 1 2 3 4 5 6 7

Ij

Ij+1

Ij+2

IF

IF

ID

ID

EX

EX

MEM

MEM

WB

WB

Documents

ECE 486/586 Computer Architecture Lecture # 9