Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
ECE 486/586
Computer Architecture
Lecture # 9
Spring 2019
Portland State University
Lecture Topics
• Basic Processor Hardware
• Pipelining
Reference:
• Appendix C: Section C.1
Overview
• A typical computing task consists of a sequence of instructionsthat constitute a program
• Processing Unit (processor) carries out these instructions
• An instruction is processed by performing a sequence of more rudimentary operations, such as:– Fetching the instruction from memory
– Decoding the instruction to understand its semantics
– Executing the instruction
• Different processor architectures differ in the way they perform the above operations
Processor Organization
Processor
Memory
PC
IR
MDR
Control
ALUR
N-1
R1
R0
MAR
N general-purpose
registers
Read/Write
Processor Registers
• Instruction register (IR)– Holds the instruction that is currently being executed
• Program counter (PC)– Contains address of the next instruction to be fetched/executed
• General-purpose register (R0 – Rn-1)– Hold operands that are currently being (or soon to be) processed
Instruction Processing Steps
• Initial State: PC is set to point to the first instruction
• For each instruction, perform the following 5 steps:
– Instruction Fetch (IF)
– Instruction Decode/Register Read (ID)
– Execution/Effective Address Calculation (EX)
– Memory Access (MEM)
– Write Back (WB)
5 Processing Steps
• Instruction Fetch:– Send PC to memory, assert MemRead signal
– Instruction read out from memory
– Place instruction in IR
– Update PC to next instruction: PC [PC] + 4
• Instruction Decode:– Instruction in IR decoded by control logic, instruction type and
operands determined
– Source operand registers read from general purpose register file
5 Processing Steps (cont.)
• Execute:– ALU operates on operands prepared in previous cycle
– One of four functions depending upon opcode• Memory Reference
– Form effective address from base register and immediate offset
– ALUOutput A + Imm
• Register-Register ALU Instruction– ALU Output A func B
• Register-Immediate ALU Instruction– ALU Output A func Imm
• Branch– Compute branch target by adding Imm to PC
– ALU Output PC + (Imm << 2)
– Evaluate the branch condition
5 Processing Steps (cont.)
• Memory Access:– For load instructions, read data from memory
– For store instructions, write data to memory
• Writeback: – Results written to destination register
Basic Processing Hardware
Register stage A Register stage B
Clock period must be longer than the delay of the combinational logic circuit
Multi-stage Processing Hardware
Register stage A Register stage B
• Combinational circuit broken down into multiple stages, each taking one clock cycle • If there are n stages, clock period is 1/n of the clock period on previous slide• Inter-stage registers hold a stage’s result, so they can be used as next stage’s inputs
5-stage MIPS Processor
FETCH: Fetch an instruction and increment the PC
DECODE: Decode the instruction (determine instruction type, source & destination registers) and read source registers from register file
EXECUTE: Perform an ALU operation
MEMORY ACCESS: Read or write memory data if the instruction involves a memory operation
WRITEBACK: Write the result into the destination register, if needed
Fetch Decode ExecuteMemory Access
Writeback
Arithmetic and Logic Instructions
Example
• Add R3, R4, R5 R3[R4]+[R5]
Source registers: R4, R5 Destination register: R3
Instruction steps:
• Fetch: Fetch the instruction and increment the program counter
• Decode: Decode the instruction in IR to determine the operation to be performed (add). Read the contents of registers R4 and R5
• Execute: Compute the sum [R4] + [R5]
• Memory Access: No action, since there are no memory operands
• Writeback: Write the result into register R3
Load Instructions
Example
• Load R5, X(R7) R5[[R7]+X]
Source register: R7 Destination register: R5
The immediate value X is given in the instruction word
Instruction steps:
• Fetch: Fetch the instruction and increment the program counter
• Decode: Decode the instruction in IR to determine the operation to be performed (load). Read the contents of register R7.
• Execute: Add the immediate value X to the contents of R7
• Memory Access: Use the sum X+[R7] as the effective address of the source operand, read the contents of that location from memory
• Writeback: Write the data received from memory into register R5
Store Instructions
Example
• Store R6, X(R8) X+[R8][R6]
Source registers: R6, R8 Destination register: None
The immediate value X is given in the instruction word
Instruction steps:
• Fetch: Fetch the instruction and increment the program counter
• Decode: Decode the instruction in IR to determine the operation to be performed (store). Read the contents of registers R6 and R8.
• Execute: Compute the effective address X + [R8]
• Memory Access: Store the contents of register R6 into memory location X + [R8]
• Writeback: No action
How to Improve Performance?
• So far, we have assumed that only one instruction is being processed by the multi-stage hardware at any point of time
• How do we decrease the execution time of a program?
• One possibility is to use faster circuits to implement the processor
– This approach will decrease the execution time of each instruction
• Another possibility is to arrange the processor hardware in such a way that multiple instructions can be processed at the same time. This approach is called pipelining
Pipelining does not change the time needed to perform a single instruction, but it increases the number of instructions performed
per second (instruction completion rate or throughput)
Pipelining: Example from Daily Life
• Laundry Example
• Four loads of laundry need
to be washed, dried and folded
• Washer takes 30 minutes
• Dryer takes 40 minutes
• “Folder” takes 20 minutes
A B C D
Pipelining: Example from Daily Life (cont.)
• Sequential laundry takes 6 hours for 4 loads
• If pipelining is used, how long would laundry take?
A
B
C
D
30 40 20 30 40 20 30 40 20 30 40 20
6 PM 7 8 9 10 11 Midnight
Time
• Pipelined laundry takes 3.5 hours for 4 loads
A
B
C
D
6 PM 7 8 9 10 11 Midnight
T
a
s
k
O
r
d
e
r
Time
30 40 40 40 40 20
Pipelining: Example from Daily Life (cont.)
• Pipelining doesn’t reduce the time taken by an individual task, it improves the throughput of entire workload
• Task completion rate limited by slowest pipeline stage
• Potential speedup = Number of pipeline stages
• Unbalanced lengths of pipeline stages reduces speedup
• Time to “fill” pipeline and time to “drain” it reduces speedup
A
B
C
D
6 PM 7 8 9
T
a
s
k
O
r
d
e
r
Time
30 40 40 40 40 20
Pipelining: Example from Daily Life (cont.)
Pipelining in Processors
• Exploit parallelism in sequential instruction stream
– Resources (e.g. ALU, memory, register file) can be used concurrently by different instructions
– Multiple instructions processed in parallel More instructions completed per unit time Higher throughput (performance)
Pipelining and ISA
• What is desirable in instruction sets for pipelining
– Variable length instructions vs. fixed length instructions?
– Memory operands supported for any instruction or memory operands only in loads/stores?
– Register operands in different places in each instruction format vs. in the same place in different instruction formats?
MIPS Choices for Pipelining
• 32-bit fixed instruction format
• Memory access only via load/store instruction
• Opcode field for all instruction formats in the same place
• Source registers for R-type and I-type instruction formats in the same place
• Single addressing mode for load/store instructions
• Simple branch conditions
5-Stage MIPS Pipeline
At any given time, each pipeline stage processes a different instructionRegister file appears in two stages: read in stage-2 and written in stage-5
IF
IF
ID
ID
EX
EX
MEM
MEM WB
WB
5-Stage MIPS Pipeline
How do we ensure that each stage has the correct inputs for the instruction that is being processed by that stage?
IF
IF
ID
ID
EX
EX
MEM
MEM WB
WB
Pipeline Registers
Information needed by an instruction is carried through the pipeline (via pipeline registers) as the instruction proceeds from one stage to the next
Representing Pipelined Execution
Time
IF ID EX MEM WB
Clock Cycle 1 2 3 4 5 6 7
Ij
Ij+1
Ij+2
IF
IF
ID
ID
EX
EX
MEM
MEM
WB
WB