26
The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

  • View
    221

  • Download
    3

Embed Size (px)

Citation preview

Page 1: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

The ProcessorData Path & Control

Chapter 5Part 2 - Multi-Clock Cycle Design

N. Guydosh

2/29/04

Page 2: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

A Multicycle Design• As pointed out earlier, a single clock cycle design has a performance bottleneck – namely the

instruction requiring the longest time will determine the time for all other instructions even simple instructions such as jump (j).

• The offending instruction is the load word memory instruction (lw)lw uses five functional units of in series:

– Instruction memory (fetch)– The register file (read)– The ALU (compute address)– Data memory (read)– The register file (write)

• Several instruction classes could fit into a shorter clock cycle, thus overall performance will be compromised – memory instructions may not be a frequently used instruction – but it determines timing for “simpler” instructions.

• In addition single cycle increases hardware units – cannot multiplex in time.

• See performance example on pp.373-375

Page 3: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

Multiple Clock Cycle Design ... Overview• Allows shorter clock cycle

– Clock cycle derived from longest functional unit delay and not the longest total data path delay

• Multiple clock pulses per instruction

• Use a clock pulse for each functional unit (memory, register file, alu, ... ). ... “multiplex” (share) in time not in space (as with single clock)

• Instead of a single long clock pulse, use a sequence of short pulses Average instruction time will be shorter - short instructions will not have to wait idle for the clock to time out.

• Another advantage: hardware is reduced.Instructions and data stored in same memory A single ALU will do all the arithmetic Time shared functional units.

• This approach is easily extended to pipelining which allows multiple instructions to executed at one time and which will further enhance performance (chapter 6).

Page 4: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

Multiple Clock Cycle DesignDesign Details

• We will use the single clock design as a starting point fig. 5.29, p. 372)• This design will be “compressed” into what you see in fig. 5.30, p. 378 ... Single inst

memory & one ALU for all (see next).– Single memory unit for both instructions and data– Single ALU instead of ALU and two adders– One or more “buffer” registers added after every major functional unit to hold output until

next subsequent clock cycle.

• Because functional units are now shared for various phases of execution, must add MUX’s and extend some existing MUX’s. See fig 5.31, page 380 (see later).

• A few additional hardware elements will also be added to resolve situations where stored data may change in one execution phase before a down stream phase gets to us it:

– For now: we add an instruction register (IR) - this is to hold the instruction for later phases during execution after the PC is changed during the fetch stage and perhaps the memory get reused during execution.

Page 5: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

MultiClock Cycle DesignDesign Details – High Level View

PC

Memory

Address

Instructionor data

Data

Instructionregister

Registers

Register #

Data

Register #

Register #

ALU

Memorydata

register

A

B

ALUOut

Fig 5.30

Add register buffers between functional “stages”.Single memory holds instructions & dataOne ALU does all

Page 6: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

MultiClock Cycle DesignSupport for Basic Instructions

Shiftleft 2

PC

Memory

MemData

Writedata

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Mux

0

1

Mux

0

1

4

Instruction[15– 0]

Signextend

3216

Instruction[25– 21]

Instruction[20– 16]

Instruction[15– 0]

Instructionregister

1 Mux

0

3

2

Mux

ALUresult

ALUZero

Memorydata

register

Instruction[15– 11]

A

B

ALUOut

0

1

Address

Fig 5.31

Page 7: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

Multiple Clock Cycle DesignDesign Details (continued)

• We will also need more control lines, and modify existing ones Control is now sequential and dynamic:

– Time and opcode dependent

– Things start to get hairy when we design the controller ... But “Don’t Panic!” as Douglas Adams would say.

• The first cut at the data path design with control lines shown is fig 5.32, page 381 All storage elements will need a separate write signal, and a read signal is needed for memory. The old ALU controller from single clock pulse will be reused.

• See fig 5.34 p. 384 for a complete summary of the control line functions.

Page 8: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

MultiClock Cycle DesignControl Lines Shown

Shiftleft 2

MemtoReg

IorD MemRead MemWrite

PC

Memory

MemData

Writedata

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Instruction[15– 11]

Mux

0

1

Mux

0

1

4

ALUOpALUSrcB

RegDst RegWrite

Instruction[15– 0]

Instruction [5– 0]

Signextend

3216

Instruction[25– 21]

Instruction[20– 16]

Instruction[15– 0]

Instructionregister

1 Mux

0

3

2

ALUcontrol

Mux

0

1ALU

resultALU

ALUSrcA

ZeroA

B

ALUOut

IRWrite

Address

Memorydata

register

Fig 5.32Add and extend MUX’s for hardware sharing.Add control lines which must be generated.

Page 9: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

MultiClock Cycle DesignControl Units To Be Designed are Depicted

Shiftleft 2

PCMux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Instruction[15– 11]

Mux

0

1

Mux

0

1

4

Instruction[15– 0]

Signextend

3216

Instruction[25– 21]

Instruction[20– 16]

Instruction[15– 0]

Instructionregister

ALUcontrol

ALUresult

ALUZero

Memorydata

register

A

B

IorD

MemRead

MemWrite

MemtoReg

PCWriteCond

PCWrite

IRWrite

ALUOp

ALUSrcB

ALUSrcA

RegDst

PCSource

RegWrite

Control

Outputs

Op[5– 0]

Instruction[31-26]

Instruction [5– 0]

Mux

0

2

Jumpaddress [31-0]Instruction [25– 0] 26 28

Shiftleft 2

PC [31-28]

1

1 Mux

0

3

2

Mux

0

1ALUOut

Memory

MemData

Writedata

Address

Fig 5.33 Shows control units to be designed.Compare to fig. 5.29 for single clock cycle

memory addr. for sw data write or register data write for R inst.

beq address

PC+4

Write PC if(beq & ALU is 0) or j inst or PC+4

for beq

For sw or lw

Data to memory for sw

Blue is for comments

op

rs

rt

rd

sw addr reg

data write inst funct field

Page 10: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

MultiClock Cycle DesignControl Lines Defined – Part 1

Fig 5.34 – part 1, Notes:lw uses rt as destinationFor de-asserting RegWrite: Register file is read by defaultMemtoReg chooses between memory or ALU as a sourceIorD chooses between Inst fetch ors data access addressesAlthough IRWrite cause memory output to go the IR, it also benignly goes to MDRPCWrite changes PC only for PC+4 or j instructions – is de-asserted fir beq.PCWriteCond is for beq instruction

Page 11: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

MultiClock Cycle DesignControl Lines Defined – Part 2

Fig 5.34 – part 2

beq

j inst

Page 12: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

MultiClock Cycle DesignSplitting The Instruction Execution Into Clock Cycle “Phases”

• Goal: break up execution into phases in such a way as to balance the amount of work done in each phase.

• Each phase will be one clock cycle and correspond to one of the “functional units” in the single clock cycle design

• Restrict each phase to contain at most one ALU operation, or one register file (or other register) access, or one memory access.

• All operations for one phase occur in parallel within one clock cycle.

• This strategy also will put us in a good position for pipelining later. • NOTE: In the control signal descriptions which follows, we assumed that if

a control signal is not mentioned or assigned a value, it is de-asserted by default.

Page 13: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

MultiClock Cycle DesignGeneric Phases – Phase 1

• Phase 1: Instruction Fetch– IR = memory[PC]; – PC += 4; – Assert IRWrite and MemRead and set IorD to 0 (select PC) and bump PC by 4. – ==> ALUSrcA =0, ALUSrcB =01, ALUOp = 00 (add). – Store incremented address back to PC by setting PCSource = 00 and PCWrite = 1

Note that the incremented PC is also stored in ALUOut which is redundant and “benign”. Note Asserting PCSource = 00 is not explicitly mentioned in the book.

– Comment: the memory access based on PC and incrementing of the PC is allowed because if our edge triggering assumption – see examples in slides 3 and 4 of the 1st set of Chapter 5 PPT notes (single clock cycle). We assume that the PC value is captured by the memory unit before it gets updated - a consequence of edge triggering. This value is available at the very beginning of the clock cycle, and because of delays, the PC is updated a little later in the cycle.

– Comment: it is also assumed that the memory can be read and set into the IR during this same clock cycle edge. Assume a faster point-to-point memory to IR connection rather than a bus. See “Elaboration” on page 382.

– Comment: it appears that the MDR benignly also get the instruction because it is unconditionally set. If the instructions is lw, this value in the MDR will get correctly overwritten later in the cycle. This is not mentioned in the book.

Page 14: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

MultiClock Cycle DesignGeneric Phases – Phase 2

• Phase 2: Decode & Register Fetch– “optimistic (maybe premature) actions” done here - may not use all results

down stream - but faster and will not cause any problem if not used. – A = register[ IR[25-21] ]; /* rs field */

B = register[ IR[20-16] ]; /* rt field */ALUOut = Target_addr = pc + ( sign_ext(IR[15-0]) << 2); /* target_addr calc is “optimistic” */

– Target_addr may get used (if instruction is conditional branch) – otherwise it gets harmlessly discarded. … it is easier to calculate it early and through it away if not needed than to have to recalculate it later.

– ALUSrcA set to 0 (choose PC) & ALUSrcB set to 11(choose offset field which is both sign extended and shifted to byte boundary)

– and ALUOp = 00 (add)

Page 15: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

MultiClock Cycle Design Instruction Content Depended Phases – Phase 3

Memory address computation & arithmetic R-type execution

• Memory Reference For Data – ALUOut = A + sign_extend( IR[15-0]); /* ALU operation */– ALUSrcA set to 1

ALUSrcB set to 10 (use sign extended value)ALUOp set to 00 (add)

• Arithmetic-logical R-type instruction– ALUOut = A op B– ALUSrcA set to 1

ALUSrcB set to 00ALUOp set to 10 – funct field used to determine the ALU control settings

Page 16: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

MultiClock Cycle DesignInstruction Content Depended Phases – Phase 3 (continued)

Branch completion• Branch (beq)

– if (A == B) PC = ALUOut;– “equal” compare reg A & B, if equal, set Zero output of ALU which means branch successful– ALUSrcA set to 1

ALUSrcB set to 00ALUOp set to 01 (subtract)PCWriteCond assertedPCSource set to 01 /* PC taken from ALUOut */PCWrite de-asserted (by default) – prevents beq address from being set when unsuccessful branch - ALU zero output must be set to use beq address.

– For successful beq’s, few write the PC twice: once from direct ALU output during decode/fetch, and once from ALUout as as above in this step – the last one is used

• Jump (j)– PC = PC[31-28] || (IR[25-0] <<2); /* concatenate 26 bit offset shifted to 28 bits */

/* to PC high 4 bits */– Set PCSource = 01 /* selects jump address */

PCWrite asserted to overwrite current contents of PC with jump address.

Page 17: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

MultiClock Cycle DesignInstruction Content Depended Phases – Phase 4

Memory access• MDR = Memory[ALUOut]; /* for lw */

Memory[ALUOut] = B; /* sw, source operand saved in B */– MemRead for lw or MemWrite for sw are asserted

IorD set to 1 to get data address instead of PC

Arithmetic-logical R-type instruction completion

• Reg[IR[15-11] = ALUOut; /* set rd from ALUout */– RegDst set to 1 to pick up rd and no rt

assert RegWriteSet MemtoReg to 0 to write from ALU and not from memory

Page 18: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

MultiClock Cycle DesignInstruction Content Depended Phases – Phase 5

Memory read completion step (for lw the longest instruction)

• Reg[IR[20-16]] = MDR; /*write back to reg from memory for sw */ /* writes to rt register */

– Set MemtoReg to 1 to write from memory and not ALUassert RegWrite to cause a write to register fileset RegDst to 0 to choose the rt register and not rd

Page 19: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

MultiClock Cycle Design Summary of steps in each phase

Fig. 5.35

Page 20: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

High Level View of Finite State Machine Control

Memory accessinstructions(Figure 5.38)

R-type instructions(Figure 5.39)

Branch instruction(Figure 5.40)

Jump instruction(Figure 5.41)

Instruction fetch/decode and register fetch(Figure 5.37)

Start

Fig. 5.36

Page 21: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

Instruction Fetch & Decode

ALUSrcA = 0ALUSrcB = 11ALUOp = 00

MemReadALUSrcA = 0

IorD = 0IRWrite

ALUSrcB = 01ALUOp = 00

PCWritePCSource = 00

Instruction fetchInstruction decode/

Register fetch

(Op = 'LW') or (Op = 'SW') (Op = R-type)

(Op

= 'B

EQ')

(Op

= 'J

MP

')

01

Start

Memory reference FSM(Figure 5.38)

R-type FSM(Figure 5.39)

Branch FSM(Figure 5.40)

Jump FSM(Figure 5.41)

Fig. 5.37

Page 22: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

Memory Reference Instructions

MemWriteIorD = 1

MemReadIorD = 1

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

RegWriteMemtoReg = 1

RegDst = 0

Memory address computation

(Op = 'LW') or (Op = 'SW')

Memoryaccess

Write-back step

(Op

= 'L

W')

4

2

53

From state 1

To state 0(Figure 5.37)

Memoryaccess

Fig. 5.38

Page 23: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

R-type Instruction

ALUSrcA = 1ALUSrcB = 00ALUOp = 10

RegDst = 1RegWrite

MemtoReg = 0

Execution

R-type completion

6

7

(Op = R-type)

From state 1

To state 0(Figure 5.37)

Fig. 5.39

Page 24: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

Branch and Jump Instruction

Branch completion

8

(Op = 'BEQ')

From state 1

To state 0(Figure 5.37)

ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCWriteCond

PCSource = 01

Jump completion

9

(Op = 'J')

From state 1

To state 0(Figure 5.37)

PCWritePCSource = 10

Branch (beq) instruction jump instruction

Fig. 5.40 Fig. 5.41

Page 25: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

Complete State Machine for Multi-cycle Controller (fig 5.33)

PCWritePCSource = 10

ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCWriteCond

PCSource = 01

ALUSrcA =1ALUSrcB = 00ALUOp= 10

RegDst = 1RegWrite

MemtoReg = 0

MemWriteIorD = 1

MemReadIorD = 1

ALUSrcA = 1ALUSrcB = 10ALUOp = 00

RegDst = 0RegWrite

MemtoReg =1

ALUSrcA = 0ALUSrcB = 11ALUOp = 00

MemReadALUSrcA = 0

IorD = 0IRWrite

ALUSrcB = 01ALUOp = 00

PCWritePCSource = 00

Instruction fetchInstruction decode/

register fetch

Jumpcompletion

BranchcompletionExecution

Memory addresscomputation

Memoryaccess

Memoryaccess R-type completion

Write-back step

(Op

= 'J

')

(Op =

'LW

')

4

01

9862

753

Start

Fig. 5.42

Page 26: The Processor Data Path & Control Chapter 5 Part 2 - Multi-Clock Cycle Design N. Guydosh 2/29/04

A Possible Implementation of the Multi-cycle Control Unit

Datapath control outputs

State registerInputs from instructionregister opcode field

Outputs

Combinationalcontrol logic

Inputs

Next state

Fig. 5.43

Current state

See appendix CFor implementationDetails.