124
1 2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1 2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

  • View
    231

  • Download
    1

Embed Size (px)

Citation preview

Page 1: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

12004 Morgan Kaufmann Publishers

Chapter Five

The Processor : Datapath and Control

Page 2: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

22004 Morgan Kaufmann Publishers

Page 3: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

32004 Morgan Kaufmann Publishers

Outline

• 5.1 Introduction

• 5.2 Logic Design Conventions

• 5.3 Building a Datapath

• 5.4 A simple Implementation Scheme

• 5.5 A Multicycle Implementation

• 5.6 Exceptions

• 5.9 Real Stuff: The Organization of Recent Pentium

• 5.10 Fallacies and Pitfalls

• 5.11 Concluding Remarks

• 5.12 Historical Perspective and Further Reading

Page 4: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

42004 Morgan Kaufmann Publishers

5.1 Introduction

Page 5: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

52004 Morgan Kaufmann Publishers

• We're ready to look at an implementation of the MIPS

• Simplified to contain only:

– memory-reference instructions: lw, sw – arithmetic-logical instructions: add, sub, and, or, slt– control flow instructions: beq, j

• Generic Implementation:

– use the program counter (PC) to supply instruction address

– get the instruction from memory

– read registers

– use the instruction to decide exactly what to do

• All instructions use the ALU after reading the registers

Why? memory-reference? arithmetic? control flow?

The Processor: Datapath & Control

Page 6: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

62004 Morgan Kaufmann Publishers

• Abstract / Simplified View:

• Two types of functional units:

– elements that operate on data values (combinational)

– elements that contain state (sequential)

More Implementation Details

Page 7: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

72004 Morgan Kaufmann Publishers

FIGURE 3.14 MIPS architecture revealed thus farMIPS assembly language

Category Instruction Example Meaning Comments

Arithmetic

add add $s1, $s2, $s3 $s1 = $s2 + $s3 Three operands; Overflow detected

subtract sub $s1, $s2, $s3 $s1 = $s2 - $s3 Three operands; Overflow detected

add immediate addi $s1, $s2, 100 $s1 = $s2+ 100 + constants; overflow detected

add unsigned addu $s1, $s2, $s3 $s1 = $s2 + $s3 Three operands; overflow undetected

subtract unsigned subu $s1, $s2, $s3 $s1 = $s2 - $s3 Three operands; overflow undetected

add immediate unsigned

addiu $s1, $s2, 100 $s1 = $s2+ 100 + constants; overflow detected

move from coprocessor register

mfc0 $s1, $epc $s1 = $epc Copy Exception PC + special regs

multiply mult $s2, $s3 Hi, Lo = $s2 x $s3 64-bit signed product in Hi, Lo

multiply unsigned multu $s2, $s3 Hi, Lo = $s2 x $s3 64-bit unsigned product in Hi, Lo

divide div $s2, $s3 Lo = $s2 / $s3

Hi = $s2 mod $s3

Lo = quotient, Hi = remainder

divide unsigned divu $s2, $s3 Lo = $s2 / $s3

Hi = $s2 mod $s3

Unsigned quotient and remainder

move from Hi mfhi $s1 $s1 = Hi Used to get copy of Hi

move from Lo mflo $s1 $s1 = Lo Used to get copy of Lo

Data transfer

load word lw $s1, 100($s2) $s1 = Memory [$s2 + 100] Word from memory to register

store word sw $s1, 100 ($s2) Memory [$s2 + 100] = $s1 Word from register to memory

load half unsigned lh $s1, 100($s2) $s1 = Memory [$s2 + 100] Halfword memory to register

store half sh $s1, 100 ($s2) Memory [$s2 + 100] = $s1 Halfword register to memory

load byte unsigned lb $s1, 100($s2) $s1 = Memory [$s2 + 100] Byte from memory to register

store byte sb $s1, 100 ($s2) Memory [$s2 + 100] = $s1 Byte from register to memory

load upper immed. lui $s1, 100 $s1 = 100 * 2^16 Loads constant in upper 16 bits

Page 8: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

82004 Morgan Kaufmann Publishers

Continue..

Logical

and add $s1, $s2, $s3 $s1 = $s2 & $s3 Three reg. operands; bit-by-bit AND

or or $s1, $s2, $s3 $s1 = $s2 | $s3 Three reg. operands; bit-by-bit OR

nor nor $s1, $s2, $s3 $s1 = ~($s2 | $s3) Three reg. operands; bit-by-bit NOR

and immediate andi $s1, $s2, 100 $s1 = $s2 & 100 Bit-by-bit AND reg with constant

or immediate ori $s1, $s2, 100 $s1 = $s2 | 100 Bit-by-bit OR reg with constant

shift left logical sll $s1, $s2, 10 $s1 = $s2 << 10 Shift left by constant

shift right logical srl $$s1, $s2, 10 $s1 = $s2 >> 10 Shift right by constant

Conditional branch

branch on equal beq $s1, $s2, 25 if ($s1 == $s2) go to

PC+4+100

Equal test; PC-relative branch

branch on not equal bne $s1, $s2, 25 if ($s1 != $s2) go to L

PC+4+100

Not equal test; PC-relative

set on less than slt $s1, $s2, $s3 if ($s2 < $s3) $s1 = 1;

else $s1 = 0

Compare less than; two’s complement

set on less than immediate

slt $s1, $s2, 100 if ($s2 < 100) $s1 = 1;

else $s1 = 0

Compare < constant;

Two’s complement

set less than unsigned

sltu $s1, $s2, $s3 if ($s2 < $s3) $s1 = 1;

else $s1 = 0

Compare less than; natural numbers

set less than immediate unsigned

sltuiu $s1, $s2, 100

if ($s2 < 100) $s1 = 1;

else $s1 = 0

Compare< constant;

natural numbers

Unconditional jump

jump j 2500 go to 10000 Jump to target address

jump register jr $ra go to $ra For switch, procedure return

jump and link jal 2500 $ra = PC + 4; go to 10000

For procedure call

Page 9: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

92004 Morgan Kaufmann Publishers

Name Format Example Comments

add.s R 17 16 6 4 2 0 add.s $f2, $f4, $f6

sub.s R 17 16 6 4 2 1 sub.s $f2, $f4, $f6

mul.s R 17 16 6 4 2 2 mul.s $f2, $f4, $f6

div.s R 17 16 6 4 2 3 div.s $f2, $f4, $f6

add.d R 17 17 6 4 2 0 add.d $f2, $f4, $f6

sub.d R 17 17 6 4 2 1 sub.d $f2, $f4, $f6

mul.d R 17 17 6 4 2 2 mul.d $f2, $f4, $f6

div.d R 17 17 6 4 2 3 div.d $f2, $f4, $f6

lwc1 I 49 20 2 100 lwc1 $f2, $f4, $f6

swc1 I 57 20 2 100 sec1 $f2, $f4, $f6

bc1t I 17 8 1 25 bc1t 25

bc1f I 17 8 0 25 bc1f 25

c. lt. s R 17 16 4 2 0 60 c. lt. s $f2, $f4

c. lt. d R 17 17 4 2 0 60 c. lf. d $f2, $f4

Field size 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits ALL MIPS instructions 32 bits

MIPS floating-point machine language

Page 10: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

102004 Morgan Kaufmann Publishers

Figure 5.2 The basic implementation of the MIPS subset including the necessary multiplexers and control lines.

Page 11: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

112004 Morgan Kaufmann Publishers

5.2 Logic Design Conventions

Page 12: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

122004 Morgan Kaufmann Publishers

Keywords

• Clocking methodology The approach used to determine when data is valid and stable relative to the clock.

• Edge-triggered clocking A clocking scheme in which all state changes occur on a clock edge.

• Control signal A signal used for multiplexer selection or for directing the operation of a function unit; contrasts with a data signal, which contains information that is operated on by a functional unit.

Page 13: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

132004 Morgan Kaufmann Publishers

• Unclocked vs. Clocked

• Clocks used in synchronous logic

– when should an element that contains state be updated?

State Elements

Clock period Rising edge

Falling edge

cycle time

Page 14: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

142004 Morgan Kaufmann Publishers

• The set-reset latch

– output depends on present inputs and also on past inputs

An unclocked state element

R

S

Q

Q

Page 15: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

152004 Morgan Kaufmann Publishers

• Output is equal to the stored value inside the element(don't need to ask for permission to look at the value)

• Change of state (value) is based on the clock

• Latches: whenever the inputs change, and the clock is asserted

• Flip-flop: state changes only on a clock edge(edge-triggered methodology)

"logically true", — could mean electrically low

A clocking methodology defines when signals can be read and written— wouldn't want to read a signal at the same time it was being written

Latches and Flip-flops

Page 16: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

162004 Morgan Kaufmann Publishers

• Two inputs:

– the data value to be stored (D)

– the clock signal (C) indicating when to read & store D

• Two outputs:

– the value of the internal state (Q) and it's complement

D-latch

Q

C

D

_Q

D

C

Q

Page 17: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

172004 Morgan Kaufmann Publishers

D flip-flop

• Output changes only on the clock edge

D

C

Q

D

C

Dlatch

D

C

QD

latch

D

C

Q Q

QQ

Page 18: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

182004 Morgan Kaufmann Publishers

Our Implementation

• An edge triggered methodology

• Typical execution:

– read contents of some state elements,

– send values through some combinational logic

– write results to one or more state elements

Stateelement

1

Stateelement

2Combinational logic

Clock cycle

Page 19: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

192004 Morgan Kaufmann Publishers

Figure 5.4 An edge-triggered methodology allows a state element to be read and written in the same clock cycle without creating a race that could lead to indeterminate data values.

Page 20: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

202004 Morgan Kaufmann Publishers

5.3 Building a Datapath

Page 21: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

212004 Morgan Kaufmann Publishers

Keywords

• Datapath element A functional unit used to operate on or hold data within a processor. In the MIPS implementation the datapath elements include the instruction and data memories, the register file, the arithmetic logic unit (ALU), and adders.

• Program counter (PC) The register containing the address of the instruction in the program being executed.

• Register file A state element that consists of a set of registers that can be read and written by supplying a register number to be accessed.

• Sign-extend To increase the size of a data item by replicating the high-order sign bit of the original data item in the high-order bits of the larger, destination data item.

Page 22: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

222004 Morgan Kaufmann Publishers

Keywords

• Branch target address The address specified in a branch, which becomes the new program counter (PC) if the branch is taken. In the MIPS architecture the branch target is given by the sum of the offset field of the instruction and the address of the instruction following the branch.

• Branch taken A branch where the branch condition is satisfied and the program counter (PC) becomes the branch target. All unconditional branches are taken branches.

• Branch not taken A branch where the branch condition is false and the program counter (PC) becomes the address of the instruction that sequentially follows the branch.

• Delayed branch A type of branch where the instruction immediately following the branch is always executed, independent of whether the branch condition is true or false.

Page 23: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

232004 Morgan Kaufmann Publishers

• Built using D flip-flops

Register File

Read registernumber 1 Read

data 1Read registernumber 2

Readdata 2

Writeregister

WriteWritedata

Register file

Read registernumber 1

Register 0

Register 1

. . .

Register n – 2

Register n – 1

M

u

x

Read registernumber 2

M

u

x

Read data 1

Read data 2

Do you understand? What is the “Mux” above?

Page 24: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

242004 Morgan Kaufmann Publishers

Abstraction

• Make sure you understand the abstractions!

• Sometimes it is easy to think you do, when you don’t

Mux

C

Select

32

32

32

B

A

Mu

x

Select

B31

A31

C31

Mux

B30

A30

C30

Mux

B0

A0

C0

...

...

Page 25: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

252004 Morgan Kaufmann Publishers

Register File

• Note: we still use the real clock to determine when to write

Write

01

n-to-2n

decoder

n – 1

n

Register 0

C

D

Register 1

C

D

Register n – 2

C

D

Register n – 1

C

D

...

Register number...

Register data

Page 26: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

262004 Morgan Kaufmann Publishers

Simple Implementation

• Include the functional units we need for each instruction

PC

Instructionaddress

Instruction

Instructionmemory

Add Sum

a. Instruction memory b. Program counter c. Adder

Page 27: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

272004 Morgan Kaufmann Publishers

AddressReaddata

Datamemory

a. Data memory unit

Writedata

MemRead

MemWrite

b. Sign-extension unit

Signextend

16 32

Page 28: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

282004 Morgan Kaufmann Publishers

Readregister 1

Readregister 2

Writeregister

WriteData

Registers ALUData

Data

Zero

ALUresult

RegWrite

a. Registers b. ALU

5

5

5

Registernumbers

Readdata 1

Readdata 2

ALU operation4

Why do we need this stuff?

Page 29: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

292004 Morgan Kaufmann Publishers

Figure 5.10 The datapath for the memory instructions and the R-type instructions.

Page 30: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

302004 Morgan Kaufmann Publishers

Building the Datapath

• Use multiplexors to stitch them together

Readregister 1

Readregister 2

Writeregister

Writedata

Writedata

Registers ALU

Add

Zero

RegWrite

MemRead

MemWrite

PCSrc

MemtoReg

Readdata 1

Readdata 2

ALU operation4

Signextend

16 32

InstructionALU

result

Add

ALUresult

Mux

Mux

Mux

ALUSrc

Address

Datamemory

Readdata

Shiftleft 2

4

Readaddress

Instructionmemory

PC

Page 31: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

312004 Morgan Kaufmann Publishers

5.4 A Simple Implementation Scheme

Page 32: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

322004 Morgan Kaufmann Publishers

Keywords

• Don’t-care term An element of a logic function in which the output does not depend on the values of all the inputs. Don’t-care terms may be specified in different ways.

• Opcode The field that denotes the operation and format of an instruction.

• Single-cycle implementation Also called single clock cycle implementation. An implementation in which an instruction is executed in one clock cycle.

Page 33: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

332004 Morgan Kaufmann Publishers

Control

• Selecting the operations to perform (ALU, read/write, etc.)

• Controlling the flow of data (multiplexor inputs)

• Information comes from the 32 bits of the instruction

• Example:

add $8, $17, $18 Instruction Format:

000000 10001 10010 01000 00000 100000

op rs rt rd shamt funct

• ALU's operation based on instruction type and function code

Page 34: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

342004 Morgan Kaufmann Publishers

• e.g., what should the ALU do with this instruction• Example: lw $1, 100($2)

35 2 1 100

op rs rt 16 bit offset

• ALU control input

0000 AND0001 OR0010 add0110 subtract0111 set-on-less-than1100 NOR

• Why is the code for subtract 0110 and not 0011?

Control

Page 35: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

352004 Morgan Kaufmann Publishers

Figure 5.12 How the ALU control bits are set depends on the ALUOp control bits and the different function codes for the R-type instruction.

Instruction Opcode

ALUOpInstruction operation

Funct field

Desired ALU action

ALU control input

LW 00 Load word XXXXXX Add 0010

SW 00 Store word XXXXXX Add 0010

Branch equal 01 Branch equal XXXXXX Subtract 0110

R-type 10 Add 100000 Add 0010

R-type 10 subtract 100010 Subtract 0110

R-type 10 AND 100100 And 0000

R-type 10 OR 100101 Or 0001

R-type 10 Set on less than 101010 Set on less than 0111

Page 36: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

362004 Morgan Kaufmann Publishers

• Must describe hardware to compute 4-bit ALU control input

– given instruction type 00 = lw, sw01 = beq, 10 = arithmetic

– function code for arithmetic

• Describe it using a truth table (can turn into gates):

ALUOp computed from instruction type

Control

Page 37: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

372004 Morgan Kaufmann Publishers

Figure B.5.9 A 1-bit ALU that performs AND, OR, and addition on a and b or a and b.

Page 38: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

382004 Morgan Kaufmann Publishers

FIGURE B.5.10 (Top) A 1-bit ALU that performs AND, OR, and addition on a and b or b.

Page 39: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

392004 Morgan Kaufmann Publishers

FIGURE B.5.10 (bottom) a 1-bit ALU for the most significant bit.

Page 40: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

402004 Morgan Kaufmann Publishers

FIGURE B.5.11 A 32-bit ALU constructed from the 31 copies of the 1-bit ALU in the top of Figure B.5.10 and one 1-bit ALU in the bottom of that figure.

Page 41: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

412004 Morgan Kaufmann Publishers

FIGURE B.5.12 The final 32-bit ALU. This adds a Zero detector to Figure B.5.11.

Page 42: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

422004 Morgan Kaufmann Publishers

FIGURE B.5.13 The values of the three ALU control lines Bnegate and Operation and the corresponding ALU operations.

ALU control lines Function

0000 AND

0001 OR

0010 add

0110 subtract

0111 set-on-less-than

1100 NOR

Page 43: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

432004 Morgan Kaufmann Publishers

FIGURE B.5.14 The symbol commonly used to represent an ALU, as shown in FigureB.5.12.

Page 44: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

442004 Morgan Kaufmann Publishers

Figure 5.14 The three instruction classes (R-tape, load and store, and branch) use two different instruction formats.

0 rs rt rd shamt funct

35 or 43 rs rt address

4 rs rt address

Field Bit positions 31:26 25:21 20:16 15:11 10:6 5:0

Field Bit positions 31:26 25:21 20:16 15:0

Field Bit positions 31:26 25:21 20:16 15:0

a. R-type instruction

b. Load or store instruction

c. Branch instruction

Page 45: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

452004 Morgan Kaufmann Publishers

Figure 5.15 The datapath of Figure 5.12 with all necessary multiplexors and all control lines identified

Page 46: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

462004 Morgan Kaufmann Publishers

Control

• Simple combinational logic (truth tables)

Operation2

Operation1

Operation0

Operation

ALUOp1

F3

F2

F1

F0

F (5– 0)

ALUOp0

ALUOp

ALU control block

R-format Iw sw beq

Op0

Op1

Op2

Op3

Op4

Op5

Inputs

Outputs

RegDst

ALUSrc

MemtoReg

RegWrite

MemRead

MemWrite

Branch

ALUOp1

ALUOpO

Page 47: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

472004 Morgan Kaufmann Publishers

• All of the logic is combinational

• We wait for everything to settle down, and the right thing to be done

– ALU might not produce “right answer” right away

– we use write signals along with clock to determine when to write

• Cycle time determined by length of the longest path

Our Simple Control Structure

We are ignoring some details like setup and hold times

Stateelement

1

Stateelement

2Combinational logic

Clock cycle

Page 48: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

482004 Morgan Kaufmann Publishers

Single Cycle Implementation

• Calculate cycle time assuming negligible delays except:

– memory (200ps), ALU and adders (100ps), register file access (50ps)

Readregister 1

Readregister 2

Writeregister

Writedata

Writedata

Registers ALU

Add

Zero

RegWrite

MemRead

MemWrite

PCSrc

MemtoReg

Readdata 1

Readdata 2

ALU operation4

Signextend

16 32

InstructionALU

result

Add

ALUresult

Mux

Mux

Mux

ALUSrc

Address

Datamemory

Readdata

Shiftleft 2

4

Readaddress

Instructionmemory

PC

Page 49: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

492004 Morgan Kaufmann Publishers

Figure 5.16 The effect of each of the seven control signals.

Signal name

Effect when deasserted Effect when asserted

RegDst The register destination number for the Write register comes from the rt field (bits 20:16).

The register destination number for the Write register comes from the rd field (bits 15:11).

RegWrite None. The register on the Write register input is written with the value on the Write data input.

ALUSrc The second ALU operand comes from the second register file output (Read data 2).

The second ALU operand is the sign-extended, lower 16 bits of the instruction.

PCSrc The PC is replaced by the output of the adder that computes the value of PC+4.

The PC is replaced by the output of the adder that computed the branch target.

MEmRead None. Data memory contents designated by the address input are put on the Read data output.

MemWrite None. Data memory contents designated by the address input are replaced by the value on the Write data input.

MemtoReg The value fed to the register Write data input comes from the ALU.

The value fed to the register Write data input comes from the data memory.

Page 50: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

502004 Morgan Kaufmann Publishers

Figure 5.17 The simple datapath with the control unit.

Page 51: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

512004 Morgan Kaufmann Publishers

Figure 5.18 The setting of the control lines is completely determined by the opcode fields of the instruction.

Instruction RegDst ALUSrcMemto-

RegReg

WriteMem Read

Mem Write Branch ALUOp1 ALUp0

R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0sw X 1 X 0 0 1 0 0 0beq X 0 X 0 0 0 1 0 1

Page 52: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

522004 Morgan Kaufmann Publishers

Figure 5.19 The datapath in operation for an R-type instruction such as add $t1, $t2, $t3.

Page 53: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

532004 Morgan Kaufmann Publishers

Figure 5.20 The datapath in operation for a load instruction.

Page 54: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

542004 Morgan Kaufmann Publishers

Figure 5.21 The datapath in operation for a branch equal instruction.

Page 55: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

552004 Morgan Kaufmann Publishers

Figure 5.22 The control function for the simple single-cycle implementation is completely specified by this truth table.

Input or output Signal name R-format lw sw beq

Inputs Op5 0 1 1 0

Op4 0 0 0 0

Op3 0 0 1 0

Op2 0 0 0 1

Op1 0 1 1 0

Op0 0 1 1 0

Outputs RegDst 1 0 X X

ALUSrc 0 1 1 0

MemtoReg 0 1 X X

RegWrite 1 1 0 0

MemRead 0 1 0 0

MemWrite 0 0 1 0

Branch 0 0 0 1

ALUOp1 1 0 0 0

ALUOp0 0 0 0 1

Page 56: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

562004 Morgan Kaufmann Publishers

Figure 5.23 Instruction format for the jump instruction (opcode = 2).

Field Bit positions 31:26 25:0

000010 address

Page 57: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

572004 Morgan Kaufmann Publishers

Figure 5.24 The simple control and datapath are extended to handle the jump instruction.

Page 58: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

582004 Morgan Kaufmann Publishers

Problem: Performance of Single-Cycle Machines (p.315)

Assume that the operation times for the major functional units in this implementation are the following:

Memory units: 200 picoseconds (ps)ALU and adders: 100 psRegister file (read or write): 50 ps

Assume that the multiplexors, control unit, PC accesses, sign extension unit, and wires have no delay, which of the following implementations would be faster and by how much?

1. An implementation in which every instruction operates in 1 clock cycle of a fixed length.

2. An implementation where every instruction executes in 1 clock cycle using a variable-length clock, which for each instruction is only as long as it needs to be.

To compare the performance, assume the following instruction mix: 25% loads, 10% stores, 45% ALU instructions, 15% branches, and 5% jumps.

Page 59: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

592004 Morgan Kaufmann Publishers

• Let’s start by comparing the CPU execution times.

Since CPI must be 1, we can simplify this to

• The critical path for the different instruction classes is as follows:

timecycleClock CPIcountn Instructio timeexecution CPU

timecycleClock countn Instructio timeexecution CPU

Instruction class Functional units used by the instruction class

R-type Instruction fetch Register access ALU Register access

Load word Instruction fetch Register access ALU Memory access Register access

Store word Instruction fetch Register access ALU Memory access

Branch Instruction fetch Register access ALU

Jump Instruction fetch

Page 60: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

602004 Morgan Kaufmann Publishers

• Using these critical paths, we can compute the required length for each instruction class:

• Thus, the average time per instruction with a variable clock is

Instruction class

Instruction memory

Register read

ALU operation

Data memory

Register write

Total

R-type 200 50 100 0 50 400ps

Load word 200 50 100 200 50 600ps

Store word 200 50 100 200 550ps

Branch 200 50 100 0 350ps

Jump 200 200ps

ps.5447

%5200%15350%45400%10550%25600 cycleclock CPU

Page 61: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

612004 Morgan Kaufmann Publishers

• Since the variable clock implementation has a shorter average clock cycle, it is clearly faster. Let’s find the performance ratio:

34.15.447

600

cycleclock CPU

cycleclock CPU

cycleclock CPUIC

cycleclock CPUIC

timeexecution CPU

timeexecution CPU

eperformanc CPU

eperformanc CPU

clock variable

clock single

clock variable

clock single

clock variable

clock single

clock single

clock variable

Page 62: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

622004 Morgan Kaufmann Publishers

5.5 A Multicycle Implementation

Page 63: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

632004 Morgan Kaufmann Publishers

Keywords

• Multicycle implementation Also called multiple clock cycle implementation. An implementation in which and instruction is executed in multiple clock cycles.

• Microprogramming A symbolic representation of control in the form of instructions, called microinstructions, that are executed on a simple micromachine.

• Finite state machine A sequential logic function consisting of a set of inputs and outputs, a next-state function that maps the current state and the inputs to a new state, and an output function that maps the current state and possibly the input to a set of asserted outputs.

• Next-state function A combinational function that, given the inputs and the current state, determines the next state of a finite state machine.

Page 64: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

642004 Morgan Kaufmann Publishers

Where we are headed

• Single Cycle Problems:

– what if we had a more complicated instruction like floating point?

– wasteful of area

• One Solution:

– use a “smaller” cycle time

– have different instructions take different numbers of cycles

– a “multicycle” datapath:

Page 65: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

652004 Morgan Kaufmann Publishers

• We will be reusing functional units

– ALU used to compute address and to increment PC

– Memory used for instruction and data

• Our control signals will not be determined directly by instruction

– e.g., what should the ALU do for a “subtract” instruction?

• We’ll use a finite state machine for control

Multicycle Approach

Page 66: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

662004 Morgan Kaufmann Publishers

• Break up the instructions into steps, each step takes a cycle

– balance the amount of work to be done

– restrict each cycle to use only one major functional unit

• At the end of a cycle

– store values for use in later cycles (easiest thing to do)

– introduce additional “internal” registers

Multicycle Approach

Page 67: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

672004 Morgan Kaufmann Publishers

Figure 5.27 The multicycle datapath from Figure 5.26 with the control lines shown.

Page 68: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

682004 Morgan Kaufmann Publishers

Figure 5.28 The complete datapath for the multicycle implementation together with the necessary control lines.

Page 69: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

692004 Morgan Kaufmann Publishers

Figure 5.29 The action caused by the setting of each control signal in Figure 5.28 on page 323.

Signal name Effect when deasserted Effect when asserted

RegDst The register file destination number for the Write register comes from the rt field.

The register file destination number for the Write register comes from the rd field.

RegWrite None. The general-purpose register selected by the Write register number is written with the value of the Write data input.

ALUSrcA The first ALU operand is the PC. The first ALU operand comes from the A register.

MemRead None. Content of memory at the location specified by the address input is put on Memory data output.

MemWrite None. Memory contents at the location specified by the address input is replaced by value on Write data input.

MemtoReg The value fed to the register file Write data input comes from ALUOut.

The value fed to the register file Write data input comes from the MDR.

IorD The PC is used to supply the address to the memory unit.

ALUOut is used to supply the address to the memory unit.

IRWrite None. The output of the memory is written into the IR.

PCWrite None. The PC is written; the source is controlled by PCSource.

PCWriteCond None. The PC is written is the Zero output from the ALU is also active.

Actions of the 1-bit control signals

Page 70: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

702004 Morgan Kaufmann Publishers

Continue…

Actions of the 2-bit control signalsSignal name

Value (binary)

Effect

ALUOp 00 The ALU performs an add operation.

01 The ALU performs a subtract operation.

10 The funct field of the instruction determines the ALU operation.

ALUSrcB 00 The second input to the ALU comes from the B register.

01 The second input to the ALU is the constant 4.

10 The second input to the ALU is the sign-extend, lower 16 bits of the IR.

11 The second input to the ALU is the sign-extended, lower 16 bits of the IR shifted left 2 bits.

PCSource 00 Output of the ALU (PC+4) is sent to the PC for writing.

01 The contents of ALUOut (the branch target address) are sent to the PC for waiting.

10 The jump target address (IR[25:0] shifted left 2 bits and concatenated with PC+4[31:28] is sent to the PC for writing.)

Page 71: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

712004 Morgan Kaufmann Publishers

Instructions from ISA perspective

• Consider each instruction from perspective of ISA.

• Example:

– The add instruction changes a register.

– Register specified by bits 15:11 of instruction.

– Instruction specified by the PC.

– New value is the sum (“op”) of two registers.

– Registers specified by bits 25:21 and 20:16 of the instructionReg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op Reg[Memory[PC][20:16]]

– In order to accomplish this we must break up the instruction.(kind of like introducing variables when programming)

Page 72: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

722004 Morgan Kaufmann Publishers

Breaking down an instruction

• ISA definition of arithmetic:

Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op Reg[Memory[PC][20:16]]

• Could break down to:– IR <= Memory[PC]– A <= Reg[IR[25:21]]– B <= Reg[IR[20:16]]– ALUOut <= A op B– Reg[IR[20:16]] <= ALUOut

• We forgot an important part of the definition of arithmetic!– PC <= PC + 4

Page 73: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

732004 Morgan Kaufmann Publishers

Idea behind multicycle approach

• We define each instruction from the ISA perspective (do this!)

• Break it down into steps following our rule that data flows through at most one major functional unit (e.g., balance work across steps)

• Introduce new registers as needed (e.g, A, B, ALUOut, MDR, etc.)

• Finally try and pack as much work into each step (avoid unnecessary cycles)

while also trying to share steps where possible(minimizes control, helps to simplify solution)

• Result: Our book’s multicycle Implementation!

Page 74: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

742004 Morgan Kaufmann Publishers

• Instruction Fetch

• Instruction Decode and Register Fetch

• Execution, Memory Address Computation, or Branch Completion

• Memory Access or R-type instruction completion

• Write-back step

INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!

Five Execution Steps

Page 75: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

752004 Morgan Kaufmann Publishers

• Use PC to get instruction and put it in the Instruction Register.

• Increment the PC by 4 and put the result back in the PC.

• Can be described succinctly using RTL "Register-Transfer Language"

IR <= Memory[PC];PC <= PC + 4;

Can we figure out the values of the control signals?

What is the advantage of updating the PC now?

Step 1: Instruction Fetch

Page 76: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

762004 Morgan Kaufmann Publishers

• Read registers rs and rt in case we need them

• Compute the branch address in case the instruction is a branch

• RTL:

A <= Reg[IR[25:21]];B <= Reg[IR[20:16]];ALUOut <= PC + (sign-extend(IR[15:0]) << 2);

• We aren't setting any control lines based on the instruction type (we are busy "decoding" it in our control logic)

Step 2: Instruction Decode and Register Fetch

Page 77: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

772004 Morgan Kaufmann Publishers

• ALU is performing one of three functions, based on instruction type

• Memory Reference:

ALUOut <= A + sign-extend(IR[15:0]);

• R-type:

ALUOut <= A op B;

• Branch:

if (A==B) PC <= ALUOut;

Step 3 (instruction dependent)

Page 78: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

782004 Morgan Kaufmann Publishers

• Loads and stores access memory

MDR <= Memory[ALUOut];or

Memory[ALUOut] <= B;

• R-type instructions finish

Reg[IR[15:11]] <= ALUOut;

The write actually takes place at the end of the cycle on the edge

Step 4 (R-type or memory-access)

Page 79: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

792004 Morgan Kaufmann Publishers

• Reg[IR[20:16]] <= MDR;

Which instruction needs this?

Write-back step

Page 80: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

802004 Morgan Kaufmann Publishers

Summary:

Page 81: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

812004 Morgan Kaufmann Publishers

• How many cycles will it take to execute this code?

lw $t2, 0($t3)lw $t3, 4($t3)beq $t2, $t3, Label #assume notadd $t5, $t2, $t3sw $t5, 8($t3)

Label: ...

• What is going on during the 8th cycle of execution?

• In what cycle does the actual addition of $t2 and $t3 takes place?

Simple Questions

Page 82: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

822004 Morgan Kaufmann Publishers

Problem: CPI in a multicycle CPU

• Using the SPECINT2000 instruction mix shown in Figure 3.26, what is the CPI, assuming that each state in the multicycle CPU requires 1 clock cycle?

Answer: The mix is 25% loads (1% load byte+24% load word), 10% stores (1% store byte+9% store word), 11% branches (6% beq, 5% bne), 2% jumps (1% jal+1% jr), and 52% ALU (all the rest of the mix, which we assume to be ALU instructions). From Figure 5.30 on page 329, the number of clock cycles for each instruction class is the following:

Loads: 5 ; Store: 4; ALU instructions: 4; Branches: 3; Jumps: 3;

The CPI is given by the following:

ii

ii

CPIcountn Instructio

countn Instructio

countn Instructio

CPIcountn Instructio

countn Instructio

cyclesclock CPUCPI

Page 83: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

832004 Morgan Kaufmann Publishers

• The ratio

is simplify the instruction frequency for the instruction class i. We can therefore substitute to obtain

This CPI is better than the worst-case CPI of 5.0 when all the instructions take the same number of clock cycles.

countn Instructio

countin Instructio

12.4302.0311.0452.0410.050.25CPI

Page 84: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

842004 Morgan Kaufmann Publishers

• Finite state machines:

– a set of states and

– next state function (determined by current state and the input)

– output function (determined by current state and possibly input)

– We’ll use a Moore machine (output based only on current state)

Review: finite state machines

Inputs

Current state

Outputs

Clock

Next-statefunction

Outputfunction

Nextstate

Page 85: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

852004 Morgan Kaufmann Publishers

Review: finite state machines

• Example:

B. 37 A friend would like you to build an “electronic eye” for use as a fake security device. The device consists of three lights lined up in a row, controlled by the outputs Left, Middle, and Right, which, if asserted, indicate that a light should be on. Only one light is on at a time, and the light “moves” from left to right and then from right to left, thus scaring away thieves who believe that the device is monitoring their activity. Draw the graphical representation for the finite state machine used to specify the electronic eye. Note that the rate of the eye’s movement will be controlled by the clock speed (which should not be too great) and that there are essentially no inputs.

Page 86: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

862004 Morgan Kaufmann Publishers

• Value of control signals is dependent upon:

– what instruction is being executed

– which step is being performed

• Use the information we’ve accumulated to specify a finite state machine

– specify the finite state machine graphically, or

– use microprogramming

• Implementation can be derived from specification

Implementing the Control

Page 87: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

872004 Morgan Kaufmann Publishers

Figure 5.31 The high-level view of the finite state machine control.

Page 88: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

882004 Morgan Kaufmann Publishers

Figure 5.32 The instruction fetch and decode portion of every instruction is identical.

Page 89: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

892004 Morgan Kaufmann Publishers

Figure 5.33 The finite state machine for controlling memory-reference instructions has four states.

Page 90: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

902004 Morgan Kaufmann Publishers

Figure 5.34 R-type instructions can be implemented with a simple two-state finite state machine.

Page 91: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

912004 Morgan Kaufmann Publishers

Figure 5.35 The branch instruction requires a single state.

Page 92: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

922004 Morgan Kaufmann Publishers

Figure 5.36 The jump instruction requires a single state that asserts two control signals to write the PC with the lower 26 bits of the instruction register shifted left 2 bits and concatenated to the upper 4 bits of the PC of this instruction.

Page 93: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

932004 Morgan Kaufmann Publishers

• Implementation:

Finite State Machine for Control

PCWrite

PCWriteCond

IorD

MemtoReg

PCSource

ALUOp

ALUSrcB

ALUSrcA

RegWrite

RegDst

NS3NS2NS1NS0

Op5

Op4

Op3

Op2

Op1

Op0

S3

S2

S1

S0

State register

IRWrite

MemRead

MemWrite

Instruction registeropcode field

Outputs

Control logic

Inputs

Page 94: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

942004 Morgan Kaufmann Publishers

• Note:– don’t care if not mentioned– asserted if name only

– otherwise exact value

• How many state bits will we need?

Graphical Specification of FSM

Page 95: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

952004 Morgan Kaufmann Publishers

Page 96: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

962004 Morgan Kaufmann Publishers

PLA Implementation (Section C.3 & AppendixB)

• If I picked a horizontal or vertical line could you explain it?Op5

Op4

Op3

Op2

Op1

Op0

S3

S2

S1

S0

IorD

IRWrite

MemReadMemWrite

PCWritePCWriteCond

MemtoRegPCSource1

ALUOp1

ALUSrcB0ALUSrcARegWriteRegDstNS3NS2NS1NS0

ALUSrcB1ALUOp0

PCSource0

Page 97: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

972004 Morgan Kaufmann Publishers

• ROM = "Read Only Memory"– values of memory locations are fixed ahead of time

• A ROM can be used to implement a truth table– if the address is m-bits, we can address 2m entries in the ROM.– our outputs are the bits of data that the address points to.

m is the "height", and n is the "width"

ROM Implementation (Section C.3 & AppendixB)

m n

0 0 0 0 0 1 10 0 1 1 1 0 00 1 0 1 1 0 00 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 11 1 0 0 1 1 01 1 1 0 1 1 1

Page 98: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

982004 Morgan Kaufmann Publishers

• How many inputs are there?6 bits for opcode, 4 bits for state = 10 address lines(i.e., 210 = 1024 different addresses)

• How many outputs are there?16 datapath-control outputs, 4 state bits = 20 outputs

• ROM is 210 x 20 = 20K bits (and a rather unusual size)

• Rather wasteful, since for lots of the entries, the outputs are the same

— i.e., opcode is often ignored

ROM Implementation

Page 99: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

992004 Morgan Kaufmann Publishers

• Break up the table into two parts

— 4 state bits tell you the 16 outputs, 24 x 16 bits of ROM

— 10 bits tell you the 4 next state bits, 210 x 4 bits of ROM

— Total: 4.3K bits of ROM

• PLA is much smaller

— can share product terms

— only need entries that produce an active output

— can take into account don't cares

• Size is (#inputs #product-terms) + (#outputs #product-terms)

For this example = (10x17)+(20x17) = 510 PLA cells

• PLA cells usually about the size of a ROM cell (slightly bigger)

ROM vs PLA

Page 100: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1002004 Morgan Kaufmann Publishers

• Complex instructions: the "next state" is often current state + 1

Another Implementation Style

AddrCtl

Outputs

PLA or ROM

State

Address select logic

Op

[5–

0]

Adder

Instruction registeropcode field

1

Control unit

Input

PCWritePCWriteCondIorD

MemtoRegPCSourceALUOpALUSrcBALUSrcARegWriteRegDst

IRWrite

MemReadMemWrite

BWrite

Page 101: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1012004 Morgan Kaufmann Publishers

DetailsDispatch ROM 1 Dispatch ROM 2

Op Opcode name Value Op Opcode name Value000000 R-format 0110 100011 lw 0011000010 jmp 1001 101011 sw 0101000100 beq 1000100011 lw 0010101011 sw 0010

State number Address-control action Value of AddrCtl

0 Use incremented state 31 Use dispatch ROM 1 12 Use dispatch ROM 2 23 Use incremented state 34 Replace state number by 0 05 Replace state number by 0 06 Use incremented state 37 Replace state number by 0 08 Replace state number by 0 09 Replace state number by 0 0

State

Adder

1

PLA or ROM

Mux3 2 1 0

Dispatch ROM 1Dispatch ROM 2

0

AddrCtl

Address select logic

Instruction registeropcode field

Page 102: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1022004 Morgan Kaufmann Publishers

5.6 Exceptions

Page 103: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1032004 Morgan Kaufmann Publishers

Keywords

• Exception Also called interrupt. An unscheduled event that disrupts program execution; used to detect overflow.

• Interrupt An exception that comes from outside of the processor. (Some architectures use the term interrupt for all exceptions.)

Type of event From where?

MIPS terminology

I/O device request External Interrupt

Invoke the operating system from user program Internal Exception

Arithmetic overflow Internal Exception

Using an undefined instruction Internal Exception

Hardware malfunctions External Exception or interrupt

Page 104: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1042004 Morgan Kaufmann Publishers

Keywords

• Vectored interrupt An interrupt for which the address to which the address to which control is transferred is determined by the cause of the exception.

Exception type Exception vector address (in hex)

Undefined instruction

Arithmetic overflow

hexC 0000 000

hexC 0020 000

Page 105: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1052004 Morgan Kaufmann Publishers

How control Checks for Exceptions?

• Undefined instruction

• Arithmetic overflow

Page 106: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1062004 Morgan Kaufmann Publishers

Figure 5.39 The multicycle datapath with the addition needed to implement exceptions.

Page 107: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1072004 Morgan Kaufmann Publishers

Figure 5.40 This shows the finite state machine with the additions to handle exception detection.

Page 108: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1082004 Morgan Kaufmann Publishers

5.9 Real Stuff: The Organization of Recent Pentium

Page 109: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1092004 Morgan Kaufmann Publishers

Keywords

• Microprogrammed control A method of specifying control that uses microcode rather than a finite state representation.

• Hardwired control An implementation of finite state machine control typically using programmable logic arrays (PLAs) or collections of PLAs and random logic.

• Microcode The set of microinstructions that control a processor.

• Superscalar An advanced pipelining technique that enables the processor to execute more than one instruction per clock cycle.

• Microinstruction A representation of control using low-level instructions, each of which asserts a set of control signals that are active on a given clock cycle as well as specified what microinstruction to execute next.

Page 110: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1102004 Morgan Kaufmann Publishers

Keywords

• Micro-operations The RISC-like instructions directly executed by the hardware in recent Pentium implementations.

• Trace cache An instruction cache that holds a sequence of instructions with a given starting address; in recent Pentium implementations the trace cache holds microoperations rather than IA-32 instructions.

• Dispatch An operation in a microprogrammed control unit in which the next microinstruction is selected on the basis of one or more fields of a macroinstruction, usually by creating a table containing the addresses of the target microinstructions and indexing the table using a field of the macroinstruction. The dispatch tables are typically implemented in ROM or programmable logic array (PLA). The term dispatch is also used in dynamically scheduled processors to refer to the process of sending an instruction to a queue.

Page 111: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1112004 Morgan Kaufmann Publishers

Microprogramming

• What are the “microinstructions” ?

PCWritePCWriteCondIorD

MemtoRegPCSourceALUOpALUSrcBALUSrcARegWrite

AddrCtl

Outputs

Microcode memory

IRWrite

MemReadMemWrite

RegDst

Control unit

Input

Microprogram counter

Address select logic

Adder

1

Instruction registeropcode field

BWrite

Datapath

Page 112: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1122004 Morgan Kaufmann Publishers

• A specification methodology– appropriate if hundreds of opcodes, modes, cycles, etc.– signals specified symbolically using microinstructions

• Will two implementations of the same architecture have the same microcode?• What would a microassembler do?

Microprogramming

LabelALU

control SRC1 SRC2Register control Memory

PCWrite control Sequencing

Fetch Add PC 4 Read PC ALU SeqAdd PC Extshft Read Dispatch 1

Mem1 Add A Extend Dispatch 2LW2 Read ALU Seq

Write MDR FetchSW2 Write ALU FetchRformat1 Func code A B Seq

Write ALU FetchBEQ1 Subt A B ALUOut-cond FetchJUMP1 Jump address Fetch

Page 113: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1132004 Morgan Kaufmann Publishers

Microinstruction formatField name Value Signals active Comment

Add ALUOp = 00 Cause the ALU to add.ALU control Subt ALUOp = 01 Cause the ALU to subtract; this implements the compare for

branches.Func code ALUOp = 10 Use the instruction's function code to determine ALU control.

SRC1 PC ALUSrcA = 0 Use the PC as the first ALU input.A ALUSrcA = 1 Register A is the first ALU input.B ALUSrcB = 00 Register B is the second ALU input.

SRC2 4 ALUSrcB = 01 Use 4 as the second ALU input.Extend ALUSrcB = 10 Use output of the sign extension unit as the second ALU input.Extshft ALUSrcB = 11 Use the output of the shift-by-two unit as the second ALU input.Read Read two registers using the rs and rt fields of the IR as the register

numbers and putting the data into registers A and B.Write ALU RegWrite, Write a register using the rd field of the IR as the register number and

Register RegDst = 1, the contents of the ALUOut as the data.control MemtoReg = 0

Write MDR RegWrite, Write a register using the rt field of the IR as the register number andRegDst = 0, the contents of the MDR as the data.MemtoReg = 1

Read PC MemRead, Read memory using the PC as address; write result into IR (and lorD = 0 the MDR).

Memory Read ALU MemRead, Read memory using the ALUOut as address; write result into MDR.lorD = 1

Write ALU MemWrite, Write memory using the ALUOut as address, contents of B as thelorD = 1 data.

ALU PCSource = 00 Write the output of the ALU into the PC.PCWrite

PC write control ALUOut-cond PCSource = 01, If the Zero output of the ALU is active, write the PC with the contentsPCWriteCond of the register ALUOut.

jump address PCSource = 10, Write the PC with the jump address from the instruction.PCWrite

Seq AddrCtl = 11 Choose the next microinstruction sequentially.Sequencing Fetch AddrCtl = 00 Go to the first microinstruction to begin a new instruction.

Dispatch 1 AddrCtl = 01 Dispatch using the ROM 1.Dispatch 2 AddrCtl = 10 Dispatch using the ROM 2.

Page 114: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1142004 Morgan Kaufmann Publishers

• No encoding:

– 1 bit for each datapath operation

– faster, requires more memory (logic)

– used for Vax 780 — an astonishing 400K of memory!

• Lots of encoding:

– send the microinstructions through logic to get control signals

– uses less memory, slower

• Historical context of CISC:

– Too much logic to put on a single chip with everything else

– Use a ROM (or even RAM) to hold the microcode

– It’s easy to add new instructions

Maximally vs. Minimally Encoded

Page 115: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1152004 Morgan Kaufmann Publishers

Microcode: Trade-offs

• Distinction between specification and implementation is sometimes blurred

• Specification Advantages:

– Easy to design and write

– Design architecture and microcode in parallel

• Implementation (off-chip ROM) Advantages

– Easy to change since values are in memory

– Can emulate other architectures

– Can make use of internal registers

• Implementation Disadvantages, SLOWER now that:

– Control is implemented on same chip as processor

– ROM is no longer faster than RAM

– No need to go back and make changes

Page 116: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1162004 Morgan Kaufmann Publishers

5.10 Fallacies and Pitfalls

Page 117: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1172004 Morgan Kaufmann Publishers

• Pitfall: Adding a complex instruction implemented with microprogramming may not be faster than a sequence using simpler instructions.

• Fallacy: If there is space in the control store, new instructions are free of cost.

Page 118: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1182004 Morgan Kaufmann Publishers

5.11 Concluding Remarks

Page 119: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1192004 Morgan Kaufmann Publishers

Figure 5.41 Alternative methods for specifying and implementing control.

Page 120: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1202004 Morgan Kaufmann Publishers

5.12 Historical Perspective and Further Reading

Page 121: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1212004 Morgan Kaufmann Publishers

Historical Perspective

• In the ‘60s and ‘70s microprogramming was very important for implementing machines

• This led to more sophisticated ISAs and the VAX• In the ‘80s RISC processors based on pipelining became popular• Pipelining the microinstructions is also possible!• Implementations of IA-32 architecture processors since 486 use:

– “hardwired control” for simpler instructions (few cycles, FSM control implemented using PLA or random

logic)

– “microcoded control” for more complex instructions(large numbers of cycles, central control store)

• The IA-64 architecture uses a RISC-style ISA and can be implemented without a large central control store

Page 122: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1222004 Morgan Kaufmann Publishers

Pentium 4

• Pipelining is important (last IA-32 without it was 80386 in 1985)

• Pipelining is used for the simple instructions favored by compilers

“Simply put, a high performance implementation needs to ensure that the simple instructions execute quickly, and that the burden of the complexities of the instruction set penalize the complex, less frequently used, instructions”

Control

Control

Control

Enhancedfloating pointand multimedia

Control

I/Ointerface

Instruction cache

Integerdatapath

Datacache

Secondarycacheandmemoryinterface

Advanced pipelininghyperthreading support

Chapter 6

Chapter 7

Page 123: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1232004 Morgan Kaufmann Publishers

Pentium 4

• Somewhere in all that “control we must handle complex instructions

• Processor executes simple microinstructions, 70 bits wide (hardwired)

• 120 control lines for integer datapath (400 for floating point)

• If an instruction requires more than 4 microinstructions to implement, control from microcode ROM (8000 microinstructions)

• Its complicated!

Control

Control

Control

Enhancedfloating pointand multimedia

Control

I/Ointerface

Instruction cache

Integerdatapath

Datacache

Secondarycacheandmemoryinterface

Advanced pipelininghyperthreading support

Page 124: 1  2004 Morgan Kaufmann Publishers Chapter Five The Processor : Datapath and Control

1242004 Morgan Kaufmann Publishers

Chapter 5 Summary

• If we understand the instructions…

We can build a simple processor!

• If instructions take different amounts of time, multi-cycle is better

• Datapath implemented using:

– Combinational logic for arithmetic

– State holding elements to remember bits

• Control implemented using:

– Combinational logic for single-cycle implementation

– Finite state machine for multi-cycle implementation