23
9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects write the checks that the design engineers have to cash. If the amount is too high, the whole project goes bankrupt. Design engineers must constantly juggle many conflicting demands: schedule, performance, power dissipation, features, testing, documentation, training and hiring. The Pentium Chronicles, Colwell, pg. 64 & 63

EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

9/4/2018

1

EEC 581

Computer Architecture

Datapath

Department of Electrical Engineering and Computer

Science

Cleveland State University

1

2

Architects write the checks that the design engineers have

to cash. If the amount is too high, the whole project

goes bankrupt.

Design engineers must constantly juggle many conflicting

demands: schedule, performance, power dissipation,

features, testing, documentation, training and hiring.

The Pentium Chronicles, Colwell, pg. 64 & 63

Page 2: EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

9/4/2018

2

3

Review: MIPS Organization

ProcessorMemory

32 bits

230

words

read/write

addr

read data

write data

word address

(binary)

0…00000…01000…10000…1100

1…1100Register File

src1 addr

src2 addr

dst addr

write data

32 bits

src1data

src2data

32registers

($zero - $ra)

32

32

32

32

32

32

5

5

5

PC

ALU

32 32

32

32

32

0 1 2 3

7654

byte address

(big Endian)

Fetch

PC = PC+4

DecodeExec

Add32

324

Add32

32br offset

4

We're ready to look at an implementation of the MIPS

Simplified to contain only:

memory-reference instructions: lw, sw

arithmetic-logical instructions: add, sub, and, or, slt

control flow instructions: beq, j

Generic Implementation:

use the program counter (PC) to supply instruction address

get the instruction from memory

read registers

use the instruction to decide exactly what to do

The Processor: Datapath and Control

Fetch

PC = PC+4

DecodeExec

Page 3: EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

9/4/2018

3

5

Ada Lovelace, 1815-1852

Wrote the first computer program.

It calculated the Bernoulli numbers using Charles Babbage’s

Analytical Engine.

She was the only legitimate child of the poet Lord Byron.

Charles Babbage's Analytical Engine, 1871.

This was the first fully-automatic calculating machine.

6

Abstract / Simplified View:

Two types of functional units:

Elements that operate on data values (combinational)

Elements that contain state (sequential)

More Implementation Details

Registers

Register #

Data

Register #

Data

memory

Address

Data

Register #

PC Instruction ALU

Instruction

memory

Address

Page 4: EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

9/4/2018

4

7

Unclocked vs. Clocked

Clocks used in synchronous logic

when should an element that contains state be updated?

cycle time

rising edge

falling edge

State Elements

8

Combinational Logic Review

Combinational logic circuits are memoryless

No feedback in combinational logic circuits

Output assumes the function implemented by the

logic network, assuming that the switching transients have

settled

Outputs can have multiple logical transitions before settling

to the correct value

Combinational

Circuit

Input Output

Page 5: EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

9/4/2018

5

9

Sequential Logic Circuits

Sequential circuits

Combinational logic circuits

State information (stored in memory)

Output is a function of inputs and present state

Can be synchronous or asynchronous

Combinationalcircuits

inputs outputs

StorageElement

delay

Present State

Next State

Controller by a periodic clock or an event trigger

10

The set-reset latch

output depends on present inputs and also on past inputs

It consists of two cross coupled NOR gates. Two inputs S and R,

two outputs are Q and Qn.

Similar to the cross coupled inverters, but its state can be

controlled by S and R, they set and reset the output Q.

An unclocked state element

S

Q

QN

R

S R Q QN

0 0 Q Q

0 1 0 1

1 0 1 0

1 1 0 0

Reset

Set

Undefined

No Change

Page 6: EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

9/4/2018

6

11

Output is equal to the stored value inside the element

(don't need to ask for permission to look at the value)

Change of state (value) is based on the clock

Latches: whenever the inputs change, and the clock is asserted

Level Sensitive Latch

Flip-flop: state changes only on a clock edge

(edge-triggered methodology) "logically true",

— could mean electrically low

A clocking methodology defines when signals can be read and written

— wouldn't want to read a signal at the same time it was being written

Latches and Flip-flops

Master-slave and edge-triggered Flip-flop

Thus, Flip-flop refers to a bi-stable element. (Edge-triggered register are also

called Flip-flops)

12

Two inputs:

the data value to be stored (D)

the clock signal (C) indicating when to read & store D

Two outputs:

the value of the internal state (Q) and it's complement

D-latch

Q

C

D

_Q

D

C

Q

Latch is transparent when clock is high. (copies input to output) !

Page 7: EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

9/4/2018

7

13

10T D Latch w/ Transmission Gates

D

C

C

C

Q

Q

The circuit consists of a t-gate based multiplexer and

a non-inverting buffer (built as a cascade of two inverters).

14

Transmission Gates

Pass transistors produce degraded outputs

Transmission gates pass both 0 and 1 well

g = 0, gb = 1

a b

g = 1, gb = 0

a b

0 strong 0

Input Output

1 strong 1

g

gb

a b

a b

g

gb

a b

g

gb

a b

g

gb

g = 1, gb = 0

g = 1, gb = 0

CMOS as a switch

-Amp

-Switch

-

Page 8: EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

9/4/2018

8

15

10T D Latch w/ Transmission Gates

D

C=1

C

Q

Q

D

Writing Data

D

DC

When the clock input is high, the current value from the data input (D)

will propagate through the left transmission gate and through the inverters.

16

10T D Latch w/ Transmission Gates

D_new

C=0

C

Q

Q

Writing Data

D

D

D

C

The output value Q of the flipflop will be fed back into the input of

the first-stage inverter. Therefore, the latch stores whatever value

it hold when the clock signal changed to low.

Page 9: EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

9/4/2018

9

17

Problem of Transparency

C

Transparent D-Latch

1

D Q D

En

D

Q

Oscillating Unstable Unstable

18

D flip-flop

Output changes only on the clock edge

QQ

_Q

Q

_Q

D

latch

D

C

D

latch

DD

C

C

D

C

Q

Page 10: EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

9/4/2018

10

19

Our Implementation

An edge triggered methodology

Typical execution:

read contents of some state elements,

send values through some combinational logic

write results to one or more state elements

Clock cycle

State

element

1

Combinational logic

State

element

2

20

Built using D flip-flops

Register File

M

u

x

Register 0

Register 1

Register n – 1

Register n

M

u

xRead data 1

Read data 2

Read register

number 1

Read register

number 2

Read register number 1 Read

data 1

Read data 2

Read register number 2

Register fileWrite register

Write data Write

* Recently, register files can be implemented by way of fast

Static RAMS with multiple ports.

Page 11: EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

9/4/2018

11

21

Register File

Note: we still use the real clock to determine when to

write

n-to-1

decoder

Register 0

Register 1

Register n – 1

C

C

D

D

Register n

C

C

D

D

Register number

Write

Register data

0

1

n – 1

n

22

Simple Implementation

Include the functional units we need for each

instruction

PC

Instruction

memory

Instruction address

Instruction

a. Instruction memory b. Program counter

Add Sum

c. Adder

16 32Sign

extend

b. Sign-extension unit

MemRead

MemWrite

Data

memoryWrite data

Read data

a. Data memory unit

Address

ALU control

RegWrite

RegistersWriteregister

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Writedata

ALUresult

ALU

Data

Data

Register

numbers

a. Registers b. ALU

Zero5

5

5 3

Page 12: EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

9/4/2018

12

23

Fetching Instructions: Memory

Fetching instructions involves

reading the instruction from the Instruction Memory

updating the PC to hold the address of the next instruction

Read

AddressInstruction

Instruction

Memory

Add

PC

4

24

Decoding Instructions: Register

sending the fetched instruction’s opcode and function field

bits to the control unit

Instruction

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read

Data 1

Read

Data 2

Control

Unit

reading two values from the Register File- Register File addresses are contained in the instruction

Page 13: EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

9/4/2018

13

25

Executing R Format Operations: ALU

R format operations (add, sub, slt, and, or)

perform the (op and funct) operation on values in rs and rt

store the result back into the Register File (into location rd)

Instruction

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read

Data 1

Read

Data 2

ALU

overflow

zero

ALU controlRegWrite

R-type:

31 25 20 15 5 0

op rs rt rd functshamt

10

The Register File is not written every cycle (e.g. sw), so we need an

explicit write control signal for the Register File

26

Executing Load and Store Operations compute memory address by adding the base register (read

from the Register File during decode) to the 16-bit signed-

extended offset field in the instruction

store value (read from the Register File during decode) written

to the Data Memory

load value, read from the Data Memory, written to the Register

File

Instruction

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read

Data 1

Read

Data 2

ALU

overflow

zero

ALU controlRegWrite

Data

Memory

Address

Write Data

Read Data

Sign

Extend

MemWrite

MemRead

16 32

Page 14: EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

9/4/2018

14

27

Executing Branch Operations

compare the operands read from the Register File during decode for equality (zero ALU output)

compute the branch target address by adding the updated PC to

the 16-bit signed-extended offset field in the

instr

Instruction

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read

Data 1

Read

Data 2

ALU

zero

ALU control

Sign

Extend16 32

Shift

left 2

Add

4Add

PC

Branch

target

address

(to branch

control logic)

28

Executing Jump Operations

Jump operation involves

replace the lower 28 bits of the PC with the lower 26 bits of the

fetched instruction shifted left by 2 bits

Read

AddressInstruction

Instruction

Memory

Add

PC

4

Shift

left 2

Jump

address

26

4

28

Page 15: EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

9/4/2018

15

29

Creating a Single Datapath from the Parts

Assemble the datapath segments and add control lines and

multiplexors as needed

Single cycle design – fetch, decode and execute each

instructions in one clock cycle

no datapath resource can be used more than once per instruction,

so some must be duplicated (e.g., separate Instruction Memory

and Data Memory, several adders)

multiplexors needed at the input of shared elements with control

lines to do the selection

write signals to control writing to the Register File and Data

Memory

Cycle time is determined by length of the longest path

30

Building the Datapath

Use multiplexers to stitch them together

MemtoReg

MemRead

MemWrite

ALUOp

ALUSrc

RegDst

PC

Instruction memory

Read address

Instruction [31– 0]

Instruction [20– 16]

Instruction [25– 21]

Add

Instruction [5– 0]

RegWrite

4

16 32Instruction [15– 0]

0

Registers

Write register

Write data

Write data

Read data 1

Read data 2

Read register 1

Read register 2

Sign extend

ALU result

Zero

Data memory

Address Read data

M u x

1

0

M u x

1

0

M u x

1

0

M u x

1

Instruction [15– 11]

ALU control

Shift

left 2

PCSrc

ALU

AddALU

result

Page 16: EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

9/4/2018

16

31

R-Type Instructions (e.g. add $2, $3, $4; Not JR/JALR)

MemtoReg

MemRead

MemWrite

ALUOp

ALUSrc

RegDst

PC

Instructionmemory

Readaddress

Instruction[31–0]

Instruction [20–16]

Instruction [25–21]

Add

Instruction [5–0]

RegWrite

4

16 32Instruction [15–0]

0

Registers

Writeregister

Writedata

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

ALUresult

Zero

Datamemory

Address Readdata

Mux

1

0

Mux

1

0

Mux

1

0

Mux

1

Instruction [15–11]

ALUcontrol

Shiftleft 2

PCSrc

ALU

AddALU

result

32

I-Type Instructions (e.g. lw $4, 1000($15))

MemtoReg

MemRead

MemWrite

ALUOp

ALUSrc

RegDst

PC

Instructionmemory

Readaddress

Instruction[31–0]

Instruction [20–16]

Instruction [25–21]

Add

Instruction [5–0]

RegWrite

4

16 32Instruction [15–0]

0

Registers

Writeregister

Writedata

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

ALUresult

Zero

Datamemory

Address Readdata

Mux

1

0

Mux

1

0

Mux

1

0

Mux

1

Instruction [15–11]

ALUcontrol

Shiftleft 2

PCSrc

ALU

AddALU

result

Page 17: EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

9/4/2018

17

33

I-type Instruction for Branches

(e.g. beq $4, $5, Label7)

MemtoReg

MemRead

MemWrite

ALUOp

ALUSrc

RegDst

PC

Instructionmemory

Readaddress

Instruction[31–0]

Instruction [20–16]

Instruction [25–21]

Add

Instruction [5–0]

RegWrite

4

16 32Instruction [15–0]

0

Registers

Writeregister

Writedata

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

ALUresult

Zero

Datamemory

Address Readdata

Mux

1

1

Mux

0

0

Mux

1

0

Mux

1

Instruction [15–11]

ALUcontrol

Shiftleft 2

PCSrc

ALU

AddALU

result

34

Control

Selecting the operations to perform (ALU, read/write, etc.)

Controlling the flow of data (multiplexer inputs)

Information comes from the 32 bits of the instruction

Example:

add $8, $17, $18 Instruction Format:

000000 10001 10010 01000 00000 100000

op rs rt rd shamt funct

ALU's operation based on instruction type and

function code

Page 18: EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

9/4/2018

18

35

e.g., what should the ALU do with this instruction

Example: lw $1, 100($2)

35 2 1 100

op rs rt 16 bit offset

ALU control input

000 AND

001 OR010 add110 subtract111 set-on-less-than

Why is the code for subtract 110 and not 011? What do you need for slt instruction?

ALU Control

ALU control

ALU

result

ALU

Zero

3

Main control unit generates the ALUop bits for ALU control and

ALU control unit generates ALU control input, reducing main control size.

36

Supporting slt

0

3

Result

Operation

a

1

CarryIn

CarryOut

0

1

Binvert

b 2

Less

0

3

Result

Operation

a

1

CarryIn

0

1

Binvert

b 2

Less

Set

Overflowdetection

Overflow

MSB Logic Block

Overflow logic depends on whether doing an addition or subtraction:

if (addition) overflow = (a and b and (not Nf ) ) or

( ( not a) and (not b) and Nf)

i.e. For addition, if sign bits of operands are the same, but the result

sign bit is different, then OVERFLOW has occurred.

Sign bit of

result for

addition,

subtraction.

Call this

‘Nf’

Page 19: EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

9/4/2018

19

37

Must describe hardware to compute 3-bit ALU control input

given instruction type

00 = lw, sw

01 = beq,

10 = arithmetic (incl. slt)

function code for arithmetic

Describe it using a truth table (can turn into gates):

ALUOp

computed from instruction type

Control the ALU

ALUOp Funct field ALU

ControlALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0

0 0 X X X X X X 010

X 1 X X X X X X 110

1 X X X 0 0 0 0 010

1 X X X 0 0 1 0 110

1 X X X 0 1 0 0 000

1 X X X 0 1 0 1 001

1 X X X 1 0 1 0 111

inst[5:0]Generated from

Decoding inst[31:26]

ALU control

ALU

result

ALU

Zero

3

add

subaddsub

andor

slt

lw/sw

beq

arith

ALU

control

ALUOp

funct =

inst[5:0]

38

Two level implementation

inst

ruct

ion

reg

iste

r ALUop

ALUcontrol

Opco

de

Funct

.

31

26

0

5

bit

Control 1

Control 2

ALU

00: lw, sw01: beq10: add, sub, and, or, slt

000: and001: or010: add110: sub111: set on less than6

6

2

3

Page 20: EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

9/4/2018

20

39

ALU Control

Simple combinational logic (truth tables)

Operation2

Operation1

Operation0

Operation

ALUOp1

F3

F2

F1

F0

F (5– 0)

ALUOp0

ALUOp

ALU control block

40

Three Instruction Classes Format

Instruction format for R-type all have an opcode of 0

R-type instructions have three operands: fields rs and rt are

sources, and rd is the destination.

Instruction format for load (opcode =35) and store (opcode

=43) instructions.

Register rs is base register that is added to 16-bit address field

to form memory address.

For loads, rt is destination register for loaded value.

For stores, rt is source register whose value should be stored

into memory.

Instruction format for branch equal (opcode = 4)

Registers rs and rt are the source registers that are compared

for equality.

16-bit address field is sign-extended, shifted, and added to PC

to compute branch target address

Page 21: EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

9/4/2018

21

41

Instruction RegDst ALUSrc

Memto-

Reg

Reg

Write

Mem

Read

Mem

Write Branch ALUOp1 ALUp0

R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0

sw X 1 X 0 0 1 0 0 0

beq X 0 X 0 0 0 1 0 1

PC

Instructionmemory

Readaddress

Instruction[31–0]

Instruction [20–16]

Instruction [25–21]

Add

Instruction [5–0]

MemtoReg

ALUOp

MemWrite

RegWrite

MemRead

Branch

RegDst

ALUSrc

Instruction [31–26]

4

16 32Instruction [15–0]

0

0Mux

0

1

Control

Add ALUresult

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Shiftleft 2

Mux

1

ALUresult

Zero

Datamemory

Writedata

Readdata

Mux

1

Instruction [15–11]

ALUcontrol

ALUAddress

Use rt not rd

42

The Effect of the seven Control signal

Textbook Figure 5.16 p306.

Page 22: EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

9/4/2018

22

43

PC

Instructionmemory

Readaddress

Instruction[31–0]

Instruction [20–16]

Instruction [25–21]

Add

Instruction [5–0]

MemtoReg

ALUOp

MemWrite

RegWrite

MemRead

Branch

RegDst

ALUSrc

Instruction [31–26]

4

16 32Instruction [15–0]

0

0Mux

0

1

Control

Add ALUresult

Mux

0

1

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Shiftleft 2

Mux

1

ALUresult

Zero

Datamemory

Writedata

Readdata

Mux

1

Instruction [15–11]

ALUcontrol

ALUAddress

Use rt not rd

Instruction RegDst ALUSrc

Memto-

Reg

Reg

Write

Mem

Read

Mem

Write Branch ALUOp1 ALUp0

R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0

sw X 1 X 0 0 1 0 0 0

beq X 0 X 0 0 0 1 0 1

44

Control Unit Signals

R-format Iw sw beq

Op0

Op1

Op2

Op3

Op4

Op5

Inputs

Outputs

RegDst

ALUSrc

MemtoReg

RegWrite

MemRead

MemWrite

Branch

ALUOp1

ALUOpO

To harness

the datapath

Inst[31:26]

Page 23: EEC 581 Computer Architecture...9/4/2018 1 EEC 581 Computer Architecture Datapath Department of Electrical Engineering and Computer Science Cleveland State University 1 2 Architects

9/4/2018

23

45

All of the logic is combinational

We wait for everything to settle down, and the right thing to be

done

ALU might not produce “right answer” right away

we use write signals along with clock to determine when to write

Cycle time determined by length of the longest path

Our Simple Control Structure

We are ignoring some details like setup and hold times

Clock cycle

State

element

1

Combinational logic

State

element

2