Upload
thomas-phelps
View
39
Download
3
Embed Size (px)
DESCRIPTION
CPE 626: Advanced VLSI Design L02. Department of Electrical and Computer Engineering University of Alabama in Huntsville. Outline. Simple Processor – MU0 Datapath Design Control Logic ALU Design Pipeline Processor – DLX ISA Registers Addressing Modes and Data Types Instruction Format - PowerPoint PPT Presentation
Citation preview
CPE 626: Advanced VLSI DesignL02
Department of Electrical and Computer Engineering
University of Alabama in Huntsville
19/04/23 UAH-CPE631 2
Outline
• Simple Processor – MU0– Datapath Design– Control Logic– ALU Design
• Pipeline Processor – DLX– ISA
• Registers• Addressing Modes and Data Types• Instruction Format• Instruction Set
– Non-pipeline Implementation– Pipeline Implementation
19/04/23 UAH-CPE631 3
MU0 – A Simple Processor
• Instruction format• Instruction set opcode S
12 bits4 bits
Instruction Opcode Effect
LDA S 0000 ACC := mem16[S]
STO S 0001 mem16[S] := ACC
ADD S 0010 ACC := ACC + mem16[S]
SUB S 0011 ACC := ACC - mem16[S]
JMP S 0100 PC := S
JGE S 0101 if ACC >= 0 PC := S
JNE S 0110 if ACC !=0 PC := S
STP 0111 stop
19/04/23 UAH-CPE631 4
MU0 Logic Design
• Follow an approach to separate the design into two components– Datapath – all the components carrying, storing or
processing bits including the accumulator, program counter, ALU, and instruction register
– Control logic – everything that does not fit comfortably into datapath
• Datapath design: many ways to do this– Assume that memory access is limiting factor, and
assume that memory access will take exactly one clock cycle
19/04/23 UAH-CPE631 5
MU0 Datapath Example
• Program Counter – PC• Accumulator - ACC• Arithmetic-Logic Unit –
ALU
• Instruction Register• Instruction Decode and
Control Logic
IRPC
ACCALU
memory
control
address bus
data bus
Follow the principle that the memory will be limiting factor in design: each instruction takes exactly the number of clock cycles defined by the number of memory accesses it must take.
Note: We do not have a dedicated PC incrementer! Why?
19/04/23 UAH-CPE631 6
MU0 Datapath Design
• Assume that each instruction starts when it has arrived in the IR
• Step 1: EX (execute)– LDA S: ACC <- Mem[S]– STO S: Mem[S] <- ACC– ADD S: ACC <- ACC + Mem[S]– SUB S: ACC <- ACC - Mem[S]– JMP S: PC <- S– JGE S: if (ACC >= 0) PC <- S– JNE S: if (ACC != 0) PC <- S
• Step 2: IF (fetch the next instruction)– Either PC or the address in the IR
is issued to fetch the next instruction
– address is incremented in the ALU and value saved into the PC
• Initialization– Reset input to start
executing instructions from a known address; here it is 000hex
• provide zero at the ALU output and then load it into the PC register
19/04/23 UAH-CPE631 7
MU0 RTL Organization
• Control Logic– Asel– Bsel– ACCce (ACC change
enable) – PCce (PC change enable)– IRce (IR change enable)– ACCoe (ACC output enable)– ALUfs (ALU function select)– MEMrq (memory request)– RnW (read/write)– Ex/ft (execute/fetch)
memory
ACC
IRce
PCce
ALUfs
Bsel
ACCce
ACCoe
MEMrq RnW
mux0 1
Asel
ALUAB
PC
ACC[15]
ACCz
IR
opcode
MU0
19/04/23 UAH-CPE631 8
MU0 control logic
Inputs Outputs Opco de Ex / f t ACC1 5 Bs e l PCce ACCo e MEMrq Ex / f tIns truct i o n Res et ACCz As el ACCce IRce ALUfs RnWReset xxxx 1 x x x 0 0 1 1 1 0 = 0 1 1 0LDA S 0000
000000
01
xx
xx
10
10
10
01
01
00
= BB+1
11
11
10
STO S 00010001
00
01
xx
xx
10
x0
00
01
01
10
xB+1
11
01
10
ADD S 00100010
00
01
xx
xx
10
10
10
01
01
00
A+BB+1
11
11
10
SUB S 00110011
00
01
xx
xx
10
10
10
01
01
00
A-BB+1
11
11
10
JMP S 0100 0 x x x 1 0 0 1 1 0 B+1 1 1 0JGE S 0101
010100
xx
xx
01
10
00
00
11
11
00
B+1B+1
11
11
00
JNE S 01100110
00
xx
01
xx
10
00
00
11
11
00
B+1B+1
11
11
00
STOP 0111 0 x x x 1 x 0 0 0 0 x 0 1 0
19/04/23 UAH-CPE631 9
LDA S (0000)
memory
ACC
IRce
PCce
ALUfs
Bsel
ACCce
ACCoe
MEMrq RnW
mux0 1
Asel
ALUAB
PC
ACC[15]
ACCz
IR
opcode
MU0
memory
ACC
IRce
PCce
ALUfs
Bsel
ACCce
ACCoe
MEMrq RnW
mux0 1
Asel
ALUAB
PC
ACC[15]
ACCz
IR
opcode
MU0
B
Ex/ft = 1 Ex/ft = 0
B+1
19/04/23 UAH-CPE631 10
STO S (0001)
memory
ACC
IRce
PCce
ALUfs
Bsel
ACCce
ACCoe
MEMrq RnW
mux0 1
Asel
ALUAB
PC
ACC[15]
ACCz
IR
opcode
MU0
memory
ACC
IRce
PCce
ALUfs
Bsel
ACCce
ACCoe
MEMrq RnW
mux0 1
Asel
ALUAB
PC
ACC[15]
ACCz
IR
opcode
MU0
x
Ex/ft = 1 Ex/ft = 0
B+1
19/04/23 UAH-CPE631 11
ADD S (0010)
memory
ACC
IRce
PCce
ALUfs
Bsel
ACCce
ACCoe
MEMrq RnW
mux0 1
Asel
ALUAB
PC
ACC[15]
ACCz
IR
opcode
MU0
memory
ACC
IRce
PCce
ALUfs
Bsel
ACCce
ACCoe
MEMrq RnW
mux0 1
Asel
ALUAB
PC
ACC[15]
ACCz
IR
opcode
MU0
A+B
Ex/ft = 1 Ex/ft = 0
B+1
19/04/23 UAH-CPE631 12
SUB S (0011)
memory
ACC
IRce
PCce
ALUfs
Bsel
ACCce
ACCoe
MEMrq RnW
mux0 1
Asel
ALUAB
PC
ACC[15]
ACCz
IR
opcode
MU0
memory
ACC
IRce
PCce
ALUfs
Bsel
ACCce
ACCoe
MEMrq RnW
mux0 1
Asel
ALUAB
PC
ACC[15]
ACCz
IR
opcode
MU0
A-B
Ex/ft = 1 Ex/ft = 0
B+1
19/04/23 UAH-CPE631 13
JMP S (0100)
memory
ACC
IRce
PCce
ALUfs
Bsel
ACCce
ACCoe
MEMrq RnW
mux0 1
Asel
ALUAB
PC
ACC[15]
ACCz
IR
opcode
MU0
B+1
Ex/ft = 0
19/04/23 UAH-CPE631 14
JGE S (0101)
memory
ACC
IRce
PCce
ALUfs
Bsel
ACCce
ACCoe
MEMrq RnW
mux0 1
Asel
ALUAB
PC
ACC[15]
ACCz
IR
opcode
MU0
B+1
Ex/ft = 0, ACC15 = 0 memory
ACC
IRce
PCce
ALUfs
Bsel
ACCce
ACCoe
MEMrq RnW
mux0 1
Asel
ALUAB
PC
ACC[15]
ACCz
IR
opcode
MU0
B+1
Ex/ft = 0, ACC15 = 1
19/04/23 UAH-CPE631 15
JNE S (0110)
memory
ACC
IRce
PCce
ALUfs
Bsel
ACCce
ACCoe
MEMrq RnW
mux0 1
Asel
ALUAB
PC
ACC[15]
ACCz
IR
opcode
MU0
B+1
Ex/ft = 0, ACCz = 0 memory
ACC
IRce
PCce
ALUfs
Bsel
ACCce
ACCoe
MEMrq RnW
mux0 1
Asel
ALUAB
PC
ACC[15]
ACCz
IR
opcode
MU0
B+1
Ex/ft = 0, ACCz = 1
19/04/23 UAH-CPE631 16
STP (001)
memory
ACC
IRce
PCce
ALUfs
Bsel
ACCce
ACCoe
MEMrq RnW
mux0 1
Asel
ALUAB
PC
ACC[15]
ACCz
IR
opcode
MU0
x
Ex/ft = 0
19/04/23 UAH-CPE631 17
Reset
memory
ACC
IRce
PCce
ALUfs
Bsel
ACCce
ACCoe
MEMrq RnW
mux0 1
Asel
ALUAB
PC
ACC[15]
ACCz
IR
opcode
MU0
0
Ex/ft = 0
19/04/23 UAH-CPE631 18
MU0 ALU Design
• ALU functions: A+B, A-B, B, B+1, 0 (used only when reset is active) => 4 functions
• Aen (enable operand A)• Binv (invert operand B)
Cin
Cout
sum
reset
B
A
Aen
Binv
19/04/23 UAH-CPE631 20
DLX Registers
• GPRs with load-store architecture• GPR: 32 32-bit named R0, R1,... R31, R0=0• FPR (floating point registers):
– single precision:32 32-bit named F0, F1,... F31 (accessed independently)
– double precision:16 64-bit named F0, F2,... F30 (accessed in pairs)
• Instructions which support transfers between GPRs and FPRs
• Other status registers, e.g., floating-point status register (hold information about the results of FP ops)
19/04/23 UAH-CPE631 21
Addressing Modes and Data Types
• Immediate with a 16-bit value field• Displacement with a 16-bit displacement
– register deferred derived when disp=0– absolute derived from displacement with R0
• Byte addressable in big-endian with 32-bit address • All memory references are load/store through GPR
or FPR and must be aligned • Data types
– 8-bit bytes, 16-bit half words (loaded into registers with either zeros or the sign bit replicated to fill 32 bits)
– 32-bit integers– 32-bit single precision and 64-bit double-precision for FP
19/04/23 UAH-CPE631 22
Instruction Formats
• I-type: load, store, arithmetic, logic, relational, shift, branch
• R-type: arithmetic, logic, relational
• J-type: jump, jump and link, trap, return from exception
I-type instruction
Encodes: Loads and stores of bytes, words, half wordsAll immediates (rdrs1 op immediate)Conditional branch instructions (rs1 is register, rd is unused)Jump register, jump and link register (rd=0, rs=destination, imm.=0)
Opcode rs1 rd immediate
16556
R-type instruction
Reg-reg ALU operations: (rdrs1 func rs2); func={add, sub,...}Read/write special registers and moves
Opcode rs1 rs2 func
16556
rd
J-type instruction
Jump and jump and link; Trap and return from exception
Opcode Offset added to PC
266
19/04/23 UAH-CPE631 23
Instructions for Data Transfers
Example Instruction Meaning
LW R1, 30(R2) Regs[R1] 32 Mem[30 + Regs[R2]]
LW R1, 1000(R0) Regs[R1] 32 Mem[1000 + 0]
LB R1, 40(R3) Regs[R1] 32 (Mem[40 + Regs[R3]]0)24 ## Mem[40 + Regs[R3]]
LBU R1, 40(R3) Regs[R1] 32 (0)24 ## Mem[40 + Regs[R3]]
LH R1, 40(R3) Regs[R1] 32 (Mem[40 + Regs[R3]]0)16 ## Mem[40 + Regs[R3]] ## Mem[41+Regs[R3]]
LF F0, 50(R3) Regs[F0] 32 Mem[50 + Regs[R3]]
LD F0, 50(R2) Regs[F0] ## Regs[F1] 32 Mem[50 + Regs[R2]]
Instruction Opcode Instruction Meaning
LB, LBU, SB Load byte, load byte unsigned, store byte
LH, LHU, SH Load half word, load half word unsigned, store half word
LW, SW Load word, store word (to/from integer registers)
LF, LD, SF, SDLoad SP float, load DP float, store SP float, store DP float (SP - single precision, DP - double precision)
MOVI2S, MOVS2I Move from/to GPR to/from a special register
MOVF, MOVD Copy one floating-point register or a DP pair to another register or pair
MOVFP2I, MOVI2FP Move 32 bits from/to FP register to/from integer registers
19/04/23 UAH-CPE631 24
Arithmetic/logical instructions
• All ALU instructions are register-register – add, sub, and, or, xor, shift – Immediate forms also available – LHI loads immediate value into most significant 16 bits
• R0 used to synthesise other operations – Loading constant is an immediate =>
add with R0 as one source – Register-register move is an add with R0 as one source
• Compare operations put 1 ("true") in destination if condition is met
19/04/23 UAH-CPE631 25
Arithmetic/logical instructions (cont’d)
Example Instruction MeaningADD R1, R2, R3 Regs[R1] Regs[R2] + Regs[R3]ADDI R1, R2, #3 Regs[R1] Regs[R2] + 3LHI R1, #42 Regs[R1] 42##016
SLLI R1, R2, #5 Regs[R1] Regs[R2] << 5SLT R1, R2, R3 if (Regs[R2] < Regs[R3]) Regs[R1] 1 else Regs[R1] 0
Instruction Opcode Instruction Meaning
ADD, ADDI, ADDU, ADDUI Add, add immediate (all immediates are 16-bits); signed and unsigned
SUB, SUBI, SUBU, SUBUI Subtract, subtract immediate; signed and unsigned
MULT, MULTU, DIV, DIVU Multiply and divide, signed and unsigned; operands must be floating-point registers; all operations take and yield 32-bit values
AND, ANDI And, and immediate
OR, ORI, XOR, XORI Or, or immediate, exclusive or, exclusive or immediate
LHI Load high immediate - loads upper half of register with immediate
SLL, SRL, SRA, SLLI, SRLI, SRAI
Shifts: both immediate(S__I) and variable form(S__); shifts are shift left logical, right logical, right arithmetic
S__, S__I Set conditional: "__"may be LT, GT, LE, GE, EQ, NE
19/04/23 UAH-CPE631 26
Control-flow instructions
• Jump can use 26-bit signed offset from PC or contents of register
• Jump-and-link saves PC in R31 • Conditional branches test source for zero/non-zero
and use 16-bit signed offset
Instruction Opcode Instruction Meaning
BEQZ, BNEZ Branch GPR equal/not equal to zero; 16-bit offset from PC
BFPT, BFPF Test comparison bit in the FP status register and branch; 16-bit offset from PC
J, JR Jumps: 26-bit offset from PC(J) or target in register(JR)
TRAP Transfer to operating system at a vectored address
RFE Return to user code from an exception; restore user code
19/04/23 UAH-CPE631 27
Floating-point instructions in DLX
• Moves between floating point (32-bit) and double-precision (64-bit) registers
• Operations: add, subtract, multiply, divide
• Also, integer multiply/divide on floating point regs
Instruction Opcode Instruction Meaning
ADDD, ADDF Add DP, SP numbers
SUBD, SUBF Subtract DP, SP numbers
MULTD, MULTF Multiply DP, SP floating point
DIVD, DIVF Divide DP, SP floating point
CVTF2D, CVTF2I, CVTD2F, CVTD2I, CVTI2F, CVTI2D
Convert instructions: CVTx2y converts from type x to type y, where x and y are one of I(Integer), D(Double precision), or F(Single precision). Both operands are in the FP registers.
__D, __F DP and SP compares: "__" may be LT, GT, LE, GE, EQ, NE; set comparison bit in FP status register.
19/04/23 UAH-CPE631 29
Instruction Execution
• Process of “instruction execution” is usually broken up into stages (“divide and conquer”)– smaller stages are easier to design– easy to optimize (change) one stage
without touching the others
• 5 main stages for DLX; each stage takes one clock cycle– Instruction Fetch (IF)– Instruction Decode / Register fetch cycle (ID)– Execution / Effective address cycle (EX)– Memory access / Branch completion cycle (MEM)– Write-back cycle (WB)
19/04/23 UAH-CPE631 30
Instruction Fetch (IF)
• Send out PC and fetch the instruction from the memory into instruction register (IR)– IR is used to hold the instruction
• Increment the PC by 4 to address the next sequential instruction– NPC is used to hold the next sequential address
IR Mem[PC]NPC PC + 4
19/04/23 UAH-CPE631 31
Instruction Decode (ID)
• Decode the instruction to determine instruction type (Opcode field - 6 ms bits of the instruction)
• Read in data from all necessary registers– temporary registers A, B hold outputs of GPR– Imm is used to hold sign-extended lower 16-bits of the IR– decoding is done in parallel with reading registers
since these fields are at fixed locations– a register may be read even we do not use it
A Regs[IR6..10]B Regs[IR11..15]Imm (IR16)16##IR16..31
19/04/23 UAH-CPE631 32
Execution [EX] (1/2)
• Register-register ALU instruction– ALU performs the operation specified by the opcode
on the values in registers A and B;the result is placed in the temporary register ALUOutput
• Register-immediate ALU instruction– ALU performs the operation specified by the opcode on
the value in register A and on the value in register Imm;the result is placed in the temporary register ALUOutput
ALUOutput A op B
ALUOutput A op Imm
19/04/23 UAH-CPE631 33
Execution [EX] (2/2)
• Memory reference– ALU adds the operands to form effective address and
places the result into the temporary register ALUOutput
• Branch– ALU adds the NPC to the Imm to compute
the address of the branch target– Register A is checked to determine whether the branch
is taken (for BEQZ op is “==“; for BNEZ op is “!=“;)– Cond is 1-bit register (1 - branch is taken, 0 - not taken)
ALUOutput A + Imm
ALUOutput NPC + ImmCond (A op 0)
19/04/23 UAH-CPE631 34
Memory access (MEM)
• Memory reference– load– store
• Branch– if the instruction branches, the PC is replaced with the
branch destination; otherwise, it is replaced with NPC
LMD Mem[ALUOutput]
if (cond) PC ALUOutput else PC NPC
Mem[ALUOutput] B
19/04/23 UAH-CPE631 35
Write-back (WB)
• Register-register ALU
• Register-immediate ALU
• Load instruction
Regs[IR16..20] ALUOutput
Regs[IR11..15] ALUOutput
Regs[IR11..15] LMD
19/04/23 UAH-CPE631 36
Datapath
MemoryAccess
WriteBack
InstructionFetch
Instr. DecodeReg. Fetch
ExecuteAddr. Calc
InstructionMemory
SignExtend
4 Zero?
Next SEQ PC
Next PC
WB Data
PCRD
RS1
RS2
Imm
Add
MUX
MUX
MUX
IR
MUX
DataMemory
ALU
LMD
Reg. File
A
B
NPC
Imm
ALU
output
19/04/23 UAH-CPE631 37
Sequential Execution
Time [clocks]
MemIF ID Ex WB
MemIF ID Ex WB
MemIF ID Ex WB
Instructions
Ii
Ii+1
Ii+2
Sequential execution for these 3 instructions (Ii, Ii+1, Ii+2)takes 15 clock cycles
5 10
19/04/23 UAH-CPE631 38
Pipelined Execution
Time [clocks]
MemIF ID Ex WB
MemIF ID Ex WB
MemIF ID Ex WB
Instructions
Ii
Ii+1
Ii+2
Pipelined execution for instructions Ii, Ii+1, and Ii+2
takes 7 clock cycles
5 10
Ii+3 MemIF ID Ex WB
MemIF ID Ex WBIi+4
Analogy with automobile assembly line- many steps,
each contributing something to the construction of the car
- each step operates in parallel with other steps, though on a different car
Pipe stages (segments)
19/04/23 UAH-CPE631 39
Pipelining Lessons
Time [clocks]
MemIF ID Ex WB
MemIF ID Ex WB
MemIF ID Ex WB
Instructions
Ii
Ii+1
Ii+2
5• Pipelining does not help
latency of single instruction, it helps throughput of entire workload
• Multiple instructions operating simultaneously using different resources
• Potential speedup = Number pipe stages
• Time to “fill” pipeline and time to “drain” reduce speedup: 2.15X vs. 5X in this example
Latency & Throughput:• Latency: ...how long it takes to
execute an instruction• Throughput: ...how often an
instruction exits the pipeline
19/04/23 UAH-CPE631 40
Pipelining Lessons (cont’d)
Time [clocks]
MemIF ID Ex WB
MemIF ID Ex WB
MemIF ID Ex WB
Instructions
Ii
Ii+1
Ii+2
5 • Pipeline stages are hooked together => all stages must be ready to proceed at the same time
• Machine cycle – the time required between moving an instruction one step down the pipeline (usually one clock cycle)
• The length of a machine cycle is determined by the time required for the slowest stage
• Unbalanced lengths of pipe stages also reduces speedup
19/04/23 UAH-CPE631 41
Visualizing Pipeline
Reg AL
U
DMIM Reg
Reg AL
U
DMIM Reg
Reg AL
U
DMIM Reg
Instr.
Order
Time (clock cycles)
Reg AL
U
DMIM Reg
CC 2 CC 3 CC 4 CC 6 CC 7CC 5CC 1
19/04/23 UAH-CPE631 42
Pipeline Datapath
MemoryAccess
WriteBack
InstructionFetch
Instr. DecodeReg. Fetch
ExecuteAddr. Calc
InstructionMemory
SignExtend
4 Zero?
Next SEQ PC
Next PC
WB Data
PC
IR6..10
Imm
Add
MUX
MUX
MUX
MUX
DataMemory
ALU
Reg. File
IR11..15
ID/EXIF/ID EX/MEM MEM/WB
MEM/WB.IR11..15 or MEM/WB.IR16..20
IR
19/04/23 UAH-CPE631 43
Instruction Flow through Pipeline Regs
Reg
ALU
DM
IMR
eg
CC 2 CC 3CC 1
Reg
ALU
DM
IMR
eg
Reg
ALU
DM
IMR
eg
Add R1,R2,R3
Nop
Nop
Nop
Add R1,R2,R3
Nop
Nop
Lw R4,0(R2)
Add R1,R2,R3
Nop
Lw R4,0(R2)
Sub R6,R5,R7
Reg
ALU
DM
IMR
eg
Add R1,R2,R3
Lw R4,0(R2)
Sub R6,R5,R7
CC 4
Xor R9,R8,R1
Time (clock cycles)
19/04/23 UAH-CPE631 44
DLX Pipeline Definition: IF, ID
• Stage IF– IF/ID.IR Mem[PC];– if EX/MEM.cond {IF/ID.NPC, PC EX/MEM.ALUOUT}
else {IF/ID.NPC, PC PC + 4};
• Stage ID– ID/EX.A Regs[IF/ID.IR6…10];
ID/EX.B Regs[IF/ID.IR11…15];
– ID/EX.Imm (IF/ID.IR16)16 ## IF/ID.IR16…31;
– ID/EX.NPC IF/ID.NPC; ID/EX.IR IF/ID.IR;
19/04/23 UAH-CPE631 45
DLX Pipeline Definition: IE
• ALU – EX/MEM.IR ID/EX.IR;– EX/MEM.ALUOUT ID/EX.A func ID/EX.B; or
EX/MEM.ALUOUT ID/EX.A func ID/EX.Imm;– EX/MEM.cond 0;
• load/store– EX/MEM.IR ID/EX.IR;
EX/MEM.B ID/EX.B;– EX/MEM.ALUOUT ID/EX.A ID/EX.Imm; – EX/MEM.cond 0;
• branch – EX/MEM.NPC ID/EX.A ID/EX.Imm;– EX/MEM.cond (ID/EX.A func 0);
19/04/23 UAH-CPE631 46
DLX Pipeline Definition: MEM, WB
• Stage MEM– ALU
• MEM/WB.IR EX/MEM.IR;• MEM/WB.ALUOUT EX/MEM.ALUOUT;
– load/store• MEM/WB.IR EX/MEM.IR;• MEM/WB.LMD Mem[EX/MEM.ALUOUT] or
Mem[EX/MEM.ALUOUT] EX/MEM.B;
• Stage WB– ALU
• Regs[MEM/WB.IR16…20] MEM/WB.ALUOUT; orRegs[MEM/WB.IR11…15] MEM/WB.ALUOUT;
– load• Regs[MEM/WB.IR11…15] MEM/WB.LMD;