32-bit Pipelined RISC Processor
Group 1Group 1aka “Go Us”aka “Go Us”
Alice WangAnn HoJason Fong
CS m152bTA: Young ChoLab section 1
General Review of a Pipelined Processor
Instruction Memory
Register File ALU
Data Memory
PC Control
IF/ID ID/EX EX/MEM MEM/WB
Memory Controller DesignDesign challenge: 32-bit processor with 16-bit memory interfaceOn every memory access, need to get two words from memory
Solution: Clock memory controller twice as fast as rest of processorResults in a memory access on the rising and falling edge of the processor’s clock cycle
DataMemoryOne request 16-bit
interface
Two16-bitwords
32-bitword
?
Request
Processor
Memory
16-bits 16-bits
32-bits
32-bit
32-bit
Instruction Format
R-type instruction
3 bits varies according to instruction type
General instruction format
31 28
opcode
27 25
rs
24 22
rt
21 19
rd
unusedunused 8 4
shamt
8 4
shamt
3 0
funct
3 0
funct
I-type instruction
31 28
opcode
27 25
rs
24 22
rt
unused 15 0
imm16
31 28
opcode
27 25
rs
24 22
rt
unused 15 0
imm16
J-type instruction
31 28
opcode
unused 15 0
imm16
31 28
opcode
27 25
rs
24 22
rt
21 19
rd
31 28
opcode
unused 15 0
imm16
R-type instructionsI-type instructionsJ-type instructions
Our Arithmetic Logical Unit
Our_mult
0 0 0 0 0 0 0 0 0
Multiplier
Uses a series of shifts and additions
Example: 13 x 11 = 01101 x 01011
0110101011
multiplicand
multiplier
01101 x1
= 01101
HI LO
+ 0 1 1 0=1
= 01101 +
00
11 1 0 1 1
0 0 1 0 0 0 1 1 1 1 = 143
multiplier(more efficient, but more hardware)
Data Forwarding
ID/EX EX/MEM MEM/WB
Forward From ALU output
Forward From memory output
Hardware NOP Insertion
NOP
1
IF/ID
PC Adder
Hold PC value
Insert NOP
Data Forwarding and Stall Insertions:Observed Results
Sample program: Bubble-sort 6 numbers
Assembler insertion of NOPs
Machine code size: 66 words of memory
Execution time: ~750 clock cycles
Hardware data forwarding and NOP insertion:
Machine code size: 35 words of memory
Execution time: ~400 clock cycles
Data Forwarding and Stall Insertions:Observed Results
• Savings in memory and execution time• Much simpler assembler• But hardware is now more complex
• Tradeoff between hardware complexity and software complexity
• Also demonstrates benefits of understanding the underlying architecture when designing an assembler
Conclusion
Some problems we encountered:• Off by one stage in pipeline• Lack of experience with VHDL• Order of bits from memory
In Conclusion...• Knowledge from previous courses• Further research• Simple RISC processor• Pipelining• Multiplier• Data Forwarding and Hardware NOP’s
References
• Hennessey and Patterson, Computer Organization and Design (2nd Ed.), 1998, pages 476-495
• Donaldson, John L., “Pipeline Hazards”, http://occs.cs.oberlin.edu/faculty/jdonalds/317/lecture08.html
• Ercegovac, Intro To Digital Systems
• Add more references from lab 4