22
Out-of-Order OpenRISC 2 semesters project Semester B: OR1200 ISA Extension Final B Presentation By: Vova Menis-Lurie Sonia Gershkovich Advisor: Mony Orbach 10.3.14 Spring 2013

Out-of-Order OpenRISC 2 semesters project

  • Upload
    fawzi

  • View
    114

  • Download
    0

Embed Size (px)

DESCRIPTION

Out-of-Order OpenRISC 2 semesters project . Semester B: OR1200 ISA Extension Final B Presentation. 10.3.14 . By: Vova Menis-Lurie Sonia Gershkovich Advisor: Mony Orbach. Spring 2013. Content:. 1 . Project Overview a. Background - PowerPoint PPT Presentation

Citation preview

Page 1: Out-of-Order  OpenRISC 2 semesters project

Out-of-Order OpenRISC2 semesters project

Semester B: OR1200 ISA Extension Final B Presentation

By: Vova Menis-Lurie Sonia GershkovichAdvisor: Mony Orbach

10.3.14

Spring 2013

Page 2: Out-of-Order  OpenRISC 2 semesters project

Content:1. Project Overview

a. Background b. Goals

2. The System: OR12003. Project Flow

a. Simulation Environmentb. Out-of-Order Implementationc. Super Scalar implementationd. ISA Extension

4. Conclusions

Page 3: Out-of-Order  OpenRISC 2 semesters project

Project Overview

Background• OpenRISC 1200 is an open source Verilog implementation of OR1000 ISA

• As a part A, we created basic working environment on XUPV5 board and SoC with OR1200 CPU

Page 4: Out-of-Order  OpenRISC 2 semesters project

Project Overview

Project GoalInitial Goal:

Out-of-Order execution processor implementation based on OR1200 implementation

Changed goal:Super Scalar processor implementation based on OR1200 implementation

Final Goal

ISA Extension Implementation for OR1200

Page 5: Out-of-Order  OpenRISC 2 semesters project

CPU

Page 6: Out-of-Order  OpenRISC 2 semesters project

MMU

CPUQMEM

OR1200 top

IMMU

DMMU

32

32

Cache

ICache

DCache

3232

3232

StoreBuffer

WBI

Instruction

WBIU

DataWBIU

3232

32WB bus

WB bus

Page 7: Out-of-Order  OpenRISC 2 semesters project

1. Cache initialization function in assembly to enable cache.

(WB Interface protocol require 3 cycles for each transaction – not effective for rtl analyze and implementation

improvements )

2. Simulation Environment Creation (Testbench)

3. Out-of-Order implementation – try

4. Super-Scalar implementation – try

5. ISA extension of current implementation

Project Flow

Page 8: Out-of-Order  OpenRISC 2 semesters project

Environment features:• UART interface emulation• Waveform generation • One Makefile to:

• RTL Compilation• Testbench instantiation• C program compilation• Run simulation• Assembly code file creation

• XILINX ram initialization file

Simulation Environment

Page 9: Out-of-Order  OpenRISC 2 semesters project

Environment features:• Advanced monitor:

• Monitoring all data and control transactions of SoC• Monitoring states and SPRS values• Creates log files with desired information:

• States of register file after each command

• Execution time analysis

Simulation Environment

Page 10: Out-of-Order  OpenRISC 2 semesters project

Fundamental statements (based on Tomasulu algorithm):• Execution parallelism should be implemented !!• Non-arch shadow registers implementation.• In order commitment. (SW executes in order)

Out of Order implementation – try

ALU

OR1200 IF

GenPCOR1200

CTRL

Except

Freeze

MAC

LSU

FPU

SPRS CFGROR1200

RF

PCNext PC

Operand MUX

OR1200 top

OR1200 top

OR1200 top

WB MUX

CPU

• For LSU instruction parallelism–multiple ports memory and wider bus-multiple port Cache, QMEM and MMU

• Branch prediction is not necessary – delay slot at compiler level

• Multiple ALU – not effective solutionALU instructions executed in one cycle

Page 11: Out-of-Order  OpenRISC 2 semesters project

Fundamental statements :.• Still in-order commitment. Multiple execution should not affect SW in-order

execution• Non-parallel Fetch and Decode to avoid instructions dependencies.

Super Scalar implementation – try

• Fetch and Decode units should be completely rewritten based on current implementation

• Exception engine should support 2 pipes – requires exception unit complete redesign

• Not all dependencies can be seen at fetch/decode stage LSU results may be required

• Multiple port SPRS should be implemented.

• Parallel LSU instruction execution in 2 pipes requires multiple port memories and wider bus

Page 12: Out-of-Order  OpenRISC 2 semesters project

• gcc OR1000 compiler and assembler support empty slots for custom ISA extension

• 8 non-parameter commands:• l.cust1• l.cust2• l.cust3• l.cust4• l.cust6• l.cust7• l.cust8

• 1 highly parameterized command• l.cust5 Rd , Ra , Rb , L immediate[5:0] , K immediate [4:0]• Allows 2048 !! commands which operates on 3 registers.

• ISA extension will not be used by compiler to generate assembly code from given C code, but gcc allows assembly commands use aside C code.

ISA Extension – final goal

Page 13: Out-of-Order  OpenRISC 2 semesters project

4 Non parameterized commands

• l.cust1• Set flag (unconditioned)

• l.cust2• Unset flag (unconditioned)

• l.cust3• Set carry (unconditioned)

• l.cust4• Unset carry (unconditioned)

l.cust Commands Implementation

Page 14: Out-of-Order  OpenRISC 2 semesters project

l.cust5 parameterized command : K immediate defines command, L immediate defines options

• K=0x1 • Replaces A[L_byte] with B[0_byte] and put result in D

• K=0x2 • SET bit A[L] (Result in D)

• K=0x3 • UNSET bit A[L] (Result in D)

l.cust Commands Implementation

Page 15: Out-of-Order  OpenRISC 2 semesters project

l.cust5 parameterized command : K immediate defines command, L immediate defines options

• K=0x4 • Slice A(MSB’s) and B(LSB’s) and put result in D >> D = {A[32-L:L] , B[L-1:0]}

• K=0x5 • Slice B(MSB’s) and A(LSB’s) and put result in D >> D = {B[32-L:L] , A[L-1:0]}

• K=0x6 • Rotate A >> D = A[0:31]

l.cust Commands Implementation

Page 16: Out-of-Order  OpenRISC 2 semesters project

l.cust5 parameterized command : K immediate defines command, L immediate defines options

• K=0x7 • Rotate A by bit- Hword-wise >> D = {A[16:31] , A[0:15]}

• K=0x8 • Rotate A by bit- byte-wise >> D = {A[24:31] , A[16:23] , A[8:15] , A[0:7]}

• K=0xa • Check if A is even. If true D=1 and set flag else D=0

• K=0xb • Check if A is odd. If true D=1 and set flag else D=0

l.cust Commands Implementation

Page 17: Out-of-Order  OpenRISC 2 semesters project

l.cust5 parameterized command : K immediate defines command, L immediate defines options

• K=0xe • L=2: Rotate A 2bytes MSB’s with 2bytes LSB’s >> D = {A[15:0] , A[31:16]}• L=4: Rotate A byte-wise >> D = {A[7:0] , A[15:8] , A[23:16] , A[31:24]}• L=8: Rotate A Hbyte-wise >> D = {A[3:0] , A[7:4] , A[11:8] , A[15:12] , A[19:16] , A[23:20] , A[27:24] ,A[31:28]};

• K=0xf

• L=0: Mirror LSB’s >> D = {A[0:15] , A[15:0]}• L=1: Mirror MSB’s >> D = {A[31:16] , A[16:31]}

l.cust Commands Implementation

Page 18: Out-of-Order  OpenRISC 2 semesters project

ISA Extension – FPGA provenTest C program

Page 19: Out-of-Order  OpenRISC 2 semesters project

ISA Extension – FPGA provenUART output

Page 20: Out-of-Order  OpenRISC 2 semesters project

FPGA UtilizationOld RTL New RTL

~1% change

Page 21: Out-of-Order  OpenRISC 2 semesters project

• Given implementation is not suitable for any significant u-Arch improvements

• Out-of-Order / Super-Scalar OR1200 implementations are possible but should

be done from scratch.

• Written in assembly software can be easily optimized for specific application

due to l.cust instructions (2048 instructions with 5 operands)

Conclusions

Page 22: Out-of-Order  OpenRISC 2 semesters project

Thank you!