30
RISC PROCESSOR IMPLEMENTATION USING BLUESPEC PART 2 - FINAL PRESENTATION Performed By: Yahel Ben-Avraham and Yaron Rimmer Instructor: Mony Orbach Bi-semesterial, 2012 - 2014 30/3/2014

RISC processor implementation using Bluespec part 2 - final presentation

  • Upload
    overton

  • View
    79

  • Download
    0

Embed Size (px)

DESCRIPTION

30/3/2014. Performed By: Yahel Ben- Avraham and Yaron Rimmer Instructor: Mony Orbach Bi- semesterial , 2012 - 2014. RISC processor implementation using Bluespec part 2 - final presentation. Project goals. Goal: Implementing and analyzing RISC Processor using Bluespec Verilog Part A: - PowerPoint PPT Presentation

Citation preview

Page 1: RISC processor implementation using Bluespec part 2 - final presentation

RISC PROCESSOR IMPLEMENTATION USING BLUESPEC

PART 2 - FINAL PRESENTATION

Performed By: Yahel Ben-Avraham and Yaron RimmerInstructor: Mony Orbach

Bi-semesterial, 2012 - 2014

30/3/2014

Page 2: RISC processor implementation using Bluespec part 2 - final presentation

Project goals Goal: Implementing and analyzing RISC

Processor using Bluespec Verilog Part A:

Studying the working environment, BSV language and the basic processor implementation.

Implementing a simple RISC processor.Run a simple test bench on the FPGA system.

Page 3: RISC processor implementation using Bluespec part 2 - final presentation

Project goals Goal: Implementing and analyzing RISC

Processor using Bluespec Verilog Part B:

Ramp up the design: Wider instruction set Branch prediction (and flushing) Hazard detection unit and extended Data

forwarding Performance counters

Run the design on the FPGA system

Page 4: RISC processor implementation using Bluespec part 2 - final presentation

Pipeline Datapath

FETCH DEC EXE MEM1 WB

Instruction Memory

Register File

Memory

MEM2

Forwarding

Branch Predictor

Page 5: RISC processor implementation using Bluespec part 2 - final presentation

Fetch Tag the instruction’s metadata (PC, cycle) Fetch the requested instruction from the

instruction memory Update next PC

Get next PC’s branch prediction and branch addressCheck for Jump command

Page 6: RISC processor implementation using Bluespec part 2 - final presentation

Decode Fully parse the received instruction Pre-fetch data from registers potentially in use

Page 7: RISC processor implementation using Bluespec part 2 - final presentation

Execute According to the instruction’s opcode:

ALU instruction: compute the resultMemory instruction: calculate memory address to

read / write toBranch instruction: check if branch is taken and

update branch resolution Data forwarding

Page 8: RISC processor implementation using Bluespec part 2 - final presentation

Memory 1 Send a read / write request to the BRAM

Write : data is immediately storedRead: wait for response in the next cycle

Otherwise, pass the incoming data

Page 9: RISC processor implementation using Bluespec part 2 - final presentation

Memory 2 (mem / skipmem) Implemented in two rules:

For memory read: get BRAM responseOtherwise, pass the incoming struct

Page 10: RISC processor implementation using Bluespec part 2 - final presentation

Writeback Save needed data to the register file

Register 0 – read only Communication with the wrapper

Data and statistics

Page 11: RISC processor implementation using Bluespec part 2 - final presentation

Branch Prediction 2-bit saturated, local counter (initialized to WNT) Prediction is acquired in the Fetch stage

Stored and passed along the pipeline Branch resolution determined in the Exec stage

BP is updated accordingly Wrong prediction?

Correction PCFlushing Dec & Exe

Page 12: RISC processor implementation using Bluespec part 2 - final presentation

Forwarding 4 global Forwarding registers

Each containing (when valid) address, value, cycle Writing - in the end of Exec stage Reading - in the beginning of Exec stage Invalidating - by aging after the Exec stage

FETCH DEC EXE MEM1 WB

Instruction Memory

Register File

Memory

MEM2

Forwarding

Branch Predictor

Page 13: RISC processor implementation using Bluespec part 2 - final presentation

Forwarding – cont. Special case: register read after memory load

Stalling registers holding the address to be read toIf needed – stall the Exec stage by keeping the

current command in the dec/exec FIFO

FETCH DEC EXE MEM1 WB

Instruction Memory

Register File

Memory

MEM2

Forwarding

Branch Predictor

Page 14: RISC processor implementation using Bluespec part 2 - final presentation

The working environment Xilinx FPGA development board – of

Virtex 5 familyProgramming to FPGA using JTAGCommunication with DUT using PCIE

The platform enables:Synthesis of design to FPGAReading and writing to memoriesPerformance counters

Page 15: RISC processor implementation using Bluespec part 2 - final presentation

The platform

Page 16: RISC processor implementation using Bluespec part 2 - final presentation

SCEMI’s working methods“Standard Co-Emulation Modeling Interface” 2 working methods

TCP/IP simulationFPGA emulation

Establishes port on SW end to FIFO on HW end communication

Parcels (data structs) are delivered in both directions

Page 17: RISC processor implementation using Bluespec part 2 - final presentation

System layers – PCIE simulation

FPGA

SCEMI – DUT to PCIE

DUT: Wrapper

Datapath

PCLinux O.S.

C++ Executable: TB

Input files

PCIE

Page 18: RISC processor implementation using Bluespec part 2 - final presentation

System layers – TCP\IP emulation

FPGASCEMI – DUT to PCIE

DUT: Wrapper

Datapath

PCLinux O.S.

C++ Executable: TB

Input files

PCIE

DUT: Bsim_dut

TCP\IP

Page 19: RISC processor implementation using Bluespec part 2 - final presentation

Our SCEMI platform – SW side A compiled C++ code (TB) is loaded with input files Sends and receives messages from the DUT using

incoming \ outgoing ports We chose to use a “Stop & Wait” protocol Performs the following actions:

Loads the DUT’s instruction memoryLoads the DUT’s register fileSignals the DUT to runWhen done, collecting relevant information

Register file Run statistics

Page 20: RISC processor implementation using Bluespec part 2 - final presentation

Our SCEMI platform – HW side Our top level module (Wrapper, which is our DUT) Receiving and sending messages to the TB using

FIFOs Contains the Datapath itself as a black box Performs commands from the TB

Loads the instruction memory and the register fileInitiates all the registers and starts \ stops the run of the

datapath Receives data from the datapath (from the WB

stage) and relay it back to the TB

Page 21: RISC processor implementation using Bluespec part 2 - final presentation

Putting the design to the test As a concluding test, we wrote a Bubble

Sort in assembly, loading 10 unsorted numbers into the memory, then using bubble sort and displaying them in the register file.

The code uses almostall the instruction set, and practicallyevery feature in thedesign.

for (i = 0; i < length -1; ++i) { for (j = 0; j < length - i - 1; ++j) { if (array[j] > array[j + 1]) { int tmp = array[j]; array[j] = array[j + 1]; array[j + 1] = tmp; } } }

Page 22: RISC processor implementation using Bluespec part 2 - final presentation

Critical example – Bubble sort The program works successfully in the

BSV simulation and the TCP\IP simulation.

Results are incorrect in the PCIE emulation.

Page 23: RISC processor implementation using Bluespec part 2 - final presentation

Critical example – Bubble sortFPGA result Expected result – TCP\IP

Page 24: RISC processor implementation using Bluespec part 2 - final presentation

Isolating the problem Trying to isolate the problem – store 4

numbers, and read them into the register file4 ADDI , 4 STORE , 4 LOAD

Encountered unexplained yet repeating results

This is only one of many debugging attempts

Page 25: RISC processor implementation using Bluespec part 2 - final presentation

Isolating the problem Expected result:

consistent with simulation FPGA result:

Padding with 1 NOP:between ADDI and ST

Padding with 2 or more NOPS:

Page 26: RISC processor implementation using Bluespec part 2 - final presentation

Further investigation Dismissing possible issues

Design fault – works flawlessly in simulationsClearing the design between runs

Investigating xilinx compilation filesPlace and route – margins are positiveNo note-worthy warnings

Consulting with Danny Hofshi, Mony Orbach, Yuval H.Nacson

We were unable to solve the problem.

Page 27: RISC processor implementation using Bluespec part 2 - final presentation

Problem characterization PFGA differs in behavior from both BSV

and TCP\IP simulation Related to the Store command – storing

into the BRAM memory Occurs when performing multiples

stores in a row Xilinx reports show no timing warnings

Page 28: RISC processor implementation using Bluespec part 2 - final presentation

Project usage and integration The project is designed modularly, so

that it can be easily modified and enhanced in the future

“Black Box” design Integration oriented information and

step-by-step walkthrough for using the system in designated section in the project’s final report

Page 29: RISC processor implementation using Bluespec part 2 - final presentation

Summary and conclusions Fine line between high- and low- level

implementation Easy to write, modify and understand Excellent simulation environment Differences between simulation and FPGA Automatic optimization – good and bad

Page 30: RISC processor implementation using Bluespec part 2 - final presentation

THANK YOU!