Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Spring2018,EC513ComputerArchitectureAdaptiveandSecureComputingSystems(ASCS)Laboratory
DepartmentofElectricalandComputerEngineering,BostonUniversityProf.MichelA.Kinsy
http://ascslab.org/courses/ec513/index.html
ClassProject:BRISC-V©ExtensionsAssignedMarch28th2018
Milestonesa. ProjectProposalb. Mid-project(PhaseI)
Reportc. ProjectPresentations
andFinalProjectReport
DueDatesa. April5th,2018b. April19th,2018
c. May1st,2018
I. Introduction
RISC-V instruction set is recently proposedby a groupof researchers at EECSDepartmentofUniversityofCalifornia,Berkeley.MainpurposesofproposingRISC-VISAissummarizedbelow:
• Tohaveacompletelyfree-accessISAforbothacademicandindustrialactivities.• Tosupportdifferentvariationsinprocessordesignincluding1)processorwidthsof32,
and64bits,2)single,multiandmanycoredesigns,3)FPGAandASICimplementations.• To have a core set of base integer instructions that can be extended with other
categoriesofinstructions,allowingarchitectstoincludeonlytheneededfeatures.• Tosupportbothuserandsupervisormodesofworkingfortheprocessor.• Tosupportvariable-lengthinstructions.• Tosupportcustominstructionsbasedonspecifictaskstheprocessorisintendedtorun
inspecificfields.The RISC-V instruction set has been used by researchers to test architectures relating tomemoryandcachesub-systems,powerandperformance improvement,amongothers.Therearealsoseveralopen-sourceCPUdesigns,includingthe64-bitBerkeleyOutofOrderMachine(BOOM),64-bitRocket, five32-bit SodorCPUs fromBerkeley,picorv32byCliffordWolf, scr1fromSyntacore,andourownopen-sourceRISC-Vprocessor implementation,calledBRISC-V©.The parameterized BRISC-V© implementation, developed at the Adaptive and SecureComputingSystems laboratory(ASCSLab)ofBostonUniversity,usestheRV32IversionoftheISA.
II. ProjectOverviewIn lecture, we have covered the key and time-tested concepts in computer architecture:pipelining, complex pipelining (Superscalar, Out-of-Order Execution, VLIW, Vector, HardwareMulti-threading, Branch Prediction, Speculative Execution, Caching, Memory Virtualization,
Multi-core, etc.). All these concepts exploit one or more of these parallelism modalities:InstructionLevelParallelism(ILP),DataLevelParallelism(DLP)andTaskLevelParallelism(TLP)to improve the cycle-per-instruction (CPI), a/the core processor performance measurementmetric.Inthisproject,youwillselectoneoftheseconcepts, implementit,andoptimizeitforhands-onarchitecturedesigntrade-offsexperience.
III. DesignBaseTostartoffyourdesign,weareprovidingyouwiththeBRISC-V©singlecycleprocessorbase.
Figure1:Canonicalviewofasingle-cycle,non-pipelineoftheBRISC-V©CPU.
BRISC-V©ispartoftheHeracles©1designplatform,anopen-source,functional,parameterized,synthesizableresearchandteachingtoolforarchitecturalexplorationandhardware-softwareco-design.TheprovidedBRISC-V©singlecycleprocessorbasecomprisesthesofthardware(HDL)modulesandtheirtestbeds,andapplicationcompiler.Itisdesignedwithahighdegreeofmodularitytosupportfastexplorationofdifferentarchitecturalfeaturesandmemorysystemorganizations.Itis a component-based framework with parameterized interfaces and strong emphasis onmodule reusability. The compiler toolchain is used to map C based applications onto theprocessor.
IV. ProjectOutlinePhase I: Pipelining the base version of the BRISC-V© processor and add amulti-level direct-mappedcachetothedesign.Task1:PipeliningthebaseversionoftheBRISC-V©processor
• Youcanselectthenumberofpipelinestages(ideallybetween2to10).Inyourreport,explainclearlywhyyouselectagivennumberofstages.
1http://ascslab.org/research/heracles/index.html
• Weobservedinlecturethatthemaindifficultyinpipeliningispipelinehazards,thatis,datadependenciesbetweeninstructionsinthepipeline.Wediscussedtwopossibilitiesfordealingwithpipelinehazards:stallinganddata forwarding.Stallingmeansthatwestop issuing instructions whenwe detect that the next instruction that wewill issuedepends on the result of in-flight instructions. An alternative to stalling is dataforwarding.Ratherthanwaitingfortheinstructiontocompleteandcommitdatatotheregisterfile,weforwardthedatafromanearlier instruction inthepipelinedirectlytolater instructions. You can elect to implement stall or bypass. Again, make sure tohighlightthedesignchoiceinyourreport.
Task2:Multi-leveldirect-mappedcachedesign
• Youcanimplementtwoorthreelevelsofcaches(L1,L2andMainMemoryorL1,L2,L3andMainMemory). YoushouldhavetwoL1caches(onefor instructionsandonefordata),butsingleinclusiveL2orL3.
• Describeyourcachingpolicies:Write-backorWrite-ThroughandWriteAllocateorNoWriteAllocate.
• Youdonothavetoimplementanycachecoherenceprotocol.• Forthisphaseoftheproject,youarenotrequiredtohaveitworkingwiththeprocessor.
Butyoushouldhaveatestbedimplementedtotestit.• You can also use the Heracles© design base for inspiration on how youmaywant to
implementyourcaches.PhaseII:ImplementonecomplexarchitecturefeaturetothedesigninPhaseI.Task 1: The memory system with the caches should be integrated and tested with theprocessor.Task2:Implementtheadditionalfeature.Task3:Evaluateyourprocessordesignperformance(e.g,CPI,cachemissrates).ProjectTimeline
1. Apr5:ProjectProposalsdue.2. Apr19:Mid-project(PhaseI)reportsdue3. May1:ProjectPresentationsandFinalprojectreportdue.
DeliverablesOnegroupmemberuploadsa.zipfiletitled[groupname]_project.zip(example:brisc_bros_project.zip) to the Blackboard assignment posting. For both the mid-project andfinal submissions,pergroup,youwill submit (1)Verilog sourcecode, (2)Verilog testbeds foreachnewormodifiedmodule,and(3)2-3pagereports.
V. BRISC-V©BaseThe BRISC-V© provided base is split into hardware and software directories. In the hardware folder there are two sub folders named testbeds and src. The testbed directory contains an individual testbed for each module of the BRISC-V©processor. The src sub-folder contains the Verilog files for each module of BRISC-V©.
In the software directory there are three folders called applications, compiler-scripts and riscv-compiler. There is also a file named compile.sh which will generate all the binary files in the binaries directory, from the C programs in applications/src/, using the riscv-compiler. Editing compile.sh can allow for not provided program binaries to be generated. In the applications folder there are two sub folders, binaries and src. In the src folder there are 12 sample C programs and in the binaries folder there are sample program’s .asm, .dump, .mem, .s, and .vmh versions. In compiler-scripts there are perl scripts used to arrange the BRISC-V©program kernel. The folder riscv-compiler contains two folders named bin and libexec. Libexec contains a number of sub-folders which are empty while bin contains the RISC-V gcc tools used for generating binary files to compile and generate binaries for the RISC-V rv32 ISA.
Figure2:Illustrativeschematicdescribingtheorganizationsofthehardwareandsoftwarefolders.
ProgramcompilationflowToassistindevelopingsoftwarefortheBRISC-V©processor, it is accompanied withaGCCRISC-Vcross-compiler. The figureabovedepicts the software flow for compilingaCprograminto the compatible BRISC-V© instruction code that can be executed on the processor. Thecompilationprocessconsistsofaseriesofsevensteps.
1. First, the user invokes riscv32-unknown-elf-gcc to translate the C code into assemblylanguage(e.g.,./riscv32-unknown-elf-gcc-Sfibonacci.c).
2. Instep2,theassemblycodeisthenrunthroughthelinkertosetupthestackpointerandreturnvalueregisters(e.g.,./link.plfibonacci.s).Itsoutputisa.asmfile.
3. Instep3,theusercompilestheassemblyfileintoanobjectfileusingthecross-compiler.This is accomplished by executing riscv32-unknown-elf-as on the .asm file (e.g.,./riscv32-unknown-elf-asfibonacci.asm–ofibonacci.o).
4. Inthisstep,allthejumpaddressesareproperlylinkedwith./riscv32-unknown-elf-ld-N-Ttext0x0004--unresolved-symbols=ignore-allfibonacci.o–ofibonacci.
5. In step 5, the object file is disassembled using the riscv32-unknown-elf-objdumpcommand(e.g.,./riscv32-unknown-elf-objdumpfibonacci.o).Itsoutputisa.dumpfile.
6. In step 6, the constructor script is called to transform the dump file into a Verilogmemory.vmhfileformat(e.g.,./riscv32-unknown-elf-objcopyfibonacci.dump).
7. Finally, a second constructor script is called to transform the dump file into anotherVerilogmemory.memfileformat(e.g.,./dump2vmhfibonacci.dump).DifferentVerilogsimulations or FPGA synthesis tools use different formats, i.e., .vmh or .mem. Theycontainthesamedata.Programs/Applicationsthathavesomeinitialvalues/datastoredin memory will also have a data file generated for them (e.g.,data_fibonacci.vmh/mem).
Figure3:Applicationcompilationtoolchain
For script-based compilation, if you run ./compile.sh, it will take a set of predefined Capplications/programsintheapplication/srcfolderandcompileallofthem.Ifyouwouldliketocompileyourownapplication(e.g.,albert_s_beautiful_code.c)withyourownstackpointersize(albert_s_stack,adecimalnumber),youcanexecute./compile.sh albert_s_beautiful_code.calbert_s_stack.(e.g.,./compile.shfoo.c128).ProjectIdeasThesearejustsuggestedprojectideas.Pleasefeelfreetobecreative.
1. BranchPredictor
Diassembler[…elf-objdump]
Compiler[…elf-gcc]Source code file
e.g., fibonacci.c
Assembly code file e.g., fibonacci.s
Linker Operation
[link.pl]
Program start & result outputassembly codee.g., fibonacci.asm
Linked object code file e.g., fibonacci
Jump Linking[…elf-ld]
Constructor[dump2vmh]
Dump file e.g., fibonacci.dump
Verilog hex memory file e.g., fibonacci.vmh
Constructor2[dump2mem]
Verilog hex memory file e.g., fibonacci.mem
Object code file e.g., fibonacci.o
Assembler[…elf-as]
ALUSrc
6
ALUresultZero
+Shift left 2
ALUControl
ALUOp
RegDst
RegWrite
Readreg 1Readreg 2
Writereg
Writedata
Readdata 1
Readdata 2
Reg
iste
r File
[32-0]
[30, 14-12]
[11-7]
ImmGen
32 64
ID/EXEX/MEM MEM
/WB
Inst
ruct
ion
MemRead
MemWrite
Address
WriteData
ReadData
DataMemory
Branch
PCSrc
MemtoReg
4+
IF/ID
PC
0
1
mux
0
1
mux
0mux
1
0
mux
Inst.Memory
Address
Instruction
BTB
1
2
pred targetpred dir
PC+4 (Not-taken target)taken target
3
MispredictDetection
Unit
Flush
predicted targetPC+4 (Not-taken target)
predicted direction
−4
address
targetdirection
alloc/updt
2. HardwareMulti-threading(Twohardwarethreads)
3. MemoryVirtualization(TLBImplementation)4. Out-of-OrderExecution5. DynamicRegisterRenamingUnit6. VectorExtension7. Superscalar(Addadditionalexecutionunitstotheprocessor,e.g.,amultiplicationunit
withthecorrectlogicmodification)
8. Verylonginstructionword(VLIW)–twoorthreeinstructionsbundles9. Special co-processor (e.g., a Neural Network Accelerator director connected to the
processorGeneralNotes:Althoughyoucandevelopprocessordesignonanymachine,wesuggestyouusethePHO307machinesthatweprovided,andstronglysuggestthatyoutestyourdesignonit.DonotusetheVM,justthenativecomputer.
+1
2
Thread select
PC1PC
1PC1PC
1
I$ IRGPR1GPR1GPR1GPR1
X
Y
2
D$
PCInst. Mem D Decode X1 X2
Data Mem W+GPRs
X2 WFadd X3
X3
FPRs X1
X2 Fmul X3
X2FDiv X3Unpipelined divider
Settingup: Toobtainthematerialsfortheproject,downloadthefileat: https://bit.ly/2GhJIww a. CompilationYoucanextracttheprojectbyrunning:
$ tar –xfz ec513-project.tar.gz NowyouhavetheprojectbasecontainingboththeBRISC-V©processorwritteninVerilog(locatedin ec513-project/hardware)andthecross-compiler(locatedin ec513-project/software)thattakesyourCcodeandprovidesyouwithrawmemoryfilestorunonthebareprocessor.Alongwiththecompiler,youwillfindsomesimpleapplications(locatedin ec513-project/software/applications),andassociatedscripts.Onceyouhaveextractedyourproject,youcompiletheapplicationsforBRISC-V©:
$ cd ec513-project/software $ ./compile.sh
Ifeverythingwentcorrectly,youshouldseeamessage:
COMPILATION SUCCESSFUL! Inthesoftware/applications/binaries/ directoryyoushouldbeabletofind*.vmhand*.memthatyouwilluseinyoursimulations.b. SimulationStep1: StartModelSim simulationenvironment: To runModelSim, type the following in theterminal:
$ /ad/eng/opt/mentor/modelsim/modeltech/bin/vsim Anewwindowshouldappear,anditshoulddisplayapopup.Youcansafelyclosethepopup.
Simulating Verilog with ModelSim
Step 1: Start ModelSim simulation environment. To run ModelSim, type the following in the terminal:
$ /ad/eng/opt/mentor/modelsim/modeltech/bin/vsim
A new window should appear, and it should display a popup. You can safely close the popup.
Step 2: Setting up the design library in ModelSim. In the main menu, click on the file menu item and on the change directory sub item. A file explorer willappear and you will want to navigate to your_project_location/ec513-project/simulation/.
Step2:SettingupthedesignlibraryinModelSim:In themainmenu, click on the filemenu item and on the change directory sub item. A fileexplorer will appear and you will want to navigate to your_project_location/ec513-project/simulation/.
Step3:CompilingVerilogsources: Inthemainmenu,clickoncompile,andcompile.Apopupshouldopen,andyoushouldnavigatetoyourVerilogsourcesinec513-project/hardware/src/.Selectalloftheverilogsources,andclickcompile.
Youmaythinkthatthecompilebuttondidnotwork,buta“createlibrary”popupshouldhaveappeared,possiblybehindthe“compilesourcefiles”popup.Whenpromptedonwhetheryouwanttocreatealibrary,clickyes.Clickcompile,andthendone.
Step4:Loadsimulation:Inthelibrarypane,youshouldseemultiplelibraries.Expandthefirstone – work, and you should see all your Verilog files. Right click on the testbenchtb_RISC_V_Core,andselectthesimulatemenuitem.
Afterclickingonsimulate,youshouldseethiswindow:
Step5: Load thebinaries.Beforewecan runourprocessor,weneed topopulate instructionmemorywithourbinaries,andforsomeapplications,thedatamemoryaswell.Assumingyouhaverunthecompilescript,intheec513-project/software/applications/binariesdirectoryyoushouldfindsome*.memfiles.Inthenewwindow,clickontheViewmenuitem,andselectthe“memorylist(w)”sub-menuitem.
�
� � � . /
Youwill see threedifferentmemories: the fetchstage instructionmemory, thedecodestagereigster file, and the data memory. We want to populate the first one,IF/i_mem_interface/RAM/sram.Ifyourightclickonthefirstitemandselect“viewcontents”,apanewillopenontheright-handside.Noticethatitiscurrentlyundefined–itisfilledwithx-es.Right click on the newpane, and select “ImportData Patterns”. A popupwill appear. In the“LoadType”segmentselect“Fileonly”,andinthe“Fileformat”segmentselect“Veriloghex”.Next, click on browse, and navigate to ec513-project/software/applications/binaries/, andfromthereselecttherightbinary,forexamplegcd.mem.ClickOK,andinthememorypaneyoushouldseethecontentsofyour.memfileintheinstructionmemory.Step 6: Run the simulation. In the Transcript pane, go ahead and type “run 10000”. Thiscommandwillrunthesimulationfor10000ns.
Weseethattheregister9containsthevalue16,which is thegreatestcommondivisorof64and48.EndNote:Makesuretomanageyourtimeproperlyanddistributetheworkloadfairly.
��������� ���� ����� ��