1

Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

  • Upload
    others

  • View
    44

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

Flexible Timing Simulation of RISC-V Processors with Sniper

Neethu Bal Mallya1, Cecilia Gonzalez-Alvarez2, Trevor E. Carlson1

1National University of Singapore, Singapore2Ghent University, Belgium

Page 2: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

Outline

• Need for Simulation• Sniper Simulator Overview• Our enhancements to Sniper• Initial Processor Performance Analysis• Conclusion

2/6/2018 2

Page 3: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

Why do we need Simulation?

2/6/2018 3

Performance analysis of next-generation systems

Pre-silicon software optimizations

Architecture design space exploration

Page 4: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

Verilog/RTL High-level

Trade-offs in Simulation

2/6/2018 4

Page 5: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

Verilog/RTL High-level

Trade-offs in Simulation

2/6/2018 5

Efficient MLP processor

Fetch Decode

Instr.Queue

BypassQueue

CommitWriteback

ALUsand

Caches

LearnSlice

Source: T.E.Carlson, et al., “Load Slice Core” [ISCA2015]

Page 6: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

Sniper Simulator – An Overview

• Parallel simulator based on Interval Simulation

2/6/2018 6

1Currently not supported for RISCV

• Models multi-/many-cores running multithreaded1

and multi-program workloads

• Hardware validated for x86

• Flexible simulation options

Page 7: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

Sniper – Beyond Traditional Simulation• Strong adoption in industry and academia

• 550+ citations• 800+ researcher downloads• 64+ countries

2/6/2018 7

• Actively used since 2011• Belgium-based team• Supports next generation Xeon Phi (KNL++)• HiPEAC TechTransfer Award

Page 8: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

Sniper – Key Differentiators

• Fast development time

• Enables Limit Studies• Branch Prediction• Memory Dependence Prediction• Shared Multi-level Cache Hierarchy

2/6/2018 8

Almost 10 MIPS

1000 cores

Average error of just 11%

with HW • High Performance and Scalability

Page 9: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

Sniper - Interacting with the Simulator

• Python interfaces

• SimAPI• Magic Instructions• SimROIStart() - SimROIEnd()

2/6/2018 9

$SN

IPER

_HO

ME/

incl

ude/

sim_a

pi.h

Page 10: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

Sniper - Interacting with the Simulator

• Energy Stats

2/6/2018 10

Run McPAT

Update the statistics

Page 11: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

Sniper - Interacting with the Simulator

• Loop Tracer

2/6/2018 11

cycles

inst

ruct

ions

Page 12: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

Sniper + RISC-V ecosystem

• RISC-V • Open, Extensible ISA• Collection of related software tools

2/6/2018 12

• Existing Architecture-level Software implementations• Functional simulators

• Many additional things

Spike rv8

Page 13: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

Comparison with existing solutions

2/6/2018 13

Sniper + RISC-V gem5 (RISC5) FireSim / Chisel / Verilog

Development Methodology

C++ based (SW) C++ based (HW) RTL based (HW)

Dev-time +++ ++ +

Sim-time +++ ++ ++++/+/+

Simulationmodel

Cycle-level + Cycle-approximate

Cycle-level Cycle-exact + Cycle-approximate

Flexibility Ease-of-use / modification Requires RTL/ abstract models

Fidelity Sophisticated models require hardware validation

Cycle-exact models derived from synthesizable RTL

Page 14: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

Simulation Flow

2/6/2018 14

RTL

Validation Only

Snip

er Final Check

Short TestingDev

Final Check

Short TestingDevelopmentRT

L

Page 15: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

Sniper Architecture

2/6/2018 15

Sniper Backend

L1

L1L2

L1

L1L2

Cache Performance

Models

Core Performance

Models

Core Model 1

Core Model 2

Core Model M-1

Core Model M

NoC

Thre

ad S

ched

uler

Decoder Library

SIFT 1

SIFT 2

SIFT M

SIFT M-1

SIFT pipesEmulation/ Binary Instrumentation

events thread 1

events thread 2

events thread M

events thread M-1

Sniper Frontend

Page 16: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

How did we enhance Sniper?

2/6/2018 16

Sniper Backend

L1

L1L2

L1

L1L2

Cache Performance

Models

Core Performance

Models

Core Model 1

Core Model 2

Core Model M-1

Core Model M

NoC

Thre

ad S

ched

uler

Decoder Library

SIFT 1

SIFT 2

SIFT M

SIFT M-1

SIFT pipesEmulation/ Binary Instrumentation

events thread 1

events thread 2

events thread M

events thread M-1

Sniper Frontend

Page 17: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

How did we enhance Sniper?

2/6/2018 17

Configuration filesto resemble a BOOM processor

4

RISC-V functional simulators - rv8 / Spike were updated to support SIFT generation

1

3 Core ModelParameters like description of ports/ functional units, latencies, etc. were updated

2 Decoder LibraryArchitectural agnostic methods

were added to implement the decoding phase of the processor

Backend……

SIFT pipes

Frontend

12

34

Page 18: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

Sniper Instruction Trace File Format (SIFT)

2/6/2018 18

• Dynamic Instruction stream generated by the Frontend

Instruction Execution Order

Memory Addresses for Loads and Stores

Branch Directions (taken/not taken)

Executed/masked info for Predicated instructions

Dynam

ic

Page 19: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

How to add new Frontend?

2/6/2018 19

Sift::Writer::InstructionCount()Sift::Writer::CacheOnly()Sift::Writer::Instruction()

// addresses, branch direction, etc.

Instruction Instrumentation

Control

Sift::Writer::Magic()

Page 20: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

How to add new Frontend?

2/6/2018 20

rv8 / Spike

Page 21: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

How to add new Frontend?

2/6/2018 21

Backend……

SIFT pipes

Frontend

SIFTrv8 / Spikerv8 / Spike

Sift::Writer

Page 22: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

How to update Backend?

2/6/2018 22

$SNIPER_HOME/decoder_lib• Decoder Library• 2 classes

• Decoder• InstructionDecoded

$SNIPER_HOME/config• Config Files

$SNIPER_HOME/common/performance_model• Core Model

Page 23: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

How to run Sniper ?./run-sniper --frontend=[pin|dr|spike|rv8|legacy] --config

2/6/2018 23

[SNIPER] Start [SNIPER] --------------------------------------------------------------------------------[SNIPER] Sniper using SIFT/trace-driven frontend[SNIPER] Running full application in DETAILED mode[SNIPER] --------------------------------------------------------------------------------[SNIPER] Enabling performance models[SNIPER] Setting instrumentation mode to DETAILEDTrace Monitor Started[TRACE:0] -- DONE --[SNIPER] Disabling performance models[SNIPER] Leaving ROI after 18.26 secondsOUT: RUN: TraceThread[SNIPER] Simulated 5.0M instructions, 11.2M cycles, 0.45 IPC[SNIPER] Simulation speed 273.4 KIPS (273.4 KIPS / target core - 3657.1ns/instr)[SNIPER] Setting instrumentation mode to FAST_FORWARD[SNIPER] End[SNIPER] Elapsed time: 18.41 seconds

Page 24: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

Experimental Setup

• Sniper multi-core simulator• Similar to BOOM v1 DefaultConfig

• Dispatch width:2, Issue Width:3, ROB:80• 32KB L1s, 1MB L2• 2.0GHz

• SPEC CPU2006 benchmarks• First 5M instructions

2/6/2018 24

Page 25: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

Initial Processor Performance AnalysisTestcase IPC KIPS

470.lbm 0.15 97.899

444.namd 1 304.719

450.soplex 1.52 343.668

456.hmmer 2.71 523.41

462.libquantum 2.65 611.968

2/6/2018 25

130.

8713

0.96

Source: Tuan Ta, et. al, “Simulating Multi-Core RISC-V Systems in gem5”, [CARRV 2018]

Page 26: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

Conclusion• An infrastructure extension of Sniper

• Sniper + RISC-V is now available

• Next steps• Improve the simulator features to allow for a detailed comparison with

cycle-level processor implementations

2/6/2018 26

Page 27: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

• Thank you• Download Today!

• http://snipersim.org/w/Download• Questions?

• http://groups.google.com/group/snipersim

2/6/2018 27

Page 28: Flexible Timing Simulation of RISC-V Processors with Sniper[SNIPER] Disabling performance models [SNIPER] Leaving ROI after 18.26 seconds OUT: RUN: TraceThread [SNIPER] Simulated 5.0M

Flexible Timing Simulation of RISC-V Processors with Sniper

Neethu Bal Mallya1, Cecilia Gonzalez-Alvarez2, Trevor E. Carlson1

1National University of Singapore, Singapore2Ghent University, Belgium