24
Transforming a FAST simulator into RTL implementation Nikhil A. Patil & Derek Chiou FAST Research group, University of Texas at Austin 1

Transforming a FAST simulator into RTL implementation Nikhil A. Patil & Derek Chiou FAST Research group, University of Texas at Austin 1

Embed Size (px)

Citation preview

Transforming a FAST simulator into RTL implementation

Nikhil A. Patil & Derek ChiouFAST Research group,

University of Texas at Austin

1

Outline• Research Goal• Motivation• Quick introduction to FAST• Going from FAST to RTL– Data-path– Microcode Compiler– Golden Models– Optimizing to single-cycle

• Benefits• Conclusions

2

Research Goal

• Simplify the design, development, and verification of computer systems

• Significantly reduce overall architecture, RTL, verification, software effort

• Eliminate wasted work; enable code-reuse

3

Motivation

Information duplication in traditional design flow

Architectural Simulator

RTL

Verification

Low Accuracy Software Simulator

Compiler

Synthesis Flow

Software

4

Pre-silicon S-RTL Bugs in Pentium 4

Bob Bentley, “Validating the Intel® Pentium® 4 Microprocessor”, DAC 2001 5

Vision of an ideal design flow

Architectural & Micro-architectural Specification

Architectural Simulator

RTL Verification Software

Shared specification reduces information duplication

6

Vision of an ideal design flow

• Single central source (“code-base”) for all of the following:– Architectural studies– Micro-architectural tuning– RTL implementation– RTL level power modeling– RTL Verification– Software development

• Note: For now, we don’t address anything beyond synthesizable RTL (physical design, etc.)

7

Overview of FAST

8

Points to note about FAST• FM is ISA specific, but micro-architecture agnostic

– Trace sent from FM to TM is ISA-specific, not micro-architecture specific; e.g., x86 opcode, not x86 microcode

• TM implements a (potentially inaccurate) microcode table to “decode” the meaning of the trace– For a simpler ISA, table is an identity mapping

• Currently, our FM can model x86 and PowerPC targets• TM written in Bluespec SystemVerilog• TM is composed of modules connected with FAST

Connectors, that manage latency, throughput and buffering (built upon the theory of Asim A-Ports)

• FAST methodology itself does not introduce any inherent inaccuracies; all inaccuracies are due to lower fidelity models (or bugs)

9

Vision for FAST

• Single central codebase will be comprised of the following three sub-modules:– ISA simulator (C/C++)– Micro-op definition (C/C++)– Micro-architectural definition (Bluespec/C)

• Note that the information contained in each is mutually exclusive– Eliminates possibility of inconsistency

10

From FAST to RTL

• Add data-paths to the timing model– ALU, cache data-stores, forwarding paths

• Magically move the ISA from the FM to TM• Detach trace-buffers; use internal data-path• TM module, improve fidelity– @ 100% fidelity, we have a Golden model

• TM module, improve host/target-cycle ratio– @ 1:1 h/t-cycle ratio, we have RTL– Will need changes to FAST connector

11

Caveats

• Fidelity of the simulation models is transferred to the implementation

• Depending on the model fidelity, it may or may not be possible to run actual software on the implementation

• Use software that uses only the subset of features supported with 100% fidelity; e.g.:– Self-modifying code– Unaligned accesses

12

From FAST to RTL

• Add Data-path• Add Functionality• Detach trace-buffers• Improve fidelity• Improve host performance

13

Data-path• Assuming a sufficiently high fidelity model:

• Adding data-path does not change the module interfaces significantly • It is simple enough to do manually (TASK)

• This process can sometimes unearth fidelity bugs in the simulator; e.g., not accounting for limited number of ports on a register file

• The data-path can be trivially removed for simulation flows

• Data-path also needed for power modeling of certain modules

`if `DATA_PATH == 1 typedef Bit#(32) Data_t;`else typedef Bit#(0) Data_t;`end

struct { Bool write; Addr_t addr; Data_t data;} DCacheReq_t

14

Functionality

• ISA simulation (in FM) can be summarized as:– Fetch: fetch instructions, advancing PC• Modeled in the TM already (with very high fidelity)

– Decode: identifies an instruction with a function• Not modeled in TM at all• Can be written manually or auto-generated (TASK)

– Execute: calls the function• Corresponds to target microcode and data-path• Microcode needs to be made 100% accurate (TASK)

15

Microcode Compiler

• Microcode Compiler (MCC) maps each instruction onto one or more micro-ops

• Takes two software (C/C++) simulators as it’s input:– ISA simulator (currently, bochs)– Micro-op simulator

• Compiles the specification of each instruction/micro-op into a data-flow graph

• Uses exhaustive search to statically map instruction execution onto one or more micro-ops based on a cost table

• In case of a failure, says why a mapping is not possible• Work in progress 16

From FAST to RTL

• Add Data-path √• Add Functionality √• Detach trace-buffers• TM module, improve fidelity– @ 100% fidelity, we have a Golden model

• TM module, improve host/target-cycle ratio– @ 1:1 h/t-cycle ratio, we have RTL– Will need changes to FAST connector

17

Golden models• A 100% cycle-accurate model• May still take multiple FPGA cycles to model a

single target cycle• It is in fact a legitimate implementation• Serves as a golden reference model for the next

step (optimization) as well as for writing and debugging verification suites

• Traditionally, verification teams have written golden models from the architectural specs

• Likely to use FPGA structures efficiently

18

Optimizing to single-cycle

• Automatic transformation of modules may be possible for some simple modules using algorithms to– Unroll a “loop” in hardware– Collapse a multi-state FSM into a single state

• Can Bluespec help here?• Manual optimization is certainly feasible• Currently, FAST Connectors don’t allow this

optimization (TASK)– Connector interface cannot support modules that take

exactly 1 host cycle for every target cycle– Work in progress

19

From FAST to RTL

• Add Data-path √• Add Functionality √• Detach trace-buffers √• TM module, improve fidelity √– @ 100% fidelity, we have a Golden model

• TM module, improve host/target-cycle ratio √– @ 1:1 h/t-cycle ratio, we have RTL– Will need changes to FAST connector

20

Alternative path

• Design the original TM modules as 1-host-cycle implementations

• Automatically convert to n-host-cycle for the simulator– Using Bluespec?

• Without automatic conversion, we would end up with RTL before FAST simulator!– Almost like prototyping

21

Potential benefits• Provides a way to verify FAST simulators• Golden models can be generated for the verification

teams– Verify resulting implementation

• Provide working implementation to RTL designers– Replace one component at a time– Provides a test-rig– Runs software

• Improves communication between teams• Eliminates SIM-RTL calibration• Potentially faster than the simulator– Early versions can be made available to software team

22

Conclusions

• This technology provides a way to use a “single codebase” to meet a variety of needs from Simulation to Implementation to Verification.

• Single central codebase will be comprised of the following three sub-modules:– ISA simulator (C/C++)– Micro-op definition (C/C++)– Micro-architectural definition (Bluespec/C)

23

24