29
Synthesis of Transaction-Level Models to FPGA Synthesis of Transaction-Level Models to FPGA s s Prof. Jason Cong Prof. Jason Cong Yiping Fan, Guoling Han, Wei Jia Yiping Fan, Guoling Han, Wei Jia ng, Zhiru Zhang ng, Zhiru Zhang VLSI CAD Lab VLSI CAD Lab Computer Science Department Computer Science Department University of California, Los Angeles University of California, Los Angeles

Synthesis of Transaction-Level Models to FPGAs

  • Upload
    lamhanh

  • View
    223

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Synthesis of Transaction-Level Models to FPGAs

Synthesis of Transaction-Level Models to FPGAsSynthesis of Transaction-Level Models to FPGAs

Prof. Jason CongProf. Jason CongYiping Fan, Guoling Han, Wei Jiang, Zhiru ZhangYiping Fan, Guoling Han, Wei Jiang, Zhiru Zhang

VLSI CAD LabVLSI CAD LabComputer Science DepartmentComputer Science Department

University of California, Los AngelesUniversity of California, Los Angeles

Page 2: Synthesis of Transaction-Level Models to FPGAs

OutlineOutline Transaction-level model (TLM)Transaction-level model (TLM)

SystemC TLMSystemC TLM Metropolis Meta ModelMetropolis Meta Model

Synthesis from TLMSynthesis from TLM RDR/MCAS: our existing architectural synthesis approachRDR/MCAS: our existing architectural synthesis approach xPilot: Ongoing synthesis infrastructure for TLMxPilot: Ongoing synthesis infrastructure for TLM

Page 3: Synthesis of Transaction-Level Models to FPGAs

OutlineOutline Transaction-level model (TLM)Transaction-level model (TLM)

SystemC TLMSystemC TLM Metropolis Meta ModelMetropolis Meta Model

Synthesis from TLMSynthesis from TLM RDR/MCAS: our existing architectural synthesis approachRDR/MCAS: our existing architectural synthesis approach xPilot: Ongoing synthesis infrastructure for TLMxPilot: Ongoing synthesis infrastructure for TLM

Page 4: Synthesis of Transaction-Level Models to FPGAs

SystemC FrameworkSystemC Framework

SystemC historySystemC history OO system/HW modeling OO system/HW modeling

and simulationand simulation SystemC under development SystemC under development

by CAD vendors/researchersby CAD vendors/researchers• SynopsysSynopsys• Frontier DesignFrontier Design• CoWare (Belgium)CoWare (Belgium)

Released to public Sept. ‘99Released to public Sept. ‘99• Open source distribution Open source distribution

@ @ www.systemc.orgwww.systemc.org• Version 2 out July ‘01Version 2 out July ‘01

Page 5: Synthesis of Transaction-Level Models to FPGAs

Channels and ModulesChannels and Modules

Basic building blocks:Basic building blocks: ModuleModule (class) instances, communicating via (class) instances, communicating via channelchannel (class) instances (class) instances Modules’ functionality coded as concurrent Modules’ functionality coded as concurrent processesprocesses

• Processes communicate via channels or Processes communicate via channels or eventsevents

Page 6: Synthesis of Transaction-Level Models to FPGAs

Communication Modeling in SystemC Communication Modeling in SystemC

Page 7: Synthesis of Transaction-Level Models to FPGAs

Primitive Channels in SystemC LibraryPrimitive Channels in SystemC Library Ordinary signal (wire) of type <T>Ordinary signal (wire) of type <T>

Fill in data type T when instantiatedFill in data type T when instantiated Point-to-point or multi-point (1 writer, n readers)Point-to-point or multi-point (1 writer, n readers)

Signal bus (arbitrary width)Signal bus (arbitrary width) FIFO, for producer/consumer connectionFIFO, for producer/consumer connection Pseudo-channelsPseudo-channels

Mutex & semaphore, for interprocess syncMutex & semaphore, for interprocess sync Accessed using channel syntaxAccessed using channel syntax

Complex “hierarchical” channels composed of primitive channels, Complex “hierarchical” channels composed of primitive channels, processes, modulesprocesses, modules

Page 8: Synthesis of Transaction-Level Models to FPGAs

Events and ProcessesEvents and Processes Events: abstract occurrences used forEvents: abstract occurrences used for

Process triggering (like VHDL sensitivity list)Process triggering (like VHDL sensitivity list) Channel communicationChannel communication Interprocess synchronizationInterprocess synchronization

Process can call wait() to block on eventProcess can call wait() to block on event Event occurrence tells simulator to schedule simulation of relevant processEvent occurrence tells simulator to schedule simulation of relevant process Processes execution Processes execution

NotNot called directly from your code called directly from your code Triggered for simulation by events on ports, channels, or explicit named eventsTriggered for simulation by events on ports, channels, or explicit named events Registered in constructor of enclosing module (associate method with events)Registered in constructor of enclosing module (associate method with events)

Thread process → infinite loopThread process → infinite loop Must call wait() to lose controlMust call wait() to lose control

Method process → runs to completionMethod process → runs to completion Less scheduling overheadLess scheduling overhead

Page 9: Synthesis of Transaction-Level Models to FPGAs

Data Types in SystemCData Types in SystemC SystemC supports

Native C/C++ Types SystemC Types

SystemC Types Data type for system modeling 2 value (‘0’,’1’) logic/logic vector 4 value (‘0’,’1’,’Z’,’X’) logic/logic vector Arbitrary sized integer (Signed/Unsigned) Fixed Point types (Templated/Untemplated)

Objective: Objective: to reflect HW registers & ALU operationsto reflect HW registers & ALU operations

Page 10: Synthesis of Transaction-Level Models to FPGAs

Functional Level and RTL Modeling in SystemCFunctional Level and RTL Modeling in SystemC Functional levelFunctional level

Sequential, algorithmic, software-likeSequential, algorithmic, software-like Explore HW/SW architectures, proof of algorithms, performance modeling & Explore HW/SW architectures, proof of algorithms, performance modeling &

analysisanalysis

Register transfer level Register transfer level Complete Complete detailed functional descriptiondetailed functional description of hardware of hardware

• Every register, bus, bit for every clock cycleEvery register, bus, bit for every clock cycle• Use C++ switch/case for FSM implementationUse C++ switch/case for FSM implementation

At this point, can switch to HDL, but staying in SystemC leverages test At this point, can switch to HDL, but staying in SystemC leverages test benchesbenches

Prepare for HW synthesis step by using only synthesizable constructsPrepare for HW synthesis step by using only synthesizable constructs

Page 11: Synthesis of Transaction-Level Models to FPGAs

Transaction Level Modeling in SystemCTransaction Level Modeling in SystemC Transaction level Transaction level

Model includes architectural componentsModel includes architectural components Maintain component interface accuracyMaintain component interface accuracy

• E.g., buses modeled as channels (read/write operations)E.g., buses modeled as channels (read/write operations) Behavioral style inside a componentBehavioral style inside a component Simulates 100-10,000x faster than RTLSimulates 100-10,000x faster than RTL Provide execution platform for SW developmentProvide execution platform for SW development

Page 12: Synthesis of Transaction-Level Models to FPGAs

TLM – Raise the Level of Architectural ModelingTLM – Raise the Level of Architectural Modeling What is TLM?

Communication uses function calls• burst_read(char* buf, int addr, int len);

Why is TLM interesting? Simulation: Fast and compact Integrate HW and SW models Early platform for SW development Early system exploration and verification Verification reuse Synthesis …

Reference: www.systemc.org

Page 13: Synthesis of Transaction-Level Models to FPGAs

Typical Design Flow Using TLMTypical Design Flow Using TLM

Functional modelFunctional model Captures system Captures system

behaviourbehaviour

TLM, Transaction Level TLM, Transaction Level ModelModel Bus transactionsBus transactions Accurate interaction Accurate interaction

with SW portionwith SW portion Simulates rapidlySimulates rapidly

Can create TLM model Can create TLM model initiallyinitially

Page 14: Synthesis of Transaction-Level Models to FPGAs

Introduction of MetropolisIntroduction of Metropolis A UCB and GSRC project, A UCB and GSRC project, http://www.gigascale.org/metropolis/http://www.gigascale.org/metropolis/ Platform-based design [ASV]Platform-based design [ASV]

Platforms have sufficient flexibility to support a series of applications/products Platforms have sufficient flexibility to support a series of applications/products Choose a platform by design space exploration Choose a platform by design space exploration Above two require models to be reusableAbove two require models to be reusable

Orthogonalization of concernsOrthogonalization of concerns Computation vs. CommunicationComputation vs. Communication Behavior vs. CoordinationBehavior vs. Coordination Behavior vs. ArchitectureBehavior vs. Architecture Capability vs. CostCapability vs. Cost

Page 15: Synthesis of Transaction-Level Models to FPGAs

Metropolis Meta ModelMetropolis Meta Model A combination of imperative program and declarative constraintsA combination of imperative program and declarative constraints Imperative program:Imperative program:

objects (process, media, quantity, statemedia)objects (process, media, quantity, statemedia) netlistnetlist awaitawait block and label block and label interface function call interface function call quantity annotationquantity annotation

Declarative constraintsDeclarative constraints Linear Temporal Logic (LTL)Linear Temporal Logic (LTL) (synch)(synch) Logic of Constraints (LOC)Logic of Constraints (LOC)

Page 16: Synthesis of Transaction-Level Models to FPGAs

A Metropolis Design TutorialA Metropolis Design Tutorial

MyFncNetlist

MP1 P2

Env1 Env2

MyMapNetlist

Page 17: Synthesis of Transaction-Level Models to FPGAs

A Metropolis Design TutorialA Metropolis Design TutorialMyMapNetlist

MyFncNetlist

MP1 P2

Env1 Env2

Y2Twrite()

Th,Wk T2Yread() Bus

ArbiterBus

Mem

Cpu OsSched

MyArchNetlist

mP1 mP2mP1 mP2

B(P1, M.write) <=> B(mP1, mP1.writeCpu); E(P1, M.write) <=> E(mP1, mP1.writeCpu);B(P1, P1.f) <=> B(mP1, mP1.mapf); E(P1, P1.f) <=> E(mP1, mP1.mapf);B(P2, M.read) <=> B(P2, mP2.readCpu); E(P2, M.read) <=> E(mP2, mP2.readCpu);B(P2, P2.f) <=> B(mP2, mP2.mapf); E(P2, P2.f) <=> E(mP2, mP2.mapf);

BusArbiterBus

Mem

Cpu OsSched

MyArchNetlist…

……

Page 18: Synthesis of Transaction-Level Models to FPGAs

Outlook of the First Metropolis ReleaseOutlook of the First Metropolis Release

Meta model infrastructure

SPIN interface

LOC checking

Front end

Meta model language

SystemC simulation

Back end1

Abstract syntax trees

Back end2 Back endNBack end3

Meta model debugger

Sample architectural libraries:

• coarse-simple cpu, bus, memory, arbiters

• time quantity

Sample MoC:

• multi-media (Yapi, TTL)

• Synchronous

A design tutorial

http://www.gigascale.org/metropolis/http://www.gigascale.org/metropolis/

Page 19: Synthesis of Transaction-Level Models to FPGAs

TLM ConclusionsTLM Conclusions SystemC is the defacto system-level-design standard SystemC is the defacto system-level-design standard

Pushed by many CAD tool vendorsPushed by many CAD tool vendors Used widely in industry and academia Used widely in industry and academia

• E.g., Intel handhold system project [ICCAD’04]E.g., Intel handhold system project [ICCAD’04] Unified language to model a system in different levelsUnified language to model a system in different levels Improving path to HW synthesis from SystemC source codeImproving path to HW synthesis from SystemC source code Fits with trend to take system design to higher levelFits with trend to take system design to higher level

Metropolis is a novel academic framework of model of Metropolis is a novel academic framework of model of computationcomputation Capable of representing TLM as wellCapable of representing TLM as well Provides a comprehensive starting point of synthesisProvides a comprehensive starting point of synthesis

Page 20: Synthesis of Transaction-Level Models to FPGAs

OutlineOutline Transaction-level model (TLM)Transaction-level model (TLM)

SystemC TLMSystemC TLM Metropolis Meta ModelMetropolis Meta Model

Synthesis from TLMSynthesis from TLM xPilot: our ongoing synthesis infrastructure for TLMxPilot: our ongoing synthesis infrastructure for TLM RDR/MCAS: our existing architectural synthesis approachRDR/MCAS: our existing architectural synthesis approach

Page 21: Synthesis of Transaction-Level Models to FPGAs

xPilot: TLM to RTL Synthesis Flow xPilot: TLM to RTL Synthesis Flow

TLM in TLM in SystemC/MetropolisSystemC/Metropolis

RTLRTL

SSDMSSDM

Arch-generation passes: RTL/constraints geneArch-generation passes: RTL/constraints generationration Verilog/VHDL/SystemCVerilog/VHDL/SystemC Altera/XilinxAltera/Xilinx General/Synopsys/Magma …General/Synopsys/Magma …

Arch-dependent passesArch-dependent passes Memory analysis/allocationMemory analysis/allocation Scheduling/Binding/Memory analysis/allocationScheduling/Binding/Memory analysis/allocation Register/port bindingRegister/port binding Traditional/Low power/RDR-pipe or Placement Traditional/Low power/RDR-pipe or Placement

driven …driven …

Arch-Independent passesArch-Independent passes SSDM CheckingSSDM Checking Loop unrolling/pipeliningLoop unrolling/pipelining Strength reduction/Bitwidth analysisStrength reduction/Bitwidth analysis Speculative-execution transformation …Speculative-execution transformation …

FPGAsFPGAs

FrontendFrontend

Page 22: Synthesis of Transaction-Level Models to FPGAs

Integration xPilot with MetropolisIntegration xPilot with Metropolis

Meta model infrastructure

Front end

Meta model language

SystemC Simulation

Abstract syntax trees

LOC Checking SPIN Interface Synthesis

HW Implementation

FPGA ASICS …

IP AssemblyPredictable RTL Synthesis

RTL Timing Constraints

Physical Constraints

RTL Handoff

Latency Latency Insensitive DesignInsensitive Design GALSGALSRDR/MCASRDR/MCAS

IP Library

HW implementation

Compilation for RP

Simulation

Extended Instruction

ReconfigurableInterconnect

ReconfigurableCoprocessor

xPilot/SSDM

Page 23: Synthesis of Transaction-Level Models to FPGAs

SSDM Zoomed In – CDFG SSDM Zoomed In – CDFG

if (cond1) bb1();if (cond1) bb1();

else bb2();else bb2();

bb3();bb3();

switch (test1) {switch (test1) {

case c1: bb4(); break;case c1: bb4(); break;

case c2: bb5(); break;case c2: bb5(); break;

case c3: bb6(); break;case c3: bb6(); break;

}}

bb7()bb7()

cond1 bb1()

bb2()

bb3()

bb4()

test1

bb5() bb6()

T

F

c1c2

c3

bb7()

2-level CDFG representation2-level CDFG representation 11stst level: control flow graph level: control flow graph 22ndnd level: data flow graph level: data flow graph

Page 24: Synthesis of Transaction-Level Models to FPGAs

SSDM Features Different from Software IRSSDM Features Different from Software IR Top-level: netlist of concurrent processes Top-level: netlist of concurrent processes Process port/interface semanticsProcess port/interface semantics

FIFO: FifoRead() / FifoWrite()FIFO: FifoRead() / FifoWrite() BUFF: BuffRead() / BuffWrite()BUFF: BuffRead() / BuffWrite() Memory: MemRead() / MemWrite()Memory: MemRead() / MemWrite()

Bit vector manipulationBit vector manipulation Bit extraction / concatenation / insertionBit extraction / concatenation / insertion Bit-width property for every valueBit-width property for every value

Cycle-level notationCycle-level notation Scheduling / binding information / delay Scheduling / binding information / delay

Page 25: Synthesis of Transaction-Level Models to FPGAs

Our Architectural Synthesis Approaches – RDR / MCASOur Architectural Synthesis Approaches – RDR / MCAS

Consideration of multi-cycle communication during architConsideration of multi-cycle communication during architectural (or behavioral) synthesisectural (or behavioral) synthesis Regular Distributed Register (RDR) micro-architecture Regular Distributed Register (RDR) micro-architecture

[Cong et al, ISPD’03][Cong et al, ISPD’03]• Highly regularHighly regular• Direct support of multi-cycle on-chip communicationDirect support of multi-cycle on-chip communication

MCAS: Architectural Synthesis for Multi-cycle CommunicationMCAS: Architectural Synthesis for Multi-cycle Communication• Efficiently maps the behavioral descriptions to RDR uArch Efficiently maps the behavioral descriptions to RDR uArch • Integrates architectural synthesis (e.g. resource binding, schedulinIntegrates architectural synthesis (e.g. resource binding, schedulin

g) with physical planningg) with physical planning

Page 26: Synthesis of Transaction-Level Models to FPGAs

RDR/MCAS: Support for Heterogeneous Integration with Multi-RDR/MCAS: Support for Heterogeneous Integration with Multi-cycle Communication & Automatic Interconnect Pipeliningcycle Communication & Automatic Interconnect Pipelining

Distribute registers to each “island”Distribute registers to each “island” Choose the island size such thatChoose the island size such that

Single cycle for intra-island computation and communicationSingle cycle for intra-island computation and communication Multi-cycle communication between islands Multi-cycle communication between islands

Support interconnect pipeliningSupport interconnect pipelining Inter-island pipeline register station (PRS) for global communicationsInter-island pipeline register station (PRS) for global communications PRS performs PRS performs autonomous autonomous store-and-forwardstore-and-forward

MCAS: Multi-cycle architectural synthesis integrated with global placementMCAS: Multi-cycle architectural synthesis integrated with global placement Experimental resultsExperimental results

MCAS vs. Conventional flow:MCAS vs. Conventional flow:• 36% reduction in clock period and 36% reduction in clock period and • 30% reduction in total latency30% reduction in total latency

MCAS-Pipe vs. MCAS:MCAS-Pipe vs. MCAS:• 28.8% long global wirelength reduction28.8% long global wirelength reduction• 19.3% total wirelength reduction19.3% total wirelength reduction

Can also support IP integration using latency Can also support IP integration using latency insensitive technique [Carloni, ICCAD’99]insensitive technique [Carloni, ICCAD’99]

Pipeline Register Station (PRS)3

1 24

LCC

FSM

LCC

FSM

LCC

FSM

IP Library

Adaptor

Reg. FileV channel

H channel1 2

3 4

PRS

PRS

PRS

PRS

Page 27: Synthesis of Transaction-Level Models to FPGAs

Synthesis Flow: MCAS-Pipe SystemSynthesis Flow: MCAS-Pipe System

ICG

C / VHDL

Locations

Placement-driven rescheduling & rebinding

Scheduling-driven placement

CDFG generation

Register and port binding

Datapath & FSM generation

Resource allocation& Functional unit binding

RTL VHDL & Floorplan constraints

CDFG

Global interconnect sharing

Global interconnect Global interconnect sharingsharing Enable multiple data Enable multiple data

communications to share communications to share one physical link (a wire one physical link (a wire with pipeline registers)with pipeline registers)

Page 28: Synthesis of Transaction-Level Models to FPGAs

Related PublicationsRelated Publications Regular distributed register (RDR) architecture and MCAS synthesis Regular distributed register (RDR) architecture and MCAS synthesis

algorithms algorithms ISPD’03, ICCAD’03ISPD’03, ICCAD’03

RDR-Pipe and MCAS-Pipe synthesis algorithmsRDR-Pipe and MCAS-Pipe synthesis algorithms DAC’04DAC’04

Lopass: high-level synthesis for low-power FPGAsLopass: high-level synthesis for low-power FPGAs ISLPED’03ISLPED’03

Multiplexor optimization through register/port binding Multiplexor optimization through register/port binding ASPDAC’04ASPDAC’04

Bitwidth-aware scheduling and binding algorithms Bitwidth-aware scheduling and binding algorithms ASPDAC’05ASPDAC’05

Page 29: Synthesis of Transaction-Level Models to FPGAs

ConclusionsConclusions Higher level abstraction is needed in current SO(P)C design Higher level abstraction is needed in current SO(P)C design

flowflow SystemC becomes the SLD standard, esp., TLM is widely usedSystemC becomes the SLD standard, esp., TLM is widely used Metropolis is a platform-based design frameworkMetropolis is a platform-based design framework It is time to build new generation of behavioral synthesis system frIt is time to build new generation of behavioral synthesis system fr

om TLMom TLM

xPilot:xPilot: Ongoing projectOngoing project An architectural synthesis infrastructure from TLM to RTL (FPGAs)An architectural synthesis infrastructure from TLM to RTL (FPGAs)