25
1 Copyright © Bluespec Inc. 2005-2008 Bluespec SystemVerilog™ Training Lecture 01: Introduction L01 - 2 Copyright © Bluespec Inc. 2005-2008 Because of its expressive power (comparable to the most advanced programming languages), full synthesizability, and quality of synthesis, Bluespec is fundamentally changing long-held assumptions about design flow: What must be done in SW vs. in HW How to achieve early, fast simulation for spec development, early SW development, architectural exploration and verification How to more closely and formally link specs to implementations How to make testbenches, models and designs more maintainable and reusable ... and more

Bluespec SystemVerilog™ Training Lecture 01: Introduction

  • Upload
    vuxuyen

  • View
    223

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Bluespec SystemVerilog™ Training Lecture 01: Introduction

1

Copyright © Bluespec Inc. 2005-2008

Bluespec SystemVerilog™ Training

Lecture 01: Introduction

L01 - 2Copyright © Bluespec Inc. 2005-2008

Because of its expressive power (comparable to the most advancedprogramming languages), full synthesizability, and quality of synthesis, Bluespec is fundamentally changing long-held assumptions about design flow:

• What must be done in SW vs. in HW

• How to achieve early, fast simulation for spec development, early SW development, architectural exploration and verification

• How to more closely and formally link specs to implementations

• How to make testbenches, models and designs more maintainable and reusable

• ... and more

Page 2: Bluespec SystemVerilog™ Training Lecture 01: Introduction

2

L01 - 3Copyright © Bluespec Inc. 2005-2008

BSV’s unique advantages, at a glance

“BSV” = Bluespec SystemVerilog

(1) Familiar module hierarchy from Verilog �precise, transparent control over

architecture, unlike C, C++, SystemC.

(2) Atomic Transactions (rules) instead of Verilog ‘always blocks’and SystemC ‘sc_threads’� automatically generate control logic for complex, concurrent access to shared resources

(3)Atomic Methods instead of Verilog ‘port lists’ and SystemC ‘methods’� reduces/ eliminates interface-related bugs/race conditions and integration issues, promoting reuse

(6) Rich, parameterized libraries of

interconnect IP (TLM 2.0, OCP, AXI, AHB, ...)� Very quick development of models, designs, and testbenches

(4)More powerful Types and

Strong Type-checking than Verilog or SystemC� statically eliminates representational errors, enforces clock domain discipline.

(5) More powerful Parameterization

and Generates than Verilog or SystemC. E.g., switch modulecan be parameterized by:

routing function,‘arbitrate/merge’ module.

� Range of microarchitectures can be parameterized� Control-adaptivity automatically resynthesizes control logic for each parameterization� very succinct� easily changeable� highly reusable

(7) Fully synthesizable at all levels of abstraction� early architectural feedback� early and very fast simulation on off-the-shelf FPGA boards for architectural exploration, verification and SW development

L01 - 4Copyright © Bluespec Inc. 2005-2008

Some implications of BSV’s unique advantages

High-levelTransactional

I/F

Design IP/SystemVerification IPSoftware

voi d

Tes tBe nc h::

Resul t( v oi d* tbP oin ter

){

Tes tBe nc h* tb = reint erpret _ca st< Te stBe nc h* >( t bP oint er );

un sig ne d d ata [4] ;

doR es p( t b- >drv0, &(t b- >o ut[ 0]), da ta ) ;

Initial Testbench(approx)

Initial IP(approx)

Final Bus/Interconnect

I/F

voi d

Tes tBe nc h::

Resul t( v oi d* tbP oin ter

)

{

Tes tBe nc h* tb = reint erpret _ca st< Te stBe nc h* >( t bP oint er );

un sig ne d d ata [4] ;

doR es p( t b- >drv0, &(t b- >o ut[ 0]), da ta ) ;

co ut < < "d ata i s: " << hex << dat a[0 ] << da ta[ 1] < < d ata [2] <<

dat a[3 ] << en dl;

Tcl_ Obj* cmd = T cl_N ewStr in gOb j( "rep ortRe sul ts ", -1 ) ;for( i nt i= 0; i< 4; i+ + )

voi d

Tes tBe nc h::

Resul t( v oi d* tbP oin ter

)

{

Tes tBe nc h* tb = reint erpret _ca st< Te stBe nc h* >( t bP oint er );

un sig ne d d ata [4] ;

doR es p( t b- >drv0, &(t b- >o ut[ 0]), da ta ) ;

co ut < < "d ata i s: " << hex << dat a[0 ] << da ta[ 1] < < d ata [2] <<

dat a[3 ] << en dl;

Tcl_ Obj* cmd = T cl_N ewStr in gOb j( "rep ortRe sul ts ", -1 ) ;

for( i nt i= 0; i< 4; i+ + )

voi d

Tes tBe nc h::

Resul t( v oi d* tbP oin ter

)

{

Tes tBe nc h* tb = reint erpret _ca st< Te stBe nc h* >( t bP oint er );

un sig ne d d ata [4] ;

doR es p( t b- >drv0, &(t b- >o ut[ 0]), da ta ) ;

co ut < < "d ata i s: " << hex << dat a[0 ] << da ta[ 1] < < d ata [2] <<

dat a[3 ] << en dl;

Tcl_ Obj* cmd = T cl_N ewStr in gOb j( "rep ortRe sul ts ", -1 ) ;

for( i nt i= 0; i< 4; i+ + )

Add Functionality

Add Architectural

Detail

Final Testbench

ExploreArchitecture

Final IP

AddFunctionality

AddFunctionality

Refine/Replace

Strong formal theoretical basis�

� Composition:modules � IPs � Systems

� Reuse (IP and Testbenches)

� Path to formal verification

Early,fast SW platform

Executable spec

Synthesis &

Execution at

every step

Verify/Analyze/

Debug

Bluespec Tools

FPGA/Emulation(>>100X speed)

Bluesim(10X speed)

RTL simulation(1X speed)

BluespecSynthesis

or

or

RTLsynthesis

PowerEstimation

Verilog

Synthesizable testbenches,

for fast verification

Much higher level

than RTL, with same

QoR (quality of results)

Page 3: Bluespec SystemVerilog™ Training Lecture 01: Introduction

3

L01 - 5Copyright © Bluespec Inc. 2005-2008

"I think we ultimately will see atomic transactions in

most, if not all, languages. That's a bit of a guess,

but I think it's a good bet.“

Burton Smith,

Technical Fellow,

Parallel Computing

Recently: for software for multi-core/multi-threaded architectures

Very recently: HW support for“Transactional Memory” in processors

For decades: in Operating Systems, Databases, Distributed Systems

Atomic transactions are the best known tool to tame complex concurrency

Tx

L01 - 6Copyright © Bluespec Inc. 2005-2008

BSV: well positioned for formal verification

Many formal specification languages use the same computation model as BSV, because it is parallel, and because atomicity of rules enables reasoning about correctness with invariants

� UNITY (Chandy&Mishra), TLA+ (Lamport), Event B (Abrial), ...� Historical note: BSV’s technology roots are in fact in formal verification—verification of complex processor pipelines and complex cache coherence protocols (ask for references)

The Rules computation model can be used from high levels of abstraction (executable specs) to lower levels (implementations)

� There is a vast literature on provably correct refinement� This is the most promising approach for formal verification

Page 4: Bluespec SystemVerilog™ Training Lecture 01: Introduction

4

L01 - 7Copyright © Bluespec Inc. 2005-2008

How customers/partners are using BSV:

Modeling for SW development

Modeling for Architecture Exploration

Verification

and, of course, IP creation

Initially, BSV was only

used for IP creation

L01 - 8Copyright © Bluespec Inc. 2005-2008

Example: modeling for SW devel.(@ a major IP company)

Cycle-accurate model of a production LPDDR memory controller

� Will be shipped to each customer of the IP company

� Developed in < 3 man months

Why BSV?

• Very quick model creation

• Parameterization

• Fast sim (Bluesim)

• Potential for much faster sim (FPGA)

• Potential to replace IP creation as well

Page 5: Bluespec SystemVerilog™ Training Lecture 01: Introduction

5

L01 - 9Copyright © Bluespec Inc. 2005-2008

How customers/partners are using BSV:

Modeling for SW development

Modeling for Architecture Exploration

Verification

and, of course, IP creation

L01 - 10Copyright © Bluespec Inc. 2005-2008

Example: CPU architecture exploration(processor microarchitectures

@ a major microprocessor company)

Rewriting 1000s of lines of legacy C++ code for CPU pipeline architecture modeling in BSV, so that it can be synthesized and run on FPGAs

� Expected performance advantage > 1000x

Background: existing microarchitect’s workbench for cycle-accurate exploration of alternative microarchitectures for future CPUs

� Written in C++, developed over a decade

� Highly parameterized and configurable, to facilitate experimentation on alternatives

� Heavily used, but running out of gas for simulation speed (multicore, multithreading)

Status: Demonstrated for 5-stage pipeline model and pipelined out-of-order model;development continuesWhy BSV?

• Only tool with expressive power to replace C++

• and synthesizable to FPGA, for speed

Page 6: Bluespec SystemVerilog™ Training Lecture 01: Introduction

6

L01 - 11Copyright © Bluespec Inc. 2005-2008

Example: CPU and cache architecture exploration(@ a major microprocessor/systems company)

Creating a flexible platform for fast exploration of cache microarchitectures for future multithreaded + multicore CPUs

Status: executing significant prefix of Linux boot sequence, on an FPGA platform, within 6 months of start of project

Why BSV?

• Very quick creation of high level, architecturally

(cycle) accurate models

• and synthesizable to FPGA, for speed

L01 - 12Copyright © Bluespec Inc. 2005-2008

Example: CPU architecture exploration(processor microarchitectureProf. Derek Chiou @ UT Austin)

Goal: flexible platform for fast exploration of pipeline microarchitectures for multithreaded + multicore CPUs

“FAST” system (functional model + predictive timing model)

� Predictive timing model written in BSV, synthesized and running on single FPGA (DRC/Xilinx in Opteron socket)

� Boots unmodified Windows XP and x86 Linux with unmodified x86 apps

� 100% cycle accurate x86 model, at 1.2 MIPS

� Expecting at least 10x more, with simple optimizations

� Compare 1-200 KIPS for software-based cycle-accurate sims

� Ref [Chiou et. al. ICCAD 2007 and MICRO 2007]

Why BSV?

• Very quick creation of high level, architecturally

(cycle) accurate models

• and synthesizable to FPGA, for speed

Page 7: Bluespec SystemVerilog™ Training Lecture 01: Introduction

7

L01 - 13Copyright © Bluespec Inc. 2005-2008

How customers/partners are using BSV:

Modeling for SW development

Modeling for Architecture Exploration

Verification

and, of course, IP creation

L01 - 14Copyright © Bluespec Inc. 2005-2008

BSV/RTLBSV

Transactions

PINS

orEVE

Software API

libZebu.so

Example: IP verification@ Qualcomm (from publicly available information)

BSV for complex transactors on EVE platform

� TLM interfaces

� Transactor functions: data transformation, clock management, timestamp management, statistics management

� and, ... moving testbench functionality to EVE side

Productivity improvements in transactor development:

� 6 months � 3 months (EVE) � 3 weeks (BSV)

Why BSV?• Quick creation of complex test infrastructure

• and synthesizable to FPGA, for speed

Page 8: Bluespec SystemVerilog™ Training Lecture 01: Introduction

8

L01 - 15Copyright © Bluespec Inc. 2005-2008

Example: IP verification(@ major semi company)

Multi channel DMA controller

� ~350K gates (11% smaller than reference design)

� 227 MHz (35% faster than 166 MHz spec)

Verification results:

� Single channel verification detected ~12 bugs

50% typos, rest logic errors

� Multi-channel verification detected ~9 bugs

Several of which were in test environment

� Most bugs found and fixed in 1-2 minutes

None longer than 15 minutes

Why BSV?

• Fewer bugs with atomic transactions, both within a

module and surrounding it (correct-by-construction

control logic)

• Remaining bugs (few) are easier to find and fix

L01 - 16Copyright © Bluespec Inc. 2005-2008

Example: IP verification(AXI demo on FPGA @ Bluespec)

AMBA AXI

TrafficGenerator

M

TrafficGenerator

M

TrafficGenerator

M

TrafficGenerator

M

RAM

S

RAM

S

Transactors

HostC++testbench/host SW/applications

ModelsHW/SW Transactors

Page 9: Bluespec SystemVerilog™ Training Lecture 01: Introduction

9

L01 - 17Copyright © Bluespec Inc. 2005-2008

Example: IP verification(AXI demo on FPGA @ Bluespec)

4,723 (10% utilization)Virtex-4 FX100 slices:

125KASIC gates:

2,000 (including comments)Bluespec lines of code:

22 day Verilog sim → 1 day Bluesim → 53 sec on FPGA

Verilog 1X (1.4K cycles/sec)

Bluesim 22X (31K cycles/sec)

Emulation 35,714X (50M cycles/sec)

C/SystemC/Bluesim

Verilog Simulation

C/SystemC/Bluesim

Bluesim Simulation

C/SystemC/Bluesim

FPGA Emulation

L01 - 18Copyright © Bluespec Inc. 2005-2008

Workstation

Avnet $3.5K FPGA board

USB cable

Example: IP verification(AXI demo on FPGA @ Bluespec)

Why BSV?

•Inexpensive em

ulation platform (low cost FPGA

board)•

Easy setup (BSV transactors)

•Quick availabilty (synthesizability of early

approximation)

Page 10: Bluespec SystemVerilog™ Training Lecture 01: Introduction

10

L01 - 19Copyright © Bluespec Inc. 2005-2008

How customers/partners are using BSV:

Modeling for SW development

Modeling for Architecture Exploration

Verification

and, of course, IP creation

Initially, BSV was only

used for IP creation

L01 - 20Copyright © Bluespec Inc. 2005-2008

Example: FPGA IP creation(MEMOCODE 2008 codesign contest)

Goal: Speed up a software reference application running on the PowerPC on Xilinx XUP reference board using SW/HW codesign

Application: Decrypt/sort/re-encrypt list of records. Optimalsolutions must solve bothsorting and crypto problems with large databases

27 teams started the four week contest. 8 teams delivered 9 solutions.

Xilinx XUP

http://rijndael.ece.vt.edu/memocontest08/

Page 11: Bluespec SystemVerilog™ Training Lecture 01: Introduction

11

L01 - 21Copyright © Bluespec Inc. 2005-2008

Example: FPGA IP creation(MEMOCODE 2008 codesign contest)

Contest Results

C + Impulse C

C + HDL

C Bluespec

Reference: http://rijndael.ece.vt.edu/memocontest08/everybodywins/ 11Xof runner-up(5x in 2007)

Why BSV?• Expressive power enabled all-BSV solution• and fully synthesizable to FPGA, for speed

(3 people, 3 weeks, direct to FPGA)

L01 - 22Copyright © Bluespec Inc. 2005-2008

Example: ASIC IP creation(@ three major semi companies)

Company A:

� High performance video data mover for video subsystem� In silicon; derivatives in progress

Company B:

� System DMA for wireless handset platform� In silicon; derivatives in progress

� Image DMA� Verified, synthesized, meeting PPA. Designer reports 50% fewer bugs compared to Verilog flow

� LCD controller� In progress

Company C:

� Wireless Turbo Decoder� 15% area (ASIC) than original VHDL

� 5X faster verification in Bluesim

� Now running on FPGA, expecting > 1000x faster verification runs

Why BSV?

• Complex concurrency

• Parameterization

Page 12: Bluespec SystemVerilog™ Training Lecture 01: Introduction

12

L01 - 23Copyright © Bluespec Inc. 2005-2008

Example: ASIC IP creation(OFDM @ MIT and Tampere U. Technolgy)

Highly parameterized source code, from which one can instantiate three different wireless protocols:

� 802.11a (WiFi) (MIT)� 802.16 (WiMax) (MIT)� 802.15.3 (WUSB) (Tampere)

� Starting with MIT code for WiFi and WiMax, two Tampere students were able to extend it to WUSB in 6 weeks, learning BSV and WUSB at the same time!

Powerful block-level parameterization� E.g., IFFT (85% of 802.11a transmitter) is parameterized to allow a wide range of micro-architectures

� Variable number of “butterfly-4s”—from 1 (full reuse) to 16 (no reuse) per stage

� Variable number of stages—from 1 (full reuse) to 3 (no reuse)with, of course, different area, performance and power

Note; parts of this IP are considered the “sweet spot” for C-based synthesis

Why BSV?

• Complex concurrency

• Parameterization

This BSV code is open sourced at http://csg.csail.mit.edu/oshd

L01 - 24Copyright © Bluespec Inc. 2005-2008

Example: ASIC IP creation(H.264 decoder @ MIT)

Complete decoder, not just kernel blocksRange of implementations from a single, parameterized source code:

� from QCIF: 176x144 @ 15 frames/sec� to 1080p: (1280x1080)p @ 60 frames/sec

Synthesized at 180 nm

< 10K lines of BSV source code. Compare:� Original reference code: > 80K lines of C� H.264 slice of FFMPEG: 20K lines of C(of course, the C codes are not synthesizable)

Why BSV?

• Complex concurrency

• Parameterization

This BSV code is open sourced at http://csg.csail.mit.edu/oshd

Note; parts of this IP are considered the “sweet spot” for C-based synthesis

Page 13: Bluespec SystemVerilog™ Training Lecture 01: Introduction

13

L01 - 25Copyright © Bluespec Inc. 2005-2008

Models of:

“RISC” processorMIPSItaniumPowerPCARM

L2 cache ctlrDistributed cache coherence

Bus converters

DMA ctlr

Network procQueuing enginesSorting queueArbiterIP lookupDebug controller

PCI Express, USBI2C, MIPI HSI,MIPI Unipro

Pixel processorWaveform generatorPong

IDCT, IFFTDES, AESCIECAM, Motion compensatorMPEG-4, H.264,802.11xx, OFDM, MIMO

LPDDR ctlrDDR2 ctlrSRAM ctlr

FIR filter,

OCPAXIAHB

Example: ASIC & FPGA IP Creation(@ misc. customers and partners)

APB

ComplexDatapaths

(e.g.processor/controller)

ComplexDatapaths

(e.g.processor/controller)

ControlControl Loop/arrayAlgorithms

Loop/arrayAlgorithms

C-basedsynthesis

Bluespec SystemVerilog (BSV)

System Bus

Peripheral Bus

BusBridge

MemoryControllerProcessor

DMAController

DSP/accelerators

PowerManagement

Arbitration

ApplicationSpecific

DRAM

SRAM

L2Cache

SerialController

Audio VideoFlash/Mem

I/FBus

Controller

L01 - 26Copyright © Bluespec Inc. 2005-2008

A few small examples, for flavor

What follows are a few small examples to illustrate what is possible in BSV

Please hold off on detailed technical questions until later; don’t worry if these are not fully understandable now

� This is just the “overture music”, to bathe you in the leitmotifs, the idioms, the themes, as you settle comfortably into your seats leaving behind the stresses of the street

� The plot of the opera will be revealed to you as we progress through the tutorial!

� Patience: the soprano will sing; Baron Scarpia will die!

Page 14: Bluespec SystemVerilog™ Training Lecture 01: Introduction

14

L01 - 27Copyright © Bluespec Inc. 2005-2008

Multiplier Example

Simple binary multiplication:

100101011001

00001001

00000101101

// d = 4’d9// r = 4’d5// d << 0 (since r[0] == 1)// 0 << 1 (since r[1] == 0)// d << 2 (since r[2] == 1)// 0 << 3 (since r[3] == 0)// product (sum of above) = 45

x

What does it look like in Bluespec?

L01 - 28Copyright © Bluespec Inc. 2005-2008

interface Mult_ifc;method Action start (Tin x, Tin y);method Tout result ();

endinterface: Mult_ifc

module mkTest ();

Reg#(int) state <- mkReg(0);Mult_ifc m <- mkMult1();

rule go (state == 0);m.start (9, 5);state <= 1;

endrule

rule finish (state == 1);$display (“Product = %d”,m.result());state <= 2;

endrule

endmodule: mkTest

module mkMult1 (Mult_ifc);Reg#(Tout) product <- mkReg(0);Reg#(Tout) d <- mkReg(0);Reg#(Tin) r <- mkReg(0);

rule cycle (r != 0);if (r[0] == 1) product <= product + d;d <= d << 1;r <= r >> 1;endrule

method Action start (x,y) if (r == 0);d <= x; r <= y; product <= 0;endmethod

method result () if (r == 0);return product;endmethod

endmodule: mkMult1

Multiplier Example

Page 15: Bluespec SystemVerilog™ Training Lecture 01: Introduction

15

L01 - 29Copyright © Bluespec Inc. 2005-2008

Concurrency and Shared Resources

Process 0 increments register x under some condition cond0Process 1 transfers a unit from register x to register y under cond1Process 2 decrements register y under some condition cond2

Each register can only be updated by one process on each clock.� Priority: 2 > 1 > 0

This is an abstraction of some real applications:� Bank account: 0=deposit, 1=transfer to savings, 2 = withdraw from savings� Packet processor: 0 = packet arrives, 1 = packet is processed, 2 = packet

departs� …

Priority: 2 > 1 > 00 1 2

x y

+1 -1 +1 -1

cond0 cond2cond1

L01 - 30Copyright © Bluespec Inc. 2005-2008

Fundamentally, we are scheduling three potentially concurrent atomic transactions that share resources.

What if the priorities changed: cond1 > cond2 > cond0?What if the processes are in different modules?

0 1 2

x y

+1 -1 +1 -1 Process priority: 2 > 1 >

0

cond0 cond1

cond2

always @(posedge CLK) begin

if (cond2)

y <= y – 1;

else if (cond1) begin

y <= y + 1; x <= x – 1;

end

if (cond0 && !cond1)

x <= x + 1;

end

* There are other ways to write this RTL, but all suffer from same analysis

Resource-access scheduling logic i.e., control logic

always @(posedge CLK) begin

if (cond2)

y <= y – 1;

else if (cond1) begin

y <= y + 1; x <= x – 1;

end

if (cond0 && (!cond1 || cond2)

)

x <= x + 1;

end

Better scheduling

Page 16: Bluespec SystemVerilog™ Training Lecture 01: Introduction

16

L01 - 31Copyright © Bluespec Inc. 2005-2008

process(CLK)begin

if (CLK = ‘1’ and CLK’event) then if (cond2 and cond1)

x <= x – 1;else if (cond0) then

x <= x + 1;end if;

if (cond2) theny <= y – 1;

else if (cond1) theny <= y + 1;

end if;end if;

end process;

Concurrency and Shared Resources (VHDL)

process(CLK) begin // process 0if (CLK = ‘1’ and CLK’event) then

if ((not cond1) and cond0) then x <= x + 1;end if;

end if;endprocess;

process(CLK) begin // process 1if (CLK = ‘1’ and CLK’event) then

if ((not cond2) and cond1) then x <= x – 1;y <= y + 1;

end if;end if;

endprocess;

process(CLK) begin // process 2if (CLK = ‘1’ and CLK’event) then

if (cond2) then y <= y - 1;end if;

end if;endprocess;

if (((not cond1) or cond2) && cond0) then

Priority: 2 > 1 > 00 1 2

x y

+1 -1 +1 -1

cond0 cond2cond1

L01 - 32Copyright © Bluespec Inc. 2005-2008

With Bluespec, the design is direct

(* descending_urgency = “proc2, proc1, proc0” *)

rule proc0 (cond0);

x <= x + 1;

endrule

rule proc1 (cond1);

y <= y + 1;

x <= x – 1;

endrule

rule proc2 (cond2);

y <= y – 1;

endrule

Hand-written RTL:Explicit scheduling���� Complex clutter, unmaintainable

BSV:Functional correctness follows directly from rule semantics (atomicity)

Executable spec (operation-centric)

Automatic handling of shared resource control logic

Same hardware as the RTL

0 1 2

x y

+1 -1 +1 -1

Process priority: 2 > 1 >

0

cond0 cond1

cond2

Page 17: Bluespec SystemVerilog™ Training Lecture 01: Introduction

17

L01 - 33Copyright © Bluespec Inc. 2005-2008

Now, let’s make a small change: add a new process and insert its priority

0

1

2

x y

+1

-1 +1

-1

Process priority: 2 > 3 > 1 > 0

cond0 cond1

cond2

3+2 -2

cond3

L01 - 34Copyright © Bluespec Inc. 2005-2008

Process priority: 2 > 3 > 1 > 0

Changing the Bluespec design

0

1

2

x y

+1

-1 +1

-1

cond0 cond1

cond2

3+2 -2

cond3

(* descending_urgency = “proc2, proc1, proc0” *)

rule proc0 (cond0);

x <= x + 1;

endrule

rule proc1 (cond1);

y <= y + 1;

x <= x – 1;

endrule

rule proc2 (cond2);

y <= y – 1;

endrule

(* descending_urgency = "proc2, proc3, proc1, proc0" *)

rule proc0 (cond0);

x <= x + 1;

endrule

rule proc1 (cond1);

y <= y + 1;

x <= x - 1;

endrule

rule proc2 (cond2);

y <= y - 1;

x <= x + 1;

endrule

rule proc3 (cond3);

y <= y - 2;

x <= x + 2;

endrule

Pre-Change

?

Page 18: Bluespec SystemVerilog™ Training Lecture 01: Introduction

18

L01 - 35Copyright © Bluespec Inc. 2005-2008

Process priority: 2 > 3 > 1 > 0

Changing the Verilog design

0

1

2

x y

+1

-1 +1

-1

cond0 cond1

cond2

3+2 -2

cond3

always @(posedge CLK) begin

if (!cond2 && cond1)

x <= x – 1;

else if (cond0)

x <= x + 1;

if (cond2)

y <= y – 1;

else if (cond1)

y <= y + 1;

end

always @(posedge CLK) begin

if ((cond2 && cond0) || (cond0 && !cond1 && !cond3))

x <= x + 1;

else if (cond3 && !cond2)

x <= x + 2;

else if (cond1 && !cond2)

x <= x - 1

if (cond2)

y <= y - 1;

else if (cond3)

y <= y - 2;

else if (cond1)

y <= y + 1;

end

Pre-Change

?

L01 - 36Copyright © Bluespec Inc. 2005-2008

Alternate RTL style (more common)

Combinatorial explosion

Case 3’b111 is subtle

Many repetitions of update actions (� cut-paste errors)

� cf. “WTO Principle” (Write Things Once—Gerard Berry)

Difficult to maintain/extend

Difficult to modularize

0 1 2

x y

+1 -1 +1 -1 Process priority: 2 > 1 >

0

cond0 cond1

cond2

always @ (posedge clk)

case ({cond0, cond1, cond2})

3'b000: begin // nothing happens

x <= x; y <= y;

end

3'b001: begin //proc2 fires

y <= y-1;

end

3'b010: begin //proc1

x <= x-1; y <= y+1;

end

3'b011: begin //proc2 fires

(2>1)

y <= y-1;

end

3'b100: begin //proc0

x <= x+1;

end

3'b101: begin //proc2 + proc0

x <= x+1; y <= y-1;

end

3'b110: begin //proc1 (1>0)

x <= x-1; y <= y+1;

end

3'b111: begin //proc2 + proc0

x <= x+1; // NOTE – subtle!

y <= y-1;

end

endcase

Page 19: Bluespec SystemVerilog™ Training Lecture 01: Introduction

19

L01 - 37Copyright © Bluespec Inc. 2005-2008

The concurrency complexities illustrated in the simple bank account example are greatly magnified in real designs

L01 - 38Copyright © Bluespec Inc. 2005-2008

A more complex example, from CPU design

Speculative, out-of-order

Many, many concurrent activities

Branch

RegisterFile

ALUUnitRe-

OrderBuffer(ROB) MEM

Unit

DataMemory

InstructionMemory

Fetch Decode

FIFO

FIFO FIFO FIFO FIFO

FIFO

FIFOFIFO

FIFOFIFORe-

OrderBuffer(ROB)

Branch

RegisterFile

ALUUnit

MEMUnit

DataMemory

InstructionMemory

Fetch Decode

Page 20: Bluespec SystemVerilog™ Training Lecture 01: Introduction

20

L01 - 39Copyright © Bluespec Inc. 2005-2008

Many concurrent actions on common state: nightmare to manage explicitly

EmptyWaiting

EW

Head

Tail

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V - -Instr - V -

V 0 -Instr B V 0W

V 0 -Instr C V 0W

-Instr D V 0W

V 0 -Instr A V 0W

V - -Instr - V -

V - -Instr - V -E

E

E

E

E

E

E

E

E

E

E

E

V 0

Re-Order Buffer

Put aninstr intoROB

DecodeUnit

RegisterFile

Get operandsfor instr

Writebackresults

Get a readyALU instr

Get a readyMEM instr

Put ALU instrresults in ROB

Put MEM instrresults in ROB

ALUUnit(s)

MEMUnit(s)Resolve

branches

Operand 1 ResultInstruction Operand 2State

L01 - 40Copyright © Bluespec Inc. 2005-2008

Branch Resolution

• …

• …

• …Commit Instr

• Write results to registerfile (or allow memorywrite for store)

• Set to Empty

• Increment head pointer

Write Back Results to ROB

• Write back results toinstr result

• Write back to all waitingtags

• Set to done

Dispatch Instr

• Mark instructiondispatched

• Forward to appropriateunit

But in BSV…

..you can code each operation in isolation, as a rule

..the tool guarantees that operations are INTERLOCKED (i.e. each runs to completion without external interference)

Insert Instr in ROB

• Put instruction in firstavailable slot

• Increment tail pointer

• Get source operands

- RF <or> prev instr

Page 21: Bluespec SystemVerilog™ Training Lecture 01: Introduction

21

L01 - 41Copyright © Bluespec Inc. 2005-2008

Example: a butterfly switch (crossbar)

Basic building blocks:

Recursive construction: 1x1 � 2x2 � 4x4 … � NxN

00

01

10

11

L01 - 42Copyright © Bluespec Inc. 2005-2008

Butterfly switch: interface

Parameterized by #ports (n), packet type (t)

Intuitive high-level structures for the interface:

� Hierarchical sub-interfaces

� Aggregation (vectors of interfaces, lists, ...)

interface XBar #(numeric type n, type t);

interface Vector#(n, Put#(t)) input_ports;

interface Vector#(n, Get#(t)) output_ports;

endinterface

Page 22: Bluespec SystemVerilog™ Training Lecture 01: Introduction

22

L01 - 43Copyright © Bluespec Inc. 2005-2008

Butterfly switch: module(< 60 lines, fully synthesizable)

module mkXBar #(function UInt #(32) destinationOf (t x),

module #(Merge2x1 #(t)) mkMerge2x1)

( XBar #(n, t) );

if (logn == 0) … // BASE CASE

FIFO#(t) f <- mkFIFO;

else … // RECURSIVE CASE

XBar#(nhalf, t) upper <- mkXBar ( …);

XBar#(nhalf, t) lower <- mkXBar ( …);

for (Integer j = 0; j < n; j = j + 1) …

rule route; …

if (! flip) merges [j] .iport0.put (x);

else merges [jFlipped].iport1.put (x);

endrule

endmodule: mkXBar

Packet routing function

Recursive module instantiation

Generate behavior (rules)

with loops

Parameterization with ANY type

Turing-complete “generate”over arbitrary design elements

High level interfaces

2�1 merge submodule

L01 - 44Copyright © Bluespec Inc. 2005-2008

(see also whitepaper for full source code)

Key Implementation Notes:

- The switch module: < 60 lines of BSV code

- First working (tested) prototype: < 1 day(including simple testbench)

- Fully synthesizable:Synthesized to layout (Magma, TSMC 0.18u, 550 MHz)

Butterfly switch

Page 23: Bluespec SystemVerilog™ Training Lecture 01: Introduction

23

L01 - 45Copyright © Bluespec Inc. 2005-2008

Elevating Design above RTL

Fully synthesizable

VHDL/Verilog/SystemVerilog/SystemC

Bluespec SystemVerilog

Correct: Concurrency and Communications

Atomic Transactions using Rules and interface methods for

complex control and concurrency:

• Across multiple shared resources

• Across module boundaries

Correct: Construction and Configurability

• High-level types closer to spec

• Much more powerful “generate”

• Powerful parameterization

• Powerful static checking & formal semantics

• Advanced clock management

Behavioral Structural

L01 - 46Copyright © Bluespec Inc. 2005-2008

Setting your expectations about what you will learn from a few days of training

Mapping an idea or specification into BSV hardware, and using abstraction well, takes training and practice: one can’t become an expert overnight!

However, even after these few days of training, you will be able to design using BSV, and still expect at least a 2x improvement in your productivity

� The basics are easy!� Even easier than Verilog/VHDL, we believe, because of the direct,

natural way to describe HW that is HW-centric and not simulation-centric. Many people have learned HW design for the first time using BSV.

� Experience in previous training sessions has borne this out time and again!

� “Aha! I get it now! Wow! Why would I ever go back to RTL?”

“Earn your Ph.D. from home in just 6 months! Call toll-free now!”(spam )

Page 24: Bluespec SystemVerilog™ Training Lecture 01: Introduction

24

L01 - 47Copyright © Bluespec Inc. 2005-2008

Bluespec Flow

Bluespec SystemVerilogTM source

Verilog 95 RTL

Verilog sim

VCD output

Visualization

(e.g., Debussy)

Bluespec Synthesis

files

Bluespec tools

3rd party tools

Legend

RTL synthesis

gates

Bluesim CycleAccurate

Bluespec

Development

Workstation

Other

Verilog/VHDL

Copyright © Bluespec Inc. 2005-2008

End of Lecture

Page 25: Bluespec SystemVerilog™ Training Lecture 01: Introduction

25

L01 - 49Copyright © Bluespec Inc. 2005-2008

Prerequisites

Basic digital design

Exposure to basic microarchitectures: pipelining, FSMs, data and control paths,

Done some coding in RTL (Verilog or VHDL)

Done some verification of RTL

Aware of typical flow: architecture exploration, design, verification, physical design

Not assuming any prior exposure to SystemVerilog