Part III Logic Emulation. What is a Logic Emulation System? 1. 1. A programmable hardware built with...

Preview:

Citation preview

Part IIIPart III

Logic Logic EmulationEmulation

What is a Logic What is a Logic Emulation System?Emulation System?

1.1. A programmable hardware built with programmable logic (FPGA) and programmable interconnect devices (PID).

2.2. A software which automatically programs the hardware according to the circuit under design

3.3. Control HW/SW to support operation of the emulated design as a hardware component operating in real time.

Target System

Typical Logic Emulation Typical Logic Emulation EnvironmentEnvironment

Workstation

Logic Emulator

Logic Module

Probe Module

In-circuitInterface

Compiler, runtime software

Stimulus generator, logic analyzer

Why we need Logic Why we need Logic Emulation?Emulation?

Design verification issues.

Real-time operation.

System-level testing.

Rapid prototyping.

Design Verification Design Verification IssuesIssues

Simulation-based verification methods have run out of steam when chip complexity grows.

Emulation is a verification technology that grows along with design size.

Real-Time OperationReal-Time OperationSimulation requires test vector development which is costly and difficult.

Verification depends on test vector correctness.

Certain applications must be verified in real time - human perception: audio and video.

Emulation connected to actual hardware can run: real diagnostic code, operating systems, and applications.

System-Level TestingSystem-Level TestingOften the chip meets its specifications but it fails in the system.

We have to verify the system-level interactions between the chip and other components. They are hard to formalize.

Internal probing is impossible when the chip is fabbed and placed in a systemBut it is possible using emulation.

Rapid PrototypingRapid Prototyping

Once emulated design is debugged it is available for immediate use by software developers for software debugging.

Emulated design is available for demo and experiments with architecture on real applications and data.

Programmable Hardware includes Programmable Hardware includes programmable interconnectprogrammable interconnect

Programmable interconnect

Memoryelement

VLSI core

Interface Logicelement

Logicelement

Considerations for Considerations for programmable interconnectprogrammable interconnectThe capacity of logic and interconnection depends on package constraints.

This forces a hierarchical system. Chips => boards => boxes => system

The interconnect structure must: 1. Provide successful connectivity, 2. Maximize FPGA utilization, and 3. Minimize delay and skew.

Rent’s rule applies to predict the interconnect needs.

Structures of Multi-FPGA Structures of Multi-FPGA SystemsSystems

Topologies: - Mesh - nearest neighboring. - Crossbar - full and partial.

Interconnect scheme: - Circuit switched. - Time multiplexed.

Nearest Neighbor Nearest Neighbor InterconnectionInterconnection

FPGA FPGA FPGA

FPGA FPGA FPGA

FPGA FPGA FPGA

Advantages and Disadvantages of Advantages and Disadvantages of Nearest Neighbor InterconnectionNearest Neighbor Interconnection

Advantages: Uniform: all chips the same.

Easy to lay out on PCB.

Disadvantages: Routing is easily blocked.

The “through pins” limit the logic utilization of FPGAs.

Long and unpredictable delays.

No natural hierarchical extension.

Nearest Neighbor ExtensionsNearest Neighbor Extensions

FPGA FPGA FPGA

FPGA FPGA FPGA

FPGA FPGA FPGA

Add more neighbors

Connect to non-neighbors

Advantages and Disadvantages of Advantages and Disadvantages of nearest-neighbor extended architecturesnearest-neighbor extended architectures

Advantages: More choices for router by adding diagonal lines & skip lines.

Disadvantages: More complex PCB.

More complex routing software.

Partial Crossbar InterconnectPartial Crossbar Interconnect

A B C D A B C D A B C D A B C D

A pins B pins C pins D pins

Logic blocks

Crossbars

Second-level crossbars

Partial Crossbar InterconnectPartial Crossbar Interconnect

Partial crossbar consists of a set of small full crossbars,

connected to logic blocks but not to each other.

I/O pins of each FPGA are divided into subsets. Each subset is connected by a full crossbar circuit switch.

Partial crossbar is a potentially blocking network.

Characteristics of “Partial Characteristics of “Partial Crossbar Architecture”Crossbar Architecture”

Partial crossbar’s size is proportional to the number of FPGA pins.

All interconnections go through one/three crossbar chips for a one-level/two-level partial crossbar interconnect –

delays are uniform and bounded.

Mixed Full and Partial Mixed Full and Partial CrossbarCrossbar

FPGA

LocalFPIC

Global FPIC

Global FPIC

LocalFPIC

LocalFPIC

FPGA FPGAFPGAFPGA FPGA

Externalconnections

Partialcrossbar

Full crossbar

Circuit Switched versus Circuit Switched versus Time Time MultiplexedMultiplexed Interconnect Schemes Interconnect Schemes

Trade-offs between the operating speed and the hardware cost.Time-multiplexing methodTime-multiplexing method:

can greatly expand available interconnect. allows lower cost IC package and PCB. makes partitioning easier.

BUT System power increases due to frequent signal switching (higher hardware cost). Complex scheduling software. Slow operating speed.

Virtual WiresVirtual Wires

FPGA FPGAPhysical wires

Logicaloutputs

Logical inputs

FPGA FPGA

Mux

DeM

uxI change space to time

Logic Emulation Systems Logic Emulation Systems and their and their interconnection schemesinterconnection schemes

System with mesh topology - Quickturn’s RPM and Virtual Machine Works (IKOS).

System with partial crossbar - Quickturn’s Enterprise, Mars, and System Realizer.

System with mixed full and partial crossbar - Aptix Prototyping System.

System using time-multiplexed interconnect - Virtual Machine Works (IKOS) , CoBALT and Arkos (Quickturn).

Memory Solutions in Emulators and Memory Solutions in Emulators and future devices/systemsfuture devices/systems

Goal: programmable memories with different width/depth/port combinations.

FPGA-based memories: inefficient of using logic resources. timing correctness is difficult to be insured. large or highly multi-ported memories must be partitioned across several FPGAs.

SRAMs with dedicated or programmable controllers.

Logic Emulation Design FlowLogic Emulation Design Flow

Pre-configuration preparation

Full-chipconfiguration

In-circuitemulation

HDL synthesis

Synthesis

Partitioning

System mapping

P & R

Design downloading

Emulators

Logic Emulation Design Logic Emulation Design Compiler Compiler and its componentsand its components

Logic emulation design compiler is a large and complex EDA tool which includes:

Front-end design importer.

HDL-based synthesizer.

Clock and timing analyzer.

Partitioner.

System-level placer and router.

FPGA-based placer and router.

Objectives of logic emulation Objectives of logic emulation compilercompiler

Fast compilation time.

Fast emulation clock.

Timing correctness.

Easy (ECO ENGINEERING Change Order).

Minimize circuit size.

Design Considerations for Logic Design Considerations for Logic EmulatorsEmulators

HDL synthesis: Trade-off run-time and quality. CLB-based vs. gate-based designs.

Clock and timing analysis: Timing correctness, hold-time violation free. Clock skew minimization.

Partitioning: Run time. - Timing and area.

System placement and routing: Timing. Completeness of routing.

FPGA-based placement and routing: Fast run time. Parallel compilation.

Design Considerations for Logic Design Considerations for Logic EmulatorsEmulators

Remember you emulate not the same logic as your design

Hold-Time ViolationHold-Time Violation

Hold-time violation occurs when Routing delay > LUT delay!!!

D Q

CK

D Q

CK

LUT

CLB

Routing delay

Clock distribution problem (Skew)!!!

Timing CorrectnessTiming Correctness

D Q

CK

D Q

CK

LUT

CLB

Routing delay

Delayelement

Delay insertion

Timing CorrectnessTiming Correctness

D Q

CK

D Q

CK

LUT

CLB

Clock path

CE

Primary clock Low-skew net

Use clock enables for gated clocks

Methodology Methodology and components of Logic and components of Logic Emulator SystemEmulator System

Pre-configuration preparation - prepare netlists and control files for configuration.

Testbed preparation - prepare emulation-based operation environment.

Full-chip configuration - download design to the emulator.

In-circuit emulation - test the design.

Pre-Configuration in Emulator Pre-Configuration in Emulator SystemSystem

Translate the leaf-cell libraries into emulation primitives.

Translated libraries must be verified for functional equivalence to original.

Modify and redesign some components to attain compatibility with emulation techniques, such as precharge logic circuits.

Assemble all the gate-level netlists for the entire design.

Testbed in Logic EmulatorTestbed in Logic Emulator

Design and implement the target ICE board combining the emulated design with real hardware.

Slowdown testbed to emulation speed.

Assemble the testbed and emulation equipment.

Full-Chip Configuration & In-Full-Chip Configuration & In-Circuit EmulationCircuit Emulation

Full-chip configuration: Prepare control files.

Partition the design to fit into the emulation system.

Download design into the system.

Verify that the emulation model faithfully implements the design as specified by RTL.

In-circuit emulation

Part IVPart IV

Reconfigurable Reconfigurable Computing and Computing and

SystemsSystems

General-Purpose Computing General-Purpose Computing vs. Custom Computingvs. Custom Computing

General-purpose computing - applying applications on a general-purpose computer.

Custom computing - applying applications on a custom-made application-specific hardware.

Field-programmable devices make this into a reality.

Goals of Reconfigurable Goals of Reconfigurable ComputingComputing

Tailor the architecture to the application.

Minimize or eliminate instruction interpretation.

Exploit fine grained parallelism.

Map software to hardware.

Applications of reconfigurable Applications of reconfigurable computingcomputing

Database search and analysis.Image processing and machine vision.Data compression.Signal processing.Neural networks.Biology computing.Medical computing.Design Automation (PSU)Many more.

ROM

Application 1

Multi-Mode Systems Multi-Mode Systems map map various applications to a reconfigurable various applications to a reconfigurable

systemsystem

Reconfigurable system

• Different configurations for read & write operations of a tape driver (Honeywell).

• Different configurations for different printer controllers (Tektronix).

Application 2

Run-Time Reconfiguration Run-Time Reconfiguration in in military image recognition systemmilitary image recognition system

Jeep?

Tank?

I/O

Truck?Image data

?

• Break single computation into multiple pieces.

• Page in components as needed (virtual hardware), ex., automatic target recognition.

Custom ComputingCustom ComputingApplication-specific systems.

Numerous applications for similar reconfigurable systems.

Offers hardware performance, flexibility to handle numerous algorithms.

Multi-FPGA systems can be viewed as hardware supercomputers.

Tell about DEC Perle

Reconfigurable Co-processorsReconfigurable Co-processors

Processor

Coprocessor

Program 1

Inst1

Program 2

Inst2- Provide custom instructions on a per-application basis.

Types of Reprogrammable Types of Reprogrammable SystemsSystems

Coprocessor

CPU

Attachedprocessing unit

Memory caches

I/Ointerface

Standalone PU

PU = processing Unit

Three ways to attach custom computing units

Types of Reprogrammable Types of Reprogrammable SystemsSystems

Attached and standalone standalone processing units are reprogrammable systems on computer add-on cards and separate reprogrammable cabinets.

Considerations: large communication overhead may over-shadow the speed gain.

Application-specific coprocessors can achieve significant improvement over a wide range of applications.

Types of Reprogrammable Types of Reprogrammable SystemsSystems

Integrate the reprogrammable logic into the processor itself.

A reprogrammable functional unit can be configured on a per-algorithm basis.

Providing some special-purpose instructions tailored to the needs of a given application.

Architectures of Multi-FPGA Architectures of Multi-FPGA (Reconfigurable) Systems(Reconfigurable) Systems

The most commonly used topologies: Mesh: 1D (linear array), 2D, and 3D.

Crossbar: full, partial, mixed, and hierarchical.

Hybrid between mesh and crossbar.

Application-specific architecture.

Hybrid Topology of a reconfigurable Hybrid Topology of a reconfigurable systemsystem

Splash 2: augments a linear array of FPGAs with a crossbar switch.Goal: Supporting systolic circuits.

RAM

FPGA

RAM

FPGA

RAM

FPGA

RAM

FPGA

FPGA

16 FPGAs

Ext. InterfaceExt. Interface

Hybrid TopologyHybrid Topology

FPGA

RAM

FPGA

RAM

FPGA

RAM

FPGA

Hostinterface

Anyboard: A linear array of FPGAs augmented by global buses.

Hybrid TopologyHybrid Topology

4 X 4 meshof FPGAs

RAM

RAM

RAM

RAM

Hostinterface

DECPeRLe-1: a 4 X 4 mesh of FPGAs augmented with shred global buses.

Application-Specific Topology of Application-Specific Topology of MARC-1, MARC-1, one subsystemone subsystem

FPGA

FPGA

FPGA

FPGA

FPGA

FPGA

FPGA

FPGA

FPGA

FPU

Memory

11

1

1

4 5 2 3

4 5 2 3

4 5 2 3

The Marc-1: subsystem 1.

Connections to other FPGAs

Application-Specific Application-Specific Topology of Marc-1, cont.Topology of Marc-1, cont.

1

5

4

3

2Subsystem1

Subsystem1

The Marc-1

• Application in circuit simulation where the program to be executed can be optimized on aper-run basis.

• This is done for values constant within that run, • but which may vary from dataset to dataset.

Application-Specific TopologyApplication-Specific Topology

FPGA

RAM

FPGA

RAM

FPGA

RAM

FPGA

RAM

FPGA

RAM

FPGA

RAM

FPGA

RAM

FPGA

RAM

FPGA

RAMThe RM-nc system: neural network.

Architecture for Computer Architecture for Computer PrototypingPrototyping

FPGA

FPGA

FPGA

FPGAFPGA

FPGAFPGACache memory

Register file

ALU FPU

VME bus

The Mushroom processorprototyping system.

Expandable TopologiesExpandable Topologies

Hierarchical crossbar topology: can be expanded by adding extra level. - Quickturn systems.

Expandable mesh topology: can be expanded by connecting individual boards to form a large mesh.

The Virtual Wires Emulation System (IKOS).

Topology for Adapting Topology for Adapting Other Other ComponentsComponents

Many multi-FPGA systems include non-FPGA resources to provide more general purpose solutions.

The MORRPH system - sockets next to FPGAs which allow to add arbitrary devices to the array.

The G800 board - contains two FPGAs and four sockets.

Topology for Adapting Other Topology for Adapting Other ComponentsComponents

The COBRA systemContains:

based modules (expanding to 2D mesh), RAM modules, I/O modules, and bus modules.

The Springbok systema pre-made daughter board which is able to contain an arbitrary device (on the top) and an FPGA (on the bottom). Daughter boards are mounted on a baseplate.

Topology for Adapting Other Topology for Adapting Other ComponentsComponents

The Quickturn systems - external component adapters.

The Aptix FPCB - a reprogrammable PCB.

Design Methodology Design Methodology for general-for general-purpose configurable systemspurpose configurable systems

Applications

Hostcomputer

Reprogrammable system

Mapping

Typical Software Methodology Typical Software Methodology for for general-purpose configurable systemsgeneral-purpose configurable systems

Application spec.

Analysis System-level synthesis

Software spec.

Codegeneration

Object codeHardware synthesis

Hardware spec.

Typical Software Methodology Typical Software Methodology for for general-purpose configurable systemsgeneral-purpose configurable systems

Hardware spec.

Synthesis

Partitioning & placement

Pin assignment & routing

FPGA P & R

Bit-stream files

Considerations for such Considerations for such complex software systemscomplex software systems

Architectural-specific design tasks.

Design automation process.

The mapping time dominates the setup time for operating the system.

Run-time reconfigurability.

Design Specification and Languages for Design Specification and Languages for reconfigurable software systemsreconfigurable software systems

Standard software programming languages, e.g., C, C++, FORTRAN, and assembly language, vs. HDLs.

Standard software programming languages - a sequential execution model.

HDLs - a parallel execution model.

Who will use it and which one is more suitable for system description???

Compilation IssuesCompilation IssuesTranslate code from software languages into hardware without losing the inherent concurrency of hardware.

Compiler techniques for parallelizing code.

Straight-line code, control flow, and loops.

Transmogrifier C compiler.

System-level and High-System-level and High-level Synthesislevel Synthesis

System-level design evaluation and analysis.

Design estimation.

Hardware-software partitioning.

Interface synthesis.

RTL synthesis.

Logic synthesis and technology mapping.

Partitioning and Partitioning and PlacementPlacement

Topology-aware partitioning methods.

Partitioning onto a multi-FPGA system is equivalent to a placement problem.

Logic utilization and timing.

Pin Assignment and Pin Assignment and RoutingRouting

Pin-assignment - the process of determining which I/O pins to be used for each inter-FPGA signal.

Pin-assignment for a pre-fabricated multi-FPGA system is equivalent to the global routing problem.

Pin-assignment will greatly affect the quality of FPGA’s logic utilization and routability.

Run-Time ReconfigurabilityRun-Time Reconfigurability

Virtual hardware <=> virtual memory. What are their relations? Artificial Intelligence, robotics. Vision.

Hardware on demand.

What is the Initial Un-configured structure?What are the reconfiguring methods.

Software supporting time-varying mapping.

Many open problems need to be solved in the forth coming years.

This is a new issue in system design: how much of the processor is virtual, when to reconfigure?

Applications: Applications: Splash 2Splash 2Stream oriented systolic and SIMDSIMD applications.

Scalable linear array of 16 to 256 processing elements (1 XC4010 with 1/2 Mbyte).

VHDL based.

Sequence comparison - 2300M:0.75M cell updates/sec (Splash 2:Sparc 10).

Edge detection - 10M:242K pixels/sec (Splash 2:Sparc 10).

Applications: PAM (DEC)Applications: PAM (DEC)

Programmable Active Memory (PAM).

C++ based and mesh arrays of XC3090 (DECPeRLe-1).

Applications: Multiple precision arithmetic. RSA encryption. Video compression (JPEG, MPEG, DCT). - High energy physics. Telecommunications.

Sources of some slidesPeter AlfkeXilinx, Incpeter.alfke@xilinx.com

Recommended