View
231
Download
3
Tags:
Embed Size (px)
Citation preview
Part IIIPart III
Logic Logic EmulationEmulation
What is a Logic What is a Logic Emulation System?Emulation System?
1.1. A programmable hardware built with programmable logic (FPGA) and programmable interconnect devices (PID).
2.2. A software which automatically programs the hardware according to the circuit under design
3.3. Control HW/SW to support operation of the emulated design as a hardware component operating in real time.
Target System
Typical Logic Emulation Typical Logic Emulation EnvironmentEnvironment
Workstation
Logic Emulator
Logic Module
Probe Module
In-circuitInterface
Compiler, runtime software
Stimulus generator, logic analyzer
Why we need Logic Why we need Logic Emulation?Emulation?
Design verification issues.
Real-time operation.
System-level testing.
Rapid prototyping.
Design Verification Design Verification IssuesIssues
Simulation-based verification methods have run out of steam when chip complexity grows.
Emulation is a verification technology that grows along with design size.
Real-Time OperationReal-Time OperationSimulation requires test vector development which is costly and difficult.
Verification depends on test vector correctness.
Certain applications must be verified in real time - human perception: audio and video.
Emulation connected to actual hardware can run: real diagnostic code, operating systems, and applications.
System-Level TestingSystem-Level TestingOften the chip meets its specifications but it fails in the system.
We have to verify the system-level interactions between the chip and other components. They are hard to formalize.
Internal probing is impossible when the chip is fabbed and placed in a systemBut it is possible using emulation.
Rapid PrototypingRapid Prototyping
Once emulated design is debugged it is available for immediate use by software developers for software debugging.
Emulated design is available for demo and experiments with architecture on real applications and data.
Programmable Hardware includes Programmable Hardware includes programmable interconnectprogrammable interconnect
Programmable interconnect
Memoryelement
VLSI core
Interface Logicelement
Logicelement
Considerations for Considerations for programmable interconnectprogrammable interconnectThe capacity of logic and interconnection depends on package constraints.
This forces a hierarchical system. Chips => boards => boxes => system
The interconnect structure must: 1. Provide successful connectivity, 2. Maximize FPGA utilization, and 3. Minimize delay and skew.
Rent’s rule applies to predict the interconnect needs.
Structures of Multi-FPGA Structures of Multi-FPGA SystemsSystems
Topologies: - Mesh - nearest neighboring. - Crossbar - full and partial.
Interconnect scheme: - Circuit switched. - Time multiplexed.
Nearest Neighbor Nearest Neighbor InterconnectionInterconnection
FPGA FPGA FPGA
FPGA FPGA FPGA
FPGA FPGA FPGA
Advantages and Disadvantages of Advantages and Disadvantages of Nearest Neighbor InterconnectionNearest Neighbor Interconnection
Advantages: Uniform: all chips the same.
Easy to lay out on PCB.
Disadvantages: Routing is easily blocked.
The “through pins” limit the logic utilization of FPGAs.
Long and unpredictable delays.
No natural hierarchical extension.
Nearest Neighbor ExtensionsNearest Neighbor Extensions
FPGA FPGA FPGA
FPGA FPGA FPGA
FPGA FPGA FPGA
Add more neighbors
Connect to non-neighbors
Advantages and Disadvantages of Advantages and Disadvantages of nearest-neighbor extended architecturesnearest-neighbor extended architectures
Advantages: More choices for router by adding diagonal lines & skip lines.
Disadvantages: More complex PCB.
More complex routing software.
Partial Crossbar InterconnectPartial Crossbar Interconnect
A B C D A B C D A B C D A B C D
A pins B pins C pins D pins
Logic blocks
Crossbars
Second-level crossbars
Partial Crossbar InterconnectPartial Crossbar Interconnect
Partial crossbar consists of a set of small full crossbars,
connected to logic blocks but not to each other.
I/O pins of each FPGA are divided into subsets. Each subset is connected by a full crossbar circuit switch.
Partial crossbar is a potentially blocking network.
Characteristics of “Partial Characteristics of “Partial Crossbar Architecture”Crossbar Architecture”
Partial crossbar’s size is proportional to the number of FPGA pins.
All interconnections go through one/three crossbar chips for a one-level/two-level partial crossbar interconnect –
delays are uniform and bounded.
Mixed Full and Partial Mixed Full and Partial CrossbarCrossbar
FPGA
LocalFPIC
Global FPIC
Global FPIC
LocalFPIC
LocalFPIC
FPGA FPGAFPGAFPGA FPGA
Externalconnections
Partialcrossbar
Full crossbar
Circuit Switched versus Circuit Switched versus Time Time MultiplexedMultiplexed Interconnect Schemes Interconnect Schemes
Trade-offs between the operating speed and the hardware cost.Time-multiplexing methodTime-multiplexing method:
can greatly expand available interconnect. allows lower cost IC package and PCB. makes partitioning easier.
BUT System power increases due to frequent signal switching (higher hardware cost). Complex scheduling software. Slow operating speed.
Virtual WiresVirtual Wires
FPGA FPGAPhysical wires
Logicaloutputs
Logical inputs
FPGA FPGA
Mux
DeM
uxI change space to time
Logic Emulation Systems Logic Emulation Systems and their and their interconnection schemesinterconnection schemes
System with mesh topology - Quickturn’s RPM and Virtual Machine Works (IKOS).
System with partial crossbar - Quickturn’s Enterprise, Mars, and System Realizer.
System with mixed full and partial crossbar - Aptix Prototyping System.
System using time-multiplexed interconnect - Virtual Machine Works (IKOS) , CoBALT and Arkos (Quickturn).
Memory Solutions in Emulators and Memory Solutions in Emulators and future devices/systemsfuture devices/systems
Goal: programmable memories with different width/depth/port combinations.
FPGA-based memories: inefficient of using logic resources. timing correctness is difficult to be insured. large or highly multi-ported memories must be partitioned across several FPGAs.
SRAMs with dedicated or programmable controllers.
Logic Emulation Design FlowLogic Emulation Design Flow
Pre-configuration preparation
Full-chipconfiguration
In-circuitemulation
HDL synthesis
Synthesis
Partitioning
System mapping
P & R
Design downloading
Emulators
Logic Emulation Design Logic Emulation Design Compiler Compiler and its componentsand its components
Logic emulation design compiler is a large and complex EDA tool which includes:
Front-end design importer.
HDL-based synthesizer.
Clock and timing analyzer.
Partitioner.
System-level placer and router.
FPGA-based placer and router.
Objectives of logic emulation Objectives of logic emulation compilercompiler
Fast compilation time.
Fast emulation clock.
Timing correctness.
Easy (ECO ENGINEERING Change Order).
Minimize circuit size.
Design Considerations for Logic Design Considerations for Logic EmulatorsEmulators
HDL synthesis: Trade-off run-time and quality. CLB-based vs. gate-based designs.
Clock and timing analysis: Timing correctness, hold-time violation free. Clock skew minimization.
Partitioning: Run time. - Timing and area.
System placement and routing: Timing. Completeness of routing.
FPGA-based placement and routing: Fast run time. Parallel compilation.
Design Considerations for Logic Design Considerations for Logic EmulatorsEmulators
Remember you emulate not the same logic as your design
Hold-Time ViolationHold-Time Violation
Hold-time violation occurs when Routing delay > LUT delay!!!
D Q
CK
D Q
CK
LUT
CLB
Routing delay
Clock distribution problem (Skew)!!!
Timing CorrectnessTiming Correctness
D Q
CK
D Q
CK
LUT
CLB
Routing delay
Delayelement
Delay insertion
Timing CorrectnessTiming Correctness
D Q
CK
D Q
CK
LUT
CLB
Clock path
CE
Primary clock Low-skew net
Use clock enables for gated clocks
Methodology Methodology and components of Logic and components of Logic Emulator SystemEmulator System
Pre-configuration preparation - prepare netlists and control files for configuration.
Testbed preparation - prepare emulation-based operation environment.
Full-chip configuration - download design to the emulator.
In-circuit emulation - test the design.
Pre-Configuration in Emulator Pre-Configuration in Emulator SystemSystem
Translate the leaf-cell libraries into emulation primitives.
Translated libraries must be verified for functional equivalence to original.
Modify and redesign some components to attain compatibility with emulation techniques, such as precharge logic circuits.
Assemble all the gate-level netlists for the entire design.
Testbed in Logic EmulatorTestbed in Logic Emulator
Design and implement the target ICE board combining the emulated design with real hardware.
Slowdown testbed to emulation speed.
Assemble the testbed and emulation equipment.
Full-Chip Configuration & In-Full-Chip Configuration & In-Circuit EmulationCircuit Emulation
Full-chip configuration: Prepare control files.
Partition the design to fit into the emulation system.
Download design into the system.
Verify that the emulation model faithfully implements the design as specified by RTL.
In-circuit emulation
Part IVPart IV
Reconfigurable Reconfigurable Computing and Computing and
SystemsSystems
General-Purpose Computing General-Purpose Computing vs. Custom Computingvs. Custom Computing
General-purpose computing - applying applications on a general-purpose computer.
Custom computing - applying applications on a custom-made application-specific hardware.
Field-programmable devices make this into a reality.
Goals of Reconfigurable Goals of Reconfigurable ComputingComputing
Tailor the architecture to the application.
Minimize or eliminate instruction interpretation.
Exploit fine grained parallelism.
Map software to hardware.
Applications of reconfigurable Applications of reconfigurable computingcomputing
Database search and analysis.Image processing and machine vision.Data compression.Signal processing.Neural networks.Biology computing.Medical computing.Design Automation (PSU)Many more.
ROM
Application 1
Multi-Mode Systems Multi-Mode Systems map map various applications to a reconfigurable various applications to a reconfigurable
systemsystem
Reconfigurable system
• Different configurations for read & write operations of a tape driver (Honeywell).
• Different configurations for different printer controllers (Tektronix).
Application 2
Run-Time Reconfiguration Run-Time Reconfiguration in in military image recognition systemmilitary image recognition system
Jeep?
Tank?
I/O
Truck?Image data
?
• Break single computation into multiple pieces.
• Page in components as needed (virtual hardware), ex., automatic target recognition.
Custom ComputingCustom ComputingApplication-specific systems.
Numerous applications for similar reconfigurable systems.
Offers hardware performance, flexibility to handle numerous algorithms.
Multi-FPGA systems can be viewed as hardware supercomputers.
Tell about DEC Perle
Reconfigurable Co-processorsReconfigurable Co-processors
Processor
Coprocessor
Program 1
Inst1
Program 2
Inst2- Provide custom instructions on a per-application basis.
Types of Reprogrammable Types of Reprogrammable SystemsSystems
Coprocessor
CPU
Attachedprocessing unit
Memory caches
I/Ointerface
Standalone PU
PU = processing Unit
Three ways to attach custom computing units
Types of Reprogrammable Types of Reprogrammable SystemsSystems
Attached and standalone standalone processing units are reprogrammable systems on computer add-on cards and separate reprogrammable cabinets.
Considerations: large communication overhead may over-shadow the speed gain.
Application-specific coprocessors can achieve significant improvement over a wide range of applications.
Types of Reprogrammable Types of Reprogrammable SystemsSystems
Integrate the reprogrammable logic into the processor itself.
A reprogrammable functional unit can be configured on a per-algorithm basis.
Providing some special-purpose instructions tailored to the needs of a given application.
Architectures of Multi-FPGA Architectures of Multi-FPGA (Reconfigurable) Systems(Reconfigurable) Systems
The most commonly used topologies: Mesh: 1D (linear array), 2D, and 3D.
Crossbar: full, partial, mixed, and hierarchical.
Hybrid between mesh and crossbar.
Application-specific architecture.
Hybrid Topology of a reconfigurable Hybrid Topology of a reconfigurable systemsystem
Splash 2: augments a linear array of FPGAs with a crossbar switch.Goal: Supporting systolic circuits.
RAM
FPGA
RAM
FPGA
RAM
FPGA
RAM
FPGA
FPGA
16 FPGAs
Ext. InterfaceExt. Interface
Hybrid TopologyHybrid Topology
FPGA
RAM
FPGA
RAM
FPGA
RAM
FPGA
Hostinterface
Anyboard: A linear array of FPGAs augmented by global buses.
Hybrid TopologyHybrid Topology
4 X 4 meshof FPGAs
RAM
RAM
RAM
RAM
Hostinterface
DECPeRLe-1: a 4 X 4 mesh of FPGAs augmented with shred global buses.
Application-Specific Topology of Application-Specific Topology of MARC-1, MARC-1, one subsystemone subsystem
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
FPGA
FPU
Memory
11
1
1
4 5 2 3
4 5 2 3
4 5 2 3
The Marc-1: subsystem 1.
Connections to other FPGAs
Application-Specific Application-Specific Topology of Marc-1, cont.Topology of Marc-1, cont.
1
5
4
3
2Subsystem1
Subsystem1
The Marc-1
• Application in circuit simulation where the program to be executed can be optimized on aper-run basis.
• This is done for values constant within that run, • but which may vary from dataset to dataset.
Application-Specific TopologyApplication-Specific Topology
FPGA
RAM
FPGA
RAM
FPGA
RAM
FPGA
RAM
FPGA
RAM
FPGA
RAM
FPGA
RAM
FPGA
RAM
FPGA
RAMThe RM-nc system: neural network.
Architecture for Computer Architecture for Computer PrototypingPrototyping
FPGA
FPGA
FPGA
FPGAFPGA
FPGAFPGACache memory
Register file
ALU FPU
VME bus
The Mushroom processorprototyping system.
Expandable TopologiesExpandable Topologies
Hierarchical crossbar topology: can be expanded by adding extra level. - Quickturn systems.
Expandable mesh topology: can be expanded by connecting individual boards to form a large mesh.
The Virtual Wires Emulation System (IKOS).
Topology for Adapting Topology for Adapting Other Other ComponentsComponents
Many multi-FPGA systems include non-FPGA resources to provide more general purpose solutions.
The MORRPH system - sockets next to FPGAs which allow to add arbitrary devices to the array.
The G800 board - contains two FPGAs and four sockets.
Topology for Adapting Other Topology for Adapting Other ComponentsComponents
The COBRA systemContains:
based modules (expanding to 2D mesh), RAM modules, I/O modules, and bus modules.
The Springbok systema pre-made daughter board which is able to contain an arbitrary device (on the top) and an FPGA (on the bottom). Daughter boards are mounted on a baseplate.
Topology for Adapting Other Topology for Adapting Other ComponentsComponents
The Quickturn systems - external component adapters.
The Aptix FPCB - a reprogrammable PCB.
Design Methodology Design Methodology for general-for general-purpose configurable systemspurpose configurable systems
Applications
Hostcomputer
Reprogrammable system
Mapping
Typical Software Methodology Typical Software Methodology for for general-purpose configurable systemsgeneral-purpose configurable systems
Application spec.
Analysis System-level synthesis
Software spec.
Codegeneration
Object codeHardware synthesis
Hardware spec.
Typical Software Methodology Typical Software Methodology for for general-purpose configurable systemsgeneral-purpose configurable systems
Hardware spec.
Synthesis
Partitioning & placement
Pin assignment & routing
FPGA P & R
Bit-stream files
Considerations for such Considerations for such complex software systemscomplex software systems
Architectural-specific design tasks.
Design automation process.
The mapping time dominates the setup time for operating the system.
Run-time reconfigurability.
Design Specification and Languages for Design Specification and Languages for reconfigurable software systemsreconfigurable software systems
Standard software programming languages, e.g., C, C++, FORTRAN, and assembly language, vs. HDLs.
Standard software programming languages - a sequential execution model.
HDLs - a parallel execution model.
Who will use it and which one is more suitable for system description???
Compilation IssuesCompilation IssuesTranslate code from software languages into hardware without losing the inherent concurrency of hardware.
Compiler techniques for parallelizing code.
Straight-line code, control flow, and loops.
Transmogrifier C compiler.
System-level and High-System-level and High-level Synthesislevel Synthesis
System-level design evaluation and analysis.
Design estimation.
Hardware-software partitioning.
Interface synthesis.
RTL synthesis.
Logic synthesis and technology mapping.
Partitioning and Partitioning and PlacementPlacement
Topology-aware partitioning methods.
Partitioning onto a multi-FPGA system is equivalent to a placement problem.
Logic utilization and timing.
Pin Assignment and Pin Assignment and RoutingRouting
Pin-assignment - the process of determining which I/O pins to be used for each inter-FPGA signal.
Pin-assignment for a pre-fabricated multi-FPGA system is equivalent to the global routing problem.
Pin-assignment will greatly affect the quality of FPGA’s logic utilization and routability.
Run-Time ReconfigurabilityRun-Time Reconfigurability
Virtual hardware <=> virtual memory. What are their relations? Artificial Intelligence, robotics. Vision.
Hardware on demand.
What is the Initial Un-configured structure?What are the reconfiguring methods.
Software supporting time-varying mapping.
Many open problems need to be solved in the forth coming years.
This is a new issue in system design: how much of the processor is virtual, when to reconfigure?
Applications: Applications: Splash 2Splash 2Stream oriented systolic and SIMDSIMD applications.
Scalable linear array of 16 to 256 processing elements (1 XC4010 with 1/2 Mbyte).
VHDL based.
Sequence comparison - 2300M:0.75M cell updates/sec (Splash 2:Sparc 10).
Edge detection - 10M:242K pixels/sec (Splash 2:Sparc 10).
Applications: PAM (DEC)Applications: PAM (DEC)
Programmable Active Memory (PAM).
C++ based and mesh arrays of XC3090 (DECPeRLe-1).
Applications: Multiple precision arithmetic. RSA encryption. Video compression (JPEG, MPEG, DCT). - High energy physics. Telecommunications.
Sources of some slidesPeter AlfkeXilinx, [email protected]