29
1 YONSEI UNIVERSITY Logic Emulation Sungho Kang Yonsei University 2 Computer Systems Lab. YONSEI UNIVERSITY Case Studt\ Outline Introduction Anatomy FPGA Architecture Emulation System Case Study Conclusion

Logic Emulation - Yonsei

  • Upload
    others

  • View
    20

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Logic Emulation - Yonsei

1

YONSEIUNIVERSITY

Logic Emulation

Sungho Kang

Yonsei University

2Computer Systems Lab. YONSEI UNIVERSITY

Case Studt\Outline

• Introduction• Anatomy• FPGA Architecture• Emulation System• Case Study• Conclusion

Page 2: Logic Emulation - Yonsei

2

3Computer Systems Lab. YONSEI UNIVERSITY

IntroductionChallenges for CAD

• EDA industry§ 0.35% of the electronics industry§ It is a vital enabling technology

• CAD in service§ Not a contest in inventing new algorithms§ Co-think with designers

• Technology Trend§ Short life time (time-to-market pressure)§ Submicron processing§ High degree of system complexity§ Merging of CPU, DSP, communicatoins, consumer electronics§ Embedded software : complicated even to specify§ Reuse of hardware§ Increased field programmability

4Computer Systems Lab. YONSEI UNIVERSITY

IntroductionLogic Emulator

E m u l a t o r C h i p

P rog r am t h r ough so f twa r e Manufac tu red i n a f ab

REPLACEABLE

Page 3: Logic Emulation - Yonsei

3

5Computer Systems Lab. YONSEI UNIVERSITY

IntroductionWhat is Emulation?

Turnkey rapid prototypingsystems

• Read users design andautomatically partition &map to array of FPGAs

• Enable user to run atsystem level and verifywith application software

• Full internal visibility todebug - thousands ofprobes

• Modify design in minutes

C P U

Compiler

DesignMapping

Logic Design

Emulator Target system

6Computer Systems Lab. YONSEI UNIVERSITY

AnatomyDesign Flow for Emulation System

Design netlist

Design Import

Compile

System Setup

Download

Emulation

Logic Analyzer, Debugger

Page 4: Logic Emulation - Yonsei

4

7Computer Systems Lab. YONSEI UNIVERSITY

AnatomyAnatomy of an Emulator

Emulation moduleswith FPGAs & cross

point chip

Special memorycards for mappingvery complex ofdeep memory

Instrumentationcards for debug

Inter-card connectioncrossbar backplane

to allow modularcapacity addition

Specialized add-on cards for cores

Targetinterfacehardware

Cable for target systeminterface or debug

8Computer Systems Lab. YONSEI UNIVERSITY

AnatomyEmulator Architecture

• Hierarchical Multiplexed Architecture Simplifies DesignMapping Process

AutomatedDesign

Mapping

MuxBackplane

EmulationModule

Page 5: Logic Emulation - Yonsei

5

9Computer Systems Lab. YONSEI UNIVERSITY

AnatomyDefinition

• A logic emulator is a system of§ Programmable hardware with capacity much greater than one

FPGA§ Software which automatically programs the hardware according to

a gate level design representation§ Software and hardware to support operation and analysis of the

emulated design as a component in real hardware

10Computer Systems Lab. YONSEI UNIVERSITY

AnatomySystem Overview : SW Components• Design compiler§ Netlist reader and parser :ÄReads and parses gate-level design netlists

§ Technology mapper :ÄMaps design components into optimal emulator equivalents

§ System-level Partitioner and Placer :ÄPartitions mapped design into boxes, boards, ultimately into

FPGA netlists.§ System-level Interconnect router :ÄDetermines the programming of interconnect hardware to

complete nets cut by the partitioner§ FPGA compiler :ÄReads each FPGA netlist, maps, partitions, places and routes

FPGA.§ Timing Analysis(optional) :ÄAnalyzes compiled design on emulation hardware for speed,

hold violations.• Runtime download and analysis controller.• Graphical User Interface Hardware diagnostics

Page 6: Logic Emulation - Yonsei

6

11Computer Systems Lab. YONSEI UNIVERSITY

AnatomySystem Overview : HW Components

• Logic emulation boards : FPGAs and interconnect chips

• Memory emulation boards : RAMs, FPGAs andinterconnect.

• System interconnect board : chips which interconnectemulation boards.

• I/O Connectors and Pods :connects to in-circuit interfaces, external components.

• Instrumentation :Stimulus generator, logic analyzer,vector interface.

• Controller : downloads configurations, operatesinstruments.

• Interface : to host computer.

12Computer Systems Lab. YONSEI UNIVERSITY

AnatomyWhy Emulate?

• Concurrent design verification - faster time to market• Higher predictability of schedule - reduced project risk• Fewer design changes in final phases - improve quality• Lower cost for fewer silicon iterations - lower cost• For mission critical designs - high quality• Simulation : only methodologies are limited by processing

power in verifying complex designs - Better verification• Emulation is the only verification methodology which is

keeping up with system complexity

Page 7: Logic Emulation - Yonsei

7

13Computer Systems Lab. YONSEI UNIVERSITY

AnatomyNeed for System Level Emulation

• Emulation combines the flexibility of simulation and therealism of a prototype§ simulation : limited by the availability of software models§ custom prototyping : increased time to build and debug

• Leverages synthesis technology to optimize designdifferentiation and uniqueness

• Enable fast, incremental design changes that shortendesign iteration cycles and improve quality

• Avoid costly respins of silicon and saves months ofredesign

• Increased confidence in your entire project schedule andyour ability to meet requirements

14Computer Systems Lab. YONSEI UNIVERSITY

AnatomyDesign Verification Methodologies

• Simulation : allows observation of a fraction of real worldinteractions (sequential)

• Emulation : enables the designer to explore alternatives(parallel)§ Verification takes place in a hardware environment§ The design is retargeted to a programmable hardware

environment

• Rapid prototyping : allows operating speeds such that allinterfaces to target applications can operate in real time.§ If the system runs at real time, the quality of the algorithm can be

evaluated on the fly and DSP design time can be greatly reduced.

software modelsoftware algorithm

workstation oraccelerator

Stimuli è

Page 8: Logic Emulation - Yonsei

8

15Computer Systems Lab. YONSEI UNIVERSITY

AnatomyAdvantages of Emulation

• Emulation is the only verification methodology whichis keeping up with system complexity

Time Speed up factor 1 second --------- 107 --------- çç Actual Hardware

10 second --------- 106 --------- çç Logic Emulation

2 minutes --------- 105 ---------

16 minutes --------- 104 ---------

3 hours --------- 103 ---------

1 day --------- 102 ---------

12 days --------- 101 ---------

3 months --------- 1 --------- çç Software Simulations

16Computer Systems Lab. YONSEI UNIVERSITY

AnatomyAdvantages of Emulation

• Emulation performance is not a function of design size

Deep Sub-micron Zone

Emulation

Point of Emulation

Simulator

Accelerator

Cycle-basedSimulator

Tim

e to

Ver

ify

Des

ign

ñ

Page 9: Logic Emulation - Yonsei

9

17Computer Systems Lab. YONSEI UNIVERSITY

AnatomyMotivation : Verification Cycles

• Simulation is the most compute-intensive problem in EDA

Design Sizes are Growing

§ 2X Design Size è2X work / vector.

§ 2X Design Size è2X vectors.

ð 2X Design Size à 4X workload

It’s Inside the Design Cycle

DesignCapture

MoreTests !

Simulate

EditDesign

Correct ?

Y e s

N o

18Computer Systems Lab. YONSEI UNIVERSITY

AnatomyMotivation : Verification Realism

• Often the chip meets its spec, but does not work in thesystem :§ Spec errors, misunderstandings:Ä“ I thought your chip was handling that… ”

§ Real system puts designs into unanticipated situations:ÄInteraction between components across time and function:

Combinatorial Explosionª i.e. the Ethernet driver interrupts a page fault which is servicing a

floating point exception.

ÄOther parts of system don’t adhere to their specs, or theirspecs aren’t known:ª Undocumented behavior in other devices, such as CPU,ª Peripherals from other projects or other vendors.

Page 10: Logic Emulation - Yonsei

10

19Computer Systems Lab. YONSEI UNIVERSITY

Case Studt\Motivation : Verification Realism (cont.)

• Some applications need real-time operation forverification :§ Display are far easier to verify by actual observation§ Closed-loop operation with analog hardware§ Electro-mechanical controllers§ Human perception : audio, video compression, processing

• Simulation generally requires test vector development:§ Costly and difficult, critical path in schedule,§ Verification depends of test vector correctness,§ Test vectors may have to be based on assumptions,§ Test vectors are intrinsically open-loop.

20Computer Systems Lab. YONSEI UNIVERSITY

Case Studt\Motivation : Verification Realism

• Only when the real design is running its real application inits real environment is correctness assured.

• Emulated design connected to actual hardware can run:§ actual diagnostic code, compatibility tests,§ actual operating systems,§ actual applications,§ receiving real data from storage, sensors, devices,

§ sending real data to storage, devices, displays.

Page 11: Logic Emulation - Yonsei

11

21Computer Systems Lab. YONSEI UNIVERSITY

AnatomyMotivation : Visibility

• Once a chip is fabricated, placed in a system, and fails§ Internal probing is impossible§ It may be difficult or impossible to put the simulation into the

failing state for analysis

• Emulated design can have internal probes programmed in,for direct connection to instrumentation

• Emulated design may be used to generate test vectors forfabrication

22Computer Systems Lab. YONSEI UNIVERSITY

AnatomyRapid Prototyping

• Once emulated design is debugged, it is available forimmediate use by software developers.§ This can directly reduce the projects;’s critical path and time-to-

market.

• Emulated design is available for demonstration tocustomers, users, management.§ Proof of concept, proof of progress.§ Find out early whether result will be satisfactory.

• Architectural workbench :§ Drive emulation with RTL-level synthesis.§ Experiment with architectural features on real code and data:Ästructures, sizes, algorithms of caches, busses, buffers,Äquantity and design of functional units,Änovel architectures, representations, algorithms. etc., etc.,

Page 12: Logic Emulation - Yonsei

12

23Computer Systems Lab. YONSEI UNIVERSITY

AnatomyDisadvantage of Logic Emulation

• Hardware emulation system is required• Speed is 5-10 X slower than real design speed§ System emulation speeds of 1 to 4 MHz are common today§ Target system must be slowed down for emulation

• Delays do not match those of real design§ Timing-induced errors are possible, that is, hold-time violation§ Delay independent functionality may not operate correctly

24Computer Systems Lab. YONSEI UNIVERSITY

FPGA ArchitecureFPGA Architectures for Emulators

• Partitioning, placement and routing optimized for theemulator performance

• FPGA efficient interconnection technology(currently use ~25% of FPGA logic gates)

• Interconnection of logic blocks, of multiple FPGAs, ofmultiple emulator modules

• Incremental design change• Observability and controllability of debug process• Memory resource

(separate memory or FPFA RAM)• Clock lines

(low skew, no setup and hold time violations)• Lower cost (than silicon)

Page 13: Logic Emulation - Yonsei

13

25Computer Systems Lab. YONSEI UNIVERSITY

FPGA ArchitecureInterconnect Problem

• It is critical to maximize gate capacity and speed bypacking as much logic into each FPGA as possible.

• Interconnect hardware architecture must:§ provide successful connectivity in all cases.§ Permit maximum logic utilization of the FPGAs,§ with minimum added delay and skew,§ at minimum hardware cost.

• Rent’s Rule applies:§ Observation of Rent at IBM in 1960’s:Äpincount of arbitrary subpart of a digital system is proportional

to a fractional power of the gatecount.P = K * G ^ r

§ Example in high-performance systems:pins = 2.5 * gates ^ 0.56

26Computer Systems Lab. YONSEI UNIVERSITY

FPGA ArchitectureInterconnect Problem

• Commercial FPGAs are sized for engineered applications:§ Designed is designing for FPGA structures.§ Designer architects system so that subparts fit into FPGA

pincounts.§ Vendors design FPGAs accordingly.

• Emulated designs are completely arbitrary :§ Structures are not optimum for FPGAs§ FPGA size and pincount is arbitrary.§ FPGA subparts are automatically extracted by partitioner.§ Result :ÄFPGAs are pin-limited, not gate-limited.ÄLogic emulator gets 20-30% as much gate utilization as

ordinary FPGA applications.§ There is a challenge for interconnect architecture and software to

maximize gate utilization FPGA pincount.

Page 14: Logic Emulation - Yonsei

14

27Computer Systems Lab. YONSEI UNIVERSITY

FPGA ArchitectureField Programmable Interconnects

• Aptix FPIC§ A place and route architecture (not a crossbar)§ Routing delay not controllable§ 940 user programmable I/Os

• IQ 160§ 176X176 crossbar§ Every port can be configured to connect to any port§ Routing delay is predictable

• FPGAs§ A place and route architecture§ Routing delay not entirely controllable§ Typically < 300 programmable I/Os

28Computer Systems Lab. YONSEI UNIVERSITY

FPGA ArchitectureVirtual Wires

• Increase bandwidth by multiplexing§ > 80% gate util., but decrease emulation speed.

FPGA #1 FPGA #2

Physical Wire

Logical Outputs

Mux

Shiftlogic

Logical Inputs

Shiftlogic

Physical Wire

LogicalOutputs

FPGA #1

LogicalInputs

FPGA #2

Page 15: Logic Emulation - Yonsei

15

29Computer Systems Lab. YONSEI UNIVERSITY

FPGA ArchitectureVirtual Wires

• Emulation clock period consists of multiple evaluationphases to evaluate combinational logic partitioned intoFPGAs

• A phase is divided into two parts :§ Evaluation portion§ Communication portion : clocked with a pipeline clock

Pipelineclock

Emulation clock

Phase 1 Phase 2 Phase 3

Eval Eval Eval

30Computer Systems Lab. YONSEI UNIVERSITY

FPGA ArchitecturePartial Crossbar Interconnect

• Useful with the ability of FPGAs to freely assign pins• Each small full crossbar chip is connected to the same

subset of pins on each logic chip• To configure the system, logic partitioning, placement,

routing are performed• Placement is insignificant, since the interconnection is

symmetrical• Interconnect router is a simple repetitive table-driven task

Page 16: Logic Emulation - Yonsei

16

31Computer Systems Lab. YONSEI UNIVERSITY

Emulation SystemMARS

• MARS System from PIE Design§ Based on Xilinx XC4005 and XC4003§ Partial crossbar system§ 500K gates in one box§ Emulated 2 million gates design

Logic BlockModule

DebugWareStandalone

Unit

CAE Workstations

Software

TargetBoard

Pods

32Computer Systems Lab. YONSEI UNIVERSITY

Emulation SystemRealizer

• Automatically configures a network of FPGAs toimplement large digital logic designs

• Logic and interconnections are separated to achieveoptimum FPGA utilization

• Interconnection by partial crossbars reduces system-levelcomplexity§ Achieves bounded interconnection delay§ Scales linearly with pin count§ Allows hierarchical expansion in a fast and uniform way

• Applications§ Prototyping for real time verification and operation§ High-speed simulation accelerator§ VHDL-driven architecture workbench

Page 17: Logic Emulation - Yonsei

17

33Computer Systems Lab. YONSEI UNIVERSITY

Emulation SystemEnterprise Emulation System

330K gates + 64 MB memoryEnterprise system

• Precision EmulationSoftwareTM

• Electronically Programmablebackplane bus

• Debug environment &software

• Built-in Logic Analyzer &Stimulus Generator

• Memory debug software withMemory Emulation Modules

• Built-in Workstation interfacefor co-simulation

Pods

WorkstationIBM RS6000Sun, HP700

Plug Adapter

Target System

Completely integrated &electronically connected

modular system

Keyboard

34Computer Systems Lab. YONSEI UNIVERSITY

Emulation SystemEnterprise Emulation System

• Expandable system supports up to 11 independent LogicEmulation Modules (30,000 to 330,000 logic gates)

• System support up to 32 Memory Emulation Modules(Total 64 Mbytes, including multiport RAMs)

• Precision Emulation Software accurately synthesizeseven the most complex design

• EDA system integration tools smooth the transition toemulation

• Fast, incremental design change capability shortensdesign iteration cycles

Page 18: Logic Emulation - Yonsei

18

35Computer Systems Lab. YONSEI UNIVERSITY

Emulation SystemEnterprise Emulation System

FPGACompiler

FPGACompiler

FPGACompiler

InterconnectCompiler

Design Database

Timing analysis(Optional)

Netlist Reader and Parser

Technology Mapper

System-LevelPartitioner and Placer

System-LevelInterconnect Router

Configuration Files for Each Logic and Interconnect Chip

Gate-levelNetlist Files

Libraries

36Computer Systems Lab. YONSEI UNIVERSITY

Emulation SystemSystem Realizer

• M6000 :§ Expandable up to 6 million emulation gates

(24 * 250 K gate modules )§ 4 ~ 8 MHz typical (M250)

1 ~ 4 MHz (M3000, M6000)§ Modular package : M250 for 250K gates M3000, M6000

(Xilinx 4013, custom interconnect chip)§ Integrated multiplexing backplane, logic analyzer, and diagnostic

processor§ Direct RTL (Verilog & VHDL) to FPGA mapping§ Fast design change

Page 19: Logic Emulation - Yonsei

19

37Computer Systems Lab. YONSEI UNIVERSITY

Emulation SystemAptix

• Rapid prototyping is made possible by§ FPIC/FPCB§ FPGA§ Synthesis R A P I D P R O T O T Y P E

High Speed System OperationStandard Component Integration

Re-configurable, AutomationArchitectural Target Definition

Design Style EnforcementLarge Logic Handlingsoftware AutomationLibrary Rotargeting

Large Logic Capacity (20K)High Speed Feasible (30MHz)Re-configurable, Automation

ProgrammableHardware

High DensityFPGA

Synthesis

FPICFPCB

38Computer Systems Lab. YONSEI UNIVERSITY

Emulation SystemCOBRA Prototyping Environment

• Connector : 90 pins/side, 75 bit link to two neighbors• 4 base modules (one with RAM module)• Supporting software§ Partitioning, synthesis, hardware debugging software

Xilinx4025

Xilinx4025

Xilinx4025

Xilinx4025

ROMCTRL

Xilinx4025

Xilinx4025

Xilinx4025

Xilinx4025

ROMCTRL

Page 20: Logic Emulation - Yonsei

20

39Computer Systems Lab. YONSEI UNIVERSITY

Emulation SystemPrism

• Goal is to improve computer-intensive performance by§ Extracting information during a program’s compilation§ Synthesize new operations to add to the instruction set

• PRISM 1 : 10MHz M68010 + 4 Xilinx 3090s• PRISM 2 : 33MHz AM29050 + 3 Xilinx 4010s

Correct C Syntax

doublefloat

structuniongoto

PRISM-Iintcharshortlongif-elsefor(fixed loop count)

PRISM-II

for (variable loop count)

whiledo-while

switch-casebreak

continue

40Computer Systems Lab. YONSEI UNIVERSITY

Emulation SystemPrism

GCC Front End :Parsing and Standard

Optimization

VHDLGeneration

VHDLDesigner

Program FlowGraph Generation

X-BLOCK NetlistGeneration

Xilinx Tools(PPR etc.)

Simulation Simulation

ProgramSource (C)

Hardware Imges

Register TransferLanguage

Page 21: Logic Emulation - Yonsei

21

41Computer Systems Lab. YONSEI UNIVERSITY

Emulation SystemVirtual Computer

• Virtual Computing• Virtual Computer§ 52 Xilinx 4010§ 24 ICUBE Field Programmable Interconnect Devices§ 3 64-bit I/O ports

(one configuration bus)§ 8 Mbytes of 25 ns SRAM§ 8 K by 16-bit 25 ns dual port RAM

• Hyper-scalability§ X area of logic → throughput Y§ 2X area of logic → throughput >= 2Y

42Computer Systems Lab. YONSEI UNIVERSITY

Emulation SystemSplash 2

• Based on Xilinx XC4010• Scalable from 16 to 256 processing elements• (1 ~ 16 boards)• Higher I/O bandwidth (DMA from Sun SBus)• Increased connectivity• (linear data path + crossbar)• Application programs in behavioral VHDL• Array board :§ 16 processing elements§ 16 * 16 crossbar switch§ 0.5 Mbyte memory§ 36 bit bi-directional data paths

Page 22: Logic Emulation - Yonsei

22

43Computer Systems Lab. YONSEI UNIVERSITY

Case StudyCase Study

C h i p C h i pS e t

S y s t e mH a r d w a r e

S y s t e m H a r d w a r e & S o f t w a r e

1 0 0 0 0 0 s e c

1 0 0 0 0 s e c

1 0 0 0 s e c

1 0 0 s e c

1 0 s e c

1 s e c

. 1 s e c

Equ iva len t Rea l-T ime Sys t em Cyc l es

Computer System In tegra t ion Stages

RunApplications

System LevelDiagnostics

Boot OperatingSystem

BIOSSelf Test

C u r r e n t S i m u l a t i o n C a p a b i l i t y

EmulationDomain

44Computer Systems Lab. YONSEI UNIVERSITY

Case StudyUltraSPARC Emulation

• Low-skew clock distribution§ The vendor supplied clocktree analysis procedure§ Gated clock are redesigned to either remove the gating or pull the

gating to the root to allow use of low-skew nets

• Possible problem areas§ Gated clocks (e.g. scan control)§ Feed-thrus (designed to reflect the silicon floorplanning)ÄOnly actual direct connections are important

§ Repeaters : Emulation does not need repeaters§ Precharge logic & pass-thru latches must be redesigned into a

static representation (careful verification required)

Page 23: Logic Emulation - Yonsei

23

45Computer Systems Lab. YONSEI UNIVERSITY

Case StudyUltraSPARC Emulation

•All the reimplementation of megacells (due to internal clocking ormemory) must be carefully verified for cycle accurate functionality

•takes 36 walk clock hours using 5 fast +70 computers

§50 system to system cables

•Over 5000 nets routinely probed (128k depth)

§

Return on emulation investment§

Post_tapeout to pre-silicon : 25% (stress testing)§

Computer Systems Lab. YONSEI UNIVERSITY

Case StudyPowerPC

• Demands :§ PowerPC 603/604TM bus speed :Ä 2~4 times the speed of the external bus.

§ Pipeline controller to run at 75MHz

• CPLD is a better choice than FPGA§ CPLD provides a higher number of pterms in each cell§ The state bits are visible§ CPLDs boast a very predictable delay path

Page 24: Logic Emulation - Yonsei

24

47Computer Systems Lab. YONSEI UNIVERSITY

Case StudyDesign for Emulation

• Synchronous design• Pipeline design techniques• Short arithmetic functions (minimize logic level)• Minimize bit width• Use of I/O FFs where possible (latches)• Careful mapping between functions and FPGAs• Minimize high fanout net or use available buses• Use of small blocks (20-40K gates)• Care when gating the clock tree (e.g. low power)• Limit module I/O count

48Computer Systems Lab. YONSEI UNIVERSITY

Case StudySynthesis to Support Emulation

ModelLDS(proprietar HDL)

subset of VHDL

CLOSESynthesis for

Emulation

CLB-netlist

META Systemsprocessing chain

EMULATOR

Specification(Detailed RTL)

LOGIC SYNTHESIS

NetList

PLACEMENTROUTING

Layout

FO

RM

AL

VE

RIF

ICA

TIO

N

Extractedfunction

FUNCTIONALABSTRACTION

SY

NT

HE

SIS

FO

RE

MU

LA

TIO

N

EMULATION

TIMINGANALYSIS

SY

NT

HE

SIS

FO

RE

MU

LA

TIO

N

MO

DE

LD

EB

UG

GIN

GP

RO

TO

TY

PIN

G

Page 25: Logic Emulation - Yonsei

25

49Computer Systems Lab. YONSEI UNIVERSITY

Case StudyApplication of Synthesis

• Efficient models to handle specific logic styles§ Multiple clocks§ Multiple input latches§ Tristate and precharged signals

• Efficient logic optimization• Module generators for specific operators• Control of visibility and controllability

50Computer Systems Lab. YONSEI UNIVERSITY

Case StudySetup-time Violation

• Too much delay in the data path, receiving flip-flop isclocked too soon.§ Remedy : slow down clock speed

s lowlog ic

D QD Q

FF1

c lock

FF1

setup time violatior

FF1 clock

FF2 Q out

FF2 D in

FF2 clock

Page 26: Logic Emulation - Yonsei

26

51Computer Systems Lab. YONSEI UNIVERSITY

Case StudyHold-time Violation

• Too much delay in the clock path with respect to source’sclock, receiving flip-flop is clocked too late. Slowing downclock will not help.

• Delay must be added to the data path, or removed fromthe clock path

f a s tl o g i c

D QD Q

F F 1

c l o c k

F F 1

d e l a y

hold time violatior

FF1 clock

FF2 Q out

FF2 D in

FF2 clock

52Computer Systems Lab. YONSEI UNIVERSITY

Case StudyGated Clocks

• Gated clocks are a major source of clock skew and hold-time violations.

• FPGA flip-flops often offer clock enables :§ Clock enable control input controls whether clock edge affects

flip-flops.

• FPGAs often offer low or zero-skew clock distributionnetworks :§ When clock enable is used, clock remains connected to low-skew

network.§ Logic emulators have system-wide low-skew clock paths

connecting FPGA low-skew clocks.

• Technology mapper should transform gated clocks intoclock enables.§ Hold-time violation is avoided by removing delay from clock path.

Page 27: Logic Emulation - Yonsei

27

53Computer Systems Lab. YONSEI UNIVERSITY

Case StudyGated Clock Optimization Method

• Special technique to eliminate hold-time violationpossibilities, developed by Wei-Jin Dai, Lou Galbiati, DamVan Bui, of Quickturn, used in Enterprise.

ff_1

ff_2

ff_n

Combinationallogic F

ff_d

ff_2 Combinational

CLOCK

ce

A circuit with a gated-clock Transformed circuit

54Computer Systems Lab. YONSEI UNIVERSITY

Case StudyLocal Clock Optimization May Exist

CLOCK

A. Origianl circuit

dff1q

CK

D Qdff2

CLOCK

dff1q

CK

D Qdff2ce

B. Transformed circuit

A B C

A B

A B

Clock

q

DOriginal circuit

QCK

Transformed circuit

Q

Page 28: Logic Emulation - Yonsei

28

55Computer Systems Lab. YONSEI UNIVERSITY

Case StudyLocal Clock Optimization May Not Exist

CLOCK

dff1q

CK

D Qdff2

CLOCK

dff1q

CK

D Qdff2ce

B. Transformed circuit(non-equivalent)

A B C

A B

A C

Clock

q

DOriginal circuit

QCK

Transformed circuit

Q

D

C

Computer Systems Lab. YONSEI UNIVERSITY

Case StudyTransformation Condition Checking

d q

d q

d q

d q

d q

d q

q1

q2

q3

q5

q4

a

b c

d

e

en1

en2

en2

en2

en2

data

CLOCK

qout

DFFd

CK

d q

d q

d q

q1

q2

q5

q4

en2

en2

en2

data

qout

DFFd

d q

Original Circuit Optimized Circuit

Page 29: Logic Emulation - Yonsei

29

57Computer Systems Lab. YONSEI UNIVERSITY

Case StudyConclusion : Ideal Emulation

• Cheaper : $1.25/gate in 1994• Easier to use (reduce time-to-emulation)§ Automatic design transformation (synchronous design:

setup/hold time, gated clock, wired logic, precharge logic)§ Compiler (partitioning and mapping)§ Hierarchical modular architecture§ Target interface for complex multi-chip system

• Powerful debug environment§ Debugging environment or short debug turn-around time§ Modular complication for incremental change§ Powerful logic analysis

• Scalability to handle increasing complexity§ Modular and flexible packaging