16
Luca Rastello Application Engineer, Verification Group The case for a system-level approach Addressing Exascale Emulation Debug Complexity

Addressing Exascale Emulation Debug Complexity · Next generation tools for waveform-level debug High Bandwidth I/F Raw Debug Signals 2TB/s High Performance Parallel Expansion High

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Addressing Exascale Emulation Debug Complexity · Next generation tools for waveform-level debug High Bandwidth I/F Raw Debug Signals 2TB/s High Performance Parallel Expansion High

Luca RastelloApplication Engineer, Verification Group

The case for a system-level approach

Addressing Exascale Emulation Debug Complexity

Page 2: Addressing Exascale Emulation Debug Complexity · Next generation tools for waveform-level debug High Bandwidth I/F Raw Debug Signals 2TB/s High Performance Parallel Expansion High

© 2019 Synopsys, Inc. 2

Agenda

• Exascale Debug Challenges

• ZeBu Exascale Debug

Page 3: Addressing Exascale Emulation Debug Complexity · Next generation tools for waveform-level debug High Bandwidth I/F Raw Debug Signals 2TB/s High Performance Parallel Expansion High

© 2019 Synopsys, Inc. 3

Exascale Debug Challenges

Page 4: Addressing Exascale Emulation Debug Complexity · Next generation tools for waveform-level debug High Bandwidth I/F Raw Debug Signals 2TB/s High Performance Parallel Expansion High

© 2019 Synopsys, Inc. 4

Emergence of Exascale Debug ComplexityG

ate

s

Cycles

IP/

Subsystems

109

109

GPU

Server Networking

Mobile

AI

Exascale Debug

Complexity

Exa = 1018

Page 5: Addressing Exascale Emulation Debug Complexity · Next generation tools for waveform-level debug High Bandwidth I/F Raw Debug Signals 2TB/s High Performance Parallel Expansion High

© 2019 Synopsys, Inc. 5

SoC+ SW+ System

Application-Level Debug

Exascale Debug Requires Higher Levels of Abstraction

Waveform-Level Debug

SoC + SW

System-Level DebugIP / Subsystem

Scope of Emulation Debug Abstraction

Millions of cycles

Millions of gates

Billions of cycles

Billions of gates

Today’s Presentation

Today’s Presentation

Page 6: Addressing Exascale Emulation Debug Complexity · Next generation tools for waveform-level debug High Bandwidth I/F Raw Debug Signals 2TB/s High Performance Parallel Expansion High

© 2019 Synopsys, Inc. 6

Real World Example for Exascale Debug Challenge

User Applications(Internet Browser)

Boot OS(Bootloader, Kernel, Init, VM, etc.)

System Services(Pwr, Batt, NetStat, Bluetooth, etc.)

Reset

Root Cause(Cache Coherency

Error)

Failure(OS Hangs)

Problem: Failure happens far later than bug…

Root cause cannot be traced

Few Millions 2 Billion1 BillionCPU Clocks

Iterative waveform-level debug leads to long, unpredictable time to root cause

2+ billion clocks

Traditional

waveform depth

(1-2M samples)

Page 7: Addressing Exascale Emulation Debug Complexity · Next generation tools for waveform-level debug High Bandwidth I/F Raw Debug Signals 2TB/s High Performance Parallel Expansion High

© 2019 Synopsys, Inc. 7

• Iterative waveform-level debug is similar to Depth First Search

– In-depth (details) analysis of a node

– Waveform based debug of a small window (1-2M cycles) of interest

– Node to node traversal

– Waveform based debug of a different small window of interest

– Repeat “n” times for graph traversal

– Issue is root caused

• Number of iterations become too large for exascale debug problems

– Time to root cause increases exponentially with iterative waveform-level debug

A parallel drawn from graph theory

Why Waveform-level Debug is No Longer Enough

Exascale debug requires higher level debug abstraction

Page 8: Addressing Exascale Emulation Debug Complexity · Next generation tools for waveform-level debug High Bandwidth I/F Raw Debug Signals 2TB/s High Performance Parallel Expansion High

© 2019 Synopsys, Inc. 8

TimeTB

DUT

Why High Throughput Emulation Can Cause Non-Determinism

Deterministic failure reproduction is a key debug requirement for high throughput emulation

Low throughput emulation

DUT clock stops

Throughput reduction by up to 5X

High throughput emulation

DUT clock doesn’t stop

Speedup by parallel TB and DUT execution

Non-determinism in high throughput emulation

TB transactions arrive at different times in subsequent runs

Nearly impossible to reproduce failure! Time

TB

TB

DUT

Regression Run

Failure

Time

TB

TB

DUT

Debug Run #1

No Failure

Time

TB

TB

DUT

Debug Run #2

No Failure

TimeTB

DUT

Typical emulation with multiple testbench interfaces

Page 9: Addressing Exascale Emulation Debug Complexity · Next generation tools for waveform-level debug High Bandwidth I/F Raw Debug Signals 2TB/s High Performance Parallel Expansion High

© 2019 Synopsys, Inc. 9

Too slow

Why BGate SoCs Dramatically Slow Waveform-level Debug

IP /

Subsystem

Emulation Raw

Debug Data

Waveform

Debugger

Expanded

Debug Data

Fast waveform expansion and load time essential for rapid root cause of billion gate SoCs

Expansion LoadData Dump

SoC + SW

EmulationRaw

Debug Data

Waveform

Debugger

Expanded

Debug DataExpansion LoadData Dump

MB mins GB min

GB hour GBs mins➔hr

Page 10: Addressing Exascale Emulation Debug Complexity · Next generation tools for waveform-level debug High Bandwidth I/F Raw Debug Signals 2TB/s High Performance Parallel Expansion High

© 2019 Synopsys, Inc. 10

Challenge: Reduce time to root cause by eliminating iterative waveform-level debug

Requirement 1: Higher level analysis of entire design over entire billion cycle run

– Abstract analysis of entire graph to identify node (Breadth First Search)

Challenge: Deterministic bug reproduction in high throughput emulation

Requirement 2: Ability to reproduce failure deterministically in subsequent runs

– Consistent reproductions of the graph

Challenge: Scalability of waveform based debug for billion gate SoC

Requirement 3: Next generation tools for waveform-based analysis

– Detailed analysis of specific node

Three Requirements for Exascale Debug

Page 11: Addressing Exascale Emulation Debug Complexity · Next generation tools for waveform-level debug High Bandwidth I/F Raw Debug Signals 2TB/s High Performance Parallel Expansion High

© 2019 Synopsys, Inc. 11

ZeBu Exascale Debug

System-level Abstraction

Deterministic Replay

Waveform Scalability

Page 12: Addressing Exascale Emulation Debug Complexity · Next generation tools for waveform-level debug High Bandwidth I/F Raw Debug Signals 2TB/s High Performance Parallel Expansion High

© 2019 Synopsys, Inc. 12

• ZeBu offers streaming capability to extract system-level data

– Abstract logs (monitors/checkers) of all key interfaces

– Dump key signals

– Language features like assertions, system tasks, DPI etc.

– At-speed execution

– Infinite depth covering entire test run

– No throughput impact - Emulation clock doesn’t stop

• System-level data analysis to identify failure window

– Checkers for coarse grain search

– Monitors, key signal waveform for refining the window

High level analysis of entire design over entire billion cycle run

System-level Abstraction Debug with ZeBu

ZeBu system-level debug enables failure window identification in a single pass

Checkers

Monitors

Key signals

BCycles

MCycles

Monitors

Checkers

Key events

System log

ZeBu Server 4

Page 13: Addressing Exascale Emulation Debug Complexity · Next generation tools for waveform-level debug High Bandwidth I/F Raw Debug Signals 2TB/s High Performance Parallel Expansion High

© 2019 Synopsys, Inc. 13

ZeBu record/replay and save/restore enable deterministic generation of failure debug data

• ZeBu Record/Replay

– Applicable for testbench

– Eliminate testbench non-determinism

– In subsequent run, testbench is replayed

• ZeBu Save/Restore

– Applicable for DUT

– Eliminate the need to restart from time 0

– Run can start close to failure point

• Application

– Main run with stimuli recording

– DUT save during main run

– Restore, deterministic replay and debug data dump

Deterministic Error Reproduction in ZeBu

DUTTestbench

TimeTest EndRoot

Cause

Stimuli Recording #0 #N #M… … …Time N

DUT

State

Save

Time M

DUT

State

Save

Time

Stimuli Replay #N

Time N

DUT

State

Restore

Debug

Data

Dump

Time 0

DUT

State

Save

ZeBu Server 4

… …

Page 14: Addressing Exascale Emulation Debug Complexity · Next generation tools for waveform-level debug High Bandwidth I/F Raw Debug Signals 2TB/s High Performance Parallel Expansion High

© 2019 Synopsys, Inc. 14

Scalable solution for complex billion gate SoC waveform-level debug

10X Faster Expansion and Load with ZeBu and VerdiNext generation tools for waveform-level debug

High

Bandwidth

I/F

Raw Debug

Signals

2TB/s

High Performance

Parallel Expansion

High Performance

Interactive Expansion

Expanded

Debug Signals

VerdiZeBu Server 4

750MGates, 1M cycles

<10min

2.5BGates, 500k cycles

Each signal drop <1sec

Native

ZeBu

Format

Page 15: Addressing Exascale Emulation Debug Complexity · Next generation tools for waveform-level debug High Bandwidth I/F Raw Debug Signals 2TB/s High Performance Parallel Expansion High

© 2019 Synopsys, Inc. 15

ZeBu Exascale Debug

ZeBu Exascale Debug

• Run billion cycle workloads at MHz speed

• Stream system-level data, record stimuli and save

DUT states

• Analyze system-level data to identify failure window

• Deterministic rerun of failure window to dump million

cycles of debug data

• High performance waveform expansion and debug

in Verdi

Fastest way to root cause bugs in complex BG SoCs with BCycle Workloads

ZeBu

Emulation

Replay

Testbench

stimuli

Record

Run

Restore

System-level

Data

Stream

DUT StateDUT State

DUT state

Save

Waveform

Expansion &

Verdi debug

Select Select

Testbench

Analyze

Page 16: Addressing Exascale Emulation Debug Complexity · Next generation tools for waveform-level debug High Bandwidth I/F Raw Debug Signals 2TB/s High Performance Parallel Expansion High

Thank You