From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

From Recovering Time to Timing Recovery: Some Challenges

for the TAU Community

Andrew B. KahngDepts. of CSE and ECE

UC San [email protected]

http://vlsicad.ucsd.edu/~abk

2

TAU-2016 Keynote: “In Search of Lost Time”

• “Recovering Time”: machine learning, optimization, margin reduction, …

3

• MotivationsAgenda

4

Design Crises: Cost, Expertise, Unpredictability

• Quality: also not scaling• Design Capability Gap• Available density: 2x/node• Realizable density: 1.6x/node • Figure: UCSD / 2013 ITRS

• Design cost: not scaling• Design, process roadmaps

not coupled• Figure: Andreas Olofsson,

DARPA, ISPD-2018 keynote

5

Design is Too Difficult !• Tools and flows have steadily increased in complexity

• Modern P&R tool: 10000+ commands/options• Hard to design with latest tools in latest technologies

• Even harder to predict quality, schedule• Expert users required • Increased cost and risk not good for industry !

• Still have “CAD” mindset more than “DA” mindset• Again: assumes expert users

How do we escape this “local minimum” ?

6

IDEA: No-Humans, 24-Hours

A. Olofsson, DARPAISPD-2018 keynote

• Part of DARPA Electronics Resurgence Initiative

• Traditional focus: ultimate quality• New focus = ultimate ease of use• No humans, 24-hour TAT = “equivalent scaling”• Overarching goal: designer access to silicon

7

DARPA IDEA and POSH Programs, 2018-2022

https://vlsicad.ucsd.edu/NEWS18/dac_v5_DISTAR.pdf

8

theopenroadproject.org

9

OpenROAD: A New Design Paradigm

Quality Schedule Cost

Mindsets• Achieve predictability from the user’s POV• Use cloud/parallel to recover solution quality• Focus on reducing time and effort = schedule, cost

Machine Learning is CENTRAL to this

24 hours, no humans – no PPA loss

Design Complexity

Extre

me

parti

tioni

ng

Para

llel

optim

izatio

n

Mac

hine

Lea

rnin

g of

tool

s, fl

ows

Rest

ricte

d la

yout

10

The OpenROAD Project • Initial target: digital IC flow “RTL to GDS”• Open source• No-human-in-loop

• Limited “knobs”, restricted field of use• Must replace intelligent humans (partition, floorplan, …)

11

• Motivations• OpenROAD + Initial Target

Agenda

12

Initial Target: RTL-to-GDS Layout Generation

Logic Synthesis

Floorplan/PDN

Placement

Clock Tree Synthesis

Global and Detailed Routing

Layout Finishing

Verilog + .lib, .sdc, .lef

GDSII

• Inputs: .v, .sdc, .lib, .lef• .def, .spef in point tools• config files required• pre-characterizations required

• Outputs: post-route .def, timing/power estimates

• V1.0 release: June 2020

13

Placement https://github.com/abk-openroad/RePlAce

• RePlAce features• Timing-driven (OpenSTA timer integrated)• Mixed-size (macros + cells)• Electrostatics analogy in analytic

placement• RePlAce used in:

• Physical synthesis• Floorplanning• Clock tree synthesis• Traditional standard-cell placement

• BSD-3 License

Placement

.def from FP/PDN (+ .v, .sdc, .lef, .lib)

Placed .def

14

RePlAce: Routability-Driven Placement • Global routing during routability-driven global placement

Routability-driven loop

15

• OpenSTA: open-sourced static timing analysis tool• Developer: James Cherry (Parallax Software)• Tested with ASAP7, GF14, TSMC16, ST28, etc.• GPLv3 license

Static Timing Analysis https://github.com/abk-openroad/OpenSTA

16

aes_cipher_top WNS (ps) TNS (ps) #viol.Signoff STA -61 -289 7OpenSTA (arnoldi) -57 -314 9

aes_cipher_top (28nm, 12T, clkp=1000ps)

Reg-to-RegReg-to-Out/ In-to-Reg

Slack, WNS, TNS 28nm

17

Signoff STA OpenSTAWNS (ns) -0.660 -0.603TNS (ns) -1758.004 -1219.239

#viol. 8096 6926

Coyote (16nm, 9T, clkp=2000ps)

Slack, WNS, TNS 16nm

18

Challenges for the TAU Community• #1. Help improve open-source STA engine

• In particular: OpenSTA• Delay calculation, SI analysis, advanced timing models, MCMM, …• Priorities = ?

• Will revisit:Signoff STA OpenSTA

WNS (ns) -0.660 -0.603TNS (ns) -1758.004 -1219.239

#viol. 8096 6926

19

The OpenROAD Project • Initial target: digital IC flow “RTL to GDS”• Open source• No-human-in-loop

• Limited “knobs”, restricted field of use• Must replace intelligent humans (partition, floorplan, …)

20

• Motivations• OpenROAD + Initial Target• Machine Learning

Agenda

21

ML in IC Design: Not Like Chess or Cat Pics• Getting to self-driving IC design: not so obvious

• Do recent ML successes transfer well?• 3-week SP&R&Opt run is NOT like playing chess!

• Design lives in a {servers, licenses, schedule} box• Distributions of outcomes matter cloud, parallel

• A “stack of models” is mandatory: Predictions of downstream outcomes are also optimization objectives

• Still uncharted road to self-driving tools and flows • How do we overcome “small, expensive data” challenges?• Standards: Learning comes from {design + tool + technology},

all of which are highly proprietary• Need mechanisms for IP-preserving sharing of data and models

22

4 Stages of ML to Recover Time, Effort

Four Stages of Machine Learning

1. Mechanization and Automation

2. Orchestration of Search and Optimization

3. Pruning via Predictors and Models

4. From Reinforcement Learning through Intelligence

Huge space of tool, command, option trajectories through design

flow

23

• Prediction of tool- and design-specific outcomes over longer and longer subflows• Wiggling of longer and longer ropes

• Enables pruning and termination avoid wasted design resources• Simple way to think about it: “identify doomed X”• Doomed floorplan, Opt run, DRoute run, …• Allocate resources elsewhere• Better outcome within given resource budget

• Complementary dream: New heuristics and tools that are inherently more predictable and modelablelessen chaos• Ensembles might be modeled/predicted• Prediction requirement might be relaxed “get user into a ballpark”?

Stage 3. Modeling and Prediction

24

• NOTE: “Doomed” often wrt timing, or due to fear of timing!!!• Picture: progressions of #DR violations in commercial router• Simple approach: track and project metrics as time series• Can use Markov decision process (MDP): “GO” vs. “STOP”

strategy card to terminate “doomed runs” early

Generic Need: Predicting Doomed Runs

25

Obtaining Golden From Non-Golden

ML shifts the Accuracy-Cost Tradeoff Curve (for free) !

26

(Old) Example: ML-based Timer Correlation

ArtificialCircuits

Train Validate Test

NewDesigns

MODELS(Path slack, setup time, stage, cell,

wire delays)

If error >

threshold

Outliers (data points)

ONE-TIME

INCREMENTAL

RealDesigns

T1 Path Slack (ns)

T 2Pa

th S

lack

(ns)

31 ps

~4 reduction

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

-0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1

T 2Pa

th S

lack

(ns

)

T1 Path Slack (ns)

123 ps

ML Modeling

BEFORE AFTER

DATE14, SLIP15

27

• PBA (Path-Based Analysis) is less pessimistic than GBA (Graph-Based Analysis)

• But, can have MUCH more expensive runtime !• ML task: Predict PBA timing from GBA timing

• Improved quality of results in P&R, optimization• Less-expensive timing analysis usable earlier in flow

Lately: Predicting PBA from GBA

GBA Mode

PBA Mode

ICCD18

28

Bigram- and CART-based Modeling

Reduced GBA pessimism vs. PBA

• Bigram-based path modeling• Classification and regression tree

(CART) approach• Model based on 13 bigram parameters

https://vlsicad.ucsd.edu/Publications/Conferences/361/c361.pdf

29

• Want all the benefits of STA at N corners, but want to pay for analysis at only M << N corners• “Missing Corner Prediction” (“matrix completion”) saves runtime, licenses• “Primary corners” methodology errors caught at signoff cause iteration

Lately: Reduce #Corners in STA and OptDATE19

30

“Missing Corners” = Matrix Completion

STA at relatively few known corners reasonably accurate prediction of timing at all unknown corners PCA: low-dimensional modeling problem

Predicting missing delay values = matrix completion problem

31

Recent: Strong Design-Independent Models

Error

# Corners

Trained using initial artificial testcases

megaboom (990K instances, 350K FF)

Trained using richer artificial testcases

10X improvement !!

32

Recent: “ML-LEAK” (leakage recovery predictor)

• ML to predict how much leakage will be recovered if user runs {Tweaker, Tempus ECO, PTSI ECO, homegrown script, …}

• Gives expectation of post-recovery power• Beneficial to methodology team when trying out various DOEs.• Saves time for implementation team: skip leakage recovery if it won’t help

• Blended model of design and instance level predictions gives best results.

Power recovered in this design was 0.076%. Our model predicts 1% power recovery for this graph

Plot showing actual vs predicted percentage change in leakage

power after recovery

33

Recent: STA Modeling Project Optimization• TAU16 keynote: “pack tapeouts into design center” (ACM TODAES ’17)

• Today: “pack signoff STA runs into compute”• Peak memory mismatch: job dies, tapeout schedule compromised• Runtimes poorly estimated: tapeout schedule compromised• Poor packing: tapeout schedule compromised

• Two optimizations• ML to predict runtime, memory as function of resource (server, cores, cache,

RAM, contentiousness, timer knobs, design, corner …)• Scheduling/packing optimization (robust, incremental, …)

34

• Extensive DOEs ongoing (e.g., tool phases, contentiousness, run-to-run variation, …); interest/guidance from industry

Runtime, Memory Predictors: Not Trivial (!)

Runtime

Memory

35

“Challenges for the TAU Community”• #2. “TAU in service to …” a world of needed models

• Timing analysis is a means to an end!• One stage’s model is another stage’s optimization

objective• Compact LLE derates: diffusion breaks, gate cuts, coloring/mask

order, ... ASP-DAC19 SDB-DDB: https://vlsicad.ucsd.edu/Publications/Conferences/366/c366.pdf

• Compact dynamic IR drop impacts DATE19 M1 power stapling

• #3. TAU introspection• “Features that ML models would want to use, provided by domain

experts” • Optimization trajectories, timing graph topology, switching windows • (+ when layout info/costs available: congestion, legalization, etc.)• Contexts: leakage reduction, DVD fix, … (during next runs of block)• Customers want more: “Timing opt tools typically stop and report reasons why

they can’t make further fixes or optimizations. It would be helpful if tools can continue to try out other options and present what-if results, i.e., automatically explore solution space w.r.t. power, performance, runtime (e.g., cell displacement and additional ECO cycles).”

36

• Motivations• Initial Target• Machine Learning• Infrastructure for ML: METRICS

Agenda

37

• Support for ML in IC design• Standards for model encapsulation, model application, and IP

preservation when models are shared

• Standard ML platform for EDA modeling• Design metrics collection, (design-specific) modeling,

prediction of tool/flow outcomes• This recalls “METRICS” http://vlsicad.ucsd.edu/GSRC/metrics

• Datasets to support ML• Real designs, Artificial designs and “Eyecharts”• Shared training data – e.g., analysis correlation, post-route

DRV prediction, sizer move trajectories and outcomes, …• Challenges and incentives: “Kaggle for ML in IC design”

ML in IC Design Requires Infrastructure !

38

“METRICS” [DAC00, ISQED01]

• METRICS (1999; DAC00, ISQED01): “Measure to Improve”

• Goal #1: Predict outcome• Goal #2: Find sweet spot (field of use) of tool, flow• Goal #3: Dial in design-specific tool, flow knobs

http://vlsicad.ucsd.edu/GSRC/metrics

39

Original METRICS Architecture

• Instrumentation of design tools:• Wrapper scripts to extract data from outputs and logfiles, • Callable API codes that allow direct interaction from within

the design tools• METRICS server: central data collection (Oracle8i)• Data mining process: analyzes existing data to

improve existing design flow (CUBIST, etc.)

40

A Proposed METRICS 2.0 Architecture

White paper, WOSET-2018woset.org

41

METRICS 2.0 Dictionary Standard Naming

• JSON & MongoDB enable learning across the flow through cross referencing

• Currently: sharing draft privately• https://github.com/The-

OpenROAD-Project/METRICS-2.0• Collaboration welcome! email to [email protected]

tool1

tool2

{“net_name”:”n123”, “length”:45}….{“net_name”:”n123”, “parasitics”:5}….

MongoDB

42

METRICS 2.0++ (Grid, Federated, …)

• METRICS2.0 can open entirely new worlds• METRICS + Grid Computing• Privacy-preserving Federated ML

43

Idea: Federated Learning (with METRICS) !!!• Centralized

• Have storage and computation need on server

• Exposure of METRICS to public domain

• Federated• Light server, distributed, spare

cycle-aware training• Data remains private

Client Server Client Server

FederatedCentralized

44

“Challenges for the TAU Community”• #4. Contribute to METRICS2.0 names, semantics in timing

and optimization space (see #3)• #5. Contribute to development of standard methods to

generate data for machine learning in/around IC design tools: artificial data, eyechart data, mutant data, obfuscated data …• E.g., with provable privacy-preserving attributes, industry concurrence, …

• #6. Get out of comfort zone (= out of silo)• Sorry, but incremental/ECO for leakage, IR is still in comfort zone• Must understand layout (detailed placement, especially) better• P&R tool should really NOT say this has zero violations:

Signoff STA OpenSTAWNS (ns) -0.660 -0.603TNS (ns) -1758.004 -1219.239

#viol. 8096 6926

45

• Motivations• Initial Target• Machine Learning• Infrastructure for ML: METRICS• Conclusions

Remember: (1) Timing is now central to everything; (2) where there’s smoke, there’s fire (ML)

Agenda

46

• Two sides of same coin• Slack, margin, schedule all tied together

• What’s changed over the years?• Machine learning “inside and outside” (to reduce errors and

margins, avoid runs, reduce iterations, …) on the way• Open-source on the way• Stronger interactions (spatial, topological, temporal contexts)

demand “going outside comfort zone” in very broad sense• Challenges for the TAU Community

• Improve open-source STA engine• “TAU in service to X” models: LLE derates, dynIR impact…• TAU introspection (features for ML modeling) (+ what-ifs)• Contribute to METRICS 2.0 names in timing, opt spaces• Standardized data generation (artificial, obfuscated…)• Get out of comfort zone!

(always happy to discuss, collaborate… )

“From Recovering Time to Timing Recovery”

THANK YOU !

Documents

From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts