47
From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts. of CSE and ECE UC San Diego [email protected] http://vlsicad.ucsd.edu/~abk

From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

From Recovering Time to Timing Recovery: Some Challenges

for the TAU Community

Andrew B. KahngDepts. of CSE and ECE

UC San [email protected]

http://vlsicad.ucsd.edu/~abk

Page 2: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

2

TAU-2016 Keynote: “In Search of Lost Time”

• “Recovering Time”: machine learning, optimization, margin reduction, …

Page 3: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

3

• MotivationsAgenda

Page 4: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

4

Design Crises: Cost, Expertise, Unpredictability

• Quality: also not scaling• Design Capability Gap• Available density: 2x/node• Realizable density: 1.6x/node • Figure: UCSD / 2013 ITRS

• Design cost: not scaling• Design, process roadmaps

not coupled• Figure: Andreas Olofsson,

DARPA, ISPD-2018 keynote

Page 5: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

5

Design is Too Difficult !• Tools and flows have steadily increased in complexity

• Modern P&R tool: 10000+ commands/options• Hard to design with latest tools in latest technologies

• Even harder to predict quality, schedule• Expert users required • Increased cost and risk not good for industry !

• Still have “CAD” mindset more than “DA” mindset• Again: assumes expert users

How do we escape this “local minimum” ?

Page 6: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

6

IDEA: No-Humans, 24-Hours

A. Olofsson, DARPAISPD-2018 keynote

• Part of DARPA Electronics Resurgence Initiative

• Traditional focus: ultimate quality• New focus = ultimate ease of use• No humans, 24-hour TAT = “equivalent scaling”• Overarching goal: designer access to silicon

Page 7: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

7

DARPA IDEA and POSH Programs, 2018-2022

https://vlsicad.ucsd.edu/NEWS18/dac_v5_DISTAR.pdf

Page 8: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

8

theopenroadproject.org

Page 9: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

9

OpenROAD: A New Design Paradigm

Quality Schedule Cost

Mindsets• Achieve predictability from the user’s POV• Use cloud/parallel to recover solution quality• Focus on reducing time and effort = schedule, cost

Machine Learning is CENTRAL to this

24 hours, no humans – no PPA loss

Design Complexity

Extre

me

parti

tioni

ng

Para

llel

optim

izatio

n

Mac

hine

Lea

rnin

g of

tool

s, fl

ows

Rest

ricte

d la

yout

Page 10: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

10

The OpenROAD Project • Initial target: digital IC flow “RTL to GDS”• Open source• No-human-in-loop

• Limited “knobs”, restricted field of use• Must replace intelligent humans (partition, floorplan, …)

Page 11: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

11

• Motivations• OpenROAD + Initial Target

Agenda

Page 12: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

12

Initial Target: RTL-to-GDS Layout Generation

Logic Synthesis

Floorplan/PDN

Placement

Clock Tree Synthesis

Global and Detailed Routing

Layout Finishing

Verilog + .lib, .sdc, .lef

GDSII

• Inputs: .v, .sdc, .lib, .lef• .def, .spef in point tools• config files required• pre-characterizations required

• Outputs: post-route .def, timing/power estimates

• V1.0 release: June 2020

Page 13: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

13

Placement https://github.com/abk-openroad/RePlAce

• RePlAce features• Timing-driven (OpenSTA timer integrated)• Mixed-size (macros + cells)• Electrostatics analogy in analytic

placement• RePlAce used in:

• Physical synthesis• Floorplanning• Clock tree synthesis• Traditional standard-cell placement

• BSD-3 License

Placement

.def from FP/PDN (+ .v, .sdc, .lef, .lib)

Placed .def

Page 14: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

14

RePlAce: Routability-Driven Placement • Global routing during routability-driven global placement

Routability-driven loop

Page 15: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

15

• OpenSTA: open-sourced static timing analysis tool• Developer: James Cherry (Parallax Software)• Tested with ASAP7, GF14, TSMC16, ST28, etc.• GPLv3 license

Static Timing Analysis https://github.com/abk-openroad/OpenSTA

Page 16: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

16

aes_cipher_top WNS (ps) TNS (ps) #viol.Signoff STA -61 -289 7OpenSTA (arnoldi) -57 -314 9

aes_cipher_top (28nm, 12T, clkp=1000ps)

Reg-to-RegReg-to-Out/ In-to-Reg

Slack, WNS, TNS 28nm

Page 17: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

17

Signoff STA OpenSTAWNS (ns) -0.660 -0.603TNS (ns) -1758.004 -1219.239

#viol. 8096 6926

Coyote (16nm, 9T, clkp=2000ps)

Slack, WNS, TNS 16nm

Page 18: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

18

Challenges for the TAU Community• #1. Help improve open-source STA engine

• In particular: OpenSTA• Delay calculation, SI analysis, advanced timing models, MCMM, …• Priorities = ?

• Will revisit:Signoff STA OpenSTA

WNS (ns) -0.660 -0.603TNS (ns) -1758.004 -1219.239

#viol. 8096 6926

Page 19: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

19

The OpenROAD Project • Initial target: digital IC flow “RTL to GDS”• Open source• No-human-in-loop

• Limited “knobs”, restricted field of use• Must replace intelligent humans (partition, floorplan, …)

Page 20: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

20

• Motivations• OpenROAD + Initial Target• Machine Learning

Agenda

Page 21: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

21

ML in IC Design: Not Like Chess or Cat Pics• Getting to self-driving IC design: not so obvious

• Do recent ML successes transfer well?• 3-week SP&R&Opt run is NOT like playing chess!

• Design lives in a {servers, licenses, schedule} box• Distributions of outcomes matter cloud, parallel

• A “stack of models” is mandatory: Predictions of downstream outcomes are also optimization objectives

• Still uncharted road to self-driving tools and flows • How do we overcome “small, expensive data” challenges?• Standards: Learning comes from {design + tool + technology},

all of which are highly proprietary• Need mechanisms for IP-preserving sharing of data and models

Page 22: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

22

4 Stages of ML to Recover Time, Effort

Four Stages of Machine Learning

1. Mechanization and Automation

2. Orchestration of Search and Optimization

3. Pruning via Predictors and Models

4. From Reinforcement Learning through Intelligence

Huge space of tool, command, option trajectories through design

flow

Page 23: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

23

• Prediction of tool- and design-specific outcomes over longer and longer subflows• Wiggling of longer and longer ropes

• Enables pruning and termination avoid wasted design resources• Simple way to think about it: “identify doomed X”• Doomed floorplan, Opt run, DRoute run, …• Allocate resources elsewhere• Better outcome within given resource budget

• Complementary dream: New heuristics and tools that are inherently more predictable and modelablelessen chaos• Ensembles might be modeled/predicted• Prediction requirement might be relaxed “get user into a ballpark”?

Stage 3. Modeling and Prediction

Page 24: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

24

• NOTE: “Doomed” often wrt timing, or due to fear of timing!!!• Picture: progressions of #DR violations in commercial router• Simple approach: track and project metrics as time series• Can use Markov decision process (MDP): “GO” vs. “STOP”

strategy card to terminate “doomed runs” early

Generic Need: Predicting Doomed Runs

Page 25: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

25

Obtaining Golden From Non-Golden

ML shifts the Accuracy-Cost Tradeoff Curve (for free) !

Page 26: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

26

(Old) Example: ML-based Timer Correlation

ArtificialCircuits

Train Validate Test

NewDesigns

MODELS(Path slack, setup time, stage, cell,

wire delays)

If error >

threshold

Outliers (data points)

ONE-TIME

INCREMENTAL

RealDesigns

T1 Path Slack (ns)

T 2Pa

th S

lack

(ns)

31 ps

~4 reduction

-0.6

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

-0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1

T 2Pa

th S

lack

(ns

)

T1 Path Slack (ns)

123 ps

ML Modeling

BEFORE AFTER

DATE14, SLIP15

Page 27: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

27

• PBA (Path-Based Analysis) is less pessimistic than GBA (Graph-Based Analysis)

• But, can have MUCH more expensive runtime !• ML task: Predict PBA timing from GBA timing

• Improved quality of results in P&R, optimization• Less-expensive timing analysis usable earlier in flow

Lately: Predicting PBA from GBA

GBA Mode

PBA Mode

ICCD18

Page 28: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

28

Bigram- and CART-based Modeling

Reduced GBA pessimism vs. PBA

• Bigram-based path modeling• Classification and regression tree

(CART) approach• Model based on 13 bigram parameters

https://vlsicad.ucsd.edu/Publications/Conferences/361/c361.pdf

Page 29: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

29

• Want all the benefits of STA at N corners, but want to pay for analysis at only M << N corners• “Missing Corner Prediction” (“matrix completion”) saves runtime, licenses• “Primary corners” methodology errors caught at signoff cause iteration

Lately: Reduce #Corners in STA and OptDATE19

Page 30: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

30

“Missing Corners” = Matrix Completion

STA at relatively few known corners reasonably accurate prediction of timing at all unknown corners PCA: low-dimensional modeling problem

Predicting missing delay values = matrix completion problem

Page 31: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

31

Recent: Strong Design-Independent Models

Error

# Corners

Trained using initial artificial testcases

megaboom (990K instances, 350K FF)

Trained using richer artificial testcases

10X improvement !!

Page 32: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

32

Recent: “ML-LEAK” (leakage recovery predictor)

• ML to predict how much leakage will be recovered if user runs {Tweaker, Tempus ECO, PTSI ECO, homegrown script, …}

• Gives expectation of post-recovery power• Beneficial to methodology team when trying out various DOEs.• Saves time for implementation team: skip leakage recovery if it won’t help

• Blended model of design and instance level predictions gives best results.

Power recovered in this design was 0.076%. Our model predicts 1% power recovery for this graph

Plot showing actual vs predicted percentage change in leakage

power after recovery

Page 33: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

33

Recent: STA Modeling Project Optimization• TAU16 keynote: “pack tapeouts into design center” (ACM TODAES ’17)

• Today: “pack signoff STA runs into compute”• Peak memory mismatch: job dies, tapeout schedule compromised• Runtimes poorly estimated: tapeout schedule compromised• Poor packing: tapeout schedule compromised

• Two optimizations• ML to predict runtime, memory as function of resource (server, cores, cache,

RAM, contentiousness, timer knobs, design, corner …)• Scheduling/packing optimization (robust, incremental, …)

Page 34: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

34

• Extensive DOEs ongoing (e.g., tool phases, contentiousness, run-to-run variation, …); interest/guidance from industry

Runtime, Memory Predictors: Not Trivial (!)

Runtime

Memory

Page 35: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

35

“Challenges for the TAU Community”• #2. “TAU in service to …” a world of needed models

• Timing analysis is a means to an end!• One stage’s model is another stage’s optimization

objective• Compact LLE derates: diffusion breaks, gate cuts, coloring/mask

order, ... ASP-DAC19 SDB-DDB: https://vlsicad.ucsd.edu/Publications/Conferences/366/c366.pdf

• Compact dynamic IR drop impacts DATE19 M1 power stapling

• #3. TAU introspection• “Features that ML models would want to use, provided by domain

experts” • Optimization trajectories, timing graph topology, switching windows • (+ when layout info/costs available: congestion, legalization, etc.)• Contexts: leakage reduction, DVD fix, … (during next runs of block)• Customers want more: “Timing opt tools typically stop and report reasons why

they can’t make further fixes or optimizations. It would be helpful if tools can continue to try out other options and present what-if results, i.e., automatically explore solution space w.r.t. power, performance, runtime (e.g., cell displacement and additional ECO cycles).”

Page 36: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

36

• Motivations• Initial Target• Machine Learning• Infrastructure for ML: METRICS

Agenda

Page 37: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

37

• Support for ML in IC design• Standards for model encapsulation, model application, and IP

preservation when models are shared

• Standard ML platform for EDA modeling• Design metrics collection, (design-specific) modeling,

prediction of tool/flow outcomes• This recalls “METRICS” http://vlsicad.ucsd.edu/GSRC/metrics

• Datasets to support ML• Real designs, Artificial designs and “Eyecharts”• Shared training data – e.g., analysis correlation, post-route

DRV prediction, sizer move trajectories and outcomes, …• Challenges and incentives: “Kaggle for ML in IC design”

ML in IC Design Requires Infrastructure !

Page 38: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

38

“METRICS” [DAC00, ISQED01]

• METRICS (1999; DAC00, ISQED01): “Measure to Improve”

• Goal #1: Predict outcome• Goal #2: Find sweet spot (field of use) of tool, flow• Goal #3: Dial in design-specific tool, flow knobs

http://vlsicad.ucsd.edu/GSRC/metrics

Page 39: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

39

Original METRICS Architecture

• Instrumentation of design tools:• Wrapper scripts to extract data from outputs and logfiles, • Callable API codes that allow direct interaction from within

the design tools• METRICS server: central data collection (Oracle8i)• Data mining process: analyzes existing data to

improve existing design flow (CUBIST, etc.)

Page 40: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

40

A Proposed METRICS 2.0 Architecture

White paper, WOSET-2018woset.org

Page 41: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

41

METRICS 2.0 Dictionary Standard Naming

• JSON & MongoDB enable learning across the flow through cross referencing

• Currently: sharing draft privately• https://github.com/The-

OpenROAD-Project/METRICS-2.0• Collaboration welcome! email to [email protected]

tool1

tool2

{“net_name”:”n123”, “length”:45}….{“net_name”:”n123”, “parasitics”:5}….

MongoDB

Page 42: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

42

METRICS 2.0++ (Grid, Federated, …)

• METRICS2.0 can open entirely new worlds• METRICS + Grid Computing• Privacy-preserving Federated ML

Page 43: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

43

Idea: Federated Learning (with METRICS) !!!• Centralized

• Have storage and computation need on server

• Exposure of METRICS to public domain

• Federated• Light server, distributed, spare

cycle-aware training• Data remains private

Client Server Client Server

FederatedCentralized

Page 44: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

44

“Challenges for the TAU Community”• #4. Contribute to METRICS2.0 names, semantics in timing

and optimization space (see #3)• #5. Contribute to development of standard methods to

generate data for machine learning in/around IC design tools: artificial data, eyechart data, mutant data, obfuscated data …• E.g., with provable privacy-preserving attributes, industry concurrence, …

• #6. Get out of comfort zone (= out of silo)• Sorry, but incremental/ECO for leakage, IR is still in comfort zone• Must understand layout (detailed placement, especially) better• P&R tool should really NOT say this has zero violations:

Signoff STA OpenSTAWNS (ns) -0.660 -0.603TNS (ns) -1758.004 -1219.239

#viol. 8096 6926

Page 45: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

45

• Motivations• Initial Target• Machine Learning• Infrastructure for ML: METRICS• Conclusions

Remember: (1) Timing is now central to everything; (2) where there’s smoke, there’s fire (ML)

Agenda

Page 46: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

46

• Two sides of same coin• Slack, margin, schedule all tied together

• What’s changed over the years?• Machine learning “inside and outside” (to reduce errors and

margins, avoid runs, reduce iterations, …) on the way• Open-source on the way• Stronger interactions (spatial, topological, temporal contexts)

demand “going outside comfort zone” in very broad sense• Challenges for the TAU Community

• Improve open-source STA engine• “TAU in service to X” models: LLE derates, dynIR impact…• TAU introspection (features for ML modeling) (+ what-ifs)• Contribute to METRICS 2.0 names in timing, opt spaces• Standardized data generation (artificial, obfuscated…)• Get out of comfort zone!

(always happy to discuss, collaborate… )

“From Recovering Time to Timing Recovery”

Page 47: From Recovering Time to Timing Recovery: Some Challenges for … · 2019-03-27 · From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts

THANK YOU !