Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
From Recovering Time to Timing Recovery: Some Challenges
for the TAU Community
Andrew B. KahngDepts. of CSE and ECE
UC San [email protected]
http://vlsicad.ucsd.edu/~abk
2
TAU-2016 Keynote: “In Search of Lost Time”
• “Recovering Time”: machine learning, optimization, margin reduction, …
3
• MotivationsAgenda
4
Design Crises: Cost, Expertise, Unpredictability
• Quality: also not scaling• Design Capability Gap• Available density: 2x/node• Realizable density: 1.6x/node • Figure: UCSD / 2013 ITRS
• Design cost: not scaling• Design, process roadmaps
not coupled• Figure: Andreas Olofsson,
DARPA, ISPD-2018 keynote
5
Design is Too Difficult !• Tools and flows have steadily increased in complexity
• Modern P&R tool: 10000+ commands/options• Hard to design with latest tools in latest technologies
• Even harder to predict quality, schedule• Expert users required • Increased cost and risk not good for industry !
• Still have “CAD” mindset more than “DA” mindset• Again: assumes expert users
How do we escape this “local minimum” ?
6
IDEA: No-Humans, 24-Hours
A. Olofsson, DARPAISPD-2018 keynote
• Part of DARPA Electronics Resurgence Initiative
• Traditional focus: ultimate quality• New focus = ultimate ease of use• No humans, 24-hour TAT = “equivalent scaling”• Overarching goal: designer access to silicon
7
DARPA IDEA and POSH Programs, 2018-2022
https://vlsicad.ucsd.edu/NEWS18/dac_v5_DISTAR.pdf
8
theopenroadproject.org
9
OpenROAD: A New Design Paradigm
Quality Schedule Cost
Mindsets• Achieve predictability from the user’s POV• Use cloud/parallel to recover solution quality• Focus on reducing time and effort = schedule, cost
Machine Learning is CENTRAL to this
24 hours, no humans – no PPA loss
Design Complexity
Extre
me
parti
tioni
ng
Para
llel
optim
izatio
n
Mac
hine
Lea
rnin
g of
tool
s, fl
ows
Rest
ricte
d la
yout
10
The OpenROAD Project • Initial target: digital IC flow “RTL to GDS”• Open source• No-human-in-loop
• Limited “knobs”, restricted field of use• Must replace intelligent humans (partition, floorplan, …)
11
• Motivations• OpenROAD + Initial Target
Agenda
12
Initial Target: RTL-to-GDS Layout Generation
Logic Synthesis
Floorplan/PDN
Placement
Clock Tree Synthesis
Global and Detailed Routing
Layout Finishing
Verilog + .lib, .sdc, .lef
GDSII
• Inputs: .v, .sdc, .lib, .lef• .def, .spef in point tools• config files required• pre-characterizations required
• Outputs: post-route .def, timing/power estimates
• V1.0 release: June 2020
13
Placement https://github.com/abk-openroad/RePlAce
• RePlAce features• Timing-driven (OpenSTA timer integrated)• Mixed-size (macros + cells)• Electrostatics analogy in analytic
placement• RePlAce used in:
• Physical synthesis• Floorplanning• Clock tree synthesis• Traditional standard-cell placement
• BSD-3 License
Placement
.def from FP/PDN (+ .v, .sdc, .lef, .lib)
Placed .def
14
RePlAce: Routability-Driven Placement • Global routing during routability-driven global placement
Routability-driven loop
15
• OpenSTA: open-sourced static timing analysis tool• Developer: James Cherry (Parallax Software)• Tested with ASAP7, GF14, TSMC16, ST28, etc.• GPLv3 license
Static Timing Analysis https://github.com/abk-openroad/OpenSTA
16
aes_cipher_top WNS (ps) TNS (ps) #viol.Signoff STA -61 -289 7OpenSTA (arnoldi) -57 -314 9
aes_cipher_top (28nm, 12T, clkp=1000ps)
Reg-to-RegReg-to-Out/ In-to-Reg
Slack, WNS, TNS 28nm
17
Signoff STA OpenSTAWNS (ns) -0.660 -0.603TNS (ns) -1758.004 -1219.239
#viol. 8096 6926
Coyote (16nm, 9T, clkp=2000ps)
Slack, WNS, TNS 16nm
18
Challenges for the TAU Community• #1. Help improve open-source STA engine
• In particular: OpenSTA• Delay calculation, SI analysis, advanced timing models, MCMM, …• Priorities = ?
• Will revisit:Signoff STA OpenSTA
WNS (ns) -0.660 -0.603TNS (ns) -1758.004 -1219.239
#viol. 8096 6926
19
The OpenROAD Project • Initial target: digital IC flow “RTL to GDS”• Open source• No-human-in-loop
• Limited “knobs”, restricted field of use• Must replace intelligent humans (partition, floorplan, …)
20
• Motivations• OpenROAD + Initial Target• Machine Learning
Agenda
21
ML in IC Design: Not Like Chess or Cat Pics• Getting to self-driving IC design: not so obvious
• Do recent ML successes transfer well?• 3-week SP&R&Opt run is NOT like playing chess!
• Design lives in a {servers, licenses, schedule} box• Distributions of outcomes matter cloud, parallel
• A “stack of models” is mandatory: Predictions of downstream outcomes are also optimization objectives
• Still uncharted road to self-driving tools and flows • How do we overcome “small, expensive data” challenges?• Standards: Learning comes from {design + tool + technology},
all of which are highly proprietary• Need mechanisms for IP-preserving sharing of data and models
22
4 Stages of ML to Recover Time, Effort
Four Stages of Machine Learning
1. Mechanization and Automation
2. Orchestration of Search and Optimization
3. Pruning via Predictors and Models
4. From Reinforcement Learning through Intelligence
Huge space of tool, command, option trajectories through design
flow
23
• Prediction of tool- and design-specific outcomes over longer and longer subflows• Wiggling of longer and longer ropes
• Enables pruning and termination avoid wasted design resources• Simple way to think about it: “identify doomed X”• Doomed floorplan, Opt run, DRoute run, …• Allocate resources elsewhere• Better outcome within given resource budget
• Complementary dream: New heuristics and tools that are inherently more predictable and modelablelessen chaos• Ensembles might be modeled/predicted• Prediction requirement might be relaxed “get user into a ballpark”?
Stage 3. Modeling and Prediction
24
• NOTE: “Doomed” often wrt timing, or due to fear of timing!!!• Picture: progressions of #DR violations in commercial router• Simple approach: track and project metrics as time series• Can use Markov decision process (MDP): “GO” vs. “STOP”
strategy card to terminate “doomed runs” early
Generic Need: Predicting Doomed Runs
25
Obtaining Golden From Non-Golden
ML shifts the Accuracy-Cost Tradeoff Curve (for free) !
26
(Old) Example: ML-based Timer Correlation
ArtificialCircuits
Train Validate Test
NewDesigns
MODELS(Path slack, setup time, stage, cell,
wire delays)
If error >
threshold
Outliers (data points)
ONE-TIME
INCREMENTAL
RealDesigns
T1 Path Slack (ns)
T 2Pa
th S
lack
(ns)
31 ps
~4 reduction
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
-0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1
T 2Pa
th S
lack
(ns
)
T1 Path Slack (ns)
123 ps
ML Modeling
BEFORE AFTER
DATE14, SLIP15
27
• PBA (Path-Based Analysis) is less pessimistic than GBA (Graph-Based Analysis)
• But, can have MUCH more expensive runtime !• ML task: Predict PBA timing from GBA timing
• Improved quality of results in P&R, optimization• Less-expensive timing analysis usable earlier in flow
Lately: Predicting PBA from GBA
GBA Mode
PBA Mode
ICCD18
28
Bigram- and CART-based Modeling
Reduced GBA pessimism vs. PBA
• Bigram-based path modeling• Classification and regression tree
(CART) approach• Model based on 13 bigram parameters
https://vlsicad.ucsd.edu/Publications/Conferences/361/c361.pdf
29
• Want all the benefits of STA at N corners, but want to pay for analysis at only M << N corners• “Missing Corner Prediction” (“matrix completion”) saves runtime, licenses• “Primary corners” methodology errors caught at signoff cause iteration
Lately: Reduce #Corners in STA and OptDATE19
30
“Missing Corners” = Matrix Completion
STA at relatively few known corners reasonably accurate prediction of timing at all unknown corners PCA: low-dimensional modeling problem
Predicting missing delay values = matrix completion problem
31
Recent: Strong Design-Independent Models
Error
# Corners
Trained using initial artificial testcases
megaboom (990K instances, 350K FF)
Trained using richer artificial testcases
10X improvement !!
32
Recent: “ML-LEAK” (leakage recovery predictor)
• ML to predict how much leakage will be recovered if user runs {Tweaker, Tempus ECO, PTSI ECO, homegrown script, …}
• Gives expectation of post-recovery power• Beneficial to methodology team when trying out various DOEs.• Saves time for implementation team: skip leakage recovery if it won’t help
• Blended model of design and instance level predictions gives best results.
Power recovered in this design was 0.076%. Our model predicts 1% power recovery for this graph
Plot showing actual vs predicted percentage change in leakage
power after recovery
33
Recent: STA Modeling Project Optimization• TAU16 keynote: “pack tapeouts into design center” (ACM TODAES ’17)
• Today: “pack signoff STA runs into compute”• Peak memory mismatch: job dies, tapeout schedule compromised• Runtimes poorly estimated: tapeout schedule compromised• Poor packing: tapeout schedule compromised
• Two optimizations• ML to predict runtime, memory as function of resource (server, cores, cache,
RAM, contentiousness, timer knobs, design, corner …)• Scheduling/packing optimization (robust, incremental, …)
34
• Extensive DOEs ongoing (e.g., tool phases, contentiousness, run-to-run variation, …); interest/guidance from industry
Runtime, Memory Predictors: Not Trivial (!)
Runtime
Memory
35
“Challenges for the TAU Community”• #2. “TAU in service to …” a world of needed models
• Timing analysis is a means to an end!• One stage’s model is another stage’s optimization
objective• Compact LLE derates: diffusion breaks, gate cuts, coloring/mask
order, ... ASP-DAC19 SDB-DDB: https://vlsicad.ucsd.edu/Publications/Conferences/366/c366.pdf
• Compact dynamic IR drop impacts DATE19 M1 power stapling
• #3. TAU introspection• “Features that ML models would want to use, provided by domain
experts” • Optimization trajectories, timing graph topology, switching windows • (+ when layout info/costs available: congestion, legalization, etc.)• Contexts: leakage reduction, DVD fix, … (during next runs of block)• Customers want more: “Timing opt tools typically stop and report reasons why
they can’t make further fixes or optimizations. It would be helpful if tools can continue to try out other options and present what-if results, i.e., automatically explore solution space w.r.t. power, performance, runtime (e.g., cell displacement and additional ECO cycles).”
36
• Motivations• Initial Target• Machine Learning• Infrastructure for ML: METRICS
Agenda
37
• Support for ML in IC design• Standards for model encapsulation, model application, and IP
preservation when models are shared
• Standard ML platform for EDA modeling• Design metrics collection, (design-specific) modeling,
prediction of tool/flow outcomes• This recalls “METRICS” http://vlsicad.ucsd.edu/GSRC/metrics
• Datasets to support ML• Real designs, Artificial designs and “Eyecharts”• Shared training data – e.g., analysis correlation, post-route
DRV prediction, sizer move trajectories and outcomes, …• Challenges and incentives: “Kaggle for ML in IC design”
ML in IC Design Requires Infrastructure !
38
“METRICS” [DAC00, ISQED01]
• METRICS (1999; DAC00, ISQED01): “Measure to Improve”
• Goal #1: Predict outcome• Goal #2: Find sweet spot (field of use) of tool, flow• Goal #3: Dial in design-specific tool, flow knobs
http://vlsicad.ucsd.edu/GSRC/metrics
39
Original METRICS Architecture
• Instrumentation of design tools:• Wrapper scripts to extract data from outputs and logfiles, • Callable API codes that allow direct interaction from within
the design tools• METRICS server: central data collection (Oracle8i)• Data mining process: analyzes existing data to
improve existing design flow (CUBIST, etc.)
40
A Proposed METRICS 2.0 Architecture
White paper, WOSET-2018woset.org
41
METRICS 2.0 Dictionary Standard Naming
• JSON & MongoDB enable learning across the flow through cross referencing
• Currently: sharing draft privately• https://github.com/The-
OpenROAD-Project/METRICS-2.0• Collaboration welcome! email to [email protected]
tool1
tool2
{“net_name”:”n123”, “length”:45}….{“net_name”:”n123”, “parasitics”:5}….
MongoDB
42
METRICS 2.0++ (Grid, Federated, …)
• METRICS2.0 can open entirely new worlds• METRICS + Grid Computing• Privacy-preserving Federated ML
43
Idea: Federated Learning (with METRICS) !!!• Centralized
• Have storage and computation need on server
• Exposure of METRICS to public domain
• Federated• Light server, distributed, spare
cycle-aware training• Data remains private
Client Server Client Server
FederatedCentralized
44
“Challenges for the TAU Community”• #4. Contribute to METRICS2.0 names, semantics in timing
and optimization space (see #3)• #5. Contribute to development of standard methods to
generate data for machine learning in/around IC design tools: artificial data, eyechart data, mutant data, obfuscated data …• E.g., with provable privacy-preserving attributes, industry concurrence, …
• #6. Get out of comfort zone (= out of silo)• Sorry, but incremental/ECO for leakage, IR is still in comfort zone• Must understand layout (detailed placement, especially) better• P&R tool should really NOT say this has zero violations:
Signoff STA OpenSTAWNS (ns) -0.660 -0.603TNS (ns) -1758.004 -1219.239
#viol. 8096 6926
45
• Motivations• Initial Target• Machine Learning• Infrastructure for ML: METRICS• Conclusions
Remember: (1) Timing is now central to everything; (2) where there’s smoke, there’s fire (ML)
Agenda
46
• Two sides of same coin• Slack, margin, schedule all tied together
• What’s changed over the years?• Machine learning “inside and outside” (to reduce errors and
margins, avoid runs, reduce iterations, …) on the way• Open-source on the way• Stronger interactions (spatial, topological, temporal contexts)
demand “going outside comfort zone” in very broad sense• Challenges for the TAU Community
• Improve open-source STA engine• “TAU in service to X” models: LLE derates, dynIR impact…• TAU introspection (features for ML modeling) (+ what-ifs)• Contribute to METRICS 2.0 names in timing, opt spaces• Standardized data generation (artificial, obfuscated…)• Get out of comfort zone!
(always happy to discuss, collaborate… )
“From Recovering Time to Timing Recovery”
THANK YOU !