Delivering Leading Edge SolutionsDefining Signoff amidst the EDA-Foundry-Design Vortex
Richard Trihy
Director Design Methodology
2
Leading Edge Technology SolutionsComprehensive at 28nm, leadership at 20nm and beyond
Available 2013 2014 2015
20LPMLow
PowerMobile
28HPPHighPerfPlus
14XMeXtremeMobility
10XMeXtremeMobility
28SLPSuperLow
Power
High Performance Computing
Wired Applications, Networking
Consumer, Wireless, Mobile Computing
28LPSLow
PowerPolySi
3
28SLP Libraries and IPs - Available TodayFoundation IP IP Supplier
Std cells 9T, 12T, PMK, ECO
Memory compilersSP, DP, 1PRF, 2PRF, ROM compilers
GPIObi-directional, Analog, OSC, corner, filler cells
GPIO1.5V/1.8V Prog. GPIO, FT & analog
GPIO1.8V/2.5V/3.3V Prog. GPIO, FT, analog & RFIO-ESD
Basic IP IP Supplier
Specialty IO LibrarySSTL15 & SSTL15/18
Specialty IO LibraryOSC, LVDS, Sub LVDS, PCI, I2C & RGMII
PLL LibraryWide Range, Low Power, Low Area, Spread Spectrum
PLL Wide Range
PLL
WiFi/WiMax Analog Front-end (AFE)
Processor IP IP Supplier
ARM POP
High performance kit
Fast cache instance
Complex IP IP Supplier
USB2.0 PHYpicoPHY
USB3.0 PHY5.0 Gbps SuperSpeed
3G SerDes PHY
6G SerDes PHY
MIPI M-PHY
MIPI D-PHY
LPDDR3+ PHY
LPDDR2+ PHY
DDR3+ PHY
MHL/HDMI 2.0 Tx PHY
HDMI 2.0 Rx PHY
HSIC PHY
OTP
* Similar IP enablement available for 28HPP
14/20/28nm Digital and Analog/Mixed Signal Design Flows
Double Pattern-AwarePlace & Route
Double Pattern-Aware Extraction & Timing
Synthesis
Mask Decomposition & Physical Verification
Libraries, Tool Scripts, Techfiles, Designs
methodology proven on multiple tapeouts
GLOBALFOUNDRIES Downloadable Reference Flows
AMS Design Methodology Overview
LDE Aware Flow Functional Design
DPT aware Custom Layout
Post Layout Design Validation
Physical Verification and Decomposition
Collaboration: ARM Cadence GLOBALFOUNDRIES28SLP Implementation of A12 Core
Implement Cortex-A12 in SoC chip– AMBA bridge– Interrupt controller– System memory– High-speed PLL
Standard Cell Libraries– ARM SC12MC Base
– SLVT C30, LVT C30, LVT C38 (RVT C30)– ARM SC12MC High Performance Kit (HPK)
– SLVT C30, LVT C30, LVT C34 (RVT C30)
Fast Cache Instances– 11 FCI memory macros for CPU and nonCPU– 1 compiled memory macro for on-chip system memory
PLL– Low jitter GLOBALFOUNDRIES PLL– Extensive test and analog monitor interface
Cortex-A12 Quad Core Macro
SCU
L2 Cache Controller
PTM0
PTM1
Funn
el
TPIU
ATB
ATB
RTC PL031
GPIO PL061
TRACEPORT
RTCK
GPIO
DEBUG APB
APB-AP JTAG-DP
ROMTable
JTAG
Trick Box
Burn-in ROM
Wait for INT
ROM
Test Structures
AXI RAM Ctrl Upper SRAM
BP140
AXI RAM512 KB
Upper SRAM
DfT/MBIST Ctrl
Config
PLL
CFGCLK
CFGDATA
REFCLK
Cortex-A12CPU Core 0
32KB I$ / 32KB D$
NEON
Cortex-A12CPU Core 1
32KB I$ / 32KB D$
NEON
Cortex-A12CPU Core 2
32KB I$ / 32KB D$
NEON
Cortex-A12CPU Core 3
32KB I$ / 32KB D$
NEON PTM2
PTM3
2MB L2 Cache
ATB
ATB
AXI Synchronisation
AXI Bus Interconnect NIC400
AXI Slave
AXI Master AHB MasterAHB MasterAHB Master
APB Master
Interrupt ControllerGIC400
Synthesis & Physical Synthesis RTL Compiler® Physical
P&R (GigaOpt, CCOpt, Nanoroute)Encounter® Digital Implementation
Parasitics ExtractionQRC®
Static Timing AnalysisTempus®
Logic EquivalenceCheck
Conformal®LEC
Donar Quad-core Cortex-A12 Cadence Flow
Physical VerificationPVS®
signoff
RTL FPSDC
Netlist PLACEMENTSDC
Power AnalysisEPS®
Netlist LayoutSDC Parasitics
signoffsignoff
7
28HPP Delivers 3GHz on Dual Core Cortex-A9Data as measured in lab
Lab test setup
Actual measured values
8
PERFORMANCE
AREA
POWER
20LPM: Leading Edge Planar Platform for Mobile & Consumer
1.5
1.0
0.5
42%
61%
Total Power vs. Frequency
Frequency (AU)
Rel
ativ
e To
tal
Pow
er
28SLP 20LPM
Higher speed at same power42%
Lower power at same speed
61% PVTConditions
Process = TTVDD = sweepTemp. = 85C
PPA Relative to 28SLP
2x
Faster
Lower
Higher gate density
9
14XM FinFET for Power-sensitive Applications
Compute, Connect, Storage Market ApplicationsMobile and Wireless Market Applications
40% less power than 20nm 60% less power than 28nm
At comparable performance
MulticoreGPU Solutions
20% higher performance than 20nm 60% higher performance than 28nm
At comparable power
Power/perf OptimizedCPU Solutions
10
Leading Edge Technology SolutionsAddressing Design Challenges
Available 2013 2014 2015
20LPMLow
PowerMobile
28HPPHighPerfPlus
14XMeXtremeMobility
10XMeXtremeMobility
28SLPSuperLow
Power
High Performance Computing
Wired Applications, Networking
Consumer, Wireless, Mobile Computing
28LPSLow
PowerPolySi
Layout Dependent
Effects
Double PatterningNew MEOL
Finfets
11
Collaborative Development
• Design/Technology co-optimization
• Rapid iteration based on close collaboration
ARM IP + GF Design Expertise + EDA Vendor = Performance/Power Breakthrough
14XMCortexTM A9 Dual-Core
0.6
2.0
0.2 0.4 0.6 0.8 1.0
Relative Total Power
14XM-9T
28SLP-12T
Relative Performance
1.3
62% Power Reduction
61% Performanceimprovement
PDK
EDA FlowsIP
Design Rules
One Drawn Level Two Masks
Decomposition
Mask Shift Methodology
Double Pattern Corner Methodology
Double Patterning Impact on Extraction and Timing SignoffMust account for modeling of mask overlap
P&R Implementation FlowsIn Design fixing of DPT odd cycles is a key productivity feature
• Odd-cycle violation is a scenario where decomposition cannot resolve colors without color conflict
13
Violation due to odd-cycle
Pull down Menu from Encounter
In Design Odd-Cycle Fixing
with PVS
Advanced Node Variability and Margining ConsiderationsTraditional De-rating Inadequate at Advanced nodes
• Traditional On-Chip Variation (OCV) derates breaks down below 65nm• Optimistic on short paths and pessimistic on long paths • Derate depends on path depth, location, PVT, cell type
• AOCV provides more accurate margining methodology for 65nm and below, but …
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2002468
10121416182022
Percent Delay Change vs. Logic Path Depth
Small INV SSSmall INV TTSmall INV FFMedium INV SSMedium INV TTMedium INV FFLarge INV SSLarge INV TTLarge INV FF
Path Depth
Perc
ent D
elay
Cha
nge
From
Med
ian
optimistic here
pessimistic here
Single derate of 10%
• Modeling and Margining for Random Device Variability – Design and technology trends
• Variability increasing as gate area scales down
• Fmax increasing with technology scaling
– Much more accurate variability modeling and margining methodology required
– Industry has progressively moved to more accurate modeling of variability
• SOCV addresses AOCV shortcomings in graph-based STA
• Liberty Variation Format to address shortcomings in modeling the impact of input slew and output load on cell delay variation:
• LVF models variation as a first order effect: variation is dependent upon same factors as baseline cell delays (arc/slew/load/cell/PVT)
• Statistical Hold method under standardization on Liberty TAB
AOCV does not model variation on Slew/load, nor Hold Variation
Delay variability as a function of input slew and output load(min slew , min load) (min slew , max load) (mid slew , mid load) (max slew , max load) (max slew , min load)
Cell A 1.22X 1.56X 1.0X 1.11X 6.78X
Cell B 1.89X 1.33X 1.67X 1.44X 4.33X
Cell C 1.44X 1.78X 1.22X 1.33X 2.56X
Cell D 2.11X 1.67X 1.44X 1.44X 2.78X
Variation dependence on input slew/output load can be significant and must be modeled
16
EDA Foundry Collaboration essential for Advanced Nodes
• Open Collaboration part of GLOBALFOUNDRIES DNA
• Technology Design-Flow Co-development
• New Challenges to tackle– FINFETs : Will Miller Effect swamp our .Lib models?– FINFETs: Will EM/IR solutions hold up?– SADP Decomposition: New sources of variation– Margins: LVF rollout
Trademark AttributionGLOBALFOUNDRIES®, the GLOBALFOUNDRIES logo and combinations thereof, and GLOBALFOUNDRIES’ other trademarks and service marks are owned by GLOBALFOUNDRIES Inc. in the United States and/or other jurisdictions. All other brand names, product names, or trademarks belong to their respective owners and are used herein solely to identify the products and/or services offered by those trademark owners.
© 2013 GLOBALFOUNDRIES Inc. All rights reserved.
Thank youEmail : [email protected]