32
1 1 Increasing Your Processor Performance with ARM Advantage Memories and Standard Cells 常骊波 常骊波 常骊波 常骊波 ARM 中国 中国 中国 中国 2007 12 ARM Advantage物理 物理 物理 物理IP技术 技术 技术 技术 增加你的处理器性能 增加你的处理器性能 增加你的处理器性能 增加你的处理器性能

ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

Embed Size (px)

Citation preview

Page 1: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

11

Increasing Your Processor Performance with ARM Advantage

Memories and Standard Cells

常骊波常骊波常骊波常骊波

ARM中国中国中国中国2007年年年年12月月月月

ARM Advantage 物理物理物理物理IP技术技术技术技术增加你的处理器性能增加你的处理器性能增加你的处理器性能增加你的处理器性能

Page 2: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

22

ARM966E-S™

ARM1026EJ-S™

2005

DM

IPS

250

300

500

ARM7TDMI®

100

ARM946E-S™

Cortex-M3

ARM968E-S™

600

ARM926EJ-S™

Cortex-A8

1000+

ARM1176JZF-S™

ARM1136EJ-S™

2000+

2006

ARM® Cortex™“Intelligent Computing”

ARM11™ MPCore™ x4

ARM1156T2F-S™

ARM7TDMI-S™

ARM7EJ-S™

广泛的广泛的广泛的广泛的ARM处理器供您选择处理器供您选择处理器供您选择处理器供您选择

Cortex-R4

Cortex-A9 ™

Page 3: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

33

但是但是但是但是….

Are you getting the optimum benefit?

Page 4: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

44

或者您所面对的正是或者您所面对的正是或者您所面对的正是或者您所面对的正是 ?A fast processor with slow memory is like driving a sports car in

heavy traffic….

Page 5: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

55

选择正确的处理器选择正确的处理器选择正确的处理器选择正确的处理器,,,,然后然后然后然后…A

RM

117

6JZ

F-S +

The right ARM core Optimized ARM Physical IP

WINNER

Page 6: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

66

ARM Processor Performance Package� The ‘Processor Performance Package’ (PPP) is ARM Artisan

Physical IP that is optimized for use with high performance ARM processors.

� Specially designed and optimized Memory Instances for Processor memory

� High Performance Advantage-HS 12 track standard cell library

� Floor planning guidelines and other configuration files for “out of the box” implementation

Page 7: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

77

为什么选择为什么选择为什么选择为什么选择 PPP ?

� Physical implementation of the processor determines system throughput .� Choice of cell library affects power and area numbers.

� Processor memory performance impacts system performance .

� PPP provides for up to 20% performance increase over mainstream Advantage memories� With minimal impact on dynamic power

� Very little area impact.

� Floor planning guidelines & other ARM documentation make implementation simple.

Page 8: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

88

设计流程设计流程设计流程设计流程

ARM1176JZ[F]-S

ConfigurationStep 1

Page 9: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

99

设计流程设计流程设计流程设计流程 - Processor Configuration

� ARM 1176JZ[F]-S configuration� Use the verilog memory wrappers to connect the processor signals to

the fast memory instances.

� Use the Clock gating cell provided in the PPP to implement high level architectural clock gating .

� Validate the connections between the processor and the memory instances using the test bench provided.

Page 10: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

1010

设计流程设计流程设计流程设计流程

ARM1176JZ[F]-S

Configuration

Prepare Libraries for EDA Flow

Step 1

Step 2

Page 11: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

1111

设计流程设计流程设计流程设计流程 - Prepare EDA Libraries

� ARM 1176JZ[F]-S configuration� Use the verilog memory wrappers to connect the processor signals to

the fast memory instances.

� Use the Clock gating cell provided in the PPP to implement high level architectural clock gating .

� Validate the connections between the processor and the memory instances using the test bench provided.

� Prepare EDA libraries� Use the scripts provided to generate the Synopsys Milkyway views,

Cadence VoltageStorm views and Magma Volcano views.

Page 12: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

1212

设计流程设计流程设计流程设计流程

ARM1176JZ[F]-S

Configuration

Prepare Libraries for EDA Flow

Perform Implementation

Step 1

Step 2

Step 3

Page 13: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

1313

设计流程设计流程设计流程设计流程 - Perform Implementation

� ARM 1176JZ[F]-S configuration� Use the verilog memory wrappers to connect the processor signals to

the fast memory instances.

� Use the Clock gating cell provided in the PPP to implement high level architectural clock gating .

� Validate the connections between the processor and the memory instances using the test bench provided.

� Prepare EDA libraries� Use the scripts provided to generate the Synopsys Milkyway views

Cadence VoltageStorm views and Magma Volcano views.

� Perform Implementation� Request backend views for GDS2 stream out and transistor level

DRC/LVS analysis from ARM

Page 14: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

1414

Library Preparation

Standard Cell Library Preparation

For

Synopsys

flow

For

Cadence

flow

For

Magma

flow

Memory Library Preparation

For

Synopsys

flow

For

Cadence

flow

For

Magma

flow

Page 15: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

1515

Library Preparation

� Standard Cell Library Preparation� For a Synopsys flow, Milkyway libraries of the standard cells are

provided as part of the Advantage-HS standard cell library.

� For a Cadence flow, VoltageStorm views can be generated using the scripts provided .

� For a Magma flow, Scripts are provided for generating the views for both standard cells and memories

� Memory Library Preparation� Scripts are provided for generating Synopsys Milkyway , Cadence

VoltageStorm and Magma Volcano views.

Page 16: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

1616

Implementation-Synopsys flow

Page 17: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

1717

Implementation-Cadence Flow

Page 18: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

1818

Implementation-Magma FlowARM Processor IPARM 1176JZ(F)S

Technology File

(65LP from TSMC)

RC Rules(65LP from TSMC)

Tool specificATPG libraries for memories

Verilog Libraries for Std cells(tsmc65lp_rvt_sc_adv12.v)

Volcano of Std cellsand Memories

Logical/Physical Synthesis

Blast Create, Talus Design

Place & Route

Blast Fusion, Talus Vortex

Signoff Timing & Noise Analysis

Quartz Time

ATPG

Talus ATPG

Signoff Parasitic Extraction

Quartz RC

Page 19: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

1919

ARM Reference Methodology (iRM)� ARM Reference Methodologies are designed to provide ARM

Partners with a simple, deterministic and rapid route from RTL to GDSII

� The iRM takes a configured RTL representation of an ARM core and performs implementation to a cell level DRC/LVS clean representation

� It provides an accompanying set of models for specific characteristics( timing,test,physical) of the final implementation

� The Processor Performance Package can be easily integrated into an iRM if higher achievable performance or cache configuration changes are required

Page 20: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

2020

ARM 1176JZ[F]-S Performance Package for TSMC65LP

86.70 µWStatic Power

0.363 mW/MHzDynamic Power

1.80mm2Area

506MHzFrequency

Nominal Vt onlyFrequency data from PrimeTime-SI @ ss,1.08V, 125C (un-margined)Power results Dhrystone @ tt, 1.2V, 25CArea includes RAM @ 84% utilization

Page 21: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

2121

Performance without Penalty

ARM Validated deliverablesReduce Risk

Standard cell architecture and memory access timing is critical to CPU speed

� Optimized memory’s improve access timing without compromising area.

� Advantage-HS 12 track standard cell architecture is designed for high performance

20% Performance increase.

automem configuration script for synthesis supporting cache sizes :

8K/8K, 16K/16K, 32K/32K

Reduce time to market

Using Lvt to achieve equivalent speed can add up to 5% wafer cost + additional mask cost.

Save $

FeatureBenefit

Page 22: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

2222

ARM1176 Performance Package deliverables

� ARM Advantage-HS standard cell library. (CLN65LP)� 12 Track high cell architecture for high performance

� Large cell set with over 900 cells and fine drive strength granularity

� Multiple beta ratios for often used cells enabling power/performance optimization

� Robust power rail architecture to support high performance designs

� Pre-Configured RAM instances for All Cache configurations� Performance numbers achieved using Rvt only

� DFT views provided Fastscan and Tetramax

� Documentation includes : � Automatic Memory Configuration for L1 Cache Instances (8K/8K,

16K/16K, 32K/32K, only)

� Guidelines on the integration of TCM memories.

� Library preparation for Synopsys, Cadence and Magma EDA tools flow

� Floor planning guidelines and references to other ARM documentation

Page 23: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

2323

所支持的所支持的所支持的所支持的ARM处理器处理器处理器处理器

� ARM 926

� Cortex-R4

� Cortex-A8

� Cortex-A9

Page 24: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

2424

挑战挑战挑战挑战 – Implementation Ranges

WANTEDHigher performance

WANTEDLower power

Higher area density

Nominalperformance

200

250

300

350

400

150 200 250 300 350mW

MH

zYou can accomplish all these with the Processor Performance Package and other ARM Physical IP

Page 25: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

2525

移动应用移动应用移动应用移动应用

� High speed required for embedded processor (~650MHz)

� High density for rest of the SoC (~300MHz)

� Aggressive power management� Low leakage “LP/LL” processes

� Multi-VT designs

� Low voltage operation

� Retention and shutdown modes

� Processor Performance Package is the best choice for the higher-speed ARM processors

� Advantage memory is the most appropriate choice for the high-speed section

� Metro memory is the most appropriate choice for the high density section

Page 26: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

2626

办公或企业应用办公或企业应用办公或企业应用办公或企业应用

� High speed required over the entire chip (>750MHz)

� Typically use G or high-speed processes

� Speed is the key criterion� Processor Performance package offers the ideal solution

� Setup time + access time

� Memories need to support pipelined outputs for better timing

� High-capacity memories are also required� 2-4Mbits of contiguous SRAM

� Advantage & Advantage-HS memory with pipelined outputs is the most appropriate choice

� In some cases, low VT devices may be used in the periphery to further improve access time

� Large SRAMs greater than 1Mbit are also required

Page 27: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

2727

高性能消费应用高性能消费应用高性能消费应用高性能消费应用� High speed required for embedded processor (~650MHz)

� High density for rest of the SoC (~300MHz)

� Moderate power management� G or low leakage “LP/LL” processes

� Multi-VT designs

� Voltage islands

� Large memories may be required� Up to 4Mbits of single-port SRAM

� Advantage memory with mixed VT periphery is the most appropriate choice for the high-speed section

� Metro memory with mixed VT periphery is the most appropriate choice for the high-density section

� SRAMs larger than 1Mbit are available as instances

Page 28: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

2828

低成本消费应用低成本消费应用低成本消费应用低成本消费应用� Moderate speed required over entire SoC (<300 MHz)

� High density required for entire SoC

� Moderate power management� Low leakage “LP/LL” processes� Multi-VT designs � Voltage islands

� Low speed subsegment (< 100MHz)� Very low leakage requirements� Low voltage operation

� Metro memory with mixed VT periphery is the most appropriate choice for the moderate speed segment

� Metro memory with all high VT periphery is the most appropriate choice for the low speed segment

� Memory power management should be used across the chip

Page 29: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

2929

� All of the options needed to give the optimum PPA trade-off

� Available at multiple Vt

� PMK for low-power at nominal Vt (RVt)

� Advantage-HS (LVt) with Cortex-A8 for maximum performance in consumer devices

� 65nm platforms available for TSMC and Common Platform

65nm High Performance PlatformProductStandard Cells

Advantage SC 10T RVt, HVt, LVtAdvantage PMK 10T RVtMetro SC 8T RVt, HVtMetro PMK 8T RVtAdvantage SC 12T RVt, HVt, LVtAdvantage PMK 12T RVt

Memory GeneratorsAdvantage SRAM-SP 64 Rows/BankAdvantage SRAM-DPAdvantage RF-SPAdvantage RF-2PAdvantage ROM-VIAMetro SRAM-SP 128 Rows/BankemBISTRx

I/O ProductsLVDS 850 MHz, 2.5VHSTL Class I/II 2.5VDDR1/2 flip-chipDDR1/2 wire-bond 2.5V - CUP

High Speed Serial PHYsPCI Express 1.1PCI Express 2.0Xuai 3.125GbpsCEI Short-Reach 6.4Gbps10G

Page 30: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

3030

45nm Low Power Mobile Platform

45nm platform based on IBM CMOS11LP and TSMC 45GS platform also available for licensing today

� Manufacturability becoming major issue

� Yield, variability, test/repair

� Increased investment will pay off as reduces cost for high-volume devices

� Cortex-A9 with PMK delivers high-performance and low-power for Connected Mobile Computers

Standard CellsMetro SC 9T RVt, HVt, LVtMetro PMK 9T RVt, HVt, LVtAdvantage SC 12T RVt, HVt, LVtAdvantage PMK 12T RVt, HVt, LVt

Memory GeneratorsAdvantage SRAM-SP (Large Bit cell) 64 / 128 R/BAdvantage SRAM-SP (Small Bit Cell) 64 / 128 R/BAdvantage SRAM-DP 64 / 128 R/BAdvantage RF-SP 128 R/BAdvantage RF-2P 128 R/BAdvantage ROM-VIA 64 R/B

Memory Self-Test and RepairemBISTRx

I/O Products - Inline/StaggeredGPIO Programmable LVDS SSTL_18 SSTL_2 USB 1.1 PCI-X HSTL Class I/II

DDR ProductsMDDR

Page 31: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

3131

结论结论结论结论

� ARM Cell Libraries and Memories give you a predictable route to silicon with a industry standard methodology.

� The ARM Processor Performance Package helps you get the best PPA performance out of your ARM processor.

� Reference methodology and other ARM documents make implementation an easy task

� You can target a variety of application using the Processor Performance package combined with other ARM Physical IP.

Page 32: ARM Advantage 物理IP 技术 增加你的处理器性能 · Partners with a simple, deterministic and rapid route from RTL to GDSII The iRM takes a configured RTL representation

3232

谢谢!

For more information

www.arm.com or [email protected]