Upload
patricia-trower
View
225
Download
1
Tags:
Embed Size (px)
Citation preview
Intelligent Interconnects for Multicore SoC’sIntelligent Interconnects for Multicore SoC’sDrew WingardDrew Wingard, , CTO, Sonics, Inc.CTO, Sonics, Inc.
OCIN06: December 6, 2006OCIN06: December 6, 2006
OCIN06: Intelligent Interconnects for Multicore SoC’s 2Dec. 6, 2006
Agenda
• SoC Background• Interconnect Architecture• Application Areas
OCIN06: Intelligent Interconnects for Multicore SoC’s 3Dec. 6, 2006
The Goal
Create a system-on-a-chip, comprising ten million gates, that satisfies rapidly-evolving market requirements for:
– Speed– Power– Area– Application Performance– Time to Market
Using the minimum resourceswith maximum predictability
OCIN06: Intelligent Interconnects for Multicore SoC’s 4Dec. 6, 2006
SoC Architecture Trends• Massive feature integration
– Driven largely by Moore’s Law (supply) and convergence (demand)
• Continued movement of complexity to software
• Distributed architectures– Higher scalability (and independence?)
• Multiple processors– CPU– DSP– Special purpose (MPEG, packet, …)
• Distributed DMA– Removes centralized DMA bottleneck– Simplifies driver software integration
CPUCPU CPUCPU MPEG MPEG MPEG MPEG DSP DSP DSP DSP
3D GFX3D GFX 3D GFX3D GFX MAC MAC MAC MAC
Video Video I/OI/O
Video Video I/OI/O
Comm Comm I/O I/O
Comm Comm I/O I/O
DRAM DRAM
ControllerController DRAM DRAM
ControllerController
System On Chip
OCIN06: Intelligent Interconnects for Multicore SoC’s 5Dec. 6, 2006
Why SoC’s Are Difficult• Improvement by feature integration has been practiced for
decades– Why is SoC any different?
• Benefits of an SoC– Higher performance– Smaller footprint– Lower power– Lower cost
• Size, power, and cost benefits derive from sharing– Control software on CPU (interrupts, etc.)– Memory resources (on-chip and off-chip) – single external DRAM
• Building predictable systems with so much sharing is hard– Dozens to hundreds of interrupt sources– Memory bandwidth bottlenecks + lots of real-time traffic stress
Unified Memory Architecture (UMA)
OCIN06: Intelligent Interconnects for Multicore SoC’s 6Dec. 6, 2006
Limits of Tightly Coupled Design
• Tightly coupled design has been dominant– Assumes largely synchronous, instantaneous and
free communication– Widely practiced, and supported in design flows
• BUT– Delivering clocks is problematic– Wire delay is dominant– Routing area can cost more than gates– Too many constraints, from too many blocks
• Cannot afford lowest common denominator design
OCIN06: Intelligent Interconnects for Multicore SoC’s 7Dec. 6, 2006
NetworkNetwork
DRAM ControllerDRAM ControllerCPUCPU DMADMADMADMA DRAM ControllerDRAM ControllerCPUCPU
Our Approach: Active Decoupling
• Separation• Abstraction• Optimization• Independence
MasterMaster
SlaveSlave
SlaveSlave
MasterMaster
Socket
MasterMaster SlaveSlave
MasterMasterSlaveSlave
16 128
BusAdapter
BusAdapter
BusAdapter
BusAdapter
BusAdapter
BusAdapter
BusAdapter
BusAdapter
Core Function
Communication
Core Function
Communication
Core Function
Communication
AgentAgent AgentAgent AgentAgent AgentAgent
Internal FabricInternal Fabric
SMART Interconnect
OCIN06: Intelligent Interconnects for Multicore SoC’s 8Dec. 6, 2006
Layers of Decoupling
• Degree of blocking
• QoS
• Addressing
• Source identification
• Bursting, pipelining
• Threading
• Data, address, event widths
• Handshaking / flow control
• Signal timing and capacitance
• Clock frequency
Performance
Identification
Transfer Protocol
Signaling
Electrical
Hig
her
A
bst
ract
ion
OCIN06: Intelligent Interconnects for Multicore SoC’s 9Dec. 6, 2006
Evolution of Design Abstractions
Abstraction minimizes the number of objectsthat the designer must manage
GatesGates WiresWires
Level ofAbstraction
TIME
BlocksBlocks BusesBusesn
TilesTiles NetworksNetworks
InterconnectCores
InterconnectCores
FunctionalCores
FunctionalCores
Need to be here!
Most designs are here
OCIN06: Intelligent Interconnects for Multicore SoC’s 10Dec. 6, 2006
Abstraction Enables Higher Complexity
p
100M+Gates
Technology Push
THE PAST
BlocksBlocks BusesBuses
PRESENT
CoresCores InterconnectsInterconnects
5M+Gates
DEVELOPMENT TIME & COST
FUTURE
TilesTiles NetworksNetworks
Lo
g(G
AT
E C
OU
NT
)
OCIN06: Intelligent Interconnects for Multicore SoC’s 11Dec. 6, 2006
The OCP Socket
• Owned/advanced by OCP-IP (www.ocpip.org)– Over 150 member companies
• Interconnect-neutral• Defines data flow, control flow, test signaling• Highly configurable to match core needs• Simple Master/Slave request/response protocols
– With many options
• Fully synchronous, point-to-point• Optional pipelining, bursting, threading• Flexible handshaking and other flow control
OCIN06: Intelligent Interconnects for Multicore SoC’s 12Dec. 6, 2006
Agenda
• SoC Background• Interconnect Architecture
– Advanced Fabrics– Intelligent Agents
• Application Areas
OCIN06: Intelligent Interconnects for Multicore SoC’s 13Dec. 6, 2006
SoC Data Flow Requirements• Connect dozens of initiators to several memories
– While meeting a wide range of latency and throughput requirements
• Connect processors and DMA to control ports and peripherals– With predictably low latencies
• Example SoC requirements (interconnect view):
Metric Mode Type Requirement
CPU – peripheral interrupt service All Worst-case < 10 cycles
CPU – DDR cache miss latency All Average As low as feasible
H.264 decode – DDR bandwidth HD vid. Worst-case 1.2 GBytes/sec
H.264 decode – service jitter HD vid. Worst-case < 6 macro-blocks
Video refresh – DDR bandwidth SD/NI Worst-case 56 MBytes/sec
… … … …
DDR data bus bandwidth HD vid. Sustained > 80%
OCIN06: Intelligent Interconnects for Multicore SoC’s 14Dec. 6, 2006
Interconnect Fabric Options
• SoC data flow requirements must be satisfied by internal interconnect fabric– Big challenge in current SoC designs!
• Choices in interconnect fabric design– Unified vs. split transactions– Shared vs. separate physical links– Combinational vs. pipelined– Single vs. multiple outstanding transactions
(transaction pipelining)– In-order vs. out-of-order completion and response– Blocking vs. non-blocking flow control
OCIN06: Intelligent Interconnects for Multicore SoC’s 15Dec. 6, 2006
Blocking vs. Non-blocking Flow Control
• Sharing in SoC’s creates many opportunities for contention– Arbitration determines who wins– Flow control determines when the winner gets to go
• Blocking flow control systems allow resource shortages along some paths to prevent other paths from progressing
• Non-blocking flow control systems ensure that points of sharing never stall if any data flow could progress
• Locally non-blocking flow control (aka virtual channels) improves efficiency and allows more resource sharing
• End-to-end non-blocking flow control offers greater predictability– Provides basis for QoS guarantees Our Approach
OCIN06: Intelligent Interconnects for Multicore SoC’s 16Dec. 6, 2006
SMX
SonicsMX Basic Architecture• Hybrid topologies
– Full / partial cross-bar
– Shared bus
• Pipelined, multi-threaded, non-blocking fabric
• Fully split (dual) request / response
• Distributed QoS arbiter– Spans cycle, frequency, and
data width boundaries
– Supports flexible thread merging tree topologies I
SMX
CPU
DSP
GFX
ROM
SRAM
FlashCtl.
DRAMCtl.
I I I I
T
T
OCIN06: Intelligent Interconnects for Multicore SoC’s 17Dec. 6, 2006
Global Interconnect Responsibilities• Routing
– Getting requests, responses and data to the desired destination
• Access control– Managing contention for shared resources (ensuring QoS)– Ensuring requested access is allowed (security and protection)
• Error management– Detection, reporting, and SW recovery support
• Power management– Activity detection, clock and voltage removal support
• Connectivity– Protocol conversion– Data width / clock frequency conversion
• Spanning distance– Connecting endpoints at required frequency and latency
OCIN06: Intelligent Interconnects for Multicore SoC’s 18Dec. 6, 2006
Fabric
The Intelligence is in the Agents
• Agents provide…– Protocol conversion
• Agent adapts to IP core– Decoupling of IP cores from fabric
• Provide local, isolated environment– Layered services
• Proven technology– Over 100 million IC’s shipped so far
• Agent services– Power management– Security management– Error management– QoS– Burst, width, and command
conversion
I
T
I
T
I
T
I
T
I
T
INITIATOR SOCKETS
TARGET SOCKETS
Target Agents (TA)
Initiator Agents (IA)
OCIN06: Intelligent Interconnects for Multicore SoC’s 19Dec. 6, 2006
Sonics Interconnect Products
Product SiliconBackplane Sonics3220 SonicsMX
Introduction 1999 2002 2004
Connects Processors, memories
Peripherals, registers
Processors, memories
Socket(s) OCP 1 OCP 1/2, APB OCP 1/2, AHB, AXI
Decoupling High Medium Very High
Fabric Distributed, shared bus
Multi-branch shared bus
Hybrid cross-bar / shared bus
Applications WLAN, HDTV, PVR, …
Handsets, consumer, …
Handsets, HDTV, DSC, OA, …
Volume Production
Over 100 million IC’s shipped
2005 2005
OCIN06: Intelligent Interconnects for Multicore SoC’s 20Dec. 6, 2006
Interconnect Performance
• Many standard peak measurements pretty meaningless when considering SoC interconnects– Bandwidth is cheap (wires are cheaper than pins)– Usable bandwidth mostly dependent on attached cores…– … and application scenario
• Sonics’ products engineered to span very wide range of operating points– Latencies: 0-10 clock cycles– Bandwidth: limited only by targets– Frequency: keep up with DRAM– IP core count: 10-100+– No process-specific limitations– Standard ASIC-style design flow
OCIN06: Intelligent Interconnects for Multicore SoC’s 21Dec. 6, 2006
Product Deliverables
• Configuration tools– Aid in capture, checking, generation, and verification of Sonics’ IP
• Configured Register Transfer Language (RTL) code– “Virtual hardware” input to logic synthesis and place & route tools
• Configured verification environment– Tests customer configuration of Sonics’ IP
• Logic synthesis scripts & floorplan interface– Bridges to physical design domain
• Electronic System Level (ESL) models (in SystemC)– Helps architectural exploration and firmware development
• Analysis tools and models– Aids customer in building and analyzing prototypes of SoC
OCIN06: Intelligent Interconnects for Multicore SoC’s 22Dec. 6, 2006
Design Issues Addressed By Our Approach
Methodology &
Automation
ScalableFabrics
Intelligent
Agents
Complex Memory Hierarchies
Signal Integrity
Distributed Processing
High Peripheral Count
Perf. Verification
Arch. Modeling
SW Development
Virtual Prototyping
Parallel IP Creation
Design Re-use
Timing Closure
Variable Clock Freq.
Voltage Isolation
Power Management
Error Management
Access Security
Data Width Conversion
Mixed Endianness
Protocol ConversionGuaranteed BW QoS Pipelining
OCIN06: Intelligent Interconnects for Multicore SoC’s 23Dec. 6, 2006
Design Timeline Benefits of Our Approach
Emulator?
Time
IPCore
SOC
IPCore
SOC
Time
DesignFunctional Verif.
Synth./Timing Verif.Performance Verif.
Integration“Hello World”
Re-verified
Core Interaction
Tig
htly C
ou
pled
So
nics
StartFirst Integration
“Hello World”Re-verified
Core Interaction Tapeout!
Start Tapeout!
OCIN06: Intelligent Interconnects for Multicore SoC’s 24Dec. 6, 2006
Agenda
• SoC Background• Interconnect Architecture• Application Areas
OCIN06: Intelligent Interconnects for Multicore SoC’s 25Dec. 6, 2006
Inside a Tile-based Multicore SoC
• Example: TI OMAP 2420• Heterogeneous processors
– General-purpose CPU (e.g. ARM 11)– DSP (e.g. TI C5x)– 2D/3D graphics (e.g. Imagination PowerVR MBX)– Video accelerator (e.g. MPEG4 codec)
• Multiple levels of shared memory– External DRAM and Flash– Internal SRAM and ROM
• Very complex peripheral subsystems• Security management (content, service, stability)• Aggressive power management
OCIN06: Intelligent Interconnects for Multicore SoC’s 26Dec. 6, 2006Sources: www.ti.com,
www.arm.com,www.powervr.com
(from OMAP 5910)
IVA
OCIN06: Intelligent Interconnects for Multicore SoC’s 27Dec. 6, 2006
Multicore Architecture Advantages
Avner GorenTI
EPF 2004
What isneeded
OCIN06: Intelligent Interconnects for Multicore SoC’s 28Dec. 6, 2006
Application Processor Interconnect Example
S3220
CPU TileCPU Tile
PP PP
PP T
PP T
PP T
PP T
PP
T
PP
T
PP
T
PP
T
PP
T
PP
T
PP
T
PP
T
PP
T
PP
T
PP
T
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PP
T
PP
T
PP
T
PP
T
Flas
h C
on
trolle
rF
lash
Co
ntro
ller
SDRAM ControllerSDRAM Controller
2D/3D GraphicsTile
2D/3D GraphicsTile
EmbeddedSRAM
EmbeddedSRAM
DSP TileDSP Tile
SMXI
TT
TI
I
SMX
MPEG4 CodecTile
MPEG4 CodecTile
MP
3M
P3
US
B 2.0
US
B 2.0
LC
DC
on
troller
LC
DC
on
troller
Cam
eraIn
terfaceC
amera
Interface
T T
TT
T
PP
T
I I
I I
DMADMA
I I
IT
IT
TI
ComplexSocket
128DSPCoreDSPCore
Inst.CacheInst.
CacheX
RAMX
RAM
YRAM
YRAM
MMUMMU
DMADMA
DataCacheData
Cache
Shared Bus Fabric
T
PP
T
PP
T
PP
T
SimpleSocket
16
AgentR
egs
Socket I/F
Fabric I/F
DecouplingBuffer
SM
SM
PP
T
PP
T
PP T
TTTT
I
Partial XBarFabric
OCIN06: Intelligent Interconnects for Multicore SoC’s 29Dec. 6, 2006
SoC Application Requirements (1/2)
MarketDRAM
EfficiencyReal-time
Feature-driven
Power-optimized
Flexible Platform
Application Processor
DTV/STB
Gaming
Multi-function Printers
WLAN
OCIN06: Intelligent Interconnects for Multicore SoC’s 30Dec. 6, 2006
SoC Application Requirements (2/2)
MarketDRAM
EfficiencyReal-time
Feature-driven
Power-optimized
Flexible Platform
Cellular Baseband
DSC
PC
Automotive
Storage
OCIN06: Intelligent Interconnects for Multicore SoC’s 31Dec. 6, 2006
Challenges With Textbook NoC
• SoC applications do not offer lots of network-level concurrency– Many are dominated by a single DRAM target
• NoC packetization and serialization overhead too high– Communication (wires) vs. computation (router, NI)
cost trade-offs different for SoC
• NoC latency is unacceptable for current processors– Should improve as reality sets in…
OCIN06: Intelligent Interconnects for Multicore SoC’s 32Dec. 6, 2006
Research Opportunities
• Sonics is interested in facilitating research in the following areas:– NoC benchmarking (via OCP-IP)– Distributed, heterogeneous cache and I/O coherence– Performance constraint capture– Static performance analysis (SPA)– Implications of 3D packaging on SoC architecture
• Interested?– Please contact me: [email protected]
OCIN06: Intelligent Interconnects for Multicore SoC’s 33Dec. 6, 2006
Summary
• Active decoupling is essential for managing heterogeneous designs
• Non-blocking interconnect fabrics offer higher efficiency and better QoS
• Intelligent agents centralize key services• High volume multicore SoC applications need
advanced on-chip interconnects now
• Please contact me for references