33
Intelligent Interconnects for Intelligent Interconnects for Multicore SoC’s Multicore SoC’s Drew Wingard Drew Wingard , , CTO, Sonics, Inc. CTO, Sonics, Inc. OCIN06: December 6, 2006 OCIN06: December 6, 2006

Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

Embed Size (px)

Citation preview

Page 1: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

Intelligent Interconnects for Multicore SoC’sIntelligent Interconnects for Multicore SoC’sDrew WingardDrew Wingard, , CTO, Sonics, Inc.CTO, Sonics, Inc.

OCIN06: December 6, 2006OCIN06: December 6, 2006

Page 2: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 2Dec. 6, 2006

Agenda

• SoC Background• Interconnect Architecture• Application Areas

Page 3: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 3Dec. 6, 2006

The Goal

Create a system-on-a-chip, comprising ten million gates, that satisfies rapidly-evolving market requirements for:

– Speed– Power– Area– Application Performance– Time to Market

Using the minimum resourceswith maximum predictability

Page 4: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 4Dec. 6, 2006

SoC Architecture Trends• Massive feature integration

– Driven largely by Moore’s Law (supply) and convergence (demand)

• Continued movement of complexity to software

• Distributed architectures– Higher scalability (and independence?)

• Multiple processors– CPU– DSP– Special purpose (MPEG, packet, …)

• Distributed DMA– Removes centralized DMA bottleneck– Simplifies driver software integration

CPUCPU CPUCPU MPEG MPEG MPEG MPEG DSP DSP DSP DSP

3D GFX3D GFX 3D GFX3D GFX MAC MAC MAC MAC

Video Video I/OI/O

Video Video I/OI/O

Comm Comm I/O I/O

Comm Comm I/O I/O

DRAM DRAM

ControllerController DRAM DRAM

ControllerController

System On Chip

Page 5: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 5Dec. 6, 2006

Why SoC’s Are Difficult• Improvement by feature integration has been practiced for

decades– Why is SoC any different?

• Benefits of an SoC– Higher performance– Smaller footprint– Lower power– Lower cost

• Size, power, and cost benefits derive from sharing– Control software on CPU (interrupts, etc.)– Memory resources (on-chip and off-chip) – single external DRAM

• Building predictable systems with so much sharing is hard– Dozens to hundreds of interrupt sources– Memory bandwidth bottlenecks + lots of real-time traffic stress

Unified Memory Architecture (UMA)

Page 6: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 6Dec. 6, 2006

Limits of Tightly Coupled Design

• Tightly coupled design has been dominant– Assumes largely synchronous, instantaneous and

free communication– Widely practiced, and supported in design flows

• BUT– Delivering clocks is problematic– Wire delay is dominant– Routing area can cost more than gates– Too many constraints, from too many blocks

• Cannot afford lowest common denominator design

Page 7: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 7Dec. 6, 2006

NetworkNetwork

DRAM ControllerDRAM ControllerCPUCPU DMADMADMADMA DRAM ControllerDRAM ControllerCPUCPU

Our Approach: Active Decoupling

• Separation• Abstraction• Optimization• Independence

MasterMaster

SlaveSlave

SlaveSlave

MasterMaster

Socket

MasterMaster SlaveSlave

MasterMasterSlaveSlave

16 128

BusAdapter

BusAdapter

BusAdapter

BusAdapter

BusAdapter

BusAdapter

BusAdapter

BusAdapter

Core Function

Communication

Core Function

Communication

Core Function

Communication

AgentAgent AgentAgent AgentAgent AgentAgent

Internal FabricInternal Fabric

SMART Interconnect

Page 8: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 8Dec. 6, 2006

Layers of Decoupling

• Degree of blocking

• QoS

• Addressing

• Source identification

• Bursting, pipelining

• Threading

• Data, address, event widths

• Handshaking / flow control

• Signal timing and capacitance

• Clock frequency

Performance

Identification

Transfer Protocol

Signaling

Electrical

Hig

her

A

bst

ract

ion

Page 9: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 9Dec. 6, 2006

Evolution of Design Abstractions

Abstraction minimizes the number of objectsthat the designer must manage

GatesGates WiresWires

Level ofAbstraction

TIME

BlocksBlocks BusesBusesn

TilesTiles NetworksNetworks

InterconnectCores

InterconnectCores

FunctionalCores

FunctionalCores

Need to be here!

Most designs are here

Page 10: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 10Dec. 6, 2006

Abstraction Enables Higher Complexity

p

100M+Gates

Technology Push

THE PAST

BlocksBlocks BusesBuses

PRESENT

CoresCores InterconnectsInterconnects

5M+Gates

DEVELOPMENT TIME & COST

FUTURE

TilesTiles NetworksNetworks

Lo

g(G

AT

E C

OU

NT

)

Page 11: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 11Dec. 6, 2006

The OCP Socket

• Owned/advanced by OCP-IP (www.ocpip.org)– Over 150 member companies

• Interconnect-neutral• Defines data flow, control flow, test signaling• Highly configurable to match core needs• Simple Master/Slave request/response protocols

– With many options

• Fully synchronous, point-to-point• Optional pipelining, bursting, threading• Flexible handshaking and other flow control

Page 12: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 12Dec. 6, 2006

Agenda

• SoC Background• Interconnect Architecture

– Advanced Fabrics– Intelligent Agents

• Application Areas

Page 13: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 13Dec. 6, 2006

SoC Data Flow Requirements• Connect dozens of initiators to several memories

– While meeting a wide range of latency and throughput requirements

• Connect processors and DMA to control ports and peripherals– With predictably low latencies

• Example SoC requirements (interconnect view):

Metric Mode Type Requirement

CPU – peripheral interrupt service All Worst-case < 10 cycles

CPU – DDR cache miss latency All Average As low as feasible

H.264 decode – DDR bandwidth HD vid. Worst-case 1.2 GBytes/sec

H.264 decode – service jitter HD vid. Worst-case < 6 macro-blocks

Video refresh – DDR bandwidth SD/NI Worst-case 56 MBytes/sec

… … … …

DDR data bus bandwidth HD vid. Sustained > 80%

Page 14: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 14Dec. 6, 2006

Interconnect Fabric Options

• SoC data flow requirements must be satisfied by internal interconnect fabric– Big challenge in current SoC designs!

• Choices in interconnect fabric design– Unified vs. split transactions– Shared vs. separate physical links– Combinational vs. pipelined– Single vs. multiple outstanding transactions

(transaction pipelining)– In-order vs. out-of-order completion and response– Blocking vs. non-blocking flow control

Page 15: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 15Dec. 6, 2006

Blocking vs. Non-blocking Flow Control

• Sharing in SoC’s creates many opportunities for contention– Arbitration determines who wins– Flow control determines when the winner gets to go

• Blocking flow control systems allow resource shortages along some paths to prevent other paths from progressing

• Non-blocking flow control systems ensure that points of sharing never stall if any data flow could progress

• Locally non-blocking flow control (aka virtual channels) improves efficiency and allows more resource sharing

• End-to-end non-blocking flow control offers greater predictability– Provides basis for QoS guarantees Our Approach

Page 16: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 16Dec. 6, 2006

SMX

SonicsMX Basic Architecture• Hybrid topologies

– Full / partial cross-bar

– Shared bus

• Pipelined, multi-threaded, non-blocking fabric

• Fully split (dual) request / response

• Distributed QoS arbiter– Spans cycle, frequency, and

data width boundaries

– Supports flexible thread merging tree topologies I

SMX

CPU

DSP

GFX

ROM

SRAM

FlashCtl.

DRAMCtl.

I I I I

T

T

Page 17: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 17Dec. 6, 2006

Global Interconnect Responsibilities• Routing

– Getting requests, responses and data to the desired destination

• Access control– Managing contention for shared resources (ensuring QoS)– Ensuring requested access is allowed (security and protection)

• Error management– Detection, reporting, and SW recovery support

• Power management– Activity detection, clock and voltage removal support

• Connectivity– Protocol conversion– Data width / clock frequency conversion

• Spanning distance– Connecting endpoints at required frequency and latency

Page 18: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 18Dec. 6, 2006

Fabric

The Intelligence is in the Agents

• Agents provide…– Protocol conversion

• Agent adapts to IP core– Decoupling of IP cores from fabric

• Provide local, isolated environment– Layered services

• Proven technology– Over 100 million IC’s shipped so far

• Agent services– Power management– Security management– Error management– QoS– Burst, width, and command

conversion

I

T

I

T

I

T

I

T

I

T

INITIATOR SOCKETS

TARGET SOCKETS

Target Agents (TA)

Initiator Agents (IA)

Page 19: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 19Dec. 6, 2006

Sonics Interconnect Products

Product SiliconBackplane Sonics3220 SonicsMX

Introduction 1999 2002 2004

Connects Processors, memories

Peripherals, registers

Processors, memories

Socket(s) OCP 1 OCP 1/2, APB OCP 1/2, AHB, AXI

Decoupling High Medium Very High

Fabric Distributed, shared bus

Multi-branch shared bus

Hybrid cross-bar / shared bus

Applications WLAN, HDTV, PVR, …

Handsets, consumer, …

Handsets, HDTV, DSC, OA, …

Volume Production

Over 100 million IC’s shipped

2005 2005

Page 20: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 20Dec. 6, 2006

Interconnect Performance

• Many standard peak measurements pretty meaningless when considering SoC interconnects– Bandwidth is cheap (wires are cheaper than pins)– Usable bandwidth mostly dependent on attached cores…– … and application scenario

• Sonics’ products engineered to span very wide range of operating points– Latencies: 0-10 clock cycles– Bandwidth: limited only by targets– Frequency: keep up with DRAM– IP core count: 10-100+– No process-specific limitations– Standard ASIC-style design flow

Page 21: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 21Dec. 6, 2006

Product Deliverables

• Configuration tools– Aid in capture, checking, generation, and verification of Sonics’ IP

• Configured Register Transfer Language (RTL) code– “Virtual hardware” input to logic synthesis and place & route tools

• Configured verification environment– Tests customer configuration of Sonics’ IP

• Logic synthesis scripts & floorplan interface– Bridges to physical design domain

• Electronic System Level (ESL) models (in SystemC)– Helps architectural exploration and firmware development

• Analysis tools and models– Aids customer in building and analyzing prototypes of SoC

Page 22: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 22Dec. 6, 2006

Design Issues Addressed By Our Approach

Methodology &

Automation

ScalableFabrics

Intelligent

Agents

Complex Memory Hierarchies

Signal Integrity

Distributed Processing

High Peripheral Count

Perf. Verification

Arch. Modeling

SW Development

Virtual Prototyping

Parallel IP Creation

Design Re-use

Timing Closure

Variable Clock Freq.

Voltage Isolation

Power Management

Error Management

Access Security

Data Width Conversion

Mixed Endianness

Protocol ConversionGuaranteed BW QoS Pipelining

Page 23: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 23Dec. 6, 2006

Design Timeline Benefits of Our Approach

Emulator?

Time

IPCore

SOC

IPCore

SOC

Time

DesignFunctional Verif.

Synth./Timing Verif.Performance Verif.

Integration“Hello World”

Re-verified

Core Interaction

Tig

htly C

ou

pled

So

nics

StartFirst Integration

“Hello World”Re-verified

Core Interaction Tapeout!

Start Tapeout!

Page 24: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 24Dec. 6, 2006

Agenda

• SoC Background• Interconnect Architecture• Application Areas

Page 25: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 25Dec. 6, 2006

Inside a Tile-based Multicore SoC

• Example: TI OMAP 2420• Heterogeneous processors

– General-purpose CPU (e.g. ARM 11)– DSP (e.g. TI C5x)– 2D/3D graphics (e.g. Imagination PowerVR MBX)– Video accelerator (e.g. MPEG4 codec)

• Multiple levels of shared memory– External DRAM and Flash– Internal SRAM and ROM

• Very complex peripheral subsystems• Security management (content, service, stability)• Aggressive power management

Page 26: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 26Dec. 6, 2006Sources: www.ti.com,

www.arm.com,www.powervr.com

(from OMAP 5910)

IVA

Page 27: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 27Dec. 6, 2006

Multicore Architecture Advantages

Avner GorenTI

EPF 2004

What isneeded

Page 28: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 28Dec. 6, 2006

Application Processor Interconnect Example

S3220

CPU TileCPU Tile

PP PP

PP T

PP T

PP T

PP T

PP

T

PP

T

PP

T

PP

T

PP

T

PP

T

PP

T

PP

T

PP

T

PP

T

PP

T

PPT

PPT

PPT

PPT

PPT

PPT

PPT

PP

T

PP

T

PP

T

PP

T

Flas

h C

on

trolle

rF

lash

Co

ntro

ller

SDRAM ControllerSDRAM Controller

2D/3D GraphicsTile

2D/3D GraphicsTile

EmbeddedSRAM

EmbeddedSRAM

DSP TileDSP Tile

SMXI

TT

TI

I

SMX

MPEG4 CodecTile

MPEG4 CodecTile

MP

3M

P3

US

B 2.0

US

B 2.0

LC

DC

on

troller

LC

DC

on

troller

Cam

eraIn

terfaceC

amera

Interface

T T

TT

T

PP

T

I I

I I

DMADMA

I I

IT

IT

TI

ComplexSocket

128DSPCoreDSPCore

Inst.CacheInst.

CacheX

RAMX

RAM

YRAM

YRAM

MMUMMU

DMADMA

DataCacheData

Cache

Shared Bus Fabric

T

PP

T

PP

T

PP

T

SimpleSocket

16

AgentR

egs

Socket I/F

Fabric I/F

DecouplingBuffer

SM

SM

PP

T

PP

T

PP T

TTTT

I

Partial XBarFabric

Page 29: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 29Dec. 6, 2006

SoC Application Requirements (1/2)

MarketDRAM

EfficiencyReal-time

Feature-driven

Power-optimized

Flexible Platform

Application Processor

DTV/STB

Gaming

Multi-function Printers

WLAN

Page 30: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 30Dec. 6, 2006

SoC Application Requirements (2/2)

MarketDRAM

EfficiencyReal-time

Feature-driven

Power-optimized

Flexible Platform

Cellular Baseband

DSC

PC

Automotive

Storage

Page 31: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 31Dec. 6, 2006

Challenges With Textbook NoC

• SoC applications do not offer lots of network-level concurrency– Many are dominated by a single DRAM target

• NoC packetization and serialization overhead too high– Communication (wires) vs. computation (router, NI)

cost trade-offs different for SoC

• NoC latency is unacceptable for current processors– Should improve as reality sets in…

Page 32: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 32Dec. 6, 2006

Research Opportunities

• Sonics is interested in facilitating research in the following areas:– NoC benchmarking (via OCP-IP)– Distributed, heterogeneous cache and I/O coherence– Performance constraint capture– Static performance analysis (SPA)– Implications of 3D packaging on SoC architecture

• Interested?– Please contact me: [email protected]

Page 33: Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

OCIN06: Intelligent Interconnects for Multicore SoC’s 33Dec. 6, 2006

Summary

• Active decoupling is essential for managing heterogeneous designs

• Non-blocking interconnect fabrics offer higher efficiency and better QoS

• Intelligent agents centralize key services• High volume multicore SoC applications need

advanced on-chip interconnects now

• Please contact me for references