Upload
others
View
30
Download
0
Embed Size (px)
Citation preview
May 6, 2015 1
© Synopsys, Inc. All rights reserved
Gert Goossens
Sr. Director R&D, Synopsys
Adding C Programmability to
Data Path Design
May 6, 2015 2
© Synopsys, Inc. All rights reserved
Smart Products Drive SoC Developments
• Need for reusable SoC platforms
• SoC platforms must become software programmable, without compromising PPA (performance, power, area)
Feature-Rich
Multi-Sensing
Wirelessly Connected
Green
Always-On
Multi-Output
May 6, 2015 3
© Synopsys, Inc. All rights reserved
Programmable Processors in SoCs
UART
AMBA 3 AXI & AMBA 2.0 AHB
AMBA APB
GPIO
Wireless
Modem Audio
SD/MMC
controller
Radio
Front End
Audio
Codecs I2C
Video /
Imaging
Video
Front End
Embedded
Memories
(SRAM,
ROM, NVM)
Datapath
SATA
controller
SATA
PHY
DDR
PHY
DDR
controller
USB
controller
USB
PHY
PCIe
controller
PCIe
PHY
HDMI
controller
HDMI
PHY
Control
Processor Ethernet
controller
10G
PHY
UniPro,
UFS, CSI-3,
DigRFv4
controller
MIPI
M-PHY
CSI-2, DSI
controller
MIPI
D-PHY
Control
Processor
Signal
Processing
ADC
DAC
May 6, 2015 4
© Synopsys, Inc. All rights reserved
Processor Solutions Spectrum
Micro- processor
Extensible Processor
Application-Specific P / DSP
Programmable Datapath
Hardwired Datapath
• Architectural specialization
• Parallelism: instruction-level, data-level, task-level
• Architectural specialization
• Parallelism: instruction-level, data-level, task-level
• Power-optimised RTL generation
• Power-gating of cores
Minimize power
consumption
Maximize performance
• Support changing requirements, product differentiation, new features… without SoC respin!
• Quick algorithm mapping from C to silicon, with easy debugging
Programma-bility
ASIP = Application-Specific Instruction-set Processor
May 6, 2015 5
© Synopsys, Inc. All rights reserved
ASIP Architectural Optimization Space
• Architectural space beyond configurable templates
• Can be captured by processor description language
• Architectural exploration enabled by retargetable ASIP design tools
ASIP architectural optimization space
Parallelism Specialization
Instruction- level
parallelism
Data- level
parallelism
Task- level
parallelism
Orthogonal instruction set (VLIW)
Encoded instruction
set
Vector processing
(SIMD)
Multi-core
App.-specific
data types
App.-specific instructions
Connectivity & storage matching application’s
data-flow
App.-spec. data
processing
App.-spec. memory
addressing
App.-spec. control
processing
Distributed regs,
sub-ranges
Multiple mem’s,
sub-ranges
Jumps, subroutines, interrupts, HW do-loops,
residual control, predication
Direct, indirect, post-modification, indexed,
stack indirect…
Any exotic operator
Integer, fractional, floating-point, bits, complex, vector…
Single or multi-cycle
Relative or absolute, address range, delay slots
Pipeline
Multi-threading
Pipeline depth
Hazards: HW/SW stall,
bypass
May 6, 2015 6
© Synopsys, Inc. All rights reserved
ASIP Designer – Retargetable ASIP Design Tool
Typical users: ASIC/SoC design teams
May 6, 2015 7
© Synopsys, Inc. All rights reserved
ASIP Designer – History
ASIP Designer
• Processor description language: nML
• Consolidated product, combining strengths of IP Designer and Processor Designer
• Stepwise deployment in 2015-2016 time frame
• Legacy products remain available
IP Designer
• Processor description language: nML
• Roots in architectural exploration and retargetable compilation
Processor Designer
• Processor description language: LISA
• Roots in modeling and fast simulation
May 6, 2015 8
© Synopsys, Inc. All rights reserved
Adding C Programmability to SoC Design
ISG
• Graph-based compilation technology combines retargetability with high code efficiency
• Instruction-set graph (ISG)
– Graph-based optimization algorithms operate on (any) ISG
– Closer to HW than other compilers’ machine models
– HW resources, data types, connectivity, instruction encoding, instruction-level parallelism, instruction pipeline
– Supports “irregular” architectures
• Enables rapid and architectural exploration with compiler-in-the-loop
• Enables algorithm development in C, even for highly specialized ASIPs
Application
C
Machine code
Elf / Dwarf
Processor model
nML
COMPILATION
ENGINE
(PHASE COUPLING)
CDFG
*
+
nML FRONT-END C FRONT-END
SOURCE-LEVEL TRANSF.
CODE SELECTION
REGISTER ALLOCATION
SCHEDULING
CODE EMISSION
mul
add
X[2] Y[2]
A[2]
A[2]
sub
May 6, 2015 9
© Synopsys, Inc. All rights reserved
Applicable to “Any” Application Domain
Audio
Video & imaging
Wireless
Wireline
Medical
Network processing
Automotive
TMTM
High-perf. computing
Crypto & identification
Industrial
Graphics
• Publicly announced IP Designer and
Processor Designer customers
May 6, 2015 10
© Synopsys, Inc. All rights reserved
Examples: Wireless Communication
Micro- processor
Domain-Specific Processor
Application-Specific Processor
Programmable Datapath
Hardwired Datapath
“BoT” [1]
Configurable inner-modem processor
LTE(A) + 11ac + 11ad + WPAN + GPS + DVBT... “FlexFEC” [2]
3-standard FEC engine
LDPC + Turbo + Viterbi
“BLOX” [1]
Single-function sliced accelerators
FFT | LDPC | Matrix inv.
[1] L. Van der Perre, “Radios in need of (Multi-)ASIP - wanted:
flexibility and energy efficiency, Synopsys User Group,
Munich, May 2013
[2] F. Naessens, “Unified C-programmable ASIP architecture for
multi-standard Viterbi, Turbo and LDPC decoding”, IP-SoC
Conference, Dec. 2011.
May 6, 2015 11
© Synopsys, Inc. All rights reserved
“BoT” [1]
Configurable inner-modem processor
LTE(A) + 11ac + 11ad + WPAN + GPS + DVBT...
Examples: Wireless Communication
Micro- processor
Domain-Specific Processor
Application-Specific Processor
Programmable Datapath
Hardwired Datapath
“BoT” [1]
Configurable inner-modem processor
LTE(A) + 11ac + 11ad + WPAN + GPS + DVBT... “FlexFEC” [2]
3-standard FEC engine
LDPC + Turbo + Viterbi
“BLOX” [1]
Single-function sliced accelerators
FFT | LDPC | Matrix inv.
[1] L. Van der Perre, “Radios in need of (Multi-)ASIP - wanted:
flexibility and energy efficiency, Synopsys User Group,
Munich, May 2013
[2] F. Naessens, “Unified C-programmable ASIP architecture for
multi-standard Viterbi, Turbo and LDPC decoding”, IP-SoC
Conference, Dec. 2011.
May 6, 2015 12
© Synopsys, Inc. All rights reserved
BOT
• Mixed scalar/vector processor
– 10-slot VLIW: 3 scalar,
2 vector L/S, 3 vector compute,
2 pack/unpack
– Vector compute units with
increased specialization
– VU1: alu, mul, shift
– VU2: alu, cabs, interleave, shift
– VU3: alu, recip, sqrt, tan, cexp,
slope, interleave, softdemap
– Vector packing/unpacking
– Low power: clock gating
exploits low duty cycle
– C programmable
Configurable Inner-Modem Processor [1]
Vector RF
BoT profile
average: 45mW (40nm@400MHz)
May 6, 2015 13
© Synopsys, Inc. All rights reserved
“FlexFEC” [2]
3-standard FEC engine
LDPC + Turbo + Viterbi
Examples: Wireless Communication
Micro- processor
Domain-Specific Processor
Application-Specific Processor
Programmable Datapath
Hardwired Datapath
“BoT” [1]
Configurable inner-modem processor
LTE(A) + 11ac + 11ad + WPAN + GPS + DVBT... “FlexFEC” [2]
3-standard FEC engine
LDPC + Turbo + Viterbi
“BLOX” [1]
Single-function sliced accelerators
FFT | LDPC | Matrix inv.
[1] L. Van der Perre, “Radios in need of (Multi-)ASIP - wanted:
flexibility and energy efficiency, Synopsys User Group,
Munich, May 2013
[2] F. Naessens, “Unified C-programmable ASIP architecture for
multi-standard Viterbi, Turbo and LDPC decoding”, IP-SoC
Conference, Dec. 2011.
May 6, 2015 14
© Synopsys, Inc. All rights reserved
• Application-specific mixed scalar/vector processor
– SIMD: n-way x 8-bit
– VLIW: 1 scalar and 5 vector issue slots
– App.-specific primitive functions
– LDPC decode, Turbo decode, Viterbi decode (e.g. add-compare- select), special addressing modes
– App.-specific complex instructions
– “abs() + abs()”, element-wise vector shift, cross correlation with programmable spreading code
– Transparent background memory access through lookup address generator
– C programmable
FlexFEC 3-Standard Forward Error-Correction (FEC) Engine [2]
May 6, 2015 15
© Synopsys, Inc. All rights reserved
• Specialization: e.g. LDPC decode function “vq()”
Standard 32-bit RISC 3,040 cycles
(mild) specialization
32-bit RISC with predicated add/sub 2,707 cycles
data-level parallelism
96-lane, 16-bit SIMD with vector- predicated add/sub
32 cycles
specialization
96-lane, 16-bit SIMD with LDPC decode instruction
(synthesized from C code) 1 cycle
Note: cycle counts obtained for randomized input data
FlexFEC 3-Standard Forward Error-Correction (FEC) Engine [2]
May 6, 2015 16
© Synopsys, Inc. All rights reserved
• Instruction-level parallelism: 1 scalar + 5 vector (SIMD) slots – C compiler efficiently exploits VLIW issue slots
FlexFEC 3-Standard Forward Error-Correction (FEC) Engine [2]
May 6, 2015 17
© Synopsys, Inc. All rights reserved
“BLOX” [1]
Single-function sliced accelerators
FFT | LDPC | Matrix inv.
Examples: Wireless Communication
Micro- processor
Domain-Specific Processor
Application-Specific Processor
Programmable Datapath
Hardwired Datapath
“BoT” [1]
Configurable inner-modem processor
LTE(A) + 11ac + 11ad + WPAN + GPS + DVBT... “FlexFEC” [2]
3-standard FEC engine
LDPC + Turbo + Viterbi
“BLOX” [1]
Single-function sliced accelerators
FFT | LDPC | Matrix inv.
[1] L. Van der Perre, “Radios in need of (Multi-)ASIP - wanted:
flexibility and energy efficiency, Synopsys User Group,
Munich, May 2013
[2] F. Naessens, “Unified C-programmable ASIP architecture for
multi-standard Viterbi, Turbo and LDPC decoding”, IP-SoC
Conference, Dec. 2011.
May 6, 2015 18
© Synopsys, Inc. All rights reserved
BLOX
• Highly regular vector processors
– In each SIMD lane, stack
elementary operators, limited
HW multiplexing
– Low power, thanks to
– Short active wires and modularity
– Simple operators
– Very wide register-files
(asymmetric access)
– Examples
– FFT for 11ac
– LDPC for 11ac
– Matrix ops
– C programmable (but requires C
code refactoring)
Single-Function Sliced Accelerators [1]
– FFT for 11ad
– LDPC for 11ad
May 6, 2015 19
© Synopsys, Inc. All rights reserved
Conclusions
• ASIP design tools introduce C programmability in SoC design
– Better design reuse
– Functional enhancements even after tapeout
– Productivity increase by raising abstraction from RTL to C
• “Compiler-in-the-Loop” concept
– Rapid architectural exploration
– Highly differentiating architectures
• Full control on PPA (performance, power, area)
• Software development kit for end users is available automatically
• Same tool supports wide range of IP needs
• Royalty-free solutions