View
78
Download
1
Category
Tags:
Preview:
Citation preview
Copyright©2014 NTT corp. All Rights Reserved.
Accelerating SDN/NFV with transparent offloading architecture
*NTT Microsystem Integration Laboratories, Japan †NTT Network Service Systems Laboratories, Japan
Open Networking Summit, Mar. 3-5, 2014, Santa Clara, CA, USA
Koji Yamazaki*, Takeshi Osaka†, Sadayuki Yasuda*, Shoko Ohteru*
and Akihiko Miyazaki*
1 Copyright©2014 NTT corp. All Rights Reserved.
Outline
• Challenge
• Our approach
• Experimental results
• Conclusion
2 Copyright©2014 NTT corp. All Rights Reserved.
Challenge
How can we enhance the performance of virtual network functions without increasing CAPEX or OPEX?
• Background
Lots of COTS accelerators and SDKs (FPGAs, NPUs, and GPUs)
Framework for saving energy in future networks (ITU-T Y.3021)
• Two objectives
To reduce programming effort
To enable high-performance, energy-efficient operations
3 Copyright©2014 NTT corp. All Rights Reserved.
Our approach
Goal: To accelerate required functions easily and efficiently
*ASIP: Application Specific Instruction-set Processor
1. Transparent offloading architecture
New programmable accelerator (ASIP*)
Harmonization among x86 environments
2. Design of application-specific instruction set
Optimal instructions for coarse DPI
Implementation of simple data structure (Bloom filter)
4 Copyright©2014 NTT corp. All Rights Reserved.
Overview of ASIP architecture
ALU
MUX
ALU
GPR
MUX
Load/Store
SRAM GPIO
Writeback
Decode
Execute
State Register
Fetch
Memory Access
SDRAM
*ISA: Instruction Set Architecture
Extension
• Embedded RISC CPUs - MIPS - Cadence - Synopsys, etc.
• Tunable architecture
• ISA* extension with tailored compiler allowed
• Customize HW resources
5 Copyright©2014 NTT corp. All Rights Reserved.
ASIP for packet stream processing
ALU
MUX
ALU
GPR
MUX
Load/Store
SRAM GPIO
Writeback
Decode
Execute
State Register
Fetch
Memory Access
SDRAM
Configure fast U-plane
FIFO
Ingress Stream FIFO
Egress Stream
6 Copyright©2014 NTT corp. All Rights Reserved.
Transparent offloading
C-plane configuration
x86 ASIP
PCIe PCIe
RAM
Core
I-RAM
D-RAM
DMAC
DMA Transfers
Issue DMA instructions
Concept: Control ASIP functions from x86 environment
7 Copyright©2014 NTT corp. All Rights Reserved.
Transparent offloading
x86 ASIP
PCIe PCIe
RAM
Core
I-RAM
D-RAM
I-RAM
DMAC
Memory map
DMA Transfers
Search ASIP section
8 Copyright©2014 NTT corp. All Rights Reserved.
Transparent offloading
x86 ASIP
PCIe PCIe
RAM
Core
I-RAM
D-RAM
I-RAM I-RAM
Forward functions
DMAC
Memory map
DMA Transfers
9 Copyright©2014 NTT corp. All Rights Reserved.
Invoke coarse DPI function
x86 ASIP
PCIe PCIe
RAM
Core
I-RAM
D-RAM
I-RAM I-RAM
Forward functions
DMAC
Memory map
DMA Transfers
... // Test data #define QUEUE_CHECK 50000 ...
int main() { ... // scan 50000 packets loop = 0; do {
bloom_scan(); loop++; } while (loop<QUEUE_CHECK); bloom_destroy(); return EXIT_SUCCESS; }
main()
10 Copyright©2014 NTT corp. All Rights Reserved.
Invoke coarse DPI function
x86 ASIP
PCIe PCIe
RAM
Core
I-RAM
D-RAM
I-RAM I-RAM
Forward functions
DMAC
Memory map
DMA Transfers ... …
void bloom_scan() { // Invoke instructions as intrinsics
if (queue_vacancy_check()) {
pop_queue(); sax_hash_match(); sdbm_hash_match(); bernstein_hash_match(); forward_data(); } }
bloom_scan()
22 instructions and 14 registers were added for coarse DPI
11 Copyright©2014 NTT corp. All Rights Reserved.
Disassembly of bloom filter matching
x86 ASIP
PCIe PCIe
RAM
Core
I-RAM
D-RAM
I-RAM I-RAM
Forward functions
DMAC
Memory map
DMA Transfers
# of
cycles Profiled disassembly
3 entry a1, 32 1 queue_vacancy_check a2
2 beqz.n a2, 60000465 <bloom_scan+0x19>
1 pop_queue 2 sdbm_hash_match 1 bernstein_hash_match 1 sax_hash_match 1 forward_data
1 retw.n bloom_scan+0x19
0 retw.n
13 cycles per one bloom_scan()
12 Copyright©2014 NTT corp. All Rights Reserved.
Experimental results
Evaluation items w/o acceleration w/ our instructions
Run-time
(mean # of cycles)
*50000 packets,
64-bit fixed field,
45-nm sim library.
hash(sax) 116 1
hash(sdbm) 115 2
hash(bernstein) 98 1
bloom_scan 678 13
Hardware size
(logic gate count) core and SRAM 75 KGates 79 KGates
Power dissipation
(mW) core and SRAM < 100 mW < 100 mW
Performance
(64 bytes)
pps (packets/s) 1 Mpps 57 Mpps
bps (bits/s) 723 Mbps 38 Gbps
Down 98%
Extremely low power
50x faster
13 Copyright©2014 NTT corp. All Rights Reserved.
Conclusion
Designing an optimal-instruction-set that harmonizes x86 environments
will reduce the costs required for acceleration
14 Copyright©2014 NTT corp. All Rights Reserved.
More challenging issues
Proprietary architecture White box architecture
Can open ISA transform the ecosystem of accelerators?
My assumption: Common, open ISA-based APIs reduce further programming costs.
Intel’s AVX ISA Extension Berkeley RISC-V open ISA
Emerging trends of open source SDKs (i.e. Centec’s Lantern)
Accelerators have “the Force” Dark side Light side
Other black box SDKs of COTS (ASSPs, NPUs)
Thank you! Questions?
Recommended