Accelerating SDN/NFV with transparent offloading architecture

Preview:

Citation preview

Copyright©2014 NTT corp. All Rights Reserved.

Accelerating SDN/NFV with transparent offloading architecture

*NTT Microsystem Integration Laboratories, Japan †NTT Network Service Systems Laboratories, Japan

Open Networking Summit, Mar. 3-5, 2014, Santa Clara, CA, USA

Koji Yamazaki*, Takeshi Osaka†, Sadayuki Yasuda*, Shoko Ohteru*

and Akihiko Miyazaki*

1 Copyright©2014 NTT corp. All Rights Reserved.

Outline

• Challenge

• Our approach

• Experimental results

• Conclusion

2 Copyright©2014 NTT corp. All Rights Reserved.

Challenge

How can we enhance the performance of virtual network functions without increasing CAPEX or OPEX?

• Background

Lots of COTS accelerators and SDKs (FPGAs, NPUs, and GPUs)

Framework for saving energy in future networks (ITU-T Y.3021)

• Two objectives

To reduce programming effort

To enable high-performance, energy-efficient operations

3 Copyright©2014 NTT corp. All Rights Reserved.

Our approach

Goal: To accelerate required functions easily and efficiently

*ASIP: Application Specific Instruction-set Processor

1. Transparent offloading architecture

New programmable accelerator (ASIP*)

Harmonization among x86 environments

2. Design of application-specific instruction set

Optimal instructions for coarse DPI

Implementation of simple data structure (Bloom filter)

4 Copyright©2014 NTT corp. All Rights Reserved.

Overview of ASIP architecture

ALU

MUX

ALU

GPR

MUX

Load/Store

SRAM GPIO

Writeback

Decode

Execute

State Register

Fetch

Memory Access

SDRAM

*ISA: Instruction Set Architecture

Extension

• Embedded RISC CPUs - MIPS - Cadence - Synopsys, etc.

• Tunable architecture

• ISA* extension with tailored compiler allowed

• Customize HW resources

5 Copyright©2014 NTT corp. All Rights Reserved.

ASIP for packet stream processing

ALU

MUX

ALU

GPR

MUX

Load/Store

SRAM GPIO

Writeback

Decode

Execute

State Register

Fetch

Memory Access

SDRAM

Configure fast U-plane

FIFO

Ingress Stream FIFO

Egress Stream

6 Copyright©2014 NTT corp. All Rights Reserved.

Transparent offloading

C-plane configuration

x86 ASIP

PCIe PCIe

RAM

Core

I-RAM

D-RAM

DMAC

DMA Transfers

Issue DMA instructions

Concept: Control ASIP functions from x86 environment

7 Copyright©2014 NTT corp. All Rights Reserved.

Transparent offloading

x86 ASIP

PCIe PCIe

RAM

Core

I-RAM

D-RAM

I-RAM

DMAC

Memory map

DMA Transfers

Search ASIP section

8 Copyright©2014 NTT corp. All Rights Reserved.

Transparent offloading

x86 ASIP

PCIe PCIe

RAM

Core

I-RAM

D-RAM

I-RAM I-RAM

Forward functions

DMAC

Memory map

DMA Transfers

9 Copyright©2014 NTT corp. All Rights Reserved.

Invoke coarse DPI function

x86 ASIP

PCIe PCIe

RAM

Core

I-RAM

D-RAM

I-RAM I-RAM

Forward functions

DMAC

Memory map

DMA Transfers

... // Test data #define QUEUE_CHECK 50000 ...

int main() { ... // scan 50000 packets loop = 0; do {

bloom_scan(); loop++; } while (loop<QUEUE_CHECK); bloom_destroy(); return EXIT_SUCCESS; }

main()

10 Copyright©2014 NTT corp. All Rights Reserved.

Invoke coarse DPI function

x86 ASIP

PCIe PCIe

RAM

Core

I-RAM

D-RAM

I-RAM I-RAM

Forward functions

DMAC

Memory map

DMA Transfers ... …

void bloom_scan() { // Invoke instructions as intrinsics

if (queue_vacancy_check()) {

pop_queue(); sax_hash_match(); sdbm_hash_match(); bernstein_hash_match(); forward_data(); } }

bloom_scan()

22 instructions and 14 registers were added for coarse DPI

11 Copyright©2014 NTT corp. All Rights Reserved.

Disassembly of bloom filter matching

x86 ASIP

PCIe PCIe

RAM

Core

I-RAM

D-RAM

I-RAM I-RAM

Forward functions

DMAC

Memory map

DMA Transfers

# of

cycles Profiled disassembly

3 entry a1, 32 1 queue_vacancy_check a2

2 beqz.n a2, 60000465 <bloom_scan+0x19>

1 pop_queue 2 sdbm_hash_match 1 bernstein_hash_match 1 sax_hash_match 1 forward_data

1 retw.n bloom_scan+0x19

0 retw.n

13 cycles per one bloom_scan()

12 Copyright©2014 NTT corp. All Rights Reserved.

Experimental results

Evaluation items w/o acceleration w/ our instructions

Run-time

(mean # of cycles)

*50000 packets,

64-bit fixed field,

45-nm sim library.

hash(sax) 116 1

hash(sdbm) 115 2

hash(bernstein) 98 1

bloom_scan 678 13

Hardware size

(logic gate count) core and SRAM 75 KGates 79 KGates

Power dissipation

(mW) core and SRAM < 100 mW < 100 mW

Performance

(64 bytes)

pps (packets/s) 1 Mpps 57 Mpps

bps (bits/s) 723 Mbps 38 Gbps

Down 98%

Extremely low power

50x faster

13 Copyright©2014 NTT corp. All Rights Reserved.

Conclusion

Designing an optimal-instruction-set that harmonizes x86 environments

will reduce the costs required for acceleration

14 Copyright©2014 NTT corp. All Rights Reserved.

More challenging issues

Proprietary architecture White box architecture

Can open ISA transform the ecosystem of accelerators?

My assumption: Common, open ISA-based APIs reduce further programming costs.

Intel’s AVX ISA Extension Berkeley RISC-V open ISA

Emerging trends of open source SDKs (i.e. Centec’s Lantern)

Accelerators have “the Force” Dark side Light side

Other black box SDKs of COTS (ASSPs, NPUs)

Thank you! Questions?

Recommended