Upload
others
View
13
Download
1
Embed Size (px)
Citation preview
April 2020 Elena Agostini and Joseph Boccuzzi
BUILDING O-RAN BASED HIGH PERFORMANCE 5G RAN SYSTEMS WITH NVIDIA GPU AND MELLANOX NIC
2
BUILDING O-RAN BASED HIGH PERFORMANCE 5G RAN SYSTEMS WITH NVIDIA GPUS AND MELLANOX NIC
In this webinar, we will walk through NVIDIA Aerial solution and O-RAN implementation. NVIDIA Aerial is a set of SDKs that enables GPU-accelerated, software-defined 5G wireless RANs. Today, NVIDIA Aerial provides two critical SDKs: cuVNF and cuBB. These SDKs can be combined to implement a software-accelerated physical layer on the O-DU that is able to dialog, by means of a Fronthaul I/O interface, with a set of radio heads to send, receive, and process 5G packets using GPUs. We'll show our implementation of the Fronthaul I/O interface to enable an O-RAN dialog with a radio unit, giving an overview of the most challenging issues we faced in differentiating between hardware- and software-accelerated features.
Elena AgostiniSoftware Engineer, NVIDIA
Joseph BoccuzziPrincipal 5G Architect, NVIDIA
3
AERIAL:5G vRAN BASEBAND PROCESSING
4
THREE REVOLUTIONS HAPPENING
5G will deliver 1000X better bandwidth
and 10X lower latency than 4G
By 2025, AI at the edge has a potential
total economic impact of up to
$11T/year
5GIOT AI
IOT devices projected to grow to >150B
by 2025, >1T by 2035
A Flexible and Scalable Network is needed.
SW Defined Open & Standards based solutions VNF/CNF based deployments
5
OPEN 5G vRAN DEPLOYMENTS5G + AI + Edge Compute
RU
Fronthaul
vDU
vCUvUPF
MECAI
vUPFvAMF, ...
MEC
AIBackhaul
Core Cloud
RU
DataCenter..
.vDU
vCUAI Midhaul
......
RU
Fronthaul
RU
...
GPU Based, SW Defined 5G Platform
vUPF
MEC
= Virtualized/Containerized
Edge Cloud
Regional Cloud
Access Network Core Network
Edge Compute Benefits:
Lower BW
Reduced Latency
Improved Reliability
Increased Privacy
Extends Application Space
6
5G vRAN AND EDGE COMPUTING ARCHITECTUREOpen & Standards based solution enables Edge Computing
MEC
N6DU&
CU
AMF SMFMEC
N6
N3
UPF
MEC
N6
DU CUF1
RU
Fronthaul
RU
...
Regional Cloud Core CloudEdge Cloud
N3
DUN3CUF1
N2
N4
N2
N4
N4N2
N2
Near-RT RICE2
E2
E2
E2
E2
E2
UPF
UPF
DU = Distributed UnitCU = Centralized UnitUPF = User Plane FunctionAMF = Access & Mobility Mgmt. FunctionSMF = Session Mgmt. FunctionMEC = Multi-Access Edge ComputeRU = Radio Unit
RIC = RAN Intelligent ControllerCUPS = Control and User Plane Separation
Impact of CUPS
7
FLEXIBLE AND SCALABLE 5G NETWORKSW Defined 5G solution enables Network Slicing Service
uR-LLC
mMTC
GPU Based Solution
MEC
N6DU&
CU
AMF SMFMEC
N6
N3
UPF
MEC
N6
DU CUF1
RU
Fronthaul
RU
...
Regional Cloud Core CloudEdge Cloud
N3
DUN3CUF1
N2
N4
N2
N4
N4N2
N2
Near-RT RICE2
E2
E2
E2
E2
E2
UPF
UPF
Smart City
HD Video/Gaming
Remote Control
mMTC = Massive Machine to Machine Comm.eMBB = Enhanced Mobile BroadbanduR-LLC = Ultra Reliable – Low Latency Comm.
uR-LLC
mMTC
eMBB
8
NVIDIA AERIAL SDK
Highest Performance 5G Software-Defined Radio
GPUMellanox
NIC
GPU Direct RDMA
CPU
DPDK
cuVNF
cuBB
AerialRich CUDA programmable environment
High performance & energy efficient
Scalable to mmWave & massive MIMO
100% SW Defined
One architecture from edge to cloud
Commercially off the shelf (COTS)
Open & Standards based
cuBB = CUDA BasebandcuVNF = CUDA Virtualized Network FunctionCUDA = Compute Unified Device Architecture
9
NVIDIA'S NEXT GENERATION COTS SOLUTION
CPU
DDR
BackHaul
FrontHaul
PCIePCIe
GPU
DDR
PCIe
GPU: L1 Functionality
5G demands a Flexible & Scalable SW Defined Network
CN RUGPU performs computationally intensive applications very well.
Inline functionality eliminates the need to move data back-n-forth.
GPU SW acceleration scales with 5G deployment complexity.
Provides a wide variety of application re-use.
Supports “speed-of-light” development & innovation (Specification Releases, ML-based algorithms).
CU DU
CPU: L2+ Functionality
10
NVIDIA AERIAL STACK
FH I/O lib
cuPHY
Dataplacement
O-RANformatting
Platform Features
GPU DPDK GPU Direct RDMA Header/Data Split
O-RAN flowidentification
CUDA ToolkitMellanox
OFEDnv_peer_mem
CPU
GPUMellanox
NIC
cuBB
cuVNF
Toolkit & Drivers
HW
AERIAL includes two SDKs:
cuVNF delivers low-latency GPU Direct packet IO to GPU memory.
cuBB offers accelerated 5G that’s been highly tuned for NVIDIA GPUs.
Accurate Packet Scheduling
11
AERIAL BASED 5G O-RAN DEPLOYMENT
RE Map
TrBLKCRC
CB Seg
+CRC
LDPCEncode
RateMatch
Scram ModLayer Map
PreCode
RE De-Map
ChanEst
De-Mod
De-Scram
EQTrBLKCRC
LDPCDecode
De-Rate
Match
CB Con
+CRC
PDSCH
PUSCH
IQComp
IQDe-
Comp
IQDe-
Comp
IQComp
FH
FH
CP+
MI
MO
iFFT
PreCode+ BF
CP-
FFT
DAC
ABF
DBF
ADC
ABF
RUCU/DUCN
L2+PHY-
UFH
FH
PHY-LRF
FHN2/N3App
N6
5G gNB
UE
App
CRC
FEC
Aerial
O-RANFH
N2/N3
...
O-RAN FH Split Option 7.2
12
AERIAL INTEGRATION: END-TO-END NSA SYSTEM
E2E Integration:
Core Network
L2+
PTP Timing
UE-EM
MACRLC
PHY-L
O-RAN
cuPHY-UUE
StackC/U
Sync
Mgnt
C/U
Sync
Mgnt
RU
UE-EM = RU + UE
PDCPS1
EPCSGi
PTP Sync
CU
DU
5G gNB
IP Switch
PTP GrandMaster
IP
FAPIPTP4L
PHC2SYS
NIC
NIC
BackHaul
RUCU/DUCNFHBH
App
5G gNB
UE
App
UE
13
NVIDIA AERIAL cuPHY
Aerial SDK (Alpha Release)
Location within 5G gNB
RE Map
TrBLKCRC
CB Seg
+CRC
LDPCEncode
RateMatch
Scram ModLayer Map
PreCode
RE De-Map
ChanEst
De-Mod
De-Scram
EQTrBLKCRC
LDPCDecode
De-Rate
Match
CB Con
+CRC
Tx
Rx
SS Block(P/S-SS, Polar Encode, Scrambling, DMRS Gen, Modulation)
PDCCH(Polar Encode, Modulation, DMRS Gen)
PDSCH
P/S-SS, PBCH
PDCCH
PUSCH
PUCCH
DL
UL ChanEst
Matched Filter
DetectorCC
removal
cuPHY
O-RANFront Haul
L2+
5G gNB
CPU
GPU
NIC
14
AERIAL SDK: ALPHA RELEASE
PDSCH
o Layers supported = 8 SU-MIMO, 16 MU-MIMO
PDCCH
o DCI 0_0 & 1_1, DMRS generation, time-frequency mapping
SS-Block (PBCH + PSS + SSS)
o PSS/SSS generation, DMRS generation, time-frequency mapping
PUSCH
o Layers supported = 4 SU-MIMO, 8 MU-MIMO
PUCCH
o Format 1, Multiplexing
HARQ support
Key Feature listing
Carrier BW = 20MHz & 100MHz
TDD/FDD
SCS supported = 15KHz & 30KHz
Multi-user Support
O-RAN Front Haul (split 7.2) support
All DL/UL modulations supported
5G FEC Processing
o LDPC encoding/decoding
o Rate Matching, De-Rate Matching
o Scrambling, Descrambling
o CB/TB CRC
o Polar Encoding
https://developer.nvidia.com/aerial-sdk
15
AERIAL SDK CUSTOMER CAPABILITIES
Uplink and Downlink test cases are provided.
The user can collect performance benchmarking such as Block Latency and uplink BLER.
Example CUDA implementations of PHY signal processing blocks are provided
What can you do with Aerial SDK ?
https://developer.nvidia.com/aerial-sdk
16
Poll Question #1
Which area do you expect the application of AI/ML will significantly impact ?
Physical Layer
Layer 2+ (ex. MAC, RLC, PDCP, SDAP)
Network Management
17
cuVNF:TECHNOLOGY, LIBRARY, FEATURES
18
cuVNF
• The NVIDIA cuVNF SDK provides a set of network libraries and features whereby packets are directly sent/received to/from GPU memory using GPU Direct capable network interface cards (NICs), such as Mellanox.
• The SDK package is based on DPDK 19.11 with:
• NVIDIA API extensions to send/receive packets using GPU memory (GPU DPDK)
• GDRCopy: required to let CPU access any GPU memory area
• Testpmd app with NVIDIA extensions to benchmark traffic forwarding with GPU memory
• l2fwd app with NVIDIA extensions as an example of:
• How to use NVIDIA API to send/receive packets back and forth the GPU memory
• Different techniques to interact with packets in GPU memory
Overview
19
NVIDIA AERIAL STACK
FH I/O lib
cuPHY
Dataplacement
O-RANformatting
Platform Features
GPU DPDK GPU Direct RDMA Header/Data Split
ORAN flowidentification
Accurate Packet Scheduling
CUDA ToolkitMellanox
OFEDnv_peer_mem
CPU
GPUMellanox
NIC
cuBB
cuVNF
Toolkit & Drivers
HW
cuBB & cuVNF
20
GPUDIRECT RDMAIn a nutshell
• 3rd party PCIe devices can directly read/write GPU memory
• e.g. network card
• GPU and external device must be under the same PCIe root complex
• No unnecessary system memory copies and CPU overhead
• cudaMalloc(gpu_buffer) + MPI_Send(gpu_buffer)
• https://docs.nvidia.com/cuda/gpudirect-rdma/index.html
21
GPUDIRECT RDMASystem topology
BestGood
Mellanox module: https://github.com/Mellanox/nv_peer_memory
22
DPDK
Data Plane Development Kit:
A set of data plane libraries and network interface controller drivers for fast packet processing
From user space, an application can directly dialog with the NIC avoiding OS procedures (and latencies)
• Mempool: contiguous system memory area which holds a list of mbufs
Overview
23
GPU DPDK
GPUDirect RDMA: NVIDIA GPU + Mellanox NIC
+ NVIDIA API to allocate mbufs content in GPU memory
+ DPDK 19.11
= GPU DPDK
Works with both GPUDirect RDMA HW topologies
Header/Data split feature:
• Same network packet split in two mbufs from different mempools (first A bytes in the first mempool, remaining B bytes in the second mempool)
• Useful to receive header of packet on CPU and payload of packet on the GPU
Recipe: DPDK & NVIDIA & Mellanox
24
GPU DPDK
Ordinary CUDA kernel: launch a CUDA kernel after receiving packets
• Pros:• GPU resources uses only when needed
• No GPU memory consistency problem
• Possible overlap between GPU processing and network activity
• Cons:
• High response latency
• CUDA kernel launch latency for each new set of packets
Dealing with GPU memory: ordinary CUDA kernel
25
GPU DPDK
Persistent CUDA kernel: pre-launch a CUDA kernel that's polling memory area waiting for new packets
• Pros:• Low response latency
• Avoid CUDA Kernel launch latency for each RX set of packets
• Possible overlap between GPU processing and network activity
• Cons:
• GPU resources held by the persistent kernel during polling
• CPU-GPU communication mechanism via polling/flags update
• GPU memory consistency problem
Dealing with GPU memory: persistent kernel
26
GPU DPDK – L2FWDOverview
L2fwd-nv:
Basic l2fwd example powered with NVIDIA extensions
Showcase of interaction with GPU packets (ordinary vs persistent CUDA kernel)
Trivial workload: swap MAC addresses of each Rx packet
Testpmd:
default DPDK application for network benchmarks
Used as packet generator
Tx throughput 100Gbps
27
GPU DPDK – L2FWD
• GPU memory vs CPU memory
• GPU processing vs CPU processing
• Persistent kernel shows 10% better performance
• L2FWD has trivial compute
• significantly more complex to use
• Regular kernels are flexible and can give similar performance
• Latencies get overlapped with larger workloads
• System HW:
• Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz (Skylake)
• NVIDIA GPU V100
• Mellanox ConnectX-5, 100Gbps
• PCIe bridge Broadcom PLX Technology 9797
Performance comparison
28
AERIAL SDK 5G vRAN
29
E2E INTEGRATIONOverview
ORAN: Standard to define communication protocol between the Distributed Unit and the Radio Unit• Distributed Unit (O-DU): a logical node hosting RLC/MAC/High-PHY layers
based on a lower layer functional split• Radio Unit (O-RU): a logical node hosting Low-PHY layer and RF processing
based on a lower layer functional split• Control Unit (O-CU): a logical node hosting PDCP, RRC and other control
functions. At present, NVIDIA uses external 3rd party stack vendor for this component
Two type of O-DU<-->O-RU interactions:• Uplink: O-RU --> O-DU• Downlink: O-DU --> O-RU
Communication planes:• C-plane: configure how to process next time slot data packets• U-plane: data packets• M-plane: network setup and management• S-plane: network synchronization
L1
L2
L3
MACRLC
PHY-LO-RAN
cuPHY-U UE Stack
C/U
Sync
Mgnt
C/U
Sync
MgntRU UE-EM
PDCPS1-U
EPC
SGi
PTP Synch
CU
DU
5G gNB
IP
SwitchPTP Grand
Master
IPFAPI
PTP4L
PHC2SYS
30
O-RAN FH COMPLIANT 5G L1 INTERFACE
Components:
cuVNF to send/receive packets from/into GPU memory
ORAN flow identification: GPU DPDK + Mellanox FW to identify different RX queues for ORAN packets based on header's values
cuBB for L1 PHY processing on the GPU (cuPHY)
O-RAN: C-plane + U-plane
Result: O-DU (or 5G gNB) able to communicate with O-RAN compliant O-RU(s)
Aerial SDK components
31
O-RAN FH COMPLIANT 5G L1 INTERFACE
Uplink procedure:
O-DU sends configuration to O-RU (C-plane packets)
O-RU replies with client’s data (U-plane)
cuBB data placement: order U-plane PRBs into a single buffer for cuPHY
O-DU L1 processing: PUSCH on the GPU
O-DU forwards PUSCH output to upper layers
Uplink
32
O-RAN FH COMPLIANT 5G L1 INTERFACE
Downlink procedure:
O-DU sends configuration to O-RU (C-plane packets)
O-DU sets configuration parameters to cuBB
O-DU L1 processing on GPU:PDSCH
PDCCH
PBCH
O-DU sends data (U-plane)
O-RU receives the data
Downlink
33
NVIDIA’s AERIAL Solution
World’s First Fully SW Defined BBU
Industry can innovate at a faster pace.
Highest Performing & Scalable Cloud-Native Architecture
Significant Capacity and Power Efficiency gains.
Fastest PHY Processing
Efficient COTS based Platform for Edge Cloud
Improves utilization.
AERIAL5G
cuBB cuVNF
AERIAL delivers the industry's highest-performance software-defined 5G vRAN.
34
Poll Question #2
Are you familiar with GPUDIRECT RDMA ?
Yes
No
Thank You