32
Extending/Optimizing the USRP/RFNOC Framework for Implementing Latency-Sensitive Radio Systems Joshua Monson*, Zhongren Cao^, Pei Liu~, Travis Haroldsen*, Matthew French* 11/14/2018 *USC Information Sciences Institute Arlington, VA ^C3-COMM Systems Vienna, VA ~New York University 4676 Admiralty Way Marina del Rey, CA 3811 N Fairfax Drive Arlington, VA

Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

Extending/Optimizing the USRP/RFNOC Framework for Implementing Latency-Sensitive

Radio Systems

Joshua Monson*, Zhongren Cao^, Pei Liu~, Travis Haroldsen*, Matthew French*

11/14/2018*USC Information Sciences Institute Arlington, VA

^C3-COMM Systems Vienna, VA~New York University

4676 Admiralty WayMarina del Rey, CA

3811 N Fairfax DriveArlington, VA

Page 2: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

USC Information Sciences Institute• Reconfigurable Computing Group

@ USC/ISI – Over 20 years performing cutting edge

FPGA research– > 100 Journal and Conference publications– Focused on FPGA and ASIC, System-Level

Design, Productivity, TRUST and Security– Custom ASIC/FPGA CAD Tool (TORC)

• ISI: A Large, vibrant, path-breaking research Institute– Part of USC’s Viterbi School of Engineering

located in “Marina Tech Campus” (Marina del Rey) and in Arlington, VA

– >$80M per year in funding from a diversified base of sponsors

– ~300 people mostly research staff– Facilities to conduct ITAR, classified, and

unclassified research

Page 3: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

Overview

• Discuss Extensions/Optimizations to the UHD/RFNOC that allowed us to meet stringent latency requirements of the transceiver

• Experiences, lessons learned, and development efforts to implement a broadband CSMA/CA based OFDM transceiver on an Ettus E310 USRP Software Defined Radio (SDR) platform.

• Supports link layer latency-cooperative transmissions

Page 4: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

COMBAT Project

•Over 95% of casualties in Operations Enduring Freedom, Iraqi Freedom, and New Dawn occurred after operations transitioned from linear, conventional fights to the nonlinear, nonconventional stabilization phase

–A majority of missions during the nonconventional stabilization phase are carried out by dismount squads at the tactical edge

–Sharing situational awareness among soldiers is vital to mission successes

–Multicast plays an increasingly important role in edge networks

•Goal: Improve the throughput of wireless multicast and broadcast in dismounted squad networks in order to significantly enhance the situational awareness at the tactical edge

1. C. Thielenhaus, P. Traeger, E. Roles, “Reaching forward in the war against the Islamic State,” PRISM, Nation Defense University 12/2016

2. E. Roles, Presentation on RAA in DARPA Industry Day, Jan. 2017Sources:

DISTRIBUTION A. Approved for public release: distribution unlimited.

Page 5: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

COMBAT Innovations

• COMBAT system works to support and improve IP multicast schemes such as the SMF in RFC 6621 – IP multicast to provide connections among clusters – Local mobility within clusters is handled by COMBAT in the link layer

Higher Data Rate

Transparent to IP Layer

Responsive to Channel

Reduced Complexity

Efficient Channel Utilization

7

Page 6: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

System Level Simulation

• Simulation Setup– Topology: Mobile distributed on a disc R=100 meter– Radio Propagation: Path loss + AWGN– Traffic: Multicast only from a random node.– Random Waypoint Model: Velocity (0.1-4 m/s), Pause duration (0-60 s).

DISTRIBUTION A

Page 7: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

COMBAT Objectives

• Demonstrate Improvements in Relay Throughput on Prototype OFDM Transceiver over Worst Link Scenario

• Requirements:– 10 MHz Bandwidth, 40 MSPS– Utilize CSMA/CA

Page 8: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

COMBAT Latency Requirements• Contention-free relay with PHY-assisted

ACK/NACK• CSMA/CA deadlines

– Enable Compatibility with Existing Systems

• Throughput– RX-to-TX Latency is critical for throughput

Page 9: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

Software-Defined Architecture

• USRPs are latency-insensitive peripherals• Latency is low, but SDRs not intended to meet

stringent latency CSMA/CA deadlines

Waveform in the Air

Antenna(s)

ADC

DAC

N

N

H(n)Decimation

H(n)Interpolation

AnalogFilter

Banks/Mixers

FPGA Discrete RF DevicesHost

USB/Ethernet

Page 10: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

Ethernet/USB

Connection

Enabling Advances in SDR Architecture Block Diagram of N210

FPGA RF-FrontHost

Block Diagram of E310

RF-FrontFPGAHost

Embedded SDR Architecture• Single Package• Host/FPGA Latency Smaller• Larger FPGA

Standard SDR Architecture

ADC

DAC

N

N

H(n)Decimation

H(n)Interpolation

AnalogFilter Banks

ADC

DAC

N

N

H(n)Decimation

H(n)Interpolation

AnalogFilter Banks

ARM CORTEXA9

Page 11: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

Radio Frequency Network on Chip (RFNOC)

RF-F

ront

end

ADC/DAC IF

RFNOC

Custom Accelerator0

CustomAccelerator1

CustomAccelerator2

HostFPGAUSB/Ethernet

Discrete RF Devices

• Dynamically Programmable Network-on-Chip• Provides a design entry-point into the FPGA

Page 12: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

E310 RFNOC Base Design (FPGA Only)

ADC/DAC IF

LOOPBACKNOC IFNOC IF

NOC IFNOC IF

NOC IF

NOC IF

NOC IF

NOC IF

RFNOC

ADC/DAC IF

ZYNQ FIFO16 Channel Host-to-RFNOC

DMA

FOSPHORFIR FILTER

FFT

SIGNALGEN

RFNOC Initial Configuration on E310

Page 13: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

E310-RFNOC Base Design (FPGA Only)

ADC/DAC IF

LOOPBACKNOC IFNOC IF

NOC IFNOC IF

NOC IF

NOC IF

NOC IF

NOC IF

RFNOC

ADC/DAC IF

ZYNQ FIFO16 Channel Host-to-RFNOC

DMA

FOSPHORFIR FILTER

FFT

SIGNALGEN

X

X

XX

XXXX

XX

NOC IF are ~30% Resource Utilization

Page 14: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

E310 RFNOC Base Design (FPGA Only)

ADC/DAC IF

LOOPBACKNOC IFNOC IF

NOC IFNOC IF

NOC IF

NOC IF

NOC IF

NOC IF

RFNOC

ADC/DAC IF

ZYNQ FIFO16 Channel Host-to-RFNOC

DMA

FOSPHORFIR FILTER

FFT

SIGNALGEN

X

X

XX

XX

8 Channel

*Required Extension/Update UHD

Page 15: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

E310 RFNOC Base Design (FPGA Only)

ADC/DAC IFNOC IF NOC IFZYNQ FIFO

16 Channel Host-to-RFNOC DMA8 Channel

LUT FF BRAM DSP

Total ZYNQ 7020 Device

53,200 106,400 140 220

RFNOC(Initial)

41,247(77%)

55,783(52.4%)

116(82.8%)

146(66.3%)

RFNOC baseline (RFNOC/Radio)

12,546(23.5%)

15,840(14.8%)

26(18.6%)

0(0%)

COMBAT optimized

45,319(85.1%)

52,540(47%)

104.5 (74.6%)

120 (52%)

Freed 53% of the FPGA Resources!

Page 16: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

RFNOC Latency

ADC/DAC IF

NOC IFNOC IF

NOC IF

ZYNQ FIFO16 Channel Host-to-

RFNOC DMA8 Channel

RF-F

ront

endRX Block

2.5 usFixed

Both Directions

~200 nsFixed

Both Directions

RFNOC CLOCK = 50 MHz (400 MB/Sec) CE CLOCK = 40 MHz (40 MSPS, 160 MB/Sec)

~.625 us +RFNOC Packet

Buffering Delayeach way

B

B

• Packet Buffering into RFNOC is Primary Cause of Delay.

• Dependent on Programmable Packet Size.

• Static Delay through RFNOC .625 us

Page 17: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

RFNOC Latency

ADC/DAC IF

NOC IFNOC IF

NOC IF

ZYNQ FIFO16 Channel Host-to-

RFNOC DMA8 Channel

RF-F

ront

endRFNOC Block

RFNOC CLOCK = 50 MHz (400 MB/Sec) CE CLOCK = 40 MHz (40 MSPS, 160 MB/Sec)0 50 100 150 200 250 300

Samples Per Packet

0

10

20

30

40

50

60

70

80

MSP

S

10 6 RFNOC Throughput

Required Thruput

16 Samples

Minimum RX –to-TX DelayRFNOC 3.0 usADC/DAC IF ~250 nsRF-Frontend 5.0 usTotal 8.25 us

B

B

Page 18: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

Low-Latency RFNOC Extension

ADC/DAC IF

NOC IFNOC IF

NOC IF

ZYNQ FIFO16 Channel Host-to-

RFNOC DMA8 Channel

RF-F

ront

end

RFNOC Block

RFNOC CLOCK = 50 MHz (400 MB/Sec) CE CLOCK = 40 MHz (40 MSPS, 160 MB/Sec)

Minimum RX –to-TX DelayRFNOC 3.0 usADC/DAC IF ~250 nsRF-Frontend 5.0 usTotal 5.25 us

LL

LL

Low-Latency RFNOC Block Benefits:• Minimum Latency Limited by RF

configuration and a couple cycles• Selectable – Select LL for TX, RX, or

TX/RX• Maintains Compatibility with

RFNOC/UHD; however needs additional work to support GNURadio

Page 19: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

Other Mods/Extensions to E310/UHD

• Updates to REPO– Added Block Diagram Build Flow– Construct Vivado GUI Project for Block Diagrams– Smoother Integration Vivado IP

• Exposed AD9361 Control– Exposed SPI Interface to Write AD9361 Control

• Debugging– Built Custom Cable and Implemented Virtual JTAG– Integrated Virtual JTAG Server/Driver

Page 20: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

OFDM Transceiver Architecture1. Upper MAC

a) Wraps Payloads in Packets

b) Manages COMBATProtocols

c) Moves Packets toand from Packet Buffers

2. Lower MACa) Configures and Manages

RX and TXb) Handles latency sensitive

protocol (e.g. relay forwarding, etc.)

c) Inform Upper MAC of received packets.

3. Packet Buffersa) TX: Hold packets that

have been staged from Transmission

b) RX: Hold packets that have been received and decoded

4. OFDM RX/TXa) TX: Transform Bits to

OFDM baseband samples.

b) RX: Transforms OFDM baseband Samples to bits

TX BitsTX SamplesRX BitsRX SamplesControl BUS

ARM (Hard IP)

Upper MAC

AD93

61

ADC/DAC IF

OFDM RX RX

RFNOC

OFDM TX TX

Packet Buffers

MircoBlaze

Lower MACMailbox

DISTRIBUTION A. Approved for public release: distribution unlimited. 25

• 8 Rate Settings (3-27 Mb/s)• 40 MSPS, 10 MHz BW

Performance Statistics

Transmit Directly from RX Buffer to Minimize Latency

ZYNQ 7020 FPGA

Page 21: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

Latency Analysis

ARM (Hard IP)

Upper MAC AD

9361

AD9361 IF

OFDM RX RX

RFNOC

OFDM TX TX

Packet Buffers

MircoBlaze

Lower MACMailbox

DISTRIBUTION A. Approved for public release: distribution unlimited. 26

ZYNQ 7020 FPGA

Page 22: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

Latency Analysis

ARM (Hard IP)

Upper MAC

AD93

61

AD9361 IF

OFDM RX RX

RFNOC

OFDM TX TX

Packet Buffers

MircoBlaze

Lower MACMailbox

DISTRIBUTION A. Approved for public release: distribution unlimited. 27

ZYNQ 7020 FPGA

2.5.2

Page 23: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

Latency Analysis

ARM (Hard IP)

Upper MAC

AD93

61

AD9361 IF

OFDM RX RX

RFNOC

OFDM TX TX

Packet Buffers

MircoBlaze

Lower MACMailbox

DISTRIBUTION A. Approved for public release: distribution unlimited. 28

ZYNQ 7020 FPGA

2.5.217

Page 24: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

Latency Analysis

ARM (Hard IP)

Upper MAC

AD93

61

AD9361 IF

OFDM RX RX

RFNOC

OFDM TX TX

Packet Buffers

MircoBlaze

Lower MACMailbox

DISTRIBUTION A. Approved for public release: distribution unlimited. 29

ZYNQ 7020 FPGA

2.5.217

4.1

Page 25: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

Latency Analysis

ARM (Hard IP)

Upper MAC

AD93

61

AD9361 IF

OFDM RX RX

RFNOC

OFDM TX TX

Packet Buffers

MircoBlaze

Lower MACMailbox

DISTRIBUTION A. Approved for public release: distribution unlimited. 30

ZYNQ 7020 FPGA

2.5.217

4.1

.1

Page 26: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

Latency Analysis

ARM (Hard IP)

Upper MAC

AD93

61

AD9361 IF

OFDM RX RX

RFNOC

OFDM TX TX

Packet Buffers

MircoBlaze

Lower MACMailbox

DISTRIBUTION A. Approved for public release: distribution unlimited. 31

ZYNQ 7020 FPGA

2.5.217

4.1

.1

.2

29.3 us

Page 27: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

RFNOC Latency Measurements

58 us

End of Packet Start of RelayRF RFNOC NOCRX MAC

2.5 2.510.4 7.617.0 20.0TX.1

NOC Delays Primarily Related to RFNOC Packet Size & Buffering

Opportunities to further optimize MAC Code can be optimized

RX-to-TX Latency exceeds Requirement (34 us)!Best Case Relay of 10.98 Mb/S

Page 28: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

3.88 3.9 3.92 3.94 3.96 3.98 4 4.02 4.04 4.06

10 4

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

10 4

Low-Latency IF Measurements

~26.6 us

End of Packet Start of Relay

RF RFRX MAC2.7 2.717.0 4.1

TX

.1

• Reduced RX-to-TX Latency by 50%!

• Optimizations:• RFNOC

• Adjusted FC Buffers• Adjusted FC ACKs

• MAC Code• Added –O3 (50%)• Adjusted Algorithms• Overlapped More relay

processing with RX• Added B. Shifter, Mult

- Meets Latency Requirement+4.7% Achievable Relay Throughput!- 2.5% Packet loss due to TX Underrun

- Likely due to Flow Control ACK

Page 29: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

Low-Latency IF Measurements

0 1 2 3 4 5 6 7

10 4

-4

-3

-2

-1

0

1

2

3

410 4

Relay Latency: 22.6 us

OFDM Relay 18 Mb/s Rate

Relay1st Packet

Page 30: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

RFNOC Review

Benefits Drawbacks

Simple FPGA Design Entry Point into Ettus Stack

Not Intended for ultra-low latency processing

Excellent for High-level Design Tools like GNURadio/RFNOC

No Abstraction for Low Latency Signals communicating between blocks (e.g. state-inputs and outputs)

Excellent for Modular Design

Flexible

Stock IP to Connect To RFNOC

• We will be releasing our Low-Latency RFNOC Block and NOC Radio Extension as open-source shortly on https://github.com/ISI-RCG (likely location)

• Recommended using these cores for ultra-low latency processing

Page 31: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

Conclusion and Summary

• Implemented Latency-Sensitive CSMA/CA Radio System on USRP E310/E312

• Made Several Extensions and Improvements to Permit Latency Sensitivity– Created Low-Latency Radio Block and Low Latency RFNOC Block

Shell– UHD Updates that Permitted RX->TX without Host– Enabled Vivado Block Diagram Build Flow– Implemented Lower MAC In Microblaze Processor– Enabled TX to Read from RX Packet Buffer– Transfer Packets to/from FPGA Rather than Samples

• Future Work:– Push Abstraction Levels Higher (Mapping to GNURadio)

Page 32: Extending/Optimizing the USRP/RFNOC Framework for ... · FPGA research – > 100 Journal and Conference publications – Focused on FPGA and ASIC, System-Level Design, Productivity,

Questions

41