Embedded Electronics for Telecom DSP

Preview:

Citation preview

Embedded Electronics for Telecom DSP

Aldebaro Klautau

Embedded Systems Lab (LASSE) @ Federal Univ. of Pará (UFPA)

V International Workshop on Trends in Optical Technologies (WTON)

CPqD – Campinas – Brazil - May 19, 2016

UFPA

Goal and Agenda

Goal: discuss options for prototyping new physical layers (PHY) of DSP-based telecommunication systems

From the perspective of a digital signal processing R&D group that (furiously) targets the highest possible bit rates

No ASICs, but discrete components & development boards

Agenda Motivation: demand for increased bit rates

Options for prototyping: emphasis on DSP processor and FPGA

Examples of prototypes using the most from available hardware

May 19, 2016 Aldebaro Klautau 2

Bit-rate hungry applications

Optical transmission with flexible transceivers

Software-defined radios and 5G Architecture: Small cells and centralized-RAN PHY: Spectrum aggregation, massive MIMO, mmWaves

Example of 4G traffic: 4 signals with BW=20 MHz ~3.7 Gbps

In newer versions of LTE number of antennas can be 16 or 32 Bit rate = 15 Gbps or 30 Gbps

Aldebaro Klautau 3 May 19, 2016

Electronic components and associated development boards for prototyping

Aldebaro Klautau 4 May 19, 2016

Prototype

GPU DSP ASSP ASIC FPGA

Standard cells

Full custom IC

GPU: graphics processing unit ASSP: application specific standard product

Complete DMT transceiver development

FFT-based Discrete Multi-Tone (DMT) bitloading supporting up to 10 bits per tone (1024-QAM)

5

Bits per tone

For DMT task: a DSP processor (SoC) chosen as platform

Aldebaro Klautau 6

4 cores FFT coprocessors

Network coprocessor

Viterbi coprocessors

C language programming

Our main motivation: program in C language

Besides, free open source routines available. Example: Forward Error Correction (FEC)

But good performance required heavy optimization

Comparison of Reed-Solomon (RS) implementations, per codeword

7

Many routines to split among cores

Issues related to concurrency and parallelism

April 6, 2016 Aldebaro Klautau 8

Architectural split of functionalities among DSP cores

9

Significant effort to optimize code for the platform

April 6, 2016 Aldebaro Klautau 10

Level 1 - Compiler Optimizations Level 2 - Code Organization/Refactoring Level 3 - Architecture Optimization

From “programmable logic” to the “platform FPGA”

11

[Lyke, 2015]

May 19, 2016

evolution

FPGA boards support several interfaces and peripherals

Several FMC (FPGA mezzanine card) boards

PC interface: PCIe to FPGA (up to 30 Gbps) Commonly present in FPGA evaluation boards

Aldebaro Klautau 12

High speed ADC/DAC cards

8x SFP expansion card

General purpose

Prototyping with FPGAs

HDL (VHDL, Verilog, etc.) is more difficult than C and most engineers are exposed to “programmable” logic (digital electronics) but not digital signal processing on FPGAs and parallel programming

Go for DSP “general-purpose” chips?

Note that multicore alternatives also require good skills on concurrent and parallel programming and often a profound knowledge of the chip architecture

Changing the DSP chip manufacturer requires studying the new architecture while FPGAs are more “generic”

FPGAs are more natural step towards silicon / ASIC than using DSP chips

Aldebaro Klautau 13

ADC trends

Photonic ADCs

Undersampling : signals sampled below their Nyquist rates

Compressive sampling E.g. Bayesian approach

May 23, 2016 Aldebaro Klautau 14

[Khilo, 2012]

Limits on ENOB (effective number of bits) due to Jitter

ADCs up to 2007

Darker blue: ADCs later than 2007

Some DAC performance numbers

Summary: DACs and AWGs (arbitrary waveform generators), together with ADCs and DSOs (digital storage oscilloscopes) operating at ~100 GSa/s

Hence, the computing platform (DSP, FPGA, ASSP, etc.) may be the bottleneck! 15

bits BW (GHz) Fs (Gsa/s) ENOB

Micram DAC-4 6 42 100 -

Micram DAC-3 6 23.8 72 4.5

Micram DACII 6 20 34 4

[Nagatani, 2011] 6 - 60 -

[Huang, 2014] 8 10 100 5.3

“Design gap” does not help those aiming at bit rate records

“Gap”: FPGA has enough capacity to accomodate most of the ASIC designs

But achieving symbol rates of tens of Gbauds is hard for a real-time transmitter implementation and often impossible for a receiver

Aldebaro Klautau 16

[Trimberger, 2015]

May 19, 2016

Architectures for PHY testbeds and demonstrations

Offline processing Both transmitter (Tx) and receiver (Rx) processing are performed offline

Often FPGA-based

Transmitter: samples are pre-computed, stored at e.g. FPGA memory and sent to channel via fast DAC

Receiver: fast digital storage oscilloscope (DSO) digitizes received signal

Real-time receiver processing Often based on ASICs or ASSPs

Real-time transmitter processing May use FPGA with internal PRBS generation to avoid “slow” interface to PC

Aldebaro Klautau 17 May 19, 2016

State of art offline processing example

1.125 Tb/s 15-carrier super-channel

Two DACs at 32 GSa/s (oversampling of 4 samples/symbol)

DSO with 62.5 GSa/s using two interleaved 33 GSa/s ADCs

Aldebaro Klautau 18 May 19, 2016

[Maher, 2016]

State of art Tx + Rx real-time processing example

[Eiselt, 2016] “First Real-Time 400G PAM-4 Demonstration for Inter-Data Center Transmission over 100 km of SSMF at 1550 nm”

ASIC chips

Extra info: 8 x 25.78125 GBaud signals, PAM-4, 100 km; 𝜆 = 1550 𝑛𝑚

19

Real-time transmitter processing example

Implementation by Ilan Sousa (UFPa). Joint work with CPqD IMOC 2015 Second Best Student Paper Award

Example of reaching limit of available hardware via DSP

Real-time fractional oversampling of high order modulation signals with Nyquist pulse shaping

Issues: Fractional sampling rate conversion: interpolate by L and decimate by M

FPGA clock is slow and parallelism is required

Need to minimize the number of multipliers

Aldebaro Klautau 20

DAC with Fs = 25 GSa/s and FPGA with 156.25 MHz clock

Parallelism level: 160 (= 25 GSa/s / 156.25 MHz)

Hardware limitation required parallelism

May 19, 2016 Aldebaro Klautau 21

Real-time Nyquist pulse shaping

Input symbols at given rate Rsym (e.g. 12.5 Gbauds) must be converted to samples at Fs (e.g. 25 Gsa/s) to feed the DAC

Often the oversampling factor L=Rsym/Fs is an integer Then “shaping” is equivalent to interpolation: upsampling followed by an FIR filter h[n] (the Nyquist pulse) with N coefficients

Aldebaro Klautau 22 May 19, 2016

Fractional sampling rate conversion (FSRC)

Fractional oversampling factor L/M Example 1: L=3 and M=2 implies L/M=1.5 samples/sym and Fs=1.5 Rsym

Example 2: L=10 and M=9 implies L/M=1.11 samples/sym and Fs=1.11 Rsym

Gives flexibility for Nyquist pulse shaping with respect to relation between symbol rate Rsym and sampling frequency Fs

May 23, 2016 Aldebaro Klautau 23

LPF Gain=L, ωc=π/L

L

𝒙[𝒎′] 𝐪[𝒎] 𝐳[𝒎]

LPF Gain=1, ωc=π/M

M

𝒚[𝒏] 𝐳′[𝒎]

Interpolator Decimator

Nyquist pulse shaping implementations

May 23, 2016 Aldebaro Klautau 24

Resampling = interpolation + decimation

LPF Gain=L,

ωc=min{π/L,π/M} M L

𝒚[𝒏] 𝒙[𝒎′] 𝐪[𝒎] 𝐳[𝒎]

LPF Gain=L, ωc=π/L

L

𝒙[𝒎′] 𝐪[𝒎] 𝐳[𝒎]

LPF Gain=1, ωc=π/M

M

𝒚[𝒏] 𝐳′[𝒎]

Interpolator Decimator

Combine the filters

Polyphase efficient implementation

Minimum number of multipliers and efficient use of memory

Example: L=3, M=5, parallelism P=15, V=5 stacked FSRCs

25 Aldebaro Klautau

Proposed Parallel FSRC

Results with parallel FSRC

Decreases computational cost by LM (for example: with L=16 and M=15 2 orders of magnitude)

FPGAs resources usage for L=5, M=4, with filter lengths N=51 or 101 using V = 32 stacked FSRCs (XC5 and XC7 and boards for Virtex 5 and 7, respectively)

26

Look-Up Tables:

Multipliers:

Validation results

Constellations for back-to-back (B2B) – first set of tests 28.125 GBd Sampling rate 𝐹𝑠 = 30 𝐺𝑆𝑎/𝑠

𝑂𝑣𝑒𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔 = 16/15 = 1.0667 samples per symbol

Symbol rate Rsym = 28.125 GBauds

Aldebaro Klautau 27

X polarization Y polarization

Channelization for FDM over fiber

An example in which smart (polyphase) filtering is not enough:

Aldebaro Klautau 28 May 19, 2016

Channelization: Digital signal processing

Mux signal transformations via DSP

~

Resample 𝑰𝒑

~

30

Carrier Carrier Complex Real

Demux signal transformations via DSP

~

Resample

𝑫𝒑

~

31

Carrier Carrier Complex Real

Adjacent channel strong interference

Classical filtering result

Filter length may not be enough

Problem: FPGA does not suport real-time operation with more than 3k multipliers

Signal

Gen

DEMUX

Analyzer

May 19, 2016 Aldebaro Klautau 32

Demux with improved filtering

~

Resample

𝑫𝒑

~

May 19, 2016 Aldebaro Klautau 33

Carrier Carrier Complex Real

Effect of improved filtering on received signal

May 19, 2016 34

FIR filters with length 90, 150 and 200 With significant improvement regarding distortion, etc.

Conclusions “Platform FPGAs” have been chosen for cutting-edge research testbeds due to their price and reconfigurability There are wonderful EDA flows to simplify design for FPGAs (e.g. Matlab VHDL FPGA), but for cutting-edge implementations, a skilled developer is often required with

Capability to write custom and efficient VHDL code Good understanding of corresponding IPs Trained to explore parallelism

Along with microelectronics and photonics, telecom algorithms will also evolve towards parallel implementations to cope with the increase on information processing rate

Benefit of increased degrees of freedom (e.g. spatial multiplexing in wireless and optical fibers)

Virtuous cycle: We develop better algorithms when evaluating their real-time implementation on hardware

35

Academia needs to update DSP courses!

Thanks! Obrigado!

LASSE @ Espaço Inovação – Parque Ciência e Tecnologia Guamá

aldebaro@ufpa.br - www.lasse.ufpa.br

April 6, 2016 Aldebaro Klautau 36

References [Khilo, 2012] Photonic ADC: overcoming the bottleneck of electronic jitter

[Huang, 2014] An 8-bit 100-GS/s distributed DAC in 28-nm CMOS

[Wong, 2014] Quantifying the Gap Between FPGA and Custom CMOS to Aid Microarchitectural Design

[Trimberger, 2015] Three Ages of FPGAs: A Retrospective on the First Thirty Years of FPGA Technology

[Lyke, 2015] An Introduction to Reconfigurable Systems

[Shannon, 2015] Technology Scaling in FPGAs: Trends in Applications and Architectures

[Maher, 2016] Increasing the information rates of optical communications via coded modulation: a study of transceiver performance

[Nagatani, 2011] A 60-GS/s 6-Bit DAC in 0.5-µm InP HBT Technology for Optical Communications Systems

[Huang, 2014] An 8-bit 100-GS/s distributed DAC in 28-nm CMOS

[Eiselt, 2016] First Real-Time 400G PAM-4 Demonstration for Inter-Data Center Transmission over 100 km of SSMF at 1550 nm

[Ilan, 2015] Parallel Polyphase Filtering for Pulse Shaping on High-Speed Optical Communication Systems

[Kuon, 2007] Measuring the Gap Between FPGAs and ASICs

[Jamieson, 2005] Mapping multiplexers onto hard multipliers in FPGAs

Aldebaro Klautau 37 May 19, 2016

Recommended