Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Forward Error Correction for THz CommunicationChallenges from an Implementation Point of View
PIMRC 2021
Workshop on Enabling Technologies for Terahertz Communications
Norbert Wehn
▪ EPIC: Enabling Practical Wireless Tb/s Communications with Next Generation Channel Coding
▪ EPIC (2017-2020), was a Pan-European collaborative project funded by EC H2020 framework program (ICT-09-2017 “Networking research beyond-5G”)
▪ EPIC consortium was composed of 8 leading industry & academic institutions− Technikon
− Interdigital, Ericsson, Creonic, Polaran
− TU Kaiserslautern, IMT Atlantique, IMEC
▪ Along with 6 other projects, EPIC is a founding member of the European ICT Beyond 5G Cluster.
Background
THz Throughput Requirements
B5G applications require Tbit/s throughput e.g. wireless backhaul, extended reality, BCI
This talk: focus on digital baseband processing
Source: G. Fettweis, MPSoC 2018
5G Terminal Baseband Computational Requirements
Source: G. Fettweis, MPSoC 2018
Modulator Equalizer It. MIMOdetector
Channeldecoder
Baseband Processing for Tb/s Applications
Advanced FEC (Turbo-, LDPC-, Polar-Codes) is one of the most-complex modules in the baseband chain
▪ Computational complexity
▪ Throughput and latency
▪ Power consumption
In the past FEC implementations have greatly benefited from Moore’s Law
5
1995 – 2020
▪ Moore’s Law: ~3 orders of magnitude
▪ Throughput: ~5 orders of magnitude
Increasing Gaps
Example Turbo-Code decoder
▪ Decoder 1 (2004): UMTS compliant− 180nm technology, 166MHz, 80Mbit/s, 30mm2
▪ Decoder 2 (2011): LTE compliant− 65nm technology, 450MHz, 2.15Gbit/s, 7.7mm2
▪ Comparison− 180nm, 130nm, 90nm, 65nm
− Throughput 27x but frequency only 3x (code, algorithm, architecture)
− Throughput/area 100x (Moore’s law)
B5G - The era of increasing gaps [A. Bahai, ISSCC 2017 Plenary Talk]
▪ Mobile data traffic doubles in ~16 Months
▪ Bandwidth efficiency doubles in ~30 Months
▪ Transistor scaling doubles in ~48 Months
▪ Battery energy density doubles in ~120 Months
6
Moore‘s Law & Cost
7
Source Qualcomm
Source IBM
Source TSMC
Power & Power Density
8
Transistor integration density per area unit increases faster than power reduction of single transistor
▪ Power per area unit increases
▪ For fixed power density limited by TDP: number of transistors that can switch simultaneously is decreasing
→ Dark silicon
Source Wikipedia
Source ARM 2009
FEC Key Performance Indicators
Communication KPIs
▪ BER/FER: strongly depends on application scenario (10-6…10-12)
▪ Code flexibility: e.g. code length, code rate
Implementation KPIs
9
▪ For fixed FECpower improving Energy_efficiency is equivalent to increasing the throughput
▪ Minimizing the Power_density is a trade-off between Area_efficiency and Energy_efficiency
B5G FEC Decoder Requirements
▪ Baseband SoC < 100mm2: 10% reserved for FEC decoder
▪ FECpower ~ 1 Watt
▪ Throughput ~ 1 Tb/second
10
Source Philip Wong
▪ FEC solutions > 100Gbit/s already exist e.g. optical communication
− Mostly restricted to algebraic hard decision decoding techniques
▪ Here we consider most advanced Soft-information based FEC schemes
− Turbo-, LDCP-, Polar-Codes
1 Tb/s FEC Decoder Challenges
𝑇𝑖𝑛𝑓 ~ 𝑁 ∗ 𝑅 ∗ 𝜋 ∗1
𝐼∗ 𝑓 [𝑏𝑖𝑡𝑠/𝑠]
Tinf : Information throughput
N : code block size, R: code rate
I : number of iterations for iterative decoding algorithm else I=1
Π : operations performed per cycle / total number of operations for one decoding iteration
f : frequency (≤ 1GHz)
Throughput 1Tb/s ↑N ↑π ↑f ↑R ↓I
Area ~ N, π ↓N ↓π
Power ~ f, N, π ↓N ↓π ↓f
Energy efficiency ~ I, 1/R ↑R ↓I
Area efficiency ~ R, f ↑R ↑ f
Error correction perf. ~ N, 1/R, I ↓R ↑N ↑I
Implementation Efficiency
Communications Performance
Efficient high throughput architectures
▪ Low complexity decoding algorithms
▪ Large locality, regularity and parallelism
Information theory
▪ Complex decoding algorithms
▪ Irregularity, Iterative/sequential algorithms
fmax ~ 1GHz: 1Tb/s → 1000 information bits have to be processed in parallel
Design Spaces and Challenges
12
▪ Increase parallelism
▪ Interleaver
▪ Low complexity decoding algorithms
▪ Inherent Parallelism
▪ Manage data transfers/routing congestion
▪ Serial decoding of SC/SCL
▪ Managing storage
Turbo-Code Decoding [1]
13
▪ Turbo-Code decoder: inherent serial on MAP and decoder level
PMAP: spatial parallelism FMAP: functional parallelism
FrameFlexible, AfterBurner TD
▪ 28nm FD-SOI technology
▪ Blocksize 128
▪ Max. 4 iterations
▪ 102.4 Gbit/s
▪ Area 14mm2
[1] „Advanced Hardware Architectures for Turbo Code Decoding Beyond 100 Gb/s“, S. Weithoffer, O. Griebel, R. Klaimi, C. A. Nour, N. Wehn. IEEE Wireless Communications and Networking Conference, April 2020, South Korea.
LDPC-Code Decoding
14
▪ BP(Min-Sum) -LDPC decoding: inherent parallel on variable/check node level decoder level
▪ Further increase in throughput: iteration unrolling, multi-core
[1] R. Ghanaatian et al., "A 588-Gb/s LDPC Decoder Based on Finite-Alphabet Message Passing“ in IEEE Transactions on VLSI Systems, Feb. 2018.[2] M. Li et al., "High-Speed LDPC Decoders Towards 1 Tb/s," in IEEE Transactions on Circuits and Systems I: Regular Papers, May 2021.
8 cores, 1GHz
1.48 mm2
DecoderHardware
1.5dBN=64800
N=51456
Min-Sum, float, 200 iterations
LDPC-Code Decoding [1]
15
Unrolled decoder architecture versus multi-core architecture (single core: full node level parallelism)
▪ (684, 540) WiFi code, Min-Sum 2-Phase, 22nm FD-SOI, same communication performance
Unrolled 2-phase decoder architecture versus unrolled layered decoder architecture
▪ (1032, 860) EPIC code, Min-Sum 2-Phase/layered, 22nm FD-SOI, same communication performance
[1] „Forward-Error-Correction for Beyond-5G Ultra-high Throughput Communications“, N. Wehn, O. Sahin, M. Herrmann, accepted for publication, International Symposium on Topics in Coding 2021 (ISTC 2021), September, 2021, Montréal, Canada.
SC-LDPC Code Decoding
16
Row-Layered Pipelined Decoding (RPD)
▪ State-of-the-art decoding algorithm
▪ Multiple layers processed in parallel
▪ #processors ~ #iterations
▪ Large decoding window / high decoding latency
Full-Parallel Window Decoding (FPWD)▪ Only sketch of decoding algorithm in
literature▪ Multiple overlapping decoding windows
processed in parallel▪ Processors exchange extrinsic messages in
the overlapping regions▪ Large number of iteration: lower latency,
less memory, higher throughput than RPD
[1] „A 336 Gbit/s Full-Parallel Window Decoder for Spatially Coupled LDPC Codes“, M. Herrmann, N. Wehn, M. Thalmaier, M. Fehrenz, T. Lehnigk-Emden, M. Alles. Joint European Conference on Networks and Communications & 6G Summit, June, 2021, Porto, Portugal.
[1]
SC-LDPC Code Decoding
17
SC-LDPC (FPWD) versus unrolled block LDPC codes [1]
▪ Code: same (sub)block size ~640, same code rate ~0.8
▪ 22nm FD-SOI technology
[1] „Forward-Error-Correction for Beyond-5G Ultra-high Throughput Communications“, N. Wehn, O. Sahin, M. Herrmann, accepted for publication, International Symposium on Topics in Coding 2021 (ISTC 2021), September, 2021, Montréal, Canada.
Polar-Code Decoding
18
Erdal Arikan (2009), Norbert Stolte (2002)
▪ Proven to achieve channel capacity for Binary Symmetric Memoryless Channels
▪ Many different decoding algorithms Successive Cancelation (SC), SCL, BP…
Reduction of tree size by different optimizations e.g.
▪ Replace repetition codes and parity check code by one single nodes
▪ Merge rate-0 codes and rate-1 nodes into parent nodes
Polar-Code Decoder
19
SC, SCL… → sequential behavior
▪ Depth-first (SC, SCL) or breadth-first (BP) traversal on optimized polar factor tree
(1024, 512) SC Polar Code decoder
▪ 16nm FINFET technology
▪ Coded throughput 1229 Gb/s
SC Decoder (1024, 512) Code▪ 28nm technology ▪ Logic stages 385▪ Optimized pipeline stages 105 (f ~ 600MHz)
Polar-Code Decoder
20
SC suffers in communications performance: SC vs SC-List(x) [1]
▪ 28nm FD-SOI technology
[1] „A 506 Gbit/s Polar Successive Cancellation List Decoder with CRC“, C. Kestel, L. Johannsen, O. Griebel, J. Jimenez, T. Vogt, T. Lehnigk-Emden, N. Wehn. IEEE 31st PIMRC'20
Design Space Exploration Frameworks
C++ based Design Space Exploration frameworks for high throughput LPDC-Code and Polar-Code decoder
▪ (SC-)LDPC decoder: 2-phase, layered BP, min-sum, information bottleneck, single-/multi-core
▪ Polar-Code decoder: SC, SCLx
21
Summary
Implementation KPIs for advanced technology nodes
▪ Throughput Tb/s is feasible
▪ Area and area efficiency are feasible
▪ Energy efficiency is feasible
▪ Power density is still a major challenge
Communication KPIs
▪ Limited to smaller block sizes and small number of iterations
▪ SC-LDPC is a promising approach to process large blocks
▪ Limited flexibility− Could exploit dark silicon
▪ FEC decoders− Customizable building blocks
− Code concatenation
22
Thank you for attention!
For more information please visit
http://ems.eit.uni-kl.de