Real-World Considerations for Deep Learning in Spectrum Sensing · 2020. 1. 22. · Real-World Considerations for Deep Learning in Spectrum Sensing Steven C. Hauser (ABSTRACT) Recently,

Real-World Considerations for Deep Learning in Spectrum Sensing

Steven C. Hauser

Thesis submitted to the Faculty of the

Virginia Polytechnic Institute and State University

in partial fulfillment of the requirements for the degree of

Master of Science

in

Electrical Engineering

Alan J. Michaels, Co-chair

A. A. (Louis) Beex, Co-chair

William C. Headley

Ryan K. Williams

May 7, 2018

Blacksburg, Virginia

Keywords: Machine Learning, Spectrum Sensing, Neural Networks, Automatic Modulation

Classification, Communication Systems

Copyright 2018, Steven C. Hauser


Steven C. Hauser

(ABSTRACT)

Recently, automatic modulation classification techniques using deep neural networks on raw

IQ samples have been investigated and show promise when compared to more traditional

likelihood-based or feature-based techniques. While likelihood-based and feature-based tech-

niques are effective, making classification decisions directly on the raw IQ samples removes

the need for expertly crafted transformations and feature extractions. In practice, RF en-

vironments are typically very dense, and a receiver must first detect and isolate each signal

of interest before classification can be performed. The errors introduced by this detection

and isolation process will affect the accuracy of deep neural networks making automatic

modulation classification decisions directly on raw IQ samples. The importance of defining

upper limits on estimation errors in a detector is highlighted, and the negative effects of

over-estimating or under-estimating these limits is explored. Additionally, to date, most of

the published research has focused on synthetically generated data. While large amounts of

synthetically generated data is generally much easier to obtain than real-world signal data,

it requires expert knowledge and accurate models of the real world, which may not always

be realistic. The experiments conducted in this work show how augmented real-world signal

captures can be successfully used for training neural networks used in automatic modulation

classification on raw IQ samples. It is shown that the quality and duration of real world

signal captures is extremely important when creating training datasets, and that signal cap-

tures made from a single transmitter with one receiver can be broadly applicable to other

radios through dataset augmentation.


Steven C. Hauser

(GENERAL AUDIENCE ABSTRACT)

With the increasing prevalence of wireless devices in every day life, communicating between

them can become more difficult because the devices must contend with each other to send

and receive information. Being able to communicate in a variety of environments can be chal-

lenging and, while devices can be pre-configured for certain situations, devices that are able

to automatically adjust how they communicate are more reliable and robust. The research

presented in this thesis will contribute to solving this challenge by considering machine-

learning based, radio frequency signal processing algorithms that are able to automatically

group different communication signals. Being able to automatically group different signals is

helpful because it can provide information about the wireless environment, allowing a device

to make intelligent decisions based on what it detects is happening around it. However,

before these algorithms can be successfully used in wireless devices, their limitations must

be better understood. To this end, the work in this thesis will show how sensitive these al-

gorithms are to imperfections in wireless devices. This work will also show how information

from new environments can be captured and manipulated to allow these algorithms to scale

for unseen environments and communication signals.

Contents

List of Figures vii

List of Tables xvi

1 Introduction 1

2 Background 6

2.1 Deep Learning for Wireless Communications . . . . . . . . . . . . . . . . . . 7

2.1.1 Deep Learning Architectures . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.2 Deep Learning Applied to Communication Systems . . . . . . . . . . 15

3 Receiver Effects 19

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Convolutional Neural Network Architecture . . . . . . . . . . . . . . . . . . 21

3.3 Neural Network Training and Simulations . . . . . . . . . . . . . . . . . . . 22

3.4 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

iv

3.4.1 Frequency Offset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.4.2 Sample Rate Offset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.4.3 Sample Rate Offset and Frequency Offset . . . . . . . . . . . . . . . . 37

4 Augmenting Over-the-Air Signal Captures 39

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.2 Neural Network Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.3 Datasets and Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.3.1 System Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.3.2 Training Dataset Generation . . . . . . . . . . . . . . . . . . . . . . . 44

4.3.3 Test Dataset Generation . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.3.4 Augmenting Existing Data . . . . . . . . . . . . . . . . . . . . . . . . 46

4.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.4.1 Synthetic Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.4.2 Augmenting Dataset 1: 10,000 Examples at 15 dB SNR . . . . . . . . 50

4.4.3 Augmenting Dataset 2: 10,000 Examples at 7.5 dB SNR . . . . . . . 54

4.4.4 Augmenting Dataset 3: 100,000 Examples at 15 dB SNR . . . . . . . 59

4.4.5 Augmenting Dataset 4: 100,000 Examples at 7.5 dB SNR . . . . . . . 62

4.4.6 Improving Performance with Dataset 2: 10,000 Examples at 7.5 dB SNR 64

4.5 Applying to New Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

v

4.5.1 Overall Classification Accuracy . . . . . . . . . . . . . . . . . . . . . 71

5 Conclusions and Future Work 78

Bibliography 84

vi

List of Figures

2.1 Example of a simple multilayer perceptron neural network. Each dimension of

the input vector is fully connected to every hidden node, and then the outputs

of the hidden nodes are combined to get the desired output y of the system. 9

2.2 Nonlinear activation functions used in this work. . . . . . . . . . . . . . . . . 10

2.3 Example of a simple convolutional neural network. The hidden nodes only

connect to a subset of the input at once, and convolve across the input to

calculate the output of the layer, which acts as the input to the next layer. . 12

2.4 Example of a multi-layer convolutional neural network. As highlighted in

orange, the input view of the deeper layers is larger than that of earlier layers. 13

2.5 Example of a simple recurrent neural network. The input is sequentially

processed by the hidden layer, and the previous state of the hidden layer is

also used as an input to the next state for the next point in the input. The

output is calculated at the end of the sequence. As shown, the hidden layer

consists of a single node, and that node is updated through time via the new

time input and the previous state of the hidden node. . . . . . . . . . . . . . 14

vii

3.1 Imperfections introduced by the detection and isolation stage of a receiver.

On the top is a wideband snapshot of a dense spectral environment, and on

the bottom is a baseband signal after detection and isolation. Frequency error

(left) and sample rate mismatch (right) are shown in red. . . . . . . . . . . . 20

3.2 Diagram of a representative convolutional neural network architecture de-

signed for AMC using only raw IQ samples. . . . . . . . . . . . . . . . . . . 22

3.3 Neural network testing accuracies using 256 raw IQ input samples averaged

over all considered SNRs. As the offsets in the training set increase, the overall

performance of the CNN decreases. The decrease is most noticeable for CNNs

trained to generalize over frequency offsets. . . . . . . . . . . . . . . . . . . . 24

3.4 Classification accuracy of Ideal CNN trained assuming no frequency offset.

Accuracy quickly drops as the tested frequency offset deviates from 0. . . . . 26

3.5 Classification accuracy of Freq 2.5 CNN trained assuming frequency offsets

from ±2.5% of the sample rate. Accuracy is relatively flat within the training

range, but quickly drops as the tested frequency offset increases beyond 2.5%. 27

3.6 Classification accuracy of Freq 5 CNN trained assuming frequency offsets from

±5% of the sample rate. Accuracy is relatively flat within the training range,

but quickly drops as the tested frequency offset increases beyond 5%. . . . . 28

3.7 Classification accuracy vs. SNR for Ideal, Freq 2.5, and Freq 5 CNNs tested

with samples at no frequency offset. For high SNRs, the CNN is able to gener-

alize over small frequency offsets without significant degradation in accuracy.

However, accuracy is significantly degraded at lower SNRs. . . . . . . . . . . 29

viii

3.8 Classification accuracy vs. frequency offset for Ideal, Freq 2.5, and Freq 5

CNNs tested at an SNR of 10 dB. Maximum classification accuracy decreases

significantly as the CNN is trained for broader ranges of frequency offsets. . 30

3.9 Classification accuracy vs. SNR for Freq 5 CNN trained with varying input

size. Increasing the number of inputs to the CNN improves classification

accuracy across all SNRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.10 Classification accuracy of Ideal CNN trained assuming a sample rate of twice

the bandwidth. Accuracy quickly drops as the tested sample rate deviates

from twice the bandwidth. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.11 Classification accuracy of Samp 2.5 CNN trained assuming sample rates be-

tween 1.5 and 2.5 times the bandwidth. Accuracy is relatively flat within the

training range, but quickly drops as the tested sample rate decreases below

1.5 or increases above 2.5 times the bandwidth. . . . . . . . . . . . . . . . . 33

3.12 Classification accuracy of Samp 4 CNN trained assuming sample rates from

the signal bandwidth to 4 times the bandwidth. Accuracy is relatively flat

within the training range, but quickly drops as the tested sample rate increases

beyond 4 times the bandwidth. . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.13 Classification accuracy vs. SNR for Ideal, Samp 2.5, and Samp 4 CNNs tested

with samples at twice the bandwidth. All CNNs converge to the same accuracy

at high SNRs, but noticeable degradation occurs at SNRs below approximately

11 dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.14 Classification accuracy vs. sample rate for Ideal, Samp 2.5, and Samp 4 CNNs

tested at an SNR of 10 dB. Maximum classification accuracy does not decrease

significantly as the CNN is trained for broader ranges of input sample rates. 36

ix

3.15 Classification accuracy vs. SNR for Samp 4 CNN trained with varying input

size. Increasing the number of input samples improves performance, but the

improvement is limited for higher SNRs. . . . . . . . . . . . . . . . . . . . . 37

3.16 Classification accuracy vs. SNR for Ideal, Freq 2.5, Samp 2.5, and Samp Freq

CNNs tested with signals at no frequency offset and sampled at twice the signal

bandwidth. The maximum accuracy is measurably degraded when the neural

network is trained to handle both frequency and sample rate offsets. . . . . . 38

4.1 GNURadio flowgraph used for collecting over-the-air signal captures. . . . . 43

4.2 Classification accuracy of CNN architecture for various training scenarios us-

ing synthetic datasets. Synthetic training datasets that do not accurately

model the real world result in poor neural network performance when de-

ployed in real-world systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.3 Classification accuracy of LSTM architecture for various training scenarios

using synthetic datasets. Synthetic training datasets that do not accurately

model the real world result in poor neural network performance when deployed

in real-world systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.4 Classification Accuracy of CNN architecture for various training scenarios

using Dataset 1 and a single augmentation. None of the augmentations show

improvements over the CNN trained with synthetic data, and only augmenting

with noise is capable of doing better than random guessing at lower SNRs. . 50

x

4.5 Classification Accuracy of LSTM architecture for various training scenarios

using Dataset 1 and a single augmentation. None of the augmentations show

improvements over the LSTM trained with synthetic data, and only augment-

ing with noise is capable of doing better than random guessing at lower SNRs. 51


using Dataset 1 and multiple augmentations. All three augmentations must

be used to maximize the effectiveness of the dataset. . . . . . . . . . . . . . 53


using Dataset 1 and multiple augmentations. All three augmentations must

be used to maximize the effectiveness of the dataset. . . . . . . . . . . . . . 54

4.8 Classification Accuracy of CNN architecture for various training scenarios us-

ing Dataset 2 and a single augmentation. None of the augmentations show

improvements over the CNN trained with synthetic data, once again demon-

strating that multiple augmentation techniques are required. All but the

resampling augmentation show almost no improvement throughout the entire

SNR range. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55


using Dataset 2 and a single augmentation. No scenario is better than the

LSTM trained with synthetic data, once again demonstrating that multiple

augmentation techniques are required. . . . . . . . . . . . . . . . . . . . . . 56

xi


using Dataset 2 and multiple augmentations. Augmenting the training data

by applying noise, resampling, and a frequency shift is critical to improving

classification accuracy, but accuracy is still relatively poor throughout the

entire SNR range when compared to the network trained with synthetic data. 57



by applying noise, frequency shifts, and resampling is critical to maximizing

classification accuracy across the entire SNR range, but still does not match

the performance of the network trained with synthetic data. . . . . . . . . . 58


ing Dataset 3. Augmenting the training data is critical to improving classifica-

tion accuracy, and the larger initial capture size helps to improve classification

accuracy when compared to using Dataset 1. The CNN trained with synthetic

data still has the best performance over the entire SNR range. . . . . . . . . 60


using Dataset 3. Augmenting the training data is critical to improving clas-

sification accuracy, and the larger initial capture size does not provide any

benefit over using the smaller captures from Dataset 1. The LSTM trained

with synthetic data still has the best performance over the entire SNR range. 61

xii


ing Dataset 4 and multiple augmentations. Augmenting the training data is

critical to improving classification accuracy at lower SNRs, but the classifi-

cation accuracy plateaus at higher SNRs. With a larger initial capture, it is

possible, through augmentation, to match the classification accuracy of the

CNN trained with synthetic data at lower SNRs. . . . . . . . . . . . . . . . . 63



is critical to improving classification accuracy at lower SNRs, but the classi-

fication accuracy plateaus at higher SNRs. With a larger initial capture, it

is possible, through augmentation, to match the classification accuracy of the

LSTM trained with synthetic data at lower SNRs. . . . . . . . . . . . . . . . 64

4.16 Training and validation loss for the CNN architecture augmenting Dataset

2 with noise, frequency shifts, and resampling. The CNN begins to overfit

immediately. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.17 Training and validation loss for the LSTM architecture augmenting Dataset 2

with noise, frequency shifts, and resampling. The validation loss of the LSTM

network tracks the training loss for the first 5 epochs, but then quickly begins

to overfit the training data. . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.18 Training and validation loss for the CNN architecture augmenting Dataset 2

with a non-repeating number of noise, frequency shift, and resampling aug-

mentations. The CNN tracks the training loss for the duration of the training,

although begins to overfit around epoch 70. . . . . . . . . . . . . . . . . . . 67

xiii

4.19 Training and validation loss for the LSTM architecture augmenting Dataset

2 with a non-repeating number of noise, frequency shift, and resampling aug-

mentations. The validation loss of the LSTM network is consistently better

than the training loss, but plateaus around epoch 300. . . . . . . . . . . . . 68

4.20 Classification accuracy comparison for CNN architecture across four different

training scenarios. Training with non-repeated augmentations results in im-

proved accuracy, but is still not as good as the CNN trained with a larger

original data capture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.21 Classification accuracy comparison for LSTM architecture across four different

training scenarios. Training with non-repeated augmentations is clearly the

best technique when augmenting over-the-air signal captures. . . . . . . . . . 70

4.22 Overall classification accuracy for different receiver and transmitter pairs with

neural networks trained using synthetic data. The overall classification ac-

curacy varies fairly significantly, as much as about 8% across the different

receiver/transmitter pairs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72


neural networks trained using Dataset 3 augmented with noise, frequency

shifts, and resampling. The receiver and transmitter pair used to create the

initial training dataset is highlighted in red, and the overall accuracy varies

fairly significantly, as much as about 8% across the different receiver/trans-

mitter pairs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

xiv


neural networks trained using Dataset 4 augmented with noise, frequency

shifts, and resampling. The receiver and transmitter pair used to create the

initial training dataset is highlighted in red, and the overall accuracy is re-

markably consistent for all hardware combinations. . . . . . . . . . . . . . . 74

4.25 Classification accuracy of the CNN architecture with Radio B as the trans-

mitter and Radio C as the receiver under various training scenarios. Training

on Dataset 4 augmented with noise, frequency shifts, and resampling results

in the best performance at lower SNRs. . . . . . . . . . . . . . . . . . . . . . 75

4.26 Classification accuracy of the LSTM architecture with Radio B as the trans-

mitter and Radio C as the receiver under various training scenarios. Training

on Dataset 4 augmented with noise, frequency shifts, and resampling results

in the best performance at lower SNRs. . . . . . . . . . . . . . . . . . . . . . 76

xv

List of Tables

3.1 Considered Neural Network Training Scenarios . . . . . . . . . . . . . . . . . 23

4.1 Summary of Over-the-Air Capture Parameters . . . . . . . . . . . . . . . . . 45

xvi

Abbreviations

AGC Automatic Gain Control

AMC Automatic Modulation Classification

AWGN Additive White Gaussian Noise

CNN Convolutional Neural Network

DSP Digital Signal Processing

FIR Finite Impulse Response

FSK Frequency Shift Keying

GPU Graphical Processing Unit

IIR Infinite Impulse Response

IQ In-phase/Quadrature

LSTM Long Short-Term Memory

MLP Multilayer Perceptron

PSK Phase Shift Keying

QAM Quadrature Amplitude Modulation

ReLU Rectified Linear Unit

RF Radio Frequency

RNN Recurrent Neural Network

SNR Signal-to-Noise Ratio

xvii

Chapter 1

Introduction

With the recent success of machine learning, and specifically deep neural networks, in fields

such as image processing [1], machine translation [2], and speech transcription [3], there

has been a surge of interest in how deep neural networks can be applied to communication

systems, particularly using only raw in-phase and quadrature (IQ) samples. For example,

in [4], a convolutional neural network (CNN) is used to estimate IQ imbalance using only

IQ samples; in [5], a CNN is used for timing and center frequency offset estimation using

only raw IQ samples; and in [6], a CNN is used for interference detection using only raw

IQ samples. For automatic modulation classification (AMC) specifically, designs such as

stacked sparse auto-encoders [7], CNNs [8], and recurrent neural networks (RNNs) [9], as

well as more complicated implementations such as those explored in [10], have been used

to autonomously extract features for AMC. In particular, [8] and [9] use a CNN and RNN

respectively for AMC using only raw IQ samples with promising results.

Automatic modulation classification has been a topic of interest for many years in wireless

communications, enabling blind characterization of a signals’ modulation with little, to no,

a priori knowledge. AMC is of particular interest in cognitive radio, spectrum sensing/dy-

1

2 Chapter 1. Introduction

namic spectrum access, and military applications. In cognitive radio, one use of AMC is

that it allows for dynamically changing modulation formats without increasing the commu-

nication overhead. Similarly, being able to easily identify primary users and secondary users

based strictly on modulation format can be highly desirable not only in cognitive radios, but

in broader spectrum sensing/dynamic spectrum access scenarios, where it is important to

find open channels without interfering with primary users. Finally, for military applications,

AMC can enable easier characterization and information gathering of both adversary and

friendly radios, as well as enabling intelligent jamming of specific types of signals.

As explored in works such as [11, 12, 13], AMC has traditionally been accomplished using

likelihood functions or by extracting modulation specific expert features from the received

IQ samples. Works such as [14, 15, 16, 17, 18, 19] have demonstrated that these features

can be used as inputs to neural networks to achieve high classification accuracy. While

both likelihood-based and feature-based methods are effective, they require expert design or

knowledge of the signal, require large numbers of samples, and/or can add complexity to a

receiver given that the raw IQ samples must be processed before classification. Techniques

that can autonomously classify modulations directly from the raw IQ samples can not only

remove the need for expert feature design, but also potentially free resources in the receiver

for other tasks. Additionally, the likelihood-based and feature-based approaches to AMC

offer no one-size-fits-all solution, requiring different techniques depending on environmental

variables and the modulations at hand [20].

The use of deep neural networks for AMC directly on raw IQ samples offers several benefits

over traditional likelihood-based and feature-based methods. First, these networks offer a

potential universal solution, in that rather than having to hand pick or expertly design AMC

features/algorithms for every type of modulation, deep neural networks can autonomously

extract and classify on useful features by simply adding the modulations to a training dataset.

3

Not only is this helpful for existing modulations, but it also allows for easy scaling to new

modulation formats, making deep neural network based AMC on raw IQ a potentially future-

proof technology. Second, by processing the raw IQ samples directly, receiver complexity can

potentially be reduced, freeing up processing resources for other tasks. Finally, traditional

techniques can require large numbers of samples to be effective [21], whereas neural networks

may be able to make more accurate classification decisions using fewer samples [5, 8].

Much of the published literature uses datasets either directly from, or derived from [22]. As

outlined in [22], these datasets attempt to model the real world by incorporating effects such

as sample rate offset, center frequency offset, channel fading, and additive white Gaussian

noise. However, the sample rate and center frequency offsets considered are largely intended

to model small inaccuracies such as sampling errors from imperfect clocks, and carrier fre-

quency drift due to local oscillator drift. While considering small inaccuracies is important,

in modern military, dynamic spectrum access, and cognitive radio applications, the exact

center frequency and bandwidth of signals of interest will not generally be known, but will

instead need to be estimated by a receiver. These estimates can create significantly larger

offsets than simple errors due to oscillator drift and digital sampling.

As the research into deep neural networks on raw IQ for modern AMC applications pro-

gresses, and in particular as the research transitions to real-world deployments, it will be-

come increasingly important to understand how deep neural networks will fit into the system

design, and what real-world effects need to be taken into consideration, particularly in blind

receiver applications. To that end, two primary contributions are added by the work pre-

sented in this thesis. First, the importance of blind receiver characteristics on deep neural

networks trained for AMC on raw IQ is discussed [23]. Next, an approach for augmenting

real, over-the-air signal captures is discussed. This approach will show how augmenting

signal captures can improve deep neural network based AMC on raw IQ in blind receiver ap-

4 Chapter 1. Introduction

plications, and what trade-offs exist between signal captures and future performance. More

specifically, this work is organized as follows:

In Chapter 2, a background on neural networks and AMC is presented. This chapter provides

a brief overview of neural networks, with a focus on the types of networks used throughout

the work presented in this thesis. Additionally, traditional AMC techniques and how neural

networks may revolutionize the field is also explored.

Chapter 3 provides an in-depth study on the effect that imperfections in the detection stage

of a receiver can have on neural networks designed for AMC on raw IQ. Specifically, this

chapter introduces two primary types of common receiver imperfections: frequency offset

and sample rate offset. Through simulation, it is shown that there is a close relationship

between the frequency and sample rate estimates in a receiver and the overall classification

accuracy of neural network based AMC on raw IQ. This study provides two major takeaways.

First, taking into account receiver imperfections such as frequency offset and sample rate

mismatch are essential if neural network based AMC on raw IQ is to be realized in real-world

systems. Second, training a neural network to generalize over a broad range of frequency

and sample rate offsets has a negative effect on classification accuracy, even if subsequent

signals have 100% accurate frequency and sample rate estimates. Training for the specific

error bounds of a receiver is critical to maximizing classification accuracy.

Chapter 4 then explores how real, over-the-air signal captures can be used to train neural

networks designed for AMC on raw IQ. Using real, over-the-air signal captures for training

offers two primary benefits over the synthetic datasets from Chapter 3. First, while receiver

imperfections such as frequency offset and sample rate offset are relatively easy to account

for in synthetic datasets, other imperfections such as emitter non-idealities and complex

channels may not be as easy to model through simulation. Second, creating synthetic train-

ing datasets inherently requires expert knowledge of all modulations of interest. Being able

5

to capture and train with previously unknown modulations can allow deep neural networks

to classify arbitrary modulations, without the need for expertly created synthetic datasets.

Through real-world simulation, it is shown that, in the absence of synthetic training data,

augmenting over-the-air signal captures with simple techniques such as adding noise, ap-

plying frequency shifts, and resampling is essential to maximizing classification accuracy of

neural network based AMC on raw IQ. It is also shown that classification accuracy can be

limited by the quality of the original data capture used for training. Although presented

through the framework of AMC, the results of this chapter can readily apply to other fields

such as deep neural network based emitter identification, where the presented augmenta-

tion techniques could enable creating generalized training datasets from emitter specific,

real-world transmissions.

Finally, in Chapter 5, conclusions and future work are provided. Here major takeaways

will be discussed, reaffirming the applicability and limitations of using both synthetic and

augmented datasets for neural network based AMC on raw IQ. Additionally, this chapter

highlights current hurdles that will need to be addressed such as neural network scalability,

generalizability, and hardware sensitivity before neural network based AMC on raw IQ can

be realized in communication systems.

The following publications were a direct result of the work presented in this thesis.

1. S. C. Hauser, W. C. Headley, and A. J. Michaels, “Augmentation of Real-World Data

for Neural Network Based Automatic Modulation Classification,” in IEEE Transactions

on Cognitive Communications and Networking, 2018 [submitted].

2. S. C. Hauser, W. C. Headley, and A. J. Michaels, “Signal Detection Effects on Deep

Neural Networks Utilizing Raw IQ for Modulation Classification,” in MILCOM 2017.

IEEE Military Communications. Conference Proceedings, vol. 1, 2017.

Chapter 2

Background

In recent years, deep neural networks have seen an explosion in use. While neural networks

have been around for decades, modern advances in hardware like Graphical Processing Units

(GPUs), as well as improved algorithms have enabled training of larger, deeper neural net-

works capable of learning more complicated tasks [24]. Additionally, open source software

libraries such as Keras [25] and TensorFlow [26] have significantly lowered the barrier to

entry for building, training, and testing neural networks. This has helped to open the door

for deep learning in new domains by enabling rapid design, prototyping, and testing of

advanced neural network architectures with just a few lines of code. In combination with

ever-improving hardware, this makes new applications for neural networks readily accessible,

and has provided a framework for state-of-the-art algorithms and widespread use in multiple

fields [1, 2, 3].

Taking a cue from rapid advances in areas such as image processing, in communication

systems this has inspired moving away from more traditional, expertly designed solutions, to

adaptable solutions that are ‘learned’ by deep neural networks. In the context of AMC, deep

neural networks are causing a shift from simply using neural networks to classify modulation

6

2.1. Deep Learning for Wireless Communications 7

formats based on expertly designed features [14, 15, 16, 17, 18, 19] to directly applying deep

neural network architectures for autonomous feature learning using only raw IQ samples

[8, 9, 10, 27]. While initial results for deep learning applied to communication systems appear

to be promising, the purpose of this work is to outline practical, real-world considerations

that need to be taken into account if modern deep learning trends are to be widely used in

communication systems.

This background section will introduce neural networks at a high level, and discuss the

primary types of neural networks used in this work: namely convolutional neural networks

and recurrent neural networks. It will also provide a more thorough introduction to neural

networks in communication systems through the lens of AMC, exploring the impact that

deep neural networks have on the field.

2.1 Deep Learning for Wireless Communications

2.1.1 Deep Learning Architectures

In this work, two primary types of deep neural networks are used, namely feedforward neural

networks and recurrent neural networks. Both types of networks are very similar in that they

attempt to learn a function that maps a set of inputs to a set of outputs. More specifically,

they attempt to learn an output y given a set of inputs x, using a set of learnable parameters

defined by θ. The key distinction between feedforward and recurrent neural networks being

that recurrent neural networks incorporate feedback connections as highlighted later in this

section [28].

To get a basic idea of how, in general, deep neural networks work, the multilayer perceptron

(MLP) is introduced. As described in [28], neural networks largely consist of the following

8 Chapter 2. Background

mathematical constructs. Formally, the neural network attempts to learn a function f such

that

y = f(x|θ). (2.1)

Typically θ consists of a series of weights and biases, and therefore Equation (2.1) can be

re-written as

y = f(x|w, b). (2.2)

Drawing from [29], Figure 2.1 shows a simple example of a MLP network with a single hidden

layer, where a hidden layer is any layer within the structure of the neural network that is

not an input or an output layer. Each line represents a learnable weight in the network. The

term deep learning really just means that the neural network has multiple hidden layers,

each with its own set of weights and biases so that

f(x|θ) = f (3)(f (2)

(f (1) (x|θ1) |θ2

)|θ3). (2.3)

For simplicity only a single hidden layer is shown in Figure 2.1.

In order to map to more complex functions, such as non-linear functions, the weights and

biases are typically passed through an activation function before the output of a particular

layer is calculated. From [28], letting an activation function be defined by the function g(x),

and the hidden layer output being h, the whole layer can be written as

h = g(W>x+ b

)(2.4)


Inputx

Hidden Layerh

Outputy

Figure 2.1: Example of a simple multilayer perceptron neural network. Each dimension ofthe input vector is fully connected to every hidden node, and then the outputs of the hiddennodes are combined to get the desired output y of the system.

where W is a learnable weight matrix, > is the matrix transpose, and b is an optional set

of learned biases. In the simple case as shown in Equation (2.2), h from Equation (2.4) is

equal to y, and if there are multiple hidden layers Equation (2.3) can be re-written as

f(x|θ) = g(3)(W>

3 g(2)(W>

2 g(1)(W>

1 x+ b1)

+ b2)

+ b3). (2.5)

In this work, three primary activation functions are used throughout all neural network

architectures considered: the rectified linear unit (ReLU), the hyperbolic tangent, and the

logistic sigmoid function. The ReLU was used because of its widespread success [1, 30, 31, 32],

while the hyperbolic tangent and logistic sigmoid functions are widely used in long short-


−1

1

z

g(z)

Nonlinear Activation Functions

relutanh

sigmoid

Figure 2.2: Nonlinear activation functions used in this work.

term memory recurrent neural networks [33, 34, 35]. For reference, these three activation

functions are shown in Figure 2.2. Also, because all simulations presented in this work are in

the context of classification, the neural networks constructed are designed to choose the most

likely class. This is accomplished by increasing the number of output nodes y in Figure 2.1

to be equal to the number of desired classes, applying the softmax function to this output

vector, and then choosing the most likely class from the distribution. Mathematically, the

chosen class H is determined by

H = arg maxc

eyc

M−1∑j=0

eyj

, (2.6)

where yj is a single element from the output vector andM−1∑j=0

is the sum over all possible

classes M .


Convolutional Neural Networks

While effective, the MLP architecture suffers from scaling issues due to the number of pa-

rameters in the densely connected layers [28], and as networks become larger and learning

tasks more difficult, approximating complicated functions with only a MLP architecture can

be challenging. One particular architecture that can drastically reduce the number of param-

eters within the neural network is a convolutional neural network [36]. Unlike MLP neural

networks, each node is only connected to a subset of nodes from the previous layer, and

the learned weights act as filters that are convolved across the previous layer. In a machine

learning context, the filters created by the weights are known as kernels [28]. Mathematically,

Equation (2.4) becomes

h = g((W> ∗ x) + b

), (2.7)

where ∗ is the convolution operator, defined as

y[n] = x[n] ∗ w[n] =∞∑

m=−∞

x[m]w[n−m] =∞∑

m=−∞

x[n−m]w[m] (2.8)

for a given input x and a given weight vector w. Note that in the context of machine learning,

convolution is often implemented as a cross-correlation, defined as

y[n] = x[n] ∗ w[n] =∞∑

m=−∞

x[n + m]w[m] (2.9)

and used interchangeably with convolution [28].

A simple diagram of a convolutional neural network is shown in Figure 2.3. Unlike the

MLP, here the weights are only connected to a subset of the input for any given hidden

node. Additionally, the learned weights of each node are the same; the key difference being


Inputx

Hidden Layerh

Outputy

Figure 2.3: Example of a simple convolutional neural network. The hidden nodes onlyconnect to a subset of the input at once, and convolve across the input to calculate theoutput of the layer, which acts as the input to the next layer.

that the weights are being applied to different parts of the input vector. It is important to

note that, as pictured in Figure 2.3, there is only a single kernel being learned. While not

shown, this diagram could easily extend along a third dimension, where each dimension is

a new, learned kernel. At the output all kernels are connected as they would be in a MLP

architecture.

Convolutional neural networks offer three primary advantages over MLP neural networks:

sparse interactions, parameter sharing, and equivariant representations [28]. As illustrated

in Figure 2.3, each node in a convolutional neural network is only connected to a subset

of the nodes in the previous layer, resulting in sparse interactions. Because each node is

not connected to every other node, this structure requires fewer parameters, decreasing the


Inputx

Hidden Layer 1h1

Hidden Layer 2h2

Outputy

Figure 2.4: Example of a multi-layer convolutional neural network. As highlighted in orange,the input view of the deeper layers is larger than that of earlier layers.

memory footprint and increasing the efficiency of the neural network estimator [28]. Simi-

larly, convolutional neural networks take advantage of parameter sharing in that each kernel

has fixed weights applied everywhere on the previous layer. This is helpful because rather

than having to learn the same set of weights for any feature, anywhere that said feature

could occur in the previous layer, only a single set of weights is learned, and every part of

every kernel is efficiently used throughout the previous layer [28]. Finally, convolutional neu-

ral networks offer equivariant representations, effectively preserving changes in the previous

layers. Note that due to the sparse interactions in CNN architectures, each learned kernel

can only operate on n inputs at a time, where n is the size of the kernel. For example, in

Figure 2.3 n would correspond to 4 inputs. This view of the input can be increased with

depth as shown by the orange connections in Figure 2.4 [28].

Recurrent Neural Networks

Recurrent neural networks [37] are similar to one-dimensional convolutional neural networks,

but, through the use of recurrent connections, RNNs are specifically designed for sequential


Inputx

Hidden Layerh

Outputy

xt-2 ht-2

xt-1 ht-1

xt ht

Figure 2.5: Example of a simple recurrent neural network. The input is sequentially pro-cessed by the hidden layer, and the previous state of the hidden layer is also used as an inputto the next state for the next point in the input. The output is calculated at the end of thesequence. As shown, the hidden layer consists of a single node, and that node is updatedthrough time via the new time input and the previous state of the hidden node.

data [28], and have a broad range of uses from stock market prediction [38] to natural

language processing [39]. While the input view of a CNN can be expanded with depth, the

recurrent connections in a RNN allow RNNs to model significantly longer sequences, well

beyond the immediate timestep of any particular node [28]. There are many ways to build

RNNs, but a typical construct, and the one used in this work, is shown in Figure 2.5 [28].

The weights learned by the hidden recurrent layer are the same from timestep to timestep,

but in addition to the input layer, the hidden state is also passed from timestep to timestep.

Note that while the output of the recurrent layer can be calculated at every timestep, only

the output at the end of the sequence is considered in this particular implementation. Similar

to the CNN architecture, Figure 2.5 could easily extend along a third dimension, allowing

for separate sets of weights (hidden states) to be learned at each timestep.

In this work, the recurrent neural network that was used follows the structure of Figure 2.5,


but leverages long short-term memory (LSTM) cells [40] as the recurrent units. LSTM cells

are used because LSTM networks are still considered to be one of the best RNN architectures

for sequence learning [35] and have shown to be exceptionally effective at many sequence

based tasks such as video to text translation [34], image captioning [41], sequence to sequence

learning [2], and image generation [42]. At a high level, LSTM networks are similar to

the generic RNN networks previously described, except that they allow for the network to

more easily learn long term dependencies by incorporating gated internal recurrent loops, in

addition to the recurrent connections described above [40, 43].

2.1.2 Deep Learning Applied to Communication Systems

The use of CNNs and RNNs are a natural fit for communication systems. Convolution is

fundamental to the field of communications and digital signal processing (DSP), particularly

in filtering to manipulate IQ data and extract useful information. The use of CNNs on the

raw IQ samples extends naturally from these signal processing concepts, where the neural

network is, by design, convolving over the inputs in an attempt to autonomously extract

high level features [28]. Each kernel in the CNN can effectively be thought of as a non-

linear, finite impulse response (FIR) filter [44]. While the RNN is not explicitly performing

a convolution in the same sense as the CNN, it is designed to learn sequential information

from the data, and the learned hidden units can effectively be thought of as non-linear,

infinite impulse response (IIR) filters [45, 46]. Both FIR and IIR filters are used extensively

in modern communication systems, so the CNN and RNN architectures have the potential to

offer unique improvements over traditional systems, with the added benefit of not requiring

expert filter design.

As previously discussed, given the recent success of deep learning in fields such as image


processing, where deep neural networks use only pixels as input values [1, 47, 48], there has

been a surge of interest looking into applying deep learning directly to raw IQ in commu-

nication systems [4, 5, 6, 8, 9, 10]. One particular application that may benefit from deep

learning is AMC. Traditional AMC techniques can broadly be grouped into two categories:

likelihood-based and feature-based [11, 12, 13]. Likelihood-based methods offer theoretically

optimum AMC performance, but suffer from computational complexity and the need for a

priori knowledge such as channel gain, noise variance, and phase offset [21]. This can make

the likelihood-based algorithms very difficult to use in practical scenarios [11]. Feature-based

techniques offer an alternative to likelihood-based techniques, trading simplified complexity

for sub-optimal performance [11].

The literature on likelihood-based AMC techniques extends back decades [49, 50, 51, 52,

53, 54, 55, 56]. In general, likelihood-based techniques center around making modulation

classification decisions by looking at the probability of the received samples conditioned on

the candidate modulations, and choosing the most likely distribution [11, 21]. While useful

from a theoretical point of view, two of the biggest drawbacks to likelihood-based AMC are

the computational complexity and the assumptions that need to be made. Additionally,

poor estimation of assumed parameters such as the channel state can negatively affect the

classification accuracy of the likelihood-based AMC technique [21].

On the other hand, feature-based techniques [14, 15, 16, 17, 18, 19, 57, 58, 59] can decrease

the computational complexity of classification algorithms by first extracting expertly de-

signed features such as statistical moments of phase [60, 61, 62], higher order cumulants

and cyclic cumulants [63, 64, 65, 66, 67], and zero-crossing variance of the signal [68, 69].

Classification decisions can then be made from the relevant features. This classification de-

cision can be performed with different machine learning algorithms such as support vector

machines [70, 71, 72], decision trees [73], or neural networks [14, 15, 16, 17, 18, 19, 58, 59].


While effective, feature-based approaches require expert feature design, which can become

particularly difficult as new modulation formats and channel assumptions are incorporated

into communication systems. Therefore, techniques that can autonomously learn relevant

features can not only simplify system design, but also allow for a universal approach to AMC

that scales as new modulation formats are created.

Some of the early results with neural networks designed for AMC using only raw IQ samples

were first published in [8]. In [8], the authors show improved performance over machine

learning algorithms applied to expertly designed cyclic-moment features [74] across 11 dif-

ferent modulations. Since [8], other works have expanded the results to include different

neural network architectures [9, 10], as well as different modulations and performance with

over-the-air captures [27] using only raw IQ samples. While promising, most of the results in

the work to date focus on training with synthetic datasets from [22], or derivatives thereof.

While real-world effects such as sample rate offset, center frequency offset, channel fading,

and additive white Gaussian noise are implicitly modeled in the synthetic datasets, none

of the work attempts to understand exactly how important each effect may or may not be

on classification tasks with raw IQ signal data. Additionally, the sample rate and center

frequency offsets considered are largely intended to model small inaccuracies from hardware

such as local oscillators, not larger inaccuracies as would be typical in blind receiver sce-

narios, where true bandwidths and center frequencies of received signals are unknown and

must be estimated. The work in Chapter 3 will demonstrate not only how important it

is to account for larger inaccuracies due to blind estimation, but also that there is a close

relationship between expected classification accuracies and the quality of initial bandwidth

and center frequency estimates.

A natural extension to the synthetic datasets from [22] is to use over-the-air signal captures

for training and evaluation. [27] explores this extension with encouraging results. However,


the authors again assume perfectly tuning to signals of interest, with a signal-to-noise ratio

(SNR) of about 10 dB, and only briefly consider training with over-the-air signal captures.

Considering that one of the primary benefits to using deep learning based AMC on raw IQ is

that the approach can autonomously learn features directly from the raw IQ, and therefore be

extended for arbitrary modulations, being able to effectively train deep neural networks from

real-world signal captures will be vital toward realizing this benefit. Chapter 4 will reinforce

the results of Chapter 3, while demonstrating that if synthetic data correctly models the

real-world it can be an effective tool, but that in the absence of synthetic data, real-world

signal captures can be just as effective provided augmentations such as noise, frequency

shifts, and resampling are applied.

Chapter 3

Receiver Effects

3.1 Introduction

As part of the investigation into real-world effects that must be considered when using deep

neural networks for AMC with raw IQ data, the signal imperfections that can be introduced

by a blind receiver are first considered. Real-world RF environments can be very dense, such

as in dynamic spectrum access applications, often with numerous signals being observed in

any particular band of interest. For most applications, signals must first be detected and

isolated before any further processing. In real systems, the detection and isolation stage is not

perfect. While receivers can directly tune to pre-defined channels with specific bandwidths

and center frequencies, there will always be a small source of error in center frequency and

sample rate due to hardware imperfections such as oscillator drift, non-ideal analog to digital

converters, and filtering. If pre-tuning is not possible, the center frequency and sample rate

errors can be exacerbated because the signal detection stage must estimate where each signal

is within the wideband spectrum, and the isolation stage must filter and resample the signal

to baseband. This entire process introduces two primary forms of error, namely a center

19

20 Chapter 3. Receiver Effects

Figure 3.1: Imperfections introduced by the detection and isolation stage of a receiver.On the top is a wideband snapshot of a dense spectral environment, and on the bottomis a baseband signal after detection and isolation. Frequency error (left) and sample ratemismatch (right) are shown in red.

frequency offset and a sample rate offset. A diagram of the detection problem and how these

imperfections manifest is shown in Figure 3.1.

While prior works like [8] include datasets with real world signal effects such as center

frequency and sample rate offsets, the authors only consider small effects due to non-ideal

hardware and do not attempt to quantify the impact, whether positive or negative, that

these effects have on classification performance. Through a thorough study of the effect of

sample rate offset and frequency offset in baseband signals, the simulations in this chapter

will show that not only must these effects be accounted for in the training datasets, but also

that defining error bounds in the detection and isolation stage of a receiver can be critical

to maximizing performance of downstream neural network processing. While only one CNN

is considered in this chapter, the CNN analyzed is of sufficient size to perform generally

well across all simulation datasets, without optimizing for any specific dataset, and thus the

results are representative of what can be expected for similar neural network architectures.

The rest of this chapter is outlined as follows. In Section 3.2, the exact CNN architecture

3.2. Convolutional Neural Network Architecture 21

used is described. In Section 3.3, information about how the neural network is trained and

how data is generated is presented. Finally, in Section 3.4, the simulations and results are

presented. Conclusions and future work will be summarized in Chapter 5.

3.2 Convolutional Neural Network Architecture

Leveraging the convolutional neural network architecture proposed in [8] for AMC and the

more finely tuned parameters explored in [10], here a representative CNN is developed to

classify an input vector of raw IQ samples into one of eight different modulation formats:

BPSK, QPSK, 8PSK, 2FSK, 4FSK, 8FSK, 16QAM, and 64QAM. While not an exhaustive

search of all modulation formats, these eight representative classes span many of the most

common ways to modulate signals, i.e. by amplitude, phase, or frequency, while including a

mix of higher and lower order modulations. Note that while works such as [6] show potential

improvements in certain tasks if the raw IQ samples are represented as magnitude/phase

or frequency domain samples, only raw IQ is considered in this work, since a hypothesized

advantage of using deep learning frameworks in AMC applications is the ability to eliminate

expertly designed features and upfront data transformations.

The considered neural network architecture given 256 input samples is shown in Figure 3.2.

The input vector consists of IQ samples divided into separate I and Q dimensions, and the

first convolutional layer (Conv1) uses a filter size of 1x16, with 32 distinct filters. The second

convolutional layer (Conv2) uses a filter size of 2x16 and again has 32 distinct filters. To

avoid overfitting, dropout [75] is applied at the output of Conv2 before being fed to a fully

connected layer with 256 nodes. Conv1, Conv2, and the first fully connected layer use a

rectified linear unit for their activation function. The final layer is another fully connected

layer with the softmax function applied to the output. Again, the architecture and results


Figure 3.2: Diagram of a representative convolutional neural network architecture designedfor AMC using only raw IQ samples.

are intended to be representative, and the results presented in this work are the takeaways,

not the exact performance of the considered neural network.

3.3 Neural Network Training and Simulations

In order to determine how estimation errors in the detection and isolation stage of a receiver

impact classification accuracy, multiple neural networks are trained with datasets that sim-

ulate varying degrees of estimation error. These neural networks are outlined in Table 3.1.

Four primary scenarios are considered: an ideal detector with perfect frequency and sample

rate estimates (Ideal), a detector with varying maximum center frequency offset (Freq 2.5

and Freq 5), a detector with varying sample rate offset (Samp 2.5 and Samp 4), and a de-

tector with both center frequency offset and sample rate offset (Samp Freq). The nominal

sample rate is twice the bandwidth of a detected signal, where, without loss of generality,

the bandwidth is defined as the null to null bandwidth of the signal.

To generate data for training and testing the neural networks, signals are created using GNU

3.3. Neural Network Training and Simulations 23

Table 3.1: Considered Neural Network Training Scenarios

Network Max Frequency Sample Rate RangeName Offset (Multiple of Bandwidth)

Ideal no offset no mismatchFreq 2.5 2.5% of sample rate no mismatchFreq 5 5% of sample rate no mismatch

Samp 2.5 no offset 1.5-2.5Samp 4 no offset 1-4

Samp Freq 2.5% of sample rate 1.5-2.5

Radio and offsets are drawn from a uniform distribution bound by the ranges defined in Table

3.1. For each scenario, all signals assume an additive white Gaussian noise (AWGN) channel

and a signal-to-noise ratio (SNR) that varies uniformly between 0 and 20 dB. The neural

networks are built and trained using Keras [25] and TensorFlow [26], and optimized using

the ADADELTA [76] algorithm, with early stopping to avoid overfitting. Approximately

240,000 signal captures are used for training, and approximately 100,000 signal captures are

used for testing.

Overall testing accuracies of these neural networks are shown in Figure 3.3. Here the overall

classification accuracy for each neural network is calculated with signals whose frequency and

sample rate offsets vary over the entire training range of the neural network, as specified in

Table 3.1. Observing Figure 3.3, it is clear that the overall classification accuracy decreases

as the training data includes larger offsets. Note that while increasing the size of the neural

network or increasing the number of training samples might improve the classification accu-

racy of the training sets with larger offsets, the size of the neural network and training set

is kept fixed throughout all of the simulations. This is done so that direct comparisons can

be made across the various detector assumptions. Specific breakdown points and trade-offs

are explored more fully in Section 3.4.


Figure 3.3: Neural network testing accuracies using 256 raw IQ input samples averaged overall considered SNRs. As the offsets in the training set increase, the overall performance ofthe CNN decreases. The decrease is most noticeable for CNNs trained to generalize overfrequency offsets.

3.4 Simulations

For all results presented in this section, classification accuracy is determined by averaging

4,000 signal captures (500 per modulation) for each data point. The signals were created with

GNU Radio as outlined in Section 3.3. If not otherwise stated, trained neural networks use

256 input IQ samples. Simulations demonstrating how accuracy is impacted by a frequency

offset are presented first, followed by simulations demonstrating how accuracy is impacted

by a sample rate offset. Finally, simulations demonstrating the combined effect of frequency

offset and sample rate offset are presented.

3.4. Simulations 25

3.4.1 Frequency Offset

To evaluate the effect of center frequency offset on modulation classification accuracy, the

frequency offset is swept from -10% to 10% of the sample rate. Classification accuracy is

then evaluated for the following three neural networks: Ideal, Freq 2.5, and Freq 5. Plots

of each neural network’s classification accuracy over the entire sample space are shown in

Figures 3.4–3.6. As can be seen, classification accuracy of each CNN decreases quickly for

frequency offsets outside of the training range, and this degradation in accuracy is consistent

across all SNRs and all three neural networks. As the frequency offset in the training data is

increased, peak classification accuracy also noticeably decreases, even at a frequency offset

of 0. This demonstrates that there is a penalty for arbitrarily training a CNN on a wide

range of frequency offsets.


Figure 3.4: Classification accuracy of Ideal CNN trained assuming no frequency offset. Ac-curacy quickly drops as the tested frequency offset deviates from 0.

3.4. Simulations 27

Figure 3.5: Classification accuracy of Freq 2.5 CNN trained assuming frequency offsets from±2.5% of the sample rate. Accuracy is relatively flat within the training range, but quicklydrops as the tested frequency offset increases beyond 2.5%.


Figure 3.6: Classification accuracy of Freq 5 CNN trained assuming frequency offsets from±5% of the sample rate. Accuracy is relatively flat within the training range, but quicklydrops as the tested frequency offset increases beyond 5%.

3.4. Simulations 29

Figure 3.7: Classification accuracy vs. SNR for Ideal, Freq 2.5, and Freq 5 CNNs testedwith samples at no frequency offset. For high SNRs, the CNN is able to generalize oversmall frequency offsets without significant degradation in accuracy. However, accuracy issignificantly degraded at lower SNRs.

A cross-sectional view of classification accuracy as a function of SNR, with frequency offset

fixed at 0, is shown in Figure 3.7. Similarly, a cross-sectional view of classification accuracy

as a function of the frequency offset, with SNR fixed at 10 dB, is shown in Figure 3.8. These

figures help to further illustrate the trends observed in Figures 3.4–3.6. For example, in

Figure 3.7 the Freq 2.5 CNN approaches the accuracy of the Ideal CNN as SNR increases,

while the accuracy of the Freq 5 CNN remains about 15% worse.


Figure 3.8: Classification accuracy vs. frequency offset for Ideal, Freq 2.5, and Freq 5 CNNstested at an SNR of 10 dB. Maximum classification accuracy decreases significantly as theCNN is trained for broader ranges of frequency offsets.

Finally, to evaluate if classification accuracy can be improved by changing the number of

input samples, the neural network with the poorest accuracy, Freq 5, is trained and tested

varying the number of input samples from 128 to 512. An SNR sweep with samples at no

frequency offset for three different input sizes is shown in Figure 3.9. Each increase in the

number of input samples improves classification accuracy across all SNRs, illustrating that

classification accuracy degradations due to errors in frequency offset can be mitigated by

increasing the number of input samples. Detailed analysis of this trend is left as future

work.

3.4. Simulations 31

Figure 3.9: Classification accuracy vs. SNR for Freq 5 CNN trained with varying inputsize. Increasing the number of inputs to the CNN improves classification accuracy across allSNRs.

3.4.2 Sample Rate Offset

To evaluate the effect of sample rate offset on modulation classification accuracy, the sample

rate is swept from the signal bandwidth to approximately eight times the bandwidth. Classi-

fication accuracy is then evaluated for the following three neural networks: Ideal, Samp 2.5,

and Samp 4. Plots of each neural network’s classification accuracy over the entire sample

space are shown in Figures 3.10–3.12. Note that for all three neural networks, the nominal,

ideal sample rate is twice the bandwidth. As can be seen, the classification accuracy of each

CNN decreases quickly for sample rates outside of the training range, and this decrease is

consistent across all SNRs and all neural networks. However, unlike for frequency offset, a

noticeable difference cannot easily be observed between peak classification accuracies as the

sample rate range in the training data is increased.


Figure 3.10: Classification accuracy of Ideal CNN trained assuming a sample rate of twicethe bandwidth. Accuracy quickly drops as the tested sample rate deviates from twice thebandwidth.

3.4. Simulations 33

Figure 3.11: Classification accuracy of Samp 2.5 CNN trained assuming sample rates between1.5 and 2.5 times the bandwidth. Accuracy is relatively flat within the training range, butquickly drops as the tested sample rate decreases below 1.5 or increases above 2.5 times thebandwidth.


Figure 3.12: Classification accuracy of Samp 4 CNN trained assuming sample rates from thesignal bandwidth to 4 times the bandwidth. Accuracy is relatively flat within the trainingrange, but quickly drops as the tested sample rate increases beyond 4 times the bandwidth.

3.4. Simulations 35

Figure 3.13: Classification accuracy vs. SNR for Ideal, Samp 2.5, and Samp 4 CNNs testedwith samples at twice the bandwidth. All CNNs converge to the same accuracy at highSNRs, but noticeable degradation occurs at SNRs below approximately 11 dB.

A cross-sectional view of classification accuracy as a function of SNR, with sample rate

fixed at 2 times the bandwidth, is shown in Figure 3.13. Similarly, a cross-sectional view

of classification accuracy as a function of the sample rate, with SNR fixed at 10 dB, is

shown in Figure 3.14. These plots help to further illustrate the trends observed in Figures

3.10–3.12. All three neural networks converge to the same classification accuracy for SNRs

above approximately 11 dB, and the decrease in classification accuracy for lower SNRs is

not as pronounced as it is for frequency offset. This suggests that when designing a detector

and accounting for estimation errors in training, an accurate sample rate estimation is less

important than an accurate center frequency estimation.


Figure 3.14: Classification accuracy vs. sample rate for Ideal, Samp 2.5, and Samp 4 CNNstested at an SNR of 10 dB. Maximum classification accuracy does not decrease significantlyas the CNN is trained for broader ranges of input sample rates.

Finally, as was done for frequency offset, the number of input samples is varied to evaluate

if classification accuracy can be improved by changing the number of inputs to the neural

network. The neural network with the poorest accuracy, Samp 4, is trained and tested

varying the number of input samples from 128 to 512. An SNR sweep with samples at

twice the signal bandwidth for three different input sizes is shown in Figure 3.15. While the

frequency offset simulation showed notable improvement across all SNRs as the number of

input samples increased, the CNN trained to generalize over sample rate offset only achieves

a measurable improvement at lower SNRs.

3.4. Simulations 37

Figure 3.15: Classification accuracy vs. SNR for Samp 4 CNN trained with varying inputsize. Increasing the number of input samples improves performance, but the improvementis limited for higher SNRs.

3.4.3 Sample Rate Offset and Frequency Offset

To analyze how a neural network that must account for both frequency offset and sample

rate offset performs under ideal conditions, the Samp Freq neural network is compared to

the Ideal, Freq 2.5, and Samp 2.5 neural networks over the SNR range of 0 to 20 dB. Once

again, the data used has a frequency offset of zero, and is sampled at the ideal rate of twice

the bandwidth. The results are shown in Figure 3.16. Apart from a dip in accuracy for

the Freq 2.5 neural network around 6 dB, the Samp Freq neural network is consistently less

accurate than the other networks across the entire SNR range. The drop in accuracy shows

that arbitrarily training a neural network to generalize over frequency and sample rate offsets

can negatively impact overall performance, even with an ideal detector. Additionally, while

the Ideal, Freq 2.5, and Samp 2.5 neural networks all converge to roughly the same peak


Figure 3.16: Classification accuracy vs. SNR for Ideal, Freq 2.5, Samp 2.5, and Samp FreqCNNs tested with signals at no frequency offset and sampled at twice the signal bandwidth.The maximum accuracy is measurably degraded when the neural network is trained to handleboth frequency and sample rate offsets.

classification accuracy (around 90%), the Samp Freq neural network converges to a lower

peak classification accuracy just above 80%. Over the given offset ranges, generalizing for a

single offset only has a marginal effect at high SNRs. However, the lower peak classification

accuracy of the Samp Freq neural network, which is the most realistic of the given scenarios,

is something that will need to be accounted for when designing real world systems.

Chapter 4

Augmenting Over-the-Air Signal

Captures

4.1 Introduction

One of the important advantages of deep learning AMC on raw IQ is the ability of deep

neural networks to autonomously learn relevant features directly from the raw IQ samples

in order to make classification decisions. This advantage allows for scaling to arbitrary

modulations, without the need for expert feature design. However, in order to create the

synthetic datasets from Chapter 3, expert knowledge of modulation types and channel models

had to be known, and examples had to be carefully created in order to accurately model the

real world. In this chapter, the feasibility of training deep neural networks for AMC using

only over-the-air signal captures is explored, and the importance of expanding over-the-air

signal capture datasets through augmentations that directly manipulate the raw IQ samples

is shown. By using over-the-air signal captures, prior knowledge of channels and modulations

can be eliminated, removing the need for expertly designed synthetic datasets and allowing

39

40 Chapter 4. Augmenting Over-the-Air Signal Captures

deep neural networks used for AMC to more easily scale to arbitrary scenarios through

directly training on signals captured in those scenarios. However, as shown in Chapter 3,

it will still be crucial to account for inaccuracies in the detection and isolation stage of the

receiver, which may be very difficult to do with a single signal capture.

One of the primary downsides to using real-world data is that labeled training datasets can

be difficult and/or expensive to obtain [28]. This is a familiar problem in other domains such

as image processing, where labeled training datasets are augmented to expand the size of

the datasets and improve neural network generalization performance [48, 77, 78]. However,

augmenting datasets has not been well studied in the radio frequency (RF) domain. This

chapter will show how real-world, over-the-air signal captures can be augmented to not only

address the receiver effects considered in Chapter 3, but also effectively expand the size of

the training datasets to improve generalization performance of the neural network. Specifi-

cally, this chapter demonstrates how real, over-the-air signal captures can be augmented by

applying noise, frequency shifts, and resampling to improve classification accuracy between

the eight modulation classes from Chapter 3 using deep neural networks trained on raw

IQ samples. These techniques are extensible to permit generalized labeled training dataset

collection without acquiring or storing massive amounts of streaming signal captures.

The rest of this chapter is outlined as follows. In Section 4.2, a description of the two neural

networks used in the experiments from this chapter are outlined. In Section 4.3, a high level

description of the over-the-air capture environment is described, and the considered datasets

and augmentation techniques are introduced. In Section 4.4, the primary augmentation

simulation results are presented. Finally, in Section 4.5, applicability to other hardware and

transmitter/receiver pairs is considered. Conclusions and future work will be summarized in

Chapter 5.

4.2. Neural Network Architectures 41

4.2 Neural Network Architectures

For all simulations presented in this chapter, two different neural network architectures were

used for analysis. One is the convolutional neural network architecture used in Chapter 3

and shown in Figure 3.2, and the other is a simple recurrent neural network. The reader is

referred to Section 3.2 for details on the CNN architecture. The RNN consisted of a sin-

gle layer of 128 LSTM cells [40]. Each LSTM cell was configured with hyperbolic tangent

activation functions on the forward connections, and hard sigmoid functions on the recur-

rent connections. Following the layer of 128 LSTM cells was a single fully connected layer

which, like the CNN, used a softmax function on the output for choosing between one of

eight possible modulation classes. Unlike the CNN architecture which uses two-dimensional

convolutions over the I and Q values, the inputs into the LSTM architecture are formatted so

that each sample, or timestep, contains two dimensions, namely the I and Q values for each

particular sample. Also, similar to the CNN architecture, this particular RNN architecture

was chosen because it is relatively simple and quick to train, and generally performed better

than the CNN architecture on the streaming IQ input. All training and evaluation of both

neural networks was performed using the Keras neural network API [25].

It is worth noting that in terms of number of trainable parameters, and therefore the capac-

ity/complexity of the model, the LSTM architecture is significantly smaller than the CNN

architecture. While the CNN architecture has about 2,000,000 trainable parameters, the

LSTM architecture only has about 70,000. Despite the smaller number of parameters, in

general the LSTM architecture shows better performance than the CNN architecture, and

this improvement will be highlighted in the coming sections. The improved performance

of the LSTM architecture is likely due to the architecture being explicitly designed for se-

quential data and learning only temporally relevant features, whereas the comparative size

and design of the CNN architecture may make it prone to learning insignificant and/or un-


helpful features. A more thorough investigation of the relationship between network size,

architecture, and performance is left as future work.

4.3 Datasets and Augmentation

4.3.1 System Setup

Data captures were performed using over-the-air, line-of-sight channels in a lab environment,

with radios about 1-2 meters apart. For the particular environment, a center frequency of

2.4 GHz was found to be relatively quiet and was used as the nominal center frequency for all

signal captures. Both the transmitter and receiver were Ettus Research USRP B205minis,

capable of tuning between 70 MHz and 6 GHz while supporting an instantaneous bandwidth

of up to 56 MHz [79].

The GNU Radio flowgraph developed to automate signal captures is shown in Figure 4.1.

This flowgraph is made up of three primary components: the transmit chain, the receive

chain, and the SNR estimation chain. For the transmit chain, the random signal block from

the open source gr-signal exciter [80] is connected to a USRP sink block. The sink block

is configured to transmit using a USRP B205mini at an arbitrary center frequency of 2.4

GHz. Similarly, the receive chain combines a USRP source block with automatic gain control

(AGC), followed by a skip head block, a head block, and a file sink block. Similar to the way

the USRP sink block was set up, the USRP source block is configured for a USRP B205mini

tuned to 2.4 GHz. The AGC is used to help stabilize the input signal to have relatively

constant amplitude, while the skip head block removes any transients that may be present

during startup of the flowgraph. Finally a head block and file sink block are connected to

save a specific number of raw IQ samples to disk.

4.3. Datasets and Augmentation 43

Figure 4.1: GNURadio flowgraph used for collecting over-the-air signal captures.

While not strictly necessary for the data capture, the flowgraph also incorporates an estimate

of the SNR by connecting a polyphase clock sync block, a Costas loop block, and an MPSK

SNR estimation block to the output of the AGC block. The MPSK SNR estimation block

estimates the SNR of PSK signals by using the 2nd and 4th order moments, but only works

well on the symbols (not the raw IQ) of the PSK signal. Therefore, the polyphase clock

sync is used to time-align and match filter the signal, while the Costas loop accounts for any

errors in carrier frequency. Note that while modulations beyond PSK were captured, SNR

estimates were only being performed on a QPSK signal, and the SNR was assumed to be

approximately the same for the other modulation formats.


4.3.2 Training Dataset Generation

A summary of the data capture parameters is shown in Table 4.1. To evaluate the effect

of the noise, frequency shift, and resample augmentation techniques on neural networks

used for AMC, as well as how important a good data capture is, four primary datasets

were generated using the flowgraph shown in Figure 4.1. All datasets consisted of an equal

number of examples for each class in the set BPSK, QPSK, 8PSK, 2FSK, 4FSK, 8FSK,

16QAM, and 64QAM. Two of the datasets contained 10,000 examples of each modulation,

while the other two contained 100,000 examples each for total dataset sizes of 80,000 and

800,000, respectively. All datasets were created using a continuous data capture per class,

and individual examples were formed by breaking up the continuous data capture into chunks

of 256 raw IQ samples, which directly became the inputs to the neural networks. In addition

to the sizes of the datasets, the SNR of the captures also varied. For two of the datasets the

nominal SNR at the receiver was 15 dB, while for the other two the nominal SNR was 7.5

dB.

At the transmitter, signals were transmitted at a center frequency of 2.4 GHz and a sample

rate of 1 MHz, with an approximate bandwidth of 500 kHz. Similarly, at the receiver raw IQ

streams were captured at a sample rate of 1 MHz and 2.4 GHz. This was done in an attempt

to minimize any of the potential receiver effects explored in Chapter 3, such as sample rate

mismatch and center frequency offset. While not strictly necessary to perfectly tune to the

signal, the goal was to get as pristine examples as possible in order to ease the analysis of the

feasibility of data augmentation when using over-the-air signal captures for neural network

based AMC. Note that at these sample rates, each class in the smaller dataset consisted of

a single 2.56 second continuous capture, while the larger dataset was formed using a 25.6

second continuous capture for each class.

4.3. Datasets and Augmentation 45

Table 4.1: Summary of Over-the-Air Capture Parameters

Parameter Dataset 1 Dataset 2 Dataset 3 Dataset 4

Examples per Class 10,000 10,000 100,000 100,000SNR at Receiver 15 dB 7.5 dB 15 dB 7.5 dBTransmitter Fc 2.4 GHz 2.4 GHz 2.4 GHz 2.4 GHz

Receiver Fc 2.4 GHz 2.4 GHz 2.4 GHz 2.4 GHzSample Rate 1 MHz 1 MHz 1 MHz 1 MHz

Samples per Symbol 2 2 2 2Capture Duration 2.56 s 2.56 s 25.6 s 25.6 s

4.3.3 Test Dataset Generation

Unlike the training datasets, where the center frequency and sample rates were held fixed in

an attempt to simulate a single data capture, the over-the-air test dataset was designed to

model errors in a detection and isolation stage of a blind receiver as outlined in Chapter 3,

where future signal acquisitions will include signals with varying power, as well as signals with

less accurate center frequency and bandwidth estimates. To this end, an evaluation dataset

was made with signals varying from about 0 dB to 15 dB SNR, sample rates that varied

between approximately 2 and 2.5 samples per symbol, and frequency offsets that ranged

from ±2.5% of the sample rate. These ranges were leveraged from Chapter 3, allowing for a

reasonable error range in the detection and isolation stage of a receiver without unnecessarily

complicating the dataset. As an example, such a ±2.5% frequency uncertainty translates to

±25 kHz center frequency error for a 1 MHz signal, which is an easily achievable bound for

most practical hardware platforms. Additionally, due to the augmenting constraints in this

work, i.e. not having enough input samples if the signal captures were downsampled, data

below 2 samples per symbol was not used in creating the test dataset. For each SNR point

shown, the resulting classification accuracy was calculated using 50,000 underlying signals

per signal class.


4.3.4 Augmenting Existing Data

As mentioned, dataset augmentation has been heavily employed in other machine learning

applications such as image processing, but it has yet to be explored extensively in the RF

domain. To expand the training dataset and evaluate generalization ability of the neural

network, three types of data augmentation were explored: adding noise, introducing a fre-

quency shift, and resampling. For all fixed size training datasets using augmented data, the

original data was always kept as part of the training dataset, and any augmented examples

were simply appended to the dataset, resulting in exponential growth of the training dataset

for each augmentation applied. When adding noise, white Gaussian noise with zero mean

and random variance was added so that the SNR of the augmented signals, rather than

being fixed around 15 or 7.5 dB, varied from approximately 0 to 15 dB, or 0 to 7.5 dB

depending on the dataset. Note that after the signal had been captured, it is only possible

to decrease the SNR by adding noise; the noise cannot realistically be subtracted from the

capture. For frequency shift augmentations, examples in the training dataset were randomly

frequency shifted between ±2.5% of the sample rate. Similarly, resampling augmentations

were applied so that rather than examples consisting of approximately 2 samples per symbol,

examples were allowed to have up to 2.5 samples per symbol. These augmentation ranges

were used because Chapter 3 clearly showed that classification accuracy is maximized when

the training dataset is tuned to the expected estimation error ranges in a receiver, and these

were the assumed estimation error ranges as described in Section 4.3.3.

4.4 Simulation Results

For all of the plots in this chapter, the CNN and LSTM architectures were evaluated with

the same data, captured according to Section 4.3.3. Where appropriate, to get an idea of

4.4. Simulation Results 47

baseline expectations for the neural networks, the performance of networks trained with

synthetic data is shown as well, where both the CNN and LSTM networks are trained on

256 synthetically generated raw IQ samples, with 100,000 examples per class. The SNR

range of the synthetically generated examples was 0 to 20 dB, the sample rate varied from

approximately 1.5 to 2.5 samples per symbol, and the frequency shift ranged uniformly over

±2.5% of the sample rate, as described in Chapter 3. Note that for a fair baseline, the

dataset from Chapter 3 was modified slightly so that modulation specific parameters such

as pulse shape, rolloff factor, pulse length, and modulation index were fixed, in an attempt

to be consistent with the over-the-air captures.

4.4.1 Synthetic Datasets

To demonstrate both the upside and downside of synthetic training datasets, both the CNN

and the LSTM architectures are trained using synthetic data that represents the real world

in different ways. In the over-the-air lab environment described in Section 4.3, the primary

real-world effects that will need to be accurately modeled in the synthetic training dataset

are AWGN, frequency offsets, and sample rate mismatches. Figure 4.2 and Figure 4.3 show

the overall classification accuracy of the CNN and LSTM architectures, respectively, when

only a portion of the real world is accurately modeled in the synthetic training dataset. If the

model of the real world is incomplete, overall classification accuracy is poor when deployed

in real-world scenarios. However, if the synthetic training dataset does accurately model the

real world, as is the case with the ‘Noise, freq shift, resamp’ curves, neural networks can

perform very well when used in real-world scenarios. Given that the lab environment is well

defined, generating a synthetic training dataset that appropriately models the environment

is relatively simple and can clearly act as an “oracle” for baseline comparison with the

augmentation scenarios presented in the rest of this chapter.


0 2 4 6 8 10 12 14SNR (dB)

0.0

0.2

0.4

0.6

0.8

1.0

Classifica

tion Accuracy

CNN Classification Accuracy vs. SNRSynthetic Training Datasets

No augmentationNoiseResampleFreq shiftNoise, freq shift, resamp

Figure 4.2: Classification accuracy of CNN architecture for various training scenarios usingsynthetic datasets. Synthetic training datasets that do not accurately model the real worldresult in poor neural network performance when deployed in real-world systems.


0 2 4 6 8 10 12 14SNR (dB)

0.0

0.2

0.4

0.6

0.8

1.0

Classifica

tion Accuracy

LSTM Classification Accuracy vs. SNRSynthetic Training Datasets

No augmentationNoiseResampleFreq shiftNoise, freq shift, resamp

Figure 4.3: Classification accuracy of LSTM architecture for various training scenarios usingsynthetic datasets. Synthetic training datasets that do not accurately model the real worldresult in poor neural network performance when deployed in real-world systems.


0 2 4 6 8 10 12 14SNR (dB)

0.0

0.2

0.4

0.6

0.8

1.0Classifica

tion Accuracy

CNN Classification Accuracy vs. SNR: Dataset 110,000 Examples at 15 dB SNR, Single Augmentation

No augmentationNoiseResampleFreq shiftSynthetic

Figure 4.4: Classification Accuracy of CNN architecture for various training scenarios usingDataset 1 and a single augmentation. None of the augmentations show improvements overthe CNN trained with synthetic data, and only augmenting with noise is capable of doingbetter than random guessing at lower SNRs.

4.4.2 Augmenting Dataset 1: 10,000 Examples at 15 dB SNR

As a first test of augmenting datasets for neural network based AMC, both the CNN and

LSTM architectures were trained by adding a single augmentation to the data from Dataset

1 in Table 4.1. Specifically, as described in Section 4.3.4, the augmentations were adding

noise, applying a frequency shift, or resampling the data. Figure 4.4 shows the classification

accuracy versus SNR of the CNN architecture when trained on synthetic data, over-the-air

data that has not been augmented, and over-the-air data that has a single augmentation.

Similarly, Figure 4.5 shows classification accuracies for the same scenarios, but for the LSTM

architecture instead of the CNN architecture.


0 2 4 6 8 10 12 14SNR (dB)

0.0

0.2

0.4

0.6

0.8

1.0Cl

assif

icatio

n Ac

cura

cy

LSTM Classification Accuracy vs. SNR: Dataset 110,000 Examples at 15 dB SNR, Single Augmentation


Figure 4.5: Classification Accuracy of LSTM architecture for various training scenarios usingDataset 1 and a single augmentation. None of the augmentations show improvements overthe LSTM trained with synthetic data, and only augmenting with noise is capable of doingbetter than random guessing at lower SNRs.

In both cases, no single augmentation technique is adequate for significantly improving per-

formance vs. the non-augmented dataset. As shown in Chapter 3, accounting for estimation

errors in a receiver is important for maximizing classification accuracy, and of all the sce-

narios considered in Figure 4.4 and Figure 4.5, only the networks trained with synthetic

data adequately account for these imperfections. Augmenting with noise is clearly necessary

for improving performance at lower SNRs, however this augmentation comes at the expense

of decreasing classification accuracy at higher SNRs because the neural networks must now

generalize over a broader range of SNRs. Given that the evaluation dataset includes different

SNRs, sample rate offsets, and frequency offsets, the results from Chapter 3 show that it is

critical that all three of these must be accounted for in the training dataset. Figure 4.4 and


Figure 4.5 support this finding, and indeed it is not surprising that no single augmentation

technique provides significant improvement. While the original, non-augmented capture will

have some sample rate and frequency offset due to hardware inaccuracies, clearly augmenting

the capture further will be vital to future performance on unseen data. Note that for this

particular set of hardware, the largest gain in classification accuracy occurs when the resam-

pling augmentation is applied to the original data capture. This is believed to be because

the local oscillators in the hardware were poorly disciplined, resulting in the original data

capture containing frequency offsets that covered a wider range when compared to sample

rate errors. Also note that when compared to the neural networks trained with synthetic

datasets in Figure 4.2 and Figure 4.3, the augmented datasets consistently improve over their

synthetic counterparts, further emphasizing the usefulness of augmenting real-world signal

captures, especially when it is difficult to accurately model the real world.

Figure 4.6 and Figure 4.7 show the overall performance of the CNN and LSTM architectures

when multiple augmentations are applied to the training datasets. With both architectures,

training with augmented data where only a frequency shift and a resample were applied is

able to match the performance of the network trained with synthetic data at high SNRs.

This is because the training set at higher SNRs is most representative of the test dataset,

but performance quickly degrades once the SNR expands below the range that was used in

training. As was shown in Figure 4.4 and Figure 4.5, augmenting with noise is once again

crucial to maximizing accuracy of both networks at lower SNRs, emphasizing the fact that

including lower SNR examples during training is vital to future performance on unseen data.

Augmenting with all three techniques is clearly the best scenario, and the one that also most

closely matches the unseen test dataset. Note that while the versions trained with synthetic

datasets are still the best overall performing networks, augmenting with noise, frequency

shifts, and resampling allows the performance of the networks to begin to approach their


0 2 4 6 8 10 12 14SNR (dB)

0.0

0.2

0.4

0.6

0.8

1.0Classification Accuracy

CNN Classification Accuracy vs. SNR: Dataset 110,000 Examples at 15 dB SNR, Combined Augmentation

Noise, resampNoise, freq shiftFreq shift, resampNoise, freq shift, resampSynthetic

Figure 4.6: Classification Accuracy of CNN architecture for various training scenarios usingDataset 1 and multiple augmentations. All three augmentations must be used to maximizethe effectiveness of the dataset.

synthetic equivalents, and almost match performance in the case of the LSTM architecture.

The discrepancy is believed to be due to a combination of the size of the training datasets, the

size of the neural networks, and the underlying fidelity of the initial signal capture. Where

the networks trained with synthetic data had 100,000 independent examples of each class,

the networks trained with all three augmentations have 80,000 correlated examples per class.

While the smaller LSTM architecture appears to be able to generalize well with this limited

sample size, the CNN likely needs more training data to more closely match the performance

when trained with synthetic data. Additionally, the 15 dB capture was near the maximum

achievable SNR, and as a result there are likely hardware non-linearities/anomalies that are

slightly corrupting the signal capture, affecting classification accuracy at lower SNRs.


0 2 4 6 8 10 12 14SNR (dB)

0.0

0.2

0.4

0.6

0.8

1.0Clas

sifica

tion Ac

curacy

LSTM Classification Accuracy vs. SNR: Dataset 110,000 Examples at 15 dB SNR, Combined Augmentation


Figure 4.7: Classification Accuracy of LSTM architecture for various training scenarios usingDataset 1 and multiple augmentations. All three augmentations must be used to maximizethe effectiveness of the dataset.

4.4.3 Augmenting Dataset 2: 10,000 Examples at 7.5 dB SNR

While noise, frequency shift, and resampling augmentations on real world training datasets

captured at high SNRs can approach performance of networks trained with synthetic datasets,

achieving similar results with lower SNR signal captures is less straightforward. If the origi-

nal capture is at a lower SNR, such as in Dataset 2 where the nominal SNR at the receiver is

7.5 dB, the same combination of augmentations from Section 4.4.2 appear to be inadequate

to match or exceed the performance of the neural networks trained with synthetic datasets

across the entire 0-15 dB SNR range. Note that while not explicitly explored, lower quality

signal captures that initially contain frequency offsets or sample rate mismatches are likely

not to have a significant impact when compared to SNR. This is because both frequency


0 2 4 6 8 10 12 14SNR (dB)

0.0

0.2

0.4

0.6

0.8


CNN Classification Accuracy vs. SNR: Dataset 210,000 Examples at 7.5 dB SNR, Single Augmentation


Figure 4.8: Classification Accuracy of CNN architecture for various training scenarios usingDataset 2 and a single augmentation. None of the augmentations show improvements overthe CNN trained with synthetic data, once again demonstrating that multiple augmentationtechniques are required. All but the resampling augmentation show almost no improvementthroughout the entire SNR range.

offsets and sample rate mismatches can be mitigated after the fact with multiplication by

complex exponentials or resampling. However, noise cannot be readily removed from a signal,

and therefore the captured SNR corresponds to the best case SNR example in the training

datasets.

Figure 4.8 shows the performance of the CNN architecture for single augmentations, and

Figure 4.9 shows the equivalent for the LSTM architecture. As with the higher SNR data

capture, no single augmentation technique is able to come close to the neural networks trained

with synthetic data. The important take-away from these plots is that classification accuracy

is roughly capped at the maximally trained SNR. Even when future test points have higher


0 2 4 6 8 10 12 14SNR (dB)

0.0

0.2

0.4

0.6

0.8

1.0Clas

sifica

tion Ac

curacy

LSTM Classification Accuracy vs. SNR: Dataset 210,000 Examples at 7.5 dB SNR, Single Augmentation


Figure 4.9: Classification Accuracy of LSTM architecture for various training scenarios us-ing Dataset 2 and a single augmentation. No scenario is better than the LSTM trainedwith synthetic data, once again demonstrating that multiple augmentation techniques arerequired.

SNRs, the networks are not able to classify them with any higher amount of reliability. Said

another way, the features that the networks are using to make classification decisions are not

features that become easier to recognize as the SNR improves. Also, while the classification

accuracy trends of the LSTM architecture are very similar to those shown in Figure 4.5, just

scaled and shifted due to the degraded SNR of the initial data capture, training the CNN

architecture with non-augmented and frequency shift augmentation datasets results in poor

generalization ability of the neural network, effectively never being able to improve beyond

random guessing. This trend is likely only observed with the CNN architecture due to the

significantly larger number of parameters when compared to the LSTM architecture.

Combining augmentation techniques when training the CNN and LSTM architectures with


0 2 4 6 8 10 12 14SNR (dB)

0.0

0.2

0.4

0.6

0.8

1.0Cl

assif

icatio

n Ac

cura

cy

CNN Classification Accuracy vs. SNR: Dataset 210,000 Examples at 7.5 dB SNR, Combined Augmentation


Figure 4.10: Classification Accuracy of CNN architecture for various training scenarios usingDataset 2 and multiple augmentations. Augmenting the training data by applying noise,resampling, and a frequency shift is critical to improving classification accuracy, but accuracyis still relatively poor throughout the entire SNR range when compared to the network trainedwith synthetic data.

Dataset 2 improves classification accuracy of both networks at lower SNRs, but still does

not match the performance of the “oracles” trained with synthetic data, particularly at

higher SNRs. Figure 4.10 shows the classification accuracy of the CNN architecture and

Figure 4.11 shows the classification accuracy of the LSTM architecture for these combined

training augmentation scenarios. Similar to the results from Section 4.4.2, adding noise,

frequency shift, and resample augmentations is crucial for maximizing classification accuracy

on future over-the-air scenarios. Comparing the results of Figures 4.8–4.11 to the results in

Figures 4.4–4.7, it is clear that while the augmentations are helpful to an extent, the SNR of

the initial data capture is crucial if the data is going to be used as training datasets for neural


0 2 4 6 8 10 12 14SNR (dB)

0.0

0.2

0.4

0.6

0.8

1.0Clas

sifica

tion Ac

curacy

LSTM Classification Accuracy vs. SNR: Dataset 210,000 Examples at 7.5 dB SNR, Combined Augmentation


Figure 4.11: Classification Accuracy of LSTM architecture for various training scenariosusing Dataset 2 and multiple augmentations. Augmenting the training data by applyingnoise, frequency shifts, and resampling is critical to maximizing classification accuracy acrossthe entire SNR range, but still does not match the performance of the network trained withsynthetic data.

network based AMC on raw IQ samples, especially for the one-shot augmentation scenarios

considered here. Furthermore, an important takeaway from all of the results in Figures

4.8–4.11 is that regardless of architecture, the classification accuracy effectively peaks and

remains relatively constant for all SNRs above the maximally trained SNR, in this case 7.5

dB. Expanding these training datasets even further to help bolster classification accuracy is

explored more thoroughly in Section 4.4.6.


4.4.4 Augmenting Dataset 3: 100,000 Examples at 15 dB SNR

As a third simulation, Dataset 3 is used for training, where the original capture size is

increased by an order of magnitude so that rather than 10,000 examples per class as was

the case in Datasets 1 and 2, now there is enough data to make 100,000 examples per class.

An order of magnitude was chosen so that, in the non-augmented case, the over-the-air

captured dataset has the same number of unique examples as the synthetically generated

dataset. Note that while the number of unique examples between the synthetic dataset and

the over-the-air dataset are now the same, the over-the-air dataset will have examples that

are still slightly correlated due to effects such as aliasing and pulse shaping.

Figure 4.12 shows the classification accuracy of the CNN architecture, and Figure 4.13 shows

the classification accuracy of the LSTM architecture when training data is augmented with

noise, frequency shifts, and resampling, as well as how this combination of augmentations

compare to training without augmentation and training with synthetic data. As before,

in order to generalize for lower SNRs and improve classification accuracy at higher SNRs,

augmentation is crucial during training. However, while the augmented CNN architecture

improves over its counterpart in Figure 4.6, the LSTM architecture shows almost no im-

provement over its counterpart in Figure 4.7. This trend reinforces the conclusions from

Section 4.4.2, and helps to illustrate that, for higher SNR signal captures, satisfactory per-

formance can be achieved with augmentations of significantly smaller initial data captures,

particularly if the neural network architecture is optimized for the task at hand.


0 2 4 6 8 10 12 14SNR (dB)

0.0

0.2

0.4

0.6

0.8

1.0

Classifica

tion Accuracy

CNN Classification Accuracy vs. SNR: Dataset 3100,000 Examples at 15 dB SNR

No augmentationNoise, freq shift, resampSynthetic

Figure 4.12: Classification Accuracy of CNN architecture for various training scenarios usingDataset 3. Augmenting the training data is critical to improving classification accuracy, andthe larger initial capture size helps to improve classification accuracy when compared tousing Dataset 1. The CNN trained with synthetic data still has the best performance overthe entire SNR range.


0 2 4 6 8 10 12 14SNR (dB)

0.0

0.2

0.4

0.6

0.8

1.0

Clas

sifica

tion

Accu

racy

LSTM Classification Accuracy vs. SNR: Dataset 3100,000 Examples at 15 dB SNR


Figure 4.13: Classification Accuracy of LSTM architecture for various training scenariosusing Dataset 3. Augmenting the training data is critical to improving classification accuracy,and the larger initial capture size does not provide any benefit over using the smaller capturesfrom Dataset 1. The LSTM trained with synthetic data still has the best performance overthe entire SNR range.


4.4.5 Augmenting Dataset 4: 100,000 Examples at 7.5 dB SNR

Depending on neural network architecture, the results from Section 4.4.2 and Section 4.4.4

show that if initial data captures contain signals at higher SNRs there is not much benefit

to simply capturing more data. However, having a high SNR signal capture may not always

be practical in real-world scenarios. Therefore, investigating how large, lower SNR captures

such as Dataset 4, can be augmented for neural network based AMC on raw IQ is essential

when determining the viability of the presented techniques, particularly if improvements

might be had over the results shown in Figure 4.10 and Figure 4.11.

Figure 4.14 and Figure 4.15 show classification accuracy when Dataset 4, containing 100,000

signal examples per class captured at 7.5 dB, is augmented with noise, frequency shifts, and

resampling, and is then used to train the CNN and LSTM architectures, respectively. The

plots also show how this training scenario compares to training with non-augmented and

synthetic datasets. Similar to the trends shown in Section 4.4.3, both the CNN and LSTM

architectures approximately reach peak classification accuracy around the maximum training

SNR of 7.5 dB, and remain relatively constant in performance for all higher SNRs, regardless

of augmentation. Note that augmenting is once again essential for accurately classifying

lower SNR signals, and that while the peak classification accuracy at high SNRs is lower

than the equivalent from Section 4.4.4, training with the lower SNR dataset does improve

classification accuracy for lower SNRs. This is likely due to the increased concentration

of samples at lower SNRs because the range only goes from 0-7.5 dB instead of 0-15 dB,

as well as the lower SNR signal captures occurring well within the linear operating range

of the USRPs. By using a larger dataset captured at a lower SNR, all of the sub-optimal

classification accuracies seen in Section 4.4.3 can be overcome, and classification accuracies

of the neural networks when trained with augmented data are now capable of matching the

performance when trained with synthetic data at lower SNRs. These results show that when


0 2 4 6 8 10 12 14SNR (dB)

0.0

0.2

0.4

0.6

0.8

1.0Classifica

tion Accuracy

CNN Classification Accuracy vs. SNR: Dataset 4100,000 Examples at 7.5 dB SNR


Figure 4.14: Classification Accuracy of CNN architecture for various training scenarios usingDataset 4 and multiple augmentations. Augmenting the training data is critical to improvingclassification accuracy at lower SNRs, but the classification accuracy plateaus at higherSNRs. With a larger initial capture, it is possible, through augmentation, to match theclassification accuracy of the CNN trained with synthetic data at lower SNRs.

the initial signal capture is degraded, i.e. at a lower SNR, it is extremely important that

more training examples be used in order to maximize future performance and match the

accuracy of a “oracle” trained with synthetic data. However, regardless of the number of

training examples, matching the accuracy of a “oracle” that has been trained with synthetic

data can only be achieved for the SNR range seen during training, and fails to generalize

well to higher SNRs.


0 2 4 6 8 10 12 14SNR (dB)

0.0

0.2

0.4

0.6

0.8

1.0Cl

assif

icatio

n Ac

cura

cy

LSTM Classification Accuracy vs. SNR: Dataset 4100,000 Examples at 7.5 dB SNR


Figure 4.15: Classification Accuracy of LSTM architecture for various training scenariosusing Dataset 4 and multiple augmentations. Augmenting the training data is critical toimproving classification accuracy at lower SNRs, but the classification accuracy plateaus athigher SNRs. With a larger initial capture, it is possible, through augmentation, to matchthe classification accuracy of the LSTM trained with synthetic data at lower SNRs.

4.4.6 Improving Performance with Dataset 2: 10,000 Examples at

7.5 dB SNR

While the results for training on larger captures at any SNR and smaller captures at high

SNR are encouraging, obtaining large captures can be difficult in real-world scenarios, and

a receiver may not be able to easily control the SNR of captured signals. Therefore, this

section explores more robustly how small data captures at lower SNRs might be able to

achieve classification accuracies on par with the other scenarios, by continually augmenting

the data so that the same example is unlikely to be repeated during training.


0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0Epoch

0

1

2

3

4

Loss

CNN Training and Validation Loss: Dataset 2Noise/Freq Shift/Resamp Augmentation

Training LossValidation Loss

Figure 4.16: Training and validation loss for the CNN architecture augmenting Dataset 2with noise, frequency shifts, and resampling. The CNN begins to overfit immediately.

Both the LSTM and CNN architectures used to generate most of the plots thus far exhibit

significant amounts of overfitting with the smaller datasets (Datasets 1 and 2), particularly

with the lower SNR dataset. Figure 4.16 shows the validation loss in a typical training

run of the CNN for Dataset 2 using noise, frequency shifts, and resampling augmentations,

and Figure 4.17 shows the same thing for the LSTM architecture. While the validation

loss on the LSTM network follows the training loss for the first 5 epochs, it then begins

to significantly overfit the training data. Overfitting on the CNN is even worse, as the

validation loss immediately begins to increase. To address overfitting, and hopefully improve

the performance of the neural networks, rather than adjust the neural network architectures

or apply regularization techniques, one can simply continue to augment the training data,

effectively creating a non-repeating number of training examples [81]. Doing this for the


0 5 10 15 20 25Epoch

0.5

1.0

1.5

2.0

2.5Loss

LSTM Training and Validation Loss: Dataset 2Noise/Freq Shift/Resamp Augmentation


Figure 4.17: Training and validation loss for the LSTM architecture augmenting Dataset2 with noise, frequency shifts, and resampling. The validation loss of the LSTM networktracks the training loss for the first 5 epochs, but then quickly begins to overfit the trainingdata.

combined noise, frequency shift, and resampling augmentations yields training runs such as

Figure 4.18 for the CNN, and Figure 4.19 for the LSTM. By augmenting so that examples

seen during training are not repeated, overfitting is drastically reduced for both networks.


0 20 40 60 80Epoch

0.6

0.8

1.0

1.2

1.4

Loss

CNN Training and Validation Loss: Dataset 2Non-Repeating Noise/Freq Shift/Resamp Augmentation


Figure 4.18: Training and validation loss for the CNN architecture augmenting Dataset 2with a non-repeating number of noise, frequency shift, and resampling augmentations. TheCNN tracks the training loss for the duration of the training, although begins to overfitaround epoch 70.


0 50 100 150 200 250 300Epoch

0.25

0.50

0.75

1.00

1.25

1.50

1.75

2.00

Loss

LSTM Training and Validation Loss: Dataset 2Non-Repeating Noise/Freq Shift/Resamp Augmentation


Figure 4.19: Training and validation loss for the LSTM architecture augmenting Dataset2 with a non-repeating number of noise, frequency shift, and resampling augmentations.The validation loss of the LSTM network is consistently better than the training loss, butplateaus around epoch 300.


0 2 4 6 8 10 12 14SNR (dB)

0.0

0.2

0.4

0.6

0.8


CNN Classification Accuracy vs. SNR: Dataset 2,4Comparison of Methods, 7.5 dB SNR

Noise, freq shift, resamp, Dataset 2Noise, freq shift, resamp, Dataset 4Non-repeating noise, freq shift, resamp, Dataset 2Synthetic

Figure 4.20: Classification accuracy comparison for CNN architecture across four differenttraining scenarios. Training with non-repeated augmentations results in improved accuracy,but is still not as good as the CNN trained with a larger original data capture.

By reducing overfitting to the training data, the overall performance of the network is easily

improved as shown in Figure 4.20 and Figure 4.21 for the CNN and LSTM architectures,

respectively. In the case of the LSTM architecture, non-repeated augmentations with the

smaller dataset show that even with this smaller, lower quality data capture, it is possible

to match the performance of the network trained with synthetic data at lower SNRs. In the

case of the CNN architecture, non-repeated augmentations show some improvement, but are

still unable to match the performance of the CNN trained with synthetic data or the CNN

trained on augmenting Dataset 4. The discrepancy is believed to be due to the underlying

architectures of the neural networks, in that the CNN seems to, in general, be more inclined

to learn unhelpful features in the training dataset and may not be the optimal network design


0 2 4 6 8 10 12 14SNR (dB)

0.0

0.2

0.4

0.6

0.8

1.0Cl

assif

icatio

n Ac

cura

cy

LSTM Classification Accuracy vs. SNR: Dataset 2,4Comparison of Methods, 7.5 dB SNR

Noise, freq shift, resamp, Dataset 2Noise, freq shift, resamp, Dataset 4Non-repeating noise, freq shift, resamp, Dataset 2Synthetic

Figure 4.21: Classification accuracy comparison for LSTM architecture across four differenttraining scenarios. Training with non-repeated augmentations is clearly the best techniquewhen augmenting over-the-air signal captures.

for this particular classification problem. Whether this is simply related to the relative size of

the CNN, or the architecture itself, as well as a more thorough investigation into the possible

dependence between architecture choice and training on over-the-air captures, is left as future

work. Note that even with non-repeated data, both the CNN and LSTM architectures still

effectively plateau for all SNRs above the maximum SNR seen during training, showing that

the initial SNR of the captured data is potentially a physical limit for how well the neural

network classifier will perform on future, higher SNR examples.

4.5. Applying to New Hardware 71

4.5 Applying to New Hardware

As a test of the feasibility and general applicability of augmenting over-the-air captures

for training neural networks used for AMC on raw IQ data, some of the best performing

networks are tested with different combinations of hardware to see what, if any, hardware

specific impairments the networks might not be able to generalize over with the augmentation

techniques presented in this thesis. All channels were once again over-the-air, line-of-sight

in a lab environment, with USRPs 1-2 meters apart. The dataset capture was performed in

exactly the same way as described in Section 4.3, with the addition of different combinations

of receiver/transmitter pairs. The available hardware consisted of two USRP B205minis and

one USRP B200. For purposes of discussion and simulation, the radios are labeled as follows.

1. Radio A: USRP B205 mini 1

2. Radio B: USRP B205 mini 2

3. Radio C: USRP B200

All training from Section 4.4, and therefore the only data used in training the CNN and

LSTM networks, was performed with captures using Radio A as the transmitter, and Radio

B as the receiver. Also in this section, any reference to “overall classification accuracy” or

“overall performance” is the average accuracy of the neural network across the entire 0-15

dB SNR range.

4.5.1 Overall Classification Accuracy

The performance of the neural networks trained with synthetic datasets across the different

receiver and transmitter pairs is shown in Figure 4.22. Additionally, the performance of the


Tx A, Rx B

Tx A, Rx C

Tx B, Rx C

Tx B, Rx A

Tx C, Rx A

Tx C, Rx B

Combination of Hardware

0.600

0.625

0.650

0.675

0.700

0.725

0.750

0.775

0.800Overall Classification Accuracy

Hardware PerformanceTraining on Synthetic Data

CNNLSTM

Figure 4.22: Overall classification accuracy for different receiver and transmitter pairs withneural networks trained using synthetic data. The overall classification accuracy varies fairlysignificantly, as much as about 8% across the different receiver/transmitter pairs.

neural networks trained with noise, frequency shift, and resampling augmentations on the

larger datasets (Datasets 3 and 4) across the different receiver/transmitter pairs is shown in

Figure 4.23 and Figure 4.24 for Dataset 3 and Dataset 4, respectively. Note that in these

figures, the receiver and transmitter pair that was used to create the original training dataset

is highlighted in red.

While the overall performances shown in Figure 4.24 are remarkably consistent across dif-

ferent hardware scenarios, the same cannot be said of the overall performances shown in

Figure 4.22 and Figure 4.23. The most noticeable degradation occurs when Radio B is con-

figured as the transmitter, and Radio C is configured as the receiver. Here, when training


Tx A, Rx B

Tx A, Rx C

Tx B, Rx C

Tx B, Rx A

Tx C, Rx A

Tx C, Rx B


0.600

0.625

0.650

0.675

0.700

0.725

0.750

0.775

0.800Ov

erall C

lassificatio

n Ac

curacy

Hardware Performance: Training on Dataset 3Noise, Frequency Shift, and Resampling Augmentation

CNNLSTM

Figure 4.23: Overall classification accuracy for different receiver and transmitter pairs withneural networks trained using Dataset 3 augmented with noise, frequency shifts, and re-sampling. The receiver and transmitter pair used to create the initial training dataset ishighlighted in red, and the overall accuracy varies fairly significantly, as much as about 8%across the different receiver/transmitter pairs.

with augmentations on Dataset 3 and synthetic data, the overall accuracy drops by about

8% for both networks. The discrepancies for the networks trained with synthetic data are

likely due to missing assumptions in the synthetic dataset that do not completely model all

possible hardware imperfections across the different receiver/transmitter setups. However,

the consistency in performance when augmentations of Dataset 4 are used vs. the inconsis-

tencies when augmentations of Dataset 3 are used is slightly unexpected and can be better

analyzed by looking at overall classification accuracy vs. SNR.


Tx A, Rx B

Tx A, Rx C

Tx B, Rx C

Tx B, Rx A

Tx C, Rx A

Tx C, Rx B


0.600

0.625

0.650

0.675

0.700

0.725

0.750

0.775

0.800Ov

erall C

lassificatio

n Ac

curacy

Hardware Performance: Training on Dataset 4Noise, Frequency Shift, and Resampling Augmentation

CNNLSTM

Figure 4.24: Overall classification accuracy for different receiver and transmitter pairs withneural networks trained using Dataset 4 augmented with noise, frequency shifts, and re-sampling. The receiver and transmitter pair used to create the initial training dataset ishighlighted in red, and the overall accuracy is remarkably consistent for all hardware com-binations.

In all of the presented test scenarios, the worst overall classification accuracy across hard-

ware configurations occurs when Radio B is the transmitter and Radio C is the receiver.

Therefore, using this configuration as a representative ‘worst-case’ scenario, the overall clas-

sification accuracy over the entire SNR range of 0-15 dB is analyzed. Figure 4.25 shows

the classification accuracy vs. SNR of the CNN architecture when trained with synthetic

data, as well as Datasets 3 and 4 after they have been augmented by adding noise, frequency

shifts, and resampling. Similarly Figure 4.26 shows the same plots, but for the LSTM ar-

chitecture. Note that in prior simulations, training on augmented over-the-air captures only


0 2 4 6 8 10 12 14SNR (dB)

0.0

0.2

0.4

0.6

0.8

1.0Classifica

tion Accuracy

CNN Classification Accuracy vs. SNR for Tx B, Rx CComparison of Best Augmented Training DatasetsNoise, freq shift, resamp, Dataset 3Noise, freq shift, resamp, Dataset 4Synthetic

Figure 4.25: Classification accuracy of the CNN architecture with Radio B as the trans-mitter and Radio C as the receiver under various training scenarios. Training on Dataset 4augmented with noise, frequency shifts, and resampling results in the best performance atlower SNRs.

ever matched the performance of the models trained with synthetic data. However, while

the classification accuracy of both networks trained with noise, frequency shift, and resam-

ple augmentations on Dataset 4 still roughly plateaus after 7.5 dB, the performance is now

significantly better than both the synthetic and augmented Dataset 3 training scenarios at

lower SNRs. This improvement reaffirms the importance of augmenting over-the-air signal

captures and shows the benefit that they can have over synthetic datasets by being able to

more accurately model real-world environments. As mentioned, the drop in accuracy with

the network trained on synthetic data is likely due to an incomplete model of the environ-

ment, but the networks trained with augmented Dataset 3 also drop well below the accuracy

of the networks trained with augmented Dataset 4 at lower SNRs. This is most likely due to


0 2 4 6 8 10 12 14SNR (dB)

0.0

0.2

0.4

0.6

0.8

1.0Classifica

tion Accuracy

LSTM Classification Accuracy vs. SNR for Tx B, Rx CComparison of Best Augmented Training DatasetsNoise, freq shift, resamp, Dataset 3Noise, freq shift, resamp, Dataset 4Synthetic

Figure 4.26: Classification accuracy of the LSTM architecture with Radio B as the trans-mitter and Radio C as the receiver under various training scenarios. Training on Dataset 4augmented with noise, frequency shifts, and resampling results in the best performance atlower SNRs.

hardware specific non-linearities/anomalies being introduced into the training set at higher

SNRs, decreasing the quality of the initial signal capture and forcing the networks to account

for these hardware specific imperfections, therefore hurting the generalization ability of the

networks across different hardware configurations.

In summary, Figures 4.22–4.26 help to illustrate that any hardware specific imperfections

in the training data can slightly degrade performance for other receiver/transmitter pairs,

but that these imperfections can be mitigated by ensuring that initial data captures are well

within the linear operating regions of the hardware. Similarly, the consistency in overall

classification accuracy shown in Figure 4.24 is likely due to the plateau at higher SNRs, and

any hardware specific non-linearities that are being introduced at higher SNRs no longer


being incorporated into the training dataset. These results help to reinforce the importance

of an initial, good quality, signal capture, as well as the effectiveness and limitations of the

proposed augmentation techniques.

Chapter 5

Conclusions and Future Work

Using deep neural networks directly on raw IQ samples is a new and powerful tool for

modulation classification applications. However, before these neural networks are realizable,

it is important to understand how they will perform in practical scenarios. In most realistic

scenarios, a receiver has no a priori knowledge about the spectrum it is observing and must

use a generic, blind, detection and isolation algorithm before classification can be performed.

To address this fact, Chapter 3 investigated the relationship between convolutional neural

network classification accuracy and detector quality. It was shown that as a convolutional

neural network is trained to generalize over detection estimation parameters in the form

of frequency offset and sample rate offset, overall classification accuracy is degraded, even

for test signals with no offsets. For the offsets explored in Chapter 3, the degradation in

classification accuracy is more pronounced for frequency offset than for sample rate offset.

Classification accuracy is further degraded when training for both frequency offset and sample

rate offset.

The presented results highlight the importance of accurate center frequency and sample rate

estimates in a detector when using similar neural network architectures for AMC on raw

78

79

IQ samples. Furthermore, clearly defining expected estimation error ranges, and limiting

neural network training to within these ranges, can maximize classification accuracy while

minimizing complexity. In other words, it was shown that designing a neural network for

an arbitrarily large range not only increases complexity, but negatively impacts the overall

classification accuracy, even for signals with no center frequency or sample rate estimation

errors.

In Chapter 4 the practicality of using real world, over-the-air signal captures when training

neural networks for AMC on raw IQ was investigated and shows promising results, but

only if over-the-air captures are augmented during training. While synthetically created

datasets are typically much easier to create than over-the-air captures, and allow for an

almost endless amount of training data, it can be very difficult to account for all types of

future modulations and RF environments. Being able to effectively train neural networks

with over-the-air captures will become exceedingly important as AMC with neural networks

on raw IQ continues to develop.

Through simulation, it was shown that there is a reasonable trade-off between the fidelity

of a signal capture and the classification accuracy of the neural networks, and that sim-

ple augmentations such as applying noise, applying a frequency shift, and resampling the

training examples are essential to maximizing classification accuracy. In particular, all three

augmentations are critical to include for the best performance on future signal captures.

Additionally, it was shown that the SNR and duration of the original data capture is ex-

tremely important. For a given classification accuracy, and for the CNN considered in this

work, the lower the SNR of the signal capture the longer the capture must be. The LSTM

architecture was less sensitive to the length of the initial data capture at lower SNRs, but

for both architectures the peak classification accuracy plateaus at the SNR of the original

signal capture, and while augmenting can increase the classification accuracy at which the

80 Chapter 5. Conclusions and Future Work

networks plateau, none of the presented augmentation techniques allow the classification

accuracy to continue to improve for SNRs above the training range.

The CNN and LSTM architectures considered showed similar trends for all simulations,

although the LSTM architecture consistently out-performed the CNN architecture. Addi-

tionally, training with augmented over-the-air signal captures for a specific transmitter and

receiver pair was able to generalize to other combinations of hardware, particularly when

the signal captures were well within the linear operating ranges of the radios.

In summary, the key takeaways from this work are the following.

• Chapter 3: Receiver Effects

– Errors in a detection and isolation stage of a blind receiver must be accounted for

in training datasets, and the larger these errors the worse the performance of the

neural network classifier.

– Training for and generalizing across a range of frequency and sample rate offsets

permanently and negatively affects the peak classification accuracy of neural net-

work classifiers, even if future signals are perfectly tuned and have no frequency

or sample rate offsets.

– CNNs designed for AMC on raw IQ appear to be more robust to sample rate

offsets than frequency offsets.

– Classification accuracy of neural network classifiers can be improved as the number

of input samples increases, helping to mitigate degradations from blind receiver

estimation errors.

• Chapter 4: Augmenting Over-the-Air Signal Captures

– When creating synthetic training datasets, inaccurate models of the real world

81

result in neural networks that are unable to accurately classify signals when used

in real systems.

– Accurate models of the real world can create good training datasets that allow

neural networks designed for AMC on raw IQ to perform very well when used in

real systems.

– In the absence of accurate synthetic models, over-the-air signal captures can be

used to create training datasets that allow neural network classifiers to achieve

similar performance to networks trained with accurate synthetic datasets, pro-

vided the captures are appropriately augmented.

– Simple augmentations such as adding noise, applying a frequency shift, and re-

sampling can not only increase the size of the over-the-air training dataset, but

are required for general applicability to future blind receiver scenarios.

– With the presented augmentation techniques, the underlying fidelity of the initial

signal capture used for training is crucial to maximizing classification accuracy,

and the better the initial signal capture, the shorter the initial capture can be.

∗ The SNR of the initial signal capture appears to be a limiting factor in how

well neural networks are able to classify unseen, higher SNR signals.

∗ If undesired, hardware specific effects are present in the initial signal cap-

ture, future performance of the neural network classifiers can be negatively

affected. However, this effect is not detrimental and appears to only result

in slightly worse classification accuracies when compared to networks trained

with synthetic datasets.

– LSTM neural netwoks appear to be better suited to AMC on raw IQ than CNNs.

– Augmenting training datasets captured with a particular transmitter/receiver pair

still allows for generalization to other hardware combinations, reinforcing that

82 Chapter 5. Conclusions and Future Work

through augmentation, any emitter specific features that are potentially being

learned are likely not detrimental to the broad applicability of the neural network

classifier.

To extend the results of Chapter 3, in future work it will be important to investigate different

neural network architectures beyond just the CNN architecture described by Figure 3.2 in

order to more broadly define and quantify blind receiver effects on the performance of neural

network classifiers. Additionally, design choices such as increasing/decreasing the size of the

neural network, increasing the number of training samples, or increasing the number of raw

IQ input samples to the neural network may produce different trends and are all parameters

that will need to be more thoroughly explored and quantified. Finally, determining the effect

of estimation errors for an expanded dataset that includes other types of modulation will be

crucial to further generalizing the conclusions of this chapter, beyond just the 8 modulations

considered.

Future work to extend Chapter 4 will need to consider moving beyond just a line-of-sight lab

environment to more complicated scenarios such as multipath environments. Similar to the

future work from Chapter 3, future work for Chapter 4 will also need to evaluate how neural

networks trained with augmented over-the-air captures extend to other types of modulation

such as higher order QAM or orthogonal frequency-division multiplexing. Additionally, all

over-the-air datasets in this work were captured at roughly the same center frequency, and

it will be important to evaluate if there are any frequency dependent receiver/transmitter

effects that can degrade the neural network classifier, and may require more novel forms of

augmentation. While multiple transmitter and receiver pairs were considered in this work,

all of the radios were similar models, and it will be important to more thoroughly evaluate

hardware dependencies that may or may not be learned by raw IQ neural networks trained

with over-the-air captures, particularly if these techniques are to scale to other applications

83

such as emitter identification. Using other radio specific augmentation techniques such as

IQ imbalance, combining augmented over-the-air captures with synthetically generated data,

or conditioning classification decisions based on SNR may show even better results than

the ones presented in this work. Furthermore, one of the dominant trends in Chapter 4

was the plateau seen for the networks trained by augmenting lower SNR signal captures.

Continuing the analysis of this plateau and looking at trends beyond 15 dB will be vital

towards determining if this plateau is in fact a limiting effect for any SNR, or if it is only

a trend observed at lower SNRs. Finally, exploring how increasing the number of raw IQ

input samples into the neural network can improve or degrade performance could be useful,

particularly as more complicated/higher order modulations are explored.

Bibliography

[1] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages

770–778, June 2016. doi: 10.1109/CVPR.2016.90.

[2] I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural

networks. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q.

Weinberger, editors, Advances in Neural Information Processing Systems 27, pages

3104–3112. Curran Associates, Inc., 2014. URL http://papers.nips.cc/paper/

5346-sequence-to-sequence-learning-with-neural-networks.pdf.

[3] A. Graves and N. Jaitly. Towards end-to-end speech recognition with recurrent neural

networks. In Proceedings of the 31st International Conference on International Confer-

ence on Machine Learning - Volume 32, ICML’14, pages II–1764–II–1772. JMLR.org,

2014. URL http://dl.acm.org/citation.cfm?id=3044805.3045089.

[4] L. J. Wong, W. C. Headley, and A. J. Michaels. Estimation of transmitter I/Q imbal-

ance using convolutional neural networks. In 2018 IEEE 8th Annual Computing and

Communication Workshop and Conference (CCWC), pages 948–955, Jan 2018. doi:

10.1109/CCWC.2018.8301715.

[5] T. O’Shea, K. Karra, and T. C. Clancy. Learning approximate neural estimators for

84

http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf

http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf

http://dl.acm.org/citation.cfm?id=3044805.3045089

BIBLIOGRAPHY 85

wireless channel state information. In 2017 IEEE 27th International Workshop on

Machine Learning for Signal Processing (MLSP), pages 1–7, Sept 2017. doi: 10.1109/

MLSP.2017.8168144.

[6] M. Kulin, T. Kazaz, I. Moerman, and E. de Poorter. End-to-end learning from spectrum

data: A deep learning approach for wireless signal identification in spectrum monitoring

applications. IEEE Access, PP(99):1–1, 2018. doi: 10.1109/ACCESS.2018.2818794.

[7] A. Dai, H. Zhang, and H. Sun. Automatic modulation classification using stacked

sparse auto-encoders. In 2016 IEEE 13th International Conference on Signal Processing

(ICSP), pages 248–252, Nov 2016. doi: 10.1109/ICSP.2016.7877834.

[8] T. O’Shea, J. Corgan, and T. C. Clancy. Convolutional radio modulation recognition

networks. In Chrisina Jayne and Lazaros” Iliadis, editors, Engineering Applications

of Neural Networks, EANN 2016, volume 629 of Communications in Computer and

Information Science, pages 213–226. Springer International Publishing, Cham, 2016.

ISBN 978-3-319-44188-7. doi: 10.1007/978-3-319-44188-7 16.

[9] D. Hong, Z. Zhang, and X. Xu. Automatic modulation classification using recurrent

neural networks. In 2017 3rd IEEE International Conference on Computer and Commu-

nications (ICCC), pages 695–700, Dec 2017. doi: 10.1109/CompComm.2017.8322633.

[10] N. E. West and T. O’Shea. Deep architectures for modulation recognition. In 2017 IEEE

International Symposium on Dynamic Spectrum Access Networks (DySPAN), pages 1–6,

March 2017. doi: 10.1109/DySPAN.2017.7920754.

[11] O. A. Dobre, A. Abdi, Y. Bar-Ness, and W. Su. Survey of automatic modulation

classification techniques: classical approaches and new trends. IET Communications, 1

(2):137–156, April 2007. ISSN 1751-8628. doi: 10.1049/iet-com:20050176.

86 BIBLIOGRAPHY

[12] O. A. Dobre, A. Abdi, Y. Bar-Ness, and W. Su. Blind modulation classification: a

concept whose time has come. In IEEE/Sarnoff Symposium on Advances in Wired and

Wireless Communication, 2005., pages 223–228, April 2005. doi: 10.1109/SARNOF.

2005.1426550.

[13] W. C. Headley and C. R. C. M. da Silva. Asynchronous classification of digital

amplitude-phase modulated signals in flat-fading channels. IEEE Transactions on Com-

munications, 59(1):7–12, 2011.

[14] W. C. Headley, J. D. Reed, and C. R. C. M. da Silva. Distributed cyclic spectrum

feature-based modulation classification. In 2008 IEEE Wireless Communications and

Networking Conference, pages 1200–1204, March 2008. doi: 10.1109/WCNC.2008.216.

[15] B. Kim, J. Kim, H. Chae, D. Yoon, and J. W. Choi. Deep neural network-based

automatic modulation classification technique. In 2016 International Conference on

Information and Communication Technology Convergence (ICTC), pages 579–582, Oct

2016. doi: 10.1109/ICTC.2016.7763537.

[16] M. M. T. Abdelreheem and M. O. Helmi. Digital modulation classification through

time and frequency domain features using neural networks. In 2012 IX International

Symposium on Telecommunications (BIHTEL), pages 1–5, Oct 2012. doi: 10.1109/

BIHTEL.2012.6412073.

[17] N. Ghani and R. Lamontagne. Neural networks applied to the classification of spectral

features for automatic modulation recognition. In Military Communications Confer-

ence, 1993. MILCOM ’93. Conference record. Communications on the Move., IEEE,

volume 1, pages 111–115 vol.1, Oct 1993. doi: 10.1109/MILCOM.1993.408536.

[18] G. Arulampalam, V. Ramakonar, A. Bouzerdoum, and D. Habibi. Classification of digi-

tal modulation schemes using neural networks. In Signal Processing and Its Applications,

BIBLIOGRAPHY 87

1999. ISSPA ’99. Proceedings of the Fifth International Symposium on, volume 2, pages

649–652 vol.2, 1999. doi: 10.1109/ISSPA.1999.815756.

[19] L. Han, F. Gao, Z. Li, and O. A. Dobre. Low complexity automatic modulation classi-

fication based on order-statistics. IEEE Transactions on Wireless Communications, 16

(1):400–411, Jan 2017. ISSN 1536-1276. doi: 10.1109/TWC.2016.2623716.

[20] B. Ramkumar. Automatic modulation classification for cognitive radios using cyclic

feature detection. IEEE Circuits and Systems Magazine, 9(2):27–45, Second 2009. ISSN

1531-636X. doi: 10.1109/MCAS.2008.931739.

[21] Z. Zhu and A. K. Nandi. Automatic Modulation Classification: Principles, Algo-

rithms and Applications. Wiley Publishing, 1st edition, 2015. ISBN 1118906497,

9781118906491.

[22] T. O’Shea and N. West. Radio machine learning dataset generation with gnu radio.

Proceedings of the GNU Radio Conference, 1(1), 2016. URL https://pubs.gnuradio.

org/index.php/grcon/article/view/11.

[23] S. C. Hauser, W. C. Headley, and A. J. Michaels. Signal detection effects on deep

neural networks utilizing raw IQ for modulation classification. In MILCOM 2017 - 2017

IEEE Military Communications Conference (MILCOM), pages 121–127, Oct 2017. doi:

10.1109/MILCOM.2017.8170853.

[24] W. Rawat and Z. Wang. Deep convolutional neural networks for image classification: A

comprehensive review. Neural Comput., 29(9):2352–2449, September 2017. ISSN 0899-

7667. doi: 10.1162/neco a 00990. URL https://doi.org/10.1162/neco_a_00990.

[25] F. Chollet. Keras. https://github.com/fchollet/keras, 2015.

https://pubs.gnuradio.org/index.php/grcon/article/view/11

https://pubs.gnuradio.org/index.php/grcon/article/view/11

https://doi.org/10.1162/neco_a_00990

https://github.com/fchollet/keras

88 BIBLIOGRAPHY

[26] M. Abadi et al. TensorFlow: Large-scale machine learning on heterogeneous systems,

2015. URL http://tensorflow.org/. Software available from tensorflow.org.

[27] T. OShea, T. Roy, and T. C. Clancy. Over-the-air deep learning based radio signal

classification. IEEE Journal of Selected Topics in Signal Processing, 12(1):168–179, Feb

2018. ISSN 1932-4553. doi: 10.1109/JSTSP.2018.2797022.

[28] I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016. http:

//www.deeplearningbook.org.

[29] T. G. Martins. How to draw neural network diagrams using Graphviz. https://

tgmstat.wordpress.com/2013/06/12/draw-neural-network-diagrams-graphviz/,

2013. [Online; accessed 20-May-2018].

[30] A. L. Maas, A. Y. Hannun, and A. Y. Ng. Rectifier nonlinearities improve neural

network acoustic models. In ICML Workshop on Deep Learning for Audio, Speech, and

Language Processing, 2013.

[31] V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines.

In Proceedings of the 27th International Conference on International Conference on

Machine Learning, ICML’10, pages 807–814, USA, 2010. Omnipress. ISBN 978-1-60558-

907-7. URL http://dl.acm.org/citation.cfm?id=3104322.3104425.

[32] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke,

and A. Rabinovich. Going deeper with convolutions. In Computer Vision and Pattern

Recognition (CVPR), 2015. URL http://arxiv.org/abs/1409.4842.

[33] A. Graves, A. r. Mohamed, and G. Hinton. Speech recognition with deep recurrent

neural networks. In 2013 IEEE International Conference on Acoustics, Speech and

Signal Processing, pages 6645–6649, May 2013. doi: 10.1109/ICASSP.2013.6638947.

http://tensorflow.org/

http://www.deeplearningbook.org

http://www.deeplearningbook.org

https://tgmstat.wordpress.com/2013/06/12/draw-neural-network-diagrams-graphviz/

https://tgmstat.wordpress.com/2013/06/12/draw-neural-network-diagrams-graphviz/

http://dl.acm.org/citation.cfm?id=3104322.3104425

http://arxiv.org/abs/1409.4842

BIBLIOGRAPHY 89

[34] S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, and K. Saenko.

Sequence to sequence – video to text. In Proceedings of the 2015 IEEE International

Conference on Computer Vision (ICCV), ICCV ’15, pages 4534–4542, Washington, DC,

USA, 2015. IEEE Computer Society. ISBN 978-1-4673-8391-2. doi: 10.1109/ICCV.2015.

515. URL http://dx.doi.org/10.1109/ICCV.2015.515.

[35] K. Greff, R. K. Srivastava, J. Koutnik, B. R. Steunebrink, and J. Schmidhuber. Lstm:

A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems,

28(10):2222–2232, Oct 2017. ISSN 2162-237X. doi: 10.1109/TNNLS.2016.2582924.

[36] Y. Lecun. Generalization and network design strategies. Elsevier, 1989.

[37] J. L. Elman. Finding structure in time. Cognitive Science, 14(2):179–211. doi: 10.1207/

s15516709cog1402\ 1. URL https://onlinelibrary.wiley.com/doi/abs/10.1207/

s15516709cog1402_1.

[38] A. Yoshihara, K. Fujikawa, K. Seki, and K. Uehara. Predicting stock market trends

by recurrent deep neural networks. In Duc-Nghia Pham and Seong-Bae Park, editors,

PRICAI 2014: Trends in Artificial Intelligence, pages 759–769, Cham, 2014. Springer

International Publishing. ISBN 978-3-319-13560-1.

[39] S. Lawrence, C. L. Giles, and S. Fong. Natural language grammatical inference with

recurrent neural networks. IEEE Transactions on Knowledge and Data Engineering, 12

(1):126–140, Jan 2000. ISSN 1041-4347. doi: 10.1109/69.842255.

[40] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Comput., 9(8):

1735–1780, November 1997. ISSN 0899-7667. doi: 10.1162/neco.1997.9.8.1735. URL

http://dx.doi.org/10.1162/neco.1997.9.8.1735.

[41] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image

http://dx.doi.org/10.1109/ICCV.2015.515

https://onlinelibrary.wiley.com/doi/abs/10.1207/s15516709cog1402_1

https://onlinelibrary.wiley.com/doi/abs/10.1207/s15516709cog1402_1

http://dx.doi.org/10.1162/neco.1997.9.8.1735

90 BIBLIOGRAPHY

caption generator. 2015 IEEE Conference on Computer Vision and Pattern Recognition

(CVPR), pages 3156–3164, 2015.

[42] K. Gregor, I. Danihelka, A. Graves, D. Rezende, and D. Wierstra. Draw: A recur-

rent neural network for image generation. In Francis Bach and David Blei, editors,

Proceedings of the 32nd International Conference on Machine Learning, volume 37 of

Proceedings of Machine Learning Research, pages 1462–1471, Lille, France, 07–09 Jul

2015. PMLR. URL http://proceedings.mlr.press/v37/gregor15.html.

[43] F. A. Gers, J. A. Schmidhuber, and F. A. Cummins. Learning to forget: Contin-

ual prediction with lstm. Neural Comput., 12(10):2451–2471, October 2000. ISSN

0899-7667. doi: 10.1162/089976600300015015. URL http://dx.doi.org/10.1162/

089976600300015015.

[44] P. Golik, Z. Tuske, R. Schluter, and H. Ney. Convolutional neural networks for acoustic

modeling of raw time signal in lvcsr. In INTERSPEECH, 2015.

[45] G. Kechriotis, E. Zervas, and E. S. Manolakos. Using recurrent neural networks for

adaptive communication channel equalization. IEEE Transactions on Neural Networks,

5(2):267–278, Mar 1994. ISSN 1045-9227. doi: 10.1109/72.279190.

[46] T. W. S. Chow and Siu-Yeung Cho. An accelerated recurrent network training algorithm

using iir filter model and recursive least squares method. IEEE Transactions on Circuits

and Systems I: Fundamental Theory and Applications, 44(11):1082–1086, Nov 1997.

ISSN 1057-7122. doi: 10.1109/81.641774.

[47] D. Ciregan, U. Meier, and J. Schmidhuber. Multi-column deep neural networks for image

classification. In 2012 IEEE Conference on Computer Vision and Pattern Recognition,

pages 3642–3649, June 2012. doi: 10.1109/CVPR.2012.6248110.

http://proceedings.mlr.press/v37/gregor15.html

http://dx.doi.org/10.1162/089976600300015015

http://dx.doi.org/10.1162/089976600300015015

BIBLIOGRAPHY 91

[48] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep

convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q.

Weinberger, editors, Advances in Neural Information Processing Systems 25, pages

1097–1105. Curran Associates, Inc., 2012. URL http://papers.nips.cc/paper/

4824-imagenet-classification-with-deep-convolutional-neural-networks.

pdf.

[49] J. A. Sills. Maximum-likelihood modulation classification for psk/qam. In MILCOM

1999. IEEE Military Communications. Conference Proceedings (Cat. No.99CH36341),

volume 1, pages 217–220 vol.1, 1999. doi: 10.1109/MILCOM.1999.822675.

[50] W. Wei and J. M. Mendel. Maximum-likelihood classification for digital amplitude-

phase modulations. IEEE Transactions on Communications, 48(2):189–193, Feb 2000.

ISSN 0090-6778. doi: 10.1109/26.823550.

[51] K. Kim and A. Polydoros. Digital modulation classification: the bpsk versus qpsk case.

In Military Communications Conference, 1988. MILCOM 88, Conference record. 21st

Century Military Communications - What’s Possible? 1988 IEEE, pages 431–436 vol.2,

Oct 1988. doi: 10.1109/MILCOM.1988.13427.

[52] A. Polydoros and K. Kim. On the detection and classification of quadrature digital

modulations in broad-band noise. IEEE Transactions on Communications, 38(8):1199–

1211, Aug 1990. ISSN 0090-6778. doi: 10.1109/26.58753.

[53] Chung-Yu Huan and A. Polydoros. Likelihood methods for mpsk modulation classifi-

cation. IEEE Transactions on Communications, 43(2/3/4):1493–1504, Feb 1995. ISSN

0090-6778. doi: 10.1109/26.380199.

[54] F. Hameed, O. A. Dobre, and D. C. Popescu. On the likelihood-based approach to

http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf



92 BIBLIOGRAPHY

modulation classification. IEEE Transactions on Wireless Communications, 8(12):5884–

5892, December 2009. ISSN 1536-1276. doi: 10.1109/TWC.2009.12.080883.

[55] A. Ramezani-Kebrya, I. M. Kim, D. I. Kim, F. Chan, and R. Inkol. Likelihood-based

modulation classification for multiple-antenna receiver. IEEE Transactions on Commu-

nications, 61(9):3816–3829, September 2013. ISSN 0090-6778. doi: 10.1109/TCOMM.

2013.073113.121001.

[56] C. S. Long, K. M. Chugg, and A. Polydoros. Further results in likelihood classification of

qam signals. In Military Communications Conference, 1994. MILCOM ’94. Conference

Record, 1994 IEEE, pages 57–61 vol.1, Oct 1994. doi: 10.1109/MILCOM.1994.473837.

[57] E. E. Azzouz and A. K. Nandi. Automatic Modulation Recognition of Communication

Signals. Kluwer Academic Publishers, Norwell, MA, USA, 1996. ISBN 0792397967.

[58] M. L. D. Wong and A. K. Nandi. Automatic digital modulation recognition using

spectral and statistical features with multi-layer perceptrons. In Proceedings of the Sixth

International Symposium on Signal Processing and its Applications (Cat.No.01EX467),

volume 2, pages 390–393 vol.2, 2001. doi: 10.1109/ISSPA.2001.950162.

[59] A. K. Nandi and E. E. Azzouz. Modulation recognition using artificial neural net-

works. Signal Processing, 56(2):165 – 175, 1997. ISSN 0165-1684. doi: https:

//doi.org/10.1016/S0165-1684(96)00165-X. URL http://www.sciencedirect.com/

science/article/pii/S016516849600165X.

[60] Y. Yang and S. S. Soliman. Statistical moments based classifier for mpsk signals. In

Global Telecommunications Conference, 1991. GLOBECOM ’91. ’Countdown to the

New Millennium. Featuring a Mini-Theme on: Personal Communications Services,

pages 72–76 vol.1, Dec 1991. doi: 10.1109/GLOCOM.1991.188358.

http://www.sciencedirect.com/science/article/pii/S016516849600165X

http://www.sciencedirect.com/science/article/pii/S016516849600165X

BIBLIOGRAPHY 93

[61] S. S. Soliman and S. Z. Hsue. Signal classification using statistical moments. IEEE

Transactions on Communications, 40(5):908–916, May 1992. ISSN 0090-6778. doi:

10.1109/26.141456.

[62] Y. Yang and S. S. Soliman. An improved moment-based algorithm for signal classifi-

cation. Signal Processing, 43(3):231 – 244, 1995. ISSN 0165-1684. doi: https://doi.

org/10.1016/0165-1684(95)00002-U. URL http://www.sciencedirect.com/science/

article/pii/016516849500002U.

[63] A. Swami and B. M. Sadler. Hierarchical digital modulation classification using cu-

mulants. IEEE Transactions on Communications, 48(3):416–429, Mar 2000. ISSN

0090-6778. doi: 10.1109/26.837045.

[64] A. Swami, S. Barbarossa, and B. M. Sadler. Blind source separation and signal clas-

sification. In Conference Record of the Thirty-Fourth Asilomar Conference on Signals,

Systems and Computers (Cat. No.00CH37154), volume 2, pages 1187–1191 vol.2, Oct

2000. doi: 10.1109/ACSSC.2000.910750.

[65] P. Marchand, C. Le Martret, and J. L. Lacoume. Classification of linear modulations

by a combination of different orders cyclic cumulants. In Proceedings of the IEEE

Signal Processing Workshop on Higher-Order Statistics, pages 47–51, Jul 1997. doi:

10.1109/HOST.1997.613485.

[66] O. A. Dobre, Y. Bar-Ness, and W. Su. Higher-order cyclic cumulants for high order mod-

ulation classification. In IEEE Military Communications Conference, 2003. MILCOM

2003., volume 1, pages 112–117 Vol.1, Oct 2003. doi: 10.1109/MILCOM.2003.1290087.

[67] O. A. Dobre, Y. Bar-Ness, and W. Su. Robust qam modulation classification algorithm

using cyclic cumulants. In 2004 IEEE Wireless Communications and Networking Con-

http://www.sciencedirect.com/science/article/pii/016516849500002U

http://www.sciencedirect.com/science/article/pii/016516849500002U

94 BIBLIOGRAPHY

ference (IEEE Cat. No.04TH8733), volume 2, pages 745–748 Vol.2, March 2004. doi:

10.1109/WCNC.2004.1311279.

[68] S. Z. Hsue and S. S. Soliman. Automatic modulation recognition of digitally modu-

lated signals. In Military Communications Conference, 1989. MILCOM ’89. Conference

Record. Bridging the Gap. Interoperability, Survivability, Security., 1989 IEEE, pages

645–649 vol.3, Oct 1989. doi: 10.1109/MILCOM.1989.104004.

[69] S. Z. Hsue and S. S. Soliman. Automatic modulation classification using zero crossing.

IEE Proceedings F - Radar and Signal Processing, 137(6):459–464, Dec 1990. ISSN

0956-375X. doi: 10.1049/ip-f-2.1990.0066.

[70] C. S. Park, J. H. Choi, S. P. Nah, W. Jang, and D. Y. Kim. Automatic modulation

recognition of digital signals using wavelet features and svm. In 2008 10th International

Conference on Advanced Communication Technology, volume 1, pages 387–390, Feb

2008. doi: 10.1109/ICACT.2008.4493784.

[71] X. Gong, Li Bian, and Q. Zhu. Collaborative modulation recognition based on svm. In

2010 Sixth International Conference on Natural Computation, volume 2, pages 871–874,

Aug 2010. doi: 10.1109/ICNC.2010.5583916.

[72] D. Boutte and B. Santhanam. A hybrid ICA-SVM approach to continuous phase mod-

ulation recognition. IEEE Signal Processing Letters, 16(5):402–405, May 2009. ISSN

1070-9908. doi: 10.1109/LSP.2009.2016444.

[73] A. Hazza, M. Shoaib, A. Saleh, and A. Fahd. A novel approach for automatic classifi-

cation of digitally modulated signals in hf communications. In The 10th IEEE Inter-

national Symposium on Signal Processing and Information Technology, pages 271–276,

Dec 2010. doi: 10.1109/ISSPIT.2010.5711793.

BIBLIOGRAPHY 95

[74] W. A. Gardner and C. M. Spooner. Signal interception: performance advantages of

cyclic-feature detectors. IEEE Transactions on Communications, 40(1):149–159, Jan

1992. ISSN 0090-6778. doi: 10.1109/26.126716.

[75] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout:

A simple way to prevent neural networks from overfitting. Journal of Machine Learning

Research, 15:1929 – 1958, 2014. ISSN 15324435.

[76] M. D. Zeiler. ADADELTA: an adaptive learning rate method. CoRR, abs/1212.5701,

2012. URL http://arxiv.org/abs/1212.5701.

[77] P. Y. Simard, D. Steinkraus, and J. C. Platt. Best practices for convolutional neural

networks applied to visual document analysis. In Seventh International Conference on

Document Analysis and Recognition, 2003. Proceedings., pages 958–963, Aug 2003. doi:

10.1109/ICDAR.2003.1227801.

[78] L. S. Yaeger, R. F. Lyon, and B. J. Webb. Effective training of a neural network

character classifier for word recognition. In M. C. Mozer, M. I. Jordan, and T. Petsche,

editors, Advances in Neural Information Processing Systems 9, pages 807–816. MIT

Press, 1997.

[79] Ettus Research. USRP B205mini-i Product Page. https://www.ettus.com/product/

details/USRP-B205mini-i, 2017. [Online; accessed 18-December-2017].

[80] SaikWolf. gr-signal exciter. https://github.com/gr-vt/gr-signal_exciter, 2017.

[Online; accessed 18-December-2017].

[81] D. C. Cirean, U. Meier, L. M. Gambardella, and J. Schmidhuber. Deep, big, sim-

ple neural nets for handwritten digit recognition. Neural Computation, 22(12):3207–

http://arxiv.org/abs/1212.5701

https://www.ettus.com/product/details/USRP-B205mini-i

https://www.ettus.com/product/details/USRP-B205mini-i

https://github.com/gr-vt/gr-signal_exciter

96 BIBLIOGRAPHY

3220, 2010. doi: 10.1162/NECO\ a\ 00052. URL https://doi.org/10.1162/NECO_

a_00052. PMID: 20858131.

https://doi.org/10.1162/NECO_a_00052

https://doi.org/10.1162/NECO_a_00052

Documents

Real-World Considerations for Deep Learning in Spectrum Sensing · 2020. 1. 22. · Real-World Considerations for Deep Learning in Spectrum Sensing Steven C. Hauser (ABSTRACT) Recently,