Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
1
Compressed Level Crossing Sampling for
Ultra-Low Power IoT DevicesJun Zhou, Amir Tofighi Zavareh, Robin Gupta, Liang Liu, Zhongfeng Wang, Fellow, IEEE, Brian M. Sadler,
Fellow, IEEE, Jose Silva-Martinez, Fellow, IEEE, and Sebastian Hoyos, Senior Member, IEEE
Abstract— Level crossing sampling (LCS) is a power-efficientanalog-to-digital conversion scheme for spike-like signals thatarise in many Internet of Things (IoT) enabled automotive andenvironmental monitoring applications. However, LCS schemerequires a dedicated time-to-digital converter (TDC) with largedynamic range specifications. In this paper, we present a com-pressed LCS that exploits the signal sparsity in the time domain.At the compressed sampling stage, a continuous-time ternaryencoding scheme converts the amplitude variations into a ternarytiming signal that is captured in a digital random sampler. Atthe reconstruction stage, a low-complexity split-projection leastsquares (SPLS) signal reconstruction algorithm is presented. TheSPLS splits random projections and utilizes a standard leastsquares approach that exploits the ternary-valued amplitudedistribution. The SPLS algorithm is hardware friendly, can berun in parallel, and incorporates a low-cost k-term approximationscheme for matrix inversion. The SPLS hardware is analyzed,designed and implemented in FPGA, achieving the highest datathroughput and the power efficiency compared to the prior arts.Simulations of the proposed sampler in an automotive collisionwarning system demonstrate the proposed compressed LCS canbe very power efficient and robust to wireless interference, whileachieving an approximately 8-fold data volume compression whencompared with Nyquist sampling approaches.
Index Terms— level crossing sampling, compressive sensing,Internet of Things, least squares, sparse signal reconstruction.
I. INTRODUCTION
W ITH the development of low-cost sensors and low-
power wireless communication systems, hundreds of
thousands of electronic devices are connected together for
real-time and intelligent processing. This has promoted rapid
growth of many Internet of Things (IoT) enabled automotive
and environment monitoring applications, such as lane-keeping
driver assistance, collision warning, automated parking assis-
tance, smart home, connected health, etc. Although equipped
with hundreds of sensor elements, many potential IoT applica-
tions have a large number of sensors that are event driven with
spike-like signals, and a large portion of them remain idle most
of time [1], [2]. The conventional Nyquist analog-to-digital
converters (ADCs) carry out uniform sampling, which results
in a very high accumulated data rate and correspondingly very
high power at the ADC interface and digital post-processing.
Jun Zhou is with Broadband Communication, Broadcom Limited, CA92619, U.S.A. (e-mail: [email protected]). Amir Zavareh, RobinGupta, Jose Silva-Martinez, and Sebastian Hoyos are with Department of Elec-trical & Computer Engineering, Texas A&M University, TX 77843, U.S.A.(e-mail: {amirtofighi, robin7}@tamu.edu; {jsilva, hoyos}@ece.tamu.edu;).Liang Liu is with Electronical and Information Technology Department,Lund University, Sweden (e-mail: [email protected]). Zhongfeng Wangis with School of Electronic Science and Engineering, Nanjing University,China (e-mail: [email protected]). Brian M. Sadler is with Army ResearchLaboratory, Adelphi, MD, U.S.A. (e-mail: [email protected])
Fig. 1 (a) illustrates a band-limited analog signal and its
discrete and quantized ADC output. This classic sampling
approach is agnostic to the characteristics of input signals so
long as the bandwidth requirement is satisfied. This motivates
the study of a non-uniform sampler that exploits the signal
sparsity in time for better power efficiency [3].
LCS has high efficiency in capturing signals that are sparse
in the time domain [4]–[6]. LCS schemes usually consist of
a level-crossing detector and a TDC. Shown in Fig. 1 (b), a
level-crossing detector records the time instances when signal
the crosses the predefined amplitude levels [4]. A dedicated
TDC measures the delay between two time stamps. Typically
operating asynchronously without a clock, a level-crossing
detector is able to adaptively respond to the signal variations.
The power consumption of the LCS scheme scales linearly
with time occupancy, which offers significant power savings
for time-sparse signals [7]–[10].
A drawback of LCS is that a large dynamic range TDC is
required to handle quiescent periods when a long time interval
between two successive level crossing events occurs, and time
intervals may vary over orders of magnitude depending on the
signal characteristics, e.g., from 1ns to 1ms [11]. A high-speed
coarse counter with dynamic range (DR) of 106 requires 20-bit
resolution, which may have false counts in an asynchronous
design or require a very large decoding module for a one-
to-one mapping in a synchronous design. Although a coarse
counter allows a large DR, the hardware is sensitive to long-
term clock jitter [11]. A ramp interpolator [12] yields sub-
cycle resolution with a moderate clock rate via a stretching
technique. However, this architecture is restricted to short time
intervals only and has a dead time limit. A hybrid method
combining a coarse counter and a ramp interpolator [11] can
measure both short and long time intervals, but the inevitable
finite switching time between different schemes as well as the
associated hardware cost both motivate a more power efficient
TDC design.
Recent developments in compressed LCS sampling scheme
have produced a low-power compressed sampling scheme
optimized for pulse-like signals [13]. The works of [14]–[17]
studied the compressed LCS scheme in ultrasound imaging
and radar with echoes in sparsely scattering environments
driven by impulsive transmit signals. Leveraging a novel
continuous-time ternary encoding (CT-TE) block [13], return
echoes are converted to a ternary sparse timing signal. CT-
TE encodes the relative level changes between two successive
time stamps via ternary symbols, e.g., {+1, 0, −1}. As echoes
are transient, the ternary timing signal generated by the CT-TE
is also sparse. This enables a sub-Nyquist rate sampling via
Digital Object Identifier: 10.1109/TCSI.2017.2707481
1558-0806 c© 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
2
compressive sensing (CS) [18] with a moderate DR. The work
[19], [53], [54] explored the CT-TE in low-power biomedical
applications.
In this paper, we present hardware design of the compressed
LCS. In compressed sampling, we study the circuit architec-
ture and key modules in the CT-TE block. The relationship
between the conversion resolution and signal slew rate is
investigated. In signal reconstruction, a low-complexity noise-
robust algorithm called SPLS is studied. SPLS splits random
projections into a few independent sections and adopts LS
approach for recovery of ternary-valued timing signals. We
present a fixed-point hardware design for the SPLS. A k-term
approximation to matrix inversion is proposed to reduce hard-
ware complexity, while still maintaining a bounded probability
of signal recovery. The non-adaptive SPLS algorithm enables
a parallel hardware architecture that is flexible in trading off
circuit complexity and system throughput. The SPLS outper-
forms conventional CS signal reconstruction algorithms, such
as orthogonal matching pursuit (OMP) [20], and iteratively
reweighted least squares (IRLS) [21]. The SPLS hardware has
been designed in RTL and tested in FPGA. Simulations of the
proposed scheme in an automotive collision warning system
confirm the compressed LCS and the SPLS reconstruction
algorithm can be robust to wireless interference, achieving
about 8-fold data compression in a wireless or wireline trans-
ceiver when compared to Nyquist sampling approaches.
This paper is organized as follows. Section II gives a
brief review on CS technique. In Section III, the CT-TE
hardware design is presented and compared to prior arts. In
Section IV, the SPLS hardware design is presented and a
low-complexity k-term approximation to matrix inversion is
discussed. Design results and comparisons with state-of-the-
art are also included. Section V shows analysis and numerical
results of the proposed CS system in an automotive collision
warning application with in-band wireless interference. Sec-
tion VI concludes the paper.
II. COMPRESSIVE SENSING BACKGROUND
CS technique enables sub-Nyquist rate sampling of sparse
signals at the analog-to-digital interface [18]. Signal x ∈ RN
is called K -sparse in a representation basis if K ≪ N
of the coefficients are significant and the rest are zero. We
define sparsity as the ratio K /N . Let � denote an M × N
random sensing matrix, where M ≪ N . The compressed
measurements y ∈ RM are generated by y = �x + w, where
w is zero-mean Gaussian noise with distribution denoted by
N(
0, σ 2)
.
The columns of �, denoted {ϕi }Ni=1, are often referred to as
atoms. The incoherent sampling principle requires the maximal
absolute value of correlation of any two different columns to
be upper-bounded by a small value ε, [22], [23]
maxi, j∈[1,N ],i �= j
∣
∣
⟨
ϕi , ϕ j
⟩∣
∣ < ε. (1)
State-of-the-art mixed-signal CS front-end designs include
a random demodulator (RD) [24]–[26], parallel segmented
CS (PSCS) [27], and modulated wideband converter (MWC)
Fig. 1. Illustration of an input band-limited analog signal and output discretesamples from different sampling devices: (a) uniform sampling ADC; (b) levelcrossing sampling scheme.
[28]. For a detailed analysis on hardware limitations of these
designs, we refer the reader to our previous works [13], [14].
Given incoherent sampling, the signal recovery problem is
min ‖x‖1 subject to y = �x . (2)
Many algorithms are available for solving (2). They may be
roughly categorized into two groups. The first group follows
basis pursuit using linear programming [29]. The second
uses a greedy method, such as orthogonal matching pursuit
(OMP) [20] and stage-wise OMP (StOMP) [30]. For a detailed
analysis on the limitations of these algorithms, please see [13],
[14].
III. CT-TE HARDWARE IMPLEMENTATION
The CT-TE scheme introduced in [13] delivers an efficient
asynchronous sampling for pulse-like signals that are sparse
in time. The CT-TE only fires upon level crossing. A Q-bit
DAC divides the dynamic range and provides a threshold pair
(Vth,L , Vth,H ) to two comparators that form an input tracking
window with one least significant bit (LSB) span.
Fig. 2 shows a hardware block diagram of the CT-TE
scheme, which consists of a pair of differential amplifiers,
comparators, a fixed width pulse generator, and a charge
pump. The differential amplifier takes a difference of the input
signal and threshold generated by the charge pump circuit. Its
intrinsic high gain characteristics increase sensitivity of the
comparator. The comparator consists of four cascaded CMOS
inverters. The comparison level is set in the first stage which
can be controlled by changing the voltage transfer curve via
transistor sizing. The threshold of an inverter based comparator
is given by Eq. (3),
Vth =Vdd −
∣
∣Vt p
∣
∣ + Vtn
√
Kn/K p
1 +√
Kn/K p
, (3)
where Vth is comparator threshold, Vdd is the power supply rail
voltage, Vtn and Vt p are NMOS and PMOS device threshold,
respectively, Kn and K p are the product of device capacitance,
mobility and dimensions for NMOS and PMOS, respectively.
3
Three inverter stages are used for increased gain and logic
level inversion so that the circuit behaves as an internally set
comparator circuit. The key in the second stage design is that
it must be same as the first stage to maintain the same DC
threshold. It’s helpful to keep the linearity in balance for the
rising and falling intervals of high frequency input signals.
The inverter based comparator is a simple design with very
high speed (mainly controlled by the gate delay) compared
to other counterparts like differential amplifier based high
gain comparators. The major disadvantage of the inverter
based comparator is two fold. First, the threshold Vth is
sensitive to temperature variation. In Eq. (3), the mobility and
transistor threshold voltage are both a function of temperature.
Therefore, the comparator threshold is also a function of
temperature. This needs to be calibrated at power up so
that impact of variations can be compensated. Second, the
comparator is sensitive to power supply noise due to its single
ended design. A bypassing capacitor is often employed to
reduce high frequency current. It’s shown that the gate delay
is controlled by device size. Since we don’t want to put a large
load to the differential amplifier stage, a minimum width will
be a good choice for first inverter stage. Assuming we select
a minimum width device for the first inverter stage, the gate
delay of each subsequent stage becomes independent to device
width and is a function of device length and power supply.
The total gate delay of the cascaded inverters in this design is
approximated to 45ps.
Propagation Delay ∝ L2
Vdd
. (4)
A fixed width pulse is generated and fed to the charge pump
for pumping up or down the current into the capacitor at each
time the comparator trips, which increases or decreases the
threshold, respectively. The fixed width pulse generator has a
set/reset flip flop with a fixed delay in the feedback path as
shown in Fig. 2. The delay element “D” in Fig.2 is designed
by an array of series inverters. The calibration process of the
inverters and charge pump can be done with a combination of
delay locked loop (DLL) and phase locked loop (PLL). First,
the calibration of charge pump P/N mismatch can be done
by tweaking pulling up/down current with a fixed width pulse
generator by a PLL. This calibration process doesn’t need to
be on all the time. It can be done at the power on or in the
background. This way PLL power consumption doesn’t come
into picture. Second, the calibration of the inverters (delay
element) can be done by tweaking the control voltage of the
voltage controlled delay lines in a DLL. In [31], it illustrates
a replica based technique that provides ps-accuracy delay line.
The idea is to tune each delay cell by a control voltage
across them. Once the voltage is found and held, the inverters
maintain the same delay. Similarly, the DLL can either run at
the start up or background to get the calibrating parameters.
Therefore, the DLL power is not included in total dissipated
power.
Charge pump circuit is shown in Fig. 3. Switches MP3 and
MN3 control the current flow to the charge-pump output. To
mitigate switching errors propagating to the output node, these
switches are on the source side of the current source devices
Fig. 2. Block diagram of CT-TE hardware implementation.
Fig. 3. Circuit schematic of charge pump design.
Fig. 4. The relationship between the maximum input slew rate versus thestep size of charge pump output in the proposed CT-TE hardware design.
MP1 and MN1. Dummy devices MP2, MP4, MN2, MN4 reduce
both charge injection and clock-feedthrough noise, while MP5
and MN5 ensure a fast turn off of MP3 and MN3.
The charge pump switching frequency is a function of
charge pump current, charging/discharging capacitor size, and
switch on/off pulse widths. A higher charge pump current
results in a higher switching frequency but the power con-
sumption of the charge pump also goes higher. A small size
4
Fig. 5. Power consumption of the CT-TE hardware with different DACresolution.
Fig. 6. Block diagram of digital random sampler (DRS) in the compressedLCS sampling scheme.
charging capacitor requires short time to charge or discharge.
But the integrated noise (KT/C) on the reference signal is
larger. Going through these constraints in our design, we
selected a capacitor size of 50fF, a pulse width of 50ps
translates to an average current of 100uA for 10mV change.
Note that the front end differential amplifier is a macro model
in our schematic simulation just to distinguish the input and
the threshold voltages. Differential amplifier is the only macro
model used in our simulation. The total delay in the signal
path in simulation is 130ps, including 45ps on the inverters
at the input, 35ps on the CLK-Q delay and 50ps on the
delay of the series inverters (delay element). It translates to
an asynchronous data conversion rate up to 7.7GHz. It can
be further increased by increasing the switching charge pump
current at the cost of higher power consumption.
The step size of charge pump output is controlled by pulling
up/down current sources and the pulse width. A small step size
provides high resolution to input variation which minimizes
the granular noise. However, a small step size in the charge
pump also restricts the maximum input slew rate (SR) given a
maximum CT-TE conversion rate fs . Let’s define U as input
swing, and Q-bits quantization is employed, we have,
S R = U
2Q· fs = LSB · fs . (5)
In this work, the CT-TE conversion rate is 4GHz. Fig. 4
illustrates the step size of the charge pump output in our CT-
TE design versus the maximum input slew rate. In Fig. 4, we
test different step sizes of charge pump output (equivalent to
the LSB of the CT-TE), and sweep the input slew rate until
the CT-TE loopback fails to track the input variation. We can
see that the maximum slew rate allowed in the CT-TE design
is linearly proportional to the step size of charge pump output,
which matches Eq. (5) very well.
Fig. 7. Block diagram of the SPLS signal reconstruction algorithm.
Table I compares the power consumption of this work with
recent low-power LCS and successive approximation (SAR)
ADC designs. We list the most important parameters in recent
IoT related sensor front end design like the power consumption
and data conversion rate. We define a figure of merit (FOM)
as
FO M = Power
2E N O B × 2 × BW, (6)
where ENOB stands for effective number of bits. The FOM
indicates the amount of power dissipated for every single
conversion to produce an output. A ramp with a slew rate
of 1V/µs, and a DAC resolution of 10-bit was chosen which
leads to a conversion rate of 4 GHz. The ramp was chosen to
manifest the worst power performance of the circuit due to the
fact that it excited the CT-TE at maximum. Simulation results
suggest our design can potentially achieve very low power
consumption. The power scales with the DAC resolution. Fig.
5 shows the higher the bit number is, the more transitions occur
and consequently more power is used. Table I also shows the
best known low-power SAR ADC announced equivalent 10-
bit 10MSps performance, which is comparable to our scheme
in terms of speed, resolution, and power consumption. We
would like to point out the proposed compressed LCS scheme
has sub-Nyquist rate in analog to digital conversion. This
unique feature allows reduced data volume in the wireless or
wireline transceiver. Section V illustrates an example in an
automotive collision warning system. The reduced data volume
means additional power saving can be expected when sending
compressed data, at the cost of extra signal reconstruction
stage which is implemented in a power-insensitive host.
A pulsed signal pair (Vup, Vdown) from the CT-TE hardware
results in a sparse analog ternary output, enabling a time-
delay measurement using CS technique without explicitly
counting the clock periods. The ternary state is input to a
low complexity digital random sampler (DRS) (Fig. 6) with
only one MUX and one accumulator for sub-Nyquist sampling.
The DRS exhibits many advantages compared with RD, PSCS
and MWC for time sparse signals, e.g. low complexity, ease
of massive integration, excellent linearity, and digital tech-
nology scaling. In addition, the analog integrator that serves
in inner production calculation suffers from saturation issue
5
TABLE I
COMPARISON ON LOW-POWER LEVEL CROSSING SAMPLER (LCS) AND SUCCESSIVE APPROXIMATION (SAR) DESIGNS
inevitably due to the central limit theorem according to [45],
[46], which significantly degrades the signal reconstruction
from compressed samples. Though the stability issue can be
resolved by moving the pole to a higher frequency with a
resistor divider in the feedback loop, it complicates signal
reconstruction. However, saturation issue does not show up
in the proposed compressed LCS scheme. It’s because the
CT-TE converts signal’s amplitude variations into ternary
symbols. The compressed samples are generated in the digital
domain by the digital random sampler shown in Fig. 6, which
is designed to be saturation-proof at acculation with word
length extension per stage. Therefore, the proposed scheme is
different from the conventional compressive sensing front-end
hardware in terms of saturation nonlinearity. As for thermal
noise, “Vin” in Fig. 2 comes from a standard RF receiver.
The RF design is out of scope in this paper. We assume the
outband noise is suppressed by RF chain. And inband noise
can be handled by the embedded Schmitt trigger design in
threshold generation. There are more details in [13].
State-of-the-art TDC designs [47], [48] have difficulties
to support 60dB (from 1ms to 1ns) dynmaic range of time
delay between two successive pulses in many pulsed IoT
applications. Recent work [47] reported a 0.4mW TDC design
fabricated in 0.11µm CMOS process. It supports an input
signal bandwidth up to 50kHz only. Work [48] showed
a 29.4Msample/s 2.8mW TDC design fabricated in 65nm
CMOS process. However, design in [47] and [48] can only
support a dynanic range of 40dB and 35dB, respectively, in
time delay measurement. It does not meet the 60dB desired
performance.
IV. SPLS HARDWARE DESIGN
Denote x ∈ RN to be the K -sparse ternary signal, x ∈ {−1,
0, 1}, and y ∈ RM to be CS measurements. A window mask
� of length L slides along the columns of matrix �. At each
time increment, a subset of consecutive columns are selected,
and � can be split into ⌈N/L⌉ mutually exclusive pieces. For
the i th piece, � is represented by two matrices �(i)� and �
(i)
�,
where � is the complementary set of �. Dropping (i), y is
denoted as,
y = ��x� + ��x� + w. (7)
The SPLS algorithm was first introduced in [14] for signal
reconstruction from compressed samples generated by the
compressed LCS scheme. It uses conventional LS, followed
by thresholding to yield a ternary signal reconstruction. The
derivation of the SPLS statistics and optimal threshold for
ternary detection was studied in [14]. The SPLS estimate of
each column window is,
x� =(
�T���
)−1�
T�y. (8)
The thresholds TH for (1 − α) hypothesis confidence are
shown below, where Tα is a constant. Fig. 7 illustrates a block
diagram of the proposed SPLS scheme, where multiple parallel
LS estimators are employed for real-time signal reconstruction.
We use standard theory to find the thresholds to be [14],
T H = x p ± Tα ·Std
(
x p
)
√p
, (9)
where x p is mean of p estimates of x , Pr (−Tα < t < Tα) =1 − α.
The SPLS yields an over-determined LS estimation in each
random projection section. In addition, signal recovery of each
section is statistically independent, which is a unique advan-
tage compared with general compressive sensing optimization
for reconstruction such as OMP [20], and IRLS [21]. These
conventional CS signal reconstruction algorithms are iterative
optimization routines that are not readily parallelizable. The
independent recovery of each section allows a fully parallel
design. Note that the recovered sections can be employed
to prune the interference from nonzero elements outside of
the projection window when processing the remaining section
[14].
In the SPLS scheme, the most complex hardware component
is the LS estimator, which includes matrix multiplication and
inversion, with matrix inversion the most computationally
intensive. Conventional VLSI design for general Hermitian
and positive-definite matrix inversion includes QR decomposi-
tion, Cholesky decomposition, and others. For a problem size
of L, Cholesky decomposition requires a total of L divisions,
1/3∗L∗(L2 − 1) multiplications and 1/3∗L∗(L2 − 1) additions.
The number of multiplication grows cubically with L.
For our case note that the matrix A = �T��� has very
small off-diagonal elements due to the incoherent sampling
kernel. Define A = D+E, where matrices D and E contain the
diagonal and off-diagonal elements of A, respectively. Noting
6
TABLE II
COMPUTATIONAL COMPLEXITY OF MATRIX INVERSION SCHEMES, FOR
HERMITIAN AND POSITIVE-DEFINITE MATRIX INPUT WITH DIMEN-SION L -BY-L
Fig. 8. Mean squared error (MSE) of least squares estimate parameterizedby the number of quantization bits Q at 10dB signal to noise ratio (SNR).
Fig. 9. MSE of LS estimate versus SNR when using 10-bit word length infixed-point representation.
that En → 0 when n → ∞, then matrix A can be rewritten
in as
A−1 = (D + E)−1 =(
I + D−1E)−1
D−1
=∞∑
n=0
(
−D−1E)n
D−1. (10)
Now consider the k-term approximation of the matrix inverse,
A−1k =
k−1∑
n=0
(
−D−1E)n
D−1. (11)
Now, a 1-term approximation is A−1
1 = D−1, while a 2-
term approximation is A−1
2 = D−1 − D−1ED−1. In general,
the residual error with a k-term approximation is
�k =∞∑
n=k
(
−D−1E)n
D−1 = (−D−1E)k
1 + D−1ED−1
=(
−D−1E)k
A−1. (12)
Therefore, the LS estimate of x� with a k-term matrix inverse
approximation can be written as,
x� = A−1k �
T�y = (I − �kA)A−1
�T�y = (I − �kA) x�. (13)
Equation (12) shows the approximation error in the LS esti-
mate depends on the residual error term �kA. The following
theorem characterizes the probability that the residual error of
a k-term approximation exceeds a given threshold α > 0.
Theorem 1. Let �� ∈ RM×L come from modified Bernoulli
random process, taking value from {1, −1} with equal proba-
bility. Then, for an integer k > 0 and any α > 0, we have
Pr(
‖�kA‖2F > α
)
≤ 2L2
M2α2/k, (14)
where ‖·‖F denotes the Frobenius norm.
The proof of Theorem 1 is given in Appendix. Theorem
1 suggests that by increasing the number of samples M , we
can constrain the probability of a large residual error caused
by the finite term approximation to be arbitrarily small. The
residual error decays in a speed of M2. Since L is less than
M by default in the SPLS scheme, the k-term approximation
approach has a good bounded residual error probability.
The proposed k-term approximation to the matrix inverse
has scalable hardware complexity depending on k. For exam-
ple, a 2-term approximation has only L divisions, and 2∗(L2–
L) multiplications. Table II compares computational complex-
ity of different matrix inversion schemes. Specializing to our
case, the proposed k-term approximation method offers con-
siderable hardware savings over general inversion algorithms.
We now consider quantization effects in the proposed k-term
matrix inversion in a stand-alone LS problem. Let elements
of �� have a standard normal distribution with dimension
of 10,000×100. We consider a LS approach to the linear
problem, y = ��x + w, where x is random ternary signal
(not necessarily sparse), and w is from Gaussian distribution
N(
0, σ 2)
.
Define the mean squared error (MSE) and signal-to-noise
ratio (SNR) in Eqs. (15) and (16), respectively, as
MSE =∥
∥x − x∥
∥
2
2
‖x‖22
, (15)
7
TABLE III
COMPARISON OF IMPLEMENTATION RESULTS OF THE PROPOSED SPLSAND OMP RECONSTRUCTION ALGORITHMS USING CMOS TECHNOL-
OGY
TABLE IV
COMPARISON OF IMPLEMENTATION RESULTS OF THE PROPOSED SPLSAND OMP RECONSTRUCTION ALGORITHMS USING XILINX FPGA
Fig. 10. Timing flowing diagram of the proposed LS calculation in theproposed SPLS estimator hardware.
SNR = 10 log10
Psignal
Pnoise(dB) . (16)
Fig. 8 shows the MSE of the LS estimate parameterized by
finite word-length Q at 10dB SNR. Fixed-point and floating-
point Cholesky decomposition are compared. As shown in Fig.
8, the fixed-point Cholesky decomposition achieves floating
point performance when Q is larger than 13 at 10dB SNR. An
MSE floor appears in the k-term approximation method when
Q is greater than 10, because the finite term approximation in
Eq. (13) introduces a small residual error. A larger k results
in less residual error and hence smaller MSE. However, note
that the MSE improvement becomes marginal when k > 2.
Fig. 9 shows the MSE of the LS estimate versus SNR with a
10-bit word-length in fixed-point representation. At relatively
high SNR (SNR ≥ 15dB), 10-bit quantization is insufficient to
preserve MSE performance as both the approximation scheme
and Cholesky decomposition have MSE error floors. However,
at lower SNR (SNR ≤ 10dB), the 2-term approximation has
a similar MSE compared to Cholesky decomposition.
Based on Figs. 8 and 9, the 2-term approximation is a good
alternative to Cholesky decomposition at low to medium SNR,
with quadratic (rather than cubic) multiplication growth with
the problem size, and no additions. Consequently, we adopt the
10-bit 2-term matrix inversion approximation in the following.
The LS estimation can be decomposed into two parts: three
matrix multiplications and matrix inversion. The three matrix
multiplications are A = �T���, B = �
T�y and A−1B. Each
matrix multiplication involves a collection of independent
inner product calculations. To optimize hardware reuse, the
matrix multiplications are carried out by one generic inner
product module with time multiplexing. A finite state machine
(FSM) is designed to select the correct input vectors and feed
to the inner product module. For example, the element ai, j in
matrix A is calculated from the i th row of �T� and j th column
of ��. Note that only the upper triangle of A needs calculation
due to matrix symmetry. Also, the diagonal elements of A
are all equal to 1 if �� comes from a normalized Bernoulli
random matrix. In this case, the first two matrices A and
B can be computed together with a MUX at the input to
minimize reconstruction time, and the 2-term approximation
to the inversion of matrix A can be simplified to element-
wise operations. Fig. 10 is a timing flow diagram of the
LS estimator. Note that the calculation of A = �T��� and
B = �T�y are independent to each other, they are grouped in
Phase 1, followed by A−1B in Phase 2.
The SPLS reconstruction algorithm was designed in Verilog
at 65nm CMOS technology. Table III compares the hardware
results of the proposed SPLS scheme and existing designs
using the OMP algorithm, where problem size describes
dimension of the matrix �. Table III shows that the SPLS has
the shortest reconstruction time due to the parallel architecture.
In the simulated SPLS design, two parallel LS estimators
were implemented. Additional parallelization can be readily
implemented. Table III shows the SPLS has the smallest area
compared with methods from [32] and [33], due to the low-
cost k-term matrix inversion approximation.
Power is estimated from Power Artist simulation. Table III
shows the SPLS scheme can potentially achieve one fifth the
power of the OMP matrix inversion design in [33]. As a further
comparison, FPGA-based implementations are listed in Table
IV, comparing the proposed SPLS design with schemes from
[32], [34] and [35]. The SPLS exhibits the best data throughput
performance with the least hardware cost.
8
V. RADAR SYSTEM DESIGN EXAMPLE
In this section, we evaluate the compressed LCS scheme
and the SPLS signal reconstruction algorithm in an automotive
collision warning system with in-band wireless interference.
Please note that the compressed LCS scheme is not limited
to automotive collision warning application. Instead, the pro-
posed compressed LCS scheme serves as a low-power analog-
to- digital sampling scheme and can be applied to many other
IoT enabled applications that are driven by event-based spike-
like signals, like biomedical and environment monitoring.
Analysis on the other IoT applications will not be included
in this paper.
The growing use of radar and communications motivates the
development of a system that allows co-existence of both radar
and wireless communications in the same spectrum allocation.
Conventional interference-robust receiver solutions include
parallel architectures such as analog filter banks to notch out
the undesired interference [36], and advanced spatial signal
processing using MIMO [37]. Both solutions face challenges
related to the power and complexity of the front-end hardware
due to the multiple parallel processing channels. Given that
a high-resolution radar requires broadband transmit signals,
and that the Shannon sampling theorem requires a sampling
rate of at least twice the signal’s bandwidth, an excessive data
volume appears at the outputs of conventional Nyquist analog-
to-digital converters (ADCs), that not only challenges the
integration of a large number of high-speed, high-resolution
ADCs, but also the high-speed data links for following digital
processing.
Recently, a few range finder designs based on impulse-radio
ultra-wideband (IR-UWB) are reported [51], [52]. Work [51]
uses time extension approach in quantization to relax ADC
high speed constraint. While the sampling stage still runs at
Nyquist rate. In addition, leading edge detection is required in
multipath environment. Otherwise, the actual echo from direct
path may be missed in dead time. Work [52] adopts a two-
step TDC with scalable delay cells. The finite switching time
between different delay cells causes larger ranging error. Also,
the power cost in this architecture is generally higher than
other approaches. These limitations motivate a more power
efficient TDC design.
Some CS-based compressed sampling schemes have been
reported for radar [38], [39]. However, the RD approach in
[38] and the MWC approach in [39] are challenged by the
power and complexity as discussed earlier. And, these designs
rely on random basis compressed sensing and inevitably suffer
an SNR loss proportional to the compression ratio.
Consider an automotive collision warning system with a
maximum unambiguous range of 90 meters to support a safe
braking and stopping distance at a speed of 100km/hour, which
is roughly 80 meters. We choose a range resolution of 0.02m
for urban application, requiring an 8GHz bandwidth linear
frequency modulated (LFM) transmit pulse. We simulate the
radar, with a detection probability (Pd) of 0.9 and a false
alarm probability (Pfa) of 10−6. Some key parameters of
the simulated collision avoidance radar system are listed in
Table V. A 6GHz carrier frequency places the radar pulse in
the 2GHz∼10GHz spectrum where wireless communication
systems (WLAN and cellular) are also present as interference
with an assumed power level from −50dBm to −70dBm.
A filter bank is employed to notch out wireless interference
while providing multiple channels for parallel processing. The
filter bank consists of 20 2nd-order filters that divide the
signal bandwidth into 20 equally spaced sub-bands of 400MHz
each. Fig. 11 illustrates the filter banks spectrum response
(left) and output waveforms of each channel (right) when
radar signals and wireless interference are simultaneously
present. Fig. 11 illustrates a tone interference contaminates
the 19th as well as its neighboring channels with a power
much stronger than the radar signal, whereas channels far away
from the interference, e.g., the 1st ∼ 17th channels, have high
interference attenuation. When zooming in to pulse area at one
channel, it looks similar to the waveform shown in Fig. 1 (b).
The level crossing points and the corresponding time instant
are also marked.
Fig. 12 shows the proposed radar receiver with emphasis
on the mixed-signal interface. z(t) and y[n] are the input
and output of the compressed LCS scheme, respectively. It is
worth noting that in some cases clutter returns from trees or
buildings can be strong and challenge desired target detection
and tracking. In such a case the return may become less
sparse, but clutter mitigation schemes such as the displaced
phase center antenna (DPCA) pulse canceller [40] and adaptive
DPCA can be employed.
We first evaluate the SPLS performance by synthetic signals.
The synthetic signal is helpful to examine the SPLS estimate in
a standard, controllable setup before we dive into actual radar
signal. Actually, we believe this part of analysis is applicable
to other pulse-like signals, like ultrasound, when they have the
similar signal property after the CT-TE scheme. The ternary 3-
bit CT-TE output model is adopted to examine the statistics of
the SPLS estimator. We simulate the ternary timing signal with
a sparsity of 0.2%. Fig. 13 compares the theoretical analysis
on bias and variance of the SPLS estimator with simulations at
0dB SNR. Simulation results illustrate the inverse relationship
between the bias and variance of the SPLS estimator and the
normalized sub-Nyquist sampling ratio (SSR, defined as the
ratio of the actual sampling rate fs to Nyquist rate fN yq ). Fig.
13 indicates a small bias and variance even with low SNR,
because the SPLS scheme converts each section recovery to
a locally over-determined LS problem that is noise robust.
Although interference from outside of the projection window
affects the LS estimate, this is small compared with the unit
amplitude of the ternary signal due to the use of an incoherent
sensing matrix. In addition, the bias and variance decrease as
the SSR increases.
A clustering parameter (CP) is defined as the minimum
number of nonzero elements in a cluster. By controlling
CP, dense or scattered distribution ternary-valued signals are
emulated for the CT-TE, with different quantization bits in
the DAC. This simulates the density of the scatterers in the
scene. Setting the signal dimension to 10,000 with a sparsity
of 0.2% and SSR equal to 0.08, three normalized windowing
sizes, L = 0.04, 0.08 and 0.12, are used with 100,000 Monte
Carlo simulation trials.
9
Fig. 11. Response of a 20-channel analog filter bank when pulsed radarsignals and interference from wireless communication are simultaneouslypresent.
Fig. 12. Architecture of the proposed interference-robust receiver withemphasis on the proposed compressed LCS sampling scheme.
Fig. 13. Analysis and simulation bias and variance comparison for theproposed SPLS scheme at 0dB SNR.
Error rate (ER) is defined as the number of error symbols
after ternary detection divided by the total number of received
symbols. Fig. 14 shows the ER performance versus normalized
CP parameterized by different normalized windowing sizes in
the split projection. ER improves slightly when CP increases
in noise free and 5dB SNR cases. The improvement is due
to the fact that a larger CP results in a more concentrated
distribution of nonzero elements. This reduces the possibility
of a large number of nonzero elements falling outside of a split
projection window, and hence reduces the probability of large
Fig. 14. Error rate (ER) performance parameterized by normalized clusteringparameters and normalized windowing sizes in the proposed SPLS scheme.
interference in the LS estimate. The ER also improves as the
projection window size increases. OMP [20] is included for
comparison. The CP has marginal influence on ER as shown
in Fig. 14. The ER remains small even for a wide range of
CP, which indicates the SPLS works well when the ternary
signal is clustered in time, as can occur in radar with dense
scattering.
The window size of the split projection has a relatively
larger impact on ER compared with CP, especially for the
noisy case. This is because both noise and interference affect
the error rate. Consider each column of the measurement
matrix as a signal representation atom. Larger L provides a
larger dictionary to interpret the compressed measurements
during each section recovery, and reduces the possibility of
fitting noise and interference that degrades the ER at lower
SNR. A larger L naturally increases the computational cost;
recall that the computational complexity of the SPLS scheme
is O (N M L). This provides a useful system tradeoff depend-
ing on the particular application.
The actual radar signal is now adopted to investigate some
system metrics in radar when the SPLS is used, for example,
error rate, probability of detection (Pd) and probability of
false alarm (Pfa). Fig. 15 compares ER of ternary timing sig-
nal recovery via state-of-the-art CS reconstruction algorithms
(basis pursuit [29], OMP [20], stage-wise OMP (StOMP) [30],
Bayesian CS [41] and smoothed-L0 [42]) at 0dB∼5dB SNR.
The ternary timing signal output from the CT-TE scheme has
an average sparsity of 0.38%, which suggests an SSR of 0.13
in the simulations, or an approximately 8-fold data volume
reduction compared with Nyquist-rate sampling. In Fig. 15, the
SPLS estimator has the best ER performance over the entire
SNR range. This important performance advantage comes
from the use of a “tall” matrix in each recovery that leads
to a locally over-determined LS problem for each section,
yielding good noise robustness. The optimal thresholds in (9)
are selected for the SPLS estimator. We set the threshold to
±0.5 for the conventional CS recovery algorithms because
these CS estimators yield an exact signal recovery with high
probability, so they are unbiased with high probability. It has
been demonstrated in [43], both theoretically and numerically,
10
that the basis pursuit de-noising (BPDN), the Dantzig selector,
and the OMP all have MSE proportional to the oracle estimator
MSE which coincides with the unbiased Cramer-Rao Lower
Bound (CRLB) for exactly K -sparse deterministic signals. The
oracle estimator performs optimal LS estimation by including
the prior knowledge of the sparse signal support [44]. How-
ever, we have to be cautious with the interpretation in the
present case as we exploit the prior of the signal’s ternary
structure.
Fig. 16 shows Probability of Detection (Pd) vs. Probability
of False Alarm (Pfa) parameterized by communication radar
signal to interference power ratio (SIR). The figure shows that
the proposed scheme can achieve a Pd of 0.9 and a Pfa less
than 0.04 with 8-fold data volume reduction compared with
the Nyquist-rate sampling.
As a generic sub-Nyquist ADC, the compressed LCS
scheme is applicable to both filter bank based interference-
robust single antenna radar and MIMO radars that have an
inherent multi-channel architecture. Some extensive studies on
MIMO radars, such as [45], reveal that Pfa can be improved
by two orders of magnitude in a 4-by-4 MIMO configuration.
The compressed LCS and the SPLS scheme would also reduce
the sampling data volume of such a system.
VI. CONCLUSION
We developed a sub-Nyquist sampling scheme for sparse in
time signals, especially in low-power IoT applications, with
a low power compressed LCS scheme, and low complexity
SPLS algorithm. Many sensors envisioned for IoT applications
are event driven with spike-like time signal outputs. Our
design, analysis, and simulation study predicts that the FOM
of the proposed LCS scheme is potentially very favorable with
respect to existing designs. We also presented the SPLS sparse
signal reconstruction algorithm paired with the compressed
LCS that exploits the ternary-valued CT-TE output. The SPLS
splits the random projections with an l2-norm approach fol-
lowed by threshold detection. To lower hardware cost, we
presented a k-term approximation for matrix inversion in the
LS estimator. The SPLS has fixed complexity and each section
reconstruction is independent, which enables fully parallel
hardware architecture. The VLSI design study shows that
the SPLS algorithm can be very power efficient, and could
provide higher throughput than conventional CS sparse signal
reconstruction algorithms such as OMP, by taking advantages
of the sparse ternary sampling scheme.
APPENDIX
Theorem 1. Let �� ∈ RM×L be i.i.d. real standard
normally distributed. Then, for an integer k > 0 and any
α > 0, we have
Pr(
‖�kA‖2F > α
)
≤√
3L2 (M + 2)
M2α1/k, (14)
where ‖·‖F denotes the Frobenius norm.
Proof: We have
E(
‖�kA‖2F
)
= E
(
∥
∥
∥(−D−1E)k∥
∥
∥
2
F
)
≤ E
(
∥
∥
∥D−1E
∥
∥
∥
2k
F
)
.
TABLE V
RADAR SYSTEM SPECIFICATIONS
Fig. 15. Error rate of the recovered signal after amplitude detection at a sub-Nyquist sampling ratio of 13% in an automotive collision warning system.
Fig. 16. Probability of Detection (Pd) vs. Probability of False Alarm (Pfa)parameterized by communication radar signal to interference power ratio(SIR).
(17)
The last inequality follows from the sub-multiplicative prop-
erty of the matrix Frobenius norm. D is a real diagonal matrix
with diagonal elements
dii = ϕTi ϕi , (18)
11
and E is a real matrix with each element given by
ei j = ϕTi ϕ j , i �= j. (19)
Therefore, Eq. (17) can be rewritten as
E(
‖�kA‖2F
)
≤ E
(
∥
∥
∥D−1E
∥
∥
∥
2k
F
)
= E
⎛
⎝
L∑
i=1
L∑
j=1
∣
∣
∣
∣
ei j
di j
∣
∣
∣
∣
2k⎞
⎠ =L
∑
i=1
L∑
j=1
E
(
∣
∣
∣
∣
ei j
di j
∣
∣
∣
∣
2k)
=L
∑
i=1
L∑
j=1
E
⎛
⎝
∣
∣
∣
∣
∣
ϕTi ϕ j
ϕTi ϕi
∣
∣
∣
∣
∣
2k⎞
⎠
≤L
∑
i=1
L∑
j=1
E
(
1∣
∣ϕTi ϕi
∣
∣
2k
)
E
(
∣
∣
∣ϕTi ϕ j
∣
∣
∣
2k)
. (20)
The last step follows from the Cauchy-Schwarz inequality.
If every entry of � is picked up from a modified Bernoulli
random process with equal probability between 1 and -1, then,
E(
|ϕTi ϕi |4
)
= M4,
E(
|ϕTi ϕ j |4
)
= M2 + (M − 1) M = 2M2 − M.
Hence, for a 2-term approximation, Eq. (20) becomes
E(
‖�2A‖2F
)
≤ E
(
∥
∥
∥D−1E
∥
∥
∥
4
F
)
≤L
∑
i=1
L∑
j=1
E
(
1∣
∣ϕTi ϕi
∣
∣
4
)
E
(
∣
∣
∣ϕT
i ϕ j
∣
∣
∣
4)
= 2M2 − M
M4L2 ≤ 2L2
M2. (21)
According to Markov’s inequality, for any α > 0, we have,
Pr(
‖�kA‖2F > α
)
≤ Pr
(
∥
∥
∥D−1E
∥
∥
∥
2k
F> α
)
= Pr
(
∥
∥
∥D−1E
∥
∥
∥
4
F> α2/k
)
≤E
(
∥
∥D−1E∥
∥
4
F
)
α2/k≤ 2L2
M2α2/k. (22)
ACKNOWLEDGEMENTS
The authors acknowledge the contributions of the students,
faculty, and sponsors of the Analog and Mixed-Signal Center
at the Texas A&M University. This research project was
partially funded under the NSF Award Number 1547436.
REFERENCES
[1] J.A. Stankovic, “Research Directions for the Internet of Things,” IEEE
Internet of Things Journal, vol. 1, no. 1, pp. 3 – 9, Feb. 2014.[2] B. Martinez, M. Monton, I. Vilajosana, and J. Prades, “The Power of
Models: Modeling Power Consumption for IoT Devices,” IEEE SensorsJournal, vol. 15, no. 10, pp. 5777 – 5789, Oct. 2015.
[3] F. Marvasti, “Nonuniform Sampling: Theory and Practice,” Springer,New York, United States, 2001.
[4] J. W. Mark, and T. Todd, “A Nonuniform Sampling Approach to DataCompression,” IEEE Trans. on Communications, vol. 29, no. 1, pp. 24– 32, Jan. 1981.
[5] Y. Tsividis, “Mixed-Domain Systems and Signal Processing Based onInput Decomposition,” IEEE Trans. on Circuits and Systems I, vol. 53,no. 10, pp. 2145 – 2156, Oct. 2006.
[6] Y. Tsividis, “Event-Driven Data Acquisition and Digital Signal Process-ing: A Tutorial,” IEEE Trans. on Circuits and Systems II, vol. 57, no.8, pp. 577 – 581, Aug. 2010.
[7] A. Ogweno, P. Degenaar, V. Khomenko, and A. Yakovlev, “A fixedwindow Level Crossing ADC with activity dependent power dissipa-tion,” 14th IEEE International New Circuits and Systems Conference
(NEWCAS), 2016.
[8] W. Tang, A. Osman, D. Kim, B. Goldstein, C. Huang, B. Martini, V. A.Pieribone, E. Culurciello, “Continuous Time Level Crossing SamplingADC for Bio-Potential Recording Systems,” IEEE Trans. on Circuits
and Systems I, vol. 60, no. 6, pp. 1407 – 1418, Jan. 2013.
[9] L. Yongjia, D. Zhao, W.A. Serdijn, “A Sub-Microwatt AsynchronousLevel-Crossing ADC for Biomedical Applications,” IEEE Transactions
on Biomedical Circuits and Systems, vol.7, no.2, pp.149-157, April 2013.
[10] B. Schell and Y. Tsividis, “A Continuous-Time ADC/DSP/DAC SystemWith No Clock and With Activity-Dependent Power Dissipation,” IEEE
Journal of Solid-State Circuits, vol. 43, no. 11, pp. 2472-2481, Nov.2008.
[11] Józef Kalisz, “Review of methods for time interval measurements withpicosecond resolution,” Metrologia, vol. 41, no. 1, 2004.
[12] G. D. Sasaki, R. C. Jensen, "Automatic Measurements with a HighPerformance Universal Counter", Hewlett-Packard Journal (Hewlett-Packard), vol. 31, no. 9, pp. 21–31, Sep. 1980.
[13] J. Zhou, M. Ramirez, S. Palermo, and S. Hoyos, “Digital-AssistedAsynchronous Compressive Sensing Front-end,” IEEE J. on Emerging
and Selected Topics in Cir. and Sys., vol.2, no.3, pp. 482–492, Sep.2012.
[14] J. Zhou, S. Hoyos, and B. M. Sadler, “Asynchronous CompressedBeamformer for Portable Diagnostic Ultrasound Systems,” IEEE Trans.on Ultrasonics, Ferroelectrics, and Frequency Control, vol. 61, no. 11,pp. 1791 – 1801, Nov. 2014.
[15] J. Zhou, Y. He, B. M. Sadler, and S. Hoyos, “Compressed DigitalBeamformer with Asynchronous Sampling for Ultrasound Imaging,”IEEE Int’l Conf. on Acoustics, Speech, and Signal Processing, May2013.
[16] J. Zhou, S. Palermo, B. M. Sadler, and S. Hoyos, “AsynchronousCompressive Sensing in Radar Systems,” IEEE Texas Symposium on
Wireless and Microwave Circuits and Systems, pp.1 – 4, Apr. 2013.
[17] J. Zhou, S. Palermo, José S. Martínez, B. M. Sadler, and S. Hoyos,“Asynchronous Compressive Radar,” the 39th Annual Government
Microcircuit Applications & Critical Technology Conference, Mar. 2014.
[18] E. J. Candès, M. B. Wakin, “An Introduction to Compressive Sampling,”IEEE Signal Processing Magazine, vol. 25, no. 2, Mar. 2008.
[19] A. Tofighi Zavareh, O. Barajas, M. Serafino, J. Jo, B. Applegate, B.Sadler, and S. Hoyos. “A Novel Continuous Time Ternary EncodingBased SS-OCT Calibration,” In IEEE Biomedical Circuits and SystemsConference, 2016.
[20] J. A. Tropp and A. C. Gilbert, “Signal Recovery from Random Measure-ments via Orthogonal Matching Pursuit,” IEEE Trans. Inform. Theory,vol. 53, no. 12, pp. 4655–4666, Dec. 2007.
[21] B.Wohlberg and P. Rodriguez, “An iteratively reweighted normalgorithmfor minimization of total variation functionals,” IEEE Signal Process.
Letter, vol. 14, no. 12, pp. 948–951, Dec. 2007.
[22] E. Candès, J. Romberg and T. Tao, “Robust uncertainty principles: Exactsignal reconstruction from highly incomplete frequency information,”IEEE Trans. Inform. Theory, vol. 52, no. 2, pp. 489–509, Feb. 2006.
[23] J. A. Tropp, J. N. Laska, M. F. Duarte, J. K. Romberg and R. G.Baraniuk, “Beyond Nyquist: Efficient Sampling of Sparse BandlimitedSignals,” IEEE Trans. on Information Theory, vol. 56, no. 1, pp. 520-544, Jan. 2010.
[24] D. Gangopadhyay, E. Allstot, A. Dixon, K. Natarajan, S. Gupta, andD. Allstot, “Compressed Sensing Analog Front-End for Bio-SensorApplications,” IEEE J. Solid-State Circuits, vol. 49, no. 2, pp. 426–438,Feb. 2014.
[25] F. Pareschi, P. Albertini, G. Frattini, M. Mangia, R. Rovatti and G. Setti,“Hardware-Algorithm Co-Design and Implementation of an Analog-to-Information Converter for Biosignals Based on Compressed Sensing,”IEEE Trans. Biomedical Circuits and Systems, vol. 10, no. 1, pp. 149 –162, Feb. 2016.
[26] J. N. Laska, S. Kirolos, M. F. Duarte, T. S. Ragheb, R. G. Baraniuk, andY. Massoud, “Theory and implementation of an analog-to-informationconverter using random demodulation,” in Proc. of IEEE Int. Symposium
on Circuits and Systems, pp. 1959–1962, May 2007.
12
[27] Z. Yu, S. Hoyos, and B. M. Sadler, “Mixed-signal parallel compressedsensing and reception for cognitive radio,” in Proc. of IEEE Int. Conf.
on Acoustics, Speech and Signal Processing, pp. 3861–3864, Mar. 2008.
[28] Y. C. Eldar, P. Kuppinger and H. Bolcskei, “Compressed Sensing ofBlock-Sparse Signals: Uncertainty Relations and Efficient Recovery,”IEEE Trans. on Signal Processing, vol. 58, no. 6, pp. 3042 – 3054,2010.
[29] W. Lu and N. Vaswani, “Modified Basis Pursuit Denoising for noisycompressive sensing with partially known support,” in Proc. of IEEEInt. Conf. on Acoustics, Speech, and Signal Processing pp. 3926-3929,2010.
[30] D. Donoho, Y. Tsaig, I. Drori, and J. Starck, “Sparse solution forunderdetermined linear equations by stagewise orthogonal matchingpursuit,” IEEE Trans. on Inform. Theory, vol. 58, no. 2, pp. 1094–1121,Feb. 2012.
[31] M. Kim, S. Choi, T. Seong, and J. Choi, “A Low-Jitter and Fractional-Resolution Injection-Locked Clock Multiplier Using a DLL-Based Real-Time PVT Calibrator With Replica-Delay Cells,” IEEE J. of Solid-State
Circuits, vol. 51, no. 2, pp. 401 – 411, Feb. 2016.
[32] G. Huang, and L. Wang, “High-speed Signal Reconstruction withOrthogonal Matching Pursuit via Matrix Inversion Bypass,” IEEE Work-shop on Signal Processing Systems, pp: 191 – 196, Oct. 2012.
[33] F. Ren, W. Xu, and D. Markovic, “Scalable and parameterised VLSIarchitecture for efficient sparse approximation in FPGAs and SoCs,”Electronics Letters, vol. 49, no. 23, pp: 1440 – 1441, Nov. 2013.
[34] H. Rabah, A. Amira, B.K. Mohanty, S. Almaadeed, and P.K. Meher,“FPGA Implementation of Orthogonal Matching Pursuit for Compres-sive Sensing Reconstruction,” IEEE Trans. VLSI, vol. 23, no. 10, pp.2209 – 2220, Oct. 2015.
[35] M. Xing, Q. Wang, G. Wang, and Z. Bao, “A Matched-Filter-Bank-Based3-D Imaging Algorithm for Rapidly Spinning Targets,” IEEE Trans. On
Geoscience and Remote Sensing, vol. 47, no. 7, pp: 2106 – 2113, 2009.
[36] D. Hai, B. Himed, “Interference Mitigation Processing for Spectrum-Sharing Between Radar and Wireless Communications Systems,” IEEETrans. on Aerospace and Electronic Systems, vol. 49, no. 3, pp: 1911 –1919, Jul. 2013.
[37] M. A. Herman and T. Strohmer, “High-resolution radar via compressedsensing,” IEEE Trans. on Signal Processing, vol. 57, no. 6, pp. 2275 –2284, Feb. 2009.
[38] W. Bajwa, K. Gedalyahu, and Y. Eldar, “Identification of ParametricUnderspread Linear Systems and Super-Resolution Radar,” IEEE Trans.
on Signal Processing, vol. 59, no. 6, pp. 2548 – 2561, Feb. 2011.
[39] J. R. Guerci, Space-Time Adaptive Processing for Radar. Boston: ArtechHouse, 2003.
[40] S. Ji, Y. Xue, and L. Carin, “Bayesian Compressive Sensing,” IEEE
Trans. Inform. Theory, vol. 56, no. 6, pp. 2346–2356, Jun. 2008.
[41] H. Mohimani, M. B. Zadeh, and C. Jutten, "A fast approach forovercomplete sparse decomposition based on smoothed L0 norm", IEEETransactions on Signal Processing, vol.57, no.1, pp. 289-301, Jan. 2009.
[42] Z. Ben-Haim, Y. Eldar, and M. Elad, “Coherence-based performanceguarantees for estimating a sparse vector under random noise,” IEEE
Trans. on Signal Processing, , vol. 58, no. 10, pp. 5030 –5043, Oct.2010.
[43] E. Candès and T. Tao, “The Dantzig selector: Statistical estimation whenp is much larger than n,” The Annals of Statistics, vol. 35, no. 6, pp.2313–2351, 2007.
[44] O’D. Nicholas, and M. F. M. Jose, “On the Product of IndependentComplex Gaussians,” IEEE Trans. on Signal Processing, vol. 60, no. 3,pp. 1050 – 1063, Mar. 2012.
[45] J. Haboba, M. Mangia, F. Pareschi, R. Rovatti, G. Setti, “A pragmaticlook at some compressive sensing architectures with saturation and quan-tization”, IEEE Journal on Emerging and Selected Topics in Circuits and
Systems, vol. 2, no. 3, pp. 443 – 459, Sept. 2012.
[46] D. Bellasi, L. Benini, “Energy-Efficiency Analysis of Analog andDigital Compressive Sensing in Wireless Sensors”, IEEE Transactions
on Circuits and Systems I: Regular Papers, vol. 62, no. 11, pp. 2718 –2729, Nov. 2015.
[47] C. K. Kwon, H. Kim, and J. Park, “A 0.4-mW, 4.7-ps Resolution Single-Loop � TDC Using a Half-Delay Time Integrator,” IEEE Trans. on
VLSI Systems, vol. 24, no. 3, pp. 1184 – 1188, Mar. 2016.
[48] K. O. Ragab, H. Mostafa, and A. Eladawy, “A Novel 10-Bit 2.8-mWTDC Design Using SAR With Continuous Disassembly Algorithm,”IEEE Trans. on Circuits and Systems II, vol. 63, no. 10, pp. 909 –913, Oct. 2016.
[49] Y. J. Chen, K. H. Chang, and C. C. Hsieh, “A 2.02–5.16 fJ/ConversionStep 10 Bit Hybrid Coarse-Fine SAR ADC With Time-Domain Quan-
tizer in 90 nm CMOS,” IEEE Journal of Solid-State Circuits, vol. 51,no. 2, pp. 357–364, Feb. 2016.
[50] C. C. Liu, “27.4 A 0.35mW 12b 100MS/s SAR-assisted digital slopeADC in 28nm CMOS,” IEEE International Solid-State Circuits Conf.
(ISSCC), Feb. 2016.[51] H. G. Han, B. G. Yu, and T. W. Kim, “19.6 A 1.9mm-precision 20GS/S
real-time sampling receiver using time-extension method for indoorlocalization,” IEEE International Solid-State Circuits Conf. (ISSCC),Feb. 2015.
[52] T. H. Jin, H. G. Han, and T. W. Kim, “Time-of-Arrival MeasurementUsing Adaptive CMOS IR-UWB Range Finder With Scalable Resolu-tion,” IEEE Trans. on Circuits and Systems I, vol. 63, no. 10, pp. 1605– 1615, Sept. 2016.
[53] A. T. Zavareh, O. Barajas and S. Hoyos, "An Efficient EstimationAlgorithm for the Calibration of Low-Cost SS-OCT Systems," IEEE
14th International Symposium on Biomedical Imaging (ISBI), 2017.[54] O.Barajas, A. T. Zavareh, S. Hoyos, "Towards an On-chip Signal
Processing Solution for the Online Calibration of SS-OCT Systems,"IEEE International Symposium on Circuits and Systems (ISCAS), 2017.
Jun Zhou (S’10) received the B.S. degree in elec-trical engineering in 2007 from Fudan University,Shanghai, China. He received dual M.S. degreesin electrical engineering from Fudan University,Shanghai, China and Royal Institute of Technology(KTH), Stockholm, Sweden in 2010. He receivedPh.D degree in electrical engineering from TexasA&M University, College Station, TX, USA in 2014.He has one student best paper in IEEE InternationalSymposium on Circuits and Systems (ISCAS). Hismain research interests are mixed signal processing
and low-power VLSI design. He is with Broadcom Limited, focusing on low-power high-performance WPAN system design.
Amir Tofighi Zavareh was born in 1992, Tehran,Iran. He received his B.S. degree with the ElectronicGroup in Electrical Engineering Department fromSharif University of Technology, Tehran, Iran, in2013. He was with Qualcomm as an interim engi-neering intern from August 2015 to December 2015.He was a recipient of the Graduate Teaching Fel-lowship with Dwight Look College of Engineeringat Texas A&M University at spring of 2017.He iscurrently a PhD candidate at Texas A&M University.His research interests includes Analog/ Mixed signal
circuit and system design and biomedical imaging.
Robin Gupta is now pursuing his Master of Sciencedegree in Electrical Engineering at Texas A&M Uni-versity with specialization in Analog/Mixed signalsand RF circuits & systems. He was with TexasInstruments, Bangalore as an Analog ApplicationEngineer for High Speed & RF Products for threeyears. His passion lies in exploring rudimentaryinventions and developing refined techniques, whichcan overcome the fundamental limitations of HighSpeed Analog Devices (Data Converters, PLL’s, Jit-ter Cleaners, Amplifiers, CDRs, TIAs, Equalizers)
and RF (LNA, Mixers, PA) Circuits and Systems.
13
Liang Liu is an Assistant Professor in the Depart-ment of Electrical and Information Technology atLund University, Sweden. He received his B.S.and Ph.D. degree in the Department of ElectronicsEngineering (2005) and Micro-electronics (2010)from Fudan University China. In 2010, he waswith Electrical, Computer and Systems EngineeringDepartment, Rensselaer Polytechnic Institute (NewYork, USA) as a visiting researcher. He joinedLund University as a Post-doc in 2010. His researchinterest includes wireless communication system and
digital integrated circuits design. Liang served as reviewer of over 30 IEEEjournals, transactions and international conferences. Liang is a member of theTechnical Committee for Circuits and Systems for Communications and VLSISystems and Applications of the IEEE Circuit and Systems Society and TCmember of the Asia-Pacific Signal and Information Processing Association(APSIPA). He also served as the technical program committee members ofIEEE WCNC, ICCVE, ReConFig, and Norchip. Liang is board member ofthe Swedish Chapter of the IEEE Joint Solid-State Circuits and Circuits andSystems Society.
Zhongfeng Wang received both B.E. and M.S.degrees from Tsinghua University, Beijing, China.He obtained the Ph.D. degree from the Depart-ment of Electrical and Computer Engineering at theUniversity of Minnesota, Minneapolis in 2000. Hejoined Nanjing University in 2016 as a DistinguishedProfessor through the state’s 1000-talent plan afterserving Broadcom Corporation as a leading VLSIarchitect for nearly nine years. Prior to that, he wasan Assistant Professor in the School of EECS atOregon State University, Corvallis. Even earlier, he
worked for National Semiconductor Corporation, Santa Clara, USA.Dr. Wang is a world-recognized expert on VLSI for Signal Processing
Systems. He has published over one hundred and fifty technical papers,edited one book (“VLSI”) and filed tens of U.S. patent applications anddisclosures. He was the recipient of the IEEE Circuits and Systems SocietyVLSI Transactions Best Paper Award in 2007. In the current record (2007-present), he has had five papers ranked among top twenty most downloadedmanuscripts in IEEE Trans. on VLSI Systems. During his tenure at Broadcom,he has contributed significantly on 10Gbps and beyond high-speed networkingproducts. Additionally, he has made critical contributions in designing FECcoding schemes for 100Gbps and 400Gbps Ethernet standards. So far, histechnical proposals have been adopted by many international networkingstandards.
Since 2004 Dr. Wang has served as Associate Editor for the IEEE Trans.on Circuits and Systems-I (TCAS-I), TCAS-II, and IEEE Trans. on VLSISystems for numerous terms. He is currently a Guest Editor for a special issueof IEEE Journal on Emerging and Selected Topics in Circuits and Systems.Furthermore, he has served as Technical Program Committee Member (or Co-Chair), Session (or Track) Chair, and Review Committee Member for tens ofinternational conferences. In 2013, he served in the Best Paper Award selectioncommittee for the IEEE Circuits and System Society. His current researchinterests are in the area of Digital Communications, Machine Learning, andEfficient VLSI Implementation. He is a Fellow of IEEE since 2016.
Brian M. Sadler (S’81-M’81-SM’02-F’07) receivedthe B.S. and M.S. degrees from the University ofMaryland, College Park, and the PhD degree fromthe University of Virginia, Charlottesville, all inelectrical engineering. He is a Fellow of the ArmyResearch Laboratory (ARL) in Adelphi, MD. Dr.Sadler is an associate editor for the IEEE Trans-actions on Signal Processing and EURASIP SignalProcessing, was an associate editor for IEEE SignalProcessing Letters, and has been a guest editor forseveral journals including IEEE JSTSP, IEEE JSAC,
the IEEE SP Magazine, and the International Journal of Robotics Research.He is a member of the IEEE Signal Processing Society Sensor Array andMulti-channel Technical Committee, and Co-Chair of the IEEE Roboticsand Automation Society Technical Committee on Networked Robotics. Hereceived Best Paper Awards from the Signal Processing Society in 2006and 2010. His research interests include information science, networked andautonomous systems, sensing, and mixed-signal integrated circuit architec-tures.
Jose Silva-Martinez (SM’98–F’10) received theM.Sc. degree from Instituto Nacional de AstrofísicaOptica y Electrónica (INAOE), Puebla, and the Ph.D.degree from Katholieke Univesiteit Leuven, Leuven,Belgium, in 1981 and 1992, respectively. He iscurrently with Department of Electrical and Com-puter Engineering, Texas A&M University, CollegeStation, TX, USA. He has published more than105 journals and 160 conference papers, 2 booksand 12 book chapters, and 1 patent. His currentresearch interests include the design and fabrication
of integrated circuits for communication and biomedical applications. Dr.Silva-Martinez is serving as an Editor-in-Chief of the IEEE TRANSACTIONSON CIRCUITS AND SYSTEMS—PART II: REGULAR PAPERS, the Con-ference Co-Chair of the IEEE International Midwest Symposium on Circuitsand Systems (MWCAS) in 2014. He was co-author of the papers that receivedthe RF-IC and MWCAS Best Student Paper Awards in 2005 and 2011, anda co-recipient of the European Solid-State Circuits Conference Best PaperAward in 1990.
Sebastian Hoyos received the B.S. degree in electri-cal engineering from Pontificia Universidad Javeri-ana (PUJ), Bogota, Colombia, in 2000, and the M.S.and Ph.D. degrees in electrical engineering from theUniversity of Delaware, Newark, in 2002 and 2004.He was a Postdoctoral Researcher (2004-2006) withthe Berkeley Wireless Research Center, Departmentof Electrical Engineering and Computer Sciences.He joined Texas A&M University, College Station,TX in 2006 where he is currently an AssociateProfessor with the Department of Electrical and
Computer Engineering. His research interests include telecommunicationsystems, digital signal processing, and analog and mixed-signal processingand circuit design.