DSP Using Labview FPGA

DSP using Labview FPGA

T.J.Moir

AUT University

School of Engineering

Auckland

New-Zealand

Limitations of a basic processor

Despite all of the advancements weve made in the world of processors, they still have one major limitation an individual processor core can only execute one

instruction at a time

2

For example, that you have MS Office, Firefox and

Skype all open at once. You feel like youre multi-tasking, but in processor terms youre not. The processor core executing data related to these programs

executes one instruction at a time, but because it is so

quick you dont notice any delay.

Field Programmable Gate-

Arrays

Unlike processors, FPGAs use dedicated direct hard wired

logic for processing and do not have an operating system.

Because the processing paths are parallel, different

operations do not have to compete for the same processing

resources. That means speeds can be very fast and the

design is re-configurable.

The most perplexing fact is that an FPGA running at a clock

frequency that is an order of magnitude lower than CPUs and

GPUs (graphics processing units) is able to outperform them.

When it comes to power efficiency (performance per watt),

however, both CPUs and GPUs lag significantly behind

FPGAs.

FPGA designs get rid of the fetch and decode pipeline

which is the limiting factor on the Von Neumann

architecture.

Application 1: Basic Signal Processing Problem

Given an acoustic mixture of two voices (or a voice plus noise), how do we build a computer system which can

separate them?

Applications

Hearing-aids

Mobile phones

Speech recognition

Algorithm solutions

There are many ways of solving this problem which have been developed over the past 40 years or so. Very few of them have ever reached a final product. We need an approach which is successful in simulation yet computationally simple enough to work in real-time.

Crosstalk resistant adaptive noise-canceller (CRANC)without a VAD.

A good engineering solution that is simple enough to implement in real-time. Not optimal but does it matter?

Unsupervised adaptive filtering (Blind-Source Separation).

This is a possibility but the algorithms can be computationally intensive

for real-time implementation.

Uses principle of independent component analysis (ICA)

In a reverberant environment the filter orders can be quite large.

We must restrict our solution as far as possible.

Why use an FPGA device and not a dsp

processor?

FPGA has inherent parallelism in the architecture which we wish to exploit.

Low power consumption but relatively slow clock speed (40MHz).

We have two LMS algorithms which we will run in parallel. A dsp processor

must run the algorithm in serial format.

Disadvantage that FPGA is difficult to program compared to an ordinary c-

language program for a dsp processor. FPGA uses fixed-point arithmetic.

On a single processor, multithreading generally occurs by time-division multiplexing

(as in multitasking): the processor switches between different threads.

This switching happens frequently enough that the user perceives the threads or

tasks as running at the same time.

On a multiprocessor or multi-core system, threads can be truly concurrent, with every

processor or core executing a separate thread simultaneously.

1 1 1 1

1 1 ( )T

k k k ke w w X

2 2 2 2

1 2 ( )T

k k k ke w w X

, 1,2i i T ik k k ke s i X w1 2 2 2

1 2[ , ... ]T

k k k k ne e e X

2 1 1 1

1 2[ , ... ]T

k k k k ne e e X

Cross-coupled LMS equations

We must implement the above cross-coupled equations in real-

time using fixed-point arithmetic. The number of weights n

should be as large as possible. Also known as the SAD

algorithm! (Symmetric Adaptive Deconvolution)

Microphone 1

Microphone 2

1

ke+

-

W2

W1

2

ke+-

Crosstalk-resistant noise-canceller

Host PC

cRio

Ethernet

A/D D/A

Analogue Out (to

filters)

Analogue In

(from filters)

The CompactRio runs as a stand-alone unit and is

programmed in the high-level data-flow language LabView.

12

NI cRIO-9082 RT

1.33 GHz dual-core Intel Core i7 processor, 32 GB nonvolatile storage, 2 GB DDR3 800 MHz RAM

LabVIEW Real-Time for determinism and continuous operation reliability

1 MXI-Express, 4 USB Hi-Speed, 2 Gigabit Ethernet, and 2 serial ports for connectivity, expansion

8-slot Spartan-6 LX150 FPGA chassis for custom I/O timing, control, and processing

1 1 2 2 Tk k k k k kT T

X w x w x w

Dot product of two vectors can exploit parallelism

Pipelining two CRANC algorithms. Pipelining is the use of feedback nodes or shift-registers in order to allow

items that would normally execute serially to execute in

parallel.

ANC1

ANC2

Store Values

in registers

Retrieve

ANC1 out

from

registers

Inputs from

ADC

ANC1 out

ANC2 out to

DAC

+-

+

-

+-

+

-

Pipelining the two decorrelators (at the expense of a time-

delay)

17

Testing was performed by playing recordings from a two

channel digital audio recorder and recording the FPGA Rio

output on a second digital recorder.

The CRANC was switched on and off (bypassed in the software)

for comparison purposes.

Example: CRANC initially on and then turned off.

100 weights per CRANC (X3 pipelined) (300 weight adaptive filter)

was achieved at a sampling rate of 33kHz, 16 bits. 3 bit integer and

13 bits fraction 3.999 to -4 volts, delta = 0.0001. Often termed

Q3.13 or Took 2.5 hours to compile. RAM method slows

down the sampling rate, array method uses more space on the

FPGA. (Spartan 6 LX 150 array 40MHz external clock, 147 K logic

cells)

Demo wav file of real-time operation.

19

Example 2.The myRIO.

20

Processor Processor type ...................................................Xilinx Z-7010

Processor speed.................................................667 MHz

Processor cores .................................................2

Memory Nonvolatile memory .........................................256 MB

DDR3 memory..................................................512 MB

DDR3 clock frequency .............................533 MHz

DDR3 data bus width................................16 bits

For information about the lifespan of the nonvolatile memory

and about best practices for using

nonvolatile memory, go to ni.com/info and enter the Info

Code SSDBP .

FPGA FPGA type ........................................................Xilinx Z-7010

22

The myRio and the Microsoft Surface Pro.

Use as a wireless spectral analyser

Documents

DSP Using Labview FPGA