22
DSP using Labview FPGA T.J.Moir AUT University School of Engineering Auckland New-Zealand

DSP Using Labview FPGA

Embed Size (px)

DESCRIPTION

DSP Using Labview FPGA

Citation preview

  • DSP using Labview FPGA

    T.J.Moir

    AUT University

    School of Engineering

    Auckland

    New-Zealand

  • Limitations of a basic processor

    Despite all of the advancements weve made in the world of processors, they still have one major limitation an individual processor core can only execute one

    instruction at a time

    2

    For example, that you have MS Office, Firefox and

    Skype all open at once. You feel like youre multi-tasking, but in processor terms youre not. The processor core executing data related to these programs

    executes one instruction at a time, but because it is so

    quick you dont notice any delay.

  • Field Programmable Gate-

    Arrays

    Unlike processors, FPGAs use dedicated direct hard wired

    logic for processing and do not have an operating system.

    Because the processing paths are parallel, different

    operations do not have to compete for the same processing

    resources. That means speeds can be very fast and the

    design is re-configurable.

    The most perplexing fact is that an FPGA running at a clock

    frequency that is an order of magnitude lower than CPUs and

    GPUs (graphics processing units) is able to outperform them.

    When it comes to power efficiency (performance per watt),

    however, both CPUs and GPUs lag significantly behind

    FPGAs.

    FPGA designs get rid of the fetch and decode pipeline

    which is the limiting factor on the Von Neumann

    architecture.

  • Application 1: Basic Signal Processing Problem

    Given an acoustic mixture of two voices (or a voice plus noise), how do we build a computer system which can

    separate them?

  • Applications

    Hearing-aids

    Mobile phones

    Speech recognition

  • Algorithm solutions

    There are many ways of solving this problem which have been developed over the past 40 years or so. Very few of them have ever reached a final product. We need an approach which is successful in simulation yet computationally simple enough to work in real-time.

  • Crosstalk resistant adaptive noise-canceller (CRANC)without a VAD.

    A good engineering solution that is simple enough to implement in real-time. Not optimal but does it matter?

    Unsupervised adaptive filtering (Blind-Source Separation).

    This is a possibility but the algorithms can be computationally intensive

    for real-time implementation.

    Uses principle of independent component analysis (ICA)

    In a reverberant environment the filter orders can be quite large.

    We must restrict our solution as far as possible.

  • Why use an FPGA device and not a dsp

    processor?

    FPGA has inherent parallelism in the architecture which we wish to exploit.

    Low power consumption but relatively slow clock speed (40MHz).

    We have two LMS algorithms which we will run in parallel. A dsp processor

    must run the algorithm in serial format.

    Disadvantage that FPGA is difficult to program compared to an ordinary c-

    language program for a dsp processor. FPGA uses fixed-point arithmetic.

    On a single processor, multithreading generally occurs by time-division multiplexing

    (as in multitasking): the processor switches between different threads.

    This switching happens frequently enough that the user perceives the threads or

    tasks as running at the same time.

    On a multiprocessor or multi-core system, threads can be truly concurrent, with every

    processor or core executing a separate thread simultaneously.

  • 1 1 1 1

    1 1 ( )T

    k k k ke w w X

    2 2 2 2

    1 2 ( )T

    k k k ke w w X

    , 1,2i i T ik k k ke s i X w1 2 2 2

    1 2[ , ... ]T

    k k k k ne e e X

    2 1 1 1

    1 2[ , ... ]T

    k k k k ne e e X

    Cross-coupled LMS equations

    We must implement the above cross-coupled equations in real-

    time using fixed-point arithmetic. The number of weights n

    should be as large as possible. Also known as the SAD

    algorithm! (Symmetric Adaptive Deconvolution)

  • Microphone 1

    Microphone 2

    1

    ke+

    -

    W2

    W1

    2

    ke+-

    Crosstalk-resistant noise-canceller

  • Host PC

    cRio

    Ethernet

    A/D D/A

    Analogue Out (to

    filters)

    Analogue In

    (from filters)

    The CompactRio runs as a stand-alone unit and is

    programmed in the high-level data-flow language LabView.

  • 12

    NI cRIO-9082 RT

    1.33 GHz dual-core Intel Core i7 processor, 32 GB nonvolatile storage, 2 GB DDR3 800 MHz RAM

    LabVIEW Real-Time for determinism and continuous operation reliability

    1 MXI-Express, 4 USB Hi-Speed, 2 Gigabit Ethernet, and 2 serial ports for connectivity, expansion

    8-slot Spartan-6 LX150 FPGA chassis for custom I/O timing, control, and processing

  • 1 1 2 2 Tk k k k k kT T

    X w x w x w

    Dot product of two vectors can exploit parallelism

  • Pipelining two CRANC algorithms. Pipelining is the use of feedback nodes or shift-registers in order to allow

    items that would normally execute serially to execute in

    parallel.

  • ANC1

    ANC2

    Store Values

    in registers

    Retrieve

    ANC1 out

    from

    registers

    Inputs from

    ADC

    ANC1 out

    ANC2 out to

    DAC

    +-

    +

    -

    +-

    +

    -

  • Pipelining the two decorrelators (at the expense of a time-

    delay)

  • 17

    Testing was performed by playing recordings from a two

    channel digital audio recorder and recording the FPGA Rio

    output on a second digital recorder.

    The CRANC was switched on and off (bypassed in the software)

    for comparison purposes.

  • Example: CRANC initially on and then turned off.

    100 weights per CRANC (X3 pipelined) (300 weight adaptive filter)

    was achieved at a sampling rate of 33kHz, 16 bits. 3 bit integer and

    13 bits fraction 3.999 to -4 volts, delta = 0.0001. Often termed

    Q3.13 or Took 2.5 hours to compile. RAM method slows

    down the sampling rate, array method uses more space on the

    FPGA. (Spartan 6 LX 150 array 40MHz external clock, 147 K logic

    cells)

    Demo wav file of real-time operation.

  • 19

    Example 2.The myRIO.

  • 20

    Processor Processor type ...................................................Xilinx Z-7010

    Processor speed.................................................667 MHz

    Processor cores .................................................2

    Memory Nonvolatile memory .........................................256 MB

    DDR3 memory..................................................512 MB

    DDR3 clock frequency .............................533 MHz

    DDR3 data bus width................................16 bits

    For information about the lifespan of the nonvolatile memory

    and about best practices for using

    nonvolatile memory, go to ni.com/info and enter the Info

    Code SSDBP .

    FPGA FPGA type ........................................................Xilinx Z-7010

  • 21

  • 22

    The myRio and the Microsoft Surface Pro.

    Use as a wireless spectral analyser