ECE532: Final Report - University of Torontopc/courses/432/2014...ECE532: Final Report Musical Note Sensing on an FPGA Nathan Van Woudenberg Michael Manning Jad Knayzeh Submitted on

ECE532: Final Report

Musical Note Sensing on an FPGA

Nathan Van Woudenberg

Michael Manning

Jad Knayzeh

Submitted on April 10th, 2014

1. Overview

This report explains the design and functionality of our project Audio_Decode_v1_00. It will go

over the following points:

● Intended final result

● Progress to date

● Theory behind the the design

● Implementation - how to recreate the system from scratch

● Explanation of the directory structure and HDL

All explanations will be with reference to documents available in the Appendices and all hardware

or software designs will be explained and referenced to the original designs.

a. Project Description

The project was intended to be similar in nature to the popular video game, Guitar Hero, by

Activision. The idea was to make a more sophisticated game where the input is not just a plastic

guitar but instead a real guitar. Ideally it would have also been able to detect other instruments

and voice.

Due to time limitations, the scope changed drastically to simplify the project down to only

process the audio signals and getting the board to output the notes to screen. As opposed to

outputting a single note, we opted for getting the system to distinguish several notes. This would

make future iterations of the project capable of detecting chords played on a guitar making the

video game much more interesting to the user and a much better learning tool.

b. Goals and Motivation

The idea was to create a video game that could help in teaching instruments. There are very little

services or tools for self-learning musical instruments that can provide feedback regarding

whether or not the user is hitting the right notes or not. While it would not be able to teach correct

technique, it would help train the user ear to hear the correct notes.

The closest tool out there is a video game called Rocksmith which takes an electric guitar as an

input. The difference between Rocksmith and our intended design is that our design would not

be limited to an electric guitar because the input would be through a microphone. Of course, this

presents several challenges with noise interference and signal quality.

Ubisoft - the publishers for Rocksmith - has sold close to a million units since the release of the

game with 59% of sales going to individuals who have never played the guitar. According to a

national study conducted by Research Strategy Group Inc., 1.4 million people have learned to

play the guitar using Rocksmith. Further, 95% of users claim to have improved their guitar skills

and 90% say that Guitarsmith was more effective compared to other traditional methods of

learning.

Similarly, there’s no reason why a system that works just as well and can provide real-time

feedback for any instrument would not be just as successful in gaining a market share.

The final goal has shifted due to time constraints so a design that identifies the notes played by

the user became final goal.

c. Block Diagram

The design involves several hardware blocks and an interface with a Microblaze Soft Processor

Core. The hardware blocks are used to interface with the microphone and the Microblaze. The

Microblaze then sends the samples to another sequence of hardware blocks which perform a

Fast Fourier Transform (FFT), get the magnitude of each output, and send the frequency peaks

back to the Microblaze to identify the notes being played.

For each note that has been identified the microblaze writes the specific frequency of the note to

an HDMI frame stored in local BRAM. Rather than explicitly writing out the frequency, the

microblaze uses musical notation to encode each frequency as three separate characters; a

letter (A-G), a symbol (sharp, flat or natural) and a number (1-9). Each of these three character

triples is written into BRAM on a pixel-by-pixel basis until all notes have been represented in the

local frame. This frame is then written into DDR memory, while the HDMI OUT core reads from

the DDR and displays the frame.

d. IP’s Used

Several IPs were used from various sources. Digilent’s main website [1], peer IP project 3, and

Xilinx Coregen.

Coregen is a tool that Xilinx provides to generated IP blocks that are capable of parameterization

to suit the user’s needs. Most blocks support the Microblaze’s AXI protocol. Further information

can be found in the AXI Reference Guide found at [2].

AC97 - audio in/out controller

The AC97 audio in/out controller available from Digilent comes with a sample EDK project to

highlight how to use the Microblaze to interface with the on-board AC97 codec. Using the

“ac97.c/h” files in the sample project provides function calls that allow the user to

● Configure the codec for various modes of operation (e.g. - configure, send/receive)

● Read data from either LINE IN or MIC IN

● Send out data through LINE OUT.

Data transmission is handled by XIO function calls by writing to or reading from set addresses

for left and right channels. Note that the user is required to block on requests for new frames

from the AC97.

HDMI OUT - video out controller

To handle the video out for the design, an HDMI Out video controller is used that acts as both an

AXI slave to the microblaze and a master for a DDR instance. The user can interface with this

core by writing HDMI frames into the DDR directly and then pointing the controller at a specific

address to display on the HDMI out. Once the frame has been written into memory the core

handles generation of HSYNC and VSYNC as required to properly display the frame, and

switching the frame can be achieved by changing the address pointer into DDR.

From XPS the user can control the color format and the screen horizontal and vertical resolution.

There are two color formats (RGB565, and RGB888X) and three screen resolutions (1280x720,

800x600, and 640x480) supported. The current parameters used are:

● RGB888X: 8 red, 8 green, 8 blue, 8 unused for 32bpp (bits/pixel)

● 1280x720 resolution

Note that if the resolution is dropped to 640x480 or 800x600 then the input clock needs to be

changed to 25MHz while 1280x720 requires 75MHz. Changing either the color format or the

resolution requires software updates as well since these parameters are used in the code as

defines.

Coregen Fast Fourier Transform - FFT DSP block

The FFT block used was generated using the Fast Fourier Transform generator from Coregen.

The FFT was designed with a sample size of 4096 and an input resolution of 16 bits. In setting

up the FFT, the user should choose 32 bits as 16 of those bits are meant to be for imaginary

values since the input can be complex numbers. The output is also 32 bits, 16 for imaginary and

16 for real. Screenshots are provided for the exact settings used below:

Peak Detector

The peak detector involves an IP block generated by including a FIFO generated by coregen. The

Peak Detector supports AXI stream, AXI4LITE and also generates an interrupt to the Microblaze

which is triggered when 4096 samples are sent from the FFT block.

The FIFO is generated using the Coregen FIFO Generator. The FIFO is also 4096 words deep to

hold the maximum number of peaks possible. Realistically, one should never get that many

peaks, but it is possible to fill the FIFO during the calibration process before the correct theshold

is known.

The peak detector is a simple comparator. The threshold is user adjustable via the Microblaze

using the AXI4LITE slave register. The input from the FFT and the output of the FIFO to the

Microblaze use AXI Stream protocol.

2. Outcome

As has been previously mentioned, the original goal for this system was to create a video game

that instructs users in how to play an instrument. The original system that would perform this

task had three main components, the audio input used for processing what the user was playing,

the video output used to provide visual cues for the user, and the audio output to provide aural

cues for the user. Specifically, each of these three components was further divided into the

functionalities listed below.

Audio Input

○ Capture audio input from the instrument the user is playing

○ Decode the audio input to determine which notes are being played

○ Report these notes to the processor for use with the Video Output

Video Output

○ Display the results of the audio decoding synchronized to the pre-recorded track

○ Provide a visual cue for the user corresponding to which note to play at each

instance of the pre-recorded track (e.g. - see below for a sample from Rocksmith

as to how this is done).

Audio Output

○ Play a pre-recorded track for the user to play along with synchronized with the

video output

Due to timing considerations the project was ultimately re-scoped to take audio input and display

the notes from the audio input on the screen in real-time. Based on this new scoping, the audio

input functionalities were left unchanged from above and the video out functionalities were

revised to remove the requirement of providing visual cues to prompt the user. The audio output

functionality listed above was removed as it was no longer required, instead the recorded audio

input was played through the LINE OUT in real time. Accounting for the updated scoping, the

revised functionality list for the project is as follows:

● Capture audio input from the instrument the user is playing

● Decode the audio input to determine which notes are being played

● Report these notes to the processor for use with the Video Output

● Display the results of the audio decoding in real-time

The system that we designed was able to meet the first three requirements for audio input

processing. During testing we saw that the input notes were recognized with a good deal of

accuracy, however on occasion the system would also report the presence of directly adjacent

notes that were not being played or incorrectly characterize the octave of the note. This occurs

because the frequency response cannot be perfectly sharp around the peaks, and as such the

roll-off is slow enough that some adjacent notes will have amplitudes higher than the threshold

value we were taking for the test. To adjust for this, the software was coded to disregard any

notes above the threshold that occur within one frequency bin of a previously recorded note.

However this currently means that for some peaks, the note one frequency bin directly under the

peak is getting selected and preventing the desired note from being measured. The output note

values were determined by first converting the measured indices to frequency [Hz] using the

following equation:

index [Hz]f = ×NF s

where is the sampling rate, and is the FFT size. These frequencies44100 Hz F s = 4096 N =

were then converted to their corresponding MIDI numbers, which represent a one-to-one

mapping to the music notes, comprised of both a pitch, and an octave.

On the video out side, the requirement for displaying the results of the audio processing in

real-time was partially met in that the results displayed correctly but introduced noticeable lag in

the system. The reason for this is that the IP we are using for video display is required to write

each display frame into the DDR2 memory on a pixel-by-pixel basis. As a result any change in

the video output requires a significant number of DDR accesses which greatly slows down the

system. In order to attempt to mitigate this effect we minimized the amount of information that

changes for a note; each time a new note is present a black box appears next to the

corresponding note on screen and any black boxes from previous time intervals are cleared.

With this change the video output was clearly introducing less-lag on the system, however it was

still not sufficient to ensure real-time operation.

Based on the results of testing our system, there are two main areas for improvement:

1. Find a method either in software or hardware to account for false detection of notes in a

+/-1 frequency bin from the peak

2. Create custom hardware to write display frames directly to memory to avoid the

bottleneck of software access to the off-chip DDR2.

3. Project Schedule

Skip this section if your goal is to recreate the project. The following section provides insight on

the time commitment required to achieve the results presented.

The original schedule was as follows. Note the schedule drastically changed due to special

circumstances. The revised schedule is also presented.

1. Preliminary Schedule

a. Week 1 (Feb 26th)

- Simulate the system in Matlab

- Test simulation with multiple types of instruments

b. Week 2 (March 5th)

- Construct simplified DSP block from library pcores

- Test the system by feeding input streams to the buffer

- Integrate video-out pcore into design

c. Week 3 (March 12th)

- Customize hardware configuration

- Connect microphone and ADC

- Display notes on the monitor

d. Week 4 (March 19th)

- Program the game and run it on the MicroBlaze system

- Synchronize hardware with software

- Noise filtering

e. Week 5 (March 26th)

- Test game thoroughly

- Ensure accurate peak detection in the presence of noise and multipath

2. Revised Schedule

a. Week 1 (Feb 26th)

- Simulate the system in Matlab

- Test simulation with multiple types of instruments

b. Week 2 (March 5th)

- Make or acquire audio in core

- Test audio in core

c. Week 3 (March 12th)

- Generate FFT core and connect to Microblaze

- Simulate and verify design

d. Week 4 (March 19th)

- Create custom multiplier blocks to find FFT amplitude

- Create peak detector block and output FIFO

- Test FFT->multiplier setup and FFT->multiplier->peak detector setup

e. Week 5 (March 26th)

- Integrate and test interrupt generation using C program

- Integrate video out and design output characters

f. Week 6 (April 2nd)

- Finish integration by mapping the FFT peaks to notes

- Sending those notes to the video out module and outputting the notes

As explained above, we were forced to rescope our project due to time constraints. The majority

of the functionality which we decided to remove was in the game aspect of the design. We

wanted to keep all the signal processing however, because we found it to be more fundamental

to the project. Doing so also provides a more complete foundation for future groups, interested in

digital signal processing, to build off of.

4. Description of Blocks

Digilent_AC97_Cntlr

The following system calls are provided in order for the microblaze to interface with the AC97

controller:

● AC97_Link_Is_Ready(AC97_BASEADDR): returns ‘0’ while waiting for AC97 codec to

come online. Should wait for return of ‘1’ before using device further

● AC97_Select_Input(AC97_BASEADDR,<channels>,<input>): to select whether one or

both input channels are used on either LINE IN or MIC IN

● AC97_Set_Tag_And_Id(AC97_BASEADDR, <code>): sets tag for AC97 codec to be

placed into configuration mode (0xF800) or send/receive mode (0x9800)

● AC97_Set_Volume(AC97_BASEADDR,<device>): change playback volume for codec

● AC97_Unmute(AC97_BASEADDR, <device>): enable audio output (device = master

volume, headphone volume, pcm volume)

● AC97_Wait_For_New_Frame(AC97_BASEADDR): block program execution until a new

frame is ready

○ Frame read via XIo_In32 from standard location in DDR2 memory

○ Similarly, XIo_Out32 used to write date out via the codec

fft_16bit - Coregen Fast Fourier Transform

The FFT block uses the AXI stream protocol to interface with the Microblaze which sends it the

input and the multiplier which takes each output of the FFT and finds the magnitude.

The signals are as follows:

input ACLK;

input ARESETN;

input S_AXIS_TVALID;

input [31 : 0] S_AXIS_TDATA;

input S_AXIS_TLAST;

input M_AXIS_TREADY;

output M_AXIS_TVALID;

output [31 : 0] M_AXIS_TDATA;

output M_AXIS_TLAST;

output S_AXIS_TREADY;

These signals are directly attached to the generated core which is instantiated inside the

wrapper called fft_16bit. There are other signals that are left unconnected in the instantiated

core. To learn more about them, refer to the datasheet [3] in the Datasheets directory.

hdmi_out

The user can interface with the hdmi_out core through a configurable status register and by

writing the frames to be displayed to the DDR2 memory.

Configuration Register:

● volatile int* hdmi_reg = (int*) XPAR_HDMI_OUT_0_BASEADDR

● hdmi_reg[0] = HRES (set by user)

● hdmi_reg[1] = DDR address for display frame

● hdmi_reg[2] = “GO” signal for the core

○ Set GO = 1 to display the frame, GO = 0 cuts the display

DDR2 Access:

● volatile u32* ddr_addr = (volatile u32*) XPAR_S6DDR_0_S0_AXI_BASEADDR

● ddr_addr[index] = pixel color in format specified by HDMI OUT block

Writing pixel-by-pixel into the DDR2 memory can be accomplished directly in code without

requiring any additional function calls, however this is slow since it requires multiple DDR

accesses for every pixel.

multiplier

The multiplier has the same pinout as the FFT, presented here again for convenience:

input ACLK;

input ARESETN;



input S_AXIS_TLAST;






The structure inside is slightly different. Inside you will find two complex_multiplier instances.

Both instances are generated using the Coregen. The reason for using a complex_multiplier

instead of a regular multiplier is the supplied complex multiplier supports the AXI STREAM

interface. Since the complex multiplier was used as a real number multiplier only, the imaginary

inputs were set to zero and only the real inputs were used to calculate the product.

One instance of the multiplier takes the real output from the FFT and squares it and the other

instance takes the imaginary output from the FFT and squares it. The outputs of the multipliers

are then added. Realistically, to get the magnitude, the value should be square rooted but since

we are only interested in the relative magnitudes between the FFT outputs, we opted to not

perform the expensive square root function and simply compare the values as they are.

An important distinction: the FFT outputs 16 bit real and 16 bit imaginary values. The multiplier

takes two 32 bit inputs, each of which takes 16 bit real and 16 bit imaginary numbers. The FFT

output is not fed directly into the multiplier. Instead, both inputs to one instance of the

complex_multiplier get the 16 bit real value (with 0 padding) and both inputs of the other instance

get the 16 bit imaginary value (with 0 padding).

i.e. FFT output: {16’bimaginary_value, 16’breal_value}

complex_multiplier_real input A: {16’b0, 16’breal_value}

complex_multiplier_real input B: {16’b0, 16’breal_value}

complex_multiplier_imag input A: {16’b0, 16’bimaginary_value}

complex_multiplier_imag input B: {16’b0, 16’bimaginary_value}

To learn more about the complex_multiplier, refer to datasheet [4] in the Datasheets directory.

peak_detector

The peak detector has an AXI-Stream interface as well as IPIF interface. The pinout is shown

here:

// AXI-Stream signals:



input S_AXIS_TLAST;






output Interrupt;

// Bus protocol ports

input Bus2IP_Clk;

input Bus2IP_Resetn;

input [31 : 0] Bus2IP_Data;

input [3 : 0] Bus2IP_BE;

input [3 : 0] Bus2IP_RdCE;

input [3 : 0] Bus2IP_WrCE;

output [31 : 0] IP2Bus_Data;

output IP2Bus_RdAck;

output IP2Bus_WrAck;

output IP2Bus_Error;

This core also has two configuration registers:

reg [31 : 0] threshold_reg0;

reg [31 : 0] int_rst_reg0;

wire [12 : 0] data_count;

reg [31 : 0] slv_ip2bus_data;

These registers can be used as follows:

● theshold_reg0: holds the value of the threshold to compare against

● int_rst_reg0: allows microblaze to exit reset state

● slv_ip2bus_data: various axi and fifo IO wires for debugging purposes

● data_count: output from FIFO specifying the number of items in the FIFO

The reading and writing of these registers is done through the AXI4LITE interface in compliance

with the standard IPIF bus protocol.

The peak detector accepts input from the AXI-stream exiting the multiplier block. The multiplier

streams in 4096 32-bit samples, each of which are then compared relative to the threshold

specified by writing to the threshold_reg0. Any peaks larger than the threshold value are piped

into a FIFO (generated with CoreGen [5]). The contents of which is read by the MicroBlaze on an

interrupt. The Interrupt signal is activated once when the peak detector receives the final

sample from the multiplier. This is detected through the use of TLAST, which is piped from the

FFT, through the complex multiplier, to the peak detector.

To learn more about the FIFO, refer to datasheet [5] in the Datasheets directory.

5. Description of Design Tree

./doc/ - project documentation

./fft_test_0/ - C-code for running the project

● “Debug” mode has errors when programming the FPGA so build to “Release”

● ac97_demo.c contains the main method for the code

● ac97.c code from Digilent website to interface with the AC97 codec

● pixel_draw.c code written to program characters into DDR on a pixel-by-pixel basis for

use with HDMI OUT

./project/ - Contains XPS project defining the Audio_Decode_v1_00 system

./pcores/ - Contains all the custom pcores that are used in the design

● d_ac97_axi_v1_00_a

● fft_16bit_v1_00_b

● hdmi_out_v1_00_a

● multipler_v1_00_a

● peak_detector_v1_00_b

6. References

[1] Digilent AC97 pcore

http://www.digilentinc.com/Data/Products/ATLYS/Atlys_AC97_EDK_demo.zip

[2] AXI Reference Guide UG761 (v13.4)

http://www.xilinx.com/support/documentation/ip_documentation/axi_ref_guide/v13_4/ug761_axi_

reference_guide.pdf

[3] Xilinx DS808 LogiCORE IP Fast Fourier Transform v8.0

http://www.xilinx.com/support/documentation/ip_documentation/ds808_xfft.pdf

[4] Xilinx PG104 LogiCORE IP Complex Multiplier v6.0

http://www.xilinx.com/support/documentation/ip_documentation/cmpy/v6_0/pg104-cmpy.pdf

[5] Xilinx DS317 LogiCORE IP FIFO Generator v8.1

http://www.xilinx.com/support/documentation/ip_documentation/fifo_generator_ds317.pdf

http://www.google.com/url?q=http%3A%2F%2Fwww.digilentinc.com%2FData%2FProducts%2FATLYS%2FAtlys_AC97_EDK_demo.zip&sa=D&sntz=1&usg=AFQjCNFrO0XGVREBWive7QxjV6Pym-XeXg

http://www.google.com/url?q=http%3A%2F%2Fwww.xilinx.com%2Fsupport%2Fdocumentation%2Fip_documentation%2Faxi_ref_guide%2Fv13_4%2Fug761_axi_reference_guide.pdf&sa=D&sntz=1&usg=AFQjCNF6eKkZjEW0OV7s0e2G0BYXSmsgVQ

http://www.google.com/url?q=http%3A%2F%2Fwww.xilinx.com%2Fsupport%2Fdocumentation%2Fip_documentation%2Faxi_ref_guide%2Fv13_4%2Fug761_axi_reference_guide.pdf&sa=D&sntz=1&usg=AFQjCNF6eKkZjEW0OV7s0e2G0BYXSmsgVQ

http://www.google.com/url?q=http%3A%2F%2Fwww.xilinx.com%2Fsupport%2Fdocumentation%2Fip_documentation%2Fds808_xfft.pdf&sa=D&sntz=1&usg=AFQjCNECX8YNl2gIaHDnxuIvDgT0djQokg



http://www.google.com/url?q=http%3A%2F%2Fwww.xilinx.com%2Fsupport%2Fdocumentation%2Fip_documentation%2Fcmpy%2Fv6_0%2Fpg104-cmpy.pdf&sa=D&sntz=1&usg=AFQjCNGp3ZkMdldqD0lOlaQEf3ojTEzbEw

http://www.google.com/url?q=http%3A%2F%2Fwww.xilinx.com%2Fsupport%2Fdocumentation%2Fip_documentation%2Fcmpy%2Fv6_0%2Fpg104-cmpy.pdf&sa=D&sntz=1&usg=AFQjCNGp3ZkMdldqD0lOlaQEf3ojTEzbEw

https://www.google.ca/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CC0QFjAA&url=http%3A%2F%2Fwww.xilinx.com%2Fsupport%2Fdocumentation%2Fip_documentation%2Ffifo_generator_ds317.pdf&ei=u9FGU_2qEKTgsASk9oGQBA&usg=AFQjCNEECffmdhuch0JRgYExMjgxUAyqQg&sig2=XWttVxk5bNF0KMjvLDG7jw&bvm=bv.64507335,d.cWc

http://www.google.com/url?q=http%3A%2F%2Fwww.xilinx.com%2Fsupport%2Fdocumentation%2Fip_documentation%2Ffifo_generator_ds317.pdf&sa=D&sntz=1&usg=AFQjCNHebD9VYE75yLXsY2payFkpzzd8jQ

Documents

ECE532: Final Report - University of Torontopc/courses/432/2014...ECE532: Final Report Musical Note Sensing on an FPGA Nathan Van Woudenberg Michael Manning Jad Knayzeh Submitted on