Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
ECE532: Final Report
Musical Note Sensing on an FPGA
Nathan Van Woudenberg
Michael Manning
Jad Knayzeh
Submitted on April 10th, 2014
1. Overview
This report explains the design and functionality of our project Audio_Decode_v1_00. It will go
over the following points:
● Intended final result
● Progress to date
● Theory behind the the design
● Implementation - how to recreate the system from scratch
● Explanation of the directory structure and HDL
All explanations will be with reference to documents available in the Appendices and all hardware
or software designs will be explained and referenced to the original designs.
a. Project Description
The project was intended to be similar in nature to the popular video game, Guitar Hero, by
Activision. The idea was to make a more sophisticated game where the input is not just a plastic
guitar but instead a real guitar. Ideally it would have also been able to detect other instruments
and voice.
Due to time limitations, the scope changed drastically to simplify the project down to only
process the audio signals and getting the board to output the notes to screen. As opposed to
outputting a single note, we opted for getting the system to distinguish several notes. This would
make future iterations of the project capable of detecting chords played on a guitar making the
video game much more interesting to the user and a much better learning tool.
b. Goals and Motivation
The idea was to create a video game that could help in teaching instruments. There are very little
services or tools for self-learning musical instruments that can provide feedback regarding
whether or not the user is hitting the right notes or not. While it would not be able to teach correct
technique, it would help train the user ear to hear the correct notes.
The closest tool out there is a video game called Rocksmith which takes an electric guitar as an
input. The difference between Rocksmith and our intended design is that our design would not
be limited to an electric guitar because the input would be through a microphone. Of course, this
presents several challenges with noise interference and signal quality.
Ubisoft - the publishers for Rocksmith - has sold close to a million units since the release of the
game with 59% of sales going to individuals who have never played the guitar. According to a
national study conducted by Research Strategy Group Inc., 1.4 million people have learned to
play the guitar using Rocksmith. Further, 95% of users claim to have improved their guitar skills
and 90% say that Guitarsmith was more effective compared to other traditional methods of
learning.
Similarly, there’s no reason why a system that works just as well and can provide real-time
feedback for any instrument would not be just as successful in gaining a market share.
The final goal has shifted due to time constraints so a design that identifies the notes played by
the user became final goal.
c. Block Diagram
The design involves several hardware blocks and an interface with a Microblaze Soft Processor
Core. The hardware blocks are used to interface with the microphone and the Microblaze. The
Microblaze then sends the samples to another sequence of hardware blocks which perform a
Fast Fourier Transform (FFT), get the magnitude of each output, and send the frequency peaks
back to the Microblaze to identify the notes being played.
For each note that has been identified the microblaze writes the specific frequency of the note to
an HDMI frame stored in local BRAM. Rather than explicitly writing out the frequency, the
microblaze uses musical notation to encode each frequency as three separate characters; a
letter (A-G), a symbol (sharp, flat or natural) and a number (1-9). Each of these three character
triples is written into BRAM on a pixel-by-pixel basis until all notes have been represented in the
local frame. This frame is then written into DDR memory, while the HDMI OUT core reads from
the DDR and displays the frame.
d. IP’s Used
Several IPs were used from various sources. Digilent’s main website [1], peer IP project 3, and
Xilinx Coregen.
Coregen is a tool that Xilinx provides to generated IP blocks that are capable of parameterization
to suit the user’s needs. Most blocks support the Microblaze’s AXI protocol. Further information
can be found in the AXI Reference Guide found at [2].
AC97 - audio in/out controller
The AC97 audio in/out controller available from Digilent comes with a sample EDK project to
highlight how to use the Microblaze to interface with the on-board AC97 codec. Using the
“ac97.c/h” files in the sample project provides function calls that allow the user to
● Configure the codec for various modes of operation (e.g. - configure, send/receive)
● Read data from either LINE IN or MIC IN
● Send out data through LINE OUT.
Data transmission is handled by XIO function calls by writing to or reading from set addresses
for left and right channels. Note that the user is required to block on requests for new frames
from the AC97.
HDMI OUT - video out controller
To handle the video out for the design, an HDMI Out video controller is used that acts as both an
AXI slave to the microblaze and a master for a DDR instance. The user can interface with this
core by writing HDMI frames into the DDR directly and then pointing the controller at a specific
address to display on the HDMI out. Once the frame has been written into memory the core
handles generation of HSYNC and VSYNC as required to properly display the frame, and
switching the frame can be achieved by changing the address pointer into DDR.
From XPS the user can control the color format and the screen horizontal and vertical resolution.
There are two color formats (RGB565, and RGB888X) and three screen resolutions (1280x720,
800x600, and 640x480) supported. The current parameters used are:
● RGB888X: 8 red, 8 green, 8 blue, 8 unused for 32bpp (bits/pixel)
● 1280x720 resolution
Note that if the resolution is dropped to 640x480 or 800x600 then the input clock needs to be
changed to 25MHz while 1280x720 requires 75MHz. Changing either the color format or the
resolution requires software updates as well since these parameters are used in the code as
defines.
Coregen Fast Fourier Transform - FFT DSP block
The FFT block used was generated using the Fast Fourier Transform generator from Coregen.
The FFT was designed with a sample size of 4096 and an input resolution of 16 bits. In setting
up the FFT, the user should choose 32 bits as 16 of those bits are meant to be for imaginary
values since the input can be complex numbers. The output is also 32 bits, 16 for imaginary and
16 for real. Screenshots are provided for the exact settings used below:
Peak Detector
The peak detector involves an IP block generated by including a FIFO generated by coregen. The
Peak Detector supports AXI stream, AXI4LITE and also generates an interrupt to the Microblaze
which is triggered when 4096 samples are sent from the FFT block.
The FIFO is generated using the Coregen FIFO Generator. The FIFO is also 4096 words deep to
hold the maximum number of peaks possible. Realistically, one should never get that many
peaks, but it is possible to fill the FIFO during the calibration process before the correct theshold
is known.
The peak detector is a simple comparator. The threshold is user adjustable via the Microblaze
using the AXI4LITE slave register. The input from the FFT and the output of the FIFO to the
Microblaze use AXI Stream protocol.
2. Outcome
As has been previously mentioned, the original goal for this system was to create a video game
that instructs users in how to play an instrument. The original system that would perform this
task had three main components, the audio input used for processing what the user was playing,
the video output used to provide visual cues for the user, and the audio output to provide aural
cues for the user. Specifically, each of these three components was further divided into the
functionalities listed below.
Audio Input
○ Capture audio input from the instrument the user is playing
○ Decode the audio input to determine which notes are being played
○ Report these notes to the processor for use with the Video Output
Video Output
○ Display the results of the audio decoding synchronized to the pre-recorded track
○ Provide a visual cue for the user corresponding to which note to play at each
instance of the pre-recorded track (e.g. - see below for a sample from Rocksmith
as to how this is done).
Audio Output
○ Play a pre-recorded track for the user to play along with synchronized with the
video output
Due to timing considerations the project was ultimately re-scoped to take audio input and display
the notes from the audio input on the screen in real-time. Based on this new scoping, the audio
input functionalities were left unchanged from above and the video out functionalities were
revised to remove the requirement of providing visual cues to prompt the user. The audio output
functionality listed above was removed as it was no longer required, instead the recorded audio
input was played through the LINE OUT in real time. Accounting for the updated scoping, the
revised functionality list for the project is as follows:
● Capture audio input from the instrument the user is playing
● Decode the audio input to determine which notes are being played
● Report these notes to the processor for use with the Video Output
● Display the results of the audio decoding in real-time
The system that we designed was able to meet the first three requirements for audio input
processing. During testing we saw that the input notes were recognized with a good deal of
accuracy, however on occasion the system would also report the presence of directly adjacent
notes that were not being played or incorrectly characterize the octave of the note. This occurs
because the frequency response cannot be perfectly sharp around the peaks, and as such the
roll-off is slow enough that some adjacent notes will have amplitudes higher than the threshold
value we were taking for the test. To adjust for this, the software was coded to disregard any
notes above the threshold that occur within one frequency bin of a previously recorded note.
However this currently means that for some peaks, the note one frequency bin directly under the
peak is getting selected and preventing the desired note from being measured. The output note
values were determined by first converting the measured indices to frequency [Hz] using the
following equation:
index [Hz]f = ×NF s
where is the sampling rate, and is the FFT size. These frequencies44100 Hz F s = 4096 N =
were then converted to their corresponding MIDI numbers, which represent a one-to-one
mapping to the music notes, comprised of both a pitch, and an octave.
On the video out side, the requirement for displaying the results of the audio processing in
real-time was partially met in that the results displayed correctly but introduced noticeable lag in
the system. The reason for this is that the IP we are using for video display is required to write
each display frame into the DDR2 memory on a pixel-by-pixel basis. As a result any change in
the video output requires a significant number of DDR accesses which greatly slows down the
system. In order to attempt to mitigate this effect we minimized the amount of information that
changes for a note; each time a new note is present a black box appears next to the
corresponding note on screen and any black boxes from previous time intervals are cleared.
With this change the video output was clearly introducing less-lag on the system, however it was
still not sufficient to ensure real-time operation.
Based on the results of testing our system, there are two main areas for improvement:
1. Find a method either in software or hardware to account for false detection of notes in a
+/-1 frequency bin from the peak
2. Create custom hardware to write display frames directly to memory to avoid the
bottleneck of software access to the off-chip DDR2.
3. Project Schedule
Skip this section if your goal is to recreate the project. The following section provides insight on
the time commitment required to achieve the results presented.
The original schedule was as follows. Note the schedule drastically changed due to special
circumstances. The revised schedule is also presented.
1. Preliminary Schedule
a. Week 1 (Feb 26th)
- Simulate the system in Matlab
- Test simulation with multiple types of instruments
b. Week 2 (March 5th)
- Construct simplified DSP block from library pcores
- Test the system by feeding input streams to the buffer
- Integrate video-out pcore into design
c. Week 3 (March 12th)
- Customize hardware configuration
- Connect microphone and ADC
- Display notes on the monitor
d. Week 4 (March 19th)
- Program the game and run it on the MicroBlaze system
- Synchronize hardware with software
- Noise filtering
e. Week 5 (March 26th)
- Test game thoroughly
- Ensure accurate peak detection in the presence of noise and multipath
2. Revised Schedule
a. Week 1 (Feb 26th)
- Simulate the system in Matlab
- Test simulation with multiple types of instruments
b. Week 2 (March 5th)
- Make or acquire audio in core
- Test audio in core
c. Week 3 (March 12th)
- Generate FFT core and connect to Microblaze
- Simulate and verify design
d. Week 4 (March 19th)
- Create custom multiplier blocks to find FFT amplitude
- Create peak detector block and output FIFO
- Test FFT->multiplier setup and FFT->multiplier->peak detector setup
e. Week 5 (March 26th)
- Integrate and test interrupt generation using C program
- Integrate video out and design output characters
f. Week 6 (April 2nd)
- Finish integration by mapping the FFT peaks to notes
- Sending those notes to the video out module and outputting the notes
As explained above, we were forced to rescope our project due to time constraints. The majority
of the functionality which we decided to remove was in the game aspect of the design. We
wanted to keep all the signal processing however, because we found it to be more fundamental
to the project. Doing so also provides a more complete foundation for future groups, interested in
digital signal processing, to build off of.
4. Description of Blocks
Digilent_AC97_Cntlr
The following system calls are provided in order for the microblaze to interface with the AC97
controller:
● AC97_Link_Is_Ready(AC97_BASEADDR): returns ‘0’ while waiting for AC97 codec to
come online. Should wait for return of ‘1’ before using device further
● AC97_Select_Input(AC97_BASEADDR,<channels>,<input>): to select whether one or
both input channels are used on either LINE IN or MIC IN
● AC97_Set_Tag_And_Id(AC97_BASEADDR, <code>): sets tag for AC97 codec to be
placed into configuration mode (0xF800) or send/receive mode (0x9800)
● AC97_Set_Volume(AC97_BASEADDR,<device>): change playback volume for codec
● AC97_Unmute(AC97_BASEADDR, <device>): enable audio output (device = master
volume, headphone volume, pcm volume)
● AC97_Wait_For_New_Frame(AC97_BASEADDR): block program execution until a new
frame is ready
○ Frame read via XIo_In32 from standard location in DDR2 memory
○ Similarly, XIo_Out32 used to write date out via the codec
fft_16bit - Coregen Fast Fourier Transform
The FFT block uses the AXI stream protocol to interface with the Microblaze which sends it the
input and the multiplier which takes each output of the FFT and finds the magnitude.
The signals are as follows:
input ACLK;
input ARESETN;
input S_AXIS_TVALID;
input [31 : 0] S_AXIS_TDATA;
input S_AXIS_TLAST;
input M_AXIS_TREADY;
output M_AXIS_TVALID;
output [31 : 0] M_AXIS_TDATA;
output M_AXIS_TLAST;
output S_AXIS_TREADY;
These signals are directly attached to the generated core which is instantiated inside the
wrapper called fft_16bit. There are other signals that are left unconnected in the instantiated
core. To learn more about them, refer to the datasheet [3] in the Datasheets directory.
hdmi_out
The user can interface with the hdmi_out core through a configurable status register and by
writing the frames to be displayed to the DDR2 memory.
Configuration Register:
● volatile int* hdmi_reg = (int*) XPAR_HDMI_OUT_0_BASEADDR
● hdmi_reg[0] = HRES (set by user)
● hdmi_reg[1] = DDR address for display frame
● hdmi_reg[2] = “GO” signal for the core
○ Set GO = 1 to display the frame, GO = 0 cuts the display
DDR2 Access:
● volatile u32* ddr_addr = (volatile u32*) XPAR_S6DDR_0_S0_AXI_BASEADDR
● ddr_addr[index] = pixel color in format specified by HDMI OUT block
Writing pixel-by-pixel into the DDR2 memory can be accomplished directly in code without
requiring any additional function calls, however this is slow since it requires multiple DDR
accesses for every pixel.
multiplier
The multiplier has the same pinout as the FFT, presented here again for convenience:
input ACLK;
input ARESETN;
input S_AXIS_TVALID;
input [31 : 0] S_AXIS_TDATA;
input S_AXIS_TLAST;
input M_AXIS_TREADY;
output M_AXIS_TVALID;
output [31 : 0] M_AXIS_TDATA;
output M_AXIS_TLAST;
output S_AXIS_TREADY;
The structure inside is slightly different. Inside you will find two complex_multiplier instances.
Both instances are generated using the Coregen. The reason for using a complex_multiplier
instead of a regular multiplier is the supplied complex multiplier supports the AXI STREAM
interface. Since the complex multiplier was used as a real number multiplier only, the imaginary
inputs were set to zero and only the real inputs were used to calculate the product.
One instance of the multiplier takes the real output from the FFT and squares it and the other
instance takes the imaginary output from the FFT and squares it. The outputs of the multipliers
are then added. Realistically, to get the magnitude, the value should be square rooted but since
we are only interested in the relative magnitudes between the FFT outputs, we opted to not
perform the expensive square root function and simply compare the values as they are.
An important distinction: the FFT outputs 16 bit real and 16 bit imaginary values. The multiplier
takes two 32 bit inputs, each of which takes 16 bit real and 16 bit imaginary numbers. The FFT
output is not fed directly into the multiplier. Instead, both inputs to one instance of the
complex_multiplier get the 16 bit real value (with 0 padding) and both inputs of the other instance
get the 16 bit imaginary value (with 0 padding).
i.e. FFT output: {16’bimaginary_value, 16’breal_value}
complex_multiplier_real input A: {16’b0, 16’breal_value}
complex_multiplier_real input B: {16’b0, 16’breal_value}
complex_multiplier_imag input A: {16’b0, 16’bimaginary_value}
complex_multiplier_imag input B: {16’b0, 16’bimaginary_value}
To learn more about the complex_multiplier, refer to datasheet [4] in the Datasheets directory.
peak_detector
The peak detector has an AXI-Stream interface as well as IPIF interface. The pinout is shown
here:
// AXI-Stream signals:
input S_AXIS_TVALID;
input [31 : 0] S_AXIS_TDATA;
input S_AXIS_TLAST;
input M_AXIS_TREADY;
output M_AXIS_TVALID;
output [31 : 0] M_AXIS_TDATA;
output M_AXIS_TLAST;
output S_AXIS_TREADY;
output Interrupt;
// Bus protocol ports
input Bus2IP_Clk;
input Bus2IP_Resetn;
input [31 : 0] Bus2IP_Data;
input [3 : 0] Bus2IP_BE;
input [3 : 0] Bus2IP_RdCE;
input [3 : 0] Bus2IP_WrCE;
output [31 : 0] IP2Bus_Data;
output IP2Bus_RdAck;
output IP2Bus_WrAck;
output IP2Bus_Error;
This core also has two configuration registers:
reg [31 : 0] threshold_reg0;
reg [31 : 0] int_rst_reg0;
wire [12 : 0] data_count;
reg [31 : 0] slv_ip2bus_data;
These registers can be used as follows:
● theshold_reg0: holds the value of the threshold to compare against
● int_rst_reg0: allows microblaze to exit reset state
● slv_ip2bus_data: various axi and fifo IO wires for debugging purposes
● data_count: output from FIFO specifying the number of items in the FIFO
The reading and writing of these registers is done through the AXI4LITE interface in compliance
with the standard IPIF bus protocol.
The peak detector accepts input from the AXI-stream exiting the multiplier block. The multiplier
streams in 4096 32-bit samples, each of which are then compared relative to the threshold
specified by writing to the threshold_reg0. Any peaks larger than the threshold value are piped
into a FIFO (generated with CoreGen [5]). The contents of which is read by the MicroBlaze on an
interrupt. The Interrupt signal is activated once when the peak detector receives the final
sample from the multiplier. This is detected through the use of TLAST, which is piped from the
FFT, through the complex multiplier, to the peak detector.
To learn more about the FIFO, refer to datasheet [5] in the Datasheets directory.
5. Description of Design Tree
./doc/ - project documentation
./fft_test_0/ - C-code for running the project
● “Debug” mode has errors when programming the FPGA so build to “Release”
● ac97_demo.c contains the main method for the code
● ac97.c code from Digilent website to interface with the AC97 codec
● pixel_draw.c code written to program characters into DDR on a pixel-by-pixel basis for
use with HDMI OUT
./project/ - Contains XPS project defining the Audio_Decode_v1_00 system
./pcores/ - Contains all the custom pcores that are used in the design
● d_ac97_axi_v1_00_a
● fft_16bit_v1_00_b
● hdmi_out_v1_00_a
● multipler_v1_00_a
● peak_detector_v1_00_b
6. References
[1] Digilent AC97 pcore
http://www.digilentinc.com/Data/Products/ATLYS/Atlys_AC97_EDK_demo.zip
[2] AXI Reference Guide UG761 (v13.4)
http://www.xilinx.com/support/documentation/ip_documentation/axi_ref_guide/v13_4/ug761_axi_
reference_guide.pdf
[3] Xilinx DS808 LogiCORE IP Fast Fourier Transform v8.0
http://www.xilinx.com/support/documentation/ip_documentation/ds808_xfft.pdf
[4] Xilinx PG104 LogiCORE IP Complex Multiplier v6.0
http://www.xilinx.com/support/documentation/ip_documentation/cmpy/v6_0/pg104-cmpy.pdf
[5] Xilinx DS317 LogiCORE IP FIFO Generator v8.1
http://www.xilinx.com/support/documentation/ip_documentation/fifo_generator_ds317.pdf