AQuA DLL library - develop your own voice and audio quality software

AQuA DLL Library

Purpose of the library.........................................................................................................................................2 Library initialization ..........................................................................................................................................2 Library information............................................................................................................................................2 Data structure containing audio quality measurements .....................................................................................3 Audio quality estimation functions....................................................................................................................3 Settings of audio quality algorithm....................................................................................................................4 Checking file formats.........................................................................................................................................6 Displaying audio quality testing results.............................................................................................................6 Analysis of possible reasons for audio quality loss...........................................................................................7

Duration distortion.........................................................................................................................................7 Delay/Advancing of audio signal activity......................................................................................................7 Audio signal activity mistiming.....................................................................................................................8 Corrupted signal spectrum .............................................................................................................................8

AQuA software benefits ....................................................................................................................................9 Simple example of using AQuA DLL .............................................................................................................10

Purpose of the library AQuA DLL provides intrusive audio signal quality measurement by comparing original and test

sound signals. Original signal is considered as etalon of audio quality and the more the test signal differs from it the lower is the quality estimation. Software utilizes a set of objective and subjective characteristics for perceptual audio quality measurement.

The software library provides developers with a wide range of possibilities to obtain information about audio quality estimation algorithm settings, audio quality testing results, preparing received results to display and the library itself. AQuA library contains the following functions:

Library initialization Library initialization includes two functions: SSA_SDK_API bool SSA_InitLib(void); - performs library initialization, check library copyrights and software license period and validity. If initialization is successful the function returns “true”, otherwise result is “false”. SSA_SDK_API void SSA_ReleaseLib(void); - this function is called when work with the library is finished. Invoking this function is important for further work with the library.

Library information This function returns information about the library: SSA_SDK_API TSSA_AQuA_Info * SSA_GetPAQuAInfo(void); - returns pointer to data structure containing information about AQuA library. Funciton SSA_GetPAQuAInfo() may be called before the library is initialized by function SSA_InitLib. Data structure returned has the following format: struct TSSA_AQuA_Info { int dStructSize; // Structure size wchar_t * dCopyrightString; // Copyright string wchar_t * dVersionString; // Product name and version number string int dSampleRateLimit; // Maximal sampling frequency // supported by the library int dChannelsLimit; // Maximal amount of channels // supported by the library bool isDifferentFFmtCheckingEnabled; // If comparison of audio files in different formats // is allowed wchar_t * pSupportedBitsPerSampleList; // List of supported sample bits wchar_t * pSupportedCodecsList; // List of supported audio compression algorithms };

Data structure containing audio quality measurements The structure contains received quality measurement estimations. If any estimation (Percentage,

MOS or PESQ) was requested then its correspondent field contains –1.

struct TSSA_AQuA_Results { double dPercent; // Percentage double dMOSLike; // MOS-like double dPESQLike; // PESQ-like }; The structure is filled with the help of the following function: SSA_SDK_API int SSA_FillQualityResultsStruct(void * anSSA_ID, TSSA_AQuA_Results * aPQResults); - which fills the structure with results of audio quality measurements.

Audio quality estimation functions

Audio quality estimation is represented by three functions of creating, deleting and start of quality analyzer. SSA_SDK_API void * SSA_CreateAudioQualityAnalyzer(void); - creates audio analyzer. If successful returns analyzer’s handle, if unsuccessful the function returns NULL. SSA_SDK_API void SSA_ReleaseAudioQualityAnalyzer(void * anSSA_ID); - finishes working with analyzer and deletes all variables used during analyzer work. SSA_SDK_API int SSA_OnTestAudioFiles(void * anSSA_ID); - performs audio files quality estimation according to the files passed to the analyzer in the setting functions.

Settings of audio quality algorithm Setting parameters is done by the same function that can set any parameter of the analyzer: SSA_SDK_API bool SSA_SetAny(void * anSSA_ID, wchar_t * aPParName, void * aPParValue); There are three input parameters: anSSA_ID – identifier of the analyzer; aPParName – parameter name aPParValue – pointer to value assigned to the parameter.

The table below represents a list of parameters and range of their values if that is applicable

Name Type Range SrcAudioFileName wchar_t TstAudioFileName wchar_t FaultsReportFileName wchar_t CoefficientsType int 0, 1, 2 EnergyNormalizationFlag bool NumberOfLinkPoints int 1..10 EnvelopeSmoothingLevel int 1..10 OutputEstimations wchar_t %, m, p QualityMode int 0, 1 SAPrecisionDegree int 8..16 DeltaCorrectionFlag bool IntegrationMode Int 0, 1, 2 MusicalPriority bool TstStartDelay long ms BOFBandwidth double Hz EOFBandwidth double Hz L”SrcAudioFileName”

- name of the file containing original audio; L”TstAudioFileName” - name of the file under test (degraded); L”FaultsReportFileName” - file name to store reasons for audio quality loss; L”CoefficientsType”

- type of weight coefficients for frequency groups. These coefficients manage input of different frequency bands to overall audio signal quality. There are three types of weight coefficients:

o 0 (uniform) – equal, uniform input of frequency bands o 1 (linear) – frequency bands input is in inverse ratio to the energy of frequency bands o 2 (logarithmic) – frequency bands input is in inverse ratio to the loudness of frequency

bands We recommend to use linear and logarithmic coefficients when signal quality is especially important in high frequency bands of the signal. L”EnergyNormalizationFlag”

- energy normalization flag. Normalizing energy maybe useful if one knows in advance about uniform changes of the signal under test amplitude caused by signal processing. In other cases attempt to normalize energy may cause unstable behavior of comparison procedure.

L”NumberOfLinkPoints” - amount of link points. In case original signal and signal under test have reasonably long pauses

then one can virtually split the compared files. In such case quality estimation will be received for each pair of the virtual files and then further processed to obtain overall quality score. There is option that allows automatic detection of the amount of virtual files (mode auto). Setting amount of link points manually must be done very carefully, because that may cause to unsynchronization between the original signal and signal under test.

L”EnvelopeSmoothingLevel”

- envelope smoothing level. This option manages how smooth is work of audio activity detector (or voice activity detector - VAD). The higher the value of this parameter the smoother will be the change from one state of the detector to another.

L”OutputEstimations”

- list of quality estimation values for output. There are three possible values the software can return as audio quality estimation:

o % - audio quality estimation in percentage o M – MOS-like estimation o P – PESQ-like estimation

Obtaining MOS-like and PESQ-like estimations does not require any additional signal processing. L”QualityMode”

- mode of receiving audio quality measurement. The software allows obtaining two types of quality measurements:

o 0 (quality) – typical audio quality estimation o 1 (naturalness) – detecting how natural the audio sounds.

For most of the cases the first type of estimation (0, quality) is optimal, testing how natural the audio sounds is an experimental estimation characterizing audio quality. L”SAPrecisionDegree”

- precision degree of the spectrum analyzer. This parameter allows controlling speed and precision of detecting audio quality. Depending on the sampling frequency there is an option to automatically define precision degree of the spectrum analyzer (mode auto), which gives optimal ration between the speed and accuracy.

L”DeltaCorrectionFlag”

- delta correction flag. Turning delta correction on allows considering additional factors that may cause quality loss in audio signal. When enabled quality score will be lower if the factors is present in audio.

L”IntegrationMode”

- sets working mode for software integrator. There are three modes of integration: o 0 (linear) o 1 (log) o 2 (10log)

Integration mode manages work of the quality estimation algorithm. The most “sensitive” is the linear mode of integration. L”MusicalPriority”

- sets type of compared signals. When enabled software considers input signals as music and when disable - speech.

L”TstStartDelay” - time shift from beginning of the signal in milliseconds. This option allows user to exclude

starting fragment of the test file from analysis thus tuning the algorithm for user’s tasks.

L”BOFBandwidth”, L”EOFBandwidth”

- beginning and end of the bandwidth under test. This allows user to particularly specify frequency bands for analysis and tune the quality estimation algorithm for user’s tasks.

Checking file formats File formats checking is represented by the following three functions: SSA_SDK_API bool SSA_IsFileFormatSupportable(void * anSSA_ID, wchar_t * aPFName); - checks if file formats are supported by the library. If the format is supported the function returns “true”, otherwise the return value is “false”. SSA_SDK_API bool SSA_AreFilesComparable(void * anSSA_ID, wchar_t * aPSrcFName, wchar_t * aPTstFName); - checks if file comparison is possible. If files format is supported and file comparison is supported for these file formats then the return value is “true”, otherwise the function returns “false”.

Displaying audio quality testing results SSA_SDK_API int SSA_GetQualityStringSize(void * anSSA_ID); - returns string length containing test results in text. SSA_SDK_API int SSA_FillQualityString(void * anSSA_ID, wchar_t * aPString); - fills string with the text of the test result. User should allocate memory for the string by himself. Amount of the memory required can be found by function SSA_GetQualityStringSize. SSA_SDK_API int SSA_GetSrcSignalSpecSize(void * anSSA_ID); - returns size of array for integral energy spectrum of the original signal. Note that signal spectrum is available only after quality estimation has been performed and only in the mode “QualityMode” = 0. If signal spectrum was not calculated the function returns 0, in case of error the function returns -1. SSA_SDK_API int SSA_GetTstSignalSpecSize(void * anSSA_ID); - returns size of array for integral energy spectrum of the signal under test. Note that signal spectrum is available only after quality estimation has been performed and only in the mode “QualityMode” = 0. If signal spectrum was not calculated the function returns 0, in case of error the function returns -1. SSA_SDK_API int SSA_FillSrcSignalSpecArray(void * anSSA_ID, float * aPSpecArray); - fills array with integral energy spectrum of the original signal. Note that signal spectrum is available only after quality estimation has been performed and only in the mode “QualityMode” = 0. If signal spectrum was not calculated the function returns 0, in case of error the function returns -1. SSA_SDK_API int SSA_FillTstSignalSpecArray(void * anSSA_ID, float * aPSpecArray); - fills array with integral energy spectrum of the signal under test. Note that signal spectrum is available only after quality estimation has been performed and only in the mode “QualityMode” = 0. If signal spectrum was not calculated the function returns 0, in case of error the function returns -1. SSA_SDK_API int SSA_GetFaultsAnalysisStringSize(void * anSSA_ID);

- returns size of the string containing reasons for quality loss. String size does not consider 0 symbol in the end of the string. SSA_SDK_API int SSA_FillFaultsAnalysisString(void * anSSA_ID, wchar_t * aPString); - fills string with reasons for audio quality loss. String aPString contains only meaningful symbols and

does not contain 0 symbol in the end.

Analysis of possible reasons for audio quality loss

Besides audio quality score AQuA gives a possibility to analyze and determine possible reasons that caused audio signal degradation. Software automatically prepares analysis results that can be returned as a string or stored in a log file depending on the chosen option. Additional audio quality metrics returned by the system may not look trivial to understand and this chapter is devoted to the main principles of how these metrics are built and how one can interpret them. AQuA returns additional metrics only in the case when they are out of range for their “typical values”. In case the metrics are within the range the system returns “Cannot determine the major reason for audio quality loss”.

Duration distortion This metric represents continuity of compared audio files. Ideally amount of audio data in the

original signal and file under test should be the same. During audio processing or transfer over communication channels audio fragments may be lost as well as inserted into the audio. If such audio degradation took place then value of this metric is lower than 100. The bigger the difference the stronger the degradation, however, this metric does not consider possible starting pauses.

When the value is less than 100% this means that audio data was lost and analysis result will be: Audio shrinking corresponds to ХХ.ХХ percent.

where ХХ.ХХ corresponds to deviation from 100%. When the actual value is more than 100% this means that data was inserted and analysis result will be:

Audio stretching corresponds to ХХ.ХХ percent. where ХХ.ХХ corresponds to deviation from 100%. Tolerance range for this value is set to 100% ± 1%.

Delay/Advancing of audio signal activity This metric represents signal shift in test file compared to the original and determines how much active level of the test signal delays/advances active level of the etalon (original) signal. When it is delayed analysis returns the following:

Signal delayed by ХХ.ХХ ms. where ХХ.ХХ is delay time in milliseconds. Correspondently, when the signal advances the original the return string is:

Signal advances the original by -ХХ.ХХ ms. where ХХ.ХХ is advancing time. Tolerance range for this value is interval of ±50 ms.

Audio signal activity mistiming This metric represents unsynchronization of active levels in etalon and under test signals. Original

(etalon) audio signal and test signal are merged to determine characteristics of audio activity, and when current characteristics of audio activity do not match system increases unsynchronization counter. After processing the final unsynchronization value is presented as percentage of cases when unsynchronization was detected.

If the metric value is not zero analysis result represent it as: Audio signal activity mistiming (unsynchronization) is ХХ.ХХ percent.

where ХХ.ХХ is percentage of unsynchronization. The value is not considered if it is less than 1%.

Corrupted signal spectrum This represents a set of metrics reflecting differences in integral energy spectrums of the original signal and audio under test. If overall spectrums difference is more than 15% than analysis returns the following string:

Corrupted signal spectrum.

If difference in spectrums is multidirectional (goes both into positive and negative zones) analysis returns the following string:

Vibration along the whole spectrum [-ХХ.XX, YY.YY] %

where ХХ.XX and YY.YY are deviations to negative and positive zones correspondently. Tolerance range of the deviation is ±5%.

If spectrum distortions are unidirectional (only negative or only positive) analysis returns this string:

Amplification approaches YY.YY %

When distortions are positive, or

Attenuation approaches ХХ.XX %

when distortions are negative.

Other metrics returned by analysis correspond to distortions occurred in different frequency groups. Analysis of different frequency bands performs in a similar manner to spectrum analysis. When talking about frequency bands in question we consider:

Low frequencies – below 1000 Hz

Medium frequencies – from 1000 Hz to 3000 Hz

High frequencies are those that are greater than 3000 Hz

When analyzing frequency bands we use other value tolerance ranges. Distortion in low frequencies is considered when they are greater than 5%, in medium frequencies – 10% and in high frequencies – 30%.

Multidirectional spectrum changes (vibration) is considered when they are greater than 2.5% in low frequencies, 7% in medium frequencies and 15% in high frequencies.

Unidirectional distortions (no matter positive or negative) are considered when they are greater than 5% in low frequencies, 10% in medium frequencies and 25% in high frequencies.

AQuA software benefits Among AQuA benefits one will definitely appreciate that:

- AQuA is suitable to develop server solutions and does not involve any “per channel” limitations - AQuA license does not have any annual royalty fee - AQuA is suitable both for 32 and 64 bit systems - AQuA is easy to deploy and use for software products development - AQuA provides perceptual estimation of audio quality and can be utilized in VoIP, PSTN, ISDN,

GSM, CDMA networks and combinations of those

Simple example of using AQuA DLL #include "stdafx.h" #include <stdio.h> #include "AQuAdll.h" int wmain(int argc, wchar_t* argv[]) { if (argc < 3) { printf("usage\nVQDLLTest <srcfilename> <codedfilename>"); return 0; } if (SSA_InitLib()) { TSSA_AQuA_Info * iPAQuAInfo = SSA_GetPAQuAInfo(); wprintf(L"CopyrightString = \n%s\n", iPAQuAInfo->dCopyrightString); wprintf(L"VersionString = \n%s\n", iPAQuAInfo->dVersionString); void * iAnalyzerID = SSA_CreateAudioQualityAnalyzer(); if (iAnalyzerID) { wprintf(L"SSA_IsFileFormatSupportable() = %i\n", (int)SSA_IsFileFormatSupportable(iAnalyzerID, argv[1])); wprintf(L"SSA_AreFilesComparable() = %i\n", (int)SSA_AreFilesComparable(iAnalyzerID, argv[1], argv[2])); int iTmpI; bool iTmpB; iTmpI = 0; SSA_SetAny(iAnalyzerID, L"SrcAudioFileName", argv[1]); SSA_SetAny(iAnalyzerID, L"TstAudioFileName", argv[2]); iTmpB = false; SSA_SetAny(iAnalyzerID, L"EnergyNormalizationFlag", &iTmpB); iTmpI = 5; SSA_SetAny(iAnalyzerID, L"NumberOfLinkPoints", &iTmpI); iTmpI = 5; SSA_SetAny(iAnalyzerID, L"EnvelopeSmoothingLevel", &iTmpI); SSA_SetAny(iAnalyzerID, L"OutputEstimations", L"%%mp"); SSA_SetAny(iAnalyzerID, L"FaultsReportFileName", L".\\report.txt"); iTmpI = 8; SSA_SetAny(iAnalyzerID, L"SAPrecisionDegree", &iTmpI); if (SSA_OnTestAudioFiles(iAnalyzerID) == 0) { printf("SSA_OnTestAudioFiles() --> Ok!\n"); int iResLen = SSA_GetQualityStringSize(iAnalyzerID);

wchar_t * iResStr = new wchar_t[iResLen + 10]; SSA_FillQualityString(iAnalyzerID, iResStr); wprintf(L"iResStr = \n%s\n", iResStr); delete(iResStr); iTmpI = SSA_GetSrcSignalSpecSize(iAnalyzerID); float * iPSpecArr = new float[iTmpI]; SSA_FillSrcSignalSpecArray(iAnalyzerID, iPSpecArr); wprintf(L"SrcSpecSize = %i\n", iTmpI); for(int i=0; (i<16)&&(i<iTmpI); i++) wprintf(L"%f\t", iPSpecArr[i]); wprintf(L"\n"); delete(iPSpecArr); iTmpI = SSA_GetTstSignalSpecSize(iAnalyzerID); iPSpecArr = new float[iTmpI]; SSA_FillTstSignalSpecArray(iAnalyzerID, iPSpecArr); wprintf(L"TstSpecSize = %i\n", iTmpI); for(i=0; (i<16)&&(i<iTmpI); i++) wprintf(L"%f\t", iPSpecArr[i]); wprintf(L"\n"); delete(iPSpecArr); TSSA_AQuA_Results iResults; SSA_FillQualityResultsStruct(iAnalyzerID, &iResults); wprintf(L"dPercent = %f\n", iResults.dPercent); wprintf(L"dMOSLike = %f\n", iResults.dMOSLike); wprintf(L"dPESQLike = %f\n", iResults.dPESQLike); iResLen = SSA_GetFaultsAnalysisStringSize(iAnalyzerID); iResStr = new wchar_t[iResLen + 10]; SSA_FillFaultsAnalysisString(iAnalyzerID, iResStr); wprintf(L"iResStr = \n%s\n", iResStr); delete(iResStr); } else { printf("SSA_OnTestAudioFiles() --> failed!\n"); } SSA_ReleaseAudioQualityAnalyzer(iAnalyzerID); } SSA_ReleaseLib(); } return(0); }

Technology

AQuA DLL library - develop your own voice and audio quality software