47
Audio Watermarking Tools 2 (AWT2) of www.audiowatermarking.info User Guide AWT2 version 0.11.00 May 2015 © Alex Radzishevsky, 2006-2015 http://audiowatermarking.info

Audio Watermarking Tools 2 (AWT2) of User Guide

Embed Size (px)

Citation preview

Audio Watermarking Tools 2 (AWT2) of www.audiowatermarking.info

User Guide

AWT2 version 0.11.00

May 2015

© Alex Radzishevsky, 2006-2015 http://audiowatermarking.info

Audio Watermarking Tools 2 (AWT2) - User Guide

2

Contents

1 Introduction ................................................................................................................................................................... 3

2 General information ....................................................................................................................................................... 4

2.1 AWT2 at a glance .................................................................................................................................................... 4

2.2 AWT2 applications ................................................................................................................................................. 6

2.3 Watermark robustness and aural transparency .................................................................................................... 7

2.4 How the algorithm works ....................................................................................................................................... 9

3 GUI front-end utility ..................................................................................................................................................... 11

4 Encoder ........................................................................................................................................................................ 13

4.1 Recommended encoding parameters for different types of audio content ........................................................ 16

5 Decoder ........................................................................................................................................................................ 19

5.1 Recommended audio fragment (slice) duration for decoding ............................................................................. 24

6 AWT2 Software Development Kit (SDK) ....................................................................................................................... 27

7 Choosing optimal watermark payload form ................................................................................................................ 31

7.1 text2hex utility ..................................................................................................................................................... 33

8 Useful tips ..................................................................................................................................................................... 35

8.1 Converting audio or video file into file format suitable for AWT2....................................................................... 35

8.2 Watermarking long audio files using AWT2 ......................................................................................................... 35

8.3 Extracting watermarks from broadcasts and long recordings ............................................................................. 37

8.4 Watermarking audio track of a video file ............................................................................................................. 37

8.5 Watermarking multiple files ................................................................................................................................. 37

8.6 Analyzing live audio stream ................................................................................................................................. 38

9 Licensing terms ............................................................................................................................................................. 41

10 FAQ ........................................................................................................................................................................... 42

11 Contact information ................................................................................................................................................. 44

12 Appendix: AWT2 error codes ................................................................................................................................... 45

13 Appendix: ASCII printable characters ....................................................................................................................... 46

Audio Watermarking Tools 2 (AWT2) - User Guide

3

1 Introduction

Audio Watermarking Tools 2 (AWT2) of www.audiowatermarking.info is a set of software utilities for embedding (and

retrieving) digital watermarks carrying arbitrary digital data within (from) audio files and streams. The watermarks are

imperceptibly embedded directly into the audio content so that they cannot be removed without damaging the original

audio quality. The tools are distributed as a package of GUI and console utilities running on Microsoft Windows

7/Vista/XP/NT/2000/Server, Apple Mac OS X and Linux/*nix systems; both x86 and x64 versions are available.

AWT2 package consists of encoder (watermark embedding tool), decoder (watermark extraction tool), GUI front-end

utility and this document. The package does not require any installation and can be used right after unpacking.

Encoder and decoder are console (terminal) applications controlled by parameters in command line. The encoder

embeds watermark payload into audio stream (file). The decoder detects whether a given audio stream (or its fragment)

is watermarked and extracts watermark if it was found.

AWT2 GUI is a front-end utility for encoder and decoder with intuitive graphical user interface, conveniently visualizing

command-line functionality of the console utilities.

AWT2 SDK (Software Development Kit) for all major desktop and mobile platforms is available upon request. AWT2 SDK

allows watermarking of live (real-time) audio streams, and provides disk files and in-memory processing functionality,

including live streams operations.

AWT2 technology is patented1.

1 U.S. Patent No. 8,116,514.

Audio Watermarking Tools 2 (AWT2) - User Guide

4

2 General information

2.1 AWT2 at a glance

AWT2 algorithm allows embedding watermarks with arbitrary digital data (usually, few bytes long binary identifiers) into

audio stream and retrieve them accordingly. The algorithm implements so called "blind watermarking" approach, in the

sense that the original (non-watermarked) audio stream is not needed to extract the watermark from the watermarked

stream. The watermark is extracted directly from watermarked audio stream or its fragment.

AWT2 encoder and decoder operate with wave PCM (.wav) audio files of almost any format ― with sampling rates from

8 to 192 KHz, amplitude resolutions of 8/16/24/32/64 bits and any number of channels2. Additional audio formats (such

as MP3, OGG, AMR, etc.) are supported by the decoder by means of automatic conversion using external tool, FFmpeg

(http://ffmpeg.org).

AWT2 can operate in 'normal' and 'high capacity' mode. Normal mode provides the highest watermark robustness with

moderate watermarking data rate (see below). In 'high capacity' mode AWT2 uses extended frequency range to embed

watermarks that results in 3 times higher watermarking data rate, but leads to somewhat reduced robustness (e.g.

against very low sample rate conversion such as 8 KHz and lower which is typical sampling rate for phone networks).

Supported watermark payload size is from 1 to 120 bytes (subject to limitations in different AWT2 packages).

Recommended watermark payload size ensuring uncompromising robustness is up to 20 bytes in ‘normal mode’ and up

to 50 bytes in ‘high capacity’ mode.

With default parameters in 'normal' mode, the watermarking data rate is approximately 8 bps for 1-byte payload, 11

bps for 2-bytes payload, 15 bps for 4-bytes payload, 17 bps for 8-bytes payload, and 20 bps for 20-bytes payload. In

'high capacity' mode the data rate is exactly 3 times higher (i.e. 60 bps for 20-bytes payload, etc.). The watermarking

rate increases with increase of watermark payload size due to some constant overhead for each watermark copy. A

special parameter of the encoder allows adjusting the data rate making it higher or lower than default. In terms of

AWT2 technology, the data rate means how many copies of the same watermark are embedded into audio fragment of

a specific duration (refer to 2.4). For example, with 4-bytes watermarks, the data rate is 15 bps (in the “normal” mode),

which leads to about 5 watermark copies in 10-seconds long audio fragment ( [10sec * 15bps] / [4 bytes * 8 bit] = 4.7 ).

In ‘high_capacity’ mode AWT2 allows performing “dual-layer watermarking” (optional). Two independent watermarks

can be embedded into the same audio stream at different stages of audio content distribution (controlled by “–layer”

2 Supported wave files: basic and WAVE_FORMAT_EXTENSIBLE, sampled at 192000, 176400, 96000, 88200, 48000, 44100, 32000,

22050, 16000, 11025, 8000 Hz, signed 16-/24-/32-bit little-endian PCM or unsigned 8-bit PCM, 32-/64-bit IEEE float.

Audio Watermarking Tools 2 (AWT2) - User Guide

5

command line switch). When layered watermarking is used, the watermarking data rate on each layer corresponds to

the data rate in ‘normal mode’.

The watermarking algorithm works in the time domain in several frequency sub-bands. The overall idea behind the

algorithm consists in embedding of a binary watermark payload into carrier audio signal in the time domain by time-

shifting the carrier signal blocks.

This watermarking approach is patented3.

The algorithm can be applied to practically all kinds of audio data. Typical examples: music (pop, jazz, classics, rock),

speech recordings, musical instrument samples, etc.

Each particular copy of AWT2 binaries with particular Serial Number (SN) contains a unique numeric identifier that is

used during encoding to scramble watermark payload. This security feature prevents one AWT2 user (with one SN) from

extracting watermarks from watermarked files created by another AWT2 user (with another SN number).

If you run AWT2 on x64 machine, you might want to use AWT2 x64 binaries because of the ability of x64 machine to

operate with greater amount of RAM.

3 U.S. Patent No. 8,116,514.

Audio Watermarking Tools 2 (AWT2) - User Guide

6

2.2 AWT2 applications

AWT2 is a universal, scalable and flexible solution addressing different application and use-cases. AWT2 targets

individual musicians, composers, music authors and music owners, audio recording, mastering and production studios,

on-line music stores, broadcasting companies and other small and medium sized businesses. Here are some of the most

common AWT2 applications:

content signing and ownership ascertainment

- audio content is marked with a special digital signature stating ownership

[ relevant for audio content authors and owners such as musicians, recording labels and studios ]

copy protection and monitoring

- each recipient gets his own copy of an audio material watermarked with unique signature

[ relevant for music web-stores, broadcasting companies, FM radio stations, content distributors ]

signaling and triggering via audio

- varying binary identifiers (triggers) are embedded into audio stream by a transmitter (live or by pre-processing)

to trigger events on the receiving side

[ relevant for broadcasting companies, monitoring and license ascertaining services ]

Refer to page 31 for more information and examples of typical AWT2 use cases.

With AWT2 you can protect your rights by ensuring that your watermark is extractable (only by you!) from any copy of

your audio materials transmitted or broadcasted anywhere.

Please read licensing terms in section 9.

Audio Watermarking Tools 2 (AWT2) - User Guide

7

2.3 Watermark robustness and aural transparency

On robustness

The proposed watermarking scheme demonstrates very high robustness to almost all kinds of audio conversions. Here

are some typical examples:

lossy transcoding using MP3, Ogg Vorbis and other audio codecs (including multiple transcoding at very low

bitrates)

acoustic coupling (i.e. traveling of sound from D/A to loudspeaker, then to the microphone via air and then to

A/D)

time-stretching (time scaling) up to 50%

mixing with other signals, noise addition

signal cropping, cutting

sample rate conversion (even down to low sample rates such as 8 KHz that is typical for phone networks);

amplitude re-quantization

effect processing, from a simple EQ to an extreme dynamic range compression, reverberation, echo, spectral

effects, etc.

waveform distortions such as limiting, clipping, slope manipulation, gain control

A/D - D/A conversion

transition over radio waves

A quick example: the watermark survives even transcoding with low-bitrate MP3 codec and then transducing of the

transcoded signal via air (i.e. reproducing it with a loudspeaker and recording with a microphone).

In 'high capacity' operating mode the algorithm uses extended frequency range to embed watermarks. This results in 3

times higher data rate, but leads to reduced robustness against conversions with sample rates lower than 16 KHz or

extreme lossy transcoding.

It is worth underlining the ability of the decoder to detect and successfully extract watermarks even from aggressively

time-stretched audio streams. It’s not a secret that time-stretching of 1-2% is often used on radio stations in order to

free more air time for advertisements. AWT2 decoder can cope with time-stretching of up to 50%.

Audio Watermarking Tools 2 (AWT2) - User Guide

8

On transparency

With default parameters, the proposed watermarking algorithm demonstrates practically undistinguishable

watermarking which is transparent to an average listener with audio equipment of any quality on majority of audio

content. For the sake of truth it should be noted that (like with any other real world technology) there are examples of

very specific audio samples that may reveal some watermarking artifacts compared to original non-watermarked audio,

however in such specific cases these artifacts are rather minor and may be noticeable only to experienced listener.

Depending on the target needs, the user may adjust encoding parameters (namely, watermarking “density”,

“aggressiveness” and “modulation gain”) to achieve optimal balance between aural transparency and robustness.

Audio Watermarking Tools 2 (AWT2) - User Guide

9

2.4 How the algorithm works

Below is a brief high-level description of the watermarking algorithm of AWT2:

Watermark payload is converted into a watermarking data packet containing encrypted watermark payload and

error-correction code

The source audio stream is decomposed into several frequency sub-band signals (carrier signals); in 'normal'

operating mode a few bands in narrow frequency range are used, in 'high capacity' operating mode more bands

are utilized and therefore wider frequency range is involved

Each sub-band signal is then divided into blocks of a certain length

Each block of the carrier sub-band signal is associated with corresponding symbol (bit) of the watermark

payload data packet

Each block of the carrier signal is then time-shifted (forwards or backwards in time) to a degree determined

from inter-band signal analysis and associated with the corresponding watermark payload data packet symbol

(bit) value

The watermark is repeated throughout the entire audio stream as many times as the stream duration permits

The output (watermarked) audio stream is then synthesized by combining modified (time-shifted) carrier sub-

band signals back into full band signal

On the last stage the output (watermarked) stream is sent to the output (stored on disk in form of audio file)

For increased reliability, different statistical, security, error-correction and signal processing mechanisms are applied.

Important outcomes:

number of copies of the watermark payload embedded into audio stream is proportional to its duration

maximal degree of the time-shifts (which is a parameter called “aggressiveness”) impacts on robustness and

(im-)perceptibility of the watermark: lower degrees result in less robust and less perceptible watermark, and

vice versa.

block size (which is a parameter called “density”) impacts on robustness and imperceptibility: greater block size

results in less payload copies embedded into the audio stream

Due to a constant overhead for each copy of the watermark payload, watermarking rate increases with increase of the

payload size.

Audio Watermarking Tools 2 (AWT2) - User Guide

10

On the decoding stage, the decoder performs the same frequency band decomposition steps as for the encoding. It then

collects “statistics” over the analyzed fragment and looks for watermark copies in it by performing self-synchronization

attempts using sophisticated logic and heuristics based on inter-band signal analysis. The more watermark copies are

contained in the analyzed fragment, the higher the probability of finding proper synchronization and successful

watermark extraction. Since the whole process is based on collecting “statistics”, at least several watermark copies must

be contained in the analyzed fragment to allow the extraction. The algorithm is unable to extract watermark from audio

fragment containing only 1-2 copies of the watermark.

Audio Watermarking Tools 2 (AWT2) - User Guide

11

3 GUI front-end utility

AWT2 GUI is a convenient front-end utility for encoder and decoder. AWT2 GUI allows the end-user to encode (embed

watermarks) and decode (extract watermarks from) audio files without the need to manually operate with the

command-line utilities. The GUI tool is available for Windows, Mac OS X and Linux platforms.

AWT2 GUI has two operating modes - encoding and decoding, and therefore has two main screens (tabs):

All essential and optional parameters that are available in AWT2 GUI tool exactly correspond to those of the encoder

(page 13) and decoder (page 19) command line tools. Please refer to the corresponding sections of this document for

more information. Along with this, the GUI tools do not offer some advanced controls available in the command-line

tools.

AWT2 GUI application supports drag-n-drop technique. You can specify input and output file names by dragging and

dropping them into corresponding fields.

Audio Watermarking Tools 2 (AWT2) - User Guide

12

Please note that for proper operation, the GUI application and the console utilities must be located in the same folder

on disk.

Audio Watermarking Tools 2 (AWT2) - User Guide

13

4 Encoder

The encoder embeds watermark payload into an audio stream (audio file). The usage screen is shown (below) when the

encoder is run from the command line without parameters:

In order to encode a file, you need to specify input and output file names, and watermark payload (refer to page 31 for

more information about watermark payload form). Other encoding parameters are optional.

Audio Watermarking Tools 2 (AWT2) - User Guide

14

The encoder operates with wave PCM files of almost any format - with sampling rates from 8 to 192 KHz, and amplitude

resolution of 8/16/24/32/64 bits and any number of channels4.

The watermark payload should be specified in a hexadecimal form, i.e. using “0”-“9” and “A”-“F” characters (for

example, “0xFE21” which is 2 bytes). Supported length of watermarks is from 1 to 120 bytes. Recommended watermark

payload sizes are from 1 to 20 bytes in ‘normal’ (default) operating mode and up to 50 bytes in ‘high capacity’ operating

mode. For more information about watermark payload form, please refer to page 31.

Please note: Maximal watermark payload size that is supported by particular copy of the encoder depends on the

version of AWT2 package that you purchase. This can be: 1 byte max, 2 bytes max, 4 bytes max or unlimited.

If you are using demo version of the AWT2 package, the only watermarks that can be embedded are: 0x12, 0x56 (1

byte), 0x7890, 0x9876 (2 bytes), 0xABCDEF12, 0x12345678 (4 bytes), 0x112233445566 (6 bytes), 0x1122334455667788

(8 bytes) and 0x00112233445566778899001122…778899 (40 bytes). Despite this limitation, the demo package offers

enough flexibility to perform a variety of performance tests so that you can decide whether AWT2 suits your needs

before buying a fully functional version.

Here is an example of running the encoder. The input file is “music.wav”, the output (watermarked) file is “music-

wtr.wav”, and the watermark payload is “0xABCDEF12”:

awt2_enc music.wav music-wtr.wav 0xABCDEF12

In this example the operating mode, “density”, “aggressiveness” and “modulation gain” settings remain defaults.

Depending on your needs, you may consider customizing these settings.

Watermarking density is controlled by a parameter −density. Denser watermarking leads to higher watermarking data

rate. For a certain payload size this means embedding more copies of the watermark by decreasing the block size

associated with one watermark payload symbol. The more copies of the payload are embedded, the higher is the

statistical probability of successful watermark extraction from a signal of certain duration. Denser watermarking can be

achieved by specifying lower values for the density parameter. For example, −density=0.75. The default value of density

is 1.0.

Watermarking aggressiveness is controlled by a parameter −aggressiveness. More aggressive watermarking allows

larger time-shifts during embedding of the watermark thus increasing its robustness (refer to page 9). More aggressive

watermarking can be achieved by specifying higher values for the aggressiveness parameter. For example,

−aggressiveness=1.9. The default value of aggressiveness is 1.5 in ‘normal’ operating mode which provides over-air

transducing robust watermarking. Its default value in ‘high capacity’ mode is 1.0.

4 Supported wave files: basic and WAVE_FORMAT_EXTENSIBLE, sampled at 192000, 176400, 96000, 88200, 48000, 44100, 32000,

22050, 16000, 11025, 8000 Hz, signed 16-/24-/32-bit little-endian PCM or unsigned 8-bit PCM, 32-/64-bit IEEE float.

Audio Watermarking Tools 2 (AWT2) - User Guide

15

Please note that denser and more aggressive watermarking increases the robustness of the watermark, but may lead to

decreased audio transparency. You may need to experiment with different settings before using them. The default

settings are believed to provide transparent and robust watermarking.

There is an additional −modulation_gain setting, which triggers a modulation mechanism of the encoder aimed to

increase watermark robustness. With default setting (0 dB) the modulation of the signal is not performed. Lower values

(e.g. -6 dB) enable special modulation of sub-band signals that helps the watermarks to withstand audio transformations

and corruptions. In addition, the use of the modulation improves watermarking reliability of monotonous signals. “Safe”

(imperceptible) modulation gain values are from 0 to -12 dB. Note: the use of the modulation functionality affects

encoding speed.

If you require operating with long watermark payloads (up to 120 bytes), you may need to use ‘high capacity’ operating

mode (-high_capacity switch) which enables 3 times greater data rate (at expense of some loss of watermark

robustness against resampling with very low frequency).

In the ‘high_capacity’ mode AWT2 allows performing dual-layer watermarking. Two independent watermarks can be

embedded into the same audio stream at different stages of audio content distribution. The dual-layer watermarking is

controlled by the –layer switch. When layered watermarking is used, the watermarking data rate on each layer

corresponds to the data rate in the ‘normal mode’. The watermarks contained in the two layers are fully independent.

Setting –layer=1 forces the encoder to place the watermark at the 1st layer, –layer=2 corresponds to the 2nd watermark

layer. The –layer switch can be used only together with –high_capacity.

Here are additional examples of running the encoder with customized parameters:

awt2_enc music.wav music-wtr.wav 0xABCDEF12 –density=0.8

awt2_enc music.wav music-wtr.wav 0xABCDEF12 –high_capacity

awt2_enc music.wav music-wtr.wav 0xABCDEF12 –high_capacity –aggressiveness=1.1 –density=0.8

awt2_enc music.wav music-wtr.wav 0x12345678 –high_capacity –layer=1

Useful tips on file formats conversion and some examples of AWT2 use can be found in section 8.

The encoder has several additional parameters controlling its performance and output:

−skip_first= / −skip_last= specifies a number of seconds at the start/end of the input audio that should be

skipped by the encoder (i.e. left untouched, not watermarked). This feature can be useful to allow smooth

cross-fades between two or more consequent fragments of a continuous audio stream. For example, one way to

watermark very long audio stream is to divide it into shorter fragments, then watermark each fragment

separately, and then construct the output from the watermarked fragments by combining them together with

cross-fades. If specified −skip_last value is negative, it forces the encoder to watermark only specified duration

of audio (instead of skipping the same duration at the end).

Audio Watermarking Tools 2 (AWT2) - User Guide

16

−silent flag can be used if you do not want any console output. Only error messages are displayed in this mode

(if any).

−noise_level defines a level of artificially generated noise mixed into the processed audio signal. This feature

resolves a problem with watermarking of speech recordings. Typical speech recordings usually contain many

regions of silence (pauses between words and phrases). Due to the lack of any audible information in these

regions, the watermarks cannot be added into them, that reduces overall watermark robustness in the output

signal. To overcome this problem, artificially generated low-level “comfort noise” can be mixed into the source

audio signal to fill the silent gaps and allow carrying watermarks even in originally silent regions. It is

recommended to use this feature to watermark speech recordings. Recommended noise level setting is around

-50 dB (optimal setting can vary depending on input signal level and content).

−output_gain defines attenuation level of the output (watermarked) signal. As a result of encoder’s signal

processing, the output signal amplitude can become higher than the input. To prevent possible

saturation/clipping of the output signal, dynamic range limiter is applied on the very last stage of the

processing. However, in some cases use of the limiter may be not desirable. The user can specify a desired

output attenuation gain to avoid automatic limitation of the output signal.

−limiter_speed defines how fast the dynamic range limiter reacts on amplitude changes. Higher settings lead to

slower limiter release and longer look-ahead times. Lower settings lead to shorter times and faster reaction. The

exact formulae for the limiter [attack, release] times is: [ceil(N/2), N], ms. E.g., for –limiter_speed=6, the attack

(look-ahead) time is 3 ms, release time is 6 ms.

−8bit, −16bit, −24bit, −32bit, 64bit control output files data format.

−nosplash prevents showing program information on the screen

−show_time forces the encoder to show processing time

4.1 Recommended encoding parameters for different types of audio content

The present section provides several examples of “recommended” encoding parameters for different possible types of

audio content.

1. Studio quality music intended for playback using “good” quality audio equipment

In most of the cases, the default encoding settings (such as –density=1.0, –aggressiveness=1.5 and –

modulation_gain=0) provide optimal balance between imperceptibility of watermarks and watermark

robustness. With these settings the watermarks are robust enough to withstand even multiple transcodings

using low-bitrate lossy audio codecs. The use of ‘high capacity’ mode is optional, depending on requirements on

watermark payload length.

Examples:

Audio Watermarking Tools 2 (AWT2) - User Guide

17

awt2_enc music.wav music-wtr.wav 0xABCDEF12

(all default settings)

awt2_enc music.wav music-wtr.wav 0xABCDEF12 –density=1.0 –aggressiveness=1.5 –modulation_gain=0

(which is effectively the same as above)

awt2_enc music.wav music-wtr.wav 0xABCDEF12 –density=1.0 –aggressiveness=1.0 –modulation_gain=0 –high_capacity

(effectively the same as defaults in “high capacity” mode)

2. Music intended for broadcasting, distribution, publishing and playback over average-quality audio

channels/equipment (such as Youtube, FM radio, etc.)

Recommended settings are a little more aggressive than in (1), namely:

–density=0.75 .. 1.0, –aggressiveness=1.5 .. 2.0, –modulation_gain=-6 .. -12.

The use of ‘high capacity’ mode is optional, depending on requirements on watermark payload length.

Examples:

awt2_enc music.wav music-wtr.wav 0xABCDEF12 –density=0.9 –aggressiveness=2.0 –modulation_gain=-6

awt2_enc music.wav music-wtr.wav 0xABCDEF12 –density=0.9 –aggressiveness=1.0 –modulation_gain=-12 –high_capacity

3. High quality speech recordings (such as audio books)

Recommended settings in this case are similar to (2) with addition of artificial noise generation of low level to fill

possible silent gaps (pauses) between words/phrases/sentences: -noise_level=-50. The use of “high capacity”

mode is optional.

Example:

awt2_enc speech.wav speech-wtr.wav 0xABCDEF12 –density=0.9 –aggressiveness=1.5 –modulation_gain=-12 –noise_level=-50.

4. Low quality speech recordings (such as phone conversation recordings).

Recommended settings in this case are similar to (3) except for a little denser/aggressive watermarking and

avoidance of using the “high capacity” mode due to narrow signal spectrum (typical phone recording does not

contain audible information above 4 KHz).

Audio Watermarking Tools 2 (AWT2) - User Guide

18

Example:

awt2_enc speech.wav speech-wtr.wav 0xABCDEF12 –density=0.85 –aggressiveness=2.0 –modulation_gain=-12 –noise_level=-45.

Audio Watermarking Tools 2 (AWT2) - User Guide

19

5 Decoder

The decoder extracts watermark from an audio file. The usage screen is shown (below) when the decoder is run from

the command line without parameters:

Audio Watermarking Tools 2 (AWT2) - User Guide

20

IMPORTANT: To successfully detect and extract watermark payload from an audio file, you must specify expected

watermark payload size and exactly the same “density”, “aggressiveness” and operating mode parameters that were

used at the encoding stage.

The requirement to specify decoding settings such as expected watermark payload size, density, aggressiveness and

operating mode (‘normal’ or ‘high capacity’) is generally not an issue. You can choose suitable payload form/size (refer

to page 31) and watermarking parameters for a specific application/use-case once, and then keep using the same

settings to encode all your files. For example, you can decide once and for all to use 12-bytes long watermark payload

and some certain aggressiveness and density parameters. This way you will always know how the parameters for the

decoder should be specified.

For example, if the following was the command line used for encoding:

awt2_enc music.wav music-wtr.wav 0xABCDEF12

then you need to run the decoder as follows:

awt2_dec music-wtr.wav 4

‘4’ at the end tells to decoder that the expected watermark payload is 4 bytes long.

Another example:

awt2_enc music.wav music-wtr.wav 0x112233445566 –aggressiveness=1.6 –density=0.8

awt2_dec music-wtr.wav 6 –aggressiveness=1.6 –density=0.8

Additional example:

awt2_enc music.wav music-wtr.wav 0x1234 –high_capacity –density=1.2

awt2_dec music-wtr.wav 2 –high_capacity –density=1.2

Dual-layer watermarking example:

awt2_enc music.wav music-wtr.wav 0x1234 –high_capacity –layer=1

awt2_dec music-wtr.wav 2 –high_capacity –layer=1

Audio Watermarking Tools 2 (AWT2) - User Guide

21

Since the watermarking data rate provided by AWT2 is high, it is generally enough to use only a portion of the

watermarked stream to detect and extract the watermark from it (typically 15-60 seconds, depending on watermark

payload size), especially when the audio is marginally distorted compared to the source file. On the other hand, if the

audio stream is significantly distorted or if the watermark payload is long, you might need to use longer portion or the

entire available watermarked recording because analysis of longer signals containing more copies of the watermark

statistically improves extraction reliability.

Watermark extraction reliability depends on degree of quality degradation of the analyzed audio stream (distortions,

noises, extraneous sounds, etc.) relative to the original watermarked stream. Additionally, the reliability depends on

“aural relevance” of the analyzed (decoded) audio stream. For example, an attempt to extract a watermark from a

stream containing many silent parts naturally leads to less reliable results than when using a stream with no silent parts.

Recommendations on audio fragment durations optimal for decoding are provided in section 5.1.

False positive, i.e. “successful” extraction of incorrect watermarks is possible, but is practically rare. One possible way to

get false positive is to try extracting watermark from a watermarked stream by specifying expected watermark payload

incorrectly. This confusion should be avoided.

Current version of decoder accepts almost any audio file formats via FFmpeg (http://ffmpeg.org/) while the native

format for AWT2 decoder is wave (.wav) PCM file. It means that if specified file is not wave PCM, the decoder will

automatically attempt to convert it into wave by running FFmpeg. FFmpeg executable should be located in current

directory (or pointed by -ffmpeg_path command line switch). FFmpeg is distributed as a part of AWT2 package. You can

also download FFmpeg binaries for Windows and Linux separately here.

Useful tips on manual file formats conversion can be found in section 8.

AWT2 decoder allows analyzing long audio recordings by slices. This feature is especially useful in cases when

watermarks search should be performed within recordings of broadcasts in which not the whole recording is

watermarked, but only parts of it (e.g. particular songs in radio broadcast). There are two parameters controlling this

mechanism: –slice_length and –slice_step. They define respectively length (duration) of audio stream slices taken for

analysis, and analysis step; both should be specified in seconds.

For example, this AWT2 decoder call:

awt2_dec long-broadcast-rec.wav 1 –slice_length=10 –slice_step=4 –fast_scan

results in the following output:

Reading input file 'long-broadcast-rec.wav'... Done [16000 Hz, 16 bit, 1 chan, 151 sec] Analyzing audio by slices... [00:00.00 - 00:00.10] [00:00.04 - 00:00.14] Found watermark: 0x12 (71%) [00:00.08 - 00:00.18] Found watermark: 0x12 (98%) [00:00.12 - 00:00.22] Found watermark: 0x12 (98%) [00:00.16 - 00:00.26] Found watermark: 0x12 (77%) [00:00.20 - 00:00.30] Found watermark: 0x12 (41%) [00:00.24 - 00:00.34]

Audio Watermarking Tools 2 (AWT2) - User Guide

22

[00:00.28 - 00:00.38] [00:00.32 - 00:00.42] [00:00.36 - 00:00.46] [00:00.40 - 00:00.50] [00:00.44 - 00:00.54] Found watermark: 0x56 (48%) [00:00.48 - 00:00.58] Found watermark: 0x56 (79%) [00:00.52 - 00:01.02] Found watermark: 0x56 (97%) [00:00.56 - 00:01.06] Found watermark: 0x56 (96%) [00:01.00 - 00:01.10] Found watermark: 0x56 (96%) [00:01.04 - 00:01.14] Found watermark: 0x56 (97%) [00:01.08 - 00:01.18] Found watermark: 0x56 (59%) [00:01.12 - 00:01.22] [00:01.16 - 00:01.26] [00:01.20 - 00:01.30] [00:01.24 - 00:01.34] [00:01.28 - 00:01.38] Found watermark: 0x12 (43%) [00:01.32 - 00:01.42] Found watermark: 0x12 (78%) [00:01.36 - 00:01.46] Found watermark: 0x12 (71%) [00:01.40 - 00:01.50] [00:01.44 - 00:01.54] [00:01.48 - 00:01.58] [00:01.52 - 00:02.02] [00:01.56 - 00:02.06] Found watermark: 0x12 (49%) [00:02.00 - 00:02.10] Found watermark: 0x12 (80%) [00:02.04 - 00:02.14] Found watermark: 0x12 (98%) [00:02.08 - 00:02.18] Found watermark: 0x12 (62%) [00:02.12 - 00:02.22] [00:02.16 - 00:02.26] [00:02.20 - 00:02.30] Done.

Recommendations on slice durations optimal for decoding are provided in section 5.1.

The decoder has additional parameters controlling its performance and output:

–time_stretch= parameter allows controlling the time-stretch extent in which the processed audio stream is

analyzed. Higher values lead to proportionally longer analysis but allow detecting watermark in audio with

changed playback speed.

–fast_tempo_only switch allows to speed up synchronization stage of watermark decoding process by limiting

time-stretched audio analysis to analysis of increased audio playback speed only. In this case any possible

slowed down playback speed is not considered.

–slow_tempo_only switch allows to speed up synchronization stage of watermark decoding process by limiting

time-stretched audio analysis to analysis of decreased audio playback speed only. In this case any possible

increased playback speed is not considered.

Note: using both –fast_tempo_only and –slow_tempo_only switches simultaneously disables any time-stretch

analysis (similar to setting –time_stretch=0)

Audio Watermarking Tools 2 (AWT2) - User Guide

23

–fast_scan switch forces the decoder to perform simpler and thus faster synchronization scan (including time-

stretch analysis) at expense of some loss of reliability. Use of this switch is reliable enough on not heavily

distorted audio streams. With heavily distorted audio (such as air recordings) it is recommended to avoid using

this switch.

–fast_calcs switch forces the decoder to perform simpler and thus significantly faster calculations at expense of

some minor loss of reliability. Though this switch impacts both syncing and extraction stages, syncing stage

speed is increased more than extraction speed. Use of this switch is reliable enough on not heavily distorted

audio streams.

Note: While both –fast-scan and –fast_calcs switches can be used simultaneously, the use of –fast_scan switch

has more perceptible impact on detection reliability than the use of –fast_calcs switch. Experiments show that –

fast_calcs switch is safe to use in most of the cases unless you deal with extremely distorted audio or have

especially high detection requirements.

–slice_length, –slice_step - AWT2 decoder allows analyzing long audio recordings by slices. This feature is

especially useful in cases when watermarks search should be performed within recordings of broadcasts in

which not the whole recording is watermarked, but only parts of it (e.g. particular songs in radio broadcast).

These parameters define respectively length (duration) of audio stream slices taken for analysis, and analysis

step; both should be specified in seconds. Refer to sections 8.3 and 5.1.

–stop_on_found switch is relevant for audio analysis by slices only. It forces to stop analysis on the first found

watermark. The switch can also be used as –stop_on_found=N, where N is integer > 0. In this case the sliced

analysis stops after N found watermarks.

–allow_not_reliable switch force the decoder to list watermarks that have low reliability (that are normally

skipped and not considered during the decoding process). Use this feature with caution.

–silent switch can be used if you do not want any console output. Only error messages are displayed in this

mode (if any).

–ffmpeg_path= allows to define a full path to FFmpeg executable including file name. AWT2 decoder will run

FFmpeg using the specified string.

–tmp_file= allows to define full path and name of temporary file that is created by FFmpeg if conversion of the

input file is necessary.

−nosplash switch prevents showing program information on the screen

−show_time switch forces the encoder to show processing time

Audio Watermarking Tools 2 (AWT2) - User Guide

24

5.1 Recommended audio fragment (slice) duration for decoding

This section provides very rough estimation of optimal audio fragment (slice) duration recommended for watermark

extraction (decoding).

Since watermark decoding is based on statistics collected throughout the analyzed audio fragment, the latter should

contain enough watermark copies in it to allow reliable watermark detection. Amount of copies in a given piece of audio

depends on two parameters: expected watermarking density and expected watermark payload size (both as was used

on encoding stage).

Two tables below show optimal audio fragment duration (in seconds) as a function of payload size (in bytes),

watermarking density and operating mode. The recommended fragment duration is shown as a range. The first number

in the range corresponds to 10 watermark copies per fragment; the second number corresponds to 30 watermark

copies per fragment.

For example, in ‘normal mode’, recommended audio fragment size for finding 4-bytes watermark with expected

density=0.8 is from 17 to 52 seconds (where 17 seconds corresponds to 10 watermark copies, and 52 seconds to 30

watermark copies).

The fragment durations presented below should be considered as rough recommendation only. The actual duration

chosen by the user should take into account audio content type and its “aural relevancy”, quality degradation (relative

to the original watermarked audio) and other relevant factors.

The tables below are for watermark payload sizes from 1 to 20 bytes, with some density values excluded to save space

and simplify table reading.

‘Normal’ mode

Density

0.5 0.6 0.8 0.9 1.0 1.1 1.2 1.4 1.5

Wat

erm

ark

pay

load

siz

e, b

yte

s

1 [6 - 17] [7 - 20] [9 - 26] [10 - 30] [11 - 33] [12 - 36] [13 - 39] [15 - 46] [16 - 49]

2 [8 - 25] [10 - 29] [13 - 39] [15 - 44] [16 - 49] [18 - 54] [19 - 58] [22 - 68] [24 - 73]

3 [10 - 29] [11 - 34] [15 - 46] [17 - 51] [19 - 57] [21 - 62] [22 - 68] [26 - 79] [28 - 85]

4 [11 - 33] [13 - 39] [17 - 52] [19 - 58] [21 - 65] [23 - 71] [25 - 78] [30 - 91] [32 - 97]

5 [14 - 41] [16 - 49] [21 - 65] [24 - 73] [26 - 81] [29 - 89] [32 - 97] [37 - 113] [39 - 121]

6 [15 - 45] [18 - 54] [23 - 71] [26 - 80] [29 - 89] [32 - 98] [35 - 107] [41 - 124] [43 - 133]

7 [16 - 49] [19 - 58] . [25 - 78] [29 - 87] [32 - 97] [35 - 107] [38 - 116] . [44 - 136] [47 - 145]

Audio Watermarking Tools 2 (AWT2) - User Guide

25

8 [19 - 57] [22 - 68] . [30 - 91] [33 - 102] [37 - 113] [40 - 124] [44 - 136] . [51 - 158] [55 - 169]

9 [20 - 61] [24 - 73] . [32 - 97] [36 - 109] [39 - 121] [43 - 133] [47 - 145] . [55 - 169] [59 - 181]

10 [21 - 65] [25 - 78] [34 - 103] [38 - 116] [42 - 129] [46 - 142] [50 - 155] [59 - 181] [63 - 193]

11 [24 - 73] [29 - 87] [38 - 116] [43 - 131] [47 - 145] [52 - 160] [57 - 174] [66 - 203] [71 - 217]

12 [25 - 77] [30 - 92] [40 - 123] [45 - 138] [50 - 153] [55 - 168] [60 - 184] [69 - 214] [74 - 229]

13 [26 - 81] [32 - 97] [42 - 129] [47 - 145] [52 - 161] [58 - 177] [63 - 193] [73 - 225] [78 - 242]

14 [29 - 89] [35 - 107] [46 - 142] [52 - 160] [58 - 177] [63 - 195] [69 - 213] [80 - 248] [86 - 266]

15 [30 - 93] [36 - 111] [48 - 148] [54 - 167] [60 - 185] [66 - 204] [72 - 222] [84 - 259] [90 - 278]

16 [32 - 97] [38 - 116] [50 - 155] [56 - 174] [63 - 193] [69 - 213] [75 - 232] [88 - 270] [94 - 290]

17 [34 - 105] [41 - 126] [54 - 168] [61 - 188] [68 - 209] [75 - 230] [81 - 251] [95 - 293] [102 - 314]

18 [35 - 109] [42 - 131] [56 - 174] [63 - 196] [70 - 217] [77 - 239] [84 - 261] [98 - 304] [105 - 326]

19 [37 - 113] [44 - 135] [59 - 180] [66 - 203] [73 - 225] [80 - 248] [88 - 270] [102 - 315] [109 - 338]

20 [39 - 121] [47 - 145] [63 - 193] [70 - 217] [78 - 241] [86 - 266] [94 - 290] [109 - 338] [117 - 362]

‘High capacity’ mode

Density

0.5 0.6 0.8 0.9 1.0 1.1 1.2 1.4 1.5

Wat

erm

ark

pay

load

siz

e, b

yte

s 1 [2 - 6] [3 - 7] [3 - 9] [4 - 10] [4 - 11] [5 - 12] [5 - 14] [6 - 16] [6 - 17]

2 [3 - 9] [4 - 10] [5 - 13] [5 - 15] [6 - 17] [6 - 18] [7 - 20] [8 - 23] [8 - 25]

3 [4 - 10] [4 - 12] [6 - 16] [6 - 18] [7 - 19] [7 - 21] [8 - 23] [9 - 27] [10 - 29]

4 [4 - 11] [5 - 13] [6 - 18] [7 - 20] [8 - 22] [8 - 24] [9 - 26] [10 - 31] [11 - 33]

5 [5 - 14] [6 - 17] [8 - 22] [8 - 25] [9 - 27] [10 - 30] [11 - 33] [13 - 38] [14 - 41]

Audio Watermarking Tools 2 (AWT2) - User Guide

26

6 [5 - 15] [6 - 18] [8 - 24] [9 - 27] [10 - 30] [11 - 33] [12 - 36] [14 - 42] [15 - 45]

7 [6 - 17] [7 - 20] . [9 - 26] [10 - 30] [11 - 33] [12 - 36] [13 - 39] . [15 - 46] [16 - 49]

8 [7 - 19] [8 - 23] . [10 - 31] [12 - 34] [13 - 38] [14 - 42] [15 - 46] . [18 - 53] [19 - 57]

9 [7 - 21] [8 - 25] . [11 - 33] [12 - 37] [14 - 41] [15 - 45] [16 - 49] . [19 - 57] [20 - 61]

10 [8 - 22] [9 - 26] [12 - 35] [13 - 39] [15 - 44] [16 - 48] [17 - 52] [20 - 61] [22 - 65]

11 [8 - 25] [10 - 29] [13 - 39] [15 - 44] [16 - 49] [18 - 54] [19 - 58] [22 - 68] [24 - 73]

12 [9 - 26] [10 - 31] [14 - 41] [15 - 46] [17 - 52] [19 - 57] [20 - 62] [24 - 72] [25 - 77]

13 [9 - 27] [11 - 33] [14 - 44] [16 - 49] [18 - 54] [20 - 60] [21 - 65] [25 - 76] [27 - 81]

14 [10 - 30] [12 - 36] [16 - 48] [18 - 54] [20 - 60] [22 - 65] [24 - 71] [27 - 83] [29 - 89]

15 [11 - 31] [13 - 38] [17 - 50] [19 - 56] [21 - 62] [23 - 68] [25 - 75] [29 - 87] [31 - 93]

16 [11 - 33] [13 - 39] [17 - 52] [19 - 58] [21 - 65] [23 - 71] [25 - 78] [30 - 91] [32 - 97]

17 [12 - 35] [14 - 42] [19 - 56] [21 - 63] [23 - 70] [25 - 77] [28 - 84] [32 - 98] [34 - 105]

18 [12 - 37] [15 - 44] [19 - 58] [22 - 66] [24 - 73] [26 - 80] [29 - 87] [33 - 102] [36 - 109]

19 [13 - 38] [15 - 46] [20 - 61] [22 - 68] [25 - 76] [27 - 83] [30 - 91] [35 - 106] [37 - 113]

20 [14 - 41] [16 - 49] [21 - 65] [24 - 73] [26 - 81] [29 - 89] [32 - 97] [37 - 113] [39 - 121]

Audio Watermarking Tools 2 (AWT2) - User Guide

27

6 AWT2 Software Development Kit (SDK)

AWT2 can be easily integrated into software applications using AWT2 SDK. AWT2 SDK consists of a C library file (static or

shared), header file and descriptive integration example. AWT2 SDK is provided upon request and can be compiled for

any OS.

AWT2 API is very simple. It provides watermark encoding and decoding functionality, including operations with wave

files on disk, with PCM audio data located in memory and with live/real-time audio streams (buffer-by-buffer

processing).

AWT2 header file listing is presented below. Examples of AWT2 SDK usage are supplied as part of the SDK package.

// =================================================== // AWT2 SDK header file // =================================================== #if __cplusplus extern "C" { #endif #ifndef AWT2_SDK_H #define AWT2_SDK_H // -------------------------------------------------------------------------- // Returns AWT2 Serial Number information (text) // -------------------------------------------------------------------------- void awt2_sn (char **serial_number); // ============================================================================ // // Operations with wave audio files on disk // // ============================================================================ // ---------------------------------------------------------------------------- // AWT2 encoder, operates with wave (.wav) file. // The wave file must be RIFF wave PCM // returns: 0 if no error, >0 otherwise (see AWT2 doc for error codes) // ---------------------------------------------------------------------------- int awt2_encode_disk_file (char *input_file_name, // INPUT: input audio wave file name char *output_file_name, // INPUT: output audio wave file name char *hex_payload, // INPUT: watermarking payload in hex form (e.g. "0xABCDEF") unsigned high_capacity, // INPUT: normal (0) or high capacity (1) mode unsigned layer, // INPUT: can be 0 (non-layered watermarking, 1 for layer1, 2 for layer2 (applicable only with high_capacity=1) float aggressiveness, // INPUT: aggressiveness (float, 0.7 .. 3.0) float density, // INPUT: density (float, 0.5 - 1.5) int modulation_gain, // INPUT: modulation gain (dBFs, 0 .. -96 dBFs, 0 is default) int noise_level, // INPUT: level of artificially generated noise (dBFs, 0 .. -96 dBFs, -96 is default) int output_gain, // INPUT: gain applied to the output signal (dBFs, 0 .. -96 dBFs, 0 is default)

Audio Watermarking Tools 2 (AWT2) - User Guide

28

float skip_first, // INPUT: skip (do not encode) first N seconds of audio (float) float skip_last, // INPUT: if >0 - skip (do not encode) last N seconds of audio (float); if <0 - encode only abs(N) seconds of audio and skip the rest unsigned limiter_speed); // INPUT: set output limiter speed (limiter [attack, release] times, ms: [ceil(val/2), val]) // ---------------------------------------------------------------------------- // AWT2 decoder, operates with wave (.wav) file. // The wave file must be RIFF wave PCM // returns: 0 if no error, >0 otherwise (see AWT2 doc for error codes) // ---------------------------------------------------------------------------- int awt2_decode_disk_file (char *input_file_name, // INPUT: input audio wave file name unsigned wm_key_length, // INPUT: expected watermarking payload length char **watermark, // OUTPUT: detected watermark (hex) unsigned int *watermark_reliability, // OUTPUT: watermark reliability (%) unsigned high_capacity, // INPUT: normal (0) or high capacity (1) mode unsigned layer, // INPUT: can be 0 (non-layered watermarking, 1 for layer1, 2 for layer2 (applicable only with high_capacity=1) float time_stretch, // INPUT: time stretch factor (float) unsigned fast_tempo_only, // INPUT: allow only fast tempo time-stretch analysis (faster) unsigned slow_tempo_only, // INPUT: allow only slow tempo time-stretch analysis (faster) unsigned fast_scan, // INPUT: fast scan mode (0 or 1) unsigned fast_calcs, // INPUT: fast calcs mode (0 or 1) unsigned allow_not_reliable, // INPUT: allow detecting even not reliable watermarks (0 or 1) float aggressiveness, // INPUT: aggressiveness (float, 0.7 .. 3.0) float density); // INPUT: density (float, 0.5 - 1.5) // ============================================================================ // // Operations with PCM audio data in memory // // ============================================================================ // -------------------------------------------------------------------------- // AWT2 encoder, operates with sampled wave PCM data in memory, 1 (mono) channel. // Several channels should be processed separately // The audio data must be PCM in floating point format, sample values in range: -1.0 .. +1.0 // returns: 0 if no error, >0 otherwise (see AWT2 doc for error codes) // -------------------------------------------------------------------------- int awt2_encode_memory_file (float *input_data, // INPUT: input PCM audio data, floats -1.0 .. +1.0 unsigned input_data_size, // INPUT: data size, samples float **output_data, // OUPUT: output PCM audio data, floats -1.0 .. +1.0 unsigned fs, // INPUT: data sampling frequency, Hz char *hex_payload, // INPUT: watermarking payload in hex form (e.g. "0xABCDEF") unsigned high_capacity, // INPUT: normal (0) or high capacity (1) mode unsigned layer, // INPUT: can be 0 (non-layered watermarking, 1 for layer1, 2 for layer2 (applicable only with high_capacity=1) float aggressiveness, // INPUT: aggressiveness (float, 0.7 .. 3.0) float density, // INPUT: density (float, 0.5 - 1.5) int modulation_gain_db, // INPUT: modulation gain (dBFs, 0 .. -96 dBFs, 0 is default) int noise_level_db, // INPUT: level of artificially generated noise (dBFs, 0 .. -96 dBFs, -96 is default) int output_gain_db, // INPUT: gain applied to the output signal (dBFs, 0 .. -96 dBFs, 0 is default) float skip_first, // INPUT: skip (do not encode) first N seconds of audio (float) float skip_last, // INPUT: if >0 - skip (do not encode) last N seconds of audio (float); if <0 - encode only abs(N) seconds of audio and skip the rest

Audio Watermarking Tools 2 (AWT2) - User Guide

29

unsigned limiter_speed); // INPUT: set output limiter speed (limiter [attack, release] times, ms: [ceil(val/2), val]) // -------------------------------------------------------------------------- // AWT2 decoder, operates with sampled wave PCM data in memory, 1 (mono) channel. // Several channels should be processed separately // The audio data must be PCM in floating point format, sample values in range: -1.0 .. +1.0 // returns: 0 if no error, >0 otherwise (see AWT2 doc for error codes) // -------------------------------------------------------------------------- int awt2_decode_memory (float *float_pcm_audio_data, // INPUT: audio data samples array unsigned total_samples, // INPUT: total number of samples unsigned Fs, // INPUT: audio sampling rate (allowed values: 192000, 176400, 96000, 88200, 48000, 44100, 32000, 22050, 16000, 11025, 8000 Hz) unsigned wm_key_length, // INPUT: expected watermarking payload length char **watermark, // OUTPUT: detected watermark (hex) unsigned int *watermark_reliability, // OUTPUT: watermark reliability (%) unsigned high_capacity, // INPUT: normal (0) or high capacity (1) mode unsigned layer, // INPUT: can be 0 (non-layered watermarking, 1 for layer1, 2 for layer2 (applicable only with high_capacity=1) float time_stretch, // INPUT: time stretch factor (float) unsigned fast_tempo_only, // INPUT: allow only fast tempo time-stretch analysis (faster) unsigned slow_tempo_only, // INPUT: allow only slow tempo time-stretch analysis (faster) unsigned fast_scan, // INPUT: fast scan mode (0 or 1) unsigned fast_calcs, // INPUT: fast calcs mode (0 or 1) unsigned allow_not_reliable, // INPUT: allow detecting even not reliable watermarks (0 or 1) float aggressiveness, // INPUT: aggressiveness (float, 1.0 .. 3.0) float density); // INPUT: density (float, 0.5 - 1.5) // ============================================================================ // // "Live" (real-time) audio stream encoding // // ============================================================================ // AWT2 Encoder object declaration typedef struct _AWT2_ENC_OBJ AWT2_ENC_OBJ, *PAWT2_ENC_OBJ; // -------------------------------------------------------------------------- // Allocates AWT2 stream encoder object and initializes it. // Returns (via arguments): // - pointer on enc_obj_ptr (or NULL on error). // - algorithmic delay - i.e. how many samples must be feed to the encoder input // in order to get first sample on the output // (depends on sampling frequency, aggressiveness, density, high_capacity) // - size of one audio buffer expected by the encoding function // (depends on sampling frequency, aggressiveness, density, high_capacity) // Returns: // 0 on success, >0 on error (see AWT2 doc for error codes) // Notes: // * hex_payload is expected to end with \0 (nil) // * high_capacity >1 is treated as =1 // * the object is for ONE channel of audio. The audio buffers must be feed to the encoding // function in the sequential order of the audio stream. // * multiple channels require using multiple/separate encoder objects // -------------------------------------------------------------------------- unsigned awt2_encode_stream_init(PAWT2_ENC_OBJ *enc_obj_ptr, // OUTPUT: pointer on AWT2 Encoder object unsigned *algorithmic_delay, // OUTPUT: algorithmic delay, samples unsigned *buffer_size, // OUTPUT: input/output buffer size, samples unsigned pre_fs, // INPUT: sampling frequency, Hz (e.g. 8000, 16000, 44100, etc.) char *hex_payload, // INPUT: watermarking payload in hex form (e.g. "0xABCDEF") unsigned high_capacity, // INPUT: normal (0) or high capacity (1) mode

Audio Watermarking Tools 2 (AWT2) - User Guide

30

unsigned layer, // INPUT: can be 0 (non-layered watermarking, 1 for layer1, 2 for layer2 (applicable only with high_capacity=1) float aggressiveness, // INPUT: aggressiveness (float, 0.7 .. 3.0) float density, // INPUT: density (float, 0.5 - 1.5) int modulation_gain_db, // INPUT: modulation gain, dBFs (0 .. -96 dB, 0 is default) int noise_level_db, // INPUT: level of artificially generated noise, dBFs (0 .. -96 dB, -96 is default) int output_gain_db, // INPUT: gain applied to the output signal, dBFs (0 .. -96 dB, 0 is default) unsigned limiter_speed); // INPUT: output limiter speed (limiter [attack, release] times, ms: [ceil(val/2), val]) // -------------------------------------------------------------------------- // Processes one audio buffer. The size of the audio buffer is assumed // as returned by the initialization function. // AWT2 stream encoder object must be initialized prior to this call. // Returns: // 0 on success, >0 on error (see AWT2 doc for error codes) // -------------------------------------------------------------------------- int awt2_encode_stream_buffer(PAWT2_ENC_OBJ enc_obj, float *input_buffer, float *output_buffer); // -------------------------------------------------------------------------- // Changes/sets watermark payload in already initialized encoder object. // Allows to change watermark payload on the fly during live stream watermarking // Returns 0 on no error, or non-zero on error // Notes: // * new_hex_payload must end with ASCII 0x00 ("\0"), up to 255 symbols in length // -------------------------------------------------------------------------- unsigned awt2_encode_stream_set_payload(PAWT2_ENC_OBJ enc_obj_ptr, char *new_hex_payload); // -------------------------------------------------------------------------- // Enables/disables processing (watermarking) bypass. // If bypass is disabled (bypass=0), awt2_encode_stream_buffer() call outputs watermarked output (normal working state). // If bypass is enabled (bypass=1), awt2_encode_stream_buffer() call outputs not watermarked audio data. // Encoder object must be initialized. // Returns 0 on no error, or non-zero on error // -------------------------------------------------------------------------- unsigned awt2_encode_stream_set_bypass(PAWT2_ENC_OBJ enc_obj_ptr, unsigned bypass); // -------------------------------------------------------------------------- // Kills AWT2 stream encoder object and frees all allocated memory // -------------------------------------------------------------------------- void awt2_encode_stream_kill(PAWT2_ENC_OBJ enc_obj); #endif #if __cplusplus } // Extern C #endif

Audio Watermarking Tools 2 (AWT2) - User Guide

31

7 Choosing optimal watermark payload form

AWT2 operates with watermark payload in the hexadecimal form (e.g. 0xABCDEF12). Therefore any information that

you want to include into your watermark must be represented in the hexadecimal form, i.e. using digits from 0 to 9 and

letters from A to F.

If you want your watermarks to contain textual data, it should be represented in the hexadecimal form first. For this

purpose it is convenient to use ASCII code that introduces numerical representation for text symbols. You can find ASCII

table of printable symbols in the Appendix: ASCII printable characters (page 46). On the other hand, standard ASCII

representation of the text symbols is not compact. AWT2 package contains a small console utility called ‘text2hex’ which

can be used to convert text into hexadecimal form (and back) using more compact representations (refer to page 33).

Choosing optimal watermark payload form has two aspects. On one hand, chosen watermark payload form and its

length should be suitable for including all necessary information, i.e. all the information that you want to embed using

watermarks. On the other hand, chosen watermark payload form should not compromise the robustness of the

watermarks, it should be as compact (short) as possible to allow embedding more watermark copies into certain

duration of audio. More watermark copies lead to better statistical measures and therefore better ability of the decoder

to detect/extract the watermark even from heavily distorted piece of audio.

Additionally, the chosen watermark payload form should be of fixed length. This is due to the fact that for decoding you

need to specify expected watermark payload size to the decoder. It means that for a specific application, you need to

choose the length of the watermark payload size once and for all.

When choosing the payload form please keep in mind that it is only you (the owner of a specific AWT2 copy with unique

Serial Number) who is able to detect/decode your watermarks, therefore you should not worry too much about

absolutely unambiguous meaning of the form or representation of the embedded data.

Let us consider three examples.

Example 1. You’re a composer or music owner wishing to embed your ID or company name and release date into your

audio files. For the purpose of this example let your company be called “Music Universe”.

Taking into account the aspects described above, it is better to keep your watermark payload as short as possible, but

still allowing to clearly recognizing the data contained in it. The recommendation is to embed the company name in

abbreviated form, like “MusUni” (6 bytes). Writing the full name as “Music Universe” (14 bytes in ASCII representation)

is a waste of payload size. The date can be stored in its pure form as year number (2 bytes), month (1 byte) and day (1

byte). However, this full date form is not optimal. Instead, you can write it as a number of days since first day of 2010.

In this case 2 bytes (that is 65535 days covering 179 years) should be enough.

In conclusion, the suggested payload form for this case is:

“MusUni” (6 bytes) + date (2 bytes)

Audio Watermarking Tools 2 (AWT2) - User Guide

32

This watermark payload form is quite compact (8 bytes = 64 bits) and at the same time is very informative as it contains

well recognizable abbreviation of the company name and the date. With watermarking rate of 15 bits per second (in

‘normal’ operating mode) this watermark form allows embedding approximately 14 copies of the payload in one minute

of audio, which is good watermarking density providing very good statistical measure on watermark detection stage.

For example, for 2010, February 1st (that is 32th day of 2010) the watermark payload of the discussed form will look as

follows:

0x4D7573556E690020

where 0x4D is “M”, 0x75 is “u”, 0x73 is “s”, 0x55 is “U”, 0x6E is “n”, 0x69 is “i”, and 0x0020 is decimal 32.

Example 2. You are running a music web-store and wish to embed unique watermarks into every distributed file. The

embedded watermark should contain an identifier of the recipient user (from the user database) and your company

identifier.

In this case you should decide about the form of the user’s identifiers. If your web-store user database assigns unique

numerical ID to each user, then you can consider embedding this numerical ID directly into the watermark payload. For

example, 4-bytes ID allows identifying more than 4 billions of different users. In this case the chosen payload form is:

“MusUni” (6 bytes) + userID (4 bytes)

This watermark payload form is very compact and at the same time very informative.

For example:

0x4D7573556E690000303A

where 0x4D7573556E69 is “MusUni” (like in the previous example) and 0x0000303A is numerical userID of the user with

decimal number 12346.

In ‘normal’ operating mode, with watermarking rate of 19 bits per second this payload (10 bytes = 80 bits) allows to

embed approximately 14 copies of the watermark in one minute of audio.

Example 3. You want to embed textual data containing written name of the file recipient.

Using ASCII codes to encode text in this case is not the best idea because one text symbol will consume 8 bits (one byte).

You can use representation that is much more compact by enumerating text symbols in your own way instead. For

example, you can decide using only small letters (from a to z) and some more characters such as space, underscore, etc:

0 – a, 1 – b, 2 – c, …, 25 – z, 26 – space, 27 – underscore (_), 28 – dot, 29 – comma, 30 - @, 31 - &

Audio Watermarking Tools 2 (AWT2) - User Guide

33

With this encoding scheme each symbol consumes only 5 bits. So, encoding of 20 text symbols (name) consumes only

100 bits (12 bytes) instead of 160 bits (20 bytes) when using standard ASCII code.

The described idea of compact representation of textual data is implemented in a small command line utility called

‘text2hex’ which comes as a part of AWT2 package and described in the next section. This functionality is also

implemented in AWT2 GUI tool.

7.1 text2hex utility

text2hex is a console application that comes as a part of AWT2 package. This application allows converting text data into

hexadecimal form and back. While standard ASCII representation uses 8 bit (1 byte) to represent one text symbol,

text2hex utility allows using more compact 5-, 6- and 7-bit representation forms. It means that using text2hex you can

store more text symbols in the same amount of bytes.

Depending on your needs you may choose one of the available encoding schemes. Below is the list of symbols available

in each translation table.

It is believed that 6-bit conversion table contains enough symbols to represent any textual information in a meaningful

way. This table contains all digits and all letters as well as special symbols. By using 6-bit representation you save 2 bits

on every text symbol compared to standard 8-bit ASCII representation. It means that you can store, for example, 20 text

symbols in 15 bytes.

text2hex has two modes: encoding and decoding. To convert text into hexadecimal form use ‘encode’ mode. To convert

hex string back into text use ‘decode’ mode.

Command line usage format of the utility is:

text2hex <work mode> <bits> [payload size (bytes)] “string”,

Audio Watermarking Tools 2 (AWT2) - User Guide

34

where:

<work mode> is the word ‘encode’ or ‘decode’ (without the quotes)

<bits> is number of bits per one symbol (5, 6, or 7) according to one of the tables above

[payload size] (mandatory only for encoding mode) is a desired length of the output hex string. If the actual

data converted into the hex form is shorter, the output is padded with zeros. This is very useful in AWT2 for

keeping the same hex payload length for any text string length

“string” is the input text string in quotes “” (in encoding mode) or hex string (in decoding mode).

Example:

text2hex.exe encode 6 40 "hey! don't even think to touch my files! (c)john doe"

This command converts 52 symbols long string “hey! Don’t even think to touch my files! (c)john doe” into 40 bytes long

hexadecimal form using 6-bit representation. The output hex string is:

0xD453D08394E313B8059C58E0EF5363760ED383B4C76758033E094DF456D0824654B53D638394C500

This string can now be used with AWT2 encoder.

To convert the hex string back into text run:

text2hex.exe decode 6 0xD453D08394E313B8059C58E0EF5363760ED383B4C76758033E094DF456D0824654B53D638394C500

The output will be:

hey! don't even think to touch my files! (c)john doe

Audio Watermarking Tools 2 (AWT2) - User Guide

35

8 Useful tips

In this section you can find some useful tips, recommendations and ready solutions for applications utilizing AWT2.

8.1 Converting audio or video file into file format suitable for AWT2

Whereas AWT2 decoder supports almost all feasible file formats via FFmpeg5 (utilizing it for files conversion into wave

PCM format), AWT2 encoder accepts only wave files. However, some applications (such us embedding watermarks into

video content) may require operating with already compressed audio files (e.g. MP3, AAC, AC3, MP4, etc.). In this case it

can be useful to know how to convert (or decode) any audio or video file into wave PCM sound file using FFmpeg. Below

is a command line for conversion of ‘input_file’ into ‘output_file.wav’ (wave PCM).

ffmpeg –i input_file –y –ar 44100 –vn output_file.wav

The ‘input_file’ can be either audio or video file containing audio track.

Non-standard WAV files (having erroneous or non-standard header and thus not supported by AWT2) possibly can be

corrected and converted into standard waves using a third-party utility called SoX6:

sox -D source.wav --encoding signed-integer -t wavpcm converted.wav

8.2 Watermarking long audio files using AWT2

In case the audio file that should be watermarked is very long (so much long, that AWT2 encoder cannot process it as a

whole piece due to computer memory limitations), it is still possible to watermark it in pieces. To do so it is convenient

to use a third-party utility called SoX.

Here is a Windows batch file which takes a file ‘long-in.wav’, slices it into shorter pieces using SoX, encodes every piece

using AWT2 encoder and combines the output by merging encoded pieces together back into one long output file.

5 FFmpeg web-site: http://ffmpeg.org/

6 SoX web-site: http://sox.sourceforge.net/

Audio Watermarking Tools 2 (AWT2) - User Guide

36

REM === Splitting the long input file 'long-in.wav' into shorter REM === slices (600 seconds long each) sox long-in.wav slice.wav trim 0 600 : newfile : restart REM === Watermarking every slice separately (note: the first and the last REM === second of the slice are skipped for smooth transition between merged parts) for %%X in ("slice*.wav") do ( awt2_enc %%X wtr_%%X 0x12 -skip_first=1 -skip_last=1 ) REM === Combining the watermarked slices together back into one long watermarked output sox wtr_slice*.wav long-out.wav

Using this trick, it is possible to encode hours-long audio files even on computer with very limited RAM size.

In case there is a need to watermark very long MP3 file, the watermarking process together with preparatory decoding

of the MP3 file into wave and following backward encoding of the watermarked file into MP3 may be extremely time

consuming. A better option in this case is to watermark only selected parts of the MP3. This can be done by splitting the

MP3 file into pieces without its transcoding as a whole. The following Windows batch file demonstrates the idea. The

following third-party freeware cross-platform open source tools are involved: mp3splt7, Lame MP3 encoder8.

REM splitting input MP3 file into two pieces: first piece (4 minutes), second piece - the rest mp3splt.exe in.mp3 0.0 4.0 EOF REM decoding the first piece into WAV lame.exe --decode in_000m_00s__004m_00s.mp3 tmp.wav del in_000m_00s__004m_00s.mp3 REM embedding watermark into the first (decoded) piece awt2_enc.exe tmp.wav out.wav 0xABCDEF12 -skip_first=1 -skip_last=1 del tmp.wav REM encoding watermarked piece back to MP3 lame -b 256 out.wav in_000m_00s__004m_00s.mp3 del out.wav REM merging all pieces back into one long MP3 mp3wrap.exe merged.mp3 in_*.mp3 del in_*.mp3

7 mp3splt homepage: http://mp3splt.sourceforge.net

8 Lame encoder homepage: http://lame.sourceforge.net/

Audio Watermarking Tools 2 (AWT2) - User Guide

37

8.3 Extracting watermarks from broadcasts and long recordings

AWT2 decoder allows analyzing long audio recordings by slices. This feature is especially useful in cases when

watermarks search should be performed within recordings of broadcasts in which not the whole recording is

watermarked, but only parts of it (e.g. particular songs in radio broadcast). There are two parameters controlling this

mechanism: –slice_length and –slice_step. These parameters define respectively length (duration) of audio stream slices

taken for analysis, and analysis step; both should be specified in seconds.

For example, the following AWT2 decoder call:

awt2_dec long-broadcast-rec.wav 1 –slice_length=10 –slice_step=4 –fast_scan

forces the decoder to analyze the input file “long-broadcast-rec.wav” by slices 10 seconds long each; the slices are taken

every 4 seconds.

Using the –stop_on_found switch It forces the decoder to stop analysis on first found watermark.

Recommendations on slice durations optimal for decoding are provided in section 5.1.

8.4 Watermarking audio track of a video file

In order to embed watermarked audio track into a video file, it is needed first to extract original audio track from the

video file, then encode the extracted audio track with AWT2 and then merge the watermarked audio track with the

video track back.

To extract audio track from video file, use FFmpeg:

ffmpeg -i video.m4v -y audio-track.wav

Extracted ‘audio-track.wav’ is a wave PCM file that can be encoded with AWT2 directly or in pieces (as described section

8.2).

To replace the original audio track in the video file with the watermarked audio track, run:

ffmpeg -i video.m4v -i watermarked-audio-track.wav -map 0:0 -map 1:0 -vcodec copy -acodec libfaac -ab 256k out.m4v

This command forces FFmpeg to take video from the file ‘video.m4v’, take audio from ‘watermarked-audio-track.wav’,

encode the later with FAAC encoder and then merge audio and video into the output file ‘out.m4v’.

8.5 Watermarking multiple files

Here is a simple way to watermark all files located in a folder (in Microsoft Windows).

Audio Watermarking Tools 2 (AWT2) - User Guide

38

Create a batch file called ‘encode-all.bat’. Put the following contents into it:

mkdir outputs @for %%X in (*.wav) do ( awt2_enc "%%~nxX" "outputs\%%~nxX" 0x12 )

Place this file in the folder with the wave files to be watermarked.

Place ‘awt2_enc.exe’ (AWT2 command line tool) in the same folder.

Run the batch file. It will create a sub-folder called ‘outputs’, watermark all wave files (.WAV) located in the current

folder and place all watermarked files inside the ‘outputs’ sub-folder.

8.6 Analyzing live audio stream

If you need to analyze live audio stream using AWT2 watermark decoder, here is an efficient recipe on establishing live

analysis of audio stream from sound card (mic or line-in) using all freeware tools on Microsoft Windows OS-based

machine.

Third-party tools involved:

SoX (Sound eXchange) – http://sox.sourceforge.net (used for live recording of audio from sound card)

Blat – http://blat.net (command line utility that sends e-mail)

Create a folder, extract SoX and Blat packages into it. Put AWT2 decoder into the folder. You should have approximately

the following files structure:

blat.dll pthreadgc2.dll zlib1.dll awt2_dec.exe blat.exe sox.exe

Create a batch-file, call it ‘AWT2Monitor-runme.bat’ (without quotes). This batch continuously monitors (listens to)

audio stream by invoking sound recording and watermark detection every N seconds. The contents of the batch:

@REM === Audio stream monitoring definitions @SET slicelen= 16 @SET slicestep= 8 @REM === Entering infinite loop @echo start > result.txt

Audio Watermarking Tools 2 (AWT2) - User Guide

39

@SET /a waitsec= 1 + slicestep :Start @REM Get current timestamp in a form: @REM DD-MM-YYYY-HH-MM-SS-ms (e.g. 10-02-2013-21-04-46-48) @set SAVESTAMP=%DATE:/=-%-%TIME::=-% @set SAVESTAMP=%SAVESTAMP: =% @set SAVESTAMP=%SAVESTAMP:,=-% @set SAVESTAMP=%SAVESTAMP:.=-% @REM Start recording audio slice from the 'default' sound device and analyze it using AWT2 decoder @start /SEPARATE /MIN call _rec_and_analyze.bat %SAVESTAMP%.wav %slicelen% @REM Wait 8 seconds before the next slice @PING 127.0.0.1 -n %waitsec% -w 1000 > nul @more result.txt @goto Start

The batch file runs continuously in the infinite loop. You will need to close its window to stop the listening process.

You can change time delay (step) between recorded audio slices by changing the value of ‘slicestep’ variable (in the

example above the recording is performed every 8 seconds). You can change the length of the recorded audio stream

slice y changing the value of ‘slicelen’ variable (in the example above the slice length is set to 16 seconds). With these

recording settings, the recorded slices overlap, that provides reliable monitoring of the audio stream.

Create another batch file, call it ‘_rec_and_analyze.bat’ (without quotes). Here are the contents of this batch:

@REM Running SoX tool to record N sec long slice from the 'default' sound card @sox.exe --channels 1 --rate 44100 --bits 16 -d "%1" trim 0 %2 @REM Running AWT2 decoder to analyze the recorded slice @awt2_dec.exe "%1" 4 -silent >"%1".txt @IF NOT EXIST "%1" goto panic @REM storing result as text to show in the main listener loop @SET /p res= < "%1".txt @echo %1 - %res% > result.txt @REM Check whether a watemark was found more "%1".txt | find /i "No watermark" @IF ERRORLEVEL 1 goto watermark @IF NOT ERRORLEVEL 1 goto nowatermark :watermark @ECHO !Watermark detected! @mkdir Watermarked @copy "%1" Watermarked\ @copy "%1".txt Watermarked\

Audio Watermarking Tools 2 (AWT2) - User Guide

40

@REM Send e-mail with attached WAVE file and watermark in the subject blat -SaveSettings -f [email protected] -server mail.com -port 26 -u user -pw pswrd blat "%1".txt -attach "%1" -to [email protected] -i "AWT2_Analyzer" -subject "watermark found" @goto ext :nowatermark @ECHO !No watermark! @goto ext :ext @del "%1" @del "%1".txt @exit :panic @ECHO Could not record audio! Panic! @echo Could not record audio > result.txt @exit

This batch performs recording from the sound card input into a wave file (audio slice) and then performs watermark

detection in the recorded wave using AWT2 decoder. If a watermark is detected, the recorded wave file is sent by e-mail

in the attachment together with the information about detected watermark in the message body. You need to put

appropriate information into mail addresses and mail server parameters in the Blat tool calling string. All watermarked

waves are stored in ‘watermarked’ sub-folder. In case of no watermark, the recorded audio slice is deleted.

You should have the following files structure:

_rec_and_analyze.bat AWT2Monitor-runme.bat blat.dll pthreadgc2.dll zlib1.dll awt2_dec.exe blat.exe sox.exe

Before running this simple monitoring system, make the desired sound card input to be the default recording audio

source in your Windows OS audio mixer (in Windows 7, go to “Recording devices” tab, right-click with mouse on the

desired sound card input and chose “Set as Default Device”). In the audio device properties (or in the volume mixer) set

appropriate recording level to avoid audio signal clipping.

Run ‘AWT2Monitor-runme.bat’ to start live audio stream listening and analysis.

Audio Watermarking Tools 2 (AWT2) - User Guide

41

9 Licensing terms

AWT2 is offered under two different license types:

“Private content creator” type

for private audio content creators (musicians, composers, DJs, etc.) – private individuals and entities who

originally compose and generate audio content

“Commercial service provider” type

for commercial businesses, service providers (music/audio web-stores, promotion, broadcasting companies,

distribution services, etc.) and software companies – those who make profit on re-selling audio content, and/or

provide paid services around it, and/or develop commercial software applications or services that involve audio

watermarking.

AWT2 package, which is available for purchasing at http://audiowatermarking.info, is provided strictly under the

“Private content creator” license type and is offered for personal/internal use only. AWT2 End-User License Agreement

(EULA), that the user must agree with upon purchasing AWT2 package, does not allow leasing AWT2 directly or

indirectly to third parties. In particular, this license does not allow providing watermarking services of any kind to third

parties, does not permit integration of AWT2 into commercial products or services, and/or products and services that

are intended to be used by third parties. The end-user is allowed to use AWT2 to watermark only his/its own audio

materials (having or not having commercial value) with the only purpose to protect them from piracy.

Any other use of AWT2 including its use in commercial applications or services that involve other end-users and/or

third-party audio material owners (e.g. providing watermarking services to third parties, using it in broadcast monitoring

applications that involve third party audio materials, using AWT2 as a part of stand-alone or web-based software

applications such as music production software or web-store applications, etc.) is not permitted by the license provided

at http://audiowatermarking.info. Any such use requires purchasing “commercial service provider” license, which terms

and conditions are to be negotiated with AWT2 developer.

For detailed licensing information, please contact AWT2 developer.

Audio Watermarking Tools 2 (AWT2) - User Guide

42

10 FAQ

Q: Why AWT2 is so pr icy / so cheap? What advantages or disadvantages AWT2 has compared to

competitive solutions?

A: AWT2 watermarks are very robust (read above), probably more robust than competitive solutions available around

the web. The watermarks even survive time-stretching and transducing via air. Plus, the data rate is extremely high (up

to 125 bps). That is why AWT2 is possibly a little more expensive compared to some competitive solutions. On the other

hand, AWT2 is a product of a single developer that is provided on "as is" basis. That is why AWT2 is not as expensive as

it could be.

Q: Can special “watermarking attacks” compromise watermark robustness of AWT2? In other

words, are there methods able to destroy AWT2 watermarks?

A: Yes, of course. Damaging watermarks is always possible at least by developing a special watermarking attack targeted

on specific watermarking technique. Therefore the watermarking technique used in AWT2 is not attack agnostic too.

However, the author believes that the proposed technique is strong enough to survive most of standard watermarking

attacks.

Q: Does the use of this watermarking solut ion compromise the performance of automatic audio

recognition services (such as TrackID, Echonest, etc) being applied to the watermarked fi les?

A: The answer depends on every particular audio recognition technology. Since AWT2 watermark is practically

transparent to the human listener, it should be transparent to any “good” audio recognition engine as well. Therefore,

AWT2 watermarking should not harm recognition results of the most popular audio recognition technologies, at least

theoretically.

Q: Can one AWT2 user extract watermarks created by another AWT2 user?

A: No. Each particular copy of AWT2 binaries with particular Serial Number (SN) contains a unique numeric identifier

that is used during encoding to scramble watermark payload. This security feature prevents one AWT2 user (with one

SN) from extracting watermarks from watermarked files created by another AWT2 user (with another SN number).

Q: Can I extract watermark from a small fragment of the watermarked stream?

A: Yes, of course. Since the watermarking rate provided by AWT2 is quite high, it is generally enough to use only a

portion of the watermarked stream to extract the watermark from it (15-50 seconds, depending on watermark payload

size), especially when the audio is only slightly distorted compared to the source file. On the other hand, if the audio

Audio Watermarking Tools 2 (AWT2) - User Guide

43

stream is significantly distorted, you might need to use longer portion or the entire available watermarked recording

because statistically analysis of longer signals improves extraction reliability. Refer to section 5.1 for detailed

information about recommended audio fragment durations.

Q: Can I watermark MP3/OGG/… fi les?

A: This procedure requires transcoding. In other words, you need to decode MP3-files back to wave format before

watermarking them. AWT2 encoder operates with wave files only, and is unable to embed watermarks directly into MP3

bit stream. Therefore, the only way to watermark MP3 files is to decode MP3 file to wave file, then watermark the

obtained wave file, and then encode the watermarked wave file back to MP3.

Q: Why extracted watermark rel iabil ity is not 100% when decoding from encoded fi le directly?

A: Because watermarks are embedded by modification of aural information of the carrier audio signal. Therefore,

watermark extraction reliability depends on two main factors: degree of quality degradation of the analyzed audio

stream (distortions, noises, etc.) relative to the original watermarked stream, and “aural relevance” of the carrier audio

stream. It is obvious, that embedding watermark into totally silent audio results in zero reliability because there is no

acoustic information in silence. Therefore, the reliability depends on contents of the carrier audio. Fast changing rich

sound is better carrier than smooth sound with silent parts. For this reason, some signals provide almost 100% reliability

when extracting from encoded file directly, while others may result in lower reliability values.

Q: Can I use AWT2 to detect watermarked content in broadcast recordings?

A: Yes. AWT2 decoder allows analyzing long audio recordings by slices. This feature is especially useful for watermarks

search in recordings of broadcasts in which not the whole recording is watermarked, but only parts of it (e.g. particular

songs in radio broadcast).

Audio Watermarking Tools 2 (AWT2) - User Guide

44

11 Contact information

You can contact AWT2 author Alex Radzishevsky by e-mail: [email protected]

Audio Watermarking Tools 2 (AWT2) - User Guide

45

12 Appendix: AWT2 error codes

1: could not allocate memory 2: input file (or stream) cannot be read or does not exist 3: unrecognized command line parameter; run the program without parameters for the exact syntax 4: specified watermark payload size exceeds limitations of this program copy 5: input file conversion error: could not run FFmpeg, or the file cannot be converted into wave, or error occurred during the file conversion 6: could not read converted file (created by FFmpeg) 7: input audio file is too short 8: input wave should be sampled at 192000, 176400, 96000, 88200, 48000, 44100, 32000, 22050, 16000, 11025, 8000 Hz 9: the audio stream is too short: less than 5 copies of the watermark fit into 10: could not write the output file 11: specified watermark payload size is too long/unsupported. 12: uninitialized AWT2 encoder object 13: cannot extract watermark from audio stream with sampling rate lower than 16 KHz in 'high capacity' mode 14: cannot embed watermark into audio stream with sampling rate lower than 16 KHz in 'high capacity' mode 15: expected payload size should be specified in bytes 16: specified payload size exceeds supported range 17: specified time stretching factor value is not supported 18: specified aggressiveness value is not supported 19: specified density value is not supported 20: name of temporary audio file for FFmpeg cannot be empty 21: specified hexadecimal watermark payload is invalid 22: source and watermarked files cannot be the same 23: new watermark payload should have the same size in bytes; changing watermark payload size requires encoder object re-initialization 24: watermark payload should be specified; should be of integer number of bytes 25: cannot operate with watermark payloads exceeding 30 bytes in 'normal' mode. Use 'high capacity' mode 26: you are using demo version of the encoder. The only allowed payloads are: 0x12, 0x56 (1 byte), 0x7890, 0x9876 (2 bytes), 0xABCDEF12, 0x12345678 (4 bytes), 0x112233445566 (6 bytes), 0x1122334455667788 (8 bytes), 0x001122334455667788990011223344...778899 (40 bytes) 27: input file should be standard RIFF wave PCM 28: incorrect slicing parameter value 29: cannot search watermarks with chosen slice length (too short) 30: specified modulation gain value is not supported 31: specified noise level value is not supported 32: specified output gain value is not supported 33: specified limiter speed value is not supported 34: encoder object is busy with signal processing 35: sampling rate not supported 36: high capacity mode is not supported 37: wrong watermark layer 38: dual-layer watermarking requires high capacity mode 39: dual-layer watermarking is not supported 40: this software requires hardware protection USB dongle to be attached to this computer. No dongle found. 41: proper hardware protection USB dongle is not detected. 42: wrong '-stop_on_found' parameter use.

Audio Watermarking Tools 2 (AWT2) - User Guide

46

13 Appendix: ASCII printable characters

Binary Dec Hex Glyph Binary Dec Hex Glyph

100000 32 20 1010000 80 50 P

100001 33 21 ! 1010001 81 51 Q

100010 34 22 " 1010010 82 52 R

100011 35 23 # 1010011 83 53 S

100100 36 24 $ 1010100 84 54 T

100101 37 25 % 1010101 85 55 U

100110 38 26 & 1010110 86 56 V

100111 39 27 1010111 87 57 W

101000 40 28 ( 1011000 88 58 X

101001 41 29 ) 1011001 89 59 Y

101010 42 2A * 1011010 90 5A Z

101011 43 2B + 1011011 91 5B [

101100 44 2C , 1011100 92 5C \

101101 45 2D - 1011101 93 5D ]

101110 46 2E . 1011110 94 5E ^

101111 47 2F / 1011111 95 5F _

110000 48 30 0 1100000 96 60 `

110001 49 31 1 1100001 97 61 a

110010 50 32 2 1100010 98 62 b

110011 51 33 3 1100011 99 63 c

110100 52 34 4 1100100 100 64 d

110101 53 35 5 1100101 101 65 e

110110 54 36 6 1100110 102 66 f

110111 55 37 7 1100111 103 67 g

111000 56 38 8 1101000 104 68 h

111001 57 39 9 1101001 105 69 i

111010 58 3A : 1101010 106 6A j

111011 59 3B ; 1101011 107 6B k

111100 60 3C < 1101100 108 6C l

111101 61 3D = 1101101 109 6D m

111110 62 3E > 1101110 110 6E n

111111 63 3F ? 1101111 111 6F o

1000000 64 40 @ 1110000 112 70 p

1000001 65 41 A 1110001 113 71 q

1000010 66 42 B 1110010 114 72 r

Audio Watermarking Tools 2 (AWT2) - User Guide

47

1000011 67 43 C 1110011 115 73 s

1000100 68 44 D 1110100 116 74 t

1000101 69 45 E 1110101 117 75 u

1000110 70 46 F 1110110 118 76 v

1000111 71 47 G 1110111 119 77 w

1001000 72 48 H 1111000 120 78 x

1001001 73 49 I 1111001 121 79 y

1001010 74 4A J 1111010 122 7A z

1001011 75 4B K 1111011 123 7B {

1001100 76 4C L 1111100 124 7C |

1001101 77 4D M 1111101 125 7D }

1001110 78 4E N 1111110 126 7E ~

1001111 79 4F O