Audio design for interactive systems

Audio design forinteractive systems

Camille Goudeseune

Integrated Systems Laboratory,

Beckman Institute

Outline

• Sound libraries– music instruments

• Digital Audio– music theory

• Roles of sound– composition

high-level multiplatform (linux+windows at least) sound

libraries• Java Sound API

• VSS

• FMOD

• SDL

• SEAL

• Housemarque

Java Sound API

• Painless for Java. Very basic. Avoid if you're ambitious. http://java.sun.com/products/java-media/sound/

VSS Virtual Sound Server

• www.isl.uiuc.edu/software/software.html

• the 500-pound gorilla if you need stuff like 8-channel output or linux/irix/windows compatibility

• steep learning curve for advanced stuff

FMOD

• Syzygy (the Cube) uses it for sound.

• CPU-miserly

• finite but featureful API

• www.fmod.org

SDL

• large user community

• open source.

• Good if you want to integrate sound and graphics tightly.

• www.libsdl.org

SEAL Synthetic Audio Library

• many C compilers supported

• Wide support for hardware acceleration

• www.sonicspot.com/sealsdk/sealsdk.html

Housemarque

• Multichannel with certain PC soundcards

• www.s2.org/hmqaudio/

low-level sound libraries: linux

• OSS, www.linux.org.uk/OSS/– Most distros include the free basic version– ~$30 for fancy multichannel soundcard drivers

• ALSA, www.alsa-project.org – religiously open-source alternative

• 0.5.10 is the stable version• 0.9.0 development version is fine too

(part of the upcoming 2.5 linux kernel).

low-level sound libraries: windows

• MMIO– simple, universal– waveOutGetDevCaps(), waveOutWrite(), ...

• DirectSound – part of DirectX– LPDIRECTSOUNDBUFFER,

CreateSoundBuffer(), Lock(), Unlock(), ...– “faster” but more awkward; use a wrapper

Basics of Digital Audio

• File formats• 16-bit or 8-bit?

– These days, 8-bit is embarrassing.Pro gear uses 32!

• Stereo or mono?– Panning mono is faster + simpler than stereo.

• 8 kHz? 44 kHz?• .WAV, .AIFF, .AU, .MP3• converters: “sox” and dozens of others


• Two tradeoffs

CompressionRatio

CPU Load

Sound Quality

Latency

CPU Load


• Debugging

• Gaps in sound excessive CPU use

• stuttering CPU starvation(CPU fast enough but poorly scheduled)

• Different from graphics! The “frames per second” can’t degrade if the CPU is taxed.


• Debugging

• Electric-guitar distortion clippingToo quiet: hissToo loud: clippingJust right: almost clipping– For every stage in the audio pipeline,

both software and hardware.Every place you can set the volume level!

Where to get sounds

• Buy: fx + music libraries

• Build: record it yourself

• Build: synthesize it yourself– adjust an existing synth patch, a little or a lot

• Steal: websearch– 8-bit might still suffice while prototyping

What to do with the sounds,once you have them

• common roles– alerts; acknowledgements; ambience

• sound vs. image

• speech vs. non-speech

• synchronization with visual events

• combining sounds

• tips for spatializing

Short and subtle is best!

• “Graphical excellence gives to the viewerthe greatest number of ideasin the shortest timewith the least inkin the smallest space.”

(Edward Tufte, The Visual Display of

Quantitative Information) • This applies to alerts, acks, and ambience.• Clutter is worse with sound: no earlids!

Sound or image for a message?

• Use sound if:• Simple• Short• Standalone• Temporal• Demands immediate

action

• Use visuals if:• Complex• Long• Referred to later• Spatial• Can deal with it later

Speech vs. non-speech

• Main advantage: precise, simple! But...

• Carries extra connotations

• Tone of voice, imprecise word choice, can confuse or distract

• Non-speech doesn’t interfere with conversation

Synchronizing to other things

• Often 100 msec is accurate enough,when passively observing a sonic anda visual event

• Sync with input action might needto be tighter, 30 or even only 4 msec.

Moving a probe through a dataset

• Trigger sounds on button-click, or when crossing thresholds

• Or, play a continuous sound.

• Vector-valued data is trickier.

• Nominal data: one sound per name.

• Scope of probe: point, ball, shell.– Auto-size based on probe speed.– Granulation

Navigating a VR world

• Dense world– azimuth: click rate varies with turning rate– altitude: high/low beeps

(rate varies with climb rate)– speed: vehicle-engine metaphor

• Large world: quiet continuous ambiences localized to individual parts of the world.– Play only the nearest two or three.

Combining sounds

• It can get only so loud before clipping.

• Spatial separation (“panning”).

• Temporal separation (one sound at a time).– A party of drunks / a good dinner conversation.

• Frequency separation.

• Each layer gets its own tempo.– Heavily layered techno or orchestral music.

Spatialized sound

• Steady tones are worst.Bird chirps are best (they should know).– wide frequency band, complex attack

• Loudspeaker array

• Headphones with HRTF

• Motion-tracked headphones– In the real world you move your head slightly to

tell where a sound comes from.

Documents

Audio design for interactive systems