1 KIPA Game Engine Seminars Jonathan Blow Ajou University December 10, 2002 Day 13

Preview:

Citation preview

1

KIPA Game Engine Seminars

Jonathan Blow

Ajou University

December 10, 2002

Day 13

2

Audio Overview

• Systems tend to have a separate audio processor– PCs and audio card like Soundblaster, or on-

board audio– Xbox and Nvidia’s sound chip– GameCube and audio processor with ARAM

3

Analogy to Graphics Hardware

• Off-CPU graphics processing– Texture memory– Upload things to texture memory and tell

processor to use them

• Some sound hardware supports this model– Creative’s cards with SoundFont

• Some APIs built around this model– OpenAL

4

However…

• Support for ‘SoundFonts’ style hardware is not widespread

• OpenAL is not widely adopted

• I suggest that this is because this method introduces unneeded complexity, while not solving real problems– I don’t like OpenAL

5

Let’s think about audio

• CD-quality audio (44100 Hz), stereo, 16-bit per channel, is pretty good.

• That’s 176400 bytes per second

• For a 60fps game, that’s 2940 bytes per frame (not much!)– If outputting Dolby 5.1, of course there will be more

data

• PCI bus bandwidth is 133MB – 1066MB/sec– Audio is .2% of bus bandwidth or less

6

Let’s think about graphics

• 32MB of textures, 60fps = almost 2 gigabytes of texture data per second– So it’s obvious why we don’t stream texture data to the

hardware every frame

• But nobody likes having to upload textures to the hardware on PCs– It only creates code complications, bugs

• So we should avoid those complications in audio, where we don’t need them

7

The problem with OpenAL

• They want to be too much like OpenGL

• They’re extendin the “texture on video hardware” analogy to a place where it’s not appropriate

• This makes their API harder to use than it ought to be

8

GameCube with ARAM

• Means you basically must follow the “upload data to graphics hardware” analogy

• Sound is harder to code here than on the PC

• You need to make a system that streams from the dvd drive into ARAM, and that is annoying

9

Audio Processing

10

3D Sound Processing

• Doesn’t seem to work very well on consumer hardware

• Overview of HRTF (Head-Related Transfer Function)

• General overview of distance-based processing methods

• So far 3D Sound has been mostly marketing hype– Though maybe sometime in the future it will be done well– Techniques like Dolby 5.1 have much more concrete

benefit

11

FIR and IIR filters

• “Finite Impulse Resposne”,• “Infinite Impulse Response”

– Diagram on whiteboard of what these mean

• FIR filters can produce linear phase– If filter kernel is symmetric

• IIR filters do not produce linear phase• Since linear phase is not so important in audio, IIR

filters find much more use than in graphics

12

Specific Effects

13

Low-Pass Filter,High-Pass Filter

• Convolve sound with symmetric filter kernel (FIR)

• Kernel can be fairly small

• Since kernel is symmetric we can save some CPU time (show on whiteboard)

14

Slow down, Speed up sound

• This is a resampling problem• (Discuss what happens with low-quality

resampling)• To speed up, low-pass filter and then downsample• To slow down by an even multiple, fill signal with

zeroes then low-pass filter• To slow down by some other multiple, you can

write a generalized resampler

15

Delay sound(like for HRTFs)

• Integer sample delay is easy, you just change the offset you use to mix

• For non-integer delay, you need some kind of filter– Either explicitly resample, or use an “all-pass

filter”

16

Per-Frequency Pitch Shifting(Doppler, etc)

• Fourier version (move the spectrum)

• Time-domain version (low-pass filter, multiply by complex exponential)– But our signal becomes no loner real-valued!

• What exactly does that mean?

17

Sound effect shifting

• Shifting a sound effect is more complicated than shifting individual frequencies

• Natural sounds contain harmonics– Multiples of a fundamental frequency

• If the fundamental frequency is shifted by ω, the harmonics need to be shifted by 2ω, 3ω, etc…

18

My preferred sound architecture

• Central mixer that streams audio to the hardware

• Declare streaming channels– None of this static circular buffer stuff

• Samples can be added to these channels at any time

• Chain filters onto channels to add various effects

19

Preferred Architecture (2)

• Sound channels are “immediate mode” as much as possible– You don’t give them poniters to objects in the

game; you have to “push” position and orientation data on them in order to spatialize, they don’t “pull”

– Two reasons for this:• Orthogonality• Control over when changes happen

20

Coding Issues:Dealing with volume, pan, etc

• Can’t let them change discontinuously– This would cause a pop!

• Need to interpolate them over time

• This adds latency! (whiteboard discussion)

21

Coding Issues:Sending sound to the hardware

• Usually need to pre-fill far enough ahead on a circular buffer

• This adds latency!

22

Coding issues:Sound system responsiveness

• Don’t want the audio to skip when the frame rate drops

• Audio probably needs to be in its own thread– Unfortunate since I don’t like multiple threads.

• Synchronization between audio and main thread

23

Sound system responsiveness (2)

• Also when we are spooling sound from disk, we want that to be responsive

• Should be in its own thread also

• Unclear whether it’s a good idea to pack this into the same thread as the low-level audio (application-dependant)

24

Audio Mixing

• Even if output is 16-bit, you want to use 32-bit integers while mixing the inputs– This reduces the amount of added noise– For maximum accuracy we want sounds stored

so that they use the whole 16 bits

• Example of overflow that is saved by 32-bit numbers

25

How sound volume is controlled

• 1/r2 attenuation

• “Min volume” / “Max volume” distances– Paradox of infinite loudness

• Modeling sound as pressure (in Pascals) not loudness (in Decibels) – Emitter of finite size radiating a certain

pressure density

26

Overview of engine soundsystem implementation

• Class hierarchy

• Win32-specific piece does the talking to DirectSound

• Cross-platform piece does most of the control

27

Audio Control Flow

• No callbacks– Buffer notifications, etc

• Absolutely everything is a “push” model– If you want sound B to happen after sound A,

you are responsible for noticing that A is almost done and that it’s time to play B

• Or else you can decide far ahead of time to just append B and forget about it

28

In-Engine Audio Format Support

• .wav loading• .ogg loading / streaming

– Discussion of .ogg vs .mp3

• Can use compressed audio for even small samples, to reduce download time– Though maybe you uncompress them to a

sjmple format the first time the game runs

29

Supported sampling rates

• 44100 Hz only• This is unlike most engines / audio APIs (which

will try to resample for you if you provide things at a different rate)

• I wanted things as simple as possible• Sometimes this might cause development

bottlenecks since you might have to resample any acquired audio (like off the internet) before using it– So I might change this decision in the future

Recommended