Sound and Music for Games: Prokofiev to Pac-Man to Guitar Hero

J. Audio Eng. Soc., Vol. 57, No. 6, 2009 June 455

he recent AES 35th Interna-tional Conference, Audio for

Games, held in London in February,illustrated the way in which sounds andmusic for games are evolving inresponse to user expectations andchanging game genres. The traditionalfilm score, such as written byProkoviev for Alexander Nevsky, wasan essentially static creation designedto match the screen action. While thecomposer might have worked closelywith the director to create a combinedmusic/film entity that satisfied theviewer on a number of aesthetic levels,the sound track was fixed for all time ina single form. Games have many thingsin common with films but differ in theessential fact that they are unpre-dictable to some degree, and yet stillhave music and effects as an accompa-niment. For this reason the sounds thataccompany games cannot be entirelyfixed and need to be allowed to changeor evolve according to the current gamestate. How this has been dealt with todate, and what might be achieved infuture, are tackled in a number of inter-esting papers from the conference.

REAL-TIME ADAPTIVE MUSIC INPERSPECTIVEMcAlpine et al. provide a comprehen-sive review of the issues surroundingthe creation of real-time adaptive musicin their paper “Approaches to CreatingReal-Time Adaptive Music in Interac-tive Entertainment: A Musical Perspec-tive.” They remind us that the marriagebetween music and film was successfulbecause both had linear structures, andthe film provided a narrative frame-work on which could be built a musicalscore. It becomes clear very quicklythat the number of possible states, play-ers, contexts, and cultures across whicha game might be played makes it very

difficult to score appropriate musicalsequences in advance. The composer isto some extent blind to the possiblecombinations of circumstances thatmight arise in a complicated game.The authors briefly review the

history of game music, showing how itevolved from the early video game inwhich the most common music wassynthesized “on the fly,” consistingmainly of monophonic melodies andsynthetic effects. This was the soundof Space Invaders and Pac-Man. Thismoved on rapidly through games thatused the potential of prerecordedmusic tracks from CD-ROM, such asQuake, through dedicated audioengines such as FMOD that facilitatethe rapid development of game audiofor a number of platforms, to fullyinteractive music-making using gamecontrollers in the shape of musicalinstruments, as one finds in moderngames such as the popular GuitarHero. An important distinction is madebetween passive musical componentsand those that can be considered inter-active. They authors explain that whatis meant by the latter is actually real-time adaptive music—music that iscomposed in response to inputs fromthe player and which provides feed-back that may bring about behavioralchange.Different game music elements are

distinguished. These include thefollowing: title music, which may setthe overall mood of a game at theoutset; menu or hi-score music, whichusually consists of short loops that fillthe silence while the player goesthrough menus; cut scene music, whichtypically accompanies actionsequences that may be from films, orprerendered gameplay, usually usedbetween levels of a game; event-drivencues; and finally in-game music, which

is the primary accompaniment for theplayer’s activities. It is in the lattercase that the main challenges of inter-active music arise. The three mainmethods currently used to generateinteractive music have been simpleevent-driven music cues, horizontal re-sequencing, and vertical reorchestra-tion. The latter two are illustratedgraphically in Fig. 1. Horizontal re-sequencing involves the insertion of ➥

By Francis RumseyStaff Technical Writer

Sound and Music for Games:

Prokofiev to Pac-Man to Guitar Hero

T

Fig. 1. Top, horizontal resequencingintroduces different loops depending ongameplay parameters; bottom, verticalreorchestration introduces new scoreelements depending on gameplayparameters (courtesy McAlpine et al.).

p455to459_Games:June09 6/23/09 11:56 AM Page 1

http://evasion.inrialpes.fr/Membres/Cecile.Picard/SupplementalAES/

456 J. Audio Eng. Soc., Vol. 57, No. 6, 2009 June

a new musical loop when a particulargameplay parameter exceeds a certainlevel, whereas vertical reorchestrationbuilds up the complexity of the scoreby introducing new components as thegameplay parameters change.They authors show how the tradi-

tional musical problems of “engineer-ing” tension and release patterns in ascore can be hard to control in gamesbecause of the difficulty of predictingthe rate at which action is going to takeplace. This introduces the problem ofthe correct pacing of musical material.They suggest that solutions to suchproblems are likely to be complex,involving sophisticated musical intelli-gence engines that can resequence andreorchestrate content appropriately.Two approaches are suggested. Thefirst treats gameplay as a dynamiccontrol system for music. However,this is hard to use in practice becausethere is no apparent reason why emer-gent gameplay parameters shouldprovide a suitable structure for thecreation of appropriate music. Thesecond option, using music as adynamic control system for the game,seems more promising and might suitsome types of games that could be ledby the music. One of the furtheroptions they suggest involves buildingon the musical concept of improvisa-tion. An improvisation engine might beable to embellish or adapt basic musi-cal lines in terms of melody, rhythm,or harmony to complement the game-play. The skilled composer is stillimportant because good basic themesand style rules will be crucial to thesuccess of such an approach.

GAME AUDIO LABHuiberts et al. describe a system thatenables rapid prototyping of nonlinearsound for games in “Game AudioLab—An Architectural Framework forNonlinear Audio in Games.” Theyexplain that innovation in this fieldrequires a flexible platform free ofcommercial secrecy and constraintsthat allows access to the individualgame parameters so they can be con-verted into more meaningful and use-ful music and sound controllers. Theygive the example of “level of threat” asa composite game variable that is noteasily accessible in most games. It has

to be constructed by bringing togetherother variables that are available, suchas the number of enemies within a cer-tain distance, the distance to the ene-mies, the health of the avatar, and soforth. Existing audio integration toolssuch as GameCODA, ISACT, andFMOD do not allow the individualgame variables to be intercepted andadapted in this way when they arrivefrom the game engine.In order to implement a version of

this system, the authors adapted agame called Half Life 2, which has anopen-source software development kitthat offers a number of tools tocustomize the game. By modifying thegame so as to enable it to use the OpenSound Control (OSC) protocol, theywere able to interface it with themusic/sound software applicationMax/MSP, via a so-called bridgepatcher (see Fig. 2). OSC is a protocolthat enables networked control ofmusic and sound devices, rather like asophisticated version of MIDI, and thebridge software developed by theauthors enabled them to combine theindividual game variables so as tocreate composite variables such as thatmentioned above. The GSToolkitshown in the diagram is a Max/MSPapplication that acts as a dynamicsample player with real-time DSP,enabling the designer to alter themapping of composite control informa-tion to sound-processing algorithms.Certain sound instances in the gameare routed via the external workstationcontaining the Max/MSP software, tohandle those components that will beseparately processed, while theremaining sounds are handlednormally by the game platform’s audioengine. In the case of the authors’example, only the weapon sounds andambience layers of the game wereexternally handled. Two compositegame variables, namely level of threat

and level of success were used tocontrol selected sounds. For example,level of success changed the gunsounds to make them more or lesssatisfying. Various parameters of theambient sounds such as volume, filter-ing, and reverberation were also modi-fied using these two game variables, inorder to change the intensity of theambience effect. The authors suggestthat the principles of this example alsolend themselves to music-composi-tional applications involving genera-tive and procedural approaches.

BRIDGING THE GAP BETWEENRECORDINGS AND SYNTHESISThe problem of how to render thesound of complex contact interactionsbetween animated objects wasaddressed by Picard et al. in “Retarget-ing Example Sound to InteractivePhysics-Driven Animations.” Suchsounds include sliding and scraping, aswell as individual impacts or breakingsounds. They refer in particular to ani-mation using physics engines, extract-ing parameters from the engine to con-trol the audio processs. (A physicsengine is a computer program that sim-ulates physical objects and theirmotion according to the laws ofphysics.) The reason for considering anew method is that conventionalapproaches tend to use prerecordedsamples synchronized with the contactevents arising from the animationengine, but this leads to repetitive-sounding audio, and good matchingbetween animation and sound is hardto achieve. It is also necessary toarrange for specific sounds to be asso-ciated with each contact event, whichis time-consuming for the game author.In the approach adopted by the

authors, original audio recordings areanalysed and classified as impulsive orcontinuous sounds, the latter beingsubdivided into sinusoidal and tran-

Fig. 2. Software framework for Game Audio Lab (courtesy Huiberts et al)

Sound and Music for Games




sient components using spectralmodelling synthesis (SMS). From thisis compiled a dictionary of shortsegments known as audio grains,which are subsequently used in aresynthesis method targeted to theanimation parameters of the objectderived from the physics engine. Thisprocess is shown in Fig. 3. Audiograins are typically between 0.01 and0.1 seconds long, corresponding to thesort of time period that might representthe individual impacts in a morecomplex sound. A measure of spectralflux is used to estimate the grainextraction, which is a measure of therate at which the power spectrum ofthe signal is changing. Grains withenergy below a certain threshold arediscarded in order to limit theirnumber. Following this the originalaudio recordings are correlated withthe collection of grains to identifymoments of maximum correlationbetween them, in order to select anappropriate number of grains that bestrepresent each audio recording over aperiod of time. This leads to a compactdatabase with a smaller memory foot-print than the original recordings.During resynthesis the most relevantgrains are concatenated using overlap-add blending.In order to match the resynthesized

sounds to animated pictures involvingobject contacts, physical parametersfrom the animation engine such aspenetration forces and relative veloci-ties against time are used to determinewhether the contact is impulsive orcontinuous. Impulsive grains are thenaligned with penetration force peaks.Continuous contacts are detectedwhere the penetration force and rela-tive velocity are constant. For exam-

ple, rolling is found by looking forsituations in which the relative velocityis equal or close to zero. Using SMSsounds can be modified spectrallyaccording, say, to the object velocity.The authors looked at the effect ofsurface roughness and were able toidentify and control the spectralchanges resulting from rubbing againstdifferent surfaces. Time-stretching isalso facilitated with reasonable easeusing this method of resynthesis. Someexamples of the results of this work,with audio files, can be found at CécilePicard’s website, http://evasion.inrialpes.fr/Membres/Cecile.Picard/SupplementalAES/

INTERACTIVE SOUNDSYNTHESISIn “Design and Evaluation of Physi-cally Inspired Models of Sound Effectsin Computer Games,” Böttcher andSerafin consider various synthesismodels for use with interactive soundeffects. They concentrated on simulat-ing swordlike sounds, using a Nin-tendo Wii remote as a controller. Theypoint out that most game manufactur-ers have tended to concentrate on mul-tilayer prerecorded soundscapes,whereas there is great potential forphysically-inspired sound models thatcould lead to a greater variety ofsounds during game interaction. Physi-cally-inspired models, as opposed tophysical models, they say, have a basisin physical phenomena but are con-cerned more closely with perceivedsound quality than with accurate physi-cal simulation. Purely physical modelsdo not currently seem to deliver ashigh a sound quality as samples at thepresent time, which limits their appli-cability in game engines.

In order to investigate some differentapproaches to this problem, they inter-faced a Wii controller to the Max/MSPsynthesis software application, usingintermediary software known asOSCulator to convert the control datafrom the Wii into Open Sound Controlformat. Four different sound-genera-tion approaches were compared,consisting of sampled sound, subtrac-tive synthesis, granular synthesis, anda physically-inspired model. For thesampled sound they experimented witha number of potentially suitableswords and sticks, settling on a record-ing derived from a long thin bamboostick at two different levels of impact(surprisingly, this appeared to createthe most convincing sound). One orthe other sample was replayed depend-ing on the acceleration of the Wiiremote, in similar fashion to currentgame engines. The subtractive synthe-sis method was based on bandpass-filtered noise, with the cutoff andbandwidth of the filter being mappedto the acceleration of the Wii in aperceptually appropriate way. Granularsynthesis was based on grains of thehigh-impact sample mentioned above,using the Wii acceleration to governthe speed, amplitude, and duration ofthe grain replay. For the physically-inspired model, the authors used anapproach based on modal synthesiswith an exciter and resonator, using afiltered version of the above sample asthe exciter. The resonator simulatedfive resonant peaks that had beenpreviously removed from the sample,and the Wii remote acceleration wasused to control the amplitude andfrequency of the peaks. The resultingsounds were panned according to thehorizontal orientation of the Wii

Fig. 3. Overview of the approach used by Picard et al. for retargeting the grains of recorded sounds according to animationobject parameters (Figs. 3 and 4 courtesy Picard et al.)


➥



458 J. Audio Eng. Soc., Vol. 57, No. 6, 2009 June

remote. An audio-only game wasdevised in which the player wassupposed to strike an elusive opponentwith the sword. This occurred within abackground soundscape including theoccasional voice of the opponent, aswell as mood sounds, footsteps, andcrackling branches.A user test was set up involving a

number of staff and students who hadfamiliarity with the Wii controller,asking them to rate a number ofaspects including the sound quality,realism, interaction, and gesturecontrol. They included additional ques-tions related to preference, differencein nuance of interaction, and entertain-ment value. The method of synthesiswas randomly chosen before each test.The sound quality of the granularsynthesis approach was judged to bethe highest, followed closely by thesubtractive method. The modal synthe-sis implementation scored below aver-age. As far as realism was concerned,the granular method also came out ontop, which it also did in relation to theconnection between gesture controland the sound produced. The authorscommented that they felt the soundquality influenced the way peopleperceived the other test factors, and insome cases this seemed to cause thesubjects to get irritated or stop playingmore quickly. As shown in Fig. 5, thegranular synthesis approach also wonthe day in terms of overall preference,with users saying that they found it

more realistic and varied,as well as being more funto play. It seemed thatthere was little differencein perceived nuance ofinteraction between thesynthesis approaches.However, they noticedthat body gestures differedsubstantially depending onwhich type of sound wasinvolved. The subtractivesynthesis implementation,for example, caused usersto move in more detailedand diverse ways, whichthe authors regarded asone of the most interestingoutcomes of their research.This type of synthesis was alsoregarded as the most entertaining ofthe four tested. They authors concluded that in order

to get a better idea about the possibili-ties of the physically-inspired modalsynthesis method, the sound qualitywould have to be improved in futuretests. One possibility would use acontinuous input such as noise for theexcitation, instead of a static sample, inorder to enable more subtle response tobody gestures. An interesting commentat the end of the paper concerns thedegree to which users actually want tohear physically correct sounds. In factthe subtractive synthesis approach wasmore closely based on creating some-thing that sounded appropriate, and the

authors wondered if users might prefer“cartoonish” sounds to be used whenplaying games. Visual feedback mightalso change the user response.

GEOMETRY-BASEDREVERBERATIONThe simulation of spatial acoustics,including early reflections and rever-beration, is another important featureof game audio. Tsingos considers howgeometrical-acoustical modelling canbe used in games to improve the qual-ity and flexibility of rendering. In hispaper “Precomputing Geometry-BasedReverberation Effects for Games,” heexplains that the complexity of theseapproaches has typically been too highto be usable in games platforms so far.

Fig. 4. An example of the segmentation of impulse-like sound events according to a spectral flux estimation (left) and theresulting audio grain (right)

Fig. 5. Proportion of test subjects preferring differentmethods of synthesis for sword sounds (courtesyBöttcher and Serafin)




Fig. 6. Overview of the approach adopted by Tsingos for preprocessing andrendering reverberation (courtesy Tsingos)

Editor’s note: The papers reviewed in this

article, and all AES papers, can be

purchased online at <www.aes.org/

publications/preprints/search.cfm> and

<www.aes.org/journal/search.cfm>. AES

members also have free access to past tech-

nical review articles such as this one and

other tutorials from AES conventions and

conferences at <www.aes.org/tutorials/>.



Traditional artificial reverberators usedin games tend to impose a single-roommodel, so they do not deal very wellwith outdoor spaces or detailed sur-face-proximity effects. In order to limitthe complexity of implementation withgeometrical simulation, Tsingos pro-poses to precompute the reflection pat-terns at key locations in a 3-D modelof the space. Image-source gradientsand hybrid directional-diffuse decayprofiles are stored in compact forms sothat they can be accessed in runtimewithout directly accessing the geomet-rical data of the modeled environment.The overall principle is depicted inFig. 6.In order to speed up rendering of

multiple sources and make the run-time engine more efficient, Tsingosshows how tens of thousands ofconcurrent blocks of audio, represent-ing direct sound, image sources, andreverberation decays, can be mixedusing frequency-domain processing.This enables a masking model to beemployed, so that only audible blocksare sent down the processing pipelinefor further action. A two-stage sortingprocess is used, whereby the level ofbroadband energy is used to obtain aconservative estimate of masking,followed by a finer-grained decisionfor those blocks deemed audible by thefirst stage.Giving an example of the computa-

tion requirements, he shows that a 14-room environment takes about 114

minutes to compute with one key loca-tion per room. After compression thedata structures occupy only 300Kbytes because late-decay blocks canbe shared between decay profiles.Combining decay profiles at runtimeusing convolution can be used to simu-late coupling between spaces, such asopening or closing doors.

POSTSCRIPTGame audio is clearly gettingsmarter, thanks to developments inreal-time music and sound genera-tion. Music and sound effects, as wellas environments, may be able to besynthesized in real-time based on theparameters of game states and objectsin the near future, leading to morevaried, compelling, and interactiveuser experiences.

AAEESSFebruary 11–13, 2009

Conference Chair: Michael Kelly

Royal Academy of Engineering. London, UK

THE PROCEEDINGSOF THE AES 35th

INTERNATIONAL CONFERENCE

ISBN: 978-0-937803-66-0

First EverConferenceDedicated toAudio for Games…held 11-13 February 2009 at the RoyalAcademy of Engineering, London, UK. Current and next generation gamingplatforms offer unprecedented levelsof processing power specifically foraudio. These changes facilitate a rangeof complex real-time audio architec-tures and DSP effects, offering cre-ative possibilities never before seen inthe areas of interactive sound designand music composition. Such develop-ments in technology also present newchallenges in audio programming andengineering, drawing on areas of mul-tidisciplinary expertise that were notpreviously relevant. The areas ofresearch addressed by the 34 confer-ence papers are sound recording andFoley, spatial audio, game music sys-tems, audio codecs, advanced spatialaudio and reverb, speech processingand analysis, and real-time synthesis.

Purchase online atwww.aes.org/publications/conferences

For more information email Donna Vivero [email protected]

or call +1 212 661 8528, ext. 42.

Available only on

CD-ROM.



Documents

Sound and Music for Games: Prokofiev to Pac-Man to Guitar Hero