9
376 J. Audio Eng. Soc., Vol. 63, No. 5, 2015 May CONFERENCE REPORT S everal dramatic changes are occurring in cinema exhibition, with immersive sound the new buzzword. Ultra High Definition Television (UHDTV) looks to be the next step in home entertainment, and consumers are increasing their use of personal media devices, with headphones looking to be a major segment of the future, reported Brian McCarty, conference chair, in his introduction to the 57th International Conference, “The Future of Audio Entertainment Technology— Cinema, Television and the Internet,” which took place at the TCL Chinese Theaters in Hollywood, California, March 6–8. Welcoming delegates to the conference, McCarty explained that the conference would cover important topics such as dialog intelligibility, loudness, headphone rendering and immersive audio, as well as the acoustics of cinemas, bringing together some of the world’s leading experts on these topics. ConFErEnCE rEporT 57 th International Conference The Future of Audio Entertainment Technology Cinema, Television, and the Internet Hollywood, CA, USA 6–8 March 2015

T+( F4341( 0) A4',0 E/3(13a,/.(/3 T(&+/0-0*7response, they just hear the response at their location, so the value of these is questionable and they do not remove response variability

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: T+( F4341( 0) A4',0 E/3(13a,/.(/3 T(&+/0-0*7response, they just hear the response at their location, so the value of these is questionable and they do not remove response variability

376 J. Audio Eng. Soc., Vol. 63, No. 5, 2015 May

CONFERENCE REPORT

Several dramatic changes are occurring in cinema exhibition, with immersivesound the new buzzword. Ultra High Definition Television (UHDTV) looks tobe the next step in home entertainment, and consumers are increasing their

use of personal media devices, with headphones looking to be a major segment ofthe future, reported Brian McCarty, conference chair, in his introduction to the57th International Conference, “The Future of Audio Entertainment Technology—Cinema, Television and the Internet,” which took place at the TCL Chinese Theatersin Hollywood, California, March 6–8. Welcoming delegates to the conference,McCarty explained that the conference would cover important topics such as dialogintelligibility, loudness, headphone rendering and immersive audio, as well as theacoustics of cinemas, bringing together some of the world’s leading experts onthese topics.

ConFErEnCE rEporT

57th International Conference

The Future of Audio Entertainment TechnologyCinema, Television, and the Internet

Hollywood, CA, USA6–8 March 2015

Page 2: T+( F4341( 0) A4',0 E/3(13a,/.(/3 T(&+/0-0*7response, they just hear the response at their location, so the value of these is questionable and they do not remove response variability

CONFERENCE REPORT

J. Audio Eng. Soc., Vol. 63, No. 5, 2015 May 377

Brian McCarty,conference chair,opens the event.

Sponsored by

Page 3: T+( F4341( 0) A4',0 E/3(13a,/.(/3 T(&+/0-0*7response, they just hear the response at their location, so the value of these is questionable and they do not remove response variability

CONFERENCE REPORT

CInEMA rEprodUCTIon And ACoUSTICSKicking off the proceedings with a keynote address entitled“Acoustics for the Theater and Home—Moving Forward on a Foun-dation of Common Acoustical Science,” Floyd Toole explained thatwe listen through rooms. It’s absurd, he suggested, to integrate acomplex room response into a simple measurement such as anoverall frequency response or one-third octave plot. No single“curve” can represent all of the relevant information, nor could itrelate properly to what the listener hears. One important rule isemerging—that early reflected energy from the loudspeakers needsto be similar to the direct sound in order to have a high-quality lis-tening experience. In cinemas the reverberation time rises at lowfrequencies because the amount of absorption is not very great.This tends to lead to bass levels being turned down. At higher fre-quencies in cinemas, the X-curve is often achieved simply as aresult of screen loss and not as a result of equalization. Why,though, he asked, do we roll off high frequencies in the cinema, aswe don’t do it anywhere else in audio reproduction? It seems thatin practice, measured cinemas deviate from the X-curve in that thebass levels are higher and the treble lower than the curve.

Toole introduced Linda Gedemer, who explained her work onpredicting in-room responses from anechoic loudspeaker data. A

known constant directivityindex, she said, is importantfor predicting this. Thedetails of this work can befound in the related conference paper containedin the proceedings. Toolesuggested that two ears anda brain are more useful thanan omnidirectional micro-phone in evaluating theperformance of loudspeakersin rooms. Designing loud-speakers for a flat directresponse is desirable. If youadd an EQ boost to counteract a dip in the loudspeaker-room responseyou will effectively create aloudspeaker resonance, he said.

In his paper on electroa-coustic measurements ofcinema B-chains inAustralia, David Murphyconfirmed Toole’s observa-tion that low frequencies arerelatively uncontrolled inmost cinemas. Perceivedquality, he said, is improvedby measurement and EQ ofhigh frequencies so as tohave an extended responseand consequently greaterclarity. It seems important,though, to acclimate audi-ences to this gradually, asit’s not been the norm inmovie theaters. It’s also diffi-cult to achieve this aim

when the loudspeakers areobscured by a screen.

Keith Holland reportedon the audibility of combfiltering due to cinemascreens. Reflections ofsound from the loudspeak-ers off the back of the screencauses this effect, but itturns out that it is barelyaudible in practice, partlybecause there is no compari-son for the listener with a

reference condition and partly because the magnitude of the effectis relatively small.

Glenn Leembruggen reported similar results to others in hispaper on equalizing the effects of perforated movie screens. The X-curve at HF is mainly the result of low-pass filtering by the screenlosses. If you attempt to introduce an inverse X-curve to compen-sate for this it can result in increased stress on the loudspeakers.This can amount to as much as 3.5 dB more power requirement formovies, depending on the crossover frequency of the loudspeakers,but this is within the realms of possibility for most real programmaterial. The crest factor of movie sound tracks, though, is verylarge and can be as much as 25 dB, so some limiting might berequired if one was to successfully equalize the effects of a perfo-rated screen. This was supported in a question that suggested exam-ples of recent driver failure in a movie theatre because of excessiveHF transients on a particular “worst case” movie sound track.

MAnAgIng low FrEqUEnCIESA workshop on low-frequency issues in cinemas started with Leem-bruggen looking at what happens at low frequencies in a range ofdifferent theaters for different audio channels. He had undertaken avery large number of measurements that gave clear insights intothe behavior of current systems. He was followed by Manny LaCar-rubba who considered the challenge of “how to create really greatbass.” In subjective terms, he proposed, people use terms such aspunchy, tight, extended, balanced, clean and so forth. Why, though,is it so hard to achieve? It turns out there are many relativelyuncontrolled factors. Manny looked at subwoofer response in differ-ent room locations, which was variable as a result of room modes.By introducing more subs in different locations and a bit of delaythe response could be evened out quite a lot, and he suggested thatparametric EQ could be applied to flatten out the response, averag-ing it at a number of positions. If we used bass management in cin-emas, he proposed, we could probably achieve better results thanwe do now.

“Everyone loves bass”,enthused Anthony Grimani.It gets you further thananything else when persuad-ing people of sound quality,but it is hard to do in asmall room. Putting anexcellent sub (one thatmeasures well in the nearfield) in a small room indifferent locations give riseto very different results.There can be as much as 20dB peak-to-peak variation,

378 J. Audio Eng. Soc., Vol. 63, No. 5, 2015 May

Floyd Toole explains how we listen“through rooms.”

David Murphy on electroacousticmeasurements in cinemas

Glenn Leembruggen talks about theeffects of cinema screens.

Brian Long of Skywalker poses aquestion in the cinema acousticssession.

Manny La Carrubba describes how toget really good bass.

Page 4: T+( F4341( 0) A4',0 E/3(13a,/.(/3 T(&+/0-0*7response, they just hear the response at their location, so the value of these is questionable and they do not remove response variability

CONFERENCE REPORT

so proper bass management is desirable. Multiple subs are neededas is modal distribution if the room dimensions are controllable.

According to Keith Holland and Philip Newell, two bass loud-speakers gives rise to a more even response below 70 Hz, whereasone is better above that (because of the interaction between multi-ple drivers at higher frequencies). A lot depends on the spacing,though. Concerning spatially-averaged responses, they rightlypointed out that no one listener actually hears a spatially-averagedresponse, they just hear the response at their location, so the valueof these is questionable and they do not remove response variabilitywith position. Calibration measurements should be of the sourceitself, and not done in the room.

In his paper on enhanced LF reproduction in cinemas, Adam Hillexplained that stereo subwoofers may make very little difference tothe spatial image, but they can decorrelate the LF so that interac-tions and comb filtering are less of a problem. Confirming theprevious speaker’s point that spatially-averaged equalization doesnot improve seat-to-seat variation in response, he introduced anoptimization strategy with multiple subs that can help instead.Diffuse signal processing decorrelates the LF speaker outputs, andthis needs to be done in such a way that local technicians canmanage the process easily.

dIAlog InTEllIgIbIlITyIn a tour de force presentation covering virtually everything that isknown about speech and dialog intelligibility in audio entertain-ment, Peter Mapp explained that speech is so robust that you haveto work quite hard to mess it up. Why then, he asked, do we have somany problems in this area? He reviewed the reasons for poor intel-ligibility, showing that a listener’s understanding of speech is thekey factor. He looked at the various types of hearing loss in relationto phoneme level, referring to the so-called “speech banana” knownby audiologists, which is a curved region of the hearing curve mostcritical for speech intelligibility. Most of the critical frequencies forintelligibility lie between 1 and 4 kHz. Visual contact between talkerand listener can give asmuch as 15 dB improve-ment, it seems. The averagesyllable rate in spoken com-munications has increasedover the last ten years forsome reason, Peter said,which doesn’t help matters.Also in movies, directorialdecisions are often made forreasons other than dialogintelligibility.

There is no standard thatsays how intelligible soundin entertainment venuessuch as theaters and cine-mas should be, despite thestandards that exist in otherwalks of life, such asaircraft, stations, stadiumsand the like. STI (speechtransmission index) valuesof 0.65–0.7 are desirable,and it has been found in arecent project thatcomplaints cease when thevalue is above 0.6. There is a

reasonably linear relationship between signal-to-noise ratio andintelligibility, which is convenient, but raising levels can result inpoorer intelligibility unless it is done to overcome noise.

People with hearing impairment mostly seem to prefer mono tospatial versions of a mix when dialog and effects are combined. Withnormal hearing listeners the results are the same when it comes tointelligibility, but their overall enjoyment is lower for monoversions. When we realize that only about half the world’s popula-tion has “normal” hearing, this becomes quite striking. The homeenvironment gives rise to a different experience from the cinema, asattention is differently focused, so we need to do things to help.Custom remixing of films can be done to enhance dialog intelligibil-ity, with limiting, EQ, and compression used to carve a clearerwindow for dialog to cut through. Expert mixing is needed though,and it does not seem that this can be left to a machine. The HIPFlixprocess is said to be the audio equivalent of large print books,presenting clear dialog in the presence of competing sounds.

David Smith of Bose pointed out that flat screen TVs had poorerloudspeakers than former designs, and they were often mounted atthe back, which makes intelligibility poorer. Sound bars, too, seemto be less good than discrete loudspeakers, some not having a truecenter channel, although multi-axis beam steering is possible usingmultiple drivers. Considering the differences between home andcinema listening, David said that the dialog is lower in level athome. Facilities like Dolby Night Mode can help with intelligibility,and object-based audio control will increasingly enable flexiblehandling of dialog content in such situations.

Looking into the psychoacoustics of dialog intelligibility, KurtGraffy discussed Bregman’s ideas on auditory scene analysis, andreviewed the principles of critical band filtering and spatialunmasking. It may be possible to get better intelligibility, he said,simply by locating a sound object in a different place rather thanneeding to use level.

KEynoTEAt the end of the first day, a keynote presentation by Louis Hernan-dez, chairman and CEO of Avid Technologies, dealt with the need tochange business models so as to make workflow more efficient. Thecurrent model is broken, he said, and the business is changing rap-idly, requiring that content creators connect more efficiently to theconsumer. Using an all-digital chain one can make sound qualitybetter and the economics of working more viable.

Processing speeds and connectivity are accelerating at a remark-able rate, Hernandez reminded the audience, and distribution of

J. Audio Eng. Soc., Vol. 63, No. 5, 2015 May 379

Peter Mapp gets excited about dialogintelligibility.

By the light of his laptop, Kurt Graffydiscusses perceptual matters.

TCL Chinese Theaters seen from Hollywood Boulevard

Page 5: T+( F4341( 0) A4',0 E/3(13a,/.(/3 T(&+/0-0*7response, they just hear the response at their location, so the value of these is questionable and they do not remove response variability

CONFERENCE REPORT

380 J. Audio Eng. Soc., Vol. 63, No. 5, 2015 May

content has changed, which haspreviously been a huge cost thatis taken away from the creativeprocess. Captain America forexample has more than 350distribution masters, mainlybecause of audio compatibilityissues, so better standards wouldhelp here. We need to be betterfocused, more responsible andbetter organized, he said, withsecurity being an importantissue that needs to be addressedin collaborative systems. Datasecurity attacks, he said, arebecoming the third world war, sothis has to be taken seriously.

While media consumption hasrisen dramatically in terms ofhours per week, unless we enablepeople to be fairly rewarded fortheir work we will lose thecreative talent that generates thecontent. Hernandez thereforeoutlined a new Avid project thatamounts to a collaborativeecosystem, a marketplace withfree entry at the outset, and theoption for subscription oroutright purchase of the tools. Interms of standards, he said, Avidwill launch its own, as someoneneeds to step forward and makethis happen.

rECEpTIonAn enjoyable evening reception,held in the historic HollywoodRoosevelt Hotel, enabled the many delegates to the conference tomeet each other and unwind after a packed day. Further amuse-ment could be found outside the front door, watching the passingparade and sidewalk circus that is Hollywood Boulevard.

loUdnESS wArS AT THE CInEMAAlthough eminent otolaryngologist Dr Robert Sataloff, was unableto be present in person to deliver his tutorial on hearing loss, hehad prepared a narrated presentation that everyone was able toabsorb at the start of the second day. Sataloff reviewed the hearingsystem with pictures of the cochlea under different degrees of dam-

age due to noise exposure.He explained the basics ofoccupational hearing losss,and the different terms ofimpairment, handicap, anddisability that are importantfor U.S. compensation laws.Under OSHA regulations,levels higher than 115 dBAare not permitted, which issignificant for movie the-aters. Levels between 85 and89 dB require monitoring of

employee sound exposure, and90 dBA for eight hours isregarded as 100% of theallowed noise dose, abovewhich hearing protectionshould be employed.

A substantial projectdesigned to evaluate loudnessissues in cinemas had beenundertaken by Eelco Grimm.Measurements made during ashowing of the Transformersmovie (which is aimed mainlyat children) exhibited peaklevels of 134 dB (slow), and a50-minute averaged level wasgreater than 100 dBA.Seventeen movies in a surveypeaked more than 115 dBA(which had been quoted bySataloff earlier as the legalmaximum). Belgian law nowlimits Dolby fader settings incinemas to 4.5, way below thestandard setting of 7 that wasoriginally intended. Grimmdiscussed adapting the ITU-RBS.1770 loudness standard for the cinema, proposing ameasure called PLcin for theaverage loudness of a movie,that used the K weightingcurve, and was adapted toDolby fader scaling. Thiswould be encoded in metadataand the film replay level setaccordingly by the projection-ist. An average loudness of

–27 LUFS was proposed, with a distinction between dialog moviesand action movies.

In a subsequent paper on loudness and dynamic range in popularmusic, Michael Oehler and his colleagues showed that songs havebecome sadder and slower in recent years, for some reason. Theloudness has increased and the dynamic range decreased, therebeing more LF in recent songs. This did not seem to be dependenton chart rankings.

opporTUnITIES And CHAllEngES In THETrAnSITIon To STrEAMEd dElIvEry oF AUdIoConTEnTEverything is moving in the direction of streaming and downloads,said Roger Charlesworth of the DTV Audio Group, at the start of aworkshop on the above topic. Members of the panel included SeanRichardson of Starz Entertainment, Jeff Riedmiller of Dolby Labo-ratories, Jeff Dean of Meridian Audio, and Tim Carroll of TelosAlliance. Physical media are still important for music, particularlyin Japan and Germany, though. Despite the prevalence of down-loads as the means of delivery, no one on the creative side is gettingrich from them, although there is potential for financial reward onthe side of the labels. New services will move forward if the techni-cal solution is compelling enough, without waiting years for ratifi-cation of standards. It’s still the “Wild West” in this field, however,

Louis Hernandez Jr., chairman and CEO of Avid, gives the keynoteaddress at the end of the first day.

During the reception on the first evening: from left, SchuylerQuackenbush, Francis Rumsey, Brian McCarty, Peter Mapp, and TomAmmerman.

Eelco Grimm discussed loudnessproposals for cinema reproduction.

Page 6: T+( F4341( 0) A4',0 E/3(13a,/.(/3 T(&+/0-0*7response, they just hear the response at their location, so the value of these is questionable and they do not remove response variability

CONFERENCE REPORT

J. Audio Eng. Soc., Vol. 63, No. 5, 2015 May 381

it was suggested. Proprietarymarket chains are not inter-operable and music deliveryis highly fragmented. Ingeneral one should avoidpreconditioning content forstreaming, it was recom-mended. This allows audioto be encoded once andscaled for different mediadevices.

There followed threeresearch papers on diverse

topics, one by Andrew Mason on adaptive audio reproduction usingpersonalized compression, a second by Gavin Kearney on auditorydistance perception with static and dynamic binaural rendering,and a final one on using a binaural spatialization tool in currentsound postproduction and broadcast workflow by Pierre Bezard andhis colleagues. These can all be found in the formal conferenceproceedings and AES E-Library.

HEAdpHonES: dESIgn, pErForMAnCE AndEnTErTAInMEnT ConTEnT prodUCTIon During the afternoon of the second day there was an extensive ses-sion on headphones, as these are fast becoming the predominantmode of listening for many people. Bob Schulein chaired the ses-sion and introduced the basics of binaural reproduction, emphasiz-ing the importance of visual cues in overcoming some of its limita-tions (what he called cognitive dissonance). Sennheiser hadgenerously sponsored the session, providing wireless headphonesfor the entire audience to be able to hear the demonstrations.

Tom Ammerman then described the headphone virtualization ofspatial audio formats, with a demonstration of a spatial audiodesign package capable of authoring for any loudspeaker layout andfor sound objects. He pointed out that 85% of people listen on

headphones, so we have thechance to create somethingnew for them in 3D.Ammerman describedimmersive audio forcomputer games, using asystem in which audioobjects can be mapped toany speaker layout. He gavea demo of Doom 3 in 50.1surround on headphones.

Making repeatable head-phone measurements wasthe topic of Tad Rollow’spresentation, in which heexplained that there are a lotof different standards and anumber of methods using ahead and torso simulator.Todd Welti then talkedabout preferred alignmentcurves and headphoneequalization. One mightwant to equalize head-phones to flatten theresponse or to aim at acertain target curve that is

preferred by the majority of listeners. He had found that leakagearound headphone surrounds caused large variations at LF onhuman subjects, so he had made a modified softer artificial pinnaand done lots of repeated measurements to confirm stability. Withregard to target curves, Welti had found that people tend to prefer aslight bass rise and about a 1 dB roll-off at HF.

Martin Walsh introduced DTS Headphone:X, designed to makehigh-end audio more accessible over headphones. A referenceequalization is employed to deal with the massive difference thatexists between consumer headphones, tailoring them so as to meetthe reference response. There are various other specifications inthe certification program, such as distortion performance, andcontent mixed for headphones is also certified.

objECT-bASEd broAdCASTIngAccording to Frank Melchior, accessibility is one of the main aimsof developments in object-based broadcasting. Content could beadjusted to users’ needs, but the current infrastructure is not setup for this. We are missing the distribution and rendering technol-ogy. Immersion is another important feature of object-based tech-nology, in that it can help to future-proof content so that one ver-sion can be reproduced over different reproduction layouts.Binaural delivery is also enabled this way, allowing dedicated mix-ing for headphones, but first it’s necessary to prove that the audi-ence wants this. To that end a survey had been done and a strongpreference had been shown for a binaural pilot production overconventional stereo with headphone listeners. Interactive featuresare enabled too, with users getting control over what they hear andsee—the BBC is using a web API to test some of these offerings.Frank went on to describe a project known as Responsive Radio, inwhich objects are deliveredto a browser that assemblesa program according tousers’ wishes. It can bemade as long or as short asdesired, not by changing thespeed but by actively editingthe content. In order tofacilitate object based con-tent production, the EBU’sopen-source Audio Descrip-tion Model (ADM) wasreleased on March 12.

The audience enjoys listening to the headphone demonstrations,courtesy of Sennheiser’s wireless sets.

Roger Charlesworth chairs the sessionon streaming audio

Headphone reproduction was Bob Schulein’s topic on Day 2.

Frank Melchior discusses object-basedbroadcasting

Tom Ammerman explains authoring of3D content for headphones.

Page 7: T+( F4341( 0) A4',0 E/3(13a,/.(/3 T(&+/0-0*7response, they just hear the response at their location, so the value of these is questionable and they do not remove response variability

A compelling demonstration of particle-based handling of audiosignals was then given by Nuno Fonseca, being a concept widelyused in graphics systems, where the content is broken up into tinyunits for mapping to different spatial locations. It is ideally suitedto content such as fire, smoke,dust, rain and so forth. Insteadof controlling each particledirectly, a limited set of parame-ters is used to govern the behav-ior of the particle system at amacro level. Nuno demon-strated a software prototype thatenabled sound particle motionto be captured with differentvirtual microphone setups frommono to immersive formats. Headdressed the question of howparticles could be handled asaudio objects as there could bemany thousands of such parti-cles. The solution proposedinvolves a hybrid channel/objectmodel in which each channel istreated as an object. He showedthree impressive examples ofimmersive audio DCPs (digitalcinema packages) made up ofparticle sources, and explainedthat a public beta of the systemis available for people to try.

There followed two papers onobject-based media production,the first from Alejandro GasullRuiz and his colleagues fromFraunhofer, the second fromUlli Scuda and colleagues onusing audio objects and spatialaudio in sports broadcasting.

ToUr oF UnIvErSAlSTUdIoSIn the evening of the second day there was anoptional tour of Universal Studios, not far fromthe conference venue, during which delegatesgot the chance to visit the sound department.The tour included a demo screening of themovie Unbroken in Dolby Atmos, presented byseven-time Academy Award nominated mixerFrank Montaño. The screening took place in thehistoric Alfred Hitchcock Theater, equippedwith the largest available Atmos system (91speakers), Auro 3D, & Imax. The tour alsoincluded a look at the machine rooms and TVmix stages.

IMMErSIvE AUdIo dAyThe last day of the conference was dedicated to immersive audio,and chaired by Francis Rumsey. During the first hour, Rumsey’skeynote charted the history of immersive audio and the variousabortive attempts there have been to get it off the ground. As withsurround sound before it, it seems that movies are the key to suc-cessful introduction of a new immersive audio format, with audio

alone not being a sufficient driver. Rumsey reviewed the variousstandards efforts that are taking place in the ITU, EBU, ATSC andSMPTE, among others, aimed at coordinating delivery and interop-erability, and he highlighted some of the key challenges that face

the industry if it is to move for-ward successfully in this area.As ended up being repeatednumerous times during therest of the day, one of the mostimportant and yet relativelyignored parts of the signalchain is the increasingly impor-tant rendering device that con-verts immersive audio contentand metadata into a suitablespatial reproduction over what-ever loudspeaker array is avail-able. There is no point specify-ing the rest of the chain andthe delivery format very care-fully if we don’t know what therendering device will do, orwhat its performance is.

Following the keynote, threeresearch papers were presentedon aspects of immersive andspatial audio. Jens Ahrenslooked into the properties oflarge-scale sound field synthe-sis, Hervé Desjardin andEdwidge Ronciere spoke onhow Radio France is makingavailable immersive audio tothe public using a website andbinaural audio, and MatthiasFrank discussed the productionof 3D audio using ambisonictools.

CInEMA dElIvErySTAndArdS

Brian Vessa, chair of the SMPTE working groupon the topic, chaired a workshop on deliverystandards for immersive audio to the cinema,with Ton Kalker of DTS, Bert Van Daele of AuroTechnologies, Charles Robinson of Dolby Labo-ratories, and Brian Claypool of Barco. It’s agood period, Vessa said, as there is increasedinterest in quality and the public is beginningto notice. He reviewed immersive audio termi-nology and some basic psychoacoustics, movingon to talk about work flows for immersiveaudio. These are complicated, requiring sepa-rate workflows for Atmos, Auro, and MDA

(DTS’s open standard for immersive audio description); one alsoneeds different home and cinema exhibition versions. We needstandards because of a lack of interoperability, which currentlymakes work flows inefficient and costs high. The National Associa-tion of Theater Owners and the Digital Cinema Initiative haveasked for a common standard. We also need to be able to take a cin-ema mix and repurpose it for the home in a simpler fashion.

In the cinema the immersive sound has to be rendered through

CONFERENCE REPORT

382 J. Audio Eng. Soc., Vol. 63, No. 5, 2015 May

The lucky delegates booked on the tour arrive at Universal Studios.

Explaining the super-wide mixing desk at the Universal dubbing stage.

Francis Rumsey gave a keynote talk onimmersive audio.

Page 8: T+( F4341( 0) A4',0 E/3(13a,/.(/3 T(&+/0-0*7response, they just hear the response at their location, so the value of these is questionable and they do not remove response variability

CONFERENCE REPORT

J. Audio Eng. Soc., Vol. 63, No. 5, 2015 May 383

whatever loudspeakersystem is available, but oneSMPTE DCP should beusable in any theater. Itseems that agreement hasbeen reached on using aCartesian coordinate systemfor describing the locationsof audio objects, but theystill need to define the meta-data and file formatsprecisely. There’s also thequestion of the data securityof these DCP files.

As Ton Kalker pointedout, when loudspeakerlayouts and acoustics arenot the same, and there is arenderer in the way, wecannot expect the sameresults between the mixingstage and the cinema. Theywill be similar but not thesame, so trust has to bedeveloped if a universaldelivery format is to beintroduced. We have alreadyachieved this to some extentwith 5.1, although theissues are less complex.Charles Robinson hopedthat the engineer–listenerlink would not be broken inthe quest for a universalformat, saying that hiscompany would do all itcould to preserve this forcinema applications. Bertvan Daele suggested that any standard should allow companies topreserve the quality they want to guarantee with proprietarysystems, while Brian Claypool suggested that most novice listenersfind little difference between 5.1 and immersive audio. We there-fore need mixes to be more persuasive, he argued, and training isreally important.

Some of the discussion in this session centered on the need formixing rooms with hybrid or different replay systems so thatmixers can hear the results, as well as the need for a more clearlyspecified renderer, otherwise no one knows what they are aimingat. It’s almost impossible to define the metadata without knowingthe characteristics of the renderer that will use it to generate theimmersive result over loudspeaker arrays. There was also somediscussion of how far such a standard should go, whether we needto wait for everything to be fully determined before fixing a descrip-tion and file format, and whether it will be possible to objectivelyqualify renderer performance. We will end up making a compatiblemix, suggested Vessa, so it will be the mixer’s responsibility todecide what is acceptable. According to Kalker, we cannot possiblycover every possible configuration, so we need to come up withsome representative profiles that exercise all the metadata, uselistening tests with a reference renderer to check the results,assuming various different loudspeaker layouts.

In summary, it was suggested that the quicker we can get

immersive content to the cinema the better for the market, elevat-ing the experience for everyone. If we standardize too much weleave no room for innovation, and “we can’t bolt everything downtoo early.” Getting the most important things done first and thenextending the standard seemed to be a way forward.

IMMErSIvE AUdIo dEMoSJust before lunch, the audience was presented with a demonstra-tion of some movie clips in the Atmos format, over the systeminstalled in the theater, and Matthias Frank’s demonstration of theambisonic 3D production tools, prerendered for the Atmos array inthe theater.

SpATIAl AUdIo pApErSThree further research papers followed after lunch. Etienne Hen-drickx asked “Does Stereoscopy Change our Perception of Sound-tracks?” Thomas Sporer looked into the localization of audioobjects in multichannel reproduction systems, and Mark Vintonand his colleagues described next-generation surround decodingand upmixing for consumer and professional applications. Thesepapers can all be studied in the conference proceedings and E-Lib.

InTEgrATIng objECT-, SCEnE-, And CHAnnEl-bASEd IMMErSIvEAUdIo FordElIvEry To THEHoMEThe first of the afternoon’sworkshops dealt with thequestion of whether andhow object-, scene-, andchannel-based immersiveaudio can be integrated fordelivery to the home. In thehome there is likely to be areduced reproduction setupcompared with the cinema,and people could be usingheadphones, sound bars andvarious other transducers.Audio can also be deliveredto the home using a range oftechnologies such asstreaming, broadcasting,download, and physicalmedia. Panelists here wereSchuyler Quackenbush rep-resenting MPEG-H, Jean-Marc Jot of DTS, BrettCrockett of Dolby Labs, andWilfried van Baelen of AuroTechnologies. There aremany ways of describingimmersive audio content,and the panel looked at howthese can be integrated invarious ways for delivery tothe consumer, for replay ona wide range of devices.While object-based audio isnot synonymous withimmersive audio, it is often

A SMPTE-sponsored cinema deliverystandards panel was chaired by Brian Vessa.

Thomas Sporer explains how audioobjects are localized.

Wilfried van Baelen describes thedelivery of Auro 3D to the home.

Schuyler Quackenbush discusses theprinciples of MPEG-H.

Juha Backman asks a tricky question.

Jens Ahrens discusses sound fieldsynthesis.

Page 9: T+( F4341( 0) A4',0 E/3(13a,/.(/3 T(&+/0-0*7response, they just hear the response at their location, so the value of these is questionable and they do not remove response variability

talked about as if it is. In fact it is just one tool for enabling multi-channel audio to be delivered in a flexible way that allows for adegree of remixing and remapping at the rendering stage. There isan ATSC 3.0 call that invites technology proposals for dealing withthese issues in a U.S. broadcast context, where interactive facilitiesare combined with immersive delivery capability.

How To MAKE bIg SMAll—CAn wE rEAllybrIng IMMErSIvE SoUnd To THE HoME?  The final session of the day considered the question of “how tomake big small,” asking whether and how immersive mixesdesigned for the large screen can be translated for consumer con-texts, the challenges of designing rendering devices that can give aconvincing impression over reduced loudspeaker setups, soundbars, headphones, etc.

Frank Melchior of the BBC argued strongly for a “baselinerenderer” that would provide a reference point specifying the mini-mum functionality and performance that upstream systems couldassume. There was general agreement thatwe are placing a lot of expectations, both inthe cinema and in the home, on the “magic”rendering device that is given the job ofconverting the immersive content and meta-data into a convincing sonic result, giventhe replay system and layout in question.Unless we can know more about theperformance of such renderers, and specifythem more closely, it is very difficult foranyone working upstream to know what theend result will sound like.

Brian Vessa discussed translating immer-sive cinema mixes to the home, includingthe challenges and compromises for thestudio. He pointed out that a well-mixedtrack usually works surprisingly well in thehome and that mixers don’t seem to be“cranking up the high end to compensate forthe X-curve.” The aim is to translate themixing intent into a smaller space with lowerplayback volume and a smaller spatial image.This cannot be automated, it was proposed.Adjustment of the dynamic range is the mostcommon thing done to home mixes, but theyhardly ever adjust overall EQ.

Arnaud Laborie described the engineeringof a home cinema processor to handleimmersive audio. This involved loudspeakerremapping as well as room and loudspeakeroptimization. It’s necessary to measurewhere the loudspeaker is in the room andadapt accordingly, he said, then the resultsshould be accurate no matter where speakersare placed. This process uses a sophisticated3D microphone to measure the loudspeakerlocations based on a test signal. The systemhe described works with Dolby Atmos,DTS:X, Auro 13.1, and MPEG-H formats.

wrAppIng UpBrian McCarty and Sean Olive closed the con-ference with a summary of the action pointsand key conclusions reached during the con-

ference. They also thankedthe dedicated conferencecommittee for its efforts inbringing about such a suc-cessful event: Keith Hollandand Sunil Bharitkar, paperschairs; Bob Shulein, head-phone and binaural soundchair; Francis Rumsey,immersive and spatial audiochair; Peter Mapp, dialogintelligibility chair; NeilShaw, facilities; Linda Gede-mer, student volunteers coordinator; and Garry Margolis, treasurer.

CONFERENCE REPORT

384 J. Audio Eng. Soc., Vol. 63, No. 5, 2015 May

Linda Gedemer, student volunteers coordinator

Subscribe to the AES E-Library

Individual annual subscription$255 non-members$145 members

Institutional annual subscription$1800 per year

GGaaiinn iimmmmeeddiiaattee aacccceessss ttoooovveerr 1144,,000000 ffuullllyy sseeaarrcchhaabblleePPDDFF fifilleess ddooccuummeennttiinngg aauuddiioorreesseeaarrcchh ffrroomm 11995533 ttoo tthheepprreesseenntt ddaayy.. TThhee EE--lliibbrraarryyiinncclluuddeess eevveerryy AAEESS ppaappeerrppuubblliisshheedd aatt aa ccoonnvveennttiioonn,,ccoonnffeerreennccee oorr iinn tthhee JJoouurrnnaall

hhttttpp::////wwwwww..aaeess..oorrgg//ee--lliibb//ssuubbssccrriibbee//

Editor’s note: 57th proceedings are available athttp://www.aes.org/publications/conferences/.For individual papers go to http://www.aes.org/e-lib/