J Multimodal User Interfaces (2011) 4:147156DOI 10.1007/s12193-011-0069-1
O R I G I NA L PA P E R
Multi-modal musical environments for mixed-reality performance
Robert Hamilton Juan-Pablo Caceres Chryssie Nanou Chris Platz
Received: 18 January 2011 / Accepted: 11 October 2011 / Published online: 5 November 2011 OpenInterface Association 2011
Abstract This article describes a series of multi-modal net-worked musical performance environments designed andimplemented for concert presentation at the Torino-Milano(MiTo) Festival (Settembre musica, 2009, http://www.mitosettembremusica.it/en/home.html) between 2009 and2010. Musical works, controlled by motion and gesturesgenerated by in-engine performer avatars will be discussedwith specific consideration given to the multi-modal pre-sentation of mixed-reality works, combining both software-based and real-world traditional musical instruments.
Keywords Music Virtual environments Mixed-reality Multi-modal
Musical performance and the ways in which musicians inter-act across physical space have traditionally been constrainedby physicality: both the innate physicality of sounds motionthrough air as well as the ability to attend to and commu-nicate with motion and gesture by performer and audiencealike. Time and physical space are naturally intertwined, aslatencies tied to the motion of sound and light comprise a
R. Hamilton () J.-P. Caceres C. Nanou C. PlatzCenter for Computer Research in Music and Acoustics(CCRMA), Stanford University, Stanford, CA 94305, USAe-mail: email@example.com
J.-P. Cacerese-mail: firstname.lastname@example.org
C. Nanoue-mail: email@example.com
C. Platze-mail: firstname.lastname@example.org
very real component of musical experience. As such, musi-cal performance practices have evolved throughout historytaking advantage of our natural understanding of the physi-cal world, with the evolution of specific meta-languages ofmusical gesture and subtle communication as the end result.
Flashing forward to a Twenty-First century culture awashin connectivity and instant global communication, musi-cal performance can be liberated from the constraints ofstatic physical location, moving beyond uni-directional mu-sical broadcasting and streaming with fully-interactive poly-directional networked streams of audio, video and controldata. Networked musical performance has become a realitythanks to high-quality and low-latency multi-channel audiostreaming solutions across research grade and commoditynetworks . The physicality of distance, still constrainedby the laws of physics and the unwavering speeds of soundand light, has been tamed to an extent that acceptable laten-cies for the transmission of sound and music are routinelyachieved. As such, distanced networked performance of tra-ditional analog instruments is not only possible but becom-ing increasingly common [1, 31].
The use of computer-based three-dimensional renderedmulti-user environments and the implementation of enac-tive or gesture-based musical control systems within suchenvironments offers a multi-modal approach for the connec-tion of and interplay between distanced spaces. Streamingmedia technologies allow networked or telematic musicalperformances to span multiple disparate physical locations,conceptually creating an additive musical space existing asa sum of each location. Multi-user rendered environmentsallow users to interact and communicate in the context of anadditional dimensionality, combining the additive musicalspace of telematic performance with a shared and perceiv-able space simultaneously belonging to all participants andaudience members alike.
148 J Multimodal User Interfaces (2011) 4:147156
By combining virtual performers with live musicians, ahybrid or mixed-reality musical environment is created, al-lowing performers and audience members to combine, per-ceive and attend to musical stimuli and visual events oc-curring across both physical and rendered space. Throughthe use of immersive audio presentation and spatializationtechniques , audience members can aurally attend tosonic events in the rendered space, transposing their atten-tional location into the shared space. We believe it is thisborderbridging the multi-modal intersection of visual andauditory presentation systems and physical spacethat rep-resents the true nature of mixed-reality performance, allow-ing performers and audiences the opportunity to consciouslyand intentionally shift their attention to performance soundand action in one reality or the other.
2 A history of musical performance networks
While the transmission of music across communication sys-tems has only become truly ubiquitous within the last fewdecades, the history of network-based musical transmissiondates back to the end of the previous century with the in-troduction of Grays Musical Telegraph and Cahills Tel-harmonium . While both instruments were designed togenerate electronic sound and to transmit that sound overtelephone wires, the Telharmonium was established as per-haps the worlds first networked musical subscription ser-vice, selling live musical performances transmitted in real-time across telephone wires to paying establishments. Theability to create sound in one location and broadcast it forperformance in another location is a uni-directional methodof communication; there is no musical interplay or bi-directionality involved, constraining any audience memberto the role of passive listener.
Musical networks of particular interest can be defined asnecessarily bi or poly-directional connected musical streamseach representing one or more voices of live musical perfor-mance data; networks can exist within a single physical lo-cation on a local network or in discrete and disparate phys-ical locations and spaces, making use of both commercialand research-grade internet access. There additionally existsan increasingly well-populated community of network mu-sicians and an evolving performance practice for networkedensemble performance .
Early networked performances by The Hub  stand outas rich examples of the complex musical constructs formedthrough communal composition, improvisation and perfor-mance. Stanfords SoundWIRE group  utilizes multiplechannels of uncompressed streaming audio over its JackTripsoftware to superimpose performance ensembles and spacesalike, with performances of note including networked con-cert performances with the ensembles Tintinnabulate at RPI
(NY) and VistaMuse at UCSD, as well as with performersat Beijing University in 2008s Pacific Rim of Wire con-cert . Both the Princeton Soundlabs Princeton LaptopOrchestra (PLOrk)  as well as the Stanford Laptop Or-chestra (SLOrk)  have displayed the powerful possibil-ities of collaborative networked compositional form usinglocal area networks for synchronization and communicationand distributed point-source spatialization.
3 Game-based musical systems
The use of networked/multi-user video game engines formusic and sound generation has become increasingly com-mon as generations of musicians who have grown up withreadily accessible home video game systems, internet ac-cess and personal computers seek to bring together visu-ally immersive graphical game-worlds, wide-spanning net-works and interactive control systems with musical systems.Though its graphical display is rendered in 2-dimensions,Small_Fish by Kiyoshi Furukawa, Masaki Fujihata andWolfgang Muench  is a game-like musical interfacewhich allows performers/players to create rich musicaltapestries using a variety of control methods. Auracle , al-lows networked users to collaborate and improvise using vo-cal gesture. Fijuu2 , a fully rendered three-dimensionalaudio/visual installation controlled with a game-pad, tightlymarries the videogame and musical worlds through the useof immersive graphics and familiar game control systems.
4 Previous work in virtual worlds
q3apd  was an early modification of the open-source1.32 Quake 3 gaming engine  which outputs user coordi-nate, view and player data to Pure Data (PD)  using PDsinternal FUDI protocol, designed as the networking protocolused by PD to link GUI and DSP systems. Describing thework, Oliver states In the installation the movement, posi-tion, health, view angle and item status of 4 software agentsin combat was sent to the synthesis environment Pure Dataand used to make an auralisation of activity in the arena:a musical re-presentation of the flows and gestures of artifi-cial life in combat. .
For the work maps and legends , q3apd was used totrack user motion around specially-designed compositionalmaps, triggering modular components of a multi-channelcomposition and spatializing each one based on individualuser positions. Eight speaker representations were arrangedaround the periphery of the compositional map for maps andlegends, giving visual cues as to the relative panning loca-tion of the currently playing sounds in the eight-channelsurround sound field. By correlating avatar location with
J Multimodal User Interfaces (2011) 4:147156 149
Fig. 1 The distance between an avatars coordinate position and eightvirtual speaker positions is constantly measured in maps and legends
Fig. 2 Virtual performer and projectiles in nous sommes tous fernando
respect to each of eight speaker representations to the am-plitude of triggered musical events in an eight-channel sur-round soundfield, an overlay of the compositional map wascreated in physical space, bringing