A High-Fidelity Orchestra Simulator for Individual Musiciansâ

A High-Fidelity Orchestra Simulator for Individual Musiciansâ€™Practice

Adriana OlmosNicolas BouillotTrevor KnightNordhal Mabire

Computer Music Journal, Volume 36, Number 2, Summer 2012, pp.55-73 (Article)

Published by The MIT Press

For additional information about this article

Access Provided by Penn State Univ Libraries at 02/27/13 11:20PM GMT

http://muse.jhu.edu/journals/cmj/summary/v036/36.2.olmos.html

http://muse.jhu.edu/journals/cmj/summary/v036/36.2.olmos.html

A High-Fidelity OrchestraSimulator for IndividualMusicians’ Practice

Adriana Olmos, Nicolas Bouillot,Trevor Knight, Nordhal Mabire,Josh Redel, and Jeremy R. CooperstockMcGill UniversityDepartment of Electrical and ComputerEngineering3480 University StreetMontreal, QC H3A 2A7, Canada{aolmos, nicolas, jer}@cim.mcgill.ca{trevorknight, nordhal.mabire}@[email protected]

Abstract: We developed the Open Orchestra system to provide individual musicians with a high-fidelity experienceof ensemble rehearsal or performance, combined with the convenience and flexibility of solo study. This builds onthe theme of an immersive orchestral simulator that also supports the pedagogical objective of instructor feedback onindividual recordings, as needed to improve one’s performance. Unlike previous systems intended for musical rehearsal,Open Orchestra attempts to offer both an auditory and a visual representation of the rest of the orchestra, spatiallyrendered from the perspective of the practicing musician. We review the objectives and architecture of our system,describe the functions of our digital music stand, discuss the challenges of generating the media content needed forthis system, and describe provisions for offering feedback during rehearsal.

Ensemble study allows musicians to practice withinthe context of a group and refine their listeningand interaction with other instrumentalists. Thisrequires ample time, availability of the otherperformers, and large spaces. Individual study, onthe other hand, allows the freedom to practiceon any schedule, focusing effort on the specificareas of importance to the performer. Ensembleperformance requires skills beyond those thatcan be fully trained in solo practice, however.As part of a whole, a musician must constantly“fit” with the ensemble, as defined through theconductor’s instruction or vision; this fit is notfully specified in musical scores (Foote 1979; Progler1995). Broadly speaking, these factors can affect thesound and temporal characteristics of the musician’sperformance. The former can be described in termsof inflection, intonation, and balance, and thelatter by articulation and accuracy of fingering, asexamples.

Several technological solutions have been pro-posed to overcome the constraints of ensemblestudy. For instance, the Performance LearningSystem (Goldmark 1976) and Music Minus One(musicminusone.com) consist of prepared record-ings of a musical program from which an instrument

Computer Music Journal, 36:2, pp. 55–73, Summer 2012c© 2012 Massachusetts Institute of Technology.

or voice is missing. A musician may practice andlearn the omitted performance while being ac-companied by a recording of the ensemble, muchlike karaoke systems. Existing technologies of thisform suffer from an absence of visual cues, naviga-tion difficulties, and limited sound control of theensemble.

This article describes Open Orchestra, a systemthat combines the high-fidelity experience of theensemble rehearsal or performance, with the conve-nience and flexibility of solo study, overcoming thelimitations noted earlier. Open Orchestra builds onthe theme of an immersive orchestral simulator, andalso supports the pedagogical objective of instructorfeedback on individual recordings, as needed toimprove musicians’ performance.

The remainder of this article is organized asfollows. First, we review relevant literature inthe field, then introduce the objectives of oursystem, followed by a summary of use cases anda description of the environment in which themusicians use the system. Next, we discuss therequirements and needed technologies to con-struct the repertoire, as well as the overall systemarchitecture. Finally, we conclude with some com-ments on implementation challenges and futurework. (A video of the Open Orchestra system inuse is available from openorchestra.cim.mcgill.ca.)

Olmos et al. 55

Related Work

Many musical practice systems focus on improve-ment of instrument skills and are thus best suitedfor amateur players. For instance, the digital violintutor (Yin, Wang, and Hsu 2005) uses audio analysisfor providing feedback to the student. This can de-tect fingering mistakes and then display the correctposition using an avatar within a controllable 3Denvironment. SmartMusic (www.smartmusic.com)provides similar feedback, but does so through apop-up display of fingering positions. Similarly, thepiano tutor project (Dannenberg et al. 2003) usesscore-following software to analyze the student’sperformance, and then provide feedback where mis-takes are made. The system then emulates a teacherby finding the most appropriate lessons according tothe student’s mistakes. Such systems are generallytargeted to amateur musicians wanting to practicetheir instrument and enhance their basic-playingskills. Experienced musicians, however, developexceptional listening abilities (Koivunen 2002), andare, therefore, likely to detect most of their ownplaying mistakes.

IMUTUS (Schoonderwaldt, Hansen, andAskenfeld 2004) was designed to be a complete,autonomous tutor, used without human teachersand providing feedback to the student after eachperformance. The authors, however, suggest that itsbenefit would be maximized if used in conjunctionwith an actual teacher. The system defines a prior-ity of mistakes based on discussions with over 40teachers. For example, mistakes in articulation areless important for beginning players than control ofair flow. The system only informs students of a fewmistakes at a time in order to avoid overwhelmingthem. Students may also request hints in the formof additional annotations made by teachers. Thiswork has been extended and specialized to windinstruments in the VEMUS project (Fober, Letz, andOrlarey 2007).

The i-Maestro (Ng 2008) project targets stringinstruments, and provides interactive environmentsfor self-learning, private lessons, and classroomuse for pedagogy of both theory and performance.The project specifically addresses training supportfor string instruments with a particular interest

in linking music practice and theory training. Theresulting i-Maestro framework for technology-enhanced music learning is designed to supportthe creation of flexible and customizable e-learningcourses, including gesture following and exercisegeneration.

Other systems offer the experience of performingwith an orchestra, supporting real-time accompa-niment (Dannenberg 1984; Vercoe 1984; Raphael2006) by synthesis according to anticipation of theperformer’s trajectory through a non-improvisedmusical piece. Although potentially useful to ob-tain a more responsive accompaniment than MusicMinus One or its derivatives, such accompanimentsystems are not necessarily intended for instruc-tional purposes, and do not provide the criticalfeedback to help improve one’s performance. Intheir review of various musical tutoring systems,Percival, Wang, and Tzanetakis (2007) discuss rec-ommendations regarding the design of such systemsfor young and amateur musicians. They concludethat visualization techniques may be useful to con-firm or correct a student’s judgment, but cautionthat such feedback should not be used to replaceself-assessment by the student.

Virtual environments for orchestral conduct-ing (Borchers, Lee, and Muhlhauser 2004) have alsobeen developed, allowing the user to control tempoand dynamics. In response to natural conductinggestures made with an infrared baton, the systemplays back a real-time audiovisual recording of anactual orchestra. In contrast, the virtual conductorby Reidsma, Nijholt, and Bos (2008) provides visualfeedback in the form of a conductor avatar, drivenby real-time audio analysis to help communicatetempo mistakes to the musician. A similar system,UBS Virtual Maestro (Nakra et al. 2009) reads theaccelerometer data from a hand-held Wii Remote,which the user moves like a conducting batonto affect the playback tempo and dynamics of anaudiovisual orchestral recording.

Networked musical performance approached thedevelopment of a virtual orchestral environmentby allowing musicians to collaborate in real timeat a distance (Woszczyk et al. 2005; Shober 2006;Carot, Rebelo, and Renaud 2007; Zimmermannet al. 2008). These systems focus mainly on small

56 Computer Music Journal

ensembles, however, and the quality often suffers asa result of network latency. Moreover, the ensembleperformance experience remains dependent onthe simultaneous availability of all performers, inaddition to an instructor for feedback and critique.

Open Orchestra Objectives

To address the shortcomings of previous systems,the Open Orchestra project aims to create a high-fidelity, immersive, network-enabled platform thatprovides individual musicians a rich audiovisualsimulation of ensemble performance.

The system is intended to support the full rangeof training activities for collaboration between amusician and a conductor within the context of arehearsal scenario. As such, our design was informedby the use cases arising from a series of field studiescarried out with a large jazz orchestra. Each of ourobjectives, resulting from this exercise, is describedbriefly herein.

Preparatory Activities

At the beginning of a rehearsal, musicians prepareby warming up their lips and fingers, tuning theirinstruments, and comparing notes with their peers,while awaiting a starting cue from the conductor.During this time, the conductor might also reviewthe score and briefly compare notes with certainmembers of the orchestra. To support some of theseaspects of real-world rehearsal, Open Orchestrakeeps track of the instrument(s) the musicianplays, personalizing the options presented, such asselections of musical parts and sound settings fromprevious sessions. At login, Open Orchestra usesthe introductory period to display any pertinent textmessages to the student. For instance, an instructormight have commented to the saxophonist on theneed to apply better intonation to particular notesin the music part. In the case of a first-time user, thesystem uses this period to recommend default soundsettings according to the student’s instrument(s)of choice, and encourages the musician to record

practice sessions for sharing with one of the listedinstructors or mentors.

Performance Guidance

When an ensemble was in the early stages oflearning a new piece of music, we observed theconductor counting the beat with his fingers andproviding verbal guidance as to what was expectedin certain sections of the music. Similarly, OpenOrchestra provides the musician with an optionaldigital metronome for tempo guidance and access tothe conductor’s guidelines, recorded as audiovisualcontent, for how he or she would like certainsections to be played. Because this is most likelyrelevant only in the early stages of rehearsal, thestudent can deselect playback of these generalcomments at any time. During regular (non-virtual)rehearsal sessions, we observed that musicians takeadvantage of small breaks or pauses to practice aspecific bar that might be hard to play, just as theymight repeatedly practice a specific complicatedsection at home. To support this functionality inOpen Orchestra, the musician can directly set thestarting and the ending points of a section using thewritten music, and then play that section repeatedlyin a loop. This offers greater flexibility than isavailable in a traditional rehearsal. As one studentcommented, during the rehearsal he “cannot pausethe conductor” to repeat a music section with theorchestra, but with our system, “here I can.”

Collaborative Tool

Ensemble rehearsal is more than refining one’s abil-ity to play a piece of music on a given instrument.Equally important is that it entails performing asa group, in coordination with the full orchestra.To support training of these aspects, the systemincludes recording and playback features so thatthe musicians can review their own performance.These features are augmented by a synchronizeddisplay of the music part and optional performanceevaluation tools that provide feedback concern-ing the quality of the student’s recording, basedon signal-analysis metrics and comparison with

Olmos et al. 57

professional recordings. The analysis is integratedinto the display of the written music part for vi-sualization purposes, and can be accessed by theinstructor as well, if requested. The musician canwrite comments or questions on the recording, andshare it with an instructor who, in turn, can alsoprovide feedback, either through written commentsor through recorded audio. These interactions occurasynchronously. When either the musician or theconductor logs into the system, all of their mes-sages are displayed, allowing them to continue theconversation or interaction at their convenience.

Feedback and Student–Instructor Interaction

During a rehearsal, the conductor uses gesturesand singing to describe the temporal variation ofnote dynamics, inflection, and intonation, and alsoclapping to indicate tempo, among other formsof feedback. These expressions can be conveyedby the audio and video of the conductor withinOpen Orchestra. In addition, to help students betterunderstand the subtleties of performance, OpenOrchestra attempts to offer feedback through a visualrepresentation of performance parameters, includingarticulation, timing, intonation, inflection, anddynamic contrast, in relation to similar parametersfrom a reference track of a professional musician.Although feedback is typically the prerogativeof the conductor, students often paraphrase theconductor’s comments, or ask questions related tothe interpretation of a certain section of the musicpart. This capability is supported in Open Orchestrathrough a feature for posting questions directly(but asynchronously) to the conductor. Musicianscan also record their performance and include thiswith their question(s) to the conductor. This allowsthe conductor to “zoom” in to the performance,listening either just to the instrument on its own, orin the context of the full orchestra. Because of timeconstraints, this capability is normally not feasiblein an actual orchestral rehearsal setting.

High-Definition Immersive Experience

Open Orchestra provides the musician with theexperience of sitting within the ensemble; but,

unlike previous systems, it does so with both high-definition video and high-resolution audio, renderedfrom the perspective of the instrumentalist. Fromprevious studies in virtual reality (Psotka 1995),we believe that this perspective is critical toachieving a compelling sense of immersion inthe simulated environment. To increase the qualityof this experience, the musicians see the conductorand the relevant part of the orchestra on a panoramicvideo display, and hear the rest of the orchestra withtheir own part removed. In addition, the systemdisplays the written music part and system controlson a digital music stand that can be operatedby touch. Among other capabilities, this offersthe musicians access to comments made by theconductor at certain bars, emphasizing certainaspects of the piece to which the orchestra shouldbe paying attention.

Network-Enabled Platform

The potential storage requirements for the col-lection of professional audio and video recordingsof multiple selections of ensemble performance,recorded or rendered from a variety of instrumen-talist perspectives, could easily overwhelm thecapacity of typical servers. To ensure content de-livery in the highest-quality format possible, it ismore effective to adopt a network-enabled platformin which one high-capacity server is dedicated forstreaming the high-definition content, on demand,over a high-speed network. Subsequent interactions,in the form of recordings, comments, and messages,are then stored in the cloud. This architecture servesthe need to communicate self-recorded samples ofthe student’s performance to a remote instructor(or conductor) for asynchronous review and com-ment. Further details are provided under the SystemArchitecture section.

System Overview

We now provide an overview of the system compo-nents necessary to achieve our objectives, includingthe audiovisual display, the various functions of thedigital music stand, and the feedback capabilitiesof Open Orchestra. At the time of writing, we have


Figure 1. Early prototype ofthe Open Orchestrahardware.

completed the design and implementation of a pre-liminary prototype (see Figure 1) that supports mostof the functionality described subsequently, apartfrom visualization of the performance feedback.Currently, the workstation is being tested at fivemusic schools across Canada.

Video and Audio Display

The panoramic view of the orchestra is deliveredthrough three 32-in NEC MultiSync V321 screens(see Figure 2); the audio is delivered through BoseQC3 noise-canceling stereo headphones. The de-cision to use headphones instead of loudspeakerswas motivated by the demand for an immersiveaudio experience that could be deployed in a typicaluniversity space, without the need to construct adedicated room with special acoustic properties.This provides the further benefit of isolating theperformer’s own audio from the accompanying or-chestral tracks when the student captures recordingsfor instructor feedback.

Control over audio rendering is necessary toselect between an idealized audio playback and amore realistic rendering of the performance from theinstrumentalist’s position within the orchestra, withthe sound of distant instruments attenuated. As oneof the drum players in our studies commented,“When you are playing in a big band as a drummeryou are so far back that all the sound is going out andyou have nothing.” From this student’s perspective,

Figure 2. The OpenOrchestra system ascurrently deployed. Thetouch screen, seen in themiddle, serves as themusic stand for display ofthe music part andcontrols.

the normal orchestral listening conditions makeit very difficult to play. Similarly, a trombonistcommented, “All you can hear is the first trumpetin your ear and whoever is beside you, everybodyelse is a big wash; I never hear any piano or anything,at least from where I am sitting.” Although studiesare required to demonstrate the pedagogical utility,we are interested in supporting initial learningfrom a more idealized rendering of the ensemblesound, and then gradually moving to more realisticlistening conditions. This prompted our studies ofaudio rendering perspective, summarized under thePreliminary Evaluation section.

Digital Music Stand

Written musical scores can be regarded as formalsymbolic objects that provide methods and pro-cedures to communicate common processes andgoals to various entities within an orchestra. Scoresalso support teamwork and collaboration within anorchestra, with annotations marking the locationwhere clarification, augmentation, or modificationof the written instructions is needed (Winget 2008).Given the importance of annotation functionality,and the desire to integrate it in a manner thatsupports its storage and communication, we incor-porated an electronic tablet as a digital music standfor display of the score and the user-interface ele-ments. The stand serves as the student’s interface tothe system, providing a display of the music part and

Olmos et al. 59

performance feedback, control of playback and audiolevels through finger-touch input, and music anno-tations through stylus input. Given its centrality tooperation of the Open Orchestra system, the standmust be easily adjustable to accommodate differentplayer heights as well as the player’s preference ofsitting or standing positions. This is accomplishedby mounting the tablet on an adjustable arm thatallows the stand to lift, tilt, pan, and rotate (seeFigure 2).

Actions that musicians perform while interactingwith their music part include selection of musicpieces, play/pause control, flipping through pages,and specifying start points from which to rehearse.Given their high frequency of expected use, wewant to ensure that these operations do not becomecumbersome. This motivates the use of a touch-sensitive display, allowing efficient control by fingertouch or gesture, rather than requiring interactionwith a dedicated input device, whether mouse orstylus. To support annotation in a natural mannerduring rehearsal (for example, marking up the musicwith feedback provided by the conductor), however,stylus input is desirable, to ensure a similar levelof writing efficiency and control as with a pen.Moreover, finger-touch sensing alone is likelyinadequate to indicate with accuracy a specific note,which in most cases occupies only a small numberof pixels on the display. These factors motivatedour adoption of a dual-input technology, supportingboth finger and stylus input, with high-accuracyposition detection.

Automatic Performance Feedback

Machine-based performance evaluation can offermusicians an objective assessment of their perfor-mance, independent of an instructor, and can beused by the instructor as one measure of a stu-dent’s progress. The latter aspect is important whenconsidering scalability to large groups, as this notonly offloads some of the responsibilities of theinstructor, but helps identify sections of the musicor aspects of the performance that are proving mostdifficult for the ensemble as a whole.

Analysis of the audio signal is used to extractfundamental frequency, harmonic amplitudes, and

Figure 3. Visualization ofarticulation with thereference (top) and student(bottom) performance.

root-mean-square (RMS) amplitude, from whicha number of musical features, including pitchand intonation, timing and rhythm, articulation,timbre, and dynamics can be inferred, similar tothe approach used in previous literature (Geslinand Lefevre 2004; Schoonderwaldt, Askenfelt, andHansen 2005; Daudin et al. 2007). These featurescan then be compared with those from a referenceaudio track, through an appropriate visualizationtechnique for articulation and dynamics.

Visualization of a student’s performance requiresa model against which comparisons can be madeto pinpoint areas in need of improvement. Unfor-tunately, an objective model, such as one generatedfrom symbolic data (e.g., the music score), lacksthe subtlety and nuance of interpretation naturallycontained in an audio recording. For this reason,an audio recording is preferred as source data forthe visualizations, representing the changes of botharticulation and dynamics over time. We performedfeature extraction of the RMS values of the record-ings with Sonic Visualiser (Cannam et al. 2006)and used Processing (processing.org) to generate thevisualizations.

Figure 3 shows an example articulation visualiza-tion of student and reference musicians playing thesame nine-note phrase, with the reference perfor-mance (top) and the student’s performance (bottom).In the reference performance, the musician hasarticulated the last two pairs of notes more clearly,whereas the student musician slurs them together.Timing differences among the performances canalso be seen on the corresponding two pairs, withthe student playing late on the first and early on thesecond.

Figure 4 shows the same phrase with a differentvisualization, this time mapping amplitude to bothheight and color, from yellow to red (shown as lightto dark gray in this publication). The data havealso been smoothed with a moving average to focuson the overall contour rather than the fine-grained


Figure 4. Visualization ofdynamics with thereference (top) and student(bottom) performances.

detail. In this view, the crescendo-decrescendocontour of the first five notes of the reference phrase(top) contrasts with the flatter student performance(bottom).

Unlike most music tutoring systems, the eval-uation provided by Open Orchestra is intendedto be oriented toward characteristics of ensembleperformance, rather than independent, technicalproficiency of the musician. Although it may provevaluable to include the latter as part of the feedbackprovided by our system, we are consulting with sev-eral conductors to ensure that we retain an initialfocus on the most appropriate metrics of ensembleintegration.

Asynchronous Expert Feedback

To maximize the benefit to the student, it willusually be necessary to complement the automaticperformance feedback, described earlier, with ex-pert feedback and technical instruction that onlya professional music teacher or conductor can pro-vide. Open Orchestra supports this capability bymaking selected student recordings accessible toan instructor or conductor, who may review theseat any time and offer feedback as appropriate. Thesystem thus establishes a collaborative platformwhere musicians, conductors, and instructors caninterchange ideas regarding the performance andinterpretation of a specific musical piece.

Building the Repertoire

For music schools and students, access to a widemusical repertoire through our platform representsa significant opportunity, not only for ensemblepractice, but also for discovery and/or considerationof new pieces. The process of building such arepertoire requires considerable effort for eachpiece, however. In addition, the content must be

correlated with a digital representation of each ofthe music parts, allowing for temporal and contentassociation between the practicing musician and thefull performance.

Audiovisual Content Generation

Two options exist to obtain perspective-specificaudiovisual rendering with independent volumecontrol for the multiple channels, the first involvingsimulation of the entire performance, the secondbased on live content acquisition.

Simulated Content Acquisition

The synthetic-reproduction approach offers theadvantage of being able to generate a large numberof audio pieces without the need for an expensiveand complex recording process, nor even the pres-ence of an instrumentalist. There is, no doubt, asignificant continuum of quality that can be ob-tained from a synthetic reproduction, ranging froma direct, “naive” Musical Instrument Digital Inter-face output, to one that attempts to reproduce thevarious annotations and descriptions provided bythe composer, as well as the deliberate shaping byperformers of expressive parameters such as tempo,dynamics, and articulation (Widmer and Goebl2004). In this manner, highly complex scores can be“performed” without the substantial costs of hiringprofessional musicians. Various computational ap-proaches have been investigated, for example, usingrule-based (Friberg 2006), structure-level (McAngusTodd 1992), and mathematical (Mazzola and Goller2002) models to formulate hypotheses of expressivemusic performance. Widmer and Goebl (2004) pro-vide a review of such methods, including their owncombination of note-level rules with structure-levelexpressive patterns induced by machine learn-ing (Widmer 2002).

A significant benefit of these models is theautomation of the generative process that wouldotherwise require a highly detailed description ofeach part, including precise intonation, inflection,timing, and other features. Such models, however,focus on common principles of performance and,

Olmos et al. 61

accordingly, permit limited control over personalartistic aspects of performance style (Widmer andGoebl 2004). For those implementations that supportsome degree of expressive control over the generatedmusic, the complexity of such control may requireexpertise beyond the musical skills of the targetusers.

Another important aspect of synthetic musicalreproduction is the synthesizer itself; most do notyet model a natural-sounding transition betweennotes (Lindemann 2007). This results, for exam-ple, in synthesized violin and trumpet sounds thatare perceived more as a concatenation of separatesounds than as a continuous musical phrase. Synful(synful.com) attempts to preserve the realism andcomplexity of natural sound, employing reconstruc-tive phrase modeling, which is an analysis-synthesissystem related to both additive and concatenativesynthesis, and includes a database of rapidly varyingcomponents derived from original recordings ofhuman performers.

Although it is perhaps only a matter of timebefore the subtleties of human performance can bemodeled and controlled adequately, current systemsrely on information provided in the score, for whichthe details of performance are typically not specifiedcompletely. For example, as frequently observed injazz scores, a piano staff may be left as a sequence ofchord names, or a drum staff might only designateparticular instants to emphasize. Nevertheless,we are encouraged to consider the possibilities ofsynthetic reproduction for Open Orchestra, at leastfor specific repertoires, under the assumption thatadvances in this regard are only a matter of time.

Similar synthesis of a video representation ofthe performance is, perhaps, as challenging. Thismight be achieved using computer-generated videoavatars of the ensemble members in place of ac-tual musicians, such as is done in the Harmonixgames series (e.g., Guitar Hero and Rock Band;www.harmonixmusic.com/). In these games, audiois acquired from a live band and motion capture isused to assist the animation process of the charac-ters (Lord 2009). Other music-driven avatar motionproduction includes motion and rhythm synchro-nization for dance (Kim, Park, and Shin 2003) andfacial emotion animation generated from music

Figure 5. The three-camerarecording apparatusinserted into the positionof the lead trumpet player.The cameras can be seenin close-up.

(DiPaola and Arya 2006). These techniques still in-volve acquisition of human gesture, however, whichwould, for our purposes, require the involvement ofactual musicians.

High-Fidelity Content Acquisition

The second approach, which we have recently used,involves using multiple audiovisual recordingsto capture the performance from each spatialperspective associated with the instrument locationintended for the simulator. This allows for renderingof an audiovisual experience for the practicingmusicians that is reasonably consistent with theirperception in a real, physical ensemble. To satisfythe requirements of our three-screen panoramicdisplay, an assembly comprising three PanasonicAK-HC900P cameras, as shown in Figure 5, waspositioned appropriately at each seat from whichthe video perspective was to be captured, withthe corresponding musician removed from therecording. For multiple recording positions, thisrequires several iterations, with the remainingperformers repeating the performance for eachcapture.

All video was acquired at 1280 × 720 resolutionat 60 frames per second (HD 720p format), andrecorded at AVC-Intra 100 using three PanasonicAJ-HPM110 recorders, then edited with Final CutPro and re-encoded in H.264, at various bitrates,


as required to accommodate the capabilities ofheterogeneous clients.

We had initially considered the possibility ofrecording the audio of each musician (or a set of sameinstruments) independently to ensure complete iso-lation of the associated tracks, and then combiningthese to create the desired mix during playback. Thisapproach would have required that the musiciansplay along in precise synchronization with one ofthe previous recordings. Early trials demonstratedthe difficulty of this approach, however, resulting inimperfect alignment, which would manifest mostvisibly in asynchrony between the audio and videotracks. For instance, the visual recording of a per-cussionist hitting the drum might lead or lag therecorded audio on a number of beats. Such misalign-ments would likely lead to students deliberatelyignoring the video display, thus undermining thevalue of our technology and depriving students ofthe intended immersive experience. Instead, wedecided to carry out the acquisition of audio andvideo tracks in parallel, as illustrated in Figure 6.

An Avid Pro Tools HD system was used as themain recording platform, capturing a total of 28 au-dio tracks at 24-bit, 96-kHz resolution in BroadcastWave Format. A synchronization signal was pro-vided by a Brainstorm DCD-8 Distripalyzer, whichalso generated tri-level sync signals for the three HDcameras. Single- or multi-directional microphoneswere placed for each musical instrument at themost appropriate positions for the instrument’sspecific musical acoustics. To ensure the highestquality of audio tracks, Grace Design microphonepre-amplifiers and Prism Sound analog-to-digitalconverters were used for the recording session. Inparallel, a binaural recording was made from theposition of each target musician (see Figure 5b) toprovide a reference of the auditory balance fromthat perspective. The audio editing and mixingwere completed in the Critical Listening room atthe Centre for Interdisciplinary Research in MusicMedia and Technology of McGill University. Thisroom was equipped with a Pro Tools HD systemand B&W 802D speakers powered by a Classe Audiopower amplifier.

At a later stage, solo audio recordings (of theremoved musicians) can be carried out, with each

Audio Mixing

Instrument voicesolo recording (fill in)

Editing Processes

Video

Audio

A/V Database

Ensemble Recording1

2

3

Figure 6. The recordingprocess for live contentacquisition: (1) Theensemble performanceis recorded removingone instrument posi-tion at a time.

(2) Individual musiciansor sections of theorchestra re-recordtheir audio in a studio,filling in the positionremoved from therecordings obtained in

the previous step.(3) Performer-specificlistening mixes aregenerated, usingspatialized rendering ofthe individual tracksfrom step 2 to recreate

the appropriate experiencefrom each perspective.The binaural recordingfrom the position ofthe player is used as areference in thisprocess.

musician listening through headphones to theprevious recording of the rest of the orchestra.We found that the musicians were quite capableof playing along in this manner, much as thestudent musicians will be doing when they usethe resulting Open Orchestra system for rehearsal.Acquisition of the solo recordings ensures a pristineaudio track, with the signal quality necessaryfor subsequent analysis and comparison with thestudent’s performance, but not one that is used forrendering the actual playback mix.

Olmos et al. 63

Enabling Score-Centric Control

In rehearsal, the music part serves as a dominantfocus of attention, motivating its use, in digitalformat, as the control interface. At present, scoresand music parts are mostly available in paper;these must be converted into a digital formatsuitable for use by our system. For representationof the score and music parts, we chose MusicXML(www.recordare.com/xml.html), a de facto standardfor musical score interchange that is semanticallymore powerful than the older MIDI format.

Figure 7 presents an overview of the process ofconverting acquired content, including both themusic part itself and the performance recording ofprofessional musicians, into a format suitable forthe simulation environment of the Open Orchestrasystem. This involves translation of the sheet into adigital representation, followed by alignment of themusic part and its associated audio.

Sheet/Digital Score Translation

The first step is done through optical musical recog-nition (OMR) software (Choudhury et al. 2000),which scans the paper sheets to extract a machine-readable representation that can be used by our sys-tem. We initially used SharpEye (www.visiv.co.uk;Fremerey et al. 2008), a MusicXML-compliantsoftware application, to extract both the notesthemselves and their spatial information on thepage. In theory, this would allow us to determinewhat note the students select based solely on theposition they “touch” on a digital display of thescanned music part. Although the typical perfor-mance of commercial OMR tools allows for reliabledetection of bars, however, OMR tools in generalremain substantially error-prone (despite signifi-cant progress in the last few years), resulting insystematic errors that require subsequent (manual)correction (Byrd and Schindele 2006; Fremereyet al. 2009). Unfortunately, manual correction up-sets the spatial relationship between the originalmusic part and its digital representation. As a result,our software must reconstruct the display of themusic part in order to maintain a correspondence

between on-screen locations and their associatedmusical elements.

Digital Score/Audio Alignment

During the acquisition of the high-fidelity audio andvideo that serve as our reference media for the OpenOrchestra system, the ensemble plays the musicparts, but interpretation leads to possible variationsof tempo and/or note placement among differentperformances. This precludes a straightforwardassociation between the musical elements of thescore and their associated timing in the actualrecording. Short of recording the piece with ametronome guiding the ensemble (a possibility thatwas quickly discarded because of its negative impacton interpretation), a computational approach totemporal alignment is required.

As an immediate solution for our initial de-ployment, we used a manual tagging system toobtain the timestamp for each bar. As a longer-term, automated solution, however, we consideredtwo different approaches. The first, score following(Dannenberg 1984; Vercoe 1984), is used as an onlinealgorithm for real-time alignment, which enablesthe computer to follow a particular musician, of-ten for accompaniment purposes (Dannenberg andRaphael 2006). Probability models and a trainingstep are commonly used in order to improve theperformance of the algorithm, although more-recentsystems (Cont 2010) adaptively update their internalstate during performance, and thus avoid the needfor offline training or parameter tweaking.

The second approach computes the alignment be-tween an audio recording and its score representationafter the fact. This has been investigated for severalapplications, including query-by-humming musicretrieval systems and media player synchronizationof animations with recorded audio (Kurth et al.2004). One of the most popular strategies, dynamictime warping (DTW), is based on non-linear warpingof two data sequences to find similarities betweenthem. These sequences are composed either of audiosamples or extracted features, such as onset, pitch,or chroma (Hu, Dannenberg, and Tzanetakis 2003),as appropriate for the given instrument (Devaney,Mandel, and Ellis 2009). (Chroma corresponds to


Figure 7. Alignment of themusic part with audio(and video) data identifiesthe associated timeinstants between media,

allowing for simulatorplayback control from ascore-centric userinterface.

the musical notes of the equal-tempered scale, anddoes not consider the register [octave] of a pitch[Shepard 1964].) For instance, wind instrumentsare better analyzed by chroma, whereas percussionand instruments with percussive attacks having awell-defined energy increase can be aligned usingnote-onset features (Muller 2007).

When alignment has to be performed on ensemblerecordings, however, the challenges posed by thepresence of multiple instruments (i.e., polyphony)can render these features irrelevant. In this case, itis preferable to apply DTW to a version of the scorethat is synthesized into a sequence of audio samples.For the alignment to succeed, the synthesized

Olmos et al. 65

audio interpretation must be sufficiently realistic,respecting accurate timing parameters of noteonsets and offsets. Notation software, such asFinale and Sibelius, or composition tools, such asOpenMusic (Agon, Stroppa, and Assayag 2000),can be used for this work. Some systems, such asOpenMusic and MaxScore (Didkovsky and Hajdu2008), support this process in real time, but requiremanual steps in their use, which thus precludesautomatization of the alignment process.

Our implementation of these steps is based on theVamp plug-in architecture (www.vamp-plugins.org).This software takes advantage of MATCH (Dixonand Widmer 2005), an existing audio-audio align-ment tool, to find the optimal alignment betweentwo audio sequences. The algorithm applies a spec-tral analysis and matches similar increases in energyalong frequency bins. Once audio frames are aligned,our system selects interesting instants, computedduring initialization, to extract related position inthe audio performance file. As output, the softwareproduces a table that translates logical score time,expressed in units of a configurable quarter-notedivision, into its related timestamp in the audio file,expressed in milliseconds.

Current practical use of the DTW with audiosequences provides reasonably accurate results, butvariation occurs according to parameter settings(window and step size) for the spectral analysis.Currently, the synthesized audio is generated usingexternal software, but we hope to automate this stepin the near future. This requires further investigationinto the tradeoff between the quality of the generatedaudio file (e.g., with respect to its sound bank orthe tempo) and the resulting alignment accuracy.Our initial tests indicate that alignment suffersfrom an under-specified part in the score—forinstance, in improvised solo sections, and also insome rhythm-only sections where piano chordsare cued without specification of rhythm or chordinversion. The latter results in simplistic audiosynthesis, which causes significant degradation inalignment accuracy. We are also considering theintegration of other strategies to cope with theseproblematic situations, thereby providing a moreaccurate alignment throughout the score. Althoughthis alignment process results in an average temporal

Figure 8. In this example, atrombone player couldselect a given measure andset a marker (representedby an arrow) to indicatethat the desired playbackposition in the part shouldstart from 0:23.

accuracy of one quarter note, this is sufficient toensure that a selected position on the music partalways corresponds to the desired measure. Thistechnique can thus be used in the future to align themusic piece with the actual performance.

Synthesis of the Digital Score

The digital score can then be synthesized, buildingon the extracted musical elements from the OMRstep and the temporal associations computed withthe recorded audio. The corresponding graphics con-tents of the score are rendered from the MusicXMLfile, and then arranged based on a set of heuristicsencoded in the software. As intended, this allowsfor playback control using a given music part as aninterface, as shown in Figure 8. For example, thespatial position within the music part serves as anindex into the recorded audio file to select a sectionfor looped playback.

Software Architecture

The Open Orchestra system deals with two maintypes of data, each best suited to a different approachfor storage and serving:

1. Large, persistent data: high-definition audioand video files that are provided to theuser for immersion within the simulatedorchestra.

2. Dynamic data: users’ own recordings and an-notations, comments, and feedback providedby instructors.


We designed our architecture around the premiseof heterogeneous client machines, especially antici-pating potential access not only from music schools,but also from students’ home machines, albeitwith a (likely) less-immersive rendering capability.This mandated that client storage and processingrequirements be kept minimal, and in turn, anydemanding computation, for example, audio spatial-ization, be performed on the server side, streamingthe requested content to clients.

For the large, immutable data of audio andvideo recordings of the orchestra, a dedicatedphysical server appears to be the most cost-effectivesolution. This machine must provide processingcapacity for audio perspective rendering, connectedto clients over a high-speed data network for real-time media streaming. For dynamic, user-generatedcontent, however, including student recordings andinstructor feedback, the requirements will varysignificantly depending on student activity; storageduration is shorter term; and real-time servingcapability is unnecessary. In addition, this dataonly needs to be shared between the student and afew individuals, such as instructors and peers. Assuch, cloud storage and processing may be moreappropriate, as this route offers more flexible scalingoptions as required for an expanded client base.

Integration and synchronization of user-generatedcontent stored in the cloud is being investigated.Approaches may be client-oriented, in which theensemble rendering will be provided by the server,with user tracks overlaid locally, or server-driven,in which the server simply includes the user-generated content as a regular audio channel whensynthesizing the playback (e.g., for the instructor toevaluate). In either case, synchronization betweenthe orchestral recordings and the user-generatedones requires attention to any delays induced by thelocal recording process.

Accessing the Orchestral Simulator

Operation of the system is illustrated in Figure 9.The user specifies parameters of the requestedstreams through the score-centric interface. A tablelookup obtains the corresponding timestamp thatindexes into the recorded audio for the requested

Figure 9. Audiovisualserver architectureallowing for real-timecontrol of audioparameters

instruments. User commands are then conveyedto a server-side audio/video sequencer, whichselects the appropriate media contents and performsmultichannel volume control and mixing in realtime. The server, which hosts the pre-recordedmedia, consists of a custom streaming engine and thesession managers, described subsequently. Becausethe musician hears the rest of the band throughheadphones, results are transmitted live to theclient as a stereo audio stream synchronized witha single video channel (Bouillot, Tomiyoshi, andCooperstock 2011), typically a “stitched together,”triple-width video that is split at the client fordisplay over three screens. It is equally possibleto render the experience through a surround-sound system, but this risks audio feedback duringrecording.

Our custom streaming engine is dedicated tocontent processing and stream serving. Its featuresinclude play/pause/seek controls in addition tovolume control over individual channels of au-dio. Internally, it handles simultaneous reading ofmultiple files, mixing of the various audio chan-nels into a stereo stream, synchronization of audiowith video, AAC encoding of audio at 320 kbps,stream marshaling, and transmission. The engineis implemented as a pipeline of components, build-ing on the GStreamer open-source multimedia

Olmos et al. 67

framework (Kost 2010). To maintain synchroniza-tion, the system clock is used to slave all elements inthe pipeline, including file reading and marshaling ofnetwork protocols. GStreamer derives several timesfrom the system clock and the playback state (e.g.,the running time of the pipeline and the current po-sition in the media). This “stream time” is handledby the data payloader that determines header times-tamps, allowing the marshaling element to specifysynchronization information between the audio andvideo streams. The protocol translator receives thestreams, decodes synchronization information, andconverts it into the format handled by clients (e.g.,Adobe RTMP), maintaining inter-channel synchro-nization for rendering. Interaction between theseserver components is described in detail in a relatedpublication (Bouillot, Tomiyoshi, and Cooperstock2011).

The session manager is responsible for translatingstreaming protocols and maintaining sessions withmultiple, possibly simultaneous, clients. Accord-ingly, each client is assigned a dedicated session,allowing for content serving and control forwardingto the streaming engine without affecting otherclient sessions. Our Java implementation of thiscomponent extends the default functionality of theWowza Media Server 2 (www.wowzamedia.com).

We have not yet evaluated control response time,a critical factor to ensure an effective experiencewith the system, especially for continuous controlsuch as volume adjustment. Fortunately, the latencycharacteristics of each system component aregenerally controllable, but tuning of buffer andaudio frame sizes is anticipated to balance betweenreliability and delay.

Automated Performance Feedback

The automated performance tools are intendedto highlight relevant elements of ensemble per-formance where the student needs to improve,hopefully enhancing the student’s understanding oftheir part in the orchestra. This might be achievedthrough a combination of comparative audio play-back and data visualization approaches; the latter

can exaggerate nuances as well as tell the studentwhat to listen for and where.

The display is based on automated comparisonbetween the recording of the student musician’sperformance and a reference “expert” audio track(see Figures 3 and 4), which may be a recordedexample of the same part or of other instrumentsfrom the same section or ensemble. The formerhas the advantage of being the exact same musicalcontent as what the student is intended to play, andserves well as a reference to evaluate the student’sfit with the rest of the ensemble. There are benefitsas well, however, to visualizing performance inrelation to different instruments, as, for example,in seeing one’s entrance in relation to the timing ofother instruments.

An important question for investigation is thechoice of which features to analyze. Section- andensemble-relative dynamic levels, timing, and into-nation are likely candidates because they all indicatewhether the instrument is playing cleanly within theensemble. In contrast, other features, such as instru-mental tuning, fingering, placement, and durationof notes relative to the score, articulation, and tim-bre, are more indicators of technical instrumentalmastery. A ranking of the importance for feed-back purposes of these and other possible featurescan be obtained through an experiment or surveyof musical professionals, such as was done withthe IMUTUS system (Schoonderwaldt, Askenfelt,and Hansen 2005). Similarly, the algorithms used toextract musical features, find dissimilarities withthe reference track, and determine the importanceof the different musical features will need to bevalidated by comparison to the values assigned byan independent human expert.

The analysis and comparison of musical featureswill no doubt generate considerable data, for whichboth judicious selection and due attention to thelimitations of information visualization will becritical for usability. Categorizing and selectivelydisplaying feedback will prevent overwhelmingthe user. Because a written music part is a famil-iar representation of both the time progressionand content of the piece of music for a giveninstrument, we intend to base as much of thefeedback visualization and navigation around this


representation as can be supported effectively. Suchfeedback will likely take the form of highlighting,annotations, or glyphs. The rich literature on vi-sualization techniques provides ample guidelines(Hiraga and Matsuda 2004; Daudin et al. 2007;Robine, Percival, and Lagrange 2007; Aggelos,Kostas, and Stefanos 2008) for display of paramet-ric data, but given the variety of representationalchoices available, the actual appearance of the visu-alizations will have to be thoroughly investigated.Early testing with musicians has indicated thatthe use of standard notation, when possible, helpsconvey clear and contextualized meaning. For ex-ample, an easy contrast can be made between thedynamic markings on the music part and dynamicmarkings generated from the students’ playing.Some features, however, such as timbre, lack sucheasy notation methods. Similarly, describing subtlevariations in timing may require a different methodfrom standard notation. Moreover, different featureswill require visualization at different time-scales:from the note- or measure-level to the macro-level,showing the entire piece.

Preliminary Evaluation

The Open Orchestra system has so far been eval-uated with respect to audio perspective and theperformance visualization. For the former, we con-ducted a preliminary evaluation of the respectivebenefits of different audio rendering perspectives,one from the musician’s perspective, and anotherfrom that of the audience, the latter being similarto Music Minus One. This involved an experimentwith eight jazz musicians, taking on their respectiveorchestral positions of lead trumpet, lead alto sax,lead trombone, and drums. Although the resultsindicated a slight preference for the audio experi-ence rendered from the musician’s perspective, thisdifference was significant only for the trumpet, sug-gesting that the value of a dedicated audio image isinstrument-dependent. Our ongoing work is exam-ining the customization of these audio parametersfor a given musician based on recommendationsfrom a mentor or conductor. Further details of thisstudy are available in a related publication (Olmos

et al. 2011). The ability of Open Orchestra to tailorthe learning environment in this manner may provevaluable to allow musicians to practice and improvetheir skills in a pedagogically optimal orchestralcontext.

Our study of automatic performance visual-ization (Knight, Bouillot, and Cooperstock 2012)considered support for musical interpretation. Thisinvolved subjective judgment of a student’s perfor-mance compared to reference “expert” performancefor particular aspects of musical performance: artic-ulation and dynamics. The experiment presented thetwo samples by either audio only, visualization only,or both together. Assessment of the effectiveness ofthe feedback condition was based on the consistencyof judgments made by the participants (all musicstudents) regarding how well the student musicianmatches the reference musician, the time taken toevaluate each pair of samples, and the subjectiveopinion about the feedback’s perceived utility.

For articulation, differences in the mean scoresassigned by the participants to the reference-versus-student performance were not statisticallysignificant for each modality. This suggests that,whereas the visualization strategy did not offer anyadvantage over presentation of the samples by audioplayback alone, visualization nevertheless providedsufficient information to make similar ratings. Fordynamics, four of our six participants categorizedthe visualizations as helpful. The means of theirratings for the visualization-only and both-togetherconditions were not statistically different fromeach other, but were statistically different from theaudio-only treatment, indicating a dominance ofthe visualizations when presented together withaudio. Moreover, the ratings of dynamics underthe visualization-only condition were significantlymore consistent than under the other conditions.

Conclusions

Open Orchestra was motivated by the desire toprovide an immersive, computer-assisted learningenvironment that enables an individual musician topractice with a simulated ensemble. In approachingthe task of its design, many existing technologies

Olmos et al. 69

were investigated, in particular, computer-assistedpedagogical musical systems that focus on im-provement of instrument skills. These systemsdo not address the needs of ensemble listening orinteraction with other instrumentalists, however.Conversely, present-day ensemble rehearsal systemsof the Music Minus One format suffer from criticaldrawbacks, including an absence of visual cues, lim-ited navigation possibilities, and lack of ensemblesound control. The development of our high-fidelityorchestral simulator required us to solve manychallenges that had not been addressed previously.

Access to both the dynamically rendered OpenOrchestra repertoire and to user-generated datarequires a network-enabled platform. Our approachis based on the integration of several existing net-working technologies, including cloud computingand several widely used streaming protocols suchas RTMP and RTP. Our resulting design is centeredaround a digital music-stand interface that controls,among other features, the playback of stereo audioand panoramic video from the musician’s positionwithin the orchestra. This interface also allowsvolume adjustment of each instrument section, andannotation of the score. The initial repertoire madeavailable through the Open Orchestra system isbeing acquired from high-fidelity audio and videosequences of semi-professional musicians, to ensurethat nuances of interpretation are preserved. Theprocess of extracting a digital representation ofthe score, aligning this with the audio and videorecordings, and rendering the results in a mannerthat reproduces the ensemble experience has beendescribed, along with the associated challenges foreach of these steps.

Open Orchestra enables rehearsal in a simulatedensemble with the freedom to control and tunethe playback. One of the most important aspectsof the learning experience relates to feedback,however. This is provided in two forms. First,students gain direct access to automated analysistools that visualize several audio features. Theselection of appropriate features to analyze iscurrently under investigation, with our emphasison those that relate to ensemble integration ratherthan instrument playing skills. Second, studentscan request external feedback from the conductor.Supporting the latter requires that Open Orchestra

serve as a collaborative tool, achieved by offeringstudents the ability to record their performancesand annotate the written music part for laterreview. Despite the unusual asynchronous natureof the resulting interaction between musicians andconductor, this offers several benefits. Notably, theability to obtain the conductor’s comments directlyon recordings of a student’s rehearsals offers thepossibility of individualized, private discussions ofspecific relevance to each musician. Furthermore,the Open Orchestra system may facilitate the task ofthe conductor, allowing for a rapid focus on specificelements that require attention, since the student’srecording can be played back either in isolation, orin conjunction with any or all of the master tracksfrom the professional recording.

Although the current prototype is functional,we are, at present, investigating issues of controllatency of distributed real-time media delivery,obviously a critical factor to the user experience.Considerable work remains to evaluate theprototype system, from both the students’ and theconductor’s perspectives, and make refinementsto its interface, music-analysis strategies, andvisualization techniques. In addition, our systempresently has no means of coping with improvisedparts, as are commonly found in jazz performance;these, naturally, will not align precisely withreference samples from previous recordings. Thisraises obvious challenges for future development.

Overall, simplicity and system reliability remainour principal goals, as the distraction of technicalsophistication should not, and cannot, impedemusical performance, which requires a great dealof attention and concentration on the part of theperformer. We look forward to evaluating the successof the Open Orchestra system in achieving theseobjectives in actual testing with music students.

Acknowledgments

The authors would like to thank Canada’s AdvancedResearch and Innovation Network (CANARIE)for funding of this work, conducted under a Net-work Enabled Platforms (NEP-2) program researchcontract. Many of the design decisions describedhere were informed by our highly valuable sessions


with Gordon Foote, Director of McGill University’sJazz Band I, and his students. The overall projectdescribed here is developed in collaboration withcolleagues at the Centre for Interdisciplinary Re-search in Music Media and Technology (CIRMMT)at McGill University. Specifically, the authors aremost indebted to the efforts of John Roston, WieslawWoszczyk, Doyuen Ko, and Jake Woszczyk, as wellas the CIRMMT technical staff.

References

Aggelos, G., P. Kostas, and N. Stefanos. 2008. “Real-TimeDetection and Visualization of Clarinet Bad Sounds.”Proceedings of the 11th International Conference onDigital Audio Effects, pp. 59–62.

Agon, C., M. Stroppa, and G. Assayag. 2000. “High LevelMusical Control of Sound Synthesis in OpenMusic.”Proceedings of the International Computer MusicConference, pp. 332–335.

Borchers, J., E. Lee, and M. Muhlhauser. 2004. “PersonalOrchestra: A Real-Time Audio/Video System forInteractive Conducting.” Multimedia Systems 9(5):465–468.

Bouillot, N., M. Tomiyoshi, and J. R. Cooperstock.2011. “Extended User Control over Multichan-nel Content Delivered over the Web.” Proceed-ings of the 44th Conference on Audio Net-working (pages unnumbered). Available online atwww.aes.org/e-lib/browse.cfm?elib=16148.

Byrd, D., and M. Schindele. 2006. “Prospects for ImprovingOptical Music Recognition with Multiple Recognizers.”Proceedings of the 7th International Conference onMusic Information Retrieval, pp. 41–46.

Cannam, C., et al. 2006. “The Sonic Visualiser: AVisualisation Platform for Semantic Descriptors fromMusical Signals.” Proceedings of the 7th InternationalConference on Music Information Retrieval, pp. 324–327.

Carot, A., P. Rebelo, and A. Renaud. 2007. “NetworkedMusic Performance: State of the Art.” Proceedingsof the 30th International Conference of the AudioEngineering Society (pages unnumbered). Availableonline at www.aes.org/e-lib/browse.cfm?elib=13914.

Choudhury, G. S., et al. 2000. “Optical Music RecognitionSystem within a Large-Scale Digitization Project.”Proceedings of the International Society for Music In-formation Retrieval (pages unnumbered). Available on-line at ciir.cs.umass.edu/music2000/papers/choudhurypaper.pdf.

Cont, A. 2010. “A Coupled Duration-Focused Archi-tecture for Realtime Music to Score Alignment.”IEEE Transactions on Pattern Analysis and MachineIntelligence 32-6:974–987.

Dannenberg, R. B. 1984. “An On-Line Algorithm forReal-Time Accompaniment.” Proceedings of theInternational Computer Music Conference, pp. 193–198.

Dannenberg, R. B., et al. 2003. “Results from the PianoTutor Project.” Proceedings of the Fourth Biennial Artsand Technology Symposium, pp. 143–150.

Dannenberg, R. B., and C. Raphael. 2006. “Music ScoreAlignment and Computer Accompaniment.” Commu-nications of the ACM 49(8):38–43.

Daudin, C., et al. 2007. “Visualisation du Jeu Instru-mental.” Actes des Journees d’Informatique Musicale,pp. 64–72.

Devaney, J., M. I. Mandel, and D. P. W. Ellis. 2009.“Improving MIDI-Audio Alignment with AcousticFeatures.” Proceedings of the IEEE Workshop on Appli-cations of Signal Processing to Audio and Acoustics,pp. 45–48.

Didkovsky, N., and G. Hajdu. 2008. “MaxScoreMusic Notation in Max/MSP.” Proceedingsof the International Computer Music Confer-ence (pages unnumbered). Available online atquod.lib.umich.edu/cgi/p/pod/dod-idx?c=icmc;idno=bbp2372.2008.034.

DiPaola, S., and A. Arya. 2006. “Emotional Remappingof Music to Facial Animation.” SIGGRAPH SandboxSymposium on Videogames, pp. 143–149.

Dixon, S., and G. Widmer. 2005. “MATCH: A MusicAlignment Tool Chest.” Proceedings of the 6th Inter-national Conference on Music Information Retrieval,pp. 492–497.

Fober, D., S. Letz, and Y. Orlarey. 2007. “VEMUS—Feedback and Groupware Technologies for MusicInstrument Learning.” Proceedings of the Sound andMusic Computing Conference, pp. 1–8.

Foote, G. C. 1979. “A Rhythmic Analysis of Swing MusicFrom the Count Basie Library.” Master’s thesis, Schoolof Music, University of Minnesota.

Fremerey, C., et al. 2008. “Automatic Mapping of Scan-ned Sheet Music to Audio Recordings.” Proceedings ofthe 9th International Conference on Music InformationRetrieval, pp. 413–418.

Fremerey, C., et al. 2009. “Sheet Music-Audio Identifica-tion.” Proceedings of the International Conference onMusic Information Retrieval, pp. 645–650.

Friberg, A. 2006. “pDM: An Expressive Sequencer withReal-Time Control of the KTH Music-PerformanceRules.” Computer Music Journal 30(1):37–48.

Olmos et al. 71

Geslin, Y., and A. Lefevre. 2004. “Sound and Musi-cal Representation: The Acousmographe Software.”Proceedings of the International Computer MusicConference (pages unnumbered). Available online atquod.lib.umich.edu/cgi/p/pod/dod-idx?c=icmc;idno=bbp2372.2004.138.

Goldmark, P. C. 1976. “Performance Learning Sys-tem. US Patent 3,955,466.” Available onlineat www.patentlens.net/patentlens/patent/US3955466/en.

Hiraga, R., and N. Matsuda. 2004. “Visualization of MusicPerformance as an Aid to Listener’s Comprehension.”Proceedings of the Working Conference on AdvancedVisual Interfaces, pp. 103–106.

Hu, N., R. B. Dannenberg, and G. Tzanetakis. 2003.“Polyphonic Audio Matching and Alignment forMusic Retrieval.” Proceedings of the IEEE Workshopon Applications of Signal Processing to Audio andAcoustics, pp. 185–188.

Kim, T.-H., S. I. Park, and S. Y. Shin. 2003. “Rhythmic-Motion Synthesis Based on Motion-Beat Analysis.”SIGGRAPH, pp. 392–401.

Knight, T., N. Bouillot, and J. R. Cooperstock. 2012.“Visualization Feedback for Musical Ensemble Practice:A Case Study on Phrase Articulation and Dynamics.”Proceedings of the Conference on Visualization andData Analysis (pages unnumbered).

Koivunen, N. 2002. “Organizational Music, the Roleof Listening in Interaction Processes.” Consumption,Markets and Culture 5(1):99–106.

Kost, S. 2010. “Writing Audio Applications usingGStreamer.” Proceedings of the Linux Audio Con-ference, pp. 15–20.

Kurth, F., et al. 2004. “Syncplayer: An Advanced Sys-tem for Multimodal Music Access.” Proceedings ofthe International Conference on Music InformationRetrieval, pp. 381–388.

Lindemann, E. 2007. “Music Synthesis with Reconstruc-tive Phrase Modeling.” Signal Processing Magazine,IEEE 24(2):80–91.

Lord, A. 2009. “Harmonix ‘Rock Band II’.” SIGGRAPHComputer Animation Festival, pp. 62–63.

Mazzola, G., and S. Goller. 2002. “Performance andInterpretation.” Journal of New Music Research31(3):221–232.

McAngus Todd, N. P. 1992. “The Dynamics of Dynamics:A Model of Musical Expression.” Journal of theAcoustical Society of America 91(6):3540.

Muller, M. 2007. Information Retrieval for Musicand Motion. Secaucus, New Jersey: Springer-Verlag.

Nakra, T., et al. 2009. “The UBS Virtual Maestro:An Interactive Conducting System.” Proceedings ofthe International Conference of New Interfaces forMusical Expression, pp. 250–255. Available onlineat www.cs.illinois.edu/∼paris/pubs/nakra-nime09.pdf.

Ng, K. 2008. “An Overview of the i-Maestro Projectand an Introduction to the 4th i-Maestro Workshop.”Proceedings of the 4th i-Maestro Workshop on Technol-ogy Enhanced Music Education (pages unnumbered).Available online at www.i-maestro.org/workshop.

Olmos, A., et al. 2011. “Where Do You Want YourEars? Comparing Performance Quality as a Functionof Listening Position in a Virtual Jazz Band.” Pro-ceedings of the 8th Sound and Music ComputingConference (pages unnumbered). Available online atwww.padovauniversitypress.it/proceedings-of-the-smc-2011-8th-sound-and-music-computing-conference.

Percival, G., Y. Wang, and G. Tzanetakis. 2007. “EffectiveUse of Multimedia for Computer-Assisted MusicalInstrument Tutoring.” Proceedings of the InternationalWorkshop on Educational Multimedia and MultimediaEducation, pp. 67–76.

Progler, J. A. 1995. “Searching for Swing: ParticipatoryDiscrepancies in the Jazz Rhythm Section.” Ethnomu-sicology 39(1):21–54.

Psotka, J. 1995. “Immersive Training Systems: VirtualReality and Education and Training.” InstructionalScience 23(5):405–431.

Raphael, C. 2006. “Demonstration of Music Plus One: AReal-Time System for Automatic Orchestral Accompa-niment.” Proceedings of the 21st National Conferenceon Artificial Intelligence, pp. 1951–1952.

Reidsma, D., A. Nijholt, and P. Bos. 2008. “TemporalInteraction between an Artificial Orchestra Conductorand Human Musicians.” Computers in Entertainment6(4):1–22.

Robine, M., G. Percival, and M. Lagrange, 2007. “Analysisof Saxophone Performance for Computer-AssistedTutoring.” Proceedings of the International ComputerMusic Conference, vol. 2, pp. 381–384.

Schoonderwaldt, E., A. Askenfelt, and K. F. Hansen. 2005.“Design and Implementation of Automatic Evaluationof Recorder Performance in IMUTUS.” Proceedingsof the International Computer Music Conference, pp.97–103.

Schoonderwaldt, E., K. Hansen, and A. Askenfeld. 2004.“IMUTUS—An Interactive System for Learning to Playa Musical Instrument.” Proceedings of the InternationalConference of Interactive Computer Aided Learning,pp. 143–150.


Shepard, R. N. 1964. “Circularity in Judgments of RelativePitch.” Journal of the Acoustical Society of America36(12):2346–2353.

Shober, M. F. 2006. “Virtual Environment for CreativeWork in Collaborative Music-Making.” Virtual Reality10(2):85–94.

Vercoe, B. 1984. “The Synthetic Performer in the Contextof Live Performance.” Proceedings of the InternationalComputer Music Conference, pp. 199–200.

Widmer, G. 2002. “Machine Discoveries: A Few Simple,Robust Local Expression Principles.” Journal of NewMusic Research 31(1):37–50.

Widmer, G., and W. Goebl. 2004. “Computational Modelsof Expressive Music Performance: The State of the Art.”Journal of New Music Research 33(3):203–216.

Winget, M. A. 2008. “Annotations on Musical Scoresby Performing Musicians: Collaborative Models,

Interactive Methods, and Music Digital Library ToolDevelopment.” Journal of the American Society forInformation Science and Technology 59(12):1878–1897.

Woszczyk, W., et al. 2005. “Shake, Rattle, andRoll: Getting Immersed in Multisensory, Inter-active Music via Broadband Networks.” Jour-nal of the Audio Engineering Society 53(4):336–344.

Yin, J., Y. Wang, and D. Hsu. 2005. “Digital Violin Tutor:An Integrated System for Beginning Violin Learners.”Proceedings of the 13th Annual ACM InternationalConference on Multimedia, pp. 976–985.

Zimmermann, R., et al. 2008. “Distributed Musical Per-formances: Architecture and Stream Management.”ACM Transactions on Multimedia Computing Com-munications and Applications 4(2):1–23.

Olmos et al. 73

Documents

A High-Fidelity Orchestra Simulator for Individual Musiciansâ