Multilayer Formats and the Semantic Web: a Music Case Study · 2.2 Music Datasets in the Semantic Web A relevant experience regarding automatic interlinking of music datasets thanks

Multilayer Formats and the Semantic Web:a Music Case Study

Adriano BaratèLaboratorio di Informatica

MusicaleDipartimento di Informatica

Università degli Studi di MilanoVia Comelico, 39

Milano, [email protected]

Goffredo HausLaboratorio di Informatica




Luca A. LudovicoLaboratorio di Informatica




ABSTRACTThe advent of the so-called Semantic Web led to the trans-formation of the World Wide Web into an environment wheredocuments are associated with data and metadata. The lat-ter kind of information specifies the semantic context of datain a format suitable to be queried and interpreted in an au-tomatic way. Extensible Markup Language (XML) is exten-sively used in the Semantic Web, since this format supportsnot only human- but also machine-readable tags. On the oneside the Semantic Web aims to create a set of automatically-detectable relationships among data, thus providing userswith a number of non-trivial paths to navigate informationin a geographically distributed framework. On the otherside, multilayer formats typically operate in a similar way,but at a “local” level. In this case, information is contained,hierarchically structured and interconnected within a singledocument. Also in this context XML is extensively adopted.The goal of the present work is to discuss the possibilitiesemerging from a combined approach, namely by adoptingmultilayer formats in the Semantic Web, addressing in par-ticular augmented-reality applications. From this point ofview, an XML-based international standard known as IEEE1599 will be employed to show a number of innovative ap-plications in music.

Categories and Subject DescriptorsH.5.3 [Information Interfaces and Presentation]: Groupand Organization Interfaces—Web-based Interaction; H.5.5[Information Interfaces and Presentation]: Sound andMusic Computing; J.5 [Arts and Humanities]: Perform-ing arts

KeywordsSemantic Web, Multilayer formats, Multimedia, Music, XML,IEEE 1599

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

AM15, October 07-09, 2015, Thessaloniki, Greecec© 2015 ACM. ISBN 978-1-4503-3896-7/15/10. . . $15.00

DOI: http://dx.doi.org/10.1145/2814895.2814910

1. INTRODUCTIONThe concept of semantic network model as a way to repre-

sent structured knowledge dates back to the 1960s [6, 9, 10].In the era of computing, the locution Semantic Web wascoined by Tim Berners-Lee [4], the inventor of the WorldWide Web and director of the World Wide Web Consor-tium (W3C). The Semantic Web is meant to extend thenetwork of hyperlinked human-readable Web pages by in-serting machine-readable metadata about contents and howthey are related to each other. As a consequence, ad-hoc au-tomated agents can access the Web more intelligently andperform specific tasks on behalf of users.

In this context, linked data are a method of publishingstructured data so that they can be interlinked and retrievedthrough semantic queries. Web technologies and standardsare employed to share information that is automatically in-terpretable by computers, rather than provide contents read-able by humans. This enables data from different sources tobe connected and queried in the framework of the SemanticWeb [5].

Currently a strong interest is emerging towards open linkeddatasets. The concept of openness implies that a piece ofdata is open if anyone is free to use, reuse, and redistributeit, subject only – at most – to the requirement to attributeand/or share-alike. The goal of the Linking Open Data com-munity project launched by the W3C Semantic Web Educa-tion and Outreach group is to extend the Web with a datacommons by publishing various open datasets on the Weband allowing a cloud-based interactive visualization of thelinked data sets.

Currently many networks are adhering to the cloud oflinked open datasets, as shown in Figure 1. For our goals,interconnections among media- and music-related initiatives(shown in the upper-left part of the diagram) are particu-larly relevant. These concepts will be reviewed in the nextsection.

As it regards the structure of this work, in Section 2 wewill present related works concerning multilayer formats andSemantic Web applications respectively. Section 3 will pro-vide an overview of the key characteristics of the IEEE 1599standard. Finally, Section 4 will show how a combined useof the Semantic Web on one side and a suitable multilayerformat on the other may encourage innovative applications.Finally, the proposed results will be generalized and dis-cussed in Section 5.

Figure 1: Linked Open Data Cloud in August 2014 by Max Schmachtenberg, Christian Bizer, Anja Jentzschand Richard Cyganiak. Licensed under CC BY-SA 3.0 via Wikimedia Commons.

2. RELATED WORKSAs mentioned above, the goal of this paper is to show

the potential of a combined approach of local and globalsemantic networks, represented by multilayer formats andthe Semantic Web respectively. In this section we will reviewthe most significant initiatives in such two fields.

2.1 Multilayer Formats for MusicThe word “multilayer” means relating to or consisting of

several or many layers. Multilayer file formats are commonlyin use in information technology. For instance, nowadaysmost image editing softwares support objects on indepen-dent overlying canvases, and they produce multilayer filesto keep them arranged in layers with appropriate offsets.Besides proprietary file formats, it is worth citing an ex-tension of the Tagged Image File Format (TIFF). Similarly,the Scalable Vector Graphics (SVG) is a royalty-free formatdeveloped and maintained by the W3C which is capable ofemulating layers thanks to the more powerful and genericconcept of group.

In this context, conversely, we are interested in a specificmeaning of the word “multilayer”. From our point of view,such a concept is well suited to the description of complexentities presenting a number of facets: multimedia in general– and particularly music – can provide relevant examples inthis sense.

As it regards the former field, it is worth citing the Mov-ing Picture Experts Group (MPEG) formats. For example,MPEG-4 deals with the actual encoding of moving picturesand audio, presenting additional features such as extendedVRML support for 3D rendering, object-oriented compositefiles, support for externally specified Digital Rights Man-agement (DRM) and various types of interactivity [14]. Onthe contrary, MPEG-7 – formally called Multimedia Con-tent Description Interface – is a standard that allows fastand efficient searching for material that is of interest to theuser. It uses XML to store metadata, and can be attachedto timecode in order to tag or synchronize particular events,such as the lyrics of a song [8]. The combination of MPEG-4and MPEG-7 provides a multilayer environment for the com-prehensive description of multimedia.

As it regards music, the applicability of multilayer formatswas postulated in a number of scientific works, such as [7],[12] and [17]. Thanks to its intrinsic characteristics, XMLis generally considered the best language to encode musicdata and metadata. In this context it is worth mention-ing the Music Encoding Initiative (MEI), namely a markuplanguage for representing the structural, renditional, andconceptual features of notated music [16]. This format hasreached full maturity as regards notation, thanks both to theMEI community efforts and to its intrinsic extensibility. Asreported in [11], thanks to the MEI 2011 Schema extensionand customization can be easily applied to the core set ofelements to produce custom encoding systems that extendsupport for new types of musical documents.

In our opinion, a comprehensive description of music cango beyond the kind of information supported by MEI andother similar initiatives. For this reason we will adopt IEEE1599, a specific music-oriented XML format internationallystandardized by IEEE Computer Society in 2008. It ex-plicitly introduces the concept of layer as a way to keepheterogeneous information organized and the idea of spineas a way to interconnect different descriptions of the same

information entity within the same layer or across multi-ple layers. The guidelines and key characteristics of such aformat will be outlined in Section 3.

2.2 Music Datasets in the Semantic WebA relevant experience regarding automatic interlinking of

music datasets thanks to the Semantic Web is described in[15]. This paper presents an algorithm which takes into ac-count both the similarities of Web resources and of theirneighbors. The algorithm is tested on two different con-texts: i) to link a Creative Commons music dataset to aneditorial one, and ii) to link a personal music collection tocorresponding Web identifiers.

In the context of open data, it is worth citing the DBpediaproject, a community effort to extract structured informa-tion from Wikipedia and to make this information availableon the Web [1]. The main goal of DBpedia is to serve as anucleus for an emerging Web of open data, by allowing so-phisticated queries against datasets derived from Wikipedia,links among Wikipedia data and other datasets available onthe Web, publishing on the Web for human- and machine-consumption, etc. With reference to Figure 1, DBpedia is inthe center of the diagram and is strongly interconnected tomany other open-data initiatives.

This project had (and still has) an interest that goes be-yond the scope of pure scientific research. As it regards therelationship between media and the Semantic Web, DBpediainspired activities of data integration and document linkingperformed by the BBC [13]. The idea was using SemanticWeb technology – not only DBpedia but also Linked Data,MusicBrainz, etc. – to move across different informationdomains.

In an early stage of the Semantic Web, when this definitionhad just begun to circulate among experts, they immediatelysaw the possibility of applying a semantic approach to musiccataloging. A pioneer project was the aforementioned Mu-sicBrainz, a large database of user-contributed music meta-data [18].

3. IEEE 1599 KEY FEATURESIEEE 1599 is an international standard sponsored by the

Computer Society Standards Activity Board, designed bythe Technical Committee on Computer Generated Music(IEEE CS TC on CGM), and officially recognized by theIEEE in 2008. IEEE 1599 adopts XML (eXtensible MarkupLanguage) in order to describe a music piece in all its as-pects [2], and this result is achieved thanks to the multilayerstructure described below. We will adopt such a languagesince it locally provides a semantic network of information.In this context we use the word “local” to distinguish theinformation patterns available in XML from the ones re-trievable from the network. In the following, we will explorethe possibilities offered by an integration of IEEE 1599 withthe Semantic Web.

The main goal of the format is providing a comprehen-sive description of music and music-related material withina unique framework. The descriptions of a music piece aremultiple and heterogeneous: its symbolic content, intendedhere as a sequence of music symbols, all its graphical and au-dio reifications (e.g. scores and recordings), additional meta-data (e.g. catalogue metadata, lyrics, etc.) and materials(e.g. photos, playbills, etc.), and so on.

Comprehensiveness in music description is supported by

IEEE 1599 thanks to a multilayered environment based onXML hierarchical structures. In particular, music and music-related contents are placed within six layers:

• General - catalogue information and other metadata;

• Logic - the logical description of the score in terms ofmusic symbols;

• Structural - identification of music objects and theirmutual relationships;

• Notational - graphical score representations;

• Performance - computer-based descriptions and per-formances of the piece;

• Audio - digital or ripped recordings.

Music events are univocally identified in the encoding, sothat they can be described in different layers (e.g. the graph-ical aspect of a chord and its audio performance), and mul-tiple times within a single layer (e.g. different performancesof the same music event). This is the role of a commondata structure known as the spine, which marks all musicevents through unique identifiers and intrinsically allows theestablishment of a complex network of relationships amongdescriptions and reifications of such music events.

In particular, the IEEE 1599 multilayer environment sup-ports two categories of relationships:1

1. Inter-layer relationships, which occur among contentsdescribed in different layers. Different layers store – bydefinition – heterogeneous information. An exampleis automatic score following, an activity permitted bythe relationships among Notational and Audio layercontents;

2. Intra-layer relationships, which occur among the con-tents of a single layer. Each layer contains – by defini-tion – homogeneous information. An example is com-paring how two singers perform the same piece, a re-sult that can be achieved by moving across differentinstances of the Audio layer.

By combining these two aspects, it is possible to designand implement advanced frameworks for music. From a se-mantic point of view, the strength point of the format is thepossibility to create – thanks to the spine – a network ofinterconnected descriptions for music events. Besides, lay-ers contain other metadata and tagging that can be usedas pointers towards external data sources, as explained inSection 1 and exemplified in Section 4.

For further details about IEEE 1599, please refer eitherto the official IEEE repository or to Reference [3], a bookthat discusses in detail many aspects of the standard. Inaddition, an official Web site containing documentation andexamples is the EMIPIU portal.2

1When a relationship is time-based or somehow implies thetime dimension, we can consider it as a form of synchroniza-tion.2EMIPIU Web site: http://emipiu.di.unimi.it

4. CASE STUDIESIn the following we will present and discuss some music-

related case studies where IEEE 1599 and the Semantic Webcould significantly improve user experience. These appli-cations have not been implemented yet, nevertheless theycould be realized through currently-available technologiesand devices. In most cases, obstacles are mainly logisti-cal and cost-related, since semantic tagging and open datacollections are not sufficiently pervasive and ad-hoc actionsshould be performed to make these proposals effective.

4.1 Interactive Street PostersPosters are a common form of billboard advertising, lo-

cated along roads to be viewed mainly by residents, pedes-trians and commuter traffic. The goal of advertisement is tocatch the attention in order to persuade an audience abouta commercial offering, an idea to convey or the availabilityof a service.

Besides well-known street posters, digital billboards capa-ble of displaying running text, graphics and even audio areavailable too. Clearly the presence of an underlying com-puter system may greatly enhance user experience and con-tent customization. An example is the Spotify Powered In-teractive Music Poster released in April 2012, an apparently-traditional poster embedded with a knock sensor to detectvibrations, which in-turn is hooked up to a micro Arduinoboard and connected to Spotify.

Instead of exploring advanced evolutions of advertising,we will concentrate on standard posters to show their pos-sible revivification in a Web 3.0 framework. In particular,we will take into account a poster advertising a jazz musicevent. In order to support advanced features, we will simplyequip it with a machine-readable two-dimensional barcode,known as Quick Response (QR) Code. In this way, commondevices such as smartphones can automatically detect andinterpret that small amount of information (e.g. a Web ad-dress) sufficient to enable a number of advanced features, asdetailed below.

Let us consider a jazz event dedicated to Billie Holiday.During the concerts, a number of tribute bands will per-form the greatest hits of the American singer and songwriter.The poster which advertises the event, shown in Figure 2,provides only basic details about the date and place of thefestival. On the one side, the attention of jazz lovers willbe certainly caught by this announce, but on the other alot of people passing by will wonder what is the scheduleof concerts, the name of performers or even the nature ofthis event. Thanks to the availability of a QR Code, theycould easily focus the poster through a suitable smartphoneapplication and get a number of metadata and services. Ina context of augmented reality, this could occur even au-tomatically thanks to wearable technology such as GoogleGlass.

Now let us produce and discuss a non-exhaustive list ofpossible applications. First, the QR Code could redirect tothe Web site of the event, thus providing a lot of additionalinformation and giving the possibility to buy tickets on line.Even if we have turned a traditional poster into an accesspoint to Web contents, in the mentioned cases we are farfrom the exploitation of the Semantic Web.

Now let us link the QR code to an IEEE 1599 documentcontaining multiple descriptions of a jazz piece, for instanceBillie Holiday’s 1936 performance of Summertime, the aria

composed in 1934 by George Gershwin for the opera Porgyand Bess. Provided that the device in use is enabled to parseIEEE 1599 contents through a suitable application, now itis possible to exploit the huge amount of data and metadatacontained in the mentioned XML document. The use casesand services that can be implemented include: letting theuser enjoy the complete original recording or other free audiotracks, retrieving information about all the singers who per-formed Summertime, implementing a simple score followingfor the main theme, and so forth.

If the described IEEE 1599-based fruition is inserted intoa network environment, or even better into the SemanticWeb, supported applications become countless (see Figure3). For instance, it is possible:

• to provide a preview of the concert, including audioand video contents recorder during a rehearsal session;

• to get historical photos of Billie Holiday from a pub-licly available repository, as well as the list of all moviescontaining Summertime in their soundtrack;

• to follow the career of one of the involved artists andto reserve a ticket for an upcoming event.

In this context, IEEE 1599 is important not only as localdata storage (providing scores, audio tracks, synchronizationinformation, etc.), but also as an authoritative source toquery the Semantic Web thanks to tagged metadata. Theseconcepts will be discussed and generalized in Section 5.

In conclusion, a typically passive way to communicatewith an audience can become interactive, entertaining andeven addictive. Keeping in mind the original purpose ofan advertising poster, its power in terms of communicationefficacy and ability to arouse interest is greatly increased.

4.2 Augmented OperaAugmented Opera is a proposal to apply augmented re-

ality to live opera experience. This example involves bothnetworking – in order to support user interaction with re-mote contents – and a multilayer format to encode local in-formation. Besides, tags and other semantic identifiers mustbe placed accordingly.

For instance, let us consider an IEEE 1599 document en-coding a complete opera, e.g. Turandot by Giacomo Puccini.This implies the availability of:

• Catalog metadata that can be used in a Semantic Webcontext to link other contents. For instance, it is possi-ble to query an open data repository to get informationabout Puccini’s last years of life or about other worksby Giuseppe Adami, marked in the document as oneof the librettists;

• Symbolic information that virtually allows one to at-tach alternative descriptions and renditions to musicevents with the desired granularity, even note by note.In this way, it is possible to display libretto togetherwith music, provided that there is a synchronizationsource (this process could be automated thanks to pitchtracking techniques or demanded to human control);

• A number of score versions that can be experienced ina score-following environment during the show;

• Alternative audio and video contents, which probablywill not be launched during the live performance butwill provide other links in terms of metadata (e.g. his-torical performances, great interpreters, etc.) as wellas on-line purchase suggestions;

• Character identification on the stage through movablevideo tags;

• Additional graphic contents (such as related artwork)to improve user experience.

After building a semantic network of information, all thesedata can be suitably presented thanks to an augmented-reality device such as Google Glass. A use case is shownin Figure 4. The interface contains a number of advancedfeatures, such as dynamic identification of characters, auto-matic score and libretto following with user-defined trans-lation, links to external contents, and so on. Of coursethe interface must be designed to enhance user experiencerather than produce information overload, a typical risk ofaugmented-reality initiatives.

As mentioned above, some additional work could be re-quired to configure the environment accordingly. For in-stance, the automatic identification of the artists on thestage, as it regards not only their presence but also theircurrent position, would enable a number of advanced fea-tures: video captions could be placed near the correspond-ing singer, a feature particularly relevant when independentvoices are simultaneous in operatic ensemble pieces. Unfor-tunately, available technology does not permit to embed areliable face-recognition algorithm in a wearable device, dueto both limited computational resources and environmen-tal characteristics (darkness, distance, etc.). Nevertheless,alternative technological solutions are available, such as Wi-Fi based positioning systems (WPS) that provide a way toattach tracking tags to moving objects.

A similar situation occurs for automatic score (and li-bretto) following. In fact, even if relationships among mu-sic symbols, graphic scores and libretto are made explicitby the IEEE 1599 document (i.e. they are locally synchro-nized), these “packets” should be synchronized with the cur-rent live performance, whose timing cannot be determineda priori. Also in this context an automatic approach – e.g.pitch/beat tracking algorithms – would be helpful, but therequired level of reliability discourages its adoption. For in-stance, some opera houses (e.g. Teatro alla Scala of Milan)delegate the automatic advancing of lyrics to a manual pro-cess performed by a dedicated expert during the live show.

A positive side effect of this approach is the possibilityto create a user-tailored environment that does not interferewith the rest of the audience. For example, the debate oninstalling stage libretto-lyrics displays in concert halls andopera houses mainly depends on the possible inconveniencecaused to the public not interested in the scrolling text. Onthe contrary, this kind of solution is personal and can becustomized with regard to many aspects: language, font size,information position on the screen, etc.

5. GENERALIZATION AND DISCUSSIONA matter of debate could be the implementability of what

described in Section 4 by the adoption of either a suitablemultilayer format or the Semantic Web alone. In fact, anumber of advanced applications for multimedia fruition,

Figure 2: An example of interactive street poster.

Figure 3: Interaction with street-poster contents.

Figure 4: An example of augmented reality application for interactive opera experience.

music education, cultural heritage revivification, etc. havebeen designed without building complex and distributed se-mantic structures. In response to this concern, we recallthat both approaches – corresponding to local vs. global se-mantic relationships – have their own characterization andtypical use. An in-depth analysis of the applications de-scribed above clearly shows that only a subset of their fea-tures could be designed and implemented without this inte-gration, but the most advanced services and use cases de-pend on the availability of strong interconnections amonglocal and global information.

Such case studies can be generalized and extended to non-music fields. What emerges from their analysis can be sum-marized as follows:

• A multilayer format – here intended as a way to de-scribe the multiple facets of a given information entity– provides a network of interrelated data having itsown semantics. An application capable of managingsuch a format should provide an interface between lo-cal semantics and users in order to make relationshipsemerge and let local data be semantically queryable;

• If the mentioned multilayer format contains authori-tative tagging, as in most XML-based languages, thiskind of local information can be used to link and ex-plore external semantic data sets;

• The Semantic Web and open-data initiatives can ben-efit from the integration with multilayer formats froma number of perspectives, both as content and ser-vice providers in response to locally-driven semanticqueries, and as potential receivers of multilayer con-tents contained in local resources to be embedded intoa wider semantic network.

6. CONCLUSIONSIn this work we discussed the advantages of employing a

multilayer format in a Semantic Web context in order tofoster advanced music experience and implement innovativeservices. Music with its multiple facets represented a do-main to test the suitability of multilayer formats and theirintegration with network-based and open-data scenarios.

When wearable technologies will be widespread amongusers, augmented reality will finally leave the stage of theo-retical research to enter the exploitation phase, and seman-tic tagging will be extensively performed, then the approachof the user community towards information interconnectionand data exchange will be revolutionized, and many propos-als similar to those mentioned above will change our lifestyle.

7. REFERENCES[1] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann,

R. Cyganiak, and Z. Ives. DBpedia: A Nucleus for aWeb of Open Data. Springer, 2007.

[2] D. Baggi and G. Haus. IEEE 1599: Music encodingand interaction. Computer, 3(42):84–87, 2009.

[3] D. L. Baggi and G. M. Haus. Music Navigation withSymbols and Layers: Toward Content Browsing withIEEE 1599 XML Encoding. John Wiley & Sons, 2013.

[4] T. Berners-Lee, J. Hendler, O. Lassila, et al. Thesemantic web. Scientific American, 284(5):28–37, 2001.

[5] C. Bizer, T. Heath, and T. Berners-Lee. Linkeddata-the story so far. Semantic Services,Interoperability and Web Applications: EmergingConcepts, pages 205–227, 2009.

[6] R. J. Brachman. What’s in a concept: structuralfoundations for semantic networks. Internationaljournal of man-machine studies, 9(2):127–152, 1977.

[7] G. Castan, M. Good, and P. Roland. ExtensibleMarkup Language (XML) for music applications: Anintroduction. In Computing in Musicology, volume 12,pages 95–102. MIT Press, Cambridge, MA, 2001.

[8] S.-F. Chang, T. Sikora, and A. Purl. Overview of theMPEG-7 standard. Circuits and Systems for VideoTechnology, IEEE Transactions on, 11(6):688–695,2001.

[9] A. M. Collins and E. F. Loftus. A spreading-activationtheory of semantic processing. Psychological review,82(6):407, 1975.

[10] A. M. Collins and M. R. Quillian. Retrieval time fromsemantic memory. Journal of verbal learning andverbal behavior, 8(2):240–247, 1969.

[11] A. Hankinson, P. Roland, and I. Fujinaga. The musicencoding initiative as a document-encodingframework. In ISMIR, pages 293–298, 2011.

[12] G. Haus and M. Longari. A multi-layered, timebasedmusic description approach based on XML. ComputerMusic Journal, 29(1):70–85, 2005.

[13] G. Kobilarov, T. Scott, Y. Raimond, S. Oliver,C. Sizemore, M. Smethurst, C. Bizer, and R. Lee.Media meets Semantic Web – how the BBC usesDBpedia and linked data to make connections. In Thesemantic web: research and applications, pages723–737. Springer, 2009.

[14] R. Koenen, F. Pereira, and L. Chiariglione. MPEG-4:Context and objectives. Signal Processing: ImageCommunication, 9(4):295–304, 1997.

[15] Y. Raimond, C. Sutton, and M. Sandler. Automaticinterlinking of music datasets on the semantic web. InProceedings of the Linked Data on the Web Workshop,Beijing, China, April 22, 2008, pages 28–37. CEURWorkshop Proceedings, 2008.

[16] P. Roland. The Music Encoding Initiative (MEI). InProceedings of the First International Conference onMusical Applications Using XML, pages 55–59. IEEE,2002.

[17] J. Steyn. Framework for a music markup language. InProceeding of the First International IEEE Conferenceon Musical Application using XML (MAX2002), pages22–29, 2002.

[18] A. Swartz. MusicBrainz: a semantic Web service.Intelligent Systems, IEEE, 17(1):76–77, Jan 2002.

Documents

Multilayer Formats and the Semantic Web: a Music Case Study · 2.2 Music Datasets in the Semantic Web A relevant experience regarding automatic interlinking of music datasets thanks