17
Creating Visual Music in Jitter: Approaches and Techniques Author(s): Randy Jones and Ben Nevile Reviewed work(s): Source: Computer Music Journal, Vol. 29, No. 4, Visual Music (Winter, 2005), pp. 55-70 Published by: The MIT Press Stable URL: http://www.jstor.org/stable/3681482 . Accessed: 22/02/2012 22:50 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. The MIT Press is collaborating with JSTOR to digitize, preserve and extend access to Computer Music Journal. http://www.jstor.org

Creating Visual Music in Jitter- Approaches and Techniques

Embed Size (px)

Citation preview

Page 1: Creating Visual Music in Jitter- Approaches and Techniques

Creating Visual Music in Jitter: Approaches and TechniquesAuthor(s): Randy Jones and Ben NevileReviewed work(s):Source: Computer Music Journal, Vol. 29, No. 4, Visual Music (Winter, 2005), pp. 55-70Published by: The MIT PressStable URL: http://www.jstor.org/stable/3681482 .Accessed: 22/02/2012 22:50

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

The MIT Press is collaborating with JSTOR to digitize, preserve and extend access to Computer MusicJournal.

http://www.jstor.org

Page 2: Creating Visual Music in Jitter- Approaches and Techniques

Randy Jones* and Ben Nevilet *2045 13th Avenue W Seattle, Washington 98119 USA [email protected] tUniversity of Victoria/Cycling '74 304-3122 Quebec Street Vancouver, British Columbia V5T3B4 Canada [email protected]

Creating Visual Music in

Jitter: Approaches and

Techniques

"Visual music" is a term used to refer to a broad range of artistic practices, far-flung temporally and geographically yet united by a common idea: that visual art can aspire to the dynamic and nonobjec- tive qualities of music (Mattis 2005). From paint- ings to films-and now to computer programs-the manifestations of visual music have evolved along with the technology available to artists. Today's in- teractive, computer-based tools offer a variety of possibilities for relating the worlds of sound and image; as such, they demand new conceptual ap- proaches as well as a new level of technical compe- tence on the part of the artist.

Jitter, a software package first made available in 2002 by Cycling '74, enables the manipulation of multidimensional data in the context of the Max programming environment. An image can be conve- niently represented by a multidimensional data ma- trix, and indeed Jitter has seen widespread adoption as a format for manipulating video, both in non- real-time production and improvisational contexts. However, the general nature of the Jitter architec- ture is well-suited to specifying interrelationships among different types of media data including au- dio, particle systems, and the geometrical represen- tations of three-dimensional scenes.

This article is intended to serve as a starting point and tutorial for the computer musician interested in exploring the world of visual music with Jitter. To understand what follows, no prior experience with Jitter is necessary, but we do assume a familiarity with the Max/MSP environment. We begin by briefly discussing strategies for the mapping of sound to image; influences here include culturally learned and physiologically inherent cross-modal associations, different domains of association, and musical style. We then introduce Jitter, the format

of image matrices, and the software's capabilities for drawing hardware-accelerated graphics using the OpenGL standard. This is followed by a survey of techniques for acquiring event and signal data from musical processes. Finally, a thorough treatment of Jitter's variable frame-rate architecture and the Max/MSP/Jitter threading implementation is pre- sented, because a good understanding of these mechanisms is critical when designing a visualiza- tion and/or sonification network.

Visualizing Through Mappings

Let us consider the compositional problem of creat- ing a dynamic visual counterpart to a given musical work. Jitter's dataflow paradigm, inherited from the Max environment, lends itself for use in composition as a means of defining relationships between chang- ing quantities. Mappings are transformations used to convert input parameters to outputs in a different do- main. Bevilacqua et al. (2005) present an overview of recent work on mappings in the context of gestural control. Classes of mappings based on the number of inputs and oututs include many-to-one, one-to-many, and one-to-one. In discussing visual music, the choice of parameters themselves will be our main concern. Given the large number of parameters that can be generated from musical data, and by which images can be specified, even the number of possible one-to- one mappings is too large to be explored fruitfully without some guiding principles. Significant works of visual music from artists including John Whitney, Norman McLaren, and Oskar Fischinger have shown the possibility of composing with sound and image to create a whole that is larger than the sum of the parts (iotaCenter 2000). By analyzing the mappings that underlie such works and considering some re- sults from the study of human audiovisual percep- tion, we can point to several avenues for exploration.

Jones and Nevile 55

Computer Music Journal, 29:4, pp. 55-70, Winter 2005

? 2005 Massachusetts Institute of Technology.

Page 3: Creating Visual Music in Jitter- Approaches and Techniques

Figure 1. The "Kiki and Bouba" experiment, pro- posed by K6hler (1929) and refined by Werner (1934).

Synaesthetic Mappings

Synaesthesia is a psychological term that refers to a mixing of the senses that occurs in certain individu- als. It is said to occur when a person perceives something via one sense modality and experiences it through an additional sense. Though synaesthesia does occur between sounds and colors, its most common form combines graphemes (written charac- ters) with colors. Grapheme/color synaesthetes per- ceive each letter with an accompanying color, linked by a consistent internal logic. The letter "T," for ex- ample, might always appear green to such a person, while "O" is always blue. Based on psychophysical experiments, Ramachandran and Hubbard (2001) have demonstrated that this form of synaesthesia is a true sensory phenomenon rather than a concep- tual one. At first blush, this consistent pairing of different sense modes might thereby seem to be a good foundation for audio/visual composition. However, the logic of synaesthesia is entirely sub- jective. The mappings derived by interviewing a given person with synaesthesia, for example, are not likely to be meaningful to another synaesthete, let alone any given viewer.

Some audio-to-visual mappings do exist that are likely to reveal their logic to a variety of listener/ viewers, owing to a basis in human perception or physics. For example, multiple concurrent voices in a work of music can be perceived in two comple- mentary ways: reductionistically and holistically. Trained listeners can choose to devote more of their attention to one or more voices or to the whole. Likewise, viewers of an animation with simultane- ous discrete visual elements can focus their atten- tion on one or more elements in the same way. In both the audio and visual domains, as more ele- ments are added, the listener/viewer becomes less able to devote attention to them all simultaneously. This isomorphism between sensory domains indi- cates that mapping single musical voices in a com- position to single graphical elements is a good approach for making visual experiences that can ac- company music in a meaningful way.

To make a graphical voice map to a musical voice, we can consider the musical voice's fundamental frequency, amplitude, and timbre. For frequency

. . . . . . . . . . . .

........... - - - - - - - - - -

and amplitude, certain mappings to graphical pa- rameters can be shown to have a basis in the physi- cal world. The smaller a physical object is, the higher the frequencies it tends to produce when res- onating. Therefore, we can say that when mapping the frequency of a voice to the size of a correspon- ding visual object, a scale factor that maps from high notes to small shapes and from low notes to large shapes is more natural than the reverse, be- cause it is consistent with our experience of the physical world. Likewise, amplitude of sound tends to map naturally to brightness of image, because amplitude and brightness are measurements of the same physical concept-intensity of the stimulus- in the audio and visual domains, respectively.

Timbre is too complex a phenomenon to be rep- resented by a small number of changing values. Characterizing musical timbre in terms of perceptu- ally salient quantities is a complex task to which there are many approaches (Donnadieu, McAdams, and Winsberg 1994). Some musically useful ap- proaches for mapping timbre to imagery might be suggested by results from the study of human per- ception. In an experiment designed by the Gestalt psychologist Wolfgang K6hler (1929) and further refined by Werner (1934), two drawings like the ones in Figure 1 were shown to a variety of people. When asked, "Which of these is a 'bouba' and which is a 'kiki'?," over 90 percent of people de- cided that the round shape is the "bouba" and the pointy one is the "kiki." The overwhelming verdict is striking.

In their work on synaesthesia, Ramachandran and Hubbard (2001) present strong circumstantial evi- dence that the bouba/kiki effect is created by cross- modal connectivity in the brain's angular gyrus, and

56 Computer Music Journal

Page 4: Creating Visual Music in Jitter- Approaches and Techniques

Figure 2. Excerpt from Syn- chromy (1971) by Norman McLaren. Courtesy of the National Film Board of Canada.

J=110 U..... mp

mmmmmm

that an excess of such connectivity is present in in- dividuals with synaesthesia. In other words, we are all synaesthetes to a degree. This supports the po- tential of KOhler's earlier findings as a model for mappings of timbre to form in visual music. As with the mappings of frequency and amplitude pro- posed above, we see an isomorphism between audio frequencies and frequencies in another domain, this time of curvature of form. Thus, analysis in the fre- quency domain provides a way to generate a mean- ingful mapping from timbres to shapes.

Mapping Domains

We can imagine using the observations in the pre- ceding paragraphs to construct a single audiovisual voice in a work of visual music. Its shape would change from smooth to spiky as its timbre gained additional high harmonics, its size would shrink as its fundamental frequency rose, and its brightness would rise and fall with the amplitude of its sounds, creating discrete notes that we would perceive si- multaneously through our eyes and ears. From a musical point of view, such strict mappings on their own can be banal, merely reinforcing correspon- dences we already understand. The isomorphisms

they manifest, however, are vital as a ground from which meaning can be derived. The degree to which the imagery and sound correlate through natural mappings in visual music is analogous to conso- nance between voices in tonal music. Each is an ex- pression of the agreement between the parts of a composition. Each can be perceived to increase or decrease, can be measured in an approximate way, but not quantified exactly except through arbitrary metrics. In visual music as with harmony, vitality comes not from total agreement but from move- ment over time between concord and discord.

The temporal scale at which a mapping operates is another degree of freedom available to the com- poser. Sound and image may be interrelated with a granularity ranging from milliseconds to minutes. Two examples of visual music by the Canadian artist Norman McLaren illustrate two points along this spectrum. Figure 2 shows an excerpt about 1.5 sec in length from McLaren's 1971 work Syn- chromy, with images from the film above the tran- scribed score. To realize Synchromy, McLaren composed music that he drew directly on the opti- cal soundtrack of 35-mm film as blocks of different vertical and horizontal sizes, which are audible as square waves of different frequencies and ampli- tudes, respectively. The visual component of the

Jones and Nevile 57

Page 5: Creating Visual Music in Jitter- Approaches and Techniques

Figure 3. Two excerpts from Lines Horizontal (1961) by Norman McLaren and Evelyn Lam- bart, with score by Pete Seeger. Courtesy of the Na- tional Film Board of Canada.

--160 r--3 - r--3

-- r ----3 - 3

mmmmmm-

A-m,-" M 4• l ~~l •TI- D

1• II .• I I 1 IW I.

i1 .1 W l I I I .. II I I

8i

film was created by manipulating the soundtrack on an optical printer to create multiple copies in different foreground and background colors. In this way, McLaren used the technology of film to associ- ate sound to image millisecond by millisecond. Each note, chord, and rest is clearly visible; the close correlation produces a strong synaesthetic effect.

Figure 3 shows two excerpts from the 1961 film Lines Horizontal by Norman McLaren and Evelyn Lambart. The flowing, emotive score by Pete Seeger correlates clearly with this minimalist study in mo- tion if time intervals on the order of 10-20 sec are considered. The flute melody on the top staff begins the piece, accompanying the measured introduction of the basic transformation by which the film's vi- sual complexity is generated: one line gives birth to another as it changes direction. The score for flute, banjo, and guitar modulates through several sec- tions as the film gains visual complexity, each mod- ulation coinciding with a change in the background color. At the film's climax, as the accumulation of lines dazzles the eye by creating an ambiguity be- tween figure and ground, the music builds to its

most harmonically adventurous point as shown by the guitar melody in the bottom staff, which is fol- lowed by a restatement of the audio and visual themes above.

A computer-mediated mapping that operates on a time scale like that of Lines Horizontal would coor- dinate groups of audio and visual events statisti- cally, rather than single events. Even when distinct voices do exist, looser mappings may be more salient, either alone or in concert with more fine- grained mappings. It may also make sense to shift the conceptual framework of the visualization to match that in which the audio is composed. For ex- ample, musique concrate, which bases its meaning partly on source material from within a specific cul- ture, may best be served by visual materials that ref- erence that same culture.

Mapping and Musical Style

Culture often dictates more basic isomorphisms as well. Figure 4 shows still frames from a visualization made with Jitter that Randy Jones created to accom-

58 Computer Music Journal

Page 6: Creating Visual Music in Jitter- Approaches and Techniques

Figure 4. (a-c) Three visu- alizations for the Suite from David Jaffe's Seven Wonders of the Ancient World (1996).

pany David Jaffe's Suite from the Seven Wonders of the Ancient World (1996), as performed by Andrew Schloss on the Radio Drum (see www.jaffe.com/ 7w.html). In the original Seven Wonders, Mr. Schloss controlled a Yamaha Disklavier with the Radio Drum; the moving keys of the Disklavier pro- vided a visual accompaniment to the performance. A projected computer-visual accompaniment was created for performances of the Suite for which a Disklavier was not available. First realized using Onadime Composer (www.onadime.com/products/ composer.html) and later using Jitter, the accompa- niment maps musical notes to visual ones using variations of form for each section of the Suite. Each variation is presented on a grid with one octave of "keys" per row. The rows are arranged bottom to top, from low frequencies to high frequencies. This arrangement has a loose physical basis in that smaller things (higher frequencies) are generally found higher in the air than bigger things (lower fre- quencies). However, the left-to-right order of in- creasing pitches within an octave makes sense mainly because it mirrors the piano keyboard, a culturally determined arrangement.

Color is a topic to which we may already have called attention by avoiding it so studiously. Like timbre, color is a complex phenomenon that can be quantified in different ways requiring multiple vari- ables. Thinkers since Aristotle have proposed par- ticular connections between musical notes and the colors of the spectrum (iotaCenter 2000). However, as shown by Fred Collopy's collection of "color scales" from 1700 to the present (rhythmiclight.com/ archives/ideas/colorscales.html), there is no basis for the universality of any one such mapping. Color can be mapped to sonic parameters besides pitch to aspects of timbre, for example. Neverthe- less, whatever mapping is chosen, a consistent in- ternal logic (as with mappings to shape and size) is vital to create the expectations that Meyer (1956) has shown are necessary for a work of music to convey meaning. In general, the choices of cross- modal mappings made by the composer, and their relationships to physically motivated or culturally defined mappings, help define the style of a work of visual music.

(a)

(b)

(c)

Jones and Nevile 59

Page 7: Creating Visual Music in Jitter- Approaches and Techniques

An Introduction to Jitter

The original version of Max was created to manipu- late MIDI data (Puckette 1988). "Patcher," as it was called, provided composers working with MIDI tone modules a graphical environment in which to create graphs of data flow between input and output, radi- cally simplifying the process of implementing new control systems. In its modern form, Max objects are designed to respond to messages that can be com- posed of multiple atoms. An atom can be a 32-bit in- teger, 32-bit floating-point number, or alphanumeric symbol. When MSP was introduced in 1998 (Zi- carelli 1998), it added a new data type: the signal. Signals are streams of 32-bit floating-point samples. Unlike messages, which can occur at varying inter- vals and lack a precise relationship to time, signals are sampled at a fixed frequency, usually controlled by an audio output device's master clock. In the in- terest of computational efficiency, objects that ma- nipulate signals perform their calculations on a group of samples each time MSP runs their code, rather than on one sample at a time. This group is known as a signal vector. With a signal vector size of 64 samples and a sampling frequency of 44.1 kHz, an MSP object's perform routine will be called roughly once every 1.45 msec.

In 2002, the introduction of the Jitter object set, primarily created by Joshua Kit Clayton, brought another new data type to the Max world. A matrix is a multidimensional container without any ex- plicit time-dependency. The spatial dimensions of a matrix define the number of cells within the ma- trix. A matrix may also have more than one plane, which affords data-parallelism in a spatial sense: each cell stores one scalar value for each plane of the matrix. Matrices can store one of four scalar data types: char (8-bit unsigned bytes), int (32-bit integers), float32 (32-bit floating point numbers), and float64 (64-bit floating point numbers). For ex- ample, a 320 x 240 matrix with four planes of char data would have 76,800 cells and contain 307,200 unsigned bytes. Each matrix is associated with a unique name, and matrices are passed between ob- jects by name in messages composed of the symbol j itmatrix followed by the matrix's name.

Video Matrices and Raster Possibilities

It is important to note that what we have called the spatial dimensions of a matrix need not be inter- preted spatially. For instance, as we will see later, it is possible to transcode audio signals into one- dimensional matrices for Jitter-based processing, or to represent the vertices of an OpenGL geometric model as a multi-plane, one-dimensional matrix. For an image matrix, the spatial interpretation of the coordinates is correct, because the values of the cells represent the colors of a two-dimensional grid of pixels. The ARGB representation separates a color into three different components: red, green, and blue, along with an additional opacity compo- nent known as the alpha channel. Accordingly, the most common Jitter image matrix represents a two- dimensional grid with four planes. Because 2563 is equal to the 16 million colors that a basic con- sumer video card can display, a typical Jitter net- work operates on image matrices of 8-bit chars. Image data may also be represented by any of the other primitive types supported by the matrix for- mat or as grayscale images if only one plane is pres- ent. The high-resolution image format OpenEXR (www.openexr.com) is also supported.

There are several different ways to import image data into the Max/MSP/Jitter environment. A com- mon starting point is to play back a QuickTime movie. This can be accomplished with the j it . qt .movie object, which loads the frames of a movie from a file on disk and provides them as data in the image matrix format discussed above. Alter- natively, live video can be routed into the Jitter en- vironment using the j it . qt. grab or j it . dx. grab objects. One can also synthesize two-dimensional imagery with the drawing tools of an object like j it. lcd, with a noise generator like j it .noise, or by manipulating the cells of a matrix directly by send- ing setcell messages to the j it .matrix object.

Of course, there are many objects that do not gen- erate matrices but rather accept input matrices and operate on their data. Examples of such filters are j it. brcosa and j it. traffic, which operate across planes in the color space of the image, and j it. tiffany and j it. repos, which operate on

60 Computer Music Journal

Page 8: Creating Visual Music in Jitter- Approaches and Techniques

Figure 5. Example of a simple Jitter network. At left, the image from the j itqt. . movie object; in the middle, j it. brcosa

operates on the color data of the image; at right, j i t. ti f fany resamples in the spatial dimension.

?i~i~iii?? i?? :Tn::,::?::? .: X-%:m :":::'::::,::4?::,:::.:::.,:::":: :::.:::.::4:' X:::'-'':: -. ... I .I..... ..".. .. - - . - :: 4:'.::l .:i ,::..4.:l'i :::?:?:.:?.:.:.:.::.:.:.:::::: 44.?l~ ::Q?:.:.:.:4?:.:4:.:4?c?4.:.:4m... ...........?........................ .?:.:t::.:!.:.:!?:.m.:%:::!"%?:.:!?11:,?1?:.::?:.:?:.:!?:.:.:-- ::

..................... I. ... I .....I ..... ::: .ff..? .1 " .. .' " ' .. " '. - I...:;;:: iw .%W

!G ~lR ii!!i ? :::::::::::,:.:.:::,:::;4,:::: ... ".., .... I~ : .:: ".".": :.:: :. ' :,: ? ?: N: :N ::

::;:fX:.XX? :...m ::X:.iXi::

:,?:::C:!.:::::?::I?1 . . . . I it .1 . : . : . : ....... I~ .1.1 .... I ... .... I .... I~ : .... I~ ~ I., .... 1".. " .. ..... -1'1.1-1. ? .......... I., :X:4::-":X-:.ilill I~ ....I ,p .. ..11. " .-1. " .... . .. .m . .X:.:: . " .. .I-I. :.I ., ::M .::: : I ::- .::: 1 ,:,:: :: ..... .......I .... '." .". .. XmmXX ." XMlm: imVX :X:S?I.:MI IM .: M :MI -M::- MX:IX ' ::.':I. .M:- ::: :MM ":::. :: 4. .:M .:: N .. .:: - :X . :X. .M , -: m - XM M- 1:1: ::: i 8.: %: 1 1,: % 1 :1 ?1 :. . " X X:.X - :4::4 " :XN" V ,. ....Z.1......... ............ ll:1 : -:X M - I M ::1:X-:X : ., .. . ... . . .... ... .. . .. .... .. . .. .. .... ...:.................... % . i .....x-N it ::- :j:X : :: : : :X ::m ::.: M .-% .4 . .: .. I NN - .' N. : . . .... ... .............. ....... ... . .... ... .. ... ;: ..... :.................................1 : 1 :8 .'. .' . :. :1 ? 1 1 N : :F. : ? :: I, -1 . -11. - I ".."1 ..111. 1 ?! tXj i:: ?1 :1?::X X ::I:XM:.::1 i:X 11 - . ::M:.:Xj Xj i:: : . . .I . - .: -? 1 ?: ? : :- ::m ::-j:X-M:?: :., :1I : ? 1IN : .. : i :XX :M : 1 :X-M - Xi: : X-1 Mm X MX -M :M: .X XX x~ :4 NIN:41.: M iii.: --:.s: .:: ; 8 .. .. . .......I.." ..... . .1... -..-? I, ... I...."I .... "'."'.' ... "

.. ....... :. Xi. IX... %-M . X-Xim XNiviv ......... ::,:. N ; 8 .. 8 ..1 I~...."--..-"-!' -, :1 : I ..... 4 ? .... ;: . . . .. . ..I , . " .. .. . , - .. ' ." ." . .. " 44 ? : : : :? :1 : I ?: i ." : -?: i. :?iIo : i-? : .?: .,? :%: :.:: .:: .? :' X - : N :- X -: . X . : . :.::4 : .X :':: i X 1:: :: X X X : ~x : : X : ? 1 : : F .:- I , - - . 11 .. . : t " , .. ." . - I , . I .1 I .. . .. . .. . .. . .. . .. .... . .. . X ? S : -m i ': ? M I l .? :?: ? :: ! ?: . :: .:! , :! : : .: . :: . ::.:: . :4 . : . : .? : : :. : .: : I : :: . :: :: .::, : ., : .? :?: :,: ::: . :: .: : .... .XX iX-::-::4:.:.,l.:4.:C?4c:4,? XXX N. ....IN%:ii::4:?Im.-mjX-: XXXX X-X--:.::.:1.:.?!.:.:!.:?::t::,:!::.,:. :m::" - :XMX : . .::N: -:M M a: 1-:.X -4:: :NIN: 444-- N -. -4-:.iNXX -4- .......?S N'?'NF:...... 4 ..... -:NX'- .." .- ... .I.-.111.111.11.1-1.1.1..1 ... --X? %- :- ?:!. :!.::!.:!. :.?l?::.::.?::.?:!.?:?::.":.?:1.::O:::?::":?.::1?:: .:::::::!.::??::?:::,::?::. :!. :4.::.?:O:.."..:..::. :4. :?.?4.?::.:4.::.?:4.?:.:40?:.::.::4?::.::.?:4 I.:U::.,::,:. '-'X4-'iX-X::-XX4:: X::-X: NX ??????????iX ?::% : -::X-:::-%:-XX. 4 X- . V: :,::?:.:?::4:..4.:4::.': :. :. : :: - 4?414: . li 11 :;:!.-.?,:,:-,?:S:: 1-- ..- -- ...".".1,....." ap 1..I ' . .it8 :. :?- :?:4 ::::::.:4?::?:?4.?4?.i ..... - ." ." ..11-- - s .I:: .M: - ..: . SH , ..: ?I I m :IN -lm .-IN I.. ? :.:- . : - M . i!;:! ??? ?,? ?? ???'?.? b"?.??,?.'???,? :?:. :.::-:. :?.:-:!,:, .-::.:%::.:?,:::. :.::::.:,.:n :. ..... .... ................,mX-mm~f:: : 1.: Q III! N :: - .':.:mXf-': - !X ::- :':: X~f:- - :X:.:N':: 1 ! . , :X: j - - : I - :as EF ....,..: ...N."X- .' 1:.1:4.?:4.?::.::.:::::::::.::4 X-::' ::.:: X: :-::X: X XN:i~m: I ? - -. - ,- .. .M i m ? .".1i.: 1 .1 % 4u ? : jj i:i:1,:. ' ?1 1N mX : j ? 1 :, 1 M NX 'X- XX.. . P .... . . : w_ :::. X: " 7::?:4.:::..:::;?? ::;? : .::,::4::4it M .:%.:M N::-'?? I ?1 ?: ::'- , i % : ? 1:1 N . I .." .-. - .1".1"

.: :-X::-X::-XII: i:m i !:: :' ..X~ . . m.' : m! 1 ?! M. ;; - :I .,:;::I i .:.8..,3..S .!.:.:T .?:: :::.::4?:.?:.,:!.::.?:?-:4q?.4 . .1 ' .. . ..: ' : ::N i :: :: 1P t; :?i o . ,Z . :r _u . 1. I, ~ ..'... ~ .. ... 1 1 ... .. .-1 .- ., ... - - - ,I .1 ."'. ....I..."... .. ,4N -'N .N . ." I... '.".1-.-11 8 :8 ? ::.? ! Xtm ::4 8 1 :: i, .' a Nx :: ..... I .................--. .. ".-111111 A..-. M . 1 ?1 ?:I% .% 11 X U -1it .. 1 " " . :N S 9 S: e % ?q :1: 8: I 2 ,.!i? NI ?: P.:.:-::.X:- iNX: _ - .:X':X:::.X-:X-': %::.:,.!:..:..,::.:!.?:4. 'pm m :.,::,:I.: '4 ? :4 :::? "i:1 :1 ! : 1N . :1? ? :1: W ?:1 : . ? N ? 11 1 "I . ::X :: :.?.2:. :.:c?:,:?,.s 1 . .-."- .1 .. .F1 ? 1 1. - II . .m::. XX - :i:. :??;,?;-;;-;?.:?::.:?:.?.:.?:.:4:4::.::.:.:..4::.:.:c.:. :.. . .." I~ .I.I..I.-mm- t ? : :X:X!.:XoX ? : X: N : : N : N ::; X':X-X X!-:: :::'::. .?. 1-. ..... ' ' N MMl:lIm .......1 1 1 - 4 " . X-I..X - t ;::.?tll?:.:.:0:0:.?1:.:1m :.ma:W :x8. :::6t4?b :?8::???:!?! :X:X 1-m% '"'' : -MRNM:..1.".-.".".,.,-,.E ,:MX?.?.:,:::?..,:,:.m:":.:!?:.:.:::. :6o.. . . -

I ' lll ...I ........I.,1 I ?, : t XX 'X-:4:.~ ::X .:.:.::."..,,".....,?.x.l. _ '_ ,1 ? 1I?! X:F a:3: ::8:8Np m; :?1? ? : 2 II. .M . ?11! O;i 0 ??gr,4,"".,..,??......iII o : : : ! :: ;:t?! ,:..:,: ?.,:.:. :. :? :. :::!?::::4.:.:. ! :::4 :4 :? . 'N .4 '. .................. ::: :?:?:? :l.:::::--:.:.?? :? ??? ;.;c::- - ;.,:c-;. ;.-?I?. 1 ?I N ?I X ? I.. . .. .... PF .., t t i:ul_.;N:.i'N '.4 .4 ~-N'4N N ,,, ,1? :8 :? N : .:?: .:1?:":!:C-:4? .:!, :x . M, .:MX:: .:X- N-: . :4.:X. 4 :, XM . : :: ' Ni I' ...... :: - ..'.l- .......... ."" .....N : I. 11.11. I. ... ..? .. .. ??MI;1:1?-1?'??:18 ..,"!:3XI I - : : 1 :1 -11 .............. -- --.

MrI...41 ? ....

I' :? X , ? I,........ : X4NX-: I............. I ~ : ?I :loxai: :: '4? '' - 1 m ? :: : m!'.?ij : I 1 ' m i? .' -p 8% i-g ---:- -1:-:- N X X44-4.X-XX:X' '1 1 NX4

:: . ......% .: :%;:! ; 1 X-? X:-? - 1 1-! 11 - I ?:1,?:1::!?::.::1:1.:1.:::.::?:I?::.::!.::.:?.:?1:?.,:.?.: xt"m it8, N it:.S.! N t 1 N :1 :? NaM1.4 81m ? ?g ic 5 :I? 91 1?g'vs?p~gqii ?.g. pp'pt :X . ::. 8? :: X NIN M... 8 "1 '., . I. X uF. . __. ? .?:.?:.?:.?:! ::? :!., :.,:!.::!.::. ::.::!.::. :?? :!.I:?.::.::!.::?:::::!.::!?:: ::-':st:sXRX. X! -XX X.9-?:.?:.?:.::41:: N,44RI41..4.l:,:..:x~m: :IUM 1 :!,::?::1:1?:!":tl?:"::"::.:1.:.:?.,?.,:.::::!::.?:.?.:::..?:. :4.::.::.,I?.:.::.??.,::.?:.,::.::4?::?:. :.::?:4-?:8,? ?,:I .:NN :X . 1 : ? I . . .%:X. 444 : - : ? ...... " . 6i:: N I .N. X .. 4X, ?i :i? I :I M - : - .: dX X -,:i .:? ::I .?:! ::: .?:! .:: .::

the data in the spatial dimensions. Figure 5 shows some of these Jitter objects operating on the image data produced from a j i t . qt .movie object.

The World of OpenGL

OpenGL is a cross-platform standard for drawing two- and three-dimensional computer graphics, de- signed to provide a common interface for different types of graphics hardware. It is used in a variety of applications from video games to the Mac OS X Window Manager. It consists of two interdependent parts: a state machine with a complex, fixed inter- nal structure for processing graphical data, and an API (application programming interface) in the C programming language for interacting with the state machine. The state machine defines a sequence of steps by which image textures can be applied to the faces of geometric primitives and rendered as seen by a virtual camera to create a final image. Many parameters that control the final image are defined, including a complex lighting model and extra processing steps that can be applied to the final rendering.

Some or all of the operations defined by the OpenGL state machine may be implemented by hard- ware GPUs (graphics processing units). Owing to the high degree of parallelism that can be applied to the task of drawing graphics, the past decade has seen affordable GPUs increase in speed dramatically faster than CPU speeds. This has prompted software devel- opers to move more and more drawing tasks to the GPU, and even some non-drawing tasks such as audio processing. (See gpgpu.org and graphics.stanford .edu/projects/brookgpu for examples of this.)

Generating and Manipulating OpenGL Data in Jitter

OpenGL accepts images in a variety of formats, in- cluding the four-plane ARGB format used by Jitter. Input from live video, recorded movies, or synthe- sizing objects such as j it. noise can be used di- rectly in OpenGL.

Geometries are defined in OpenGL by applying low-level geometric primitives to lists of vertices in three-dimensional space. The primitives define how a given sequence of vertices should be drawn. Examples of primitives are GL_LINE_STRIP, which connects all the vertices in the list with a line from first to

Jones and Nevile 61

Page 9: Creating Visual Music in Jitter- Approaches and Techniques

Figure 6. Examples of OpenGL primitives.

V5 V5 V5

V4 V4 V4

V2 V3 V3 V2 V3

V2 V1 vV VV, V, V

VoGL LINE STRIP VoGL TRIANGLES VoGL TRIANGLE STRIP

GLLINESTRIP GLTRIANGLES GLTRIANGLESTRIP

last; GL_TRIANGLES, which connects each triad of vertices with a triangle; and GLTRIANGLESTRIP, which draws a series of triangles between the ver- tices to form a strip (see Figure 6).

OpenGL geometries, like images, are stored in Jit- ter using matrices. Each vertex of the geometry is stored as one cell of a float32 matrix with one or two dimensions and a number of planes that can range from three to thirteen. If the matrix has three planes, the planes specify the x, y, and z compo- nents of the location of each vertex. Additional data are specified at each vertex if groups of additional planes are present, as described in Table 1.

When a geometry matrix is sent to a j it .gl . ren- der object, the symbol specifying which OpenGL primitive to use can be appended to the j it_matrix message as an additional atom, or it can be commu- nicated separately to the j it . gl. render object. In addition to the primitives defined by OpenGL, Jitter has two of its own primitives, TRI GRID and QUAD_GRID, which were added to accommodate two-dimensional geometry matrices. Rendering us- ing TRI GRID and QUADGRID creates connections from each vertex to other vertices in its local neigh- borhood in the matrix grid. If not using TRI_GRID or QUAD GRID with a two-dimensional geometry matrix, the matrix is divided into rows or columns depending on the value of the geom_rows attribute of the j it. gl. render object, and the resulting one-dimensional matrices are drawn directly in OpenGL (see Figure 7).

Geometries can come from a variety of sources.

Table 1. Planes in the Jitter Geometry Matrix Format

planes data group description

0-2 x, y, z position of the vertex in 3-space

3-4 s, t the coordinates on an applied texture that will be mapped to the vertex

5-7 nx, ny, nz normal vector: describes the orientation of the vertex with respect to light sources

8-11 r, g, b, a vertex color as RGBA components

12 e edge flag: determines whether following edge is considered an outer edge of a polygon

Models are definitions of geometry used to draw the characters and objects in video games and computer-animated movies. A variety of data may be stored with a model, including coordinates and parameters for mapping multiple surface textures onto the geometry. Models are stored in files in a variety of different formats. One common, open for- mat is the . obj file, created by Wavefront Technol- ogies (Rule 1996). Jitter's j it. gl.model object can read and draw . obj files into an OpenGL scene. Other objects provide different ways of creating geometries. The j it . gi . gridshape object defines a variety of geometric shapes that can be used to de- fine collections of vertices at different spatial reso- lutions. The j it. gl. text3d object can render texts using TrueType fonts to produce geometries. The j it . gl. plato object can create geometries of

62 Computer Music Journal

Page 10: Creating Visual Music in Jitter- Approaches and Techniques

Figure 7. Effect of the geomrows attribute.

V00 V40 V40

Vio V30 Vo

o V30

VooI V1o V20 V30 V40 V20 0V2

Vol Vl1

V21 V31 V41 ol V41 Vol V41

V1 V31 V11 V31

V21 V21

vertices in 2D matrix GL_LINE_STRIP, geom_rows = 1 GL_LINE STRIP, geom_rows = 0

the five Platonic solids (tetrahedron, hexahedron, octahedron, dodecahedron, and icosahedron).

Finally, geometries can also be created directly by patching in the Max environment. Any matrix can be interpreted as an OpenGL geometry, provided it consists of floating-point data and has the necessary three or more planes. By using Jitter's matrix opera- tors and built-in type conversions, a variety of shapes can be defined mathematically. Data such as moving images, the results of sound analysis, or ges- tures from controllers can be converted into geome- try data. The j it. pack and j it. unpack objects can split the geometry matrix for processing of the desired planes, recombining it to send to the j it . gl. render object after processing. Most of the Jitter objects that are commonly used as video filters, such as j it. tiffany, j it. repos, and j it. glop, operate on data of arbitrary type and plane count. This enables their use as geometry fil- ters as well, often with surprising and interesting re- sults. A j it .matrixset object can be used to store animated geometries in the same way it is used to store animated images.

Input from Musical Processes

We must now discuss two different types of musical input: discrete data, which can be linked to logical musical events such as notes being played on a pi- ano; and parametric input, which we might think of as smooth, continuous changes in a stream of data.

Of course, since we are in the digital world, we must discretely sample a continuous parameter to make it available for analysis, so an argument can be made that distinguishing between a discrete event and a continuous parameter is only valid in theory. After all, one can treat each sample in a sig- nal as a separate event, and conversely, one can con- struct a signal out of an aperiodic series of events through any number of heuristics, such as simply incorporating values from new events into some sort of integration filter. Regardless, it is helpful to segment the discussion into the two types of input when thinking in terms of what one may want the visualization algorithm to do.

Events

In a live performance setting, one or more of the musical performers may be using a device capable of sending discrete messages directly to the com- puter. The most common format for this type of input is of course the venerable MIDI protocol; de- spite the protocol's not having changed much in more than 20 years, modern off-the-shelf USB and Firewire MIDI interfaces still provide a convenient port of entry into the computer. In addition, some performers have designed standalone or extended instruments with circuitry to communicate with a computer via some other means, for example, using direct serial communication.

Alternatively, one can analyze and extract events

Jones and Nevile 63

Page 11: Creating Visual Music in Jitter- Approaches and Techniques

from a signal that represents some aspect of the per- formance. This signal could be in the form of an au- dio feed from a live microphone or a mixing board, or it could be a stream of gesture data, sampled at high rates with a device such as the Teabox (Allison and Place 2004). Multiple channels of signal input that represent different dimensions of a perfor- mance can be analyzed simultaneously, potentially with cross-correlations taken into account for non- orthogonal dimensions. Extraction of events from a signal is a complicated topic that has been thoroughly researched by electrical engineers. The simplest de- tector is a threshold test, which compares the cur- rent value of a signal to a threshold value. When the status of the comparison changes-a transition from below the threshold to above, or vice versa- an event can be triggered by the comparator. But even a simple comparator like this becomes compli- cated when we consider that noise corrupts every signal and obscures the true value of the parameter. It may be possible to construct a filter to minimize the effects of the noise, but doing so necessarily in- troduces latency into the detection. It is therefore impossible to optimize a system for both latency and accuracy simultaneously, so one must settle for a compromise.

Indeed, the threshold test is often the building- block for more complicated tests that rely on spe- cific types of filters. For instance, if one wishes to detect a known variation in the signal-a particular sound in an audio signal, perhaps, or a defined ges- ture-it has been shown that the optimum detec- tion process involves running the signal through a matched filter with an impulse response that mir- rors the desired amplitude variation, and then per- forming a comparison with a fixed threshold whose value can be determined by the desired statistics of the detection process (Schwarz and Shaw 1975). Similar techniques exist for matching in the fre- quency domain after a transformation from the time domain using FFTs, wavelet-based methods, or other filtering operations. Miller Puckette's bonk- object (Puckette 1997a) for his signal-processing environ- ment pd (Puckette 1997b), and which Ted Apel has ported to MSP (crca.ucsd.edu/-tapel/software.html), was designed to be a percussion detector. It tracks the envelopes of all 21 bands of a constant-Q filter-

bank to detect a rise in energy of rapidity above a certain threshold, and then it matches the detected energy spectrum to one of a series of known spectra.

Parameters

Unlike events, which arrive sporadically and may trigger new processes in the algorithm, a value for each parameter of a visualization algorithm is needed for every frame rendered. When a request is made to render a frame, the algorithm often need only know the current value of the parameter; in other words, values that the parameter passed through en route to the final destination may not be important. However, if a parameter is rapidly changing, it is important to consider the aliasing ef- fects that can take place in the large "downsampling" between the audio and video domains. If the effec- tive frequency of a parameter change is higher than half the visual sampling rate or frame rate of the vi- sualization algorithm, the aliased parameter values in the resulting video may not represent the param- eter accurately. Movie film in most common for- mats has a sampling sample rate of 24 frames per second; this is fast enough to represent fluid moving imagery, but only if proper filtering is done. Fast motion must be low-pass filtered with respect to time to produce the appearance of motion continu- ity. In analog film, this filtering is provided by the camera's shutter, which integrates the moving im- age during each frame, approximating a box filter. This integration appears to us as motion blur. To ensure smooth motion in digital work, then, it is sufficient to low-pass filter our parameter signals with a cutoff frequency of approximately 12 Hz if the visual effects controlled by those parameters are also filtered properly. However, owing to the com- putational expense of rendering motion blur digi- tally, proper filtering is not normally present in real-time work; instead, faster frame rates and pa- rameter sampling rates are typically used.

MIDI continuous control (CC) messages are com- mon communicators of parameters in the world of music. However, the inherent resolution of 128 val- ues that a generic control allows is coarse, the pro- tocol cannot simultaneously transmit multiple

64 Computer Music Journal

Page 12: Creating Visual Music in Jitter- Approaches and Techniques

Figure 8. Two ways of transferring amplitude data from the signal to event domains. The net- work on the left uses the peakamp- object to com- pute the largest value re-

ceived since the last pa- rameter update. The net- work on the right uses an accumulator to sum the values received since the last update.

ses ttr Wi Ib"

seinals b llthba" it Sl~~sel lsgm~sfltbwblly

dimensions of data (because it is serial), and it is common to overwhelm a MIDI port by sending too much control data. Despite these shortcomings, MIDI CCs are adequate for many purposes. The most recently received value can be cached and used to generate a parameter value when needed. Often, in the audio domain, a separate slew parame- ter controls how quickly the signal slides toward the target parameter value as a strategy to prevent the "zipper noise" caused by large jumps in a signal value, which manifest themselves as impulses in the audio path. This can also be an effective strategy in the visual domain, especially in algorithms that employ recursive feedback where an impulse can have a long-term effect.

In many situations, signals are the natural method for communicating parameters. Sending a bang message to the snapshot- object provides a convenient way to transform the value of a signal into a message for use by a Jitter object. If more pre- cision is required, the event- object allows one to synchronously sample multiple signals with a signal-based trigger mechanism.

Raw audio data can be analyzed in a variety of ways to produce meaningful musical parameters. An amplitude metric can be calculated by squaring the sample values of the input signal, which for an audio channel in most cases is limited between the range of -1 and 1. It is also common to scale the value logarithmically to approximate our percep-

tion of loudness. Then, because the signal oscillates between large and small values, it is not enough simply to take a single value from the signal and use that as the estimate; to estimate accurately the overall amplitude of the signal, we must employ some kind of filter that examines the signal over a certain amount of time. Figure 8 illustrates two dif- ferent methods of handling this problem: the first uses the peakamp- object, which keeps track of the largest value received since the last parameter request. The second uses the +=- accumulator ob- ject to add all the values received since the last parameter request. Note that the peakamp- and accumulating methods produce numbers with very different orders of magnitude, so the scaling factors that follow the networks must be different for the same expected range of output values. Alterna- tively, Tristan Jehan's loudness- external (avail- able online at web.media.mit.edu/-tristan) employs spectral methods to estimate the time-domain en- ergy of a signal.

Parameters can also be calculated from a fre- quency analysis of the performed audio. FFT-based schemes are common, although the latency they in- troduce can be problematic, and the more resolved a frequency division desired, the longer the latency. Such an analysis can be accomplished with an algo- rithm implemented using MSP's built-in FFT tools, or the artist can use the variety of custom objects made for the job. Ted Apel's centroid- external

Jones and Nevile 65

Page 13: Creating Visual Music in Jitter- Approaches and Techniques

takes the output of an f f t- object and estimates the spectral centroid, the spectral "average" weighted by amplitude. Miller Puckette's fiddle- object is a polyphonic pitch tracker that produces the detected pitches both as messages for stabilized pitches and as signals for a "continuously" updated estimate. Tristan Jehan's brightness- and noisiness- are estimators of spectral centroid and spectral flatness, respectively, and his bark- object provides a spec- tral analysis with bands chosen according to an au- ditory model rather than the linear spacing between frequencies that an FFT provides.

The j it . catch- object provides the ability to transcode data from a signal into a Jitter matrix in a variety of ways. For instance, it is possible to request only the most recent N samples of data, or every sample since the last request, or the most recent frame of data centered around some sort of thresh- old (like the trigger feature on an oscilloscope). The j it . catch- object allows one to introduce signal data directly into the matrix world, where further analysis can take place, or where direct synthesis can be the result. The j it. graph object (see Fig- ure 9) renders one-dimensional audio matrices as two-dimensional waveform displays. In addition to displaying these directly, one can use them as fur- ther fodder for synthesis-for instance, as keyframes for compositing, or as geometric manipulators.

In terms of analysis, moving audio into the ma- trix world affords some tricks that can be taken ad- vantage of if the results for intermediary samples are not needed. For example, if an FIR filter is used, perhaps only the final sample must be calculated. Because rendered video needs less than a tenth of a percent as many frames as rendered audio, calculat- ing the results of the FIR filtering in the matrix do- main can be a considerable gain in efficiency and may afford expensive analysis that could not other- wise be accommodated in MSP's signal domain. Similar efficiencies can be exploited with FFT anal- ysis, but not with recursive IIR filtering. However, the results of IIR filters used for analysis are not pro- duced in the audio domain, and so the unpleasant sound of a highly nonlinear phase response is no longer an issue. This allows the use of more effi- cient filter types, such as Chebyshev and elliptical filters (Antoniou 2000).

Max/MSP Threads and Jitter's Variable Frame- Rate Architecture

Some humans perceive frequencies as high as 20,000 Hz, whereas our eyes refresh the image that is sent to our brain at a rate as much as three orders of magnitude lower. Correspondingly, MSP operates with very strict timing, and Jitter does not. In fact, Jitter was designed so that its frame rate would adapt to the processing power available after audio and high-priority event processing was finished. The op- eration of this variable frame-rate architecture in- volves a system of event queues and priorities that the remainder of this article attempts to explain.

The internal Max/MSP engine changed a great deal during the transition from the cooperative multi- tasking world of Macintosh OS 9 to the pre-emptive multitasking environments of Mac OS X and Win- dows XP. In version 4.5 of Max/MSP, there are three primary threads of execution. The first is called the main thread. This low-priority thread is responsible for short, expensive operations, such as the servic- ing of the user-interface objects. The main thread calculates user-interface actions and redraws the screen when one clicks on a button object or changes the value in a number box. The-high priority sched- uler thread operates on time-sensitive data, such as incoming data from MIDI interfaces, or bangs ema- nating from a clocked object like metro. Finally, the very high-priority thread that operates on the MSP signal vectors is called the perform thread.

Although the perform thread is given a higher pri- ority in the operating system's thread-scheduling mechanism than the scheduler thread, and the scheduler is given a higher priority than the main thread, any thread can interrupt any other at any time in a pre-emptive multitasking operating sys- tem. Indeed, on a computer with more than one CPU, more than one thread can even execute simul- taneously. This can lead to confusing situations for Max programmers when a patch has elements that operate in different threads, occasionally interrupt- ing one another in the midst of a calculation.

The organization of the scheduler thread is fur- ther complicated by two options that can be set in Max's DSP Status dialog box. If "overdrive" is dis- abled, all scheduler data is processed in the main

66 Computer Music Journal

Page 14: Creating Visual Music in Jitter- Approaches and Techniques

Figure 9. A visualization network using j it . catch- and j it. graph. The j it. catch- object transcodes three audio channels from the signal to

the matrix domain. The j it . unpack object sepa- rates the three planes of the output, and the j it . graph objects render the data in a different

semi-transparent color to the same output matrix, which is finally displayed in the window.

qmneiro2

|hs.aw :344.5--

jit?.grP ph mode 2 @clear t

o Ofrgh 125 255 124 94 Wout•...nam ,tl

......................

.. .. ..

.....

. . .. .. .. . .. .. .. . .. . . . ....

. . .. ....

thread. If both "overdrive" and "scheduler in audio interrupt" are enabled, all scheduler data is processed in the perform thread immediately prior to the calcu- lating of the signal vector. If "overdrive" is enabled but "scheduler in audio interrupt" is not enabled, the scheduler thread exists normally. These three

configurations allow Max programmers to tailor the execution of high-priority events to their needs. Disabling overdrive removes the special status of scheduler messages but increases efficiency through the elimination of the additional thread, whereas executing the scheduler in the perform thread en-

Jones and Nevile 67

Page 15: Creating Visual Music in Jitter- Approaches and Techniques

Figure 10. (a) A patch with two counters banged in the scheduler thread by the output of metro objects. The output of counter A on the left is deferred to the main thread using a usurp

mechanism, whereas the output of counter B is de- ferred as usual (see the text). (b) A schedule of some executing messages from the patch illustrated in (a).

sures the most accurate timing but reduces the amount of time the perform thread has available to calculate the signal vector. With the latter option, one must be careful to keep the scheduler opera- tions very short. A lengthy scheduler process in- creases the risk of exceeding the amount of time available to process the signal vector, which may result in audible glitches.

Regardless of the configuration of the scheduler, if the processing resulting from a clocked action is expensive, it is usually wise to transfer the execu- tion of the processing to the main thread, where a lengthy operation will only delay the execution of other time-insensitive actions. This can be accom- plished using the defer object or j it . qball. Con- versely, execution can be transferred from the main thread to the scheduler thread using the delay ob- ject with an argument of 0. Some objects operate in both threads; qmetro, for instance, functions as a metronome internally by using the scheduler thread to clock when bangs should be produced, but instead of sending the bangs in the scheduler thread, it de- fers their output to the main thread. This deferral is done using a usurp mechanism: if a message is wait- ing in the main thread's queue and has not yet been produced, any new message coming from the same object will replace the old message in the queue.

Figure 10 provides an illustration of the usurp mechanism in action. The two counter objects in Figure 10a are deferred to the main thread; the counter on the left ("counter A") uses a usurp mech- anism, and the counter on the right ("counter B") does not. Figure 10b illustrates a schedule of exe- cuting messages as they are passed from the sched- uler thread to the main thread. On the first line, counter A has sent out a 1, and this message is placed in the main thread's queue. The second line sees the counter send out a 1, which is also placed on the queue. On the third line, counter A produces a 2, which, owing to the usurp mechanism, replaces the 1 from counter A that was waiting to be produced.

The fourth line illustrates the main thread's pro- cessing of the first event in the queue, as well as the output of 2 from counter B. The fifth line shows the output of 3 from counter A, which is added to the front of the queue because no other messages from counter A are waiting to be produced. On the sixth

(a)

jitlqball @mode usurp i~0t.qbal111mode defer

(b)

counter A counter B queue (scheduler thread) (scheduler thread) (main thread) usurped deferred

b3 b

b2 I 3a3b2 b2 •1 Fb 3? I b 31 a 31 b 21 b 1-

line, the "3" message from counter B is placed at the front of the queue. The seventh line shows the processing and removal of counter B's 3, as well as the replacement of the 3 from counter A with the new output of 4 owing to the usurping mechanism.

In the case of qmetro, the usurp mechanism en- sures that only a single bang from the qmetro is ever waiting to be sent out in the main thread's queue. In a situation where a clocked object is con- nected to a network of Max objects that perform some expensive computations, and bangs are output more quickly than the network can perform the computations, the usurping mechanism prevents stack overflow.

Networks of Jitter objects that operate on video matrices iterate over many thousands of pixels. Ac- cordingly, these demanding calculations typically take place in the main thread. In fact, on multi- processor machines, Jitter maintains a pool of threads for its own use in iterating over large matrices. Fig- ure 11 shows the execution flow in the matrix_calc method of a multiprocessor-capable Jitter object. The Max programmer need not think about these

68 Computer Music Journal

Page 16: Creating Visual Music in Jitter- Approaches and Techniques

Figure 11. In a multi- processor environment, the calculation of some expen- sive operations is divided between the main thread and one or more worker

threads. This division of labor does not affect the relationship between the main thread and the scheduler and perform threads.

main thread work thread A work thread B

matrix calc

process process

(wait) process

I I

join

extra threads, other than to know that they can sig- nificantly speed up iteration over large matrices.

It is common to drive a Jitter network with a qmetro object set to a very short period between bangs. Owing to the usurping mechanism discussed above, the result is that the Jitter network calcu- lates new frames as quickly as possible given the computational resources available. Because the scheduler thread and perform thread execute con- currently, processing of audio and time-dependent events is not affected. The frame rate of the video output is dependent on the available computational resources. Because a modern operating system typi- cally operates several dozen processes in the back- ground, each of which requires a varying amount of time to be serviced, the available computational re- sources are constantly fluctuating. Therefore, the frame rate of the video output also fluctuates. For- tunately, the eye does not have the same stringent periodic requirements as the ear.

This variable frame-rate architecture requires a different mindset from that required by MSP when evaluating the computational load of a patch. Be- cause it has a fixed rate at which it must process samples, with MSP the computational load can be defined as the ratio of the time taken to calculate a single sample to the period of the audio signal. On the other hand, driving a Jitter network with a

quickly resetting qmetro as described above effec- tively means that the Jitter processing will consume all available processing power after the perform and scheduler threads have done their work. The best way to estimate the different computational loads of different Jitter networks is therefore to compare their frame rates, something that is easily done with the fpsgui object, a graphical object that can be connected anywhere in the network to provide valuable feedback about the frame rate, data type of the matrix, and other information. It is worth not- ing that the frame rate of the computer's monitor or video projector is a practical upper limit on the rate that output imagery can be updated. It is a waste of computational power to synthesize images more quickly than the output medium can display them.

Conclusions

Max/MSP is a widely used system for creating audio works. With the addition of Jitter, new visual dimen- sions are available for electronic artists to explore. We have discussed mappings between sound and image in light of various considerations, from the theoretical treatment of human psychology to the practical demands of the programming environment. Certain mappings make intuitive sense because they have a basis in physics or human perception, or because they are learned in a cultural context. The mappings chosen for a particular work of visual mu- sic help define the work's style: an internal logic that creates a ground for meaning. It is our hope that Jitter will prove to be an effective tool for the implementation of novel mappings, and that the re- sulting instruments will help to communicate new styles of visual music.

References

Allison, J. T., and T. A. Place. 2004. "Teabox: A Sensor Data Interface System." Proceedings of the 2004 Interna- tional Computer Music Conference. San Francisco: In- ternational Computer Music Association, pp. 699-701.

Antoniou, A. 2000. Digital Filters: Analysis, Design and Applications, 2nd ed. New York: McGraw-Hill.

Jones and Nevile 69

Page 17: Creating Visual Music in Jitter- Approaches and Techniques

Bevilacqua, F., MWiller, R., and Schnell, N. "MnM: A Max/MSP Mapping Toolbox." Proceedings of the Inter- national Conferences on New Interfaces for Musical Expression (NIME), Vancouver, B.C., Canada, 2005.

Donnadieu, S., S. McAdams, and S. Winsberg. 1994. "Context Effects in 'Timbre Space'." Proceedings of the 3rd International Conference on Music Percep- tion and Cognition. Liege, Belgium: ESCOM, pp. 311-312.

iotaCenter. 2000. Kinetica 2 Exhibition Catalog. Los Angeles: iotaCenter.

Kohler, W. 1929. Gestalt Psychology. New York: Liveright. Mattis, 0. 2005. Visual Music. London: Thames and

Hudson. Meyer, L. B. 1956. Emotion and Meaning in Music.

Chicago: University of Chicago Press. Puckette, M. 1988. "The Patcher." Proceedings of the

1988 International Computer Music Conference. San Francisco: International Computer Music Association, pp. 420-429.

Puckette, M. S. 1997a. "Pure Data: Recent Progress." Pro-

ceedings of the Third Intercollege Computer Music Festival. Tokyo: Keio University, pp. 1-4.

Puckette, M. S. 1997b. "Pure Data: Another Integrated Computer Music Environment." Proceedings of the Second Intercollege Computer Music Festival. Tachikawa: Kunitachi College of Music, pp. 37-41.

Ramachandran, V. S., and E. M. Hubbard. 2001. "Synaes- thesia: A Window Into Perception, Thought and Lan- guage." Journal of Consciousness Studies 8(12): 3-34.

Rule, K. 1996. 3D Graphics File Formats: A Programmer's Reference. Boston: Addison-Wesley.

Schwarz, M., and L. Shaw. 1975. Signal Processing: Dis- crete Spectral Analysis, Detection, and Estimation. New York: McGraw-Hill.

Werner, H. 1934. "L'unit6 des sens." Journal de Psycholo- gie Normale et Pathologique 31:190-205.

Zicarelli, D. 1998. "An Extensible Real-Time Signal Pro- cessing Environment for Max." Proceedings of the 1998 International Computer Music Conference. San Fran- cisco: International Computer Music Association, pp. 463-466.

70 Computer Music Journal