333

Musical Imagery

Embed Size (px)

DESCRIPTION

Godoy, 2001

Citation preview

Page 1: Musical Imagery
Page 2: Musical Imagery

MUSICAL IMAGERY

Page 3: Musical Imagery

STUDIES ON NEW MUSIC RESEARCH

Series Editor:

Marc Leman, Institute for Psychoacoustics and Electronic Music,University of Ghent, Belgium.

Page 4: Musical Imagery

MUSICAL IMAGERY

Edited by

Rolf Inge God¢y and Harald ]¢rgensen

@ Taylor & FrancisTaylor&Francis Group

NEW YORK AND LONDON

Page 5: Musical Imagery

Library ofCongress Cataloging-in-Publication Data

Appliedfor ...

Published by Taylor & Francis270 Madison Ave, New York NY 100162 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN

Transferred to Digital Printing 2009

Serie-cover design: Ivar Hameling.

© 2001 Taylor & Francis

All rights reserved. No part of this publication may be reproduced, stored ina retrieval system, or transmitted in any form or by any means, electronic,mechanical, by photocopying, recording or otherwise, without the priorwritten permission of the publishers.

ISBN 90 265 1831 5 (hardback)

Publisher's NoteThe publisher has gone to great lengths to ensure the quality of this reprintbut points out that some imperfections in the original may be apparent.

Page 6: Musical Imagery

Contents

Contributors VB

Editors Preface VBI

Rolf Inge God¢y and Harald J¢rgensen

Part I Theoretical PerspectivesOverview .Rolf Inge God¢y and Harald J¢rgensen

1 Perspectives and Challenges of Musical Imagery . . . . . . . . . . . . 5Albrecht Schneider and Rolf Inge God¢y

2 Neuropsychological Mechanisms Underlying AuditoryImage Formation in Music 27Petr Janata

3 Musical Imagery and Working Memory . . . . . . . . . . . . . . . . . .. 43Virpi Kalakoski

4 Modeling Musical Imagery in a Framework of PerceptuallyConstrained Spatio-Temporal Representations 57Marc Leman

5 Mental Images of Musical Scales: A Cross-culturalERP Study 77Christiane Neuhaus

6 Complex Inharmonic Sounds, Perceptual Ambiguity,and Musical Imagery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 95Albrecht Schneider

7 Musical Imagery between Sensory Processing andIdeomotor Simulation 117Mark Reybrouck

8 Musical Imagery as Related to Schemata of EmotionalExpression in Music and on the Prosodic Level of Speech .... 137Dalia Cohen and Edna lnbar

9 Imaging Soundscapes: Identifying Cognitive Associationsbetween Auditory and Visual Dimensions . . . . . . . . . . . . . . . . .. 161Kostas Giannakis and Matt Smith

Part II Performance and Composition

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 181Rolf lnge God¢y and Harald J¢rgensen

10 Expressive Timing in the Mind's Ear 185Bruno H. Repp

Page 7: Musical Imagery

VI MUSICAL IMAGERY

11 Control of Timbre by Musicians - A Preliminary Report ..... 201Wolfgang Auhagen and Viktor Schoner

12 Images of Form: An Example from NorwegianHardingfiddle Music 219Tellef Kvifte

13 Imagined Action, Excitation, and Resonance 237Rolf Inge God¢y

14 The Keyboard as Basis for Imagery of Pitch Relations 251James M. Baker

15 Composers and Imagery: Myths and Realities 271Rosemary Mountain

16 The Musical Imagery of India . . . . . . . . . . . . . . . . . . . . . . . . . .. 289Lewis Rowell

Name Index 303Subject Index 311

Page 8: Musical Imagery

Contributors

Wolfgang Auhagen and Viktor Schoner, Musikwissenschaftliches Seminarder Humboldt-Universitat zu Berlin, Am Kupfergraben 5, D-I0099 Berlin,Germany, [email protected]

James M. Baker, Brown University, Box 1924, Providence, RI 02912,USA, [email protected]

Dalia Cohen and Edna Inbar, Department of Musicology, The HebrewUniversity of Jerusalem, 13 Rashba Street, 92264 Jerusalem, Israel,[email protected]

Kostas Giannakis and Matt Smith, School of Computing Science,Middlesex University, Bounds Green Rd, London NIl 2NQ, UK,[email protected]

Rolf Inge God0Y, Department of Music and Theatre, University of Oslo,P.O. Box 1017 Blindem, 0315 Oslo, Norway, [email protected]

Petr Janata, Psychological and Brain Sciences, 6207 Moore Hall,Dartmouth College, Hanover, NH 03755, USA, [email protected]

Harald J0rgensen, Norwegian State Academy of Music, P.O. Box 5190Majorstua, 0302 Oslo, Norway, [email protected]

Virpi Kalakoski, Department of Psychology, P.O. Box 13,00014 Universityof Helsinki, Finland, [email protected]

Tellef Kvifte, Department of Music and Theatre, University of Oslo, P.O.Box 1017 Blindem, 0315 Oslo, Norway, [email protected]

Marc Leman, Institute for Psychoacoustics and Electronic Music,University of Ghent, Blandijnberg 2, B-9000 Ghent, Belgium,[email protected]

Rosemary S. Mountain, Department of Music, Concordia University, 7141Sherbrooke Street West, Room RF 322, Montreal, QC H4B 1R6, Canada,[email protected]

Christiane Neuhaus, Institute of Musicology, University of Hamburg, NeueRabenstr. 13, D-20354Hamburg, Germany, [email protected]

Page 9: Musical Imagery

Vlli MUSICAL IMAGERY

Bruno Repp, HaskinsLaboratories,270 Crown Street, New Haven, CT06511-6695, USA,[email protected]

Mark Reybrouck, Katholieke Universiteit Leuven, Eikendreef 21, B-8490Varsenare, Belgium, [email protected]

Lewis Rowell, School of Music, Indiana University, 1201 East Third StreetBloomington, IN 47405-7006, USA, [email protected]

Albrecht Schneider, Institute of Musicology, University of Hamburg, NeueRabenstr. 13, D-20354 Hamburg, Germany, [email protected]

Page 10: Musical Imagery

Preface

This is a book about images of musical sound in our minds. Our belief is thatmusical imagery is at the very core of music as a phenomenon, because, afterall, what would music be if we did not have images of sound in our minds?

Yet, a review of outstanding writings on music in our culture from antiquityto the present shows that there is little material dealing directly with ourmental images of musical sound. The study of imagery in other domains, andin particular visual imagery, has made important advances in the past coupleof decades, raising the important debate about 'mental imagery' in the 1980s,and developing ever more innovative approaches to research. The past threedecades have, however, also witnessed a growing interest in research intomusic cognition. This research has covered a broad range of issues ofperception and has also contributed to the formation of explanatory modelsof musical behaviour. Accordingly, we felt that it was high time musicalimagery was put on the agenda of an international conference. When theopportunity arose to choose a topic for the VI. International Conference onSystematic and Comparative Musicology, to be held at the Section forMusicology of the University of Oslo in June 1999, we suggested musicalimagery as the topic.

In the call for papers prior to the conference we tentatively defined musicalimagery as 'our mental capacity for imagining musical sound in theabsence of a directly audible sound source, meaning that we can recall andre-experience or even invent new musical sound through our 'inner ear'.'In the reading of abstract proposals submitted for the conference, as well asin the process of selecting and editing the chapters for this book (mostly writ-ten by authors present at the Oslo conference), we realised that we wouldhave to reconsider our definition of musical imagery.

For one thing, the distinction between perception and imagery is at best oneof principle. Trying to distinguish between actually listening to music (bethat in a live performance situation or with a phonograph source), andimagining music without such a source, is a matter of distinguishing twodifferent situations. However, it is also clear that in an ordinary listeningsituation, the memory of what has been heard as well as the expectations ofwhat is to come, play an integral role in the process of any 'primary'perception of musical sound. It could thus be argued that contextual imagesas well as more general schemata at work in music perception, are in factelements of musical imagery 'present' in an ordinary listening or performancesituation (e.g. the improviser has an image of what he/she just played andwhat to play next).

Frances Shih
Page 11: Musical Imagery

x MUSICAL IMAGERY

Another point is that studies of musical imagery to a large extent are indirectin the sense that we do not have an 'observer' situated in the mind, capableof giving accounts of what is going on when we are experiencing images ofmusical sound. Although the use of non-invasive methods (cf. chapters 2 and5 in this book) may indeed give us useful observations of what is going on inthe brain when we imagine musical sound, this does not tell us much aboutthe qualities involved in musical imagery. One consequence of this indirectaccess to mental images of musical sound is that we have to deduce, assumeor simply guess a number of things from other sources, in particular fromwhat can be observed in the more 'primary' perception of music. Anotherconsequence of this is that we have to rely on introspective accounts of ourmental images of musical sound. We should also add to this that musicalsound 'in itself' may be considered 'impure' in the sense that musicalimagery seems in many situations to be accompanied by, or even inseparablefrom, images of source, of sound-generation, of the environment, as well asvarious images of 'meaning', such as emotional content or highly extra-musical associations (e.g. the sound of a drum roll could for many people beinseparable from the image of someone ferociously beating the drummembrane with a couple of drum sticks).

Given the complex and multifaceted nature of our topic, we believed thatthere had to be a plurality of approaches in this book: no one single domainof research can claim to possess the most appropriate approach to musicalimagery. We believe that people from such diverse domains as neurology,cognitive psychology, philosophy, music theory, ethnomusicology, musiceducation, composition and performance may all make valuable contribu-tions to the study of musical imagery. We emphasise this, because the readerof this book will encounter a pluralism of scientific paradigms and termino-logy, and it is our hope that the various contributions will be appreciated ontheir own terms.

We have organized the chapters of this book into two parts. The first partdeals with theoretical perspectives and gives a presentation of what we see assome of the main issues in musical imagery today, including a historicaloverview and an overview of the neurophysiological bases of musicalimagery, as well as some other theoretical issues of musical imagery.The second part is focused on issues related to performance and compositionand presents some more practical applications of musical imagery. Both mainparts will be preceded by an overview section where the main issues of thechapters will be presented and briefly discussed in the perspective of thedomain of musical imagery as a whole.

We believe that the subject of musical imagery concerns everyone working

Page 12: Musical Imagery

PREFACE xi

with music. Our belief that musical imagery is at the very core of musicalexperience, perhaps even being the very content of musical thought, makesus hope that this book will be read by not only music psychologists, but byperformers, composers, arrangers, music theorists, musicologists, musiceducators, and of course, by any person interested in music.

This book is the result of the efforts of our contributors, and we thank themall for sharing their work with us and the readers of this book. We sincerelythank the reading committee, consisting of Marc Leman, Bruno Repp andAlbrecht Schneider, for their effort. Without their meticulous reading,commenting, and extensive knowledge, this book would hardly have beenpossible. Our thanks go to the publisher, Swets and Zeitlinger, for receivingand endorsing our project and for patient and helpful support throughout theproduction process, to the Norwegian State Academy of Music, to theDepartment of Musicology, University of Oslo, and to The InternationalSociety for Systematic and Comparative Musicology for supportin"g thisproject. Last, but not least, we thank Gisela Attinger for doing the productionwork of the manuscripts. Her efforts, her reliable attention to detail and heringenuity have made a major contribution to the quality of the book.

May 2001

Rolf Inge God0Y Harald 10rgensen

Page 13: Musical Imagery
Page 14: Musical Imagery

I

Theoretical Perspectives

Rolf luge and Harald

The first part of this book deals with some fundamental issues in musical imagery.Although there seems to be a broad consensus about the term 'musical imagery' asdenoting images of musical sound in our minds, there are obviously many opinionson the nature of such images and on their relationship to perception and memory,and there are many explanations of how they work. Also, there are many, and ratherdifferent, approaches to investigating these images of musical sound in our minds.

Such a plurality of notions of what musical imagery is supposed to be, as wellas of paradigms for exploration may perhaps seem confusing. There is of course aconsiderable distance not only in method, but also in fundamental attitudes, betweenfor instance 'arm chair' style introspective approaches and various methods of brainactivity measurements. As we know, such differences in approaches and fundamentalattitudes are often encountered in other domains of the cognitive sciences as well. Onthe positive side, we can see these differences in approaches and attitudes as an expres-sion of the composite and multifaceted nature of our topic. With this understanding,the plurality of notions and approaches becomes an asset rather than a liability, andwe should read this book as a frame-by-frame exposure, from various angles, of thisrich and essential topic that we think musical imagery is.

In line with this idea of a pluralism of approaches, the contribution of AlbrechtSchneider and Rolf Inge (chapter 1) gives an overview of some important no-tions of imagery and musical imagery in the past, as well as a brief assessment ofsome challenges confronting us now and in the immediate future. The authors believe

Page 15: Musical Imagery

2 THEORETICAL PERSPECTIVES

that some ideas manifest in phenomenological and Gestalt theoretical works of thenineteenth and twentieth century still have relevance. As an example, we can lookat the issue of context in both perception and imagery. This was elegantly depictedof Brentano and later of Husserl in his tripartite model of retentions, primal impres-sions and protentions, meaning that there are always images of past experience andfuture expectancies exerting influence on what we perceive and imagine at any givenmoment. This, and similar 'introspective' insights are actually also epistemologicalquestions, questions which reappear in experimental approaches to musical imagery.

Interestingly, expectancies and/or violations of expectancies are important ingre-dients in investigating the neurophysiological bases of musical imagery, as we cansee in Petr Janata's contribution (chapter 2). Another central issue is the relation-ship between imagery and more 'primary' perception, and Janata gives us a modelfor structuring our understanding of this as well as other central issues of the neuro-physiological basis for musical imagery. Various methods for collecting data on brainactivity during tasks of musical imagery, as well as various proposed models for theworkings of musical imagery are presented in this chapter. In addition, the authorpresents some of his own observational findings.

Although there will always be a neurophysiological basis for musical imagery, thefocus of Virpi Kalakoski (chapter 3) is on various experiments and models of humanmemory and faculties of imagery. The topic of memory is indissociable from that ofimagery, and a number of 'classical' theories ofmemory are presented here, as well assome of the authors' own experimental findings. As a conclusion, the author suggeststhat musical imagery is multimodular as well as multimodal, ideas that we find inseveral other contributions in this book.

We hope that these first three chapters will give the reader an overview of somebasic epistemological-philosophical, neurophysiological, and cognitive issues of mu-sical imagery, and by that preparing the ground for the following chapters. In the nextthree chapters we move on to consider in more detail what kinds of constraints and/orschemata might be at work in musical imagery. Is it possible to imagine the unimagin-able, or is that a contradiction of terms? Or rather: Is everything that we can possiblyimagine based upon bits and pieces of what we have already experienced in our lives,so that imagery is mostly a matter of making new combinations? And furthermore:Does whatever we imagine follow schemata that we all have learned during the longprocess of acculturation? And: Are there effects of context, both short-term and long-term, at work in imagery so that whatever we may imagine at a given moment, in a'now', is actually conditioned by what we imagined a little while ago as well as bywhat we are expecting to imagine in the immediate future?

Posing these and similar questions shifts our attention towards the very content ofour images ofmusical sound. Given what is known about constraints and schemata atwork in other cognitive domains, it is not unreasonable to guess that such constraintsand schemata are also at work in musical imagery. In the contribution ofMarc Leman(chapter 4), the point of departure is what can be termed an ecological view of musi-cal sound, perception and cognition. Here, the basis is the continuous, sub-symbolicacoustic substrate and the neurophysiological workings ofaudition, including the self-organizing behaviour of neurons. The paper presents a modeling of a 'low-level' basisfor musical imagery, with higher level brain processes linked to this basis.

Frances Shih
Frances Shih
Page 16: Musical Imagery

ROLF INGE AND HARALD J0RGENSEN 3

From a neurophysiological point of view, constraints on musical imagery shouldbe detectable in the sense that there should be some kind of observable trace in caseswhere there is a 'violation ofexpectancies' , i.e. where musical sound does not conformto schemata in our minds. This is the topic in the contribution of Christiane Neuhaus(chapter 5). She sets out to measure and interpret patterns of brain activity acrossdifferent cultural groups. What may seem 'right' to one group, i.e. in accordancewith learned schemata, may seem 'wrong' to another group, i.e. as a violation oflearned schemata. The research of Neuhaus shows that neurophysiological responsesto expected versus non-expected patterns in musical scales are different in groups ofpeople from different cultures, and this makes it plausible to suggest that such culturespecific schemata are also present in musical imagery.

Another aspect is how our learned, internal schemata work when we are con-fronted with acoustically highly complex and even ambiguous sounds in terms ofpitch, as is the case with many inharmonic sounds. This is discussed in the contri-bution of Albrecht Schneider (chapter 6), and exemplified with inharmonic soundsfrom carillons. These sounds are analysed in view of acoustic properties such as spec-tral content and possibilities or impossibilities of having an unambiguous pitch. Itseems that beyond a certain point of complexity, listening has to rely on some kind ofsimplification of the sound material. That is, we have to 'overrule' the acoustic mate-rial and make an 'idealised' or 'stylised' image of the musical sound by filtering outfeatures which would otherwise lead to ambiguous images. This concerns the generalquestion of the ecological versus the more schematic nature of our images of musicalsound, something that can also be understood as a dichotomy between the abstractand the concrete in music (to borrow an expression of Pierre Schaeffer), ultimatelyrelated to a distinction between the 'pure' and the 'impure' in musical thought.

Speaking of the 'pure' versus the 'impure', the last three chapters of Part I dealwith various schemata in other modalities than the auditive. There are various proper-ties in the acoustic signal which over a certain stretch of time can lead to the formationof schemata by principles of self-organisation (which can be seen as a kind of learn-ing). On the other hand, there are associations formed which link musical sound toevents in other modalities, such as to vision, various motor-related sensations or tomore general emotional images. The questions in our context are to what extent, aswell as how, the images, schemata and/or constraints in other modalities can engen-der, enhance and influence images of musical sound, and conversely, to what extentimages of musical sound may trigger images in other modalities.

For instance, is it possible to imagine highly emotionally charged music, e.g.,Schonberg's Erwartung, without also arousing emotions, and conversely, can variousemotions evoke certain images of musical sound? Does imagining certain kinds ofmusic also engender images of certain kinds of movement, e.g., does imagining ajuicy tango also evoke sensations of dance movements, or does the image of a freneticdrum sequence evoke images of equally frenetic mallet, hands, and body movements?Can images of colours be associated with certain images of musical sound? The listof such questions can be very long indeed, but from what has emerged from cross-modality research in recent years, there seems to be an increasing amount of researchsupporting the idea that the simultaneous presence in our minds of elements from dif-ferent modalities can enhance images in many cases (as well as of course inhibit or

Frances Shih
Frances Shih
Page 17: Musical Imagery

4 THEORETICAL PERSPECTIVES

weaken images in some cases). One very good reason for posing such questions isto try to develop systematic methods for generating and enhancing images of musi-cal sound, in other words, to have a better understanding of what triggers images ofmusical sound in our minds, or what is the 'engine' of musical imagery.

The contribution of Mark Reybrouck (chapter 7) argues that there are many indi-cations of motor components involved in perception and cognition, and in imagery aswell, and that traditional notions of 'passive' sensory input coupled with abstract sym-bol cognition is now rejected by several neuroscientists. There is a close functionalresemblance between perception and action on the one hand, and imagery of the sameperceptions and actions on the other hand.

A similarly close relationship between emotional schemata in the imagery of mu-sic and speech is suggested in the contribution of Dalia Cohen and Edna Inbar (chap-ter 8). We may assume that for most people in our culture, images of emotions areintegral to much of the music we experience. An exploration of emotional images andschemata could then be useful in further explorations of the nature ofmusical imagery,in particular as possible agents for the recall and enhancement of musical images.

At the end of this first part of the book, the contribution of Kostas Giannakisand Matt Smith (chapter 9) explores the link between colour and images of musicalsound. The idea is that there could be a correlation here, not only for people whoclaim to have had synaesthetic experiences. A correlation between colour and imagesof musical sound can be exploited in evoking images of musical sound in the mind, aswell as be useful in machine representations of musical sound.

In summary, the first part of this book is a spreading out of the topic of musicalimagery, progressively exposing several aspects. Is it possible then to have a coherent,or even unitary perspective of what musical imagery is supposed to be? Probably yes,if we by musical imagery understand images of musical sound in our mind. Yet, ifwe think that this is too loose and all-embracing, the problem is not musical imagery,but rather music as a phenomenon. If we accept that music is infinitely complex anddiversified, there is really no reason to think that our images of musical sound shouldbe less complex and diversified.

Frances Shih
Frances Shih
Page 18: Musical Imagery

1

Perspectives and ChallengesofMusical Imagery

Albrecht Schneider and Rolf Inge

Introduction

The study of musical imagery as well as imagery in other domains is not a recentinvention. Although the so called 'cognitive revolution' of the last quarter of thetwentieth century has given the study of imagery the status of scientific investigation,we believe it is important to be aware of some historical elements here. There issometimes an embarrassing lack of historical knowledge in contemporary cognitivescience, sometimes giving us a feeling of witnessing a 'reinvention of the wheel'. Butin a more positive sense, we believe several of the questions posed today have actuallybeen given much attention as well as creative answers in the past. Because of this, weshall in this chapter give a brief, and necessarily selective, survey of some main pointsin the history of musical imagery and imagery in general. To complete this overview,we shall also at the end of this chapter try to give a summary of what we see as themain challenges for the study of musical imagery now and in the immediate future.

As will become clear when reading the various contributions in this book, musicalimagery is a complex and multifaceted topic, allowing for a multitude of approachesand scientific paradigms. Also, the term imagery has been, and is still, used to denotea manifold of phenomena as well as conceptual objects. It could be useful then to startby giving a survey of the most prevalent meanings of the term, a term which plays acentral role in philosophical treatises on epistemology, and is of similar importance inworks on cognition and 'inner perception' (often approached as apperception in the

Page 19: Musical Imagery

6 PERSPECTIVES AND CHALLENGES OF MUSICAL IMAGERY

philosophical and psychological literature from Descartes to Wundt). Further, mentalimagery plays an important role in clinical psychology and psychiatry (see contribu-tions in, e.g., Shorr et aI., 1980; Klinger, 1981). More recently, the term imagery hasbeen widely used also in the field of neuroscience where a number of experimentaltechniques have been developed to study the anatomy and actual function of the liv-ing brain, such as computerized tomography, magnetic resonance imaging, positronemission tomography (PET), and magnetoencephalography (MEG, see Clarke, 1994).These techniques have also been employed in the study of audition and music percep-tion (see NaaUinen & Winkler, 1999; Marin & Perry, 1999).

Philosophical approaches

The following notes cover, roughly speaking, philosophical commentaries on mentalimagery from Aristotle and Descartes to Husserl and the phenomenological movementof the twentieth century. Given that epistemology includes aspects of perception andcognition, on the one hand, and that psychology cannot do without epistemology, onthe other, a clear distinction between 'philosophical' and 'psychological' concepts ofimagery is hardly possible. It should be clear, in this connection, that any discussionof concepts of imagery (see, e.g., Segal, 1971; Block, 1981; MacDaniel & Pressley,1987) at some point will enter a much larger and still more complex field which gen-erally is labelled philosophy ofmind (for an overview, see Carrier [1995]). There arequite many works in philosophy and related fields (nowadays often subsumed underthe term cognitive science) which deal with the nature of mental and psychic phe-nomena, mind-body relations as well as with the brain-mind problem (e.g., Carrier &Mittelstrass, 1991; Dennett, 1991; Scheerer, 1993). Even though imagery may com-prise a limited area of mental phenomena, to understand these seems to be especiallyintricate since mental images are said to be notoriously subjective, on the one hand,and difficult to investigate by experimental procedures, on the other. It cannot be de-nied, however, that imagery is essential in artistic production as well as apprehensionof works of art.

The English term imagery which denotes, in particular, mental images as beingthe products of imagination, is obviously related to the Latin terms imaginatio andimago. The word imago in Latin has a variety of meanings which range from picture,portrait, mask, guise to scheme, vision, phantom, and also to echo, metaphor, allegory(Georges, 1869, pp. 2293-2295.). Basically, the different meanings can be grouped asfollows:

imago: 1. picture, portrait (imago ficta; if painted: imago picta);2. death-mask, portrait of ancestors;3. image, effigy, (copy, duplicate, exact likeness);4. shadow, scheme, vision, phantom, echo (also of a true voice);5. metaphor, allegory, figure of speech;6. phantom, illusion, delusion, mirage, apparition;7. view, sight, phenomenon, appearance;8. idea, conception;

Page 20: Musical Imagery

ALBRECHT SCHNEIDER AND ROLF INGE 7

This rather large semantic field of imago allows us perhaps to extract some basicfeatures. First, imago relates to objects existing in the real world which are picturedin a realistic manner. Second, subjects conceive of objects that may exist or did existonce in the real world so that these objects are somehow mirrored as images. Third,subjects may create mental images of things that never did exist in the real world, andare thus the products of imagination (e.g., witches riding on broom-sticks, or moremusically interesting, a set of planets which rotate in space whereby they produce acomplex sound which is a chord of perfect harmony). Fourth, subjects are capable ofproducing some kinds of immaterial 'copies' of objects of the material world to bestored in one's mind.

This in fact is also the central meaning of imaginatiowhich in turn is the scholastictranslation of the Greek term phantasia as was used by Aristotle (Arist. met., 980b,1010b, 1024b). According to Aristotle, perception has to be distinguished from imag-ination. Humans have a capacity to produce immaterial pictures of objects which theyhave perceived previously. This capacity can also be used to recall objects which arestored in memory, as well as to create visions of real or imaginary objects.

For Descartes, the vis imaginandi (German: Einbildungskraft; English: power ofimagination) is a source of a multitude of imaginations which include the creationof new objects not learned by previous experience (Descartes, 1642, med. sec.). Forexample, such new objects may arise in dreams which are not 'triggered' by actualsensations because we have our eyes closed, yet we can 'see' such objects. Theseimaginations, however, are not very reliable with respect to truth, and are prone to er-ror and illusion. From the point of epistemology, they are thus inferior to cogitationeswhich are the only reason that we are able to obtain reliable knowledge: Ego cogito,ergo sum, sive existo. Or, as Descartes expressed himself in the French text of hisDiscours de la methode (written earlier than the Meditationes): je pense, donc je suis(Descartes, 1637, ch. 4).

In the Discours de la methode, Descartes had elaborated on the problem of trueknowledge, and had come to the conclusion that only such things we comprehend veryclearly, and very distinctly, are true. This expression, which in the French originalreads fort clairement et fort distinctement, and in the Latin translation valde delucideet distincte, is of prime importance since in effect it relates to the difference betweenperception and apperception. 'Very clear' (fort clairement) , in this respect, are objectsthat we perceive by eye or ear, and of which we have a precise and unambiguousimpression. 'Very distinct' (fort distinctement), however, are conceptions of thingsthat bear to their very nature. If, for example, one imagines a triangle, we see threelines, and also recognize a geometric figure with certain features, which is formed bythese. If we now extend, as Descartes in fact did, this little example to a polygon ofa thousand lines, it is not possible to have an 'inner picture' of such a configurationany more, however it is possible to understand the principles of such a construction atonce (Descartes, 1642, med. sexta).

What he wants to distinguish then, is mental imagery as related to perception fromcognition as a form of conscious knowledge. We should add that Descartes' thoughtson 'very clear' and 'very distinct' played a significant role in a key work on the theoryof perception, namely, Franz von Brentano's Psychologie vom empirischen Stand-punkt (Brentano, 1874, 1924, 1928). We will return to this work below, but if we

Page 21: Musical Imagery

8 PERSPECTIVES AND CHALLENGES OF MUSICAL IMAGERY

now only sum up briefly some of Descartes' terminology and considerations in theMeditationes we find:

cogitatio =idea =imaginatioimaginari =

intelligere

thinking (and also consciousness)idea, conceptionimagination (and also power of imagination)to imagine (to conceive of, and to understand things bymeans of 'images', and in a sensuous mode)to comprehend (to understand the nature of things)

It is not possible here to go much into David Hume's Enquiry concerning humanunderstanding (Hume, 1758/1951), and to discuss the relationship of impressions tothoughts and ideas he explains there. Basically, 'ideas' are but weak aftereffects ofimpressions based on sensory data. The mind then has the task to order and to com-bine data acquired by the senses, and by experience in general. Hume, 'the hero ofthe atomistic theory' (James, 1981, p. 691), advances an associationist view of themind which makes use of the principle of 'similarity'. Specific 'ideas' are therebyassociated because of objective similarity of content (see also Wilbanks, 1968).

An elaborate concept of mental synthesis is found in the philosophy of ImmanuelKant. The basic question Kant put forward in his Critique ofPure Reason (Kritik derReinen Vernunft, KdRV B 19) is: how can synthetic propositions (Urteile, 'judicii') apriori be derived? And in particular: how is pure mathematics possible? how is purenatural science possible? In answer to these questions he developed a systematic ap-proach of epistemology called Transcendental-Philosophie, whereby transcendentalmeans transcending common experience, yet not the limits of possible human knowl-edge (the latter condition, in Kant's epistemology [KdRV B 352ff], is labelled tran-scendent, whereas reasoning that takes place within the limits of possible experience,is labelled immanent).

The foundation of the Critique of pure reason is the transcendental aestheticswhere Kant elaborates such concepts as sensation (Empfindung), perception (Anschau-ung), phenomenon (Erscheinung), sensuousness (Sinnlichkeit), conception (Vorstel-lung). His reasoning is that, in dealing with the world around us, we strive for knowl-edge which requires both notions and perceptions. Perception of things are 'given'(gegeben), in a first stage, by sensory data. These are however, processed mentallyby making reference to a system of categories such as space and time, as well as cer-tain basic concepts (Raum, Zeit, Stammbegriffe), to result in perception. Perceptionis regarded as a conception accompanied by sensations (mit Empfindung begleiteteVorstellung, KdRV B 147). It might be added that both space and time are consideredby Kant (KdRV B, 38ff, 116ft) as categories and as 'forms of perception' (Formen derAnschauung).

Objects which are perceived can also be constituted by rational thought, what thenresults in notions (Begriffe). Since knowledge in many cases implies abstraction andgeneralization, it requires such notions, and a framework of categories (reine Ver-standesbegriffe) which are needed as the foundation stones for pure reasoning.

Objects which 'affect' (affizieren) us, cause sensations which are empirical(KdRV B 34). In our mind, the object of empirical sensations is registered as a

Page 22: Musical Imagery

ALBRECHT SCHNEIDER AND ROLF INGE 9

phaenomenon (Erscheinung). Phaenomena in general call for further mental treat-ment to become perceptions in a strict sense. Opposed to phaenomena, Kant (KdRV B307) defines noumena as objects of a type of perception which is not dependent onsensory 'input' (nichtsinnliche Anschauung). One might think of, for example, moralnorms such as honesty or dignity which can be conceived of independent from actualsensations and/or perceptions. Noumena would actually be objects of a pure intel-lectual mode of perception, or rather, apperception. Kant (KdRV B 310-311) admitsthat his notion of noumenon is a problematic issue that he had introduced, in the firstplace, to delimit the range of the notion of phaenomenon, and of things covered bythis concept.

The basic problem now is: how can coherent knowledge be derived by one person?Obviously, each of us has a single, unique, and in itself coherent experience which insum makes up one's life as well as one's personality. The capacity to perform an ongo-ing synthesis of the manifold of sensations, perceptions, as well as of intellectual andmoral understanding into one configuration of theoretical and practical knowledge, isdefined by Kant as pure apperception (KdRV B 131ft). This is a spontaneous activityof the mind and the primeval force of our self-consciousness which first of all givesrise to the conception of 'cogito ergo sum'. Pure apperception (as distinguished fromempirical apperception) is thus the condition a priori which enables identity of the selfas well as enables us to acquire coherent knowledge. It is prior to actual experienceand the conditio sine qua non to 'assemble' many perceptions and ideas which relateto each other, so as to make up our consciousness and our personal experience.

In dealing with the necessary conditions for synthesis of the manifold of thingswhich are perceived (synthesis speciosa), and the manifold of thoughts (synthesis in-tellectualis), Kant introduces another concept labelled Einbildungskraft (called fac-ultas imaginandi by Leibniz), which is defined as the capacity to perceive an objectwithout its actual presence. In particular, Kant employs this concept to mediate be-tween perception as based on sensory data, and apperception which is purely intel-lectual. He distinguishes between two types of Einbildungskraft, one which he la-bels 'productive', and one which he labels 'reproductive'. Synthesis guided by reason(Verstand) and its categories is achieved by the productive Einbildungkraftwhich is as-signed to the spontaneous activity of our mind, whereas reproductive Einbildungskraftis a process driven merely by association, and thus by rules established by empiricalpsychology.

The concept of Einbildungskraft plays a major role also in Kant's theory of aes-thetics. In the Critique of Judgement (Kritik der Urteilskraft [KdU], 2nd ed. 1793),both the productive and the reproductive types are considered with respect to artisticexpression. While the reproductive type of Einbildungskraft again is discussed withreference to association and memory, the productive type is regarded as a major fac-tor which accounts for aesthetic creation as well as aesthetic experience (KdU 68ff,192ft). In particular, productive Einbildungskraft leads to artistic 'ideas' which tendto transcend everyday experience (e.g., surrealistic paintings, arrangements of soundthat bring about auditory illusions). Aesthetic ideas (transformed into works of art),cause us, in Kant's opinion, to think a lot even though it may be impossible to ex-plain such ideas by speCific concepts or notions (KdU 193). In this respect, becauseof a wealth of intrinsic meaning, works of art may withstand complete description and

Page 23: Musical Imagery

10 PERSPECTIVES AND CHALLENGES OF MUSICAL IMAGERY

explanation by means of language.Productive Einbildungskrajt, however, is also used by the viewer or listener who

perceives works of art. Because of its formal relation to reason, Einbildungskrafiaccounts for validity of aesthetic judgements even though such judgements are also amatter of taste, and of subjective pleasure (Wohlgefallen). As Kant elaborates through-out this work, the main reason for such pleasure is that works of art exhibit expediencywithout a formal purpose (Zweckmiissigkeit ohne Zweck). The high degree of coher-ence and expediency which we register when looking at DUrer's Melancholia, or lis-tening to Bach's Art ofthe Fugue, can be attributed to our productive Einbildungskrajtwhich is spontaneous and free on the one hand, yet also rational in certain ways, onthe other hand. It has recently been argued again that without imagination, hearingmusic as music would be impossible because of the metaphorical nature of many, ifnot most compositions (cf. Scruton, 1997, ch. 3; see below).

Psychological approaches

We shall now turn to Brentano's Psychologie vom empirischen Standpunkt as wellas to other writings of this scholar who investigated philosophical foundations ofpsychology. The very center of Brentano's approach is the analysis of human con-sciousness and the principle of intentionality. It is characteristic of all acts of ourconsciousness (Bewusstseinsakte) that these are directed towards objects which covermuch more than things found in the world around us. The main reason for this isthat we are capable of what Brentano calls inner perception, distinguished from outerperception which is based on sensation (Brentano, 1974, pp. 40ff and 104ff; as to theprinciples of Brentano's psychology of 'acts' which can be distinguished from variousapproaches to psychology of 'content' , see Boring [1950, chs. 17, 18, and 19].).

Objects of inner perception may for instance be judgements and decisions. How-ever, also objects usually understood as fictional, such as the flying horse Pegasus, orangels singing with a 'crystal voice', can become the actual 'content' of our conscious-ness, and will in this respect be regarded as 'real'. This point is of great relevancewhen we consider, for example, artistic invention. We might point to the surrealisticmovement, and to Salvador Dali in particular who propagated a concept labelled 'crit-ical paranoia' which is based on both rational analysis of phenomena and seeminglyirrational images which are however totally coherent and meaningful (see Gomez dela Serna, 1977). In music, works of the composer Bernd Alois Zimmermann (1918-1970), such as Monologe (for two pianos) and the Musique pour les soupers du RoiUbu, come to mind, works which are the offspring of what Zimmermann called plu-ralistic composition. 1 This approach incorporates techniques of montage and collagewhereby spontaneous ideas and the momentary content of consciousness contribute tothe emergence of a new arrangement together with various resources stored in mem-ory. Pluralistic composition means that many small musical elements (themes, motifsfrom well-known works plus tunes from jazz standards etc.) which come to the com-poser's mind are used as building blocks for a type of work which also makes useof different textures, changes in dynamics, etc. Since composition implies meaning-ful arrangements of musical elements (which may be heterogeneous with respect to

Page 24: Musical Imagery

ALBRECHT SCHNEIDER AND ROLF INGE GOD0y 11

historical and cultural criteria, as in this case), composers quite often start their workwith sketches which reflect basic ideas and other 'early' conceptualizations.

In his psychology, Brentano grouped all intentional activity of the mind into threeclasses, namely (1) conceiving (Vorstellen), (2) judgements (Urteilen), (3) emotionsand feelings (Gemiitstiitigkeit). The German term Vorstellen could be translated alsoby imagining as well as by imaging, since in English to form a conception means toimagine (to form a mental image of something not present) and/or to image (to call upa mental picture of something not present; see also Casey [1976]). Psychic phenomenaare related to imagining, which in turn typically will be combined with judgements.Imaging, to be sure, does not in the first place refer to the objects which are imagined,but to the psychic act of imaging. For example, listening to music leads to imaging, theobject of which being the listening process itself as a psychic phenomenon, whereasBrentano defines the sound that we listen to as a physical phenomenon, and thus theobject of outer perception (Brentano, 1924, pp. 170ft).

If we imagine a certain sound or chord as part of inner perception and are awareof this so that we have an image of us as imaging, this process would according toBrentano take place in but one psychic act of our mind. It would include, however, twodifferent objects, one being a physical phenomenon (the sound), the other a psychicphenomenon (the act of listening). In accordance with Aristotle, Brentano regards thephysical phenomenon of sound which we listen to as the primary object of listening,and the psychic phenomenon of listening as the secondary object which is 'perceived'(The notion ofperception in a phenomenological perspective deserves a more detaileddiscussion; cf. Chisholm [1956]).

This type of inner perception also relates closely to apperception, a term whichwas used by Descartes, Leibniz, Kant and others (see above). To understand fullysomething which enters our mind either by way of sensations and outer perception, orby imaging, requires mental activity such as vigorous attention, concentration, mem-ory, and forms of judgement, such as comparison or distinction. Only this will resultin valde clare ac distincte percipere, as was stated already by Descartes.

Brentano argues that there are two different modes relevant to how things maycome to our minds. The first he describes as explicit and distinct, the second as implicitand indistinct (Brentano, 1928, pp. 33ft). If, for example, someone listens to a chord

and is able to distinguish the different notes which make up this chord,he or she will be able to identify the chord as being of the 'Tristan' -type (cf. Vogel,1993, pp. 478-481), and will also be aware of listening to this specific chord. Thisawareness would qualify as explicit and distinct, whereas someone who perceives thechord as one entity without further analysis would only reach the level of implicit andindistinct knowledge.

It is this type of explicit and distinct perception accompanied by conscious aware-ness of the act of perceiving which yields what phenomenologists from Brentano toHusserl have accepted as 'evidence'. The epistemological concept of Evidenz, andthe term itself, especially in Brentano's philosophical (but still empirical) psychol-ogy, is fundamentally connected with inner perception. Husserl, who was a studentof Brentano and Stumpf, argued that such Evidenz could stem also from outer percep-tion, and in this respect has stressed the importance of apperception which takes placeregardless of whether we deal with sensory input or with things we only conceive of

Page 25: Musical Imagery

12 PERSPECTIVES AND CHALLENGES OF MUSICAL IMAGERY

(Husserl, 1901, pp. 222ft). In particular, Husserl points to the fact that since apper-ception involves mental acts, it will result in a conception of what has been perceived.This conception is of course not just a copy of things in the world around us but thephenomenon (Erscheinung) we conceive of. Phenomena can be understood as the in-tentional correlates of acts of perceiving. Several phenomena, for example musicaltones, may appear to be similar in such a way that they allow abstraction of invariantfeatures, and to conclude from here as to the essence (Wesen, 'Eidos') of all musicaltones (Husserl, 1976, pp. 410ft).

It is from this point of view that Husserl also developed his theory of categoricalperception (Husserl, 1901, pp. 40ft). Categorical perception means transcending per-ception based on actual sensation, and is defined by perceiving what Husserl labelsideal objects, objects distinguished from real objects ('ideale' versus 'reale' Gegen-stiinde). Ideal objects are constituted by way of mental acts such as comparison,judgement, and, in general, abstraction, whereas real objects are sensed in direct ac-cess. Perception of real objects is more simple since it is achieved in but one step. Bycontrast, categorical perception may involve several acts to constitute ideal objects.Thereby, Husserl's approach to categorical perception is quite different from that ofcontemporary psychology, linguistics and musicology where 'categorical perception'often is understood as perceiving stimuli which subjects order and assign to a limitednumber of learned 'categories' (cf. Schneider, 1997a). In this perspective, 'categori-cal' means 'classificatory' in the first place. Husserl (1976) has pointed out that thistype of classification is basic to human modes of rational thought which, however,comprise also more abstract procedures.

Listening to music: the constitution of ideal objects in the 'time domain'

Listening to a work ofmusic (or music which is improvised according to some precon-ceived scheme) is a task which involves constitution of ideal objects, namely graspingthe compositional structure as well as principles of musical form such as symmetryand contrast, repetition and variation. Constitution of ideal objects is achieved, mostof all, by way of abstracting formal properties of such an object from actual sensa-tions as well as by making reference to knowledge acquired earlier. Regarding worksof music, this knowledge will relate to the syntactic, pragmatic and semantic levels,respectively, and is needed to grasp the 'meaning' inherent in both formal structures aswell as in particular textures or even in the composition as a whole (cf. Stoffer, 1996).

Since music is an art which is centered in the 'time domain' , the ongoing processof constitution in actual listening has to be worked out along the time axis by manyconsecutive acts of consciousness. Brentano had devoted much labour investigatingcontinua, and how continua such as space and time could be perceived. He discussesthe concept ofProteriisthese which has to be regarded as eine Reihe von kontinuierlichsich folgenden Wahrnehmungen, that is, perceiving is an act which is repeated againand again along the time axis, and results in a sequence of 'frames' of perceptionfrom some point in the past to the present.2 To perceive objects such as melodieswhich unfold in time, it is necessary, however, that the parts of the object which havebeen perceived already still be present while the next parts follow. Edmund Husserl

Page 26: Musical Imagery

ALBRECHT SCHNEIDER AND ROLF INGE GOD0Y 13

(who was a student of Brentano and for a short period, also of Stumpf) discussessuch problems in his theory of inner time consciousness (Husserl, 1928), where heelaborates on such principles as protention (anticipation and expectancy, Erwartungor Protention in German) and retention (that is, Erinnerung vergangener 'letztpunkte',recollection of such 'points' of the present which have just passed, yet have not ceasedto exist in memory), as well as on the idea of the 'present' (letzt, Gegenwart). Husserl(1928, p. 385) explicitly relates to Brentano in that a sequence of acts of perceivingforms an Aktkontinuum which with respect to temporal objects includes retention,actual perception, and protention. Since the temporal object, e.g., a melody or evena work of music, extends over a certain stretch of time, retention, perception, andexpectancy cover certain parts of this Zeitstrecke:

past(retention in memory)

actual perception('now')

future(expectancy)

.....I(... ...

Husserl's considerations are of interest especially regarding the perception of con-figurations organized in time such as melodies.3 His views have been influential andare found in writings on perception and music of, among others, the social scien-tist Alfred Schlitz (1976), and the musicologist Thomas Clifton (1976, 1983) whoalso drew on ideas of the French phenomenologist Maurice Merleau-Ponty (Merleau-Ponty, 1945). Clifton (who sadly passed away much too young) applied Husserl'sconcept of protention, retention etc., to the analysis of compositions, for instance, toWebern's Bagetelle No. J for String Quartet, Ope 9. Jean-Paul Sartre (1940) who alsoadopted elements of Husserl's philosophy, argued that the constitution of a musicalwork - as an example, he pointed to Beethoven's 7th symphony - by way of listeningcan only result in an analogue of the musical object created in an actual performance.Through the process of listening, the work as a whole is in the end imagined ratherthan perceived. This is the reason, Sartre concludes, that we have difficulties to returnto 'reality' of everyday life after attending a concert.

In a systematic treatise on the ontology of musical works (which are neither identi-cal with their score nor with the manifold of their realizations by way of performance),the Polish philosopher Roman Ingarden (1893-1970), also a student of Husserl, haselaborated on the conceptualization of musical works (Ingarden, 1962). He arguesthat in one respect, the constitution of the work of music as an organized whole byan experienced listener is achieved because the parts of the work are structured intime according to hierarchical levels. Thus, when listening to music, we register boththe constituents of individual compositions as well as the hierarchies inherent in asequence of different parts which in total make up the temporal Gestalt experiencedas a work of music. This experience also means that different parts are perceived tobe of different 'weight' with respect to the overall configuration, and also of differentquality in terms of aesthetic criteria. Consequently, the time of musical experienceis not homogeneous, since the 'transfer rate' of musical information and meaning isuneven due to changes in complexity and intensity of a work which is performed, orplayed from some recording, to a listener. Detailed considerations of these issues are

Page 27: Musical Imagery

14 PERSPECTIVES AND CHALLENGES OF MUSICAL IMAGERY

found in theMusikpsychologie ofErnst Kurth (Kurth, 1947) who -like Ingarden - wasinfluenced by the concept of time consciousness and subjective experience of time putforward by Henri Bergson.

Perhaps one of the most extensive applications of phenomenological ideas in mu-sic theory can be found in the work of Pierre Schaeffer. One aim of his monumentalTraite des objets musicaux (Schaeffer, 1966) was to give a foundation for characteriz-ing sound features in a general way, that is, sound not restricted to that of traditionalmusical instruments or the human voice. On the way to establish a multidimensionalmatrix for characterizing any sound object based exclusively on the subjective lis-tening experience, Schaeffer goes through a number of exclusions of what the soundobject is not (Schaeffer, 1966, pp. 95-98; here quoted from a slightly different sum-mary in Chion [1983, pp. 34-35]):

The sound object is not the sounding body.The sound object is not the physical signal.The sound object is not a fragment of a recording.The sound object is not a symbol notated in a score.The sound object is not a state of the soul.

This is a progressive ontological differentiation running from the source of thesound to the intentional constitution of the sound object in the 'listening conscious-ness' (Schaeffer, 1966, p. 147). In fact, the 'ultimate' reality for Schaeffer is clearlythe intentional constitution of the sound object, and later on (with addition of certaincriteria), of the musical object, as an intentional unit in our minds, hence as a rich andvivid instance of musical imagery. His entire typo-morphological matrix ('Tableaurecapitulatif du solfege des objets musicaux' in Schaeffer [1966, pp. 584-587]) can infact be seen as a mental technique for guiding our scrutiny of internal images of mu-sical objects by establishing progressively finer differentiations of the various featuredimensions in the musical object through shifts in intentional focus.

The question then is if musical experience ruled by individual intentionalities issimilar for a number of subjects. In general, experienced listeners can be expected toperceive formal structures in works of music in similar ways. However, subjects havedifferent thoughts and different imagery of what they hear, something which may beunderstood as 'unasserted thought' (cf. Scruton, 1997, ch. 3). Also, musical structureand processes are often conceived of, and described, in a rather metaphorical way.For example, the changes in instrumentation, tempo and dynamics which are found inmany symphonic works have been interpreted in terms of forces, energy, and maUer,the interplay of which brings about movement, as well as shape, in music (see, e.g.,Kurth, 1947). But musical imagery is by no means restricted to subjective 'associa-tions' or 'connotations' (cf. Meyer, 1956, ch. VIII). Rather, seemingly metaphoricalconceptions of music may reflect features of the temporal organization of a work (orimprovisation) fairly well. The reason for metaphorical conceptualizations, as well asimagery, has been explained thus: In hearing music (which is based on, yet not re-stricted to, organized sound), we develop a kind of double intentionality: 'one and thesame experience takes sound as its object, and also something that is not and cannotbe sound - the life and movement that is music. We hear this life and movement inthe sound, and situate it in an imagined space...' (Scruton, 1997, p. 96). Music in the

Frances Shih
Page 28: Musical Imagery

ALBRECHT SCHNEIDER AND ROLF INGE GOD0y 15

imagined space thereby appears to move up and down, melodies are felt to be 'risingand falling', etc. Since the impression of 'rising' and 'falling' of course is connectedwith physical parameters such as 'high' and 'low' frequencies, the space of musi-cal imagery would basically comprise the same dimensions as does the phenomenalspace of normal experience (See Schneider [1992] for a discussion of 'tonal space'and phenomenal attributes of sound).

Stumpf and Riemann

We now turn to Carl Stumpf (1848-1936), another pupil of Brentano, and one of thefounders of systematic musicology. Stumpf has addressed fundamental issues in per-ception and cognition in several of his books as well as in other publications. Besidesthe two volumes of the Tonpsychologie (1883, 1890), there are two philosophicaltreatises which are of relevance, namely Erscheinungen und psychische Functionen(1907) and Empfindungen und Vorstellung (1918); also, there are chapters dealingwith perception in Stumpf's book Die Sprachlaute (1926) and especially in Vol I ofhis Erkenntnislehre (published posthumously in 1939). In particular in his Tonpsy-chologie he considers various aspects of auditory perception as related to mental acts(labelled by Stumpf, and also by Klilpe, psychische Funktionen, psychical functions;cf. Stumpf [1907, 4: Akte, Zustiinde, Erlebnisse]), such as judgement, comparison,recognition, conceiving, imagination, emotion, desiring, and intentions. In his trea-tise Empfindung und Vorstellung (1918), Stumpf gives a systematic account of howsensation and imagination differ in intensity yet also in quality.4 Stumpf further ar-gues that imaginations can be modified any time at will, whereas sensations are moreresistant to subjective interpretation. Also of interest are Stumpf's remarks on howactual sensations (of, e.g., musical tones) might be complemented and reinforced byrecollections of the same objects.

Of the many problems discussed in detail in the Tonpsychologie, one might men-tion Stumpf's approach to scaling by way of subjective estimation of stimulus differ-ences cognitively turned into distances on one or several dimensions. This of courseimplies that subjects conceive of the perceptual difference as a spatial distance. Oneinteresting case Stumpf investigated is how we try to estimate the relative distanceof two tone complexes (clusters) with respect to a single dimension of pitch (Tonpsy-chologie II, 1890, pp. 406ft). Since such clusters cannot be easily analyzed into theircomponents (they appear to be 'complex wholes'), subjects are forced to make judge-ments based on a phenomenal difference which then is translated into a distance. Thedistance itself, however, is not perceived, yet is a spatial model employed in cognition.

Cognitive analysis of complex sounds, as well as musical structures, typically in-volves images because musically trained subjects can be expected to mentally projectthe sounds they hear, and the notes they apprehend in terms of a musical syntax, on aspatial model. If we listen, for example, to the chord c-e-g-bD-d' made up of com-plex tones (played by, e.g., five saxophones), we might conceive of the resulting soundstructure in terms of a two- or three-dimensional spectral representation. Also, thenotes could be projected onto a two- or three-dimensional structure known to musictheorists as 'tone net' or 'tone lattice' (see Riemann, 1914/15; Fokker, 1945; Vogel,

Page 29: Musical Imagery

16 PERSPECTIVES AND CHALLENGES OF MUSICAL IMAGERY

1993). The spectral representation would be helpful in understanding acoustical prop-erties of the sound, for example, the coincidence of partials which belong to differentnotes. Projecting notes on a tone net which is made up of constituent musical inter-vals (fifths, major thirds, 'natural' seventh) can be useful if we want to understand theharmonic structure of chords as well as of textures comprising several notes playedsimultaneously (see below).

Stumpf's theory of consonance is based on the principle of Verschmelzung whichis not to be understood simply as a sensation of fusion or coalescence of a set of tones(or chords). Rather, Verschmelzung is a special case of Gestalt perception of severalsounds (e.g., musical notes) played simultaneously (see Gurwitsch, 1975, pp. 66ff;Schneider, 1997b). As we hear such a mixture of several complex harmonic soundswhich (in just intonation) blend perfectly, we employ acts of both analysis and in-tegration so that we may 'switch' between apperception of several or even all theconstituents of the harmonic complex, on the one hand, and perception of a highlyintegrated whole, on the other. Perceiving configurations such as chords thus meansthat subjects may concentrate, at one instant in time, on the constituents in their har-monic relations (e.g., the notes as well as the intervals they form), andon the resulting sonorous object in the very next moment. Stumpf investigated thisprocess of focusing attention on either the constituents or the resultant whole experi-mentally by listening again and again to complex chords played with mixture stops ona pipe organ tuned to just intonation. As to the specific experience of Verschmelzung,Stumpf argued that this quality is valid also if one only imagines two tones, c and g,to be played simultaneously (Tonpsychologie II, pp. 138ff). In this respect, imagerywould not differ from listening to real notes.

There are other phenomena, though, where sensation and imagery are not the sameanymore. For example, Stumpf points to two notes, c and which if played on aninstrument simultaneously will cause roughness and/or beats. He says that he couldimagine those two notes either with no beats at all, or with beats, whereby the numberand intensity of such beats could be freely modified. On the other hand, Stumpf admitsthat hearing in certain cases exceeds the capabilities of imagery. From experimentshe conducted on himself (Tonpsychologie I, p. 179), he found that even though one isable to perceive a very high note (at the very limit of human hearing), it is impossibleto clearly imagine such a stimulus if it was not sensed just prior to imagining.

Notwithstanding his keen interest in cognition, Stumpf took a nativist stand inmany issues of auditory perception. He admitted that consonance is in the first placea matter of sensation, whereas harmony is much more dependent on apperceptionand relational thinking (cf. Stumpf, 1898, 1911). He therefore distinguished betweenconsonance and concordance, the latter being a cognitive principle: if we listen tomusic composed in tonal harmony which is played with poor intonation, the actualsound structure can hamper the sensation of consonance. Poor intonation, however,cannot prevent us from conceiving the musical structure in terms of tonal harmony, asfar as musical syntax is concerned, and in terms of just intonation, if we imagine howthe music should have been played correctly.

With respect to this distinction, Stumpf was thinking in particular of tonal musicplayed on a piano tuned in equal temperament where we have only twelve keys peroctave to realize many more notes. Stumpf recognized that given such circumstances,

Page 30: Musical Imagery

ALBRECHT SCHNEIDER AND ROLF INGE 17

all chords, and in particular all chords and sonorities comprising more than three si-multaneous notes, will be ambiguous when played on a piano or other keyboard tunedin equal temperament since they can have different functions within a harmonic tex-ture, and thus, different 'meanings'. A fact which further complicates the issue is thatalso notation is often equivocal in the sense that c-sharp and d-flat, g-sharp and a-flatetc. are taken so as to represent the 'same' note. With respect to conceptualizations oftonal music, Stumpf, and also Hugo Riemann (in Riemann, 1914-16), did argue thatexpert listeners will not take a for an ab yet will conceive of notes with respect tothe harmonic context notwithstanding actual intonation which might be poor or evenexplicitly wrong.

The problem addressed by Stumpf and Riemann is in fact that of the widespreaduse of equal temperament. Tonal relations which can be conceived of, and be found incompositions, to be complex and diversified, will be levelled if works rich in harmonyare played on a piano or organ. To illustrate the case, consider a simple cadencecomprising the chords C-d-F-D-G. To realize this cadence in just intonation wouldnecessitate that we have the following notes (and pitches) at hand:

I I I If-c-g-d-a

(horizontal rows =perfect fifths 3/2)(vertical columns = major thirds 5/4)

It is easy to see that the 'a' of the F-major chord is not identical with the 'a' ofthe D-major chord, as these are in fact two different pitches with respect to intonation(which differ by a syntonic comma of 22 cents). Also the 'd' in the d-minor chordis not identical with the 'd' of the D-major chord. Since on a piano or other keyboards,however, there is only one 'a' as well as only one 'd' per octave available, these have tobe regarded as 'compromise pitches' in equal temperament which are used to realizevarious chords which differ with respect to interval structure and tonal relations. It isthe expert listener who has to find out which 'a' and which 'd' is needed to realizecertain chords. Therefore, he or she has to conceive of the harmonic structure which,according to Stumpf (1911, 1926) and also Riemann (1914-16), by expert listeners isalways done on the basis of pure intervals such as 2/1, 3/2,4/3,5/4,6/5 etc. Thus, eventhough music as actually played on a piano or other conventional keyboard deviatesin intonation from the tonal relations the composer had in mind when writing chords,chord progressions and modulations, the expert listener is capable of the appropriateconceptualization of the harmonic structure.

The tertium comparationis which enables relating the actual performance of agiven work to the rule system of tonal harmony, would be to take the harmonic texture'as if' it was written to be played injust intonation. Thereby, it could also be perceivedin an unambiguous way. The expert listener, according to Riemann, is someone ca-pable of analysing complex sequences of chords in terms of correct Tonvorstellungen(Riemann gives some examples in the Jdeen). In practice, this means that the listenerwould have to abstract the correct harmonic scheme of a given piece from the notationand the intonation which both can be quite ambiguous. Regarding conceptual-ization, Riemann tentatively discussed a number of principles such as what he calls'economy of listening'. That is, in actual listening, simple tonal relations based on

Page 31: Musical Imagery

18 PERSPECTIVES AND CHALLENGES OF MUSICAL IMAGERY

small integer ratios should in general be preferred against more complex ones.5 Rie-mann's ldeen certainly were a serious attempt at formulating a theory of harmonicimagery (this being a central part of musical imagery). Unfortunately, Riemann diedsoon after he had published his ldeen which he regarded as a preliminary work.

So far, there have been very few attempts at putting Riemann's assertions to test.He had claimed that, by systematically studying the tonal relations inherent in certainworks of music, one might be able to take these as 'tracks' (or traces) of the com-poser's imagination which thereby would become accessible. Riemann himself be-lieved to have understood, most of all, works of the late Ludwig van Beethoven (pianosonatas and string quartets) of which he had made extensive analyses. As interestingas his theory of Tonvorstellungen is from the point of view of cognitive musicology,to validate this concept will necessitate closer examination as well as experimentalresearch.

Imagery, which Riemann had addressed regarding harmony, apparently plays arole also in timbre perception. Stumpf found that, in order to identify different timbres,as well as to distinguish instruments and voices by their respective 'tone colours' , lis-teners try to single out characteristics like brightness, sharpness, density, fullness etc.from complex sound entities. Thereby, perceptual analysis of sound qualities resultsin distinguishing timbres on the basis of phenomenal attributes, attributes which withrespect to complex sounds can be considered dimensions of what Stumpf (and alsohis student, Wolfgang Kohler) describes as Tonfarbe (tone colour; see Stumpf, 1926,especially ch. 15). Conceptualizations of timbre again involve spatial characteristicswhich are helpful in the classification of sounds. Of course, certain qualities we per-ceive in complex sounds must have correlates in the time function and spectrum of thephysical stimulus as well. However, different timbres seem to be perceived, analyzed,and compared to each other with reference to such dimensions which stem from, andare bound to, the phenomenal appearance of sounds. To subjects perceiving soundsproduced by various instruments, some appear to be 'fat' or 'voluminous', others areregarded as 'thin', 'sharp', 'dense' or 'hollow'. Subjects tend to imagine such specificsound qualities along with certain instruments. It should be noted that Stumpf's find-ings and thoughts on timbre have been acknowledged as outstanding in a more recentpublication on sound colour (Slawson, 1985).

Aspects of research on musical imagery

Besides Stumpf's observations on Verschmelzung and Tonfarbe which were based onseveral experiments he had carried out alone or with co-workers and students, someinteresting research on musical imagery was started by other psychologists, and fromsomewhat different angles. As early as 1885, Hermann Ebbinghaus had published abook on memory which deals with aspects of learning as well as memory for, andreproduction of, learned verbal items. The findings reported by Ebbinghaus (1885)were the result of extensive experimental work which includes statistical analyses (seeBoring, 1950, pp. 386ft). With the publication of Ebbinghaus' study, memory becamea topic which attracted many psychologists. To be sure, memory with respect to tonesand other musical objects is extensively covered also by Stumpf (1883, 1890), but one

Page 32: Musical Imagery

ALBRECHT SCHNEIDER AND ROLF INGE 19

of the most remarkable works from this period is the essay 'On "Gestalt Qualities'"by Christian von Ehrenfels from 1890 (Ehrenfels, 1988). In this essay, von Ehrenfelsdescribes a holistic approach to musical imagery, meaning that imagery is enhancedwhen as many as possible of the senses and contextual elements are mobilized, e.g.in order to imagine an orchestral work, we should imagine a scene with the entireorchestra in front of us, the conductors movements, the concert hall, the lightening inthe concert hall, the ambiance, etc.

Another area of research which became influential within the phenomenologicalmovement, on the one hand, and Gestalt psychology, on the other, was that of eide-tic imagery (for an introduction to the field as it was developing around 1920, seeJaensch [1927]). In particular, this research centred on subjects ability to recollect ob-jects (which had been perceived earlier) and to reproduce them as visual or acousticalimages as precisely as possible. Eidetic images were regarded as intermediate betweenperception - which, according to a broad and well-known definition is the experienceof objects and events which are here now (Newman, 1948, p. 216) - and imagery,which deals with mental images being the product of 'free' or 'pure' imagination (seesection Philosophical approaches (pp. 6ft) above).

Regarding eidetic memory in music, there are some famous cases reported in theliterature: Mozart is said to have been able to write down Allegri's Miserere in fullafter hearing it only once or twice.6 Also, he is said to have had stored hundreds, if notmore, compositions in memory (Knepler, 1991, ch. 2). Further, there are some sources- unfortunately dubious as far as philology is concerned - relating to Mozart's imageryas being a central part of his creative activity (see Duchesneau, 1986, pp. 103ft). An-other case of interest is Beethoven who - after his sense of hearing became impaired- did write some of his greatest works, something which could of course not havebeen achieved without an unusual strength of imagery. Quite many composers havesaid that they had 'visions' of a complete work before they did put anything to paper(see Duchesneau, 1986, pp. 103-109). We may also point to reports from neurologistswho have observed patients suffering from serious brain dysfunctions, yet still seemto be able to recollect a large number of melodies, or even complete works of music.Perhaps one of the most unusual cases is that reported by Sacks (1985, ch. 22).

In German psychology of the 1920's, there were attempts to classify eidetic im-ages with respect to memory. Shortly after a (visual) stimulus has been presented, anafter image (Nachbild) will remain accessible to the subject who, after some time (ingeneral, several minutes), will also be able to form an Anschauungsbild from recol-lection which correctly gives the features of the original stimulus. Finally, after moretime has elapsed, it is possible for many subjects to recover what they had perceivedearlier as an image (Vorstellungsbild). Apparently, these images are formed by meansof 'retrieval' of information stored in long-term memory.

To check whether this tripartite classification would hold also for musical imagery,Rudolf Kochmann (1923) carried out experiments with school children, because ithad been found by some researchers that eidetic memory of the Anschauungsbildertype develops to a maximum during childhood, and seems to degenerate later on.Kochmann played sequences of up to ten notes, sequences which did not correspondto any popular melodies, to individuals (boys of age 10 to 17 who attended differenttypes of schools, and also differed with respect to musical training and abilities) who

Page 33: Musical Imagery

20 PERSPECTIVES AND CHALLENGES OF MUSICAL IMAGERY

were asked to recollect and sing these sequences after a break in which the subjectswere set to do other musical tasks. The reproductive task (singing the tone sequencewhich had to be recollected) started five minutes after the sequence had been presented'first, and was repeated after another five minutes as well as after twenty minutes. Theresults of the experiments showed that subjects varied substantially with respect to theextent and the precision of recollection (as well as reproduction) ofquasi-melodic tonesequences. Kochmann concluded on the basis of his observations that there is no clearboundary between recollection (Anschauungsbild) and imagery (Vorstellungsbild) ofmusical objects such as tone sequences.

Not so many experimental studies on musical imagery were published subsequentto Kochmann's.7 One investigation published by Mainwaring (1933) explores kinaes-thetic imagery, as for example in the case of piano players who can be expected torecollect musical phrases by imagining the action of their hands and fingers play-ing a certain melody or piece. (As to imagery and motor behaviour, see andalso Reybrouck [this volume].) More recently, experiments were carried out whichchecked the influence of imaged pictures and sounds on the detection of visual andauditory signals (Segal & Fusella, 1970). It was found, not surprisingly, that mentalimagery can hamper actual perception of both visual and acoustical stimuli if per-ception has to be achieved during the same time as the subjects are occupied withimagery. Also, it was found that whereas intra-modal (visual or auditory) imagerydid affect perception, cross-modal imagery did not. This indicates, besides other evi-dence (see also Kosslyn, 1980, 1994), that imagery basically employs the same neural'channels' as well as cognitive mechanisms as are needed for perception.

Among the experiments related to imagery for melodies and melody-like phrases,there were some which by their design are not too far away from Kochmann's inves-tigations. In the study of Weber & Brown (1986), subjects were asked to designatethe contour of simple melodies (all in 4/4 meter) which they heard as sung from tape.This was done by writing horizontal lines of equal length, one for each quarter note,whereby the position of the lines relative to each other represent relative pitches of themelody (as well as steps of a diatonic scale). The melodies were either sung with theproper words ('song condition'), or only with the syllable 'ba' ('melody condition')on each note. Subjects in one trial were allowed to sing the melodies or songs aloudwhile drawing the contour, and in a second trial had to designate the contour while au-ditorily imagining the musical phrase. The parameters measured were processing timeand errors made by subjects. No significant effect as to the overt versus the imaginedcondition was observed. However, songs were processed faster than were melodiesalone. The results suggest that, with respect to this specific task, overt singing doesnot give better results in recognition of simple melodic phrases than does imaginedsinging.

Other studies considered imagery of (complex) tones and chords (Hubbard &Stoeckig 1988) as well as imagery for timbres (Crowder 1989, Pitt & Crowder 1992).Hubbard and Stoeckig confirmed some of the observations already made by Stumpf(1883,1890,1907,1911,1918), namely that images in certain cases can substituteperceptions, and that the time needed to form images of, for example, chords or othermusical objects in general, increases with complexity. Crowder had subjects judgewhether or not two tones of a pair were equal in pitch. The timbre of the two tones

Page 34: Musical Imagery

ALBRECHT SCHNEIDER AND ROLF INGE GOD0Y 21

could be either the same or different. From the reaction time analysis Crowder (1989,p. 474 and p. 477) concluded that people are faster to judge that two tones have thesame pitch if they also have the same timbre than otherwise. This result remainedvirtually the same when the first of the two tones was an internally generated image ofa timbre rather than a true tone. In experimental work related to that just mentioned,Pitt and Crowder (1992) found that for imagery of timbre, spectral characteristics aremost important for subjects. This is of interest since in many investigations on timbreperception, the transient and onset portion of sounds (plucked, bowed, blown etc.),and thus temporal and dynamic features, turned out to be a cue for identification oftimbres (see, e.g. Iverson & Krumhansl [1993] and references there), which howevermay sometimes have been overestimated (see findings in Reuter [1995]).

Musical imagery is not only closely connected to perception and memory, it re-lates also to research in synaesthesia and to semiotics of music. As to the latter, itis interesting to see, for example, what musically trained subjects conceive of worksthey either only hear, or read as a score, and which 'connotations' arise from eitherhearing or reading a work which evidently calls for formation of images (for a casestudy, see Schneider 1995).

Challenges for musical imagery research

One conclusion to our brief overview here is that musical imagery is a composite or'impure' phenomenon in the sense that it comprises many things at the saIne time. Inline with this, and in looking forward, it could be useful to make a short assessmentof what we see as some important challenges for the study of musical imagery.

Obviously, there is a need for more knowledge about the neurological bases of im-agery. An overview of this is presented in the next chapter (Janata, this volume), butfrom what is already known today, there seems to be a 'functional equivalence' be-tween perception and imagery, meaning that much of the same neurological substratesinvolved in 'primary' perception are also involved in 'pure' imagery. The distinctionbetween perception and imagery seems then even on the neurological level not soclear-cut, something which may help us understand better the complex interactionsof more 'primary' perception and more 'pure' imagery, a topic which we saw in theprevious sections of this chapter has been recurrent in the history of musical imagery.Related to this is the issue of the neurological bases of contextual images, i.e. theworkings of memory images of the recent past and expectancy images of what is tocome next, what in the above mentioned terminology of Husserl was called respec-tively 'retentions' and 'protentions'. Also, advances in knowledge of the neurologicalbases of cross-modality (Stein & Meredith, 1993) could hopefully help us to betterunderstand what triggers images of musical sound in our minds. Various studies re-ferred to above seem to suggest that there are in many cases strong links betweenvisual imagery, motor imagery, and musical imagery.

There are many questions concerning the relationship between imagery and var-ious schemata in perception and cognition (see for instance chapters 4, 5, and 8 inthis book). Does musical imagery follow learned schemata, and are various schematafor musical sound (such as categories for pitch relationships, for harmonic, melodic

Page 35: Musical Imagery

22 PERSPECTIVES AND CHALLENGES OF MUSICAL IMAGERY

and formal elements, rhythmic patterns, etc.) really instances of more long-term or'slower' kinds of musical imagery? Related to this are questions about the ecologicalcontent of musical imagery, meaning the 'concrete' and particular qualia of the im-ages as opposed to the more 'abstract' structural features, e.g the distinction betweena detailed, salient image of a particular vocal performance of a well known tune anda more indistinct and 'generalized' image of that tune. This distinction between 'con-crete' and 'abstract', adopted from Pierre Schaeffer (Schaeffer, 1966), could also beseen as a distinction between particular and general, or between low-level and high-level features in images, something which has been studied in visual imagery (Rouw,Kosslyn & Hamel, 1997). This is again related to the ecological constraints at work inmusical imagery, meaning the question of whether all the material in musical imageryis derived from experienced sound and is in accordance with principles of sonorousbehaviour in the 'real world', e.g. that when we imagine a tune, we also imagine somekind of 'carrier' or performance of this tune, be that our own sub-vocalizations, ourown imagined fingers moving along a keyboard, imagining someone else singing orplaying, etc.

It could be tempting to use the term 'dynamics of musical imagery' to denote notonly the possible shifts between different qualia of images as just mentioned (e.g.shifting between different timbres in the imagery of a well known tune), but also todenote a number of other apparently not well explored aspects of musical imagery.For one thing, there will probably be highly variable degrees of salience or acuity inour images of musical sound, e.g. sometimes we may have very clear and intenseimages of sound, at other times images may seem pale and distant. This does possiblyhave to do with priming and 'recency' effects, but it could be very useful (in particularfor practical applications of musical imagery) to have a better understanding of suchshifts in acuity. Related to this is the apparent possibility of variable resolution inmusical imagery, meaning that we are probably all capable of zooming in on detail,re-playing some fragment again and again or in slow motion, zooming out, playing'fast forward', etc. Even the vague, macroscopic, retrospective, cumulative images oflong works of music, e.g. the sense of recollecting an entire concert 'in a now' , couldbe included in this dynamics of musical imagery.

And of course, musical imagery has some very practical applications, and maybe considered as integral to musical craftsmanship. For this reason, 'ear-training'(or solfige) has traditionally been a part of the curriculum of most schools of music.There is much to be said about the efficiency of the pedagogical methods used inear-training, and it seems quite clear that this subject could profit from advances in theunderstanding ofmusical imagery, in particular when we consider the demands placedon composers and arrangers to make reasonable predictions of how their compositionsor arrangements are going to sound. The same goes of course for conductors andfor other performers as well, and will here be closely linked with mental practiceof the motor components of performance. Notably, this is not only relevant for theprofessional musician, but also for music education at all levels. We can for instancethink of string instrument education for children where imagining sound is a crucialelement and has in fact been implemented in some methods of teaching.

Such practical applications of musical imagery, i.e. the capability of generatingsalient images of musical sound more or less at will, seems to be one of the least

Page 36: Musical Imagery

ALBRECHT SCHNEIDER AND ROLF INGE GOD0y 23

studied, yet in our opinion most crucial aspects of musical imagery: What is it thattriggers images of musical sound in our minds, or what is the 'engine' or the drivingforce? There are some studies which suggest that the triggering of musical imageryis closely linked with motor imagery (Mikumo, 1994, 1998), meaning that imaginingsound-producing actions will also trigger mental images of the resultant sounds. Thiscould be understood as related to the idea of 'motor theory' in perception and cogni-tion, a theory which has been controversial but which now seems to gain support fromthe application of brain observation techniques, producing data which suggests thatmotor areas of the brain are indeed involved in musical imagery tasks.

From these last remarks, we think it is fitting then to conclude this introductorychapter by situating musical imagery at the intersection of musicianship and severalscientific disciplines. This means that musical imagery will be the meeting place forsubjective images and more universally observable phenomena, something which inturn will mean that the study of musical imagery will have to draw both on personalintrospections and various inter-subjective methods of research.

Notes

I. Monologe (fUr zwei Klaviere), Mainz: Schott 1964; Musique pour les soupers du Roi Ubu. Ballet noiren sept parties et une entree, Kassel and Basel: B:1renreiter 1966. Recorded versions of both workswill be found in the anthology Zeitgeniissisc!le Musik in der Bundesrepublik Deutschland, Vol 5, editedby the Deutscher Musikrat, Deutsche Harmonia Mundi (EMI) DMR 1013-15 (1983).

2. Most of the relevant material (including papers and sketches previously unpublished) is contained inBrentano (1976). Some of Brentano's investigations pertaining to perception, including musical is-sues, are found in Brentano (1979). As to Brentano's epistemology, and especially with respect to hisconcepts of time and time perception, see Bergmann (1967, pp. 320ft).

3. See Gurwitsch (1975, pp. 60ft). Gurwitsch was a student of Husserl and the Gestalt psychologistWolfgang Ktlhler who discussed Husserl's (and Stumpf's) works with respect to Gestalt theory. In hisbook, he offers a concise introduction to the phenomenological concepts of perception and cognition.

4. Another systematic account is of course offered by William James in his Principles of psychology(James, 1890/1981, chapters XVII [sensation] and XVIII [imagination]).

5. This implies that, for example, a Pythagorean major third 81/64 actually played would be 'simplified'perceptually to the just major third 5/4. With respect to Riemanns Ideen as well as to Stumpf's Kon-kordanz, there are a number of unsolved factual and methodological problems (see Schneider, 1986,pp. 182ft).

6. It has been argued by Sloboda (1985, p. 192) that because of a rather simple structure of the Miserere,Mozart's 'memorization... does not involve inexplicable processes which set him apart from ordinarymusicians.'

7. There are several studies, however, which have to do with recall of (familiar or new) melodies frommemory, and which basically explore related problems; see e.g. Davies (1978, ch. 5), Sloboda (1985,pp. 183ft) and Crowder (1983). In this respect as well as in others, it is not always easy to separateperception from imagery and other aspects of conceptualization and memory.

References

Bergmann, G. (1967). Realism: A critique of Brentano and Meinong. Madison: University of WisconsinPress.

Block, N. (Ed.) (1981). Imagery. Cambridge, Mass. and London: The MIT Press.

Page 37: Musical Imagery

24 PERSPECTIVES AND CHALLENGES OF MUSICAL IMAGERY

Boring, E. (1950). A History ofexperimental psychology (2nd ed.). Englewood Cliffs: Prentice Hall.Brentano, F. v. (1874, 1924, 1928). Psychologie vom empirischen Standpunkt, Vol. 1-3. Leipzig: F. Meiner.Brentano, F. v. (1976). Philosophische Untersuchungenzu Raum, Zeit undKontinuum (Edited by St. Kfirner

and R. Chisholm). Hamburg: Meiner.Brentano, F. v. (1979). Untersuchungen zur Sinnespsychologie (2nd ed., edited by R.M. Chisholm and

R. Fabian). Hamburg: Meiner.Carrier, M. (1995). Philosophy of mind. In J. Mittelstrass (Ed.), Enzyklopadie Philosophie und Wis-

senschaftstheorie (Vol. 3, pp. 220-226). Stuttgart and Weimar: Metzeler.Carrier, M. and Mittelstrass, J. (1991). Mind. brain. behavior. The mind-body problem and the philosophy

ofpsychology. Berlin and New York: de Gruyter.Casey, E.S. (1976). Imagining. A phenomenological study. Bloomington: Indiana University Press.Chion, M. (1983). Guide des objets sonores. Paris: Editions BuchetlChastel.Chisholm, R. (1956). Perceiving. Ithaca, N.Y.: Cornell University Press.Clarke, J.M. (1994). Neuroanatomy: brain structure and function. In D.W. Zaidel (Ed.), Neuropsychology

(pp. 31-51). San Diego, London: Academic Press.Clifton, T. (1976). Music as constituted object. In FJ. Smith (Ed.), In Search ofmusical method (73-98).

London and N.Y.: Gordon & Breach.Clifton, T., (1983). Music as heard. A Study in applied phenomenology. New Haven and London: Yale

University Press.Crowder, R. G. (1989). Imagery for musical timbre. Journal ofExperimental Psychology: Human percep-

tion and performance, 15, 472-478.Crowder, R. (1993). Auditory memory. In S. McAdams & E. Bigand (Eds.), Thinking in sound. The

cognitive psychology ofhuman audition (pp. 113-145). Oxford: Clarendon Press.Davies, J.B. (1978). The Psychology ofmusic. London: Hutchinson.Dennett, D.C. (1991). Consciousness explained. Boston: Little, Brown & Co.Descartes, R. (1637). Discours de la methode pour bien conduire sa raison et chercher la verite dans les

sciences. Paris: M. Soly.Descartes, R. (1642). Meditationes de prima philosophia (ed. alt.). Amsterdam: L. Elzevir.Duchesneau, L. (1986). The Voice of the muse: A study of the role of in!Jpiration in musical composition.

Frankfurt: P. Lang.Ehrenfels, C. v. (1988). On 'Gestalt Qualities'. In B. Smith (Ed.), Foundations ofGestalt Theory (pp. 82-

117). MUnchenlWien: Philosophia Verlag.Fokker, A.D. (1945). Rekenkundige Bespiegeling der muziek (A mathematical approach to music). Gor-

inchem: J. Noorduijn en Zoon.Ebbinghaus, H. (1885). Ober das Gedachtnis. Untersuchungen zur experimentellen Psychologie. Leipzig:

J.A. Barth.Georges, K. E. (1869). Ausfiihrliches Lateinisch-deutcsches Handworterbuch, Vol. 1 (6th edition). Leipzig:

Hahn'sche Verlagsbuchhandling.G6mez de la Serna, R. (1977). Dali. Madrid: Espasa & Calpe.Gurwitsch, A. (1975). Das Bewusstsein!Jfeld (edited by W. Frfihlich). Berlin and N.Y.: de Gruyter.Hubbard, T., L., & Stoeckig, K. (1988). Musical imagery: Generation of tones and chords. Journal of

Experimental Psychology: Learning, Memory, and Cognition, 14, 656-667.Hume, D. (1758/1951). Enquiry concerning human understanding (2nd ed, reprint 1951). Oxford: Claren-

don Press.Husserl, E. (1901). Logische Untersuchungen. Elemente einer phanomenologischenAujklarung der Erken-

ntn;s, Vols. 1-3. Halle: Niemeyer.Husserl, E. (1928). Husserls Vorlesungen zur des inneren Zeitbewusstseins (edited by

M. Heidegger). Jahrbuchfiir Philosophie und phanomenologische Forschung, 9,367-496.Husserl, E. (1976). Erfahrung und Urteil. Untersuchungen zur Genealogie der Logik (5th ed.). Hamburg:

Meiner.Ingarden, R. (1962). Untersuchungen zur Ontologie der Kunst. TUbingen: NiemeyerIvarson, P. and Krumhansl, C.L. (1993). Isolating the dynamic attributes of musical timbre. Journal ofthe

Acoustic Society ofAmerica, 94,2595-2603.Jaensch, E. (1927). Die Eidetik und die typologische Forschungsmethode (2nd ed.). Leipzig: Quelle &

Meyer. (English translation by?? (1930): Eidetic imagery and typological methods of investigation.

Page 38: Musical Imagery

ALBRECHT SCHNEIDER AND ROLF INGE GOD0y 25

London: K. Paul, Trench, Trubner & Co.; Ney York: Harcourt, Brace & Co.)James, W. (1890/1981). The Principles ofpsychology,Vols. 1-3 (edited by F. Bowers and I.K. Skrupskelis).

Cambridge, MAlLondon: Harvard University Press.Kant, I. (1787). Kritik der Reinen Vernunft (2nd ed.). Riga: Hartknoch.Kant, I. (1792). Kritik der Urteilskraft. Ktinigsberg and Riga: HartknochKlinger, E. (Ed.) (1981). Imagery [2]: Concepts, results, and applications. London, New York: Plenum

Press.Knepler, G. (1991). Wolfgang Amade Mozart. Annaherungen. Berlin: Henschel-Verlag.Kochmann, R. (1923). Uber musikalische Ged:tchtnisbilder. Zeitschriftfiir angewandte Psychologie, 22,

329-351.Kosslyn, S.M. (1980). Image and mind. Cambridge, MA: Harvard University Press.Kosslyn, S.M. (1994). Image and Brain: The Resolution of the Image Debate. Cambridge, Mass. and

London: The MIT Press.Kurth, E. (1947). Musikpsychologie (2nd ed.). Bern: Krompholz.Mainwaring, J. (1933). Kinaesthetic factors in the recall of musical experience. British Journal ofPsychol-

ogy, 23, 284-307.Marin, O.S.M. & Perry, D.W. (1999). Neurological Aspects of music perception and performance. In

D. Deutsch (Ed.), The Psychology ofmusic (2nd ed., pp. 653-724). San Diego etc.: Academic Press.McDaniel, M.A. & Pressley, M. (Eds.) (1987). Imagery and related mnemonics processes. Theories,

individual differences, and applications. New York, Berlin etc.: Springer.Meyer. L.B. (1956). Emotion and meaning in music. Chicago: University of Chicago Press.Mikumo, M. (1994). Motor encoding strategy for pitches of melodies. Music Perception, 12,175-197.Mikumo, M. (1998). Encoding strategies for pitch information. Japanese Psychological Monographs, No.

27.N:t:tt:inen, R., & Winkler, I. (1999). The Concept of auditory stimulus representation in cognitive neuro-

science. Psychological Bulletin, 125, 826-859.Newman, E.B. (1948). Perception. In E.G. Boring, H.S. Langfeld & H.P. Weld (Eds.), Foundations of

psychology (pp. 215-249). New York: Wiley (London: Chapman & Hall).Pitt, M.A. & Crowder, R. (1992). The role of spectral and dynamic cues for musical timbre. Journal of

Experimental Psychology: human perception and performance, 18,728-738.Reuter, C. (1995). Der Einschwingvorgang nichtperkussiverMusikinstrumente. Frankfurt, etc.: LangRiemann, H. (1914-16). Ideen zu einer 'Lehre von den Tonvorstellungen' .Jahrbuch Peters, 21/22 (1914/15),

1-26; 23 (1916),1-15.Rouw, R., Kosslyn. S. M., & Hamel. R. (1997). Detecting high-level and low-level properties in visual

images and visual percepts. Cognition, 63, 209-226.Sacks, O. (1985). The Man who mistook his wife for a hat. New York: Summit Books.Sartre, J.P. (1940). L'lmaginaire. Psychologie phenomenologiquede l'imagination. Paris: GallimardSchaeffer, P. (1966). Traite des objets musicaux. Paris: Editions du Seuil.Scheerer, E. (1993). Mentale Repr:tsentation in interdisziplin:trer Perspektive. Zeitschrift fiir Psychologie,

201,136-166.Schneider, A. (1992). On Concepts of 'tonal space' and the dimensions of sound. In R. Spintge & R. Droh

(Eds.), MusicMedicine (pp. 102-127). St. Louis: MMB Music.Schneider, A. (1995). Musik sehen - Musik htlren. Ober Konkurrenz und Komplementarit:it von Auge und

Ohr. Hamburger Jahrbuch der Musikwissenschaft, 13, 123-150.Schneider, A. (1997a). On categorical perception of pitch and the recognition of intonation variants. In

Pylkk:tnen, P. Pylkkti, A. Hautam:iki (Eds.), Brain. mind and physics (pp. 250-261). Amsterdam andTokyo: IDS Press and Ohmsha.

Schneider, A. (1997b). 'Verschmelzung', tonal fusion, and consonance: Carl Stumpf revisited. In M. Le-man (Ed.), Music, Gestalt, and Computing (pp. 117-143). Berlin, New York etc.: Springer 1997.

SchUtz, A. (1976). Fragments on the phenomenology of music (edited by F. Kersten). In FJ. Smith (Ed.),In Search ofmusical method (pp. 5-71). London: Gordon & Breach,.

Scruton, R. (1997). The Aesthetics ofmusic. Oxford: Clarendon Press.Segal, SJ. (Ed.) (1971). Imagery: Current cognitive approaches. New York, London: Academic Press.Segal, SJ. & Fusella, V. (1970). Influence of imagined pictures and sounds on the detection of visual and

auditory signals. Journal ofExperimental Psychology, 83, 458-464.

Page 39: Musical Imagery

26 PERSPECTIVES AND CHALLENGES OF MUSICAL IMAGERY

ShOff, lE. et al. (Eds.) (1980). Imagery ( I]: Its many dimensions and applications. New York, London:Plenum Press.

Slawson, W. (1985). Sound Color. Berkeley and Los Angeles: University of California Press.Sloboda, J. (1985). The musical mind. The cognitive psychology ofmusic. Oxford: Clarendon Press.Stein, B. E. & Meredith, M. A. (1993). The Merging ofthe Senses. Cambridge, Mass.: The MIT Press.Stoffer, T. (1996). Mentale Repr:tsentation musikalischer Strukturen. Zeitschrift fiir Semiotik, 18, 213-234.Stumpf, C. (1883, 1890). Tonpsychologie, Vois. 1 & 2. Leipzig: Hirzel.Stumpf, C. (1898). Konsonanzund Dissonanz. Leipzig: J. Barth.Stumpf, C. (1907). Erscheinungen und psychische Funktionen. Abhandlungen der Kt)niglich Preussis-

chen Akademie der Wissenschaften, Jahrgang 1906, Phil.-hist. Klasse Nr. 4. Berlin: Akademie derWissenschaften.

Stumpf, C. (1911). Konsonanzund Konkordanz. Zeitschriftfiir Psychologie, 58, 321-355.Stumpf, C. (1918). Empfindung und Vorstellung. Abhandlungen der Kt)niglich Preussischen Akademie der

Wissenschaften, Jahrgang 1918, Phil.-hist. Klasse Nr. 1. Berlin: Akademie der Wissenschaften.Stumpf, C. (1926). Die Sprachlaute. Berlin: SpringerStumpf, C. (1939). Erkenntnislehre, Vois. 1 & 2. Leipzig: J. Barth.Vogel, M. (1993). On the relations oftone. Bonn: Verlag fUr Systematische Musikwiss.Weber, RJ. & Brown, S. (1986). Musical imagery. Music Perception, 3, 411-426.Wilbanks, J. (1968). Hume's theory ofimagination. The Hague: Nijhoff.

Page 40: Musical Imagery

2

NeurophysiologicalMechanisms UnderlyingAuditory Image Formationin Music

Petf Janata

Introduction

The formation of contextually dependent expectancies is an important feature of mu-sic cognition. Both explicit and implicit knowledge about the structure of a piece ofmusic serve to establish highly specific expectations about the pitch, timbre, and otherfeatures of ensuing musical information. Musical expectancies represent a specifictype of musical imagery. On the one hand, musical imagery might be thought of asa mental process that occurs over an extended period as a person imagines hearing orperforming a piece of music. This type of imagery differs from expectancy formationin that it may transpire in the absence of sensory input. Active expectancy formation,on the other hand, generally requires that specific images for subsequent sensory in-put are based on preceding sensory input and established knowledge of what type ofsensory input to expect.

A neuroscientific way of framing the general question is, 'What are the brain ar-eas and mechanisms that support the formation of such images and the interaction ofthese images with incoming sensory information?' Electrophysiological measures ofbrain activity provide a description of how the human brain implements these typesof processes, and a variety of different experimental designs can be used to addressvarious components of these processes. Over the past 30 years, studies of auditoryevoked potentials have provided support for models of how voluntarily maintained

Page 41: Musical Imagery

28 NEUROPHYSIOLOGICAL MECHANISMS

images (expectancies) of single tones interact with sensory input consisting of multi-ple auditory stimuli. Stimuli based on musical considerations 1) extend the types ofdesigns that can be used to study mechanisms of auditory image formation, 2) pro-vide important tests of the existing models, and 3) provide a framework, rooted inthe neuroethological tradition (PflUger & Menzel, 1999), for understanding the neuralunderpinnings of human musical behavior.

Forms ofmusical imagery

Perhaps the first step in studying musical imagery is to place musical imagery in thebroader context of auditory imagery and the general domain of mental imagery, if forno other reason than to borrow from definitions of mental imagery derived primarilyfrom considerations of the form and formation of visual images. Finke (1989) definesmental imagery as, 'the mental invention or recreation of an experience that in at leastsome respects resembles the experience of actually perceiving an object or an event,either in conjunction with, or in the absence of, direct sensory stimulation.'

In order to pinpoint and characterize specific neural mechanisms underlying musi-cal imagery, it is necessary to define what a musical image is and what the processes offorming such an image or series of images are. Mirroring Finke's definition, I considertwo contexts in which musical imagery occurs. In the first context, musical imagery ispurely a mental act: an endogenous phenomenon in which the content of the images isinternally generated from long-term memory stores ofmusical knowledge and is unin-fluenced by any concurrent sensory input. In the second context, formation of musicalimages depends on an interaction ofmemory-dependent processes (expectancies) withrepresentations of incoming auditory input.

The relationship between perception and mental imagery has been elaborated andtested extensively with visual material by Kosslyn (1980, 1994). In Kosslyn's view(1994, p. 287), 'images are formed by the same processes that allow one to anticipatewhat one would see if a particular object or scene were present.' Thus, postulatingtwo contexts for musical imagery is in keeping with other theories of mental imagery.

Figure 1 on the facing page shows a theoretical framework for thinking about howimagery processes within these two different contexts might be instantiated in a setof brain structures. Because a complete theory of musical imagery should includealso imagery for musical performance and the interaction of sensory and motor infor-mation, the diagram in Figure 1 is restricted, in the interest of relative simplicity, torepresent processes that may be involved in 'sensory' imagery rather than 'motor' im-agery. Following a brief description of the framework, I summarize the physiologicalmethods and experiments in support of it.

The arrows represent the flow of information across different general brain areaslisted at the right through time. Those brain areas involved more immediately withsensory processing are listed at the bottom, while those involved in abstract reason-ing and memory storage/retrieval are listed toward the top. The first type of auditoryimagery unfolds over longer time periods (seconds or minutes) and is generally un-constrained by sensory input. I call it 'non-expectant' because we neither expect tohear anything as we are imagining, nor are we forming expectations of what we will

Page 42: Musical Imagery

PETRJANATA

Forms of imagery in relation to brain structures(a sensory perspective)

29

"Non-Expectant"

Abstract-H

Eidetic_v

"Expectant"

"'//Prefrontal cortex

Association cortex

Sensory cortex

Subcortical structures

Time

Figure 1. Schematic view of different types of auditory imagery and how these might beinstantiated in the human brain (see text for details).

hear in the immediate future. This is the type of imagery we engage in when we imag-ine a melody in our mind. Similarly, we might mentally improvise melodies that wehave never heard before but are able to compose based on knowledge, either explicitor implicit, of tonal sequences, etc. Thus, this type of imagery relies on long-termmemories we have of specific musical material, or on a more abstract knowledge ofmusical structure, e.g. the tonal relationships in western tonal music.

Non-expectant imagery may be differentiated further into two modes of imagerythat I call 'abstract' and 'eidetic'. In the abstract mode, the sequence of pitches in amelody might be imagined without the sense of 'hearing' the melody being played byany particular instrument. In the eidetic mode, a strong component of the image isthe impression that an instrument or group of instruments is performing the imaginedpiece of music, and the image has a strong sensory quality to it.

To the extent that this 'non-expectant' imagery depends on retrieving informa-tion from long term memory stores, it may rely heavily on areas of prefrontal cortexwhich have been implicated in general memory functions (Goldman-Rakic, 1996).One prediction of the 'abstract/eidetic' distinction is that the eidetic qualities of im-ages engage brain regions more immediately involved in processing sensory infor-mation. Increased eidetic qualities of the images are represented by increases in thedepths of the arcs of the solid arrows in Figure 1. For example, a strong impression ofhearing an orchestra in our minds might be indicative of auditory cortex involvementin the imagery process. These relationship between the vividness of the mental imageand the brain areas that are activated by the image remains to be tested. A modicumof support for the notion that more 'true-to-life' images activate those brain areasmore immediately involved in the formation of sensory representations comes fromneuroimaging studies of schizophrenic patients experiencing auditory hallucinationswhich show that auditory cortical areas, including primary auditory cortex are acti-vated during hallucinations (Dierks et aI., 1999; Griffiths, Jackson, Spillane, Friston,& Frackowiak, 1997).

Page 43: Musical Imagery

30 NEUROPHYSIOLOGICAL MECHANISMS

'Expectant' imagery refers to the process of forming mental images when listen-ing, attentively, to music or sounds in general. In addition to relying on long-termmemory for musical structure or a specific piece of music, the mental images areadditionally constrained by the interactions of contemporaneous sensory informationwith the memorized information. In other words, as we listen to the notes of an as-cending major scale we can form a very specific image/expectation of the next note inthe scale. The specificity arises in part from our knowledge of the intervalic relation-ships between successive notes in a major scale, as well as the exact frequencies ofthe notes being used to create this particular instance of the scale. If the playing of thenotes in the scale were to stop, we could continue forming images of the remainingnotes. Similarly, in listening to a chamber ensemble playing a familiar piece of music,our expectancies are formed from familiarity with the piece of music as well as thesensory information we are receiving about the tone of these particular instruments, orthe expressive tendencies of the particular ensemble. In Figure 1, the merging of thestraight arrows represents the interaction of 'top-down' expectancies with 'bottom-up'sensory input, and a subsequent influence of this interaction on the expectancy/imageforming processes in a continuous, iterative, process. The extensive literature on theformation of auditory representations (reviewed in & Winkler, 1999) andthe interaction of these representations with top-down influences such as selective at-tention (NaaUinen, 1992) implicate areas such as the secondary auditory cortex as aneural substrate for these interactions.

Methods for probing human brain activity

Inferences about brain function are made by measuring changes in one or severaldependent variable(s) as a subject performs specific cognitive tasks. The dependentvariables can range from behavioral measures such as reaction time and accuracy tophysiological measures of regional cerebral blood flow or field potentials generatedby neurons. I will focus on the physiological measures, first providing a brief de-scription of the signals being measured along with the measurement and data analysistechniques and then summarizing results of studies that are of particular relevance tomusical imagery.

Studies of the brain's physiological responses typically strive to 1) identify brainareas that are responsible for performing specific tasks or computations, and 2) de-scribe the neural mechanisms by which stimuli are represented and cognitive tasksare performed. Functional imaging techniques such as positron emission tomography(PET) and functional magnetic resonance imaging (fMRI) are well-suited to addressthe first goal. Both of these methods monitor blood flow changes in the brain. As largepopulations of neurons in those brain areas that perform a cognitive task become activeand increase their metabolism, the blood supply to those areas increases in order tomeet the increased demand for oxygen (Villringer, 1999). Note that neuronal activityis modulated more quickly than is the associated blood flow, resulting in a tempo-ral resolution limit (peak response is 2-4 seconds from onset of event) of PET andfMRI. While PET and fMRI provide only an indirect measure of neural activity, theydo so with much better spatial resolution than do direct, non-invasive measurements

Page 44: Musical Imagery

PETRJANATA 31

of neural activity. Thus, these methods are invaluable tools for localizing cognitivefunctions, and their application to issues of auditory imagery is described below.

The temporal properties of neural responses in cognitive tasks are best captured bydirect measures of the neural activity. The electrical fields generated by large popula-tions of neurons comprise the electroencephalogram (EEG), and the magnetic coun-terpart to the electrical fields forms the magnetoencephalogram (MEG). Given thedistance ofEEG recording electrodes positioned at the scalp from the cortical surface,large populations of similarly oriented neurons must be active synchronously in orderfor them to create a sufficiently strong electrical field that can be detected at the scalpsurface (Nunez, 1981). The superposition of many electrical fields from many neu-ronal populations, along with the low-pass filtering characteristics of the skull, makesthe problem of unambiguously localizing the neural sources based on EEG data adifficult one.

The nature of the experimental situation generally dictates how the EEG is ana-lyzed. When the experiment consists of short, discrete, and clearly defined stimuli,such as single tones embedded in longer sequences of tones, the stimuli are presentedmany (30-1000) times while the EEG is recorded. The EEG responses to each pre-sentation are then averaged in order to extract the mean waveform. This waveformis interpreted as the unique response to the particular type of stimulus, and is gener-ally referred to as an event-related potential (ERP) waveform. The ERP waveformis analyzed in terms of the amplitudes, latencies, and areas of the peaks and troughs.The nomenclature reflects the polarity and typical latency of the deflection in the ERPwaveform. For example, the auditory NI00 is a negative peak (as measured from anelectrode at the vertex of the head relative to a mastoid, ear, noise or non-cephalicreference electrode) which occurs approximately 100 ms following the onset of thestimulus. Features of the ERP waveform, such as the NI00 or P300, are commonlycalled 'components' to indicate their dependence on the perceptual/cognitive factorsthat modulate their presence, size, and latency. Individual ERP components, partic-ularly those occurring in the hundreds of milliseconds do not necessarily reflect uni-tary cognitive phenomena. For example, the amplitude of the Nl00 is modulated byboth physical features of the stimulus as well as the attentional state of the subject.The NI00 represents a conglomerate of brain processes associated with processingan auditory stimulus (NaaUinen & Picton, 1987), and it may overlap with other ERPcomponents such as the mismatch negativity (MMN) 1992).

In tasks employing continuous stimulus situations in which individual discreteevents cannot be identified, such as listening to a recorded piece of music, the EEGis typically analyzed in the frequency domain. Here, the assumption is that the per-formance of any given cognitive task will be associated with a sustained and stablepattern of neural activity involving some number of brain areas. The field potentialarising from the neural activity pattern is then described by its frequency spectrum.Typically, successive 2s EEG epochs are converted into the frequency domain and theaverage power spectrum is computed. The magnitude of the power spectrum indicatesthe strength with which oscillations at a particular frequency are represented in theEEG during a cognitive state. Brain activity associated with any given cognitive taskcan be isolated by subtracting the average power spectrum during the task from theaverage power spectrum during rest, and the synchronization of different brain regions

Page 45: Musical Imagery

32 NEUROPHYSIOLOGICAL MECHANISMS

can be assessed through the frequency-specific coherence between pairs of electrodes(Rappelsberger & Petsche, 1988; Srinivasan, Nunez, & Silberstein, 1998). Overall,the spectral analysis of the EEG tends to provide information about global brain dy-namics associated with a particular cognitive task, whereas ERPs are used to elucidatethe sequence of processing steps associated with discrete stimulus events.

Physiological measures of mental images

'Non-expectant imagery'What is the manifestation of different types of auditory/musical imagery in the brain,and what is the evidence supporting the functional architecture described above? Tothe extent that 'non-expectant' imagery establishes a set of stationary processes in thebrain, i.e. a stable pattern of activity within specific neural circuits over the durationthat subjects perform an imagery task, it should be possible to capture signatures ofthese processes in EEG recordings. Petsche and colleagues have found wide-spreadcoordination, as manifested in the coherence of the EEG, of brain areas as subjectsimagine hearing or composing musical material (Petsche, Richter, von Stein, Etlinger,& Filz, 1993) or mentally play an instrument (Petsche, von Stein, & Filz, 1996).Imagining a piece of music leads to an increase in the number of observed coher-ence changes compared to listening to the same piece of music. The exact patternsof changes differ appreciably among subjects, however. For example, in one subjectimagery is associated with theta and alpha band decreases and beta band increases,whereas in another subject there are coherence increases across all frequency bands.

So far, the best evidence for those brain areas involved in auditory imagery in theabsence of acoustic input come from PET studies by Zatorre and colleagues (Halpern& Zatorre, 1999; Zatorre, Evans, & Meyer, 1994; Zatorre, Halpern, Perry, Meyer,& Evans, 1996). In their tasks, mentally scanning through a melody results in supe-rior temporal gyrus (STG) and right frontal lobe activations. The auditory cortex liesalong the STG. The frontal lobes are widely implicated in memory retrieval processes(Goldman-Rakic 1996). The observation that the auditory cortex is activated duringthese tasks is extremely important because it indicates that those structures responsi-ble for the processing of auditory stimuli are also activated under certain conditionsof musical/auditory imagery when no sensory stimuli are present. Indirectly, theseresults suggest that discrete, mentally generated, auditory images might be comparedagainst incoming sensory information in the auditory cortex.

Formation of specific musical expectanciesAlthough spectral analysis of the EEG is typically applied to sustained tasks ratherthan to event-related tasks, it has been applied to analyzing the build up and resolu-tion of harmonic expectancies (Janata & Petsche, 1993), implicating right frontal andtemporal areas in the processing of cadences and their resolutions. ERPs analyzedas time-domain averages have been used by several researchers to probe musical ex-pectancy. Although the results of the studies differ slightly, the general finding is thatunexpected notes and chords elicit larger positive potentials from 300 to 600 ms fol-lowing the onset of the stimulus than do highly expected notes and chords (Besson

Page 46: Musical Imagery

PETRJANATA 33

Electrode C48 "P2006

4,--..> 2::t'-'0 0bJ)Cd

-2

-4-- Tonic

-6 ........... MinorNI00 -- Dissonant

- 8 /1000 2000 3000 4000 5000

Time (ms)

Figure 2. Averaged event related potentials (ERPs) recorded during a hannonic priming taskin which subjects heard a I, IV, V cadence (at 0 ms, 1000 ms, 2000 ms, respec-tively), imagined the best possible resolution (3000-4000 ms) and heard one ofthree possible resolutions at 4000 ms, whereupon they had to decide if it was theresolution they had imagined. The waveform components labeled P3a and P3b typ-ically vary in amplitude as a function of expectancy. Adapted from Janata, 1995.

& Fai"ta, 1995; Besson & Macar, 1987; Hantz, Kreilick, Kananen, & Swartz, 1997;Janata, 1995; see also Janata & Petsche, 1993 for a frequency domain analysis ofevent-related EEG data; Patel, Gibson, Ratner, Besson, & Holcomb, 1998). The gen-eral names in the ERP literature for the large, late positive waves are 'P300' and latepositive complex (LPC), and their amplitude is inversely proportional to the subjectiveprobability of the eliciting event (for a review, see Donchin & Coles, 1988; Verleger,1988). More recently, researchers have focussed on contributions of frontal brain ar-eas to the processing of harmonic expectancies, showing negative shifts in the wave-forms for harmonically deviant chords, compared to contextually consonant chords(Koelsch, Gunter, Friederici, & Schrager, 2000; Patel et aI., 1998).

Figure 2 shows ERP waveforms in response to chords in a priming (I, IV, V) ca-dence and three resolutions of the cadence. In this experiment (Janata, 1995), chordsin the priming cadences were presented in numerous inversions and several differentkeys, using a sampled grand piano sound. Each chord was presented for 1 s. ForIs between the offset of the final chord and the onset of the resolution (3000-4000s),subjects imagined the best possible resolution. At 4000 ms, one of three possible res-olutions was heard: the expected resolution to the I (thick solid line), a harmonicallyplausible resolution to the tonic of the relative minor (dashed line), or a harmonicallyimplausible resolution to a triad based on the tritone (thin solid line). The large nega-tive (Nl00) and positive peaks (P200) characterize the auditory evoked potential. The

Page 47: Musical Imagery

34 NEUROPHYSIOLOGICAL MECHANISMS

response to the harmonically incongruous ending elicited the largest amplitudes in thetwo P300 components. The brain circuitry that gives rise to these late potentials is notwell understood. Intracerebral recordings indicate multiple sites that exhibit P300-likeactivations (Halgren, Marinkovic, & Chauvel, 1998). Thus, the details of how activityin auditory cortical regions is coordinated with activity in other brain areas necessarilyremain murky.

Interestingly, Janata (1995) observed a large waveform in the period when sub-jects were asked to imagine the best possible resolution, suggesting that this was ameasurable brain response to the act of imagining the resolution to the tonic. Unfortu-nately, the evoked potential may have been caused by the offset of the previous chord,rather than voluntary image formation. Further studies have been performed to inves-tigate evoked potentials elicited by imagined events, and some preliminary results arepresented below.

Measures of expectancies in the absence of sensory inputIn ERP studies of auditory expectancy, brain potentials resulting from the expectancyforming process and potentials arising in response to sensory stimulation are com-bined. Although it is possible to measure the outcome of the interaction of the ex-pectancy (mental image) with the sensory information, determining the electrophysi-ological signature of each information stream poses a greater challenge. One way ofstudying the expectation is to simply omit an expected stimulus, thereby removing allsensory components from the ERP response (Besson & Fai"ta, 1995; Besson, Fai"ta,Czternasty, & Kutas, 1997; Ruchkin, Sutton, & Tueting, 1975; Simson, Vaughan,& Ritter, 1976; Sutton, Tueting, Zubin, & John, 1967; Weinberg, Walter, & Crow,1970). Such omissions generate a large P300, typical of unexpected stimuli. Insome cases, earlier components, reminiscent of the auditory evoked potential are alsopresent (Besson et aI., 1997). The relationship between potentials generated in re-sponse to unexpected stimulus omissions and voluntarily generated images has notbeen explored in more detail, however.

These earlier studies raise several questions about the neural processes involved informing auditory expectancies and the interactions of these expectancies with sensoryinput. Specifically, can the mental processes involved in mental image formation bedissociated further from the process of expectation? In other words, is it possible tomeasure emitted potentials associated with the formation of a mental image in thecomplete absence of an expectation that a stimulus will occur? If the answer is yes, isthere any evidence that such emitted potentials arise from the auditory cortex whereexpectancies and sensory input are believed to interact?

To begin investigating these questions, I performed a study in which musicallytrained subjects were asked to first listen to and then imagine completions of simpleeight-note melodic phrases (Janata, in press). Examples of the melodic fragments andthe various experimental conditions are schematized in Figure 3 on the facing page.On each 'imagery' trial (Figure 3B), subjects first heard all eight notes of the melody,and made a key-press synchronously with the last note ('All Heard' condition). Next,they heard the first five notes of the same melody and continued imagining the remain-ing three, pressing a key at the time they thought the last note would have occurred('3 Imagined' condition). They then heard the first three notes of the melody, imag-

Page 48: Musical Imagery

PETRJANATA 35

A .J = 120- Melody 1

B Imagery trials

Melody 2

J heard note

All Heard (AH)

3 Imagined (31)

5 Imagined (51)

No Imagery (NI)I

J-. J-. J-. J-. J-. J-. J-. J-. X Imagined note

X X X7key press

J-. J-. J-. X X X X X

J-. J-. J-. J-. J-.msec o }000 2000 3000 4000

Figure 3. Melodies and experimental conditions used in the auditory imagery experiments.A) The two simple melodies in musical notation. B) Schematic diagram of imagerytrials. In the 'All Heard' condition subjects heard the entire melodic fragment. Inthe '3 Imagined condition,' the initial five notes were heard and the remaining threeimagined. In the '5 Imagined' condition, subjects heard the initial three notes andimagined the remaining five. On each trial, these three conditions appeared inimmediate succession. In separate blocks of 'No Imagery' trials, subjects heard theinitial five notes but did not imagine the remaining three.

ined the remaining five, once again making a key-press synchronously with the lastimagined note ('5 Imagined' condition). In a separate control block of 'no-imagery'trials, subjects heard the first five notes of the melody but did not continue imaginingthe remaining notes and made no key presses ('No Imagery' condition). They wereexplicitly instructed to not imagine a continuation of the melody and to try to hear thefive notes as a complete phrase. In order to achieve an appropriate signal to noise ra-tio in the ERP waveform, subjects performed 100 imagery trials and 100 no-imagerytrials.

Subjects' brain electrical activity was recorded throughout each trial using a geo-desic array of 129 electrodes (Electrical Geodesics Inc., Eugene, OR) distributedacross the scalp. Such dense sampling of the electrical potential at the scalp surfaceallows one to construct an accurate topographical map of the voltage at each sampledtime point.

Figure 4A (see page 36) illustrates a typical auditory N100 topography in which

Page 49: Musical Imagery

36 NEUROPHYSIOLOGICAL MECHANISMS

6th HeardNlOO

1.5

0.9

0.3

·0.3

·0.9

-1.596-192ms 376 - 480 ms

7th Heard1.5

0.9

0.3

-0.3

- 0.9

- 1.5704 - 808 ms

2.0

1.2

0.4

-0.4

- 1.2

"N100" Ist of 3 Imagined P3001.5

0.9

0.3

-0.3

-0.9

- 1.5

2nd Imagined2.0

1.2

0.4

·0.4

- 1.2

- 2.0

1.0

0.6

0.2

-0.2

-0.6

- 1.0

i-96-192ms

"NlOO"

96·192ms

96-192ms

Ist of 5 Imagined1.5

0.9

0.3

-0.3

-0.9

- 1.5

1.5

0.9

0.3

376 - 480 ms

P300

376 - 480 ms

376 - 480 ms

704 - 808 ms

2nd Imagined2.0

1.2

0.4

- 0.4

- 1.2

- 2.0704 - 808 ms

2.0

1.2

0.4

·0.4

- 1.2- 2.0 L- ...J

704· 808 ms

1.0

0.6

0.2

-0.2

-0.6

- 1.0

1.0

0.6 >:s.0.2 ';;'

-02• Cl.e-0.6 <

- 1.0

Figure 4. Summary topographical maps of the activation elicited by different tasks (each ofthe four rows) in the auditory imagery experiment. Each circle represents a viewdown onto the top of the head. The nose would be at the top of the circle. Plotted arethe average voltage values in time windows that encompass the N I00 (left column).the P300 (center column). and the N loolP2oo for the next note. Maps with negativevalues at centro-frontal sites and positive values around the perimeter (A. D. G) aretypical of the auditory N100 response. The large parietal positivities in the imageryconditions (E. H) are characteristic of a P300 response. Each map is the activationaveraged across seven subjects and 60-90 trials/subject.

Page 50: Musical Imagery

PETRJANATA 37

there is a large negative focus across centro-frontal electrode sites on top of the head,and a ring of positive voltage at electrodes at more inferior positions of the scalparound the perimeter. Within the same time window, imagining the first of a sequenceof notes elicited a topographical pattern in the brain electrical activity that resem-bled the topographical pattern elicited by the corresponding heard note in the originalphrase (Figure 4D, G). No such pattern was elicited in the condition in which subjectswere asked to abstain from imagining the continuation of the phrase (Figure 4J). Inthe imagery conditions, another stable topographical pattern was assumed from 375-480 ms after the time at which the first note was to be imagined (Figure 4E, H). Thepositive peak above centro-parietal electrodes is characteristic of the P300 componentmentioned earlier. Because subjects were expecting the sounds to cease, the pres-ence of the P300 does not indicate an expectancy violation response to an unexpectedcessation of input. This interpretation is further supported by the absence of a P300response in the no-imagery condition (Figure 4K). The presence of the P300 is as-sociated specifically with the imagery task, though it is difficult to assign a furtherfunctional role at this time.

Given the relatively recent advent of dense-EEG methods, statistical techniquesfor quantitative assessment of the similarity or dissimilarity of different topographieshave not been well established. Nonetheless, one way to compare topographical mapsis to calculate the correlation between them. The similarity of the topographical statesacross the different conditions was assessed by correlating the topographies in the3 Imagined condition with corresponding topographies in the other conditions. Thetemporal evolution of correlations among voltage topographies elicited in the four ex-perimental conditions is depicted in Figure 5 on the next page. Average topographieswere computed for successive 100 ms epochs. A 100 ms window size reduces theamount of data as much as possible while preserving the most prominent and stabletopographical distributions. In order to compare the 3 Imagined and 5 Imagined con-ditions, the topographies from the two conditions had to be aligned with respect tothe onsets of the first imagined events. Figure 5A shows the correlations of topogra-phies among the conditions during the first two notes of the melodies. During theseepochs, the acoustical parameters were identical across the conditions. Although thedegree of correlation between the 3 Imagined condition and the other conditions var-ied from time-window to time-window, as expected, there were no differences amongany of the conditions. Figure 5B shows the correlations when the tasks and acous-tic stimulation diverged. The two imagery conditions were most highly correlated,whereas the topographies in the 3 Imagined and no-imagery conditions were uncorre-lated. Note that the acoustic input in the latter pair of conditions was identical. Thecorrelation between the 3 Imagined and All Heard conditions assumed intermediatevalues, particularly during the first portion of the first imagined note epoch. As a pointof reference, the topographies shown in the leftmost column of Figure 4 correspondto the second 100 ms window in Figure 5B.

While the instruction to imagine the continuation of the melody resulted in a clearemitted potential in response to the first imagined note, and the topographical patternwas similar to the N100 elicited by the corresponding heard note, the same pattern wasnot observed for the subsequent imagined notes. Rather, the topographical activationpattern shifted to a frontal-positive peak around the time that the second note was to

Page 51: Musical Imagery

38

A

5 0.8

;g 0.6](,J 0.4

§ 0.2

-g 0oU-O.2

1st heard

NEUROPHYSIOLOGICAL MECHANISMS

2nd heard

3 imagined vs 5 imagined3 imagined vs all heard3 imagined vs no imagined

- 0.4 '---J.-__...L-__.L.-_----I.__--..J.. -..I-__ __.J...___.L..___ ___J____'

B 1st imagined

5 6100 msec window#

9

2nd imagined

10

5 0.8

;g 0.6](,J 0.4

§ 0.2

-g 0oU-0.2

•• • p<O.05•• p<O.Ol

- 0.4 L..-.....L-__....L...-__ __--L._ _'___--'--___L.___...L..___L....__ ___L_____J

5 6100 msec window#

9 10

Figure 5. Time-series of the correlations among the experimental conditions. For eachsubject, the average scalp-topographies in successive 100 ms windows from the3 Imagined (31) condition were correlated with corresponding topographies in theother task conditions. The correlations between conditions (averaged across sub-jects) are plotted in A & B. Error bars indicate the standard error of the mean. A)The solid line shows the correlation of evoked responses during the first and secondheard notes in the 31 and 5 Imagined (51) task conditions. The dashed line showsthe correlation between the 31 and All Heard (AH) conditions during the first andsecond heard notes. The dashed-dotted line shows the correlation between the 31and No-Imagery (NI) conditions during the first and second heard notes. The acous-tic stimulation was identical across task conditions during this 1 second epoch. B)The solid line shows the correlation between the 31 and 51 conditions during thefirst and second imagined notes in each condition. The dashed line shows the com-parison between the first and second imagined notes in the 31 condition and thesixth and seventh heard notes in the AH condition. The dashed-dotted line showsthe comparison of the first and second imagined notes in the 3I condition and cor-responding silence in the NI condition. Asterisks indicate windows in which therewere significant differences in the correlation coefficients between conditions.

Page 52: Musical Imagery

PETRJANATA 39

be imagined (Figure 4F, I). When no imagery task was performed, no topographicaltransformations were observed (Figure 4J-L). The presence of a sequence of distincttopographies that was unique to the imagery condi tions indicates that the instructionsto imagine successive notes result in a measurable series of brain states. The similar-ity of the initial component during imagery and the auditory evoked potential NI00component suggest that auditory areas may be activated as subjects begin to imaginethe continuation of the melody.

If the set of brain areas involved in forming successive mental images is engagedin an iterative manner, one might expect to detect repetitions of voltage topographiesat the scalp. The failure to record the same emitted potential to each note in a sequenceof imagined notes may stem from technical limitations of the event-related potential(ERP) methodology, e.g. the need to average together many responses in which thephase-jitter of the components one is interested in observing is minimized. Alterna-tively, the switch from listening to a sequence of notes to imagining a sequence ofnotes may reorganize the functional relationship of brain areas involved in the task.The pattern of brain activity evoked at the onset of imagery may differ from that gen-erated during more sustained imagery. For example, the response to the first imaginednote may represent the interaction of the image with a decaying sensory memory traceof the preceding note. Cowan (1995) reviews evidence for two types of auditory sen-sory memory. The shorter form lasting several hundred ( 250) ms from the stimulusmay be perceived as a continuation of the stimulus, whereas the other lasts 10-20s andtakes the form of a vivid recollection of the sound. The process of image generationduring the first note might interact with either form of the auditory sensory memory.NaaHinen and Winkler (Naatanen & Winkler, 1999) have recently argued that repre-sentations of auditory stimuli that are accessible to top-down processes are formed inthe auditory cortex and indexed by the MMN. A correspondence has been proposedbetween the MMN and the longer lasting form of auditory sensory memory (Cowan,1995).

If, in the auditory imagery tasks described above, the NI00-like topography isdue to the interaction of the image generation process with the short-duration sensorycomponent, the same topography should be absent for subsequent imagined notes, asthe sensory memory has decayed. If the N1OO-like topography to the imagined note isthe result of an interaction with the longer-duration sensory memory store, one mightexpect its presence for the remaining imagined notes also. It is possible, however, thateither sensory memory store is perturbed by forming an image of the next, different,note in the melody, in which case the NI00-like topography may no longer be possi-ble. A slightly different explanation for the absence of an NI00-like topography foreach imagined note is that the processes at the onset of auditory imagery and those incontinued imagery differ and are therefore associated with different voltage topogra-phies. In the case of visual mental imagery, Kosslyn (1994) makes a strong case fordissociating between processes of 'image generation' and 'image maintenance', argu-ing that the two processes have different brain activation signatures, the latter beingclosely associated with working memory processes. Clearly, additional experimentsare needed to disentangle the pattern of activations observed in the auditory imageryexperiment described above.

Page 53: Musical Imagery

40

The neural circuitry of musical imagery

NEUROPHYSIOLOGICAL MECHANISMS

What are the brain circuits that facilitate the formation ofmusical images, either in theform of expectancies in musical contexts or sequences of imagined notes of a melody?Recent functional neuroimaging studies using PET by Zatorre and colleagues foundthat regions of the STG, home to the auditory cortex, and the right frontal lobes areactivated when subjects mentally scan a familiar melody in order to compare the pitchheight of two notes (Zatorre et aI., 1996), or continue imagining melody (Halpern &Zatorre, 1999). While directly implicating the auditory cortex and frontal areas inmusical imagery, the PET data do not have the temporal resolution to specify howthese areas interact in time.

The temporal resolution of EEG and MEG methods can potentially address theissue of temporal interactions of different brain areas, but these methods are ham-pered by the need to average many trials due to signal-to-noise ratio considerationsand the difficulty of inferring intracranial sources of the potentials measured at thescalp. Inferences about the loci of brain activation that give rise to the electrical fieldmeasured at the scalp are based on analyses of the topographical distribution of thevoltage across the scalp. For instance, the hallmark topography of the NI00 is a po-larity inversion between negative values at the vertex of the head and positive valuesat sites around the lower perimeter of head. Models, based on MEG recordings, ofequivalent current dipoles that account for such a polarity inversion imply activationof auditory cortical regions along the superior temporal gyrus (Pantev et aI., 1995;Verkindt, Bertrand, Perrin, Echallier, & Pernier, 1995). MEG and EEG data recordedduring scanning of auditory memory (Kaufman, Curtis, Wang, & Williamson, 1992),and auditory selective attention tasks (reviewed in Naatanen, 1992) all implicate theSTG as an area where stored images are compared with sensory input. Thus, the sim-ilarity in the topography during the first imagined note and the NI00 component ofthe auditory evoked potential to the corresponding heard note is consistent with thenotion that the process of generating an image of the next note in a melody activatesthe auditory cortex.

Combined neurophysiological and functional neuroimaging work may provide away of determining whether focal expectancies and sustained imagery of auditory se-quences are simply different facets of the same process, i.e. dependent on the sameneural architecture. Currently, the available physiological data are too sparse to de-scribe in detail the mechanisms of auditory and musical imagery in the brain. Thespatiotemporal activity pattern in the averaged ERPs described above is consistentwith suggestions from the functional neuroimaging literature that a circuit subservingmusical imagery may consist of auditory cortex (N100 activation in the ERPs), frontalcortex (positive anterior focus, e.g. Figure 4G, 31), and posterior parietal cortex (P300activation). However, the process of auditory imagery in a musical context requiresa more precise theoretical specification. This specification must account for ways inwhich domain general functions, such as the numerous forms of memory (long-term,working, echoic), attention, expectancy, and perception interact to imbue mental im-ages with their content.

Page 54: Musical Imagery

PETRJANATA

Acknowledgements

41

The research was supported by NIH grants GM07257, NSI0395, and P50 NS17778-18, and the McDonnell/Pew Center for the Cognitive Neuroscience of Attention at theUniversity of Oregon. I thank several anonymous reviewers for their helpful critiquesof an earlier version of this manuscript.

References

Besson, M., & Falta, E (1995). An event-related potential (ERP) study of musical expectancy: Comparisonof musicians with nonmusicians. JEP: Human Perception & Performance, 21,1278-1296.

Besson, M., Falta, E, Czternasty, C., & Kutas, M. (1997). What's in a pause: Event-related potentialanalysis of temporal disruptions in written and spoken sentences. Biological Psychology, 46(1), 3-23.

Besson, M., & Macar, F. (1987). An event-related potential analysis of incongruity in music and othernon-linguistic contexts. Psychophysiology, 24(1), 14-25.

Cowan, N. (1995). Attention and Memory. New York: Oxford University Press.Dierks, T., Linden, D. E., Jandl, M., Formisano, E., Goebel, R., Lanfermann, H., & Singer, W. (1999).

Activation of Heschl's gyrus during auditory hallucinations' [see comments]. Neuron, 22(3), 615-621.Donchin, E., & Coles, M. G. H. (1988). Is the P300 component a manifestation of context updating.

Behavioral and Brain Sciences, 11(3), 357-374.Finke, R. A. (1989). Principles ofMental Imagery. Cambridge: The MIT Press.Goldman-Rakic, P. S. (1996). The prefrontal landscape: implications of functional architecture for under-

standing human mentation and the central executive. Philosophical Transactions of the Royal SocietyofLondon Series B-Biological Sciences, 351(1346),1445-1453.

Griffiths, T. D., Jackson, M. C., Spillane, J. A., Friston, K. 1., & Frackowiak, R. S. J. (1997). A neuralsubstrate for musical hallucinosis. Neurocase, 3(3), 167-172.

Halgren, E., Marinkovic, K., & Chauvel, P. (1998). Generators of the late cognitive potentials in auditoryand visual oddball tasks. Electroencephalography and Clinical Neurophysiology, 106(2),156-164.

Halpern, A. R., & Zatorre, R. J. (1999). When that tune runs through your head: A PET investigation ofauditory imagery for familiar melodies. Cereb Cortex, 9(7),697-704.

Hantz, E. C., Kreilick, K. G., Kananen, W., & Swartz, K. P. (1997). Neural responses to melodic andharmonic closure: An event-related potential study. Music Perception, 15(1),69-98.

Janata, P. (1995). ERP measures assay the degree of expectancy violation of harmonic contexts in music.Journal ofCognitive Neuroscience, 7, 153-164.

Janata, P. (in press). Brain electrical activity evoked by mental formation of auditory expectations andimages. Brain Topography.

Janata, P., & Petsche, H. (1993). Spectral analysis of the EEG as a tool for evaluating expectancy violationsof musical contexts. Music Perception, 10,281-304.

Kaufman, L., Curtis, S., Wang, J. Z., & Williamson, S. J. (1992). Changes in cortical activity when subjectsscan memory for tones. Electroencephalography and Clinical Neurophysiology, 82(4), 266-284.

Koelsch, S., Gunter, T., Friederici, A. D., & Schrtlger, E. (2000). Brain indices of music processing:'Nonmusicians' are musical. Journal ofCognitive Neuroscience, 12(3), 520-541.

Kosslyn, S. M. (1980). Image and Mind. Cambridge: Harvard University Press.Kosslyn, S. M. (1994). Image and Brain. Cambridge: The MIT Press.N:i:itl1nen, R. (1992). Attention and Brain Function. Hillsdale: Lawrence Erlbaum Associates.N:i:itl1nen, R., & Picton, T. (1987). The N1 wave of the human electric and magnetic response to sound: A

review and an analysis of the component structure. Psychophysiology, 24(4),375-425.N:i:itl1nen, R., & Winkler, I. (1999). The concept of auditory stimulus representation in cognitive neuro-

science. Psychological Bulletin, 125(6),826-859.Nunez, P. L. (1981). Electric Fields of the Brain. New York: Oxford University Press.Pantev, C., Bertrand, 0., Eulitz, C., Verkindt, C., Hampson, S., Schuierer, G., & Elbert, T. (1995). Specific

tonotopic organizations of different areas of the human auditory cortex revealed by simultaneous mag-

Page 55: Musical Imagery

42 NEUROPHYSIOLOGICAL MECHANISMS

netic and electric recordings. Electroencephalography and Clinical Neurophysiology, 94(1),26-40.Patel, A. D., Gibson, E., Ratner, J., Besson, M., & Holcomb, J. (1998). Processing syntactic relations

in language and music: An event-related potential study. Journal 0/ Cognitive Neuroscience, 10(6),717-733.

Petsche, H., Richter, P., von Stein, A., Etlinger, S. C., & Filz, O. (1993). EEG coherence and musicalthinking. Music Perception, 11(2),117-151.

Petsche, H., von Stein, A., & Filz, O. (1996). EEG aspects of mentally playing an instrument. CognitiveBrain Research,3(2), 115-123.

PflUger, H. J., & Menzel, R. (1999). Neuroethology, its roots and future. Journal o/Comparative PhysiologyA- Sensory Neural and Behavioral Physiology, 185(4),389-392.

Rappelsberger, & Petsche, H. (1988). Probability mapping: power and coherence analyses of cognitiveprocesses. Brain Topography, 1(1),46-54.

Ruchkin, D. S., Sutton, S., & Tueting, P. (1975). Emitted and evoked P300 potentials and variation instimulus probability. Psychophysiology, 12(5),591-595.

Simson, R., Vaughan, H. G., & Ritter, W. (1976). The scalp topography of potentials associated withmissing visual or auditory stimuli. Electroencephalography and Clinical Neurophysiology, 40(1), 33-42.

Srinivasan, R., Nunez, P. L., & Silberstein, R. B. (1998). Spatial filtering and neocortical dynamics: esti-mates of EEG coherence. IEEE Transactions on Biomedical Engineering, 45(7),814-826.

Sutton, S., Tueting, P., Zubin, J., & John, E. R. (1967). Information delivery and the sensory evokedpotential. Science, 155(768),1436-1439.

Verkindt, C., Bertrand, 0., Perrin, F., Echallier, 1. F., & Pernier, 1. (1995). Tonotopic organization of thehuman auditory cortex: N100 topography and multiple dipole model analysis. Electroencephalographyand Clinical Neurophysiology, 96(2),143-156.

Verleger, R. (1988). Event-related potentials and memory - a critique of the context updating hypothesisand an alternative interpretation of P3. Behavioral and Brain Sciences, 1J(3), 343-356.

Villringer, A. (1999). Physiological changes during brain activation. In C. T. W. Moonen & P. A. Bandettini(Eds.), Functional MRI (pp. 3-13). Berlin: Springer-Verlag.

Weinberg, H., Walter, W. G., & Crow, H. J. (1970). Intracerebral events in humans related to real andimaginary stimuli. Electroencephalography and Clinical Neurophysiology, 29(1), 1-9.

Zatorre, R. J., Evans, A. C., & Meyer, E. (1994). Neural mechanisms underlying melodic perception andmemory for pitch. Journal 0/Neuroscience, 14(4), 1908-1919.

Zatorre, R. 1., Halpern, A. R., Perry, D. W., Meyer, E., & Evans, A. C. (1996). Hearing in the mind's ear:A PET investigation of musical imagery and perception. Journal o/Cognitive Neuroscience, 8,29-46.

Page 56: Musical Imagery

3

Musical Imagery andWorking Memory

Virpi Kalakoski

It is a familiar experience to most of us that we can rehearse musical pieces in ourminds although there is no auditory musical stimulus. This subjective experienceis an example of musical imagery. Research into cognitive psychology has shownthat mental imagery is not only a subjective experience, but a measurable cognitivephenomenon as well. Neisser clarifies the nature of mental imagery as a cognitivephenomenon: 'If memory and perception are the two key branches of cognitive psy-chology, the study of imagery stands precisely at their intersection' (Neisser, 1972,p. 233). This statement is confirmed by the fact that most research into mental im-agery has focused on the similarities between mental imagery and perception, buttoday an increasing number of studies also concentrate on mental imagery and cogni-tion. The latter approach studies the role of long-term memory, interpretation, learn-ing and conceptual knowledge in mental imagery, or investigates mental images asactivated short-term memory or working memory representations. In this chapter, Ifirst discuss the nature of musical images, and then ask to what extent the concept ofworking memory applies to musical imagery.

Page 57: Musical Imagery

44 MUSICAL IMAGERY AND WORKING MEMORY

Musical imagery and perceptual processes

The essence of mental imagery is its similarity with perceptual processes. In musicwe can have for example an auditory image of our national anthem, a visual imagefor the note pattern of the anthem, or a motor image as to how to play the anthem ona violin. Musical imagery has mostly been studied as a special example of auditoryimagery, and in this chapter, too, we will focus on auditory musical imagery.

Similarity with perceptual processes means, first, that mental imagery is a mediumfor simulating perceptual properties of the external world. Theories taking this ap-proach to mental imagery are called functional theories, and they attempt to explainhow mental imagery contributes to e.g. the process of comparing one object with an-other (Finke, 1985). Crowder (1989) investigated the effect of match and mismatch ofimagined timbre of the first tone on reaction times to make same/different judgementconcerning the timbre of the second tone. In his Experiment 2, subjects were pre-sented a sine wave tone, and they were asked to imagine how it would sound if playedwith a guitar, flute or trumpet. Thereafter followed the second tone, whose timbrewas a recording of one of the three instruments. The results showed that when theimagined timbre of the first tone matched the timbre of the second tone, the reactiontimes were faster than when the timbre of the first and second tone were mismatched.This result seems to suggest that it takes time to transform the timbre of the imaginedtone, and mental imagery contributes to the process of pitch comparison.

Second, structural theories of mental imagery represent a stronger view than func-tional theories by stressing that imagery shares structural similarities with perception,and there are some similarities between real and imagined objects (Finke, 1985). Ac-cording to this view, in music such attributes as timbre, pitch and tempo can be rep-resented in real objects and by auditory imagery (Baddeley & Logie, 1992; Halpern,1988; Hubbard & Stoeckig, 1988; Zatorre, 1996). Halpern (1988) applied this ap-proach to study temporal extent in auditory imagery. She used a mental scanningprocedure which was introduced by Kosslyn, Ball, and Reiser (1978) in the visualmodality. In Halpern's Experiment three subjects were presented with the name of asong, followed by a one-word lyric from the song on a monitor. After 500 ms subjectswere presented with a second lyric, and their task was to compare whether the pitch ofthe second lyric was higher or lower than the pitch of the first lyric. In the nonimagerycondition subjects were only asked to respond as quickly and accurately as possible,whereas in the imagery condition they were asked to 'begin with the first lyric andplay through the song in your mind until you reach the second lyric' (Halpern, 1988,p. 439). The results showed an identical reaction-time pattern in both conditions: re-action times increased with greater distance (number of steps) between the first andsecond song lyric. These results suggest that auditory imagery for songs representstemporal-like characteristics (Halpern, 1988).

Third, interactive models of mental imagery claim that imagery is mediated by thecognitive and neuronal mechanisms involved in perception, and this has been shownin several experimental and brain research studies (Farah & Smith, 1983; Halpern,1988a; Hubbard & Stoeckig, 1988; Zatorre et al. 1996). These studies represent theinteractive approach as they study how mental imagery influences ongoing perceptualprocesses (Finke 1985). Farah and Smith (1983), for example, studied whether an

Page 58: Musical Imagery

VIRPI KALAKOSKI

Short-term store Long-term store

45

Figure 1. A simplified presentation of the memory system according to the multi-storemodels (e.g Atkinson & Shiffrin, 1968). Perceptual information is received bymodality-specific sensory stores, which hold information very briefly. Some of theattended information is further processed by the short-term store, which has a lim-ited duration and capacity. A limited amount of information is transferred from theshort-term store to the long-term store, which can hold a great amount of informa-tion over long periods of time.

auditory image interfered with or facilitated an auditory signal detection task. In theirstudy participants imagined pure tones before or during a signal detection task. Theresults showed that the thresholds for detecting auditory signals were lower when thefrequency of the image was the same as the frequency of the auditory signal to bedetected. Thus, the results showed that auditory images facilitate detection of same-frequency auditory signals, suggesting that imagery and perception share some com-mon underlying mechanisms.

The theoretical approaches introduced above have increased our knowledge con-cerning what can be represented in auditory musical imagery. The subsequent ques-tion is what is the underlying cognitive system that accounts for the maintenance andprocessing of these auditory and musical representations. The basic claim in cognitivepsychology is that the cognitive system includes a sensory memory for brief storageof perceived stimuli, a short-term working memory for rehearsal and processing, anda long-term memory for learned information (Fig. 1). Baddeley and Logie (1992)suggest that echoic memory, which is a brief temporary storage of auditory material,cannot be the seat of auditory imagery, because it operates only in the presence ofauditory stimuli. While auditory musical images can be evoked internally without ex-ternal stimuli, the relevant concept here is that of working memory, which focuses onactivated mental images.

Page 59: Musical Imagery

46 MUSICAL IMAGERY AND WORKING MEMORY

Long-termstore

The visuo-spatialsketch pad

Figure 2. A schematic figure of the working memory model (Baddeley & Hitch 1974, Bad-deley, 1986). Information accessed from the sensory stores and the long-term storeis processed by working memory, in which the central executive functions as a su-pervisory controlling system. It controls the visuo-spatial sketch pad, dealing withvisually and spatially coded information, and the phonological loop, specialized forprocessing language material.

Working memory

The concept of working memory, introduced by Baddeley and Hitch (1974), refersto an active memory coding and rehearsal system, which is used in complex cog-nitive tasks such as language comprehension, reading, visual imagery, and problemsolving. The working memory model includes a central executive, which is an atten-tional control device. It cooperates with two slave systems specialized in visuo-spatialand phonological processing. The visuo-spatial sub-component is proposed to have,among other functions, the activation of visual mental imagery (Logie, 1995). Theconcept of the phonological loop refers to a system which is needed in memorizingverbal material, and in language processing (Baddeley, 1986) (see Fig. 2).

A general method for studying the role of working memory in a certain cogni-tive task has been a dual-task paradigm (Baddeley, 1986). The rationale behind theparadigm is that working memory subsystems are capacity limited. Thus, if two tasksload the same working memory subsystem, concurrent processing interferes with per-formance. However, if the tasks are processed in different systems, there is no differ-ence between the performance in one or both of the tasks. When considering auditoryimagery, the interesting question is to what extent musical imagery and the phonolog-ical working memory overlap.

Page 60: Musical Imagery

VIRPI KALAKOSKI

The phonological loop

47

The phonological loop is the most studied component of working memory. One ofits functions is passive short-term storage of phonologically encoded material, whichprovides an acoustic image of phonological information. The other function of thesystem is active subvocal articulatory rehearsal, which is capable of maintaining aphonological code (Baddeley, 1986).

This kind of division of short-term retention of auditory stimuli into two separatecomponents is not only seen in the working memory literature, but also concerns theconcept of short-term (verbal) memory (e.g. Penney, 1989). Reisberg, Wilson andSmith (1991) apply this division to musical imagery as follows: 'one rehearses mate-rial in working memory by talking to oneself, and then listening to what one has said(Reisberg et al 1991, p. 72.). They use the term inner voice for active subvocalizationand the term inner ear for the passive acoustic image.

Besides the division between passive and active functions, research into the na-ture of the phonological loop has found some important effects, called phonologicalsimilarity, the word-length effect, the irrelevant speech effect, and articulatory sup-pression (Baddeley, 1986). The phonological similarity effect refers to the findingthat the short-term recall of phonologically similar items (e.g., mad, map, cat, etc.)is poorer than the recall of phonologically dissimilar items (e.g., day, cow, pen etc.).This is assumed to show that a phonological code is used in rehearsal. Second, theword-length effect refers to the finding that the length of items to be retained affectsthe short-term recall, and it is poorer for sequences comprising long words (e.g. mu-sicality, orchestra, composer, etc.), than for short words (e.g. horn, staff, tune, etc.).The interpretation of this effect is that subvocal rehearsal operates in real time, andthe trace underlying phonological code decays within about two seconds if not ac-tively rehearsed (Baddeley, 1986). These two effects have not yet been studied in thecontext of musical imagery.

Third, the irrelevant speech effect means that presentation of spoken material, andthis includes an unfamiliar foreign language, impairs memory for visually presentedverbal items. However, this effect does not occur with environmental sounds, and theinterpretation is that spoken material has obligatory access to the phonological store.The fourth effect is articulatory suppression: concurrent vocalization of an irrelevantsound reduces memory span for verbal material. The effect suggests that visuallypresented verbal material has to be recoded through the articulatory process, but thisis not possible under articulatory suppression (Baddeley, 1986). These two effects andtheir application to musical imagery are described below.

The phonological loop and musical imagery

Salame and Baddeley (1989) studied the overlap of verbal and musical processingusing immediate verbal recall as a primary task. They studied whether serial recallof visually presented digits was disrupted by unattended vocal or instrumental music.The results showed that instrumental music had significantly less effect on recall ofdigit sequences than vocal music, which also impaired performance. One of their the-

Page 61: Musical Imagery

48 MUSICAL IMAGERY AND WORKING MEMORY

oretical interpretations was that there is a peripheral filter, which is passed by speechand music, but not by white noise. This would explain why instrumental and vocalmusic disrupt the memory performance while white noise has no disruptive effect.Furthermore, they suggested that the difference between vocal and instrumental mu-sic is in their capacity to disrupt phonological store: vocal music has more acousticfeatures in common with subvocal speech, and is therefore more likely to disrupt it(Salame & Baddeley, 1989).

From the point of view of musical imagery, an interesting question is whether pro-cessing musical imagery as a primary task is differently affected by presentation ofirrelevant verbal and musical material. A dissimilar effect would indicate that pro-cessing musical and verbal material does not overlap in working memory, whereas thesimilar effect of irrelevant speech and music on musical imagery would indicate thatprocessing verbal and musical material overlap. There are a few experiments focusingon music as a primary task. Below we discuss two kinds of tasks that have been stud-ied: pitch comparison tasks and memory for melodies. These tasks can be interpretedas musical imagery tasks, because they concern representations that are produced afterperception of the stimuli, or from the memory.

Effects of tonal and verbal secondary tasks on pitch comparison

Logie and Edworthy (1986) used pitch discrimination as a primary task. In their studyparticipants were presented with a pair of pitches, and they had to store the first itemof the pair for a few seconds, after which the second item was presented. The partici-pant's task was to decide whether or not the two pitches were identical.

Concurrently with the primary task participants performed a secondary task. Inthe homophone judgement secondary task they had to decide whether visually pre-sented word-nonword pairs (e.g. cloak-kloke) sounded identical. This task has beenshown to block acoustic imagery. In the articulatory suppression task they repeatedthe word 'the' at a barely audible level, which blocks subvocal rehearsal. The thirdsecondary task was visual matching of strings of non-alphabetic symbols to interferewith the working memory visuo-spatial sub-component. The results showed that onlyhomophone judgement had an effect on pitch discrimination, which seems to suggestthat only acoustic imagery, not subvocal rehearsal, is involved in pitch discrimination(Logie & Edworthy, 1986).

A similar kind of study was conducted by Pechmann and Mohr (1992), who usedadditional tones in their secondary task. They used a method employed by Deutsch(1975), in which participants first received a test tone, followed by a series of inter-vening tones, monosyllabic words, or visual 4 x 4 matrices. After intervening items,participants heard a second test tone, and their task was to indicate verbally whetherthis tone was the same or different from the first test tone. The study was conductedwith musically trained and untrained participants in two conditions. In unattendedcondition the interfering items were supposed to be ignored, and in attended conditionparticipants were asked to indicate whether the last two intervening words rhymed, orwhether the last two visual matrices were identical.

The results showed that the tonal condition decreased performance to the level of

Page 62: Musical Imagery

VIRPI KALAKOSKI 49

chance, whereas visual and auditory conditions caused only weak interference for non-musicians and no interference for musicians (Pechmann & Mohr, 1992). This resultis in line with Deutsch's (1970) finding that concurrent speech does not affect tonalmemory, whereas interpolated tonal items cause an interference effect. Pechmann andMohr give two interpretations for their results. First they suggest that in addition tothe articulatory loop, there is a tonal loop consisting of a tonal-storage componentand a tonal-rehearsal component, the efficacy of which are affected by the level ofmusical training. Alternately, they suggest that both speech and tonal information areprocessed in a common acoustic store (Pechmann & Mohr, 1992).

A pitch comparison task was also applied by Keller, Cowan and Saults (1995),who used imagery, not a perceived stimulus, in the secondary condition. They studiedwhether the delayed comparison of two pitches is affected by the type of imagined dis-tractor task performed during the intertone interval (ITI) between the pitches. In theverbal distractor task participants were first presented with four digits on a screen, andduring the ITI they were asked to remember the order in which digits were presented,and indicate it by using a computer mouse to click at a vertical rectangle, which rep-resented the digits. In the auditory distractor condition, they were first presented withpitches of four tones, and then asked to reproduce the contour of the series of tonesby clicking a mouse at the response grid. The results showed that with an ITI of 0.5seconds, the distractor task did not affect pitch comparison, whereas with 10 sec. ITIboth verbal and auditory distraction impaired performance. Keller et al. (1995) sug-gest that verbal and auditory imagery disrupts the rehearsal process which is neededfor pitch discriminations.

This result shows that an imagined distractor task also affects pitch comparison,although the effect was smaller than in the Pechmann and Mohr (1992) study in whichactual interfering stimuli were used. However, both studies showed the effect of verbaland tonal interfering tasks in the group of musically untrained participants. Further-more, Pechmann and Mohr also had a musically trained group, in which the verbaldistractor task did not appear to have any effect.

To summarize, research into the effect of secondary tasks on pitch comparisonshows that homophone judgement, attended and unattended tones, attended and unat-tended words, and visual matrices (only in the nonmusician group) impair perfor-mance in pitch comparison task. These results suggest that acoustic imagery is in-volved in pitch comparison. It is unclear why visual matrices also had an effect onnonmusicians' performances. The studies reviewed above further show that articula-tory suppression and concurrent speech do not interfere with pitch comparison. Thissuggests that subvocal rehearsal is not involved. Furthermore, reproducing the contourof a series of tones and reproducing the order of imagined digits caused impairment,but it is unclear to what extent acoustic imagery is involved in this task, or whether itmainly indicates the role of the attention-demanding central executive.

Effects of tonal and verbal secondary tasks on recognition and recall of melodies

Logie and Edworthy (1986) also studied the effect of homophone judgement, artic-ulatory suppression, and visual matching of strings on the memory of melodic se-

Page 63: Musical Imagery

50 MUSICAL IMAGERY AND WORKING MEMORY

quences. Their results showed that articulatory suppression and homophone judge-ment disrupted recognition of melodic sequences. They interpret this result to showthat both subvocal rehearsal and acoustic imagery are involved in the short-term mem-ory of melodies. Logie and Edworthy suggest that in tasks with musical material, twoseparate mechanisms are involved. Both processing and storage are involved in themelody task, whereas pitch discrimination mainly requires processing. Another sug-gestion regarding the difference between pitch comparison and the melody task is thatthe first requires only acoustic imagery, but the latter also demands an articulatorymechanism (subvocal speech) (Logie & Edworthy, 1986).

Accordingly, Reisberg et al. (1991) conceptualise two mechanisms required in au-ditory imagery. The first is the inner ear, the use of which can be blocked by presentingirrelevant sounds. The other mechanism is the inner voice, which can be blocked byasking subjects to do concurrent articulation tasks. Reisberg et al. studied the effectof one or both of those tasks on a primary task which was to estimate whether themelody of familiar tunes rose or fell from the second note to the third. The numberof correct judgements in this task decreased from over 80% to less than 70% whensubjects were presented with irrelevant sounds or when they performed concurrent ar-ticulation, or when they had to perform both secondary tasks. Reisberg et al. (1991)interpret the result to show that both the inner ear and the inner voice were needed inthis task. This interpretation is comparable with the distinction between acoustic andarticulatory processing.

The two studies described above used recognition memory or recall of familiartunes as a primary task. The primary task in the series of studies we are working onis the immediate recall of visually presented notes. The rationale of the task is thatmusicians can remember visually presented sequences of musical notes via musicaland auditory imagery, while nonmusicians are not able to transform visual notationinto a musical image. The primary task in our experiments consists of a series ofindividual notes presented visually on a monitor. Only one single note appears at atime for 2 seconds in successive places on a staff. Thus, the participants are not givenany auditory stimulus, and the note patterns are not presented visually in their entirety.The rationale behind this method is that participants have to construct a large internalrepresentation, and once the whole note pattern has been presented, they have to recallit by writing it down.

We used the dual-task paradigm to study the role of working memory slave sys-tems in this kind of imagery construction task (Kalakoski, 1997). Eight professionalmusicians and eight students of psychology with less than four years of musical train-ing were presented with 11-13 notes. There were in addition two dual task conditionsin the experiment. First, participants were asked to repeat a pseudo word aloud. Sec-ond, they were asked to mentally scan a capital letter, and indicate by pressing acorresponding computer key whether each corner of the letter turned left or right (seealso Brooks 1968).

The hypothesis was that articulatory suppression would only affect musicians,since they are able to transform visual note patterns into auditory representations.The results showed that participants recalled more notes in the control than in thearticulatory secondary task condition, and more in the articulatory than in the visualsecondary task condition. The effect of the articulatory secondary task is in line with

Page 64: Musical Imagery

VIRPI KALAKOSKI 51

the Logie and Edworthy (1986) study, in which articulatory suppression had an effecton rehearsing musical patterns. However, contrary to our hypothesis, performance wasimpaired due to articulatory suppression in both skill groups, although nonmusicianswere unable to transform visual note patterns into auditory representation. When inter-viewed after the experiment, participants claimed that they had used verbal rehearsalin order to memorize the note patterns by silently repeating verbal strings such as 'onestep up black, two step down white'.

The effect of a visuo-spatial secondary task is more difficult to interpret. The firstinterpretation is that the construction ofmusical imagery from visually presented notesalso requires visual imagery. Furthermore, the visuo-spatial secondary task also re-quired a motor response, and the impairment in performance may have been the resultof using a motor subsystem in a representation condition. However, it is more likelythat the visuo-spatial secondary task also burdens the central executive component ofworking memory, and the general cognitive load explains the effect of visuo-spatialtask on the results (Logie, 1995).

A second experiment by Kalakoski (1999) used primary task similar to the previ-ous one. Ten students of music and 7 control participants were presented with notepatterns consisting of 4, 8 or 16 black notes (dots). In the secondary task conditions,participants were first presented an auditory melody, auditory pseudoword or visualmatrix. Then followed the notes one by one. Thereafter a second melody, pseudowordor visual matrix was presented, and the participants' task was to indicate by pressinga computer key, whether that item was the same or different from the first item of thesecondary stimulus. And, at the end, they were asked to recall the notes presented inthe primary task.

The tentative results showed that an effect of secondary task could be found on re-call of visually presented note patterns consisting of 16 items. In the control group thepseudoword secondary task impaired recall of notes, whereas in the musicians' group,presentation of an auditory melody impaired performance in the primary task. Thus,the performance of the musicians was most disrupted by attended melodies, whereasthe performance of the nonmusicians was mainly disrupted by attended pseudowords.The results were not quite statistically significant, but the pattern suggests that musi-cians store visually presented notes via a music rehearsal loop, but not by subvocalspeech. Furthermore, since the performance of the nonmusicians was not affected byattended melodies but only by attended pseudowords, it indicates that the different ef-fect of attended melodies and pseudowords is not caused by a difference between thedifficulty of the two tasks. We are now conducting new studies with a greater numberof participants to further investigate the hypothesis that the rehearsal process under-lying musical imagery for melodies does not necessarily overlap with the articulatorymechanism used in subvocal speech rehearsal.

Conclusions: Does the concept of working memory apply to musical imagery?

The aim of this chapter was to approach musical imagery from the concept of workingmemory. First, it was noted that in this context the phonological loop is the relevantsubcomponent of working memory. The role of the phonological loop in musical

Page 65: Musical Imagery

52 MUSICAL IMAGERY AND WORKING MEMORY

imagery was studied via the effects of articulatory suppression and irrelevant speech.The studies reviewed above applied a dual task method in order to study the effects ofspeech, melodies and articulatory suppression on musical imagery. Musical imagerywas measured by pitch comparison and melody recognition/recall tasks.

The results reviewed above seem to show that homophone judgement, as wellas attended/unattended tones and words, impair performance in pitch comparison,melody recognition, and melody recall tasks. However, articulatory suppression andconcurrent speech interfere only with melody tasks. These results suggest that acous-tic imagery is involved in pitch comparison and melody tasks, while subvocal re-hearsal is only involved in melody tasks.

However, the pattern of results was not always the same for musicians and nonmu-sicians, for in our experiments (Kalakoski, 1999) recall of visual note patterns seemedto be disrupted either by a concurrent melody comparison task or a pseudoword com-parison task. Two possible ways of approaching this tentative result are, first, to con-duct research into experts' cognition, and second, to study more carefully the featuresof auditory secondary tasks that disrupt musical imagery.

The first approach focuses on the fact that music is a task environment in whichskill level matters. Intons-Peterson (1992) concludes from several studies and somereviews that imaginal and perceptual performance seem to be more similar whenthe tasks are highly unusual, than when they are based on real-world knowledge.This indicates that the experimental effects which have been taken to demonstratethe similarity of mental imagery with perceptual processes may apply mainly to un-usual tasks, and not to the task environments which require learning and skill. Inorder to broaden our understanding of mental imagery, Intons-Peterson wants to add aknowledge-weighted model to the approaches relating mental imagery to perception.The model stresses that cognition can facilitate mental imagery (Intons-Peterson &Roskos-Ewoldsen, 1989; Intons-Peterson & McDaniel, 1991; Intons-Peterson, 1992).

A good example of the role of cognition in mental imagery is experts' images invisual tasks, such as mental calculation and blindfold chess. This research suggeststhat long-term memory conceptual knowledge affects the information we choose toinclude in our mental imagery representation, and the way in which this informa-tion is organised (Hatta et aI., 1989, Hishitani, 1990, Saariluoma, 1991, Saariluoma& Kalakoski 1997, 1998). The idea that it is not possible to separate image and itscomprehension, was also raised in auditory imagery by Reisberg, Smith, Baxter andSonenshine (1989). They studied whether auditory images of words which can be seg-mented in more than one way when repeated aloud over and over again (e.g. kiss thesky becomes kiss this guy) are ambiguous, or whether it is possible to find only oneinterpretation in the auditory image of the word/words. Their studies showed that ifsubvocalization is eliminated, auditory images are unambiguous, and it is impossibleto find the other interpretation or segmentation in the word/words. This result suggeststhat 'images are inherently meaningful', and we cannot only investigate their similar-ities with perception if we want to broaden our understanding of mental imagery.

The idea that cognition has a role in mental imagery is in line with the researchinto skill effects in other cognitive processes such as perception, categorization, mem-ory, and problem solving. For example several studies into experts' memory havedemonstrated superior memory in visual and auditory tasks in which experts can use

Page 66: Musical Imagery

VIRPI KALAKOSKI 53

pre-learned knowledge. In memory research the concept of long-term working mem-ory has been helpful in explaining experts' superior memory performance in their owntask environment (Ericsson & Kintsch, 1995). Long-term working memory refers toskilled use of storage in long-term memory, which enables individuals to greatly ex-pand capacity and duration of short-term working memory. Applying this concept inmental imagery research could increase our knowledge concerning the role of work-ing memory in skilled musical imagery (Ericsson & Kintsch, 1995; Saariluoma &Kalakoski, 1997).

The second point is that there are multiple auditory components in music. Forexample, such temporal information as rhythm, rate and total duration are in somecases encoded jointly with nontemporal information like a sequence of pitch intervals.However, if the two dimensions are structurally incompatible or the individual's de-gree of musical experience is low, temporal and nontemporal dimensions may requireindependent processing (Boltz, 1998). A distinction between processing systems re-quired for rhythm and pitch has also been proposed by Peretz and Kolinsky (1993) andCarroll-Phelan and Hampson (1996). An interesting question concerning research intomusical imagery and working memory is to study whether pitch and rhythm are pro-cessed in different working memory sub-components. A counter question would bewhether phonological coding and the articulatory loop of working memory are rele-vant concepts at all when considering the multiple attributes of musical imagery. Ithas been claimed that even short-term verbal memory is not after all primarily audi-tory or articulatory, but the inner voice and the inner ear, as well as visual codes, arefunctionally equivalent (Macken & Jones, 1995). This approach neither differentiatesbetween working memory subcomponents processing speech nor nonspeech sounds,but suggests that processes are based on acoustic parameters, like changing and steadystates of pitch and rhythm, rather than on the phonological code (Jones, Beaman, &Macken, 1999).

Finally, when considering musical imagery we should not forget the multi-modularnature of musical representations. Studies by Zatorre and Becket (1989) have shownthat recall of note names when tested with participants with absolute pitch, were notaffected by a verbal interference task nor by humming. They suggest, based also onreports of their subjects, that there are several possible strategies by which to rehearsenote sequences mentally. Besides verbal strategies, their subjects visualized the loca-tion of note names on their instruments, e.g. a keyboard, or on the staff. Furthermore,they used the auditory image of tones or the kinesthetic image of how the note se-quences would be performed on their instruments. Likewise, Mikumo (1994) showedthat finger tapping melodies as if playing a piano improves recall of musical patterns,which suggests that motor imagery plays a role in musical representations. Musiciansparticipating in our imagery experiments reported similar comments. Likewise, someempirical findings and theories concerning language processing suggest that motorcontrol structures are involved in e.g. speech perception (Kerzel & Bekkering, 2(00).Thus, if cognition of music and language overlap, it is reasonable to suggest that therole of motoric coding is relevant in both.

Although imagery representations seem to be sense-specific, it is also possibleto use several modalities at the same time, as is the case with the perceptual world(Intons-Peterson, 1992). Then, constructing an auditory musical image may follow

Page 67: Musical Imagery

54 MUSICAL IMAGERY AND WORKING MEMORY

visual, kinesthetic, and motoric images. From this point of view, the imagery ofmusi-cal performance may also involve some undefined working memory sub-componentsin addition to those proposed by the Baddeley and Hitch (1974) model. To conclude,musical imagery not only stands at the intersection of memory and perception, butalso at the intersection of several sense modalities.

References

Atkinson, R. C., & Shiffrin, R. M. (1968). Human memory: A proposed system and its control processes.In K. W. Spence & J. T. Spence (Eds.), The psychologyoflearning andmotivation (Vol. 2, pp. 89-105).London: Academic Press.

Baddeley, A. (1986). Working memory. Oxford: Clarendon Press.Baddeley, A. D., & Hitch, G. (1974). Working memory. In G. Bower (Ed.), The psychologyoflearning and

motivation (Vol. 8, pp. 47-89). New York: Academic Press.Baddeley, A., & Logie, R. (1992). Auditory imagery and working memory. In D. Reisberg (Ed.), Auditory

imagery (pp. 179-197). Hillsdale, New Jersey: Laurence Erlbaum Associates.Boltz, M. G. (1998). The processing of temporal and nontemporal information in the remembering of

event durations and musical structure. Journal ofExperimental Psychology: Human Perception andPerformance, 24(4), 1087-1104.

Brooks, L. (1968). Spatial and verbal components in the act of recall. Canadian Journal ofPsychology, 22,349-368.

Carroll-Phelan, B., & Hampson, P. 1. (1996). Multiple components of the perception of musical sequences:A cognitive neuroscience analysis and some implications for auditory imagery. Music Perception,13(4),517-561.

Crowder, R. G. (1989). Imagery for musical timbre. Journal ofExperimental Psychology: Human Percep-tion and Performance, 15(3),472-478.

Deutsch, D. (1970). Tones and numbers: Specificity of interference in short-term memory. Science, 168,1604-1605.

Deutsch, D. (1975). The organization of short-term memory for a single acoustic attribute. In D. Deutsch& 1. A. Deutsch (Eds.), Short term memory (pp. 107-151). New York: Academic Press.

Ericsson, K. A., & Kintsch, W. (1995). Long-term working memory. Psychological Review, 102, 211-245.Farah, M., J., & Smith, A., F. (1983). Perceptual interference and facilitation with auditory imagery. Per-

ception & Psychophysics, 33, 475-478.Finke, R. (1985). Theories relating mental imagery to perception. Psychological Bulletin, 98, 236-259.Halpern, A., R. (1988). Mental scanning in auditory imagery for songs. Journal ofExperimental Psychol-

ogy: Learning, Memory, and Cognition, 14,434-443.Hatta, T., Hirose, T., Ikeda, K., & Fukuhara, H. (1989). Digit memory of soroban experts: Evidence of

utilization of mental imagery. Applied Cognitive Psychology, 3, 23-33.Hishitani, S. (1990). Imagery experts: How do expert abacus operators process imagery? Applied Cognitive

Psychology,4, 33-46.Hubbard, T., L., & Stoeckig, K. (1988). Musical imagery: Generation of tones and chords. Journal of

Experimental Psychology: Learning, Memory, and Cognition, 14, 656-667.Intons-Peterson, M. J. (1992). Components of auditory imagery. In D. Reisberg (Ed.), Auditory imagery

(pp. 45-71). Hillsdale, New Jersey: Lawrence Erlbaum Associates.Intons-Peterson, M. J., & McDaniel, M. A. (1991). Symmetries and asymmetries between imagery and

perception. In C. Cornoldi & M. A. McDaniel (Eds.), Imagery and cognition (pp. 47-77). New York:Springer-Verlag.

Intons-Peterson, M. J., & Roskos-Ewoldsen, B. B. (1989). 'Sensory-perceptual qualities of images. JournalofExperin'ental Psychology: Learning, Memory and Cognition, 15, 188-199.

Jones, D. M., Beaman, & Macken, W. J. (1999). The object-oriented episodic record model. In S. E.Gathercole (Ed.), Models ofshort-term memory (pp. 209-237). Erlbaum, UK: Psychology Press.

Kalakoski, V. (1997, August 9-13). Multi-modal imagery and long-term working memory in constructing

Page 68: Musical Imagery

VIRPI KALAKOSKI 55

representations from visually presented musical notes. Paper presented at the Sixth European Workshopon Imagery and Cognition, Oslo, Norway.

Kalakoski, V. (1999, June 17-20 & July 12-15). Musical imagery and working memory. Paper presentedat the Conference on Musical Imagery, Sixth International Conference on Systematic and ComparativeMusicology, Oslo, Norway, and at the Seventh European Workshop on Imagery and Cognition, London,United Kingdom.

Keller, T., A., Cowan, N., & Saults, l S. (1995). Can auditory memory for tone pitch be rehearsed? JournalofExperimental Psychology: Learning, Memory, and Cognition, 21,635-645.

Kerzel, D., & Bekkering, H. (2000). Motor activation from visible speech: Evidence from stimulus responsecompatibility. Journal ofExperimental Psychology: Human Perception and Perfornlance, 26(2), 634-647.

Kosslyn, S. M., Ball, T. M., & Reiser, B. l (1978). Visual images preserve metric spatial information:Evidence from studies of image scanning. Journal of Experimental Psychology: Human Perceptionand Performance, 4(1),47-60.

Logie, R., H., & Edworthy, l (1986). Shared mechanisms in the processing of verbal and musical material.In D. G. Russell, D. F. Marks, & J. T. E. Richardson (Eds.), Inlagery 2. (Vol. 2, pp. 33-37). NewZealand: Human Performance Associates.

Logie, R. H. (1995). Visuo-jpatial working memory. Hove, UK: Lawrence Erlbaum Associates.Macken, W. l, & Jones, D. M. (1995). Functional characteristics of the inner voice and the inner ear:

Single or double agency? Journal of Experimental Psychology: Learning, Memory, and Cognition,21(2),436-448.

Mikumo, M. (1994). Motor encoding strategy for pitches of melodies. Music Perception, 12,175-197.Neisser, U. (1972). Changing conceptions of imagery. In P. W. Sheehan (Ed.), The functions and nature of

imagery (pp. 233-251). New York: Academic Press.Pechmann, T., & Mohr, G. (1992). Interference in memory for tonal pitch: Implications for a working-

memory model. Memory & Cognition, 20(3), 314-320.Penney, C. G. (1989). Modality effects and the structure of short-term verbal memory. Memory & Cogni-

tion, 17(4),398-422.Peretz, I., & Kolinsky, R. (1993). Boundaries of separability between melody and rhythm in music discrim-

ination: A neuropsychological perspective. Quarterly Journal of Experimental Psychology, 46A(2),301-325.

Reisberg, D., Smith, l D., Baxter, D. A., & Sonenshine, M. (1989). "'Enacted" auditory images are am-biguous; "pure" auditory images are not. The Quarterly Journal ofExperimental Psychology, 41A(3),619-641.

Reisberg, D., Wilson, M., & Smith, J. D. (1991). Auditory imagery and inner speech. In R. Logie, H.& M. Denis (Eds.), Mental images in human cognition (pp. 59-81). Amsterdam: Elsevier SciencePublisher B.V.

Saariluoma, (1991). Aspects of skilled imagery in blindfold chess. Acta Psychologica, 77, 65-89.Saariluoma, & Kalakoski, V. (1997). Skilled imagery and long-term working memory. American Journal

ofPsychology, 110,177-201.Saariluoma, P., & Kalakoski, V. (1998). Apperception and imagery in blindfold chess. Memory, 6(1),

67-90.Salame, P., & Baddeley, A. (1989). Effects of background music on phonological short-term memory.

Quarterely Journal ofExperimental Psychology, 41A, 107-122.Zatorre, R. J., & Beckett, C. (1989). Multiple coding strategies in the retention of musical tones by posses-

sors of absolute pitch. Memory & Cognition, 17(5), 582-589.Zatorre, R. l, Halpern, A. R., Perry, D. W., Meyer, E., & Evans, A. C. (1996). Hearing in the mind's ear:

A PET investigation of musical imagery and perception. Journal ofCognitive Neuroscience, 8, 29-46.

Page 69: Musical Imagery
Page 70: Musical Imagery

4

Modeling Musical Imageryin a Framework ofPerceptually ConstrainedSpatio-TemporalRepresentations

Marc Leman

Introduction

Musical imagery can be defined as the capacity for the mental representation of musi-cal sound in the absence of a direct audible and corresponding sound source. Musicalimagery is not a hallucination or the experience of hearing real music when none isthere. It rather denotes the mental re-experience, remembrance, recall, or mental cre-ation of a sonoric object or a musical process. Reference can be made to an innerear or to a vivid representation that is close to the actual perception of music. Mostpeople, moreover, will agree that the representation of a sonoric object in mind mayoccur in different degrees of vividness and abstraction. Such a mental given may beassociated with non-sonoric objects and generated on the basis of metaphors. Meta-phors obviously entail the projection of an image or process in a certain domain (e.g.,the visual or tactile domain) onto the sonoric domain.

Musical imagery may appear as vivid and its capacity may be exceptional as inthe case of Ludwig van Beethoven who, suffering from otosclerosis and being com-pletely deaf, could imagine and compose symphonies and string quartets. Musicalimagery can become a dramatic experience, as in the case ofMaurice Ravel who, suf-fering from an injury to the left hemisphere, claimed to be able to hear the music hecomposed but could no longer write it down (Sergent, 1993). But musical imageryis certainly not restricted to composers, nor is it necessarily exceptional or dramatic.Although we may assume that composers constitute the most talented and often best

Page 71: Musical Imagery

58 MODELING MUSICAL IMAGERY

trained population in exploring musical imagery as an instrument of creation, at moremodest levels, musical imagery may be as common as the capacity to recall a heardmelody in mind, similar to the way in which visual objects can be recalled or imag-ined.

Mental imagery can be studied from different points of view. The phenomenolog-ical viewpoint, long popular in musical esthetics, probes musical imagery by meansof introspection. The approach provides a description of our common imagery expe-rience in terms of a verbal report of imagined objects and associated strategies. Suchan enquiry can be stimulating and revealing, but it lacks the testable and measurableresults which are needed for the development of a systematic theory of musical im-agery. Apart from its epistemological value, a theory of how musical imagery works,rather than how it is experienced, would be valuable in view of the growing role ofmachines in musical creation processes. Musical imagery is indeed a prerequisite forgenuine machine intelligence and machine creation, just as a prerequisite for humanmusical intelligence, creation, planning, and man-machine interaction.

A systematic approach should provide an explanation of how musical imagery, inthe forms described by introspection, is at all possible in terms of a representationalsystem. It should go beyond the reports of personal experiences and programmaticideas, and provide a quantitative model that generates a testable hypothesis. Exper-imental psychology may help in determining the constraints of musical imagery interms of memory capacity and mental representation, and neuromusicology may pro-vide data about the neuronal carriers of imagery, using modern techniques of brainresearch (Tervaniemi & Leman, 1999).

Given the complex nature of higher-level brain processes, theoretical (or 'specula-tive') modeling may provide additional help in generating suitable working hypothe-ses. Our approach, therefore, is based on computer modeling which can be consideredas a method to develop an operational hypothesis of musical imagery as representa-tion.

Hence, the purpose of this chapter is to develop a framework that may be useful inexploring musical imagery in terms of memory, image representations, and informa-tion processing. Modeling aims at defining parameters and constraints necessary todevelop a plausible hypothesis about musical imagery which, in turn, may guide datagathering. In the first part, the basic requirements and assumptions for an image-basedspatio-temporal representational framework for musical imagery are discussed. In thesecond part, two examples of representational systems for musical imagery are given.The proposed framework is based on the ecological semiotics put forward in Leman(1999b). The signal processing models discussed below are functional equivalencemodels which perform in ways like neurons probably do.

Musical imagery and perception

Musical imagery is often assumed to be intrinsically related to perception. The mainargument comes from the experience of imagery as a kind of perception as well asscientific evidence.

Recent developments in neuromusicology, indeed suggest that the sensation of

Page 72: Musical Imagery

MARC LEMAN 59

musical imagery may result from the stimulation of brain areas that are active in sen-sory information processing. Using the paired-image subtraction method (Zatorre,1997), in which two different conditions are compared directly to one another thusproviding a difference image which reflects areas of cerebral activity of a studied task,Zatorre and collaborators (1994, 1996) have provided evidence that perception andimagery may share, at least partially, the same cortical neural substrate. A central hy-pothesis of their representational concept of musical imagery is that the awareness ofan inner perception is achieved owing to the fact that imagery makes use of the sameneuronal carriers as perception. In particular, Zatorre et al. (1996) suggest that thesecondary auditory cortex underlies both perception and imagery. 1 Yet, the hypothesisof a common ground for perception and imagery demands further analysis:• Could imagery also involve peripheral components of the auditory system? And

if there were no evidence for this, then what could be the reason? Consider areal listening situation, where the vibrations of the air molecules are picked upby the ear, and cause the stimulation of neurons (and, ultimately, the sensationof a sound). It is known that temporal information is accurately transmitted tohigher information centers but the average frequency range of temporal variationsrepresented in the time patterns of neural responses of the auditory nerve is higherthan in the inferior colliculus, where it is still substantially higher than in any ofseveral cortical fields. Rates of 1000 Hz are found in the auditory nerve, about100 Hz in the inferior colliculus, and about 10 Hz in the cortex (Schreiner &Langner, 1988). One may wonder how these findings relate to musical imagery.Does a vivid imagery of the beginning of Beethoven's Fifth Symphony generatethe very precise neural temporal patterns similar to the ones generated in theauditory periphery during a real listening situation? If yes, this would be a strongargument in favor of a low-level perceptual basis for imagery. If not, one maywonder in what other format imagery may connect to perception.

A possible explanation, and a basic supposition of the present proposed mod-eling, is that the brain applies some kind of time to place mapping. Evidencethat higher-level brain functions are spatial pattern processors is found in the to-pographical ordering of collicular and cortical maps (Eggermont, 1997). Themapping of neural time-code to neural place-code could be a central mechanismfor the formation of a perceptually constrained spatio-temporal representationalsystem, useful for imagery as well. If musical imagery cannot produce the accu-rate time-code for the soundwave of a particular performance of the beginning ofBeethoven's Fifth, then perhaps more abstract (and less time-dependent) percep-tually constrained spatial representations may still form a reference frame for im-agery_ Langner (1997) provides particular evidence that modulations in neuronaldischarge patterns representing pitch are transformed into place representationsfor pitch. The periodicity pitch detection is done sub-cortically and then mappedonto the spatial dimension in the inferior colliculus so that higher cortical areasrepresent pitch in the spatial dimension.

Others argue in favor of a temporal representational system based on corti-cal delay-loops which have a spatial extension as loop patterns (Cariani, 1999).Yet whether such a delay-loop memory can generate the fine grained temporalcode of stimulus induced activity in the case of an imagined sound is unknown,

Page 73: Musical Imagery

60 MODELING MUSICAL IMAGERY

and perhaps questionable in view of the temporal resolution in the cortex. Thedelay-loop memories are nevertheless quite interesting candidates for the imageryof rhythm patterns and expressive gestures. The latter are conceived as tempo-ral patterns which have a coarse resolution and therefore could possibly exist astemporal patterns in the higher cortical levels (see also Eggermont, 1997).

For both pitch and rhythm it is reasonable to assume that perceptually con-strained spatio-temporal representations at higher cortical levels provide a suit-able framework for musical imagery, given the fact that the generation of precisetime-code at lower levels of the auditory system may be less likely.

• A second problem concerns the dynamics of imagery. The hypothesis that spatio-temporal representations for musical imagery are perceptually constrained leavesa lot of room for creative processing. Although this is not the real topic of thispaper, it may be instructive to mention that the creative processes that generateand guide musical imagery, for example in cases of novel compositions, may beconstrained as well. The nature of these constraints may be related to the na-ture of higher-level cortical ongoing autonomous processing which, although notstimulus-induced, may be strongly connected to the existing perceptually con-strained spatio-temporal representations. Cellular automata provide examples ofsystems that have simple constraints (between units called 'neurons' or 'cells')at a local level. The interaction of excitatory and inhibitory constraints, how-ever, may define complex transformation principles which define global spatio-temporal behavior. Shaw (2000) has developed a model of cortical creative pro-cessing in terms of a cellular automaton that simulates some properties of cor-tical columns. The automaton develops global spatio-temporal patterns and canbe constrained by input. Its dynamics shows that physical connections betweenneurons may define a limited spatio-temporal structure in which a set of imagetransformations, based on symmetry operations in space and time, may occur.The system, in other words, defines an inherent coherence, or logic, of imagetransformations that implement operations like the inversion, or retrograde of agiven melody.

Hence, in addition to the hypothesis of perceptually constrained spatio-tem-poral representations, we assume that imagery also relies on a logic ofmusicalimage transformations. This logic is assumed to be the emergent outcome ofongoing autonomous activity based on local constraints between representationalunits in the system. In what follows we avoid the discussion of ongoing au-tonomous activity and focus on the representational system and more easy tograsp principles of image transformation.

Representational description system and ecological assumptions

What kind of representational description system could we adopt as the basis for mod-eling? First consider a symbol-based representational model in which images wouldbe addressed in terms of symbolic entities and a kind of rule system that governs therelationships between these entities. Such a description system may be a useful meta-

Page 74: Musical Imagery

MARC LEMAN

Logical inferences

Image Processing

EnvironmentalInteractions

p,,,,q

III

SymbolicFramework

Spatio-TemporalFramework

61

Environment

Figure 1. Basic framework for musical imagery: the spatio-temporal embedding of musicalimagery is basically constrained by the environment although image transformationprocesses subsume a higher-level brain-dependent logic. The logical account isbased on a grounding of symbolic representations in images.

level description provided that the basic entities can somehow be directly related toimagery.2

A basic idea is that musical imagery may involve abstract concepts which may beinstantiated as images in a representational system spanning spatio-temporal spacesand constrained by perception and inherent higer-Ievel brain processing. The under-lying logic somehow follows from perceptually constrained processing in this spatio-temporal space. What is needed, therefore, is (i) a definition of musical images atdifferent levels of abstraction, and (ii) a relation of the representational framework toa possible inherent logic of musical imagery.

Associated with a proper choice of the description system is an ecological theoryof representation. It starts from the straightforward observation that the constraints ofthe outer-environment are somehow captured by the human brain and that neuronalactivity, stimulated by processing activities involving filtering and extraction of outer-environmental features, provides the ultimate foundation for imagery. The ecologicalapproach is compatible with different levels of information processing. The imagesthat best mirror the constraints of the outer environment are situated at the level ofsensory information processing. At higher perceptual and cognitive levels, imagesmay be completed and deformed into more abstract spatio-temporal representations.The latter typically involve image transformation processes in connection with learn-ing and long term memory. In this view, imagery may start from more general andabstract representations allowing abstract combinations of symbolic nature, towards amore concrete instantiation of imagined objects onto which rotations, symmetry trans-

Page 75: Musical Imagery

62 MODELING MUSICAL IMAGERY

formations and any kind of image transformation may be applied. The spatio-temporalstructures thus realize images through constrained top-down processes in connectionwith the internal (neuronal or brain-dependent) image transformation principles. Theecological basis of musical imagery anyhow implies the connection to perception bymeans of the constrained spatio-temporal framework.

The ecological approach to musical imagery thus assumes that musical imageryoccurs in a tight interaction between the musical environment and a representationalsystem. It entails the view that musical imagery cannot be dissociated from the envi-ronment in which the representational system is embedded, both as perception systemand action system. This concept is depicted in Figure 1 on the preceding page.

Representational concepts

A more detailed account of the basic assumptions involves a definition of image andsubsequent image processing:• Definition of (Auditory) Images3

Low-level (or early) images are conceived of as the result of causal feature ex-traction processes that ultimately rely upon sound waves. A distinction can bemade between different kinds of brain codes, such as (Ehret & Romand, 1997):

discharge rate, or spike train code;rate code, as the amount of discharges during a time unit;phase-locking synchrony code, which represents the amount of synchroniza-tion to a particular frequency.

A useful further distinction can be made between the time-code or the signal en-coding in the time domain of a neuronal channel, and the place-code which ishow information is processed by a group or assembly of neurons at a certain timeinstance. An image can thus be expressed in a rate-time code which would im-ply the representation of a temporal pattern in units that represent the amount ofneuronal discharges, or in a rate-place code which would imply the representa-tion of a spatial pattern in units that represent the amount of neuronal discharges(examples are given below, see Fig. 3c on page 67 and Fig. 4 on page 68). Froma technical point of view, the activation of neurons can be registered as numericalvalues representing aspects of the image.

Musical features, such as loudness, pitch, rhythm, and timbre are not fixedcategories of this representational system. Instead, they are to be understood asemerging aspects of the underlying auditory processing. Once beyond the audi-tory periphery, several new encoding formats may emerge due to image trans-formation processes. Starting from images that accurately capture aspects of thetemporal properties of sounds in the auditory periphery, spatio-temporal imagesmay gradually appear at higher levels. The above mentioned transformation oftime-code into place-code provides an opportunity for musical imagery in thatspatio-temporal representations require a less time-critical imagery and hence,may rely on a higher degree of abstraction and independence from low-levelperception. The present model thus assumes that musical imagery is exploringthe space spanned by the spatio-temporal structures developed during different

Page 76: Musical Imagery

MARC LEMAN 63

stages of perception. The spatia-temporal structures may entail temporal (at alow resolution) and even motoric-temporal information. A main difference ofthe proposed representational system from a pure time-code, however, is that thetemporal aspects are encoded using a spatial format. The time-delay connectionsbetween neurons provide an example (see page 69).

• Causality and CoherenceA particularly intriguing question concerns the notion ofcoherence in imagery. Inperceptual studies, at least, a basic constraint of image processing is rooted in theidea that the operations on images are based on causal processes that guaranteecoherence. A sound can thus be transformed into an image (or several images)provided that there is a causal pathway from the sound to the image. Causalpathways may be formulated in terms of auditory information processing andthey guarantee the coherence of image transformations. But how does it workwith imagery, when transformation is not constrained by data-driven processes?Coherence will imply that images generate other images subsuming an inherentlogic. The hypothesis is that coherence may be guaranteed provided that theimage transformations have a causal basis, which they have if we assume that thespatio-temporal structures carrying images are the spaces where these operationsare carried out. Physical constraints between neurons may define a limited spatio-temporal structure in which a set of image transformations, based on operationsin space and time, may occur. The representational system is thus assumed todefine an inherent coherence or logic of musical image transformations.

• Different Time-Scales and MemoriesMemories are considered representational structures or registers which hold theimages as their content. The memories instantiate the features discussed aboveand in previous studies, and we have explored different types of memories inmusic perception. A distinction can be made between:

Immediate (short-term) memory. This relates to the immediate activation ofan array of neurons.Echoic (short-term) memory. This relates to the fact that images may bebuilt up into a short-term memory lasting no longer than a few seconds. Theechoic (short-term) memory is a very powerful representational device inmusic perception. In Leman (2000) we show that two echoic memories forpitch, one operating with a half-time decay of 0.1 seconds, and the other onewith a half-time decay of about 1.5 seconds, may account for the famousprobe-tone rating experiments of Krumhansl and Kessler (1982). They hadassumed that the task, involving the comparison of a tone with a previoustonal context, involves a long-term memory of tonal relationships. Our studydoes not exclude the existence of a long-term memory for tonal relationships(see next paragraph) but it questions the need of this memory in the probe-tone task in particular. The results raise questions whether expectancy inprobe-tone studies may be considered a genuine effect of imagery. It seemsthat expectation is largely a stimulus-induced activity in which echoic mem-ory plays an important role.Statistical (long-term) memory. This relates to the statistical storage of in-variant features in image streams. In a number of studies we have shown that

Page 77: Musical Imagery

64 MODELING MUSICAL IMAGERY

a statistical (long-term) memory for tonal relationships may develop a spatialstructure (based on the circle of fifths) by mere exposure to music. Belowwe give a more detailed analysis of this model and discuss its relevance tomusical imagery.Episodic (long-term) memory. This relates to the storage of temporal de-pendencies in music which account for the memory of particular pieces andrecognition of particular performances. Below, we give a more detailed ex-ample of a memory system that captures rhythmical patterns. The examplewould also apply to motoric patterns and an interesting link can here be madeto motoric theories of imagery 1997).Working memory. This relates to an activated structure which, in the caseof musical imagery, may be conceived of in terms of spatio-temporal imageproperties. The concept of a working memory may be applied to both thestatistic and episodic memories. The latter appear as image reservoirs thatcan be activated. Working memory applies to activated memories at a certainmoment in time.

Having specified a few basic assumptions of our modeling approach to musical im-agery, we now turn to a more detailed exploration of two image-related representationmodels.

Perceptually constrained spatio-temporal spaces of representation

In this section, I present two models of spatio-temporal representation. The first isa statistical long-term memory, in which topology defines relationships between im-ages. The second is an episodic long-term memory, in which looping patterns definerepeating rhythms. Both types may provide an operational reference framework formusical imagery.

A Statistical Long-Term Memory (SLTM)Computer simulations show that a statistical long-term memory (or schema) for distri-butions of tone centers (as well as of chords) may be built up through self-organizationby mere exposure to music (Leman, 1995b; Leman & Carreras, 1997). The represen-tational structure (see Fig. 2 on the facing page) holds the perceived objects in anordered two-dimensional spatial representation, called a topology or map, resemblingthe mental representations proposed by Krumhansl and Kessler (1982).

The topology provides a perceptually constrained spatio-temporal representationsystem carried by a two-dimensional array of neurons. A top-down induced activationof a certain region on the map may activate neighboring regions which implies rela-tionships between the concepts on the basis of imagery. The representational structureshould be conceived of as a predisposed container for imagery instantiation, similar tothe instantiation of Figure 2b, where the class-bounderies of the structure are shown,into Figure 2a, where the instantiation of the imagery takes place.

A formal description may clarify the above idea in more detail. The StatisticalLong-Term Memory (SLTM) works in two modes. One mode (called SLTM-learning)

Page 78: Musical Imagery

MARC LEMAN

(a)

(b)

65

Figure 2. SLTM of learned tonal distributions. The top panel (a) represents the output ofSLTM to a cadence in C major. The neuronal carrier contains 1()()()() neurons or-dered on a two-dimensional grid of 100 by 100 neurons. The structure is a torus(top and bottom are connected, as well as left and right). The black dots representactivation of neurons in response to the input. The labels are put on the placeswhere the network gives a maximal response. In this case, the response region iscentered on the black spot (which covers the label C). SLTM has been trained with72 different cadences which got organized into 24 classes along circles of fifths.The lower panel (b) gives an idea of the internal boundaries which define the classseparations. The lower figure is obtained by visualization of the neuronal bound-aries that develop by self-organization into the schema. The boundaries show thecontours of the distinct key templates (see Leman & Carreras, 1997).

transforms running images from a data-driven echoic memory into long-term stableimages of the SLTM. This accounts for the formation of a perceptually constrainedtopology of images. Another mode of operation (called SLTM-resonance) uses therepresentational structure as a resonance system. The latter mode has two parts: onepart is basically steered by bottom-up processing and suitable for recognition, theother part is top-down and suitable for imagery. We first describe how SLTM is builtup and then how it can be used in an imagery task.

Page 79: Musical Imagery

66

SLTM-leamingSLTM-Iearning can be described as:

MODELING MUSICAL IMAGERY

SLTMlearning: PE{t) -+ P = < Pk > for k = 1...K (1)

where PE{t) denotes a pitch image at t, and Pk a stable image in SLTM. Thecontainer of stable images Pk is denoted P, and there are K such images (also called:classes). The pitch image is time integrated using a half-decay value, or echo, spec-ified by E. The echo, which is assumed to be determined by neuronal integrationmechanisms, accounts for the pitch context which may be short in the case of chordsand long in the case of tone centers. Different echos thus reflect the categorization ofdifferent objects. The echo of chords can be 0.1 s, while the one for tone centers canbe 1.5 s (Leman, 2000).

The pitch patterns are spatio-temporal patterns derived from auditory nerve pat-terns. The latter are closely connected to the sound patterns in the environment.The causal chain leading to the pitch images involves an Auditory Peripheral Mod-ule (APM):

APM: s{t) -+ d{t) = < dc{t) > for c = 1...C (2)

where s{t) represents the musical sound, and dc{t) the discharge pattern along a par-ticular channel c of the auditory nerve. There are C such channels. An example isgiven in Figure 3c on page 67. The horizontal axis represents time, and the verticalaxis represents the auditory channels in terms of their frequency along a critical bandscale.

The pitch images can be described in terms of the transformation PCM (PitchCompletion Module) which implies a mapping from time-code to place-code. Thusdc{t) is a time-encoded image, while p{t) is a place-encoded image. Both types en-code the rate of neuronal discharges (see Leman, 2000; Leman, Lesaffre, & Tanghe,2000, for more details).

PCM: d{t) -+ p{t)cLPc{t)c=l

(3)

The echoic pitch images follow from a leaky integration of these images:

EMM: p{t) -+ PE{t) (4)

The pitch images and echoic pitch images are shown in Figures 4a-c on page 68.In a number of previous studies (Leman, 1995b), a statistical memory has thus

been trained with pitch images of this type. The training process entails a transfor-mation of time-variant patterns into stable or invariant images in the SLTM.4 In thetone center experiments, the stable images Pk denote k=1...24 major and minor keys.SLTM-Iearning thus represents a categorization process where instances of invariance,called classes or stable images are extracted from instances of variance, called run-ning pitch images. In Leman and Carreras (1997), we used the preludes and fuguesof Bach's Welltempered Klavier as input. Another study was based on the use of 72tonal cadences.

Page 80: Musical Imagery

MARC LEMAN 67

(a)• **

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18J:lU

Audio signal

02i 0

-02

-10

Tllne (In s)(b)

Auditory nerve image

-- .- ----- ---------- ------------------------¥ 3809 -+------.-==_-=..=_ ...---_---&2874 .. __:.:: ====-=====--:i 1981 - --_. =------ - :::=-.-=-=.------..::.=------..:... -------..:...=-=.::::--------:.==.-. _.8 1488

.g 858i 665-< 507

252

3Time (In.) (c)

Figure 3. Transformation of a sound into auditory nerve images, an essential first step inmusic perception. (a) The score, (b) the sound represented as waveform, (c) theauditory nerve images (the vertical axis represents the auditory channels in termsof their frequency along a critical band scale).

SLTM-resonanceMusical listening can be described as a trajectory in SLTM. In a similar way, imagerymay use the space to perform an imagined trajectory which would be both free andconstrained: free because it is not driven by the outer environment, constrained be-cause it is (i) embedded in a space that was first moulded by the outer environment,and (ii) subject to autonomous processing.

First consider how recognition is modeled. SLTM-resonance transforms the stim-ulus induced images into an activated response. This process can be described as:

Page 81: Musical Imagery

68 MODELING MUSICAL IMAGERY

60

-tnE40-(a) ..

Q)Q,

0 2 3 4 5

60

-tnE 40-(b)

"00't:Q) 20Q,

0 2 3 4 5

60

-tnE 40-"0(c) 0

't:Q) 20Q,

0 2 3 4 5 6Time (s)

Figure 4. Pitch images, (a) periodicity pitch images, (b) echoic images using a half-decayvalue ofO.Is, (c) echoic images using a half-decay values of 1.5s. Observe that dueto smearing of the pitch images, the depicted time scale is slightly different.

Page 82: Musical Imagery

MARC LEMAN 69

SLTMresonance: PT(t)Q9P-7A(t) = < Ak(t) > fork=I ...K (5)

The operator® denotes some matching process such as correlation, or the retroac-tive dynamics as described (for key recognition) in Leman (1995a, 1995b). The re-sulting pattern A{t) contains the degree of activation of each stable image Pk in theschema. Each Ak at time point t represents a value that results from the matching ofPat time point t with Pk . The recognition trajectory is thus described by A{t), as shownin Figure 5 on the next page.

Once the SLTM has been developed by learning, it thus reacts as a resonator todata-driven input. The activation of SLTM then shows the degree in which particularinput is recognized. Owing to the structural representation of the SLTM, activationcomes up with additional information regarding the relationships between the repre-sented objects. This is illustrated in Figure 2 on page 65, where the activation of akey center also activates other related key centers thus providing an associated fieldof activations that is related to different represented objects. The resonating schemaas such, however, can be seen as a (short-term) activation of regions in a long-termmemory.

Of interest to musical imagery is the idea that the top-down activation of a par-ticular region, say C immediately reveals the connections to other regions, F, G, a,c, etc... such that the transition from one concept (in this case a tonal center C) toanother concept (tonal center G) can be imagined in a spatial structure, thus provid-ing the coherence needed for imagery. In a similar way, this model would work fora spatial structure that represents chords. The imagery of a chord sequence wouldimply the instantiation of a trajectory on the chord-map. It is important to note thatthe imagery related to the associated concepts has been constrained by environmen-tal adaptation and self-organization. The virtual trajectory may first be thought of interms of a sequence of symbols and then realized by means of the trajectory in theappropriate mental space. The model predicts that listeners, subjected to an auditorytraining with piano chords only, would use this spatio-temporal structure and henceimagine chord sequences basically as chords having piano timbres. The timbre of asound indeed has an effect on the pitch images. Although this top-down model hasnot been implemented we believe that an implementation would imply some directconnection between a symbolic representation and a neural activation which, whenactivated, may trigger a structural response.

An Episodical Long-Term Memory (ELTM)The second example deals with episodic memory, in particular, the question of howmusical sequences can be stored in a memory and subsequently recalled and imagined.Cariani (1999) has explored the idea of time-delay memories capable of storing andholding repeating patterns in time-delay loops. The time-delay memories are related tothe periodicity transform technique proposed by Sethares and Staley (1999). In whatfollows, we present a related model based on the notion of multiple pattern scanning(Leman, Tanghe, Moelants, & Carreras, 1999; Leman & Verbeke, 2000).

Page 83: Musical Imagery

70 MODELING MUSICAL IMAGERY

Context Images (tee) Compared to Stable Tone Center Imagesbbbaabgf#feebdc#cBBbAAbGF#FEEb0C#C

I0 0.5 1.5 2 2.5 3 3.5 4 4.5 5 5.5

Figure 5. Example of a trajectory represented as a function of the invariant or stable images.The schema was learned by 72 different cadences (using Shepard tones), as shownin Figure 2 on page 65. The input were the pitch images shown in Figure 4c onpage 68 (Schumann piece). The activations describe a trajectory on tone centerimages with respect to the topology.

Consider the following transformation:

s(t) -+& (6)

where s(t) denotes a repetitive sound pattern such that s(t + T) = s(t). This pattern iscalled T-periodic (see Sethares & Staley, 1999) and is transformed into a loop-patternrof length T, which we denote as LT' The pattern s(t) is now highly reduced andcan be economically stored in ELTM. Looping through the pattern&gives a kind ofreconstruction of the perceived original pattern s(t). It can be considered an instanceof musical imagery.

In what follows, a formal description of a particular ELTM model is given. Themodel is applied to a piece of music with the aim to find the repeating rhythmicalpattern. The test whether the model actually works was done by resynthesis of therhythmic pattern, using the stored rhythm pattern to amplitude-modulate band-limitednoises.

The build up of ELTM involves the Auditory Peripheral Module (Expression 2on page 66) which transforms a given sound s(t) into neuronal patterns dc(t) at thelevel of the auditory nerve. This step is similar to the first step in SLTM-Iearning.

Page 84: Musical Imagery

MARC LEMAN 71

It basically provides an analysis of the sound in different frequency bands such that apossible polyrhythmical pattern may be dealt with, provided that the different rhythmsare in different frequency layers (which they often are in music). The next step, calledEnvelope Extraction (EE) extracts the energy envelope from the patterns de{t) intoee (t). It entails a reduction in temporal resolution (Le. lower sampling rate):

(7)

A Looping Analysis (LA) then transforms the energy envelopes into a set of looping-patterns:

Tcr{t) for! = 1...T...8 (8)

where Ie (t) denotes a set of loops, found at time instance t. Each loop Tcr has a length! ( from 1 to 8) at this time instance t. The next step builds up ESTM by choosing thebest loop pattern and storing it into a memory.

(9)

where mc{t) holds the memory with the best loop-pattern at time t.Figure 6a on page 72 is an example of a waveform representing an excerpt of

the Presto energico from theMusica Ricercate per pianoforte (1951-53) by G. Ligeti(Bis-CD-53). The piece is characterized by a repeating rhythmic pattern which isindicated on the figure. Figure 6b shows the synthesized waveform based on the loop-patterns contained in ELTM after 2.5 seconds. A resynthesis of the original rhythmicalpattern can be obtained by using band-limited noises covering the original frequencyrange of the auditory model, and then amplitude modulate the noises with the foundrhythm patterns. Thus we have: re{t) representing a band-limited noise signal withcenter frequency equal to the auditory channel c, a specific pattern me taken at 2.5seconds which is repeated and which models the noise signal. The final signal Sf (t)is obtained by summing the amplitude modulated noises over all channels such thatSf (t) = re{t)Mc{t), where Me{t) is the repeated pattern me at 2.5 seconds.5

The sequence of loop-patternsmc{t), for c corresponding to a center frequency of1111 Hz is shown in Figure 7 on page 73. The figure displays the patterns detectedin auditory channel 10, which has a center frequency of 1111 Hz. The pattern at 2.5seconds is shown in the small frame. This pattern has been selected for modulation ofa band-pass noise with the same center frequency. The re-synthesis accumulates theresults of all auditory channels.

Although this has not been implemented thus far, the memory unit m can furtherbe connected to a spatial memory, like the one considered in the previous section. Thiswould allow an additional reduction of the data, with an additional classification andordering.

As with LTSM, an ELTM can be conceived of as a resonance system, allowingstimulus-driven recognition of sequences or particular gestures, as well as top-downdriven imagery. In the case of musical imagery, the top-down connection may acti-vate a loop pattern providing a temporal unfolding of a musical idea (a performance,gesture, rhythmic pattern etc...). A musical structure can be conceived as a nested col-lection of such episodic memories. Musical imagery may reason about the memory

Page 85: Musical Imagery

72 MODELING MUSICAL IMAGERY

G.Llgetl: Muslca Rlcercata (1951-53)

(a)432

o

-1 '---- --'-- --..L .....L- ---L__-J

o

0.5

-0.5

Resyntheslsed, using the pattern built up at time 2.5 s

(b)4

Time (a)32

o

-1 L-- --'-- --..L ...L.- --..L__-J

o

0.5

-0.5

Figure 6. Original soundwave and resynthesised soundwave using ELTM. (a) Wavefonn rep-resenting an excerpt of the Presto energico from the Musica Ricercate per pi-anoforte (1951-53) by G. Ligeti. (b) Synthesised wavefonn based on the loop-patterns contained in ELTM after 2.5 seconds. The loop-patterns of the differentauditory channels are repeated. The resulting modulation patterns are then multi-plied with band-pass noises having the center frequency of the corresponding audi-tory channels.

units and activate them within the spatio-temporal memory system, hence producinga working memory for spatio-temporal images. Imagery may perform several opera-tions on these units which are constrained by the principles of coherent transforma-tions.

Implementation

The symbol-based description system which is adopted here is actually implementedin terms of a mathematical functional equivalence model which accounts for signal(and image) processing. It is assumed that image processing in the auditory systemcan be described independently from the physical carriers (the neurons), providedthat basic properties of the information processing in neurons is somehow retained.

Page 86: Musical Imagery

MARC LEMAN

Auditory Channel 10, CF =1111 Hz

0.8

-!:::s 0.6=Q.E 0.4III

0.2

00 100 200 300 400

t

73

o 2 3 4

Tlme(s)

5

Figure 7. Picture of the patterns detected in auditory channell O. which has a center frequencyof IIII Hz. The pattern at 2.5 seconds is shown in the small frame. This pattern hasbeen selected for modulation of a band-pass noise with the same center frequency.The resynthesis accumulates the results of all auditory channels. The diagonallines in this figure results from a phase correction such that stable periods are seenas horizontal lines.

Signal processing concepts such as filtering, integration, correlation and other relatedconcepts provide an appropriate description tool.

The modules described in this paper form part of an ongoing project at the Institutefor Psychoacoustics and Electronic Music, Ghent University, which aims at construct-ing a toolbox for perception-based analysis of music. The IPEM Toolbox6 contains aset of Matlab functions for auditory-based musical signal processing. The functionscan be accessed at three different levels: functional-logical, signal processing, andimplementation.• The functional-logical level provides a description of the Matlab functions in

terms of the functional formal logics adopted in this paper. It allows a concisedescription which resembles the way in which the IPEM Toolbox modules are to

Page 87: Musical Imagery

74 MODELING MUSICAL IMAGERY

be used while programming in Matlab.• The signal processing level provides a description of the Matlab functions in

terms of the mathematics of signal processing. This level of detail is not dealtwith in this paper.

• The implementation level concerns the way in which the function is implementedin Matlab. Signal processing functions of the Matlab signal processing toolboxhave been used if possible.

Conclusions

A modeling framework for musical imagery is based on the assumption that the pre-cise time-code of neuronal activation in the auditory periphery is somehow trans-formed into a spatio-temporal representation forming a perceptually constrained spacefor musical imagery. This paper had a focus on the constrained representational sys-tem, yet we believe that the full power of musical imagery is also related to a pro-cessing engine based on an inherent and representational constrained logics. Cellularautomata provide models to deal with the complexities of this inherent higher-levelbrain logics but at this stage it may be more convenient to explore more simple andbetter understood image processing models.

The focus on representational issues was restricted to two examples in the auditorydomain. One model is based on a statistical long-term memory. Imagery is heredefined in terms of a trajectory in a perceptually constrained mental space. A secondmodel is based on an episodic long-term memory. Imagery is here defined in termsof an ongoing looping of perceptually constrained patterns. The latter model providesways of conceiving imagery in terms of sequential patterns, not necessarily restrictedto auditory patterns.

The modeling approach to musical imagery, apart from being a method for spe-cific theory construction related to experimental research and empirical (behavioraland brain) data finds its most straightforward application in the domain of multi-modal musical performance environments (Camurri, 1999) where systems interactin a natural and expressive way with human performers. Future work in modelingmusical imagery will concentrate on the application of the spatio-temporal percep-tion constrained representations to the recognition and imagery of expressive musicalgestures. The episodic long-term memory explored in this chapter offers promisingperformance opportunities in this domain.

Notes

I. Studies in physiological acoustics and related brain research furthermore offer rather detailed descrip-tions of auditory representation and processing, from the cochlear system to the cochlear nucleus andhigher brain centers (Ehret & Romand, 1997; Zenner, 1994; Pickles, 1982). Both these studies, to-gether with the more recent developments in brain imaging (Tervaniemi & Leman, 1999), suggest thata theory of musical imagery could ultimately be based on neuron-like structures rather than high-leveland abstract cognitive constructs (Leman, I999a).

Page 88: Musical Imagery

MARC LEMAN 75

2. A pure symbol-based account at the object-level remains too abstract and arbitrary serving a hot-bedfor epistemological problems. Note that a symbol-based system cannot capture the compelling causalconstraints that exist between images. Constraints between representations need to be explicitly definedand this approach would therefore be in conflict with the proposed ecological basis of musical imageryin which the constraints are assumed to result from interactions with the environment (Leman, 1999b).

3. In what follows, the account is basically focused on auditory patterns.4. These invariant or stable images, which are the predisposed images in a representational structure (and

learned by self-organization) have no running time index (see Expression 1 on page 66).5. The sound examples are available at http://www.ipem.rug.ac.be/staff/marc/marc. html.6. See http://ilww.ipem. rug. ac. be/research. html.

References

Camurri, A. (1999). Music content processing and multimedia: Case studies and emerging applications ofintelligent interactive systems. Journal ofNew Music Research, 28 (4), 351-363.

Cariani, P. (1999). Timing nets for rhythm perception. In M. Leman (Ed.), Proceedings ofthe Tenth Meetingofthe FWO Research Society on Foundations ofMusic Research - Music and TIming Networks (pp. 28-37). Ghent University: IPEM - Dept. of Musicology.

Eggermont, J.-J. (1997). Representation of amplitude modulated sounds in two fields in auditory cortex ofthe cat. In 1. Syka (Ed.), Acoustical signal processing in the central auditory system (the language ofscience) (pp. 303-319). New York, N.Y.: Plenum Press.

Ehret, G., & Romand, R. (Eds.). (1997). The central auditory system. New York, Oxford: Oxford Univer-sity Press.

GOd0Y, R. (1997). Knowledge in music theory by shapes of musical objects and sound-producing actions.In M. Leman (Ed.), Music, Gestalt, and computing: Studies in cognitive and systematic musicology(pp. 89-102). Berlin, Heidelberg: Springer-Verlag.

Krumhansl, C., & Kessler, E. (1982). Tracing the dynamic changes in perceived tonal organization in aspatial representation of musical keys. Psychological Review, 89, 334-368.

Langner, G. (1997). Temporal processing of pitch in the auditory system. Journal ofNew Music Research,26,116-132.

Leman, M. (1995a). A model of retroactive tone center perception. Music Perception, 12, 439-471.Leman, M. (1995b). Music and schema theory: Cognitive foundations of systematic musicology. Berlin,

Heidelberg: Springer-Verlag.Leman, M. (1999a). Adequacy criteria and models of musical cognition. In 1. Tabor (Ed.), Otto Laske:

Navigating new musical horizons (pp. 93-120). Westport, CT: Greenwood Pub!. CompoLeman, M. (1999b). Naturalistic approaches to musical semiotics and the study of causal musical signi-

fication. In I. Zannos (Ed.), Music and signs - semiotic and cognitive studies in music (pp. 11-38).Bratislava: ASKO Art & Science.

Leman, M. (2000). An auditory model of the role of short-term memory in probe-tone ratings. MusicPerception, 17(4),481-509.

Leman, M., & Carreras, F. (1997). Schema and Gestalt: Testing the hypothesis of psychoneural isomor-phism by computer simulation. In M. Leman (Ed.), Music, Gestalt, and computing: Studies in cognitiveand systematic musicology (pp. 144-168). Berlin, Heidelberg: Springer-Verlag.

Leman, M., Lesaffre, M., & Tanghe, K. (2000). An Auditory Toolboxfor Perception-BasedMusic Analysis.Ghent University: IPEM - Dept. of Musicology, Ghent (manuscript).

Leman, M., Tanghe, K., Moelants, D., & Carreras, F. (1999). Analysis of music using timing networkswith memory: Implementation and preliminary results. In M. Leman (Ed.), Proceedings of the TenthMeeting ofthe FWO Research Society on Foundations ofMusic Research-MusicandTIming Networks(pp. 53-59). Ghent University: IPEM - Dept. of Musicology.

Leman, M., & Verbeke, B. (2000). The concept of minimal 'energy' change (MEC) in relation to Fouriertransform, Auto-correlation, Wavelets, AMDF, and brain-like timing networks - Application to therecognition of repetitive rhythmical patterns in acoustical musical signals. In K. Jokinen, D. Heylen, &A. Myholt(Eds.), Cele-Twente Workshop on Language Technology, Workshop 11: Internalizing Knowl-

Page 89: Musical Imagery

76 MODELING MUSICAL IMAGERY

edge (pp. 191-200). leper, Bergium: Cele-1\vente.Pickles, 1. (1982). An introduction to the physiology ofhearing. London: Academic Press.Schreiner, C., & Langner, G. (1988). Coding of temporal patterns in the central auditory nervous system.

In G. Edelman, W. Gall, & W. Cowan (Eds.), Auditory function: Neurobiological bases of hearing(pp. 337-361). New York, NY: John Wiley and Sons.

Sergent, 1. (1993). Music, the brain and Ravel. Trends in Neurosciences, 16 (5), 168-172.Sethares, W., & Staley, T. (2001). Meter and Periodicity in Musical Performance. Journal ofNew Music

Research, 30 (in press).Shaw, G. (2000). Keeping Mozart in mind. San Diego, CA: Academic Press.Tervaniemi, M., & Leman, M. (Eds.). (1999). Cognitive neuromusicology. Lisse, The Netherlands: Swets

& Zeitlinger. (Special issue of Journal ofNew Music Research).Zatorre, R. (1997). Cerebral correlates of human auditory processing: perception of speech and musical

sounds. In J. Syka (Ed.), Acoustical signal processing in the central auditory system (the language ofscience) (pp. 453-468). New York, N.Y.: Plenum Press.

Zatorre, R., Evans, A., &Meyer, E. (1994). Neural mechanisms underlyingmelodic perception and memoryfor pitch. The Journal ofNeuroscience, 14, 1908-1919.

Zatorre, R., Halpern, A., Perry, D., Meyer, E., & Evans, A. (1996). Hearing in the minds's ear: A PETinvestigation of musical imagery and perception. Journal ofCognitive Neuroscience, 8, 29-46.

Zenner, H.-P. (1994). Horen - Physiologie. Biochemie, Zell- und Neurobiologie. Stuttgart: Georg ThiemeVerlag.

Page 90: Musical Imagery

5

Mental Images ofMusicalScales: A Cross-culturalERP Study

Christiane Neuhaus

Introduction

Are there different constraints on musical imagery in different cultures? Are thereculture specific schemata which govern the formation of images of musical sound inour minds? It would be tempting to guess that this is the case when we observe howmost other expressions of music seem to be influenced by culture. However, as withso many other aspects of musical imagery, our information about the nature of suchimages is mostly indirect, i.e. we can access images of musical sound only throughvarious accounts, experiments and observations. One such observational approach tomusical imagery is through what can be deduced from research on the perception andcognition of musical sound by means of non-invasive neurometrical methods. Thesemethods have led to the establishment of an interdisciplinary field of study which hasdeveloped rapidly during the past fifteen years, a field which is now generally knownas cognitive neuromusicology.

Many of the results achieved so far in neuromusicology are based on researchwhich uses event-related potentials (ERPs) as the method of study. ERPs are stimulus-related recordings of brain currents. They result from averaging data which have beenrecorded in a (typically multi-channel) EEG set-up. This recording of brain-electricalactivity is performed synchronously with the presentation of a stimulus which is re-peated several times without changing. During the subsequent process of averaging

Page 91: Musical Imagery

78 MENTAL IMAGES OF MUSICAL SCALES

this data, all responses of equivalent epochs of registration recorded from several in-dividuals are cleared of artifacts and combined so as to yield a single graph, plottingvoltage against time. This resultant graph, extracted from raw EEG data, is calledevent-related potential and can be regarded as a 'trace' of cerebral activity. In this ap-proach, which can be called cognitive information processing, characteristic positiveor negative deflections within the potential trace (so-called components) are generallytaken as indicators, or correlates, of psychic processes. ERP patterns therefore reflectpsychological states and mental functions (see, e.g., Fabiani et aI., 1987; Altenmtiller,1993).Experimental designs in ERP-measurements investigating the processing ofacous-

tic stimuli often refer to situations where subjects are inactive and just have to listento stimuli. Such a design was also employed in the experiment reported here. Tasksgiven to test persons, such as counting stimuli, detecting signals or solving some puz-zle, are used primarily to make the subjects maintain a high degree of awareness. Thisis required for instance for recordings of the so-called P300-component. On the otherhand, there are also several kinds of experiments during which subjects are actuallyrequired to behave unattentively, for instance in measurements of the so-called mis-match negativity (MMN).

Stimuli used in most published music-related ERP-studies fall into three classes(DC potentials are not considered here):a. Single stimuli such as isolated harmonic intervals (Cohen et aI., 1993) or chords

(Taub et aI., 1976)b. Pairs of stimuli with one stimulus as a standard or reference, and the other one

as a deviant (usually presented at unequal rates), as well as permutations of twostimuli within a five-tone pitch pattern (cf. Klein et aI, 1984; Tervaniemi et aI.,1993; Cohen et aI., 1993; Cohen & Erez, 1991; Tervaniemi et aI., 1999).

c. Melodic and/or harmonic deviations which violate musical expectancies, follow-ing a preceding (usually short) musical context (e.g., Besson and Macar, 1987;Paller et aI., 1992; Hantz et aI. 1997; Janata, 1995).

With regard to the selection of subjects, most ERP-studies divide the subjects intoclasses, such as musicians versus non-musicians, as well as subjects who possess ab-solute pitch vs. subjects who have relative pitch.

The approach chosen by the present author and reflected in this article (which isonly a preliminary report of a more detailed study to be published later), differs fromother ERP-studies published up to now in that event-related potentials have been con-sidered a tool with respect to investigating cross-cultural aspects in music psychol-ogy. The purpose of the present ERP-study has been to see what influence culturalfactors may possibly have on the perception and the apperception of musical scales.Therefore, subjects of different cultural origin and background took part in the exper-iment. The basic design here follows ideas expressed by Dalia Cohen (Cohen & Erez1991). She has suggested that the (ERP-)responses of listeners from different culturesto various scale-structures should be examined more closely. Because the groupingof subjects as well as the selection of stimuli was determined by this interculturalpoint of view, Indian, Turkish and German musicians were chosen to take part in theERP-experiment.

Page 92: Musical Imagery

CHRISTIANE NEUHAUS 79

In line with this multi-cultural selection of test subjects, the stimulus material con-sisted of heptatonic scales which bear characteristics of the German (Western) musicalculture on the one hand, and of Thai (Siamese) and the Turkish-(Arab) musical cul-tures on the other hand. My idea was that scales, e.g. sequences of five or seven tonesarranged according to pitch, are found in very many musical cultures and may thus beconsidered a 'universal' feature of music. (Ornamental tones on microtonallevel, e.g.gamakas, which are found in Turkish als well as North Indian or South Indian music,are not considered here.)

The ERP-waveform investigated in this experiment has been the so-called P300-component, which has played a major role in many psychophysiological studies. TheP300 can be described by means of ERP-parameters. It a) shows a positive polar-ity, and b) has an amplitude maximum at the centro-parietal scalp locations peakingabout 300 msec after stimulus onset. In several experiments, including my own, theP300-component is evoked within a design usually described as the 'classic oddballparadigm' (see, e.g., Fabiani et aI., 1987, Appendix A). The term oddball refers tosequences of stimuli which consist of two distinct classes (A and B), which are com-plementary in probability. Whereas members of one of the classes of stimuli (A)are presented frequently, members of the other (B) are scarce. Thus, the total seriesconsists of stimuli (A) interspersed with stimuli (B), and both these stimuli differ con-siderably in their respective physical features. In the experiments, (B) is used as atarget to which subjects are asked to attend.

Although the cognitive relevance of the P300-component has been debated (see,e.g., Verleger, 1988), there is a general consensus that this waveform, from the psy-chological point of view, should be considered an indicator of information processing.In particular, the functional significance of the P300-component has been explainedaccording to the so-called 'context updating' model, which has also been used forthe interpretation of some of the data presented here. Given that what we perceiveengenders so called 'mental representations' (or 'internal representations') stored inshort-term memory (STM), 'context updating' is in general described as a process ofrevision and actualization of these mental images of the outer world. Unexpected in-put or rapid changes in the structure (or order) of stimuli presented, will necessitatean 'update' of the current contents of the STM (Donchin, 1981; Altenmtiller, 1993).

In the present study, the hypothesis tested by means of the 'oddball paradigm' isas follows:

HI: Musical scales with various structures will be perceived differently by In-dian, Turkish and German musicians. P300-reactions as recorded from mem-bers of these cultural groups are expected to differ accordingly.

With this HI, the HO can be formulated simply as:

HO: In listening to musical scales with various Indian, Turkish andGerman musicians will employ basically the same mechanisms of perceptionso that P300-recordings will not yield results which are statistically significant.

Page 93: Musical Imagery

80 MENTAL IMAGES OF MUSICAL SCALES

Table 1. Interval sizes (in cents) of makam Hicaz and Thai scale.

I II III IV V VI VII VIIImakamHicaz 0 114 385 499 703 883 997 1201Thai scale 0 171,4 342,8 514,2 685,6 857 1028,4 1199,8

Method

SubjectsFive German, five Turkish and five Indian musicians between the ages of 20 and 54,all male with normal hearing, participated in the ERP-experiment carried out at theaudiotechnicallaboratory of the Institute ofMusicology at the University ofHamburg.The German group consisted mainly of young conductors who were studying at theAcademy of Music in Hamburg, and none of them had listening experience of non-Western music. (Because the skill of absolute pitch (AP) and its neurophysiologicalcorrelates were no matter of investigation here, subjects were not additionally dividedinto groups with and without AP.)

The group of Turkish musicians, most of whom were members of an amateur sazensemble, had been living in Germany for at least nine years, and thus liable to havebecome acculturated. They were accustomed to listening to Turkish folk music as wellas international rock and pop music. The group of Indian subjects was chiefly madeup of trained musicians who played the tabla, sarod and mridangam professionallyand were staying in Germany for only a short time. They were in the habit of listeningto Western classical music and international rock and pop music occasionally.

StimuliThe stimulus material used in this experiment consisted of four heptatonic scales,each of them stored on Digital Audio Tape: The European major and harmonic minorscales, the Thai scale made of equal steps, and the makam Hicaz of Turkish art music.The stimulus material was presented binaurally through earphones and each scale hadthe starting pitch of 493,88 Hz (= scale-tone b), the level of sound pressure was 75dB(A) constant for each scale-tone. Thai scale and makam Hicaz are characterized bythe interval sizes (cent values summed up) shown in Table 1.

The tone material was generated on a programmable synthesizer (Roland JD 800),and the sustain segment of each sound envelope was shortened using the Sound De-signer II software on a Macintosh computer. Every single tone was based on a pulsewave (synpulse 2 of the JD 800 sound catalogue). The stimulus duration was 200 msec(attack time 15 msec, release time 30 msec), and the interstimulus interval (stimulusonset to stimulus onset) ran up to 540 msec. Each trial, consisting of eight scale-tones,started after a two second rest.

From this tone material five blocks of scales, combined into pairs, were put to-gether for the experiment. Each block consisted of 60 scales. According to the classicoddball paradigm, they were divided into standard and deviant scales, occurring re-spectively 75% and 25% of the time. The arrangements of blocks were made as shownin Table 2 on the facing page.

Page 94: Musical Imagery

CHRISTIANE NEUHAUS

Table 2. Arrangement of scale blocks used in the experiment.

standard-scale deviant-scaleblock 1 major thaiblock 2 major harmonic minorblock 3 makam Hicaz thaiblock 4 major makam Hicazblock 5 thai major

81

ProcedureAt the beginning of an experimental session, the subject was informed in detail aboutthe EEG method in general, the procedure of measurement, and the task at hand, butdid not get any information about scale structure and the cross-cultural issue in theexperiment. The actual task consisted of listening to 60 trials of each block of scaleswith a high degree of awareness. An additional task had the aim to keep the subjectattentive, and this included a) the silent counting of the deviant scales according to theoddball paradigm, as well as b) the notation of the internal structure of each standardand deviant scale.

Apparatus and recordingsRecording and off-line processing of the EEG signals were carried out by means of apaperless multi-channel EEG apparatus (PL-EEG), which had a special EP-softwareinstalled and was delivered from Walter Graphtek GmbH of Ltibeck (a company trad-ing in neurotechnological equipment). The time constant was 0,3 sec, high cut offfrequency 140 Hz, degree of amplification 20000 for all electrode inputs, and thesampling rate was 667 Hz for each channel. Bioelectrical signals of the brain (i.e.event-related signals superimposed by spontaneous activity) were recorded with threesintered Ag-/AgCI-scalp electrodes attached to the midline with scalp locations Fz, Czand Pz, according to the international 10-20-system. In addition, there was a verticalelectro-oculogram (VEDG) registered for the control of eyelid and blinking artifacts.

Data analysis

Given that five equivalent trials are the minimum necessary for reliably collecting thedata, artifact free trials were averaged separately for each subject, block, scale-tone,electrode site, and kind of scale. Thus, the product of averaging was single ERP-potentials which were baseline-corrected afterwards. Grand averages, Le. curvessummed up as a result of an averaging procedure performed for subject groups, havebeen generated externally at the Centre ofElectronic Data Processing of the Universityof Hamburg by means of the SPSS software. These grand averages, drawn up for eachsample of the subjects, were the basis for subsequent visual analysis. However, sincethe required equipment was not available, it was impossible to smooth those curvesin question, Le. to remove residual parts of the so-called background noise after theprocedure of averaging. For statistical analysis, baseline-to-peak measurements have

Page 95: Musical Imagery

82 MENTAL IMAGES OF MUSICAL SCALES

additionally been carried out by hand. They yielded maximum amplitudes within twolatency ranges of data input, range I between 270 msec and 430 msec, and range IIbetween 430 msec and 540 msec after stimulus-onset.

Step 1 of the computation consisted of a one-factor analysis of variance calledup with the SPSS command ONEWAY where the dependent variable was MITIPIto M2T8P3 (M for 'mode', i.e. standard status or deviant status, T for 'scale-tone',and P for 'electrode placement'), and the independent variable was CULTURE (withthree levels, i.e. the German, Turkish, and Indian groups). This analysis of variancedesign was developed to investigate the influence of the factor 'culture' during theperception of musical scales by means of a posteriori 'pair-wise' comparisons afterhaving computed the omnibus F-test. (For paired comparisons, the Scheffe-test hadbeen applied; results were considered significant at p<.05 and p<.O1). Also, a t-test forpaired samples was employed to investigate whether the arithmetic mean of a deviantscale-tone differs significantly from that of its corresponding standard scale-tone atone of the three scalp locations (alpha =.05, n = 15 subjects).

Step 2 of the computation was a four factor repeated measures design of variance.It was performed with the SPSS command MANOVA and was limited to the omnibusF-test; repeated measures factors were 'mode' (standard status and deviant status),'scale-tone', and 'electrode site', and the between-subjects factor was 'culture'. (Thecorrectional procedure was done after Greenhouse & Geisser, computations have beensignificant at a p<.05 level).

Some results

Unexpectedly, the graphs ofmany scale potentials revealed negative shifts visible at anonset latency of at least 430 msec. This negativity was present either as an additionalreaction to the P300 wave or as the only endogenous deflection. Thus, in oppositionto the original experimental concept, there was established a second latency range ofbetween 430 msec and 540 msec. Within this time interval, all results being significantwere taken into account for the visual and statistical analyses.

Block 1 (major standard-scale versus Thai deviant-scale)Scale-tone 2For all subjects, Thai-tone number 2 evokes a 'trace' of cerebral activity, having anegative deflection with a local amplitude maximum in the second latency range (seeFigs. 1 and 2). The height of the peak differs clearly from that in the correspondinggraph of the major scale (mean amplitudes are listed in Table 3 on the next page,standard deviations in parentheses). German subjects also show a negativity for themajor scale-tone at all three scalp sites (see Fig. 1). This difference in amplitude valueshas been verified by the t-test for paired samples, showing three significant results: Fz:t(14) = 2,49 p<0,05 Cz: t(14) = 3,73 p<O,OI pz t(14) = 3,88 p<O,OI.

The grand averages arranged according to cultural grouping can clearly be dis-tinguished from each other with regard to negative shifts appearing with the majorstandard scale-tone (see Fig. 3 on page 85). These findings had been substantiated bymeans of the F-test. It yields significant values for electrode placements Fz and Cz,

Page 96: Musical Imagery

CHRISTIANE NEUHAUS

Table 3. Mean amplitudes, scale tone 2.

block 1 Fz Cz pztone 2 S 0 MD S 0 MD S 0 MDGerman -6,25 -6,34 0.09 -6,94 -9,35 2,41 -5,81 -10,24 4,43

(2,55) (2,6) (2,48) (4,37) (2,62) (4,88)Turkish -2,26 -5,62 3,36 -2,71 -7,44 4,73

(2,94) (3,27) (2,06) (3,76)Indian -0,78 -4,87 4,09 -1,64 -5,75 4,11 -2,45 -5,74 3,29

(2,78) (5,24) (2,74) (6,52) (3,18) (7,45)

83

Note. Mean amplitudes in JlV (baseline-to-peak-measures) and standard deviations (in paren-theses). Block 1, tone 2 major standard-scale (S) versus tone 2 Thai deviant-scale (0), meandifferences (MO), latency range II (430 msec - 540 msec), Fz, Cz, Pz; German, Turkish andIndian subjects.

Fz: F(2/12) =5,26 p<0,05 and Cz: F(2/12) =6,57 p<0,05. The Scheffe-test specifiesthis general result and indicates a significant difference between the amplitude valuesof Indian and German subjects (alpha = .05).

Scale-tone 7German musicians responded to the seventh deviant Thai-tone with a well markedP300-deflection at Fz, Cz and pz scalp locations (see Fig. 4 on page 86). Indiansubjects produced a P300-component also, though with smaller amplitudes at Cz (seeFig. 5 on page 86) and Pz, in this case superimposed by alpha activity.

For electrode placements Cz and Pz, results of the t-test were significant, Cz:t(14) = -2,35 p<0,05, Pz: t(14) = -2,73 p<0,05. The visual data can be confirmedby the plotting of grand averages according to cultural grouping of the subjects, how-ever the application of the F-test (with the Scheffe-test) does not reveal any significantresults (see Fig. 6 on page 87).

Scale-tone 8At all three scalp sites, German musicians showed a so-called 'long lasting positivity'after the presentation of the eighth deviant Thai-tone (see Fig. 7 on page 88). As avariant, Turkish subjects developed a P300-peak at Fz and Cz scalp locations, how-

Table 4. Mean amplitudes, scale tone 7.

block 1 Fz Cz pztone 7 S 0 MD S 0 MD S 0 MDGerman 1,83 7,12 -5,29 1,51 6,21 -4,7 1,35 4,76 -3,41

(1,85) (4,92) (1,14) (6,8) (1,12) (5,61)Turkish 3,5 3,77 -0,27 2,99 4,27 -1,28

(2,93) (2,97) (2,74) (3,09)Indian 2,06 3,51 -1,45 3,32 6,04 -2,72 3,01 6,1 -3,09

(2,75) (4,13) (2,22) (3,57) (3,74) (1,22)

Note. Mean amplitudes in JlV and standard deviations (in parentheses). Block 1, tone 7 majorstandard-scale (S) versus tone 7 Thai deviant-scale (0), mean differences (MO), latency rangeI (270 msec - 430 msec), Fz, Cz, Pz; German, Turkish and Indian subjects.

Page 97: Musical Imagery

84 MENTAL IMAGES OF MUSICAL SCALES

.8,.------....:.-------------------.

-6

·4

4

2

"'v /\/ '........ \\ 1\/ \

\,'\, ,\I

° 60,2 120,5 180,7 241,0 301,2 361,4 421,7 481,9 542,230,1 90,4 150,6 210,8 271,1' 331,3 391,6 451,8

Figure 1. Grand average ERPs of German musicians (n = 5 subjects). Electrophysiologicalreactions to tone 2 of the major standard-scale (solid line) and tone 2 of the Thaideviant-scale (broken line), scalp site Cz. Though the diagram shows the wholerange of averaging (0 msec - 540 msec), only the course of potential in latencyrange II is analyzed (range: 430 msec - 540 msec, negativity is up). Horizontalaxis indicates time in msec, vertical axis indicates average Cz in JlV.

"\/\/"" \I \I \

/

./I11I,

-6

-4

.2

"/ \

°2

4

8

1°° 60,2 120,5 180,7. 241,0 301,2 361,4 421,7 481,9 542,230,1 90,4 150,6 210,8 271,1 331,3 391,6 451,8 5J2,0

6

Figure 2. Grand average ERPs of Indian musicians (n =5 subjects). Electrophysiologicalreactions to tone 2 of the major standard-scale (solid line) and tone 2 of the Thaideviant-scale (broken line), scalp site Cz. Though the diagram shows the wholerange of averaging (0 msec - 540 msec), only the course of potential in latencyrange II is analyzed (range: 430 msec - 540 msec, negativity is up). Horizontalaxis indicates time in msec, vertical axis indicates average Cz in JlV.

Page 98: Musical Imagery

CHRISTIANE NEUHAUS

.8..--------------------------,-6

85

.2

2

6

8

1o 60,2 120,5 180,7 241,0 301,2 361,4 421,7 481,9 542,230,1 90,4 150,6 210,8 271,1 331,3 391,6 451,8 512,0

Figure 3. Synoptic diagram of the electrophysiological reactions to tone 2 of the majorstandard-scale for all three cultural groups. Grand average ERPs of Gennan sub-jects (solid line), Turkish subjects (broken line) and Indian subjects (dotted line),recordings from scalp site Cz. Horizontal axis indicates time in msec, vertical axisindicates average Cz in JlV.

ever, this kind of component reaction was not developed by Indian subjects. The t-testof paired samples yields the following significant results: Fz: t( 14) = -2,86 p<O,05 Cz:t(14) = -3,38 p<O,01 und Pz: t(14) = -3,7 p<O,01.

Discussion

Each actually perceived tone of a musical scale should be considered as an incomingstimulus in relationship to short-term memory. The perceived tone will be comparedwith the temporarily stored mental representation (internal image) of the standard ordeviant pitch information established in the STM by the running through of 45 stan-

Table 5. Mean amplitudes, scale tone 8.

block 1 Fz Cz pztone 8 S D MD S D MD S D MDGennan 1,92 8,23 -6,31 1,37 7,25 -5,88 0,68 6,3 -5,62

(4,46) (5,5) (2,61) (4,56) (2,71) (4,99)Turkish 3,21 6,76 -3,55 3,09 6,42 -3,33

(1,68) (1,21) (1,6) (2,15)Indian 1,44 3,91 -2,47

(4,09) (5,54)

Note. Mean amplitudes in JlV and standard deviations (in parentheses). Block 1, tone 8 majorstandard-scale (S) versus tone 8 Thai deviant-scale (0), mean differences (MO), latency rangeI (270 msec - 430 msec), Fz, Cz, Pz; all cultural groups.

Page 99: Musical Imagery

86 MENTAL IMAGES OF MUSICAL SCALES

°2

4

6

'\" ..... '\ /"\ \1 \

\ "\/ \" t'\/ ,\., I"'" ,...... , \'/ \l/'V'

rI/

/\ ,.,-J ,II

\ ,\ ,\/ \ I

\ I\J

8+--r---,-__° 60,2 120,5 180,7 241,0 301,2 361,4 421,7 481,9 542,230,1 90,4 150,6 210,8 271,1 331,3 391,6 451,8 512,0

Figure 4. Grand average ERPs ofGerman musicians (n = 5 subjects). Bioelectrical reactionsto tone 7 of the major standard-scale (solid line) and tone 7 of the Thai deviant-scale (broken line), scalp site Fz. Presentation of the whole range of averaging(0 msec - 540 msec), only the course of potential in latency range I is discussed(range: 270 msec - 430 msec, negativity is up). Horizontal axis indicates time inmsec, vertical axis indicates average Fz in JlV.

JJV-4

6

o 60,2 120,5 .180,7 241,0 301,2 361,4 421,7 481,9 542,2JO,1 90,4 150,6 210,8 271,1 331,3 391,6 451,8

Figure 5. Grand average ERPs of Indian musicians (n = 5 subjects). Bioelectrical reactions totone 7 of the major standard-scale (solid line) and tone 7 of the Thai deviant-scale(broken line), scalp site Cz. Presentation of the whole range of averaging (0 msec- 540 msec), only the course of potential in latency range I is discussed (range:270 msec - 430 msec, negativity is up). Horizontal axis indicates time in msec,vertical axis indicates average Cz in J..lV.

Page 100: Musical Imagery

CHRISTIANE NEUHAUS

.6,..--------------------------.JJV.4

,.... :"....

6

o 60,2 120,5 180,7 2<41,0 301,2 361,4 421,7 481,9 542,230,1 90,4 150,6 210,8 271,1 331,3 391,6 451,8 512,0

87

Figure 6. Synoptic diagram of the electrophysiological reactions to tone 7 of the Thaideviant-scale for all three cultural groups. Grand average ERPs from German sub-jects (solid line), Turkish subjects (broken line) and Indian subjects (dotted line),recordings from scalp site Cz. Horizontal axis indicates time in msec, vertical axisindicates average Cz in J..lV.

dard and 15 deviant scales per block.

Block 1, scale-tone 2The negative shifts at Fz, Cz, and pz scalp locations described above, point to theeffort required for cognitive processing after the sensorial perception of the secondThai scale-tone, and the second major scale-tone, respectively. Thus, this deflection islabelled processing negativity. However, this should not be confused with the similarterm 'processing negativity', used by the Finnish neurophysiologist Risto NlliiUinen,because experimental designs and latency ranges differ considerably here. It is notthe negativity per se which is of interest in the present study, but rather the differencebetween amplitudes of the Thai scale potentials and the major scale potentials, Le. therelationship between the major potential trace and the Thai potential trace.

In my opinion, this difference between amplitudes reflects the cognitive process ofinterval judgement based on the concept of 'categorical perception', a concept whichexplains processing of musical pitch in the group of trained musicians. In the word-ing of A. Schneider, this concept 'originally deals with discrimination processes andsensation (rather than perception and identification) in the first place, and has beenput into a more generalized approach later.' (Schneider, 1994, p. 227). The basicelements of this concept are the so-called 'perceptual categories'. Each of these per-ceptual categories consists of both a 'tonal centre' and a certain width, allowing someamount of variation in pitch (in German Klassenbreite and Klangbreite respectively).For German professional musicians, it may in general be assumed that there is a widthin such perceptual categories of 50 cents above or below the centre pitch of a givensemitone. For trained musicians from India or Turkey, there probably exist categorieswhich are more narrow, because in their musical practice, these musicians make useof the octave interval divided into smaller units than the tempered semitone. (Basic

Page 101: Musical Imagery

88 MENTAL IMAGES OF MUSICAL SCALES

'\ "1 \. 1\, 1\ / " , I1\' \ ,/ '/, \ ,I ,,I,

I

/,/'

- --_._._"-"-'_-1--

2

6

4

o

...

-2

o 60,2 120,5 180,7 241,0 301,2 361,4 421,7 481,9 542,230,1 90,4 150,6 210,8 271,1 331,3 391,6 451,8 (a)

" 1\I \ I, 1: \ I ,,' 1\/\'

\1 J -,I "

\ " \1...

4

2

:1+-----r----r--,.--..,.----r--.--,.--r--.---r----,.--,---,----yo-oy--------lo 60,2 120,5 180,7 241,0 301,2 361,4 421,7 481,9 542,230,1 90,4 150,6 210,8 271,1 331,3 391,6 451,8 512,0 (b)

-2

-6,---------------------------

-4

Figure 7. a) Grand average ERPs of Gennan musicians ( n = 5 subjects). Bioelectrical re-actions to tone 8 of the major standard-scale (solid line) and tone 8 of the Thaideviant-scale (broken line), scalp site Cz. Plotting is of the whole range of aver-aging (0 msec - 540 msec), but only the course of potential in latency range I isanalyzed (270 msec - 430 msec, negativity is up). b) For comparison, the ERPsof Indian subjects, tone 8, block 5. Bioelectrical reactions to tone 8 of the Thaistandard-scale (solid line) and tone 8 of the major deviant-scale (broken line), scalpsite Cz. Horizontal axis indicates time in msec, vertical axis indicates average CzinJ..lV.

Page 102: Musical Imagery

CHRISTIANE NEUHAUS 89

division of the octave is 22 srutis in Indian music and 24 scale-steps in Turkish music.)Given that the difference between the amplitudes of potential traces caused by

Thai-tone 2 and major scale-tone 2 within latency range II, i.e. between 430 msec to540 msec, has the function of an electrophysiological indicator concerning 'categor-ical perception', the a priori defined Klangbreiten can be verified by means of thesingle standard and deviation curves for all three cultures. Thus, Turkish and Indiansubjects show clear distinctions between amplitudes when the difference between thepitches of Thai-tone 2 and major scale-tone 2 was only 28,6 cents. This fact pointsto the processing of the Thai deviant stimulus and the major reference tone in twoperceptual categories. For German subjects, the pitch difference of 28,6 cents is ap-parently too small to evoke sensations of two separate perceptual categories. Probably,German musicians will process the Thai-tone 2 and major scale-tone 2 in one categoryaccording to the principle of the so-called Zurechthoren ('adaptive listening'). Thus,the graph shows a considerably smaller difference between the amplitudes of the Thai-tone and major scale-tone potential traces.

Scale-tone 7In general, step 7 of the Western major scale is characterized by its function as a so-called leading tone. In music theory, the leading tone is described as a note sensiblewhich has the tendency to resolve into an adjacent, harmonically important targettone a minor second above, Le. the tonic. When a tone of the Thai scale was theneuronal input, this incoming stimulus did not correspond with the German musicians'expectancy of a leading tone, a tone which could be taken for granted because of theoverlearned usage of the major-minor scales and the half tone - whole tone structure,respectively.

In this experiment, violations of the structure dependent 'melodic anchoring' prin-ciple and the 'offence' against the leading tone idea were indicated through the P300-component as a reaction to the seventh and deviant Thai-tone, in other words that theThai-tone presentation means a non-fulfilment of leading tone expectancy. Moreover,this bioelectrical result points to a relational, step-wise way of listening instead oflistening in a punctual manner. This means that during perception, attention is fo-cused on the scale-step between tone 7 and 8 rather than on the actual seventh pitchpoint. Also, the P300-component ofGerman subjects may be understood as a correlateof the 'context updating model' (see above). According to this kind of explanation,Thai-tone number 7 would be the conspicuous stimulus information which requiresactualization, correction, and revision of mental images. Actually, it causes a ruptureof the culture intrinsic, overlearned template of the major scale and engenders theP300-waveform as described above.

As for the Indian subjects, the small difference in amplitudes to be noticed betweenthe Thai P300-component and the corresponding major scale potential, indicates onlya minimal context updating process. Possibly, the ERP-result here could be explainedby the many variations in scale-structure which are found in Indian musical practice,hence that the Indian subjects were accustomed to such variations. In fact, half of theso-called that scales, outlined in Jairazbhoy (1971), show a semitone between scale-step 7 and 8, equivalent to the leading tone of the major-minor-system, whereas theother half of that scales are characterized by a wholetone between scale-step 7 and 8.

Page 103: Musical Imagery

90 MENTAL IMAGES OF MUSICAL SCALES

Scale-tone 8In order to explain the P300-response of all participants to scale-tone 8, a hypothesisof the German neurophysiologist R.Verleger could be useful (Verleger, 1986, pp. 60-72). According to this hypothesis, the P300-component indicates the 'closure of acognitive epoch', i.e., the P300-waveform is normally found at the end of a cognitiveunit which is composed of several stimuli. Verleger describes this cognitive epoch asa variable segment, consisting of a modifiable sequence of standard stimuli and beingconfined by a deviant stimulus which evokes the component in question. In line withVerlegers idea, this cognitive epoch could also be interpreted as a 'Gestalt in time',i.e. in this case a 'shape' of a scale made up of eight scale-steps. Following this,the P300-component and the so-called 'long lasting positivity' can be interpreted aselectrophysiological correlates ofpattern perception and pattern processing.

Thus, German subjects responded with a large P300-amplitude to the terminationof the scale-template, whereas Turkish musicians responded with a smaller amplitudeafter the closure of the tone-sequence. The missing reaction of the Indian subjects (af-ter a single, upward tone-series) can probably be traced back to the traditional practiceof playing scales and melodic phrases of a raga in combinations of upward and down-ward movements (termed aroh and avroh, or arohana and avarohana).

General discussionIt seems that three general elements can be induced from the ERP-results in the casesof scale-tone 2, 7 and 8 (block 1):• Processing negativity (cf. scale-tone number 2), understood from a psychophys-

iological perspective.• Listening concepts and listening strategies of the participants, understood in re-

lationship to the probable influence of culture (see, HI above), and:• Event-related-potential measuring per se as a possibly useful tool for empirical

investigations of cross-cultural issues, including constraints and schemata in mu-sical imagery.

Processing negativityAssuming a sequential (or serial) order of information processing, the unusual (andat first sight apparently illogical) chronologial sequence of the underlying cognitiveprocesses indicated by the ERP-activity is striking, especially in those traces whichinclude the P300-waveform as well as processing negativity. This means that first ofall, the revision of current representations in STM (context updating process) is indi-cated by the P300-component, and that afterwards, the concept of categorical percep-tion is reflected by the processing negativity, something which actually should havebeen visible much earlier (onset latency about 430 msec).

In trying to explain this, we can refer to an idea put forward by A. Schneiderconcerning categorical perception. Schneider points to a division of this concept,one part being the process of pitch discrimination, and the other one being that ofpitch identification: ' ...categorical perception ... is based on absolute judgements andidentification of stimuli well as of a discrimination function ...' (Schneider 1994,p. 227). Following this idea, processing negativity could perhaps be regarded not as an

Page 104: Musical Imagery

CHRISTIANE NEUHAUS 91

indicator of pitch discrimination, but rather of pitch identification. Thus, processingnegativity points to the apperceptive part of categorical perception rather than to thesensory-perceptive part. This could then be a tentative interpretation for explainingthe late onset-latency of the processing negativity-shift.

Previously, M. Besson and F. Faita have found similar late negativities. In theirstudy concerning 'violation of musical expectancy' , they suppose that

negative components that developed in the latency band of 200 to 600 ms ...may be similar to the N200 component, typically reflecting categorical mis-match ... However ... the negative components reported here were ... extendedunder passive listening conditions. Thus, the negative components cannot eas-ily be equated with N200-like components ... Further experiments are clearlyneeded to specify the functional significance of the negativities reported here.(Besson & Faita 1995, p. 1293)

It is suggested that sensory-perceptive processes of discrimination (Le. that which isincluded in the first part of Schneiders categorical perception concept) have their bio-electrical correlate in an ERP-waveform named Mismatch Negativity (MMN), peak-ing at about 200 msec. Regarding this MMN-component, M. Tervaniemi et al. statethe following:

MMN is elicited by physically deviant auditory stimuli presented among repet-itive 'standard' stimuli ... The MMN amplitude is known to correlate withpitch-discrimination performance ... it might be concluded that pitch discrimi-nation and identification are based on different brain mechanisms. (Tervaniemiet al. 1993, p. 305.)

Clearly, further ERP-experiments concerning this problem would be welcome.

Listening concepts and listening strategiesSubjects of two, or even all three, cultural groups in my study make use of similar,or nearly identical, strategies of listening and processing, e.g. 'categorical perception'(cf. scale-tone 2), 'anticipatory thinking' (cf. scale-tone 7) as well as 'pattern percep-tion' (cf. scale-tone 8). This conclusion is based on the electrophysiological results ofblock 1. As a consequence, it is not possible to maintain the strict, culture-relativisticview-point which was formulated in hypothesis HI. Thus, if on the one hand, strate-gies of listening can be understood as basic cognitive processes, Le. as universal andbeyond culture-specific criteria, the influence of culture, on the other hand, still hasto be taken into account as a result of the daily practising of various culture specificscale-schemata by trained musicians. During the laboratory experiments, that is whenperceiving the stimulus material, it is this previous practising with overlearned scale-patterns which causes modification of the - in principle- similar processing strategies.

In fact, Indian subjects did not develop a P300-component after the termination ofthe upward scale-schema, probably because of the missing form of avarohana move-ment they were accustomed to (cf. scale-tone 8). A further example is the underly-ing processing-mechanisms of Turkish and Indian subjects, where they (contrary tothe German subjects) probably assigned pitch sensations to perceptual categories ofsmaller width (cf. scale-tone 2).

As to the perception and processing of scale-tones 2, 7 and 8 (block 1), one could

Page 105: Musical Imagery

92 MENTAL IMAGES OF MUSICAL SCALES

speak of an interaction between culture-specific stimulus material and universal mech-anisms and/or principles of perception. This interpretation is in line with ideas of theGerman psychologist L. Eckensberger, as in the following:

The assumption of culture-specific manifestations of theoretical concepts sug-gests a distinction of constructs into structures and processes ... The argumentfor the premise of transcultural validity of psychological constructs refers forthe most part to processes, e.g. learning operations ... on the contrary, the for-mulation of its culture-specific manifestations relates much more to structures,i.e. contents. (Eckensberger 1970, p. 15. Translation by Christiane Neuhaus.)

A similar understanding was advanced by D.L. Harwood:

A search for universals cannot usefully focus on musical content ... Rather, weshould direct our attention to how music is made - how it is performed, heard,understood, and learned. The process of understanding and participating in themusical behaviour of one's community may be more universal than what is tobe understood or performed. (Harwood 1979, p. 51)

Event-related-potential measuring and cross-cultural issuesElectrophysiological measuring procedures and other non-invasive methods of brainresearch have the advantage of allowing the study of perception and information pro-cessing on the basis of brain activity. This means that collecting data is not dependentupon various verbal accounts by the subjects. It should follow then that ERP-methodsper se are suited to yield culture equivalent data, in line with intentions of cross-cultural psychological research, research which has had the ambition of developingso-called 'culture fair tests' (Wassmann, 1988; Thomas, 1993). However, the method-ological constraints of ERP-methods, such as that of presenting simple, often repeti-tive and synthetical stimuli in an artificial laboratory situation, could of course raiseobjections of an eurocentric bias in this kind of investigation.

Setting methodological reservations aside and returning to the question posed atthe beginning of this chapter concerning culture specific constraints and schemataat work in musical imagery, the information presented above does seem to suggestthat there are indeed culture specific elements at work in the expectancies of musicalsound. From the data in this study, it would not seem too farfetched to assume thatthese expectancies would be similarly at work in imagining learned or entirely novelmusical material in the mind. However, the experimental data reported here is limitedto scales. It would indeed be very interesting to see what the methods used here (orsimilar methods) could reveal about brain activity in the imagery of other, and morecomplex, musical elements in different cultures, advancing our knowledge in the fieldof 'comparative musical imagery'.

References

AltenmUller, E. O. (1993). Psychophysiology and EEG. In E. Niedermeyer & F. L. da Silva (Eds.), Elec-troencephalography: Basic principles, clinical applications, and relatedfields (3rd ed., pp. 597-613).Baltimore: Williams & Wilkins.

Page 106: Musical Imagery

CHRISTIANE NEUHAUS 93

Besson, M.& Faita, F. (1995). An Event-Related Potential (ERP) study of musical expectancy: Compar-ison of musicians with nonmusicians. Journal of Experimental Psychology: Human Perception andPerformance, 21, 1278-1296.

Besson, M.& Macar, F. (1987). An Event-Related Potential analysis of incongruity in music and othernon-linguistic contexts. Psychophysiology, 24, 14-25.

Cohen, D. & Erez, A. (1991). Event-Related-Potential measurements of cognitive components in reponseto pitch patterns. Music Perception, 8, 405-430.

Cohen, D., Granot, R., Pratt, H. & Bameah, A. (1993). Cognitive meanings ofmusical elements as disclosedby Event-Related-Potential (ERP) and verbal experiments. Music Perception, 11, 153-184.

Donchin, E. (1981) .. Surprise! ... Surprise? Psychophysiology, 18, 493-513.Eckensberger, L. (1970). Methodenprobleme der kulturvergleichenden Psychologie. Schriften des sozial-

wissenschaftlichen Studienkreises fUr internationale Probleme (SSIP) e.V., Heft 8. SaarbrUcken.Fabiani, M., Gratton, G., Karis, D. & Donchin, E. (1987). Definition, identification, and reliability of

measurementof the P300 componentof the Event-Related brain Potential. In P. Adeles, & J.R. Jennings(Eds.), Advances in Psychophysiology (Vol. 2, pp. 1-78). Greenwich: JAI Press.

Hantz, E.C., Kreilick, K.G., Kananen, W. & Swartz, K.P. (1997). Neural responses to melodic and harmonicclosure: An Event-Related-Potential study. Music Perception, 15, 69-98.

Harwood, D.L. (1979). Contributions from psychology to musical universals. The world of music, 21,48-61.

Jairazbhoy, N.A. (1971). The rags ofNorth Indian music: Their structure and evolution. London: Faber &Faber.

Janata, P. (1995). ERP measures assay the degree of expectancy violation of harmonic contexts in music.Journal ofCognitive Neuroscience, 7, 153-164.

Klein, M., Coles, M.G.H. & Donchin, E. (1984). People with absolute pitch process tones without produc-ing a P300. Science, 223, 1306-1308.

Paller, K.A., McCarthy, G. & Wood, C.C. (1992). Event-Related Potentials elicited by deviant endings tomelodies. Psychophysiology, 29, 202-206.

Schneider, A. (1994). Tone system, intonation, aesthetic experience: theoretical norms and empirical find-ings. Systematische MusikwissenschaJt, 2, 221-254.

Taub, 1M., Tanguay, P.E., Doubleday, C.N., Clarkson, D. & Remington, R. (1976). Hemisphere and earasymmetry in the Auditory Evoked Response to musical chord stimuli. Physiological Psychology, 4,11-17.

Tervaniemi, M. (1999). Pre-attentive processing ofmusical information in the human brain. Journal ofNewMusic Research, 28,237-245.

Tervaniemi, M., Alho, K., Paavilainen, P., Sams, M. & R. (1993). Absolute pitch and Event-Related brain Potentials. Music Perception, 10, 305-316.

Thomas, A. (1993). Kulturvergleichende Psychologie: eine Einfiihrung. Gtlttingen: Hogrefe.Verleger, R. (1986). Die P3-Komponente im EEG: Literaturiibersicht, Diskussion von Hypothesen, Unter-

suchung ihres Zusammenhangs mit langsamen Potentialen. MUnchen: Profil.Verleger, R. (1988). Event-related potentials and cognition: A critique of the context updating hypothesis

and an alternative interpretation of P3. Behavioral and Brain Sciences, 11, 343-427.Wassmann, 1 (1988). Methodische Probleme kulturvergleichender Untersuchungen im Rahmen von Pi-

agets Theorie der kognitiven Entwicklung - aus der Sicht eines Ethnologen. Zeitschriftfiir Ethnologie,113,21-66.

Page 107: Musical Imagery
Page 108: Musical Imagery

6

Complex InharmonicSounds, PerceptualAmbiguity, and MusicalImagery

Albrecht Schneider

Introduction

The notions of image and imagery are intricate regarding the many meanings whichhave been assigned to them in disciplines such as philosophy and psychology (cf.Schneider & this volume). Also, there are writings on literature and the artsin general which address 'imagery' in many ways (e.g., Leppert, 1996). In a morerestricted perspective developed in areas of cognitive psychology, imagery is still de-fined quite loosely as the experience of 'seeing with the mind's eye' or 'hearing withthe mind's eye' (cf. Kosslyn, 1990, p. 177). It is believed, however, that the prin-cipal elusiveness of imagery can be overcome by more objective methods, most ofall by computational approaches. Such have been taken, in among other fields, psy-choacoustics and hearing research where various models have been developed whichproduce, at one stage of processing signals such as speech or music, a plot of fea-tures extracted from the signal which is labelled auditory image (see, e.g., Pattersonet aI., 1992, 1995). Regarding the peripheral transduction process in hearing, acous-tical stimuli which cause a certain activity pattern in the auditory nerve are therebytransformed into auditory images which maintain basic signal features such as, for ex-ample, periodicity. This of course has implications for the perception of pitch. Thereare a number of hypotheses how such images could finally be 'represented' at the levelof the auditory midbrain as well as the cortices by means of tonotopically organizedmaps or other spatiotemporal mechanisms.

Page 109: Musical Imagery

96 COMPLEX INHARMONIC SOUND

Whereas the concept of auditory image is closely related to computer modelsof peripheral processing of stimuli, the concept of auditory imagery involves centralbrain activity because it addresses, among other issues, retrieval of knowledge storedin some form in long-term memory.

In a definition which emphasizes earlier ideas, it is said that 'auditory imageryis the introspective persistence of an auditory experience, including one constructedfrom components drawn from long-term memory, in the absence of direct sensoryinstigation of that experience' (Intons-Peterson, 1990, p. 46). The author therebywishes to exclude auditory aftereffects. Though imagery as a special mode of 'inner'experience often has been separated from actual perception, it has been argued alreadyby Stumpf (1890, 1907, 1918) that there is no strict boundary between perception andimagery, and that both involve basically the same mental functions. This point ofview has been reinforced more recently with respect to vision. In particular, it hasbeen expressed 'that imagery plays an essential role in normal perception' (Kosslyn,1994, p. 145).

It is everday day experience that perception of music in many instances also in-volves musical imagery. To comprehend a given musical structure, listeners may findit helpful to form an image which, in this context, can be understood as a simplifiedmodel of an actual stimulus perceived. The image which is a mental contruct 'repre-sents' the stimulus, yet with some abstraction. As to specifics of this representation,there are different opinions, since the debates about 'propositional' versus 'pictorial'and 'declarative' versus 'procedural' which have been issued in psychology and lin-guistics, are also found in cognitive music theory (see Seifert, 1993, pp. 305 ft). As tothe propositional vs. pictorial controversy, one could argue that fully comprehendinga given musical structure indeed calls for a propositional treatment, that is, an abstractdescription which relates the structure perceived to a musical syntax in order to judgewhether the piece as heard is 'correct' with respect to certain rules. This doesn't pre-clude some form of pictorial, and in particular geometrical representation which canbe useful, to be sure, especially in listening situations where complex musical struc-tures need to be processed more or less in 'real time'. It has been stressed that humanunderstanding to a considerable extent is based on cognitive strategies which can becalled geometrization (relating sensory input to forms and shapes which have beenlearned earlier, see GodflSY [1997] and sections Sound structure and apperception andRelation ojperception and apperception to musical imagery below).

For example, when listening to a an ordered sequence of major and minor chordssuch as found in measures 31-35 of Bach's Fantasia in g-minor for organ (BWV 542),one may be inclined to mentally 'map' each chord on a geometrical structure suchas the two- or even three-dimensional tone net (cf. Riemann, 1914/15; Vogel, 1993).Thereby, the stimulus as heard would be transformed into a sequence of different geo-metrical shapes which represent minor and major chords. This transformation wouldsupport categorization because a number of chords will be found to have the sameshape when projected on the tone net, and progression from one chord to the nextcan then be understood in terms of geometrical operations (shifts, transformations).Thereby, recognition of invariant features is facilitated, as is comprehension of thecompositional idea Bach had in this instance, namely exploring a wide range of keysalong the cycle of fifths plus adding chromaticism in the bass line as well as by 'ex-

Page 110: Musical Imagery

ALBRECHT SCHNEIDER 97

change operations' (from minor to major chords: the sequence in fact is D-07-c-C-f-F-bD-BD-eD-ED-aD-AD-dD-DD, the last one being the boundary because this key wasdifficult to realize on a keyboard in a non-equal-temperament tuning).

It has been underpinned that relating musical structures as heard to the tone net hasindeed cognitive relevance since, for example, progressions from one chord to the nextwill call for distance estimates on the side of the listener which include judgements onthe relative consonance or dissonance found in chord progressions (cf. Zannos, 1995).I will return to the concepts of geometrization and 'musical shapes' below with respectto perception of music realized with complex inharmonic bell sounds.

Complex inharmonic sounds

Complex inharmonic sounds are such where frequencies fn [Hz] of spectral compo-nents n (n = 1,2,3, ..., n) are not multiples of the lowest component. Thus, higher spec-tral components of such sounds can not be considered as partials that bear frequencyrelations of 1 : 2: 3: 4: 5 :6: 7 : 8... : n to the fundamental frequency (n=l) as is thecase in sounds with harmonic spectra found in instruments belonging to the classes ofaerophones (wind instruments including the singing voice) and chordophones (stringinstruments) .

Different from aerophones and chordophones, instruments belonging to the classof idiophones such as xylophones, gongs and gong chimes, bells and carillons typi-cally produce complex inharmonic sounds which can be described by their time func-tion as well as by spectral composition and temporal changes in energy distribution(for details and examples, see Fletcher & Rossing [1991] and Schneider [1997b, 1998,1999]). In particular, parameters such as spectral density, spectral centroid, frequencyshifts of spectral components as well as other modulation effects are of interest. In-harmonicity in sounds produced by idiophones basically results from the fact that insolids (e.g., bars, slabs, plates, shells) frequency dispersion occurs. Phase velocity CBfor bending waves which are the most important type of waves in solids as regardsactual sound production, can roughly be given as CB -Vl. Thus, wave propagationfor each mode of vibration is dependent on its frequency. For a bar free at both endsas found in xylophones this means that the phase velocity CB for the third mode isalready 2.32 times that of the first vibrational mode, and frequency relations are thus1: 5.4.

Inharmonic sounds can of course be produced also by synthesizers (especiallythose which employ frequency modulation for sound generation) as well as by meansof computers which allow generation and control of arbitrary signals. The presentarticle will be concerned only with sounds produced by instruments such as bells.It should be noted, however, that with regard to perception and cognition of com-plex inharmonic sounds, there is not much difference between 'natural' and 'artificial'sources.

Page 111: Musical Imagery

98 COMPLEX INHARMONIC SOUND

Perceptual ambiguity of pitch and timbre in inharmonic sounds

Inharmonic complex sounds as a rule cause ambiguity of pitch or rather pitches per-ceived. One reason why multiple pitches can be elicited is that in many such sounds,spacing of spectral components is quite irregular, and the components lowest in fre-quency must by no means be the strongest in amplitude. Quite to the contrary, manysounds recorded from metallophones such as the Javanese gender and saron as well asfrom gongs and bells contain components with strong amplitudes that are located in afrequency range one or several octaves above the lowest component which, to be sure,in inharmonic sounds cannot be simply taken as the 'fundamental' of the spectrum.In spectra of sounds comprising inharmonic components, the frequency of the periodof the complex waveshape must not match that of the lowest component. By contrast,in sounds with harmonic spectra, the frequency of the period of the common wave-shape equals that of the fundamental. The periodicity pitch of the waveshape therebyreinforces that of the fundamental so that no ambiguity of pitch will be experienced.This holds true even if the amplitudes of all harmonics which normally correspond,roughly, to a scheme like A = lIn (A =amplitude, n =number of harmonic), will bereversed in such a way that amplitudes increase with the number of harmonics (seeSchneider, 1997a, pp. 128-130). If the inharmonicity of all components is slight, thatis, frequencies fall off from a harmonic series of, for example 100, 200, 300, 400,500, 600 Hz to yield frequencies of 96, 202, 299, 408, 498, 615 Hz instead (withamplitudes of all components remaining unchanged), it is still possible to fit a singlepure tone to the components that, as a 'pseudo-fundamental', represents the averageperiod lenght of a set of inharmonic components (for examples, see Schneider [1997b,pp. 145-148; 2000]). This technique is similar to that chosen by Terhardt (1979) in anapproach called subharmonic matching.

In such cases where a 'pseudo-fundamental' can be matched to a complex wave-shape of inharmonic components so that both share the same zero-crossings (or nearlyso; it is implied that both have the same period of length T [ms]), one can classify thetime function y(t) of the sound in question as quasi-periodic. It is in general also pos-sible for subjects to judge the pitch of such inharmonic complex sounds, and to assigna musical note to it. Even moderately inharmonic sounds will already undergo ampli-tude modulation, and may bring about sensation of beats or roughness depending onthe degree of inharmonicity as well as actual frequency and amplitude values.

In case inharmonicity is increased steadily, periodicity of the complex wave-shapedecreases. Consequently, matching of a 'pseudo-fundamental' to such waveshapesbecomes more and more difficult. In experiments carried out in our institute withsuch sounds (generated by additive Fourier synthesis with the aid ofMathematica) aswell as with many similar stimuli (sounds of real instruments as well as computer-generated inharmonic complex sounds, see Schneider [1997b, pp. 444ff]), musicallytrained subjects were asked to match a pure tone produced by a sine wave generatoragainst the complex sound in such a way as to yield the 'same pitch'. In all such ex-periments, various subjects have selected different spectral components of each soundas a target, and sometimes have chosen a virtual pitch resulting from an inharmoniccomplex as a pitch to which the pure tone was matched. This indicates that perceptionof inharmonic complex sounds is ambiguous, and typically gives rise to perception of

Page 112: Musical Imagery

ALBRECHT SCHNEIDER 99

several pitches rather than one pitch.Such effects have been observed in, among others, experiments with sounds of car-

illon bells serving as stimuli. These experiments were carried out with basically thesame experimental design as we have used (see Fleischer, 1996). As a general result itcan be said that especially musically trained subjects find it difficult to match a singlepure tone to an inharmonic complex because subjects who master analytic listening,as a rule are capable to single out more than one component of a complex inharmonicsound. These components are in fact heard as 'spectral pitches' (see Terhardt 1979,1998) so that for each sound several such pitches can be distinguished and, dependingon stimulus characteristics, experimental conditions, and expertise of subjects, also beidentified. Even though subjects try to focus on but one spectral pitch when trying tomatch a single pure tone to one of the components of the stimulus, many subjects re-gard several spectral pitches as possible alternatives that lack a clear hierarchy. In suchinstances perceptual ambiguity increases with the number of spectral (and sometimesalso virtual) pitches.

Inharmonic complex sounds bear another perceptual problem, namely that sepa-ration of 'pitch' and 'timbre' is much more difficult (if possible at all) than in soundswith harmonic spectra. In these, to be sure, pitch is determined by both the frequencyof the fundamental and the (typically identical) frequency with which the complexwaveshape repeats per second. Timbre in such sounds, as Helmholtz (1863) had al-ready shown, can mainly can be attributed to the number and relative amplitude of par-tials. If transients as well as changes in spectral energy distribution due to (amplitudeand/or frequency) modulation are neglected, the 'colour' of the steady-state portion ofa given sound is thus largely dependent on the spectral envelope (see Stumpf, 1926;Slawson, 1985).

In complex inharmonic sounds, no such distinction between pitch and timbre ispossible. As has been noted above, in complex inharmonic sounds very often thelowest spectral component isn't the strongest in amplitude so that other componentsare more prominent, and are clearly audible as such. Since these components arenot partials, they appear in many instances as disparate parts that 'resist' integrationinto one coherent perceptual entity (see also Cohen, 1984). Single inharmonic soundsthereby can give rise to an impression of tension and discordance whereas in harmonicsounds partials 'fuse' into one coherent percept (see Stumpf, 1926; Schneider, 1997a).

Further, interaction of inharmonic components often causes amplitude modula-tion as well as regular or irregular shifts in spectral energy distribution so that soundsin particular of metallophones (idiophones made of bronze or brass bars, plates orshells) can be quite fluctuating with respect to spectral density and location of spec-tral centroid. These modulation processes are sensed by subjects and contribute tothe overall impression of flux and dissonance. Finally, since spectral components withstrong amplitudes may be found in a relatively high frequency band (for examples, seeSchneider [1997b, in press]), the spectral centroid is much higher than the lowest com-ponent of a given sound. This, as might be expected, can have two effects: First, thehigher the spectral centroid frequency is, the more the sensation of brightness grows.Second, if the lowest spectral component and the centroid frequency are far removedfrom each other, it is likely that subjects will hear two or more different pitches, onepossibly attributed to the component lowest in frequency, the other to a strong com-

Page 113: Musical Imagery

100 COMPLEX INHARMONIC SOUND

ponent in a high register or to a group of higher components acting together. Thesecomponents may in addition produce a virtual low pitch identical with, or differentfrom, a low spectral pitch (see Terhardt, 1998). For example, in bells the so-called'strike note' (German Schlagton, Dutch Slagtoon) is such a virtual pitch (Schouten &t'Hart, 1965) that very often by listeners is located at the same frequency as that of aspectral pitch (namely, the prime, see Bruhn [1980]). In sum, the occurence of sev-eral possible (spectral plus virtual) pitches as well as spectral inharmonicity, spectraldensity and modulation processes, make it difficult even for trained listeners to sepa-rate, in inharmonic sounds, 'timbre' from pitch. With respect to perception, especiallycomplex inharmonic sounds that contain many components work in a similar fashionas do clusters known from 20th century art music. Such textures are meant to evokesensations of continua that have a lower and upper boundary rather than to give riseto distinct pitches which can be regarded as 'points' on a single dimension (as to suchmodels of pitch, see Schneider [1997b, pp. 404ff]).

Inharmonic sounds and pitch perception

In studies on principles of pitch perception, typically two approaches are discussed,one being directed to frequency analysis, the other to periodicity detection (see deBoer, 1976; Moore, 1993; Houtsma, 1995). It is held that perception of pitch, on theone hand, depends on components of a complex (harmonic or inharmonic) sound thatare resolved in the cochlea which is regarded as a spectrum analyzer. The cochleathereby can be modeled as a chain of (overlappping) bandpass filters operating in par-allel. Views differ, however, as far as the actual shape, bandwidth and other parametersof such auditory filters concerned (see Moore, 1993; Delgutte, 1996; Hartmann,1998, ch. 10).

Since higher harmonics cannot be resolved independently, and because of auditoryphenomena such as the pitch of the 'missing fundamental' resulting from a groupof consecutive higher harmonics (e.g., the 7th, 8th, and 9th partial of a harmoniccomplex), periodicity as contained in the compound waveshape of a signal has beenintroduced as a means of explanation to account for, among others, perception of themissing fundamental (Schouten, 1940). The principle of periodicity being relevant forpitch perception had already been pioneered by A. Seebeck in 1843 (for an overviewof research, see Hesse [1972]; de Boer [1976]).

From all what is known today from physiological and psychoacoustic experimentsas well as studies in computer simulation, it can be concluded that pitch perception inspeech and music is closely connected to detection of periodicity inherent in signals aswell as in neural spike trains and interspike intervals (for details, see Hesse [1972]; deBoer [1976]; Houtsma [1995]; Cariani & Delgutte [1996]; Delgutte [1996]; Schnei-der [1997a, 1997b, 2000]). It has of course to be noted that periodicity detection byno means contradicts frequency analysis. First, frequency f(Hz) and period T (sec)of a musical or speech signal relate to each other since f = lIT, and T = Ilf, respec-tively. Second, harmonics that have been resolved in the cochlearic filter bank willthen also result in periodicities found in spike trains and interspike intervals (lSI; seeSachs & Young, 1979; Young & Sachs, 1979; Greenberg & Rhode, 1987; Javel et aI.,

Page 114: Musical Imagery

ALBRECHT SCHNEIDER 101

1987). There are several models to account for how spectral information is encodedand represented in single fibers as well as across bundles of such fibers of the auditorynerve. Also, stages and mechanisms where integration of neurally coded informationinto coherent pitch percepts is achieved along the auditory pathway is still a matter ofdebate (see Keidel, 1992; Delgutte, 1996; Cariani & Delgutte, 1996).

It is clear, however, that the periodicities of all the resolved harmonics as encodedin spike trains and ISIs have to be integrated at higher stages in order to yield a single,unitary and unambiguous pitch. In addition, neural information pertaining to groupsof consecutive higher harmonics (that have passed the auditory filters as groups) hasto be processed. To derive pitch in the 'time domain', several mechanisms includingall-order, multi-channel ('pooled') autocorrelation of interspike interval informationavailable in the auditory nerve (see Cariani & Delgutte, 1996) and coincidence de-tection networks based on clock cells in the midbrain (Keidel et aI., 1975) have beenproposed. The validity of autocorrelation theory to explain temporal integration hasbeen questioned recently on the basis of experimental findings (Kaernbach & Demany,1998). It should be noted, though, that stimuli in these experiments did hardly bearany relationship to musically relevant stimuli.

Difficulties with inharmonic sounds are now at least twofold: First, even if inhar-monic components have been resolved in the cochlearic filter bank, periodicities inspike trains and ISIs corresponding to these components typically will not integrateinto a common periodicity. Since in pooled auto-correlation functions (Cariani &Delgutte, 1996) for harmonic sounds comprising resolved and unresolved partials thepredominant interval is maximum at the fundamental period T = 1/f1 (to be sure, thisis the period also of the fundamental frequency f1), such a clear maximum can not beexpected for inharmonic sounds. Depending on the degree of inharmonicity and fre-quency relations between inharmonic components, the autocorrelation function mayyield several peaks indicative of 'pseudo-periods', or may, in cases of extreme inhar-monicity, degenerate altogether (see Schneider, 2000). A factor to possibly 'confuse'autocorrelation further could be if inharmonic sounds contain much spectral energy infrequency regions where components cannot be resolved individually, that is, aboveca. 4-5 kHz. Such sounds from musical instruments do indeed exist (see Schneider1997b, 1999).

Thus, whereas integration of the neural response to harmonic sounds with differentspectral composition again yields what can be viewed as a simple periodic function,for inharmonic sounds the neural response patterns are more complex so that in manycases it will not be easy, or even be impossible, to determine common periodicitiesinherent in spike trains and ISIs corresponding to inharmonic components.

Second, with respect to auditory models that propose harmonic templates or 'sieves'for pitch estimation, with increasing inharmonicity of components it will be moreand more difficult to derive a clear 'central spectrum' (Srulovicz & Goldstein, 1983)from the overall neural response pattern of inharmonic sounds. Since the 'centralspectrum' again implies that a fundamental (frequency and/or period) can be derivedfrom the distribution of (more or less harmonic) components, this approach is likelyto fail in case of strongly inharmonic sounds. In this respect, inharmonic sounds oreven textures of such sounds are much different from stimuli that have been used inmany experiments in psychoacoustics, namely such where one partial in an harmonic

Page 115: Musical Imagery

102 COMPLEX INHARMONIC SOUND

complex has been mistuned, and thus 'stands out' as a single component against anotherwise harmonic sound.

Auditory images

The concept of auditory images basically was developed in connection with modelsof the cochlea, and in particular regarding travelling waves that reach a maximumafter a certain time, and at a certain place, depending mainly on the frequency andamplitude of the stimulus (Bekesy, 1960; Keidel et aI., 1975; Keidel, 1992; Zenner,1994). Such travelling waves thereby cause periodic vibration of membranes and re-lated cell structures in the inner ear which in turn are transformed into electrical pulsetrains by way of the hair cell transduction process. In more recent computer modelsof peripheral auditory processing based in the time domain (e.g., Meddis & Hewitt,1991; Van Immerseel & Martens, 1991; Patterson et aI., 1992, 1995), the spectrumanalyzer capabilities of the cochlea are often simulated by a filter bank comprisingn bandpass filters. In several models, filters implemented have a characteristic knownas gammatone that is defined by its impulse response (see Hartmann 1998, ch. 10).The output of the filter bank that performs spectral analysis can be taken as equiva-lent to the distributed motion of the basilar membrane which is then transformed into aneural activity pattern (NAP) by a hair cell model. Finally, the NAP undergoes strobedtemporal integration that yields an auditory image (see Patterson et aI., 1995).

The point of interest here is simply that if the input to the model is a steady-state, harmonic signal such as a vowel in speech, the time/frequency representation ofthe signal as output of the gammatone filter bank is very regular, and clearly reflectsperiodicities inherent in the time function and spectral composition of the vowel. Con-sequently, the NAP corresponding to the motion of the basilar membrane again showsthe same regularity and periodicity (see Patterson et al. 1995, figs. 2 and 3). As canbe expected, the auditory image derived from the neural activity pattern once more re-flects the periodicities that are even reinforced in periodic signals because of temporalintegration. Thus, auditory images of sounds with a periodic time function (so thatF(t) = F(t + 't), 't = period length, 't = 21t1f, f = frequency), and which are free frommodulation (or nearly so), are stable in shape, and have features which clearly indi-cate periodicities of the complex harmonic sound. Since the NAP contains the sameperiodicities as does the acoustic signal (e.g., a vowel), these can be reconstructedalso from spike trains recorded from fibers of the auditory nerve (see Sachs & Young,1979; Young & Sachs, 1979).

With inharmonic sounds which lack such clear-cut periodicities (or can even beaperiodic depending on inharmonicity and modulation, see Schneider, [1997b, 1998]),no simple response patterns can be expected. Time/frequency response patterns of agammatone filter bank analyzing a complex inharmonic signal are notably irregular(as is the signal itself, see below). From the filter bank analysis taken to representbasilar membrane motion, it can be inferred that detection of periodicity, and therebyprediction of pitches, will be very difficult if not impossible.

For the purpose of illustration, a sound example that we also used in experimentsto be described below, was fed into a special gammatone filter bank developed by Sol-

Page 116: Musical Imagery

ALBRECHT SCHNEIDER 103

Figure 1. J.S. Bach, Ich bin's, ich sollte bussen... (BWV 244, four voice transcript for piano).

bach et al. (1998) whereby the gammatone filter realizes a wavelet transform (see alsoSolbach, 1998). The performance of this filter bank which uses, among other features,logarithmic spacing of kernel functions, works closer to actual auditory functions thanmany other models.

The sound example consists of polyphonic music (the choral Ich bin's, ich solitebussen from J.S. Bach'sMatthiiuspassion [BWV 244] in a four-voice transcript for pi-ano, see Fig. 1) that was played from a sampling keyboard (EMAX II stereo) by meansof a MIDI sequencer. The basic sound sample recorded on DAT at 48 kHz/16 bit wasfrom bell no. 2 of the famous carillon of Brugge in Flanders manufactured by lorisDuMery in the 1740s.

The spectrum for 16384 sample points of the sound as measured from the onsetis given in Figure 2 on the following page where also the frequencies and relativeamplitudes of the first nine major spectral components are listed. It is evident thatcomponent no. 8 which is the third octave above the hum note of roughly 100 Hz, isconsiderably 'out of tune' at 843.5 Hz, yet of all the spectral components is strongestin amplitude, and thus well audible. This and other inharmonic components are suitedto impede pitch perception of this sound. Since components 1, 2, and 4 are almostperfect partials ofone harmonic series, one might think these components alone wouldsuffice to bring about a stable pitch at about G (G2) or In fact the soundgives rise to at least two pitches, one being a low pitch equivalent to the hum note, theother located a fourth above this component.

The gammatone filter bank was set to 6 octaves, each having 12 filters so that72 bandpass filters were used for the analysis with an upper frequency limit of 3 kHzwhich seems appropriate with regard to the sound material in question (cf. Fig. 3 onpage 105). The relative bandwidth (df/fo, fo=center frequency) of the filters was 0.02,and filter order k = 3.

As can be seen from Figure 4 on page 106 which plots the output of the gamma-tone filter bank (ordinate: log frequency, abscissa: time; relative amount of spectralenergy found in filter bands is indicated by greyscales) for about the first 5.2 seconds

Page 117: Musical Imagery
Page 118: Musical Imagery

.

ALBRECHT SCHNEIDER

-...-............-.........

-:::::.----<

c _.

_..---...-.

ir-e' 'S.,,;'"--"'--...,..._.....

105

.u" ....

Figure 3. J.S. Bach,/ch bin's, ich soUte biissen... ,ca. 5.3 secondsof the bell version processedby a complex-valued gamrnatone filter bank (upper frequency limit: 3 kHz, 72filters, relative bandwidth: 0.02, filter order: 3): spectral energy distribution.

of the signal, the music consisting of four voices, each played with a complex inhar-monic sound (namely, that of bell no. 2), leads to quite dense layers of spectral energy.As a matter of fact, many of the 72 filters are activated by the multitude of signal com-ponents. Further, regarding temporal changes, there is a lot of fluctuation in spectralenergy distribution resulting mainly from amplitude modulation. These changes arewell documented in Figure 4 which plots the energy contained in each of the 72 filtersagainst time.

Both Figures 3 and 4 can be regarded as 'first stage' auditory images in that theyshow features of interacting complex sounds with respect to spectral analysis as per-formed by the inner ear. The output of the filter bank of course then will have to beprocessed further by a hair-cell module etc.

Experiment

The music example described above (see Fig. 1 on page 103) was used as stimulusin an experiment, carried out in October 1998 at our institute. 25 students in musi-cology, all 'freshmen' and 'freshwomen' respectively, volunteered as subjects. Thetask was to judge two versions of the choral lch bin's, ich soLlte biissen on four scales,

Page 119: Musical Imagery

106 COMPLEX INHARMONIC SOUND

Ich bin'.. ich loUte bUII8n (bell version) :.. energy 72 g.... ./. tt.

------_._...

----.- -------- .._._---_._- ._-_._----. - .=?=. -_....

..._-., ___ __ ._. _ _--

Figure 4. J.S. Bach, Ich bin's, ich sollIe bUssen... , ca. 5.3 seconds of the bell version pro-cessed by a complex-valued gammatone filter bank (upper frequency limit: 3 kHz,72 filters, relative bandwidth: 0.02, filter order: 3): readout of the energy containedin the 72 filters.

two of which relate closely to psychoacoustical qualities and perceptual dimensions,whereas the other two are more cognitively relevant. The scales that were runningfrom 1 (min) to 7 (max) are (1) sensory consonance, (2) correctness ofchord structureincluding voice-leading (in German: harmonische Stimmigkeit), (3) auditory rough-ness, (4) aesthetic pleasantness. Apparently, scales (1, 3) refer to perceptual charac-teristics, scale (2) to musical as well as cognitive features while (4) is aesthetic andcognitive in direction, and calls for a final overall judgements. The variables thusinterrelate:

max. -c:.... . . . . • . . • • . . . • . . . . • .. roughness min.

ttmin. -c:. • . • . . . . . . • . . . . • . . . • • .• consonance max.

min. -c:. . •• correctness of chord structure and voice leading .... max.

min. -c:. . . . • • • . • . . . . . . .. aestehetic pleasantness max.

Page 120: Musical Imagery

(a) Sampled bellaM SD Median

ALBRECHT SCHNEIDER

Table I. Basic statistics for the experiment.

SoundParameter:ScalelDimension:consonance 2.88 1.27 3chord/voice 3.92 2 4roughness 5.16 1.37 6aesthetic pleasantness 3.32 2.32 3

Note. aM = arithmetic mean SD = standard deviation.

(b) Synth. Organ stopaM SD Median

5.6 I 66.04 1.06 62.64 1.18 24.84 1.37 5

107

Of course, if roughness is maximum, sensation of consonance is minimum (et viceversa, see Schneider [1997a]). Also, it can be hypothesized that an increase in sensoryroughness will affect the two more cognitive and aesthetic variables so that judgementvalues for these should decrease with roughness going up in stimuli.

The design of this experiment was explored in a similar experiment (Schneider &Mtillensiefen, 1999) except that different stimuli had been used. For this experiment,the two versions of the choral were identical in musical structure, and were playedwith the same tempo (MM = 80 b.p.m.) and at the same level of loudness. Versionsdiffered, however, with respect to the timbre and harmonicity of the sounds employedin that version (a) uses the sampled sound of bell no. 2 from the Brugge carillon, andversion (b) a synthesized pipe organ stop comprising partials 1, 2, 8, 16, available onthe TX 81 Z synthesizer. Since the Brugge carillon is said to be tuned in meantonetemperament, the same tuning was chosen also for the synthesized organ stop.

The basic statistics for the experiment are summed up in Table 1.Homogeneity of variance was checked by means of the Bartlett-test whereby the

data for consonance and roughness were homogeneous (chi not significant) whereasthis was not the case for correctness of chord structure and voice leading (harmonis-che Stimmigkeit) for which chi (6.63, p = 0.01) = 8.88. Also, data in the two filespertaining to aesthetic pleasantness are not homogeneous in variance with chi (3.84,p = 0.05, 6.63, p = 0.01) = 6.187.

A t-test of paired data thus is possible for the dimensions of consonance and rough-ness, yet should be taken with care for the other two dimensions.

As an alternative to the t-tests of paired data, a multiple mean-test known to be ro-bust (Scheffe-test; see Bortz, 1985, pp. 339-343) was executed; the following contrastsD i/j (of pairs ofmean values corresponding to each other according to the four scalesand two versions = eight data files) are significant for the p = O.Ol-level: 1/5 (conso-nance), 2/6 (correctness of chord structure and voice leading), 3/7 (roughness). Somecontrasts D i/j are significant also on the p=O.OOl-level, namely 1/5 (consonance) and

Table 2. t-Test (paired data).

Data files 1/5 (consonance), t(3.745, p=O.OOI) = 9.148, highly significantData files 2/6 (chord/voice.), t(3.475, p=O.OOI) = 5.229, highly significantData files 3/7 (roughness), t(3.475, p=O.OOI) = 6.647, highly significantData files 4/8 (aesth. pleasantness), t(2.797, p=O.OI) = 3.058, very significant

Page 121: Musical Imagery

108 COMPLEX INHARMONIC SOUND

3/7 (roughness). The contrast for the correctness of chord structure etc. is almost onthe p =0.00 I-level with D 3/7 (2.173, P =0.001) =2.12.

Finally, a MANOVA (one factor, four dependent variables) was carried out where-by the two versions of the choral served as two steps of the factor, and the four depen-dent variables are those described above. The Ho for the MANOVA is that the vectorsfor the mean values of the steps of the factor have the same direction whereas HIclaims that they differ significantly in this respect (see Bortz, 1985, pp. 713ft). First,WilkS-A has to be calculated which in this case gives A= 0.335. Then the F-Test isperformed which yields F (3.961, p =0.001) =6.19. Consequently, differences in thetwo versions of the music stimulus (two steps of the factor) as judged by 25 subjectson four scales/dimensions are highly significant in terms of statistics.

Sound structure and apperception

The two versions of lch bin's, ich soUte biissen are judged quite differently wherebythe cause for these differences must rest in the sounds with which the two versionsof the otherwise identical music stimulus have been played to the 25 subjects. Evi-dently, the music played with the bell sound leads to sensation of considerable rough-ness because, with four voices in most of the chords, this means that the same in-harmonic sound sample is activated in four different frequency regions. Due to thesampling technique, the spectrum remains constant yet with all the frequencies ofcomponents being shifted up and down 'in parallel'. Modulation frequencies of com-ponents thereby are dependant on pitch. In effect, a dense layer of spectral componentsmost of which are inharmonic in relation to each other is produced. This yields lowmean and median values in judgements of consonance, and high values for roughness.As can be expected, these values are opposite for the synthesized organ stop whereconsonance is high, and roughness low.

For the organ sound, however, also the respective values for harmonische Stim-migkeit (correctness of chord structure) are much higher even though the notes thathave been played are identical, and so are the lowest frequencies in the spectra of bothsounds for each note. Thus, for each note played the two sounds differ only by spectralcomposition, and not by fundamental frequencies.

This is of importance because it has been argued in works on psychology ofmusicand music theory that listening to music basically means following the movement ofnotes as defined by their respective fundamental frequencies in tonal space (Alber-sheim, 1979; Cogan & Escot, 1976). Thereby, motion of voices indeed is imaginedby listeners as quasi-continuous lines in the two-dimensional pitch/time-space. Sinceeach note in principle is discrete in both frequency (fundamental plus partials) andtime, it is the listener who especially in multi-part music makes decisions as to whichnote belongs to which line (or voice), and also has expectations as to the directionof motions with regard to 'good continuation' etc. These issues which had been ad-dressed already by Stumpf (1890, 29ff, 411ft) and by scholars in the formative years ofGestalt psychology, have been of interest more recently in research directed towardspitch streaming (Bregman, 1990; Grossberg, 1999).

With respect to the experiment reported above, motion of voices, as well as the

Page 122: Musical Imagery

ALBRECHT SCHNEIDER 109

Figure 5. J.S. Bach, lch bin's, ich sollte bassen... , trajectories of the fundamental frequenciesof the four voices (measures 1-7 from Fig. 1on page 103) plotted on semilogarith-mic graph paper.

intervals resulting from simultaneous notes, is shown in Figure 5 which follows atechnique employed by, among others, Cogan & Escot (1976). Fundamental fre-quencies of the notes for measures 1-7 are plotted on semilogarithmic graph paperwhereby the ordinate (log) is frequency (Hz), and the abscissa (lin) is time (expressedas 1 crotchet == 10 mm). As has been pointed out by Rossing (1982, p. 134, 135),musical staff notation approximates this type of semilogarithmic representation fairlywell which in turn is a basic model of tonal space where pitch (scale) is organizedlogarithmically to account for octave equivalence, and time linear to allow for 'quasi-continuity' of tone sequences.

If we assume that listeners indeed try to follow motion of voices as well as toidentify the sequence of chords played, results. obtained for the variable harmonis-che Stimmigkeit can be interpreted in such a way that subjects listening to the 'bell

Page 123: Musical Imagery

110 COMPLEX INHARMONIC SOUND

version' of the choral apparently have difficulties in recognizing the contour of eachvoice as well as the chords resulting from the four voices. A melodic contour, tobe sure, with respect to pitch is one-dimensional in that complex sounds are takenso as to represent the pitch of each sound by its fundamental frequency only. Thus,the note a' when played on a piano is referred to by the frequency value of 440 Hzeven though the actual sound also contains a considerable number of partials. Since,in a piano sound, these are harmonics of the fundamental frequency, such a 'reduc-tionist' perspective is possible because the pitch perceived of such a sound mainly isdetermined by the fundamental as well as by the period of the complex waveshapewhich is of the same frequency (see sections Complex imharmonic sounds and Per-ceptual ambiguity of pitch and timbre in inharmonic sounds above). Consequently,complex harmonic sounds in many cases can be reduced, with respect to pitch, to aone-dimensional representation without dramatic loss in information and meaning, asis the case in conventional staff notation (Fig. 1), in graphs of melodic contours andintervals (Fig. 5) as well as in similar graphs derived from, for example, pitch trackerhardware or algorithms known as melograph, music mapper, etc. Neither conven-tional notation nor melography would make sense to us if melodic lines, each madeup of a sequence of notes that is also understood as a sequence of pitches, could notbe represented by the respective (fundamental) frequency values.

This view is corroborated by experimental findings according to which listenersconceive of melodic lines as one-dimensional contours, and are able to draw graphsrepresenting melodic contours from memory (for an overview of much of the relevantliterature, see Watkins & Dyson [1985]). Since even musically untrained subjectsare quite good at drawing such contours, it follows that the one-dimensional 'melodycurve' or 'melograph' representation can be considered as a standard coding formatfor melodic processes.

With inharmonic complex sounds, the task is much more difficult. To apprehend'in real time' for example, a melodic line played with a rather inharmonic carillon orgamelan instrument, implies that a one-dimensional cognitive representation has to bederived from multi-dimensional stimuli (see also Schneider, 1997b, chs. III, IV). Thisnecessitates that subjects listening to actual sound patterns, must be able to extract,first of all, a single pitch from each inharmonic complex that can be expected to yield alikewise complex auditory image. Pitch estimate includes abstraction from such soundcharacteristics which cause ambiguity (e.g., warble or other modulation effects). Thisabstraction process has in fact to be executed for each note and/or sound whereby theanalytical burden as a rule increases with the number of complex inharmonic soundsplayed at a time. One has to note, in this respect, that processes of feature-extractionand Gestalt recognition require that the signal is present for a certain time, and shouldnot undergo major changes within this span. There are several temporal integrationlevels relevant for perception that can be related to neural processing and psychophysi-cal observations (for details, see Schneider [1997b, pp. 100-105]; Naatanen & Winkler[1999]). For example, stable sensations of pitch and timbre require that the stimulusis present for 100-200 ms, loudness integration needs ca. 200 ms, etc. These timeintervals, however, concern sensation and sensory based perception in the first place,whereas apprehension (or apperception, a concept developed in cognitive psychol-ogy from Kant to Wundt, cf. Schneider [1997a, pp. 117f; 1997b, pp. 71ffpp. 150ff,

Page 124: Musical Imagery

ALBRECHT SCHNEIDER III

pp. 431f]) seems to be even slower. As had been explored already by William Stem(1897), subjects typically need about 0.5 sec. to fully apprehend a complex stimu-lus. Thus, apperception which implies that subjects are consciously aware of whatthey perceive, as well as of the act of perceiving, is relatively slow. Regarding musicplayed with complex inharmonic sounds, the problem then obviously is that the rateof change in the sensory input (cf. Figs. 3 and 4) is high even in tonal music such ashad been used in our experiments. Integration necessary to apprehend the melodicand harmonic structure seems to interfere with constant flux especially in the spectralenergy distribution of the stimuli.

Compared to the 'bell version', recognition of melodic lines, intervals and chordsis much easier with the synthesized organ sound that is harmonic and where the funda-mental is the strongest component of the spectrum. Thus, pitch estimation for singlecomplex tones as well as for several notes contained in the chords here is no problem.Further, the subjects who took part in the experiment (1st semester students in musi-cology) can be assumend to be more familiar with this type of harmonic sound thanwith carillons.

With the bell version, recognition of the actual motion of a voice marked mostof all by the intervals between fundamental frequencies for each note, is hamperedby three facts: One is the spectral envelope of the bell sound (Fig. 2) which clearlyshows that the lowest component isn't the strongest in amplitude. The other is the in-harmonicity of the spectrum which produces the more roughness, the more notes areplayed simultaneously. In fact, a dense layer of inharmonic components results fromthe four- voice texture of the choral (Fig. 1) whereby this layer 'masks' the four com-ponents which, as 'pseudo-fundamentals' of the respective sounds, refer to the musicalnotes that constitute the musical structure. The third obstacle simply is that with thegiven bell sound the main pitch perceived must not be equivalent to the frequencyof the lowest spectral component (which might be labelled 'pseudo-fundamental').Thereby, four inharmonic sounds played at a time (as is the case with most of thechords) must by no means result in perception of four (and only four) pitches equiva-lent to the fundamental frequencies of the respective notes (compare Figs. 3 and 5).

Consequently, even musically trained subjects have difficulties in following themotion of the four voices as well as apprehending the chord structure. The ambigu-ity resulting from tonal music in a major key (in this case, A-flat major) played withbells that are famous for their minor third (see Fig. 2) plus inharrnonic componentscontained in the spectrum, obviously not only affects perceptual qualities such as con-sonance and roughness, but also apperception of the musical structure of a given piece.It would require special training even for music students to adapt to polyphonic tonalmusic in a major key played on a carillon with minor third bells. Even then someambiguity resulting from the sound characteristics of the bells will remain. It has beenfound in experiments that only carillon experts appreciate the tension that results frommusic in a major key played with minor third bells whereas musically trained subjectsunfamiliar with bells and carillons accept such stimuli in case musical and acousticalcharacteristics can be matched, that is, if music in a minor key is played with minor-third bells, and music in a major key is realized with major third bells (Houtsma &Tholen, 1987).

Of course, perceptual qualities (consonance vs. roughness) and musical apprehen-

Page 125: Musical Imagery

112 COMPLEX INHARMONIC SOUND

sion necessary to judge harmonische Stimmigkeit also affect, finally, aesthetic pleas-antness which is judged considerably lower for the bell version than for the organversion of the choral. For explanation, one should think of this variable (aestheticpleasantness) as depending on both sensory qualities and cognitive evaluation of stim-uli since a change from harmonic to inharmonic sound means also increasing com-plexity of the musical stimulus. It has been found in experimental studies that therelation of aesthetic pleasantness to stimulus complexity can be described by a cor-relation function known as 'inverse U'. That is, with increasing complexity of, forexample music, also aesthetic pleasantness in listening first grows up to a maximum.In case complexity is increased further, even educated subjects are less and less able toapprehend structural and other features of the stimulus 'in real time' so that they feelbeing 'outstripped'. Such disappointment typically goes along with lower judgementsof aesthetic pleasantness for stimuli which are too complex with respect to the numberof perceptual and cognitive dimensions that are involved.

Relation of perception and apperception to musical imagery

As had been pointed out already by Carl Stumpf (1883, 1890, 1918), perception basedon sensory data and imagination of stimuli operate on the same psychic principles andfunction so that we may experience a difference in intensity, yet not one in quality, incase we either listen to, or just imagine, a piece of music. It seems likely that percep-tion and imagery are interrelated since apprehension (apperception) in music hardlycan be achieved without recall of musical knowledge including schemata, features thatare characteristic of a certain genre and/or styIe, and even scores once studied or usedin performance whenever we try to analyze and understand a piece of music in actuallistening.

A simple strategy in understanding something that is new and (completely or par-tially) unknown is to detect features of this stimulus, and to find out whether these canbe matched to a learned schema or not. One basic psychic function in this case (as wellas in many other instances) thus is comparison, another is recollection, yet another ex-pectancy, which to be sure, means expectancy guided by knowledge already acquired.Thus, perception cannot be separated from previous experiences, and knowledge de-rived therefrom. From this it can be concluded that, in actual listening, knowledgestored in memory is activated to support formation of images which 'represent' musi-cal objects. For example, in order to apprehend the musical structure, listeners of thebell version of the chorale (see above) face the task to derive a simplified model fromprocessing of sensory input which retains melodic contours as well as the harmonicrelations of the sequence of chords. Consequently, listeners will try to form an imagewhich is a mental construct yet based on actual sensations. The plot shown in Figure 5is one possible candidate of how a musical image derived from a perceptual (auditory)image could look like. In this respect, the notion of image contains elements of bothan Abbild (copy) as well as of a mental scheme, that is, a Vorstellung. Auditory im-agery (mental concepts of what has been heard) as well as musical imagery (conceptsof musical structures either actually heard or just imagined) are likely to make use ofgeometric shapes such as melodic contours since geometrization and representation

Page 126: Musical Imagery

ALBRECHT SCHNEIDER 113

of objects in an Euclidean space evidently supports both perception and apperception.In case melodies have distinctive 'corners' and 'slopes', it is sufficient to store thesefeatures in memory and to retrieve the information in actual listening when it comesto identifying stimuli which, to be sure, thus are re-cognized (the reflective nature ofcognition has been stressed from John Locke to Stumpf and Husserl).

In a more general perspective which involves epistemological issues, geometriza-tion seems to be unavoidable regarding physical constraints (for a more detailed dis-cussion, see Shepard [1989]; as to constructs of 'tonal space', see Shepard [1982];Schneider [1992]). Geometrization thus is interpreted as a universal cognitive strat-egy to deal with especially complex (multi-dimensional and/or unkown) stimuli.

It has been argued, on the basis of experimental findings 'that auditory imageryis based on sensory (auditory) processing but not on motor (articulatory) processing'(Crowder 1993, p. 135). The findings on perception of music realized with complexinharmonic sounds are well in line with this view. In order to comprehend the structureof the Bach chorale as realized with bells, motor approaches (as have been advancedin speech studies, psychology and music psychology) would not be much of help.Rather, decomposition of the stimulus which is complex regarding its dimensionalityis needed to obtain an image that, as a model, reduces dimensionalty yet retains allthe salient features necessary to comprehend the piece with respect to compositionalstructure and musical syntax. The condition which perhaps most necessitates reduc-tion in dimensionality is time: Apprehending music while listening requires subjectsto concentrate on 'essentials', and to neglect accessory features.

Regarding the experiment reported above (section Experiment above), it seemsthat subjects had difficulties both with the dense texture resulting from chords of in-harmonic bell sounds as well as to recognize the work that was played. Subjects whomight have even known the chorale lch bin's, ich sollte bussen, had certainly neverlisten to a version played with a carillon which, consequently, was difficult to recog-nize. Further, this version is much more complex in sound structure than the versionplayed with the synthesized organ stop, whereby the bell version cannot be so easilydecomposed 'in real time' along dimensions which, ideally, should be separable, andindependent from each other (see Garner, 1974; Ashby & Townsend, 1986). In bellsand other inharmonic sounds, however, pitch and timbre are so closely interrelatedthat apprehension of syntactic structures can be difficult especially in polyphonic mu-sic played on carillons or in styles of similar complexity such as change-ringing of apeal of bells as is still practiced in many English communities.

Acknowledgements

The bell sounds that have been used for experiments in this and other papers have beenrecorded in Brugge (Flanders) with the help of Aime Lombaert and Marc Leman, inan ongoing research project. For discusssions of issues in signal processing relevantto this paper, I'd like to thank Ludger Solbach and RolfWohrmann. Finally, thanks toDaniel Mtillensiefen who assisted in the statistical analysis of experimental data.

Page 127: Musical Imagery

114

References

COMPLEX INHARMONIC SOUND

Albersheim, G. (1979). Zur Musikpsychologie (2nd ed.). Wilhelmshaven: Heinrichshofen.Ashby, F. Gregory, Townsend, IT. (1986). Varieties of perceptual independence, Psychological Review, 93,

154-179.Bekesy, G. v. (1960). Experiments in hearing. New York: McGraw-Hill.Boer, E. de (1976). On the 'residue' and auditory pitch perception. In Keidel, W.D., Neff, W.D. (Eds.),

Handbook ofSensory Physiology (Vol. V:3, pp. 479-583). Berlin and New York: Springer.Bortz, J. 1985. Lehrbuch der Statistik (2nd ed.). Berlin: Springer.Bregman, A. (1990). Auditory scene analysis. Cambridge, Mass and London: The MIT Press.Bruhn, G. (1980). Uber die Horbarkeit von Glockenschlagtonen. Regensburg: Bosse.Cariani, P.A., & Delgutte, B. (1996). Neural correlates of the pitch of complex tones, I: Pitch and pitch

salience, II: Pitch shift, pitch ambiguity, phase invariance, pitch circularity, rate pitch, and the domi-nance region of pitch. Journal ofNeurophysiology, 76, 1698-1716, 1717-1734.

Cogan, R., & Escot, P. (1976). Sonic design. The nature of sound and music. Englewood Cliffs, N.J.:Prentice Hall.

Cohen, E.A. (1984). Some Effects of inharmonic partials on interval perception. Music Perception, 1,323-349.

Crowder, R.G. (1993). Auditory memory. In S. McAdams & E. Bigand (Eds.), Thinking in sound. Thecognitive psychology ofhuman audition (pp. 113-145). Oxford: Clarendon Press.

Delgutte, B. (1996). Physiological models for basic auditory percepts. In H. L. Hawkins, T.A. McMullen,A.N. Popper, & R. A. Fay (Eds.), Auditory computation (pp. 157-220). New York: Springer.

fleischer, H. (1996). Schwingung und Tonhohe von Glockenspielglocken. Forschungs- und Seminarberichteaus dem Gebiet Technische Mechanik und FHlchentragwerke 1/96. MUnchen: UniversitlU der Bun-deswehr.

fletcher, N.H., & Rossing, Th.D. (1991). The Physics ofmusical instruments. New York: Springer.Gamer, W.R. (1974). The Processing ofinformation and structure. Potomac, Md: Erlbaum Ass.

R.I. (1997). Knowledge in music theory by shapes of musical objects and sound-producing actions.In M. Leman (Ed.), Music, Gestalt and Computing (pp. 89-102). Berlin etc.: Springer.

Greenberg, S., & Rhode, W.S. (1987). Periodicity Coding in cochlear nerve and ventral cochlear nucleus.In W.A. Yost & C.S. Watson (Eds.), Auditory processing ofcomplex sounds (pp. 225-236). Hillsdale,N.J. and London: L. Erlbaum.

Grossberg, S. (1999). Pitch-based streaming in auditory perception. In N. Griffith and P. Todd (Eds.),Musical Networks. Parallel distributed perception and performance (pp. 117-140). Cambridge, Mass.and London: The MIT Press.

Hartmann, W.A. (1998). Signals, sound, and sensation. New York etc.: Springer.Helmholtz, H. v. (1863). Die Lehre von den Tonempfindungen alsphysiologischeGrundlagefiirdie Theorie

der Musik. Braunschweig: Vieweg.Hesse, (1972). Die Wahrnehmung von Tonhohe und Klangfarbe als Probleme der Hortheorie.

A. Volk.Houtsma, A.J.M. (1995). Pitch perception. In B. J. C. Moore (Ed.), Hearing (pp. 267-295). San Diego and

London: Academic Press.Houtsma, A., & Tholen, H. (1987). Perceptual evaluation (= A carillon of major-third bells, pt II), Music

Perception, 4, 255-266.Immerseel, L.M. van, & Martens, J.P. (1991). Pitch and voiced/unvoiced determination with an auditory

model. Journal ofthe Acoustical Society ofAmerica, 89, 3511-3526.Intons-Peterson, M.J. 1992. Components of auditory imagery. In D. Reisberg (Ed.), Auditory imagery

(pp. 45-71). Hillsdale, N.J.: Erlbaum.Javel, E., Horst, J.W., & Farley, G.R. (1987). Coding of complex tones in temporal response patterns of

auditory nerve fibers. In W.A. Yost & C.S. Watson (Eds.), Auditory processing of complex sounds(pp. 237-246). Hillsdale, N.J. and London: L. Erlbaum.

Kaernbach, C., & Demany, L. (1998). Psychophysical evidence against the autocorrelation theory of audi-tory temporal processing. Journal ofthe Acoustical Society ofAmerica, 104,2298-2306.

Keidel, W.D. (Ed.) (1975). Physiologie des Gehiirs. Akustische lnformationsverarbeitung. Stuttgart andNew York: Thieme.

Page 128: Musical Imagery

ALBRECHT SCHNEIDER 115

Keidel, W.D. (1992). Das des Htlrens. Ein Diskurs. Naturwissenschaften, 79,300-310,347-357.

Kosslyn, S. (1990). Imagery, computational theory of. In M. Eysenck (Ed.), The Blackwell Dictionary ofcognitive psychology, (pp. 177-181). Oxford: Blackwell.

Kosslyn, S. (1994). Inlage and Brain: The Resolution of the Imagery Debate. Cambridge, Mass.: HarvardUniversity Press.

Leppert, R. (1996). Art and the committed eye: the culturalfunctions ofimagery. Boulder, Col.: WestviewPress.

Meddis, R., & Hewitt, M. (1991). Virtual pitch and phase sensitivity of a computer model of the auditoryperiphery, I: Pitch identification, II: Phase sensitivity. Journal ofthe Acoustical Society ofAnlerica, 89,2866-2894.

Moore, BJ.C. (1993). Frequency analysis and pitch perception. In W.A. Yost, A.N. Popper & R.A. Fay(Eds.), Human psychophysics (pp. 56-115). New York: Springer.

R., & Winkler, I. (1999). The Concept of auditory stimulus representation in cognitive neuro-science. Psychological Bulletin, 125,826-859.

Patterson, R.D., Robinson, K., Holdsworth, J., McKeown, D., Zhang, C., & Allerhand, M. (1992). ComplexSounds and auditory images. Advances in the Biosciences, 83,429-443.

Patterson, Roo, Allerhand, M., & Giguere, C. (1995). Time-domain modeling of peripheral auditory pro-cessing: A modular architecture and a software platform. Journal ofthe Acoustical Society ofAmerica,98, 1890-1894.

Riemann, H. (1914/15). Ideen zu einer "Lehre von den Tonvorstellungen". JahrbuchPeters, 21122,1-26.Rossing, Th. (1982). The Science ofsound. Menlo Park, CA and London: Addison-Wesley.Sachs, M.B. & Young, E.D. (1979). Encoding of steady-state vowels in the auditory nerve: representation

in terms of discharge rate. Journal ofthe Acoustcal Society ofAmerica, 66,470-479.Schneider, A. (1992). On Concepts of 'tonal space' and the dimensions of sound. In R. Spintge & R. Droh

(Eds.) MusicMedicine (pp. 102-127). St. Louis: MMB Music.Schneider, A. (1997a). 'Verschmelzung', tonal fusion, and consonance: Carl Stumpf revisited. In M. Leman

(Ed.), Music, Gestalt, and computing: studies in cognitive and systematic musicology (pp. 117-143).Berlin/Heidelberg/New York: Springer.

Schneider, A. (1997b). Skala - Tonhohe - Klang. Akustische, tonometrische und psychoakustische StudienaufvergleichenderGrundlage. Bonn: Verlag fUr Systematische Musikwissenschaft.

Schneider, A. (1998). Notes on the analysis and resynthesis ofmusical sounds and on nonlinear instrumentbehaviour. Systematische MusikwissenschaftJ Systematic Musicology/ Musicologie systematique, 5,29-47.

Schneider, A. (1999). Acoustical Research into Idiophone sounds by means of Autoregressive SpectralAnalysis and Wavelet Gammatone Filtering: Implications for pitch perception. In I. Zannos (Ed.),Music and signs. Proceedings of the Vth International Symposium on Systenlatic and ComparativeMusicology (pp. 99-116). Bratislava: Asco.

Schneider, A. (2000). Inharmonic sounds: implications as to pitch, timbre, and consonance. Journal ofNew Music Research, 29.

Schneider, A. & D. MUllensiefen (1999). Systematische und Vergleichende Musikwissenschaft in Ham-burg. In P. Petersen & H. Rtlsing (Eds.), 50 Jahre Musikwissenschaftliches Institut in Hamburg (pp. 43-63). Hamburger Jahrbuch der Musikwissenschaft, Bd 16. Frankfurt/M.: P. Lang.

Schouten, J.F. (1940). The Perception of pitch. Philips Technical Review, 5, 286-294.Schouten, J.F. & t'Hart, J. (1965). De Slagtoon van klokken. Publikatie no. 7, Nederlands Akoestisch

Genootschap, 9-19. English translation (1984) in T.D. Rossing (Ed.), Acoustics ofbells (pp. 245-255).Stroudsburg, PA: Van Nostrand-Reinhold.

Seifert, U. (1993). Systematische Musiktheorie und Kognitionswissenschaft. Bonn: Verlag fUr Systemati-sche Musikwissenschaft.

Shepard, R.N. (1982). Structural Representations of musical pitch. In D. Deutsch (Ed.), Psychology ofmusic (pp. 343-390). San Diego: Academic Press.

Shepard, R.N. (1989). Internal Representation of universal regularities: a challenge for connectionism. InL. Nadel, A. Cooper, P. Culicover, & R.M. Hamish (Eds.), Neural connections, mental computation(pp. 104-134). Cambridge, Mass. and London: The MIT Press.

Slawson, W. (1985). Sound Color. Berkeley: University of California Press.

Page 129: Musical Imagery

116 COMPLEX INHARMONIC SOUND

Solbach, L. (1998). An Architecture for robust partial tracking and onset localization in single channelaudio signal mixes. Dr.Ing. Thesis, Technical University Hamburg-Harburg.

Solbach, L., WOhrmann, R., & J. Kliewer, J. (1998). The complex-valued continuous wavelet transform asa preprocessor for auditory scene analysis. In D.E. Rosenthal, H. & G. Okuno (Eds.), Computationalauditory scene analysis (pp. 273-291). Mahwah, N.J.: L. Erlbaum Assoc.

Srulovicz, P., & Goldstein, J. (1983). A central spectrum model: a synthesis of auditory-nerve timing andplace cues in monaural communication of frequency spectrum. Journal of the Acoustical Society ofAmerica, 73,1266-1276.

Stem, W. (1897). Psychische Zeitschriftfiir Psychologie, 13,325-349.Stumpf, C. (1883, 1890). Tonpsychologie, Vols. 1 & 2. Leipzig: Hirzel.Stumpf, C. (1907). Erscheinungen und psychische Funktionen. Abhandlungen der KOniglich Preussis-

chen Akademie der Wissenschaften, Jahrgang 1906, Phil.-hist. Klasse Nr. 4. Berlin: Akademie derWissenschaften.

Stumpf, C. (1918). Empjindung und Vorstellung. Abhandlungen der KOniglich Preussischen Akademie derWissenschaften, Jahrgang 1918, Phil.-hist. Klasse Nr. I. Berlin: Akademie der Wissenschaften.

Stumpf, C. (1926). Die Sprachlaute. Berlin: SpringerTerhardt, E. (1979). Calculating virtual pitch. Hearing Research, 1, 155-182.Terhardt, E. (1998). Akustische Kommunikation. Berlin and New York: SpringerVogel, M. (1993). On the Relations oftone. Bonn: Verlag fUr Systematische Musikwissenschaft.Watkins, A.J., & Dyson, M.C. (1985). On the perceptual organisation of tone sequences and melodies. In

P. Howell, I. Cross, & R. West (Eds.), Musical Structure and Cognition (pp. 71-120). Orlando etc.:Academic Press.

Young, E.D., & Sachs, M.B. (1979). Representation of steady-state vowels in the temporal aspects of thedischarge patterns ofpopulations of auditory nerve fibers. Journal ofthe Acoustical Society ofAmerica,66,1381-1403.

Zannos, I. (1995). The 'Tone Net'. A paradigm for music representation and music generation. Systemati-sehe Musikwissenschaft/Systematie Musicology/Musicologie systematique, 5, 17-30.

Zenner, P. (1994). HOren. Physiologie, Biochemie, Zell- und Neurobiologie. Stuttgart and New York:Thieme

Page 130: Musical Imagery

7

Musical Imagery betweenSensory Processing andIdeomotor Simulation

Mark Reybrollck

Introduction

Music can be handled in two different ways: one is dealing with the music as it un-folds through time, as a kind of sonorous articulation, with actual sounds that can beperceived in an objective manner; the other is dealing with music at the level of imag-ination with the sounds sounding only at a virtual level. Imagery in fact is usuallydefined as the occurrence of a perceptual sensation in the absence of the correspond-ing perceptual input (Kosslyn, 1980; Le Ny, 1994). It is possible, however, to haveimaginative projections in the presence of perceptual input as well. Imagery, then, iscoperceptual rather than purely autonomous. As such, it holds a position between theepistemological paradigms of realism and nominalism, that stress either the sensory'realia' or the imaginative reconstructions of these realia in the listener's mind. As Iwill try to show, this distinction, that is traditionnally conceived of as a dichotomy,can be weakened in favour of a combined approach of both sensory processing andimaginative reconstruction of the mind.

Page 131: Musical Imagery

118 MUSICAL IMAGERY, SENSORY PROCESSING AND IDEOMOTOR SIMULATION

The problem of imagery

What is imagery? And what is musical imagery? There is no unambiguous answer tothis question. Most of the studies of imagery have been done in the field of visual per-ception and the psychology of perception and cognition has traditionally taken staticvisual forms as the paradigm of imagery. But images, according to Hoffman & Ho-neck (1987), are more tied to event-perception than to the 'scanning' of image formsthat possess static properties. The images that observers experience are in fact dy-namic, and the same holds true for music as a temporal art. What is needed therefore,is a conceptual framework that does justice to the dynamic ongoing characteristics ofthe sonorous articulation through time and the organization of mental representationsinto meaningful units. A somewhat analogous way of thinking was already advocatedby Kant, who claimed that imagination generates much of the connecting structure bywhich we have coherent, significant experience (Kant, 1790, see also Johnson, 1987).I will not enter into his complex treatment of imagination and his distinction betweenreproductive, productive, schematizing and creative functions. I only mention the re-productive and schematizing function of imagination that I consider to be importantepistemological tools for making sense of music. As Johnson (1987, p. 165) puts it,the reproductive function of imagination gives us a unified, coherent experience overtime, that allows us to grasp a series of perceptual inputs as connected. We thus canexperience objects that persist through time. The schematizing function, on the otherhand, mediates between the more abstract concepts and the contents of sensation,making it possible for us to conceptualize what we receive through sense perception.

This approach immediately outlines a major problem in music cognition. Is musi-cal imagery merely the representation at a virtual level of an objective musical realitythat exists 'out there' - music as an 'ontological category' - or is it a deliberate andconscious construction and reconstruction of a reality that has multiple readings? Theproblem is related to Lakoff's (1987, 1988) philosophical distinction between objec-tivist and non-objectivist cognition. The former states that the mind can achieve realknowledge of the external world only if it can represent what is really in the world.Hence, the concern in the objectivist tradition with cognitive representation of exter-nal reality - at least at an early stage of processing - rather than with the nature ofthe beings doing the cognizing. Two things, however, are lacking in this picture: 'therole of the body in characterising meaningful concepts, and the human imaginative ca-pacity for creating meaningful concepts and modes of rationality that go well beyondany mind-free, external reality.' (Lakoff, 1988, p. 119). What is needed therefore is a'cognitive semantics', that accounts for what meaning is to human beings, rather thanclaiming a reality that is external to human experience.

What matters here is the epistemological duality that distinguishes between ob-server and observed things (Berthoz, 1997; Bouveresse, 1995). The problem wasalready stated by William James:

As 'subjective' we say that the experience represents; as 'objective' it is repre-sented. What represents and what is represented is here numerically the same;but we must remember that no dualism of being represented and representingresides in the experience per see In its pure state, or when isolated, there is no

Page 132: Musical Imagery

MARK REYBROUCK 119

self-splitting of it into consciousness and what the consciousness is 'of'. Itssubjectivity and objectivity are functional attributes solely.... The instant fieldof the present is at all time what I call the 'pure' experience. It is only virtuallyor potentially either object or subject as yet. For the time being, it is plain,unqualified actuality, or existence. (McDermott, 1968, p. 177)

This subjective/objective dichotomy is a key problem in dealing with music. Music asan artefact can be described in an objective way. The musical experience, however, ishighly subjective and can be described in terms ofembodied and enactive listening (fora description of the terms, see Johnson, 1987; Lakoff, 1987) that take the human bodyand its actions as a reference as well. The body, according to Lidov, can be regardedas a privileged context of external reference that divides the total universe of our dis-course into mutually exclusive 'objective-exosomatic' and 'subjective-endosomatic'realms:

The endosomatic world is one we feel; the exosomatic world, one we see.The exosomatic realm, preeminently visual, is stabilized and articulated bythe physiology of Gestalt perception. Its objects commute without apparentdistortion; that is, we can move ourselves and many other things around in itwithout altering them. The endosomatic realm is largely unarticulated. To besure, it has its distinctive entities just as the other space has its fogs and clouds.Hunger is as definite a thing as a tea cup. But as a general rule the conditionsof the body fade into each other and effect each other, and there is only a littleroom to maneuver when it comes to reordering them. The endosomatic spaceis chiefly a realm of flux and influence. Its chief contents are hanging statesrather than fixed objects. (Lidov, 1987, p. 75)

Music as experience: experiential cognition

Dealing with music can be described in epistemological terms. Rather than statingthat music, as an artefact is out there - as an ontological category -, I claim that musiccognition is a tool for adaptation to the sonic world (Reybrouck, 2000, 2001). Whatwe call knowledge, according to von Glasersfeld, is the result of our own constructionand of how we make the world we experience. This claim is the central dogma ofradical constructivism (Glasersfeld, 1995). It rejects the 'realist dogma' that consid-ers knowledge as a representation of an independent reality with objectively existingcategorical structures in favour of a conception of knowledge as conceptually drivenand relative to human understanding and intentionality. This non-objectivist approachcalls up the mediating role of human subjectivity. It is closely related to the experi-ential and conceptual or cognitive approach to cognition that claims a semantics ofunderstanding (Fillmore, 1984) rather than a semantics of truth (see also Johnson,1987, p. 174). Central in this approach is the construction of the external world as theresult of an interaction between external input and the means available to internallyrepresent it. What matters here, according to Jackendoff (1988), is the priority of Con-ceptual Semantics over Real Semantics. Or as he puts it, the characteristics of matterin the physical world must be regarded in terms of how humans structure the world,

Page 133: Musical Imagery

120 MUSICAL IMAGERY, SENSORY PROCESSING AND IDEOMOTOR SIMULATION

and this is determined by our capacity for mental representation and the properties ofthe ontological categories available in our conceptual structure (1987, pp. 151-152).Meaning, here, is not objectively given, or 'real', but is constructed in continuous in-teraction with the world. Conceptual semanctis, therefore, is closely related to theconcept of 'experiential cognition' (Johnson, 1987; Lakoff, 1987). To quote Lakoff:

"Experiential" is to be taken in the broad sense, including basic sensory-motor,emotional, social, and other experiences of a sort available to all normal humanbeings - and especially including innate capacities that shape such experienceand make it possible... "Experiential" should definitely NOT be taken in theempiricist sense as mere sense impressions that give form to the passive tabularasa of the empiricists. We take experience as active functioning as part of anatural and social environment. We take common human experience - givenour bodies and innate capacities and our way of functioning as part of a realworld- as motivating what is meaningful in human thought. (Lakoff, 1988,p. 120)

Meaning, thus, is characterized in terms of our collective biological capacities andour physical and social experiences as beings functioning in our environment. Varelaet al. argued on similar lines in suggesting a change in the nature of reflection froman abstract, disembodied activity to an embodied (mindful), open-ended reflection:'cognition depends upon the kinds of experience that come from having a body withvarious sensorimotor capacities, and ... these individual sensorimotor capacities arethemselves embedded in a more encompassing biological, psychological and culturalcontext' (Varela et aI., 1991, p. 173). Embodied, then, means reflection in whichbody and mind have been brought together (1991, p. 27), as is the case in practicing abasic skill such as playing a flute. As one practices, the connection between intentionand action becomes closer, until eventually the feeling of difference between them isalmost entirely gone. One achieves a certain condition that phenomenologically feelsneither purely mental nor purely physical; it is rather a specific kind of mind-bodyunity (Varela et aI., 1991, p. 29).

Imagery between sensorium and motorium: an operational approach

In what follows I will try to develop the concept of enactive listening as a kind ofdealing with music that leans upon our having a body with various sensorimotor ca-pacities. Listening, in fact, involves action as well as perception. This is obvious inplaying music (see Molino, 1988), but also in pure listening there is a coupling withaction, be it at an internalized level. As Berthoz (1997, p. 233) puts it, perceiving anobject is to imagine the actions that are implied in using it. This coupling of action andperception is evidenced by modern research on the relationships between perception,imagery and motor preparation. Perception, in fact, involves the same neural sub-strates as action (the supplementary motor area) and the same holds true for imaginedaction (Berthoz, 1996, 1997; Di Pellegrino, Fadiga, Fogassi, Gallese, & Rizzolatti,1992; Jeannerod, 1994; Annett, 1996; Decety, 1996). Perception, therefore, can beconsidered as simulated action, as imagining the actions that are implied in using the

Page 134: Musical Imagery

MARK REYBROUCK 121

perceived objects. This echoes, in a way, Gibson's claims that animals perceive en-vironmental objects in terms of what they 'afford' for the consummation of behavior(their affordances) (1979, p. 127), but Piaget argued on similar lines. He stated thatknowledge arises from the subject's activity, either physical or mental, and that it isgoal-directed activity that gives knowledge its organization (1967, pp. 14-15). Thesame idea was advocated by Jakob von Uexktill who said that, in order to arrive at anunderstanding of what things mean, it is necessary to study the possibility to act uponthem. He introduced the concept ofJunctional cycle as an operational tool for describ-ing the basic structure of the interactions between human and animal organisms andthe objects of their surrounding worlds:

Figuratively speaking, every animal grasps its object with two arms of a for-ceps, receptor and effector. With the one it invests the object with a receptorcue or perceptual meaning, with the other, an effector cue or operational mean-ing. But since all of the traits of an object are structurally interconnected, thetraits given operational meaning must affect those bearing perceptual meaningthrough the object, and so change the object itself. (von Uexktill, 1957/1934,p.l0)

The concept of functional cycle is concerned with sensorimotor integration. It bringstogether action and perception, and enables us to define the subjective worlds orUmwelten of living beings. Subjects, in fact, can be related to the same objects byseveral functional cycles, and in doing so, they construct their Umwelt.

'Umwelt' -research, further, is highly informative in determining the perceptualand functional triggers of the existing stimuli. Every subject builds up relations withthe external environment, selecting some of them to give them special meanings, andto construct his/her specific Umwelt. The same applies to music and the way the lis-tener constructs his/her musical Umwelt (Reybrouck, 2000). The qualities listenerscan attribute to the sounds are not merely reducible to their objective qualities, butimpinge upon a whole domain of imaginative projections that emerge from recurrentpatterns of sensorimotor and ideomotor activity. Schooled listening, therefore, in-volves a kind of active processing of the sound that involves both the construction ofan internal model of the sonic Umwelt and a kind of circularity that couples perceptionwith action. Sensorimotor strategies are really important here, but they are conserva-tive in keeping step with the real unfolding of the sonorous articulation. They callupon a mode of operating that acts as a 'controller'. To quote Berthoz:

One mode [of operating] consists of sensory-motor loops linking the sensorsto the central nervous system and the effectors. These sensory-motor loopswork as conservative processes like cybernetic loops. They are continuous,have properties similar to servomechanisms, deal with sensory signals whichare transformed into motor commands through the estimation of motor errorsand are regulated by feed-back or feed-forward mechanisms. They operateon a repertoire of motor synergies which generate a set of motor primitives.(Berthoz, 1996, p. 102)

Real enactive listening, however, is likely to be proactive rather than merely conser-vative (see Jones, 1987; Jones & Boltz, 1989; Narmour, 1990), in the sense that the

Page 135: Musical Imagery

122 MUSICAL IMAGERY, SENSORY PROCESSING AND IDEOMOTOR SIMULATION

listener can make anticipations as to the evolution of the sound, and this on the basis ofan internal model. This fits well the newer paradigms in neuroscience that transcendthe conception of the brain as a reactive machinery (Paillard, 1977, 1990, 1994b;Berthoz, 1997) that is able to respond in an automatic way to the solicitations of theenvironment. This reductionist approach has been relinquished by the introduction ofintermediary hypothetical variables that refer to mental operations that are interposedbetween the perception of the stimulus and the triggering of the action and this on thebasis of the formation of an internal model of reality (Paillard, 1994b). What mattershere is the transition from mere sensory-motor integrations, as kinds of wired-in pro-grams that unfold in a quasi-automatic way to a kind of cognitive mediation that is notconservative but anticipatory. The latter, according to Berthoz, involves higher centralloops, that operate as a 'projective process', where signals are processed in internalloops having no direct links with sensors. This mode produces predictions of futurestates, it preselects strategies and selects reference frames for the control of movementand posture. In this mode the brain works as a simulator (Berthoz, 1996, p. 102).

Perception, further, is constrained by action and by the implicit knowledge con-cerning the movements that can be produced (Viviani & Stucchi, 1992). These claimsare somewhat related to what is commonly known as the motor theory ofperception(Liberman & Mattingly, 1985, see also Viviani, 1990). What is meant here is a broadframework that deals with many aspects of the perception/action domain. Its mainclaim is that mental states such as perception or imagery may arise from movement or,more precisely from 'innervation' (centripetal as well as centrifugal) that is associatedwith movements. The classical theory is known as the muscle- or EMG-feedback hy-pothesis. It states that motor images are generated by peripheral mechanisms that feedback to the central levels. This 'peripheral' variant of the theory would hold that, inthe same way as the perceived image of an object arises from the muscular dischargesproduced during the movements for exploring it, the mental image of that object isproduced by covert muscular discharges in the related muscles (James, 1901/1890;Jacobson, 1930; for a review of the historical and epistemological roots of the motortheory, see Scheerer, 1984). This version of the theory, however, has been rejectedconclusively as there is no bottom-up influence of muscle discharges during mentalstates of imagery (Mackay, 1992). A 'central version', however, of the motor theoryseems to be valid, as stated by Jeannerod (1994). The key phenomenon in this concep-tion is the motor intention, which is thought to be largely endogenous. Motor imagerywould then represent the result of conscious access to the content of these intentions.The central version of the theory thus relies on central programming and doesn't needfeedback. Sensory feedback, however, does have an effect on movement. Much of thegrace and subtlety of movement, according to Rosenbaum (1991, p. 108), is presentwhen feedback is available, but disappears when feedback is withdrawn. This sug-gests that although a motor program may allow a movement sequence to be carriedout uninfluenced by peripheral feedback, it does not require movement sequences tobe uninfluenced by peripheral feedback. The reactive machinery and the sensorimotorintegrations, therefore, are not to be cut off from the merely central simulations.

Page 136: Musical Imagery

MARK REYBROUCK

Musical representation between percept and concept

123

As I will state further, the topic of imagery is closely linked to the problem of percep-tion. There are in fact imaginal-perceptual-cognitive relationships (Intons-Peterson,1992). This was already advocated by Bateson who stated that all perception has im-age characteristics (1985), but recent neurophysiological and psychological researchhas come to the same conclusions: images and percepts share the same format (Deecke,1996; Rollins, 1989, and for auditory imagery: Zatorre & Halpern, 1993; Carroll-Phelan & Hampson, 1996). There are, however, distinctions as well. The problem athand is the difference between two distinct perceptive modes. One mode 'objectifies'actual things under the guise of presentational immediacy or sense-perception, theother 'objectifies' them under the guise of a kind of synthetic activity with the mindintervening with its conceptual analysis (Whitehead, 1927, p. 21).

There are several ways to deal with this distinction. An interesting contributionis Langacker's distinction between autonomous and peripherally connected cognitiveevents: 'The sensation directly induced by stimulating a sense organ is an instanceof a peripherally connected event; the corresponding sensory image, evoked in theabsence of such stimulation, is an autonomous but equivalent event.' (1987, p. 12).Somewhat related is Jackendoff's distinction between 'lower' or 'more peripheral'levels of structure and 'higher' or 'more central' levels. Lower levels of structure in-terface most directly with the physical world. Higher levels represent a greater degreeof abstraction, integration, and generalization vis-a-vis sensory input (1987, p. XX).Both levels, however, do not not exclude each other. There is a gradation from lowerto higher levels, and both can operate simultaneously, allowing a kind of listeningstrategy that combines both generality and particular precision in decoding the acous-tic information. Dealing with music, in fact, is an experiential as well as a conceptualaffair (Reybrouck, 1998). It implies time-bound reactivity (a kind of wired-in reac-tions to standardized stimuli) as well as higher level cognitive processes that are theoutcome of mediation between stimulus and reaction (Reybrouck, 1999). As such itis possible to gradually shift from presentations or eidetic images that have all thecharacteristics of a percept (things known by the senses) to a kind of representationthat is largely autonomous (senses are closed). What matters here is the differencebetween the economy of abstraction and the subtlety of experience, or to state it inanother way: the difference between an analogue image system and a languagelikeor propositional system (Bideaud & Houde, 1991). Or, as Dretske puts it: 'In pass-ing from the sensory to the cognitive representation ..., there is a systematic strippingaway of components of information ... which makes the experience of [something] ...the phenomenally rich thing we know it to be, in order to feature one component ofthis information.' (Dretske, 1985, p. 183). This is in fact a process of 'digitalization'or 'conceptualization' whereby a piece of information is taken from a richer matrix ofinformation in the sensory-analog representation anq featured to the exclusion of allelse.

The idea of 'economy' of rules, however, is in conflict with many of the subtletiesof musical experience. The problem is closely related with one of the central claimsof James' doctrine of radical empiricism (James, 1976; McDermott, 1968) in whichhe was intent on showing that the role of percepts - 'knowledge-by-acquaintance' - is

Page 137: Musical Imagery

124 MUSICAL IMAGERY, SENSORY PROCESSING AND IDEOMOTOR SIMULATION

the crucial element in epistemology, as percepts are 'the only realities we ever directlyknow'. Their relationship with concepts is stated as follows:

'Things' are known to us by our senses and are called 'presentations' by someauthors, to distinguish them from the ideas or 'representations' which we mayhave when our senses are closed. I myself have grown accustomed to the word'percept' and 'concept' in treating of the contrast. ...

And further:

The great difference between percepts and concepts is that percepts are contin-uous and concepts are discrete. Not discrete in their being, for conception as anact is a part of the flux of feeling, but discrete from each other in their severalmeanings. ... The perceptual flux as such, on the contrary, means nothing, andis but what it immediately is. No matter how small a tract of it be taken, it is al-ways a much-at-once, and contains innumerable aspects and characters whichconception can pick out, isolate, and thereafter always intend. (McDermott,1968, pp. 232-233)

The problem, further, is still more complicated if the object of perception is not actu-ally present as in memory awareness. According to Ransdell (1986, p. 72), the recallof some event in past experience can be the propositional knowledge that such-and-such occurred, but it can be the experiential recall in memory of some part of one'spast experience as well. If the attempt to recall is successful, one may say of memorythat the intended object (the remembered event itself) and its iconic sign are at leastformally identical. In this sense, memory perception of this sort is direct, and so onehas direct as well as evidential access to the past.

An interesting musical analogy of this problem is provided by claimsabout the dynamics of representation. In describing the transformation from a flux tosome kind of object he considers the possibility (and legitimacy) of

thinking a musical object in different temporal representations, from "real time"versions to extremely compressed, i.e. "instantaneous" or "synoptic" kinds ofrepresentations, which have also been called "outside time" representations ofmusical objects. 1997a, p. 11)

The coupling with the inexorable character of time is a critical factor here. Mentaloperations that emancipate themselves from a merely time-bound character of pro-ceeding are lacking in sensory resolution but are gaining in abstract and conceptualautonomy. To quote again:

There may be "high-speed" or "broad-band" types of representations in theform of concentrated graphical overviews showing longer stretches of tempo-ral unfolding "at a glance", or there may be "slower" or more "sequential"types of "frame by frame" overviews. For this reason, I believe that a multi-plicity of temporal representations is not only possible and legitimate, but evenhighly desirable, as each velocity of representation can provide a different kind

Page 138: Musical Imagery

MARK REYBROUCK

of perspective, hence a different kind of knowledge of the musical substance.This question of different velocites will then also concern what I shall call res-olution and perspective, both of which are perhaps rather "atemporal" terms inordinary usage. (God{£1y, 1997a, p. 66)

125

I will not enter into the elaboration of these representational transformations. Cru-cial in this is the role of the cognizing subject and the strategies he/she leans upon.There is, however, a tension between a vague, inexact and macroscopic knowledgeof the music and precise local knowledge that does justice to the idiosyncrasies ofthe sonorous articulation. For the moment I argue for a complementarity of represen-tations that combines discrete conceptual knowlegde with the dynamical unfoldingthrough time. Dealing with music, in fact, involves both conceptual decoding andsensory processing that keeps step with the actual articulation through time. I recallonce again James' doctrine of radical empiricism, in which he claims a transition fromdiscrete particulars to relational continuity. Both positions provide advantages, but theprice for their acceptance is too steep as each of them violates the actual way in whichwe achieve our experience. The overarching principle of unity cannot account for par-ticularity and mere association of particulars cannot provide a principle of continuity(James, 1976, p. XXIII). He therefore affirms a relational continuity in reality, and hefurther contends that such a relational continuity is effectively experienced by us inour 'stream of consciousness' (1976, p. XX).

The implications of these insights for dealing with music are numerous. To quoteSerafine: 'Sound events that are logically discrete and isolable ... in fact are perceivedor felt as a continuous gesture.' (1988, p. 75, see also Todd, 1999; Gjerdingen, 1994).What is needed, therefore, is a kind of processual categorization of the sonorous artic-ulation, and an interesting conceptual tool for dealing with this matter is Langacker'sdistinction between summary and sequential scanning:

[summary and sequential scanning] are contrasting modes of cognitive pro-cessing ... Summary scanning is basically additive, and the processing of con-ceptual components proceeds roughly in parallel. All the facets of the complexscene are simultaneoulsy available, and through their coactivation ... ; they con-stitute a coherent gestalt. This is the mode of processing of things and atempo-ral relations... Sequential scanning, on the other hand, involves the successivetransformations of one configuration into another. The component states areprocessed in series rather than in parallel, and though a coherent experiencerequires a certain amount of continuity from one state to the next, they are con-strued as neither coexistent nor simultaneously available. This is the mode ofprocessing that characterizes processual predications and defines what it meansto follow the evolution of a situation through time. (Langacker, 1987, p. 248).

The critical factor in this distinction is the relative timing of processing, and whetherthe scanned events that correspond to different facets of a complex scene are activatedsimultaneously or successively.

The concept ofprocessual predication is an important tool for dealing with music.Making sense ofmusic, in fact, involves an act of imagination that grasps the sonorousunfolding as a processual figure that unfolds through time. What is meant here is a path

Page 139: Musical Imagery

126 MUSICAL IMAGERY, SENSORY PROCESSING AND IDEOMOTOR SIMULATION

of becoming, a kind of continuous transformation that is not restricted to a single state.The concept is somewhat related to some of Lakoff's image-schema transformationswhich are especially fruitful in providing operational descriptions of the listener's'listening strategies' (see also Saslaw, 1966). I quote extensively:

Path-focus H end-point-focus: It is a common experience to follow the path ofa moving object until it comes to rest, and then to focus on where it is. Also,many paths are traveled in order to arrive at an endpoint that is kept in sightalong the way. Such everyday experiences make the path-focuslend-point-focus transformation a natural principle of semantic relationships. Multiplex

H mass: As one moves further away, there is a point at which a group of indi-viduals, especially if they are behaving in concert, begins to be seen as a mass.Similarly, a sequence of points is seen as a continuous line when viewed from adistance. ODMTR H IDTR: When we perceive a continuous-moving object,

we can mentally trace the path it is following, and some objects leave trails -perceptible paths. The capacity to trace a path and the experience of seeing atrail left behind make it natural for the transformation linking zero-dimensionalmoving trajectors [ODMTR] and a one-dimensional trajector [IDTR] to playa part in semantic relations in the lexicon... (Lakoff, 1988, p. 147)

The analogies with listening to music are obvious. It is possible, however, to enlargethe concept of image-schema transformation from the mere perception of moving ob-jects to perceptual-motor activities (for a musical analogy, see Gromko and Poorman,1998). Skills as tying a knot or drawing Chinese characters are two examples. Whatis needed in order to perform these activities is a kind of continuous knowledge rep-resentation that is 'analog' and 'procedural' rather than 'declarative'. It is somewhatrelated to the 'phoronomic' interpretation of a curve in mathematics (the Greek verb'phoreo'means 'to drag'), stressing the act of tracing rather than looking at the curveas a static depiction or an artefact.

All this involves a dynamic conception of predication that is somewhat analo-gous to Bergson's first thesis on movement - movement is not to be confused withthe traversed space: the space is past and divisible, but the movement is present andundivisible (Bergson, 1896, see also Deleuze, 1983, p. 9) - and Deleuze's conceptionof image-movement, as exemplified in movies: movies give slices but these are notto be considered as immobile slices with movement that is added to them, but as anirreducible unity of image and movement (1983, p. 11).

The philosophical implications of the phoronomic approach to listening are rathercomplex. I only mention a musical analogue that can be found in Schaeffer's (1966,p. 570) and Chion's (1983, p. 162) concept of melodic profile and Schenker's concep-tion of voice-leading as a kind of drawing lines (Schenker, 1956/1933, p. 31). Themelodic profile is characterized by Schaeffer as a variation that affects the whole massof the sound in letting it draw a kind of trajectory in the melodic range (1983, p. 162).As (1997a) puts it, he points out that notions of melodic profile were actuallypresent in the Gregorian neumatic notation in the form of a 'typology' of shapes, butmore important than such a rudimentary classification of shapes is the attempt to cre-ate more general categories ofevolution of pitch. He conceives of them as 'envelopes'

Page 140: Musical Imagery

MARK REYBROUCK 127

in the sense that there are patterns of the patterns of change. Melodic profiles canaccelerate, slow down, fluctuate and modulate (Schaeffer 1966; 1997a). Othermusical analogies, however, are possible as well.

Experiential phenomenology

The concept of processual predication is an interesting tool for categorizing dynamicongoing events as music. It is somewhat related to the work of the late Belgian exper-imental psychologist Michotte and his theory of experiential phenomenology. Cen-tral in his theory are the fundamental problems of causality, permanence and reality(Costall, 1991). Space limitations do not allow me to go in detail here, but the conceptof 'permanence' is likely to be an interesting tool for describing the dynamic unfoldingof music through time. Three fundamental types can be distinguished: prior perma-nence is characterized by the object or one of its parts appearing to have existed priorto its perception; permanence ofposteriority lets an object or one of its parts continueto exist, even if it ceases to be visible; and continuous permanence finally refers toan object that seems to remain itself, and maintains its fundamental identity duringthe whole time it is present despite apparent changes it may undergo. The problem of'non-permanence', on the other hand, is still more interesting for characterising mu-sical events. Three parallel cases can be distinguished here: creation, when an objectappears 'born' or rises into view; annihilation, when an object that is present disap-pears, leaving no trace, and substitution, when an object seems suddenly replaced byanother (Michotte, 1991, p. 78).

The musical analogies are obvious. Musical events, in fact, possess an intrinsictemporal structure that constitutes their own time and that is very closely linked to theimpressions of creation and annihilation, with an object appearing or disappearinggradually (Butterworth, 1991, p. 137). The phenomenon is a key problem in expe-riential phenomenology. It has been elaborated by one of Michotte's students in abody of empirical work on phenomenal arisings (Knops, 1947). Giving the exampleof the firing up of a flame or the formation of bubbles on the surface of boiling water,he states that an object begins to exist at the moment of its appearing or finishes toexist when it disappears. Objects, however, often are not created at the moment ofappearing but simply become visible. In this case, they preexist before their actualappearance and are permanent. This is the case when an object is hidden by anotherobject, as a landscape that becomes visible while opening the window (screen-effect).

Phenomenal arisings are frequent in the domains of audition and odour but are rarein the domain of vision, where one mostly has to do with screen effects. According toKnops (1947), they can be classified in four major types: phenomenal instantaneousapparition, in which the object is present at a glance, without any evolution or internaldevelopment; explosion, in which an amorphous mass fills in a limited space abruptlyand with a movement of expansion; deployment, in which a kind of symmetrical orasymmetrical magnification occurs with all the the parts of the object executing a com-mon movement of centrifugal character; and finally overture, in which an object seemsto symmetrically unfold its extremities while maintaining its mass motionless ( Knops,1947, pp. 572-575). It is challenging to apply this typology of phenomenal arisings

Page 141: Musical Imagery

128 MUSICAL IMAGERY, SENSORY PROCESSING AND IDEOMOTOR SIMULATION

to the dynamic ongoing unfolding characteristics of the sonorous articulation throughtime. I refer only to the work of Miereanu (1998) who describes some semiotic tex-tualities of musical unfolding (arising, crepuscular, zenithal, meteoric, envelopping),but much of this phenomenal typology still has to be worked out.

Perception and action

Empirical work on experiential phenomenology is a promising area of research. Thereis, however, the danger that music will be reduced to the exosomatic realm of experi-ence, leaving the listener out of the experience. I strongly argue, therefore, for a kindof enactive cognition that depends upon experience that involves motor and sensori-motor components and that brings together perception and action. There is a growingbody of neurophysiological research that stresses the importance of coupling percep-tion and action (Berthoz, 1997, 1999, and for the domain of music: Gromko & Poor-man, 1998; Mikumo, 1994; Todd, 1999) but also the domain of linguistics has offeredinteresting claims. An important contribution comes from the description and catego-rization of events as exemplified in the categories of words that are used to describethem. Pioneering research in this field has been done in the domain of roadway cate-gories that are used by professional drivers. Most of their descriptions are organizedaround the principal dichotomy of perception and action with verbalized propertiesthat reflect linguistic categories as substantives and adjectives or action verbs. Conur-bations and roadways are typically described in terms of substantives and adjectives.Villages and small towns evoke perceptual elements of the environment as churches,shops, markets, while roads mostly are described in terms of congestions, trucks, carsand traffic signals. Intersections of roads on the contrary are described in terms ofaction verbs (Mazet, 1991).

A somewhat analogous contribution stems from language theorists who tried tobuild a bridge between linguistic and neurophysiological research in using the idea ofactivity signature (Beck, 1987). They studied the muscular habits that are associatedwith words we use in everyday speech (e.g. for the word 'chair' this means the motorhabits the body normally goes through when sitting down or getting up) and foundthat they help to define the words and to subtly distinguish them from others (Beck,1987, p. 24).

It is tempting to apply this to the enactive approach of dealing with music. WhatI have in mind is the whole domain of motor categorization of sounding events.A somewhat related program was already advocated by (1997b, 1999) whoclaims that listeners possess a vast repertoire of singular 'sound-producing actions'(hitting, striking, kicking, blowing, etc.) as well as more complex or compound ac-tions (drumming a rhythmic pattern, sliding up and down a melodic contour, or eventhe formant changes in timbre resulting from the changing shape of the vocal tract)(see also Molino, 1988, and his concept of musical ergology). But also the metaphorsused in talking about music refer in a way to sound-producing actions (slow, fast,up and down, etc.) and the same applies to musical terms like martellato, leggiero,tenuto, legato etc. 1999, p. 90). The category of sound-producing actions,however, can be widened to include all kinds of overt and covert movements that are

Page 142: Musical Imagery

MARK REYBROUCK 129

related to the production and reception of the sounds. It makes a difference, further,as to both the intensitiy and precision of the covert movements (the ideomotor simu-lation) if the subject who tries to imagine a certain musical structure is an expert or alayman. Subjects who received formal musical training can use this explicit musicalknowledge and will easily imagine all the motor processes that are connected withthe production of the sounds. There is, however, a less explicit level of ideomotorsimulation. As the 'energetic' approaches to music psychologoy, advanced by, amongothers, Kurth (1931) and Mersmann (1926) demonstrated, there is a more general, lessspecific experience of ideomotor simulation contained in the listening process that canbe basically conceived of as the 'forces' and 'energies' inherent in musical structuresthat in turn account for our perception and imagination of 'tension', 'dissolution' and'movement' (Reybrouck, 1995).

I will not elaborate this distinction here, in order to focus merely on the couplingof perception, imagery and action. The whole field of 'motor learning' is a proto-typical domain, but the 'central' version of the motor theory of perception can do aswell. Accumulated data, in fact, provide support for a 'central' assumption of thecurrent conception of motor control. Motor acts, according to Decety (1996), are cen-trally represented in the sense that their organization is supposed to be based uponthe utilization of information stored in memory in the form of multiple hierarchicallyorganized representations of action. This assumption, however, involves a decouplingof action schemas from the executive system (Annett & Smith, 1988). There is furtherevidence of a specific neural mechanism that is capable of storing action prototypes(Di Pellegrino et aI., 1992) and it seems likely that there should be a mechanism whichis active in both the production and the recognition of specific actions (Annett, 1996).This shows the interrelations between perception, imagery and action, and it is chal-lenging to apply this to the process of dealing with music.

Enactive listening and motor imagery

The term 'enactive' has multiple meanings. I use it here in the sense that 'enactivelistening' takes the human body and its actions as a reference. One has to make adistinction, however, between overt and covert action. Enactive listening, therefore,involves a kind of motor imagery. Music, in fact, can be conceived as movementthrough time. The problem, however, is complex, in that music can be conceived asthe mover, but the listener can move or be moved as well. The question is related tothe distinction between the objective-exosomatic and subjective-endosomatic realmand the weakening of the epistemological duality that distinguishes between the ob-server and the observed thing, as claimed by James (McDermott, 1968, p. 177). Thecrucial idea here is the conception of motor imagery, which is defined by Mahoneyand Avener (1987) as a dynamic state during which a subject mentally simulates agiven action. This type of phenomenal experience implies that he/she feels himselfor herself performing a given action without actual manifestation of this action. Itcorresponds to the so called 'internal imagery' (or first person perspective) of sportpsychologists, and is ideomotor rather than sensorimotor activity. The transition fromovert action to internalized forms of action, however, does not imply the abandoning

Frances Shih
Page 143: Musical Imagery

130 MUSICAL IMAGERY, SENSORY PROCESSING AND IDEOMOTOR SIMULATION

of the sensorimotor control systems that link the sensors to the central nervous systemand the effectors (the muscles). It only cuts off the actual manifestation of the outputor effector side of the control system.

Converging evidence from several sources (Decety, 1996) indicates further thatmotor imagery pertains to the same category of processes as those which are involvedin programming and preparing actual actions, with the difference that in the latter caseexecution would be blocked at some level of the path from the cortex to the spinalchord (Decety, 1996). The representative and executive systems, in fact, can be de-coupled such that perception is not inevitably translated into imitative action and thatactions can be imagined without being translated into overt movement. Both percep-tion and imagery must involve a very precisely tuned inhibitory mechanism (Annett,1996). As a result, the actual feedback is lacking as motor images are substitutedfor manifest movement. In arguing for enactive listening, however, I stress the closeinteraction between representation and executive systems and the mental simulationof executed actions. The latter involves the activation of a central action plan andconstitutes the preparation for action (Jeannerod, 1994). According to Annett (1996),however, motor imagery contains two elements: the activation of the action prototypeand memories of the perceptual consequences of previous actions of a similar kind.What matters here is a current concept of motor control that is the result of the inter-play between 'central' and 'peripheral' levels of control, a combination of 'ideomotorpreparation' and 'sensorimotor' control. The same probably holds true for 'enactive'listening to music, if the listening process keeps step with the actual sonorous unfold-ing through time.

From sensorimotor processing to ideomotor simulation

In stressing the role of sensorimotor processing in dealing with music, I lean heavilyupon James' theory of radical empiricism (James, 1976), in which he argues for akind of cognition that keeps step with the perceptual flux. Abandoning the idiosyn-crasies of the sonorous articulation, however, offers a kind of conceptual knowledgethat emancipates itself from the inexorable character of time. The human brain, in fact,is not merely a reactive machine (Paillard, 1977, 1990, 1994b; Berthoz, 1997) withperceptual predispositions that act as selective filters and motor automatisms that aretriggered in a quasi-automatic way. According to Paillard (1994b), this dispositionalmemory has to be supplemented with a representational memory that is analogousto the concept of working memory. It forms the material for the mental operationsthat allow cognitive control and anticipation. The human brain, in fact, is a predictivemachine that controls direct reactivity to external solicitations. Or, as Paillard putsit: what is typically human is the possibility to perform 'internal dialogues' that areso typical of mental activities. This implies a transition from mere sensorimotor toideomotor activities with a corresponding integration of sensorimotor dialogues of the'organized machines' with the 'self-organizing capacities' of the nervous machinery(1994b, p. 929).

The idea of the brain as a predictive machine is an interesting line of thought(Berthoz, 1997; Paillard, 1994b). It considers the human brain as primarily proactive

Frances Shih
Page 144: Musical Imagery

MARK REYBROUCK 131

and anticipating upon the consequences of action and this on the basis of an internalmodel of the body and the world. As Berthoz puts it, an expert ski-runner does nottreat all the sensory information continuously. He must unroll the trajectory in hismind, predict the stages and halting-points, the possible places ofdifficulties and makedecisions before he starts the action. Skilled action, therefore, is anticipation, guessingand betting on the entrained behavior (Berthoz, 1997, p. 7,10). What matters really is akind of mental simulation of the unfolding through time: the action is 'pro' -grammedin the most literal sense of the word and the configurations of the sum total of sensoryinformation both of the body and the external environment are verified with respect topossible discrepancies. Motor imagery then is the manifestation of the normal internalsimulation which accompanies the planning and execution of movements (Berthoz,1996, p. 110).

The same holds true for perception of movement. Berthoz offers the example ofoculomotor pursuit that is intrinsically predictive. What is pursued is not the targetproper but the internal simulation of its predicted trajectory. The brain anticipates withrespect to the trajectory. (1997, p. 164) Musical analogies are obvious here. One onlyhas to translate the concept of 'eye-tracking' to the concept of audiomotor pursuit ofthe auditory trajectory. Skilled listening in fact is predictive and highly anticipatory indealing with greater tension-building chains (Reybrouck, 1995; Narmour, 1990).

Motor imagery and motor preparation

One of the central claims of this article was the suggestion that listening to music in-volves listening strategies that rely upon motor encoding of the sonorous articulationthrough time. Listening, then, should be related to action planning and motor prepa-ration without actual motor output. There is indeed a close functional equivalencebetween motor imagery and motor preparation as suggested by the positive effectsof imagining movements on motor learning, the similarity between the neural struc-tures involved, and the similar physiological correlates observed in both imaginingand preparing. (Jeannerod, 1994). It is further hypothesized that there is a continuumbetween motor preparation and motor imagery although they have different subjec-tive contents. Motor preparation is an entirely non-conscious process which escapesthe subject's awareness. Only the final result is completely conscious. By contrast,the content of motor images can be accessed consciously by the imaginer (Jeannerod,1994). The difference between the two situations, however, may be one of degree, andnot of nature. This implies that motor preparation or the intention to act, if it could beprolonged, would become progressively a motor image of the same action.

Motor planning, according to Jeannerod, is intentional rather than reactive: actionsare driven by a represented goal rather than directly elicited by the external world.Representations may be built from the environment, they may rely, at least partly, onknowledge acquired from the outside but the generation of actions involves a repre-sentational step operating with fixed rules and relying on identifiable building blocks(1994). I argue strongly therefore for a 'central version' of motor imagery, leaningheavily on internal representations of the self in action and the movements of exter-nal objects. There is, in fact, a whole tradition of motor imagery in music education

Frances Shih
Page 145: Musical Imagery

132 MUSICAL IMAGERY, SENSORY PROCESSING AND IDEOMOTOR SIMULATION

focussing on the actions of learning motor skills like playing a musical instrument.The pupil who watches the teacher demonstrating an action, must imagine in his/hermind the teacher's action in order to reproduce it later on. (Jeannerod, 1994, p. 187)What I argue for, however, is not this imagery of manifest movement of playing amusical instrument, but the mental simulation of the actions that could be executedat a virtual level while listening to music (see also Delalande, 1984, 1988; Molino,1988; Lidov, 1987). There is much empirical evidence now that motor imagery andexecution involve activities of very similar cerebral motor structures 'at all stages ofmotor control' (Crammond, 1997).

Conclusions

In this paper I have argued for a widening of the concept of musical imagery. Ratherthan defining imagery as a perceptual sensation in the absence of corresponding sen-sory input, I have claimed that imagery can be coperceptual as well. Imagery, then,is a capacity for structuring and organizing the actual experience through time. Thisstresses the role of the subject in organizing his/her experience as well as his/her cog-nitive and conceptual tools for mediating between the perceptual input and his/herimaginative projections. The latter, however, are closely linked to embodied and en-active forms of cognition. Dealing with music in fact leans upon sensorimotor andideomotor activity. The former is a conservative process that keeps step with the artic-ulation through time, the latter is a predictive process that allows the listener to makepredictions as to the actual unfolding of the music through time. The combinationof both modalities makes the process of dealing with music a richer experience thatallows the listener to process music in a perceptual and conceptual way. It does justiceto both the subtleties of the sonorous articulation and the more abstract and internaldialogues that allow the listener to simulate the actual unfolding through time.

I have argued further for a listening strategy that makes possible a transition frommerely sensorimotor processing to an ideomotor simulation that makes use of ourbodily representations of goal-directed actions. This could be a promising area offuture research. What I have in mind is a kind of retooling of music theory with newconcepts and new paradigms. I only mention the possibility of motor and kinestheticcategorization of sound, the gestural approach as a behavioral tool for describing thereactions of the listener (Delalande, 1988), the possibility ofdescribing music in termsof a grammar of movements (Baily, 1985), and finally the sensorimotor couplings thatcan be performed both at a manifest and an internalized level.

References

J. (1996). On knowing how to do things: a theory of motor imagery. Cognitive Brain Research, 3,65-69.

Annett, J. & Smith, R. (1988). Motor imagery in Parkinson's disease. In C. Comoldi (Ed.), Pre-proceedingsofthe second International Workshop on Imagery and Cognition (pp. 373-388). Padua.

Baily, J. (1985). Music Structure and Human Movement. In Howell, I. Cross, & R. West (Eds.), MusicalStructure andCognition (pp. 237-258). London - Orlando - San Diego - New York - Toronto - Montreal

Page 146: Musical Imagery

MARK REYBROUCK 133

- Sydney - Tokyo: Harcourt Brace Jovanovich.Bateson, G. (1985). Mind and Nature. London: Fontana Paperbacks.Beck, B. (1987). Metaphors, Cognition and Artificial Intelligence. In R. E. Haskell (Ed), Cognition and

Symbolic Structures: The Psychology of Metaphoric Transformation (pp. 9-30). Norwood: Ablexpublishing Corporation.

Bergson, H. (1896). Matiere et memoire. Essai sur la relation du corps al'esprit. Paris: Alcan.Berthoz, A. (1996). The role of inhibition in the hierarchical gating of executed and imagined movements.

Cognitive Brain Research, 3,101-113.Berthoz, A. (1997). Le sens du mouvement. Paris: Odile Jacob.Berthoz, A. ( 1999). Le secret du geste. In A. Berthoz, Le cerveau et Ie mouvement. Comment nos gestes

construisent notre pensee. Science & Vie, Hors Serie, 204, (68-76).Bideaud, 1. & Houde, o. (1991). Categorisation, logique et prototypicalire. Aspects developpementaux.

In D. Dubois (Ed.), Semantique et cognition. Categories, prototypes, typicaLite (pp. 55-70). Paris:Editions du CNRS.

Bouveresse, 1. (1995). Language, perception et realite. Nimes: Jacqueline Chambon.Butterworth, G. (1991). Phenomenal Permanence. In G. Thines, A. Costall & G. Butterworth (Eds.),

Michotte's Experimental Phenomenology of Perception (pp. 117-167). HillsdalelLondon: LawrenceErlbaum.

Carroll-Phelan, B. & Hampson, P. (1996). Multiple Components of the Perception of Musical Sequences:A Cognitive Neuroscience Analysis and Some Implications for Auditory Imagery. Music Perception,13(4), 517-556.

Chion, M. (1983). Guide des objets sonores. Paris: Editions BuchetlChastel.Costall, A. (1991). Phenomenal Causality In G. Thines, A. Costall & G. Butterworth (Eds.), Michotte's

Experimental Phenomenology ofPerception (pp. 51-64). Hillsdale/London: Lawrence Erlbaum.Crammond D. (1997). Motor imagery: never in your wildest dream. Trends in Neurosciences, 20, 54-57.Decety, 1. (1996). Do imagined actions share the same neural substrate? Cognitive Brain Research, 3,

87-93.Deecke, L. (1996). Planning, preparation, execution, and imagery of volitional action. Cognitive Brain

Research, 3, 59-64.Delalande, F. (1984). La musique est unjeu d'enfant. Paris - Bry-sur-Marne: BuchetlChastel- INA.Delalande, F. (1988). Le geste, outil d'analyse: quelques enseignements d'une recherche sur la gestique de

Glenn Gould. Analyse Musicale, 10,43-46.Deleuze, G. (1983). Cinema 1. L'image-mouvement. Paris: Les Editions du Minuit.Di Pellegrino, G., Fadiga, L., Fogassi, L., Gallese, V. & Rizzolati, G. (1992). Understanding motor events:

a neurophysiological study. Experimental Brain Research, 19, 176-180.Dretske, F. (1985). Precis of Knowledge and the Row of Information. In H. Kornblith, (Ed.), Naturalizing

Epistemology (pp. 169-187). Cambridge/London: MIT Press.Fillmore, C. (1984). Frames and the Semantics of Understanding. Unpublished manuscript. Dept. of

Linguistics, Berkeley: University of California.Gibson, 1. (1979). The Ecological Approach to Visual Perception. Boston - Dallas - Geneva - Illinois -

Hopewell - New Jersey - Palo Alto - London: Houghton Mifflin Company.Gjerdingen, R. (1994). Apparent motion in music. Music Perception, 11(4),335-370.Glasersfeld, E. von (1995). Radical Constructivism: A Way ofKnowing and Learning. London - Washing-

ton: The Falmer Press.R. I. (1997a). Formalization and Epistemology. Oslo: Scandinavian University Press.R. I. (1997b). Knowledge in Music Theory by Shapes of Musical Objects and Sound-Producing

Actions. In M. Leman (Ed.), Music, Gestalt and Computing. Studies in Cognitive and SystematicMusicology (pp. 89-102). Berlin - Heidelberg: Springer Verlag.

R.I. (1999). Cross-Modality and Conceptual Shapes and Spaces in Music Theory. In I. Zannos(Ed.), Music and Signs (pp. 85-98). Bratislava: ASCO Art & Science.

Gromko, 1. & Poorman, A. (1998). Does perceptual-motor performance enhance perception of patternedart music? Musicae Scientiae, l/(2), 157-170.

Hoffman, R. & Honeck, R. (1987). Proverbs, Pragmatics, and the Ecology of Abstract Categories. In R.E. Haskell (Ed.), Cognition and Symbolic Structures: The Psychology ofMetaphoric Transformation(pp. 121-140). Norwood: Ablex publishing Corporation.

Page 147: Musical Imagery

134 MUSICAL IMAGERY, SENSORY PROCESSING AND IDEOMOTOR SIMULATION

Intons-Peterson, M. (1992). Components of Auditory Imagery. In D. Reisberg (Ed.), Auditory Imagery(pp. 45-72). Hillsdale, New Jersey and London: Lawrence Erlbaum.

Jackendoff, R. (1987). Consciousness and the Computational Mind. Cambridge, Mass. London: MITPress.

Jackendoff, R. (1988). Conceptual semantics. In U. Eco, M. Santambrogio & Violi (Eds.), Meaning andmental representations (pp. 81-97). Bloomington & Indianapolis: Indiana University Press,.

Jacobson, E. (1930). Electrical measurements of neuromuscular states during mental activities. I. Imagina-tion of movement involving sceletal muscle. American Journal ofPhysiology, 91, 567-608.

James, W. (1901/1890). Principles ofPsychology, //. London: Macmillan.James, W. (1976). Essays in Radical Empiricism. Cambridge, Mass. and London: Harvard University

Press.Jeannerod, M. (1994). The Representing Brain: Neural Correlates of Motor Intention and Imagery. Be-

havioural Brain Sciences, 17, 187-202.Johnson, M. (1987). The Body in the Mind. The Bodily Basis of Meaning, Imagination, and Reason.

Chicago and London: The University of Chicago Press.Jones, M. R. (1987). Dynamic pattern structure in music: Recent theory and research. Perception and

Psychophysics, 41(6),631-634.Jones, M. R. & Boltz, M. (1989). Dynamic attending to and responses to time. Psychological Review, 96,

459-491.Kant, I. (1974/1790). Kritik der Urteilskraft (Ed. K. VorHinder). Hamburg: Felix Meiner.Knops, L. (1947). Contribution I'etude de la "naissance" et de la "permanence" phenomenales dans

Ie champ visuel. In Extrait de 'Miscellanea Psychologica Albert Michotte' (pp. 560-610). Louvain:Institut superieur de philosophie.

Kosslyn, S. (1980). Image and Mind. Cambridge, Mass.: Harvard University Press.Kurth, E. (1931). Musikpsychologie. Berlin: Max Hesses Verlag.Lakoff, G. (1987). Women, Fire, andDangerous Things: What Categories RevealAbout the Mind. Chicago:

University of Chicago Press.Lakoff, G. (1988). Cognitive Semantics. In U. Eco, M. Santambrogio & Violi (Eds.), Meaning and

mental representations (pp.119-154). Bloomington & Indianapolis: Indiana University Press.Langacker, R. (1987). Foundations ofcognitive grammar, Vol. 1. Stanford CA: Stanford University Press.Le Ny J.-F. (1994). Les representations mentales. In M. Richelle, 1. Requin & M. Robert (Eds.), Traite de

psychologie experimentale. 2 (pp. 183-223). Paris: Presses Universitaires de France.Liberman, A. & Mattingly, I. (1985). The motor theory of perception of speech revisited. Cognition, 21,

1-36.Lidov, D. ( 1987). Mind and body in music. Semiotica, 66(113),69-97.MacKay, D. (1992), Constraints on Theories of Inner Speech. In D. Reisberg (Ed.), Auditory Imagery

(pp. 121-149). Hillsdale, New Jersey and London: Lawrence Erlbaum.Mahoney, M. & Avener, M. (1987). Psychology of the elite athlete. Cognitive Therapy and Research, 1,

135-141.Mazet, C. (1991). Fonctionnalite dans l'organisation categorielle. In D. Dubois (Ed.), Semantique et cog-

nition. Categories, prototypes, typicalite. (pp. 89-100) Paris: Editions du CNRS.McDermott, 1. (1968). The Writings of William James. A Comprehensive Edition. New York: Random

House.Mersmann, H. (1926). Angewandte Musikiisthetik. Berlin: Max Hesses Verlag.Michotte, A. (1991). On phenomenal permanence: fact and theories. In G. Thines, A. Costall & G. Butter-

worth (Eds.), Michotte's Experimental Phenomenology ofPerception (pp. 122-139). Hillsdale1London:Lawrence Erlbaum.

Miereanu, C. (1998). Strategies du discontinu. Vers une forme musicale accidentee. In C. Miereanu &X. Hascher (Eds.), Les Universaux en musique. Actes du 4e Congres international sur la significationmusicale (pp.31-42). Paris: Publications de la Sorbonne.

Mikumo, M. (1994). Motor Encoding Strategy for Pitches of Melodies. Music Perception, 12(2), 175-197.Molino, J. (1988). La musique et Ie geste: prolegomenes une anthropologie de la musique. Analyse

musicale, 10, 8-15.Narmour, E. (1990). The Analysis and Cognition ofBasic Melodic Structures: The Implication-Realization

Model. Chicago: University of Chicago Press.

Page 148: Musical Imagery

MARK REYBROUCK 135

Paillard, J. (1977). La machine organisee et la machine organisante. Revue de l'education physique belge,27,19-48.

Paillard, 1. ( 1987). Vers une psychobiologie de l'intentionalite. In M. Laurent & P.Therme (Eds.),Rechercheen activites physiqueset Jportives 2 (pp. 163-194). Marseille: Editions UEREPS, Universited' Aix-Marseille II.

Paillard, J. (1990). Reactif et Predictif: deux modes de gestion de la motricite. In V. Nougier & J Bianqui(Eds.), Pratiques sportives et modelisation du geste (pp. 13-56). Grenoble: Universite Joseph-Fourier.

Paillard, 1. (1994a). La conscience. In M. Richelle, 1. Requin, & M. Robert (Eds.), Traite de psychologieexperimentale. 2 (pp. 639-684). Paris: Presses Universitaires de France.

Paillard, J. ( 1994b). L'integration sensori-motrice et ideo-motrice. In M. Richelle, J. Requin, & M. Robert(Eds.), Traite de psychologie experimentale. 1 (pp. 925-961). Paris: Presses Universitaires de France.

Piaget, J. (1967). Biologie et connaissance. Essai sur les relations entre les regulations organiques et lesprocessus cognitifs. Paris: Gallimard.

Ransdell, 1. (1986). On Peirce's Conception of the Iconic Sign. In P. Bouissac, M. Herzfeld & R. Posner(Eds.), Iconicity. Essays on the Nature ofCulture. Festschriftfiir Th. A. Sebeok (pp. 51-74). TUbingen:Stauffenburg Verlag.

Reybrouck, M. (1995). Spanning en OntJpanning in de Muziek. Een semiotische benadering van de omgangmet muziek (Tension and Relaxation in Music. A semiotic approachofdealing with music). PhD Thesis(unpublished). University of Louvain.

Reybrouck, M. (1997). Gestalt Concepts and Music: Limitations and Possibilities. In M. Leman (Ed.),Music, Gestalt and Computing. Studies in Cognitive and Systematic Musicology (pp. 57-69). Berlin -Heidelberg: Springer Verlag.

Reybrouck, M. (1998, December 1-5). Deixis, pointing and categorization as operational tools for musicalsemantics, Lecture delivered at the Sixth International Conference on Musical Signification, Aix-en-Provence (to appear).

Reybrouck, M. (1999). The musical sign between sound and meaning. In I. Zannos (Ed.), Music and Signs(pp. 39-58). Bratislava: ASCO Art & Science.

Reybrouck, M. (2000). Biological roots of musical epistemology: functional cycles, Umwelt and enactivelistening. Semiotica, 131(1/4).

Reybrouck, M. (2001), Musical semantics between epistemological assumptions and operational claims.Acta Semiotica Fennica. (to appear).

Rollins, M. (1989). Mental Imagery. On the Limits of Cognitive Science. New Haven & London: YaleUniversity Press.

Rosenbaum, D.A. (1991). Human Motor Control. San Diego - New York: Harcourt Brace.Saslaw, J. (1996). Forces, containers, and paths: the role of body-derived image schemas in the conceptu-

alization of music. Journal ofMusic Theory, 40(2), 217-243.Schaeffer, P. (1966). Traite des objets musicaux. Paris: Editions du Seuil.Scheerer, E. (1984). Motor theories of cognitive structure: a historical review. In W. Prinz & A. Sanders

(Eds.), Cognition and motor processes (pp. 77-97). Berlin: Springer.Schenker, H. (1956/1933). Neue musikalische Theorien und Phantasien III: Der freie Satz. Wien.Serafine, M. (1988). Music as Cognition. The Development of Thought in Sound. New York: Columbia

University Press.Todd, N. P. M. (1999). Motion in Music: A Neurobiological Perspective. Music Perception, 17(1),115-126.UexkUll, J. von (1957/1934). A Stroll Through the Worlds of Animals and Men. A Picture Book of Invisible

Worlds. In C. Schiller (Ed.), Instinctive Behavior. The Development ofa Modern Concept (pp. 5-79).New York: International Universities Press.

Varela, F., Thompson, E. & Rosch, E. (1991). The Embodied Mind. Cognitive science and Human experi-ence. Cambridge (Ma)-London: MIT Press.

Viviani, (1990). Motor-perceptual interactions: the evolution of an idea. In M. Piatielli-Palmarini (Ed.),Cognitive Science in Europe: Issues and Trends (pp. 11-39). Golem.

Viviani, P. & Stucchi, N. (1992). Biological movements look uniform. Evidence of motor-perceptualinteractions. Journal ofExperimental Psychology, Hunlan Perception and Performance, 18,603-623.

Whitehead, A. N. (1927). Symbolism. Its Meaning and Effect. New York: Capricorn Books.Zatorre, R. & Halpern, A. (1993). Effects of unilateral temporal-lobe excision on perception and imagery

of songs. Neuropsychologia, 31, 221-232.

Page 149: Musical Imagery
Page 150: Musical Imagery

8

Musical Imagery as Relatedto Schemata of EmotionalExpression in Music and onthe Prosodic Level of Speech

Dalia Cohen and Edna Inbar

Introduction

How does the concept of musical imagery relate to sound schemata? Are there soundschemata that guide and create images of emotional expression, or types of emotions,either consciously or unconsciously, in music and speech? What is the role of naturaland learned schemata in emotional communication in music and speech?

We assume that musical imagery is related to musical schemata that are formed inour minds and stored in our long-term memory, and that every act of perception takesplace in relation to these schemata. It is associated with rules that relate to differ-ent levels of musical organization, which are meaningful to the types of experiencesthat music is supposed to evoke in the listener. Thus, musical imagery comprisesthe different kinds of experiences and emotions embedded in musical schemata. Wecan also talk about 'emotional schemata' (or 'emotive images', according to Reber[1985]), which may be manifested in other expressive modes besides music, such asthe prosodic level of speech, which our findings indicate is based on similar musicalschemata. They may also appear in gestures, facial expressions, and so on.

Thus, musical imagery is inherently connected to schemata in 'non-musical' do-mains, including language. Linguistic terms play a complementary communication

Frances Shih
Page 151: Musical Imagery

138 IMAGES OF EMOTIONAL EXPRESSION IN MUSIC AND SPEECH

role in the expression of emotions, and they may evoke associated emotional sche-mata. Hence, in addition to addressing psychophysical phenomena such as pitch im-agery (in the 'inner ear') and color associations, we suggest a broader scope for con-sidering musical imagery, including metaphorical images and concepts in differentmodalities.

It is important to distinguish between learned schemata as formulated in theory- which are based on selected combinations of various measurable parameters (scales,harmonic patterns, metric patterns, etc.) and are culturally determined (although notnecessarily arbitrary)- and 'natural' schemata, which are not culture-dependent, arefamiliar from outside the realm of music and of art in general, have emotional asso-ciations, are not expressed in precise quantitative terms, and are hardly mentioned inWestern tonal music theory.

In our opinion, the natural schemata are characterized mainly by the ranges ofoccurrence of the various parameters (texture in its broad sense [Cohen & Dubnov,1997]); meaningful curves of change over time; types of operations that can be re-garded as cognitive; and the degree of definability as determined by the degree ofcategorization of the various elements and by concurrence/nonconcurrence of param-eters and of the boundaries of units. Here we focus mainly on the natural schemataand their contribution to the creation of emotional images in music.

Numerous research studies have been and are still being conducted on the produc-tion and perception of emotions, but the question of how emotions are communicatedand how musical imagery is evoked has not yet been fully answered. This is especiallytrue if one takes into account the fact that the excitement factors may be manifestedeither in a written piece, which includes learned and natural schemata, or on the levelof performance, which relates to natural schemata only. The two may also be non-concurrent ('noncongruent affect') - as in an excited performance of a calm piece ora calm performance of a piece based on excitement factors. In addition, excitementin speech may appear on the semantic-lexical level and the prosodic-emotional level,with or without concurrence.

The present study is an attempt to contribute to this broad field of research andto address the above questions by presenting a theoretical model and by conductingexperiments accordingly, with reference mainly to natural schemata.

By and large, an understanding of the meaning of emotions in speech and musicis based on their classification. For example, in the psychological literature, we findtwo major categories, which may partly overlap each other (note that most of theexperiments relate to emotional facial expressions):

1. Simple or basic emotions Goy, surprise, anger, sadness, disgust, fear, guilt, andshame) versus complex emotions (e.g., anxiety, which is composed of the basicemotion of fear plus sadness, guilt, and/or shame; and depression, as a basic emo-tion of fear plus sadness, anger, and/or disgust, and the like) (Izard & Buechler,1980; Plutchik, 1980; and others). The complex emotions are also defined as'emotional patterns' (as we also discovered in our study; for example, longing,which may be composed of both joy and sadness).

2. Positive emotions (the only well-defined one is joy) versus negative emotions(sadness, anger, disgust, fear, etc.) (Lazarus et aI., 1980; Heath, 1986).1

Page 152: Musical Imagery

DALIA COHEN AND EDNA INBAR 139

As for emotional expression in music, there are three main fields of research:1. In the media of written music, some studies discuss the rules for general excite-

ment versus calm (e.g., Hanslick, 1854; Sacks, 1946; Meyer, 1956; Cohen, 1971),and some concentrate on specific emotions and types of emotions (e.g., F6nagy& Magdics, 1963/1972; Dowling & Harwood, 1986; Sloboda, 1991; Hacohen& Wagner, 1997; and articles in the special issue on emotional expression andcommunication in music Psychology ofmusic, 24(1) [1996]).

2. Studies of musical performance, which came much later, look into the rules ofperformance that aim to express specific emotions (e.g., Gabrielson & Juslin,1996), and rules of performance of any other type (laboratories in Stockholm, atthe University of Geneva, Haskins Laboratories in New Haven, the Ohio StateUniversity, IRCAM in Paris, and vocal analysis at the Hebrew University ofJerusalem). The importance of these studies (even when they do not deal di-rectly with excitement) in the matter at hand lies in the very finding of 'naturalregularity in the performance' ,2 such as deviation from precise 'correct' perfor-mance; ways to emphasize a specific note or musical units; the natural tendencyto coordinate parameters3 (e.g., Gabrielsson, 1974, 1987; Clynes, 1977; Cohen,1978; Sundberg, 1982, Sundberg et aI., 1991; Clarke, 1988; Todd, 1985, 1992;and Repp, 1998); or 'expressive timing in the mind's ear' (Repp, this volume).The greatest challenge in research of performance was set forth by Clynes, whoattempts to formulate rules of performance for the computer so as to reflect thewritten content of the music.

3. Studies of emotional expression by the human voice (verbal or non-verbal) andanimal vocalizations from various perspectives, such as List (1963), on the tran-sition from speech that evokes excitement to music; Lieberman (1975), on the in-nate linguistic and musical potential in infants; Cohen (1983) and Katzir (1995),on the meaning of bird calls; Cohen (1986), on the rules of recitation in the per-formance of Rig Veda hymns (which are considered exciting); Bolinger (1972),a collection of works some of which focus on the level of 'attitude' in speech;Sundberg's comparison of emotion in speech and music (Sundberg, 1982); Sche-rer's comparison of emotions in speech and those expressed facially (Scherer,1986); Rapoport (1998), on the different tone qualities and timbral-expressive el-ements with regard to the emotional expression of opera singers; and the compre-hensive work of F6nagy and Magdics (1963/1972), who also examined musicalparallels to the emotions expressed by the human voice.4

In the present study, we are interested in elucidating the universal factors that shapeemotional auditory images both in written and performed music and in speech, withreference to general and specific types of emotions with potential for family-resem-blance relationships. We focus on presenting several theoretical assumptions and aselection of examples and analyses from our comprehensive study.

Page 153: Musical Imagery

140 IMAGES OF EMOTIONAL EXPRESSION IN MUSIC AND SPEECH

Suggestions and assumptions

The assumptions pertain to (1) types ofexcitement that are expressed aurally; (2) rulesofexcitement that are derived from universal principles of sound organization and thatmay be regarded as overall schemata of excitement versus calm, by means of whichwe attempt to characterize the various types of emotions; and (3) music and speech -the meaning of their hierarchical levels.

Types of excitement and emotions and basic principles that can shape themVariables oftypes ofexcitement and emotionEmotional expression has many facets. The following overall classifications seem tobe relevant to the emotions produced by auditory material:• Latent-complex-abstract excitement (E} ) (as a general contrast to calm) - famil-

iar primarily from artistic organization - versus direct-immediate-concrete (E2 ),which is familiar from nonmusical contexts and bears a specific name (e.g., happyor sad) with intermediate states between E} and E2 .5

• Positive/negative (as mentioned above, but only with respect to E2 ). This variableis problematic with respect to E} because 'in art, every tear becomes a pearl' (asthe Israeli poet Zelda put it). It is further complicated by the possible interventionof variables that are characteristic of works of art, such as interestinglboring andpleasant/unpleasant (which may be due not only to the specific form of organi-zation but also to psychoacoustic constraints). Consequently, both relaxation andtension may be positive or negative. Nevertheless, an analogy is possible betweenthe transition from negative to positive and a transition from states of tension anduncertainty to resolution. The distinction between positive and negative is notentirely clear with respect to E2 either.

• Basic and well-defined versus mixed (as mentioned above with respect to ).An example in the direct E2 type is well-defined joy, as opposed to mixed loveor longing, which may appear in combination with opposite emotions such as joyand sadness. In the latent E} , the mixed type involves a simultaneous appearanceof musical units with several meanings or several interpretations of the same unit.This classification partially overlaps the classification of clear/unclear.

• Externalized/internalized or overt/suppressed. With respect to E2 ' some of theemotions may appear at the two extremes (e.g., suppressed anger as opposedto explosive anger); some may be only at one extreme (e.g., sadness, which isinternalized); and some are unclear. The same is true for E} , in which all thepossibilities may be found.

Other types (calm and neutral):• Relaxed or calm, which is the general opposite of all the pairs of opposites in E}

and E2 ' and as stated above, may itself be considered a definite expression, sincerelaxation is a specific kind of emotion. It may be regarded as a reference pointfor the other types of emotions. If we define the rules of relaxation, then violationof these rules may be considered excitement.

• 'Neutral.' Though it is now a fairly well-accepted concept, it does not really

Page 154: Musical Imagery

DALIA COHEN AND EDNA INBAR 141

exist in music or speech. If we think of it as an intermediate state betweenpositive and negative, then we get apathy, which can be seen as an opposite ofboth calm and excitement of all kinds. In musical performance, neutralizationof the performer's creative role, i.e., precise performance of the written material,is meaningless (Repp, 1989). The same is true of the rules of performance inspeech (Laufer, 1987). Another point is that the degree of difficulty in expressingemotions vocally varies from emotion to emotion and may also be related to thegeneral characterization of the emotions as clear or unclear.

All these (except neutrality) may be manifested in various media: the written music,the musical performance, written or auditory verbal material (speech or recitation),nonverbal sounds (screaming, crying, etc.). Different media emphasize different emo-tional variables. For example, E} is more prominent in written music; E2 ' in contrast,is found in almost all media, but it is especially salient when it appears in pure form innonverbal utterances (List 1963). In music, E2 always appears in addition to E} andis less concrete than in excited speech.

General rules ofexcitement, based on natural schemata• Intensification in all parameters. Therefore, a 'natural' ending for a unit in mu-

sic or speech (calm following tension) will involve a descent in pitch, density,and intensity, whereas an expression of excitement, or a question that requires acontinuation, will end with an ascent.

• Sudden versus gradual change in all parameters. This is one of the characteristicsof the rules of Gestalt. According to this factor, the aforementioned 'natural'ending can also be explained in that it disappears gradually. Some examplesare large melodic intervals (as opposed to a second) and sudden switches fromslow to rapid notes. The laws of 'natural' melodic progressions, formulated byNarmour (1990), are one realization of this principle.

• Deviation from the optimal normative range (represented by an inverted U func-tion for the various parameters) toward either extreme (Cohen & Granot, 1995):with respect to register (very high/very low); ambitus (very large/very small), in-tensity, density, degree of change (zigzag/great evenness, Le., no change undercertain conditions), complexity, etc. The two extremes (which are not symmetri-cal) are what lead to the above classification of the emotions as 'externalized' or'internalized.' An especially prominent example of the two extremes is Kabuki,which tends to be externalized, as opposed to No, which is internalized. A markedexample of unexcited expression can be found in the rules of Palestrina counter-point, which express maximum adherence to the optimum range.

• Nonconcurrence between the directions of progressions in various parameters(e.g., an increase in pitch in diminuendo; or a convex curve in one parameterwith a zigzag or concave curve in another parameter) or between the boundariesof simultaneous units that are determined by learned or natural schemata (Cohen& Wagner, 2000). Nonconcurrence contributes to complexity and to uncertaintyin defining the various events. For example, concurrence is salient in the worksof Mozart, whereas nonconcurrence is salient in the works of Bach. Many of therules of performance mentioned above are essentially an expression of states of

Page 155: Musical Imagery

142 IMAGES OF EMOTIONAL EXPRESSION IN MUSIC AND SPEECH

concurrence/nonconcurrence of parameters.• Uncertainty as to the continuation of the progression (maximum certainty may be

caused by learned schemata such as the 'directional' harmonic patterns I-IV-V-I),or by the natural schemata such as the convex curve, or 2° (1+1+2+4 ... ), therules of Gestalt, and concurrence). For example, concave curves are salient inexcited expression, such as the recitation of the Rig Veda hymns, or in Roman-tic music, and are forbidden in Renaissance music, especially that of Palestrina,where the convex curve is seen on various levels.

• Deviation from expectations (a much-discussed factor) - expansion of point 3with respect to the inverted U function and of additional points, if we assume'natural expectations' (e.g., deviation from natural or learned schemata).

All these rules may appear on various levels of organization in auditory expressions,separately and in various combinations, and may express different types of emotions.These principles enable us to formulate some rules of styles governed by the idealof tension versus calm (as in Baroque versus Renaissance music) and to characterizevarious emotions in excited speech. They may even serve as a criteria for distinguish-ing between bird calls in a state of excitement (e.g., presence in hostile surroundingsor during a fight) and a state of calm (Cohen, 1983). Most of these rules can be ex-plained psychologically or biologically (see, for example, Lieberman, 1963; F6nagy& Magdics, 1963/1972; Meyer, 1965; Bolinger, 1972; Cohen, 1983; Katzir, 1995;Wallin et aI., 2000).

Some assumptions regarding music and speechMany studies have been done of various aspects of this subject. Here we contributeour view on the subject by elucidating just three points: (1) types of parameters; (2)basic levels; (3) learned and natural schemata.

Types ofparameters and schemata based on them in music and speechHere we distinguish between two types of general schemata (A and B [possible inter-mediate stages are not presented here]) and subschemata (AI' A2 and B I ' B2 ), asfollows:A. Schemata based on parameters that permit complex hierarchical organization -

pitch and duration (intensity is an essential concomitant for purposes of empha-sis).

B. Timbre, which, as we know, does not lend itself to clear hierarchical organization.

• AI: Learned schemata based on measurable sizes with well-defined categories:from the parameter of pitch - a coherent interval system, scales, chords, etc.;from the parameter of duration - rhythms, meters, etc.

• A2 : Natural schemata that are not stated in precise terms but as greater than ... ,less than ... , or equal to . . .. They are familiar to us outside of music and aremeaningful. These are texture and curves of change for the various parameters,including intensity (e.g., a convex curve, a flat curve, or zigzag curve for variousparameters).

Page 156: Musical Imagery

DALIA COHEN AND EDNA INBAR

A

PitchDurationIntensity -+

B

Timbre

143

Exact sizeswith

well definedcategories

From pitch-intervals, scales,

etc.;from duration-meter, rhythm,

etc.

"Types"of curvesdefined by

more/less/equal(texture)

Well definedcategories

IPhonemes,somemusical

instruments

Unclearclassification

IMethodsof

performance

Figure 1. A comparison of music and speech in tenns of the parameters.

• B I : A system of timbres with clear categories, in which the timbre can be per-ceived even out of context as an isolated, indivisible event, as in the phonemesystem or certain musical instruments.

• B2 : Poorly defined events in the gray area between texture and timbre, wheretiny, unmeasurable, but significant changes occur (Cohen & Dubnov, 1997). Theyare salient in manners of performance (e.g., types of staccato and legato, smallchanges in duration and intensity) and especially in artistic singing.

The meaning of levels in speech and musicSpeech, as we know, is composed of hierarchical levels, from the lexical/semanticlevel to the prosodic/emotional level; between them are levels expressed mainly interms of the three 'musical factors in speech' (A2 ) - pitch, duration, and intensity-which are primarily responsible for shaping natural schemata. The prosodic sublevelsare as follows: Classification of syllables (stressed/unstressed; long/short [this de-pends on the particular language]); classification of words based on the location of thestressed syllable and its length (in Hebrew this distinction is very important); the syn-tactic level, which determines units of meaning; the level of the attitude or intentionthat relates directly to the text; and the pure emotional level (the last two levels maypartly overlap). This arrangement of the sublevels in speech reflects their proximity toor distance from the two extremes - the 'pure' lexical level and the emotional level,which, to a large extent, reflect the two polar schemata, learned and natural.

Page 157: Musical Imagery

144 IMAGES OF EMOTIONAL EXPRESSION IN MUSIC AND SPEECH

Each of these prosodic levels represents various interpretations of the lexical level,although they can exist on their own. They are also involved in shaping the levelsabove them. In real life, we are exposed to all the levels simultaneously, and we maypay attention to different levels each time. Moreover, all the levels share in shapingthe curves of the three parameters; in order to focus on a particular level we have toeliminate the contribution of the other factors that interest us. Here we focus mainlyon the purely emotional level, taking into account the other levels. We comment thatmost studies that discuss the prosodic levels of speech and their analogies in musicdo not focus on the emotional prosodic level (see the special issue on this subject inMusic Perception, 16(1) [1998]).

In music we can speak in very general terms about the level of the raw material(characteristic of a specific culture, period, or style), which is divided into sublevels oflearned schemata; rules of composition (which include operations and may be thoughtof as various realizations of the raw material); the written piece, which can be re-garded as one realization of the rules of composition, is itself divided into sublevels,and includes natural schemata; and the performance, which is based only on naturalschemata (Cohen & Granot, 1995). It should be stressed that this clear separationbetween levels is typical of Western music, and especially of tonal music.

In speech, of course, the prosodic-emotional level is manifested in vocalizationonly by means of natural schemata, while the musical parallel is expressed both in thewritten composition and in the performance.

Learned and natural schemata on various level ofspeech and musicHere we focus on schemata based on the four types of parameters A} , A2 ' B} , B2(presented in Fig. 1 on the page before) in relation to the two types of emotions: E}(latent) and E2 (direct). As was mentioned above, all the intermediate states can befound between E} , which is the primary message of the form ofmusical organization,and E2 ' which is the primary message of the prosodic-emotional sublevel and reachesits peak in nonverbal sounds.

We classify these schemata as follows (see Table 1 on the facing page):

1. Learned schemata, which constitute the building blocks of music theory and thelexical level of speech.

2. Natural schemata (NS), which are manifested in three ways:a. Texture and contours in music and in speech (the nonverbal aspects).b. Rules of composition that reflect cognitive constraints on images and meta-

phors in poetry and music;c. Rules of performance for music and for speech.

The experiments

Three types of experiments were conducted: (a) subjects' responses; (f3) analysis ofthe auditory material; (y) composers' 'interpretation'.

Page 158: Musical Imagery

DALIA COHEN AND EDNA INBAR 145

Table 1. Comparison ofMusic and Speech (A - schemata based mainly on pitch and duration;B - based on timbre; AI, B1 - well defined; A2, B2 - not well defined)

Music SpeechLearned schemata Theoretical material of Lexical-semantic level

traditional music(not necessarily arbitrary) (arbitrary, not necessarily vocal)

Selected "raw" • Intervals • Phonemesmaterial" in • Scales • Syllableshierarchical levels • Chords - Combinations create units

• Rhythm and meter meanings.- Combinations due tostylistic ideal

Types of sound In the "tonal West": Al Timbre: B IElsewhere: Az ,B Combinations: also Bz, Az

Types of exitement E I E}-EzNatural schemata Texture Prosodic level(texture) Range of occurrence and Interpretation of the

contours - in all parameters lexical-semantic• syntax (parsing)• attitude• emotions

Types of sound Az AzTypes of excitement E I , Ez EzNatural schemata Music PoetryCompositional rules Structures and forms Rules of rhythm, rhyme, and(known in various Various types of overall structure, based onareas in addition to organization og similarities syllabic classificationthe auditory field, and differences An additional artisticincluding rules of Types of repetition dimension to lexical materialsymmetry, Categories of operationsorder/disorder, etc.Types of excitement Ez, E} E}Natural schemata Music SpeechPerformance Various interpretations of the The prosodic level (with all

written composistion sublevelsTypes of sound Mainly Az ' Bz AzTypes of excitement EI , Ez Ez

Method in experiment a: Subjects' verbal responses to the emotions in auditorysentencesThe experiments examine listeners' responses to sentences aimed at expressing emo-tions of various types, on either the lexical level, the prosodic level, or both levels.Each verbal sentence was performed under three conditions: (1) concurrence betweenthe lexical and prosodic levels; (2) nonconcurrence between the two levels; (3) neu-trali ty on the prosodic level.

Page 159: Musical Imagery

146 IMAGES OF EMOTIONAL EXPRESSION IN MUSIC AND SPEECH

The selected emotionsSeven emotions were selected: anger (overt and suppressed), joy, firmness, disgust,fear, sadness, and longing. The selection was based on the assumption that theseemotions represent different emotional dimensions, such as positive-negative Uoy -sadness), externalized-internalized (anger - for both, sadness - internalized only),simple-complex (in our case, all of the emotions except longing are considered sim-ple and basic), and all of them can be expressed vocally to various extents of clarity(in contrast to an emotional mental state such as love, which may be expressed inan unlimited number of ways). We should already note that this classification is notsimple.

In our study each emotion was represented in several different sentences (betweenfour and ten), taking into account both lexical and prosodic properties in various com-binations.

The sentences on the lexical level: General• The sentences were in ordinary, standard 'student/adult language' (in Hebrew).• No emotional terms were inserted, and the emotional meaning was supposed to

be inferred from the context.• The sentences were relatively short (from ten syllables for the shorter sentence

to twenty-two syllables for the longer sentence). The average duration was fourseconds.

The sentences: SpecificThirty-three different sentences, representing the seven emotions (each one performedin three different prosodic expressions), were selected for the experiment.

The selection was done in two steps: (1) Five 'judges' had to determine the emo-tional content of written verbal sentences from a list of the seven emotions; only sen-tences that achieved 80% agreement were included on the list. (2) After listening to arecording of these sentences (each sentence had at least three prosodic versions) spo-ken by a professional radio broadcaster (a woman), the judges had to exclude from thelist all of the poorly uttered sentences, again with 80% agreement. The final selectionincluded 94 sentences.

Utterance ofthe sentences: Combinations ofprosodic and lexical levelsEach of the sentences was read in three ways:1. With the lexical and prosodic levels in concurrence, i.e., eliciting the emotion

inherent in the lexical level.2. With the lexical and prosodic levels in nonconcurrence, i.e., vocally expressing a

different emotion from that of the written text.3. In a 'neutral' or 'flat' voice.

The sentences were read and recorded following these instructions: for the concurrentsentences use 'ordinary' speech, and for the nonconcurrent sentences stress (withoutexaggerating) the emotion on the prosodic level. (In this respect, our study differsfrom most other studies in which the emotion selected was expressed in exaggerated

Page 160: Musical Imagery

DALIA COHEN AND EDNA INBAR 147

fashion on the emotional level.) The reader was not given specific instructions for the'neutral' sentences.

The recorded sentences were edited and re-recorded in random order in terms ofthe different emotions, with a five-second pause between sentences.

SubjectsThe experiment was carried out with 45 subjects, all adult university students, whowere divided into two groups: musicians (M), n=15, and non-musicians (NM), n=30.The musicians were all seniors at the Jerusalem Rubin Academy of Music. The non-musicians were graduate students in the School of Education at the Hebrew Universityof Jerusalem. Their ages ranged from 25 to 40.

ProcedureThe experiment took place in a regular classroom. The students were given a ques-tionnaire on which to write their responses to the recorded sentences. The sentenceswere introduced to the subjects on an ordinary Sony cassette recorder in random or-der in four blocks (22 to 24 sentences per block), with a five-second pause betweensentences and a two-minute pause between blocks. Altogether, the experiment lastedabout 20 minutes.

TaskThe subjects had to listen to 94 recorded sentences. During the five-second pausebefore the next sentence, they had to choose one of the seven emotions indicated onthe chart and write it on the questionnaire. They could mark 'other' and/or 'unclear'instead of or in addition to the seven categories of emotions.

Method in experiment (3: Analysis of the auditory materialThis analysis was carried out in two ways: (1) on a selection of sentences with theaid of the melograph (CSL Pitch program, Model 4331); (2) by listening to the entireset of recorded sentences. In both cases the analysis was based on the fundamentalfactors of excitement/relaxation. We note here already that fairly good coordinationbetween the two types of analysis was found.

Melographic analysisThe melographic analysis provides the experiment with information about the three'musical factors in speech': pitch, duration, and intensity. The factor of timbre, whichis extremely important in determining some of the emotions, as the studies of Fo-nagy and Magdics (1963, 1972) and Gabrielson (1996) have shown, was not exam-ined here. The curve of intensity most clearly represents the rhythmic organizationof the various levels between the lexical and emotional levels. Some pairs of concur-rent/nonconcurrent sentences were selected for analysis.6

Method in experiment y: Composers' 'interpretations' of verbal emotional con-tentThis experiment was based on two types of data: In the first part we asked composersto compose musical phrases for each of the seven emotions, based on written verbal

Page 161: Musical Imagery

148 IMAGES OF EMOTIONAL EXPRESSION IN MUSIC AND SPEECH

sentences taken from the set of sentences used in Experiment a.. They were instructedto compose a one-part melodic line for each sentence (seven melodic phrases in all).In the second part we analyzed musical pieces in which the emotional content is ex-plicitly stated and compared their features with our findings from the verbal responses.

Results of experiment a.: Findings from the subjects' responsesStatistical analysisWe calculated the percentage of correct identifications ofeach emotion by each subjectin each group: musicians (M) and non-musicians (NM). Then we calculated the over-all mean of correct identifications for each emotion. We used a MANOVA procedureto test group differences in correct identification across the spectrum ofemotions. TheMANOVA was applied separately to each of the research conditions (neutral; concur-rent; nonconcurrent).

Responses to concurrent sentencesIn concurrent sentences the two levels -lexical and prosodic - are inherently related,and we therefore expected a great deal of agreement among the subjects regarding theemotional content of the sentences. Nevertheless, there was quite significant scatteramong the responses, and not all of the emotions were identified clearly.

The MANOVA found a significant difference between the two groups - musicians(M) and non-musicians (NM) - in terms of correct identifications of the emotionsunder concurrent conditions (F0 ,43) = 6.50, p<0.05). In order to identify the sourceof the difference in the between-groups effect, we examined the univariant analysisof group differences for each emotion. Significant differences between groups werefound in longing (FO,43) = 4.61, p<0.05), and in firmness (FO,43) = 5.70, p<0.05). Inboth cases the musicians were better at identification.

The findings indicate three groups of emotions, according to their degree of defin-ability: (1) joy (79.75%) and firmness (74.81%); (2) anger (57.77%), fear (55.92%),and longing (53.88%); and (3) disgust (31.11%) and sadness (17.77%).

Although slight differences were found between the musicians and the non-musi-cians, the picture of three groups of emotions was maintained. This general groupingis in part supported by other factors and in part requires consideration of the 'substi-tutes,' as will be described later.

Responses to nonconcurrent sentencesIn this experiment, responses could relate either to the lexical or to the prosodic level.The main findings are:1. Subjects in both groups responded significantly better to the prosodic level than to

the lexical level. Musicians' responses: lexical level - 12.7%; prosodic - 42.2%;non-musicians' responses: lexical - 9.3%; prosodic - 26.6%.

2. The MANOVA procedure found a significant difference between the two groups.Musicians tended to respond significantly better than non-musicians to the pro-sodic level (FO,43) =21.65, p<O.OOI). (In group 1, FO,14) =26.409, p<O.OI; ingroup 2, F(l,29) = 9.382, p<0.05).

Page 162: Musical Imagery

DALIA COHEN AND EDNA INBAR 149

In order to identify the source of the difference in the between-groups effect, we ex-amined the univariant analysis of group differences for each emotion. Significantdifferences were found for disgust (FO,43) = 7.65, p<0.05), anger (F(1,43) =9.60,p<0.05), fear (FO,43) = 5.20, p<0.05), and firmness (F0 ,43) = 7.17, p<0.05). Theemotion most frequently identified by both groups was firmness.

Responses to neutral sentencesIn the neutral sentences, ostensibly the only option was to respond to the semanticcontent. Nevertheless, the prosodic levels were also somewhat important. The reader,according to her understanding of 'neutrality,' uttered the sentences in a relativelylow register, in medium to low volume, with monotonic pitch and intensity. She onlymaintained the proper rhythm of the syllables, so that the duration reflected the unitsat all hierarchical levels in speech, without the two uppermost levels of attitude andemotion. From our standpoint, this reflects, to a large extent, the pole of 'less' in theinverted U function. Indeed, a summation of all the responses, irrespective of theirsemantic content, shows a preference for emotions from the poles of 'less'.

No significant difference was found between the two groups of subjects, exceptfor firmness (p<O.O1), sadness (p<0.05), and fear (p<0.05). In both groups the mostsalient emotion was firmness, followed by disgust and sadness. Identification of otheremotions was statistically insignificant.

'Substitutes' and their contribution to groups ofemotionsThe concept of the 'substitute' refers to the emotions chosen by subjects as an alter-native to the target emotion on the lexical or the prosodic level. The substitute maybe similar for various reasons: semantic (positive/negative or part of a complex emo-tion) or in terms of auditory factors such as externalized/internalized, etc. Sadness,for example, may be linked semantically to joy through the mediation of longing. Theconnection between these three emotions was found in our experiment in responsesto both concurrent and nonconcurrent sentences. Here we sum up the possible substi-tutes (the 'alternatives' chosen by the subjects) for each emotion examined and thosethat they did not replace in the concurrent and nonconcurrent sentences.

In a state ofconcurrence, we found, for example, that the substitutes for longingare sadness or joy, but not firmness, anger, fear, or disgust. The substitutes for sadnessmay be disgust, anger, or longing, but not joy, firmness, or fear. Only joy and longingare not substitutes for disgust; only longing and sadness can be substitutes for fear.Interestingly, the substitutes are not always symmetrical. For example, joy may beperceived as anger, but anger is not perceived as joy.

Thus we may discern three degrees of possible relationships between the targetemotions, as intended by the speaker, and the alternative emotions suggested by thelisteners:

1. Two-way relationships between the 'target' emotion and its substitutes.2. A one-way relationship in two forms: (a) The emotions in the group serve as

substitutes for the target emotion (but not vice-versa). (b) The target emotionserves as a substitute for each of the emotions in the group (but not vice-versa).

3. No relationship.

Page 163: Musical Imagery

150 IMAGES OF EMOTIONAL EXPRESSION IN MUSIC AND SPEECH

Table 2. Summations of substitutes (or absence thereot) for emotions in concurrent sentenceswith classification into four levels of connections with the 'target' emotion.

Target Anger Disgust Sadness Joy Longing Fear Firmness

Subst.1 Disgust Sadness Disgust Longing Joy

Anger (slight)2a Firmness Fear Anger Sadness Sadness

Fear Firmness Firmness Joy LongingFirmness

2b Sadness Fear Longing Fear Anger DisgustLonging Disgust Anger

SadnessFear

3 Joy Joy Joy Anger Anger Joy Joy(not as Longing Longing Sadness Firmness Longingsubst.) Firmness Disgust

DisgustFear

These relationships are summed up in Table 2, which shows some possible groupingsand family-resemblance relationships. Particularly prominent is joy, the most 'inde-pendent' emotion, which appears as a substitute only for longing and has almost nosubstitutes itself. In other words, it is the best defined and the most clearly identifiable

If we sum up the percentage of responses to the various emotions together withthe substitutes that reflect family resemblance, we obtain a high percentage for all theemotions. For most of the emotions, the responses ranged between 85% and 95%; afew were between 60% and 70%. This is in sharp contrast to the three highly distinctgroups obtained from responses to the target emotion only, without the substitutes. (Inthe less well-defined group, the responses range between 17% and 30%; in the better-defined group they are between 75% and 80%. In this summation, the substitutes aresalient in the less well-defined emotions, sadness and longing, and minimal injoy andfirmness, which are the best-defined. Interestingly, joy and firmness are at the twoextremes of the U function; joy is at the pole of 'more' and firmness is at the pole of'less'.

In a state ofnonconcurrence, the responses were summed up in accordance withtheir 'correct' answers to the emotion as a target, expressed on the lexical or prosodiclevel and also their substitutes.

As for the lexical level, as stated above, the correct responses were few; this re-quires further study.

At the prosodic level, the responses were significant mainly when the substituteswere taken into account. A summation of the responses of all subjects to all emotions,with a breaking down of anger into externalized and internalized categories, points to

Page 164: Musical Imagery

DALIA COHEN AND EDNA INBAR 151

a first preference for externalized anger (irrespective of substitutes), followed by firm-ness. There was no difference between the responses of musicians and non-musiciansto the two types of anger in a state of concurrence. In a state of nonconcurrence theresponses to external anger were greater than those to internal anger, and the musi-cians scored much higher than the non-musicians: 85% and 62%, respectively, forexternal anger, and 50% and 12%, respectively, for internal anger. (The results showthat musicians have great sensitivity to fine nuances in speech, too.) Moreover, thesole substitute for anger in both groups was firmness. This substitute was salient withinternal anger, which, like firmness, is at the pole of 'less.' (For example, for non-musicians the substitute of firmness was stated four times that of the target emotion.)

Responses of 'vague' or 'unclear'As stated above, subjects were allowed to respond 'Vague' or 'Unclear' in addition tomentioning a specific emotion. These responses were compared in two aspects:1. The distribution of 'vague' responses for each emotion by each group of subjects

(musicians and non-musicians) under the three conditions (concurrent, noncon-current, neutral).

2. The correct identification by the two groups.

The main findings were as follows: (1) Emotions identified as of minimum vague-ness are usually those identified the most. (2) Vagueness is greater with respect tononconcurrence than to neutrality and concurrence. (3) Musicians give the 'vague'response more frequently than non-musicians (an average of26.2% 'vague' responsesby musician, versus 12% by non-musicians; it would be interesting to find out why).

Findings from the melographic and auditory analyses ofspeechThe melographic analysis indicates that there is some similarity between sentencesthat share the same lexical content, although the prosodic expression is different. Thedifference produced by the emotional-prosodic level is manifested mainly in the fol-lowing factors: The degree of zigzag of the curve in terms of pitch and intensity,including the silences; the separation of syllables (analogous to staccato/legato); theevenness of rhythm, pitch and loudness; the ending curve (ascending/descending, ac-celerando/ritardando, etc.); the range of appearance in the various parameters, andnonconcurrence. All these are reflections of the basic principles of tension/relaxation,which in this case are manifested only in natural schemata.

Overall, in the two types of analysis - melographic and verbal responses, whichproved to be very similar- we found 'family-resemblance' relationships among theemotions. For example, both joy and anger share intensification (more pronouncedin anger than in joy) of register, ambitus, and change (zigzag pattern), but joy is dif-ferentiated from anger by the evenness that typifies anger on a non-immediate level(evenness of the zigzag). Anger and firmness share rhythmicity and evenness, but infirmness, unlike anger, the evenness is on the immediate level (as it is for pitch); theydiffer with respect to pitch and density.

In Figure 2 on the following page we see some examples of relationships betweenemotions in terms of musical parameters that represent a move away from the opti-mum in the inverted U function toward either extreme (more [+] or less [-], e.g., very

Page 165: Musical Imagery

152 IMAGES OF EMOTIONAL EXPRESSION IN MUSIC AND SPEECH

Change

/ " Eveness

/"Tempo

/ "Ending

/ " Intensity

/ " Fragmentation

/ "

+(Zigzag)

AngerJoy

+(Ascending)

LongingJoyAnger

(Very gradual)

LongingSadness

(Descending)

Sadness

+

FinnnessAnger

+

AngerJoy

Finnness

JoyDisgust

FearDisgust

+

AngerFear

+

FearAngerJoy

LongingSadness

SadnessLonging

Register in pitch

/ "+JoyAnger

Figure 2.

FinnnessDisgust

Some relationships between emotional expressions in terms of relevant character-istics represented by + (more) or - (less).

high/very low, strong/weak, slow/fast). As for the very existence of change (in pitch,intensity, or duration), an extreme change may be contrasted with a lack of change orgradual change. (Here we chose the latter and focused on evenness separately.) Notall of the properties can be described by the two extremes.

Let us stress that the impact of the two extremes in the inverted U function issalient in music for both E} and E2 (in educational music for children, as we explainbelow, the extremes are avoided).

Results of experiment y: Musical analysisThe analyses pertain to musical pieces in which the emotional content is stated explic-itly in the title. Some of the pieces are taken from educational art music; others werespecially commissioned for the present study.

Expression ofemotions in children's musicExpression of immediate emotion in music (E2 ) is salient in educational art musicfor children, since one of the characteristics of this music is a connection with theconcrete world of the child, as manifested in visual and emotional images (Shapira,in press). In his study on this type of music for the piano in Israel, Shapira foundthat only about one-third· of works have no verbal title. The emotional expressionsare usually not extreme in both directions; most of them are 'positive' (all the dances

Page 166: Musical Imagery

DALIA COHEN AND EDNA INBAR 153

express happiness), while the 'negative' emotions, such as sadness, are limited to mildexpressions.

Analysis of the titled works in Shapira's repertoire (short works commonly usedin piano teaching) shows a strong similarity between factors that cause excitement inthese works and those found in our study. The overall picture reveals that most ofthe characteristics of these works fall into the framework of 'neither too much nor toolittle' (for most of the parameters). The register does not extend beyond two octavesabove middle C and three octaves below it. The loudness generally ranges from p to f;there is noff and there are only a few instances ofpp. The tempo ranges from andanteto allegro. Most of the excerpts are in the general form of ABA. ('Joke' stands out forits ABA'B' form.)

Overall, the excerpts from this study are very different from each other. Never-theless, we observe some shared characteristics: a proliferation of concurrences ofdifferent types, a proliferation of convexity in the various parameters, a paucity ofjumps, and avoidance of extremes in tempo, register, loudness, and ambitus.

Within this confining framework, we notice rather clear differences between thepieces that express external excitement (especially joy) and the others. Among thelatter, we notice a difference between calm and sadness.

In the external type, we find 'more' in pitch, staccato, tempo, zigzags, intensity,evenness of meter and expansions of the major scale. In the second type, the majorscale does not occur in the sad excerpts. We encounter convex motives with simplebeats, and the calm excerpt has a greater variety of rhythms and embellishments.

Expression ofexcitement in musical phrasesAs stated above, the musical phrases were written at our request and were tailored tothe selection of verbal sentences used in our experiments. Thus, the musical excerptsare self-standing phrases. Some were written for a specific text; others reflect onlythe emotion expressed lexically. All excerpts are 3-12 measures long. Despite thedifferences among excerpts written for the same emotions, a similarity was found -sometimes to a surprising extent - perhaps in part because of the identity of the textand the structure of the sentence, e.g., a question mark or an exclamation point atthe end. From our standpoint, it is important that the similarities pertain at least tosome principles of the factors that arouse different types of excitement. In Figure 3on the next page we present the musical 'interpretations' of two emotions -angerand fear- by two composers. In 'anger' (Fig. 3a, b), both sentences exhibit greatloudness, a zigzag curve, fractures and staccato, a certain degree of evenness, and anupward ending. In 'fear,' one sentence (Fig. 3.2a) is in adagio and mostly in p, whilethp other (Fig. 3.2b) is in agitato and mostly f, but they nevertheless have a lot incommon. Both works exhibit considerable fracture, begin within a small ambitus andchromatic descent with repetitions of notes, include sudden 'jump' (fright?), and endin an upward direction (a question mark?).

Page 167: Musical Imagery

154 IMAGES OF EMOTIONAL EXPRESSION IN MUSIC AND SPEECH

F EAR

II. Wolpe

nia- tzet7

ri t.

kho 1 ha-zman, kho 1 ha-zman;A N G E K

> >ff dim. .. f m:p f m:p -e: ff-=::.

la Michaeladalio, molto rubato

ha zman, ha zman a oi sho e let et az mi el fo at nim zet1p -== rr{' ==- PP -== f sub.P -== "11-== ==-p

Serclu ShapiraAcitato plu tranquilio

-= p

Figure 3. Musical interpretation of two emotions - anger (1), fear (2) - by two composers (aand b).

Concluding remarks

Responses to the expression of emotions in speechThe responses to the emotions - expressed in ordinary, unexaggerated speech - wereexamined with respect to identification of the speaker's intention and the substituteschosen by the listeners. These two variables help us classify the emotions - one of ourmajor goals which enabled us to understand the overall schemata in the expressionof emotions in speech and music. Indeed, in our investigation the responses variedgreatly with respect to the various emotions and were found to be dependent on thetype of emotion, on the level at which it appeared (lexical or prosodic), in the link-age of emotions (concurrence or nonconcurrence), and on the category of listeners(musicians and non-musicians).

In no case was there a 100% response to a deliberately expressed emotion. Evenwhen the levels (prosodic and lexical) were concurrent and mutually supportive, var-ious substitutes were offered for all emotions to varying extents. Perhaps this factshould not have surprised us, bearing in mind how common misunderstandings are

Page 168: Musical Imagery

DALIA COHEN AND EDNA INBAR 155

in ordinary speech (in terms of the emotions expressed). As was expected, in a stateof nonconcurrence, in which the two levels 'compete' with each other and each mayinterfere with the listener's perception of the other level, the number of 'substitutes'was much greater than in concurrent sentences.

Nevertheless, the findings of our study show certain regularity in the responsesto the emotions - a regularity guided by the classification of the emotions - whichmay regulate both the hierarchies among them in terms of correct identity and theirspecific substitutes. One may also say that the findings support the classification ofthe emotions.

This classification is not simple, because people perceive emotions in differentdimensions-the psycho-linguistic (P.L.) and the psycho-musical (P.M.) - simultane-ously.

The P.L. dimension provides the positive-negative classification and the classifi-cation of relationships that appear in mixed-emotion forms (P.L. emotional patterns).The P.M. dimension provides the externalized-internalized classification and relation-ships between emotions in terms of universal causes of excitement; it also determineslevels of clarity and distinguishability in the auditory manifestation of the variousemotions.

Despite the complexity, the results from the concurrent sentences and the pro-sodic level in the nonconcurrent sentences indicate that, to some degree, there is ashared emotional auditory image among subjects. (Note that we did not check reflex-ive 'real-time' emotional responses, but rather reflective images, which were activatedby emotional auditory stimuli.)

Family resemblance in the images of the various types of emotionsOur findings from the analysis of the verbal responses to the vocal sentences heardshow that each emotion is determined by a number of factors in different parameters;although they may vary, usually one or more factors appear consistently. For example,firmness is always even (expressing power, situated at the internalized pole) at least inone of the parameters, while the other factors may rest at the externalized pole. Joy isalways high in pitch and zigzag curve (the externalized pole). Sadness is always low,quiet, and slow (the internalized pole). Longing is situated at the internalized pole andincludes nonconcurrence among the parameters (an ascending ending in diminuendo).Nonconcurrence may create a linkage among different emotions in accordance withits clashing traits. Thus, longing may be associated both with joy, because of itsascending pitch, and with sadness, because of its diminuendo ending and additionalcharacteristics at the internalized pole (in addition to the P.L. association among thethree, as mentioned above). All these characteristics create the potential for naturalfamily-resemblance relationships among emotions, which may be classified accordingto their position at the internal pole, the external pole, or both.

Integration of the findings from the verbal responses and the analysesThe location of the emotions on the psycho-linguistic (P.L.) and psycho-musical (P.M.)dimensions may help explain the formation of groups of emotions created by substi-tutes and hierarchy. Furthermore, as stated above, they may be linked by means of theP.L. emotional patterns in 'family-resemblance relationships'.

Page 169: Musical Imagery

156 IMAGES OF EMOTIONAL EXPRESSION IN MUSIC AND SPEECH

Groups ofemotions derived from 'targets' and their 'substitutes'Tables 3 and 4 on page 157 present groups of emotions and explanations based onanalysis of common factors, at least for some of the emotions, in light of the twodimensions (P.M./P.L.) The grouping of emotions results from subjects' verbal re-sponses to the sentences heard. The connections between the emotions in these groupsreflect the complexity of classification of emotions, and they are part of the emotionalsuperschema.

Groups in the hierarchy ofemotionsThe hierarchy, in terms of correct identification, was found to differ slightly betweenthe prosodic- and lexical-level combinations in concurrence and in nonconcurrence.In concurrence, first place is held by joy; in nonconcurrence, externalized anger comesfirst. Emotions that maintained their position in the hierarchy were firmness (in secondplace), and disgust (second to last). In last place were sadness in concurrence and fearin nonconcurrence. Thus, at the top of the hierarchy we find externalized emotionsthat are more distinguishable (in auditory terms) or which project a message of power:firmness, joy, and externalized anger.

Expression of emotions in musicGenerally, the 'aesthetic ideal' guiding the rules in artistic musical styles is basedmainly on E} (latent-abstract emotion). In our study, we chose to examine ways ofexpressing type E2 emotions (direct-immediate) in two types of musical material:(1) Israeli educational literature for children, which is limited mainly to two typesof emotions -joy and sadness - that are expressed in the psycho-linguistic (positive-negative classification) and psycho-musical (externalized-internalized classification)dimensions; and (2) composers' interpretations of several sentences used in our study.In this material, we found meaningful parallels to our findings in the domain of speech.

Summary

To sum up, our findings with regard to music and speech fall into line, to a large extent,with works of others. Here our aim was to investigate the principles of the basic rulesof excitement as well as to classify emotions in the P.L. and P.M. dimensions. Thiswas done without looking deeply into the connections between these two dimensions,which would also require a look at the biophysiological dimension.

Here we related to the auditory manifestation of emotions in various media butfocused on the performance of emotions in everyday speech, in accordance with rulesof excitement in music. It would be interesting to perform a parallel study focusingon written and performed music.

The findings, along with the findings of other studies, provide some answers forour initial questions, by supporting the hypothesis concerning the existence of an over-all superschema for excitement.

We found (1) that there is a correspondence between emotional images and audi-tory schemata in music and on the prosodic level of speech; and (2) that the auditoryschemata are part of an overall superschema that embraces all emotional expressions

Page 170: Musical Imagery

DALIA COHEN AND EDNA INBAR 157

Table 3. Location of the emotions on the P.L. and P.M. dimensions, based on our assump-tions.

EmotionJoy External PositiveFirm Internal or External Positive or NegativeLonging Internal Positive or NegativeAnger Internal or External NegativeSadness Internal NegativeFear Internal NegativeDisgust Internal Negative

Table 4. Groups of emotions: 'target' and main substitutes in the psycho-linguist and psycho-musical dimensions (for more substitutes in concurrence, see Table 2 on page 150).

Explanation of type of connection Main substitutes Target emotionPsycho- Psycho- Concurrence Nonconcurrencelinguistic musical (Con.) (N.con.)

Positive Joy Joy LongingDisgust*

Introvert negative Finnness Anger DisgustAnger Sadness

Except joy - Con.: Negative Longing Disgust SadnessIntrovert N.con.: Ep.PL** Joy AngerFamily Negative Disgust Anger-ex.resemblance PL (power) FinnnessFamily Negative Finnness Finnness Anger-in.resemblance DisgustFamily Positive Anger Anger Firmnessresemblance With joy JoyWith angerAnger-ex. Positive Disgust* Longing (slight) Joy

With longing AngerIntrovert Ep.PL** Longing Sadness FearIntrovert In prosodic - In lexical- Neutral

Finnness FinnnessDisgust DisgustSadness Sadness

Interestingly, 'fear' never appeared as a significant substitute to any emotion.* The appearance of 'disgust' as a substitute to a positive emotion in a state of noncon-currence may reflect awareness to the vary nature of the contradiction between the twolevels.

** Ep.PL = Emotional patterns in the psycho-linguistic dimension.

Page 171: Musical Imagery

158 IMAGES OF EMOTIONAL EXPRESSION IN MUSIC AND SPEECH

in the different domains. We realize that this superschema is not simple for a num-ber of reasons: the variety and the many nuances of emotions, their different dimen-sions, the complex connections of family resemblance, and the diversity in peoples'responses.

As for the question about the role of learned versus natural schemata in the com-munication of emotions in music and speech, we may conclude that both types ofschemata -learned and natural- are present in the perception and production of emo-tional images. This conclusion raises a question: Can every emotional image indeedmanifest itself in some kind ofmusical image? And by the same token-can every mu-sical expression be connected to some emotional image? Of course, further study isrequired, and we intend to continue to investigate emotional expression in music andspeech in light of the proposed model.

Notes

I. The researchers note the distinction between emotions, which are fleeting, and moods, which are long-term enlotional states.

2. We may say that performance that violates the natural rules, like an ascent in a diminuendo, may arouseexcitement. We may also say that if the rules of Palestrina counterpoint (P.C.) represent natural lawsof calm (Cohen 1971), then the rules that represent the Affektenlehre in Baroque music (Buelow 1983)are deliberate 'violations' of the rules of

3. E.g., to accelerate and amplify the loudness of a rising melodic line or to treat a descending line in theopposite way.

4. For a rather comprehensive summary of studies on expression of excitement in speech between 1930and 1984, see Scherer (1985).

5. E1 represents the uniqueness of music, the most abstract of the arts, whose complex organization isbased mostly on learned schemata; is less complex and is based mostly on natural schemata.

6. The shortage of space precludes us from presenting examples. Those interested can obtain examplesfrom us directly.

References

Bolinger, D. (Ed.) (1972). Intonation. Middlesex, England: Penguin Modem Linguistics Readings.Buelow, G. (1983). Johann Mattheson and the invention of the Affektenlehre. In G. Buelow & H. Marx

(Eds.), New Mattheson Studies (pp. 393-407). Cambridge: Cambridge University Press.Clarke, E. F. (1988). Generative principles in music performance. In 1. A. Sloboda (Ed.), Generative

Processes in Music (pp. 1-26). Oxford: Clarendon Press.Clynes, M. (1977; rev. ed. 1989). Sentic, the Touch ofEmotions. New York: Doubleday.Cohen, D. (1971). Palestrina counterpoint: A musical expression of unexcited speech. Journal ofMusic

Theory, 15, 23-57.Cohen, D. (1978). Rhythm and meter in music and poetry. Israeli Studies in Musicology, 3, 99-141.Cohen, D. (1983). Birdcalls and the rules of Palestrina counterpoint: Towards the discovery of universal

qualities in vocal expression. Israel Studies in Musicology, 3, 96-123.Cohen, D. (1986). The performance practice of Rig-Veda: A musical expression of excited speech. Yuval,

5,292-317.Cohen, D., & Granot, R. (1995). Constant and variables influences on stages of musical activities. Journal

ofNew Music Research,24, 197-229.Cohen, D., & Dubnov, S. (1997). Gestalt phenomena in musical texture. In M. Leman (Ed.), Music, Gestalt,

and Computing (pp. 326-405). Berlin: Springer.

Page 172: Musical Imagery

DALIA COHEN AND EDNA INBAR 159

Cohen, D., & Wagner, N. (2000). Concurrence and non-concurrence between learned and natural schemata:The case of J. S. Bach's saraband in C minor for cello solo. New Music Research, 29(1),23-36.

Dowling, W. 1., & Harwood, D. L. (1986). Music Cognition. New York: Academic Press.F6nagy, I., & Magdics, C. (1963/1972). Emotional patterns in intonation and music. In D. Bolinger (Ed.),

Intonation (pp. 286-312). Middlesex, England: Penguin Modem Linguistics Readings.Gabrielsson, A. (1974). Performance of rhythm patterns. Scandinavia Joumal ofPsychology, 15,63-72.Gabrielsson, A. (1987). Once again: The theme from Mozart's piano sonata in A major. A comparison

of five performances. In A. Gabrielsson (Ed.), Action and Perception in Rhythm and Music (vol. 55,pp. 81-103). Royal Swedish Academy of Music.

Gabrielsson, A., & Juslin, P. N. (1996). Emotional expression in music performance. Psychology ofMusic,24(1), 68-91.

Hacohen, R., & Wagner, N. (1997). The communicative force of Wagner's leitmotifs: Complementaryrelationships between their connotations and denotations. Music Perception, 14(4),445-476.

Hanslick, E. (1854/1957). The Beautiful in Music (trans. G. Cohen). New York: Liberal Arts Press.Heath, R. G. (1986). The neural substrate for emotion. In R. Plutchik & H. Kellerman (Eds.), Emotion-

Theory, Research, and Experience (vol. 3, pp. 3-35). New York: Academic Press.Izard, C. E., & Buechler, s. (1980). Aspects of consciousness and personality. In R. Plutehik & H. Keller-

man (Eds.), Emotion-Theory, Research, and Experience (vol. 1, pp. 165-187). New York: AcademicPress.

Katzir, Z. (1995). The meaning of the variations in the Babbler 'shout': A musical ethological approach.Behavioral Processes, 34, 213-232.

Laufer, A. (1987). Intonation. Jerusalem: The Hebrew University of Jerusalem.Lazarus, R. S., Kanner, A. D., & Folkman, S. (1980). Emotions: Cognitive-phenomenological analysis. In

R. Plutchik & H. Kellerman (Eds.), Emotion-Theory, Research, and Experience (vol. 1, pp. 189-217).New York: Academic Press.

Lieberman, (1975). On the Origins ofLanguage: An Introduction to Human Speech. New York: Macmil-lan.

List, G. (1963). The boundary of speech and song. Ethnomusicology, 7,1-16.Meyer, L. B. (1956). Emotion and Meaning in Music. Chicago: University of Chicago Press.Narmour, E. (1990). The Analysis and Cognition ofBasic Melodic Structures: The Implication-Realization

Model. Chicago: University of Chicago Press.Plutchik, R. (1980). A general psychoevolutionary theory of emotion. In R. Plutchik & H. Kellerman

(Eds.), Emotion-Theory, Research, and Experience (vol. 1, pp. 3-33). New York: Academic Press.Rapoport, E. (1998, May 28-30). Singing: The art, expression, and science: exploring the interior of

vocal tones. Paper presented at the symposiumMusical Cognition and Behavior: Relevance to MusicComposition, Econa, Rome, University of Rome, 'La Sapienza' .

Reber, A. (1985). Dictionary ofPsychology. Penguin Books.Repp, B. H. (1998). Obligatory 'expectations' of expressive timing induced by perception of musical

structure. Psychological Research, 61,33-43.Sachs, C. (1946). The Conl1tlonwealth ofArt. New York: W.W. Norton & Company.Scherer, K. R. (1986). Vocal affect expression: A review on a model for future research. Psychological

Bulletin, 99, 143-165.Shapira, S. (2000). Educational Art Music for Piano in Israel. PhD Thesis (unpublished), The Hebrew

University of Jerusalem, Israel.Sloboda, 1. (1991). Music structure and emotional response: Some empirical findings. Psychology of

Music, 19, 110-120.Sundberg, J. (1982). Speech, song and emotions. In M. Clynes (Ed.), Music, Mind and Brain: Neuropsy-

chologyofMusic (pp. 137-149). New York: Plenum Press.Sundberg, J., Friberg, A., & Fryden, L. (1991). Common secrets of musicians and listeners: An analysis

by synthesis study of musical performance. In Howell, R. West, & I. Cross (Eds.), RepresentingMusical Structure (pp. 161-197). London: Academy Press.

Todd, N. P. MeA. (1985). A model of expressing timing in tonal music. Music Perception, 3, 33-58.Todd, N. P. McA. (1992). The dynamics of dynamics: A model of musical expression. Journal of the

Acoustical Society ofAmerica, 91, 3540-3550.Wallin, N. L., Merker, B. & Brown, S. (2000). The Origins ofMusic. Cambridge, Mass.: MIT Press.

Page 173: Musical Imagery
Page 174: Musical Imagery

9

Imaging Soundscapes:Identifying CognitiveAssociations betweenAuditory and VisualDimensions

Kostas Giannakis and Matt Smith

Introduction

The extent to which we can represent musical information in computers, in a way thatmatches our mental images, is one of the most intriguing and compelling questions inmusic research. Leman (1993) distinguishes three kinds of musical representation:1. Acoustical, Le. based on the physical properties of sound (e.g., sonograms, spec-

trograms).2. Subsymbolic, i.e. based on the known behaviour of the human hearing system

and how the human brain processes auditory information (e.g., auditory models).3. Symbolic, Le. based on the manipulation of symbols (e.g., Common Music No-

tation).

Our research focuses on the mapping between a symbolic (or abstract) system (in thiscase a colour model) and the perceptual dimensions of musical sounds. The postulatehere is that it is possible to devise a visual approach to sound design based on hu-man cognitive associations between auditory and visual percepts. Empirically derivedauditory-visual associations can contribute to the design of more cognitively usefulsound design tools that support the externalisation of composers' internal auditoryimages.

By auditory percepts we mean the following perceptual dimensions of sound:

Page 175: Musical Imagery

162 IMAGING SOUNDSCAPES

1. Loudness, Le. a psychoacoustic variable that refers to the subjective perceptionof sound intensity.

2. Pitch, Le. a psychoacoustic variable that refers to the subjective perception ofsound frequency.

3. Timbre, i.e. 'that attribute of sensation in terms of which a listener can judgethat two steady complex tones having the same loudness, pitch and duration aredissimilar', Plomp (1976).

By visual percepts we refer to elements such as: colour, shape, and texture. In thepresent study we focus on the perception of colour. Colour has three perceptual di-mensions (Fortner & Meyer, 1997):1. Hue, Le. the dominant wavelength in the power spectrum of a colour.2. Saturation, i.e. the degree to which hue is perceived to be present in a colour.3. Light intensity, Le. how light or dark a colour is.

We have designed and conducted an experiment in order to examine the associationsbetween pitch, loudness and the above colour dimensions. The results are presentedand discussed.

Related work

Colour spaces for soundAn important concept in colour research is the concept of colour space. The lat-ter is a formal method of representing the visual dimensions of colour (Jackson etaI., 1994). There are various examples of colour spaces ranging from purely physi-cal models (e.g., RGB) to more perceptually based models (e.g., CIELUV, CIELAB,HSL, HSV, NCS).l Colour spaces present a number of important features that are alsohighly desirable in the area of sound design. For example, by arranging colours in athree-dimensional space it is easy to understand concepts such as colour complemen-tarity, similarity, and contrast. In a similar manner, it may be possible to structuretimbre relations in terms of similarity, difference, etc. Various empirical (e.g.,Bismarck, 1974; Grey, 1975; Plomp, 1976; Slawson, 1985; McAdams, 1999) haveshown that timbre is a multidimensional attribute of sound and have proposed a smallnumber of prominent dimensions (e.g., sharpness, compactness, roughness, etc.) forthe qualitative description of timbre.

Barrass (1997, pp. 96-97) cites Caivano (1994) and Padgham (1986) as the first at-tempts to model sound using colour spaces. The auditory-visual associations that wereproposed by these studies are summarised in Table 1 on the next page. However, thesestudies are based on correspondences that may exist between the physical dimensionsof sound and colour. For example, in Caivano's approach, hue is associated with pitchsince both these dimensions are closely related to the dominant wavelengths in colourand sound spectra respectively. In the same manner pure (or high-saturated) coloursare associated with pure (or narrow bandwidth) tones whereas low-saturated colours(those that involve wider bandwidths of wavelengths) are associated with complex

Page 176: Musical Imagery

KOSTAS GIANNAKIS AND MATT SMITH

Table 1. Studies that attempt to model sound using colour dimensions.

Studies Colour DimensionsHue Colourfulness Lightness

Padgham (1986) Formants Timbre Loudness

Caivano (1994) Pitch Timbre Loudness

Barrass (1997) Timbre Brightness Pitch

163

tones and noise. Finally, light intensity is associated with loudness (black and whiterepresent silence and maximum loudness respectively with the greyscale representingintermediate levels of loudness). It is of further interest to investigate whether theabove associations can be supported by empirical studies.

SynaesthesiaOne useful source of information for auditory-visual associations may be a closerinvestigation of the phenomenon of synaesthesia. In one of the most detailed accountsfor synaesthesia to date, Marks (1997, p. 47) defines synaesthesia as ' ...the translationof attributes of sensation from one sensory domain to another...'. The associationbetween visual and sonic stimuli (i.e. coloured-hearing synaesthesia) is one of themost common synaesthetic conditions and manifests itself in two different but veryrelated phenomena:1. Coloured vowels, i.e. visual sensations produced by the sound of vowels.2. Coloured music, i.e. visual sensations produced by musical sound.

Marks examined a large number of reported synaesthesia studies related to colouredvowels and combined the results in order to identify general characteristics and consis-tencies among synaesthetes. The opponent colour model (see Fairchild, 1998; Jacksonet aI., 1994) was used with the opponent colour axes being: black-white, red-green,and yellow-blue. Marks found that the black-white axis predicts vowel pitch and thatthe red-green axis predicts the ratio of the first two formants in the vowel spectra (thefirst two formants are considered to be the most important ones for vowel discrimina-tion). In further studies (Marks, 1997, p. 72) with musical tones, Marks reports exper-iments with non-synaesthete subjects that have shown associations between pitch andlight intensity as well as loudness and light intensity. Although these associations arein agreement with earlier synaesthesia studies, Marks' overall conclusion was that it isneither pitch nor loudness that is related to light intensity, but auditory brightness. Thisconclusion is based on an assumption that auditory brightness is the same as auditorydensity, a dimension that increases when both pitch and loudness increase. However,auditory brightness has been shown to be a dimension of timbre that is determinedby the upper limiting frequency and the way energy is distributed over the frequencyspectrum of a sound (see Bismarck, 1974; Grey, 1975). Furthermore, a problem lies

Frances Shih
Page 177: Musical Imagery

164 IMAGING SOUNDSCAPES

Table 2. Current computer music systems for sound design that employ auditory-visual asso-ciations.

Applications Colour Dimensions DescriptionHue Colourfulness Lightness

Metasynth .I K t/Red - Yellow - Green scale for spatial

Wenger (1998) position. Dark - Light for Soft - loud.Phonogramme K K t/

light - Dark for Soft - loudLesbros (1996)

in the method behind the above-described experiments. Marks investigated only thedimension of light intensity, therefore hue and saturation were not considered. It is notvery surprising to suggest that when people are asked to relate either pitch or loudnessto a dark-light scale, they will succeed in both pitch and loudness. The question thatarises is what happens when there are multiple visual and auditory dimensions for thesubjects to associate.

Sound design systemsColour dimensions have been incorporated in a number of current computer musicsystems for sound design (see Table 2). The most common association is betweenlight intensity and loudness. The level of light intensity (how dark or light a colour is)specifies the loudness for a sound with black and white usually being the minimumand maximum values respectively. Hue and saturation have been neglected with theexception ofMetasynth (Wenger, 1998) where a red-yellow-green hue scale is used todetermine the spatial position of sound. Furthermore, there is no general agreementon the use of dark (or light) as soft or loud, although in conventional sound represen-tations (e.g., sonograms, correlograms, cochleagrams2) darker areas represent higherlevels of amplitude whereas light areas represent the lower levels.

Summary of reviewed workThe reviewed attempts to create colour spaces for sound lack empirical evidence tosupport the proposed physical correlations between auditory and colour dimensions.Similarly, none of the reviewed computer music systems for sound design are basedon empirical studies to support design strategies. This has resulted in a number ofdifferent approaches that in certain cases are very different and inconsistent. The ma-jority of synaesthesia related studies have suggested an association between pitch andlight intensity. Although this association is empirically supported, various method-ological problems have been identified (e.g., other colour dimensions have not beeninvestigated in the reported experiments).

In general there is no theoretical framework for auditory-visual associations basedon empirical studies and that can be used for intuitive sound descriptions. The lack ofsuch a framework forms the motivation behind the research we have conducted.

Page 178: Musical Imagery

KOSTAS GIANNAKIS AND MATT SMITH

A constant hue square. Saturation runs through thehorizontal axis whereas lightness is on the vertical one.

165

Clicking on a colour chip displays a poup menuwith the numbers 1-6 corresponding to the sixsounds in a sequence and the six slots in thedisplay area (from left to right).

Figure I. The Colour Palette.

Experimental design

Display area. Selecting a number from the popup menuwill cause the selected colour to be displayed in theappropriate slot.

An experiment was designed to investigate the matching of pure tones with colours.The main objective was to provide results to help answer the following questions:

1. To what extent can a colour model based on hue, saturation, and light intensityprovide a useful metaphor to describe loudness and pitch?

2. Which of these colour dimensions are associated with loudness and pitch?3. To what extent do different sound frequency ranges influence colour selections?

A prototype computer application was designed for use in this experiment comprisinga custom colour palette and three series of sound sequences.

Colour paletteWe used a computer implementation of the well known HSV (Hue, Saturation, Value)(see Jackson et aI., 1994) colour model in order to select six hues: red (R), yellow(Y), green (G), cyan (C), blue (B), magenta (M) - in other words the three primary(RGB) and three secondary (YCM) hues. Value is the correlate of the dimension oflight intensity in the HSV colour space (Fortner and Meyer 1997). The saturation andvalue levels were subdivided into six equal steps thus producing thirty-six differentsaturation-value combinations for each hue (6x36 =216 colours in total). The HSVvalues were then translated into their RGB equivalents encoded in the MacProlog32programming environment in order to display the custom colour palette (see Fig. I).

Page 179: Musical Imagery

166 IMAGING SOUNDSCAPES

Table 3. The sound sequences used in our experiment.

Sequence 1 I 2 I 3 I 4 5 I 6 7 I 8 I 9 I 10 11Complexity 1 2 3 4Loudness . I . II I . IIPitch . I . I . I II II

Sound sequencesThe auditory dimensions under examination in this experiment were loudness andpitch. We used pure tones, Le. sounds with a single sinusoidal frequency component,in order to neutralise the effect of timbral richness (sound complexity) on subjects'responses. All sequences consisted of six sounds whose frequency content was asingle fundamental frequency. The individual tones were designed with PowerSyn-thesiser (Russell, 1995), a computer application for the design of psychoacousticalexperiments involving sound. Although PowerSynthesiser provides adequate controlover amplitude and frequency, mention should be made that these objective physicalproperties of sound are closely related to loudness and pitch, which in contrast aresubjective perceptual measures.

Three series of eleven sound sequences were designed - one series for each ofthe following frequency ranges: 110Hz - 220Hz (Low), 440Hz - 880Hz (Mid), and1760Hz - 3520Hz (High). Each sequence consisted of six tones. The sequences weredesigned and classified according to their level of complexity (Table 3 depicts thecontent of each sequence in a series).

The first complexity level comprised sequences where tones were either increasingor decreasing linearly in one auditory dimension while keeping the other constant. Thesecond level was an extension to the previous case with tones varying in a non-linearway. The third level incorporated sequences with both loudness and pitch varyingsimultaneously either in the same or opposite linear direction. Finally, the fourth levelextended the previous case with non-linear variation of loudness and pitch. It shouldbe mentioned that during the experiment, sequences were not introduced in the sameorder as depicted in Table 3. Instead, they were shuffled and their order was the samefor all subjects within a subject group but in all cases sequences started with lowcomplexity and progressed to higher complexity.

SubjectsWe had twenty-four subjects in total and all were given a screening questionnaireabout their experience in both traditional and computer music. The exact compositionof the twenty-four subjects was:1. twelve undergraduate students studying sonic arts (11) or other fields (1) with

average music experience,2. five individuals with a great deal of computer music experience,3. seven individuals with no musical background.

Page 180: Musical Imagery

KOSTAS GIANNAKIS AND MATT SMITH 167

Subjects were randomly assigned into three groups of eight (one group for each seriesof sound sequences) and screened with an Ishihara colour plates test to detect colourvision deficiencies (one subject failed the Ishihara test). The purpose of the colourvision test was not to disqualify subjects but to test the effect(s) of colour blindnesson subjects' colour selections.

Experimental environmentThe experiment was conducted in a room with normal 'office' lighting and soundswere presented binaurally through headphones. Due to hardware limitations the ex-periment was designed and run on an Apple PowerMacintosh capable of representingthousands of colours, a limitation that in some cases produced less uniform colourvariation in our Colour Palette. Subjects sat approximately 80 cm away from the com-puter screen and the components of the interface were sized for comfortable viewingand manipulation at that distance.

Experimental taskThe experimenter demonstrated how to use the sequence player and the colour palette.This was followed by a short practice period of one tone sequence. The practice se-quence was part of the series but reintroduced later in the experiment. The exper-imental task was: for the current sequence of six tones to create a sequence of sixcorresponding colours. Subjects could listen to the current sequence as many times asthey wished, at any point during the task. Each subject completed the task for elevensequences. Subjects performed the experiment at their own pace and times rangedfrom thirty to forty-five minutes. The experimenter was present throughout the ex-periment recording observations that formed the basis for post experiment interviewswith subjects. Finally, a data collection program logged colour selections in terms ofhue, saturation, and value, as well as completion time per sequence.

ResultsWe now present the results obtained from the above-described experiment. The pre-sentation is based on a qualitative method supported by quantitative data. The majorqualitative variable is the colour selection strategy followed by subjects. With threecolour dimensions there are 23 =8 possible strategies. In order to identify a strategywe assigned a score to each strategy based on the correlation with the correspondingvariation in pitch and/or loudness. Tables 6-13 on pages 169ff. show the results afterthe processing of the raw data obtained from subjects' colour selections. It must benoted here that each colour chip in the palette was a discrete step (the same held forloudness and pitch values). This allowed for the classification of sounds and coloursalso in discrete steps. As an example let us assume that the six colour selections (rawdata) for a sequence of six tones with varying loudness and constant pitch were thosedepicted in Table 4 on the following page. These are translated to the discrete stepsshown in Table 5.

If we now examine the above step sequences, we can conclude that the variation insaturation correlates perfectly with the linear step variation in loudness (Le. from softto loud: 1 2 3 4 5 6) while hue and value steps remained constant. Thus, our colourstrategy for this example sound sequence was to associate saturation with loudness

Page 181: Musical Imagery

168 IMAGING SOUNDSCAPES

Table 4. Raw data for six colour selections. Hue is expressed in angle degrees around theHSV colour wheel.

Dimension Colour 1 Colour 2 Colour 3 Colour 4 Colour 5 Colour 6Hue 0° 0° 0° 0° 0° 00Saturation 16% 32% 48% 64% 80% 96%Value 64% 64% 64% 64% 64% 64%

Table 5. The discrete colour steps based on the raw data in Table 4.

Dimension Step 1 Step 2 Step 3 Step 4 Step 5 SteD6Hue 1 1 1 1 1 1Saturation , 2 3 4 5 6Value 4 4 4 4 4 4

within the same hue and value confines. Furthermore, Figures 3-10 one pages 173-177 are based on the patterns obtained from subjects and display average levels foreach colour dimension relative to pitch and/or loudness. Note that floating point num-bers were not actually part of the experiment. For example, a saturation step of 2.3was not a possible selection. Thus, the figures should be interpreted as showing thetrend in the patterns and not actual steps.

Discussion

Table 6 on the next page shows the overall results for sequences where the only varying(linear and non-linear) auditory attribute was loudness. In single dimension terms (Le.varying one dimension while keeping the remaining dimensions constant), hue variedin 1/72, saturation in 38/72, and value in 9/72 sequences. Furthermore, hue remainedconstant in 55/72, saturation in 15/72, and value in 49/72 sequences. These resultssuggest that the majority of subjects from all three groups varied the saturation levelwhile keeping hue and value at constant levels. This correlation between saturationand loudness can also be seen in Figure 3 on page 173. Quiet sounds were associatedwith weak colours while louder sounds evoked stronger colour selections.

Table 7 on the next page shows the overall results for sequences where the onlyvarying (linear and non-linear) auditory attribute was pitch. In this case the results arenot as clear as for loudness. There was no dominant colour selection strategy as wellas a high number of selections that involve variation in all three colour dimensions.The results for strategies that involve variation in only one colour dimension showthat subjects varied hue in 10/72 sequences, saturation in 15/75, and value in 11/72.However, these figures are relatively small to support safe conclusions although Fig-ure 4 on page 173 hints at a possible pitch-value association. Furthermore, we cansuggest that hue doesn't seem to have any immediate role in colour selections since

Page 182: Musical Imagery

KOSTAS GIANNAKIS AND MATT SMITH 169

Table 6. Overall results for sequences willi varying loudness (linear and non-linear) and con-slant pitch.

-: J • , • I ". -< r; - ( ,- . - . -

, .'K K K 1 0 0 1.I K K 5 3 2 10K .I K 4 4 7 15K K .I 6 3 2 11.I .I K 3 2 3 8.I K .I 1 0 0 1K .I .I 0 11 3 14.I .I .I 4 1 7 12Total 24 24 24 72

Table 7. Overall results for sequences willi varying pitch (linear and non-linear) and conslantloudness.

K K K 0 3 0 3.I K K 1 0 0 1K .I K 13 14 11 38K K .I 5 1 3 9.I .I K 1 3 3 7.I K .I 0 0 2 2K .I .I 2 1 2 5.I .I .I 2 2 3 7Total 24 24 24 72

hue remained constant in 41/72 sequences, saturation in 23/72, and value in 34/72.Table 8 on the next page shows the overall results for sequences where both loud-

ness and pitch were varying simultaneously in a positively correlated fashion. Here,the dominant colour selection strategy (17/48 sequences) was to vary both saturationand value while hue remained constant (33/48 sequences). This supports the pointmade before that saturation and value are the key dimensions for differences in loud-ness and pitch. However, since both loudness and pitch follow the same pattern (eithersoft-Ioud/low-high or 16ud-soft/high-low), we cannot immediately tell which of thetwo corresponds to which auditory dimension.

Page 183: Musical Imagery

170 IMAGING SOUNDSCAPES

Table 8. Overall results for sequenceswith linear (ascending or descending) variation in bothpitch and loudness.

, , ,---- "- .0:--

\ w

- - -

)( )( )( 0 0 1 1./ )( )( 1 0 5 6)( ./ )( 1 5 4 10)( )( ./ 3 0 2 5./ ./ )( 1 0 1 2./ )( ./ 1 0 0 1)( ./ ./ 7 8 2 17./ ./ ./ 2 3 1 6Total 16 16 16 48

Table 9. Overall results for sequences with descending loudness (linear) and ascending pitch(linear).

..... I """':1" • 1-';Y "- ----- - ''''''''''---'-- - --- - ..,

..1 1 f, , , .

.. • ... • , .........._ ............._,._. L_ _ ._ _ •• • ... - -

)( )( )( 0 0 1 1./ )( )( 0 0 2 2)( ./ )( 1 2 1 5)( )( ./ 0 0 1 1./ ./ )( 0 1 1 2./ )( ./ 1 0 0 1)( ./ ./ 5 5 1 11./ ./ ./ 1 0 1 2Total 8 8 8 24

In order to address this issue we examined the subjects' responses for the thirdlevel of sequence complexity, i.e. sequences with negatively correlated levels of loud-ness and pitch. Tables 9 and 10 contain the results for these two cases. The resultssuggest that in both cases saturation and value were again the varying colour dimen-sions (hue remained constant in 17/24 and 16/24 sequences respectively). Table 11on the facing page (left side) breaks down the results for this colour selection strategyin sequences where loudness descended linearly from loud to soft and pitch followedthe reverse pattern. The dominant strategy was to decrease saturation and increasevalue levels. Based on these results we can suggest that saturation and value predict

Page 184: Musical Imagery

KOSTAS GIANNAKIS AND MATT SMITH 171

Table 10. Overall results for sequences with ascending loudness (linear) and descending pitch(linear).

)( .I )()( )( .I.I .I )(.I )( .I)( .I .I.I .I .ITotal

o

4

8

o

o4

8

2

o2o8

332110224

Table II. (Left) Breakdown of results for the saturation-value strategy based on the resultsin Table 9. (Right) Breakdown of results for the saturation-value strategy based onthe results in Table 10.

.. "'"" .. "', .., l " .-";

-\ ..... -'. ._.-1 1 0 2 0 0 0 0

" 1 1 0 2 " 4 3 0 7

" " 0 0 0 0 " " 0 1 2 3

" 3 3 1 7 " 0 0 0 0Total 5 5 1 11 Total 4 4 2 10

Table 12. Overall results for sequences with non-linear variation in both loudness and pitch.

'.;-", I • - ........ • r T"", , : .. ..\.,..:, -r -..,

.. - - ."-' ---- --, .- - - --- I , "',-". {'1 , ;; ,'- I "."1 ', ..... - . -.-.- --- - - - -)( )( )( 1 0 0 1.I )( )( 1 0 0 1)( .I )( 0 1 2 3)( )( .I 1 0 0 1.I .I )( 0 0 3 3.I )( .I 0 0 0 0)( .I .I 2 4 3 9.I .I .I 3 3 0 6Total 8 8 8 24

Page 185: Musical Imagery

172 IMAGING SOUNDSCAPES

Table 13. Compiled results for all the sequences that hue remained constant.

.. :, - " 'I',

Red - Yellow 16 20 28 64Green - Cyan 20 28 10 57Blue - Maqenta 26 19 8 51

Total 62 67 46 175

loudness and pitch respectively. However, this would hold only if the same applied tosequences where pitch descended linearly from high to low and loudness followed thereverse pattern. This is clearly demonstrated from the results shown in Table 11 (rightside).

The results for the fourth level of complexity are shown in Table 12 on the pagebefore. Once again, subjects varied saturation and value as a response to the non-linearsimultaneous variation in both pitch and loudness. As it can be seen in Figures 8, 9,and 10 on pages 175-177 the variations in saturation and value seem to match thevariations in loudness and pitch respectively, however, the complexity of the sequencesclearly affected the accuracy of colour selections.

Finally, we examined the colour selection strategy that involved no variation inany colour dimension, i.e. subjects selected the same colour for all the tones in thesequence despite the variation in loudness and/or pitch. These results are shown inthe first row of figures for the tables discussed above (except Table 11). Summing upthese figures results in six such cases which, surprisingly, belong only to the subjectthat failed the Ishihara test. However, since there was only one colour-blind subject,the above observation has no significant statistical value.

As previously mentioned the vast majority of subjects kept hue at constant lev-els. It is of further interest to examine to what extent these levels are related to soundcharacteristics. In Table 13 we have compiled the results for all the sequences thathue remained constant. The hues are organised in pairs and in the same order asthey appear in the HSV colour space. For sequences of low frequency tones, chosenhues appear to fall most often in the blue-magenta region. For sequences of mid-dle frequency tones, chosen hues appear to fall most often in the green-cyan region.Finally, for high frequency tones, chosen hues appear to fall most often in the red-yellow region. Therefore, although hue does not seem to have any immediate effecton pitch-colour selections, there seems to be an effect in terms of general frequencyranges. This means that subjects associated hue with certain frequency ranges andvaried value with the various frequencies that fall in those ranges.

Summary and conclusions

Based on the above analysis and discussion we can argue that the loudness and pitchof pure tones can be predicted by saturation and light intensity respectively (see Fig. 2on the next page). In general, quiet tones were associated with low levels of satura-tion while louder tones with increasing levels of saturation. Furthermore, low-pitched

Page 186: Musical Imagery

KOSTAS GIANNAKIS AND MATT SMITH 173

WeakSaturation

Strong

.ca.

Ahigh-pitched quiet soundcorresponds to a light colourweak in saturation.

Quiet LoudLoudness

Figure 2. Proposed space for the associations between pitch-light intensity and loudness-saturation.

----Value

---- saturation

91 Hue

--- Loudness

Figure 3. Trends for each colour dimension based on the results in Table 6.

---Value

--- saturation

---Pitch

Figure 4. Trends for each colour dimension based on the results in Table 7.

Page 187: Musical Imagery

174 IMAGING SOUNDSCAPES

---Value

--- Saturation

--- Loudness - Pitch

Figure 5. Trends for each colour dimension based on the results in Table 8.

6

5

4

3

2

o __..L.-_--..L..__'---_...I

---Value

Saturation

---Hue

-ll-Pitch

Figure 6. Trends for each colour dimension based on the results in Table 9.

6 Bl,

5

4

3

2

-ll-Value

-II- Saturation

-II- Hue

-II- Loudness

Figure 7. Trends for each colour dimension based on the results in Table 10.

Page 188: Musical Imagery

KOSTAS GIANNAKIS AND MATT SMITH

---- Hue

m Loudness

---- Pitch

175

6

5

4

3

2

Ii Saturation

---- Loudness

---Pitch

Figure 8. Trends for each colour dimension based on the results in Table12 (low frequency).

tones evoked dark (low levels of light intensity) colour selections while high-pitchedtones were associated with lighter colours. Hue was not found to have any immediateassociation with pitch or loudness. However the experimental results suggest an as-sociation between hue and certain sound frequency ranges. Finally, our experimentaldesign suggests that the use of a three-dimensional colour space can provide a moreuseful framework for the investigation of auditory-visual associations than the use ofsingle dimension scales by previous studies.

Page 189: Musical Imagery

176 IMAGING SOUNDSCAPES

---Hue

---Pitch

6

5

4

3

2

--- Loudness

----Pitch

Figure 9. Trends for each coloUr dimension based on the results in Table 12 (mid frequency).

Further work

We should point out that this research did not attempt to derive perceptual scales forthe associations between auditory and visual dimensions. To develop such scaleswould require rigorous psychophysical experiments - such experiments would be anatural extension leading from the research we describe here. A limitation of our ex-periment is that it focused on pure tones. However, in order to design effective sounddesign tools we must deal with the dimension of timbre. As we discussed earlier, the

Page 190: Musical Imagery

KOSTAS GIANNAKIS AND MATT SMITH 177

6

5

4

3

2

---Hue

---Pitch

0---------------------'6

5

4

3

2

-.- Loudness

O---------------L....-----I6

5

4

3

2

O---..I..------...........----&...---------J

----Pitch

Figure 10. Trends for each colour dimension based on the results in Table 12 (high fre-quency).

perception of timbre is a more complex and multidimensional phenomenon. Recently,visual texture has been proven effective when used in the of multidimen-sional data sets (e.g., Ware & Knight, 1992; Healey & Enns, 1998). In another study(Giannakis & Smith, 2000), we have also identified a number of important similaritiesbetween timbre and visual texture that suggest further investigation of the potentialcognitive associations between these sensory percepts.

Page 191: Musical Imagery

178

Acknowledgements

IMAGING SOUNDSCAPES

We would like to thank Dr. John Dack and his associates at the Sonic Arts Dept.(Middlesex University) for their support in carrying out our experiment. Furthermore,we wish to thank Prof. Ann Blandford for her feedback on drafts of this chapter. Thisresearch is funded by a Middlesex University research studentship.

Notes

I. The CIELUV and CIELAB colour spaces are based on a system of colorimetry developed by the Com-mision Internationale de l'Eclairage (CIE). The Hue, Saturation, Lightness (HSL) and Hue, Saturation,Value (HSV) colour spaces are based on the perceptual dimensions of colour described above and arewidely used in computer systems due to their ease of implementation. Finally, the Natural Colour Sys-tem (NCS) is based on the opponent theory of colour that is described later in this chapter. For a moredetailed description of these colour spaces, see Jackson et al. 1994.

2. For a more detailed description of these visual representations of sound, see Roads (1996).

References

Barrass, S. (1997). Auditory information design. PhD Thesis. Camberra: The Australian National Univer-sity.

Bismarck, von G. (1974). Timbre of Steady Sounds: A Factorial Investigation of its Verbal Attributes.Acustica, 30, 146-159.

Caivano, J. L. (1994). Colour and Sound: Physical and Psychophysical Relations. Color Research andApplication, 19(2), 126-132.

Fairchild, Mark D. (1998). Colour Appearance models. Reading Massachusetts: Addison Wesley long-man.

Fortner, B. and Meyer, T. (1997). Number by Colors. New York: Springer-Verlag.Giannakis, K. and Smith, M. (2000). Towards a Theoretical Framework for Sound Synthesis based on

Auditory-Visual Associations. In Proceedings of the AISB 2000 Symposium on Creative and CulturalA!Jpects and Applications ofAI and Cognitive Science (pp. 87-92). University of Birmingham, UK.

Grey, J. M. (1975). Exploration of Musical TImbre. PhD Thesis, Report No. STAN-M-2. Stanford,California: CCRMA, Stanford University.

Healey, C. and Enns, J. (1998). Building Perceptual Textures to Visualize Multidimensional Datasets. InProceedingsofthe IEEE Visualization 1998 (pp. 111-118). Los Alamitos, Calirfornia: IEEE ComputerSociety Press.

Jackson, R. et al. (1994). Computer Generated Colour: A Practical Guide to Presentation and Display.Chichester: John Wiley & Sons.

Leman, M. (1993). Symbolic and Subsymbolic Description of Music. In G. Haus (Ed.), Music Processing.Oxford: Oxford University Press.

Lesbros, V. (1996). From Images to Sounds, A Dual Representation. Computer Music Journal, 20(3),59-69.

Marks, L. E. (1997). On colored-hearing Synesthesia: Cross-modal Translations of Sensory Dimension.In S. Baron-Cohen, J.E. Harrison (Eds.), Synesthesia: Classic and contemporary readings (pp. 49-98).Oxford: Blackwell Publishers Ltd.

McAdams, S. (1999). Perspectives on the Contribution of Timbre to Musical Structure. Computer MusicJournal, 23(3),85-102.

Padgham, C. (1986). The Scaling of the Timbre of the Piping Organ. Acustica, 60, 189-204.Plomp, R. (1976). A!Jpects ofTone Sensation. London: Academic Press.Roads, C. (1996). The Computer Music Tutorial. Cambridge Massachusetts: MIT Press.

Page 192: Musical Imagery

KOSTAS GIANNAKIS AND MATT SMITH 179

Russell, (1995). PowerSynthesiser Manual. Brighton: University of Sussex.Slawson, W. (1985). Sound Color. Berkeley California: University of California Press.Ware, C. & Knight, W. (1992). Orderable Dimensions of Visual Texture for Data Display: Orientation, Size,

and Contrast. In Proceedings of the 1992 CH1- ACM Conference on Hunlan Factors in ComputingSystems (pp. 203-209). New York, N.Y.: ACM Press.

Wenger, E. (1998). Metasynth Manual. I & U Software (http://wi1w.uisoftware.com/).

Page 193: Musical Imagery
Page 194: Musical Imagery

II

Performance andComposition

Rolf luge and Harald J

The focus in the second part of this book is on applications of musical imagery. Mu-sical imagery is an integral element of composition and related activities, such asimprovisation and arranging, and of most kinds of performance, be that instrumental,vocal or in conducting. Anecdotes of the way outstanding composers and performershave worked with musical imagery are many and remarkable, however the challengenow is to work towards more systematic theoretical and practical knowledge of howto generate and/or enhance images of musical sound in various tasks. This is a long-term and admittedly rather immodest goal, but if such a project is successful, thesesystematic methods of musical imagery could have important consequences for thedaily work of many musicians as well as for music education at all levels.

Much of what we know about musical imagery in performance and compositionis from indirect sources in the sense that we have to deduce what is going on in theminds of the performers and composers from their musical practice. This is similar towhat we saw in the contributions in the first part of this book, i.e. that we often haveto deduce information about musical imagery from the perceptual process. One primecase of this is expressivity in performance and musical imagery. 'Expressivity' hasnow become a common term for denoting a live, 'human' performance of a musicalwork as opposed to a flat, 'machine-like' performance, and includes elements such asvariations in tempo and timing, various accents and other dynamic nuances, phrasing,use of vibrato and/or tremolo, various timbral inflections, etc., in sum anything whichattests to a 'human touch' in an actual performance of a notated work of music.

Page 195: Musical Imagery

182 PERFORMANCE AND COMPOSITION

Furthermore, expressivity belongs to 'performance tradition' in the sense thatWest-ern notation is at best quite sketchy with regard to such qualities. Western art music isin fact dependent upon an 'oral' transmission of performance tradition, as is the casewith the more or less unbroken 'apostolic' succession from one virtuoso to anothervirtuoso from the first half of the nineteenth century to our own times, telling us howthe works of Chopin, Liszt, Schumann, etc. should be performed. This tradition ofnot notated, yet well known, expressivity is in fact a case of musical imagery, and thecontribution ofBruno Repp at the beginning of this part of the book (chapter 10) is de-voted to exploring images of expressive timing in musical imagery. There is a reviewof related research in this chapter, and in addition, a report of the results from recentexperiments by the author. One of the conclusions in this chapter is that expressivetiming is indeed integral to musical imagery, and that it is also closely related to themusician's motor intentions.

Trying to establish links between the musicians' intentions, internal images andfeatures of the resultant sounds is also the topic of the contribution ofWolfgang Auha-gen and Victor Schoner (chapter 11). They explore the possibilities of associatingtimbral images with certain linguistic metaphors musicians use to designate varioustimbral qualities. A mapping of metaphors and qualities of the resultant sounds is car-ried out by analysing various prominent features in the acoustic signal. The authorsconclude that it is indeed possible to establish correlations between certain commonlyused verbal attributes of timbre and properties of the musical signal, and that musi-cians employ a kind of feedback loop to adjust their playing technique until the desiredsonorous results are obtained.

It seems inevitable that musicians somehow make previews of what to do next, asshown in this study about timbre, and in general when performing longer stretches ofmusic. Many accounts, ranging from the anecdotic of how performers visualize an en-tire piece ofmusic 'all at once' , to studies of preparation ofperformance movements atsmaller time-scales, all seem to indicate that there are various kinds of overviews andpreparations in the minds ofmusicians. This is the subject of Tellef Kvifte's contribu-tion (chapter 12) where he discusses various images of musical form exemplified withNorwegian Hardingfiddle music. This is a topic that is situated between performanceand composition, because performers of Norwegian Hardingfiddle music usually puttogether a number of tune fragments in a performance, so that each performance ofany well-known tune may be a novel concatenation of fragments. It has often beenclaimed that grouping of information in hierarchies facilitate memory tasks, hence,is also an integral element in images of musical form. An analysis of actual perfor-mances of Norwegian fiddlers' tunes as well as interviews with fiddlers contradictthis view, showing that networks or chains of small tune fragments, allowing variableconcatenations, are equally possible. Hence, images of large-scale forms in this folkmusic tradition are flexible yet robust at the same time.

Also situated between performance and composition, the contribution ofRolf IngeGod{()y (chapter 13) tries to show how motor images can play an important role ingenerating and enhancing images of music at various levels of resolution, i.e. both atthe meso- and macro-levels of phrases, sections and even whole movements, as wellas at micro-levels of individual sounds. In addition to images of sound-producing ac-tions, there are the many ecologically conditioned images ofexcitation and resonance,

Frances Shih
Frances Shih
Page 196: Musical Imagery

ROLF INGE GOD0Y AND HARALD J0RGENSEN 183

meaning knowledge of both the material properties and the behaviour of various in-struments (and other physical objects not considered musical instruments in the usualsense), knowledge which can be used to generate, enhance and differentiate images ofmusical sound in our minds.

Images of the various components of sound-production can be applied to both per-formance and composition. However, as can be seen from the contribution of Rose-mary Mountain (chapter 15), composers employ many strategies to generate imagesof musical sound. The bag of tips and tricks is rather heterogeneous, ranging fromgraphical sketches to spreadsheets and algorithms, and makes it difficult to concludeotherwise than that composers' strategies are as diverse as the music they compose.However, common to most composers is that they are creating illusions, and that theseillusions often relate to objects, phenomena, events, etc. of the 'real world'. Further-more, that principles of grouping and segregation usually play important roles even inthe most abstract cases of sonic design.

One such type of sonic design encountered in Western art music is that whichrelates to the keyboard. This is the main theme of James Baker's contribution (chap-ter 14), where he starts out with the observation that musicians often tend to makethe same movements when asked to imagine music as they use when really playing,attesting to a close relationship between musical sound and its generation. This re-lationship is here explored in various works of music of the past three centuries, Le.during a period when the keyboard has been perhaps one of the most important toolsfor compositional work. This 'paradigm' of the keyboard in composition is seen inrelation to contemporary ideas of embodied cognition, as well as to recent researchwhich points to the crucial role of instrumental practice in the memory for music.

Finally, with the contribution of Lewis Rowell (chapter 16), we encounter in themusical imagery of India a more 'holistic' view, as musical imagery is seen in relationto not only performance and composition, but to philosophy and cosmology as well.The notions of musical imagery depicted here belong to an oral tradition, and notablyso, a tradition with a highly elaborate conceptual apparatus for designating sonorousqualities. The links to Indian cosmology may perhaps seem strange to most Western-ers, however these concepts may also be regarded as a practical support for memory,similar in function to the repertoire of metaphors used in Western music to denote var-ious features when we speak about and imagine musical sound. This presentation ofIndian notions of musical imagery is a fitting ending to this book, not only as a coun-terbalance to our frequent ethnocentric ways of thinking, but also because it leads usback to the basic 'philosophy of mind' issues which were presented at the beginningof this book, issues which we believe will invariably be part of the study of musicalimagery.

Frances Shih
Page 197: Musical Imagery
Page 198: Musical Imagery

10

Expressive Timing in theMind's Ear

Bruno H. Repp

Introduction

Musical imagery, defined here as the vivid imagination of musical sounds that are notphysically present at the moment, may occur in at least four different real-life situ-ations. First, a composer may imagine novel music without the aid of notation or amusical instrument. Such creative imagery is likely to be fragmentary and exploratoryin nature (see Mountain, this volume). Second, a musically literate person may imag-ine music as he or she is reading an unfamiliar score. This ability varies widely evenamong professional musicians (see Brodsky, Henik, Rubinstein, & Zorman, 1999),and in most cases the resulting imagery is probably incornplete and at a much slowertempo than the one intended by the composer. Third, a previously heard or imag-ined piece of music may be recalled from memory, with or without the aid of a score.Although this process also is often fragmentary and discontinuous, it can be carriedout at approximately the correct tempo and in a continuous fashion if the memory isstrong enough and if attention is devoted to the task. Fourth, a musician may use mu-sical imagery during performance to achieve the desired sound and expression, andto compare the musical image with the immediate feedback received from the instru-ment. This imagery must of course be continuous and at the correct tempo, though itcan be intermittent or absent when performance or listening is on automatic pilot, asit were. An involved listener thoroughly familiar with a piece of music may engage inan analogous process of what Levinson (1997, p. 16) has called 'vivid anticipation'.

Page 199: Musical Imagery

186 EXPRESSIVE TIMING IN THE MIND'S EAR

It may be methodologically difficult, however, to distinguish this form of anticipatoryimagery from the perception of the simultaneously occurring sounds.

This chapter is concerned mainly with the third of these situations ('playing back'a familiar piece from memory at an appropriate tempo) and to some extent with thefourth (imagery during performance). Moreover, it addresses only one aspect ofmusi-cal imagery and that is its timing. Specific properties of the imagined musical sounds,such as their vividness, pitch, timbre, or intensity, will not be dealt with here. Thequestion of interest is whether imagined music is (or can be) expressive in the sameway that a good performance is expressive, and specifically whether the musical imageis (or can be) paced in the same way as an expressive performance.

Expressive timing consists in a continuous modulation of the local tempo of aperformance, which is most obvious in compositions from the Romantic period. Thismodulation has three largely independent aspects: First, there is the mean value aroundwhich the modulation occurs, which corresponds roughly to the basic or global tempo.Second, there is the magnitude of the modulation, which may be measured either inabsolute terms by a standard deviation or in relative terms by a coefficient of variation(the standard deviation divided by the mean). Third, there is the specific pattern ofthe modulation, which is the focus of attention in the present research. The expressivetiming pattern is constrained but not fully determined by the musical structure, anddifferent patterns are possible for the same music, although a typical timing patterncan usually be identified by analyzing large samples of performances (Repp, 1998a).

To the best of the author's knowledge, there are no previous studies of musicalimagery that have examined whether imagined music is expressive, and specificallywhether it is expressively timed. In fact, the question may not even have been raisedpreviously. However, several authors have attempted to determine whether music isimagined at the same global tempo as it is performed. Some of these studies com-pared keyboard performance with and without auditory feedback. Playing on a silentkeyboard clearly requires musical imagery, though not necessarily to a greater extentthan normal performance does; the imagery may just be more conscious in the ab-sence of heard sound. More than a century ago, Ebhardt (1898) observed that pianistsplayed more slowly when the piano action was separated from the strings, but more re-cent studies with electronic keyboards by Gates and Bradshaw (1974), Finney (1997),and Repp (1999c) have not found any significant tempo difference. Gabrielsson andLindstrom (1995) asked musicians to play simple tunes on a keyboard and then totap rhythmically on a sentograph (a silent pressure-sensitive button) while imaginingthe tunes. They, too, found only a nonsignificant tendency for the sentograph tap-ping to be slower than the keyboard performances. Clynes and Walker (1982) reportan experiment in which musicians were asked to repeatedly play or merely imagine('think') various short pieces. The imagined performances were found to be signif-icantly slower and also more variable in tempo than the actual performances. How-ever, Halpern (1988) observed that the tempi of imagined songs were not significantlyslower than the preferred tempi for the same songs in an adjustment task, though shedid find a regression to the mean, with slow songs being imagined faster and fast songsbeing imagined slower than their preferred versions.

Page 200: Musical Imagery

BRUNO H. REPP 187

In his book, Sentics: The Touch of the Emotions, Clynes (1977) reports informalexperiments with several famous musicians in which they were asked to tap rhyth-mically on a sentograph while imagining works by different composers. Analysis ofthe averaged sentographic pressure curves revealed different shapes for different com-posers, which lent support to Clynes's theory of a composer's distinctive 'inner pulse'.These findings demonstrate that musical imagery can give rise to characteristic motorkinematics, but they do not imply that the timing of the finger taps imitated the ex-pressive timing patterns of performances; on the contrary, the finger taps were pacedby a regular visual signal and thus must have been fairly evenly timed.

One study that came close to addressing the question of expressive timing is theaforementioned one by Gabrielsson and Lindstrom (1995), in which musicians wereasked to play simple tunes on a keyboard and then to tap out the rhythmic patternsof the tunes on a sentograph with the intention of conveying specific emotions. Thetiming patterns in the two conditions turned out to be similar, which provides evidencethat the different emotional performance styles were preserved in the musical imagi-nation, particularly with regard to their timing. However, Gabrielsson and Lindstromwere mainly concerned with gross differences in tempo and rhythm between contrast-ing emotional styles, not with the subtle temporal inflections of typical expressiveperformance.

Of course, one might ask: Why should imagined familiar music not have all theexpressive features of a real performance, including its expressive timing? After all,listeners are exposed only to expressive performances, and musicians normally pro-duce only expressive performances, so that one may reasonably assume that this isthe format in which music is stored in people's memories. Certainly, musicians mustbe able to generate expressive intentions from long-term memory in order to playexpressively, and experienced listeners presumably derive expectations from a sim-ilar memory representation in the process of appreciating the expressive details ofa performance. However, psychologists and music theorists commonly assume thatwhat musicians and listeners store in memory is a categorized and schematic musi-cal structure similar to what can be seen in a printed score, and that expressive nu-ances are generated at the time of performance or listening from implicitly learnedrules and conventions that are applied to the stored structure (see, e.g., Clarke, 1985,1988; Todd, 1985). Such a theoretical division between abstract representation andconcrete realization, reminiscent of Chomsky's well-known competence-performancedistinction in linguistics, leaves open the theoretical possibility of not applying theexpressive rules to the remembered structures and of imagining (or indeed producing)a deadpan performance. Therefore, it may be wrong to assume that imagined music isautomatically and necessarily expressive. Rather, the expressiveness of an imaginedperformance, as of a real performance, may well be under conscious strategic controland may depend on the requirements of a particular task. Thus it makes sense to askwhether an imagined performance even can be as expressively timed as a real perfor-mance. The remainder of this chapter summarizes some suggestive findings from theauthor's research.

Page 201: Musical Imagery

188 EXPRESSIVE TIMING IN THE MIND'S EAR

Evidence for expressive timing in imagined music

The relevant empirical evidence comes from several recent experiments on timing per-ception or production that included conditions in which musical imagery was required(Repp, 1998c, 1999b, in press). These experiments all made use of the same musicalexcerpt, the opening of Chopin's Etude in E major, op. 10, No. J, which is shownin Figure 1 on the facing page. The final chord, which is not in the original music,was added to give the excerpt maximal closure. The melody, in the highest voice, isdivided into a number of segments or rhythmic groups, each ending with a long note,as indicated above the score. An accompaniment in continuous sixteenth-note valuesis provided by the alto voice. The lower voices, played by the left hand, provide arhythmic and harmonic underpinning. With the exception of the initial upbeat (an un-divided eighth note), there is at least one note onset at every sixteenth-note metricalsubdivision, so that the timing of a performance of the excerpt can be described interms of nominally equal inter-onset intervals (lOIs), as indicated below the score.

A typical performance of the excerpt lasts about 20 seconds and contains largeexpressive timing modulations. The graph below the musical score represents a typicalexpressive timing pattern (or timing profile), which was obtained by averaging thetiming patterns of performances by 18 advanced student and amateur pianists (Repp,1999a). The pattern represents the average durations of the lOIs between successive'primary' tones, defined as the highest-pitched tones in the sixteenth-note positionsin the score. (The initial eighth-note upbeat is not included in the graph or in anyof the following comparisons and correlations.) This average timing pattern is quiterepresentative of the individual performances in the sample; radical departures fromthe typical pattern are observed only in performances of some experienced concertpianists (Repp, 1998a). For further discussion of this typical timing pattern, which isalso aesthetically pleasing, see Repp (1997, 1998a, 1998b).

The first set of results relating to imagery comes from a study in which 6 skilledpianists were requested to carry out a number of different tasks, each of which was re-peated 10 times in immediate succession (Repp, 1999b). Variation across repetitionswas small, and therefore the data were averaged over all repetitions, separately foreach pianist. The first task was normal expressive performance of the Chopin excerpton a digital piano (Roland RD-250s). The data were recorded in MIDI format on aMacintosh Quadra 660AV computer using MAX software. Figure 2 on page 190 com-pares the average expressive timing pattern ofeach pianist (heavy line) with the typicaltiming pattern (thin line), previously shown in Figure 1. The correlations between theprofiles are shown as well. It is evident that all individual patterns were highly sim-ilar to the typical pattern, although there were individual differences in some detailsand in the magnitude of the timing modulation. One pianist (T.C.), a specialist in20th century music, unexpectedly produced a rather flat timing profile, and a secondpianist (M.S.) also showed somewhat reduced timing modulation, probably due to arather fast tempo. The other four pianists, however, produced large expressive timingmodulations, as expected.

The second task in the experiment again required expressive performance, butwith the sound of the digital piano turned off. Although auditory imagery may well beinvolved in normal expressive performance, it seems more essential in silent perfor-

Page 202: Musical Imagery

BRUNO H. REPP 189

Melodic-rhythmic groupsr-t....r--l r--1----1

I I I I I I I I I I I I I I I I I

Inter-onset intervals (lOis)

53 4Measure number

2450-+-------------------------------'

750

850-r-------------.-------.-----....----.

550

(j).sS50Q

Figure 1. Top: The opening of Chopin's Etude in E major, op. 10, No.3, with melodic-rhythmic groups and nominal inter-onset intervals (lOIs) indicated. Bottom: Atypical expressive timing profile. [Reproduced from Repp (1999a: Fig. 1) withpermission of The Psychonomic Society.]

mance. Figure 3 on page 191 compares the expressive timing patterns produced with(heavy line) and without (thin line) auditory feedback. They were extremely similar.Only the final 101 was substantially shortened by two pianists. It can also be seenthat there were no consistent differences in overall tempo between the two tasks. Thisresult is in agreement with several other studies that have compared keyboard perfor-mance with and without auditory feedback (Gates & Bradshaw, 1974; Finney, 1997;for additional detailed analyses of the present data, see Repp, 1999c), and it suggeststhat auditory imagery can effectively pace expressive performance. However, the pi-anist's physical interaction with the keyboard may also contribute to the expressivetiming pattern.

The third task of the experiment required the pianists to tap with the index fingeron a quiet response key (the 'enter' key of the computer keyboard) in synchrony withevery sixteenth note of one of their own expressive performances of the Chopin Etudeexcerpt. However, it is the fourth task which is of particular interest here. Here the

Page 203: Musical Imagery

190 EXPRESSIVE TIMING IN THE MIND'S EAR

c:- (1) Expressive - Typical timing profile

5

r=0.92

4

r=0.84

4 53

3

2

2

M.S.

H.S.

r=0.94

r=O.94

i r=0.85 i

4 5

4 5

3

3

1 21100-,------:-----:-----:------:----,1000 T.e.900! 800

Q 7006005004oo-+-----i--........

1 2

1000 D.G.900

! 800Q 700

600500

1100--------------1000 B.R.900

! 800Q 700

600500

234Measure number

5 234Measure number

5

Figure 2. Average timing profiles of six pianists in Task 1 (expressive performance), com-pared with the typical timing profile (from Fig. I). [Reproduced from Repp (1999b:Fig. 2) with permission of The Helen Dwight Reid Educational Foundation (Hel-dref Publications).]

pianists were asked to tap in synchrony with an imagined expressive performance ofthe excerpt. In this condition, not only was there no sound, but the physical interac-tion with the piano keyboard was also absent. This, then, was a pure musical imagerycondition, and the finger taps reflected the temporal unfolding of the auditory image.Figure 4 on page 192 compares the pianists' tap lOIs (thin line) with the timing pro-files of their silent performances in the second task (heavy line). The four pianists whohad played with much expressive timing (top and center panels) were able to imagineand tap out their expressive timing patterns quite well, though not with perfect accu-racy. By contrast, the two pianists who had played with less expressive timing (bottompanels) tapped almost metronomically. (The overall correlations for these two pianistsare misleading because they reflect mainly the lengthened final or initial lOIs; they

Page 204: Musical Imagery

BRUNO H. REPP 191

I - (1) Normal (2) No feedback

5

5

r=0.95

r=O.96

4

4

3

3

H.S.

2

2

M.S.

5

5

r=0.96

r=0.98

4

4

3

3

2

2

1100-,.-------------1000 B.A.900I 800

Q 700600500400+------,----t-'--.,...-- ----'

1100......--------------.1000900

(i) 800.sQ 700

6005004oo+------,----t-'--.,...---------'

1100-,.-------------1000 T.C.900I 800

Q 700600500400-+------.........--.,...----

234Measure number

5 234Measure number

5

Figure 3. Average timing profiles of six pianists in Tasks 1 (normal expressive performance)and 2 (expressive performance without auditory feedback). [Reproduced fromRepp (1999b: Fig. 3) with permission of The Helen Dwight Reid EducationalFoundation (Heldref Publications).]

are much lower when these lOIs are omitted.) It may be that, even in their expressiveperformances, these two pianists intended to play metronomically, and that the smalltiming nuances in their performances were an unintended byproduct of their physicalinteraction with the piano keyboard (cf. Drake & Palmer, 1993; Penel & Drake, 1998;Repp, 1999b, 1999d). Thus their deadpan musical imagery may not reflect a generalinability to imagine music with expression but rather may be a reflection of their ex-pressive intentions in performances of this specific musical excerpt. Of course, thereremains also the possibility that these two pianists had somehow misunderstood theinstructions. In that case, their results indicate that musical imagery is not automati-cally expressive. Indeed, the author (B.R.) as a participant felt that the task requiredconsiderable attention. It may be easier, after all, to imagine a deadpan performance.

Page 205: Musical Imagery

192 EXPRESSIVE TIMING IN THE MIND'S EAR

[ - (2) No feedback (4) Tapping I

5

5

r=0.97

r=0.93

4

4

3

3

2

2

K.S.

H.S.

M.S.

5

5

r=0.87

r=0.94

r=0.96

4

4

3

3

2

2

111000 B.A.

900I 800Q 700

600500

1000 D.G.900I 800

Q 7006005004oo-+----;---r---r-----i----J

1000 T.e.900I 800

Q 700600

400-+----;---r---r-----i----'234Measure number

5 234Measure number

5

Figure 4. Average timing profiles of six pianists in Tasks 2 (expressive performance withoutauditory feedback) and 4 (tapping in synchrony with an imagined expressive per-formance). [Reproduced from Repp (1999b: Fig. 4) with permission of The HelenDwight Reid Educational Foundation (Heldref Publications).]

In fact, conditions in the second half of the experiment required the participants to dojust that (see Repp, 1999b).

Two further tasks from the first half of the same experiment are pertinent to musi-cal imagery. First, as in the third task, the pianists were required to tap their finger insynchrony with an expressive performance of the Chopin excerpt (Task 5). However,here the model performance was computer-generated and had a typical timing pattern(based on performances from nine pianists) very similar to the one shown in Figure 1.Figure 5 on the facing page compares the model timing pattern (heavy line) with thetap timing pattern of each pianist (thin solid line) and also shows the correlations, la-beled '(5)', between these two patterns. After only three practice trials, not includedin the figure, all pianists were quite successful in anticipating the timing variations in

Page 206: Musical Imagery

BRUNO H. REPP 193

54

=

(6):!r=00821

r = 0.751

(61=00781

(6) Tap lOis I

3

234 5Measure number

2

H.S.

M.S.

(5) Tap lOis

(5):j r =0.85j(6):lr =0.8t

B.R.

I - (5,6) Model lOis

.....

700en5. 600Q500

4002 3 4 5

800D.G. r =

700 (6):/ =0.90!en5.600Q500

4002 3 4 5

800T.C. r =0.89!

700 (6);r =009Zen5. 600Q

500

4002 3 4 5

Measure number

Figure 5. Expressive model timing profile (heavy line) and average tap timing profiles ofsix pianists in Tasks 5 (synchronization with music) and 6 (synchronization withclicks). [Reproduced with slight modifications from Repp (I999b: Fig. 11) withpermission of The Helen Dwight Reid Educational Foundation (Heldref Publica-tions).]

the model, even though there was a general tendency to underestimate long lOIs andto overshoot the following lOIs in compensation. What is especially noteworthy isthat at least one of the two pianists who did not seem to have intentions or imagery ofexpressive timing (bottom panels) was as accurate as the others in anticipating whatto him must have seemed unusually large timing variations. To the extent that musicalimagery was involved in this task, and there seems little doubt that it was, at leastas far as timing is concerned, the image presumably was based on a memory of therepeatedly heard model performance, or more likely on a conflation of that memorywith the individual pianist's preferred timing pattern.

The final task of the first half of the experiment (Task 6) was again a pure musical

Page 207: Musical Imagery

194 EXPRESSIVE TIMING IN THE MIND'S EAR

imagery task. Here the pianists had to synchronize their finger taps with a series of'clicks' (actually, very high-pitched digital piano tones) while imagining the music insynchrony with the clicks. The clicks followed exactly the same expressive timingpattern as the music in the preceding task. As can be seen in Figure 5 (dotted line,'(6)' correlations), the pianists were about as accurate in this task as in the precedingone. Thus they were able to anticipate the timing pattern of the clicks to a consid-erable extent by generating an appropriately timed musical image from memory. Itseems unlikely that they would have been able to maintain this level of synchroniza-tion accuracy without imagining the music.

A more recent synchronization experiment using the same musical excerpt (Repp,in press) included a baseline condition in which click sequences were presented ini-tially without musical imagery. Four different timing patterns were presented: threevery different patterns (Tl, T2, T4) derived from an analysis of a large sample ofexpert performances (Repp, 1998a), and a random pattern (Rl), generated by scram-bling the lOIs of the TI pattern. Twelve musically trained undergraduate students,mostly string instrument players, participated. They tried to tap in synchrony withthe click sequences instantiating the different timing patterns, without having beentold about the music. There were 10 successive trials for each pattern. Subsequently,the participants tapped in synchrony with the same click sequences, which now weresuperimposed on the identically timed Chopin excerpt. The presence of the musicwas expected to facilitate synchronization with the musical patterns, but not with therandom pattern. Finally, the participants again tapped to the click sequences withouthearing the music, but with instructions to imagine the music in synchrony with theclicks.

The results are summarized in Figure 6 on the next page in terms of an index ofanticipation accuracy, rO*, which ranges from 0 to 1. The index represents the cor-relation between the model timing pattern and the tap timing pattern, corrected for ahypothetical minimum correlation that would obtain if the taps tracked rather than an-ticipated the model pattern (see Michon, 1967). Somewhat surprisingly, the musicalT2 pattern was more difficult to synchronize with than the random pattern, Rl. How-ever, as predicted, anticipation accuracy improved substantially for all three musicalpatterns when the music was added to the clicks, whereas the Rl pattern showed only asmall improvement, probably due to general task practice. Moreover, the level of per-formance achieved in the music condition was essentially maintained in the imagerycondition, at least for the T1 and T4 patterns. These results indicate that synchro-nization accuracy is similar when music is present and when it is merely imagined,and they also demonstrate that synchronization with at least some musical timing pat-terns is facilitated in both conditions, relative to a condition in which music is neitherheard nor imagined. (Unfortunately, a subsequent replication of this experiment witha modified design did not yield clear evidence of imagery; see Repp [in press].)

The same experiment also included completely isochronous stimulus sequenceswith lOIs of 500 ms. Even though they served mainly as a practice condition, theyyielded another very interesting result pertaining to imagery, shown in Figure 7 onpage 196. The constant lOIs of the stimulus sequence are represented by the dottedhorizontal line in each panel. The data points with double standard-error bars rep-resent the average lOIs between the participants' synchronized finger taps. In the

Page 208: Musical Imagery

BRUNO H. REPP 195

-0- R1

---- T1

-.- T2

-+- T4

Music ImageryCondition

0.6

0.5

0.4

0.2

0.1

0.0Clicks

Figure 6. Average indices of anticipation accuracy (rO*) for four different timing patterns inthree conditions: clicks only, clicks accompanied by music, and clicks accompa-nied by musical imagery. [Data from Repp (in press).]

top panel are the results for clicks without music. After an initial 'tuning in' to thesequence tempo, the tap lOIs did not deviate significantly from 500 ms, as expected.The center panel shows the results for isochronous clicks accompanied by isochronousmusic. Here, as observed in several earlier studies in which participants had tapped insynchrony with isochronous music, albeit without superimposed clicks (Repp, 1999a,1999b, 1999d), there was a systematic pattern of deviations from isochrony in thetap lOIs. That pattern bears some relation to the musical structure and to the typicalexpressive timing pattern (Fig. 1); it is believed to represent a combination of typi-cal expressive tendencies and automatic error correction processes (see, e.g., Press-ing, 1998). The result of principal interest here is shown in the bottom panel of thefigure. This is the condition in which the participants tapped in synchrony with anisochronous click sequence while only imagining the music. Here, too, significantdeviations from regularity occurred in the tap timing. The deviations were smaller butextremely similar in pattern to those produced when the music was heard (r = .91, or.84 if the initial three data points are omitted). These results were replicated in thesecond experiment of Repp (in press). Thus, musical imagery can not only facilitatesynchronization with appropriate expressive timing patterns, but it can also interferewith synchronization with a mechanically regular sequence by introducing involuntaryand subconscious timing modulations in the motor activity.

Finally, one result should be mentioned that failed to show an effect of musicalimagery. The author has conducted many perceptual studies in which participants

Page 209: Musical Imagery

54

iso sequence___ Taps (clicks)

EXPRESSIVE TIMING IN THE MIND'S EAR

___ Taps (music)

iso sequence___ Taps (imagery)

3

345Bar number

196

520

500en

enS480E0Q460

440

530

510en

enS490E'-0

Q470

4502

530

(C)510en

enS490E'-0

Q470

4502

Figure 7. Average tap timing profiles (with double standard errors) for synchronization withan isochronous timing pattern in three conditions: (a) clicks only; (b) clicks ac-companied by music; and (c) clicks accompanied by musical imagery. [Data fromRepp (in press).]

Page 210: Musical Imagery

BRUNO H. REPP 197

were required to detect small hesitations (lengthenings of single lOIs) in an otherwiseisochronous rendition of a musical excerpt (Repp, 1992, 1998b, 1998c, 1998d, 1999a,1999d). The detectability of these timing perturbations always varied greatly withposition in the music, as shown for the Chopin Etude excerpt by the filled circles andsolid line in Figure 8 on the next page. These variations were found to be stronglyrelated to the typical expressive timing pattern for the music (Fig. 1), such that a slightlengthening of an 101 was difficult to detect when that 101 tended to be lengthenedin typical expressive performances. In one of these earlier experiments (Repp, 1998c:Exp. 2), participants were given the task ofdetecting similar deviations from isochronyin a sequence of clicks while imagining the music in synchrony with the clicks. Eachclick sequence was immediately preceded by an expressively timed rendition of theChopin excerpt (with a typical timing pattern similar to that shown in Fig. 1), so thatthe participants heard the music many times in the course of the experiment. Theyalso read along in the score as they imagined the music and marked their detectionresponses in the score. The results are shown as the open circles and dotted line inFigure 8. Even though there was unexpectedly large variation in the detectability oftiming perturbations across positions in the click sequence, the pattern of this variationdid not at all resemble the pattern obtained with music, apart from the poor detectionscores in the initial and final positions which may be attributed to psychophysicalcauses. Thus, musical imagery did not seem to affect timing perception, at least notin a structurally specific way. By contrast, results very similar to those for the musicalone were obtained when the isochronous click sequence was superimposed on theChopin excerpt, even though participants had been instructed to focus on the clicksand ignore the music (Repp, 1998c: Exp. 3).

Conclusions

The results summarized here show that music can be imagined as expressively timed.However, they do not prove that music is always or necessarily imagined in that way.The author's introspections and the finding that two out of six pianists tapped metro-nomically when synchronizing with imagined music (Fig. 4) suggest that expressivetiming is optional and possibly a luxury in the mind's ear. To achieve expressive tim-ing, it may be necessary to imagine not just the sound of the music but the bodily activ-ity of performing it as well. This may require extra effort. However, it may also be thecase that the ongoing physical activity (the finger tapping), which is naturally inclinedtowards isochrony, inhibits the imagination of musically relevant performance move-ments. Perhaps the imagination of expressive timing is easier without finger tapping,but then a different measure of its occurrence would have to be found.

Clearly, the imagination of expressive timing is under conscious control, at leastto the same extent as expressive timing in performance. However, musical imageryapparently also has consequences for timing that are below the level of awareness. Aswe have seen, imagining music without expressive timing while tapping in synchronywith a metronome leads to small but systematic variations in tap timing, similar tothe variation found when isochronous music is actually heard in synchrony with themetronome. One might then also predict that tapping directly in synchrony with an

Page 211: Musical Imagery

198 EXPRESSIVE TIMING IN THE MIND'S EAR

I Music -0- Clicks with imagery100

80

UQ) 60t:00'EQ)e 40Q)a..

20

02 3 4 5

Measure number

Figure 8. Percent correct hits as a function of sequence position in a task requiring detec-tion of small local increments in 101 duration in an isochronous sequence, in twoconditions: music, and clicks with imagined music. [Reproduced with slight mod-ifications from Repp (1998c: Fig. 7) with permission of Springer-Verlag.]

imagined metronomic music performance, without an auditory metronome, wouldreveal a similar pattern of timing variations in the finger taps. Repp's (1999b) studyactually included this task as well, but the results were not so clear. This calls forfurther research.

So far, all the positive evidence for effects of musical imagery on timing comesfrom the timing of concomitant motor activity. There are no results so far suggest-ing that musical imagery interacts with the perception of the timing of simultaneousauditory input. Although the imagination of musical sounds is a form of internal per-ception, the pacing of that imagination may be viewed as a form of internal action -a performance on the mind's synthesizer, as it were. This is a potentially rich area forfurther exploration by cognitive psychologists.

Acknowledgements

The author's research and preparation of this chapter were supported by NIH grantMH-51230. Thanks to Amandine Penel for helpful comments on the manuscript.

Page 212: Musical Imagery

BRUNO H. REPP

References

199

Brodsky, W., Henik, A., Rubinstein, B., & Zorman, M. (1999). Inner hearing among symphony orchestramusicians: Intersectional differences of string-players versus wind-players. In Yi, S. W. (Ed.), Music,mind, and science (pp. 370-392). Seoul, Korea: Seoul National University Press.

Clarke, E. F. (1985). Structure and expression in rhythmic performance. In P. Howell, I. Cross, & R. West(Eds.), Musical structure and cognition (pp. 209-236). London: Academic Press.

Clarke, E. F. (1988). Generative principles in music performance. In J. A. Sloboda (Ed.), Generative pro-cesses in music: The psychology ofperformance, improvisation, and conlposition (pp. 1-26). Oxford,U.K. : Clarendon Press.

Clynes, M. (1977). Sentics: The touch of the emotions. New York: Doubleday. (Bridport, Dorset, U.K.:Prism Press, 1989.)

Clynes, M., & Walker, 1. (1982). Neurobiologicfunctions of rhythm, time, and pulse in music. In M. Clynes(Ed.), Music, mind, and brain: The neuropsychologyofmusic (pp. 171-216). New York: Plenum Press.

Drake, C., & Palmer, C. (1993). Accent structures in music performance. Music Perception, 10,343-378.Ebhardt, K. (1898). Zwei Beitrage zur Psychologie des Rhythmus und des Tempo. Zeitschriftfiir Psycholo-

gie und Physiologie der Sinnesorgane, 18, 99-154.Finney, S. A. (1997). Auditory feedback and musical keyboard performance. Music Perception, 15,153-

174.Gabrielsson, A., & Lindstrt>m, E. (1995). Emotional expression in synthesizer and sentograph performance.

Psychomusicology, 14,94-116.Gates, A., & Bradshaw, J. L. (1974). Effects of auditory feedback on a musical performance task. Percep-

tion & Psychophysics, 16, 105-109.Halpern, A. R. (1988). Perceived and imagined tempos of familiar songs. Music Perception, 6, 193-202.Levinson, J. (1997). Music in the moment. Ithaca, NY: Cornell University Press.Michon, J. A. (1967). Timing in temporal tracking. Assen, NL: van Gorcum.Penel, A., & Drake, C. (1998). Sources of timing variations in music performance: A psychological seg-

mentation model. Psychological Research, 61, 12-32.Pressing, 1. (1998). Error correction processes in temporal pattern production. Journal ofMathenlatical

Psychology,42,63-101.Repp, B. H. (1992). Probing the cognitive representation of musical time: Structural constraints on the

perception of timing perturbations. Cognition, 44, 241-281.Repp, B. H. (1997). The aesthetic quality of a quantitatively average music performance: 1\\'0 preliminary

experiments. Music Perception, 14,419-444.Repp, B. H. (l998a). A microcosm of musical expression: I. Quantitative analysis of pianists' timing in

the initial measures of Chopin's Etude in E major. Journal of the Acoustical Society ofAmerica, 104,1085-1100.

Repp, B. H. (I 998b). Variations on a theme by Chopin: Relations between perception and productionof deviations from isochrony in music. Journal ofExperimental Psychology: Hunzan Perception andPerformance, 24, 791-811.

Repp, B. H. (l998c). Obligatory 'expectations' of expressive timing induced by perception of musicalstructure. Psychological Research, 61, 33-43.

Repp, B. H. (l998d). The detectability of local deviations from a typical expressive timing pattern. MusicPerception, 15, 265-290.

Repp, B. H. (1999a). Detecting deviations from metronomic timing in music: Effects of perceptual structureon the mental timekeeper. Perception & Psychophysics, 61, 529-548.

Repp, B. H. (1999b). Control of expressive and metronomic timing in pianists. Journal ofMotor Behavior,31,145-164.

Repp, B. H. (I 999c). Effects of auditory feedback deprivation on expressive piano performance. MusicPerception, 16,409-438.

Repp, B. H. (1999d). Relationships between performance timing, perception of timing perturbations, andperceptual-motor synchronization in two Chopin preludes. Australian Journal ofPsychology, 51, 188-203.

Repp, B. H. (in press). The embodiment of musical structure: Effects of musical context on sensimo-tor synchronization with complex timing patterns. In W. Prinz & B. Hommel (Eds.), Attention and

Page 213: Musical Imagery

200 EXPRESSIVE TIMING IN THE MIND'S EAR

Performance XIX: Common mechanisms in perception and action. Oxford, U.K.: Oxford UniversityPress.

Todd, N. (1985). A model of expressive timing in tonal music. Music Perception, 3,33-58.

Page 214: Musical Imagery

11

Control of Timbre byMusiciansA Preliminary Report

Wolfgang Auhagen and Viktor Schaner

Introduction

According to the standard definition, timbre is 'that attribute of auditory sensation interms of which a listener can judge that two sounds similarly presented and having thesame loudness and pitch are dissimilar' (ANSI s. 3.20, 1973). This definition reflectsthe multidimensional nature of timbre, which makes it difficult to name precisely therelation of certain physical parameters to perceived sound qualitites. Accordingly,this question is one of the main areas of research on timbre or sound colour. Thisresearch can be divided into two main categories: a) studies focused on timbre pro-duction, especially on physical parameters determining the timbre of different musicalinstruments, and b) studies focused on timbre perception.

Studies of the first category inquire e.g. which constructional details of a musi-cal instrument are responsible for the characteristic timbral qualities in comparisonto other related instruments. Another aim is the control of the sound quality of in-struments in the process of industrial production. In general, standardized methods ofexcitation are used to get comparable results, (cf. Dlinnwald 1994, Wogram 1992). Itis obvious that the player of a musical instrument can modify the timbre of musicalsounds heJshe produces, yet there are comparatively few studies focused on the con-trol of timbre by instrumentalists or singers (e.g. Canazza et aI., 1997; De Poli, Roda& Vidolin, 1998; lost, 1967; Krautgartner, 1982; Sundberg, 1975, 1981).

Page 215: Musical Imagery

202 CONTROL OF TIMBRE BY MUSICIANS

Studies on timbre perception can be divided into several subcategories: researchon the perceived similarity of sound colours (cf. Wedin & Goude, 1972), the segrega-tion or the blend of sound colours (cf. Reuter, 1996), the categorical timbre perception(cf. Grey, 1975), and on the verbal attributes of sound qualitites. It is by no meansself-evident that perceived sound qualities can be adequately described with meta-phors. Of course, there is a long tradition of verbal descriptions of sounds in music-theoretical treatises that include recommendations on instrumentation (e.g., Praeto-rius, 1619; Mattheson, 1713) or in handbooks on orchestration (e.g., Berlioz, 1843,Piston, 1955). But it is not clarified to what physical parameters those descriptionsrelate and whether they are inter-subjectively valid. A more scientific way to analyzethe interrelations between verbal attributes and perceived qualities of sound is an ex-perimental design in which subjects have to rate the adequacy of a list of adjectives asa description of their impressions. This method was introduced by K. Hevner (1935,1936) and was improved by applying factor analysis or multidimensional scaling tech-niques to the data.

W. Herkner (1969) used 44 sounds of orchestral instruments, played in variousarticulations and on various pitches. He asked his subjects to rate the suitability ofadjectives on a 7-point scale. Factor analysis of the correlations between the rat-ings resulted in three components, which Herkner called 'Lust - Unlust' (pleasure),'Ernst - Heiterkeit' (seriousness), or 'Aktivierung - Ruhe' (activity). Herkner wasable to retrace the expressivity of sounds to certain acoustical parameters. He foundthat noise-like sounds with many partials induced the impression of activity, whereasharmonic sounds with low fundamentals and no frequency-fluctuations of the partialsinduced the impression of seriousness. In summary, pitch, transients, and variation ofthe spectrum in time were important physical parameters.

Another example is the study by G. von Bismarck (1974a, 1974b). In order tohave exact control of the physical parameters, Bismarck used synthetic sounds withvarying cutoff frequency and envelope slope of the spectrum and different types ofspectra (harmonic vs. noise). Bismarck suggested that compactness and sharpnesswere the two principal components of factor analysis of subjects' responses. Theycould be traced back to the harmonic or inharmonic structure of the sound spectrum,its frequency range, and the slope of the spectral envelope.

R.A. Kendall and E.C. Carterette (1993a) used von Bismarck's adjectives in astudy that presented pairs of wind instrument timbres to the subjects. They found thatsharpness was not a good discriminator for these kinds of sounds. As a consequence,in a following study (1993b) they used 21 adjectives induced from W. Piston's bookon orchestration (1955). Subjects had to rate the adequacy of theses adjectives on a1oo-point scale. Various analyses of the data showed that several interpretations ofthe dimensions of a timbral space were possible. According to the authors, the bestinterpretation was a two-dimensional space with the dimensions 'nasal' (negative)versus 'rich' (positive), and 'brilliant' (negative) versus 'reedy' (positive) (Kendall &Carterette, 1993b, p. 491). However, because of the ambiguity of MDS solutions,Kendall & Carterette recommended to use different methods of research on timbreperception:

One of the clear messages of this study is that the naming of dimensions in a

Page 216: Musical Imagery

WOLFGANG AUHAGEN AND VIKTOR SCHaNER

MDS configuration must be accompanied by collateral studies. Far too manyMDS configurations are named arbitrarily, with little basis other than intuitionand subject to biases of expectation [...] Additional studies need to be done onthe influence of method on subject response strategy [...] (1993b, p. 494-496).

203

Thus, there is still a controversy on the relation of verbal attributes to physical param-eters of sound. In addition, until now it has not been studied in detail• whether metaphors like 'bright', 'dull' etc. can evoke precise images of sounds,• whether these images are inter-individually similar, and• which physical aspects of sounds are 'encoded' in imagery.

In nearly all of the studies on perceptual aspects of timbre, subjects had to evaluatesounds they actually listened to. The only exception is a study by M.A. Pitt andR.G. Crowder (1992). In this study subjects had to judge whether imagined soundsand sounds through audio equipment were identical or not. Various manipulationson the presented sounds (pitch, spectrum, transients) showed that spectral propertiesplayed a dominant role in memory for timbre, whereas temporal properties were ofsubordinate relevance.

In the experiment described in this paper, we study instrumentalists' imagery formusical timbre. Since changes of sound colour are part of an interpretation of a com-position we think that musicians are expected to have precise predetermined images.However, it is uncertain whether these images can be evoked by verbal attributes, andwhich physical aspects of sounds are modified because of timbral imagery.

Experiment

It was the general idea of the investigation to give verbal attributes to musicians andask them to produce those sounds that present the imagined sound qualities. So, wecould analyze the complex interaction of many physical parameters, e.g. dynamics,pitch, attack time, and sound spectrum (cf. Houtsma, 1997). In addition, we can attaininsight into the variety of sound qualitites that can be produced on one instrument.Our study focuses on the following questions:• To what degree do players of different musical instruments modify the sound of

their instruments according to imagined sound qualities?• Are there similarities between musicians' realizations of tones on different in-

struments, indicating that the images associated with certain verbal attributes ofsound are similar as well?

The assumption that physical similarities of tones are indicators of similarities in mu-sicians' timbral imagery warrants some discussion. Since the sound of an instrumentis partly determined by its construction, size, material and radiation characteristics,there will always be similarities between different sounds. Thus, the analyses of thedata have to focus on the systematic variation of the musicians' technique. If thereare similar variations across various instrumental groups with different constructionalfeatures (strings, woodwinds etc.), there would be a strong indicator for a meaningful

Page 217: Musical Imagery

204 CONTROL OF TIMBRE BY MUSICIANS

Table 1. Verbal sound-attributes of high importance for the subjects.

dunkelhell, klargrell, scharf, durchdringendmatt, fahl, muffig, dumpf, belegtengsamtigtragendgroB, vollschonaggressivtraurigschreiend

(dark)(bright, clear)(harsh, sharp, penetrating)(dim, pale, muffled, dull, dampened)(narrow)(velvety)(carrying)(big, sonorous)(beautiful)(aggressive)(sad)(screaming)

variation of sound colour by musicians. If no systematic variation can be detectedthere are several possible explanations:

• verbal attributes are not an effective means of evoking timbral imagery,• the meaning of the attributes has no inter-subjective validity,• the role of the instrument in sound production is dominant, and therefore players

are not able to produce clearly distinguishable sound colours.

Subjects and method of research

Subjects are players of the following instruments: violin (4), violoncello (5), bassoon(4), oboe (1), and french horn (1 ).1 All of our subjects study or have studied theseinstruments at the Hochschule fUr Musik 'Hanns Eisler' in Berlin and thus are experts.The experiment is run in a room of about 2000 cubic metres2 with a reverberation timeof 1.7 seconds (500 - 1000 Hz). This room is used for lectures, but also for concerts.

The first part of the experiment is an interview with the student. He or she is askedto give some information on the instrument, on the duration of his or her training,on his/her teacher(s), on the importance of verbal attributes of sound for the commu-nication between teacher and student, and on those verbal attributes that seem to beespecially important. These attributes are listed in table 1. In the second part of theexperiment, the student is asked to realize the following verbal attributes of sound onhis/her instrument: dunkel, hell, tragend, groB, schon, aggressiv, traurig. These at-tributes represent words that describe perceived qualities, an aesthetic evaluation andemotional aspects of the tones. In addition they are among those of special importancefor the students. The attributes dark and bright have already been studied with expres-sive music performances (Canazza et aI., 1997; Canazza, De Poli & Vidolin, 1997; DePoli, Rodii & Vidolin, 1998). Thus, results of this experiment can be compared withformer research.

In a first trial, the subject chooses pitch and dynamic level (P, mf, or f) of the tones

Page 218: Musical Imagery

WOLFGANG AUHAGEN AND VIKTOR SCHONER 205

according to his or her idea of realizing the sound quality in the best way. Then, he/sheis asked to produce the same sound quality on standardized tones in different octavesand on different dynamic levels. Such standardized tones built a basic set of directlycomparable tones that could be used e.g. for calculating average spectra across thesounds of different players. Since tones may be produced using various fingeringswe were interested in examining the influence of this parameter on the production ofdifferent sound qualities.

The performance is registered by a video camera monitoring the playing-tech-nique, e.g. with the string instruments: bowing point, velocity of bowing, and vibratoof the left hand. It is also monitored by a DAT-recorder (sampling rate: 44.1 kHz) withmicrophones positioned in two different places.3 One microphone is positioned nearthe ear of the instrumentalist, another in front of the instrumentalist at the reverbera-tion radius (a distance of about two meters). The video tapes are analyzed via a digitalvideo system, the audio tapes via a computer using the program 'SoundscopelI6'which includes various forms of FFf-analyses (average spectrum, sonogram).4 Up tonow, we analyzed the signals from the microphone representing the player's perspec-tive.

Research on the acoustics of musical instruments has revealed many parameterscontributing to characteristic timbral properties: e.g. attack and decay time, develop-ment of the spectrum during the transient processes, spectral envelope of the steady-state portion of the sound, and noise (Benade, 1976). Since Pitt & Crowder (1992)argued that spectral characteristics of the steady-state part of sounds are especiallyimportant for timbral imagery, we started with analyses of this parameter. It has to beemphasized that variations of the spectral content of a sound as analyzed by Fouriertransformation of the signal are not a direct measure of pereived changes in timbre(cf. De Poli & Prandoni, 1997). Accordingly, the results of signal anlysis have tobe checked by listening tests in which synthesized sounds are systematically varied(cf. Canazza, De Poli & Vidolin, 1997). This will be one of the next steps of ourresearch. Another step will be the development of statistical methods for the analy-ses of a greater number of sounds. Brightness seems to correspond to the location ofthe spectral centroid of a sound (cf. De Poli, Roda & Vidolin, 1998), defined as 'themidpoint of a spectrum's energy distribution' (Sandell, 1995, p. 222). However, ithas to be tested whether this measure holds for sounds with cyclically increasing anddecreasing amplitudes of the partials which are characteristic of woodwind instrumenttones.

Results

The results of the interviews can be summarized as follows: verbal attributes of tonesplay an important role in the communication between teacher and student. The at-tributes chosen for the experiment are widely used by different groups of instrumen-talists while there are some others used by a special group only. For example, the at-tribute 'muffled' seems to be a terminus technicus of the bassoon players. Despite thewidespread use of such attributes, the students have only vague explicit knowledge ofphysical sound parameters that are influenced when they imagine and produce a spe-

Page 219: Musical Imagery

206 CONTROL OF TIMBRE BY MUSICIANS

cial sound quality. The students have heard about overtones but do not know exactlyin which way these overtones influence the timbre of the sounds of their instruments.However, even without such an explicit knowledge, the results of the analyses indicatethat at least with some verbal attributes, players of different instruments change theirplaying parameters in such a way that the produced sounds are very similar.

Firstly, we examined the sounds produced to realize the attributes 'bright' and'dark'. Figures 1 and 2 on the next page show the spectra of the tones E2 and D3realized in mezzoforte or forte by two bassoon players (A, B).5 The dark bassoonsounds are characterized by high amplitudes of the partials in the region between400-600 Hz and a rapid decrease of the amplitudes of higher partials. The region ofhigh energy is a formant often described in literature (cf. Voigt, 1975). Bright soundsshow higher amplitudes of partials in the region of 1200-1400Hz in comparison withdark sounds. With sound A, the relative maximum in this region is raised from -30 dBto -20 dB in relation to the peak of the first formant. With sound B it is raised from-28 dB to -10 dB. Obviously, for the quality 'bright' not all of the higher partialshave higher amplitudes but only those within a relatively small frequency-region. Thisregion is called the second formant in some studies but was not constantly observedin former research (cf. Reuter, 1995, p. 100). Thus, this formant clearly depends onmusicians' ideas of timbre. The 'dark' sounds were produced with a low chin drawnbackwards, creating a big volume of the mouth cavity. The 'bright' sounds wereproduced with a high chin pushed forwards resulting in a small volume.

Figures 3 and 4 on page 208 show typical spectra of the tones and D4 realizedby two violoncello-players (A, B). One detail should be mentioned first: the spectraof dark violoncello and violin sounds are characterized by amplitude minima thatcorrespond to the bowing point which is at about 1/4 to 1/8 of the string length (I) forthe most part of the sounds, and multiples of this fraction. With the tone shownin figure 3, it is at about 1/8 I, and with D4 ' it is at about 1/5 I. Fractions like 1/4or 1/5 I were realized by shortening the string with a special fingering and movingthe bow towards the fingerboard. In contrast, the bowing point with bright tones is atabout 1/10 or even smaller fractions of the string length and accordingly there are noprominent minima to be seen in the spectra.

Like bassoon tones, the dark violoncello sounds show spectra whose envelopes arecharacterized by a steep slope, while the higher partials of bright sounds again havecomparatively higher amplitudes within a limited frequency range. This can be seenmore clearly if one calculates average spectra across several sounds of one quality andplots the difference between these average spectra of bright and dark sounds. Beforecalculating the average, the spectra were normalized. That is, the maximum peakswere adjusted to the same dB-value. Figure 5 on page 209 shows normalized averagespectra of four bright and nine dark violoncello tones and the difference between thesespectra.6 The maximum of this difference (10 dB) is below 5000 Hz. Above 6000 Hz,there is hardly any difference.

The results of the analysis of dark and bright violin tones are very similar to thoseobtained with violoncellos. Figure 6 on page 210 shows the average spectra of fivebright and seven dark sounds and the difference between these averages. The spectraof bright tones on average show higher amplitudes in a limited frequency range: thedifference reaches its maximum value of 10 dB between 4000 and 8000 Hz. The

Page 220: Musical Imagery

WOLFGANG AUHAGEN AND VIKTOR SCHONER 207

A---:-e.. . : . : 'T1 . : . : . : . : . dB

··r·::·1:·. ..·:.i,.·.:.·.:.·.· .• ::i.:.··.· ...··.·.:.::1.:·.·.·:.·.·.:.:.:::.·.:.·· ..: ..... .•'•...... .....•• •.•.•.• •• -60. .... . ..

-80

-100400 800 1200 1600 2000 2400 2800 3200 3600 4000 4400 4800 5200 Hz:

B

-100

-80

-40

...60

dB-20

. .. . ........: :J :] : : : : - : : : :...:..: .. ..---:. .. ··:---!<···!···:···I·.. . 1 : -r:::::r::::r::::r::::r::: r::"r::::r::::r:::::r

Figure 1. Sound spectra of dark bassoon tones.A: E2 ' forte, Instrumentalist 1. B: D3 ' mezzoforte, Instrumentalist 2.

-80

-60

A

': ::: "'-j-'" .. "Y':'+":'::l:::::::I:::::::I:

o 400 laO 1200 1'00 2000 2400 2100 3200 3600 4000 4400 .100 5200 Hz

B

dB-20

-40

-60

-80

-100o 400 100 1200 1'00 2000 2400 2800 3200 )600 4000 4400 4100 5100 Hz

Figure 2. Sound spectra of bright bassoon tones.A: E2 ' forte, Instrumentalist 1. B: D3 ' mezzoforte, Instrumentalist 2.

Page 221: Musical Imagery

208 CONTROL OF TIMBRE BY MUSICIANS

A

...:... .::.::: ::::::r::::r::::1:::::r::::r::::1:.:·:r ::

"::'I":':'l':::::',::::::I::::::I:':::::I:::::::I::Mm.-loo

500 1000 1500 2000 2500 3000 3500

. i' B

-100

dB-20

: : : : : ; : : : : :.:

.. :···(:·..*·:···j···:···li>I···:···!t···I···:···I···:···j···:·--/-··:···I···:···I···:···I<···I··

l' .:. ·1······ ···::··I:·::·::i::::::1::::::I:·::::·r::::::[::·:f:::::·.:::.. ::

Figure 3. Sound spectra of dark violoncello tones.A: Db3 ' mezzoforte, Instrumentalist 1. B: D4 , forte, Instrumentalist 2.

A

dB-20

-40

o 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

-'0-80

-100

B

dB-20

-40

-'0-80

1I-IIIIIIIIIIIIMIdlIl-loo700 1400 2100 2100 3500 .200 .'00 5'00 '300 1000 1100 1.00 9100

Figure 4. Sound spectra of bright violoncello tones.A: Db3 ' forte, Instrumentalist 1. B: D4 ' forte, Instrumentalist 2.

Page 222: Musical Imagery

WOLFGANG AUHAGEN AND VIKTOR SCHaNER

.............. M • ••• _ _ M 80

dB................................................................... M M • 60

................................................................................................................................................................................................................................... 0

4 6 7 8 9 10 11 12 13 KHz I

........._ - __ 80

dB........................................................................................................._ _ 60

...._ 20

........._ _ __ 0

209

o 4 9 10 11 12 13 KHz II

15dB

.......................................__ _ _ 10

..................................................................__ _ -5

7 10 11 12 13 XRzIII

Figure 5. Average sound spectra of bright and dark violoncello tones.I: Bright, 4 sounds, Db) -F) . II: Dark, 9 sounds, Db) -AJ . III: Difference I-II.

fact that bright tones are not characterized by a general boost of high order partialscorresponds with a result of the study by De Poli, Roda, and Vidolin (1998, Fig. 15).They used the spectral centroid as a measure of brightness and found that in a violinperformance the bright version was not characterized by an extreme high value of thisparameter. Instead, the highest value was found with the 'heavy' version.

After the analysis of bright and dark tones, bassoon sounds associated with theattribute 'bright' were compared to those associated with the attribute 'carrying'. Thespectra of the latter sounds show an additional region of higher amplitudes at about1900 to 2100 Hz. This third formant, seldom mentioned in literature, can be clearlyseen in figure 7 on page 211. With both sounds (A and B), the peak value in this fre-quency region is raised by 8 dB compared to the corresponding values of the spectra

Page 223: Musical Imagery

210 CONTROL OF TIMBRE BY MUSICIANS

.............................................................................................................................................................................- 60

dB................................................................................................................................................................._ 40

20

......................................._ _ _ 0

4 9 10 11 12 13 KHz

..............................................................._ _ _ 60

dB......................................................_ - 40

9 10 11 12 13 KHz II

9 10 11 12 13 KHz III74

....................__ _ - -- __ __ 15

dB..............................._ __ .. 10

o

Figure 6. Average sound spectra of bright and dark violin tones.I: Bright, 5 sounds, A4 -B4 . II: Dark, 7 sounds, A4 -B4 . III: Difference I-II.

of bright tones (cf. Fig. 2 on page 207). 'Carrying' sounds were produced in a similarway to the dark sounds, with a chin drawn back- and downwards, leading to a bigvolume of the mouth cavity. In addition to the spectral difference, there is anotherdifference between the bassoon-players' realization of bright and carrying tones: car-rying tones are played with a stronger vibrato. This holds for carrying tones played byvioloncello and violin players, too.

With the bowed string instruments such a strong vibrato leads to an asynchronousincrease and decrease of the various partials of the sounds (Meyer, 1992, 1993). This

Page 224: Musical Imagery

WOLFGANG AUHAGEN AND VIKTOR SCHONER

° 400 800 1200 1600 2000 2400 2800 3200 3600 4000 4400 4800 5200 Hz

. : . : . : . : . : . : . : . : . : . : . : . : .: =: :: ::::. .. .. ....

-100

-80

211

A°dB-20

-40

-60

-80

-100

B°em-20

-40

-6O

. . . .........................................· . . . .· . . . .· . . . .. : . : . : . : . :· . . . ....... ;· . . . ...........· . . . .· . . . .· . . . .........................................: : : : :· . : . : . : . :. . . .

: : : :·.+1.·..... 04••••••••••••• : ••••••• : •••••• :.

: I: : I: : : :: . : . : . : . : . : . : .: :-24 : :-38 : : :.. -:- -:- :..... : :- : .;.: . .. . . .. . . . .

. ; -8 . . . . . :-46 . . .

:: I'·::"· .· . . . .· .. .

o 400 100 1200 1'00 2000 2400 2100 3200 3600 4000 4400 4800 5200 Hz

Figure 7. Sound spectra of carrying bassoon tones.A: E2 ' forte, Instrumentalist 1. B: D3 ' mezzoforte, Instrumentalist 2.

can be seen in the spectrogram of a carrying violoncello tone shown in figure 8 onthe next page.? High amplitudes of partials are colored black, very low amplitudes arecolored light grey. Following the time-axis, one can see the variation of the amplitudesof the partials over time. The reason for these changing intensities are the variousmodes of the string vibration that at some moments excite body resonances of thevioloncello. At other moments, whenever the frequencies of the modes do not fit aresonance, these resonances are not excited. Since some modes of the string vibrationfit body resonances with the fundamental frequency of the string lowered whereasother modes fit resonances with the frequency raised, the changes of their intensitiesare not synchronized.

Hence, several features indicate that musicians' ideas of carrying may relate to theidea of a 'broadening' of sound:1. broadening of the perceived pitch. This effect depends on the frequency range of

the vibrato (Meyer, 1979, Fig. 7).2. broadening of the sound colour: one aspect is the raising of the amplitudes of

high-order partials as can be seen with the bassoon sounds. This can also be ob-served with violin sounds. Figure 9 on page 213 shows normalized average spec-tra of eight carrying and seven dark violin sounds and their difference. Withina large frequency range up to 12000 Hz this difference is about 10 dB. How-

Page 225: Musical Imagery

212 CONTROL OF TIMBRE BY MUSICIANS

Figure 8. Spectrogram of a carrying violoncello tone, C3 ' forte.

ever, comparing average spectra of carrying and dark violoncello tones, only verysmall differences in the upper frequency range could be detected. Figure 10 onpage 214 shows normalized average spectra of eleven carrying and nine darksounds and the difference between these averages. With such tones below middlec, there was hardly any difference above 6000 Hz.

Thus, the idea of broadening obviously is not necessarily coupled to a strong excitationof high-order partials. An important aspect might be the feeling of ·spaciousness'. Inroom acoustics it is well known that perceived spaciousness varies with the interauralcross correlation of the signals received by the left and the right ear: With this correla-tion coefficient decreasing the perceived spaciousness increases (Potter, Raatgever &Bilsen, 1995). As a consequence, in future research the signals received by the ears ofthe instrument players should be registered with two microphones directly placed intothe auricles and the maximum value of the cross correlation function should be cal-culated. If our hypothesis is correct, this value should be lower for sounds associatedwith the attribute 'carrying' than for other sounds.

Discussion

Timbre of musical instruments can be modified to a high degree by playing technique.Therefore, the instrumentalist's role should be given more attention in research on

Page 226: Musical Imagery

WOLFGANG AUHAGEN AND VIKTOR SCHONER

_ MM_ __ _ _ M · ·...........••.....M ·.··· M __.M..•.._ -........... 60

dB...................... _ ••••• __ __ ••• ••• _ MM _ ••• .. •• M ·................ 40

.·.M ·•··· · ···· ···· ·· _ ·········..········ · _ M_ - ·..M......... 0

213

4 5 10 11 12 13 KHz

............................__ M • _ M _.... 60

dB.............._ _M M _•..·.•· _.·_._•..••·.•· ···· ··M.•·.. 40

.... •• .. •••• .... .....

o 4 10 11 12 13 KHz II

20dB

.................._ .._ _................................................................................... 15

. . _ - _ _ _........ 0

o 1 2 3 4 5 6 8 9 10 11 12 13 KHz III

Figure 9. Average sound spectra of carrying and dark violin tones.I: Carrying, 8 sounds, 04 -B4 . II: Dark, 7 sounds, A4 -B4 . III: Difference I-II

musical instruments. Our musicians were able to produce changes of sound qualityto correspond with verbal attributes and they showed a consistency in the variation ofplaying parameters and parameters of the sound spectrum. They were able to producecertain sound colours in different dynamic levels which shows that timbre and in-tended dynamics are not completely coupled. The difference between dark and brighttones can be traced back to a difference in the amplitudes of partials within a limitedfrequency range and in the inner structure of the spectra with respect to amplitudeminima. Carrying tones realized by bassoon players are characterized by the appear-ance of a third formant in the region around 2000 Hz which may be interpreted as a

Page 227: Musical Imagery

214 CONTROL OF TIMBRE BY MUSICIANS

80dB

... 60

. .. _".." _ _ _ "......... 20

........- " " _ _ -...... 0

o 1 2 3 4 5 6 7 8 9 10 11 12 13 KHz I

..................._ _ " " _................................... . 80

dB.. .. " _ "" _ 60

.................................................................................................................... 20

............................................................................" -_ _ __..-..... 0

......................" _ 10

dB....................................__ _ - _ _ 5

.............................................................................................................._ -5

o 4 7 10 11 12 13 KHz ITI

Figure 10. Average sound spectra of carrying and dark violoncello tones.I: Carrying, 11 sounds, C3 -G3 . II: Dark, 9 sounds, DD3 -A3 . III: Difference I-II.

broadening of the spectral content. By contrast, the predominant feature of carryingtones realized by violin and violoncello players seems to be a vibrato-controlled spec-tral variation of sound over time. For further statistical analyses, a measure of thesespectral fluctuations will have to be developed.

The aforementioned parameters refer to the steady-state part of a sound. As al-ready mentioned, in a next step of analysis the transients have to be analyzed. DePoli, Roda & Vidolin (1998) found a great variation of this parameter with differentinterpretations of an excerpt from A. Corelli's Violin Sonata in A Major Ope V (nor-

Page 228: Musical Imagery

WOLFGANG AUHAGEN AND VIKTOR SCHaNER 215

mal, hard, soft, heavy, light, bright, dark). A first inspection of 'beautiful' versus'aggressive' sounds shows a similar variation of the attack time in our experiment.

With attributes referring to aesthetic qualities (e.g., 'pleasant') the results up tonow are less clear, indicating individual preferences of certain aspects of sound qual-ity. Emotional attributes could hardly be realized by our subjects with single tones.Here, the general opinion was that such qualities are related to musical phrases. Thisindicates that the way of passing from one tone to the next is especially important forthe aesthetic evaluation of a performance. This idea is supported by a result ofCanazzaet al. (1997). The analysis of listeners' impressions of different performances (nor-mal, hard, soft, heavy, light, bright, dark) of an excerpt from W.A. Mozart's ClarinetConcerto K622 showed that tempo and articulation were important factors influencingsubjects' responses. However, further detailed research on the realizations of the aes-thetic attributes is necessary, since in the study by Canazza et al. such versions werenot included, and accordingly corresponding factors are missing in the expressivitymodel of Canazza et al. (1999).

With respect to the teaching and the playing of musical instruments, it seems thatthe 'metaphorical' terminology is quite useful. The sound images associated withsome verbal attributes are surprisingly stable despite missing definitions of the words.It has to be tested whether those images have some intercultural validity. On the otherhand, we observed that some of our subjects needed many trials until they were satis-fied with the obtained sound quality. Thus, the question is raised whether the perfor-mances were really a result of timbral images or of auditory analysis and modificationof actual sounds. In our opinion sound production should be seen as a feedback pro-cess: the performer has an internal image of the sound he/she wants to produce andadjusts playing technique to this image. While realizing a sound he/she listens tohis/her performance and makes readjustments until the sound produced correspondsto his/her image. Various explanations of our subjects' different behavior are possible.Room response may have influenced the perceived timbre, or even the order in whichthe instrumentalists played the sounds may have had some influence. Finally, differ-ences in the precision of timbral images are possible as well as differences in motorskills needed for the realization of different sounds. With the strings, precise controlof the bowing point, velocity of bow movement, and bow pressure and the controlof the vibrato of the left hand are necessary to produce differentiated sound colours.With the woodwinds, pressure of the lips upon the reed (damping) as well as positionof the tongue and volume of the mouth cavity are critical parameters that have to becontrolled. Thus, a more detailed insight into the nature of sound production and theconsequences of varying certain playing parameters for the sound could make soundvariation more comprehensible and effective.

Notes

I. The numbers of subjects who participated in the study up to now are written in brackets. Since therewas only one oboist and one hom player these instruments are left out in the following.

2. Width: 18 m, length: 16 m, height: 7 m.3. Camcorder: Panasonic SX 30; DAT-Recorder: Sony Typ TCD-D8; Microphones: Neumann UM 57,

polar pattern.

Page 229: Musical Imagery

216 CONTROL OF TIMBRE BY MUSICIANS

4. Video systemCasablanca by MacroSystem; Macintosh computerQuadra 650; software Soundscope/16by GW Instruments; sampling rate: 44.1 kHz.

5. Data of the FFf: Hamming time-window, 2 14 samples (= 0.37 sec.).6. The average spectra were calculated by summing the amplitude values of each sample of the sound

spectra and dividing the sum by the number of spectra taken into account; since the average spectrashould be representative of the general spectral envelope of the sounds the parameters of the FFf wereset to wideband analysis.• Violin tones: Hamming time window of 5 ms corresponding to a filter width of 300 Hz within

the octave C3 -H3 ' Hamming window of 3.2 ms corresponding to a filter width of 450 Hz withinthe octave C4 -B4 ;

• Violoncello tones: Hamming window of 8 ms corresponding to a filter width of 184 Hz withinthe octave C3 -H3 ' Hamming window of 10 ms corresponding to a filter width of 150 Hz withinthe octave C4 -B4 .

7. Data of the Spectrogram: Hamming time-window, 33 ms, frame advance per analysis: 5 ms; the lowestpartial of the tone C3 shown in figure 8 is the third partial.

References

ANSI S. 3.20 - 1973, American National Standards Institute.Benade, A.H. (1976). Fundamentals ofMusical Acoustics. London: Oxford University Press.Berlioz, H. (1843). Grand Traite d'/nstrumentation et d'Orchestrationmodernes. Paris: Schonenberger.Bismarck, G. von. (1974a). Timbre of steady sounds: a factorial investigation of its verbal attributes.

Acustica, 30, 146-159.Bismarck, G. von. (1974b). Sharpness as an attibute of the timbre of steady sounds. Acustica, 30, 159-172.Canazza, S., De Poli, G., Rinaldin, S. & Vidolin, A. (1997). Sonological analysis of clarinet expressivity.

In M. Leman (Ed.), Music, Gestalt, and Computing: Studies in Cognitive and Systematic Musicology(pp. 431-440). Berlin, Heidelberg: Springer-Verlag.

Canazza, S., De Poli, G. & Vidolin, A. (1997). Perceptual analysis of the musical expressive intention ina clarinet performance. In M. Leman (Ed.), Music, Gestalt, and Computing: Studies in Cognitive andSystematic Musicology (pp. 441-450). Berlin, Heidelberg: Springer-Verlag.

Canazza, S., De Poli, G., Di Frederico, R., Drioli, C. & Roda, A. (1999). Symbolic and audio processing tochange the expressive intention of a recorded music performance. Proceedings of the 2nd COST G-6Workshop on Digital Audio Effects (DAFx99), NTNU, Trondheim, December 9-11, W99-1-4 (http://vvv.tele.ntnu.no/akustikk/meetings/DAFx99/canazza.pdf).

De Poli, G. & Prandoni, P. (1997). Sonological models for timbre characterization. Journal ofNew MusicResearch,26, 170-197.

De Poli, G., Roda, A. & Vidolin, A. (1998). Note-by-note analysis of the influene of expressive intentionsand musical structure in violin performance. Journal ofNew Music Research, 27, 293-321.

DUnnwald, H. (1994). Die Klangqualitllt von Violinen unter besonderer BerUcksichtigung der Herkunft derInstrumente. In Zum Streichinstrumentenbau des 18. Jahrhunderts. Bericht tiber das 11. Symposiumzu Fragen des Musikinstrumentenbaus, Michaelstein 9.-10. November 1990 (pp. 71-82). Michaelstein:Institut fur AuffUhrungspraxis.

Grey, 1. M. (1975). An exploration ofmusical timbre using computer-based techniques for analysis, syn-thesis and perceptual scaling. PhD Thesis, Stanford University.

Herkner, W. (1969). Der Ausdruck der Klangfarben von Musikinstrumenten. PhD Thesis (typescript),University of Vienna.

Hevner, K. (1935). The affective character of the major and minor modes in music. The American JournalofPsychology, 47, 103-118.

Hevner, K. (1936). Experimental studies of the elements of expression in music. The American Journal ofPsychology, 48, 246-268.

Houtsma, AJ.M. (1997). Pitch and timbre: Definition, meaning and use. Journal ofNew Music Research,26, 104-115.

Jost, E. (1967). Akustische undpsychometrische Untersuchungen an Klarinettenkliingen. Knln: Amo Volk.

Page 230: Musical Imagery

WOLFGANG AUHAGEN AND VIKTOR SCHaNER 217

Kendall, R.A. & Carterette, E.C. (1993a). Verbal attributes of simultaneous wind instrument timbres: I. vonBismarckt's adjectives. Music Perception, 10,445-468.

Kendall, R.A. & Carterette, E.C. (1993b). Verbal attributes of simultaneous wind instrument timbres:II. Adjectives induced from Pistont's Orchestration. Music Perception, 10,469-502.

Krautgartner, K. (1982). Untersuchungenzur Artikulation bei Klarinetteninstrumenten im Jazz. PhD Thesis(typescript), University of Cologne.

Mattheson, J. (1713). Das Neu-Eroeffnete Orchestre. Hamburg: Benjamin Schillers Witwe.Meyer, J. (1979). Zur Tonhtlhenempfindung bei musikalischen in vom Grad der

Gehtlrschulung. Acustica, 42, 189-204.Meyer, J. (1992). Zur klanglichen Wirkung des Streicher-Vibratos. Acustica, 76, 283-291.Meyer, 1. (1993). Vibrato sounds in large halls. In Proceedings of the Stockholm Music Acoustics Confer-

ence, 1993 (pp. 117-121).Piston, Walter. (1955). Orchestration. New York: Norton.Pitt, M.A. & Crowder, R.G. (1992). The role of spectral and dynamic cues in imagery for musical timbre.

Journal ofExperimental Psychology: Human Perception and Performance, 18, 728-738.Potter, J.M., Raatgever, J. & Bilsen, EA. (1995). Measures for Spaciousness in Room Acoustics Based on

a Binaural Strategy. Acta acustica, 3, 429-443.Praetorius, M. (1619). Syntagmatis Musici Tomus Tertius. WolfenbUttel: Elias Holwein.Reuter, C. (1995). Der Einschwingvorgang nichtperkussiverMusikinstrumente. FrankfurtlM.: Lang.Reuter, C. (1996). Die auditive Diskrimination von Orchesterinstrumenten. FrankfurtlM.: Lang.Sandell, GJ. (1995). Roles for spectral centroid and other factors in determining 'blended' instrument

pairings in orchestration. Music Perception, 13, 209-246.Sundberg, J. (1975). Formant technique in a professional female singer. Acustica, 32, 89-96.Sundberg, 1. (1981). Formants and fundamental frequency control in singing. Acustica, 49, 47-54.Voigt, W. (1975). Untersuchungen zur Formantbildung in Klangen von Fagott und Dulzianen. Regensburg:

Bosse.Wedin, L. & Goude, G. (1972). Dimension analysis of the perception of instrumental timbre. The Scandi-

navian Journal ofPsychology, 13, 228-240.Wogram, K. (1992). ImpulsmeBverfahren fUr die Qualirntsbestimmung bei Blechblasinstrumenten. lnstru-

mentenbau-Zeitschri/t, 46 (2/3), 122-132.

Page 231: Musical Imagery
Page 232: Musical Imagery

12

Images of Form: AnExample from NorwegianHardingfiddle Music

Tellef Kvifte

Introduction

To understand the formal structure of a piece of music, we often use visual images inone form or another. One of the strengths of visual images - in contrast to auditoryimages - is the overview giving us the complete musical object accessible for inspec-tion at one and the same time. Much of traditional music theory aims at making visualrepresentations on paper of musical objects, to characterize and show structures ofthese objects. Images of form are of course not limited to representations on paper.As musicians and listeners, we spontaneously make our own internal images of form,and part of the process of learning to listen to and to play in a specific genre of music,consists of the development of images that 'work' in some sense, giving us what weexperience as an understanding of the music in question.

Reading music theory, I sometimes get a feeling that 'form' is a relatively fixedentity, an entity that one may discover given good analytical tools and procedures thatare applied to musical notation. But in a musical genre where music notation is noimportant part of the teaching and transmission of musical works, such analytical pro-cedures seem irrelevant if the outcome is not in some way grounded in the experiencesof insiders. With the genre of Norwegian Hardingfiddle music as point of departure, Iwill discuss possible structures that may form the basis of formal structures employedby performers of this genre. Central in this discussion is the question of the relationbetween observed formal structure and possible internal images of them: how can we

Page 233: Musical Imagery

220 IMAGES OF FORM

/

...........- ......--------.--------....-----"--_'-.....--,...'-......,"'.--------.

---.........,..-....

Lestoidae Calopterygoidae Aeshnoidae Libelluloidae

Figure 1. Hierarchy of administrative units and hierarchy of classification in biology.

infer from behaviour to structure of internal images? I will draw on different kinds ofdata in this discussion, including my own experiences as a performer.

Images of hierarchies

One of the most pervasive kind of images of structure, is the image of a hierarchy. Wecan find examples of hierarchies almost anywhere, in most contexts where we care tolook. We seem to view both society and nature around us in terms of hierarchies, asFigure 1 may indicate.

In the context of musical activity - research, listening and performance - imagesof hierarchies are encountered in a number of situations. Beside being the basis forclassification of musical genres and of musical instruments, most works on musicalform tend to take the hierarchies as the obvious point of departure.

A hierarchy may be seen as a number of units combined into a smaller number ofunits on a higher level, and so on. Alternatively, the hierarchy may be described asan 'inverted tree' where the 'root' is divided into a number of units on a lower level(hence 'inverted'); each of these new units further divided into units on a still lowerlevel, and so on. Most systems of classification are built like this; it is in fact hard toenvisage classification without a hierarchy.

The use of hierarchies as analytical tools or even as the fundamental way of look-ing at things in general, is motivated in various ways by a number of authors. In hisclassical paper Simon (1962), Simon argues that we not only tend to view complex

Page 234: Musical Imagery

TELLEF KVIFfE

G ---.. Mental images

Figure 2. Events causing mental images.

221

processes as hierarchies, but that also nature itself seems to favour hierarchies whenproducing complex systems. His argument is basically that in this way, smaller unitsmay be built and tested by evolution before they are combined into larger units.

Another classical paper, 'The Magical Number Seven, Plus or Minus Two' (Miller1956) uses hierarchies as a basis for explaining how we are able to handle large amountof sensory information. The concepts of recoding and chunking are central here, theidea being that a number of units (like numbers, letters, items...), that is, a chunkof information on one level, may be recoded into one single unit on a higher level,and units on this level may then be chunked and recoded and so on. Remembering along tune may then be explained as series of recodings on several levels from, say, asmall motif level, via sections to the whole tune. The structure of these recodings mayobviously be seen as a hierarchy.

Earlier (Kvifte, 1978, 1981), I have used arguments like those held by Simon andMiller to argue in favour of hierarchies as a fundamental way ofdescribing Norwegianfiddle tunes, both as an analytical tool, as well as a description of possible mental mod-els used by fiddlers. In addition to theory inspired by Miller and Simon, I argued thatthe terminology used by fiddlers to describe tunes and parts of tunes, as well as theirway of splitting the tunes when teaching them to others, pointed to hierarchical struc-tures, structures that I described in some detail. Later (Kvifte, 1994), I abandoned thestrict hierarchical position. The present paper continues some of the arguments fromthis work, but here, I also want to question the at times almost self-evident position ofhierarchies as the preferred analytical structure. One of my aims is to show situationswhere the hierarchy is not the obviously best choice.

The status of images, and how to study them

It is possible to regard the conscious images as only an epiphenomenon - as a beauti-ful, but completely non-functional embellishment in our minds. They may be regardedas a product of external stimuli - in our case probably music, as indicated in Figure 2.

Or the images may be produced without such stimuli, as we may just imagine atune or a sound or a visual scene without obvious connection with present stimuli. Theimage may then be seen as a product of internal, subconscious processes, processesthat also may be able to produce behaviour. But the mental images may still be seenas a useless byproduct, as indicated in Figure 3 on the following page.

Leaving the more general issues of the nature, function and origins of mental im-ages aside here, it may be argued that acting as if images are functionally significanthas advantages. As a performer, I know that the process of learning and performing atune is heavily influenced by my conscious image of the structure of the tune in ques-

Page 235: Musical Imagery

222

/Mental images

IMAGES OF FORM

Figure 3. Subconscious processes causing mental images and behaviour, but no neccessaryrelation between mental images and behaviour.

tion. As a teacher, I know that it matters very much what images of a tune I manageto convey to my students. Problems with 'catching' a certain phrase may for instanceoften be solved by actively encouraging the student to look for alternative images ofthe phrase in question.

I prefer to see the situation more like in Figure 4, taking into consideration that themental images on the one hand may shape my behaviour, and on the other hand, thatI may use my mental images to shape subconscious processes that in tum may shapebehaviour. I do not have to be consciously aware of the images of formal structure ofthe tune all the time I play. On the contrary, I practice my instrument and my tunesprecisely in order to be able to play without being consciously aware of technicalitiesof fingering and problems of formal structure, Le. so that my attention may be on allsorts of other important issues, like the general 'feel' of the music, the movement ofthe dancers when playing dance music, or a particularly nice lady in the band or in theaudience. And my musical behaviour may still be no different from when I do have aclear structural image in my mind.

In other words, I see internal images as tools for the fiddler (and listener), toolswhich may be put to a variety of uses. Assisting in parsing the tune in relevant ways,in finding relevant features for the task at hand, like finding the next motif during aperformance, finding the bow direction at a certain point for demonstration for a pupil,putting together a new motif, in exploring and combining features of a tune in novelways etc.

Mental images

/ \

Figure 4. Subconscious processes causing mental images and behaviour, and mental imagesalso possibly causing behaviour and further subconscious processes.

Page 236: Musical Imagery

TELLEF KVIFfE

Available evidence in the study of images

223

How am I able to know what the fiddlers' images of form are like? In the follow-ing, I will exploit three different kinds of available evidence, starting with analysis ofmusical behaviour, including several possible performances of a tune. Furthermore, Iwill refer some verbal evidence from fiddlers, and finally use introspection relating tomy personal experience as a fiddler. The discussion starts with what may be regardedas circumstantial evidence, not addressing images directly, but using the evidence toargue for certain kinds of structural properties which must be present in the images.The introspective evidence will be more directly related to images as such.

Analysis of musical behaviourIn the following example, we see a somewhat hypothetical, but possible, performanceof a Norwegian tune. The actual tune is not hypothetical, only the performance shownhere. As will be shown later, a normal performance of this particular tune will includemore motifs. The actual pitches used do not always correspond to normal westerntype intonation. The intonation of notated C2 will be somewhere between C2 and

a number of embellishments will be used, and normally, two strings are usedthroughout, producing a variable drone. The bowing patterns are also important for theperformance. However, the information in Figure 5 on the following page is sufficientfor the subsequent discussion.

In order to play this tune, what kind of image of the formal structure do I need?Taken for granted that I break the performance down into units of some kind, mybehaviour may be explained quite simply by a linear structure like the one shown inFigure 6 on the next page.

On the other hand, looking closely at the units in Figure 5, we find that they arenot all different, but display a structure similar to Figure 7. One may notice that thesecond and fourth b-units are prolonged in this representation, but an alternative is tolet the third unit be longer. However, that choice is not important for the argumenthere.

If we call each unit a motif, we notice that we have two different motifs, conve-niently called 'a' and 'b' respectively. Now, given this regularity, it is tempting to inferthat my internal structure is not linear, but rather like a hierarchy, with the tune con-sisting of two equal parts, labelled '1' and '2' as illustrated in Figure 8 on page 225,each with an A and a B section, containing the motifs 'a' and 'b' respectively:

It is convenient to establish a terminology for the different levels in the hierarchy.The lowest level will be called the motif level; the level above, where the units aremade up of a number of similar motifs will be called the vek level, following fiddlerterminology. The next level - 'playing the tune through once' - will be called theround level. The top level in Figure 8 I usually prefer to label the performance level,and reserve the tune concept to an even higher level, a level not shown in the figure,as this level would incorporate all possible performances of the tune.

As an image of the form of the tune, this model works quite well, as we accountfor both how units on lower levels combine into units on a higher level, as well as forthe sequence of the units, if we read the sequence of units from left to right as shownin the figure. Such an hierarchy may be called ordered, in contrast to hierarchies like

Page 237: Musical Imagery

224 IMAGES OF FORM

Figure 5. Melody of the tune Den gamle Sord(/Jlen. The half barlines indicates my percepc-tion of the motif structure; the dotted bracket in line two indicates an overlappingbetween two motifs.

Figure 6. Linear structural image of Den gamle Sord(/Jlen with each box corresponding tounits between half barlines in figure 5.

Figure 7. Linear structural image, with motif names.

Page 238: Musical Imagery

TELLEF KVIFfE

2

225

b

Figure 8. The tune as a hierarchy. Here, the top level is a performance of the tune, and thebottom level is what is called motif units in the text. The level with units 1 and 2will be referred to as the 'round level' - each unit on this level amounts to 'playingthe tune through once'. The units with capital letters are referred to as the vek levelin the text. The number of levels is, of course, an arbitrary analytical decision.

in Figure 1 where the graphical layout does not neccessarily imply that the differentunits on the lowest level are to be understood as part of a particular sequence.

More data may come from additional performances of the same tune. Continuingwith our small tune from Figure 5, you may notice when I play it one more timethat this time, I play the b-motif three times, but perhaps omit an a-motif. Still, wemight explain this by saying that I use a linear structure, but that I have two alternativeversions of the tune.

But if I continue to produce more different versions of the tune, the linear structurewill become less and less likely - the implication of a linear structure being that I haveto learn each version separately. Knowing that an average fiddler has a repertoire ofmore than a hundred tunes, most of which are played in different ways each time, thisseems rather unlikely. A hierarchy is a much simpler way of explaining my behaviour.If we add some rules as to how the number of subunits for a given level may vary,like (for the motif level) 'play each motif unit one, two or three times', many differentperformances may be generated from the simple hierarchical model.

There is more evidence from behaviour to support the hierarchy. The linear struc-ture implies that I do not really have an overview of the tune. If I for some reasonshould make a mistake at some point, I may be unable to continue, but will have tostart from the very beginning, as the only road to each structural unit goes from the

Figure 9. Two possible version of the tune.

Page 239: Musical Imagery

226

Figure 10.

A

B

c

D

E

IMAGES OF FORM

U' Ul tJtf*J¥=tJ-+--+--iU'Q t g

__

P j EJmJ

Motifs in the tune Den gamle Sordf/Jlen.

beginning. For the same reason, if I want to show a fellow fiddler how a certain motifis played, a linear structure will force me to start from the beginning and play till Ireach the motif in question. A hierarchy, on the other hand, will enable me to accessthe units on various levels and points of a tune. Good fiddlers obviously do not haveto play from the beginning of a tune to access motifs in the middle of it. They are ableto show any part of any tune at any time, displaying a great skill of retrieving musi-cal material relevant for the task at hand, be that teaching tunes, playing for dance,or answering silly questions from ethnomusicologists. Their internal tune structure iscertainly not linear.

So far, I have shown you only one part of the tune. The complete version has somemore units as shown in Figure 10.

The additional units present no problem to the hierarchical model. It is simply amatter of adding more units at the proper level of the hierarchy. Also knowing theadditional complexity that each of the units of Figure 10 may be played a fifth higheror lower than given here is no problem - just add some more units, perhaps at anotherlevel in the hierarchy, like in Figure lIon the facing page. We could conclude that sofar, the hierarchy is the natural choice of structure.

Conflicting evidence - and the network as an alternativeA problem with the hierarchies is that units on lower levels cannot overlap units onhigher levels. The low level units must be well-defined, at least where they share bor-ders with higher level units, because how should we be able to tell where the B partstarts if we are unable to tell where the a-motif ends and the b-motif starts? This isthe case in Den gamle Sordf/Jlen where there are at least two obvious possibilities forthe border between the a- and b-motifs, as the motifs share some of the same mate-rial (indicated by the dotted bracket in Figure 5). Experientially, one has to chooseonly one of the two possibilities at anyone time, much in the same way as the Gestaltprinciple of exclusive allocation works, cf. various well-known cases of this such asthe vase/figure and the duck/rabbit illustrations. But it is always possible to change

Page 240: Musical Imagery

TELLEF KVIFTE 227

Figure 11. Incorporating additional motifs in the hierarchy. Underlined symbols indicate'motif played a fifth higher than motif with plain styIe'. Only the first round isshown.

one's mind, and perceive the music according to the other possible punctuation. Fur-ther, there is no obvius analytical procedure for distinguishing the two possibilities. Inother words, there is no way to find one single hierarchical model to describe the tuneperformance, as the two possible borders need separate hierarchies.

An even stronger argument against the hierarchy, is the fact that playing this tune,the sequence ofmotivic units is not always the same. Given only two different units,as in the first example, such an observation does not make sense as an a-motif sooneror later always will be followed by a b-motif. But with four different units, withvariations, the question is quite relevant. Adding to the confusion, there are severalpossibilities for starting and ending a performance of the tune. A possible performanceof the tune may be like in Figure 12.

If I play the tune like this, I am obviously not behaving as if using a hierarchy inany simple way. There is nothing that corresponds to 'playing the tune through once',as the round level (see Fig. 8 on page 225) is absent. The vek level, on the other hand,is clearly visible in clusters of similar motifs (two a-motifs together followed by two

w ---.w ---.w ---.W---. W ---.W ---.W ---'wJ

lrn---.w---.Figure 12. Possible sequence of motif units.

Page 241: Musical Imagery

228 IMAGES OF FORM

Figure 13. A possible network structure for the tune.

b-motifs...). But the vek units do not follow each other in a fixed order that may bevisualised in a ordered hierarchy. I seem to combine the motifs more freely, and astructure like a network as shown in Figure 13 seems to be a better explanation.

This structure also accounts for how the different units may be combined; thenetwork in Figure 13 implies that I may combine the motifs in almost all possibleways, the only exceptions being that C and D units are not allowed to follow a B unit.In other words, of the twelve theoretical combinations of units (excluding the trivialcases where one unit may follow itself), 10 are allowed in practice in this tune.

Saving the hierarchies

So far, the evidence from the performed music seem to point to a network as a betterstructural choice. But one should notice that it is possible to use a hierarchy in a waythat will produce performances like the ones described above. I may think in terms ofa hierarchy where the motifs come in a fixed sequence, but simply omit some motifsfrom time to time when performing, as in Figure 14 on the facing page. This will givethe impression that the sequence of the motifs is not well-defined, like in a network asillustrated in Figure 13.

Another way to envisage this, is a hierarchy with a number of extra connectionsbetween units on a certain level, like in Figure 15 on the next page. This type ofstructure may be seen as a way of incorporating a network in the hierarchy. Noticethat the level where the new connections are introduced, is the vek level, correspondingto the units connected in the network illustration in Figure 13.

Therefore, as both networks, hierarchies and hierarchy-like structures may pro-duce performances of the type described, we will need additional evidence to approachthe question of the structure of the images experienced and used by fiddlers.

Page 242: Musical Imagery

TELLEF KVIFfE

2

229

Figure 14. The tune organised as a hierarchy, but skipping some motivic units to simulate anetwork.

Verbal evidenceVerbal evidence is obviously one possible source of relevant information, and an im-portant part of this is the terminology commonly employed by fiddlers to describeformal structure. The term vek is taken from fiddlers' terminology, but used here ina slightly different sense, as fiddlers tend to use the term for units on the vek level aswell as for units on the motif level. The term for a tune, slatt, may also be used forunits on more than one level: 'You play the vek twice [the a-motif repeated] - then youhave played a vek' [the 'A' section], and further 'you play the tune once through [the'1' section in Fig. 8], then one more time [the '2' section] - then you have played thetune.' These concepts, and the way they are explained, are documented in interviewswith performers (see e.g. NFSTd-2909; tape in Norwegian Collection of Folk Music,Univ. of Oslo ), and point to a hierarchy as the basis for their perception of form.

Another kind of evidence comes from the answers I get when I ask fiddlers whereone motif ends and the next begins. Sometimes I get a quick and certain answer,but sometimes -like when asking where the a-motif ends and the b-motif begins inDen gamle Sord¢len- fiddlers may be very reluctant to proviJe an answer. Here,the two motifs share a common part, and in the transition from the a- to the b-motifthe common part may be seen as belonging to either motif (see the notes under the

Figure 15. A structure incorporating both a hierarchy and a network

Page 243: Musical Imagery

230 IMAGES OF FORM

broken bracket in Fig. 5). This is the case in many tunes, and on this background itis not surprising that fiddlers at times may be very reluctant to show exactly whereone motif ends and the next begins. On the other hand, it is perfectly possible todecide that one of the possibilities is 'right' and the other interpretation 'wrong'. Thatthe fiddlers tend not to do this, I take as evidence against a strict hierarchical model,as a strict hierarchy has to be built on well-defined units, as said above. Maybe thefact that fiddlers seem to use the same term for (at least) two different levels in ourhierarchy also indicates that the case is not as clear-cut as one could wish. Insistingon a strict analytical hierarchy may therefore not be in agreement with the formalstructure perceived by the performers.

Two kinds of tunes

Before turning to introspective evidence, it is important to point out that Norwegianfiddle tunes may be regarded as belonging to a number of different formal types. Thetune Den gamle SordfjJlen belongs to a relatively small class of tunes where the possiblevariation in performance is far greater than in most other tunes. Analytically, a largebody of tunes in the repertoire are readily seen as hierarchies quite similar to theone shown in Figure 8, where vek units follow each other in an orderly, predictablesequence. One may also view some tunes as belonging to various 'intermediate types'.In the following, I will discuss images related to both Den gamle SordfjJlen-type oftunes, as well as more regular ones.

Introspective evidenceAs said above, there are several possible structures which may fit my observable per-formances of Den gamle SordfjJlen. But before relating what my own introspectionmay reveal, a few words should be said of the possible relevance of my personal ex-perience. To what extent my images are representative for traditional fiddlers is ofcourse an open question, as my musical background is quite different from that of atraditional fiddler. Without going into details of such differences, the only argument Ican put forward in favour of the relevance of my internal images, is that I am actuallyable to produce musical behaviour which is culturally accepted in a traditional context.Specifically, I am able to come up with ever new performances of the tune Den gamleSordfjJlen which are formally different, but still acceptable as correct performances ofthis tune (as well as a number of other tunes in the genre). On the other hand, I havepreviously spent quite some time and effort to argue in favour of hierarchies as the ba-sis for performance of Norwegian fiddle tunes, and my present images may of coursebe influenced by my theoretical efforts.

Returning to the 'fuzzy border' argument against hierarchies, this argument doesnot seem relevant introspectively. My image of situations as described above, wheretwo succeeding motifs share an element, so that the transition from the first to thesecond is 'blurred' by this common element, is similar to the image shown in Figure 16on the facing page. The motifs seems fuzzy, in the sense that I am not able -orwilling- to define an exact transition point. But the vek level is quite distinct, as the'a-ness' of the a-motif is quite distinct form the 'b-ness' of the b-motif.

Page 244: Musical Imagery

TELLEF KVIFfE

, ",,>,- --,-- ",,>,- -, a a' b b" . " . .. ...... ,., ..., ................ ,., ..., ..........

231

Figure 16. Hierarchy with fuzzy boundaries at the motif level. The fuzzy boundaries do notprevent me from having distinct - not overlapping - experiences of the respectivemotifs, and of the sections of the tune made up of the respective motifs.

This is part of the power of the hierarchy, in that we may use each new level as anabstraction. The capital A in Figure 16 is not simply the part of the tune where theunit 'a' is used; it is my image of 'a-ness'. The hierarchy shows me that in this tune,'a-ness' comes before 'b-ness'. Hierarchies of this kind seems to me to describe myinternal images of the formal structure of a large group of tunes, namely tunes with aclear round level and with consecutive motifs that share motivic material.

The images, however, do not really 'look like' hierarchies in the form shown inFigures 11, 14 and 15. On the motif level, I have quite detailed images, in the form ofanimated pictures made up of a combination of fiddle fingerboard, finger movementsand music notation. The images are extremely hard to draw on a paper, and a drawingwould not be a good description, since the images are real multi-sensory, includingvision, sound and, equally important, a unique 'feel' for each motif. For the vek level,I also have clear images, but not as detailed as in the motif level. Here, the 'feel'component is the more important. The same holds for the round, performance andtune levels. But I definitely have images of the units on the different levels shown inFigures 11, 14 and 15. What I do not have, is a clear visual image of the hierarchichalstructure as it is shown in the mentioned figures. I have no conscious visual image of agraphical layout on a page with lines connecting the units. What I do have, is an imageof sequences on each level. I have an image of how motif variants should follow eachother, as well as an image of how vek units should follow each other. But once I arriveat a vek, I do not have to move along a line to another unit at another level; 'beingin' a vek I have immediate access to the motif and all the variants. Instead of 'goingdown' I 'go in'. But structurally, those images behave like an ordered hierarchy, andthe figures shown are in this sense as good representations of my images of form as ispossible for me to put on paper. The two modified hierarchies, however, do not seemto fit the situation when playing Den gamle Sordf/Jlen. For tunes of this kind, a networkseems more to the point.

But the network does not look or feel like the one in Figure 13. My internal imageis seen from 'inside' the network; each vek a different 'place', with connecting 'roads'of a quite abstract kind.- The point, however, is that at anyone time, I feel 'situated'in a vek, being able to see a number of possible connections to other vek, but not

Page 245: Musical Imagery

232 IMAGES OF FORM

having an overview like the picture in Figure 13 would indicate. Supporting this is thelack of experience, or image, of a round level when playing this tune. In playing more'regular' tunes, I have a clear experience of rounds, knowing exactly when I 'start overagain' and will usually be able to tell afterwards how many rounds I played. PlayingDen gamle Sordrplen, I have no clear image of vek sequence.

The difference between networks and hierarchies also show up in how I am ableto describe tune performances: it is harder for me to recall the specific form of acertain performance of Den gamle Sordrplen, than it is with tunes where my imageis clearly hierarchical. This is also what one would expect, as an ordered hierarchygives a rich vocabulary for places within a tune performance, with round level welldefined, and a definite sequence of the motifs. One might argue that the possibleamount of variation is greater in Den gamle Sordrplen than in more regular tunes, butthis is not necessarily so. Many 'regular hierarchical' tunes may be varied quite alot during performance. Even if the vek sequence is unaltered, the number of motifrepetitions with a vek may vary, as well as the way motifs are performed with respectto melodic, ornamental and bowing variations. I have, however, different images ofsuch possible variations depending on the situation. If I play (for dance, for listening,or just for my own pleasure), I will usually have a number of possible 'ready-made'variations for the motif in each vek - and an image of each complete motif includingfingering, ornaments, bowing and timing. In a different situation, when I teach tunes,or when I explore the tunes for my own pleasure and for expanding the possible waysof playing them, I find that I use images of e.g. bowing patterns as own entities -entities that may be applied to different motifs. But once a bowing pattern is appliedto a motif in a novel fashion and rehearsed for a short while, the new bowing patternmotif combination gets added to the existing store of possible motifs variations for thatparticular vek in the same way as the others. Apart from bowing patterns, I may alsouse specific images of different kinds of ornaments, double stops, rhythmic variationsand melodic alternatives.

Forming the images of form

A final observation: My image of Den gamle Sordrplen is not always the same. The'inside tune' network image is at work when I play, but when discussing the tune,or teaching it, my experience is that I have direct access to any motif in the tune,in other words, that I may go from the 'tune level' to the vek level directly, henceimplying a hierarchical structure. And my image in such discussion situations feelsmore hierarchical than when playing.

I have no other evidence to support the suggestion that fiddlers in general sharemy experience in this respect, and my analytical training may very well influence myimages to a large extent. But the general point remains: there is no reason to believethat one should have only one kind of image of a given tune (or other kind of musicalperformance or work). Rather, it seems reasonable to expect that the images perceivedand used will depend on the actual situation, and be geared to suit the tasks at hand.Given that images are in some way functional and are used as tools by the fiddler,this is quite obvious: My need for conscious access to details is greater when I teach

Page 246: Musical Imagery

TELLEF KVIFfE 233

than when I play; my need for overview is greater when I want to surprise an attentivelistener or when I compose a new tune than if I play for people absorbed in dancing,etc. During a number of such activities, I use and construct images pertaining to detailsof playing as well as to the form of the music; not always being able to distinguishimages of playing activities from images of form.

How and to what extent images vary in this respect, is an open and partly empiricalquestion. Music theory is usually more concerned with observable patterns in musicnotation than with structures of the experience of listeners and musicians, and thereis usually no reference to possible situations where a certain formal analysis will berelevant outside the analytical context.

Form as formed by personal musical historyMy personal history of playing Norwegian fiddle tunes includes a movement fromstrict hierarchies to more differentiated inner images of formal structure of fiddletunes. My first images of Den gamle Sordr/Jlen were also clearly hierarchic, and myperformances reflected this in the way that I played the different vek in a fixed se-quence. If I had not paid close attention to how my fiddle teacher actually playedthis tune on several occasions, I might still have a hierarchical image of this tune, andeffectively reduced the possible performances and artistic space available to me. It isalso tempting to speculate that my previous musical background had made the hierar-chy a preferred structure to me, so that I tended to understand tune performances inthis way whenever possible. Reversing this train of thought, I have tried to play sev-eral of the tunes normally played in a regular hierarchic fashion as networks; that is,loosening up the sequence of vek in the performance. While provoking protests fromfellow fiddlers who know the tunes, such a procedure does in many cases makes sensestylistically and musically. This is supported by the fact that fiddlers not knowing thespecific tune in question will find such performances quite in style.

This leads to the relatively obvious observation that I tend to understand the musicI hear in ways that are familiar to me, and to the equally obvious observation that Icould very well understand it in a different way if I tried. What is not so obvious, isthat this opens up for a number of different scenarios in the historical developmentof Norwegian fiddle tunes. One is parallel to my own personal history, that is, fid-dle tunes perceived as primarily hierarchic in structure, and performances being quitesimilar, at least on the vek level, but being developed into more flexible structuresthrough use as e.g. dance music that required the music to adopt to the whims ofthe dancers. Another possibility is that most fiddle tunes were perceived as flexiblenetworks, but that through pressure from other musics of higher status e.g. throughschools, churches and later mass media, forms became more fixed. The third possibil-ity being that both kinds of structure have been available all along, and being used fordifferent tunes and maybe for different uses, dance playing and playing for listeningbeing two possibilities.

The point of these reflections is not to rewrite the history of fiddle tunes, but of amore general nature, and I will therefore not speculate further on the history of fiddletunes. The point here is to draw attention to perceived formal structure as an activeforce in shaping the development of a musical genre, and to argue that 'musical form'is not found in the score nor in the sound, but as images in the heads of musicians

Page 247: Musical Imagery

234 IMAGES OF FORM

and listeners. Musical works and performances may to a certain extent indicate cer-tain perceived structures, but can not by themselves prove how they are perceived.Therefore, it is quite difficult to assess the different versions of Norwegian fiddle tunehistory I outlined above. Getting access to the experiences ofother people is very hard,and getting to know the experiences of the people who played and used the music one,two or three hundred years ago will never be possible. But changing the perspectivefrom the musical score to the experiences of people will perhaps make us look formore data than we otherwise would have, and also make us assess the data we have ina different way.

Conclusion

My argument is based on empirical evidence and personal experience with one specificgenre ofmusic, traditional Norwegian dance tunes, but I see no reason why the generalimplications could not be extended to other musics. The empirical evidence availablemay be different, but there is nothing genre-specific in the way of thinking here. Theactual structures perceived may differ a lot from genre to genre. There is no need toassume that the whole of The Ring of the Nibelung is perceived using the same kindof structure image as when dancing to techno dance music. But in both cases, thesame theoretical or paradigmatic point of departure may be used, namely thatformalstructure should be viewed as a dynamic quality ofthe perceived music, not as a staticproperty ofthe music score.

Formal analysis of single performances and scores should not be regarded asmeaningful analysis of a musical work, but merely as possibilities for experience. Aconventional analysis of the tune usually aims at a single structure for the descriptionof the form, basing the argumentation on the performance as played or transcribed.But a single performance of Den gamle Sord(Jlen may, as shown above, be perceivedeither as a hierarchy, a linear structure or as a network.Therefore, I will argue that asensible analysis cannot be undertaken on such a basis, but needs a wider empiricalbase, both widening the scope of musical evidence from musical scores to musicalperformance, and also including evidence that may indicate what kind of images areactually used by listners and musicians. Formal analysis should be regarded more asan empirical than an analytical discipline, and draw on a variety of evidence ofmusicaland other behaviour. In this paper, I have used information from several structurallydifferent performances of the same tune - a privilege not available to all genres ofmusic. Further, verbal behaviour of performers was drawn into the discussion, as wellas observations of my own images.

Musical imagery is in itself a fascinating subject. My aim in this paper, however,has not been to describe such images in themselves, but to show that the perspectiveof imagery is central to understanding aspects of musical behaviour, like the formalstructure of music as performed. I will also stress the fact that the study of imageryinvites us to use a wide variety of empirical material. This in turn may contribute toexpanding musicological research and, ultimately, our understanding of music as aphenomenon and human activity.

Page 248: Musical Imagery

TELLEF KVIFfE

References

235

Kvifte, T. (1978). Om variabilitet ifremf(Jring av hardingfeleslatter. Thesis. University of Oslo.Kvifte, T. (1981). On Variability, Ambiguity and Formal Structure in the Harding fiddle music. In E. Stock-

mann (Ed.), Studia lnstrumentorum Musicae Popularis VII (pp. 102-107). Stockholm: Musikhistoriskamuseet.

Kvifte, T. (1994). Om variabilitet ifremf(Jring av hardingfeleslatter- og paradigmerifolkemusikkforsknin-gen. Oslo: Institutt for musikk og teater.

Miller, G. A. (1956). The Magical Number Seven, Plus or Minus Two. PsychologicalReview, 63(2),81-97.Simon, H. A. (1962). The architecture of complexity. Proceedings of the American Philosophical Society,

106(6),467-482.

Page 249: Musical Imagery
Page 250: Musical Imagery

13

Imagined Action,Excitation, and Resonance

Rolf luge

Introduction

Although recent studies have explored the cognitive and neurological bases of musicalimagery (Zatorre et aI., 1996), questions of the actual sonorous content or sonorousqualities (such as timbre and texture) in musical imagery seem to have received less at-tention. Questions of sonorous qualities are difficult because they do not fit well withthe symbol-oriented paradigm of our musical culture (such as discrete pitches anddurations), as sonorous qualities are highly multidimensional (evolving spectra, tran-sients, etc.) and also rely heavily on introspective reports (Le. pose difficult method-ological problems for any experimental approach). Nevertheless, musicians, com-posers, musicologists, and non-professional music-lovers for that matter, all rely onimagining musical sound, and for composers and/or arrangers, developing a capabilityfor imagining sonorous qualities is quite simply an integral part of musical craftsman-ship. In this paper, I will argue that images of sound-producing actions (such as hittinga drum, plucking a string, blowing on the tip of a bottle, etc.) can enhance this capac-ity for imagining sonorous qualities. I will do this by presenting a conceptual modelof imagined sound-production which separates excitation and resonance and relatesthis to some ideas from the domains of motor control and motor imagery.

There are two fundamental (and related) ideas underlying this paper:

Page 251: Musical Imagery

238 IMAGINED ACTION, EXCITATION, AND RESONANCE

• That there is a strong link between our knowledge of sound and sound sources,both in perception and cognition, so that features of sound are in most casesrelated to features of sound-production, sound-production here understood as in-cluding both the sound-producing action and the features of the resonant bodiesand environments. And, as an extension of this:

• That images of sound-production, including visual, motor, tactile, etc. elements,may actually trigger images of sound, and conversely, that images of sound maytrigger images of sound-production.

There seems now to be a fair amount of research which directly or indirectly demon-strates close links between images of sound and sound-production in perception andcognition (more on this in the section 'Ecological constraints' below). However, asfor the idea that images of sound-production may actually trigger images of musicalsound (and vice versa) in our minds, this is as far as I know a less well explored topic.There is one study which suggests that imagined sound-producing action may serve as'stimulus support' in auditory imagery (Smith, Reisberg & Wilson, 1992), and I sus-pect that images of sound-productionmay play an important role in generating imagesof musical sound for the following two reasons:

• There are fairly strong indications of functional relationships between perceptionand imagery in general, as has been demonstrated in the domains of visual andmotor imagery, meaning that imagining something will share many features (neu-ral substrates, time needed, sense of effort, degree of difficulty or ease, etc.) asthe actual doing or perceiving (Kosslyn, 1994; Jeannerod, 1995). It is plausibleto assume that this could also be the case for musical perception and imagery,something which in fact seems to be suggested by some studies of musical im-agery (see Janata, this volume).

• Informal observations of performers, composers, music students, etc. suggest thattasks of musical imagery (such as the recollection of music, the reading of scores,etc.) are facilitated, and results enhanced, by imagining the actions of actuallyproducing the sound, i.e. by imagined playing, conducting, etc.

It could in any case be justified to propose this triggering of sound images by images ofsound-production as simply a (hopefully) fruitful hypothesis for further study, as wellas an heuristic strategy for exploring features ofmusical sound. This last point refers toa kind of 'guided imagery', a kind of phenomenological exploration ofmusical sound,where the conceptual mapping out of the various elements of sound-production mayhopefully enhance our images of musical sound, i.e. we have more salient images ofsound when we have more salient images of how the sounds are produced (as is, bythe way, one of the experiences of working with sound synthesis based on physicalmodels). Such explorations of images of sound-production could open up the territoryof non-symbolic qualia, such as timbre and espressivity, which have until now beenlittle explored in musical imagery.

Clearly, such explorations will imply a good deal of introspective, 'arm chair'type research. As a point of method, this will mean that we should accept several dif-ferent approaches to the investigation of musical imagery (brain scannings and other

Page 252: Musical Imagery

ROLF INGE GOD0y 239

measurements and observations of brain activity, various experimental psychologicalapproaches, as well as more conceptual studies), and that introspective insights willhave their place together with other observational and experimental methods, along thelines previously demonstrated by Pierre Schaeffer's phenomenological explorations ofthe features of musical sound (Schaeffer, 1966).1

Ecological constraints

In discussing the link between our images of sounds and our images sound sources,it is useful to postulate two possible extremes (or poles) in recent audition and mu-sic cognition research: Signal based versus schema based, or what could roughly betermed as a bottom-up versus a top-down understanding of audition and music cog-nition. Of course, most projects in audition and music cognition research proposemodels situated somewhere between these two extremes, as well as often proposemodels which rely on interactions between these two poles. Yet the distinction is rel-evant here, because the idea of a strong link between images of sound and images ofsound sources, as well as the idea of sound source images actually triggering soundimages and vice versa, clearly relies on top-down, schematic elements at work in audi-tion and cognition. In this way, audition, music cognition, and musical imagery, couldall be understood here as 'impure' phenomena in the sense of being cross-modal aswell as conditioned by ecological constraints. There has recently been a significantincrease of interest in cross-modality (Calvert, Brammer & Iversen, 1998), and someresearchers reject the classical division of senses altogether (Berthoz, 1997), postulat-ing instead a complex interaction of sense modalities as well as motor elements in allacts of perception and imagery.

Interestingly, we find an analogous distinction between 'pure' and 'impure' soundin the domain of digital signal processing between 'signal based' and 'physical modelbased' approaches. The first one (i.e. signal based) is essentially an abstract genera-tion of signals, such as in additive synthesis or frequency modulation synthesis. Thesecond one (i.e. physical model based) is however a simulation of the excitation andresonance of various physical bodies (which may be more or less 'real world' or moreor less chimerical), such as the plucking of a string, a hammer hitting a steel plate, orthe plucking of a string attached to a steel plate and also having resonant features of ahuman female voice (which would actually be a hybrid, source-filter kind of model).Digital synthesis by physical models could, amongst other things, actually be under-stood as an attempt to bring elements of top-down schematic control as well as evenhuman gestures into the process of synthesis (Cadoz, 1991).

In the domain of auditory scene analysis, the close link between our images ofsound and images of sound sources in perception and cognition is understood as anecological constraint, meaning that audition has evolved as a means of survival andorientation (Bregman, 1990). Briefly stated, auditory scene analysis suggests thatthere are two levels involved in audition, one on a low level, mostly signal based,called 'primitive' scene analysis and essentially concerned with what could be calledqualitative discontinuities in the continuous auditory signal, and on a higher level,what Bregman calls 'schema based' integration, meaning auditory cognition based

Page 253: Musical Imagery

240 IMAGINED ACTION, EXCITATION, AND RESONANCE

more on learned schemata such as assumptions of sound source. It has been shownthat this schema based integration can not only supplement degraded or incompletesignals so as to enable us to make sense of what we (assume to) hear, but even over-ride auditive signals, so that when looking at a video displaying the facial movementsof a person pronouncing a different sound than that we are actually hearing, we maybe convinced that we heard something else than what was feed to us in the acousticsignal (McGurk & MacDonald, 1976). This massive repertoire of relations betweensounds and sound sources which we may assume most of us have, constitute then oneelement of the ecological constraints of sound and sound source, a constraint whichis clearly an advantage in unfavourable circumstances, such as with complex inter-mingled sounds or in noisy environments. The advantage of this knowledge of soundsources has been demonstrated by research into computational auditory scene analy-sis, where artificial systems somehow must have recourse to a repertoire of previouslylearned patterns in order to function satisfactorily (Rosenthal & Okuno, 1998).

There seems then to be a fairly large amount of research which supports the ideaof top-down schemata based on learning at work in music cognition. Of particularinterest here is the idea ofmotor schemata at work in perception and cognition, knownthrough various variant versions as the motor theory of perception. In particular, thishas been applied in the study of category formation in speech perception with the ideathat speech sounds are identified not only by the acoustic features but also by images ofhow the sounds are assumed to be produced (Liberman & Mattingly, 1985). Althoughthere has been much controversy surrounding this theory in the past decades (Harnad,1987), accumulation of various indications of motor imitative behaviour in infants andat later stages of development, makes the basic principles of this theory seem more andmore wellfounded. It should also be noted that the idea ofmotor schemata is importantfor categorization in general (Rosch et aI., 1976), as one of the attributes of categoriesis that of similarity of action, e.g. what is common to all instances of the category'chair', i.e. across all possible variant appearances, is something that we sit on. Incognitive linguistics, the idea of motor schemata has been understood as underlyingvery many metaphors in everyday language (Johnson, 1987).

One consequence of these ideas of motor components in perception and cognitionis that imagery does not only include ecologically acquired knowledge as suggestedearlier by R. N. Shepard (1984), but also includes the element of simulation as pro-posed by A. Berthoz (1997). In finding most uses of the term 'mental representation'problematic, Berthoz suggests rather to use the term 'simulation', as this term is botha rejection of the idea of the human mind as an abstract symbol manipulating ma-chine, and is an endorsement of the idea that cognition and imagery are incessantre-enactments of the process of perception, including all the motor components thatgo into the perceptive act (movement of head, eyes, etc.), as well as a constant pro-duction of hypotheses as to what is to come in the next moment and what will be theappropriate action to be taken then (e.g. seeing a chair, and immediately thinking thatthis is something to sit on). In fact, Berthoz argues that movement and action are prob-ably the very basis for cognition in general, with the quoting from Goethe's Faust that'In the beginning was movement'. With regards to perception and imagery of musicalsound, there is then good reason to suspect that the production of sound is integral tothe very notion of sound itsel f.

Page 254: Musical Imagery

ROLF INGE

Sound-production

241

I have earlier proposed a schematic triangularmodel of cross-modality at work in mu-sic cognition 1997). The three main modalities of vision, action and auditionare seen as situated so that each modality directly relates to the other modalities, i.e.that there are strong links between action and audition as well as action and vision,strong links between audition and vision as well as audition and action, etc. There isof course much to be said about what constitutes the various modalities in perceptionand cognition, and in particular as to what are the delimitations of one supposedlyspecific modality in relation to another. As has become increasingly clear in the lastcouple of decades, there is probably always an interaction and cooperation between amultitude of faculties and/or neural substrates in perception and cognition (Berthoz,1997). What is most relevant in our context here, is the accumulation of research re-sults supporting the idea of a quite strong integration of vision, action and auditionalso at a neurological level (Stein & Meredith, 1993), i.e. that integration of vision,action and audition is not only a phenomenon of learning, but a matter of neurophys-iological disposition as well. The earlier mentioned various guises of motor theoryin perception and cognition could then be understood as instances of cross-modality,since both visual and auditory objects in perception and cognition are related to motorphenomena.

As for the action component in such a triangular depiction of cross-modality, thereis invariably the problem of how to represent action. Action may variously be repre-sented by physical parameters such as force, velocity, distance, angles, etc., or visu-ally and/or graphically as trajectories in time-space, or physiologically as patterns ofneuronal activity in muscular control, and cognitively as scripts, schemata, goals, oreven linguistically or poetically by various images of the kinaesthetic sensations ofmovement. Yet in our context of imagery, the problem remains that action is also a'silent' internal image of 'how things are done', as is depicted by the classical dis-tinction between declarative and procedural knowledge (Haberlandt, 1994), meaningthat whereas declarative knowledge may be more or less satisfactorily representedand communicated by language or other symbols, procedural knowledge resides inthe body, remains a 'feel' for certain movements, such as riding a bicycle, swimming,skiing, etc.

For this reason, I believe the triangular model of audition, action and vision isuseful in exploring musical imagery because it suggests that sensations of movement(procedural knowledge) are intimately linked with sensations of sound, and that bothsensations of movement and sensations of sound may also have visual correlates. Asfor movements, this means that the visualization of movement as trajectories in time-space could be an integral element of imagining a sound, e.g. when imagining thesound of a drum, also imagining the trajectory of the drummer's mallet and hand fromthe initial position to impact with the drum membrane and back again to the initialposition, i.e. as an entire 'action unit'. In this way, I believe it could be useful toregard most musical sound as included in action trajectories, and that these actiontrajectories have both a procedural motor component (a 'feel' for the effort, velocity,amplitude of movement, etc.) and a visual component (actually visualizing the actiontrajectory in time-space).

Frances Shih
Page 255: Musical Imagery

242 IMAGINED ACTION, EXCITATION, AND RESONANCE

The ecology of sound-producing actions

Excitation Resonance

What we do ......... •• The effects ofwhat we do

Motor images ....... Materials images

Figure 1. A schematic overview for the separation of excitation and resonance. Another wayto understand this separation is the distinction between 'what we do' and 'the ef-fects of what we do', as well as the distinction between motor images and imagesof the resonance features of whatever body is excited (strings, plates, membranes,etc.). that is, of what could be termed 'materials images'.

Sound-producing actions are of course but one class of movements which may beassociated with musical sound, as is evident from dance and other kinds of movementto music (such as in marching, in singing cradlesongs, etc.), or in the case of spokenlanguage for that matter, where gestures often seem to serve the purpose of empha-sizing elocution and parsing as well as visually giving some shape to the meaning ofthe utterance (McNeill, 1992). Also, the border between specifically sound-producingactions and more visually communicative and dance-like gestures made by perform-ers may be quite fuzzy, as is evident from the way in which many performers includetheir sound-producing movements in more large scale phrasal, or perhaps rhetorical,gestures. This inclusion of sound-producing actions into more expressive gesturescould however in turn be understood as a case of coarticulatory fusion (see the section'Excitation' below), hence actually as a higher level organization of sound-producingactions.

Even though I have tried to argue that sound and sound-source are inseparable inmost cases of music cognition, I believe it could be useful to conceptually separatethese two components here for the sake of enhancing our mental images of musicalsound2, as illustrated in figure 1.

This separation of excitation and resonance designates the generally valid differ-ence between the 'active' excitatory effort and the 'passive' resultant qualities.3 Thismeans understanding all 'natural' sound (Le. non-electronic) as a result of actions,as well as recognizing that our action images are 'active' phenomena because of ourmotor-mimetic capabilities, as is the central claim of various motor theories of per-ception and cognition.

A further consequence of this focus on the excitation separately from the reso-nance is that sounds are not only the results of excitations, but that the onset-pointsof sounds, Le. the point in time when the energy of the acoustic signal is first per-ceived, is but a point in a more extended action trajectory: The onset of a drum soundcomes after the impact of the mallet with the membrane, and this impact comes after

Page 256: Musical Imagery

ROLF INGE GOD0Y 243

the mallet and the hand has travelled from an initial position somewhere away fromthe drum membrane, and furthermore, the mallet and the hand (usually) travel backagain after the impact. Or: The onset of the singer's tone is but a point in an actiontrajectory comprising taking a breath, tensing the vocal chords, shaping the vocal tractand pressing the air through this vocal apparatus. This inclusion ofmost natural soundinto action trajectories, i.e. into a context of motor 'prefix' and 'suffix', should indi-cate that there are motor schemata which run parallel to 'pure' sound, constituting a'silent choreography' of sound-production integral to notions of musical sound.

This opens up an interesting domain for further exploration, potentially applicableto most features of musical sound, not just the patterns of onsets (rhythmical and tex-tural patterns) and the various expressive features associated with this (tempo, rubato,various articulations, etc.), but also contours of pitch and use of pitch-space (tessi-tura), and even timbral features where various formant shapes, transients and patternsof harmonic fluctuation could be understood as the result of action trajectories (relat-able to personal experiences of the vocal apparatus or other ecological experiences). Inparticular, the notion of action units in the sense of effort-relaxation and/or extension-contraction of limbs could be interesting to study in relationship to the formation ofmusical gestalts (rhythmical and textural patterns, 'gaits', phrases, etc.). This ac-tion element could then also be included in more systematic explorations or mentalexercises of incremental variation, as a kind of 'analysis by synthesis' approach toexploring the relationship between imagined action and imagined sound.

There is of course the question of innateness vs. more specific learned associa-tions involved in this, i.e. of the role of universal experiences vs. individual, personalknowledge, or of expert vs. amateur, involved in images of sound-production. Thiscould provoke criticism similar to that which has been directed towards the motortheory, in that we are obviously able to perceive, recognize, remember and discrimi-nate sounds and other impressions from the outside world which we can not possiblyexecute or reproduce ourselves, e.g. we may recognize speech sounds of a foreignlanguage we are unable to speak. Now, the answer to this criticism is that even inunfamiliar circumstances, we make hypotheses as to what we believe are the actionsbehind an impression, and notably so, if necessary, in a rather coarse and undiffer-entiated manner (i.e. many finer details and distinctions are not captured), yet stillsufficient for a first discrimination, a discriminatory capability which may of coursebe enhanced by repetition and learning. This could be understood as a general (andecological) tendency to relate to the world by incessantly making 'hypotheses' and toactually carry out 'simulations' as to what is the cause of what we perceive (Berthoz,1997).

As an example, consider the notion of pitch, which will have certain universal,culture independent elements for the simple reason that strings, tubes, membranes,etc. must have different lengths in order to produce different pitches. However, imagesof the exact arrangements or spatial layout of the source may vary according to socio-cultural background, as for instance in the case of some african xylophones where lowto high pitches are arranged from right to left (instead of in ascending order from leftto right as in most western european pitched percussive instruments). Also, within ourwestern musical culture, there may be different image schemata at work in the caseof different instruments, e.g. keyboard players may have the left to right ascending

Page 257: Musical Imagery

244 IMAGINED ACTION, EXCITATION, AND RESONANCE

ordering, string players may have a right to left combined with upwards movementson the string, clarinet, oboe, etc. players ascending from the end of the instrumenttowards the mouthpiece, etc. That this is in fact the case is suggested by a trulyremarkable study by M. Mikumo (1998), a study which also suggests that the recalland imagery of musical sound is enhanced by motor imagery (see the section 'Motorprogrammes and motor imagery' below).

Although excitation and resonance are separated here for the reasons given above,this must not be understood as disregarding the information in, and the schemataemerging from, the acoustic substrate. As has been shown (Leman, 1995), featuresemerging from the continuous signal can on a more long-term basis lead to the forma-tion of schemata based on principles of self-organization, schemata which in turn exertinfluence on the perceived signal. Rather, it is plausible that there is a convergence ofdifferent schemata in music perception, as is actually one of the main conclusions ofauditory scene analysis (Bregman, 1990). In our context here, there will probably bea cooperation between more 'signal intrinsic' and 'sound-producing' schemata in thesense that the signal intrinsic schemata (based on auditory modeling) are at work inthe resonance part of the conceptual model given above, and that the sound-producingschemata are at work in the excitation part of the model, Le. a dynamical process ofmutual influence.

Resonance

We are often capable of an immediate and fairly accurate identification of soundsources in our everyday environment (see Handel [1995] for a review of some relevantstudies). This knowledge of sound source will in most cases also comprise knowledgeof the materials involved (e.g. wood, metal, glass, etc.) and other features (e.g. large,small, thin, long, hollow, dense, etc.), and could be understood as knowledge about thereactions of objects in our world to our actions, and in the case of musical imagery, assimulations of the reactions of imagined objects in our world to our imagined actions.

This is a kind of 'tacit' knowledge about physical properties which in acousticalterms could be expressed as patterns of energy dissipation, specifically with regard tothe distribution of frequencies in the spectrum (e.g. degree of harmonicity or inhar-monicity, noise components, etc.) and the dynamics of this (e.g. patterns of fluctuationof harmonic content, patterns of damping or decay and frequency loss, etc.). For mostlisteners (except those who are trained in diagnosing the acoustic correlates of sound),this kind of knowledge is conceptualized quite simply as knowledge of objects inour environment, something which is again reflected in a number of metaphors usedto characterize sounds, such as 'wooden', 'metallic', 'hollow', etc. Furthermore, itwould be reasonable to assume that there are also bodily correlates of resonances inour vocal apparatus, meaning the experience of producing sound with our vocal chords(as well as unvoiced sounds) which are transformed by altering the shape of the vocaltract. Metaphors for vocalic qualities, such as 'open', 'narrow', 'wide', etc., attestto this (Slawson, 1985), and we may also find a number of other metaphors whichcharacterize sound as a whole, but which in acoustical terms could be understood asbelonging to the resonant features of objects set in motion, such as 'sharp', 'hard',

Page 258: Musical Imagery

ROLF INGE 245

'soft', etc. and often borrowed from the visual domain such as 'bright', 'dark', 'slim','fat', etc.

Knowledge of resonant features often also includes notions of what we could call'source coherence', meaning the identification of a source across variant instances ofexcitation, e.g. recognizing the instrument 'piano' played in its entire dynamical range(softest to loudest) and entire register (lowest to highest). It is generally agreed upontoday that such notions of source coherence across the entire range of dynamics, pitchand mode of playing, depend upon a combination of acoustic invariants and learning(Rossing, 1982)4, and it seems fair to assume that these notions of source coherencealso apply to musical imagery, allowing for a large range of variation in excitation.

Excitation

From the cross-modal model presented above, it follows that excitation of sound hasboth visual components (what the sound-producing actions look like) and motor com-ponents (what the actions feel like, Le. velocity, effort, distance travelled, etc). Aswas the case for resonance features, we encounter also here what could be labelled akind of 'ecological physics' in the sense that physical principles playa fundamentalrole but are embodied in the sensations of movement. One simple example of thisis the so called mass-spring model of movement (Rosenbaum, 1991), meaning that alimb which is extended by muscular tension will tend to move back to an equilibriumposition once the muscles are relaxed, like a spring which is bent will move back intoequilibrium once it is no longer held in a bent position. This image is applicable toseveral sound-producing actions, such as the playing of percussion instruments, pluck-ing of strings, playing of keyboards, etc. Another image of effort is that presented bymore continuous modes of excitation, such as blowing a wind or brass instrument orsinging, however there is still the shifting between muscular tension and relaxationinvolved here.

I have elsewhere 1999) tried to give a sketch of how we could classifysound-producing actions, and I will here only briefly recapitulate that we could di-vide sound-producing actions into two main groups: ballistic and sustained. Ballisticsound-producing actions consist of a very short exertion phase followed immediatelyby a relaxation phaseS, such as in hitting and plucking various instruments. Sustainedsound-producing actions consist of a more continuous effort from the onset of thesound until the end of the sound, such as in blowing or bowing. (There are howeversome intermediate types of sound-producing actions where there are multiple excita-tions in one gesture, such as in stroking maracas or tambourines.)

Both ballistic and sustained types of actions may appear singly or may be concate-nated into longer chains of more composite actions. However, actions may also besubsumed into hierarchies where several small scale actions are merged into a higherlevel gesture, such as in the playing of a scale on a keyboard where a number ofsmaller finger movements are subsumed into the gesture of the entire hand and arm(and even torso) moving along the keyboard. In such cases of strong subsumption, wemay speak of coarticulation,meaning that we have several different action trajectories(with different velocities and different axes of movement) at once, but all fused into

Page 259: Musical Imagery

246 IMAGINED ACTION, EXCITATION, AND RESONANCE

one action unit, causing the hierarchy of several movements to dissappear in favourof this higher level action unit. Used in connection with phonology to denote the an-ticipatory shaping of the vocal apparatus for the next, but not yet pronounced sound(Rosenbaum, 1991), and used in studies of human motor control to denote anticipatoryactions while performing another action, such as in the flexing of the fingers while stillextending the arm at the elbow to pick up some object (Rosenbaum et al., 1995), theapplication of coarticulation in sound-production seem quite obvious.

Sound-producing actions may of course be combined in highly elaborate patterns,as in complex instrumental textures for solo instruments or orchestral textures, wherethere often are several different simultaneous layers of action, constituting rather com-plex choreographic images. This all has a relationship with the previously mentionedidea of a motor theory ofperception and cognition where the basic belief is that there isan imitative component in all perception of movement, even in rather complex sceneswhich we may not be able to reproduce accurately, but which we still may rememberas vague images of effort and movement.

Motor programmes and motor imagery

As an extension of the idea that images of sound-production are integral to imagesof musical sound, it could be useful to briefly consider some points of motor pro-grammes, meaning the planning or the 'script' of actions, such as for picking up apencil, kicking a pebble, going for a walk or performing a piece of music (Rosen-baum, 1991), as well as motor imagery, meaning imagining actions without actuallyexecuting them (Jeannerod, 1995):• Even though motor programmes denote what is to be done and in what sequence,

there will in many cases be alternative ways of doing something, such as openinga door with my foot when I am carrying something in both hands. This substi-tution of one action by another which achieves more or less the same result, iscalled motor equivalence, and is quite convenient in our context here because itimplies the possibility of variability yet repeatability and relative stability acrossseveral different variants. This means that a sequence of musical sounds, suchas a melody, a rhythmical pattern, or a textural fragment, may have several vari-ant guises, Le. be imagined performed in various ways or transferred to differentinstrumental and/or vocal settings, yet preserve some overall features across thevarious variants. This is in fact in accordance with a basic principle of catego-rization, in that similar yet variant guises of an action can constitute a category,allowing for the constitution of prototypical images (Rosch et al., 1976). If weconsider the case of hitting a bass drum with a soft mallet, this action will firstof all belong to the category of sound-production on membranes hit with a mal-let, but further more belong to the larger category of hitting something, whichwould include most percussion and keyboard instruments as well. For this reasonwe could speak of a generative capacity due to motor equivalence which couldbe very useful in exploring and differentiating images of musical sound, as wellas enabling a continuum between rather vague (low resolution) and more exact(high resolution) images of musical sound in our minds.

Frances Shih
Frances Shih
Frances Shih
Page 260: Musical Imagery

ROLF INGE GOD0y 247

• Sound-producing actions are in a way 'simple' or have the appearance of being aunit, such as in the case of a single ballistic movement (e.g., hitting a tamtam), yetthe resultant sound may be quite complex (e.g., complex patterns of transients,complex inharmonic spectrum, noise components, etc.), and the neural dynamicsassumed to be at the base of the hitting movement may be quite complex as well.In this way, sound-producing actions could be regarded as focal points, as at theone and same time both complex and simple. This kind of 'translation' fromcomplex sound to more simple action images is reminiscent of G. A. Millersidea of 'recoding' as presented in his classic paper from 1956 (Miller, 1956).This fits well with the fundamental idea of motor theory as presented above,meaning that there is a motor-mimetic component in perception which tries todepict the sound-producing action as part of the musical sound, and using it asa memory trace for that particular musical sound. Such an idea of 'recoding' oreven 'compression' in musical imagery has several attractive features, first of allto facilitate the memorization and recall of musical sound. It is tempting to speakof a kind of 'triple coding'6 here, including the visual and motor components ofaction together with the auditory component.

• This recoding into action images also has the advantage of allowing for a dy-namic of imagery, specifically in the sense of compressed, 'fast forward' types ofimages, as well as in random order, looped, in slow motion, etc. This could per-haps shed some light on the enigmatic phenomenon of chunking (another mainpoint in Miller's 1956 paper) of musical sound, or of the cumulative, compressed,'instantaneous' overview images of musical sound, such as those we have afterlistening to even fairly long pieces of music. Macroscopic, chunked, overviewimages, or vague and approximate recollections of long sequences of musicalsound are comparable to the compressed versions of movies (or 'trailers') usedin advertising where in the course of a 30 second clip we are exposed to a densecollage of salient scenes from the movie (often the most violent and/or passion-ate scenes), so as to transmit a general image of a two hour movie in the courseof that clip. Actually there is a similar idea of a point by point presentation ofsalient actions exploited both in cartoons and in the production of animated car-toons. This is known as 'key framing' (Rosenbaum et aI., 1995), meaning that itis most economical from a production point of view to start out with a numberof significant postures or moments of 'frozen' actions, and then interpolate thecontinuous trajectories between these postures.

• Needless to say, motor imagery offers a number of advantages in terms of mem-ory for music (Mikumo, 1998) and for mental practice as well, although this fieldremains also to be more systematically explored in music. Following the ideaof 'simulations' as the essential component of cognition and imagery, the moresystematic application of motor imagery should have effects of priming on othertasks of music cognition by 'making present' a large amount of material for con-sideration or for an 'analysis by synthesis' approach to composition, arrangingand instrumentation. The close link between motor programmes and musicalcreativity has been remarkably described in Sudnow's introspective study of jazzimprovisation (Sudnow, 1978), but again this is something which should be moresystematically explored.

Frances Shih
Page 261: Musical Imagery

248 IMAGINED ACTION, EXCITATION, AND RESONANCE

Further research and conclusions

Although I have tried to present a number of arguments in this paper for a close linkbetween images of sound-production and images of musical sound, and the mutualtriggering of images of sound and sound-producing action, this is of course somethingwhich must be explored extensively to gain more credibility. As far as I can under-stand, this would have to be done in different domains, just briefly to mention whatseem to be the most important ones:• Exploring motor-mimetic phenomena as a basis for cognition, as remarkably sug-

gested on a rather introspective basis by Merleau-Ponty, and more recently sys-tematically explored by researchers like Meltzoff, Stein and Meredith, Berthoz,etc.

• More studies on the neurological and cognitive workings of cross-modality inimagery.

• More studies of motor control in sound-production, in particular the variousguises of coarticulation, as this seems to be quite close to the context dependentimages of real musical sound.

• Experimental explorations of the effects of imagined action on musical imageryin terms of priming, mental practice (performance), composition, arranging, or-chestration, etc.

• Exploring the interactions between schemata based on auditory modeling andschemata based on images of sound-production.

• Better visualizations of the physical properties of actions as well as of resonance.

Needless to say, the scope of unanswered questions here is very great. Still, I believewe for the moment have reasonable grounds for claiming that there are significantadvantages in cultivating motor imagery in musical imagery, in particular as a tangi-ble and dynamic means of evoking rich, multidimensional images of musical sound.'Structure' is a much used (and abused) word in musical analysis and music theory,and motor imagery could serve as a kind of 'deep structure' for musical sound, of-fering a healthy antidote to unfortunate abstractions in musical thought. This does ofcourse pose profound challenges to our notions of knowledge, as we would have tolearn how to handle and represent knowledge of our embodied 'feel' of movement.

Notes

1. It should be noted that in Schaeffers phenomenological explorations ofmusical sound, the source of thesound was to be ignored in favour of the 'internal' features of the sound (ordered in a multidimensional'typo-morphological'matrix). However, as a point of method, guiding our explorations of musicalsound by a progressive differentiation of features in our images of musical objects, this exploration ofexcitation and resonance that I am suggesting here will have a fundamental resemblance to Schaeffer'sapproach.

2. Separating these two components can have advantages not just for the exploration and enhancementof musical imagery, but fOf other domains of musical thinking as well. In particular, this separationof excitation and resonance could be useful for the study of orchestration or instrumentation, meaning

Page 262: Musical Imagery

ROLF INGE GOD0y 249

both the components of texture and timbre. Considering various possible variant resonances separatelyfrom excitatory actions is informative as to choice of the sustained quality of sounds, and consideringthe excitations separately will enable an attentional focus on the more textural qualities oforchestration.Conceptually, we could understand this as a continuum between a textural pole of onsets and a resonantpole of more or less stable spectral shapes.

3. In the case of vocal sound, the resonance apparatus is of course not just a 'passive' component. Also,there will in many cases also be a feedback from the resonance to the excitation, e.g. in piano soundswhere the energy in the resonance may partially dissipate back to exciting the string, hence introducinga third component here. However, from an embodied, ecological point of view, this distinction betweenexcitation and resonance is valid enough as a schemata for understanding images of musical sound.

4. Applications of physical models in digital synthesis provide a means for simulating and exploringsource coherence by varying the parameters of excitation (e.g. the velocity of a mallet striking asteel plate) while retaining the basic mode of excitation (mallet striking steel plate), or by keeping theexcitation constant and varying the size or other qualities of the resonating body (e.g., the size of thesteel plate). (See Freed [1990] for relevant research here). Interestingly, such variations of excitationand resonance parameters seem to within certain limits to produce an auditory image with sourcecoherence across several variants.

5. The term 'ballistic' is used in the human movement literature (Rosenbaum, 1991), denoting specificallydiscontinuous exertion in the form of a 'spike' of effort immediately followed by relaxation. Forinstance, in the case of percussion instruments, the excitation is in the form of an impact caused byballistic movement, and this impact is followed by a relaxation phase (e.g., hand going back to theinitial position) concurrent with an energy dissipation phase in the excited body.

6. In allusion to Allan Paivios idea of dual coding, i.e. both pictorial and verbal (Paivio, 1986).

References

Berthoz, A. (1997). Le sens du mouvement. Paris: Odile Jacob.Bregman, A. S. (1990). Auditory Scene Analysis. Cambridge, Mass. & London: The MIT Press.Cadoz, C. (1991). Timbre et causalitt. In J.-B. Barriere (Ed), Le timbre, metaphore pour la composition

(pp. 17-46). Paris: I.R.C.A.M.lChristian Bourgois.Calvert, G. A., Brammer, M. J., & Iversen, S. D. (1998). Crossmodal identification. Trends in Cognitive

Science, 2(7), 247-253.Freed, D. J. (1990). Auditory correlates of perceived mallet hardness for a set of recorded percussive sound

events. Journal of the Acoustical Society ofAmerica, 87, 311-322.God0Y, R. I. (1997). Knowledge in Music Theory by Shapes of Musical Objects and Sound-Producing

Actions. In M. Leman (Ed.), Music, Gestalt, and Computing (pp. 106-110). Berlin: Springer Verlag.God0Y, R. I. (1999). Cross-modality and conceptual shapes and spaces in music theory. In I. Zannos (Ed.),

Music and Signs (pp. 85-98). Bratislava: ASCO Art & Science.Haberlandt, K. (1994). Cognitive Psychology. Needham Heights, Mass.: Allyn and Bacon.Handel, S. (1995). Timbre Perception and Auditory Object Identification. In B. C. Moore (Ed.), Hearing

(pp. 425-461). San Diego: Academic Press.Hamad, S. (Ed.) (1987). Categorical Perception. Cambridge: Cambridge University Press.Jeannerod., M., (1995). Mental Imagery in the Motor Context. Neuropsychologia, 33(11), 1419-1432.Johnson, M. (1987). The Body in the Mind. Chicago: The University of Chicago Press.Kosslyn, S. M. (1994). 1mage and Brain. Cambridge, Mass.: The MIT Press.Leman, M. (1995). Music and schema theory: Cognitive foundations of systematic musicology. Berlin,

Heidelberg: Springer-Verlag.Libermann, A. M., & Mattingly, I. G. (1985). The motor theory of speech perception revised. Cognition,

21,1-36.McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746-748.McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago, IL: University of

Chicago Press.

Page 263: Musical Imagery

250 IMAGINED ACTION, EXCITATION, AND RESONANCE

Mikumo, M. (1998). Encoding strategies for pitch information. Japanese Psychological Monographs No.27. (The Japanese Psychological Association.)

Miller, G. A. (1956). The magic number seven plus or minus two: Some limits on our capacity for process-ing information. Psychological Review, 63, 81-97.

Paivio, A. (1986). Mental representations: A dual coding approach. New York: Oxford University Press.Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., & Boyes-Braem, P. (1976). Basic Objects in Natural

Categories. Cognitive Psychology, 8,382-436.Rosenbaum, D. A. (1991). Human Motor Control. San Diego: Academic Press, Inc.Rosenbaum, D. A., Loukopoulos, L. D., Meulenbroek, R. G. 1., Vaughan, 1., & Engelbrecht, S. E., (1995).

Planning Reaches by Evaluating Stored Postures. Psychological Review, 102(1),28-47.Rosenthal, D. F., and Okuno, H. G. (Eds.), (1998). Computational Auditory Scene Analysis. Mahwah, New

Jersey: Lawrence Erlbaum Associates.Rossing, T. (1982). The science ofsound. Menlo Park, CA and London: Addison-Wesley.Schaeffer, P. (1966). Traite des objets musicaux. Paris: Editions du Seuil.Shepard, R. N. (1984). Ecological constraints on internal representation: Resonant kinematics of perceiv-

ing, imagining, thinking, and dreaming. Psychological Review, 91(4),417-447.Slawson, W. (1985). Tone Colour. Berkeley/Los AngeleslLondon: University of California Press.Smith, 1. D., Reisberg, D. & Wilson, M. (1992). Subvocalization and Auditory Imagery: Interactions

Between the Inner Ear and Inner Voice. In D. Reisberg (Ed.), Auditory Imagery (pp. 95-119). Hillsdale,New Jersey: Lawrence Erlbaum Associates.

Stein, B. E. & Meredith, M. A. (1993). The Merging ofthe Senses. Cambridge, Mass.: The MIT Press.Sudnow, D. (1978). Ways ofthe Hand. Cambridge, Mass.: Harvard University Press.Zattore, R. 1., Halpern, A. R., Perry, D. W., Meyer, E., & Evans, A. C. (1996). Hearing in the Mind's Ear: A

PET Investigation ofMusical Imagery and Perception. Journal ofCognitive Neuroscience,8(1),29-46.

Page 264: Musical Imagery

14

The Keyboard as Basis forImagery of Pitch Relations

James M. Baker

Having taught music theory and musicianship for over twenty-five years, I am con-vinced that many musicians image music in terms of the instruments they play. Ingiving melodic dictation, for example, I have seen a cellist raise her left hand andlocate a problematic interval on an imaginary cello fingerboard in order to identifyit. Guitarists are likely to associate heard chords with particular finger configurationson their instrument. Since in these situations an auditory stimulus is present, suchbehavior might not be considered in the strictest sense to involve musical imagery,which has been defined as 'our mental capacity for imagining musical sound in theabsence of a directly audible sound source'.1 Nonetheless, the fact that students solv-ing dictation problems frequently seek to gauge intervallic distance and other pitchinformation in motor-tactile terms is indicative of the multi-modal nature of musicalimagery. Musical cognition in trained musicians entails not only auditory perceptionbut also aspects of performance.2 Musical imagery thus derives from a complex webof auditory, visual, tactile, motor, and other experiences, from which no single typeof stimulus or mode of perception can be easily extricated. For those of us who playan instrument, our training has a profound impact on the ways we both perceive andconceive musical structure.

Of all the standard instruments of western music, the keyboard is by far the bestsuited for imaging the complex tonal relations of western art music. Since the begin-ning of the Baroque era, with the introduction of the basso continuo and the inventionof the figured bass as a shorthand notation for that practice, the keyboard has served as

Page 265: Musical Imagery

252 THE KEYBOARD AS BASIS FOR IMAGERY OF PITCH RELATIONS

the basis for teaching and learning composition and theory. Today, a full four centurieslater, when the guitar can perhaps claim equal status with the keyboard as a means forlearning to play and compose, the keyboard retains its preeminence in the academy. Itschief advantage, of course, is that it makes it possible for a single musician to performthe essential harmony and voice leading of a musical work, and - in the case of thesynthesizer - to simulate the actual timbres of a work selected from the full spectrumof orchestral colors.

The importance of the keyboard goes beyond being a useful tool, however. Itcould be argued that the existence of the keyboard in the Renaissance made possibleand even necessitated the development of the science of harmony. Because it has beenthe means by which most musicians and composers in the western tradition sincethe seventeenth century have attained mastery of musicianship skills and theoreticalconcepts, the design of the keyboard has influenced and, to a certain extent, may haveeven determined the ways that trained musicians conceive musical structure, both interms of local features of harmony and voice leading and of overall design.

The following features and capabilities of the keyboard seem particularly signifi-cant regarding the imaging of pitch relations:1. The irregular arrangement of white and black keys for the chromatic scale within

the octave, together with a duplication of that pattern for each of the octaves(over seven full octaves on the modern piano), allows for instant visual-tactileassociation of any key with its sounding pitch, and for speedy identification ofall octave equivalents. Transposing material up or down an octave or multipleoctaves is highly idiomatic. The octave lies comfortably within reach for theaverage hand.3

2. Movement from left to right across the keyboard is linked with notation of pitchesin musical score from bottom to top in association with movement in pitch regis-ter from low to high. This arrangement of the keyboard appears to reflect a right-handed orientation, in which the left hand is ordinarily associated with chordalfoundation and accompanimental figuration, while the right hand is assigned themain melodic ideas and and their elaborations.4

3. The bilateral symmetry of the body is exploited in the allocation of musicalvoices, with more important outer parts (soprano and bass) projected by the outerportions of the hands. The clumsier thumbs usually take up less important inner-voice material. A pianist learns to weight the motion of arms and hands towardthe outer fingers to bring out the bass and create a singing melody.

4. The keyboardist practices all types of chordal and melodic configurations, as wellas combinations of motions between the hands: similar and parallel motions,and hands in opposition. The strong physical sensations associated with theseconfigurations and motions are distinct and easily recognized.

5. Seated comfortably at the keyboard, the pianist covers the registers associatedwith fundamental structure (in the Schenkerian sense of the Ursatz) merely byextending the hands forward in relaxed fashion. Registral extremes are literally areach for the hands and arms in each direction.

My thesis that the keyboard is a basis for auditory imagery is supported by theories ofcognition recently developed by Mark Johnson, Gerald M. Edelman, and others. For

Page 266: Musical Imagery

JAMES M. BAKER 253

Johnson, 'meaning is always a matter of human understanding, which constitutes ourexperience of a common world that we can make some sense of....And understandinginvolves image schemata and their metaphorical projections...' (1987, p. 174). John-son regards image schemata as embodied structures that give 'general form to ourunderstanding in terms of structures such as CONTAINER, PATH, CYCLE, LINK,BALANCE, etc. This is the level that defines form itself, and allows us to make senseof the relations among diverse experiences' (1987, p. 208). Edelman postulates a pre-verbal conceptual ability involving 'coordination of the simultaneous activity of thosebrain regions mediating the sense of joint movement, weight, touch, hearing, vision,and smell. The shape and feel of the body as it moves and interacts with the environ-ment play key roles in the building up of a sense of space and of the possibilities ofaction' (1995, p. 41). I would maintain that for musicians who learn music throughthe keyboard, responses to auditory stimuli are mapped in the brain at the same timethat responses to the visual, tactile, and motor stimuli involved in playing those soundsare mapped. Accordingly, for these musicians the image of a sound is a complex con-struction integrating maps of responses to all sorts of stimuli, including those resultingfrom the physical motions of seeing and playing the keys.

I want to make it clear that I approach this topic as a music theorist and musicianand claim no particular expertise in music cognition. I have, however, come across anumber of recent experimental studies in music perception which would appear to bearout my contention that the visual and motor aspects of keyboard playing are implicatedin musical imaging. The work of Bangert, Parlitz, and Altenmtiller - presented at the1999 Conference on Musical Imagery in Oslo -offers perhaps the most significantexperimental corroboration to date. They show that, for pianists with as little as fiveweeks of keyboard training, cortical structures for auditory and sensorimotor aspectsof keyboard performance are always activated together, regardless of whether the mu-sician is actually playing or simply listening. They speculate that activation of rightfrontobasal regions may playa crucial role in keyboard imaging in expert musicians.

A number of earlier experiments demonstrate the complexity of the auditory im-age. Crowder and Pitt (1992) report that subjects more readily recognize two succes-sive instrumental tones as having the same pitch if they also have the same timbre,and that recognition of pitch is speedier if the subject has already imaged the timbreof the tone. In experiments by Intons-Peterson (1992), subjects asked to generate anauditory image of a familiar event (e.g., glass breaking) also generated a visual one,with the visual image usually produced first.

Sloboda (1984) has shown that music reading is 'a real species of, and windowonto, music perception'. Musical knowledge comes into play in sightreading, be-cause good sightreaders are able to provide expressive interpretation of the musicthey are reading, since they apparently grasp superordinate structures and are able toplan finger movements before they actually play the notes. Sloboda's work showssightreading to be a complex multimodal activity involving genuine analysis of musi-cal structure and entailing a strong motor component.

Guiard (1989) had skilled pianists simultaneously sing the pitches and pitch namesof the right- or left-hand parts of the music as they played. Singing the right-handpart was judged easy, whereas singing the left-hand part was more difficult. Manualperformance errors were few in comparison to vocal errors, and it proved easier to

Page 267: Musical Imagery

254 THE KEYBOARD AS BASIS FOR IMAGERY OF PITCH RELATIONS

accompany the voice with the hands than to do the reverse. This work again demon-strates a strong hand-finger motor component in pianists' mapping ofmusical imagery,while for them motor elements associated with vocalization do not appear so stronglyimplicated.

Takeuchi and Hulse (1991) found that individuals with absolute pitch identifiedblack-key auditory pitches and visual pitch names more slowly than white-key pitchesand pitch names, while subjects without absolute pitch showed no difference. Theseresults seem to indicate that the early musical experiences of those subjects with ab-solute pitch involved the keyboard, with white keys encoded before black, as is thecase in traditional piano pedagogy. Since black keys are identified in terms of adja-cent white keys, the results might have to do more with names than keyboard imaging.However, it seems likely that, for those who first learned music through the keyboard,the aural-visual-tactile encoding of that medium is the source of the phenomenon. Thisresearch strongly supports the hypothesis that early instrumental training profoundlyimpacts musical imaging, and further investigation in this area is surely warranted.

Mikumo (1994) investigated memory for melodies by testing trained musiciansusing various strategies-verbal (naming pitches), auditory (retaining in memory byhumming or singing vocally or subvocally), visual (imaging contours), and motor (bytapping fingers as if playing the notes of the melody on a keyboard). Finger movementproved to be the most effective strategy for encoding, especially as the time intervalfor retention increased. A subsequent experiment by Mikumo (Experiment No.7,1998) confirmed that the tapping strategy is effective for melody retention, even ininterference conditions.

In another experiment (No.6, 1998), Mikumo tested the basis for melody reten-tion using twelve combinations of non-motor interference conditions. This experi-ment showed that skilled musicians often encode pitch information using two or threecodes at the same time, with naming notes, visualizing staff notation, and mental re-hearsal of pitches being the most effective for retaining tonal melodies. For atonalmelodies, a combination of pitch rehearsal and visualizing melodic contour workedbest. Mikumo's experiments 8 and 9 (1998) tested visual tracking as an aspect ofmelodic retention. Significantly, Mikumo found that subjects highly trained in playingan instrument 'precisely tracked their internal spatial representations in the compatiblespatial directions related to the motor system used in playing the instrument' (1998,p. 121).

The considerable literature on subvocalization in speech imagery is particularlyrelevant to the question of the keyboard as a basis for musical imagery, for speechimaging seems to require motor input from various sources-vocal cords, tongue, lips,positioning of teeth, etc. Smith (1992) reports that auditory hallucinations in schizo-phrenics entail subvocalization. Smith, Reisberg, and Wilson (1992) postulate thatactual motor performance may not be required for imaging, but rather that 'an unreal-ized motor plan . .. somehow produces a representation which can be interpreted bymechanisms of ordinary hearing'. We could likewise postulate that keyboard imagingneed involve only a visual-motor plan, even though actual movement is implicatedoften enough in musical imaging.

Before proceeding to my musical analyses, I would like to issue a small caveatregarding experimental research in musical imaging. Studies such as those just cited

Page 268: Musical Imagery

JAMES M. BAKER 255

Figure 1. Beethoven, Piano Sonata in C minor, Ope 13, i, mm. 221-254. Copyright 1952 byG. Henle Verlag, Mtinchen. Used by permission.

provide valuable information on the ways in which a broad range of human subjects,from unskilled listeners to expert musicians, perceive various aspects of music. Whatthese studies cannot tell us, however, is how our most gifted composers heard theirmusic. We should not too quickly assume, because of limits on perception demon-strated by experiments on us mere mortals, that Beethoven or Bach were limited inthe same ways. Just because ordinary subjects cannot identify the two tonal compo-nents of the bi-tonal Petrushka chord (Krumhansl and Schmuckler, 1986), we cannotinfer that Stravinsky was incapable of doing so. Just because the abstruse relationsof atonal and serial music are inaccessible to most listeners, we should not infer thatSchoenberg was composing out relations beyond his own hearing abilities. Of course,we shall never know whether the great composers of the past were capable of perceiv-ing aurally the full range of complex pitch relations disclosed through the analyticaltechniques now available to us. It may be that in a test of purely aural perception theywould have exhibited limitations not dissimilar from those of other skilled musicians.

Page 269: Musical Imagery

256 THE KEYBOARD AS BASIS FOR IMAGERY OF PITCH RELATIONS

The real-time constraints for purely aural perception may be more severe for all hu-mans (even our greatest composers) than for the other various cognitive componentsof musical imagery. Certain composers may even have relied heavily on the non-aural aspects of musical imagery in creating their compositions. In analyzing music,I take the composition to be the product of the composer's art, and I believe that it isa valid pursuit to try to discover by whatever means available the structural relationswhich underlie that composition. I personally cannot necessarily accept the findingsof empirical investigations as a constraint on the types of relations I investigate. In sodoing, I would be making potentially false assumptions regarding the capabilities ofthe composer.

That the keyboard informs our conception of musical space can be illustrated bya familiar example from the first movement of Beethoven's Pathetique Sonata (Fig. 1on the preceding page). In this excerpt the accompaniment is given to the left-hand,whose position remains throughout in a narrow register in the vicinity ofmiddle C un-til m. 253. The right-hand part, however, is split between two more extreme registers,entailing considerable crossing back and forth over the left hand. This hand crossing ishighly idiomatic for the keyboard, since the right hand is positioned similarly in bothregisters, supplying strong visual-tactile patterning. For example, in mm. 221 through231, the thumb covers C2 or C5, and the index finger F3 or F5.5 The relatively sym-metrical apportionment of musical space, focusing on the octaves below and abovemiddle C, is evident not only in the visual-spatial layout of the keyboard, but espe-cially in the athletic physical motion of right hand crossing back and forth at similardistances over the left. Note that the symmetry of octaves above and below middle Cis restored as hands resume normal playing positions in m. 253. It may be going toofar to say that a full appreciation of the musical structure of the passage demands thatthe listener know how it feels to play it. But it would seem that the physical act ofmanipulating the keyboard was a determining factor in Beethoven's control ofmusicalspace in this excerpt.

The second movement of Webern's Piano Variations Ope 27 offers another exam-ple of symmetrical apportionment of musical space coordinated with a very physicalexploitation of the body's bilateral symmetry. As is well known, the pitch structureof this work entails fixed dyadic associations, symmetrical about a single pitch, theA above middle C. For purposes of this discussion, I shall focus on one dyad, BQ-GQ,and show how it is treated in motor performance. I choose this dyad because it isinvolved in the registral extremes of the movement and in the most conspicuous eventat the center of the piece. The score for this well-known piece is given in Figure 2a(p. 257), and 2b provides specifics on the eight compositional settings of the B-Gdyad. In m. 2, the two pitches occur in extreme registers (in fact, these are the highestand lowest pitches of the movement) as grace-notes, with symmetrical fingerings andwrists turning inward, with left hand preceding right. The same pitches occur in m. 5as staccato eighths, with right preceding left. In m. 8, the pitches occur in less extremeregisters, in conjunction with the same figuration as in m. 2 (l.h.-r.h.), but with BandG as main notes, each decorated by grace-notes. (Here, interestingly, these pitches arein the same slurred pairings as in m. 2, but in opposite hands.) The very next eventin m. 8 presents Band G in 4closed position' (both within the octave above middleC), as the interior elements of fortissimo chords, each played with thumbs, in the right

Page 270: Musical Imagery

JAMES M. BAKER 257

f

8

13 14-11 10 11 1211"_.., I fr

" f L.J 'P 1 f

tJ "'P -

22

fb

(a)

pitch B2 G6 G6 82 G3 B5 B4 G4 B2 G6 G6 B2 84 G4G4841'eI. low high high low low high high low low hilb high low high low low hilbhand 1 1 1 1 1 1 1 r 1fingen 42 3 42 42 51211 lf2/5 3 3 51211 lf2/5 3 3

(b)

Figure 2. (a) Webem, Piano Variations, Op. 27, ii. (b) Treatment of the B-G dyad in We-bern's Op. 27, ii. Copyright 1937 (renewed) Universal Edition. All rights reserved.Used by pennission of European American Music Distributors LLC, Sole Agentfor Universal Edition.

Page 271: Musical Imagery

258 THE KEYBOARD AS BASIS FOR IMAGERY OF PITCH RELATIONS

hand then the left. In the first half of the piece, then, the four presentations entail thefollowing reciprocating pairings of orderings in the hands: l-r, r-U l-r, r-l. In each case,the higher register note is taken by the right hand.

In the second half of the piece, in the spirit of a variations movement, musicalspace is assigned very differently to the hands. This is immediately evident in thewild hand crossing of mm. 12-13, where the extreme pitches of the piece are playedas four staccato eighths in direct succession, coordinated as follows: low-high-high-low played l-r-I-r. The final two presentations of the B-G dyad occur in mm. 19-20, again in a succession of four eighths in direct succession. Here both dyads arein closed position, the first pair as the middle elements of fortissimo trichords (bothhands similarly fingered), the second as a simple two-note slurred grouping. Registerand hands are coordinated in exact reciprocity to the preceding grouping in m. 12:high-Iow-Iow-high played with r-I-r-1.6

Overall, the eight presentations of the B-G dyad seem to segment into four two-pair groupings, on the basis of alternating registers, hand allocations, and pitch allo-cations as well. Three of the four pairs entail the sequence B-G-G-B, with only thesecond grouping inversely as G-B-B-G. Groupings 1 and 3, the initial groupings ofeach half of the piece, entail solely the B2-G6 dyad. The last presentation in eachhalf involves the B4-G4 dyad. The G3-B5 dyad occurs only once, at the beginning ofgrouping 2, perhaps as a transitional event in the registration. We have examined thetreatment of only one of six dyads in the piece, and one can imagine that others oc-cur with similar patterned variations. (Only the axis pitch, A4, is treated consistentlythroughout, in auditory reinforcement of its structural status.) What is patently clear isthat the process of variation at the core of the work includes the bodily realization ofmusical relations. It would be a huge violation of the structure of the work to recon-figure the hand allocations to make the piece easier to play. A full appreciation of thismusic demands not only the aural recognition of pitches, but also a sense of reciprocalmotions stretching through musical space - gestures that can only be accomplishedthrough the choreography prescribed by Webern's notation.

To continue this investigation ofmusical imagery via the keyboard, I want to focuson the keyboardist's image of the voice exchange. As defined by Heinrich Schenker,the voice exchange is a procedure in which a pair of voices, usually soprano and bass,switch elements. Such a maneuver is shown in Figure 3 on the next page (Bach,Chorale No.3), where in m. 1 bass and soprano exchange pitches, with A3 in thebass moving to A4 in soprano, while soprano C5 is transferred to C4 in the bass. Thereciprocal motions of descending third from C to A in soprano and ascending thirdfrom A to C in bass (each filled in with a passing-note B harmonized as vii6) takeplace in conjunction with a prolongation of the a-minor tonic triad. In order to imagethe voice-exchange relation, one must first be aware of the outer voices interacting. Anexperienced listener can learn to recognize local-level voice exchanges on the basis ofpitch recognition alone. In performance, an individual singer might be less aware ofthe other parts and thus would probably not grasp this feature. A student unskilled inscore reading would not spot it on the basis of notation alone. A skilled score readermight notice the connection on the basis of replication ofpitch names (A and C) in tworegisters or the auditory associations with the notated pitches. A keyboardist wouldbe most likely to grasp the relation, not only on the basis of these features, but also

Page 272: Musical Imagery

JAMES M. BAKER

r--1 17'\

J I I I I I". - : - ...: - - TL_

I I I , l.-J I W I

I i i J J n n i...... '. - ... --voice exchange:

v

259

II

motor stimuli:wrists:fingers:

left

543

right

345

Figure 3. Bach, Chorale No.3.

on the basis of visual observation of the A-C thirds in adjacent octaves traversed incontrary motion on the keyboard (where the layout makes the relation quite obvious),as well as the motor sensations involved in playing the progression (one senses thereciprocity in the contrary motion of left wrist turning clockwise while right wristturns counterclockwise, or perhaps in the possible fingering 5-4-3 in the left handmoving against 5-4-3 in the right). The motor performance of the voice exchange atthe keyboard embodies the metaphor of the voice exchange as embracing or containing- and thus prolonging- a particular harmony. The keyboardist's image of the voiceexchange is thus a complex integration of neural mappings of auditory, visual, tactile,and motor stimuli. For skilled keyboardists, this image would occur regardless ofwhether the music is actually played or heard, since reading the music would entaildeveloping a motor plan for performance.

Our next example is the first of Schoenberg's Six Little Pieces Ope 19. The piece,while not tonal in any conventional way, nonetheless features a type of voice exchangeas the culmination of a process involving the symmetrical apportionment of musicalspace. Figure 4a on page 260 shows the pitch content of the opening measures, to-gether with a graph demonstrating symmetries about the axis dyad FQ4-FU4. In m. 1,BQ4 and C4 are generated as perfect fourths above and below the axis. These pitchesare projected outwards to B5 and C3 in m. 3, subsequently resolving inwards by stepto A5 and D3. The FQ4-FU4 dyad is likewise projected outwards to FQ5-GD5 and FQ3-

Figure 4b juxtaposes the opening and closing measures of the piece in orderto show that the piece is framed by pitch relations which approximate the enclosingeffect of the tonal voice exchange. At the beginning, the upper voice features the pro-gression from FQ4 to FU4 supported by the verticality DU2-E3. At the close, FU4 movesreciprocally to FQ4 and serves as the lower part, with E-DU transferred to the 6-octaveand played horizontally. The allocation of space involves a near-symmetry, withsounding two octaves plus 2 half-steps below the FQ4 at the beginning, and DU6 two

Page 273: Musical Imagery
Page 274: Musical Imagery

JAMES M. BAKER

Andante COD Varlazlonl•..-.!. IA, 4

261

(a)

10

v

II

(b)

Figure 5. Beethoven, Piano Sonata in A-flat, Op. 26 (1800-01), i: (a) Mm. 1-12. Copyright1952 by G. Henle Verlag, Mtinchen. Used by permission. (b) Mm. 1-8.

octaves minus 2 half-steps above F4 at the end. The relation of these two events ishighlighted to the performer by the similar 'feels' of the left-hand chord in m. 1 andthe right-hand chord in m.17 (they are fingered the same in their respective hands andinvolve the large stretch from to E). The sketch illustrates as well that thetetrachord in the left hand in m. 1, is turned inside-out in the right hand of m. 17, withinterior and exterior intervals (marked 'x' and 'y') reversed. All of these reciproci-ties are most readily apparent through performance of the piece at the keyboard, inthe course of which the visual and motor stimuli merge with auditory input to formcomplex imagery. It seems likely that the composition originated in Schoenberg's ex-perience of such complex keyboard-based imagery, rather than through manipulationof pitch relations and other parameters dissociated from the keyboard.

The expressive potential of the tonal voice exchange is amply illustrated in thetheme of the first movement ofBeethoven's Piano Sonata in A-flat Ope 26, an Andantecon variazioni. Figure 5a presents the score of the first eight-measure phrase, with ananalytical sketch provided in Figure 5b.? The character of this theme owes much to thevoice leading of the outer parts, which entails a good deal of parallel motion in sixthsand tenths on the surface, and a pair of voice exchanges at a slightly deeper level.The main tone of the upper voice, C5, is established as the goal of a voice exchangeof dyads in conjunction with the pattern 6-6 10-10, as shown in the sketch. Here theAb-C third is unfolded in ascending fashion in the upper voice, against the reciprocalmotion in the bass in conjunction with a motion from 16 to root-position I. In the latter

Page 275: Musical Imagery

262 THE KEYBOARD AS BASIS FOR IMAGERY OF PITCH RELATIONS

Moderato C&Dtabile molto espres81vo, I f /J i1

® ! ... .... ;a/\.

LilliI I J J I] .. ..

- . "T""'1.. f' f

Figure 6a. Beethoven, Piano Sonata in A-flat, Ope 110 (1821), i, mm. 1-23. Copyright 1952by G. Henle Verlag, MUnchen. Used by permission.

Page 276: Musical Imagery

JAMES M. BAKER 263

Figure 6b. Beethoven, Piano Sonata in A-flat, Ope 110, i, ffiffi. 1-23.

half of the eight-measure phrase (mm. 5-7), a similar 6-6 10-10 voice exchange takesthe melody from C5 to ED5 as the bass progresses from to 16 . Melody and bass thendescend to the main elements of the tonic, C5 over AD3, before proceeding to the halfcadence in m. 8. In my reading, the first four-measure phrase oscillates between tonicand tonic sixth before cadencing rather brusquely to V in m. 4. Thereafter focus onthe tonic is restored weakly in conjunction with an arpeggiation through and 16 to Ijust prior to a stronger half cadence to V in m. 8.

This theme projects an ambiance that is at once courtly yet tender. The intimacy ofthe theme is bound together with the motion of the hands of the player performing thepiece.8 The keyboard is a basis for musical imaging in this piece in another importantrespect as well. Beethoven's choice of the key of A-flat was probably not arbitrary.As is well known, specific keys had strong affective associations for composers in theeighteenth and nineteenth centuries. The individual qualities and colors of the keysderived from the various keyboard tuning systems in use prior to equal temperament,in which various tonalities truly sounded different, not only in register, but more im-portant in the specific sizes of the intervals among the notes of their respective scales.These associations continued vestigially even after equal temperament had wiped outthe small proportional differences in the intervals of the various keys. The key of A-flat has been traditionally associated with a nocturnal, amorous ambiance; Beethovencertainly used the key in this way in the slow movements of his Piano Sonata Ope 13(Pathetique) and the String Quartet Ope 127. In the Ope 26 theme we have examined,the stylized slow triple meter, suggesting the dance, as well as the legato articulation,generally soft dynamic level with periodic swells, and characteristic employment ofappoggiaturas (known in the Classic period as 'sighing motives') - all work togetherto enhance the projection of a romantic topic.

Twenty years later Beethoven turned to similar subject matter for the first move-ment of his Piano Sonata Ope 110, also in the key of A-flat. Music and a sketch arepresented in Figures 6a- and 6b. Here his evocation of intimacy is made explicit inthe instructions at the outset: Moderato cantabile motto espressivo and especially the

Page 277: Musical Imagery

264 THE KEYBOARD AS BASIS FOR IMAGERY OF PITCH RELATIONS

K

fll I ; r .J:-: J I = J ... JJ - - - - - ---4tJ r t-= :::::=- I

I I I I I I I I I I I 1 I I:r l,-----r e r .II

(a)

10

(b)

Figure 7. Schubert, Impromptu in A-flat, Op. 142 No.2, (Dec. 1827): (a) Mm. 1-16. Copy-right 1948 by G. Henle Verlag, MUnchen. Used by pennission. (b) Mm. 1-8.

unique instruction con amabilita, perhaps best translated as 'with loveability' or 'lov-ingly'. Beethoven's thematic materials here bear remarkable resemblance to those ofhis Ope 26 theme. In the opening measures, voice exchanges are stated directly at thesurface, while parallelisms (especially parallel10ths) are in evidence at slightly deeperlevels at the beginning and on the surface at the end of the passage shown (mm. 20-21).The focal element in the melody of Op. 110 is C (C5 at the beginning, and C6 lateron), and the bass motion essentially arpeggiates the tonic triad. It would thus appearthat Beethoven, in the Ope 110 sonata of 1821, was revisiting and reworking earlierideas from Op. 26, composed in 1801. Astonishing corroboration of this connection isprovided by the flighty arpeggiated material which enters in m. 12 to initiate a bridgesection modulating to the dominant. Underlying this passage is precisely the 6-6 10-10 pattern of the initial voice exchange of the Ope 26 theme. The only difference isone of register: whereas in the earlier theme the voices are in relatively close position,here they are in registral extremes, with more than five octaves separating Ap1 andC6. The sketch shows further exchanges of Ap and C associated with the stratosphericascent of the upper voice to C7 in m. 20. If Beethoven began his Ope 110 sonata withan evocation of the tender intimacy of Ope 26, that mood has given way to flights ofimagination in which the original idea has been re-imaged in a moment of visionaryecstasy. The arms, which at the beginning were in closed position in simulated en-folding of the Beloved, are now open wide, to embrace all of humanity or to receivethe blessings of heaven.9

Page 278: Musical Imagery

JAMES M. BAKER 265

Figure 8a. Beethoven, Piano Sonata in A-flat, Ope 110, i, tnrn. 76-79. Copyright 1952 byG. Henle Verlag, Mtinchen. Used by pennission.

Figure 8b. Schubert, Impromptu in A-flat, Ope 142 No.2, tnrn. 23-30. Copyright 1948 byG. Henle Verlag, Mtinchen. Used by pennission.

One composer who almost surely grasped the connection between Beethoven'sOpe 26 and 110 sonatas was Franz Schubert. In December 1827, only months afterBeethoven's death (and just months prior to his own death in 1828), Schubert com-posed his Impromptu in A-flat Ope 142/2. The main theme of that work is clearlymodeled on the Ope 26 theme, as is demonstrated in Figure 7 on the facing page. Inother respects, however, Schubert's piece draws upon ideas from Beethoven's Ope 110.Compare for instance the passage in mm. 27-29 of the Schubert, with its focus on FD(EQ), and mm. 76-78 ofBeethoven's Ope 110 (see Figs. 8a and 8b). The texture of theright-hand arpeggios against repeated dense chords in the low bass in Schubert's com-position, mm. 69-74, might well derive from the material in mm. 12-16 in Beethoven'sOpe 110 (compare Fig. 9 on the next page with mm. 12-16 of Fig. 6a on page 262).Replete with the rich musical imagery of Beethoven's pieces, Schubert's impromptuwas likely intended as a tender homage to the master he had never dared to meet butwhose music he idolized.

One might well lodge the reservation that this discussion of the keyboard as a basisfor musical imagery has dealt primarily with keyboard music. However, much instru-mental writing in western music seems to have been conceived in terms of relationsmost easily seen and felt in the design of the keyboard. Passages such as that shownin Figure 10 on the following page, from Beethoven's Harp Quartet, seem strikinglysimilar to the crossing-hands passage from the Ope 13 Piano Sonata examined in Fig-ure 1 on page 255. Some might argue that Beethoven's piano writing was informedby the quartet idiom; I would argue the opposite, however, that the back-and-forth

Page 279: Musical Imagery

266 THE KEYBOARD AS BASIS FOR IMAGERY OF PITCH RELATIONS

Figure 9. Schubert, Impromptu in A-flat, Ope 142 No.2, mm. 67-75. Copyright 1948 byG. Henle Verlag, MUnchen. Used by pennission.

Figure 10. Beethoven, String Quartet in E-flat, Ope 74, i, mm. 109-22. Used by pennissionof Dover Publications.

between violin and cello might well obtain its energy through the association with thecrossing-hands keyboard technique, or at least that the passage reflects an apportion-ment of musical space likely imaged in terms of keyboard layout. In the posttonalera as well, instrumental music often continues to reflect keyboard imaging. IgorStravinsky's harmony, for example, seems often to have been derived by juxtaposingkeyboard configurations which, taken separately, are familiar and idiomatic, but whichin combination produce striking results. In such circumstances, keyboard imaging canbe the key to a successful analysis. Figure lIon the facing page presents an excerpt

Page 280: Musical Imagery

JAMES M. BAKER 267

Figure 11. Stravinsky, Rite of Spring, Introduction, nun. 64-65. Version for piano four-hands. Copyright 1912, 1921 by Hawkes & Son (London) Ltd. Copyright Re-newed. Used by permission of Boosey & Hawkes, Inc.

from the Introduction to Stravinsky's Rite ofSpring in the composer's version for pi-ano four-hands, which makes fairly clear that some type of E7 chord is the essentialharmony of the passage; its status is not nearly so apparent in Stravinsky's orchestra-tion. Almost all western composers since the establishment of the figured bass havelearned the basics through the keyboard, and many regularly composed at the piano.I would therefore contend that keyboard imaging has been a determining factor ofstructure in much of the great body of Western art music composed over the past fourcenturies.

Notes

1. Musical imagery was so defined in the call for papers issued by the organizing committee for theConference on Musical Imagery held at the University of Oslo in June 1999.

2. John Baily astutely defines music as 'the sonic product of action' and recommends that we study 'theextent to which the creation of musical structures is shaped by sensorimotor factors' (1985, p. 237).Baily speculates that 'motor grammars may control certain aspects of the patterning of music' (1985,p.258).

3. The present-day keyboard layout existed as early as the 15th century, with a main row of seven keys(usually white) corresponding to the pitches of the church modes and a second row of five keys (usuallyblack) for flat and sharp pitches needed for transposition and correction of faulty intervals. Althoughapparently asymmetrical, the design is in fact symmetrical about the pitch D (an odd coincidence,probably unrelated to the fact that D was considered the root of the primary mode, the Dorian), anattribute not exploited until the nineteenth century by composers such as Clementi, Brahms, and Ziehn(Berry, 2000).

4. In a study using reversed keyboards, Laeng and Park (1999) found that 'inexperienced participants ofeither handedness find it more comfortable to perform the melody with their preferred hand'. It seemsreasonable to infer that the standard keyboard design favored the right-handed majority. Laeng andPark found that left-handed musicians appear to adapt fairly easily to the right-oriented keyboard.

5. For purposes of designating specific pitches, middle C is identified as C4. The lowest C on the piano is

Page 281: Musical Imagery

268 THE KEYBOARD AS BASIS FOR IMAGERY OF PITCH RELATIONS

C1, the highest C8. All pitches between a given C and the next highest C are identified in terms of theinteger of the lower C-thus the A between C3 and C4 is called A3.

6. The fact that Band G appear as interior elements of the trichords in mm. 19-20 in no way negates theimportance of this statement of the dyad. In instructing the pianist Peter Stadlen in preparing for theworld premiere of Op. 27, the composer specifically directed that these notes be brought out, circlingthem in pencil and writing in the instruction 'fortsetzen'. See the facsimile of Stadlen's working copyof the score, published as Universal Edition No. 16845 (Vienna, 1979).

7. My analysis differs somewhat from Schenker's in Figure 85 of his Free Composition (1935; trans. NewYork, 1979), specifically in the interpretation ofmm. 5-7, where I read continued tonic prolongation, ascompared with Schenker's reading of a prolongation of the motion from IV to V, with tonic relegatedto consonant-passing status.

8. I shall take the risk of making a psychoanalytic speculation that the rhythmic motions of the hands,relatively close together, moving in parallel or turning reciprocally, might entail unconscious associa-tions with experiences from infancy of bonding with mother, or even reflect a sublimated and rarefiederoticism.

9. I was delighted to discover that my speculations regarding the content and meaning of Beethoven'sOp. 26 and 110 sonatas are apparently supported by Maynard Solomon's psychobiographical research.In an intriguing analysis of four dreams Beethoven recounted in letters to close friends, Solomon con-cludes that 'the central desire expressed in all four of Beethoven's dreams is to return to [an idealized]Bonn infancy-Eden where he and his parents had been harmoniously united' (1988, p. 73). Unfortu-nately, in actuality 'the child Beethoven had been an object of strife between his father and mother,his person the battleground of a failed marriage, his allegiance the prize of a silent struggle' (p. 81).The boy grew up craving love but feeling unworthy of affection (p. 55). His ambivalence toward hismother was partially responsible for his problematic relations with women in adult life (p. 144). Af-ter a series of unfulfilled romantic episodes (his romantic attentions were usually directed to womenwhose age or marital or social status made them virtually unattainable), the composer attempted, withonly partial success, to sublimate his longings by devoting himself totally to his art. In the first entryin a diary he kept during the years 1812 through 1818, Beethoven wrote: 'You must not be a humanbeing, not for yourself, but only for others: for you there is no longer any happiness except withinyourself, in your art' (Solomon's translation, p. 246). And later: 'Everything that is called life shouldbe sacrificed to the sublime and be a sanctuary of art' (p. 258). Significantly, Beethoven evidentlyintended to dedicate the Op. 110 sonata to Antonie Brentano, whom Solomon identifies as the famous'Immortal Beloved', years after their romance had been concluded (pp. 181-82). (Ultimately the workwas published without dedication, so that Beethoven's only composition with a published dedicationto her would be the Op. 120 Diabelli Variatons.) The recasting of the Op. 26 material in the Op. 110movement makes sense, in light of the composer's resolve to forego the pleasures of earthly love, as anattempt to sublimate the amorous expression of his earlier work, elevating it to higher artistic purposes.

References

Baily, 1. (1985). Music structure and human movement. In Howell, I. Cross, & R. West (Eds.), MusicalStructure and Cognition (pp. 237-258). London: Academic Press.

Bangert, M., Parlitz, D., & AltenmUller, E. (1999, June). Neuronal correlates of the pianist's 'inner ear'.Paper presented at the Conference on Musical Imagery, 6th International Conference on Systematicand Comparative Musicology, University of Oslo, June 17-20, 1999.

Berry, D. C. (2000, March). The multivalence ofaxis-tone 'd' in the history and evolution ofsymmetricalinversion. Paper presented at the annual meeting of the New England Conference ofMusic Theorists,Brandeis University, Waltham, Massachusetts, USA, March 25-26, 2000.

Crowder, R. G. & Pitt, M. A. (1992). Research on memory/imagery for musical timbre. In D. Reisberg(Ed.), Auditory imagery (pp. 29-44). Hillsdale, NJ: Lawrence Erlbaum Associates.

Edelman, G. M. (1992). Bright air, brilliantfire: on the matter ofthe nlind. New York: Basic Books.Edelman, G. M. (1985). Neural Darwinism: population thinking and higher brain function. In M. Shafto

Page 282: Musical Imagery

JAMES M. BAKER 269

(Ed.), How we know (pp. 1-30). San Francisco: Harper and Row.Edelman, G. M. (1987). Neural DalWinism: the theory of neuronal group selection. New York: Basic

Books.Edelman, G. M. (1995). The Wordless metaphor: visual art and the brain. In K. Kertess (Ed.), 1995Biennial

exhibition (pp. 33-47). New York: Whimey Museum of American Art.Guiard, Y. (1989). Failure to sing the left-hand part of the score during piano performance: loss of the pitch

and stroop vocalizations. Music Perception, 6,299-314.Intons-Peterson, M. J. (1992). Components of auditory imagery. In D. Reisberg (Ed.), Auditory imagery

(pp.45-72). Hillsdale, NJ: Lawrence Erlbaum Associates.Johnson, M. (1987). The body in the mind: the bodily basis ofmeaning. imagination. and reason. Chicago

and London: The University of Chicago Press.Krumhansl, C. L. & Schmuckler, M. A. (1986). The Petroushka chord: a perceptual investigation. Music

Perception, 4, 153-184.Laeng, B. & Park, A. (1999). Handedness effects on playing a reversed or normal keyboard. LAterality, 4,

1-14.Mikumo, M. (1994). Motor encoding strategy for pitches of melodies. Music Perception, 12, 175-197.Mikumo, M. (1998). Encoding strategies for pitch information. Japanese Psychological Monographs,

No. 27. The Japanese Psychological Association.Sloboda, J. A. (1984). Experimental studies of music reading: a review. Music Perception, 2,222-236.Smith, J. D. (1992). The Auditory hallucinations of schizophrenia. In D. Reisberg (Ed.), Auditory inlagery

(pp. 151-178). Hillsdale, NJ: Lawrence Erlbaum Associates.Smith, J. D., Reisberg, D., & Wilson, M. (1992). Subvocalization and auditory imagery: interactions

between the inner ear and inner voice. In D. Reisberg (Ed.), Auditory imagery (pp. 95-120). Hillsdale,NJ: Lawrence Erlbaum Associates.

Solomon, M. (1988). Beethoven essays. Cambridge: Harvard University Press.Takeuchi, A. H. & Hulse, S. H. (1991). Absolute-pitch judgments of black- and white-key pitches. Music

Perception, 9, 27-46.

Page 283: Musical Imagery
Page 284: Musical Imagery

15

Composers and Imagery:Myths and Realities

Rosemary Mountain

Introduction

The art of composing may be considered an art of creating illusions. Music is the onlycontext in which we hear sounds which have no particular basis in the physical world,even though some physical props are necessary to create the artifice. In our everydaylives, the sonic environment provides sensory information on which we depend toanalyze what is going on around us, while hearing music for its own sake is a verydifferent experience. By common agreement, we try to suspend our normal listeninghabits and allow the music to create a different atmosphere, where the sounds areinterpreted as transcending the mundane world and creating new images - sometimesabstract, sometimes more literal imitations of familiar phenomena. It comes as nosurprise, therefore, that composers often resort to modes of imagery that draw on theproperties of the physical world in order to produce illusions that retain a convincingpresence and behaviour.

The problems associated with the study of imagery in music are compoundedwhen studying the use of imagery in the compositional process. Much of the compo-sitional process can take place in the subconscious, and composers have rarely felt anyneed to examine their strategies, much less to articulate them. Typically, the processand strategies employed differ widely according to the characteristics of the peopleinvolved. In addition, various comments by and about composers through the ageshave resulted in some pervasive misconceptions. In investigating the role of imagery

Page 285: Musical Imagery

272 COMPOSERS AND IMAGERY: MYTHS AND REALITIES

in the compositional process, I scanned various writings by and interviews with com-posers, looking for hints of how each one thought about composing. I complementedthis study with questions posed to colleagues and composition students, drawing onmy own experience as well. 1 It is hoped that the sampling of approaches and reflec-tions will illustrate the richness and complexity of the field and thus not only stimulatefurther inquiry but also aid in the preparation of adequate tools for the investigation.

The nature of composition

Intentions and objectivesVarying functions of music through the ages and in different cultures have led to quitedivergent intentions and objectives on the part of the composer. Music may be com-posed to accompany a religious ritual, or a film; it may express a personal state ofmind or describe a particular scene or mood; it may be an abstract design in the sonicmedium, or even a means by which to investigate properties of time or perception.The intentions and objectives will necessarily condition the process and consequentlythe specific use of imagery.

Processes and strategiesThere are various stages in the compositional process which can be grouped underthree headings: the gathering of material, the arranging of the gathered material, andthe encoding of the material for eventual communication to the listener.

Strategies for the collection and arrangement ofmaterial differ widely among com-posers. The collection process is usually an on-going activity throughout the years;eventually, a composer may have a remarkable repository of material available formental recall. The diversity of material that can contribute to a composer's repertoireof sounds are illustrated in Figure 1 on the next page. The sounds and designs consid-ered most appealing are remembered, analyzed, imitated, extended, developed, and/ortransformed, then recycled into the memory bank. The entire collection process mayoccur mentally, or be supplemented by notated sketches and/or recordings. At anypoint, the composer may begin playing with the memory bank sounds, arranging theminto various configurations. The individual's personality and work habits will deter-mine whether such manipulation of the material occurs more at the subconscious ormore at the conscious level, whether it involves methodical rigour, playful improvisa-tion, or both. The encoding generally takes the form of notating a set of instructionsto performers in the form of music notation; other means range from mental prepara-tion to play the composition in an improvisatory style to the recording of an electronicwork onto CD.

The sequence of these stages is not always linear, and very often the stages occurin a nested fashion. Thus, a composer may begin by gathering and arranging pitchesand rhythms for a melody which is then written down in music notation, ready to beselected and arranged at a later date for inclusion in a larger work. Equally possibleis that the large-scale form of the work, with something of its overall character, willbe established before the details of any melody are arranged. The first melodic lineto be written will not necessarily be the first to occur in the final work; authors will

Page 286: Musical Imagery

ROSEMARY MOUNTAIN

NATURAL ENVIRONMENTS

NON-VOCAL (footsteps, fq)pilgJ

273

ARTIFICIAL(MAN-MADE) ENVIRONMENTS

SOUNDAS OBJECI1VE

(jackhammer, engine)

.,. -'a,INvENTED SOUNDS CREAm> bytbe -'.,

COMPQ;ER •MENTAUYI through imagination )ACOUSTICALLY, through experimentation ./-

with electronics or acoustic sources ,.,.'"... ...... .....-......._._.._.-._.....

Figure 1. Sources of sound for a composer's repertoire.

recognize the parallel with refining chapters of a book in a non-sequential fashion.Commissions and requests from performers often dictate a specific instrumentation,and in some circumstances even the length of the work, the venue and the technicalabilities of the performers will be important factors from the start.

The role of inspiration and associated mythsThe initiative for composing is often thought to be inspiration, and in that guise it maysubsume gathering and arranging stages. The role of inspiration in the compositionalprocess is clearly relevant, but less clearly definable; today, much of what used to beregarded as inspiration is now referred to as the workings of the subconscious. It couldbe argued that inspiration implies imagery; certainly the myths that surround the onehave confused investigation of the other.

The most basic and insidious form of what I think of as 'the Mozart Myth' pre-tends that the composer's task is to receive divine inspiration in the form of a musicalmasterpiece, and then transcribe it: 'a stenographer to his Muse,' as Erickson wrylyexpresses it (1955, p. 43). Its most erroneous implication is that inspiration arrivesin the form of a pure and complete auditory image, already orchestrated, which thecomposer proceeds to encode from memory, from beginning to end. The myth doubt-less dates from antiquity, but seems to have been given particular resonance from twoseparate events recorded from Mozart's life. The first is his transcription of Allegri'sMiserere at age seven, after a single hearing: a feat that would require an extraordinary

Page 287: Musical Imagery

274 COMPOSERS AND IMAGERY: MYTHS AND REALITIES

musical memory and a complete grasp of the correlation between the heard sound andthe notation. Given the less complex musical vocabulary of the time, it is not quiteas dramatic as a child of today being able to notate Boulez's Structures, for example,but nonetheless it is proof (if true) of Mozart's formidable talent. The second event ishis completion of the Overture to Don Giovanni in a single day, just before the firstperformance.2 In this case, it is clear that the composer would have been working onit mentally for a long time previously.

The Mozart Myth becomes considerably more believable if one allows for the pos-sibility that the heard masterpiece is not necessarily specific and complete in terms ofits notational detail, nor that it is necessarily the first step in the process of composi-tion. Many works have indeed had a sudden moment of inspiration as their births -it is just that the 'fleshing out' of the idea may require subsequent months or years ofwork involving techniques which the composer has already refined over years of train-ing. On the other hand, it is possible that the composer does hear a complete, detailedwork all in a flash - but usually after having spent many months working with specificmusical ideas that the subconscious has finally arranged into a satisfying design.

Attempts to explain the real compositional process do not always clarify; Schoen-berg, on the first page of Fundamentals ofMusical Composition, says:

A composer does not, of course, add bit by bit, as a child does in building withwooden blocks. He conceives an entire composition as a spontaneous vision.Then he proceeds, likeMichelangelo who chiselled hisMoses out of the marblewithout sketches, complete in every detail, thus directly forming his material.[italics the author's] (1967, pp. 1-2)

As if anticipating the skepticism of other composers, he hastens to add: 'No beginneris capable ofenvisaging a composition in its entirety; hence he must proceed gradually,from the simpler to the more complex.' The spontaneity of his own vision is throwninto doubt by reports of his own struggles with Jacob's Ladder, a work which hefinally abandoned after many years. However, the inconsistencies fade on re-reading:Schoenberg may well have been referring to a spontaneous vision as yet untranslatedinto the musical language, and therefore the years of work are simply a testimony tothe difficulty of finding the appropriate language for expressing it.3

Conversations and readings, as well as personal experience, suggest that suchspontaneous vision of an entire work is most common upon awakening from sleep.Stockhausen reports (Cott, 1974, p. 24) 'I have all sorts of sound visions very oftenat night in a deep sleep. I wake up and the entire pieces are in me; I've heard them.'Such an experience can serve as a potent stimulus, but the process of transferring alldetails of the auditory image to a reproducible score is problematic. Typically, not allof the details are retained with the same clarity; just as the listener discovers whenleaving the concert hall, less crucial details fade first while key themes and texturesleave more indelible impressions. Even if the original image is purely auditory (an as-sumption we will examine below), it will not necessarily involve specific instruments.Instead, the composer might 'hear' an abstract sonic configuration, which must bethen encoded: 'translated' into a common musical language and notational systemfor reproduction on available acoustic instruments by available techniques - a processthat implies compromise, or at least interpretation.4 Unless the composer has a firm

Page 288: Musical Imagery

ROSEMARY MOUNTAIN 275

belief in the divinity of the inspiration, he or she may also be tempted to improve onthe aural experience, only to discover that such editing interferes with the retentionof the original.5 Likewise, even if the entire image is maintained in the memory forsome time, the mental focussing required to remember every detail can jeopardize itsretention. Stravinsky is equally articulate on his recognition of the dream vision andhis repudiation of the Mozart Myth; he makes a most poignant comment about his oldage when he remarks: 'I dreamed a new episode of my work-in-progress but realized,when I awoke, that I could not walk to my desk to write it down, and that it would begone by morning' (1972, p. 111) and elsewhere comments:

The idea of work to be done is for me so closely bound up with the idea ofthe arranging of materials and of the pleasure that the actual doing of the workaffords us that, should the impossible happen and my work suddenly be givento me in a perfectly completed form, I should be embarrassed and nonplussedby it, as by a hoax. (1970, pp. 52-53)

As further clarification of his objection to the passive role as Muse's stenographer, heasserts: 'We have a duty towards music, namely, to invent it' (1970, p. 53).

The use of imagery in the compositional processImagery may be intertwined with inspiration and stimulation, or it may be employedin a more mundane way as a practical tool which enables the composer to maintain amusical idea in memory while searching for appropriate expression of the idea. It cancome into play at any stage of composition, from the initial gathering of seminal ideasto the final encoding. Different tasks and different personalities affect the type andamount of imagery employed, but it is not uncommon for one composer to draw onmany types of imagery during the process of composing a single work. Auditory andvisual imagery often occur in predominantly single-mode form, and will be examinedbefore moving to the complexities of the more clearly multi-modal situations.

Auditory and visual imagery

Auditory imageryGiven that two of the three main stages of the compositional process are the gatheringand arranging of material, and that much if not all of that material is of an auralnature, it is logical that auditory imagery plays a fundamental role in compositionalstrategies. A major part of a composer's development is learning how to manipulateauditory images in order to arrange them into more extended configurations. Thisimplies not only the musical imagination to hear sonic gestures and chordal structures,but also the musical memory to be able to store and retrieve them again at will, and thecapacity to alter each image (for instance, by substituting a different instrumentation,changing the tempo, or transposing the pitch) and mentally replay it. A strong senseof auditory imagery comes into play when the composer decides to construct a newsound mentally from the superposition,juxtapositionor mixing of known sounds. Thevividness of the auditory image is equally necessary during the encoding stage, so that

Page 289: Musical Imagery

276 COMPOSERS AND IMAGERY: MYTHS AND REALITIES

it can be clearly maintained and referred to during the sometimes tedious procedureof notation.

By mentally 'playing back' the auditory image repeatedly to oneself, it can be-come 'engraved' more firmly into memory, but is still subject to being dislodged orobscured by other information, particularly of a similar sonic variety. This aspect canbe deduced by the fact that many composers search out quiet places to work, in orderthat physical sounds not interfere with the mental images. For the same reason, manycomposers prefer to work away from the piano in order to avoid timbral confusion andthe interference from potentially wrong notes struck while playing.

An imagined sonic configuration mayor may not be complete in all its parameters.Frequently, as evidenced by composers' sketchbooks and writings, the initial idea mayconsist of a melodic contour and rhythm but not yet be fixed in terms of starting pitch,instrumentation, dynamics or even precise tempo. Such incompleteness, far from be-ing a deficiency, permits a greater range of possibilities for appropriate transformationinto the final composition. However, the initial 'hearing' of the sound object will of-ten embody a particular character or expressive quality even if not present in the initialsketch. Therefore, when the composer begins to choose instrumentation and dynam-ics for a particular fragment, certain choices might be automatically excluded becausethey are contradictory to the desired character.

Visual imageryVisual imagery also plays an extremely important role in a composers' training. Mostof us began decoding music notation at an early age: learning the cryptic correspon-dence between complex graphic symbols and sonic parameters. The underlying prin-ciple is the use of an x1y graph representing frequency and time coordinates (Fig. 2a)6;dynamics are often represented by the graphic 'hairpin' as well as by letter symbols(Fig. 2b). The notation system has been continuously developed and extended; in the20th century many new symbols and refinements were introduced to reflect a grow-ing interest in specifying timbral shading (as in Figure 2c) and new approaches totemporal organization. In particular, the palette has been broadened by the adventof electroacoustic music which use graphics to represent sonic configurations as wellas the tools for their manipulation. Music theory and analysis also rely on symbols,largely based on Roman and Arabic numerals for describing tonal hierarchies, chordconfigurations (such as figured bass), metric structure, and various classifications ofpitch sets and intervals. Analytical approaches which incorporate graphics, whetherdrawings of melodic contours (Erickson, 1955; Zuckerkandl, 1959) diagrams (Ler-dahl & Jackendoff, 1983), graphic symbols and even spectrograms (Cogan & Escot,1976; Cogan, 1984), have been well-received.?

Composers have long used these and similar symbols in their compositionalsketches, allowing them to think about and/or play with the material without havingto specify all details. For example, the sketch in Figure 2d might serve to represent alocal pitch/duration complex of a few seconds' duration, or a plan for an extended pas-sage. In either case, it might be a sketch from a very early stage of the compositionalprocess, representing a vague idea of the structure before the detail is specified, or itmight be the graphic representation of a passage in which details have already beendetermined and therefore do not have to be specified in this particular diagram. In the

Frances Shih
Page 290: Musical Imagery

ROSEMARY MOUNTAIN

a)

1time

b)

>[getting softer] [accent]

(getting softer)

277

c) [vib. becoming slower]

[durations getting shorter]

d)

Figure 2. Graphic notation symbols. a) Standard music notation, in the context of a fre-quency/time graph. b) Two manifestations of a symbol indicating decrease in vol-ume. c) New notational symbols: vibrato with indication of frequency, a smoothnon-metric decrease ofdurational values, and non-specific frequency changes. d) Acomposer's sketch with variable meanings.

latter case, the sketch allows a considerable amount of information to be codified intoa single unit, if we keep in mind that the composer could mentally place a magnifyingglass on any part of the drawing and see, or hear, the detail. Although such imagesoften exist as drawings on paper, it is common for the composer to use them mentallyas well.

Computer sound programs also make visible a procedure which is familiar fromthe compositional process: the zooming in and out on the time scale. At some stagesit is useful to 'play' an entire composition-in-progress at the intended tempo, even ifa few sections are still lacking in certain details (at which point playback becomesextremely low-fidelity, resembling attempts to sing a song when one has forgottensome of the notes and words). However, during much of the compositional process, itwould be inefficient to start playback at the beginning, and visual imagery can providea convenient way to navigate from one section to another (then normally reverting toauditory imagery once the desired starting point is reached).

The composer may mentally construct certain passages with the aid of graphicimagery (especially traditional music notation) without transforming them into theirsonic counterparts. In fact, when assembling a piece by collage technique, ala Stravin-sky or Varese, it is easier to imagine the manipulation of scraps of paper than scraps ofauditory material, since in the physical world it is considerably easier to exchange thespatial arrangement of two objects than the temporal order of two events. Those who

Page 291: Musical Imagery

278 COMPOSERS AND IMAGERY: MYTHS AND REALITIES

work with sound programs on computers are even more accustomed to dissociatingsound from its temporal position in a large scale, as the cut-and-paste technique ofword processing is equally easy to perform with sonic data. However, given that theresult of the work is always of an aural nature, there is usually considerable interactionbetween the visual and the auditory image, and often the composer will regard themas inseparable. Given that the final encoding of a work is usually in the form of a mu-sical score, it is not surprising that the composer moves freely between the auditoryconcept and its visual representation.

Multi-modal imagery

Even while attempting to restrict the discussion to 'purely' auditory or visual imagery,it becomes evident that there are often latent associations between such images anda more complete model which has all the attributes of an entity or phenomenon ofour physical environment. Kinaesthetic imagery, for example, may come into playwhen a composer imagines a gesture, dance movement, the execution of a musicalpassage on a specific instrument. However, it is difficult to isolate this type of imageryfrom a more complete multi-modal one. The composer may arrive at multi-modalimagery in various ways: a scene, such as in film or opera, may benefit from theinclusion of sound effects to clarify the action; a certain mood may be desired and sothe composer draws on extra-musical imagery which is consistent with such a mood;or a particularly potent image may serve as a stimulus for the creation of a passage oreven an entire composition. The difference between these examples is more a questionof degree and attitude than of imagery itself. In the case of the sound effect, a soundis used as a more or less direct representation of its sound source, whereas in the caseof a metaphor as stimulus, the resulting sonic configuration may be far removed fromany auditory properties of the original image. In the case of creating mood, muchof the process may be largely unconscious on the part of the composer; it is such acommon objective that the means to achieve it may be indistinguishable from codifiedmusical practice. Text provides a special correlation with sound: in addition to theaural properties of speech itself - articulation, rhythm, contour, etc. - the words mayrefer to visual images and generally evoke mood as well.s A composer setting text tomusic will normally be sensitive to all such imagery, even if the intention is to avoidobvious parallels. In the case of opera, the complexity increases with the combinationof movement, song, visual imagery, and narrative.

Sound effectsEven before the integration of recorded sounds into musical contexts, composerscould convey the sense of an action or environment by an approximation of typi-cal sounds. Traditionally, the 'translation' of non-musical sounds into musical con-texts involved some adaptation or abstraction, so the listener is required to use someimagination to read the illusion, just as theatre-goers are asked to accept a few well-chosen props as indicating a change of scene. Beethoven's Pastoral Symphony isa famous example; the factory rhythms of Mossolov's String Quartet and the trainin Villa-Lobos's Little Train ofthe Caipira are equally convincing portrayals of spe-

Page 292: Musical Imagery

ROSEMARY MOUNTAIN 279

cific sound-producers within very musical contexts. The incorporation of actual extra-musical sound sources into a compositional context are rare, however (the cannons ofTchaikovsky's 1812 Overture being a notable exception); the ease of inserting themvia high-fidelity recordings led to the converse in the form of musique concrete: thetransformation of familiar sounds from the environment into abstract configurations tobe appreciated for their sonic characteristics instead of their traditional associations.However, a study of such work suggest that considerable effort must be expended onthe part of the composer - and sometimes, the listener as well (viz. Schaeffer's 'ecoutereduite' [1966])- to rip a sound away from the imagery of its natural source.

Creation of mood or atmosphereAn enormous segment of the musical repertoire was never intended as 'pure' music,but was designed to convey at least mood or atmosphere, and often more overtly hu-man emotions, narrative, activities, etc. What little information we have about the roleof music in the earliest days of humanity can be supplemented by our knowledge ofhumanity itself: music must have been integrated into ritual, accompanied dance, andserved as the vehicle for conveying verse, whether epic narrative or simple expressionsof love. It is recorded that bardic training among the ancient Celts involved playingmusic that could, at will, provoke joy, tears, or a healing sleep in the listener(s). Inrecent centuries, the integration of music with other arts has hardly lessened: opera,court dances, and art song are waning, but film, television, and even such events asfigure skating have created new contexts for music with words, images, and move-ments. It is difficult to imagine a composer writing for film without making referenceto atmosphere, visual images, or movement. Although the exact musical elementsand combinations responsible for expressing the range of mood and atmosphere sotypical of film have hardly begun to be studied in any rigorous fashion (see howeverCohen, 2000), there are obvious correspondences that draw on our knowledge of thebehaviour of people, creatures, and things in our environment. Sounds which resem-ble human utterances are among the most difficult to dissociate from their natural orprobable source: we are so accustomed to reacting to aural cues such as sighs, cries,or a quiver in the voice that music which is imbued with similar properties may un-consciously evoke similar reactions. (See for example Sundberg, 1982; Lindstrom,1997.) My own research (Mountain, 1993) suggests that much of our rhythmic per-ception in music is conditioned by our utter familiarity with human movement andthereby contributes to our appraisal of the amount of energy or tension in a musicalpassage by encouraging us (even if unconsciously) to compare the rhythms to our owncapacity for executing them. (This is parallel to our tendency to judge the scale of asculpture in comparison with our own size.) Aside from speech, motor movementssuch as walking and running provide the most obvious correlations, but gestures andphysically tangible evidence of metabolic states such as trembling or shivering alsocontribute to create specific moods. (See Krumhansl, 1997, for an overview on thesubject with special attention to music/dance correlations.)

The agreement between listeners on the impact of a musical passage and its moodor activity association (see for example Sloboda, Lehmann, & Parncutt, 1997) is pre-sented as tentative confirmation that the listener, if only subconsciously, imagines aplausible sound source (such as a human body) whose behaviour and characteristics

Page 293: Musical Imagery

280 COMPOSERS AND IMAGERY: MYTHS AND REALITIES

could produce the musical gesture or component. This is hinted at even in quite ab-stract musical contexts by verbal annotations in the score: agitato, lirico, allegro,without expression, hammering like a madman. Such instructions are an efficient wayto convey to the performers a coherent program that will modify every parameter ofthat passage according to a global directive. The imagery is developed only as far asnecessary to indicate a set of global characteristics which govern the behaviour of thesonic object. Research into the expressive timing and other aspects of performancebeyond the conventional notation coding has examined the effect of such directions,and verified that the musician will try to modify the performance in subtle but audibleways to convey the attribute requested. There is substantial evidence in many styles ofmusic of a complex set of rules applied by the performer which can be approximated,if rather crudely, by systematic algorithms. (See for instance Clarke, 1985; Clarke &Windsor, 1997; Kendall & Carterette, 1990.) In order for such a system to work, wecan deduce that the composer is acquainted with the same set of rules.

Analogies and metaphorsInspiration need not arrive in aural form. The auditory shape may be developed toexpress a more abstract or non-musical idea, event, process or action. The extent towhich the original stimulus is manifest in the final composition is almost exclusivelydependent on the intentions (and skills) of the composer, though naturally some stim-uli are more transferable into the musical domain than others. Whether the listener isaware of the original stimulus is often irrelevant; the function of the imagery can beexclusively that of a compositional tool. When an image is borrowed from the familiarfour-dimensional world, decisions about the details of the sonic manifestations can bemade by consulting the known characteristics of the image.

The types of metaphor and analogy that are commonly used by composers canbe grouped into general categories: animate beings, inanimate objects, processes andconcepts. As with inspiration in auditory form, such metaphors and analogies mayprovide the stimulus for a simple musical gesture or for the large-scale structure of awork. In addition, it must be stressed that the composer may in fact be only dimlyaware of the analogy, and it may either grow or fade in importance depending on itsusefulness in the compositional process. Naturally, there are also situations when thecomposer borrows from a mixed collection.

Animate beingsThe realm of animate beings seems the most potent in the field of musical imagery,especially when the beings are of human type. This doubtless stems from variousfactors: the influence of the Romantic era and its emphasis on the individual; theintimate and common knowledge of human activity among composers and listenersand the subsequent richness of expression which can be alluded to through imitation ofspeech, gesture, walking, etc.; the gradual codification through the musical repertoire(largely but not exclusively culture-specific) of these same allusions; the familiarityof narrative structure from opera, theatre and literature; and in general, the perceivedappropriateness of music to expression of our human environment.

Schoenberg (1967, p. 93) explains: 'The term character, applied to music, refersnot only to the emotion which the piece should produce and the mood in which it was

Page 294: Musical Imagery

ROSEMARY MOUNTAIN 281

composed, but also the manner in which it must be played.' He continues (p. 95):'In composing even the smallest exercises, the student should never fail to keep inmind a special character. A poem, a story, a play or a moving picture may providethe stimulus to express definite moods.' A clear example of the role of this type ofimagery in the compositional process is revealed by Stravinsky, who reports:

More than a decade before composing leu de Cartes, I was aware of an idea fora ballet with playing-card costumes and a green-baize gaming-table backdrop.The origins of the ballet, in the sense of the attraction of the subject, go back toa childhood holiday with my parents at a German spa, and my first impressionsofa casino there.... In fact the trombone theme with which each of the ballet'sthree 'Deals' begins imitates the voice of the master of ceremonies at that firstcasino ... and the timbre, character, and pomposity of the announcement areechoed, or caricatured, in my music. (1972, p. 43)

Similarly, Carter explains that the soloists in the Double Concerto act as 'mediatorsbetween unpitched percussion and pitched instruments' (1976, p. 76) and that thedesign of his Piano Concerto 'pits the 'crowd' of the orchestra against the 'piano's'individual,' mediated by a concertino of seven soloists' (p. 77). This kind of imageryis clearly related to the performance directives mentioned above, but on a larger scale,coherent through an entire work.

Inanimate objects, processes and conceptsThe trend towards abstract art in sculpture and painting in the 20th century was sim-ilarly present in music, though more subtle due to a perception of music's innateabstraction. A move away from the human sonic environment resulted not only inunsingable melodies and rhythms too slow for dancing, but also bolder ways of pre-senting temporal activities in general: collages of static textures, simultaneous pre-sentations of musical passages moving at different rates, a lack of continuity from onemotive to the next. Some composers felt, and acted, more like scientists exploringtime and sound, and the imagery used reflects this. Ligeti, for example, describes hisapproach to the electronic piece Artikulation thus:

First I chose types with various group-characteristics and various types of in-ternal organization, as: grainy, friable, fibrous, slimy, sticky and compact mate-rials. An investigation of the relative permeability of these characters indicatedwhich could be mixed and which resisted mixture. (1958, p. 15)

When describing the stochastic laws which he applied in many compositions, Xenakisgives as illustration (1971, pp. 8-9) the sound of hail on a hard surface, the sound ofcicadas in a summer field, and the sound of a political crowd, chanting, shouting, andbeing dispersed by bullets. Rather than wanting to imitate the specific sonic aspects ofany of these, it is the transformation of the rhythms from order to disorder which hechooses as a model. Stockhausen uses strikingly parallel analogies when describinghis own compositional methods:

I very often used the image of a swarm of bees to describe such a process.You can't say how many bees are in the swarm, but you can see how big or

Page 295: Musical Imagery

282 COMPOSERS AND IMAGERY: MYTHS AND REALITIES

how dense the swarm is, and which envelope it has. Or when birds migratein autumn, the wild geese sometimes break formation, flying in nonperiodicpatterns. Or think of the distribution of the leaves on a tree; you could changethe position of all the leaves and it wouldn't change the tree at all. (Cott, 1974,p.68)

Later he develops this further:

You can put [sounds] together at any speed, density, or distribution in a giventime and space field of the audibility range. You can produce a structure andrelate it to any natural event. You could, for instance, distribute sounds the waythe leaves on a tree are distributed. (p. 71)

A compositional technique which became vogue in the 20th century, usually referredto as 'mapping', encouraged novel configurations while maintaining at least a tenuouslink with natural phenomena. Mapping refers to the assignment ofa set of non-musicaldata onto specific musical parameters such as pitch, duration, dynamics, etc. A veryelementary form can be found in Cage's Atlas Eclipticalis, for which the composertook a map of the stars and placed it underneath transparent score paper, so that thedots representing the stars were transformed into dots representing pitches, and thetemporal distribution of the pitches was determined by their relative positions alongthe score lines. A stricter application of the concept is found in Dodge's electronicpiece Earth's Magnetic Field, in which data from Californian seismographic machinesdetermined the organization of pitch, dynamics, and timbral content. The temporalorder of the data was retained, although the scale was reduced to compress a year'sworth of data into several minutes. A more abstract form was used by Xenakis inPithoprakta, where mathematical formulae derived from the kinetic theory of gasesgave a Gaussian distribution to the pitch structures (see Xenakis, 1971, pp. 12-21).

A more poetic use of mapping is simply to use a stimulus such as a visual sceneor work of art in another medium and then freely 'translate' the concept or image tothe sonic realm. This mayor may not involve a temporal matching from one mediumto another. For instance, if the stimulus were the image of a flowering garden on asunny day, the notable characteristics of the image are neither auditory nor temporal.Therefore, it would necessitate a much freer kind of association, dependent for itsspecifics on the particular predilections of the composer. An early example is Honeg-ger's orchestral suite Rugby, in which the energetic movements of the game suggestedthe musical configurations. Several of my own compositions draw inspiration fromnatural phenomena. In Underground Streams, The Fish Weren't Jumping and SpringThaw, for example, I was thinking about the behaviour ofwater in melting ice, brooks,eddies, etc. which suggested various treatments of pitch, dynamics and rhythm at sev-eral levels, from foreground detail to large-scale formal structure. In such cases, mythinking about water would conjure up various visual and auditory images of brooks,rivers, and ocean waves, as well as non-image-based thoughts about their behaviourand ways in which that could contribute to the musical design. This imagery wouldgive way to more strictly musical considerations when appropriate.

Page 296: Musical Imagery

ROSEMARY MOUNTAIN 283

Mixed metaphorsIt may be that those composers who are not content to pour their musical ideas intotraditional molds are the most likely to resort to cross-modal imagery, in a searchfor new ways to organize their musical ideas or thoughts. In their search, they oftenwander from one image to another. Ligeti, whose talk about music is extremely richin visual and other imagery, expresses an approach which seems to embrace botha poetic association and Xenakis's view when he describes a particular passage asbeing 'rather like the slow, gradual transformation of the 'molecular state' of soundor the changing pattern of a kaleidoscope' (Ligeti, 1983, p. 39). In a similarly 'mixedimagery' comment (p. 60), he discusses the harpsichord work Continuum: 'The initialminor third is slowly blurred by the appearance ofother intervals, then this complexityclears away and gradually a major second comes to dominate.' Later in the sameconversation he remarks that 'my general idea for that movement was the surface of astretch of water, where everything takes place below the surface. The musical eventsyou hear are blurred; suddenly a tune emerges and then sinks back again.' On the'technical process of composition' he says that it is

like letting a crystal form in a supersaturated solution. The crystal is potentiallythere in the solution but becomes visible only at moment of crystallization[producing] supersaturated polyphony.... My aim was to arrest the process,fix [it] just at the moment before crystallization. (1983, p. 15)

Berio likewise exploits mixed-imagery metaphors when, talking about Circles (1985,p. 144), he says: 'I grouped the instruments around the text, reflecting the phoneticfamilies so that the sound is sometimes short-circuited and explodes.' In speaking ofhis First String Quartet, Carter says:

the Adagio [displays a] strong opposition between the soft, muted music ofthe two high violins and the loud, vigorous recitative of the viola and cello. . . while the Allegro scorrevole is a reduction of the typically diversifiedtexture to a stream of sixteenth notes with a seven-note theme, fragmented intodiversified bits that form a constantly changing mosaic. (p. 71)

Stockhausen also speaks of differentiation of musical layers in imagery terms, but ismore firmly abstract, though quite deliberately mixing the metaphors to clarify theorganization:

nowadays I even want to compose pieces where you have one layer whichis completely static and another which is then moving with a clear directiontoward a climax and a third layer which is epic, like telling you something butnot aiming at a certain end-narrative. (Cott, 1974, p. 35)

Because music is free of the constraints of the physical world beyond those of theproperties of sound and, in the case of acoustic music, the capabilities of performersand instruments, it provides an ideal realm to explore such mixing of imagery. Ele-ments or environments that would be incompatible in the real world are potentiallyevocable through music-, though as the complexities of such configurations increase,so must the composer's reliance on imagery to maintain the clarity of distinction.

Page 297: Musical Imagery

284 COMPOSERS AND IMAGERY: MYTHS AND REALITIES

Metaphors about music in generalA composer may find general metaphors and analogies about music useful for clarify-ing the entire process of composition and thus influencing the choice of strategies onhow to proceed, and even what to choose as a focus. The metaphor of the sonic objectwas found very useful within the context of electronic music (e.g. Schaeffer, 1966),where the concepts of melody and chord were inappropriate, and also proved helpfulin perceptual issues of distinguishing objects, or auditory images, and identifying theirboundary-forming characteristics (Bregman, 1990; McAdams, 1982, 1987). As withso many of our metaphors in music, 'sonic object' and 'auditory image' have obviousroots in the visual field: not only the words themselves but also the Gestalt princi-ples which had a significant influence on the refinement of the concepts. Despite thestatic qualities of most objects in the physical world, the sonic object must embodydynamic concepts, as is made explicit in (1997b). However, on the whole thisparticular metaphor is applied to smaller rather than larger units. McAdams refers to'coherent behavior' (1987, p. 39) in his definition of an auditory image. I have foundthis a very helpful metaphor for describing musical strata (see Mountain, 1998) as theterm behaviour suggests a clear link to the physical world, where everything has itsown properties and behaviour, be it a human, a volcano, or the moon. It implies notonly the passage of time, but also the probability of change, whether development,growth and decay, modification, minor fluctuation, etc. The limits of the amount ofchange which an object will experience and/or delineate are within boundaries oftenrecognizable only through observation.

It is clear that many composers are comfortable with thinking about aspects oftheir music in terms of characters, beings, objects, and phenomena. Extending this,it is not difficult to propose that in some sense they are creating imaginary worlds forthe listener's and performer's exploration. Emboldened by Johnson, who urges us toembrace metaphor as a potent tool for understanding music, and our relationship toit (1999), I propose a global metaphor to describe the phenomenon of many musicalcompositions: the performers are puppeteers who help present the composer's de-signed fantasy world, moving the themes and gestures in convincing ways so that theyappear to be emitted by imaginary physical beings, objects and phenomena. The ideaof melodies being moved by puppeteer musicians may seem more complicated thanthe usual one of melodies which move by themselves, but since the latter contradictscommon sense on reflection, it may be easier to think of musical gestures, phrases andtextures as being the product of sophisticated sleight of hand which is reproducingauditory images of physical sound sources, whether familiar or fictitious.

Xenakis moves beyond such a metaphor to a loftier aim: he explains that his explo-rations in musical composition grow out of an 'overriding need to consider sound andmusic as a vast potential reservoir in which a knowledge of the laws of thought andthe structured creations of thought may find a completely new medium of materializa-tion, i.e., of communication' (1971, p. ix). He continues: 'the quantity of intelligencecarried by the sounds must be the true criterion of the validity of a particular music.'What I find fascinating about this approach is that it argues in favour of maintaining,and extending, the links between music and our physical world: not only for reasonsof perception, but in order to refine our thinking about the world in general. It is inter-esting that Xenakis's use of stochastic organization, like Ligeti's and Stockhausen's,

Page 298: Musical Imagery

ROSEMARY MOUNTAIN 285

grew out of a dissatisfaction with the perceptual deficiencies of serialism, a mid-20th-century development in compositional technique. It cannot be coincidental that thesethree renowned composers returned to methods of organization that relate directly tomodels from the physical world. Our perception evolved to understand the physicalworld; it seems that our perception of musical illusions is thoroughly conditioned byits upbringing.

ReOections and Summary

The uses of imagery in the compositional process seem extremely variable, but it isclear to me that all types of imagery are used by some composers, and some typesby most of us. The imagery employed by a composer while thinking about musicmay contain vestiges of visual, kinaesthetic, auditory, and even visceral aspects, eventhough the intended musical configuration is meant to be an abstract sonic design. Thisis not surprising, given that the attempt to dissociate the sound from its sound sourceis in contradiction to logic; although art is often artificial, asking us to suspend ournatural expectations, we should not be surprised at the evidence of the props when ex-amining the backstages of the production. After all, it is natural that a composer who isworking to create convincing illusions of line, space, mass, and movement would con-sult his or her own knowledge about the physical world and the relationship betweenthe behaviour of objects, beings, and processes, and the sounds they emit. Bregman(1990), for example, argues that our hearing, in its full cognitive sense, evolved inresponse to our need to analyze all sensory data from the environment. Therefore,our strategies for grouping and segregating sounds can be seen as based in a constantdrive to identify the sources of sounds and their behaviour, significance, and intercon-nectedness. Godf3Y, in a similar vein, argues that our knowledge of the sources andbehaviours of sounds in the physical world conditions our musical listening by imbu-ing the sonic events with the characteristics of their plausible sound sources (Godf3Y,1997a) and extends this into the compositional realm by suggesting that composersincorporate kinematic images of sound production (Godf3Y, 1998).

Although many composers work much of the time in 'purely musical' terms, solv-ing problems of harmony and rhythm in a way similar to that of a mathematician solv-ing equations, they may still depend upon non-aural images for most of their thinkingabout the music. Notation, graphic sketches, analytical symbols and visual patternsare all very useful and efficient coding systems, so many composers supplement theauditory image with a visual correlate, which may be written down or retained asa purely mental image. Improving the mental manipulation of the musical materialthrough auditory, visual and cross-modal imagery is crucial to the composer's devel-opment; the discovery of particularly fertile correlations between the imagery and thesonic designs can in turn stimulate more flexible and powerful organization of themusic. As one of the most difficult aspects ·of composing for many of us is not theformulation of basic musical ideas but rather their refinement and subsequent notationonto paper, anything that can facilitate that process is treasured. The appropriate useof imagery can help retain the essential characteristics of the idea while the necessarydetails are chosen which will permit the full expression of those ideas into music.

Page 299: Musical Imagery

286 COMPOSERS AND IMAGERY: MYTHS AND REALITIES

This paper does not offer conclusions, but rather a survey from an 'insider' in afield that has understandably been shrouded in mystery: semi-conscious artists, notalways verbally articulate or consistent, creating ephemeral designs whose trace lastsonly in the memory of those who have the information to decode what they hear.The survey is not all-encompassing: the composers about whom I have read and withwhom I have talked do not create a fully representational group. I have focussed on20th-century composers, as I feel most secure with understanding their language andcontext. Jazz has largely been ignored, not due to any lack of interest but because theissues are different; not only is much of the composing done in real time, but joint cre-ative collaborations are typical. In addition, those who are composing in non-Westerncultures have not yet been considered; an extreme sensitivity to differences of culturewould be required to avoid posing questions which are culturally specific - even theidentification of one individual as composer can be a foreign concept. Despite theseomissions, I hope that the information presented here will help those who are in abetter position to analyze and contextualize to understand something of the magnitudeof the complexity of the issue of auditory and cross-modal imagery in composition.It's a complicated job, creating effective illusions of imaginary sonic objects mov-ing through time, but according to all those involved -composers, performers, andlisteners - it seems worth the effort!

Notes

I. The first issue of the 'Armchair Researcher' project was designed to help me in this investigation.Copies were distributed to friends, colleagues, and students in Europe and North America (particularlythe University of Aveiro in Portugal, the University of Salford in England, Concordia University inMontreal and participants at the CMI-99 in Oslo, Norway). The author would like to thank all of thosewho took the time to reflect and respond. Copies of the questionnaire are available from the author.

2. Sadie (1980, p. 709) says that this is an exaggeration; the overture was finished two days before theconcert. He sidesteps the issue of when it was begun, but as overtures give a preview of music tofollow, one can assume that the first mental sketch began with the first idea for the opera in its entirety(incidentally confirming the non-sequential order of composing). In the same article (p. 681), healso mentions that many of the anecdotal reports of Mozart's childhood achievements were taken fromreports written after his death by his sister and a friend.

3. Another inconsistency is remarked on inadvertently by Stravinsky, who comments that Schoenbergcomposed the violin part of his Fantasy first, and then added the piano part based on the violin'scontent (Stravinsky, 1970, p. 61).

4. The excitement with which many composers greeted the advent of electronic instruments was dueto the elimination of this necessity of translation, as the original sonic idea could theoretically bereproduced with great fidelity to all its nuances, without having to be mediated by physical limitationsof instruments and performers. Unfortunately, there are two tremendous obstacles to this process: thetime required to arrive at the desired sound, and the amount of extraneous sounds which may have tobe heard in the process, both of which can interfere with the integrity of the remembered sonic imagein the composer's mind.

5. One should perhaps keep in mind, however, that Stockhausen also believes we are all 'transistors in theliteral sense', so the theme of divine inspiration returns in a modem, slightly modified, guise. (Colt,1974, p. 24).

6. The representation is more complex with more than one instrument, as a high note in one may be belowa low note in another, but even the orchestral score represents the higher instruments below the lowerones of the same family: violin above cello above double bass, piccolo above flute above bassoon.

Page 300: Musical Imagery

ROSEMARY MOUNTAIN 287

7. Interestingly, Cogan (1984, pp. 85-92) attributes the stimulus for his exploration of spectral analysis(Cogan & Escot, 1976) to a quotation from Debussy in a letter about Nuages, where he says that thework was 'an exploration of the different arrangements that a single color can give - as, for example,in painting, a study in gray'. Although this was not necessarily the initial idea, Debussy had not yetcompleted the work, so we can probably conclude that he found the metaphor useful while composing.

8. Laurie Anderson claims that her involvement with music grew out of talking: 'the talking becomingmore musical - and now it's becoming more like talking again' (Smith & Smith, 1994).

References

Berio, L. (1985). Two Interviews with Rossana Dalmonte and Balint Andras Varga. New York: MarionBoyars.

Bregman, A. S. (1990). Auditory scene analysis. Cambridge, MA: MIT Press.Carter, E. (1976). Music and the time screen. In J. W. Grubbs (Ed.), Current thought in musicology (pp. 63-

88). Austin: University of Texas Press.Clarke, E. F. (1985). Structure and expression in rhythmic performance. In Howell, I. Cross & R. West

(Eds.), Musical structure and cognition (pp. 209-236). London: Academic Press.Clarke, E. F. & Windsor, W. L. (1997). Expressive timing and dynamics in real and artificial musical

performances: Using an algorithm as an analytical tool. Music Perception, 15(2), 127-152.Cogan, R. (1984). New images ofmusical sound. Cambridge, MA: MIT Press.Cogan, R. & Escot, P. (1976). Sonic design. Englewood Cliffs, NJ: Prentice-Hall.Cohen, A. J. (2000). Film music communication: Perspectives from cognitive psychology. In 1. Buh-

ler, C. Flinn & D. Neumeyer (Eds.), Music and Cinema (pp. 360-377). Middletown, CT: WesleyanUniversity Press.

Cott, 1. (1974) Stockhausen: Conversations with the composer. London: Picador.Erickson, R. (1955). The Structure ofMusic. New York: The Noonday Press.

R. I. (1997a). Chunking in music theory by imagined sound-producing actions. In Proceedings ofthe third triennial ESCOM conference (pp. 557-562). Uppsala: Uppsala University.

R. I. (1997b). Knowledge in music theory by shapes of musical objects and sound-producingactions. In M. Leman (Ed.), Music, gestalt, and computing (pp. 106-110). Berlin: Springer-Verlag.

R. I. (1998, May). Compositional sketching by kinematic images of sound production. Paperpresented at the symposium Musical cognition and behavior: Relevance for music composing at theInteruniversity Centre for the Research on Cognitive Processing in Natural and Artificial Systems,University La Sapienza, Rome, Italy.

Johnson, M. (1999, June). Something in the way she moves: Musical motion and musical space. Paperpresented at the Conference on Musical Imagery, Sixth International Conference on Systematic andComparative Musicology, Oslo, Norway.

Kendall, R. A., & Carterette, E. C. (1990). The communication of musical expression. Music Perception, 8(2), 129-164.

Krumhansl, C. (1997). Musical tension: Cognitive, motional, and emotional aspects. In Proceedings ofthethird triennial ESCOM conference (pp. 3-12). Uppsala: Uppsala University.

Lerdahl, F. & Jackendoff, R. (1983). A generative theory oftonal music. Cambridge, MA: MIT Press.Ligeti, G. (1958). Metamorphoses of musical form. Die Reihe, 7,5-19.Ligeti, G. (1983). Ligeti in conversation, with Peter Varnai, JosefHausler, Claude Samuel, and himself.

London: Eulenberg Books.LindstrOm, E. (1997). Impact of melodic structure on emotional expression. In Proceedings of the third

triennial ESCOM conference (pp. 292-297). Uppsala: Uppsala Univerity.McAdams, S. E. (1982). Spectral fusion and the creation of auditory images. In M. Clynes (Ed.), Music,

Mind, and Brain (pp. 279-298). New York: Plenum Press.McAdams, S. E. (1987). Music: A science of the mind? Contemporary Music Review, 2,1-61.Mountain, R. S. (1993). An investigation ofperiodicity in music, with reference to three twentieth-century

compositions. (Doctoral dissertation, University of Victoria, 1993). Information accessible at Disser-

Page 301: Musical Imagery

288 COMPOSERS AND IMAGERY: MYTHS AND REALITIES

tation Abstracts International, 55, DANN9013.Mountain, R. S. (1998, April). Sorting out the strata: Auditory scene analysis applied. Paper presented at

the Dept. ofMusic, University of Ottawa, Canada.Sadie, S. (1980). Mozart, Wolfgang Amadeus. In S. Sadie (Ed.), The new Grove dictionary ofmusic and

musicians (Vol. 12, pp. 680-725). London: MacMillan.Schaeffer, P. (1966). Traite des objets musicaux. Paris: Seuil.Schoenberg, A. (Strang, G. & Stein, L., Eds.) (1967). Fundamentals of musical composition. London:

Faber and Faber.Sloboda, J. A., Lehmann, A. C. & Parncutt, R. (1997). Perceiving intended emotion in concert-standard

performances of Chopin's Prelude no. 4 in e-minor. In Proceedings of the third triennial ESCOMconference (pp. 629--634). Uppsala: Uppsala University.

Smith, G. & Smith, N. W. (1994). American originals. London: Faber and Faber.Stravinsky, I. (1970). Poetics ofmusic. Cambridge: Harvard University Press.Stravinsky, I. (1972). Themes and conclusions. London: Faber and Faber.Sundberg, J. (1982). Speech, song, and emotions. In M. Clynes (Ed.), Music, mind, and brain (pp. 137-

149). New York: Plenum Press.Xenakis, I. (1971). Formalized music: Thought and mathematics in composition. Bloomington: Indiana

University Press.Zuckerkandl, V. (1959). The Sense ofMusic. Princeton: Princeton University Press.

Page 302: Musical Imagery

16

The Musical Imagery ofIndia

Lewis Rowell

The present article differs from most of its companion studies in this volume in itsfocus on oral representations of music in one of the world's most important old highcultures. 'India', in this context, refers to the common heritage of all the modernstates of the Indian subcontinent: Pakistan, India, Sri Lanka, and Bangladesh. I definemusical imagery as mental activity (including all its interpretations and descriptions)that arises in response to music as heard, remembered, or imagined.

In her reliance upon oral notations, metaphors, and analogies for the transmis-sion of her music, India exhibits many of the same characteristics found in other oraltraditions of world music. The unique qualities of Indian musical imagery may be at-tributed in part to the depth of her historical tradition and in part to the enthusiasm withwhich her images have come under examination by philosophers, experts in religiousritual, theatre and literary critics, and speech scientists, as well as musicologists. As aresult, a practical core of effective, explicit, and satisfying representations of musicalsound is supported by a body of explanation drawn from several related literatures.

Memory, in any oral tradition, is at once the mother of the arts and the inescapablereason why musical information is compressed, organized, divided into units and hi-erarchies, encoded for storage, and flagged for retrieval with colorful cultural asso-ciations. While the link between imagery and mnemonics is a predictable one (asone would expect in an oral tradition of sacred chant in which the correct recitationof scriptures is seen as a manifestation of cosmic power), Indian literature is againunique in having maintained a detailed analysis of the nature and function ofmemory

Page 303: Musical Imagery

290 THE MUSICAL IMAGERY OF INDIA

for more than three thousand years. Indeed, the Sanskrit word for 'tradition' is smrti(literally, 'that which has been remembered'), as opposed to the kernels of revealedtruth embedded in the Vedas (which are called sruti, 'that which has been heard').This distinction between presentational (sruti) and representational (smrti) cognitionis as important for artistic practice as it is for spiritual practice: Direct, unmediatedknowledge must always be the goal, but it is to be sought by means of representations.To paraphrase a medieval Jewish saying, 'If you seek to understand the inaudible,listen closely to the audible.'

After a series of preliminary observations that will set forth the ideological contextfor musical imagery, I will focus upon the major model for musical sound - the vocaltract and the process of articulation. The following section will demonstrate the colorand diversity of Indian imagery in several musical dimensions. The article concludeswith some observations on musical practice in an oral world.

Preliminary observations

What can be said, in general, about India's musical imagery? First, that the idea andbasic image of music comes from 'in here,' not 'out there'. The traditional Hindu-Buddhist Weltanschauung regards the world from an inside perspective, with the in-dividual human body as the absolute center of a concentric universe - a center fromwhich a continuous stream of sound emerges in an upward spiral, carried by the breath,amplified as it passes through various chambers of resonance, and transformed intoperceivable units by the organs of articulation that serve as the gateway to the outsideworld. This is the core concept of sound, as described in the following passage.

The mind stirs up the body's fire;The fire then sets in motion wind;The wind then, moving through the chest,Produces pleasurable sound.As stirred in heart by means of fire of friction,Less is it than the least; in throat, it is doubled;And know that on the tongue-tip it is trebled;Come forth, it is the alphabet! - They say thus.

Maitri 7.11.4-5 (Hume, 1931)

On the strength of descriptions such as the preceding, the human voice, along with itsarticulatory procedures, came to be regarded as the model for all music, instrumentalas well as vocal. There is no suggestion that music is an imitation of external nature,as we find in many other world musics. It is, rather, an emanation of internal reality,that which lies beyond nature.

Second, that Indian music, like most Indian scripts, is a syllabary. The essenceof Indian musical imagery is to be found in her oral notations, distinctive collectionsof syllables by which pitch and rhythmic units are captured for purposes of teach-ing, learning, performing, remembering, and reception. The Sanskrit word for sylla-ble (alqara) is usually taken to mean 'imperishable' or 'permanent', and the implicit

Page 304: Musical Imagery

LEWIS ROWELL 291

message is that the individual syllables that represent clusters of microscopic musicalevents are the abiding part of any spoken or musical utterance. Aristotle may have hadsomething similar in mind when he conceived of perception in terms of essential andaccidental features. The essential features of Indian music are encoded into stringsof syllables, which have the capacity to attract unto themselves all the accidental orcontingent features of music that might otherwise escape the reach of the mind.

Third, verbal representations of Indian musical sound are in no way medium-specific. The characteristic descriptions, analogies, and metaphors draw upon a col-lection of simple, everyday, down-to-earth objects and concepts: a clay pot, an oillamp, drops of cooking oil, monkeys, elephants, and tigers - used over and over againin different contexts, with no small amount of humor, and often in the most outrageousmetaphors:

Because of [this treatise of] Vitthala, may people now cross without fear theocean of tala (meter) which has become extremely difficult to traverse, owingto the crocodiles of conflict between theory and practice.

Nartananir1)llya 1.26Ocdef (Sathyanarayana, 1994)

Indian scholarship has cultivated a taste for metaphor and analogy, rather than straight-forward explanation: learning is neither an incremental nor a linear process - it comesin moments of revelation when the mind is suddenly flooded with meaning after anencounter with a colorful kernel of truth.

And fourth, images are somehow closer to reality and have a higher ontic statusthan in the West. If the world itself is a parade of images and our perception veiled bythe curtain of illusion known as maya, on what grounds can we distinguish betweenone category of imagery and another? Images of musical sound - or of anything-are something more than convenient, disposable tools for the imagination and thememory: they are the mind's only contact with the reality they represent.

Fifth, oral notations are by no means the exclusive property of nonliterate musicaltraditions. Syllable notations have proved valuable in nonliterate and preliterate soci-eties, but they are equally valuable as codes in bodies of oral literature - especially incontexts where written knowledge has been imperfectly preserved and musical scriptshave been regarded as inherently defective. Nor is the notation in any way intended toconvey the full details of any musical utterance: a syllable ofmusical notation - heard,pronounced, or imagined - is a flag for the memory, within which more detailed ver-sions of musical events have been stored.

And finally, we may legitimately ask whether images are means or ends? Formany Indian and Western listeners, music is assumed to be a symbol of emotion,triggering a stream of visions, dreams, and memories. In this view, music is alwaysabout something and has a content. I would like to suggest that, while Indian imageryis often tinged with cultural associations and emotional flavors, its essential purposeis to reduce the load on the memory. Indian authors are fond of pointing out that in astate of ignorance the mind is occupied with its own contents and how they are beingprocessed; but in a state of knowledge, all such mediation disappears.

I will return to the practical consequences of these issues in the final section of thisessay.

Page 305: Musical Imagery

292

Phonetic representations ofmusical sound

THE MUSICAL IMAGERY OF INDIA

In the early history of Indian music and musical thought, music -as a topic- wasoften expounded within the context of the literature of articulatory phoneticsand Indian scholars make a strong case that the conceptual framework and guidingimagery for their music arose primarily from speech science and the performancemanuals for sacred chant. Ancient traditions of sacred music were almost alwaysrecorded more diligently than secular music, in India as in the ancient and medievalWest, so the picture we see is perhaps drawn from an institutional and ideologicalperspective. Nevertheless, the need to devise concepts by which ritual music could bepreserved and transmitted was one of the major reasons for the rise ofmusical systems,their mental and verbal representations, and their notations. In the case of India, theaccomplishments of the ancient phoneticians and grammarians are well-known andrightly celebrated.

The primary model for both the sounds of the Sanskrit language and the sounds ofIndian music was the vocal tract - and, more specifically, the order and structure ofthe Sanskrit alphabet with respect to place of articulation (see Table 1 on the facingpage). Imagine a human head drawn around the display of the 48 Sanskrit morpho-phonemes in Table 1, in profile and facing toward the right. The sounds of Sanskrithave traditionally been conceived and displayed in the form of a linear grid with (a)various subcategories of vowels and consonants as the vertical axis and (b) place ofarticulation as the horizontal axis: from left to right, the throat (velar), the palate(palatal), the roof of the mouth (retroflex), the teeth (dental), and the lips (labial). Theorder of the Sanskrit alphabet, which was determined more than 2500 years ago, is astraightforward progression through the vowels, diphthongs, and consonants - fromtop to bottom within each category of sounds and moving gradually from the back tothe front of the mouth. The individual letters and the sounds they represent followthe path of sound as it emerges from the inner regions of the vocal tract and reachesthe outside world. Many of the consonants require very subtle distinctions, especiallythe five nasals, and this is one of the reasons why Indian chanters and singers areso precise in their articulation - remarkably so when compared to many other vocaltraditions in world music.

It is important to point out the unique features of this system: Many languages dovery well without alphabets, many alphabets have no phonetic basis, and - for most ofthose alphabets that do have a phonetic basis - their order and structure is generallyquite arbitrary. The order of vowels in the Roman alphabet does follow a relativelyconsistent path from the back to the front of the mouth, but by what principle are theyinterspersed with the consonants and by what principle are the consonants ordered? Incontrast, the phonetic principles of Sanskrit are reflected not only in the conventionalorder and structure of the alphabet, but also in what we might call the cultural modelfor the sounds of speech, chant, and song.

My point here is that this model of a stream of sound rising in a continuous spiralin the body and then crisply articulated into a string of distinct syllables lies at theheart of Indian musical imagery. Indian singers are constantly aware of the smoothoutflow of breath and the·need for precise articulation at syllable junctures, and, likeVedic chanters and reciters, they often mimic the internal path of sound with hand

Page 306: Musical Imagery

LEWIS ROWELL 293

Table 1. The Sanskrit Morphophonemes

Velar Palatal Retroflex Dental LabialVowelsSimpleShort a r uLong a I f ii

DiphthongsShort e 0Long ai au

Consonants- Visarga1 I)Anusvara2 rilStopsvoiceless unaspirated k c t t Pvoiceless aspirated kh ch th th phvoiced unaspirated g j Q d bvoiced aspirated gh jh gh dh bh

Nasals il fi I) n mSemivowels y r I vSibilants S sAspirate h

Note. I Final aspiration. 2 Nasal closure.

gestures. The result is an image of sound as fluid continuity, with minimal but distinctarticulations, as described in the two following analogies:

Just as a tigress who, when carrying her cubs by the scruff of the neck, holdsthem securely with her teeth so that she shall not drop them, and yet takescare not to bite them, so it is that one should recite the sacred syllables [of theVedas].

Niiradfyasilqii 2.8.30 (trans. Rowell)

When counting time or musical units, [one should think of] drops of oil, ratherthan drops of honey, ghee, or water, because of oil's greater viscosity.

(saying)

The logical consequence of this line of reasoning was to extend these analogies toinstrumental practice. From early times, Indian authors have noted the resemblancebetween (a) the grid that locates the articulation of sounds in the vocal tract, and (b) thegrid of strings and crosswise frets on string instruments such as the vina, and later thesitar. The Aitareya Aral)yaka, which was written perhaps as early as 500B.C.E., drawsa set of correspondences between the human body and the vina, with reference to theirproduction of sounds (3.2.5, see Janaki, 1985); and the subsequent development ofIndian music demonstrates clearly that vocal sound has become the primary model forall instrumental practice. This accounts for the extraordinary profusion of ornamental

Page 307: Musical Imagery

294 THE MUSICAL IMAGERY OF INDIA

clusters, transitions, and oscillations in the domain of melody, a cultural preferencethat has virtually eliminated the distinction between vocal and instrumental style.

Categories of imagery

Samples of five different categories of imagery are presented in the following pages:metrical groups (in poetry and in music), rhythmic syllables, pitch syllables, ornamen-tal phrases, and distinctions of timbre. Each of these examples will permit us to viewIndian musical imagery from a different perspective; together they will demonstratesomething of the range and characteristic flavors of this body of imagery. The fivecategories have one common feature: each one is a representation of what we mightcall a unit of performance (a metric or rhythmic pattern, a duration, a note or tone, anornament, a tone color) - anything that can be conceived as a unit and therefore storedin the mind as a single image. I shall begin with syllabic units and proceed to othertypes of imagery.

Figure 1 on the next page and Figure 2 on page 296 contain samples of the metri-cal patterns of Sanskrit verse and the rhythmic patterns of tala (the system of musicalmeter, as manifested by hand gestures and recited syllables). Both metric and rhyth-mic patterns have often been described with reference to the gaits of various animalsand birds, as shown in Figure 1: an elephant (gaja), swan (hamsa), lion (simha), horse(turaga), or the mythical sarabha (an eight-legged Himalayan deer). Medieval San-skrit verse is often framed in long lines that depend on punctuation for their internalstructure, either by caesura (as in sarabhii) or grouping boundaries (as in sarabha-lalita, which differs from sarabhii only in its lack of caesuras). This type of imageryis of course reserved for meters that depend on internal repetition and are thereforesuggestive of a regular 'gait'; they amount to no more than a small fraction of therecognized poetic meters, of which many are much more irregular.

In contrast, the medieval tala patterns displayed in Figure 1 are much shorter andwere probably performed at a faster speed because they were not linked with the con-ventional tempo of speech syllables. But the principle is the same: continuous repe-tition of a short, often irregular, sequence of durations. The results can be describedas additive (as opposed to divisive) rhythm: uneven groupings of a lowest commondenominator (as opposed to even divisions of a highest common multiple). The pointhere is not whether these are accurate representations of animal gaits - perhaps theyare, perhaps they aren't; Indian authors saw no need to argue the question. Fromthe perspective of an oral tradition, the issue is how effectively can these patterns bedemonstrated, learned, and remembered? Poetic and musical meters were set down,not in the longs and shorts of verse and the Western rhythmic notation shown in Fig-ure 1, but in syllable codes that reduced the information to forms that could be moreeasily recited and remembered (see Rowell, 1992, pp. 215-221).

Figure 2 on page 296 is a syllabic display of the structure of the very popularmodern Hindustani tala TinHIl, a tala that is likely to be performed in most concertsof North Indian music: a total of 16 beats (each marked by a clap, wave, or fingercount) divided 4 + 4 + 4 + 4, with a wave of the hand on beat 9 instead of the clapsthat initiate each of the other groups of four (beats 1, 5, and 13). Each of the 16 beats

Page 308: Musical Imagery

LEWIS ROWELL

Classical Sanskrit meters:

gajagati (elephant-gait)hamsagati (swan-gait)sarabha (a male sarabha)sarabhii (a female sarabha)sarabhalalita (like a playful sarabha)

Medieval talas:

'W'W'W- 'W'W'W'W'W- -'W--, ,

295

gajalfla (elephant-play)

hamsaniida (the call of a swan)

sarabhalfla (sarabha-play)

simha (lion)

simhalfla (lion-play)

simhaniida (roaring lion)

turagalfla (horse-play)

JTJ .p..P J. n J.nlmnr1IlJ", j)

.P J J .P Jnnn

Figure 1. Poetic and musical meters named after animal gaits.

is further represented by a drum syllable, as shown on the lower line of the example.This is how North Indian talas are conceived today - as strings of drum syllables thatare differentiated by the various phonetic oppositions of the parent language (Sanskritand the local languages) and embody the physical patterns of energy in the tala. Inthe case of TinUil, each cycle is often molded into two crests of energy that dischargeon beats 1 and 9 and thus establish a norm from which the subsequent performancevariations deviate and then return.

Here is an extremely important point: These syllables are more than drum syllables- they are abstract phonetic patterns that can be recited, played, and danced. Theyare distinguished from each other by impact, quality, aspiration, closure, and otherphonetic features. They are not medium-specific, and, because not many syllables areavailable, there can be no one-to-one correspondence between (a) a sound and (b) theaction that sound represents. The sounds and actions are associated only by traditionand convention, although they seem natural enough when performed in context.

But what do the syllables represent? Phonetic oppositions such as long versusshort, open versus closed [syllables], aspirated versus unaspirated [consonants], voicedversus voiceless [consonants], velar versus dental [articulation], and the like can betranslated into such things as left versus right [drumstrokes, arm or leg movements],ringing versus damped [drumstrokes], up versus down [pitches, dance steps], one

Page 309: Musical Imagery

296 THE MUSICAL IMAGERY OF INDIA

X 12 10

13

dha dhin dhin dha dha dhin dhin dha dha tin tin ta dha dhin dhin dhaX, 2, and 3 = clapso= right hand waves to right

Figure 2. The structure of Tintal

drumhead versus another, forward versus back, and many other possibilities. Sur-prisingly enough, dynamic variations of loud versus soft are excluded from this longlist of performance variables, on the probable grounds that immediate dynamic con-trasts have never been a major feature of Indian performance style. A highlight of anIndian dance performance occurs when the dancer or dancing master dictates a rapidsequence of syllables, and the dancer and drummer then simultaneously translate thesyllables into their own physical and sonic languages.

With Table 2 we enter the domain of pitch, and here -once again- we find anoral notation consisting of syllables, the so-called sargam notation, which has beenin use for more than two thousand years. (The name sargam is a composite of thefirst three syllables: sa, ri, and ga.) The third vertical column in Table 2 displays theperforming syllables for the seven degrees of the diatonic scale - sa, ri, ga, rna, pa,dha, and ni. (Elaborated and inflected versions of these syllables are used for tuningpurposes in both the Hindustani and Carnatic musical systems, but these are not usedin performance.) The point I wish to emphasize here is the flexibility with which thesesyllables are affixed to complex clusters ofmusical information. The scale degrees arecalled svaras, a concept that will be examined in the final section of this essay.

The sargam notation has often been referred to as a sol-fa system and the obvi-ous Western parallel is the set of hexachord syllables devised by the eleventh-centuryGuido of Arezzo: ut, re, mi, fa, sol, lao But there is a profound difference betweenthe two systems: Guido's syllables were derived from an actual piece of music andrepresent fixed interval relationships - from ut (or do) to re is always a whole step or,more technically, a major second interval. But the svaras were and are not specificwith regard to their pitch. They represent what we might call generalized degrees ofthe scale, and, while sa and pa are fixed in their location, the others vary in their tuningfrom raga to raga.

Every performance of Indian music will at some point include a section or sections

Table 2. The Seven Svaras (sargam notation)

Degree1234567

Name

giindhiiramadhyamapaiicamadhaivata

Abbreviationsarigarnapadhani

Page 310: Musical Imagery

LEWIS ROWELL

Description:

rolling like a pearllike the gait of a drunken bull elephantassociated with tender lusterplayfultrembling, as if under a heavy burdenrolling like waves of the Gangesswirling like water in a half-filled jara high note touched quickly like a flamea ball tossed and caught repeatedlyresounding like a bellswingingrockingstumblingincreasing and then decreasing (in volume)circling like the vortex of a whirlpoolzig-zag, as in a flash of lightning

297

Figure 3. A selection of ornamental phrases (s.thiiyas) from the Satigftaramakara (ca. 1240)

of music based upon the sargam syllables, whether sung by a singer or imagined byan instrumentalist. And, even after their tuning has been specified in a particular raga,the syllables are not restricted to single pitches but may also represent ornaments andornamental clusters. In the great South Indian raga TOQi, for example, the syllablega is often performed as an oscillation that surrounds (but never touches) its ownpitch location, reaching into the territory of its lower and upper neighbors. This - forme - is one of the defining features of the musical imagery of India: the economywith which minimal phonetic signals (the sargam syllables) can become associatedwith complexes ofmusical information through demonstration and reinforcement, andthereby be preserved and insured against loss. I will amplify this point in the finalsection of this essay.

The two remaining examples extend the reach of musical imagery beyond thesyllable, and into the ornamental and qualitative domains of Indian music. Figure 3,which is based on information from an authoritative thirteenth-century music treatise(see Shringy & Sharma, 1989, pp. 175-198), is a partial list of the 96 s!hiiyas, a wordthat can be best translated as 'ornamental phrases'. Jazz musicians have somethingsimilar in mind when they refer to their repertoire of 'licks' - musical figures that canbe performed in a variety of scales and contexts. The attempt here was to suggesta set of models for melodic development, and the solution found was to representthese musical patterns by vivid extramusical imagery. The analogies obviously requirelive demonstration, but once learned, never forgotten. Many of the same figures arestill learned today by students of Indian music as the basis for their improvisatorytechnique.

Figure 4 on the following page, my final example, is an interesting case and some-

Page 311: Musical Imagery

298 THE MUSICAL IMAGERY OF INDIA

brilliant (dfpta): sharp (tfvra)fierce (raudri)thundering (vajrikii)fearful (ugra)

extended (ayatii): like a lotus (kumudvati)passionate (krodhii.)pervasive (prasiiri1J.i)inflaming (sathdfpani)adolescent (rohi1J.i)

moderate (madhya): metrical (chandovati)charming (raiijani)purifying (miirjani)enamoured (raktikii)restful (ramyii)excited (qobhi1J.i)

tender (mrdu): low (mandii)lovely (ratikii)beloved (prfti)patient (qiti)

compassionate (karu1J.ii): compassionate (dayavati)conversing (alapini)maddening (madantikii)

Figure 4. Distinctions of timbre from the Sangftasiroma1)i (C.E. 1428)

thing of a puzzle. When phonetic scholars began to study the melodic basis for Vedicchant, perhaps as early as 3000 years ago, it became clear to them that the tonal dis-tinctions between one note and another were too complex to be captured under thecategory of pitch. So they conceived the chant melody as a 'unified field' withinwhich distinctions of pitch, dynamics, and vocal quality (timbre) were intermingled:one note, for example, could be higher, louder, raspier, more tense, or more sharplyimpacted than another. The same distinctions were subsequently applied to musicalmelody, but seem to have been misunderstood by later writers and eventually fell intodisuse (Ramanathan, 1980). There is some evidence that early analysts of Westernchant experimented with similar solutions, but they too gave up the attempt in favorof notations that privileged pitch.

Figure 4 is a display of the so-called sruti-jatis, which were divided into five maincategories (in the left margin) and 22 subcategories. The word sruti refers to a mi-crotone or any small distinction between one tone and another; jati means 'species'or 'type'. The consensus of most Indian scholars today is that these were measuresof tone quality, not pitch, as single syllables were brightened, tensed, relaxed, or in-flected in other qualitative ways - suggesting an entirely new kind of music that theconventional pitch notation was unable to represent.

The five main species can be grouped into two 'elevated' tones (one characterizedby brightness and the other by some type of sustained intensity), one intermediate

Page 312: Musical Imagery

LEWIS ROWELL 299

tone, and two 'depressed' tones, whose properties are not as clearly defined. Theoriginal five were probably sufficient for recitation, but were later expanded to 22- perhaps because the musical octave was divided into 22 microtones (also calledsrutis), and the authors reasoned that if there were 22 separate regions of pitch, thereshould also be 22 distinctions of timbre. Once again, these distinctions could havebeen taught only by demonstration.

It is not surprising that this set of timbral distinctions never developed into a com-prehensive theory of musical tone quality, given the lack of logic in the various sub-categories: 'maddening' may not have the same range of meaning that it has for ustoday, but neither does it seem to belong in a depressed category; and it seems strangeto find both 'restful' and 'excited' within the same category. The tradition was ap-parently lost at some point, and later authors tried unsuccessfully to convert it to a setof pitch distinctions (see Te Nijenhuis, 1992, pp. 91-95). It represents one of thoseblind alleys in musical thought that can easily lead later interpreters astray, and this is,unfortunately, one of the problems with an oral tradition. The other two things to notein this scheme are (a) the tendency toward hyperbole and (b) the amorous imagery,which occurs in at least five subcategories. Both are staples of Indian metaphoricallanguage, which tends both to exaggerate and to link musical qualities with emotionalcontent.

Conclusions: Musical practice in an oral milieu

Readers will be aware that this essay reflects the methods and perspectives of classi-cal indology, which are in turn based primarily on the insights of literati from diversebranches of Indian learning. The preceding section, with its emphasis on examplesof imagery drawn from the Indian musicological literature, has served to demonstrateboth the characteristic flavors of Indian musical imagery and the ingenuity with whichmusicians and scholars sought to develop and promote mental and oral representa-tions of music appropriate to the prevailing tradition. But theory, as Indian scholarshave consistently maintained, is barren unless it leads to practice. The focus in thisfinal section is upon practical music-making - composition, improvisation, teaching,learning, performance, and listening- and its attendant imagery.

To begin with, I should like to put forward the following propositions, which havebeen working assumptions throughout this study: first, that musical imagery is a prod-uct of culture; second, that imagery, from a philosophical perspective, is a form ofmediation and will therefore come under the influence of cultural attitudes towardsensation, mediation, experience, and perception; third, that the most difficult kind ofimagery to survey is the imagery practiced by listeners; fourth, that testimony is oftenunreliable, because it is colored by personal attitudes, unconscious biases, and mo-tivations; fifth, that the experience of music will vary widely from person to person,even within the most powerful cultural tradition; sixth, that texted music is a specialcategory, in that it has the power to supplant imagery that would otherwise arise inso-called abstract music; seventh, that one cardinal difference between the music ofIndia and the music of the West lies in the Indian concept of svara; and finally, that thetypes and principles of imagery illustrated in this essay continue to inform the practice

Page 313: Musical Imagery

300 THE MUSICAL IMAGERY OF INDIA

of Indian music today.To return to the central question framed by the editors of this volume: what occurs

in the minds of performers and, to a lesser extent, listeners, as they hear and imaginemusic? I have discussed the question with a number of Indian colleagues and wasmildly surprised by some of the responses. Two points stand out: (1) the Indianexperience of music relies very little on forms of visual imagery, which did not comeas a surprise and requires no further comment; and (2) the cultural tradition by whichmusical notes are conceived, learned, remembered, and apprehended as flexible bands,not fixed points, shapes the entire field of musical imagery.

I once heard a great Indian musician say, 'A note is not a point, it is a region to beexplored.' When that remark was quoted in a lecture, a Western colleague, who -likeme - spends much of her working day in teaching students that notes are preciselysituated and separated by equally precise intervals, said to me, 'Don't tell that to ourstudents!' The single most defining feature of Indian musical imagery is the conceptof svara, which teaches that the single musical note is an illusion by which the mindcaptures a miniature cluster of musical information and represents it in the form ofa syllable. The importance of this notion has not escaped Indian authors: the twinconcepts of svara (note) and sruti (microtone) have generated an enormous amount ofphilosophical analysis over the last two thousand years, and each of the major systemsof Indian philosophy has addressed the issue from its own perspective (see Rowell,1992, pp. 149-152). By now, the habit of thinking in svaras has become the centralcore of, and inescapable context for, the music of India.

Some analogies may help Western readers understand the extraordinary flexibilitywith which a small set of syllables can represent the hundreds of intonations and pitchrelationships in the vast repertoire of ragas. For Western musicians, do - re - mi isa precise sequence of tones that can be sung, played, imagined, or notated on anypitch - but that is entirely their problem; for ordinary listeners, the problem does notexist, although they may be aware of differences in pitch height or the psychologicalcolors of various keys. But in the music of India, the territory and relative stabilityof each note vary from raga to raga. If musical notes are filters through which ourmusical thinking and musical experience flow into culturally-authorized channels andgroupings, the scales of India require a large supply of such filters - larger perhapsthan in any other world music.

Perhaps an even better analogy is to compare the svara syllables to changeabletype-fonts on a computer, fonts that differ in size, style, intensity, and scale and permittranslation between different languages (e.g., singing, drumming, dancing). Once aparticular font has been selected and locked in place, it can specify with great precisiona wide range of preset musical microfeatures. In the case of Indian music, the lettersin these fonts are the svara syllables, and, while their order remains the same withineach raga (as in an alphabet), their tuning, stability or instability, and function willvary in the same way that letters vary in their ability to combine with other letters in aparticular language.

The example of Indian music is a valuable one when we consider the possible uni-versality of musical imagery, and it also demonstrates the crucial role of mediation inteaching, learning, and maintaining a musical language. The question has so far beenaddressed from the perspective of a performer, but it is not as clear what happens in

Page 314: Musical Imagery

LEWIS ROWELL 301

the minds of listeners. Those who claim Indian music as their first language will befamiliar with the continuous drone, the ornate melody, the twists and turns of the or-naments, the satisfying stability of sa (the tonic or final for the entire system of ragas),and even some of the special contours of well-known ragas. But it is dubious whetherthey will automatically translate these sounds into svaras, as trained musicians do.We could say much the same thing with regard to the experience of untrained listen-ers in the West, except that they will be more prone to interpret single notes as unitswith fixed locations and hear anything that lies in-between (slides and transitions ofall sorts) as performance variables and expressive deviations.

Indian musicians, on the other hand, hear - or claim to hear - known compositionsand known ragas as strings of svaras, which filter the flow of data into the mind but inno way diminish their apprehension and savor of the musical details. But no one canknow every detail of every raga, and even the most learned musicians will occasion-ally confront ragas to which they have not been preconditioned. What happens thendepends on the individual; my guess is that he or she will translate the svaras on thebasis of their resemblance to similar configurations in other ragas.

Much of Indian music consists of texted, vocal music, and the presence of a mean-ingful text has profound consequences for the mental image of music. Whether aperformer is conscious of a text as meaning or as pure phonetic play, it adds anotherdimension to the mental content. As a result, the mind is more likely to go on 'au-tomatic pilot' with respect to the location of the svaras and all the melodic featuresof the raga, which have been - in effect - 'locked' into position by the time a perfor-mance begins. Since ancient times, Vedic priests have been trained to memorize andrecite a text both with and without consciousness of its meaning, as insurance againsterror.

My final point concerns mediation. For millennia, Indian authors have recognizedthe value and power of mediation in such things as learning, perception, and memory,but have held - at the same time - that mediation of any kind is an obstacle that bars usfrom the direct experience of reality. We live in a world of shadows and images, somemore substantial and persuasive than others, but they should be known and enjoyedfor what they are and not be confused with what they represent. In the traditionalHindu doctrine of the four ends of life (right conduct, material gain, pleasure, andliberation), the immediate goal of music is pleasure (kiima) but the ultimate goal isliberation It is only through attachment (to the images of the deities and thepleasures of music, literature and the other arts) that detachment becomes a possibility- when one is able to know the real without the mediation of images.

When asked, both Western and Indian musicians of an imaginative turn of mindwill happily spin flights of fancy, but when it comes down to the working vocabularyof music -the notes, rhythms, syllables, and ornaments- anything that reduces themental content is all to the good. Whether what is needed is a link to a compositionalready stored in memory or a reminder of the melodic structure and metric structure(raga and tala) that will inform and guide each performance, the Indian musician'smotto is 'less is more'. And I suspect that most of us who work with Western musicfrom the inside will be likely to agree.

Page 315: Musical Imagery

302

References

THE MUSICAL IMAGERY OF INDIA

Hume, R.E. (Trans.). (1931). The Thirteen Principal Upanishads. London: Oxford University Press.Janaki, S.S. (1985). The role ofSanskrit in the developmentof Indian music. Joumalofthe Music Academy,

Madras, 56, 66-98.Ramanathan, N. (1980). The concept of §ruti-jatis. Journal ofthe Music Academy, Madras, 51, 99-112.Rowell, L.E. (1992). Music and Musical Thought in Early India. Chicago: University of Chicago Press.Sathyanarayana, R. (Trans. & Ed.). (1994). NartananiT7J'lya of Pa'.l{iarfka Vi!!hala. New Delhi: Indira

Gandhi National Centre for the Arts.Shringy, R.K. & Sharma, (Trans.). (1989). Sangfta Ratniikara of Sii17)gadeva, vol. II. New Delhi:

MunshiramManoharlal.Te Nijenhuis, E. (Trans. & Ed.). (1992). Sangftaiiromal)i: A Medieval Handbook ofIndian Music. Leiden:

EJ. Brill.

Page 316: Musical Imagery

Name Index

Albersheim, G., 108, 114Alho,K.,93Allegri, 19, 273Allerhand, M., 115Altenmtiller, E. 0., 78, 79, 92, 253,

268Annett, J., 120,129, 130,132Arezzo, G., 296Aristotle, 6,7,11, 23,Ashby, F. G., 113, 114Atkinson, R. C., 45, 54Auhagen, VV., 182,201Avener, M., 129,134

Bach, J. S., 10, 66, 96, 103, 105, 106,109,255,258,259

Baddeley, A., 44,45,46,47,54,55Baily,J., 132,132,267,268Baker, J., 183, 251

Ball, T. M., 44, 55Bangert, M., 253, 268Barneah, A., 93Barrass, S., 162, 163, 178Bateson, G., 123, 133Baxter, D. A., 52, 55Beaman, P., 53, 54Beck, B., 128, 133Beckett, C., 53, 55Beethoven, L. v., 13, 18, 19,57,59,255,256,261,262,263,264,265,266,268,278

Bekesy, G. v., 102, 114Bekkering, H., 53, 55Benade, A. H., 205,216Bergmann, G., 23,Bergson, H., 14, 126, 133Berio, L., 283, 287Berlioz, H., 202, 216

Page 317: Musical Imagery

304

Berry, D. C., 267, 268Berthoz, A., 118, 120, 121, 122, 128,

130,131,133,239,240,241,243,248,249

Bertrand, 0., 40,41,42Besson, M., 32, 33, 34, 41,42, 78, 91,

93Bideaud, J., 123, 133Bilsen, F. A., 212,217Bismarck, G. von, 162, 163, 178, 202,216

Block, N., 6, 23,Boer, E. de., 100, 114Bolinger, D., 139, 142, /58Bolz, M. G., 53, 54, 121, 134Boring, E., 10, 18, 24Bortz, J., 108, 114Boulez, P., 274Bouveresse, J., 118,133

250Bradshaw, J. L., 186, 189, 199Brahms, J., 267Brammer, M. J., 239,249Bregman, A., 108, 114, 239, 244, 249,284,285,287

Brentano, F. v., 2, 7, 10, 11, 12,13, 15,23,24,

Brodsky, VV., 185,199Brooks, L., 50, 54Brown, S., 20, 26, 159Bruhn, G., 100, 114Buechler, S., 138, 159Buelow, G., 158, 158Butterworth, G., 127, 133

Cadoz, C., 239, 249Cage, J., 282Caivano, J. L., 162, 163, 178Calvert, G. A., 239, 249Camurri, A., 74, 75Canazza, S., 201,204,205,215,216Cariani, P., 59,69, 75, 100, 101, 114Carreras, F., 64, 65, 66, 69, 75Carrier, M., 6, 24,Carrol-Phelan, B., 53,54, 123, 133Carter, E., 280, 287

NAME INDEX

Carterette, E. C., 202,217,280,287Casey, E. S., 11, 24,Chauvel, P., 34, 41Chion,M., 14,24,126,133Chisholm, R., 11,24,Chomsky, 187Chopin, F., 182, 188, 189, 192, 194,

197Clementi, 267Clarke, E. F., 139, 158, 186, 187, 199,280,287

Clarke, J. M., 6, 24,Clarkson, D., 93Clifton, T., 13,24,Clynes, M., 139, 158, 187, 199Cohen, A. J., 279,287Cohen, D., 4, 78, 93, 138, 139, 141,

142,143,144,158,158Cohen, E. A., 99, 114Cogan, R., 108, 109, 114,276,287,287

Coles, M. G. H., 33,41, 93Corelli, A., 214Costall, A., 127,133Cott, J., 274,282,283,286,287Cowan, N., 39,41,49,55Crammond, D., 132,133Crow, H. J., 34, 42Crowder, R. G., 20,21,23,24,25,44,54,113,114,203,205,217,253,268

Curtis, S., 40,41Czternasty, C., 34,41,

Dali, S., 10Davies, J. B., 23, 24,Decety, J., 120, 129, 130, 133Deecke, L., 123, 133Delalande, F., 132, 133Deleuze, G., 126, 133Delgutte, B., 100, 101, 114Demany, L., 101, 114Dennett, D. C., 6, 24,De Poli" G., 201, 204, 205,209,214,216

Descartes, R., 6, 7, 11, 24,

Page 318: Musical Imagery

NAME INDEX

Deutsch, D., 48, 49, 54Dierks, T., 29,41,Di Frederico, R., 216Di Pellegrino, G., 120, 129,133Donchin, E., 33,41, 79, 93Doubleday, C. N., 93Dowling, W. J., 139,159Drake, C., 191,199Dretske, F., 123, 133Drioli, C., 216Dubnow, S., 138,143, 158Duchesneau, L., 19,24,Diinnwald, H., 201,216Dyson, M. C., 110, 116

Ebbinghaus, H., 18,24,Ebhardt, K., 186, 199Echallier, J. F., 40,42Eckensberger, L., 92, 93Edelmann, G. M., 252, 253, 268Edworthy, J., 48,49,50,51,55Eggermonth, J-J., 59, 60, 75Ehrenfels, C. V., 19,24,Ehret, G., 62, 74, 75Elbert, T., 40,41Engelbrecht, S. E., 250Enns, J., 177,178Erez, A., 78, 93Erickson, R., 273, 276, 287Ericsson, K. A., 53,54

108,109,114,276,287,287Edinger, S. C., 32, 42Eulitz, C., 40, 41Evans, A., 76Evans, A. C., 32, 40, 42 55, 250

Fabiani, M., 78, 79, 93Fadiga, L., 120, 133Fairchild, M. D., 163,178Faita, F., 33,41,91,93Farah, M. J., 44, 54Farley, G. R., 114Fillmore, C., 119, 133Filz, 0., 32, 42Finke, R. A., 28,41,44, 54Finney, S. A., 186, 189, 199

305

Fleischer, H., 99, 114Fletcher, N. H., 97, 114Fogassi, L., 120,133Fokker, A.D., 15, 24,Folkmann, S., 159F6nagy,l., 139, 142, 147,159Formisano, E., 29, 41,Fortner, B., 162, 165, 178Frackowiak, R. S. J., 29,41Freed, D. J., 249, 249Friberg, A., 159Friederici, A.D., 33,41Friston, K. J., 29,41Fryden, L., 159Fukuhara, H., 54Fusella, V., 20, 25,

Gabrielsson, A., 139, 147, 159, 186,187,199

Gallese, V., 120, 133Garner, W. R., 113, 114Gates, A., 186, 189, 199Georges, K. E., 6, 24Giannakis, K., 4,161,176,178Gibson, E., 33, 42Gibson, J., 121, 133Giguere, C, 115Gjerdingen, R., 125, 133Glasersfeld, E. von, 119, 133

R. I., 1,5,20,64, 75,95,96,114,124,126,127,128,133,182,237,241,245,249,284,285,287

Goebel, R., 29,41Goethe, J. W., 240Goldman-Rakic, P.S., 29, 32,41Goldstein, J., 101, 116G6mez de la Serna, R., 10, 24Goude, G., 202,217Granot, R., 93, 141, 144, 158Gratton, G., 93Gray, W. D., 250Greenberg, S., 100, 114Grey, J. M., 162, 163, 178, 202, 216Griffiths, T.D., 29, 41Gromko, J., 126, 128, 133Grossberg, S., 108, 114

Page 319: Musical Imagery

306

Guiard, Y., 253, 269Gunter, T., 33,41Gurwitsch, A., 16, 23, 24

Haberlandt, K., 241,249Hacohen, R., 139, 159Halgren, E., 34, 41Halpern, A. R., 32, 40, 41 , 44, 54, 55,

76,123,135,186,199,250Hamel, R., 22,25,Hampson, J., 53,54, 123,133Hampson, S., 40,41Handel, S., 244, 249HansHck, E., 139, 159Hantz, E. C., 33,41,78,93Hamad, S., 240, 249Hartmann, W. A., 100, 102, 114Harwood, D. L., 92, 93, 139, 159Hatta, T., 52,54Healey, C., 177, 178Heath, R. G., 138, 159Helmholz, H. v., 99, 114Henik, A., 185, 199Herkner, W., 202,216Hesse, H. P., 100, 114Hevner, K., 202,216Hewitt, M., 102, 115Hirose, T., 54Hishitani, S., 52, 54Hitch, G., 46, 54Hoffman, R., 118, 133Holcomb, P. J., 33, 42Holdsworth, J., 115Honeck, R., 118, 133Honegger, A., 282Horst, J. W., 114Houde, 0., 123, 133Houtsma, A. J. M., 100, 111, 114,203,

216Hubbard, T. L., 20, 24, 44, 54Hulse, S. H., 254, 269Hume, D., 8, 24,Hume, R. E., 290, 302Husserl, E., 2, 6, 11, 12, 13, 21, 23, 24,

113

NAME INDEX

Ikeda, K., 54Immerseel, L. M. van, 102, 114Inbar, E., 4,Ingarden, R., 13, 14, 24,Intons-Peterson,M. J., 52, 53, 54, 96,114,123,134,253,269

Ivarson, P., 21,24,Iversen, S. D., 239, 249Izard, C. E., 138, 159

Jackendoff,R., 119, 123,134,276,287

Jackson, M. C., 29,41Jackson,R., 162,163,165,178,178Jacobson, E., 122, 134Jaensch, E., 19,24,Jairazbhoy, N. A., 89, 93James, W.,8,23,25, 122, 123, 125,

130,134Janaki, S. S., 293, 302Janata, P., 2, 21,27,32,33,34,41,78,

93,238Jandl, M., 29,41,Javel, E., 100, 114Jeannerod, M., 120, 122, 130, 131,

132,134,238,246,249John, E. R., 34, 42Johnson, D. M., 250Johnson, M., 118,119,120,134,240,249,252,253,269,284,287

Jones, D. M., 53, 54, 55Jones, M. R., 121, 134Jost, E., 201,216JusHn, P. N., 139, 159J{2Jrgensen, H., 1, 181

Kaernbach, C., 101, 114Kalakoski, V., 2,43,50,51,52,53,54,55

Kananen, W.,33,41,93Kanner, A. D., 159Kant, I., 8, 9, 10, 11,25, 110, 118, 134Karis, D., 93Katzir, Z., 139, 142, 159Kaufmann, L., 40,41Keidel, W. D., 101, 102, 114

Page 320: Musical Imagery

NAME INDEX

Keller, T. A., 49, 55Kendall, R. A., 202,217,280,287Kerzel, D., 53, 55Kessler, E., 63, 64, 75Kintsch, W., 53, 54Klein, M., 78, 93Kliever, J., 116Klinger, E., 6, 25Knepler, G., 19,25Knight, W., 177, 179Knops, L., 127,134Kochmann, R., 19,20,25Koelsch, S., 33,41Kohler, W., 18, 23,Kolinsky, R., 53, 55Kosslyn, S. M., 20,25, 28, 39, 41,44,55,95,96,115,117,134,238,249

Krautgartner, K., 201,217Kreilick, K. G., 33, 41, 93Krumhansl, C. L., 21,24,63,64, 75,255,269,279,287

Ktilpe, 0., 15,Kurth, E., 14, 25, 129, 134Kutas, M., 34,41,Kvifte, T., 182, 219, 221, 235

Laeng, B., 267,269Lakoff, G., 118,119,120,126,134Lanfermann, H., 29,41,Langacker, R., 123, 125, 134

76Laufer, A., 141,159Lazarus, R. S., 138, 159Lehmann, A. C., 279, 288Leman, M., 2, 58, 63, 64, 65, 66, 69,74,75,76,113,161,178,244,249

LeNy,J.F., 117,134Leppert, R., 95, 115Lerdahl, F., 276, 287Lesbros, V., 164, 178Lessafre, M., 66, 75Levinson, J., 185, 199Libermann, A. M., 122, 134, 240, 249Lidov, D., 119, 132,134Liebermann, P., 139, 142,159Ligeti, G., 71, 72, 281, 283, 284, 287

307

Linden, D.E., 29, 41,Lindstrom, E., 186, 187,199,279,287List, G., 139,141,159Liszt, F., 182Locke, J., 113Logie, R., 44,45,46,48, 49, 50, 51,54,55

Loukopoulos, L. D., 250

Macar, F., 33,41, 93MacDonald, J., 240, 249MacKay, D., 122, 134Macken, W. J., 53, 54, 55Magdics, C., 139, 142, 147,159Mahoney, M., 129, 134Mainwaring, J., 20, 25Marin, O. S. M., 6, 25Marinkovic, K., 34, 41Marks, L. E., 163, 178Martens, J. P., 102, 114Mattheson, J., 202,217Mattingly, I., 122, 134, 240, 249Mazet, C., 128, 134McAdams, S. E., 162,178,284,287McCarthy, G., 93McDaniel, M.A., 6, 25, 52,54McDermott, J., 119, 123, 124, 129,

134McGurk, H., 240, 249McKeown, D., 115McNeill, D., 242, 249Meddis, R., 102, 115Menzel, R., 28, 42Meredith, M.A., 21, 26, 241, 250Merker, 159Merleau-Ponty, M., 13, 248Mersmann, H., 129,134Mervis, C. B., 250Meulenbroek, R. G. J., 250Meyer, E., 32, 40, 42, 55, 76, 250Meyer, J., 210, 211, 217Meyer, L. B., 14,25, 139, 142,159Meyer, T., 162, 165, 178Michon, J. A., 194, 199Michotte, A., 127,134Miereanu, C., 128, 134

Page 321: Musical Imagery

308

Mikumo, M., 23, 25,53,55, 128,134,244,247,250,254,269

Miller, G. A., 221,235,247,250Mittelstrasse, J., 6,24,Moelants, D., 69, 75Mohr, G., 48, 49, 55Molino, J., 120, 128, 132, 134Moore, B. J. C., 100,115Mossolov, A., 278Mountain, R., 183, 185, 271, 279, 284,

287,288Mozart, VV.A., 19,23,215,273,274,

275,286MUllensiefen, 107, 115

R., 6, 25, 30, 31, 40, 41 , 87,93,110,115

Narmour, E., 121,131,134,141,159Neisser, D., 43, 55Neuhaus, C., 3, 77Newmann, E. B., 19, 25,Nunez, P.L., 31,32,41,42

Okuno, O. G., 240, 250

Paavilainen, P., 93Padgham, C., 162, 163, 178Paillard, J., 122, 130, 135Paivio, A., 249, 250Paller, K. A., 78, 93Palmer, C., 191, 199Pantev, C., 40,41Park, A., 267, 269Parlitz, D., 253, 268Parncutt, R., 279, 288Patel, A.D., 33, 42Patterson, R. D., 95, 102, 115Pechmann, T., 48, 49, 55Penel, A., 191, 199Penney, C. G., 47,55Peretz, I., 53, 55Pernier, J., 40,42Perrin, F., 40, 42Perry,D. VV.,6,25,32,42,55, 76,250Petsche, H., 32, 33, 41,42PflUger, H.J., 28,42

NAME INDEX

Piaget, J., 121, 135Pickles, J., 74, 76Picton, T., 31, 41Piston, VV., 202,217Pitt, M. A., 20, 21,25,203,205,217,

253,268Plomp, R., 162, 178Plutchik, R., 138, 159Poorman, A., 126, 128, 133Potter, J. M., 212,217Praetorius, M., 202,217Prandioni, 205,216Pratt, H., 93Pressing, J., 195, 199Pressley, M., 6, 25

Raatgever, J., 212, 217Ramanathan, N., 298, 302Ransdell, J., 124, 135Rapoport, E., 139, 159Rappelsberger, P., 32, 42Ratner, J., 33,42Ravel, M., 57Reisberg, D., 47, 52, 55,238,250,

254,269Reiser, B. J., 44, 55Remington, R., 93Reber, A., 137, 159Repp, B. H., 139, 141, 159, 182, 185,

186, 188, 189, 190, 191, 192, 193,194,195,196,197,198,199

Reuter, C., 21,25,202,205,217Reybrouck, M., 4, 20, 117, 119, 121,

123,129,131,135Rhode, VV. S., 100, 114Richter, P., 32, 42Riemann, H., 15,17,18,23,25,96,

115Rinaldin, S., 216Ritter, VV., 34, 42Rizzolati, G., 120,133Roads, C., 178,178Robinson, K., 115Rosch, E., 240, 246, 250Roda,A.,201,204,205,209,214,216Rollins, M., 123, 135

Page 322: Musical Imagery

NAME INDEX

Romand, R., 62, 74, 75Rosch, E., 135Rosenbaum, D. A., 122, 135,245,246,

247,250Rosenthal, D. F., 240, 250Roskos-Ewoldsen, B. B., 52,54Rossing, T. D., 97,109,114,115,245,

250Rouw, R., 22, 25,Rowell, L., 183,289,294,300,302Rubinstein, B., 185, 199Ruchkin, D. S., 34, 42Russell, P., 166, 179

Saariluoma, P., 52, 53, 55100, 102,115

Sachs, C., 139,159Sacks, 0., 19,25Sadie, S., 286, 288Salame, P., 47,55Sams, 93Sandell, G. J., 205,217Sartre, J. P., 13,25,Saslaw, J., 126, 135Sathyanarayana, R., 291,302Saults, J. S., 49, 55

14,22,25, 126, 127,135,239,250,279,284,288

Scheerer, E., 6, 25, 122, 135126,135,252

Scherer, K. R., 139,158,159Schmuckler, A., 255, 269Schneider, A., 1, 3, 5, 12, 15, 16,21,25,87,90,93,95,97,98,99,100,101,102,107,110,113,115

Schoenberg, A., 3,255,259,260,274,280,288

Schoner, V., 182, 201Schouten, J. F., 100, 115Schreiner, C., 59, 76Schrager, E., 33,41Schubert,E,264,265,266SchUtz, A., 13, 25,Schuirer, G., 40, 41Schumann, R., 182Scruton, R., 10, 14, 25,

309

Seebeck, A., 100Segal, S. J., 6, 20, 25,Seifert, D., 96, 115Serafine, M., 125, 135Sergent, J., 57, 76Sethares, W., 69, 70, 76Shapira, S., 152, 153, 159Sharma, P. L., 297, 302Shaw, G., 60, 76Shephard, R. N., 113, 115,240,250Shiffrin, R. 45, 54Shorr, J. E., 6, 26,Shringy, R. K., 297, 302Silberstein, L. B., 32, 42Simon, H. A., 220, 235Simson, R., 34, 42Singer, W., 29,41,Slawson, W., 18,26,99,115,162,

179,244,250Sloboda, J., 23,26,139,159,253,269,279,288

Smith, A. F., 44, 54Smith, G., 287, 288Smith, J. D., 47, 50,52,55,238,250,254,269

Smith, 4,161,177,178Smith, N. W., 287,288Smith, R., 129, 132Solbach, L., 102, 116Solomon, 268, 269Sonenshine, 52, 55Spillane, J. A., 29,41Srinivasan, R., 32, 42Srulovicz, P., 101, 116Stadlen, P., 268Staley, T., 69, 70, 76Stein, A. von., 32, 42Stein, B. E., 21, 26, 241, 250Stern, W., 111, 116Stoeckig, K., 20, 24, 44, 54Stockhausen, K., 274,281,283,284Stoffer, T., 12, 26,Stravinsky, I., 255,266,267,275,277,281,286,288

Stucchi, N., 122, 135Stumpf, 11, 13,15,16,17,18,23,26,

Page 323: Musical Imagery

310

95,99,108,112,113,116Sudnow, Do, 247,250Sundberg, Jo, 139,159,201,217,279,288

Sutton, So, 34, 42Swartz, Ko Po, 33,41,93

Takeuchi, Ao Ho, 254, 269Tanghe, Ko, 66, 69, 75Tanguay, Po Eo, 93Taub, Jo Mo, 78, 93Te Nijenhuis, Eo, 299,302Terhardt, Eo, 98, 99, 100, 116Tervaniemi, Mo, 58, 74, 76, 78, 93t'Hart, Jo, 100, 115Tholen, Ho, 111, 114Thomas, Ao, 92, 93Thompson, Eo, 135Todd, Mo Po Mo, 125, 128, 135, 139,

159Todd, No, 186, 199Townsend, Jo To, 113, 114Tshaikovsky, Po, 279Tueting, Po, 34, 42

Uexkiill, Jo von, 121, 135

Varela, F., 120, 135Varese, E., 277Vaughan, H. G., 34, 42Vaughan, J., 250Verbeke, B., 69, 75Verkindt, C., 40, 41Verleger, Ro, 33, 42, 79, 90, 93Vidolin, A., 201,204,205,209,214,216

Villa-Lobos, H., 278Villringer, A., 30, 42Viviani, P., 122, 135Vogel, Mo, 11, 15,26, 96, 116

NAME INDEX

Voigt, Wo, 206,217

Wagner, N., 141, 159, 139, 159Walker, J., 187, 199Wallin, N. L., 142, 159Walter, W. G., 34, 42Wang, J. Z., 40, 41, 42Ware, C., 177, 179Wassmann, J., 92, 93Watkins, A. Jo, 110, 116Weber, R. J., 20, 26,Webern, A., 13,256,257,258Wedin, Lo, 202,217Weinberg, H., 34, 42Wenger, Eo, 164, 179Whitehead, A. N., 123, 135Wilbanks, J., 8, 26,Williamson, S .J., 40,41Wilson, M., 47,55, 238,250,254,269Windsor, W. L., 280, 287Winkler, I., 6,25, 30, 39,41, 110, 115Wogram, K., 201,217Wood, C. C., 93Wohrmann, R., 116Wundt, W., 6, 110

Xenakis, I., 281, 282, 283, 284, 288

Young,E.D., 100,102,115

Zannos, 1.,97, 116Zatorre, R. J., 32,40,41, 42,44,53,

55,59,76,123,135,237,25074,76,102,116

Zhang, C., 115Zimmermann, B. A., 10,Zorman, M., 185, 199Zubin, J., 34,42Zuckerkandl, V., 276, 288

Page 324: Musical Imagery

Subject Index

abstract (cognition) 4, 12, 17, 22, 28,29,59,61,74,75,96,118,120,124,132,161,183,187,231,239,240,272,274,279-283,285,295,299

abstract vs. concrete (Schaeffer) 3, 22abstract (vs.eidetic) 29abstraction 8, 12, 57, 61, 62, 96, 110,

123,231,248,278,281acculturation 2action 4, 20, 23, 62, 119-132, 186,

198,237-248,253,267,278,280,295

action trajectories 241-243, 245action unit 241,243,246adaptive listening 89additive rhythm 294aerophones 97aesthetic (experience, judgement, etc.)

9, 10, 13, 106, 107, 112, 156, 188,204,215

affordances 121amplitude modulation 99, 105analogy 124, 126, 140, 280, 291, 300analysis by synthesis 243, 247animate beings 280, 281anticipation 13, 122, 130, 131, 185,

194, 195anticipatory actions 246apperception 5, 7, 9, 11, 16, 78, 96,

108, 110-113apprehension 6, 110, 112, 113, 301arranging 181,247,248articulatory phonetics 292artistic invention 10atmosphere 271, 279, 280attention 2, 11, 16,31,40,49,89,92,

144,185,191,212,222,233,237,

Page 325: Musical Imagery

312

249,268,279audition 2, 6, 127, 239, 241auditory brightness 163auditory cortex 29, 30, 32, 34, 39, 40,59

auditory density 163auditory evoked potentials 27auditory image 27, 28, 44,45.52,62,95,96,102,105,110,112,139,155,161,249,253,273-278,282,284,285

auditory imagery 28,29,31,32, 35,36,39,40,44-46,49,50,52,96,113,123,188-190,219,238,252,275,277

auditory midbrain 95, 101auditory nerve 59, 66, 67, 70, 95,101,

102auditory percepts 161auditory scene analysis 239, 244auditory sensory memory 39auditory-visual associations 161-164,

175awareness 11, 59, 78, 81, 124, 131,

197autocorrelation 101

ballistic (sound-producing actions)245-247,249

bandpass filters 100, 102, 103basilar membrane 102bells 97-111, 113body (human) 3,118-120,127-129,

131,241,252,253,256,257,290,292,293bottom-up 30, 65, 112,239

bowing patterns 223, 232brain 2, 6, 19, 23, 27-40,44, 58, 59,61,62,74,78,81,91,92,122,130,131,161,238,239,253

brain activity 1, 2, 3, 27-40, 77-92, 96,239

brain areas 28-34, 39, 40, 59brain-electrical activity 27-40, 77-92bright (attribute of timbre) 203-216,245

SUBJECT INDEX

brightness 18, 99, 163, 205, 209, 298broadening (of spectral content) 211,212,214

carillons 3, 97, Ill, 113categorical perception 12, 87, 90, 91categorization 52, 66, 96, 125, 128,

132,138,240causality 63, 127cellular automata 60, 74central executive 46,49, 51character 280, 281, 284chord (harmony) 7, 11, 15-17,20,32-34,64,66,69,78,96,97,106-113,142,145,188,243,244,251,252,255,256,261,265,267,275,276,284

chordophones 97choreography 243, 258chunking 221, 247circle of fifths 64coarticulation 245-248cochlea 74,100-102,164cognitive epoch 90cognitive linguistics 240cognitive psychology 43, 45, 95cognitive science 1,5, 6cognitive semantics 118coherence 60, 63, 69, 245, 249colour 3,4,18,99,138, 161-177,202-204,211,213,215,252,263,287,294,300

colour space 162, 172, 175, 178comparison 11, 12, 15,44,48-52,63,82,112,139,143,145,201,206,279

competence-performance distinction187

completions 34composition 10, 12, 13,17,19,22,60,96,113,144,145,181-183,186,247,248,252,256,261,268,271-286,299,301

computer model 58, 96, 102concepts 8,9,23,61,62, 90,92,97,

112,118,124,138,280,281,291,

Page 326: Musical Imagery

SUBJECT INDEX

292,300concrete (as opposed to abstract) 3,22,

61, 140, 141, 152, 187concurrence/nonconcurrence 138, 142conducting 181, 238consciousness 8-14, 119, 125, 301consonance 16, 97, 106-108, IIIconstraints 2, 3, 22,58,60-63,75,77,90,92,113,140,144,238-240,256,283

context 2, 17,19,21,27,28,33,40,63,66,70,78,119,120,140,143,146,243,248,271,278-280,286,290,291,295,297,300

context updating 79, 89, 90continua 12, 100continuation 35, 37, 39, 108, 141, 142contour 20, 49, 65, 110, 112, 128, 144,

145,243,254,276,278,301correlation 4,37,38,69,73,164,167,

168,182,188,190,192,202,212,274,278,279,285

cross-cultural 77, 78, 81, 90, 92cross-modal 20, 239, 245, 283, 285,

286cross-modality 3,21,239, 241, 248culture specific 3, 77, 91, 92, 280

dancing 233, 234, 281, 296, 300dance 3, 222, 226, 232-234, 242, 263,

278,279,295,296dark (attribute of timbre) 162, 164,

175,204,206-216,245declarative (knowledge) 96, 126, 241dimension (feature category) 15, 18,53,59,100,106-113,126,145,146,155-157,161-177,202,280,290,301

distractor task 49drawing 20, 110,126,231,272,276,

277dreams 7, 268, 291dual-task paradigm 46, 50

ear-training 22echoic memory 45,65

313

ecological (bases for cognition) 2, 3,22,57-75,182,238-249

eidetic (images, memory) 19, 29, 123electroencephalogram (EEG) 31-40,77,78,81

electrophysiological (data) 27-40,84-92

embodied (cognition) 119, 120, 132,183,245,248,249,253

emotional (expression, image, etc.) 3,4,11,137-158,187,279

enactive 119-121, 128-130endosomatic 119, 129envelope extraction 71episodic (long-term) memory 64, 69,71,74

epistemology 5-8, 23, 124event-perception 118event-related potential (ERP) 31-40,77-92

excitation (generation of sound) 182,201, 202, 237-249

excitement (emotions) 138-158,286exosomatic 119, 128, 129expectancy 2,3,13,21,27-40,63,78,89,91,92,112

experiential cognition 119, 120experiential phenomenology 127, 128experiment 2, 6, 16, 18-21, 27-39,44-53,63,66,74,77-92,98-113,138-153,162-178,182,186-198,203-216,237,239,248,253-255,298

expert (vs. amateur) 17,52,53, Ill,129,131,194,204,243,253,255,289

explicit (knowledge) 11,27,29,129,205,206

expressive (performance) 30, 60, 74,137,139,182,185-198,242,243,253,261,276,280,301

facial expressions 137, 138factor analysis 82, 202feature dimension 14fiddler 182,221-223

Page 327: Musical Imagery

314

filter bank 100-106filtering 3, 31, 61, 73finger movement 231,245,253,254form, see musical formformant 128, 163, 206, 209, 213, 243frequency dispersion 97frequency modulation 97, 239frequency shifts 97frontal cortex 40functional cycle 121functional equivalence 21, 58, 72, 131functional magnetic resonance

imaging (tMRI) 30functional theories (of musical

imagery) 44

gah243, 294, 295, 297gammatone 102-106geometrization 96, 97, 112, 113Gestalt 2, 13, 16, 19, 23, 90, 108, 110,

119,125,141,142,226,243,284gesture 60, 71, 74, 125, 137,239,242,

245,258,275,278-280,284,293,294

gong 97, 98grouping (of sounds) 182, 183, 258,285,294,300

hallucination 29, 57, 254Hardingfiddle music 182, 219harmonic fluctuation 243harmony 7, 16-18,252,259,266,267,

285heptatonic scales 79, 80hierarchy 13, 99, 155, 156, 182,220-234, 245, 276

history (of imagery) 5-21hue 162-178human body 119, 129,279,290,293human voice 14, 139,290

ideal objects 12ideomotor 117-132idiophones 97, 99illusion 6-9, 183, 271, 278, 285, 286,

291,300

SUBJECT INDEX

image generation 39image maintenance 39imagination 6-8, 10, 15, 18, 19,23,

112,118,125,129,185,197,198,264,273,275,278,291

immanent (knowledge) 8implicit (knowledge) 11,27,29, 122,

187improvisation 12, 14,29, 181,247,299

inanimate objects 280, 281India 183, 289-301inferior colliculus 59information processing 59, 61,63, 72,78,79,90

inharmonic sounds 3,95,97-113inner ear 47, 50, 53,57, 102, 105, 138inspiration 273-275, 280, 282, 286instrumentation 14,202,247,248,273,275,276

integration (neuronal) 66, 73, 101,102,110

integration (perceptual) 99, 101, 111,123,239-241,259

integration (sensorimotor) 121, 122,130

intentionality 10-14, 119, 131interference 49, 53, 254, 276inter-onset interval (101) 188-198intonation 16,17,223,300intra-modal 20introspection 1, 2,23, 58, 96, 197,223,230,237-239,247,248

isochrony 195, 197

judgement 10-12, 15,44,48-50,52,87,90,97,106-108,112

key center 69keyboard 17,22,53,97,103,183,186,

187,189-191,243,245,246,251-268

kinesthetic 241, 278, 285language 10,46,47,53, 123, 128, 137,

143,146,240-243,274,286,292,295,296,299-301

Page 328: Musical Imagery

SUBJECT INDEX

learning 18,52,64,66,69,70,92,129,131,132,219,221,240,242,243,245,252,275,276,290,291,299-301

lexical 138, 143-157light intensity 162-175listening 3, 11-17, 30-32, 39, 47, 59,67,79-91,96,99,108-113,119-121,123,126,129-132,146,147,185,187,205,220,232,233,247,253,271,285,299

logic 60-63, 73, 74, 285, 299long-term memory 19, 28, 30,43,45,52,53,63,64,69,74,96,137,187

looping 64, 70, 71, 74loudness 62, 107, 110, 151, 153, 158,

162-175,201

machine 4,58,130magnetoencephalogram (MEG) 31, 40mapping 59, 66, 161,182,238,254,282

mass-spring model 245mathematics 8, 74, 126mediation 123, 149,291,299-301melody 13,20,29,32-35,37,39,40,50-52,58,60,188,224,246,252,254,263,264,267,272,284,294,298,301

melographic analysis 147, 151memory 1,2,7,9,13,18,19,21,23,28,29,30,32,39,40,43-54,58-75,79,85,96,110-113,124,129, 130, 137, 182, 183, 185-187,193,194,203,247,254,272-276,286,289,291,301

mental act 12, 15, 28mental activity 11, 130, 289mental imagery 6, 7,20,28,39,43-46,52,53,58

mental practice 22, 247, 248mental representation 57,58,64,79,85,118,120,240

mental scanning 44metallophone 98, 99metaphor 6,10,14,57,128,138,165,

315

182,183,203-216,240,244,253,259,278,280-287,289,291,299

metrical 188, 294, 298microtones 299, 300mind-body relation 6, 120mismatch negativity (MMN) 31, 91,39,78

missing fundamental 100mnemonics 289modalities (senses) 3, 53, 54, 132, 138,239,241

model (auditory) 71, 95, 96,100-103,161,244,248

model (colour and sound) 161-177model (general) 121,122, 131, 138,

158,241,278model (imagery and memory) 2, 27,28,40,44-46,52,54,57-74,237,245

model (musical form) 223,225-227,230,281,285,297

model (perception) 2, 15, 79, 89, 109,112,113,239

model (performance) 192-194,215model (sound generation) 238, 239,249,290,292,293

modes (of vibration) 211modulation (tonality) 17mood 158,264,272,278-281motif 10, 221-232motor control 53, 129-132,237,246,248

motor equivalence 246motor imagery 21, 23, 53, 122,

129-132,237,238,244,246-248motor preparation 120, 131motor program 122,244,246,247motor response 51motor sensation 259motor theory (of perception) 23, 122,

129,240-243,246,247motor-mimetic 242, 247, 248motor-tactile 251multidimensional (features) 14, 162,

177,201,237,248multidimensional scaling 202

Page 329: Musical Imagery

316

multi-modal 2, 74,251,253,275,278multi-modular 2, 53musical behaviour 220-223, 225, 230,234,240,271,279,280,282,284,285

musical form 12, 14, 22, 132, 143,153,182,219-234,272

musical history 233, 234, 292musical instruments 14, 101, 132, 143,

183,187,201,203,205,212,213,215,220

musical object 13, 14, 18, 20, 124,219,248

musical training 19, 49, 129musique concrete 279

natural science 8network 65, 101, 182, 226, 228, 229,231-234

neural activity pattern (NAP) 31, 102neuroimaging 29, 40neurological 21, 237,241,248neuromusicology 58, 77neuron 2, 30,31,58-65, 72, 74neuronal 44, 58, 59,61,62,65, 66, 70,74,89,241

neurophysiological (bases for musicalimagery) 2, 3, 27-40, 80, 123, 128

neuroscience 6, 122nominalism 117non-invasive neurometrical methods30,77,92

notation 17,50,81,109,110,126,182,185,219,231,233,251,252,254,258,272,274,276,277,280,285,289,290-298

objectivist 118, 119oddball paradigm 79-81ontology 13oral tradition 183, 289, 294, 299orchestration 202, 248, 249ornamental 79,232,293,294,297,301

paired-image subtraction method 59paradigm 1,5,46,50, 79, 80, 81, 117,

SUBJECT INDEX

118,122,132,183,234,237parsing 145, 222, 242perceptual ambiguity 95, 98, 99perceptual scales 176performance 13, 17, 2, 28, 46-54,59,64,71,74,91,103,112,138,139,141,143-145,156,158,181,182,185-198,204-216,220-234,248,251,253-256,258,259,261,274,280,281,292,294-296,299,301

performer 22, 74, 141, 181, 182, 215,219,220,221,229,230,234,238,242,261,272,273,280,283,284,286,300,301

periodicity 59, 68, 69, 95, 98, 100-102permanence127phenomenological 2, 6, 11, 14, 19,23,

120,238,239,248phenomenology 127, 128philosophical 2, 5, 6, 10, 11, 15, 19,

118,126,299,300philosophy 6, 8, 13, 95, 183, 300philosophy of mind 6, 183,phonetic representations (of musicalsound) 292

phonological loop 46, 47, 51phonology 246phoronomic 126physiological measures 30, 32pictorial (representation) 96, 249pitch (colour representation) 161-178pitch (comparison) 15, 20,21,40,44,48-50, 52, 202-204,211

pitch (general) 27,29,59-63,66,68-70,126,138,141-145,147,149,151-153,155,186,223,237,243,245,251-268,272,275,276,281,282,290,294-300

pitch (perception) 3, 17, 20, 21, 77-92,95-113

pitch syllables 294pitch space 243place-code 59,62,66playing 22, 30, 53, 90, 120, 132, 182,

183,186,205,206,213,215,222-227,231-233,238,245,253,

Page 330: Musical Imagery

SUBJECT INDEX

254,256,259,272,276,279playing technique 182,205,212,215pleasantness 106, 107, 112positron emission tomography (PET)6,30,32,40

prefix 243prefrontal cortex 29priming 22,33,247,248procedural (knowledge) 96, 126, 241processing 20, 28-33,45, 54, 58-63,67,72-74,78-81,87,89-91,95,96,102, 110-113, 117, 118, 121, 125,130, 132

processing negativity 87, 90, 91propositional (knowledge) 96, 123,

124prosodic 137-158protention 2, 13, 21pseudo-fundamental 98, 111pseudo-periods 101

qualia 22, 238qualitative discontinuities 239qualitative method 167

radical constructivism 119radical empiricism 123, 125, 130raga 90, 296, 297, 300, 301rate-place code 62rate-time code 62reaction time 21, 30, 44real time 110, 112, 113, 124, 155, 256,

286realism 117recall 4, 7,23,47,49-53,57,58,69,

112,124,185,232,244,247,272recoding 47, 221, 247recognition 15, 20, 49, 50,52, 64, 65,67,69,71,74,96,110,111,129,253,258,275

recollection 13, 15, 20, 39, 112, 238,247

regional cerebral blood flow 30rehearsal 45-51, 254relaxation 140, 147, 151, 243, 245, 249reproductive (imagination) 9, 118

317

reproductive (task) 20resolution 22,30,40,60, 63, 71, 124,

182,246resonance 65,67,69,71,182,211,237,239,242,244,245,248,249,290

resonant features 239, 244, 245retention 2, 13,21,47,254,275rhythm 22,53,60,62,64, 70,71, 128,

142, 143, 145, 147, 149, 151, 153,186-189,232,243,246,268,272,276,278,279,281,282,285,290,294,301

rhythmic syllables 294roughness 16, 98, 106-108, 111, 162round (musical form) 223,227,231,

232

Sanskrit 290, 292-295sargam notation 296-297saturation (colour) 162-178scale (pitch) 3, 20, 30, 77-92, 109,

138,142,143,145,153,252,263,296,297,300

scanning (mental) 32,40, 44, 69, 118,125

schema 2-21,64,65,69,70, 77, 90-92,112, 118, 126, 129, 137-158, 187,239-244,248,249,253

schematizing function 118secondary auditory cortex 30, 59segregation 183self-consciousness 9self-organization 2, 3, 64, 65, 69, 75,

130,244sensation 2, 3, 7-12, 15, 16, 23, 58,59,87,89,91,98-100,107,108,110,112,117,118,123,132,162,163,201,241,245,252,299

sentograph 186short-term memory 43, 63, 79, 85signal based 239signal processing 58, 73, 74, 113, 239simulation 64, 100, 117, 122, 129-132,239,240,243,244,247

singing 20, 22, 97, 143,242,245,

Page 331: Musical Imagery

318

252-254, 300sketch 11,46, 183,272,274,276,277,

285,286sonic object 280, 284, 286sonorous qualities 183, 237sound colour 18, 201-216sound effects 278sound object 14sound-producing actions 23, 128, 237,238,242,245-248

soundscapes 161source coherence 245, 249space 7, 14, 15,61-69, 74, 108, 109,

113, 119, 126, 127, 162, 164, 165,172,173,175,178,202,233,253,256,258,259,266,282,285

space and time 8, 12, 60, 63spaciousness 212spatio-temporaI40, 57-66,69, 72, 74,95

spectral analysis (ofEEG data) 32spectral analysis (of sound) 102, 105,

287spectral centroid 97, 99, 205, 209spectral composition 97, 101, 102, 108spectral density 97, 99, 100spectral pitch 99, 100speech 4, 6, 47-53,95, 100, 102, 113,

128,137-158,240,243,254,273,278-280,289,292,294

sruti 89, 290, 298-300statistical (long-term) memory 63-74strike note 100subharrnonic matching 98subjectivity 119sub-symbolic 2, 161subvocal (rehearsal) 47-52,254subvocalization 47, 254suffix 243superior temporal gyrus 40sustained (sound-producing actions)

245,249svara 296, 299-301syllable 20, 143, 145, 146, 149, 151,290-301

symbol-based 60, 72, 75

SUBJECT INDEX

symmetry 12,60,61,127,141,145,149,252,256,259,267

synaesthesia 4, 21, 163, 164synchronization 31, 193-196syntactic 12, 113synthetic sound 202tala 294, 295, 301tempo 14,44,107,152,153,181,

186-198,215,275-277,294temporal 13, 14,21,30-32,37,40,44,53,57,59,60,62-64,71,97,101,102,105,110,118,124,127,187,190,203,276-278,281,282

tension 99, 111, 125, 129, 131,140-142,151,245,279

texture 10, 12, 16, 17, 100, 101, 111,113,138,142-145,162,177,237,246,249,265,274,281,283,284

timbre 18,20-22,27,44,62, 69,98-100,107,110,113,128,142,143,145,147,162,163,176,177,182,186,201-215,237,238,249,252,253,281,294,298,299

time (perception and cognition) 8,12-15,23,57-75,124-132,219,222,238,242,272,281,282,284,286

time (general) 3, 19, 20, 28, 29, 40, 44,47, 78, 84-92, 96, 110, 111, 113,117-119,123,254,293

time/frequency representation 102,103,105,108,109,202,203,205,211,214-216,276,277

time function 18,97, 98, 102time-code 59, 60, 62, 63, 66, 74time-domain 32, 62, 102time patterns 59time-scales 63, 68, 182, 277time series 38time-space 108,241time to place mapping 59time window 37timing 125, 139, 181, 182, 185-198,232,280

tone colour 18, 294top-down 30, 39, 62, 64, 65,69, 71,

Page 332: Musical Imagery

SUBJECT INDEX

239,240topography 35-40topology 64, 65, 70trajectory 67, 69, 70,74,109,126,

131, 241-247transcendental 8transient 21, 99, 202, 203, 205, 214,237,243,247

tune (melody) 10, 22, 50, 182, 186,187, 221-234, 283

tuning 16,17,97,102,103,107,263,296,297,300

Umwelt 121universal (features, principles) 23, 79,91,92,113,139,140,155,243

verbal attributes 182, 202-215verbal memory 47,53Verschmelzung 16, 18vibrato 181,205,210,211, 214,215,277

violation (of expectancies) 2, 3, 37, 89,91, 140, 158

319

virtual pitch 98-100vision 3, 6, 7, 19,96, 127, 167,231,241,253,274,275,291

visual dimension 161-178visual imagery 21, 22, 46, 51,275-278, 300

visual image 28,44,219,231,253,278,279

visual percept 161, 162visual tracking 254visualizing 241, 254vivid (intensity of image) 14, 29, 39,57,59,185,186,275,297

vocal (apparatus) 128, 243, 244, 246,249,254,290,292,293

voice leading 106, 107, 126,252,261

working memory 39, 43-54, 64, 72,130

xylophone 97, 243

zooming 22, 277

Page 333: Musical Imagery

STUDIES ON NEW MUSIC RESEARCH

1. Signal Processing, Speech and Music.Stan Tempelaars.1996. ISBN 90 265 1481 6 (hardback)

2. Musical Signal Processing.Edited by C. Roads, S. T. Pope, A. Piccialli, G. de Polio1997. ISBN 90 265 1482 4 (hardback)

ISBN 90 265 1483 2 (paperback)

3. Rhythm Perception and Production.Edited by Peter Desain and Luke Windsor.2000. ISBN 90 265 1636 3 (hardback)

4. Representing Musical Time - A Temporal-Logic Approach.Alan Marsden.2000. ISBN 90 265 1635 5 (hardback)

5. Musical Imagery.Edited by Rolf Inge God¢y and Harald l¢rgensen.2001. ISBN 90 265 1831 5 (hardback)