Icmpc 2010 Paper




Citation preview


    Ben Duane

    Northwestern University


    Although melodies seem to be focal points for attentionfigures to

    the grounds created by other musical lineslittle is known about

    why this is so. This paper tests the hypothesis that melodies mark

    themselves for attention partly by being less predictable than the

    lines that accompany them. As in several previous studies,

    predictability is quantified using various types of information

    entropy. These entropies are computed for melodic and

    non-melodic lines extracted from two musical corporaone

    containing rock songs, the other containing Baroque keyboard

    minuets. Results show that the various entropies not only are

    significantly higher in melodies than in non-melodies, but can be

    used to classify lines as melodic or non-melodic with above-chance



    Melodies, more than other musical lines, seem to attract listeners

    attentionto occupy the foreground of their perception. Indeed,

    the whole concept of melody is often explained using the

    figure-ground metaphor. Melodies are not just coherent series of

    notes, they are principal. They stand apart from their

    accompaniments, like faces from the backgrounds of portraits, and

    place themselves under the listeners notice.

    Figure 1 Boccherini, string quartet, op. 11, no. 5, mm. 1-4.

    When listening to Figure 1, for instance, ones attention is drawn

    more to the melodic line of the Violin 1 than to the non-melodic

    lines of the other four parts. It is for this reason, I suspect, that

    Boccherinis minuet can be identified easily by the first violins line

    but not by one of the other four. For someone who did not know the

    piece by name, it would make perfect sense to refer to the quintet

    whose melody goes:

    But not the quintet with that cello line:

    The Violin 1 melody is a signature of sortsa feature by which the

    piece can be instantly recognizedand it could hardly function this

    way without being a focal point of the listeners attention.

    This much is almost common sense. Most of us realize, at least

    intuitively, that melodies are uniquely marked for attention. But

    what marks them? What about melodies makes them stand out,

    attract attention, acquire principality? One possibility is that our

    attention is drawn not to the melody per se but to the top of the

    texture, since the melody is usually the highest line present. This,

    however, does not explain excerpts like the Boccherini, in which

    the non-melodic Violin 2 and Viola are often above the melodic

    Violin 1. The effect might also be due not to the melody but its

    performance. One might, in other words, attend to Figure 1s first

    violin part because the first violinist brings it out. But how would

    he or she know to do this? First violinists cannot count on always

    having the melody. No, it must be something about the line itself,

    not how it is played or orchestrated, that designates it as melody and

    prompts the listener to attend to it.

    That something, which marks melodies for attention, might be

    partly their predictabilityor lack thereof. In Figure 1, the Violin 1

    line is, in every apparent way, less predictable than the other four.

    This line features nothing as pervasive and uniform as the second

    violins Es, the violas eighth notes, the first cellos thirds, or the second cellos articulations. The melody is simply the least

    predictable, most interesting line in this example. And so it is in

    many other examples from many other musical styles. Countless

    accompanimental patterns, from the Alberti bass to the oompah

    bass, are more predictable than the melodies they are paired with.

    Perhaps, then, the lower predictability of melodies is operativea

    feature that somehow directs our attention toward them.

    This hypothesis is tested here. Like several previous authors, I

    quantify the predictability of musical lines using different types of

    information entropy. These quantities are calculated across two

    musical corpora, each containing one melodic line and one

    non-melodic line. The results suggest that melodies indeed tend to

    be less predictable than non-melodies, meaning that this lower

    predictability might mark them for attention.


    Madsen and Widmer (2006) tested essentially the same hypothesis

    as this paperthat listeners tend to focus on the most complex

    (least repetitive) voice, experiencing this as foreground (p. 1812).

    Like me, these authors quantify complexity using different types of

    information entropy, but their work differs from mine on two

  • counts. First, they compute some types of entropy that I do not, and

    vice-versa. (They do not compute the sub-phrase entropy described

    in section 3.4, for instance.) Second, their data comprise one

    symphony and one concerto, whereas mine include a corpus of

    minuets and a corpus of rock songs.

    Although psychologists have not, to my knowledge, studied why

    melodies attract attention, they have researched musical attention

    from other angles. Davison and Banks (2003) had subjects listen to

    two-voice counterpoint, instructing them to attend to one of the

    voices, and found that their perception of the voice attended to was

    affected by the structure of the voice ignored. Dowling, Lung, and

    Herrbold (1987) found that if listeners are primed to expect notes in

    certain pitch regions, and at certain times, they often direct their

    attention accordingly. And Bigand, McAdams, and Fort (2000)

    tested two competing models of musical attention: a divided

    attention model, in which listeners concurrently attend to multiple

    lines; and a figure-ground model, in which listeners attend to just

    one part at a time. The authors results, however, did not fully

    support either hypothesis, leading them to propose a third,

    integrative model, by which listeners switch their attention

    between one voice at a time and all voices at once.

    Several researchers have applied information theory to the study of

    music. Meyer (1967a, 1967b) discussed connections between

    information theory and musical meaning and aesthetics. Other

    authors have tried to use information entropy as a metric of musical

    style (Youngblood 1958, Knopoff and Hutchinson 1983, Snyder

    1990, Margulis and Beatty 2008). Hiller (1967) performed an

    information-theoretic analysis on Weberns Symphonie, op. 21.

    And Knopoff and Hutchinson (1981) proposed a method for

    computing information-theoretic quantities for musical continua.

    None of this research, however, attempted to use entropy to predict

    whether lines are melodic or not.


    3.1 First- and Second-Order Entropy

    The central premise of information theory is, of course, that

    information and predictability are inversely relatedthat the

    unexpected is also the most informative. Mathematically, this

    means that as probability decreases, information increases or, more

    specifically, that the information, I(x), of some event x is given by

    Equation 1:

    2logI x p x (1)

    where p(x) is the probability of x.

    Entropy, then, is the average information of each event in a signal

    (i.e. a series of events). The first-order entropy, H(1), of a signal is

    defined as:

    1 1



    i ii


    i ii

    H p x I x

    p x p x


    where n is the number of possible events xi. (That first-order

    entropy derives from zeroth-order Markov probability is an

    unfortunate quirk of standard terminology.) Equation 2 is

    conceived as a mean of the information, I(xi), across all possible

    events xi, weighted by the respective probabilities, p(xi). As such,

    this mean estimates the expected information of each event in the

    signal or, in another sense, each events average surprisal.

    Say, for example, that the signal is a series of coin flips. The

    alphabet of possible events would be X = {heads, tails}, and the

    first-order entropy of the signal would be:

    2 21 log logH p heads p heads p tails p tails (3)

    If the coin was fair, and p(heads) = p(tails) = 0.5, the entropy would

    be H(1) = 1. But if the coin was riggedsay with p(heads) = 0.75

    and p(tails) = 0.25then the entropy would be H(1) = 0.81. Entropy

    is lower with the rigged coin because that coin is more predictable:

    most of the time, it will turn up heads, whereas the first coins

    distribution is half-and-half.

    Second-order entropy is computed using first-order transition

    probabilitiesthat is, the probabilities of observing certain events,

    given the events directly preceding them. The formula is:

    22 1 1 log

    n n

    i i j i ji jH p x p x p x


    where p(xi) is the probability of event xi and pi(xj) is the probability

    of event xj given that event xi has just occurred. Equation 4 could be

    rewritten as:

    2 11


    i iiH p x H x


    where H(1)(xi) is the first-order entropy of the signal, given that xi

    was the most recent event. Second-order entropy, then, is a mean of

    the first-order entropies H(1)(xi) for each prior event xi, weighted by

    the probabilities p(xi) of these events.

    3.2 Pitch Entropy

    In what follows, pitch entropy is defined using the alphabet of

    possible melodic intervals X = {leap up, step up, unison, step down,

    leap down}. To compute this entropy for a given line, each interval

    is placed into one of the categories in X, as illustrated in Figure 2.

    Each intervals probability, p(x) for each xX, is estimated by counting that intervals instantiations. The lines entropy is then

    calculated from these probabilities.

    Figure 2 Vocal line from She Loves You by the Beatles.

    We could, of course, expand the alphabet of intervals. The leap up

    category, for instance, could distinguish between thirds, fourths,

    fifths, and so on. But larger alphabets can become too large for the

    relatively short signals found in music. If an alphabet contains n

    intervals, then as n increases, so does the likelihood that some

    intervals will be absent from a given musical line. Such absences

    become even more likely with the n2 possible pairs of intervals

  • reflected by second-order entropy. And, when many possibilities

    are not represented, entropy seems to become less effective in

    distinguishing musical signals. The size of the interval alphabet is

    kept low for this reason.

    3.3 Rhythmic Entropy

    Entropy is also computed with respect to rhythmic durations. For

    each note in the line under analysis, the following value is obtained

    (see Figure 2):

    2logb round b (6)

    where b is the number of beats the note occupies, and round{}

    returns the integer nearest . Rhythmic entropy, then, is defined

    using the alphabet X = {b < 3, b = 3, b = 2, b = 1, b = 0, b = 1, b = 2, b = 3, b > 3}. This definition serves the dual purpose of transforming rhythmic duration from a continuous to a discrete

    variable, which makes entropy simpler to calculate, and limiting the

    number of possibilities, for the reason discussed in section 3.2.

    3.4 Entropy of Sub-Phrases

    We seem, as listeners, to process melodies and other lines not only

    as complete units but also as series of shorter segments. (The

    melody in Figure 2, for example, is easily heard as the segments

    shown beneath the staff.) It is plausible, therefore, that computing

    the entropy of such segments, rather than of full lines, would be

    usefulthat it would tap into some key aspect of music perception

    that we would otherwise overlook. I obtained such quantities for the

    analysis reported below. Melodies and other lines were

    automatically segmented into sub-phrases, using an algorithm

    proposed by Cambouropoulos (1997), and then entropy was

    computed for these sub-phrases.


    To assess whether melodic lines are in fact less predictable than

    non-melodic lines, two corpora of music were assembled. Two

    linesone melodic, the other non-melodicwere extracted from

    each musical work. For each line, six varieties of entropy were

    computed: first- and second-order pitch and rhythmic entropy for

    the complete lines and first-order pitch and rhythmic entropy for

    lines segmented into sub-phrases.

    The first corpus included the nine two-part minuets from J. S.

    Bachs Notebook for Anna Magdalena Bach. In this case, the right

    hand was considered melodic, the left hand non-melodic. The

    second corpus comprised the first verse and chorus from thirteen

    rock songs by The Beatles, Cream, Tom Petty, Billy Joel, and Elvis

    Costello. For each song, I transcribed the vocal line, which was

    considered melodic, and the bass line, which was considered


    5. RESULTS

    The data from each corpus were analyzed in two ways. Multivariate

    analyses of variance and paired t-tests were used to compare the

    entropies of melodic and non-melodic lines. And linear

    discriminant analysis (LDA), quadratic discriminant analysis

    (QDA), and logistic regression (LR) were employed to see how

    well the six entropies could classify lines as melodic or

    non-melodic. I tested classifiers using not only all six entropies

    together, but also the other sixty-two possible subsets of these six.

    Accuracy was defined as the percentage of pieces in the corpus that

    were correctly classified, and statistical significance was evaluated

    using a binomial test. The classifiers were tested using

    cross-validation. For each piece, a classifier was trained on the

    other members of the corpus, and this classifier was tested on the

    withheld piece.

    The results for the minuets are shown in Tables 1 and 2. A

    MANOVA showed a significant difference in one dimension

    between the entropies of the right and left hands (p = 0.01).

    Follow-up t-tests showed that both second-order pitch entropy for

    full lines and first-order pitch entropy for sub-phrases were

    significantly higher in melodic than in non-melodic lines. None of

    the other entropies produced significant differences. The LDA

    classifier performed above chance when using all six entropies,

    labeling the minuets with 83.33% accuracy (p < 0.01). Several

    subsets of the six entropies increased the accuracy of all three

    classifiers to 88.89% (p < 0.001).


    P-VALUE Mean St. Dev. Mean St. Dev.



    1.51 0.22 1.26 0.21 0.10



    2.02 0.12 1.97 0.18 0.73



    1.11 0.15 0.85 0.15 0.01



    1.54 0.15 1.59 0.15 1.00




    1.50 0.22 1.09 0.16 0.01




    1.98 0.18 1.82 0.25 0.20

    Table 1 Distributions of entropies for the minuets.


    Accuracy p-value Accuracy p-value

    LDA 83.33% 0.004 88.89%

  • rhythmic for sub-phrases, were significantly higher in vocal lines

    than bass lines. All three classifiers performed above chance when

    using the six entropies. Each classifiers performance improved

    with certain subsets of the entropies, with the LR model reaching

    100% accuracy in one case.


    P-VALUE Mean St. Dev. Mean St. Dev.



    1.85 0.29 1.20 0.51