20
1 Personalized Music Emotion Recognition via Model Adaptation Ju-Chiang Wang, Yi-Hsuan Yang, Hsin-Min Wang, and Skyh-Kang Jeng Academia Sinica, National Taiwan University, Taipei, Taiwan

Personalized Music Emotion Recognition via Model Adaptation

Embed Size (px)

DESCRIPTION

In the music information retrieval (MIR) research, developing a computational model that comprehends the affective content of music signal and utilizes such a model to organize music collections have been an essential topic. Emotion perception in music is in nature subjective. Consequently, building a general emotion recognition system that performs equally well for every user could be insufficient. In contrast, it would be more desirable for one's personal computer/device being able to understand his/her perception of music emotion. In our previous work, we have developed the acoustic emotion Gaussians (AEG) model, which can learn the broad emotion perception of music from general users. Such a general music emotion model, called the background AEG model in this paper, can recognize the perceived emotion of unseen music from a general point of view. In this paper, we go one step further to realize the personalized music emotion modeling by adapting the background AEG model with a limited number of emotion annotations provided by a target user in an online and dynamic fashion. A novel maximum a posteriori (MAP)-based algorithm is proposed to achieve this in a probabilistic framework. We carry out quantitative evaluations on a well-known emotion annotated corpus, MER60, to validate the effectiveness of the proposed method for personalized music emotion recognition.

Citation preview

Page 1: Personalized Music Emotion Recognition via Model Adaptation

1

Personalized Music Emotion Recognition via Model Adaptation

Ju-Chiang Wang, Yi-Hsuan Yang, Hsin-Min Wang, and Skyh-Kang Jeng

Academia Sinica, National Taiwan University,Taipei, Taiwan

Page 2: Personalized Music Emotion Recognition via Model Adaptation

2

Outline

• Introduction

• The Acoustic Emotion Gaussians (AEG) Model

• Personalization via MAP Adaptation

• Music Emotion Recognition using AEG

• Evaluation and Result

• Conclusion

Page 3: Personalized Music Emotion Recognition via Model Adaptation

3

Introduction

• Developing a computational model that comprehends the affective content from musical audio signal, for automatic music emotion recognition and content-based music retrieval

• Emotion perception in music is in nature subjective (fairly user-dependent)– A general music emotion recognition (MER) system

could be insufficient– One’s personal device is desirable to understand

his/her perception of music emotion– Adaptive MER method, efficient and effective

Page 4: Personalized Music Emotion Recognition via Model Adaptation

4

Basic Idea

• The UBM-GMM system for speaker adaptation– State-of-the-art systems for speaker recognition– A large background GMM (UBM), representing the

speaker-independent distribution of acoustic features– Obtain the speaker-dependent GMM via model

adaptation with the speech data of a specific speaker• Adaptive MER method for personalization

– A probabilistic background emotion model, learn the broad emotion perception of music from general users

– Personalize the background emotion model via model adaptation in an online and dynamic fashion

Page 5: Personalized Music Emotion Recognition via Model Adaptation

5

Multi-Dimensional Emotion• Emotions are considered as numerical values

(instead of discrete labels) over two emotion dimensions, i.e., Valence and Arousal (Activation)

• Good visualization, a unified model

Mr. Emo developed by Yang and Chen

Page 6: Personalized Music Emotion Recognition via Model Adaptation

6

The Valence-Arousal Annotations

• Different emotions may be elicited from a song• Assumption: the VA annotation of a song can be

drawn from a Gaussian distribution, as observed• Learn from the multiple annotations and the acoustic

features of the corresponding song• Predict the emotion as a single Gaussian

Page 7: Personalized Music Emotion Recognition via Model Adaptation

7

The Acoustic Emotion Gaussians Model

• Represent the acoustic features of a song by a probabilistic histogram vector

• Develop a model to comprehend the relationship between acoustic features and VA annotations– Wang et al. (2012), “The acoustic emotion Gaussians model for

emotion-based music annotation and retrieval,” Proc. ACM Multimedia (full paper)

Acoustic GMM Posterior Distributions

Page 8: Personalized Music Emotion Recognition via Model Adaptation

8

Construct Feature Reference Model

A1 A2AK-1

AK A3A4

Each component representsa specific pattern

EM Training

A Universal Music Database

Acoustic GMM

Music Tracks& Audio Signal

Frame-based Features

… …

… …

Global Set of frame vectors randomlyselected from each track

Music Tracks& Audio Signal

A Universal Music Database

Music Tracks& Audio Signal

Page 9: Personalized Music Emotion Recognition via Model Adaptation

9

Represent a Song into Probabilistic Space

1

2

K-1

K…

Posterior Probabilities over the Acoustic GMM

A1

A2

AK-1

Acoustic GMM

AK

Feature Vectors Histogram:Acoustic GMM Posterior

prob

1 2 K-1 K…

Page 10: Personalized Music Emotion Recognition via Model Adaptation

10

Generative Process of VA GMM

• Key idea: Each component in acoustic GMM can generate a component VA Gaussian

Audio Signal of Each Clip

A Mixture of Gaussians in the VA Space

A1

A2

AK-1

Acoustic GMM

AK

1

2

K-1

K…

Viewed as a set of acoustic codewords

Page 11: Personalized Music Emotion Recognition via Model Adaptation

11

The Likelihood Function of VA GMM

• Each training clip is annotated by multiple users {uj}, indexed by j

• An annotated corpus: assume each annotation eij of clip si can be generated by a weighted VA GMM with {qik}!

• Generating the Corpus-level likelihood and maximize it using the EM algorithm

1 1 1 1

( | ) ( | ) ( | , )jU KN N

i ik ij k ki i j k

p p s q= = = =

= = å E E e m S

1

( | ) ( | , )K

ij i ik ij k kk

p s q=

=åe e Sm

Acoustic GMM posterior

Clip-level likelihood:Each user contributes equally

parameters of each latent VA Gaussian to learn

Annotation-levelLikelihood

Page 12: Personalized Music Emotion Recognition via Model Adaptation

12

Personalizing VA GMM via MAP

• Apply the Maximum A Posteriori (MAP) adaptation• Suppose we have a set of personally annotated songs

{ei, qi}, i =1,…,M• The posterior probability over each component zk for ei is

• The expected sufficient statistics with posterior and ei1

( | , )( | , )

( | , )ik i k k

k i i K

iq i q qq

p zq

q=

ee

e

m

m

Sq

S

1

1

( | , )( )

( | , )

M

k i i iik M

k i ii

p zE

p z=

=

¬åå

e e

em

q

q

1

1

( | , )( )

( | , )

W Tk ij i i ii

k W

k i ii

p zE

p z=

=

=åå

e e e

e

qS

q

Page 13: Personalized Music Emotion Recognition via Model Adaptation

13

MAP for GMM: Parameter Interpolation

• The updated parameters for the personalized VA GMM can be derived by interpolation

• The effective number of component zk for the target user

• The interpolation factors (data-dependent) can be set by

( ) (1 ) ,k k k k kEa a¢ ¬ + -m m m

( ) (1 )( ) .T Tk k k k k k k k kEa a¢ ¢ ¢¬ + - + -S S S m m m m

1

( | , )M

k k i ii

M p z=

= å e q

k

kk M

M

Parameter interpolation between the expectation and background

Page 14: Personalized Music Emotion Recognition via Model Adaptation

14

Graphical Interpretation – MAP Adaptation

1

2

3 6

5

4

Interpolation Factor

Acoustic GMM Posterior

The personal annotation can be applied to clips exclusive to the background training set

Page 15: Personalized Music Emotion Recognition via Model Adaptation

15

Music Emotion Recognition

• Given the acoustic GMM posterior of a test song, predict the emotion as a single VA Gaussian

1

2

K-1

K

Acoustic GMM Posterior Learned VA GMM Predicted Single Gaussian

1

ˆˆ( | ) ( | , )K

k ij k kk

p s q=

=åe e m S

^

^

^

^

{ , }*m *S

Page 16: Personalized Music Emotion Recognition via Model Adaptation

16

Find the Representative Gaussian

• Minimize the cumulative weighted relative entropy– The representative Gaussian has the minimal cumulative

distance from all the component VA Gaussians

• The optimal parameters of the Gaussian are

( )KL{ , }

1

ˆ( | , ) argmin ( | , ) || ( | , )K

k k kk

p D p pq* *

=

= åe e eS

S S Sm

m m m

*

1

ˆ K

k kk

q=

= åm m

( )* * *

1

ˆ ( )( )K

Tk k k k

k

q=

= + - -åS S m m m m

Page 17: Personalized Music Emotion Recognition via Model Adaptation

17

Evaluation – Dataset and Acoustic Features

• MER60– 60 music clips, each is 30-second– 99 users in total, each clip annotated by 40 subjects– 6 users have annotated all the clips– Evaluate the personalization based on these 6 users

• Bag-of-frames representation, perform the analysis of emotion at the clip-level, instead of frame-level– 70Dim: dynamic, spectral, timbre (13 MFCCs, 13 delta MFCCs,

and 13 delta-delta MFCCs), and tonal

Page 18: Personalized Music Emotion Recognition via Model Adaptation

18

Evaluation – Incremental Setting

• Incremental adaptation experiment per target user– Randomly split all the clips (w/ annotations) into 6 folds – Perform 6-fold CV

• Hold out one fold for testing• The rest 5 folds: All the annotations except the

target user’s to train a background VA GMM• Add one fold of annotation of target user into the

adaptation pool (P=5 iterations loop) – Adaptation pool to adapt the background VA

GMM– Evaluate the prediction performance of the test

fold

Page 19: Personalized Music Emotion Recognition via Model Adaptation

19

Evaluation – Result

• Metric (ALLi): compute the log-likelihood of the predicted Gaussian with the ground truth annotation of the target user

Page 20: Personalized Music Emotion Recognition via Model Adaptation

20

Conclusion and Future Work

• The AEG model provides a principled probabilistic framework that is technically sound, and flexible for adaptation

• We have presented a novel MAP-based adaptation technique which is very efficient for personalizing the AEG model

• Demonstrated the effectiveness of the proposed method for personalizing MER in an incremental learning manner

• We will investigate the maximum likelihood linear regression (MLLR) that learns a linear transformation over the parameters of the AEG model