Upload
ju-chiang-wang
View
514
Download
0
Embed Size (px)
DESCRIPTION
In the music information retrieval (MIR) research, developing a computational model that comprehends the affective content of music signal and utilizes such a model to organize music collections have been an essential topic. Emotion perception in music is in nature subjective. Consequently, building a general emotion recognition system that performs equally well for every user could be insufficient. In contrast, it would be more desirable for one's personal computer/device being able to understand his/her perception of music emotion. In our previous work, we have developed the acoustic emotion Gaussians (AEG) model, which can learn the broad emotion perception of music from general users. Such a general music emotion model, called the background AEG model in this paper, can recognize the perceived emotion of unseen music from a general point of view. In this paper, we go one step further to realize the personalized music emotion modeling by adapting the background AEG model with a limited number of emotion annotations provided by a target user in an online and dynamic fashion. A novel maximum a posteriori (MAP)-based algorithm is proposed to achieve this in a probabilistic framework. We carry out quantitative evaluations on a well-known emotion annotated corpus, MER60, to validate the effectiveness of the proposed method for personalized music emotion recognition.
Citation preview
1
Personalized Music Emotion Recognition via Model Adaptation
Ju-Chiang Wang, Yi-Hsuan Yang, Hsin-Min Wang, and Skyh-Kang Jeng
Academia Sinica, National Taiwan University,Taipei, Taiwan
2
Outline
• Introduction
• The Acoustic Emotion Gaussians (AEG) Model
• Personalization via MAP Adaptation
• Music Emotion Recognition using AEG
• Evaluation and Result
• Conclusion
3
Introduction
• Developing a computational model that comprehends the affective content from musical audio signal, for automatic music emotion recognition and content-based music retrieval
• Emotion perception in music is in nature subjective (fairly user-dependent)– A general music emotion recognition (MER) system
could be insufficient– One’s personal device is desirable to understand
his/her perception of music emotion– Adaptive MER method, efficient and effective
4
Basic Idea
• The UBM-GMM system for speaker adaptation– State-of-the-art systems for speaker recognition– A large background GMM (UBM), representing the
speaker-independent distribution of acoustic features– Obtain the speaker-dependent GMM via model
adaptation with the speech data of a specific speaker• Adaptive MER method for personalization
– A probabilistic background emotion model, learn the broad emotion perception of music from general users
– Personalize the background emotion model via model adaptation in an online and dynamic fashion
5
Multi-Dimensional Emotion• Emotions are considered as numerical values
(instead of discrete labels) over two emotion dimensions, i.e., Valence and Arousal (Activation)
• Good visualization, a unified model
Mr. Emo developed by Yang and Chen
6
The Valence-Arousal Annotations
• Different emotions may be elicited from a song• Assumption: the VA annotation of a song can be
drawn from a Gaussian distribution, as observed• Learn from the multiple annotations and the acoustic
features of the corresponding song• Predict the emotion as a single Gaussian
7
The Acoustic Emotion Gaussians Model
• Represent the acoustic features of a song by a probabilistic histogram vector
• Develop a model to comprehend the relationship between acoustic features and VA annotations– Wang et al. (2012), “The acoustic emotion Gaussians model for
emotion-based music annotation and retrieval,” Proc. ACM Multimedia (full paper)
Acoustic GMM Posterior Distributions
8
Construct Feature Reference Model
A1 A2AK-1
AK A3A4
Each component representsa specific pattern
EM Training
A Universal Music Database
Acoustic GMM
Music Tracks& Audio Signal
Frame-based Features
… …
… …
Global Set of frame vectors randomlyselected from each track
…
Music Tracks& Audio Signal
A Universal Music Database
Music Tracks& Audio Signal
9
Represent a Song into Probabilistic Space
1
2
K-1
K…
Posterior Probabilities over the Acoustic GMM
…
A1
A2
AK-1
Acoustic GMM
AK
…
Feature Vectors Histogram:Acoustic GMM Posterior
prob
1 2 K-1 K…
10
Generative Process of VA GMM
• Key idea: Each component in acoustic GMM can generate a component VA Gaussian
Audio Signal of Each Clip
A Mixture of Gaussians in the VA Space
…
A1
A2
AK-1
Acoustic GMM
AK
1
2
K-1
K…
Viewed as a set of acoustic codewords
11
The Likelihood Function of VA GMM
• Each training clip is annotated by multiple users {uj}, indexed by j
• An annotated corpus: assume each annotation eij of clip si can be generated by a weighted VA GMM with {qik}!
• Generating the Corpus-level likelihood and maximize it using the EM algorithm
1 1 1 1
( | ) ( | ) ( | , )jU KN N
i ik ij k ki i j k
p p s q= = = =
= = å E E e m S
1
( | ) ( | , )K
ij i ik ij k kk
p s q=
=åe e Sm
Acoustic GMM posterior
Clip-level likelihood:Each user contributes equally
parameters of each latent VA Gaussian to learn
Annotation-levelLikelihood
12
Personalizing VA GMM via MAP
• Apply the Maximum A Posteriori (MAP) adaptation• Suppose we have a set of personally annotated songs
{ei, qi}, i =1,…,M• The posterior probability over each component zk for ei is
• The expected sufficient statistics with posterior and ei1
( | , )( | , )
( | , )ik i k k
k i i K
iq i q qq
p zq
q=
=å
ee
e
m
m
Sq
S
1
1
( | , )( )
( | , )
M
k i i iik M
k i ii
p zE
p z=
=
¬åå
e e
em
q
q
1
1
( | , )( )
( | , )
W Tk ij i i ii
k W
k i ii
p zE
p z=
=
=åå
e e e
e
qS
q
13
MAP for GMM: Parameter Interpolation
• The updated parameters for the personalized VA GMM can be derived by interpolation
• The effective number of component zk for the target user
• The interpolation factors (data-dependent) can be set by
( ) (1 ) ,k k k k kEa a¢ ¬ + -m m m
( ) (1 )( ) .T Tk k k k k k k k kEa a¢ ¢ ¢¬ + - + -S S S m m m m
1
( | , )M
k k i ii
M p z=
= å e q
k
kk M
M
Parameter interpolation between the expectation and background
14
Graphical Interpretation – MAP Adaptation
1
2
3 6
5
4
Interpolation Factor
Acoustic GMM Posterior
The personal annotation can be applied to clips exclusive to the background training set
15
Music Emotion Recognition
• Given the acoustic GMM posterior of a test song, predict the emotion as a single VA Gaussian
1
2
K-1
K
…
Acoustic GMM Posterior Learned VA GMM Predicted Single Gaussian
1
ˆˆ( | ) ( | , )K
k ij k kk
p s q=
=åe e m S
^
^
^
^
…
{ , }*m *S
16
Find the Representative Gaussian
• Minimize the cumulative weighted relative entropy– The representative Gaussian has the minimal cumulative
distance from all the component VA Gaussians
• The optimal parameters of the Gaussian are
( )KL{ , }
1
ˆ( | , ) argmin ( | , ) || ( | , )K
k k kk
p D p pq* *
=
= åe e eS
S S Sm
m m m
*
1
ˆ K
k kk
q=
= åm m
( )* * *
1
ˆ ( )( )K
Tk k k k
k
q=
= + - -åS S m m m m
17
Evaluation – Dataset and Acoustic Features
• MER60– 60 music clips, each is 30-second– 99 users in total, each clip annotated by 40 subjects– 6 users have annotated all the clips– Evaluate the personalization based on these 6 users
• Bag-of-frames representation, perform the analysis of emotion at the clip-level, instead of frame-level– 70Dim: dynamic, spectral, timbre (13 MFCCs, 13 delta MFCCs,
and 13 delta-delta MFCCs), and tonal
18
Evaluation – Incremental Setting
• Incremental adaptation experiment per target user– Randomly split all the clips (w/ annotations) into 6 folds – Perform 6-fold CV
• Hold out one fold for testing• The rest 5 folds: All the annotations except the
target user’s to train a background VA GMM• Add one fold of annotation of target user into the
adaptation pool (P=5 iterations loop) – Adaptation pool to adapt the background VA
GMM– Evaluate the prediction performance of the test
fold
19
Evaluation – Result
• Metric (ALLi): compute the log-likelihood of the predicted Gaussian with the ground truth annotation of the target user
20
Conclusion and Future Work
• The AEG model provides a principled probabilistic framework that is technically sound, and flexible for adaptation
• We have presented a novel MAP-based adaptation technique which is very efficient for personalizing the AEG model
• Demonstrated the effectiveness of the proposed method for personalizing MER in an incremental learning manner
• We will investigate the maximum likelihood linear regression (MLLR) that learns a linear transformation over the parameters of the AEG model