MediaEval 2016 - Emotion in Music Task: Lessons Learned

Emotion in Music Task: Lessons Learned

Anna Aljanaki1 Yi-Hsuan Yang2

Mohammad Soleymani1

1University of Geneva, Switzerland2Academia Sinica, Taiwan

20-21 October, MediaEval 2016

Emotion in Music Task

I 2013 — Emotion in Music Brave New Task.I Organized by M. Soleymani, M.N. Caro, E.M. Schmidt and

Y.-H. YangI 2 subtasks - dynamic (per-second) music emotion

recognition and song-level emotion recognitionI 3 participating teams


I Focused on audio analysis (optionally, metadata)I Most attention was paid to recognizing how emotion

changes over timeI Used valence/arousal model

Valence/Arousal model

Dynamic emotion tracking (over duration of a piece)



Y.-H. YangI 2 tasks - dynamic (per-second) music emotion recognition

and song-level emotion recognitionI 3 participating teams

I 2014 — Emotion in Music Task, Second EditionI Organized by A. Aljanaki, Y.-H. Yang, M. SoleymaniI 2 tasks - dynamic (per-second) music emotion recognition

and feature designI 7 participating teams



Y.-H. YangI 2 tasks - dynamic (per-second) music emotion recognition

and song-level emotion recognitionI 3 participating teams

I 2014 — Emotion in Music Task, Second EditionI Organized by A. Aljanaki, Y.-H. Yang, M. SoleymaniI 2 tasks - dynamic (per-second) music emotion recognition

and feature designI 7 participating teams

I 2015 — Emotion in Music Task, Third Edition.I Organized by A. Aljanaki, Y.-H. Yang, M. SoleymaniI 1 task - dynamic (per-second) music emotion recognition,

three submissions - features, prediction on baselinefeatures, prediction on custom features.

I 11 participating teams

Quality of the annotations

Year 2013 2014 2015Total length 9h 18min 12h 30min 3h 46minCronbach’s α for arousal .28 ± 0.28 .31 ± 0.30 .66 ± 0.26GAM’s R2 for arousal .13 ± 0.10 .14 ± 0.11 .44 ± 0.19Cronbach’s α for valence .28 ± 0.29 .20 ± 0.24 .51 ± 0.35GAM’s R2 for valence .13 ± 0.10 .10 ± 0.08 .37 ± 0.21

Quality of the annotations

Year 2013 2014 2015Total length 9h 18min 12h 30min 3h 46minCronbach’s α for arousal .28 ± 0.28 .31 ± 0.30 .66 ± 0.26GAM’s R2 for arousal .13 ± 0.10 .14 ± 0.11 .44 ± 0.19Cronbach’s α for valence .28 ± 0.29 .20 ± 0.24 .51 ± 0.35GAM’s R2 for valence .13 ± 0.10 .10 ± 0.08 .37 ± 0.21

I 2013 & 2014 – 45 second excerpts. 2015 – full songs.I 2013 & 2014 – Amazon Mechanical Turk Workers. 2015 –

Both lab and AMT workers.I 2015 – introduced preliminary listening.

Quality of the annotations - Arousal

Quality of the annotations - Valence

Continuous annotation interface

Continuous annotation problems

I Absolute scaleI Reaction timeI Scaling (’zoom’ levels)


Absolute scale ratings


We tried to scale each annotation to the dynamic mean of thesong: aj,i = aj,i + (Aj − A)


There is a reaction time in the annotations. Before listeners cangive judgements on the emotional content of music, they needto listen to it for some time.


There is a scaling problem – the unit of emotional expressioncan be structural section, or phrase, or a single note.

Best solutions

Method ρ RMSE2013, BLSTM-RNN .31 ± .37 .08 ± .052014, LSTM .35 ± .45 .10 ± .052015, BLSTM-RNN .66 ± .25 .12 ± .06

Table: Winning algorithms on arousal, ordered by Spearman’s ρ.BLSTM-RNN – Bi-directional Long-Short Term Memory RecurrentNeural Networks.

Method ρ RMSE2013, BLSTM-RNN .19 ± .43 .08 ± .042014, LSTM .20 ± .49 .08 ± .052015, BLSTM-RNN .17 ± .09 .12 ± .54

Table: Winning algorithms on valence, ordered by Spearman’s ρ.

Possible solutions and modifications

I Change the task from emotion tracking to dynamicstracking (diminuendo, crescendo, rallentando)



I Change the data collection interface

Categorical interface



I Change the data collection interfaceI Finding the practical task where continuous tracking is

necessary.I Retrieval by an emotional trajectoryI ThumbnailingI Emotion prediction from physiological signals and audio

Acknowledgements

We thank Erik M. Schmidt, Mike N. Caro, Cheng-Ya Sha,Alexander Lansky, Sung-Yen Liu and Eduardo Countinho fortheir contributions to task developments, and anonymousTurkers for their work.

Science

MediaEval 2016 - Emotion in Music Task: Lessons Learned