Upload
multimediaeval
View
51
Download
4
Embed Size (px)
Citation preview
Emotion in Music Task: Lessons Learned
Anna Aljanaki1 Yi-Hsuan Yang2
Mohammad Soleymani1
1University of Geneva, Switzerland2Academia Sinica, Taiwan
20-21 October, MediaEval 2016
Emotion in Music Task
I 2013 — Emotion in Music Brave New Task.I Organized by M. Soleymani, M.N. Caro, E.M. Schmidt and
Y.-H. YangI 2 subtasks - dynamic (per-second) music emotion
recognition and song-level emotion recognitionI 3 participating teams
Emotion in Music Task
I Focused on audio analysis (optionally, metadata)I Most attention was paid to recognizing how emotion
changes over timeI Used valence/arousal model
Valence/Arousal model
Dynamic emotion tracking (over duration of a piece)
Emotion in Music Task
I 2013 — Emotion in Music Brave New Task.I Organized by M. Soleymani, M.N. Caro, E.M. Schmidt and
Y.-H. YangI 2 tasks - dynamic (per-second) music emotion recognition
and song-level emotion recognitionI 3 participating teams
I 2014 — Emotion in Music Task, Second EditionI Organized by A. Aljanaki, Y.-H. Yang, M. SoleymaniI 2 tasks - dynamic (per-second) music emotion recognition
and feature designI 7 participating teams
Emotion in Music Task
I 2013 — Emotion in Music Brave New Task.I Organized by M. Soleymani, M.N. Caro, E.M. Schmidt and
Y.-H. YangI 2 tasks - dynamic (per-second) music emotion recognition
and song-level emotion recognitionI 3 participating teams
I 2014 — Emotion in Music Task, Second EditionI Organized by A. Aljanaki, Y.-H. Yang, M. SoleymaniI 2 tasks - dynamic (per-second) music emotion recognition
and feature designI 7 participating teams
I 2015 — Emotion in Music Task, Third Edition.I Organized by A. Aljanaki, Y.-H. Yang, M. SoleymaniI 1 task - dynamic (per-second) music emotion recognition,
three submissions - features, prediction on baselinefeatures, prediction on custom features.
I 11 participating teams
Quality of the annotations
Year 2013 2014 2015Total length 9h 18min 12h 30min 3h 46minCronbach’s α for arousal .28 ± 0.28 .31 ± 0.30 .66 ± 0.26GAM’s R2 for arousal .13 ± 0.10 .14 ± 0.11 .44 ± 0.19Cronbach’s α for valence .28 ± 0.29 .20 ± 0.24 .51 ± 0.35GAM’s R2 for valence .13 ± 0.10 .10 ± 0.08 .37 ± 0.21
Quality of the annotations
Year 2013 2014 2015Total length 9h 18min 12h 30min 3h 46minCronbach’s α for arousal .28 ± 0.28 .31 ± 0.30 .66 ± 0.26GAM’s R2 for arousal .13 ± 0.10 .14 ± 0.11 .44 ± 0.19Cronbach’s α for valence .28 ± 0.29 .20 ± 0.24 .51 ± 0.35GAM’s R2 for valence .13 ± 0.10 .10 ± 0.08 .37 ± 0.21
I 2013 & 2014 – 45 second excerpts. 2015 – full songs.I 2013 & 2014 – Amazon Mechanical Turk Workers. 2015 –
Both lab and AMT workers.I 2015 – introduced preliminary listening.
Quality of the annotations - Arousal
Quality of the annotations - Valence
Continuous annotation interface
Continuous annotation problems
I Absolute scaleI Reaction timeI Scaling (’zoom’ levels)
Continuous annotation problems
Absolute scale ratings
Continuous annotation problems
We tried to scale each annotation to the dynamic mean of thesong: aj,i = aj,i + (Aj − A)
Continuous annotation problems
There is a reaction time in the annotations. Before listeners cangive judgements on the emotional content of music, they needto listen to it for some time.
Continuous annotation problems
There is a scaling problem – the unit of emotional expressioncan be structural section, or phrase, or a single note.
Best solutions
Method ρ RMSE2013, BLSTM-RNN .31 ± .37 .08 ± .052014, LSTM .35 ± .45 .10 ± .052015, BLSTM-RNN .66 ± .25 .12 ± .06
Table: Winning algorithms on arousal, ordered by Spearman’s ρ.BLSTM-RNN – Bi-directional Long-Short Term Memory RecurrentNeural Networks.
Method ρ RMSE2013, BLSTM-RNN .19 ± .43 .08 ± .042014, LSTM .20 ± .49 .08 ± .052015, BLSTM-RNN .17 ± .09 .12 ± .54
Table: Winning algorithms on valence, ordered by Spearman’s ρ.
Possible solutions and modifications
I Change the task from emotion tracking to dynamicstracking (diminuendo, crescendo, rallentando)
Possible solutions and modifications
I Change the task from emotion tracking to dynamicstracking (diminuendo, crescendo, rallentando)
I Change the data collection interface
Categorical interface
Possible solutions and modifications
I Change the task from emotion tracking to dynamicstracking (diminuendo, crescendo, rallentando)
I Change the data collection interfaceI Finding the practical task where continuous tracking is
necessary.I Retrieval by an emotional trajectoryI ThumbnailingI Emotion prediction from physiological signals and audio
Acknowledgements
We thank Erik M. Schmidt, Mike N. Caro, Cheng-Ya Sha,Alexander Lansky, Sung-Yen Liu and Eduardo Countinho fortheir contributions to task developments, and anonymousTurkers for their work.