Upload
dotuyen
View
225
Download
0
Embed Size (px)
Citation preview
Automatic Transcription:An Enabling Technology for Music Analysis
Simon [email protected]
Centre for Digital MusicSchool of Electronic Engineering and Computer Science
Queen Mary University of London
!
Semantic Audio
Extraction of symbolic “meaning” from audio signals:How would a person describe the audio?
Solo violin, D minor, slow tempo, Baroque periodBar 12 starts at 0:21.9612nd verse starts at 1:43.388Quarter comma meantone temperament
Annotation of audioOnset detectionBeat trackingSegmentation (structural, ...)Instrument identificationTemperament and inharmonicity analysisAutomatic transcription
Automatically generated annotations are stored as metadataMatched against content-based queriesAlso useful for navigation within a document
Simon Dixon (C4DM) Automatic Transcription 2 / 16
!
Automatic music transcription (AMT)
Audio recording→ ScoreApplications:
Music information retrievalInteractive music systemsMusicological analysis
Subtasks:Pitch estimationOnset/offset detectionInstrument identificationRhythmic parsingPitch spellingTypesetting
Still remains an open problem
Simon Dixon (C4DM) Automatic Transcription 3 / 16
!
Background (1)
Core problem: Multiple-F0 estimationCommon time-frequency representations:
Short-Time Fourier TransformConstant-Q TransformERB Gammatone Filterbank
Common estimation techniques:Signal processing methodsStatistical modelling methodsNMF/PLCASparse codingMachine learning algorithms
HMMs commonly used for note tracking
Simon Dixon (C4DM) Automatic Transcription 4 / 16
!
Background (2)
evaluation:
Best results achieved by signal processing methodsMultiple-F0 estimation approaches rely on assumptions:
HarmonicityPower summationSpectral smoothnessConstant spectral template
1 2 3 4 5−100
−50
0
Mag
nitu
de (
dB)
Frequency (kHz)1 2 3 4 5
−100
−50
0
Mag
nitu
de (
dB)
Frequency (kHz)
Simon Dixon (C4DM) Automatic Transcription 5 / 16
!
Background (3)
Design considerations:Time-frequency representationIterative/joint estimation methodSequential/joint pitch tracking
1 2 3 4 50
1
2
3
Frequency (kHz)
ST
FT
Mag
nitu
de
200 400 600 800 1000 12000
1
2
3
RTFI Index
RT
FI M
agni
tude
Simon Dixon (C4DM) Automatic Transcription 6 / 16
!
Signal Processing Approach (ICASSP 2011)
RTFI time-frequency representationPreprocessing: noise suppression, spectral whiteningOnset detection: late fusion of spectral flux & pitch saliencePitch candidate combinations are evaluated, modelling tuning,inharmonicity, overlapping partials and temporal evolutionHMM-based offset detection
(a) Ground-truth guitar excerpt
‘RWC MDB-J-2001 No. 9’
(b) Transcription output of the
same recording
(a) Ground Truth
MID
I Pitc
h2 4 6 8 10 12 14 16 18 20 22
50
60
70
80
(b) Transcription
MID
I Pitc
h
2 4 6 8 10 12 14 16 18 20 22
50
60
70
80
Simon Dixon (C4DM) Automatic Transcription 7 / 16
!
Background: Non-negative Matrix Factorisation (NMF)
X = WHX is the power (or magnitude) spectrogramW is the spectral basis matrix (dictionary of atoms)H is the atom activity matrix
Solved by successive approximationMinimise || X −WH ||Subject to constraints (e.g. non-negativity, sparsity, harmonicity)
Elegant model, but ...Not guaranteed to converge to a meaningful resultDifficult to extend / generalise
Simon Dixon (C4DM) Automatic Transcription 8 / 16
!
PLCA-based Transcription
(Ref: Benetos and Dixon, SMC 2011)Probabilistic Latent Component Analysis (PLCA): probabilisticframework that is easy to generalize and interpretPLCA algorithms have been used in the past for musictranscriptionShift-invariant PLCA-based AMT system with multiple templates:
P(ω, t) = P(t)∑p,s
P(ω|s, p) ∗ω P(f |p, t)P(s|p, t)P(p|t)
P(ω, t) is the input log-frequency spectrogram,P(t) the signal energy,P(ω|s, p) spectral templates for instrument s and pitch p,P(f |p, t) the pitch impulse distribution,P(s|p, t) the instrument contribution for each pitch, andP(p|t) the piano-roll transcription.
Simon Dixon (C4DM) Automatic Transcription 9 / 16
!
PLCA-based Transcription (2)
System is able to utilize sets of extracted pitch templates from variousinstruments (Figure: violin templates)
Pitch Number
Fre
quen
cy
35 40 45 50 55 60 65 70 75 80
100
200
300
400
500
600
700
800
900
1000
Simon Dixon (C4DM) Automatic Transcription 10 / 16
!
PLCA-based Transcription (3)
Figure: transcription of a Cretan lyra excerpt
Original: Transcription:
Time (frames)
Fre
quen
cy
200 400 600 800 1000 1200 1400 1600 1800 2000 2200100
200
300
400
500
600
700
Simon Dixon (C4DM) Automatic Transcription 11 / 16
!
Application 1: Characterisation of Musical Style
(Ref: Mearns, Benetos and Dixon, SMC 2011)Bach Chorales: audio and MIDI versionsAudio transcribed and segmented by beatsEach vertical slice is matched to chord templates based onSchoenberg’s definition of diatonic triads and 7ths for the majorand minor modesHMM detects key and modulations
Observations are chord symbols, hidden states are keysObservation matrix models relationships between chords and keysState transition matrix models modulation behaviourVarious models based on Schoenberg and Krumhansl —comparison of models from music theory and music psychology
Transcription Chords KeyAudio 33% 56% 64%MIDI (100%) 85% 65%
Simon Dixon (C4DM) Automatic Transcription 12 / 16
!
Application 2: Analysis of Harpsichord Temperament
(Ref: Tidhar et al. JNMR 2010; Dixon et al. JASA 2011, to appear)For fixed-pitch instruments, tuning systems are chosen which aimto maximise the consonance of common intervalsTemperament is a compromise arising from the impossibility ofsatisfying all tuning constraints simultaneouslyWe measure temperament to aid study of historical performance
Perform “conservative” transcriptionCalculate precise frequencies of partials (CQIFFT)Estimate fundamental and inharmonicity of each noteClassify temperament
Synthesised Acoustic (real)Spectral Peaks 96% 79%Conservative Transcription 100% 92%
Simon Dixon (C4DM) Automatic Transcription 13 / 16
!
Application 3: Analysis of Cello Timbre
(Ref: Chudy, Benetos and Dixon, work in progress)Expert human listeners can recognise the “sound” of a performer
Not just the recordingNot just the instrument
Can this be captured in standard timbral features?Temporal: attack time, decay time, temporal centroidSpectral: spectral centroid, spectral deviation, spectral spread,irregularity, tristimuli, odd/even ratio, envelope, MFCCsSpectrotemporal: spectral flux, roughness
In order to estimate features, it is necesary to identify the notes(onset, offset, pitch, etc.)
As a manual task, this is very time consumingConservative transcription solves the problem: high precision, lowrecallSystem is trained on cello samples from RWC database
Simon Dixon (C4DM) Automatic Transcription 14 / 16
!
Take-home Messages
Automatic transcription enables new musicological questions to beinvestigatedMany other applications: music education, practice, MIRThe technology is not perfect but music is highly redundantShift-invariant PLCA is ideal for instrument-specific polyphonictranscription
Proposed procedure for folk music analysis:Extract templates for desired instruments (e.g. using NMF)Perform shift-invariant PLCA transcriptionEstimate features of interestInterpret results according to estimates obtained
Simon Dixon (C4DM) Automatic Transcription 15 / 16
!
Any Questions?
AcknowledgementsEmmanouil Benetos, Lesley Mearns, Magdalena Chudy: the PhDstudents who do all the workDan Tidhar, Matthias Mauch: collaboration on the harpsichordprojectAggelos, Christina and EURASIP for the invitation
Simon Dixon (C4DM) Automatic Transcription 16 / 16