28
1 Institute of Information Science Academia Sinica Singer Identification and Singer Identification and Clustering of Popular Music Clustering of Popular Music Recordings Recordings Wei-Ho Tsai [email protected] Institute of Information Science, Academia Sinica

Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai [email protected]

Embed Size (px)

Citation preview

Page 1: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

1Institute of Information Science Academia Sinica

Singer Identification and Singer Identification and Clustering of Popular Music Clustering of Popular Music

RecordingsRecordings

Wei-Ho [email protected]

Institute of Information Science, Academia Sinica

Page 2: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

2Institute of Information Science Academia Sinica

Extracting Information From Music

Music Information Retrieval (MIR)– To develop ways of managing collections of musical

material for preservation, access, research, and other uses.

MIR communities & research areas [after Futrelle & Downie, 2002]

Computer Science

Audio Engineering

Psychology & Philosophy

Library Science

Law

Representation

Indexing

User Interface Design

Compression

FeatureDetection

Machine Learning

Metadata

Musical Analysis

Epistemology & Ontology

Perception

Intellectual Property

Classification

Musicology

Communities Research Areas

Page 3: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

3Institute of Information Science Academia Sinica

Extracting Voice Information From Music

Viewing MIR from a speech-processing perspectiveSpeech Processing

Analysis/Synthesis Recognition Coding

Speech Recognition Speaker Recognition Language Recognition

Phone Recognition Word Recognition Tone Recognition

Speaker Identification Speaker Verification Speaker Clustering

Language Identification Language Verification Dialect Identification

Singing Processing

Analysis/Synthesis Recognition Coding

"Singing Recognition" Singer Recognition Language Recognition

Phone Recognition Lyric Transcription Melody Extraction

Singer Identification Singer Detection Singer Clustering

Language Identification Language Verification Dialect Identification

Speech/Singing/Music/

Other SoundsDiscrimination

Page 4: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

4Institute of Information Science Academia Sinica

Singer Recognition Tasks (I)

Singer Identification– Determining who is singing

?

?

?

?

Who performed thismusic recording?

Page 5: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

5Institute of Information Science Academia Sinica

Singer Recognition Tasks (II)

Singer detection– Determining whether or not a specified singer is present

in a music recording

?

If Coco's voices in thismusic recording?

Page 6: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

6Institute of Information Science Academia Sinica

Singer Recognition Tasks (III)

Singer Tracking– Locating where a specified singer is present in a music

recording

Where is Coco's voices?

Page 7: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

7Institute of Information Science Academia Sinica

Singer Recognition Tasks (IV)

Singer Clustering– Grouping the same-singer music recordings into a cluster

Clusteringby singer

Page 8: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

8Institute of Information Science Academia Sinica

Potential Applications

Indexing– Finding cameo’s or guest appearances in live concert

recordings.– Identifying the singers in a movie’s musical interludes.

Music recommendation systems– Suggesting music by singers with similar voices.

Karaoke services– Efficiently organizing the customer’s recordings.– Personalization

Copyright protection– Distinguishing between an original song and a cover-band.– Rapidly scanning suspect websites for piracy

Page 9: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

9Institute of Information Science Academia Sinica

Singer’s Vocal Characteristics

Humans use several levels of perceptual cues for distinguishing among singers

Singingpronunciation, and

IdiosyncrasiesLearned traits

Physical traits

Culture andsocio-background

Modulation of pitch,rhythm,speed,

intonation, andvolume

Personality type

Acoustic aspect ofsinging, e.g., nasal,deep, breathy and

rough

Anatomicalstructure

of vocal apparatus

Characteristics Sources

Page 10: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

10Institute of Information Science Academia Sinica

Major Challenges In Singer Recognition

The vast majority of popular music contains background accompaniment during most or all vocal passages– Infeasible to acquire isolated solo voice data for

extracting the singer’s vocal characteristics

The proposed solution:

Vocal segment detection followed by solo vocal signal modeling

Page 11: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

11Institute of Information Science Academia Sinica

Vocal/Non-vocal Segmentation

Sliding Window

Vocal Model

Non-vocal Model

FeatureVectors

DecisionFeature

Extraction

Vocalor

Non-vocal

Mel-scale Frequency Cepstral Coefficients (MFCCs)

Filter BankFrame

Segment

Music Recording

Page 12: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

12Institute of Information Science Academia Sinica

Gaussian Mixture Model (I)

Model description– The distribution of the feature vector x is represented by

a mixture of M component Gaussian densities, i.e.,

• is the i-th Gaussian density with mean and covariance matrix

– A Gausian mixture model (GMM) is characterized by

x

11, ΣN () 22 , Σ ... MM Σ,

+w 1

w 2

w M

)|( xp

N () N ()

M

i iiiwp1

),()|( xx N

),( ii xN i i

Miw iii 1|,,

Page 13: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

13Institute of Information Science Academia Sinica

Gaussian Mixture Model (II)

Parameter estimation– Using the EM algorithm, an initial model is created, and

the new model is then estimated by maximizing the auxiliary function

where and

– Letting for each parameter to be re-estimated, we have

T

tti ip

Tw

1

),|(1

x

T

t t

T

t tti

ip

ip

1

1

),|(

),|(

x

xx

iiT

t t

T

t ttti

ip

ip

1

1

),|(

),|(

x

xxx

)ˆ,ˆ(ˆ)ˆ|,( iitit wip xx N

,)ˆ|,(log),|()ˆ(1 1

T

t

M

itt ipipQ xx

M

m mmtm

iitit

w

wip

1),(

),(),|(

x

xx

N

N

0)ˆ( Q

Page 14: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

14Institute of Information Science Academia Sinica

Distilling Singers’ Voices From Music

Substantial similarities exist between the instrumental regions and the accompaniment of the vocal signal

Solo voice can be modeled via suppressing the background music estimated from the instrumental regions.

Solo Voice

Accompaniments

Accompanied Voice

+

Page 15: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

15Institute of Information Science Academia Sinica

Solo Vocal Signal Modeling (I)

Model Description

b can be approximately estimated using the instrumental regions of music

– Our aim is to find an optimal s such that (in maximum likelihood sense)

).,|(maxarg bss ps

V

MixingV = f (S ,B )

A Solo Voice

A Background Music

An Accompanied Voice

},...,,{ 21 TvvvV},...,,{ 21 TsssS

},...,,{ 21 TbbbB

GMM GMM

}| ,,{

1

,,,

Mi

isisiss w

Σ}|

,,{

1

,,,

Nj

jbjbjbb w

Σ,),,,|(),|(

1 1 1,,

T

t

M

i

N

jbstjbisbs jipwwp vV

.),;(),;(

),,,|(

),(

,,,,

BSV

ΣΣ

f

ttjbjbtisist

bst

dd

jip

bsbs

v

NN

(unobservable)

(unobservable)

(observable)

Page 16: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

16Institute of Information Science Academia Sinica

Solo Vocal Signal Modeling (II)

Parameter estimation– Defining an auxiliary function

where

– Letting for each parameter to be re-estimated, we have

,)ˆ|,,(log),|,()ˆ(1 1 1

T

t

I

i

J

jbstbstss jipjipQ vv

),ˆ,,|()ˆ|,,( ,, bstjbisbst jipwwjip vv

.),,|(

),,|(),|,(

1 1 ,,

,,

I

m

J

n bstnbms

bstjbisbst

nmpww

jipwwjip

v

vv

0)ˆ( ssQ

,),|,(1

ˆ1 1

,

T

t

J

jbstis jip

Tw v

,

),,|,(

,,,,|),,|,(ˆ

1 1

1 1,

T

t

N

j bst

T

t

N

j bsttbst

isjip

jiEjip

v

vsv

,

),,|,(

,,,,|),,|,(ˆ

,,

1 1

1 1, isisT

t

J

j bst

T

t

J

j bstttbst

isjip

jiEjip

v

vssvΣ

Page 17: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

17Institute of Information Science Academia Sinica

Solo Vocal Signal Modeling (III)

Re-estimation formulas for linear spectral features– Suppose V is a linear spectral feature, and S and B are

additive in the time domain, then vt = st + bt

– is the convolution of the solo and background music densities, i.e.,

– and can be shown in the following form:

),,,|( bst jivp

tjbjbttisistbst sds-vsjivp ),;(),;(),,,|( 2,,

2,, NN

,,,,,| ,,2,

2,

2,

2,

2,

jbis

is

jbt

jbis

isbstt vjivsE

.,,,,|,,,,|2

2,

2,

2,

2,

bsttjbis

jbisbst

2t jivsEjivsE

bstt jivsE ,,,,| bst2t jivsE ,,,,|

Page 18: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

18Institute of Information Science Academia Sinica

Solo Vocal Signal Modeling (IV)

Re-estimation formulas for cepstral features– Suppose V is a cepstral feature, and S and B are additive in

the time domain, then vt = log[exp(st)+exp(bt)]. We approximate vt

max (st , bt ).

– It can be shown that

),(),;()(),;(),,,|(,

,2,,

,

,2,,

is

istjbjbt

jb

jbtisistbst

vv

vvjivp

NN .

2

1)( 2/

2

dwe w

,,,,,|),,,|(1),,,|(,,,,| bstttbstttbsttbstt jivssEjivspvjivspjivsE

,,,,,|),,,|(1),,,|(,,,,| 222bstttbstttbsttbstt jivssEjivspvjivspjivsE

,)(),;()(),;(

)(),;(

),,,|(

,

,2,,

,

,2,,

,

,2,,

is

istjbjbt

jb

jbtisist

jb

jbtisist

bstt vv

vv

vv

jivsp

NN

N

.

)(

),;(,,,,|

,

,

2,,

,,

is

ist

isistisisbsttt v

vjivssE

N

.)(

),;()(,,,,|

,

,

2,,

,,2,

2,

is

ist

isististisisisbstt

2t v

vvjivssE

N

Page 19: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

19Institute of Information Science Academia Sinica

Singer Identification (SID)

Block diagram

TrainingData Vocal/

InstrumentalSegmentation

(Non-vocal portion)Instrumental

Signal B

Accompanied Signal V(vocal portion)

GaussianMixture

Modeling

Background MusicModel

Solo SignalModeling

SoloModel

Vocal/Instrumental

Segmentation

GaussianMixture

Modeling

MaximumLikelihoodDecision

max P( B |b)max

p (V s,b)

Training Phase

Testing Phase

b

s

TestData

X

InstrumentalSignal

Background Music Model

Accompanied Signal X V

arg max i

Hyp

othe

size

d Si

nger

Solo Models

for P Singers

s ,1 , s,2 , ..., s, P

B~

)~

,|( , bisVp X)

~|

~( bpmax B

b~

Page 20: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

20Institute of Information Science Academia Sinica

SID Experiments

Music data– 200 tracks from

Mandarin pop music CDs

– 10 female & 10 male singers

– 5 tracks/singer for training; 5 tracks/singer for testing

– 20-min instrumental-only data for training the non-vocal GMM

– 22.05 kHz sampling rate (down-sampled from 44.1 kHz)

Vocal/Non-vocal segmentation– 82.3% frame accuracy

1000 Entire3000 6000 9000R ecord ing Length (# fram es)

65.0

70.0

75.0

80.0

85.0

90.0

95.0

100.0

Acc

urac

y (in

%)

G M M ; M anua l Segm enta tion

G M M ; Autom atic S egm entation

Solo M ode ling; M anual Segm enta tion

Solo M ode ling; A utom atic Segm entation

SID

Page 21: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

21Institute of Information Science Academia Sinica

Singer Clustering (I)

Block diagram

L N 1 L N 2 ... L NN

x 1

Log-likelihoodComputation

L ij = log p (x i | )

x 2 x N

SoloModeling

SoloModeling

SoloModeling

x 1

x 2

x N

L 11 L 12 ... L 1 N

L 21 L 22 ... L 2 N

F 1

F 2

F N

VectorClustering

Cluster M : {x 6 , x 10 , ...}

Cluster 1: {x 3 , x 7 , ...}

Cluster 2: {x 1 , x 9 , ...}Transform

Transform

1 2 N (Log-likelihoods)

j

(Characteristic Vectors)

(Feature Vectors of Music Recordings)

(Models)

Transform

Page 22: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

22Institute of Information Science Academia Sinica

Singer Clustering (II)

An example of the characteristic vectors

V

{ { { { {Singer 1 Singer 2 Singer 3 Singer 4 Singer 5

L F

{ { { { {Singer 1 Singer 2 Singer 3 Singer 4 Singer 5

- ,, jii LL

1.0

F i,j

)maxarg( ,

kik

L

Page 23: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

23Institute of Information Science Academia Sinica

Singer Clustering (III)

Determining the number of clusters– Bayesian Information Criterion (BIC)

• Measuring how well the model fits a data set, and how simple the model is, specifically

– The BIC for a K-clustering is computed by:

– A reasonable number of clusters can be determined by

|,|log 2

1)|(log)BIC( DD dp

,log)1(2

1

2

1||log

2)(BIC

1

MMMMKn

KK

kk

k

).(BIC maxarg1

* KKMK

d : no. of free parameters in model | D | : size of the data set D: a penalty factor

K=3

K=4

BIC increases or not?

M : total no. of elementsn k : no. of elements of the cluster kk : covariance matrix of the

characteristic vectors in the cluster k

Page 24: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

24Institute of Information Science Academia Sinica

Singer Clustering Experiments (I)

Music data– 200 tracks (20 singers; 10 tracks/singer)

Assessment method– Cluster purity

k is the purity of the cluster k, nk the total no. of recordings in the cluster k, and nkp the no. of recordings in the cluster k that were performed by singer p

– Average purity

• M is the total no. of recordings, and K the no. of clusters

,1

1

K

kkkn

M

,1

2

2

P

p k

kpk n

n

42.010

62112

2222

0.1

6

62

2

25.04

11112

2222

Page 25: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

25Institute of Information Science Academia Sinica

Singer Clustering Experiments (II)

Results

0 10 20 30 40 50 60 70 80 90N o. of C lusters

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Ave

rage

Pur

ity

M anual Segm entation; 32-m ix So lo G M M & 8-m ix B ackground G M M / R ecord ing

M anual Segm entation; 32-m ix Vocal G M M / R ecord ing

Autom atic Segm entation; 24-m ix Solo G M M & 8-m ix Background G M M / R ecord ing

Autom atic Segm entation; 24-m ix voca l G M M / R ecord ing0 5 10 15 20 25 30 35 40 45 50

N o. of C lusters

-520

-480

-440

-400

-360

-320

-280

-240

-200

-160

-120

-80

-40

BIC

5 s ingers

10 s ingers

15 s ingers

20 s ingers

Appropriate no. of clusters

Page 26: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

26Institute of Information Science Academia Sinica

Summary

We have– Separated vocal from non-vocal segments of music;– Isolated singers’ vocal characteristics form the

background music;– Distinguished singers from one another.

We will– Handle wider variety of music data including duets, trios,

chorus, background vocals, or music with multiple simultaneous or non-simultaneous singers;

– Deal with the other problems of voice information retrieval from music, such as lyric transcription and singing language recognition.

Page 27: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

27Institute of Information Science Academia Sinica

To Probe Further (I)

Selected references– Music information retrieval

• A. L. Uitdenbogerd, “Music IR: past, present, and future,” Proceedings of International Symposium on Music Information Retrieval, 2000.

• J. Futrelle and J. S. Downie, “Interdisciplinary communities and research issues in music information retrieval,” Proceedings of International Conference on Music Information Retrieval, pp. 215–221, 2002.

– Artist recognition• B. Whitman, G. Flake, and S. Lawrence, “Artist detection in music with Minnowmatch,”

Proceedings of IEEE Workshop on Neural Networks for Signal Processing, 2001.• A. Berenzweig, D. P. W. Ellis, and S. Lawrence, “Using voice segments to improve artist

classification of music,” Proceedings of International Conference on Virtual, Synthetic and Entertainment Audio, 2002.

– Singer identification• Y. E. Kim and B. Whitman, “Singer identification in popular music recordings using voice coding features,”

Proceedings of International Conference on Music Information Retrieval, pp. 164–169, 2002. • C. C. Liu, and C. S. Huang, “A singer identification technique for content-based classification of MP3 music objects,”

Proceedings of International Conference on Information and Knowledge Management, pp. 438–445, 2002. • T. Zhang, “Automatic Singer Identification,” Proceedings of International Conference on Multimedia and Expo, 2003. • W. H. Tsai, H. M. Wang, and D. Rodgers, “Automatic singer identification of popular music recordings via estimation

and modeling of solo vocal signal,” Proceedings of European Conference on Speech Communication and Technology, 2003.

– Singer clustering• W. H. Tsai, H. M. Wang, D. Rodgers, S. S. Cheng, and H. M. Yu, “Blind clustering of popular music recordings based

on singer voice characteristics,” to appear in Proceedings of International Conference on Music Information Retrieval, 2003.

Page 28: Institute of Information Science Academia Sinica 1 Singer Identification and Clustering of Popular Music Recordings Wei-Ho Tsai wesley@iis.sinica.edu.tw

28Institute of Information Science Academia Sinica

To Probe Further (II)

General resources– Important conferences

• International Conference on Music Information Retrieval• International Computer Music Conference • IEEE International Conference on Multimedia and Expo• ACM International Multimedia Conference• International Conference on New Interfaces for Musical Expression

– Organizations• International Computer Music Association (http://www.computermusic.org/)• The Australasian Computer Music Association (http://www.acma.asn.au/)• ACM Multimedia (http://www.acm.org/sigmm/)• Acoustical Society of America (http://asa.aip.org/)

– Journals• Computer Music Journal (http://www-mitpress.mit.edu/catalog/item/default.asp?ttype=4&tid=15)• Journal of New Music Research (http://www.swets.nl/jnmr/jnmr.html)• Computing in Musicology (http://www.ccarh.org/publications/books/cm/)

– Useful links• http://www.leighsmith.com/Browsers/Cmusic.html• http://www2.siba.fi/Kulttuuripalvelut/computers.html