30
Thursday, November 13, 2008 ASA 156: Statistical Approaches for Analysis of Music and Speech Audio Signals AudioDB: Scalable approximate AudioDB: Scalable approximate nearest-neighbor search with nearest-neighbor search with automatic radius-bounded indexing automatic radius-bounded indexing Michael A. Casey Michael A. Casey Digital Musics Digital Musics Dartmouth College, Dartmouth College, Hanover, NH Hanover, NH

AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Embed Size (px)

DESCRIPTION

AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing. Michael A. Casey Digital Musics Dartmouth College, Hanover, NH. Scalable Similarity. 8M tracks in commercial collection PByte of multimedia data Require passage-level retrieval (~ 2 bars) - PowerPoint PPT Presentation

Citation preview

Page 1: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Thursday, November 13, 2008ASA 156: Statistical Approaches for Analysis of

Music and Speech Audio Signals

AudioDB: Scalable approximate AudioDB: Scalable approximate nearest-neighbor search with nearest-neighbor search with

automatic radius-bounded indexingautomatic radius-bounded indexing

Michael A. CaseyMichael A. Casey

Digital MusicsDigital Musics

Dartmouth College, Hanover, Dartmouth College, Hanover, NHNH

Page 2: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Scalable SimilarityScalable Similarity

8M tracks in commercial collection8M tracks in commercial collection PByte of multimedia data PByte of multimedia data Require passage-level retrieval (~ 2 Require passage-level retrieval (~ 2

bars)bars) Require scalable nearest-neighbor Require scalable nearest-neighbor

methodsmethods

Page 3: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

SpecificitySpecificity

Partial track retrievalPartial track retrieval Alternate versions: remix, cover, live, Alternate versions: remix, cover, live,

album album Task is mid-high specificityTask is mid-high specificity

Page 4: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Example: remixingExample: remixing

Original TrackOriginal Track Remix 1Remix 1 Remix 2Remix 2 Remix 3Remix 3

Page 5: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Audio ShinglesAudio Shingles

, concatenate l frames of m dimensional features

A shingle is defined as:

• Shingles provide contextual information about features • Originally used for Internet search engines:

•Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, Geoffrey Zweig: “Syntactic Clustering of the Web”. Computer Networks 29(8-13): 1157-1166 (1997)

•Related to N-grams, overlapping sequences of features• Applied to audio domain by Casey and Slaney :

•Casey, M.   Slaney, M.   “The Importance of Sequences in Musical Similarity”, in Proc.

IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006

Page 6: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Audio Shingle SimilarityAudio Shingle Similarity

Page 7: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Audio Shingle Similarity Audio Shingle Similarity

, a query shingle drawn from a query track {Q}

, database of audio tracks indexed by (n)

, a database shingle from track n

Shingles are normalized to unit vectors, therefore:

For shingles with M dimensions (M=l.m); m=12, 20; l=30,40

Page 8: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Open source: google: Open source: google: “audioDB”“audioDB” Management of tracks, sequences, Management of tracks, sequences,

saliencesalience Automatic indexing parametersAutomatic indexing parameters OMRAS2, Yahoo!, AWAL, CHARM, more…OMRAS2, Yahoo!, AWAL, CHARM, more… Web-services interface (SOAP / JSON)Web-services interface (SOAP / JSON) Implementation of LSH for large N ~ 1BImplementation of LSH for large N ~ 1B 1-10 ms whole-track retrieval from 1B 1-10 ms whole-track retrieval from 1B

vectorsvectors

AudioDB: Shingle Nearest AudioDB: Shingle Nearest Neighbor SearchNeighbor Search

Page 9: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

AudioDB: Shingle Nearest AudioDB: Shingle Nearest Neighbor SearchNeighbor Search

Page 10: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Whole-track similarityWhole-track similarity

Often want to know which tracks are Often want to know which tracks are similarsimilar

Similarity depends on specificity of Similarity depends on specificity of tasktask Distortion / filtering / re-encoding (high)Distortion / filtering / re-encoding (high) Remix with new audio material (mid)Remix with new audio material (mid) Cover song: same song, different artist Cover song: same song, different artist

(mid)(mid)

Page 11: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Whole-track resemblance:Whole-track resemblance:radius-bounded searchradius-bounded search

Compute the number of shingle collisions between two tracks:

Page 12: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Whole-track resemblance:Whole-track resemblance:radius-bounded searchradius-bounded search

Compute the number of shingle collisions between two tracks:

• Requires a threshold for considering shingles to be related• Need a way to estimate relatedness (threshold) for data set

Page 13: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Statistical approaches to Statistical approaches to modeling modeling

distance distributionsdistance distributions

Page 14: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Distribution of minimum Distribution of minimum distancesdistances

Database: 1.4 million shingles. The left bump is the minimum between 1000 randomly selectedquery shingles and this database. The right bump is a small sampling (1/98 000 000) of the full histogram of all distances.

Page 15: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Radius-bounded retrieval Radius-bounded retrieval performance: cover song performance: cover song

(opus task)(opus task)

• Performance depends critically on xthresh, the collision threshold

• Want to estimate xthresh automatically from unlabelled data

Page 16: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Order StatisticsOrder Statistics

Minimum-value distribution is Minimum-value distribution is analyticanalytic

Estimate the distribution parametersEstimate the distribution parameters Substitute into minimum value Substitute into minimum value

distributiondistribution Define a threshold in terms of FP Define a threshold in terms of FP

raterate This gives an estimate of This gives an estimate of xthreshxthresh

Page 17: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Estimating Estimating xthresh xthresh from from unlabelled dataunlabelled data

Use theoretical statisticsUse theoretical statistics Null Hypothesis: Null Hypothesis:

HH00: shingles are drawn from unrelated tracks: shingles are drawn from unrelated tracks

Assume elements i.i.d., normally distributedAssume elements i.i.d., normally distributed MM dimensional shingles, dimensional shingles, dd effective degrees of effective degrees of

freedom: freedom:

Squared distance distribution for Squared distance distribution for HH00

Page 18: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

ML for background ML for background distributiondistribution

• Likelihood for N data points (distances squared)• d = effective degrees of freedom• M = shingle dimensionality

Page 19: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Background distribution Background distribution parametersparameters

• Likelihood for N data points (distances squared)• d = effective degrees of freedom• M = shingle dimensionality

Page 20: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Minimum value over Minimum value over NN samplessamples

Page 21: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Minimum value distribution Minimum value distribution of of unrelated shinglesunrelated shingles

Page 22: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Estimate of Estimate of xthreshxthresh

, false positive rate

Page 23: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Unlabelled data Unlabelled data experimentexperiment

Unlabelled data set Unlabelled data set Known to contain:Known to contain:

cover songs (same work, different performer)cover songs (same work, different performer) Near duplicate recordings (misattribution, Near duplicate recordings (misattribution,

encoding)encoding) Estimate background distance distributionEstimate background distance distribution Estimate minimum value distributionEstimate minimum value distribution Set Set xthresh xthresh so FP rate is <= 1%so FP rate is <= 1% Whole-track retrieval based on shingle Whole-track retrieval based on shingle

collisionscollisions

Page 24: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Cover song retrievalCover song retrieval

Page 25: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

ScalingScaling

Locality sensitive hashing Locality sensitive hashing Trade-off approximate NN for time Trade-off approximate NN for time

complexitycomplexity 3 to 4 orders of magnitude speed-up3 to 4 orders of magnitude speed-up No noticeable degradation in No noticeable degradation in

performanceperformance For optimal radius thresholdFor optimal radius threshold

Page 26: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

LSHLSH

Page 27: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Remix retrieval via LSHRemix retrieval via LSH

Page 28: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Current deploymentCurrent deployment

Large commercial collectionsLarge commercial collections AWAL ~ 100,000 tracksAWAL ~ 100,000 tracks Yahoo! 2M+ tracks, related song Yahoo! 2M+ tracks, related song

classifierclassifier AudioDB: open-source, international AudioDB: open-source, international

consortium of developersconsortium of developers Google: “audioDB”Google: “audioDB”

Page 29: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

ConclusionsConclusions

Radius-bounded retrieval model for tracksRadius-bounded retrieval model for tracks Shingles preserve temporal information, high Shingles preserve temporal information, high

dd Implements mid-to-high specificity searchImplements mid-to-high specificity search Optimal radius threshold from order statistics Optimal radius threshold from order statistics

null hypothesis: shingles are drawn from unrelated null hypothesis: shingles are drawn from unrelated trackstracks

LSH requires radius bound, automatic LSH requires radius bound, automatic estimateestimate

Scales to 1B shingles+ using LSHScales to 1B shingles+ using LSH

Page 30: AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

ThanksThanks

Malcolm Slaney, Yahoo! Research Malcolm Slaney, Yahoo! Research Inc.Inc.

Christophe Rhodes, Goldsmiths, U. Christophe Rhodes, Goldsmiths, U. of Londonof London

Michela Magas, Goldsmiths, U. of Michela Magas, Goldsmiths, U. of LondonLondon

Funding: EPSRC: EP/E02274X/1 Funding: EPSRC: EP/E02274X/1