105
1/53 Introduction Audio Fingerprinting Video Fingerprinting Conclusion Fingerprinting Audio and Video Detection Systems Speaker: Matteo Pozzetti Supervisor: Laura Peer 01 April 2015

Fingerprinting - Audio and Video Detection Systems

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Fingerprinting - Audio and Video Detection Systems

1/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

FingerprintingAudio and Video Detection Systems

Speaker: Matteo Pozzetti

Supervisor: Laura Peer

01 April 2015

Page 2: Fingerprinting - Audio and Video Detection Systems

2/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Overview

1 Introduction

2 Audio Fingerprinting

3 Video Fingerprinting

4 Conclusion

Page 3: Fingerprinting - Audio and Video Detection Systems

3/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Introduction

Page 4: Fingerprinting - Audio and Video Detection Systems

4/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Fingerprinting

Content based fingerprinting:

Algorithm that maps a multimedia object into a compactdigital summary (set of hashes)

Used to retrieve such an object in a database

Applications:

Search engines (Shazam)Copyright protection (YouTube)

Requirements

A fingerprint algorithm must be noise resistant andcomputationally efficient.

Page 5: Fingerprinting - Audio and Video Detection Systems

4/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Fingerprinting

Content based fingerprinting:

Algorithm that maps a multimedia object into a compactdigital summary (set of hashes)

Used to retrieve such an object in a database

Applications:

Search engines (Shazam)Copyright protection (YouTube)

Requirements

A fingerprint algorithm must be noise resistant andcomputationally efficient.

Page 6: Fingerprinting - Audio and Video Detection Systems

4/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Fingerprinting

Content based fingerprinting:

Algorithm that maps a multimedia object into a compactdigital summary (set of hashes)

Used to retrieve such an object in a database

Applications:

Search engines (Shazam)Copyright protection (YouTube)

Requirements

A fingerprint algorithm must be noise resistant andcomputationally efficient.

Page 7: Fingerprinting - Audio and Video Detection Systems

4/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Fingerprinting

Content based fingerprinting:

Algorithm that maps a multimedia object into a compactdigital summary (set of hashes)

Used to retrieve such an object in a database

Applications:

Search engines (Shazam)

Copyright protection (YouTube)

Requirements

A fingerprint algorithm must be noise resistant andcomputationally efficient.

Page 8: Fingerprinting - Audio and Video Detection Systems

4/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Fingerprinting

Content based fingerprinting:

Algorithm that maps a multimedia object into a compactdigital summary (set of hashes)

Used to retrieve such an object in a database

Applications:

Search engines (Shazam)Copyright protection (YouTube)

Requirements

A fingerprint algorithm must be noise resistant andcomputationally efficient.

Page 9: Fingerprinting - Audio and Video Detection Systems

4/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Fingerprinting

Content based fingerprinting:

Algorithm that maps a multimedia object into a compactdigital summary (set of hashes)

Used to retrieve such an object in a database

Applications:

Search engines (Shazam)Copyright protection (YouTube)

Requirements

A fingerprint algorithm must be noise resistant andcomputationally efficient.

Page 10: Fingerprinting - Audio and Video Detection Systems

5/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Table of Contents

1 Introduction

2 Audio FingerprintingIntroductionHashingSearching and ScoringResults

3 Video Fingerprinting

4 Conclusion

Page 11: Fingerprinting - Audio and Video Detection Systems

6/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Introduction

Page 12: Fingerprinting - Audio and Video Detection Systems

7/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Page 13: Fingerprinting - Audio and Video Detection Systems

9/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Process

Page 14: Fingerprinting - Audio and Video Detection Systems

10/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Goals

The length of the recordedsample should not be morethan 15 seconds long

The sample can come fromany portion of the originalaudio track

Robust

Page 15: Fingerprinting - Audio and Video Detection Systems

10/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Goals

The length of the recordedsample should not be morethan 15 seconds long

The sample can come fromany portion of the originalaudio track

Robust

0 2 4 6 8 10 12 14 16 18 20−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Time, s

Nor

mal

ized

am

plitu

de

The signal in the time domain

Page 16: Fingerprinting - Audio and Video Detection Systems

10/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Goals

The length of the recordedsample should not be morethan 15 seconds long

The sample can come fromany portion of the originalaudio track

Robust

Page 17: Fingerprinting - Audio and Video Detection Systems

11/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Hashing

Page 18: Fingerprinting - Audio and Video Detection Systems

12/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Transformations to the frequency domainFourier transform

Fourier transform: maps the signal x(t) : R→ C into

X (f ) =

∫ ∞∞

x(t)e−j2πftdt

which decomposes the function into its frequencies.

Page 19: Fingerprinting - Audio and Video Detection Systems

12/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Transformations to the frequency domainFourier transform

Fourier transform: maps the signal x(t) : R→ C into

X (f ) =

∫ ∞∞

x(t)e−j2πftdt

which decomposes the function into its frequencies.

Page 20: Fingerprinting - Audio and Video Detection Systems

13/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Short-time Fourier Transform

STFT is a Fourier-related transform used to determine thesinusoidal frequency and phase content of local sections of a signalas it changes over time.

x(t)STFT7−−−→ X (τ, f ) =

∫ ∞−∞

x(t)w(t − τ)e−j2πftdt

Page 21: Fingerprinting - Audio and Video Detection Systems

13/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Short-time Fourier Transform

STFT is a Fourier-related transform used to determine thesinusoidal frequency and phase content of local sections of a signalas it changes over time.

x(t)STFT7−−−→ X (τ, f ) =

∫ ∞−∞

x(t)w(t − τ)e−j2πftdt

Page 22: Fingerprinting - Audio and Video Detection Systems

14/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Constellation and Hashing

Page 23: Fingerprinting - Audio and Video Detection Systems

14/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Constellation and Hashing

Page 24: Fingerprinting - Audio and Video Detection Systems

14/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Constellation and Hashing

Page 25: Fingerprinting - Audio and Video Detection Systems

14/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Constellation and Hashing

Page 26: Fingerprinting - Audio and Video Detection Systems

15/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Hashing

Each anchor point is sequentially paired with points within itstarget zone.

(f1, f2,∆t)→ hash

30 bits hash:

10 bits frequency anchor

10 bits frequency target

10 bits time difference

Each hash is associated with thetime offset:

hash:offset = (f1, f2,∆t) : t1

Page 27: Fingerprinting - Audio and Video Detection Systems

15/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Hashing

Each anchor point is sequentially paired with points within itstarget zone.

(f1, f2,∆t)→ hash

30 bits hash:

10 bits frequency anchor

10 bits frequency target

10 bits time difference

Each hash is associated with thetime offset:

hash:offset = (f1, f2,∆t) : t1

Page 28: Fingerprinting - Audio and Video Detection Systems

15/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Hashing

Each anchor point is sequentially paired with points within itstarget zone.

(f1, f2,∆t)→ hash

30 bits hash:

10 bits frequency anchor

10 bits frequency target

10 bits time difference

Each hash is associated with thetime offset:

hash:offset = (f1, f2,∆t) : t1

Page 29: Fingerprinting - Audio and Video Detection Systems

16/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Searching and Scoring

Page 30: Fingerprinting - Audio and Video Detection Systems

17/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Searching and Scoring

We take all matching hashes of a given song and we plot their timeoffset

DB song Query songh:t h:τ

Page 31: Fingerprinting - Audio and Video Detection Systems

17/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Searching and Scoring

We take all matching hashes of a given song and we plot their timeoffset

DB song Query songh:t h:τ

Page 32: Fingerprinting - Audio and Video Detection Systems

17/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Searching and Scoring

We take all matching hashes of a given song and we plot their timeoffset

DB song Query songh:t h:τ

Page 33: Fingerprinting - Audio and Video Detection Systems

17/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Searching and Scoring

We take all matching hashes of a given song and we plot their timeoffset

DB song Query songh:t h:τ

Page 34: Fingerprinting - Audio and Video Detection Systems

18/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Scoring

Matching features: tk = τk + constant

For each (tk , τk) in the scatter-plot, we calculate:

∆k = tk − τk

Then we plot a histogram of these ∆k

Page 35: Fingerprinting - Audio and Video Detection Systems

18/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Scoring

Matching features: tk = τk + constant

For each (tk , τk) in the scatter-plot, we calculate:

∆k = tk − τk

Then we plot a histogram of these ∆k

Page 36: Fingerprinting - Audio and Video Detection Systems

18/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Scoring

Matching features: tk = τk + constant

For each (tk , τk) in the scatter-plot, we calculate:

∆k = tk − τk

Then we plot a histogram of these ∆k

Page 37: Fingerprinting - Audio and Video Detection Systems

19/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

ScoringTrack that does not match the sample

Figure : No diagonal in the scatter-plot.

Page 38: Fingerprinting - Audio and Video Detection Systems

20/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

ScoringTrack that matches the sample

Figure : Diagonal in the scatter-plot: cluster in the histogram.

Page 39: Fingerprinting - Audio and Video Detection Systems

21/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Results

Page 40: Fingerprinting - Audio and Video Detection Systems

22/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Results

The algorithm performs well with significant levels of noise

For a database of 20 about thousand tracks the search time isin the order of 5− 500 milliseconds.

Page 41: Fingerprinting - Audio and Video Detection Systems

22/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Results

The algorithm performs well with significant levels of noise

For a database of 20 about thousand tracks the search time isin the order of 5− 500 milliseconds.

Page 42: Fingerprinting - Audio and Video Detection Systems

23/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Table of Contents

1 Introduction

2 Audio Fingerprinting

3 Video FingerprintingIntroductionPreprocessingDiscrete cosine transform3D-DCTFingerprinting using TIRIsSearchingResults

4 Conclusion

Page 43: Fingerprinting - Audio and Video Detection Systems

24/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Introduction

Page 44: Fingerprinting - Audio and Video Detection Systems

25/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Introduction

Copy detection system

The purpose of this system is todetect copyrights violations.

A fingerprint should be

Robust to content-preservingdistortions

Discriminant

Easy to compute

Compact

Page 45: Fingerprinting - Audio and Video Detection Systems

25/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Introduction

Copy detection system

The purpose of this system is todetect copyrights violations.

A fingerprint should be

Robust to content-preservingdistortions

Discriminant

Easy to compute

Compact

Page 46: Fingerprinting - Audio and Video Detection Systems

25/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Introduction

Copy detection system

The purpose of this system is todetect copyrights violations.

A fingerprint should be

Robust to content-preservingdistortions

Discriminant

Easy to compute

Compact

Page 47: Fingerprinting - Audio and Video Detection Systems

25/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Introduction

Copy detection system

The purpose of this system is todetect copyrights violations.

A fingerprint should be

Robust to content-preservingdistortions

Discriminant

Easy to compute

Compact

Page 48: Fingerprinting - Audio and Video Detection Systems

25/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Introduction

Copy detection system

The purpose of this system is todetect copyrights violations.

A fingerprint should be

Robust to content-preservingdistortions

Discriminant

Easy to compute

Compact

Page 49: Fingerprinting - Audio and Video Detection Systems

26/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Image and Video Representations

RBG

Colors represented as acombination of red, greenand blue (3D-space)Each component is aninteger value in [0, 255]

YUV

Y is the luminancecomponent, U and V arethe chrominancecomponents.

Page 50: Fingerprinting - Audio and Video Detection Systems

26/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Image and Video Representations

RBG

Colors represented as acombination of red, greenand blue (3D-space)

Each component is aninteger value in [0, 255]

YUV

Y is the luminancecomponent, U and V arethe chrominancecomponents.

Page 51: Fingerprinting - Audio and Video Detection Systems

26/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Image and Video Representations

RBG

Colors represented as acombination of red, greenand blue (3D-space)Each component is aninteger value in [0, 255]

YUV

Y is the luminancecomponent, U and V arethe chrominancecomponents.

Page 52: Fingerprinting - Audio and Video Detection Systems

26/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Image and Video Representations

RBG

Colors represented as acombination of red, greenand blue (3D-space)Each component is aninteger value in [0, 255]

YUV

Y is the luminancecomponent, U and V arethe chrominancecomponents.

Page 53: Fingerprinting - Audio and Video Detection Systems

26/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Image and Video Representations

RBG

Colors represented as acombination of red, greenand blue (3D-space)Each component is aninteger value in [0, 255]

YUV

Y is the luminancecomponent, U and V arethe chrominancecomponents.

Page 54: Fingerprinting - Audio and Video Detection Systems

27/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Preprocessing

Page 55: Fingerprinting - Audio and Video Detection Systems

28/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Preprocessing

Page 56: Fingerprinting - Audio and Video Detection Systems

29/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Discrete cosine transform

Page 57: Fingerprinting - Audio and Video Detection Systems

30/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Discrete cosine transform

The DCT is similar to the discrete Fourier transform: ittransforms a signal or image from the spatial domain to thefrequency domain.

It is a widely used tool in image and video processing andcompression!

It is possible to extract some relevant information from thefrequency representation

Page 58: Fingerprinting - Audio and Video Detection Systems

30/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Discrete cosine transform

The DCT is similar to the discrete Fourier transform: ittransforms a signal or image from the spatial domain to thefrequency domain.

It is a widely used tool in image and video processing andcompression!

It is possible to extract some relevant information from thefrequency representation

Page 59: Fingerprinting - Audio and Video Detection Systems

30/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Discrete cosine transform

The DCT is similar to the discrete Fourier transform: ittransforms a signal or image from the spatial domain to thefrequency domain.

It is a widely used tool in image and video processing andcompression!

It is possible to extract some relevant information from thefrequency representation

Page 60: Fingerprinting - Audio and Video Detection Systems

31/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Discrete cosine transformFormula

The two-dimensional DCT of an M-by-N matrix A is defined asfollow:

Bpq = αpαq

M−1∑m=0

N−1∑n=0

Amn cosπ(2m + 1)p

2Mcos

π(2n + 1)q

2N

αp =

1√M, p = 0√

2M , 1 ≤ p ≤ M − 1

αq =

1√N, q = 0√

2N , 1 ≤ q ≤ M − 1

where 0 ≤ p ≤ M − 1 and 0 ≤ q ≤ N − 1 are the matrix indexes.

Page 61: Fingerprinting - Audio and Video Detection Systems

32/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Discrete cosine transformExample

Page 62: Fingerprinting - Audio and Video Detection Systems

34/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

3D-DCT

Page 63: Fingerprinting - Audio and Video Detection Systems

35/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

3D Discrete cosine transform

3D-DCT is a DCT applied to a 3-dimensional matrix.

Time domain Frequency domain

Page 64: Fingerprinting - Audio and Video Detection Systems

35/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

3D Discrete cosine transform

3D-DCT is a DCT applied to a 3-dimensional matrix.Time domain Frequency domain

Page 65: Fingerprinting - Audio and Video Detection Systems

36/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Fingerprinting using 3D-DCT

Binary fingerprints are derived bythresholding the low-frequencycoefficients of the transform; thethreshold is the median value ofthe selected coefficients.

Page 66: Fingerprinting - Audio and Video Detection Systems

36/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Fingerprinting using 3D-DCT

Binary fingerprints are derived bythresholding the low-frequencycoefficients of the transform; thethreshold is the median value ofthe selected coefficients.

Page 67: Fingerprinting - Audio and Video Detection Systems

36/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Fingerprinting using 3D-DCT

Binary fingerprints are derived bythresholding the low-frequencycoefficients of the transform; thethreshold is the median value ofthe selected coefficients.

Page 68: Fingerprinting - Audio and Video Detection Systems

37/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Fingerprinting using TIRIs

Page 69: Fingerprinting - Audio and Video Detection Systems

38/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

TIRITemporally Informative Representative Image

A TIRI contains spatial and temporal information of a shortsegment of a video sequence.

l ′m,n =J∑

k=1

wk lm,n,k

where lm,n,k is the luminancevalue of the (m,n)th pixel in aset of J frames and wk is aweighting factor.

Page 70: Fingerprinting - Audio and Video Detection Systems

38/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

TIRITemporally Informative Representative Image

A TIRI contains spatial and temporal information of a shortsegment of a video sequence.

l ′m,n =J∑

k=1

wk lm,n,k

where lm,n,k is the luminancevalue of the (m,n)th pixel in aset of J frames and wk is aweighting factor.

Page 71: Fingerprinting - Audio and Video Detection Systems

39/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

TIRIWeighting factor

Experiments were made with different w : constant, linear,exponential.

Figure : (d) wk = 1 (costant), (e) wk = k (linear), (f) wk = 1.2k (exponential)

Page 72: Fingerprinting - Audio and Video Detection Systems

40/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

TIRI-DCT

Page 73: Fingerprinting - Audio and Video Detection Systems

41/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Searching

Page 74: Fingerprinting - Audio and Video Detection Systems

42/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Searching requirements

After the fingerprint is extracted, the database is searched for theclosest fingerprint.

Nearest neighbor problem

Fingerprints of two different copies of the same video are similarbut not necessarily identical. We then search for the most similarfingerprint in the binary space.

Since the database is very large, the algorithm must be fast. A fastapproximate algorithm is preferred over an exact slow one.

Hamming distance

The process used to create binary fingerprints allow us to use theHamming distance as a metric of similarity.

Page 75: Fingerprinting - Audio and Video Detection Systems

42/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Searching requirements

After the fingerprint is extracted, the database is searched for theclosest fingerprint.

Nearest neighbor problem

Fingerprints of two different copies of the same video are similarbut not necessarily identical. We then search for the most similarfingerprint in the binary space.

Since the database is very large, the algorithm must be fast. A fastapproximate algorithm is preferred over an exact slow one.

Hamming distance

The process used to create binary fingerprints allow us to use theHamming distance as a metric of similarity.

Page 76: Fingerprinting - Audio and Video Detection Systems

42/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Searching requirements

After the fingerprint is extracted, the database is searched for theclosest fingerprint.

Nearest neighbor problem

Fingerprints of two different copies of the same video are similarbut not necessarily identical. We then search for the most similarfingerprint in the binary space.

Since the database is very large, the algorithm must be fast. A fastapproximate algorithm is preferred over an exact slow one.

Hamming distance

The process used to create binary fingerprints allow us to use theHamming distance as a metric of similarity.

Page 77: Fingerprinting - Audio and Video Detection Systems

42/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Searching requirements

After the fingerprint is extracted, the database is searched for theclosest fingerprint.

Nearest neighbor problem

Fingerprints of two different copies of the same video are similarbut not necessarily identical. We then search for the most similarfingerprint in the binary space.

Since the database is very large, the algorithm must be fast. A fastapproximate algorithm is preferred over an exact slow one.

Hamming distance

The process used to create binary fingerprints allow us to use theHamming distance as a metric of similarity.

Page 78: Fingerprinting - Audio and Video Detection Systems

43/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Inverted-File-Based Similarity search

0110001︸ ︷︷ ︸1st word

0010110︸ ︷︷ ︸2nd word

. . . 1110001︸ ︷︷ ︸(n − 1)th word

0010101︸ ︷︷ ︸nth word

m = word length

n =L

m

Position of a word in a fingerprint1 2 3 . . . n-1 n

11, 44,250

10, 14,99

21, 250

2 2, 865, 170,241

.... . .

...2m . . . 61, 26 4

Page 79: Fingerprinting - Audio and Video Detection Systems

43/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Inverted-File-Based Similarity search

0110001︸ ︷︷ ︸1st word

0010110︸ ︷︷ ︸2nd word

. . . 1110001︸ ︷︷ ︸(n − 1)th word

0010101︸ ︷︷ ︸nth word

m = word length

n =L

m

Position of a word in a fingerprint1 2 3 . . . n-1 n

11, 44,250

10, 14,99

21, 250

2 2, 865, 170,241

.... . .

...2m . . . 61, 26 4

Page 80: Fingerprinting - Audio and Video Detection Systems

44/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Inverted-File-Based Similarity search

To find a query fingerprint in a database:

The fingerprint is divided into n words (of m bits)

The query is compared with fingerprints that starts with thesame word.

If the Hamming distance is below a certain threshold, thematch is found.

Otherwise the procedure is repeated with the second word andso on.

Position of a word in a fingerprint1 2 3 . . . n-1 n

1 1, 44, 250 10, 14, 99 21, 2502 2, 86 5, 170, 241

.

.

.. . .

.

.

.2m . . . 61, 26 4

Page 81: Fingerprinting - Audio and Video Detection Systems

44/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Inverted-File-Based Similarity search

To find a query fingerprint in a database:

The fingerprint is divided into n words (of m bits)

The query is compared with fingerprints that starts with thesame word.

If the Hamming distance is below a certain threshold, thematch is found.

Otherwise the procedure is repeated with the second word andso on.

Position of a word in a fingerprint1 2 3 . . . n-1 n

1 1, 44, 250 10, 14, 99 21, 2502 2, 86 5, 170, 241

.

.

.. . .

.

.

.2m . . . 61, 26 4

Page 82: Fingerprinting - Audio and Video Detection Systems

44/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Inverted-File-Based Similarity search

To find a query fingerprint in a database:

The fingerprint is divided into n words (of m bits)

The query is compared with fingerprints that starts with thesame word.

If the Hamming distance is below a certain threshold, thematch is found.

Otherwise the procedure is repeated with the second word andso on.

Position of a word in a fingerprint1 2 3 . . . n-1 n

1 1, 44, 250 10, 14, 99 21, 2502 2, 86 5, 170, 241

.

.

.. . .

.

.

.2m . . . 61, 26 4

Page 83: Fingerprinting - Audio and Video Detection Systems

44/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Inverted-File-Based Similarity search

To find a query fingerprint in a database:

The fingerprint is divided into n words (of m bits)

The query is compared with fingerprints that starts with thesame word.

If the Hamming distance is below a certain threshold, thematch is found.

Otherwise the procedure is repeated with the second word andso on.

Position of a word in a fingerprint1 2 3 . . . n-1 n

1 1, 44, 250 10, 14, 99 21, 2502 2, 86 5, 170, 241

.

.

.. . .

.

.

.2m . . . 61, 26 4

Page 84: Fingerprinting - Audio and Video Detection Systems

45/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Cluster-based similarity search

The database is clustered into K non-overlapping groups; eachcluster has a centroid called cluster head.

The cluster head closest to the query fingerprint is found

0100001︸ ︷︷ ︸0

dist0 = 2

0010110︸ ︷︷ ︸0

dist0 = 3

1010111︸ ︷︷ ︸1

dist1 = 2

0000001︸ ︷︷ ︸0

dist0 = 1

1111101︸ ︷︷ ︸1

dist1 = 1

Cluster head = 00101

Every fingerprint belonging to this cluster are searched to finda match

If no match is found, then the cluster that is the secondclosest to the query is examined, then the third and so on

Page 85: Fingerprinting - Audio and Video Detection Systems

45/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Cluster-based similarity search

The database is clustered into K non-overlapping groups; eachcluster has a centroid called cluster head.

The cluster head closest to the query fingerprint is found

0100001︸ ︷︷ ︸0

dist0 = 2

0010110︸ ︷︷ ︸0

dist0 = 3

1010111︸ ︷︷ ︸1

dist1 = 2

0000001︸ ︷︷ ︸0

dist0 = 1

1111101︸ ︷︷ ︸1

dist1 = 1

Cluster head = 00101

Every fingerprint belonging to this cluster are searched to finda match

If no match is found, then the cluster that is the secondclosest to the query is examined, then the third and so on

Page 86: Fingerprinting - Audio and Video Detection Systems

45/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Cluster-based similarity search

The database is clustered into K non-overlapping groups; eachcluster has a centroid called cluster head.

The cluster head closest to the query fingerprint is found

0100001︸ ︷︷ ︸0

dist0 = 2

0010110︸ ︷︷ ︸0

dist0 = 3

1010111︸ ︷︷ ︸1

dist1 = 2

0000001︸ ︷︷ ︸0

dist0 = 1

1111101︸ ︷︷ ︸1

dist1 = 1

Cluster head = 00101

Every fingerprint belonging to this cluster are searched to finda match

If no match is found, then the cluster that is the secondclosest to the query is examined, then the third and so on

Page 87: Fingerprinting - Audio and Video Detection Systems

45/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Cluster-based similarity search

The database is clustered into K non-overlapping groups; eachcluster has a centroid called cluster head.

The cluster head closest to the query fingerprint is found

0100001︸ ︷︷ ︸0

dist0 = 2

0010110︸ ︷︷ ︸0

dist0 = 3

1010111︸ ︷︷ ︸1

dist1 = 2

0000001︸ ︷︷ ︸0

dist0 = 1

1111101︸ ︷︷ ︸1

dist1 = 1

Cluster head = 00101

Every fingerprint belonging to this cluster are searched to finda match

If no match is found, then the cluster that is the secondclosest to the query is examined, then the third and so on

Page 88: Fingerprinting - Audio and Video Detection Systems

46/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Results

Page 89: Fingerprinting - Audio and Video Detection Systems

47/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Results

A database of 200 videos was created

Videos were attacked (distorted) to create query videos

Metrics

Recall It’s the true positive rate. It is a measure of robustness.

Recall =TP

TP + FN

Precision Percentage of correct hits within all detected copies.

Precision =TP

TP + FP

F-score It is the harmonic mean of precision and recall:

F = 2precision· recall

precision + recall

Page 90: Fingerprinting - Audio and Video Detection Systems

47/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Results

A database of 200 videos was created

Videos were attacked (distorted) to create query videos

Metrics

Recall It’s the true positive rate. It is a measure of robustness.

Recall =TP

TP + FN

Precision Percentage of correct hits within all detected copies.

Precision =TP

TP + FP

F-score It is the harmonic mean of precision and recall:

F = 2precision· recall

precision + recall

Page 91: Fingerprinting - Audio and Video Detection Systems

47/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Results

A database of 200 videos was created

Videos were attacked (distorted) to create query videos

Metrics

Recall It’s the true positive rate. It is a measure of robustness.

Recall =TP

TP + FN

Precision Percentage of correct hits within all detected copies.

Precision =TP

TP + FP

F-score It is the harmonic mean of precision and recall:

F = 2precision· recall

precision + recall

Page 92: Fingerprinting - Audio and Video Detection Systems

47/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Results

A database of 200 videos was created

Videos were attacked (distorted) to create query videos

Metrics

Recall It’s the true positive rate. It is a measure of robustness.

Recall =TP

TP + FN

Precision Percentage of correct hits within all detected copies.

Precision =TP

TP + FP

F-score It is the harmonic mean of precision and recall:

F = 2precision· recall

precision + recall

Page 93: Fingerprinting - Audio and Video Detection Systems

48/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Results

Attack F-score3D-DCT TIRI-DCT

Noise 1.00 0.99

Brightness 1.00 0.99

Contrast 1.00 0.99

Rotation 1.00 1.00

Time shift 0.91 0.98

Spatial shift 1.00 0.98

Frame loss 0.98 0.99

Average 0.99 0.99

Speed

TIRI-DCT is more than 3times faster than 3D-DCT

Approximate searchalgorithms are much fasterthan the exhaustive searchwhen the error rates are low.Cluster-based approach isfaster.

Page 94: Fingerprinting - Audio and Video Detection Systems

48/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Results

Attack F-score3D-DCT TIRI-DCT

Noise 1.00 0.99

Brightness 1.00 0.99

Contrast 1.00 0.99

Rotation 1.00 1.00

Time shift 0.91 0.98

Spatial shift 1.00 0.98

Frame loss 0.98 0.99

Average 0.99 0.99

Speed

TIRI-DCT is more than 3times faster than 3D-DCT

Approximate searchalgorithms are much fasterthan the exhaustive searchwhen the error rates are low.Cluster-based approach isfaster.

Page 95: Fingerprinting - Audio and Video Detection Systems

48/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Results

Attack F-score3D-DCT TIRI-DCT

Noise 1.00 0.99

Brightness 1.00 0.99

Contrast 1.00 0.99

Rotation 1.00 1.00

Time shift 0.91 0.98

Spatial shift 1.00 0.98

Frame loss 0.98 0.99

Average 0.99 0.99

Speed

TIRI-DCT is more than 3times faster than 3D-DCT

Approximate searchalgorithms are much fasterthan the exhaustive searchwhen the error rates are low.

Cluster-based approach isfaster.

Page 96: Fingerprinting - Audio and Video Detection Systems

48/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Results

Attack F-score3D-DCT TIRI-DCT

Noise 1.00 0.99

Brightness 1.00 0.99

Contrast 1.00 0.99

Rotation 1.00 1.00

Time shift 0.91 0.98

Spatial shift 1.00 0.98

Frame loss 0.98 0.99

Average 0.99 0.99

Speed

TIRI-DCT is more than 3times faster than 3D-DCT

Approximate searchalgorithms are much fasterthan the exhaustive searchwhen the error rates are low.Cluster-based approach isfaster.

Page 97: Fingerprinting - Audio and Video Detection Systems

49/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Table of Contents

1 Introduction

2 Audio Fingerprinting

3 Video Fingerprinting

4 Conclusion

Page 98: Fingerprinting - Audio and Video Detection Systems

50/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Shazam

The algorithm is noise resistant and efficient. It can identify ashort segment of music out of a database of over a milliontracks. However the algorithm is not meant to be resistant togeneral attacks (e.g time compression, pitch alterations, . . . ).

Shazam was founded in 1999 and launched its firstidentification service in 2002

Shazam has now 400 million users in 200 countries and wasused to identify 15 billion songs

The value of the company is now half a billion dollars

Page 99: Fingerprinting - Audio and Video Detection Systems

50/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Shazam

The algorithm is noise resistant and efficient. It can identify ashort segment of music out of a database of over a milliontracks. However the algorithm is not meant to be resistant togeneral attacks (e.g time compression, pitch alterations, . . . ).

Shazam was founded in 1999 and launched its firstidentification service in 2002

Shazam has now 400 million users in 200 countries and wasused to identify 15 billion songs

The value of the company is now half a billion dollars

Page 100: Fingerprinting - Audio and Video Detection Systems

50/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Shazam

The algorithm is noise resistant and efficient. It can identify ashort segment of music out of a database of over a milliontracks. However the algorithm is not meant to be resistant togeneral attacks (e.g time compression, pitch alterations, . . . ).

Shazam was founded in 1999 and launched its firstidentification service in 2002

Shazam has now 400 million users in 200 countries and wasused to identify 15 billion songs

The value of the company is now half a billion dollars

Page 101: Fingerprinting - Audio and Video Detection Systems

50/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Shazam

The algorithm is noise resistant and efficient. It can identify ashort segment of music out of a database of over a milliontracks. However the algorithm is not meant to be resistant togeneral attacks (e.g time compression, pitch alterations, . . . ).

Shazam was founded in 1999 and launched its firstidentification service in 2002

Shazam has now 400 million users in 200 countries and wasused to identify 15 billion songs

The value of the company is now half a billion dollars

Page 102: Fingerprinting - Audio and Video Detection Systems

51/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Video fingerprinting

The proposed system is robust and it is able to find a match ina database of tens of millions of fingerprints in a few seconds

It yields a high average true positive rate of 97.6% and a lowaverage false positive rate of 1%.

Page 103: Fingerprinting - Audio and Video Detection Systems

51/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Video fingerprinting

The proposed system is robust and it is able to find a match ina database of tens of millions of fingerprints in a few seconds

It yields a high average true positive rate of 97.6% and a lowaverage false positive rate of 1%.

Page 104: Fingerprinting - Audio and Video Detection Systems

52/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Youtube testing

Results of Scott Smitelli tests:

Modification Performed Result

Song reversed Pass

Pitch was lowered 6% Pass

Pitch was lowered 5% Fail

Pitch was raised 5% Fail

Pitch was raised 6% Pass

Time was expanded 6% Pass

Time was expanded 5% Fail

Time was compressed 5% Fail

Time was compressed 6% Pass

44% noise, 56% song Fail

45% noise, 55% song Pass

Modification Performed Result

Volume was reduced 48 dB Fail

Volume was reduced 18 dB Fail

Volume was reduced 6 dB Fail

Volume was increased 6 dB Fail

Volume was increased 18 dB Fail

Volume was increased 48 dB Fail

Slow down 4% Pass

Slow down 3% Fail

Speed up 3% Fail

Speed up 4% Fail

Speed up 5% Pass

Page 105: Fingerprinting - Audio and Video Detection Systems

53/53

Introduction Audio Fingerprinting Video Fingerprinting Conclusion

Questions?