100
Mul$modal pa+ern matching algorithms and applica$ons Xavier Anguera Telefonica Research

Multimodal pattern matching algorithms and applications

Embed Size (px)

DESCRIPTION

In this presentation I focus on 3 projects I have been working in the last year. The first one is a novel pattern matching algorithm, based on the well known Dynamic Time Warping. The presented algorithm can be used to find real-valued subsequences within a longer sequence, without prior knowledge of their start-end points. I have applied the algorithm for the task of acoustic matching, for which I will show some preliminary results. Then I will continue to explain a second DTW-based algorithm, this one being able do an online of two musical pieces. One of the music pieces can be input life or be retrieved from an audio file, while the second one is extracted from an online music video. The online alignment allows for the music video to be played in total synchrony with the corresponding ambient/recorded audio. Finally, I will talk about video copy detection, which is the task of finding video duplicate segments within a big database. I will explain our multimodal approach, based on audio-visual change-based features.

Citation preview

Page 1: Multimodal pattern matching algorithms and applications

Mul$modal  pa+ern  matching  algorithms  and  applica$ons  

Xavier  Anguera  Telefonica  Research  

Page 2: Multimodal pattern matching algorithms and applications

Outline  

•  Introduc$on  •  Par$al  sequence  matching    

– U-­‐DTW  algorithm    

•  Music/video  online  synchroniza$on    – MuViSync  prototype  

•  Video  Copy  detec$on  

Page 3: Multimodal pattern matching algorithms and applications

Par$al  Sequence  Matching  Using  an  Unbounded  Dynamic  Time  Warping  

Algorithm  

Xavier  Anguera,  Robert  Macrare  and  Nuria  Oliver  

Telefonica  Research,  Barcelona,  Spain  

Page 4: Multimodal pattern matching algorithms and applications

Proposed  challenge  •  Given  one  or  several  audio  signals  we  want  to  find  and  align  recurring  acous$c  pa+erns.  

Page 5: Multimodal pattern matching algorithms and applications

Proposed  challenge  •  We  could  use  the  ASR/phone$c  output  and  search  for  symbol  

repe$$ons  PROS:  –  It  is  easy  to  apply,  the  ASR  takes  care  of  any  $me  warping  CONS:  –  ASR  is  language  dependent  and  requires  training  –  We  introduce  addi$onal  sources  of  error  (acous$c  condi$ons,  OOV’s)  –  It  can  be  very  slow  and  not  embeddable  

•  Automa$c  mo$f  discovery  directly  in  the  speech  signal  –  Train  free,  language  independent  and  resilient  to  some  noises  

ASR/Phone$za$on  

symbols  alignment  

Symbolic  representa$on  

acous$c  alignment  

•   Alignment  loca$ons  •   Scores  

Page 6: Multimodal pattern matching algorithms and applications

Areas  of  applica$on  

•  Improve  ASR  by  disambigua$on  over  several  repe$$ons  (Park  and  Glass,  2005)  

•  Pa+ern-­‐based  speech  recogni$on  –  flat  modelling  (Zweig  and  Nguyen,  2010)  

•  Acous$c  summariza$on  (Muscariello,  2009)  

•  Musical  structure  analysis  (Müller,  2007)  

•  Server-­‐less  mobile  voice  search  (Anguera,  2010)    

Page 7: Multimodal pattern matching algorithms and applications

Automa$c  mo$f  discovery  •  Goal  is  to  avoid  going  to  text  and  therefore  be  more  robust  to  errors  

•  Good  deal  of  applicable  work  on  this  area:  – Biomedicine  in  matching  DNA  sequences  (conver$ng  the  speech  signals  into  symbol  strings)  

– Directly  from  real-­‐valued  mul$dimensional  samples  using  DTW-­‐like  algorithms  •  Müller’07,  Muscariello’09,  Park’05,  Zweig’10  •  Most  need  to  compute  all  the  cost  matrix  a  priori  

Page 8: Multimodal pattern matching algorithms and applications

Dynamic  Time  Warping  -­‐  DTW  •  DTW  algorithm  allows  the  computa$on  of  the  op$mal  alignment  between  two  $me  series      Xu,  Yv  ε  ΦD    

Image  by  Daniel  Lemire  

XU = (u1,...,um,...,uM )

XV = (v1,....,vn,..,vN )

Page 9: Multimodal pattern matching algorithms and applications

Dynamic  Time  Warping  (II)  •  The  op$mal  alignment  can  be  found  in  O(MN)  complexity  using  dynamic  programming.  

•  We  need  to  define  a  cost  func$on  between  any  two  elements  in  the  series  and  build  a  distance  matrix:  

d :ΦD × ΦD →ℜ≥ 0

Image  by  Tsanko  Dyustabanov  

d(i, j) = um − vn

Where  usually:  

c(i(k), j(k))

F = c(1),...,c(K)Warping  func$on:                                                                          where  

Euclidean  distance  

Page 10: Multimodal pattern matching algorithms and applications

Warping  constraints  For  speech  signals  some  constraints  are  usually  applied  to  the  warping  func$on  F:  – Monotonicity:  

         – Con$nuity  (i.e.  local  constraints):  

i(k −1) ≤ i(k)

j(k −1) ≤ j(k)

i(k) − i(k −1) ≤1

j(k) − j(k −1) ≤1

Sakoe,H.  and  Chiba,S.  (1978)  Dynamic  programming  algorithm  op0miza0on  for  spoken  word  recogni0on,  IEEE  Trans.  on  Acoust.,  Speech,  and  Signal  Process,  ASSP-­‐26,  43-­‐49.  

(m,  n)  

(m-­‐1,  n-­‐1)  

(m-­‐1,  n)  

D(m,n) =minD(m −1,n)D(m,n −1)D(m −1,n −1)

⎨ ⎪

⎩ ⎪

+ d(um,vn )

Page 11: Multimodal pattern matching algorithms and applications

Warping  constraints  (II)  – Boundary  condi$on:    

i.e.  DTW  needs  prior  knowledge  of  the  start-­‐end  alignment  points.  

– Global  constraints  €

i(1) =1

j(1) =1

i(K) = M

j(K) = N

Image  from  Keogh  and  Ratanamahatana  

Page 12: Multimodal pattern matching algorithms and applications

DTW  Dynamic  Programming  

Page 13: Multimodal pattern matching algorithms and applications

DTW  Dynamic  Programming  

Page 14: Multimodal pattern matching algorithms and applications

DTW  Dynamic  Programming  

Page 15: Multimodal pattern matching algorithms and applications

DTW  Dynamic  Programming  

Page 16: Multimodal pattern matching algorithms and applications

DTW  main  problem  •  The  boundary  condi$on  constraints  $me-­‐series  to  be  aligned  from  start  to  end  – We  need  a  modifica$on  to  DTW  to  allow  common  pa+ern  discovery  in  reference  and  query  signals  regardless  of  the  sequence’s  other  content  

Page 17: Multimodal pattern matching algorithms and applications

Alterna$ve  proposals  

•  Meinard  Müller’s  Path  extrac$on  for  music  – Needs  to  pre-­‐compute  the  complete  cost  matrix.  

•  Alex  Park’s  Segmental  DTW  – Needs  to  pre-­‐compute  the  complete  cost  matrix,  very  computa$onally  expensive  ajerwards.    

•  Armando  Muscarielo’s  word  discovery  algorithm  – Searches  for  pa+erns  locally,  does  not  check  all  possible  star$ng  points.  

[1]  M.  Müller,  “Informa$on  Retrieval  for  Music  and  Mo$on”,Springer,  New  York,  USA,  2007.  [2]  A.  Park  et  al.,  “Towards  unsupervised  pa+ern  discovery  in  speech,”  in  In  Proc.  ASRU’05,  Puerto  Rico,  2005.  [3]  A.  Muscariello  et  al.,  “Audio  keyword  extrac$on  by  unsupervised  word  discovery,”  in  Proc.  INTER-­‐  SPEECH’09,  2009.  

Page 18: Multimodal pattern matching algorithms and applications

Unbounded-­‐DTW  Algorithm  

•  U-­‐DTW  is  a  modifica$on  to  DTW  that  is  fast  and  accurate  in  finding  recurring  pa+erns  

•  We  call  it  unbounded  because:  – The  start-­‐end  posi$ons  of  both  segments  are  not  constrained  

– Mul$ple  matching  segments  can  be  found  with  a  single  pass  of  the  algorithm  

– Minimizes  the  computa$onal  cost  of  comparing  two  mul$dimensional  $me  series  

Page 19: Multimodal pattern matching algorithms and applications

U-­‐DTW  Cost  func$on  and  matching  length  

•  Given  two  sequences  to  be  matched        U=(u1,  u2,  …,  uM)  and  V=(v1,  v2,  …,  vN)  

   we  use  the  inner  product  similarity      

 Values  range  [-­‐1,1],  the  higher  the  closer  •  We  look  for  matching  sequences  with  a  minimum  length  Lmin  (set  at  400ms  in  our  experiments)  €

s(m,n) = cosθ =um ,vnum vn

Page 20: Multimodal pattern matching algorithms and applications

U-­‐DTW  global/local  constraints  

•  no  global  constraints  are  applied  in  order  to  allow  for  matching  of  any  segment  among  both  sequences  

•  Local  constraints  are  set  to  allow  warping  up  to  2X  

(m,  n)  

(m-­‐1,  n-­‐2)  

(m-­‐1,  n-­‐1)  

(m-­‐2,  n-­‐1)  

D(m,n) =maxD(m − 2,n)D(m,n − 2)D(m − 2,n − 2)

⎨ ⎪

⎩ ⎪

+ s(um,vn )

Page 21: Multimodal pattern matching algorithms and applications

U-­‐DTW  computa$onal  savings  

•  Computa$onal  savings  are  achieved  thanks  to:  1.  We  sample  the  distance/similarity  matrix  at  

certain  possible  matching  start  points  (sesng  Synchroniza$on  points)  

2.  Dynamic  programming  is  done  forward,  prunning  out  low  similarity  paths  

Page 22: Multimodal pattern matching algorithms and applications

Synchroniza$on  points  •  Only  certain  (m,n)  posi$ons  are  analyzed  in  the  matrix  for  possible  matching  segments  – Selected  not  to  loose  any  matching  segment  – Op$mize  the  computa$onal  cost  

•  Two  methods  are  followed:  horizontal  and  ver$cal  bands:  

τh  

τd  

λ  

(m,n)  

λ  

λ  

π/4  2τh  

(m,n)  

U  

U  

V  V  

Page 23: Multimodal pattern matching algorithms and applications

U-­‐DTW  Dynamic  Programming  

Page 24: Multimodal pattern matching algorithms and applications

Forward  dynamic  programming  •  For  each  posi$on  (m,n):  3  possible  forward  paths  are  considered  

•  The  forward  path  is  extended  forward  IIF:  –  Its  normalized  global  similarity  is  above  a  pruning  threshold  

–                           is  greater  than  any  previous  path  in  that  loca$on  

(m,  n)  

(m+1,  n+2)  

(m+1,  n+1)  

(m+2,  n+1)  

S(m',n') =D(m,n) + s(m',n')

M(m,n) +1≥Thrprun

S(m',n')

Page 25: Multimodal pattern matching algorithms and applications

U-­‐DTW  Dynamic  Programming  

Page 26: Multimodal pattern matching algorithms and applications

U-­‐DTW  Dynamic  Programming  

Page 27: Multimodal pattern matching algorithms and applications

Backward  path  algorithm  

•  When  a  possible  matching  segment  is  found  in  the  forward  path,  the  same  is  done  backwards  star$ng  from  the  origina$ng  SP  posi$on.  

The  same  procedure  is  followed  as  in  the  forward  path    

(m,  n)  

(m-­‐1,  n-­‐2)  

(m-­‐1,  n-­‐1)  

(m-­‐2,  n-­‐1)  

Page 28: Multimodal pattern matching algorithms and applications

U-­‐DTW  Dynamic  Programming  

Page 29: Multimodal pattern matching algorithms and applications

U-­‐DTW  Dynamic  Programming  

Page 30: Multimodal pattern matching algorithms and applications

Computa$onal  savings  example  Ba

rcelon

a  

Barcelona  

Page 31: Multimodal pattern matching algorithms and applications

Experimental  setup  •  We  asked  23  people  to  record  47  words  from  6  categories,  5  itera$ons  each:  

•  Simple  energy-­‐based  trimming  eliminates  non-­‐speech  regions  

•  We  simulate  acous$c  context  by  a+aching  different  start-­‐end  audio  sequences  to  Xu,v.  

Nature  

Ci$es  

People  

Events  

Family  

Monuments  

XU ,V [n,i],i =1...5, j =1...47

Page 32: Multimodal pattern matching algorithms and applications

Experimental  setup  (II)  

•  Signals  are  parameterized  with  10MFCC  every  10ms  

•  Each  word  Xu  is  compared  to  all  words  Xv  from  the  same  speaker  (234  comparisons)  and  the  closest  one  is  retrieved  

 We  get  a  hit  m=n,  a  miss  otherwise  •  Tests  were  performed  on  an  Ubuntu  Linux  PC  @2.4GHz.  €

argminm, j D(XU [n,i],XV [m, j]) | (n,i) ≠ (m, j)

Page 33: Multimodal pattern matching algorithms and applications

Comparing  systems  

•  Standard  DTW  – Compare  the  sequences  without  any  added  acous$c  context  (i.e.  prior  knowledge  of  start-­‐end  points)  

•  Segmental  DTW  (Park  and  Glass,  2005)  – Minimum  segment  length  of  500ms  – Band  size  of  70ms,  50%  overlap  

– Used  2  distances:  Euclidean  and  1-­‐inner  product  

Page 34: Multimodal pattern matching algorithms and applications

Performance  evalua$on  Used  metrics:  

–  Accuracy:  percentage  of  words  correctly  matched  (Xu  y  Xv  are  different  itera$ons  of  the  same  word).  

–  Average  processing  $me  per  sequence  pair  (Xu-­‐Xv)  (excluding  parameteriza$on)  

–  Average  ra$o  of  frame-­‐pair  distances  within  each  sequence-­‐pair  cost  matrix.    

Acc =correct matches∑all matches

⋅ 100

Time =time(D(XU [n,i],∑ XV [m, j]))

#matches⋅ 100

Ratio =computed(d(XU [n,i],XV [m, j]))∑

MN⋅ 100

Page 35: Multimodal pattern matching algorithms and applications

Results  

Algorithm   Accuracy   Avg.  ;me   ra;o  

Segmental  DTW  w/  Eucl.   80.61%   82.7ms   1  

Segmental  DTW  w/  inner  prod.   74.62%   86.7ms   1  

U-­‐DTW  horiz.  bands   89.53%   10.6ms   0.51  

U-­‐DTW  diag.  bands   89.34%   9.0ms   0.42  

Standard  DTW   95.42%   0.6ms   1  

Page 36: Multimodal pattern matching algorithms and applications

Effect  of  the  Cutout  Threshold  

Page 37: Multimodal pattern matching algorithms and applications

Conclusions  and  future  work  

•  We  propose  a  novel  algorithm  called  U-­‐DTW  for  unconstrained  pa+ern  discovery  in  speech    

•  We  show  it  is  faster  and  more  accurate  than  exis$ng  alterna$ves  

•  We  are  star$ng  to  test  the  algorithm  for  unrestricted  audio  summariza$on  

Page 38: Multimodal pattern matching algorithms and applications

MuViSync  AudioVisual  Music  Synchroniza$on  

Xavier  Anguera,  Robert  Macrae  and  Nuria  Oliver  

Page 39: Multimodal pattern matching algorithms and applications

…on  the  go,  …  

…at  home,  …  

People  enjoy  listening  to  their  favorite  music  everywhere…    

…or  in  a  party  with  friends  

Page 40: Multimodal pattern matching algorithms and applications

Users  increasingly  have  a  personal  mp3  music  collec$on…  

…but  it  usually  contains  ‘only’  music.    

What  if  you  could  watch  the  video  clip  of  any  of  our  songs  while  listening  to  it?  

Page 41: Multimodal pattern matching algorithms and applications

…but  the  audio  quality  is  much  worse  that  in  your  mp3…    

You  could  go  to  sites  like  YouTube…  

What  if  you  could  listen  to  our  high  quality  mp3  music  while  watching  the  video  clips?  

Page 42: Multimodal pattern matching algorithms and applications

MuViSync:      Music  and  Video  Synchroniza$on  system  

Personal  Music  

Video  clip  

streaming  

local  

MuViSync  

MuViSync  synchronizes  audio  and  video  from  two  different  

sources  and  plays  them  together  in-­‐sync  

Page 43: Multimodal pattern matching algorithms and applications

Applica$on  scenarios  

•  Watch  on  TV  your  favorite  music  – Personal  music  synchroniza$on  with  video  clips  either  local  or  streamed  

•  Watch  on  your  iPhone  your  music  – Personal  music  synchroniza$on  by  streaming  the  video  into  the  iPhone  

•  Iden0fy  and  watch  any  music  – Combined  with  songID  technology,  either  at  home  or  on  the  go.  

Page 44: Multimodal pattern matching algorithms and applications

MuViSync  applica$on  •  We  have  developed  a  prototype  applica0on  for  Windows/mac,  and  soon  for  Iphone.  

Page 45: Multimodal pattern matching algorithms and applications

Alignment  algorithm  requirements  

•  Perform  an  alignment  between  the  mp3  music  and  the  Video’s  audio  track  

•  Ini$ally  only  par$al  knowledge  is  available  from  both  sources  (life  recording  or  buffering)  

•  Alignment  has  to  be  done  online  and  in  real-­‐$me  

•  Emphasis  is  needed  on  the  user  sa$sfac$on  when  playing  the  video.  

Page 46: Multimodal pattern matching algorithms and applications

Applica$on  testbed  •  We  use  320  music  videos  (Youtube)  +  their  corresponding  mp3  files  

•  A  supervised  ground-­‐truth  alignment  was  performed  using  offline  DTW  and  checking  for  consistency  

•  Audio  is  processed  every  100ms  (200ms  window)  and  chroma  features  are  extracted  

Page 47: Multimodal pattern matching algorithms and applications

MuViSync  online  alignment  algorithm  

1.  Ini$al  path  discovery  –  Both  signals  (audio  and  video)  are  buffered,  features  

are  extracted  and  an  ini$al  alignment  is  found  

2.  Real-­‐$me  online  alignment  –  An  incremental  alignment  is  computed  

3.  Alignment  post-­‐processing  to  ensure  a  smooth  playback  of  the  aligned  video.  

Audio  +  feats  extrac$on  

Feats  extrac$on  

Ini$al  path  discovery  

Real-­‐$me  alignment  

1)  

2)  

ta   tv  

alignment  

Page 48: Multimodal pattern matching algorithms and applications

Ini$al  path  discovery    (online  mp3  playback    +  video  buffering)  

Audio  available  from  the  video  

Audio  from  the  mp3  file  

Video  buffering  end  

Sync  request  

Page 49: Multimodal pattern matching algorithms and applications

Ini$al  path  discovery  •  A  segment  of  the  audio  and  the  buffered  video  are  checked  for  alignment  using  forward-­‐DTW  

•  The  global  similarity  D(m,n)  at  each  loca$on  (m,n)  is  normalized  by  the  length  of  the  op$mum  path  to  that  loca$on  

•  At  each  step,  all  paths  with  D’(m,n)  <  Dave(*,n)  are  pruned.    

•  The  ini0al  alignment  is  selected  when  only  one  path  survives  or  the  sync  0me  is  reached.  

Page 50: Multimodal pattern matching algorithms and applications

Ini$al  path  discovery  

Audio  available  from  the  video  

Aud

io  being  played  from

 mp3

 

Audio  $me  alignment  buffer  (about  1s)  

Page 51: Multimodal pattern matching algorithms and applications

Ini$al  path  discovery  

Audio  available  from  the  video  

Aud

io  being  played  from

 mp3

 

Page 52: Multimodal pattern matching algorithms and applications

Ini$al  path  discovery  

Audio  available  from  the  video  

Aud

io  being  played  from

 mp3

 

Page 53: Multimodal pattern matching algorithms and applications

Ini$al  path  discovery  

Audio  available  from  the  video  

Aud

io  being  played  from

 mp3

 

Page 54: Multimodal pattern matching algorithms and applications

Real-­‐$me  online  alignment  •  Star$ng  from  the  ini$al  alignment  we  itera$vely  compute:    1.  Locally  op$mum  forward  path  for  L  steps:  p1…pL  

using  a)  local  constraints  (no  dynamic  programming)  

2.  Backward  (standard)  DTW  from  pL  to  p1  using  b)  local  constraints  

3.  Add  the  ini$al  p/2  steps  to  the  final  path,  and  start  1)  from  pL/2  un$l  the  playback  ends  

Page 55: Multimodal pattern matching algorithms and applications

Real-­‐$me  online  alignment  

Audio  available  from  the  video  

Aud

io  being  played  from

 mp3

 

Page 56: Multimodal pattern matching algorithms and applications

Real-­‐$me  online  alignment  

Audio  available  from  the  video  

Aud

io  being  played  from

 mp3

 

1)Forward  locally  best  path  with  L=8  

p1  

pL  

Page 57: Multimodal pattern matching algorithms and applications

Real-­‐$me  online  alignment  

Audio  available  from  the  video  

Aud

io  being  played  from

 mp3

 

2)stardard  DTW  

p1  

pL  

Page 58: Multimodal pattern matching algorithms and applications

Real-­‐$me  online  alignment  

Audio  available  from  the  video  

Aud

io  being  played  from

 mp3

 

3)Move  forward  the  new  star$ng  point  

p1  

Page 59: Multimodal pattern matching algorithms and applications

Alignment  postprocessing  •  Alignment  es$mates  every  100ms  are  not  enough  to  drive  25/30  fps  video  

•  An  interpola$on  of  the  points  +  averaging  over  5  seconds  gives  the  projec$on  es$mate  for  current  playback  

Page 60: Multimodal pattern matching algorithms and applications

Experiments  •  We  use  320  videos+mp3,  aligned  using  offline  DTW  and  manually  checked  for  consistency.  

•  Accuracy  is  computed  as  the  %  of  songs  with  average  error  <  some  ms.  

Average  accuracy  @100ms  for  different  video  buffer  lengths    

Page 61: Multimodal pattern matching algorithms and applications

Experiments  

Page 62: Multimodal pattern matching algorithms and applications

Video  Duplicate  Detec$on  Xavier  Anguera  and  Pere  Obrador  

Page 63: Multimodal pattern matching algorithms and applications

Let’s  say  you’re  looking  for  the  Bush  a+ack  video…  

Page 64: Multimodal pattern matching algorithms and applications

…and  you  get  11,100  results.  

Page 65: Multimodal pattern matching algorithms and applications

…ajer  40  minutes...  

watching  many  of  the  videos  returned  you  no$ce  that    many  are  similar,  i.e.  near  duplicates  

27%  in  average  in  Youtube  [Wu  et  al.,  2007]  12%  in  average  in  Youtube  [Anguera  et  al,  2009]  

Page 66: Multimodal pattern matching algorithms and applications

Near  duplicate  (NDVC)  defini$on  •  Iden$cal  or  approximately  iden$cal  videos,  that  differ  in  some  feature:  – file  formats,  encoding  parameters  – photometric  varia$ons  (color,  ligh$ng  changes)  – overlays  (cap$on,  logo,  audio  commentary)  

– edi$ng  opera$ons  (frames  add/remove)  –   seman$c  similarity  

NDVC  are  videos  that  are  “essen(ally  the  same”  

Page 67: Multimodal pattern matching algorithms and applications

Near  duplicates(NDVC)  vs.  Video  copies  

•  These  two  concepts  are  not  totally  well  discriminated  in  the  bibliography.  

•  Video  copy:  exact  video  segment,  with  some  transforma$ons  on  it  

•  Near  duplicate:  similar  videos  on  the  same  topic  (different  view  points,  seman$cally  similar  videos,  …)  

In  our  research  we  approach  the  video  copy  detec;on  

Page 68: Multimodal pattern matching algorithms and applications

Examples  of  video  copies  

Page 69: Multimodal pattern matching algorithms and applications

Use  Scenarios:  Copyright  law  enforcement  

Detec$on  of  copyright  infringing  videos  in  online  video  sharing  sites  

In  a  recent  study  we  found  that  in  average  12%  of  search  results  in  YouTube  are  copies  of  the  same  video  

Page 70: Multimodal pattern matching algorithms and applications

Currently  police  forces  usually  have  to  manually  scroll  through  ALL  materials  in  pederasty  cases  searching  for  evidence.  

Discover  illegal  content  hidden  within  other  videos  

Use  Scenarios:  Video  forensics  for  illegal  ac$vi$es  

Page 71: Multimodal pattern matching algorithms and applications

Database  management/op$miza$on  and  helping  in  searches  over  historic  contents  

Video  excerpts  used  several  $mes  

Use  Scenarios:  Database  management  

Page 72: Multimodal pattern matching algorithms and applications

Adver$sement  detec$on/iden$fica$on  

Programming  analysis  

Use  Scenarios:  adver$sement  detec$on  and  management  

Page 73: Multimodal pattern matching algorithms and applications

Use  Scenarios:  Informa$on  overload  reduc$on  

Improved  (more  diverse)  video  search  results  by  clustering  all  video  duplicates.  

George  Bush  

Before  clustering  

Ajer  clustering  

Page 74: Multimodal pattern matching algorithms and applications

Steps  in  Video  Duplicate  detec$on  

1.  Indexing  of  the  reference  videos  A.  Obtain  features  represen$ng  the  video  B.  Store  these  features  in  a  scalable  manner  

2.  Search  of  queries  within  the  reference  set  

Feature  extrac$on  References  indexing  

Ref  videos  

Query    video   Feature  extrac$on  

Search  for  duplicates  

Features  Database  

ONLINE  

OFFLINE  

Page 75: Multimodal pattern matching algorithms and applications

Ways  to  approach  near-­‐duplicate  video  detec$on  

•  Local  features  – Extracted  from  selected  frames  in  the  videos  

– Focus  on  local  characteris$cs  within  those  frames  

•  Global  features  – Extracted  from  selected  frames  or  from  all  the  video    

– Focus  on  overall  characteris$cs  

Page 76: Multimodal pattern matching algorithms and applications

Local  features  

•  Comes  from  the  previous  knowledge  on  image  copy  detec$on/near  duplicates  detec$on  

•  Steps:  – Keyframes  are  first  extracted  from  the  videos  at  regular  intervals  or  by  detec$ng  shots  

– Local  features  are  obtained  for  these  keyframes:  •  SIFT  •  SURF  •  HARRIS  •  …  

Page 77: Multimodal pattern matching algorithms and applications

Global  Features  

•  Features  are  extracted  either  from  the  whole  video  or  from  keyframes  by  looking  at  the  overall  image  (not  at  par$cular  points).  

In  our  work  we  extract  them  from  the  whole  video  

Page 78: Multimodal pattern matching algorithms and applications

Mul$modal  video  copy  detec$on  

•  Most  works  use  only  video/images  informa$on  – They  prefer  local  features  for  their  robustness  

•  We  introduce  audio  informa$on  by  combining  global  features  from  both  the  audio  and  video  tracks  

•  We  are  also  experimen$ng  on  fusing  local  features  with  global  features  (work  in  progress)  

Page 79: Multimodal pattern matching algorithms and applications

Mul$modal  global  features  

•  We  use  features  based  on  the  changes  in  the  data-­‐>  more  robust  to  transforma$ons  

•  Video:  – Hue  +  satura$on  interframe  change  –  Lightest  and  darkest  centroid  interframe  distance  

•  Audio:  –  Bayesian  informa$on  criterion  (BIC)  between  adjacent  segments  

–  Cross-­‐BIC  between  adjacent  segments  –  Kullback-­‐Leibler  divergence  (KL2)  between  adjacent  segments  

Page 80: Multimodal pattern matching algorithms and applications

Hue+Satura$on  interframe  change  

1.  Transform  the  colorspace  from  RGB  to  HSV  (Hue+Satura$on+Value)  

Page 81: Multimodal pattern matching algorithms and applications

Hue+Satura$on  interframe  change  

2.  Compute  for  each  2  consecu$ve  frames  their  HS  histogram  and  compute  their  intersec$on  as:  

Page 82: Multimodal pattern matching algorithms and applications

Lightest and darkest centroid interframe distance

1.  Find  the  lightest  and  darkest  regions  in  each  frame  and  obtain  its  centroid  

Page 83: Multimodal pattern matching algorithms and applications

Lightest and darkest centroid interframe distance

We  compute  the  euclidean  distance  between  each  two  adjacent  frames,  obtaining  two  global  feature  streams  

Page 84: Multimodal pattern matching algorithms and applications

Acous$c  features  

•  Compute  some  acous$c  distance  between  adjacent  acous$c  segments  

Segment  A   Segment  B  

GMM  A   GMM  B   GMM  A+B  

Page 85: Multimodal pattern matching algorithms and applications

Acous$c  features  (II)  

•  Likelihood-­‐based  metrics:  – Bayesian  Informa$on  Criterion  

– Cross-­‐BIC  

•  Model  distance  metrics:  – Kullback-­‐Leibler  divergence  (KL2)  

Page 86: Multimodal pattern matching algorithms and applications

Acous$c  features  (III)  

•  For  example:  the  Bayesian  Informa$on  Criterion  (BIC)  output:  

Page 87: Multimodal pattern matching algorithms and applications

Search  for  full  copies  •  For  each  video-­‐query  pair  we  compute  the  correla$on  of  each  feature  pair  

•  We  then  find  the  posi$ons  with  high  similarity  (peaks).  

Reference  

Possible  copy  

XFFT  

FFT  

IFFT   Find  peaks  

Page 88: Multimodal pattern matching algorithms and applications

Mul$modal  fusion  •  When  mul$ple  modali$es  are  available,  fusion  is  performed  on  the  correla$ons  

Page 89: Multimodal pattern matching algorithms and applications

Output  score  

•  The  resul$ng  score  is  computed  by  weighted  sum  of  the  different  modali$es’  normalized  dot  product  at  the  found  peak  

•  Automa$c  weights  are  obtained  via  

Page 90: Multimodal pattern matching algorithms and applications

Finding  subsegments  of  the  query  •  The  previously  described  algorithm  considers  the  whole  query  matches  a  por$on  of  the  reference  videos  

•  To  avoid  such  restric$on  a  modifica$on  to  the  algorithm  first  splits  the  query  into  overlaping  20s  segments  

•  By  accumula$ng  the  resul$ng  peaks  for  each  segment  we  can  obtain  the  main  delay  and  its  segment  

Page 91: Multimodal pattern matching algorithms and applications

Algorithm  performance  evalua$on  

•  To  test  the  algorithm  we  used  the  MUSCLE-­‐VCD  database:    – Over  100  hours  of  reference  videos  from  the  SoundVision  group  (Nederlands)  

– 2  test  sets  •  ST1:  15  query  videos  where  the  whole  query  is  considered  

•  ST2:  3  videos  with  21  segments  appearing  in  the  reference  database  

h+p://www-­‐roc.inria.fr/imedia/civr-­‐bench/benchMuscle.html  

Page 92: Multimodal pattern matching algorithms and applications

MUSCLE-­‐VCD  transforma$on  examples  

Page 93: Multimodal pattern matching algorithms and applications

Evalua$on  metrics  

•  We  use  the  same  metrics  as  in  the  MUSCLE-­‐VCD  benchmark  tests  

Page 94: Multimodal pattern matching algorithms and applications

Evalua$on  metrics  (II)  

•  We  also  use  the  more  standard  Precision  and  recall  metrics  

Page 95: Multimodal pattern matching algorithms and applications

Evalua$on  results  

Page 96: Multimodal pattern matching algorithms and applications

Evalua$on  results  histogram  for  ST1  

Page 97: Multimodal pattern matching algorithms and applications

Youtube  reranking  applica$on  •  We  downloaded  all  videos  searching  for  the  top  20  most  viewed  and  20  most  visited  videos  

Page 98: Multimodal pattern matching algorithms and applications

Youtube  reranking  applica$on  •  We  applied  mul$modal  copy  detec$on  and  grouped  all  near  duplicates  

Page 99: Multimodal pattern matching algorithms and applications

Youtube  Reranking  test    

•  Results  show  how  some  videos  have  mul$ple  clear  copies  that  can  boost  their  ranking  once  clustered  

Page 100: Multimodal pattern matching algorithms and applications

Thanks  for  your  aHen;on  

xanguera@$d.es  www.xavieranguera.com  

Linkedin:  h+p://es.linkedin.com/in/xanguera  Twi+er:  h+p://twi+er.com/xanguera  

Website:  h+p://www.xavieranguera.com/