17
1 1 Lecture 16: Earth-Mover Distance COMS E6998-9 F15

Lecture 16: Earth-Mover Distance - MIT

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

11

Lecture 16:

Earth-Mover Distance

COMS E6998-9 F15

Administrivia, Plan

ā€¢ Administrivia:

ā€“ NO CLASS next Tuesday 11/3 (holiday)

ā€¢ Plan:

ā€“ Earth-Mover Distance

ā€¢ Scriber?

2

Earth-Mover Distance

ā€¢ Definition:

ā€“ Given two sets š“, šµ of points in a metric space

ā€“ šøš‘€š·(š“, šµ) = min cost bipartite matching between š“ and šµ

ā€¢ Which metric space?

ā€“ Can be plane, ā„“2, ā„“1ā€¦

ā€¢ Applications in image vision

Images courtesy of Kristen Grauman

Embedding EMD into ā„“1

ā€¢ Why ā„“1?

ā€¢ At least as hard as ā„“1

ā€“ Can embed 0,1 š‘‘ into EMD with distortion 1

ā€¢ ā„“1 is richer than ā„“2

ā€¢ Will focus on integer grid Ī” 2:

4

Embedding EMD into ā„“1

ā€¢ Theorem: Can embed EMD over Ī” 2 into ā„“1 with distortion š‘‚ log Ī” . In fact, will construct a randomized š‘“: 2 Ī” 2

ā†’ ā„“1 such that:ā€“ for any š“, šµ āŠ‚ Ī” 2:

šøš‘€š· š“, šµ ā‰¤ š‘¬ ||š‘“ š“ āˆ’ š‘“ šµ ||1 ā‰¤ š‘‚ log Ī” ā‹… šøš‘€š·(š“, šµ)

ā€“ time to embed a set of š‘  points: š‘‚ š‘  logĪ” .

ā€¢ Consequences:ā€“ Nearest Neighbor Search: š‘‚(š‘ logĪ” ) approximation with

š‘‚(š‘ š‘›1+1/š‘) space, and š‘‚(š‘›1/š‘ ā‹… š‘  log Ī”) query time.

ā€“ Computation: š‘‚(logĪ”) approximation in š‘‚ š‘  logĪ” timeā€¢ Best known: 1 + šœ– approximation in š‘‚ š‘  time [ASā€™12]

[Charikarā€™02, Indyk-Thaperā€™03]

What if š“ ā‰  |šµ| ?

ā€¢ Suppose:

ā€“ š“ = š‘Ž

ā€“ šµ = š‘ < š‘Ž

ā€¢ Define

šøš‘€š·āˆ† š“, šµ = Ī” š‘Ž āˆ’ š‘ + š‘šš‘–š‘›š“ā€²,šœ‹

š‘Žāˆˆš“ā€²

š‘‘(š‘Ž, šœ‹ š‘Ž )

where

š“ā€² ranges over all subsets of š“ of size š‘

šœ‹: š“ā€² ā†’ šµ ranges over all 1-to-1 mappings

For optimal š“ā€², call š‘Ž āˆˆ š“\š“ā€² unmatched

6

Embedding EMD over small gridā€¢ Suppose = 3

ā€¢ š‘“(š“) has nine coordinates, counting # points in each integer point

ā€“ š‘“(š“) = (2,1,1,0,0,0,1,0,0)

ā€“ š‘“(šµ) = (1,1,0,0,2,0,0,0,1)

ā€¢ Claim: 2 2 distortion embedding

High level embedding

ā€¢ Set in Ī” 2 box

ā€¢ Embedding of set š“:ā€“ take a quad-tree

ā€¢ grid of cell size Ī”/3ā€¢ partition each cell in 3x3

ā€¢ recurse until of size 3x3

ā€“ randomly shift it

ā€“ Each cell gives a

coordinate:

š‘“ (š“)š‘=#points in the

cell š‘

ā€¢ Want to proveš‘¬ š‘“ š“ āˆ’ š‘“ šµ

1ā‰ˆ šøš‘€š·(š“, šµ)

8

2 2

1 0

02 11

1

0 00

0 0 0

0

0 2

21

š‘“ š‘Ø = ā€¦2210ā€¦0002ā€¦0011ā€¦0100ā€¦0000ā€¦

š‘“ š‘© = ā€¦1202ā€¦0100ā€¦0011ā€¦0000ā€¦1100ā€¦

Main idea: intuition

ā€¢ Decompose EMD over []2 into EMDs over smaller grids

ā€¢ Recursively reduce to = š‘‚(1)

+ā‰ˆ

Decomposition Lemma

ā€¢ For randomly-shifted cut-grid šŗ of side length š‘˜, will prove:1) šøš‘€š·(š“, šµ) ā‰¤ šøš‘€š·š‘˜(š“1, šµ1) + šøš‘€š·š‘˜(š“2, šµ2) + ā‹Æ

+ š‘˜ ā‹… šøš‘€š·Ī”/š‘˜(š“šŗ, šµšŗ)

2) šøš‘€š·Ī” š“, šµ ā‰„1

3š‘¬[šøš‘€š·š‘˜ š“1, šµ1 + šøš‘€š·š‘˜ š“2, šµ2 + ā‹Æ]

3) šøš‘€š·Ī” š“, šµ ā‰„ š‘¬[š‘˜ ā‹… šøš‘€š·Ī”/š‘˜(š“šŗ , šµšŗ)]

ā€¢ The distortion will

follow by applying the lemma

recursively to (AG,BG)

/š‘˜

š‘˜

1 (lower bound)ā€¢ Claim 1: for a randomly-shifted cut-grid šŗ of side length š‘˜:

šøš‘€š·(š“, šµ) ā‰¤ šøš‘€š·š‘˜(š“1, šµ1) + šøš‘€š·š‘˜(š“2, šµ2) + ā‹Æ+ š‘˜ ā‹… šøš‘€š·Ī”/š‘˜(š“šŗ , šµšŗ)

ā€¢ Construct a matching šœ‹ for šøš‘€š·Ī” š“, šµ from the matchings on RHS as follows

ā€¢ For each š‘Žš“ (suppose š‘Žš“š‘–) it is either:1) matched in šøš‘€š·(š“š‘– , šµš‘–) to some š‘šµš‘– (if š‘Ž āˆˆ š“š‘–ā€²)

ā€¢ then šœ‹ š‘Ž = š‘

2) or š‘Ž āˆ‰ š“š‘–ā€², and then it is matched

in šøš‘€š·(š“šŗ , šµšŗ) to some š‘šµš‘— (š‘— ā‰  š‘–)ā€¢ then šœ‹ š‘Ž = š‘

ā€¢ Cost? 1) paid by šøš‘€š·(š“š‘– , šµš‘–)2) Move š‘Ž to center ()

ā€¢ Charge to šøš‘€š·(š“š‘– , šµš‘–)

Move from cell š‘– to cell š‘—ā€¢ Charge š‘˜ to šøš‘€š·(š“šŗ , šµšŗ)

ā€¢ If š“ > |šµ|, extra |š“| āˆ’ |šµ|

pay š‘˜ ā‹…Ī”

š‘˜= Ī” on LHS & RHS

/š‘˜

š‘˜

2 & 3 (upper bound)

ā€¢ Claims 2,3: for a randomly-shifted cut-grid šŗ of side length š‘˜, we have:

2) šøš‘€š·Ī” š“, šµ ā‰„1

3š‘¬[šøš‘€š·š‘˜ š“1, šµ1 + šøš‘€š·š‘˜ š“2, šµ2 + ā‹Æ]

3) šøš‘€š·Ī” š“, šµ ā‰„ š‘¬[š‘˜ ā‹… šøš‘€š·Ī”/š‘˜(š“šŗ , šµšŗ)]

ā€¢ Fix a matching šœ‹ minimizing šøš‘€š·Ī”(š“, šµ)ā€“ Will construct matchings for each EMD on RHS

ā€¢ Uncut pairs š‘Ž, š‘ āˆˆ šœ‹ are matched in respective (š“, šµ)ā€¢ Cut pairs š‘Ž, š‘ āˆˆ šœ‹ :

ā€“ are unmatched in their mini-grids

ā€“ are matched in (š“šŗ , šµšŗ)

3: Cost

ā€¢ Claim 2: ā€¢ 3 ā‹… šøš‘€š·Ī” š“,šµ ā‰„ š‘¬[šøš‘€š·š‘˜ š“1, šµ1 + šøš‘€š·š‘˜ š“2, šµ2 + ā‹Æ]

ā€¢ Uncut pairs (š‘Ž, š‘) are matched in respective (š“š‘– , šµš‘–)ā€“ Total contribution from uncut pairs ā‰¤ šøš‘€š·Ī”(š“, šµ)

ā€¢ Consider a cut pair (š‘Ž, š‘) at distance š‘Ž āˆ’ š‘ = (š‘‘š‘„ , š‘‘š‘¦)ā€“ (š‘Ž, š‘) can contribute to RHS as they may be unmatched in their

own mini-grids

ā€“ Pr[(š‘Ž, š‘) cut] = 1 āˆ’ 1 āˆ’š‘‘š‘„

š‘˜ +1 āˆ’

š‘‘š‘¦

š‘˜ +ā‰¤

š‘‘š‘„

š‘˜+

š‘‘š‘¦

š‘˜ā‰¤

1

š‘˜||š‘Ž āˆ’ š‘||2

ā€“ Expected contribution of (š‘Ž, š‘) to RHS:ā‰¤Pr[(š‘Ž, š‘) cut] ā‹… 2š‘˜ ā‰¤ 2 š‘Ž āˆ’ š‘

2

ā€“ Total expected cost contributed to RHS:2 ā‹… šøš‘€š·Ī”(š“, šµ)

ā€¢ Total (cut & uncut pairs): 3 ā‹… šøš‘€š·Ī”(š“, šµ)

š‘‘š‘„

š‘˜

3: Cost

ā€¢ Claim:

ā€“ šøš‘€š·Ī” š“, šµ ā‰„ š‘¬[š‘˜ ā‹… šøš‘€š·Ī”/š‘˜(š“šŗ , šµšŗ)]

ā€¢ Uncut pairs: contribute zero to RHS!

ā€¢ Cut pair: š‘Ž, š‘ āˆˆ šœ‹ with š‘Ž āˆ’ š‘ = (š‘‘š‘„, š‘‘š‘¦)

ā€“ if š‘‘š‘„ = š‘„š‘˜ + š‘Ÿš‘˜, and š‘‘š‘¦ = š‘¦š‘˜ + š‘Ÿš‘¦ , then

ā€“ expected cost contribution to š‘˜ ā‹… šøš‘€š·Ī”/š‘˜(š“šŗ , šµšŗ):

ā‰¤ š‘„ +š‘Ÿš‘„

š‘˜ā‹… š‘˜ + š‘¦ +

š‘Ÿš‘¦

š‘˜ā‹… š‘˜ = š‘‘š‘„ + š‘‘š‘¦ = š‘Ž āˆ’ š‘

2

ā€¢ Total expected cost ā‰¤ šøš‘€š·Ī”(š“, šµ)

14

š‘‘š‘„

š‘˜ š‘˜ š‘˜

Recurse on decompositionā€¢ For randomly-shifted cut-grid šŗ of side length š‘˜, we have:

1) šøš‘€š·(š“, šµ) ā‰¤ šøš‘€š·š‘˜(š“1, šµ1) + šøš‘€š·š‘˜(š“2, šµ2) + ā‹Æ

+ š‘˜ ā‹… šøš‘€š·Ī”/š‘˜(š“šŗ, šµšŗ)

2) šøš‘€š·Ī” š“, šµ ā‰„1

3š‘¬[šøš‘€š·š‘˜ š“1, šµ1 + šøš‘€š·š‘˜ š“2, šµ2 + ā‹Æ]

3) šøš‘€š·Ī” š“, šµ ā‰„ š‘¬[š‘˜ ā‹… šøš‘€š·Ī”/š‘˜(š“šŗ , šµšŗ)]

ā€¢ We applying decomposition recursively for š‘˜ = 3ā€“ Choose randomly-shifted cut-grid šŗ1 on []2

ā€“ Obtain many grids [3]2, and a big grid [/3]2

ā€“ Then choose randomly-shifted cut-grid šŗ2 on [/3]2

ā€“ Obtain more grids [3]2, and another big grid [/9]2

ā€“ Then choose randomly-shifted cut-grid šŗ3 on [/9]2

ā€“ ā€¦

ā€¢ Then, embed each of the small grids [3]2 into ā„“1, using š‘‚(1)distortion embedding, and concatenate the embeddingsā€“ Each 3 2 grid occupies 9 coordinates on ā„“1 embedding

15

Proving recursion worksā€¢ Claim: embedding contracts distances by š‘‚ 1 :

šøš‘€š·(š“, šµ) ā‰¤

ā‰¤ āˆ‘š‘– šøš‘€š·š‘˜ š“š‘–, šµš‘– + š‘˜ ā‹… šøš‘€š·Ī”/š‘˜(š“šŗ1, šµšŗ1

)

ā‰¤ āˆ‘š‘– šøš‘€š·š‘˜ š“š‘–, šµš‘– + š‘˜āˆ‘š‘– šøš‘€š·š‘˜ š“šŗ1,š‘–, šµšŗ1,š‘–

+š‘˜ ā‹… šøš‘€š· Ī”

š‘˜2š“šŗ2

, šµšŗ2

ā‰¤ ā€¦ā‰¤ sum of šøš‘€š·3 costs of 3 Ɨ 3 instances

ā‰¤1

2 2||š‘“ š“ āˆ’ š‘“ šµ ||1

ā€¢ Claim: embedding distorts distances by š‘‚ log Ī” in expectation:3 logš‘˜ Ī” ā‹… šøš‘€š·(š“, šµ)

ā‰„ 3 ā‹… šøš‘€š·(š“, šµ) + 3 logš‘˜Ī”

š‘˜ā‹… šøš‘€š·Ī”(š“, šµ)

ā‰„ š„[ āˆ‘š‘– šøš‘€š·š‘˜(š“š‘–, šµš‘–) + 3 logš‘˜Ī”

š‘˜ā‹… š‘˜ ā‹… šøš‘€š·Ī”/š‘˜(š“šŗ1

, šµšŗ1) ]

ā‰„ ā‹Æā‰„ sum of šøš‘€š·3 costs of 3 Ɨ 3 instances

ā‰„ ||š‘“ š“ āˆ’ š‘“ šµ ||1

Final theoremā€¢ Theorem: can embed EMD over [Ī”]2 into ā„“1

with š‘‚(log Ī”) distortion in expectation.

ā€¢ Notes:ā€“ Dimension required: š‘‚(Ī”2), but a set š“ of size š‘ 

maps to a vector that has only š‘‚(š‘  ā‹… log Ī”) non-zero coordinates.

ā€“ Time: can compute in š‘‚(š‘  ā‹… log )ā€“ By Markovā€™s, itā€™s š‘‚(log Ī”) distortion with 90%

probability

ā€¢ Applications:ā€“ Can compute šøš‘€š·(š“, šµ) in time š‘‚(š‘  ā‹… log Ī”)ā€“ NNS: š‘‚(š‘ ā‹… log Ī”) approximation, with š‘‚(š‘›1+1/š‘ ā‹…

š‘ ) space, and š‘‚(š‘›1/š‘ ā‹… š‘  ā‹… log Ī”) query time.