21
ରԠ୳ͷΊͷಛද ݱ߉ †† ౻٢ ߂ †† גձσϯιʔΞΠςΟʔϥϘϥτϦ ˟ 150–0002 ژौ୩ौ୩ೋஸ 15 1 ߸ ौ୩Ϋϩελϫʔ 28 ֊ †† த෦େ ˟ 487–8501 Ѫݝय़Ҫদຊொ 1200 E-mail: [email protected], ††{tkhr@vision., hf@}cs.chubu.ac.jp Β· ରԠ୳ͱɼΔରΛҟͳΔΒӨಘΒΕෳͷըΒɼཧతʹಉҰͷݸॴʢର ԠʣݟΔॲཧͷͱΛɽຊߘͰɼաʹڈఏҊΕख๏Λɼ(1) ϕΫτϧͰہॴಛΛද ݱΔ๏ɼ(2) ಛͰہॴಛΛදݱΔ๏ɼ(3) शʹΑΔ๏ɼҎͷͷ؍Β·ͱΊ Δɽ(1) ʹɼաʹڈଟͷղઆจදྉ։ΕΔΊɼຊߘͰಛʹ (2) ͷೋಛʹΑΔ ಛදʹݱղઆΔɽΕɼہॴಛΛͷϕΫτϧͰදݱɼΘΓʹʙඦݸͷ 0 ͱ 1 ͷʢͳΘೋͷϕΫτϧʣͰදݱΔͱͷͰΔɽͷ๏ʹΑΓɼܭɾϝϞϦফඅ ͷඈ༂తʹվળΕͱΒɼίϯϐϡʔλϏδϣϯͷڀݚͷ෯৺ΛΊΔΑʹͳɽͰɼ SIFT SURF ʹΘΔ৽ͳہॴಛͱٸʹٴΔɽΕʹՃɼޙࠓͷτϨϯυͱͳΔͱΘΕ Δ (3) शʹΑΔख๏ʹɼදతͳख๏ΛհΔɽ·ɼڀݚతʹར༻ՄͳιϑτΣ ΞɼσʔληοτɼΑͼ๏ʹͷϊϋʹซհΔɽ Ωʔϫʔυ ಛɼशɼରԠ୳Local feature description for keypoint matching Mitsuru AMBAI , Takahiro HASEGAWA †† , and Hironobu FUJIYOSHI †† Denso IT Laboratory, Inc. Shibuya CROSS TOWER 28th Floor 2–15–1 Shibuya Shibuya-ku Tokyo, 150–0002 Japan †† Chubu University 1200, Matsumoto, Kasugai, Aichi, 487–8501 Japan E-mail: [email protected], ††{tkhr@vision., hf@}cs.chubu.ac.jp Abstract A task of nding physically the sample points among multiple images captured from different viewpoints is called as key point matching for which discriminative local feature representation is very important. We categorize proposed methods in the past into three groups: (1) real-valued feature representation, (2) binary feature representation and (3) feature represen- tation by deep learning, respectively. In this paper, we briey overview the classic real-valued feature representation and especially focus on investigating the recent binary feature representation. In addition, we introduce a new trend that utilizes deep learning for nonlinear feature representation. Open source software, dataset and implementation techniques are also reviewed. Key words Binary features, deep learning, keypoint matching 1. Ίʹ ہॴಛɼըͷϚονϯάΛՄͱΔڧͳπʔ ϧͰΓɼମɼݩݩɼըࡧݕͳͲɼʑͳΞϓ Ϧέʔγϣϯͷඈ༂తͳਐาʹݙߩɽہॴಛɼ ΩʔϙΠϯτݕग़ͱಛهड़ͱೋஈ֊ͷॲཧΒͳΔɽ ΩʔϙΠϯτݕग़ͱಛهड़ͷΛɼزԿతมԽʢճ సɼεέʔϧɼΞϑΟϯมܗʣরมԽʹରؤʹͳ ΔΑʹઃܭΔϙΠϯτͰΓɼScale-Invariant Feature Transform (SIFT) [20] ΛΊͱɼଟͷہॴಛ ఏҊΕɽຊߘͰɼΩʔϙΠϯτݕग़ʹѻΘ ɼಛͷදݱ๏ʹհΔɽ ͷಈΛݟͱɼରԠ୳ͷΊͷಛදݱɼେͷʹͰΔΑͰΔɽ ʢ 1 ʣ ϕΫτϧͰہॴಛΛදݱΔ๏ (1999ʙ) ʢ 2 ʣ ೋϕΫτϧͰہॴಛΛදݱΔ๏ (2010ʙ) ⯡♫ᅋἲே 㟁Ꮚሗ㏻ಙᏛ ಙᏛᢏሗ ڇڮڞڤکڪڭگڞڠڧڠ ڡڪ ڠگڰگڤگڮکڤ ڠڣگ ۏۍۊۋۀڭ ۇڼھۄۉۃھۀگ ڠڞڤڠڤ ڮڭڠڠکڤڢکڠ کڪڤگڜڞڤکڰڨڨڪڞ ڟکڜ کڪڤگڜڨڭڪڡکڤ ڄڍڌڈڐڌڋڍڃڏڋڌڈڐڌڋڍڰڨڭګCopyright ©201 by IEICE 5 This article is a technical report without peer review, and its polished and/or extended version may be published elsewhere. - 53 -

0 : s g w h w à ¯ q - MPRGmprg.jp/data/MPRG/F_group/F168_ambai2015.pdf · This article is a technical report without peer review, and its polished and/or extended version may be

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 0 : s g w h w à ¯ q - MPRGmprg.jp/data/MPRG/F_group/F168_ambai2015.pdf · This article is a technical report without peer review, and its polished and/or extended version may be

THE INSTITUTE OF ELECTRONICS,INFORMATION AND COMMUNICATION ENGINEERS

TECHNICAL REPORT OF IEICE.

† †† ††

†150–0002 15 1 28

†† 487–8501 1200E-mail: †[email protected], ††{tkhr@vision., hf@}cs.chubu.ac.jp

(1)(2) (3)

(1) (2)

0 1

SIFT SURF(3)

Local feature description for keypoint matching

Mitsuru AMBAI†, Takahiro HASEGAWA††, and Hironobu FUJIYOSHI††

† Denso IT Laboratory, Inc.Shibuya CROSS TOWER 28th Floor 2–15–1 Shibuya Shibuya-ku Tokyo, 150–0002 Japan

†† Chubu University 1200, Matsumoto, Kasugai, Aichi, 487–8501 JapanE-mail: †[email protected], ††{tkhr@vision., hf@}cs.chubu.ac.jp

Abstract A task of finding physically the sample points among multiple images captured from different viewpoints is calledas key point matching for which discriminative local feature representation is very important. We categorize proposed methodsin the past into three groups: (1) real-valued feature representation, (2) binary feature representation and (3) feature represen-tation by deep learning, respectively. In this paper, we briefly overview the classic real-valued feature representation andespecially focus on investigating the recent binary feature representation. In addition, we introduce a new trend that utilizesdeep learning for nonlinear feature representation. Open source software, dataset and implementation techniques are alsoreviewed.Key words Binary features, deep learning, keypoint matching

1.Scale-Invariant Feature

Transform (SIFT) [20]

1 (1999 )2 (2010 )

— 1 —

Copyright ©201 by IEICE 5This article is a technical report without peer review, and its polished and/or extended version may be published elsewhere.

- 53 -

Page 2: 0 : s g w h w à ¯ q - MPRGmprg.jp/data/MPRG/F_group/F168_ambai2015.pdf · This article is a technical report without peer review, and its polished and/or extended version may be

3 (2015 )SIFT 10

1

SIFT

Convolutional Neural Network

(1)

(2)2010 2015

(3)

2

1. 12.

3. 6.

4. 5. 6. 4. 5.

7.

8.

2013 11 CVIM[40]

[42]

1 Binary Features, Binary Descriptors, Binary Codes

2 ICCV2015

2.

SIFT 1SIFT Difference of Gaussian(DoG)

128SIFT

[41]SIFT

SIFTGLOH [21]

PCA-SIFT [18] SURF [7]RIFF [33]

ASIFT [22] 2010 6[41]

SIFT [20] SURF [7] PCA-SIFT [18] GLOH [21]

2. 1SIFT SURF

[23], [25]

1SIFT 128

unsigned char 1128byte

1,0001, 000 × 128byte ≈ 125Kbyte

22 x1,x2 ∈ R

D (1)(2)dE dθ

dE(x1,x2) =√

(x1 − x2)�(x1 − x2) (1)

dθ(x1,x2) = arccos

(x�1 x2

|x1||x2|)

(2)

D D

D − 1 1

SIFT D

— 2—- 54 -

Page 3: 0 : s g w h w à ¯ q - MPRGmprg.jp/data/MPRG/F_group/F168_ambai2015.pdf · This article is a technical report without peer review, and its polished and/or extended version may be

1 SIFT

2

3.

B y

y ∈ {−1,+1}B (3)

y ∈ {0,+1}B (4)

y

(3) (4)

Binary Hashing(3)

(3)(4) y B 01 3

2XOR 1

2CPU

3 (3) -1 0

2

3. 1

3

1

2

64bits SIFT[34]

4.

2010 BRIEF [12]

— 3 —- 55 -

Page 4: 0 : s g w h w à ¯ q - MPRGmprg.jp/data/MPRG/F_group/F168_ambai2015.pdf · This article is a technical report without peer review, and its polished and/or extended version may be

BRIEF[ECCV2010]

BinBoost[CVPR2013]

BRISK[ICCV2011]

CARD[ICCV2011]

FREAK[CVPR2012]

LDAHash[PAMI2012]

ORB[ICCV2011]

D-BRIEF[ECCV2012]

RandomProjections

2010 2011 2012 2013

unsupervisedlearning

supervisedlearning

Direct approach

Indirect approach

without training

BOLD[CVPR2015]

2015

improvedHamming distance

3

4 BRIEF

BRISK [19] ORB [27]

ORB OpenCV Willow GarageOpenCV

ORB FREAK [3]

D-BRIEF [35] BinBoost [34]positive negative

positive negative

BOLD [6]

4. 1 BRIEF: Binary Robust Independent Elementary Fea-tures

Calonder [12] BRIEF2

2 ui,vi ∈ R2 B

i = 1, · · · , BI(ui), I(vi) I(ui)− I(vi)

1, 0 B bitsui,vi

4

2

BRIEFSIFT

SURF2

BRISK ORB4. 2 BRISK: Binary Robust Invariant Scalable KeypointsLeutenegger [19] BRISK 2

BRIEF

FAST [26]

xy

3

— 4 —- 56 -

Page 5: 0 : s g w h w à ¯ q - MPRGmprg.jp/data/MPRG/F_group/F168_ambai2015.pdf · This article is a technical report without peer review, and its polished and/or extended version may be

(a) (b) L (c) S5 BRISK

BRISK4

60 5(a) BRIEF2

BRISK 60

5(a)60

ui ∈ R2 (i = 1, · · · , 60)

BRISK δmin

Lδmax

S5(b)(c) L,S

L

S

L(ui,uj) ∈ L

(5)

g(ui,uj) = (uj − ui) · I(uj , σj)− I(ui, σi)

||uj − ui||2 (5)

I(ui, σi), I(uj , σj) σi, σj

g(ui,uj) uj ,ui

5(b)α L

g

g = (gx, gy)� =

1

|L|∑

(ui,uj)∈Lg(ui,uj) (6)

α = atan2(gy, gx) (7)

α ui,uj uαi ,u

αj

(8)

yij =

⎧⎨⎩1 I(uα

j , σj) > I(uαi , σi)

0 (otherwise), ∀(ui,uj) ∈ S (8)

SS 512

δmax

5124. 3 ORB: Oriented FAST and Rotated BRIEFBRISK ORB BRIEF

2FAST

BRISK2

ORB α

0,1(intensity centroid, C) O C

α O

mpq C

mpq =∑x,y

xpyqI(x, y) (9)

C =

(m10

m00,m01

m00

)(10)

α

α = atan2(m01,m10) (11)

ORB

yi 0 1 50%yi, yj

ORB[36], [38]

— 5 —- 57 -

Page 6: 0 : s g w h w à ¯ q - MPRGmprg.jp/data/MPRG/F_group/F168_ambai2015.pdf · This article is a technical report without peer review, and its polished and/or extended version may be

6

yi, yj yi

yi 100% 0yi yi

0 1yi

6 8 A EAB

CDE 01

CDE

D E

E C

C D

205,590

Greedy

7 ORB

1

23

4 3 256

256 7 ORB64

4. 4 FREAK: Fast Retina KeypointAlahi [3] FREAK ORB

BRISK

FREAK

— 6—- 58 -

Page 7: 0 : s g w h w à ¯ q - MPRGmprg.jp/data/MPRG/F_group/F168_ambai2015.pdf · This article is a technical report without peer review, and its polished and/or extended version may be

8 FREAK

8 FREAK

Alahi

FREAK ORBBRISK 43

ORB Greedy

512Alahi 512

FREAK 512Greedy

coarse-to-fine

FREAK 512128 4 4

128

90%128

FREAK

BRISK4. 5 D-BRIEF: Discriminative BRIEFBRIEF, BRISK, ORB, FREAK

2

x 4

x

i yi (12) 5

yi = sgn(w�i x+ τi) (12)

τi = 0

wi +1 -10

wi

9BRIEF, BRISK, ORB, FREAK wi

Trzcinski [35]D-BRIEF

wi τi

10 wi

positive

negative 11 positivenegative wi τi

wi

wi x

D-BRIEF

wi

w�i x

wi ≈ Dsi (13)

D si si

wi

12D-BRIEF (14)

min(si,τi)

∑i∈1,··· ,B∑(x,x′)∈N

sgn((Dsi)�x+ τi)sgn((Dsi)

�x′ + τi)

4 N × N

x N2

5 5. Binary Hashing

— 7 —- 59 -

Page 8: 0 : s g w h w à ¯ q - MPRGmprg.jp/data/MPRG/F_group/F168_ambai2015.pdf · This article is a technical report without peer review, and its polished and/or extended version may be

9 BRIEF, BRISK, ORB, FREAK wi

10 D-BRIEF wi( [35]1 )

−∑

(x,x′)∈Psgn((Dsi)

�x+ τi)sgn((Dsi)�x′ + τi)

+ λ|si|1subject to (Dsi)

�(Dsj) = δij (14)

(x,x′) ∈ P positive (x,x′) ∈ N negativeλ si

λ si wi

δij i = j 1

0 (Dsi)�(Dsj) = δij

ORB

(14)x,x′

y,y′ (14)

min(si,τi)

∑(x,x′)∈N

y�y′ −∑

(x,x′)∈Py�y′ + λ|si|1 (15)

NP

si, τi

D-BRIEF y,y′ ∈ {−1,+1}BdH(y,y′) y�y′

(16)

y�y′ = B − 2dH(y,y′) (16)

y,y′

y y′

(14) PN

(a) positive

(b) negative11

wi, τi

D-BRIEF (14)(14) si, τi

sgn L1 si wi

(17)

{w0i } = argmin{wi}

∑i

∑(x,x′)∈P(w

�i (x− x′))2∑

(x,x′)∈N (w�i (x− x′))2

(17)

wi Linear Discriminant Embed-ding (LDE) [11]

sgn

τi LDE wi

τi (14)τi

6

w0i w0

i

Dsi si

6 τi

LDAHash [32]

— 8 —- 60 -

Page 9: 0 : s g w h w à ¯ q - MPRGmprg.jp/data/MPRG/F_group/F168_ambai2015.pdf · This article is a technical report without peer review, and its polished and/or extended version may be

wi

si

{s0i } = argminsi ||w0i −Dsi||22 + λ|si|1 (18)

4. 6 BinBoostTrzcinski [34] BinBoost

K

1

yi(x) = sgn(w�i hi(x)) (19)

x yi ∈ {−1,+1} i hi(x) =

(hi,1(x), hi,2(x), · · · , hi,K(x))� ∈ RK K

wi ∈ RK hi,j(x)

x

D-BRIEF

hi,j(x)

R, e T 3R

e T

−1 +1

13 K

hi(x)

K B

B ×K (14)BinBoost AdaBoost

N

{(xn,x′n, ln)}Nn=1 xn x′

n positiveln = +1, negative ln = −1

(20) wi hi

L = min{wi,hi}Bi=1

N∑n=1

exp(−γln

B∑i=1

ci(xn,x′n;wi,hi)) (20)

γ ci

ci(xn,x′n;wi,hi) = yi(x)yi(x

′) (21)

= sgn(w�i hi(x))sgn(w

�i hi(x

′))

ln = +1

ln = −1

(16)

AdaBoost i = 1

hi wi (20)

(22)

maxwi,hi

N∑n=1

lnWi(n)ci(xn,x′n;wi,hi) (22)

Wi(n)

Wi(n) = exp(−γln

i−1∑i′=1

ci′(xn,x′n;wi,hi)) (23)

1, · · · , i − 1

AdaBoost

ci sgn

(22) sgn

(24)

maxwi,hi

w�i (

N∑n=1

lnWi(n)hi(x)hi(x′)�)wi (24)

hi

Wi(n)

BinBoost [30]K

wi (24) hi

(24)

maxwi

w�i Mwi (25)

M =

N∑n=1

lnWi(n)hi(x)hi(x′)� (26)

(19) wi

||wi||2 = 1

(25) M

i K

hi(x) wi

B

64 SIFT

4. 7 BOLD: Binary Online Learned DescriptorBalntas BOLD [6] BRIEF BRISK ORB

FREAK

[12], [19], [27], [3]

BOLD

— 9—- 61 -

Page 10: 0 : s g w h w à ¯ q - MPRGmprg.jp/data/MPRG/F_group/F168_ambai2015.pdf · This article is a technical report without peer review, and its polished and/or extended version may be

12 wi [35] 1

(a)

(b)13

14 BinBoost 1 K

B B ×K

15 BOLD

BOLD 15

ORB

GreedyORB 256 BOLD

512

512

( 15)512

0( )

— 10 —- 62 -

Page 11: 0 : s g w h w à ¯ q - MPRGmprg.jp/data/MPRG/F_group/F168_ambai2015.pdf · This article is a technical report without peer review, and its polished and/or extended version may be

( =0) 1 (|=0) 0 β

p, q yp,yq dH(yp,yq)

βp,βq (27)

dH(yp,yq) =1

Gpβp ∧ yp ⊕ yq +

1

Gqβq ∧ yp ⊕ yq (27)

G β 1

2BOLD

5.

Binary Hashing

web

(28)

y = sgn(Px+ t) (28)

x D y B t

P B D P t

Random projec-tions [5], [13]

(CARD [4], LDAHash [32]) CARDLDAHash

5. 1 Random ProjectionsRandom projections [5], [13] Binary Hashing

P

B → ∞

2 (D = 2)16 2

1 x1,x2

θ

(28)t = 0 P

2

sgn(p�i x1) |= sgn(p�

i x2) sgn(p�i x1) = sgn(p�

i x2)

16 Random projections

0 100 200 300 400 500 600 700 800 900 10000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

17 Random Projections

Random projections P 2

θ x1 x2

θ x1 x2

x1

x2 θ (29)

Pr[sgn(p�i x1) |= sgn(p�

i x2)] =θ

π(29)

(29) pi

x1 x2

B → ∞B

Random projections

17 x1 = (1, 0)�, x2 = (0, 1)�

θ π θ/π random projectionsx y

θ/π = 0.5

500

SIFT SURF(mean centering) Random pro-

jections Random projections

— 11 —- 63 -

Page 12: 0 : s g w h w à ¯ q - MPRGmprg.jp/data/MPRG/F_group/F168_ambai2015.pdf · This article is a technical report without peer review, and its polished and/or extended version may be

18 CARD

Random Projections

5. 2 CARD: Compact And Real-time DescriptorsAmbai [4] CARD SIFT

2(1)

(2)BinaryHashing P

CARDBRISK ORB

Binary Hashing

5. 2. 1

SIFT CARDK

L

CARD4 × 4

18 [21]GLOH K = 17, L = 8

136SIFT

α SIFT 2

CARDα = 0, 1, · · · ,M − 1

CARD

α M

M

19 M = 40 α = 0, 3

(x, y)�

(a) α = 0 (b) α = 3

19

20

L

M θ(x, y)

M L

M L

θ(x, y)

(30)

l = QL(Q−1M (θ(x, y)− α)) (30)

l

l = 0, · · · , L − 1 QN (·)0 N − 1

Q−1N (·) θ(x, y) − α

QL(Q−1M (·))

(30) θ(x, y) α

M 20M ×M

19 20 2

5. 2. 2(31) Binary Hashing

y = sgn(Px) (31)

x

t

Binary HashingP B D (31)

— 12 —- 64 -

Page 13: 0 : s g w h w à ¯ q - MPRGmprg.jp/data/MPRG/F_group/F168_ambai2015.pdf · This article is a technical report without peer review, and its polished and/or extended version may be

B ×D B × (D − 1)

2 P

1 ( )P

2 P S

−1, 0, 1

W

(1)(2) (31)

S

C(P) =∑

(x,x′)∈Ω

(dθ(x,x

′)π

− dh(y,y′)

B

)2

(32)

Ω (x,x′)

(y,y′) P i j

pij

pij ∈ {−1, 0,+1},∑ij

|pij | = S (33)

(33) (32) P

greedy

P

P 90%

B

128

5. 3 LDAHashStrecha [32] LDAHash

2(x,x′) ∈ P

2(x,x′) ∈ N P

NP t

LDAHash(34)

L = αE{dH(y,y′)|P} − E{dH(y,y′)|N} (34)

dH y y′ E{·|P},E{·|N} P,N P,N

(34) (34)(35)(36)

L = E{y�y′|N} − αE{y�y′|P} (35)

L = αE{||y − y′||2|P} − E{||y − y′||2|N} (36)

y ∈ {−1,+1}B (34) (35)dH y�y′ (16)

(34) (36)

||y − y′||2 = ||y||2 − 2y�y′ + ||y′||2 (37)

(16) (37) (38)

||y − y′||2 = 4dH(y,y′)− 2B + ||y||2 + ||y′||2 (38)

y ∈ {−1,+1}B||y||2 = ||y′||2 = B

(34) (36)P, t (36)

y,y′

sgn (36) sgn

L = αE{||Px−Px′||2|P} − E{||Px−Px′||2|N} (39)

(39) t

(36) (39) P P

(35) t

5. 3. 1 P

Strecha (39) Linear DiscriminantAnalysis (LDA) Difference of Covariances (DIF)

P , NΣP ,ΣN

ΣP = E{(x− x′)(x− x′)�|P} (40)

ΣN = E{(x− x′)(x− x′)�|N} (41)

LDA ΣR = ΣPΣ−1N

DIFΣD = αΣP −ΣN

Linear Discriminant Analysis (LDA)ΣP ,ΣN (39)

L = αtr(PΣPP�)− tr(PΣNP�) (42)

— 13 —- 65 -

Page 14: 0 : s g w h w à ¯ q - MPRGmprg.jp/data/MPRG/F_group/F168_ambai2015.pdf · This article is a technical report without peer review, and its polished and/or extended version may be

1

BRIEFBRISKORBFREAKD-BRIEFBinBoostBOLD

Random projec-tionsCARDLDAHash

x Σ−1/2N (42)

(42)

L ∝ tr(PΣ−1/2N ΣPΣ

−�/2N P�) (43)

= tr(PΣPΣ−1N P�) = tr(PΣRP�) (44)

ΣR = ΣPΣ−1N ΣP ΣN

tr(PΣRP�) ΣR B

S−1/2R U

�RΣ

−1/2N (45)

SR B

B B UR

D B

Difference of Covariances (DIF)(42) ΣD = αΣP −ΣN

L = tr(PΣDP�) (46)

LDA P ΣD

B B B

SD D

B UD P (47)

P = S−1/2D U

�D (47)

LDA DIF P Nα α → ∞

ΣN ΣN = I

5. 3. 2 t

P (35) t

t

t i ti

minti

E {sgn((p�i x+ ti)

�(p�i x

′ + ti))|N}

− αE{sgn((p�i x+ ti)

�(p�i x

′ + ti))|P} (48)

p�i P i 1

ti

6.

4., 5. 1

LDAHash, D-BRIEF, BinBoost

LDAHash Binary Hashing

D-BRIEF

BinBoost1 1 msec

4.,5.(49)

y = sgn(Pf(x) + t) (49)

x ∈ RN2

N ×N

— 14 —- 66 -

Page 15: 0 : s g w h w à ¯ q - MPRGmprg.jp/data/MPRG/F_group/F168_ambai2015.pdf · This article is a technical report without peer review, and its polished and/or extended version may be

2P t f(x)

BRIEF t = 0 f(x) = x

BRISK t = 0 f(x) = x

ORB t = 0 f(x) = x

FREAK t = 0 f(x) = x

D-BRIEF f(x) = x

BinBoost t = 0

BOLD t = 0 f(x) = x

Random projec-tions

t = 0 (SIFT, SURF )mean centering

CARD {−1, 0,+1} t = 0 LUTmean centering

LDAHash LDA DIF (SIFT, SURF )

21

P ∈ RB×D t ∈ R

B

f(·) x

f : RN2 → RD (50)

3 ( 21)P, t, f(·) 2

BRIEF, BRISK, ORB, FREAK, D-BRIEF,BOLD

x BinaryHashing f(x) = x

BinBoost f(x) D

P

Random projections, CARD,LDAHash f(x) SIFT SURF

Binary Hashing

f(x)

BinBoost f(x) P Boosting

6. 1 Binary hashing

Binary Hashing

2007 SIGIR Workshop HintonRBM Semantic Hashing [28]

Weiss 2008 NIPS Spectral Hashing [38]Binary Hashing

2011CVPR Iterative Quantization [15]

Binary Hashing

[17] VLAD Fisher Vector[14]

[29]

7.

2015Convo-

— 15 —- 67 -

Page 16: 0 : s g w h w à ¯ q - MPRGmprg.jp/data/MPRG/F_group/F168_ambai2015.pdf · This article is a technical report without peer review, and its polished and/or extended version may be

22

23 CNN

lutional Neural Network (CNN)

7. 1 Discriminative Learning of Deep Convolutional FeaturePoint Descriptors (ICCV2015)

Simo-Serra CNN[31] Simo-Serra CNN

22 3w

w

hyperbolic tangentL2

128 Simo-SerraCNN Siamese [8] 23

2 CNNw 2 CNN

x L2

2 CNN L2

l(·)

l(P1, P2) =

⎧⎨⎩||x1 − x2||2 positve pair

max(0, C − ||x1 − x2||2) negative pair(51)

P x CNNC l(·)

positive L2

negative L2

w Multi-view StereoCorrespondence Dataset (MVS) [2]

hard negative mining

negativehard negative hard

negativepositive hard positivehard positive hard negative

7. 2 Learning to Compare Image Patches via ConvolutionalNeural Networks (CVPR2015)

Zagoruyko 2 CNN[39] CNN 2

224

CNN

(ReLU) (max pooling)

ReLU 5125 CNN

(a) 2-channel2-channel ( 24(a)) 22 1

CNN 1

2 21

— 16 —- 68 -

Page 17: 0 : s g w h w à ¯ q - MPRGmprg.jp/data/MPRG/F_group/F168_ambai2015.pdf · This article is a technical report without peer review, and its polished and/or extended version may be

24 CNN

(b-1) SiameseSiamese ( 24(b)) Siamese [8]2

21

22

(b-2) Pseudo-siamesePseudo-siamese ( 24(b)) Siamese

Siamese

(c) Central-surround two-stream networkCentral-surround two-stream network ( 24(c))

64× 64

(1) 32× 32

(2) 32× 32

1/21 24(c)

Siamese Pseudo-siamese2-channel

(d) Spatial Pyramid Pooling (SPP) network

Spatial Pyramid Pooling (SPP)

— 17 —- 69 -

Page 18: 0 : s g w h w à ¯ q - MPRGmprg.jp/data/MPRG/F_group/F168_ambai2015.pdf · This article is a technical report without peer review, and its polished and/or extended version may be

network ( 24(d))SPP network

Spatial Pyramid Pooling [16]

24(d) SiamesePseudo-siamese 2-channel

CNNCNN

CNN MVS [2]

(52)L2

minw

λ

2||w||2 +

N∑i=1

max(0, 1− lioneti ) (52)

w CNN oneti i CNN

li ∈ {−1,+1} negative positiveλ

λ = 0.0005

positivew

wλ2||w||2 (52)

w

CNN DataAugmentation Data Augmentation

90◦ 180◦ 270◦ Data AugmentationCNN

7. 32

end-to-end

end-to-end

Pseudo-siamese

SLAM

#include "intrin.h"

int _mm_popcnt_u32 ( unsigned int a);

int _mm_popcnt_u64 ( unsigned __int64 a);

25 SSE4.2

inline unsigned int popcnt32(unsigned int x)

{

x = (x & 0x55555555) + ((x>> 1) & 0x55555555);

x = (x & 0x33333333) + ((x>> 2) & 0x33333333);

x = (x & 0x0f0f0f0f) + ((x>> 4) & 0x0f0f0f0f);

x = (x & 0x00ff00ff) + ((x>> 8) & 0x00ff00ff);

x = (x & 0x0000ffff) + ((x>>16) & 0x0000ffff);

return x;

}

26

8.

8. 1XOR

Intel Core i7 SSE4.2

C/C++ intrin.h include32/64

( 25) 7 64mm popcnt u64() 64

[24], [37]8

256

26

(carry-save adder, CSA)

7 GCC (-msse4.2)

— 18 —- 70 -

Page 19: 0 : s g w h w à ¯ q - MPRGmprg.jp/data/MPRG/F_group/F168_ambai2015.pdf · This article is a technical report without peer review, and its polished and/or extended version may be

#define CSA(h,l,a,b,c) \

{unsigned u = a ˆ b; unsigned v = c; \

h = (a & b) | (u & v); l = u ˆ v;}

27 (carry-save adder, CSA)

(a) 3

(b) 7

28

[24]3 a, b, c 2 h, l C/C++

27 a, b, c

3CSA

1 CSA 2

pop(a) + pop(b) + pop(c) = 2 · pop(h) + pop(l) (53)

pop(·) a, b, c

3 a, b, c CSA1 h, l l h

2CSA

a, b, c CSA 328(a)

CSACSA CSA

28(b)

7 4CSA 3

CSA

CSA

[24]

CPUCPU

CSA

8. 23

Mikolajczyk [21] Affine CovariantRegions Datasets [1]

HomographyBrown

[9] Multi-view Stereo Correspondence Dataset [2]Difference of Gaussian (DOG) Harris

64× 64

9.

(1)(2)

(3)

SIFT SURF

[1] Affine covariant regions datasets. http://www.robots.ox.ac.uk/˜vgg/data/data-aff.html.

[2] Multi-view stereo correspondence dataset. http://www.cs.

— 19 —- 71 -

Page 20: 0 : s g w h w à ¯ q - MPRGmprg.jp/data/MPRG/F_group/F168_ambai2015.pdf · This article is a technical report without peer review, and its polished and/or extended version may be

3

BRIEF C++(OpenCV2.0 ) http://cvlab.epfl.ch/research/detect/briefBRISK C++(OpenCV2.2 ),

MATLABhttp://www.asl.ethz.ch/people/lestefan/personal/BRISK

OpenCV http://opencv.org/ORB OpenCV http://opencv.org/FREAK C++(OpenCV ) http://www.ivpe.com/freak.htm

OpenCV http://opencv.org/MATLAB MATLAB R2013a Computer Vision System Toolbox 5.2

D-BRIEF C++(OpenCV2.0 ),MATLAB

http://cvlab.epfl.ch/research/detect/dbrief

BinBoost C++ http://infoscience.epfl.ch/record/186246?ln=enBold C++(OpenCV ) https://github.com/vbalnt/boldCARD MATLAB https://github.com/DensoITLab/CARD (Denso IT Laboratory, Inc.)

iOS : CARDesc, AppStore (Apple Inc.)

LDAHash C++(OpenCV ) http://cvlab.epfl.ch/research/detect/ldahashCNN Descriptor [Simo-Serra, ICCV15] https://github.com/etrulls/deepdesc-release ( )CNN Similarity [Zagoruyko, CVPR15] Torch, MATLAB https://github.com/szagoruyko/cvpr15deepcompare

ubc.ca/˜mbrown/patchdata/patchdata.html.[3] Alexandre Alahi, Raphael Ortiz, and Pierre Vandergheynst. FREAK:

Fast retina keypoint. In IEEE Conference on Computer Vision andPattern Recognition (CVPR), pp. 510–517, 2012.

[4] Mitsuru Ambai and Yuichi Yoshida. CARD: Compact and real-timedescriptors. In International Conference on Computer Vision (ICCV),pp. 97–104, 2011.

[5] Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithmsfor approximate nearest neighbor in high dimensions. Communica-tions of the ACM, 2008.

[6] V. Balntas, L. Tang, and K. Mikolajczyk. BOLD - binary onlinelearned descriptor for efficient image matching. In IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR), 2015.

[7] Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool.Speeded-up robust features (SURF). Computer Vision and ImageUnderstanding, Vol. 110, pp. 346–359, June 2008.

[8] J. Bromley, I. Guyon, Y. Lecun, E. Sackinger, and R. Shah. Signatureverification using a “siamese” time delay neural network. In NeuralInformation Processing Systems (NIPS), 1994.

[9] M. Brown, G. Hua, and S. Winder. Discriminant learning of local im-age descriptors. IEEE Transactions on Pattern Analysis and MachineIntelligence (TPAMI), Vol. 33, pp. 43–57, 2011.

[10] M. Brown, G. Hua, and S. Winder. Discriminative learning of lo-cal image descriptors. IEEE Transactions on Pattern Analysis andMachine Intelligence (TPAMI), Vol. 33, No. 1, pp. 43–57, 2011.

[11] Matthew Brown, Gang Hua, and Simon Winder. Discriminativelearning of local image descriptors. IEEE Transactions on PatternAnalysis and Machine Intelligence (TPAMI), Vol. 33, No. 1, pp. 43–57, 2011.

[12] Michael Calonder, Vincent Lepetit, Christoph Strecha, and PascalFua. BRIEF: Binary robust independent elementary features. In Eu-ropean Conference on Computer Vision (ECCV), pp. 778–792, 2010.

[13] Michel X. Goemans and David P. Williamson. Improved approxima-tion algorithms for maximum cut and satisfiability problems usingsemidefinite programming. Journal of the ACM, Vol. 42, pp. 1115–1145, November 1995.

[14] Yunchao Gong, Sanjiv Kumar, Henry A. Rowley, and SvetlanaLazebnik. Learning binary codes for high-dimensional data usingbilinear projections. In IEEE Conference on Computer Vision andPattern Recognition (CVPR), pp. 484–491, 2013.

[15] Yunchao Gong and S. Lazebnik. Iterative quantization: A pro-

crustean approach to learning binary codes. In IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR), pp. 817–824,2011.

[16] K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deepconvolutional networks for visual recognition. IEEE Transactions onPattern Analysis and Machine Intelligence (TPAMI), Vol. 37, No. 9,pp. 1904–1916.

[17] Jae-Pil Heo, Youngwoon Lee, Junfeng He, Shih-Fu Chang, andSung-Eui Yoon. Spherical hashing. In IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR), pp. 2957–2964, 2012.

[18] Yan Ke and Rahul Sukthankar. PCA-SIFT: A more distinctive repre-sentation for local image descriptors. In IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR), pp. 506–513, 2004.

[19] S. Leutenegger, M. Chli, and R.Y. Siegwart. BRISK: Binary robustinvariant scalable keypoints. In International Conference on Com-puter Vision (ICCV), pp. 2548–2555, 2011.

[20] David G. Lowe. Distinctive image features from scale-invariant key-points. International Journal of Computer Vision (IJCV), Vol. 60, pp.91–110, November 2004.

[21] Krystian Mikolajczyk and Cordelia Schmid. A performance evalua-tion of local descriptors. IEEE Transactions on Pattern Analysis andMachine Intelligence (TPAMI), Vol. 27, pp. 1615–1630, 2005.

[22] Jean-Michel Morel and Guoshen Yu. ASIFT: A new framework forfully affine invariant image comparison. SIAM Journal on ImagingSciences, Vol. 2, No. 2, pp. 438–469, April 2009.

[23] David Nister and Henrik Stewenius. Scalable recognition with a vo-cabulary tree. In IEEE Conference on Computer Vision and PatternRecognition (CVPR), pp. 2161–2168, 2006.

[24] Andy Oram and Greg Wilson. Beautiful Code: Leading Program-mers Explain How They Think. Oreilly & Associates Inc, 2007.

[25] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Objectretrieval with large vocabularies and fast spatial matching. In IEEEConference on Computer Vision and Pattern Recognition (CVPR),2007.

[26] Edward Rosten and Tom Drummond. Machine learning for high-speed corner detection. In European Conference on Computer Vision(ECCV), pp. 430–443, 2006.

[27] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski. ORB: An effi-cient alternative to SIFT or SURF. In International Conference onComputer Vision (ICCV), pp. 2564–2571, 2011.

[28] R. R. Salakhutdinov and G. E. Hinton. Semantic hashing. In SIGIR

— 20 —- 72 -

Page 21: 0 : s g w h w à ¯ q - MPRGmprg.jp/data/MPRG/F_group/F168_ambai2015.pdf · This article is a technical report without peer review, and its polished and/or extended version may be

workshop on Information Retrieval and applications of GraphicalModels, 2007.

[29] Ikuro Sato, Mitsuru Ambai, and Koichiro Suzuki. Sparse isotropichashing. IPSJ Transactions on Computer Vision and Applications,Vol. 5, No. 0, pp. 40–44, 2013.

[30] Gregory Shakhnarovich. Learning task-specific similarity. Ph.D the-sis, Massachusetts Institute of Technology. Dept. of Electrical Engi-neering and Computer Science, 2006.

[31] E. Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, P. Fua, andF. Moreno-Noguer. Discriminative learning of deep convolutionalfeature point descriptors. In International Conference on ComputerVision (ICCV), 2015.

[32] C. Strecha, A.M. Bronstein, M.M. Bronstein, and Pee. Fua. LDA-Hash: Improved matching with smaller descriptors. IEEE Transac-tions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 34,No. 1, pp. 66–78, 2012.

[33] Gabriel Takacs, Vijay Chandrasekhar, Sam Tsai, David Chen, RadekGrzeszczuk, and Bernd Girod. Unified real-time tracking and recog-nition with rotation-invariant fast features. In IEEE Conference onComputer Vision and Pattern Recognition (CVPR), pp. 934–941,2010.

[34] T. Trzcinski, M. Christoudias, V. Lepetit, and P. Fua. Boosting binarykeypoint descriptors. In IEEE Conference on Computer Vision andPattern Recognition (CVPR), pp. 2874–2881, 2013.

[35] Tomasz Trzcinski and Vincent Lepetit. Efficient discriminative pro-jections for compact binary descriptors. In European Conference onComputer Vision (ECCV), pp. 228–242, 2012.

[36] Jun Wang, Sanjiv Kumar, and Shih-Fu Chang. Sequential projectionlearning for hashing with compact codes. In International Confer-ence on Machine Learning (ICML), 2010.

[37] Henry S. Warren. Hacker’s Delight (2nd Edition). Addison-WesleyProfessional, 2012.

[38] Yair Weiss, Antonio Torralba, and Robert Fergus. Spectral hashing.In Neural Information Processing Systems (NIPS), pp. 1753–1760,2008.

[39] S. Zagoruyko and N. Komodakis. Learning to compare imagepatches via convolutional neural networks. In IEEE Conference onComputer Vision and Pattern Recognition (CVPR), 2015.

[40] . . .CVIM, [ ], Vol. 2013,No. 19, pp. 1–14, nov 2013.

[41] , , , , ,, .2. , 2010.

[42] , . . , Vol. 77,No. 12, pp. 1109–1116, 2011.

— 21 —- 73 -