最近傍探索と直積量子化(Nearest neighbor search and Product Quantization)

Nearest Neighbor Search(NNS)

NGUYEN ANH TUAN

Bachelor 4th yearThe Engineering Falcuty, the University of Tokyo

May 14, 2014

NGUYEN ANH TUAN Group study memo 1 / 45

自己紹介

2008年　来日

2009年～2012年　兵庫県明石高専卒論:「エッジ方向のヒストグラムを利用した電子透かしの手法」(対象:文章画像)

2012年　東大 2年に編入

好きなプログラミング言語:C/C++、Java、最近はRubyを触っています。

本日の流れ

1 概要問題の定義定義の解説空間と距離次元課題:次元の呪縛

関連問題k近傍法ボロノイ図近似最近傍探索

2 アルゴリズム線形探索kd木Locality Sensitive Hashingその他

3 最近の研究Vector QuantizationProduct Quantization最近傍探索との関連

4 まとめ

問題の定義

Nearest Neighbor Search(NNS problem)

D次元ユークリッド空間RDにおいて

ベクトル x =(x1, x2, . . . xD

)距離 d

√(x1 − y1)2 + (x2 − y2)2 + . . .+ (xD − yD)2

と表記すると、

入力:部分集合Y ⊂ RDとベクトル x ∈ RD

出力:Yの要素の中で xと “一番近い”要素NN(x)。

要するに、入力(Y,x

)が与えられた時、

NN(x) = argminy∈Y

d(x,y) (1)

を求める問題

解説

空間と距離

高次元と低次元

ユークリッド空間内の距離

ユークリッド距離→直感的、通常の距離

→Minkowski距離 dp(x,y) =( D∑i=1

|xi − yi|p) 1

→ p = 2の時、ユークリッド距離

p = 1の時、マンハッタン距離

dM(x,y) =D∑i=1

|xi − yi| (2)

Question: p > 2のMinkowski距離を利用するか?

|xi − yi|p) 1

dM(x,y) =D∑i=1

|xi − yi| (2)

|xi − yi|p) 1

dM(x,y) =D∑i=1

|xi − yi| (2)

Hamming Space

Hamming Space:H = {0, 1}の時、すべての長さDのビット列からなる空間をD次元Hamming Space HD

Hamming距離→二つのビット列の相違度

dH(x,y) =∣∣{i|xi ̸= yi}

∣∣ (3)

例: 00011と 00101の相違度が 2

Hamming Space

dH(x,y) =∣∣{i|xi ̸= yi}

∣∣ (3)

例: 00011と 00101の相違度が 2

Hamming Space

dH(x,y) =∣∣{i|xi ̸= yi}

∣∣ (3)

例: 00011と 00101の相違度が 2

解説

空間と距離

参考文献 1

定理 1(引用)

Let F is an arbitrary distribution of n points (from a database of Nuniformly distributed points), the distance function is an k-normMinkowski distance function inside an Euclidean space RD. Therefore,

Ck ≤ limD→+∞

[dmax − dmin

D1/k−1/2

]≤ (N − 1)Ck (4)

, where dmax, dmin are the farthest and nearest distance from a point in Fto the query point, respectively. Ck is a constant value that depends on k.

1A. Hinneburg et al , “What Is the Nearest Neighbor in HighDimensional Spaces?”, Proceedings of the 26th International Conferenceon Very Large Data Bases, pp.506-515, 2000

k > 2のMinkowski距離 dk(x,y)を利用した時、

空間の次元Dを増加させると、

dmax − dminが 0に収束

→高次元で k > 2のMinkowski距離の時、データセットの中の最短距離と最長距離はほぼ一致してしまう

次元の呪縛

ざっくり見ると、普通な問題

一番簡単な手法:線形探索→ 時間計算量O(nD)

次元の一定の値を超えると、いかなるアルゴリズムでも線形探索 (全探索)と等価である。

次元の呪縛

ざっくり見ると、普通な問題

一番簡単な手法:線形探索→ 時間計算量O(nD)

次元の一定の値を超えると、いかなるアルゴリズムでも線形探索 (全探索)と等価である。

最近傍探索と直積量子化(Nearest neighbor search and Product Quantization)

Technology

Quantization, after Souriau...Quantization, after Souriau Souriau Prequantization Quantization? Group algebra Classical Quantum Nilpotent Reductive E(3) J.-M. Souriau What is quantization?

Contents - fmt.if.usp.brgtlandi/courses/second-quantization-4.pdf · quantization. Second quantization is extremely powerful. We will learn in these notes to think about second quantization

Geometric Quantization 1 Introductionsites.ugcs.caltech.edu/~mainiero/articles/geom_quant_mainiero... · Geometric Quantization 1 Introduction The aim of the geometric quantization

Product Quantization for Nearest Neighbor Search...been successfully used for local descriptors [12] and 3D object indexing [13], [11]. However, for real data, LSH is outperformed

Quantization - Textbook

2.3.4 QUANTIZATION

Second Quantization

Lecture 1 Quantization of energy. Quantization of energy Energies are discrete (“quantized”) and not continuous. This quantization principle cannot be

Quantization Noise

Optimized Product Quantization for Approximate Nearest Neighbor …kaiminghe.com/publications/cvpr13opq.pdf · 2017-01-22 · Optimized Product Quantization for Approximate Nearest

Canonical Quantization

3) Vector Quantization (VQ) and Learning Vector Quantization (LVQ)

Adaptive Binary Quantization for Fast Nearest Neighbor Searchsites.nlsde.buaa.edu.cn/~xlliu/ECAI2016_slides.pdf · 2016. 10. 31. · An adaptive binary quantization method: jointly

Vector Quantization. 2 outline Introduction Two measurement : quality of image and bit rate Advantages of Vector Quantization over Scalar Quantization

SECOND QUANTIZATION - McGill Universityhilke/QTSv4.pdfSECOND QUANTIZATION In this section we introduce the concept of second quantization. Historically, the ﬂrst quantization is

Scalar quantization with memory...Scalar quantization with memory Speech and images have redundancy or memory that scalar quantization can not exploit. Scalar quantization for source

Vector Quantization in Speech Coding - LabROSA - …labrosa.ee.columbia.edu/~dpwe/papers/MakhRG85-vq.pdf · Vector Quantization in Speech Coding ... quantization. Sampling converts

Deformation Quantization G and Geometric Quantization T C of Abelian Varieties …blaavand.us/download/progress.pdf · 2018-03-04 · Deformation Quantization and Geometric Quantization

Product quantization for nearest neighbor search-report

Overview Neighborhood graph Search Quantization Application · [1] Trinary-Projection Trees for Approximate Nearest Neighbor Search. Jingdong Wang, Naiyan Wang, You Jia, Jian Li,