50
SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT TRIES Huanchen Zhang Hyeontaek Lim, Viktor Leis, David G. Andersen Michael Kaminsky, Kimberly Keeton, Andrew Pavlo

SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT TRIES

Huanchen ZhangHyeontaek Lim, Viktor Leis, David G. Andersen

Michael Kaminsky, Kimberly Keeton, Andrew Pavlo

Page 2: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

Filters answer approximate membership queries

2

Page 3: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

Filters answer approximate membership queries

Billionaire

2

Page 4: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

Filters answer approximate membership queries

Billionaire

2

Page 5: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

Filters answer approximate membership queries

Billionaire

2

YES, 100% No False Negatives

Page 6: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

Filters answer approximate membership queries

Billionaire

2

Page 7: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

Filters answer approximate membership queries

Billionaire

2

NO, 99%

Page 8: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

Filters answer approximate membership queries

Billionaire

2

NO, 99%

YES, 1%

Page 9: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

Filters answer approximate membership queries

Billionaire

2

NO, 99%

YES, 1%

Page 10: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

Filters answer approximate membership queries

Billionaire

2

NO, 99%

YES, 1% False Positive Rate

Page 11: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

3

Local Memory

Slow Devices

Queries

Filters pre-reject most negative queries

Page 12: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

3

Local Memory

Slow Devices

Queries

Filters pre-reject most negative queries

NOProbably YES

Page 13: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

Existing filters only support point filtering

Point Filtering

4

Bloom Filter (1970)

Quotient Filter (2012)

Cuckoo Filter (2014)

SELECT * FROM BillionairesWHERE LastName = ‘Pavlo’

Page 14: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

Existing filters only support point filtering

Point Filtering

4

Bloom Filter (1970)

Quotient Filter (2012)

Cuckoo Filter (2014)

SELECT * FROM BillionairesWHERE LastName = ‘Pavlo’

Range Filtering

SELECT * FROM BillionairesWHERE LastName LIKE ‘Pav%’

Page 15: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

Existing filters only support point filtering

Point Filtering

4

Bloom Filter (1970)

Quotient Filter (2012)

Cuckoo Filter (2014)

SELECT * FROM BillionairesWHERE LastName = ‘Pavlo’

Range Filtering

SELECT * FROM BillionairesWHERE LastName LIKE ‘Pav%’

Page 16: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

Our solution: Succinct Range Filters (SuRF)

First practical, general-purpose range filter

5

SMALL: close to theoretic minimum

FAST: comparable to fastest trees

USEFUL: evaluated in RocksDB

64-bit integer keys, 1% false positive rate: ≈12 bits per key

10 million 64-bit integer keys: ≈ 200 ns per query

speed up range queries by up to 5x

Page 17: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

Starting point: a complete trie

S

6

I

G

M

O

D

K

D

D

O

P

S

Page 18: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

Starting point: a complete trie

S

6

I

G

M

O

D

K

D

D

O

P

S

TOO BIG

Page 19: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

S

7

I

G

M

O

D

K

D

D

O

P

S

S

I

G

MK O

Make it smaller: a truncated trie

Page 20: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

S

7

I

G

M

O

D

K

D

D

O

P

S

S

I

G

MK O

Make it smaller: a truncated trie

SIGMODSIGMETRICS

Page 21: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

8

Use suffix bits to reduce false positive rate

S

I

G

MK O

0x200xC8 0x06

Hashed Suffix Bits Real Suffix Bits

S

I

G

MK O

OD P

Page 22: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

8

Use suffix bits to reduce false positive rate

S

I

G

MK O

0x200xC8 0x06

Hashed Suffix Bits Real Suffix Bits

S

I

G

MK O

OD P

SIGMETRICS

Page 23: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

8

Use suffix bits to reduce false positive rate

S

I

G

MK O

0x200xC8 0x06

Hashed Suffix Bits Real Suffix Bits

S

I

G

MK O

OD P

SIGMETRICS

0x18

Page 24: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

8

Use suffix bits to reduce false positive rate

S

I

G

MK O

0x200xC8 0x06

Hashed Suffix Bits Real Suffix Bits

S

I

G

MK O

OD P

SIGMETRICS

0x18

SIGMETRICS

E

Page 25: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

8

Use suffix bits to reduce false positive rate

S

I

G

MK O

0x200xC8 0x06

Hashed Suffix Bits Real Suffix Bits

S

I

G

MK O

OD P

Page 26: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

8

Use suffix bits to reduce false positive rate

S

I

G

MK O

0x200xC8 0x06

Hashed Suffix Bits Real Suffix Bits

S

I

G

MK O

OD P

Each bit reduces FPR by half

Page 27: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

8

Use suffix bits to reduce false positive rate

S

I

G

MK O

0x200xC8 0x06

Hashed Suffix Bits Real Suffix Bits

S

I

G

MK O

OD P

Each bit reduces FPR by half

Cannot help range queries

Page 28: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

8

Use suffix bits to reduce false positive rate

S

I

G

MK O

0x200xC8 0x06

Hashed Suffix Bits Real Suffix Bits

S

I

G

MK O

OD P

Each bit reduces FPR by half

Cannot help range queries

Benefit point & range queries

Page 29: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

8

Use suffix bits to reduce false positive rate

S

I

G

MK O

0x200xC8 0x06

Hashed Suffix Bits Real Suffix Bits

S

I

G

MK O

OD P

Each bit reduces FPR by half

Cannot help range queries

Benefit point & range queries

Weaker distinguishability

Page 30: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

Succinct Data Structure

9

… uses an amount of space that is “close” to the information-theoretic lower bound, but still allows efficient query operations. [wikipedia]

Page 31: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

SuRF’s encoding is small and fast

10

Small≈10 + suffix bits per key for 64-bit integers

≈14 + suffix bits per key for emails

FastMatches state-of-the-art pointer-based trees

Page 32: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

LN-2

LN-1

LN

…, 6, 20, …

…, 12, 21, …

…, 11, 19, …

11

Bloom filters speed up point queries in RocksDB

B

B

B

Cached FiltersB, B, B, …

SSTable

Page 33: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

LN-2

LN-1

LN

…, 6, 20, …

…, 12, 21, …

…, 11, 19, …

11

B

B

B

Cached FiltersB, B, B, …

GET(16)

Bloom filters speed up point queries in RocksDB

Page 34: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

LN-2

LN-1

LN

…, 6, 20, …

…, 12, 21, …

…, 11, 19, …

11

B

B

B

Cached FiltersB, B, B, …

GET(16)NO

Bloom filters speed up point queries in RocksDB

Page 35: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

LN-2

LN-1

LN

…, 6, 20, …

…, 12, 21, …

…, 11, 19, …

12

B

B

B

Cached FiltersB, B, B, …

SEEK(14, 18)

Bloom filters can’t help range queries in RocksDB

Page 36: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

LN-2

LN-1

LN

…, 6, 20, …

…, 12, 21, …

…, 11, 19, …

12

B

B

B

Cached FiltersB, B, B, …

SEEK(14, 18)

Bloom filters can’t help range queries in RocksDB

Page 37: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

LN-2

LN-1

LN

…, 6, 20, …

…, 12, 21, …

…, 11, 19, …

13

S

S

S

Cached FiltersS, S, S, …

SuRFs can benefit both point and range queries

Page 38: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

LN-2

LN-1

LN

…, 6, 20, …

…, 12, 21, …

…, 11, 19, …

13

S

S

S

Cached FiltersS, S, S, … SEEK(14, 18)

GET(16)NO

SuRFs can benefit both point and range queries

Page 39: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

Evaluation setup: a time-series benchmark

14

Time

Key: 64-bit timestamp + 64-bit sensor IDValue: 1KB payload

Page 40: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

Evaluation setup: a time-series benchmark

14

Time

Key: 64-bit timestamp + 64-bit sensor IDValue: 1KB payload

SEEK(t1, t2)GET(t)

t t1 t2

Queries:

Page 41: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

Evaluation setup: a time-series benchmark

14

Time

Key: 64-bit timestamp + 64-bit sensor IDValue: 1KB payload

SEEK(t1, t2)GET(t)

t t1 t2

Queries:

System Config

Dataset: ≈100 GB on SSD

DRAM: 32 GB

Page 42: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

Evaluation setup: a time-series benchmark

14

Time

Key: 64-bit timestamp + 64-bit sensor IDValue: 1KB payload

SEEK(t1, t2)GET(t)

t t1 t2

Queries:

Filter ConfigBloom filter: 14 bits per key

SuRF: 4-bit real suffix

System Config

Dataset: ≈100 GB on SSD

DRAM: 32 GB

Page 43: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

15

SuRFs still benefit point queries in RocksDB

0

10

20

30

40

Th

rou

gh

pu

t (K

op

s/s)

No Filter Bloom Filter SuRF

All-false point queries

Worst-case Gap

Page 44: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

0

2

4

6

8

10T

hro

ug

hp

ut

(Ko

ps/

s)

Percent of queries with empty results

10 20 30 40 50 60 70 80 90 99

No Filter/ Bloom Filter

SuRF

16

SuRFs speed up range queries in RocksDB

Page 45: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

0

2

4

6

8

10T

hro

ug

hp

ut

(Ko

ps/

s)

Percent of queries with empty results

10 20 30 40 50 60 70 80 90 99

SuRF

16

5x

SuRFs speed up range queries in RocksDB

No Filter/ Bloom Filter

Page 46: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

Conclusion

17

SuRF is a fast and compact data structureoptimized for range filtering

github.com/efficient/SuRF

rangefilter.io[Demo]

Page 47: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

Backup Slides

Page 48: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

Comparing ARF to SuRF

B1

Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries (ARF uses 2M queries for training)

Bits per Key (held constant)

Range Query Throughput (Mops/s)

False Positive Rate

Build Time (s)

Build Memory (GB)

Training Time (s)

Training Throughput (Mops/s)

ARF SuRF Improvement

14

0.16

25.7

118

26

117

0.02

14

3.3

2.2

1.2

0.02

N/A

N/A

-

20x

12x

98x

1300x

N/A

N/A

Page 49: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

LOUDS-Sparse encoding example

B2

a i

h t f t

v1 v2 v3 v4 v5

a i d h t f t1 1 0 0 0 0 0

1 0 1 0 0 1 0v1 v2 v3 v4 v5

Label:

Structure:

Has-child:

Value:

LOUDS-Sparse

d

10N bitsTheoretic Limit ≈ 9.4N bits

moveToChild (p) = select(S, rank(HC, p) + 1)

Page 50: SuRF: PRACTICAL RANGE FILTERING WITH FAST SUCCINCT …huanche1/slides/surf-sigmod18.pdfComparing ARF to SuRF B1 Experiment: insert 5M 64-bit integers, 10M Zipf-distributed range queries

LOUDS-DS trades small space for performance

B3

LOUDS-Dense

LOUDS-Sparse

Hot

Cold

space overhead

speed-up

< 1%

3x