Upload
bohu-tang
View
1.346
Download
1
Embed Size (px)
DESCRIPTION
Citation preview
Indexing DelightThinking Cap of Fractal-tree Indexes
BohuTANG@2012/[email protected]
B-treeInvented in 1972, 40 years!
B-tree
Block0
Block1 Block2 Block3
Block4 Block5
.... ....
.....................................................................................
... Block0 ... ... Block3 ... Block5 ...File on disk:
B-tree Insert
Block0
Block1 Block2 Block3
Block4 Block5
.... ....
.....................................................................................
Insert x
seek
... Block0 ... ... Block3 ... Block5 ...File on disk:
B-tree Insert
Block0
Block1 Block2 Block3
Block4 Block5
.... ....
.....................................................................................
Insert x
seek
seek
... Block0 ... ... Block3 ... Block5 ...File on disk:
B-tree Insert
Block0
Block1 Block2 Block3
Block4 Block5
.... ....
.....................................................................................
Insert x
seek
seek
... Block0 ... ... Block3 ... Block5 ...File on disk:
Insert one item causes many random seeks!
B-tree Search
Block0
Block1 Block2 Block3
Block4 Block5
.... ....
.....................................................................................
Search x
seek
seek
Query is fast, I/Os costs O(logBN)
B-tree Conclusions
● Search: O(logBN ) block transfers.● Insert: O(logBN ) block transfers(slow).● B-tree range queries are slow.● IMPORTANT: --Parent and child blocks sparse in disk.
A Simplified Fractal-treeCache Oblivious Lookahead Array, invented by MITers
COLA
log2N
...........
Binary Search in one level:O(log2N) 2
COLA (Using Fractional Cascading)
log2N
...........
● Search: O(log2N) block transfers.● Insert: O((1/B)log2N) amortized block transfers.● Data is stored in log2N arrays of sizes 2, 4, 8, 16,...● Balanced Binary Search Tree
COLA Conclusions
● Search: O(log2N) block transfers(Using Fractional Cascading).
● Insert: O((1/B)log2N) amortized block transfers.● Data is stored in log2N arrays of sizes 2, 4, 8, 16,...● Balanced Binary Search Tree● Lookahead(Prefetch), Data-Intensive!● BUT, the bottom level will be big and bigger,
merging expensive.
COLA vs B-tree
● Search: -- (log2N)/(logBN)
= log2B times slower than B-tree(In theory)● Insert:
--(logBN)/((1/B)log2N)= B/(log2B) times faster than B-trees(In theory)
if B = 4KB:COLA search is 12 times slower than B-treeCOLA insert is 341 times faster than B-tree
LSM-tree
LSM-tree
buffer
buffer
bufferbuffer bufferbuffer
buffer ...
... ... ...
● Lazy insertion, Sorted before● Leveli is the buffer of Leveli+1● Search: O(logBN) * O(logN) ● Insert:O((logBN)/B)
In memory
LSM-tree (Using Fractional Cascading)
buffer
bufferbuffer bufferbuffer
buffer ...
... ... ...
● Search: O(logBN) (Using FC)● Insert:O((logBN)/B)● 0.618 Fractal-tree?But NOT Cache Oblivious...
bufferIn memory
LSM-tree (Merging)
buffer
bufferbuffer bufferbuffer
buffer ...
... ... ...
A lot of I/O wasted during merging!Like a headless fly flying...
merge merge merge
Zzz...
bufferIn memory
Fractal-tree IndexesJust Fractal. Patented by Tokutek...
Fractal-tree Indexes
Search: O(logBN) Insert: O((logBN)/B) (amortized)Search is same as B-tree, but insert faster than B-tree
Fractal-tree Indexes (Block size)
B is 4MB...
....
.... .... ....
Fractal-tree Indexes (Block size)
B is 4MB...
....
.... .... ....
full
Fractal-tree Indexes (Block size)
B is 4MB...
....
.... .... ....
full
Fractal-tree Indexes (Block size)
..
.. ....
..
.. ....
..
.. ....
... ... ...
full
Fractal! 4MB one seek...
Bε-treeJust a constant factor on Block fanout...
Bε-tree
Search
Inserts
Slow
Slow
Fast
Fast
B-tree
AOF
ε=1/2
Optimal Curve
Bε-tree
insert search
B-tree(ɛ=1)
O(logBN) O(logBN)
ɛ=1/2 O((logBN)/√B) O(logBN)
ɛ=0 O((logN)/B) O(logN)
if we want optimal point queries + very fast inserts, we should choose ɛ=1/2
Bε-tree
So, if block size is B, the fanout should be √B
Cache Oblivious Data StructureAll the above is JUST Cache Oblivious Data Structures...
Cache Oblivious Data Structure
Question:Reading a sequence of k consecutive blocks
at once is not much more expensive than reading a single block. How to take advantage of this feature?
Cache Oblivious Data Structure
My Questions(In Chinese):Q1:
只有1MB内存,怎样把两个64MB有序文件合并成一个有序文件?
Q2:大多数机械磁盘,连续读取多个Block和读取
单个Block花费相差不大,在Q1中如何利用这个优势?
nessDBhttps://github.com/shuttler/nessDBYou should agree that VFS do better than yourself cache!
nessDB
..
.. ....
..
.. ....
... ... ...
Each Block is Small-Splittable Tree
nessDB, What's going on?
..
.. ....
..
.. ....
..
.. ....
... ... ...
From the line to the plane..
Thanks!Most of the references are from:Tokutek & MIT CSAIL & Stony Brook.
Drafted By BohuTANG using Google Drive, @2012/12/12