Upload
bruce-park
View
229
Download
3
Embed Size (px)
Citation preview
CS 245 Notes 4 1
CS 245: Database System Principles
Notes 4: Indexing
Hector Garcia-Molina
CS 245 Notes 4 2
Indexing & Hashing
value
Chapter 4
value
record
CS 245 Notes 4 3
Topics
• Conventional indexes• B-trees• Hashing schemes
CS 245 Notes 4 4
Sequential File
2010
4030
6050
8070
10090
CS 245 Notes 4 5
Sequential File
2010
4030
6050
8070
10090
Dense Index
10203040
50607080
90100110120
CS 245 Notes 4 6
Sequential File
2010
4030
6050
8070
10090
Sparse Index
10305070
90110130150
170190210230
CS 245 Notes 4 7
Sequential File
2010
4030
6050
8070
10090
Sparse 2nd level
10305070
90110130150
170190210230
1090
170250
330410490570
CS 245 Notes 4 8
• Comment:{FILE,INDEX} may be contiguous
or not (blocks chained)
CS 245 Notes 4 9
Question:
• Can we build a dense, 2nd level index for a dense index?
CS 245 Notes 4 10
Notes on pointers:
(1) Block pointer (sparse index) can be smaller than record pointer
BP
RP
CS 245 Notes 4 11
(2) If file is contiguous, then we can omitpointers (i.e., compute them)
Notes on pointers:
CS 245 Notes 4 12
K1
K3
K4
K2
R1
R2
R3
R4
CS 245 Notes 4 13
K1
K3
K4
K2
R1
R2
R3
R4
say:1024 Bper block
• if we want K3 block: get it at offset (3-1)1024 = 2048 bytes
CS 245 Notes 4 14
Sparse vs. Dense Tradeoff
• Sparse: Less index space per record can keep more of
index in memory• Dense: Can tell if any record exists
without accessing file
(Later: – sparse better for insertions– dense needed for secondary indexes)
CS 245 Notes 4 15
Terms
• Index sequential file• Search key ( primary key)• Primary index (on Sequencing field)• Secondary index• Dense index (all Search Key values in)• Sparse index• Multi-level index
CS 245 Notes 4 16
Next:
• Duplicate keys
• Deletion/Insertion
• Secondary indexes
CS 245 Notes 4 17
Duplicate keys
1010
2010
3020
3030
4540
CS 245 Notes 4 18
1010
2010
3020
3030
4540
10101020
20303030
1010
2010
3020
3030
4540
10101020
20303030
Dense index, one way to implement?
Duplicate keys
CS 245 Notes 4 19
1010
2010
3020
3030
4540
10203040
Dense index, better way?
Duplicate keys
CS 245 Notes 4 20
1010
2010
3020
3030
4540
10102030
Sparse index, one way?
Duplicate keys
CS 245 Notes 4 21
1010
2010
3020
3030
4540
10102030
Sparse index, one way?
Duplicate keys
care
ful if lookin
gfo
r 2
0 o
r 3
0!
CS 245 Notes 4 22
1010
2010
3020
3030
4540
10203030
Sparse index, another way?
Duplicate keys
– place first new key from block
CS 245 Notes 4 23
1010
2010
3020
3030
4540
10203030
Sparse index, another way?
Duplicate keys
– place first new key from block
shouldthis be40?
CS 245 Notes 4 24
Duplicate values, primary index
• Index may point to first instance ofeach value only
File Index
Summary
aaa
b
CS 245 Notes 4 25
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
CS 245 Notes 4 26
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
– delete record 40
CS 245 Notes 4 27
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
– delete record 40
CS 245 Notes 4 28
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
– delete record 30
CS 245 Notes 4 29
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
– delete record 30
4040
CS 245 Notes 4 30
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
– delete records 30 & 40
CS 245 Notes 4 31
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
– delete records 30 & 40
CS 245 Notes 4 32
Deletion from sparse index
2010
4030
6050
8070
10305070
90110130150
– delete records 30 & 40
5070
CS 245 Notes 4 33
Deletion from dense index
2010
4030
6050
8070
10203040
50607080
CS 245 Notes 4 34
Deletion from dense index
2010
4030
6050
8070
10203040
50607080
– delete record 30
CS 245 Notes 4 35
Deletion from dense index
2010
4030
6050
8070
10203040
50607080
– delete record 30
40
CS 245 Notes 4 36
Deletion from dense index
2010
4030
6050
8070
10203040
50607080
– delete record 30
4040
CS 245 Notes 4 37
Insertion, sparse index case
2010
30
5040
60
10304060
CS 245 Notes 4 38
Insertion, sparse index case
2010
30
5040
60
10304060
– insert record 34
CS 245 Notes 4 39
Insertion, sparse index case
2010
30
5040
60
10304060
– insert record 34
34
• our lucky day! we have free space where we need it!
CS 245 Notes 4 40
Insertion, sparse index case
2010
30
5040
60
10304060
– insert record 15
CS 245 Notes 4 41
Insertion, sparse index case
2010
30
5040
60
10304060
– insert record 15
15
2030
20
CS 245 Notes 4 42
Insertion, sparse index case
2010
30
5040
60
10304060
– insert record 15
15
2030
20
• Illustrated: Immediate reorganization• Variation:
– insert new block (chained file)– update index
CS 245 Notes 4 43
Insertion, sparse index case
2010
30
5040
60
10304060
– insert record 25
CS 245 Notes 4 44
Insertion, sparse index case
2010
30
5040
60
10304060
– insert record 25
25
overflow blocks(reorganize later...)
CS 245 Notes 4 45
Insertion, dense index case
• Similar
• Often more expensive . . .
CS 245 Notes 4 46
Secondary indexesSequencefield
5030
7020
4080
10100
6090
CS 245 Notes 4 47
Secondary indexesSequencefield
5030
7020
4080
10100
6090
• Sparse index
302080
100
90...
CS 245 Notes 4 48
Secondary indexesSequencefield
5030
7020
4080
10100
6090
• Sparse index
302080
100
90...
does not make sense!
CS 245 Notes 4 49
Secondary indexesSequencefield
5030
7020
4080
10100
6090
• Dense index
CS 245 Notes 4 50
Secondary indexesSequencefield
5030
7020
4080
10100
6090
• Dense index10203040
506070...
CS 245 Notes 4 51
Secondary indexesSequencefield
5030
7020
4080
10100
6090
• Dense index10203040
506070...
105090...
sparsehighlevel
CS 245 Notes 4 52
With secondary indexes:
• Lowest level is dense• Other levels are sparse
Also: Pointers are record pointers
(not block pointers; not computed)
CS 245 Notes 4 53
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
CS 245 Notes 4 54
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
10101020
20304040
4040...
one option...
CS 245 Notes 4 55
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
10101020
20304040
4040...
one option...
Problem:excess overhead!
• disk space• search time
CS 245 Notes 4 56
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
10
another option...
4030
20
CS 245 Notes 4 57
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
10
another option...
4030
20Problem:variable sizerecords inindex!
CS 245 Notes 4 58
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
10203040
5060...
Another idea (suggested in class):Chain records with same key?
CS 245 Notes 4 59
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
10203040
5060...
Another idea (suggested in class):Chain records with same key?
Problems:• Need to add fields to records• Need to follow chain to know records
CS 245 Notes 4 60
Duplicate values & secondary indexes
1020
4020
4010
4010
4030
10203040
5060...
buckets
CS 245 Notes 4 61
Why “bucket” idea is useful
Indexes RecordsName: primary EMP
(name,dept,floor,...)
Dept: secondaryFloor: secondary
CS 245 Notes 4 62
Query: Get employees in
(Toy Dept) ^ (2nd floor)
Dept. index EMP Floor index
Toy 2nd
CS 245 Notes 4 63
Query: Get employees in
(Toy Dept) ^ (2nd floor)
Dept. index EMP Floor index
Toy 2nd
Intersect toy bucket and 2nd Floor
bucket to get set of matching EMP’s
CS 245 Notes 4 64
This idea used in text information retrieval
Documents
...the cat is fat ...
...was raining cats and dogs...
...Fido the dog ...
CS 245 Notes 4 65
This idea used in text information retrieval
Documents
...the cat is fat ...
...was raining cats and dogs...
...Fido the dog ...
Inverted lists
cat
dog
CS 245 Notes 4 66
IR QUERIES
• Find articles with “cat” and “dog”• Find articles with “cat” or “dog”• Find articles with “cat” and not “dog”
CS 245 Notes 4 67
IR QUERIES
• Find articles with “cat” and “dog”• Find articles with “cat” or “dog”• Find articles with “cat” and not “dog”
• Find articles with “cat” in title• Find articles with “cat” and “dog”
within 5 words
CS 245 Notes 4 68
Common technique: more info in inverted
list
cat Title 5
Title 100
Author 10
Abstract 57
Title 12
d3d2
d1
dog
typeposit
ion
locatio
n
CS 245 Notes 4 69
Posting: an entry in inverted list.Represents occurrence
ofterm in article
Size of a list: 1 Rare words or (in postings) miss-spellings
106 Common words
Size of a posting: 10-15 bits (compressed)
CS 245 Notes 4 70
IR DISCUSSION
• Stop words• Truncation• Thesaurus• Full text vs. Abstracts• Vector model
CS 245 Notes 4 71
Vector space model
w1 w2 w3 w4 w5 w6 w7 …
DOC = <1 0 0 1 1 0 0 …>
Query= <0 0 1 1 0 0 0 …>
CS 245 Notes 4 72
Vector space model
w1 w2 w3 w4 w5 w6 w7 …
DOC = <1 0 0 1 1 0 0 …>
Query= <0 0 1 1 0 0 0 …>
PRODUCT = 1 + ……. = score
CS 245 Notes 4 73
• Tricks to weigh scores + normalize
e.g.: Match on common word not asuseful as match on rare
words...
CS 245 Notes 4 74
• How to process V.S. Queries?
w1 w2 w3 w4 w5 w6 …Q = < 0 0 0 1 1 0 … >
CS 245 Notes 4 75
• Try Stanford Libraries• Try Google, Yahoo, ...
CS 245 Notes 4 76
Summary so far
• Conventional index– Basic Ideas: sparse, dense, multi-
level…– Duplicate Keys– Deletion/Insertion– Secondary indexes
– Buckets of Postings List
CS 245 Notes 4 77
Conventional indexes
Advantage:- Simple- Index is sequential file
good for scans
Disadvantage:- Inserts expensive,
and/or- Lose sequentiality &
balance
CS 245 Notes 4 78
Example Index (sequential)
continuous
free space
102030
405060
708090
CS 245 Notes 4 79
Example Index (sequential)
continuous
free space
102030
405060
708090
39313536
323834
33
overflow area(not sequential)
CS 245 Notes 4 80
Outline:
• Conventional indexes• B-Trees NEXT• Hashing schemes
CS 245 Notes 4 81
• NEXT: Another type of index– Give up on sequentiality of index– Try to get “balance”
CS 245 Notes 4 82
Root
B+Tree Example n=3
100
120
150
180
30
3 5 11
30
35
100
101
110
120
130
150
156
179
180
200
CS 245 Notes 4 83
Sample non-leaf
to keys to keys to keys to keys
< 57 57 k<81 81k<95 95
57
81
95
CS 245 Notes 4 84
Sample leaf node:
From non-leaf node
to next leafin
sequence5
7
81
95
To r
eco
rd
wit
h k
ey 5
7
To r
eco
rd
wit
h k
ey 8
1
To r
eco
rd
wit
h k
ey 8
5
CS 245 Notes 4 85
In textbook’s notationn=3
Leaf:
Non-leaf:
30
35
30
30 35
30
CS 245 Notes 4 86
Size of nodes: n+1 pointersn keys
(fixed)
CS 245 Notes 4 87
Don’t want nodes to be too empty
• Use at least
Non-leaf: (n+1)/2pointers
Leaf: (n+1)/2 pointers to data
CS 245 Notes 4 88
Full nodemin. node
Non-leaf
Leaf
n=3
12
01
50
18
0
30
3 5 11
30
35
counts
even if
null
CS 245 Notes 4 89
B+tree rules tree of order n
(1) All leaves at same lowest level(balanced tree)
(2) Pointers in leaves point to records except for “sequence pointer”
CS 245 Notes 4 90
(3) Number of pointers/keys for B+tree
Non-leaf(non-root) n+1 n (n+1)/2 (n+1)/2- 1
Leaf(non-root) n+1 n
Root n+1 n 1 1
Max Max Min Min ptrs keys ptrsdata keys
(n+1)/2 (n+1)/2
CS 245 Notes 4 91
Insert into B+tree
(a) simple case– space available in leaf
(b) leaf overflow(c) non-leaf overflow(d) new root
CS 245 Notes 4 92
(a) Insert key = 32 n=33 5 11
30
31
30
100
CS 245 Notes 4 93
(a) Insert key = 32 n=33 5 11
30
31
30
100
32
CS 245 Notes 4 94
(a) Insert key = 7 n=3
3 5 11
30
31
30
100
CS 245 Notes 4 95
(a) Insert key = 7 n=3
3 5 11
30
31
30
100
3 5
7
CS 245 Notes 4 96
(a) Insert key = 7 n=3
3 5 11
30
31
30
100
3 5
7
7
CS 245 Notes 4 97
(c) Insert key = 160 n=3
10
0
120
150
180
150
156
179
180
200
CS 245 Notes 4 98
(c) Insert key = 160 n=3
10
0
120
150
180
150
156
179
180
200
160
179
CS 245 Notes 4 99
(c) Insert key = 160 n=3
10
0
120
150
180
150
156
179
180
200
18
0
160
179
CS 245 Notes 4 100
(c) Insert key = 160 n=3
10
0
120
150
180
150
156
179
180
200
160
18
0
160
179
CS 245 Notes 4 101
(d) New root, insert 45 n=3
10
20
30
1 2 3 10
12
20
25
30
32
40
CS 245 Notes 4 102
(d) New root, insert 45 n=3
10
20
30
1 2 3 10
12
20
25
30
32
40
40
45
CS 245 Notes 4 103
(d) New root, insert 45 n=3
10
20
30
1 2 3 10
12
20
25
30
32
40
40
45
40
CS 245 Notes 4 104
(d) New root, insert 45 n=3
10
20
30
1 2 3 10
12
20
25
30
32
40
40
45
40
30new root
CS 245 Notes 4 105
(a) Simple case - no example
(b) Coalesce with neighbor (sibling)
(c) Re-distribute keys(d) Cases (b) or (c) at non-leaf
Deletion from B+tree
CS 245 Notes 4 106
(b) Coalesce with sibling– Delete 50
10
40
100
10
20
30
40
50
n=4
CS 245 Notes 4 107
(b) Coalesce with sibling– Delete 50
10
40
100
10
20
30
40
50
n=4
40
CS 245 Notes 4 108
(c) Redistribute keys– Delete 50
10
40
100
10
20
30
35
40
50
n=4
CS 245 Notes 4 109
(c) Redistribute keys– Delete 50
10
40
100
10
20
30
35
40
50
n=4
35
35
CS 245 Notes 4 110
40
45
30
37
25
26
20
22
10
141 3
10
20
30
40
(d) Non-leaf coalese– Delete 37
n=4
25
CS 245 Notes 4 111
40
45
30
37
25
26
20
22
10
141 3
10
20
30
40
(d) Non-leaf coalese– Delete 37
n=4
30
25
CS 245 Notes 4 112
40
45
30
37
25
26
20
22
10
141 3
10
20
30
40
(d) Non-leaf coalese– Delete 37
n=4
40
30
25
CS 245 Notes 4 113
40
45
30
37
25
26
20
22
10
141 3
10
20
30
40
(d) Non-leaf coalese– Delete 37
n=4
40
30
25
25
new root
CS 245 Notes 4 114
B+tree deletions in practice
– Often, coalescing is not implemented– Too hard and not worth it!
CS 245 Notes 4 115
Comparison: B-trees vs. static indexed sequential
fileRef #1: Held & Stonebraker
“B-Trees Re-examined”CACM, Feb. 1978
CS 245 Notes 4 116
Ref # 1 claims:- Concurrency control harder in B-Trees
- B-tree consumes more space
For their comparison:block = 512 byteskey = pointer = 4 bytes4 data records per block
CS 245 Notes 4 117
Example: 1 block static index
127 keys
(127+1)4 = 512 Bytes-> pointers in index implicit! up to 127
blocks
k1
k2
k3
k1
k2
k3
1 datablock
CS 245 Notes 4 118
Example: 1 block B-tree
63 keys
63x(4+4)+8 = 512 Bytes-> pointers needed in B-tree up to 63
blocks because index is blocksnot contiguous
k1
k2
...
k63
k1
k2
k3
1 datablock
next
-
CS 245 Notes 4 119
Size comparison Ref. #1Size comparison Ref. #1
Static Index B-tree# data # datablocks height blocks
height
2 -> 127 2 2 -> 63 2128 -> 16,129 3 64 -> 3968 316,130 -> 2,048,383 4 3969 -> 250,047 4
250,048 -> 15,752,961 5
CS 245 Notes 4 120
Ref. #1 analysis claims
• For an 8,000 block file,after 32,000 inserts
after 16,000 lookups Static index saves enough accesses
to allow for reorganization
CS 245 Notes 4 121
Ref. #1 analysis claims
• For an 8,000 block file,after 32,000 inserts
after 16,000 lookups Static index saves enough accesses
to allow for reorganization
Ref. #1 conclusion Static index better!!
CS 245 Notes 4 122
Ref #2: M. Stonebraker, “Retrospective on a
database system,” TODS, June 1980
Ref. #2 conclusion B-trees better!!
CS 245 Notes 4 123
• DBA does not know when to reorganize• DBA does not know how full to load
pages of new index
Ref. #2 conclusion B-trees better!!
CS 245 Notes 4 124
• Buffering– B-tree: has fixed buffer requirements– Static index: must read several
overflow blocks to be efficient (large & variable size buffers needed for this)
Ref. #2 conclusion B-trees better!!
CS 245 Notes 4 125
• Speaking of buffering… Is LRU a good policy for B+tree
buffers?
CS 245 Notes 4 126
• Speaking of buffering… Is LRU a good policy for B+tree
buffers? Of course not! Should try to keep root in memory
at all times(and perhaps some nodes from second level)
CS 245 Notes 4 127
Interesting problem:
For B+tree, how large should n be?
…
n is number of keys / node
CS 245 Notes 4 128
Sample assumptions:(1) Time to read node from disk is
(S+Tn) msec.
CS 245 Notes 4 129
Sample assumptions:(1) Time to read node from disk is
(S+Tn) msec.(2) Once block in memory, use
binary search to locate key:(a + b LOG2
n) msec.For some constants a,b; Assume a <<
S
CS 245 Notes 4 130
Sample assumptions:(1) Time to read node from disk is
(S+Tn) msec.(2) Once block in memory, use
binary search to locate key:(a + b LOG2
n) msec.For some constants a,b; Assume a <<
S(3) Assume B+tree is full, i.e.,
# nodes to examine is LOGn N where N = # records
CS 245 Notes 4 131
Can get: f(n) = time to find a record
f(n)
nopt n
CS 245 Notes 4 132
FIND nopt by f’(n) = 0
Answer is nopt = “few hundred”
(see homework for details)
CS 245 Notes 4 133
FIND nopt by f’(n) = 0
Answer is nopt = “few hundred”
(see homework for details)
What happens to nopt as
• Disk gets faster?• CPU get faster?
Exercise
• f(n)= lognN*[S+T*n+a+b*log2(n)]
CS 245 Notes 4 134
S= 14000
T= 0.2
b= 0.002
a= 0
N= 10000000
N=10 million records
CS 245 Notes 4 135
S= 14000T= 0.2b= 0.002a= 0
N=1000000
0
times in microseconds
N=100 million records
CS 245 Notes 4 136
S= 14000T= 0.2b= 0.002a= 0
N=1000000
0
times in microseconds
CS 245 Notes 4 137
Variation on B+tree: B-tree (no +)• Idea:
– Avoid duplicate keys– Have record pointers in non-leaf nodes
CS 245 Notes 4 138
to record to record to record with K1 with K2 with K3
to keys to keys to keys to keys < K1 K1<x<K2 K2<x<k3 >k3
K1 P1 K2 P2 K3 P3
CS 245 Notes 4 139
B-tree example n=2
65
125
145
165
85
105
25
45
10
20
30
40
110
120
90
100
70
80
170
180
50
60
130
140
150
160
CS 245 Notes 4 140
B-tree example n=2
65
125
145
165
85
105
25
45
10
20
30
40
110
120
90
100
70
80
170
180
50
60
130
140
150
160
• sequence pointers not useful now! (but keep space for simplicity)
CS 245 Notes 4 141
Note on inserts
• Say we insert record with key = 25
10
20
30 n=3
leaf
CS 245 Notes 4 142
Note on inserts
• Say we insert record with key = 25
10
20
30 n=3
leaf
10
– 20 –
25
30
• Afterwards:
CS 245 Notes 4 143
So, for B-trees:
MAX MINTree Rec Keys Tree Rec KeysPtrs Ptrs Ptrs Ptrs
Non-leafnon-root n+1 n n (n+1)/2 (n+1)/2-1
(n+1)/2-1Leafnon-root 1 n n 1 n/2 n/2
Rootnon-leafn+1 n n 2 1 1
RootLeaf 1 n n 1 1 1
CS 245 Notes 4 144
Tradeoffs:
B-trees have faster lookup than B+trees
in B-tree, non-leaf & leaf different sizes in B-tree, deletion more complicated
CS 245 Notes 4 145
Tradeoffs:
B-trees have faster lookup than B+trees
in B-tree, non-leaf & leaf different sizes in B-tree, deletion more complicated
B+trees preferred!
CS 245 Notes 4 146
But note:
• If blocks are fixed size (due to disk and buffering restrictions)
Then lookup for B+tree is actually better!!
CS 245 Notes 4 147
Example:- Pointers 4 bytes- Keys 4 bytes- Blocks 100 bytes (just example)- Look at full 2 level tree
CS 245 Notes 4 148
Root has 8 keys + 8 record pointers+ 9 son pointers
= 8x4 + 8x4 + 9x4 = 100 bytes
B-tree:
CS 245 Notes 4 149
Root has 8 keys + 8 record pointers+ 9 son pointers
= 8x4 + 8x4 + 9x4 = 100 bytes
B-tree:
Each of 9 sons: 12 rec. pointers (+12 keys)
= 12x(4+4) + 4 = 100 bytes
CS 245 Notes 4 150
Root has 8 keys + 8 record pointers+ 9 son pointers
= 8x4 + 8x4 + 9x4 = 100 bytes
B-tree:
Each of 9 sons: 12 rec. pointers (+12 keys)
= 12x(4+4) + 4 = 100 bytes2-level B-tree, Max # records =
12x9 + 8 = 116
CS 245 Notes 4 151
Root has 12 keys + 13 son pointers= 12x4 + 13x4 = 100 bytes
B+tree:
CS 245 Notes 4 152
Root has 12 keys + 13 son pointers= 12x4 + 13x4 = 100 bytes
B+tree:
Each of 13 sons: 12 rec. ptrs (+12 keys)
= 12x(4 +4) + 4 = 100 bytes
CS 245 Notes 4 153
Root has 12 keys + 13 son pointers= 12x4 + 13x4 = 100 bytes
B+tree:
Each of 13 sons: 12 rec. ptrs (+12 keys)
= 12x(4 +4) + 4 = 100 bytes2-level B+tree, Max # records
= 13x12 = 156
CS 245 Notes 4 154
So...
ooooooooooooo ooooooooo 156 records 108 records
Total = 116
B+ B
8 records
CS 245 Notes 4 155
So...
ooooooooooooo ooooooooo 156 records 108 records
Total = 116
B+ B
8 records
• Conclusion:– For fixed block size,– B+ tree is better because it is bushier
CS 245 Notes 4 156
An Interesting Problem...
• What is a good index structure when:– records tend to be inserted with keys
that are larger than existing values?(e.g., banking records with growing data/time)
– we want to remove older data
CS 245 Notes 4 157
One Solution: Multiple Indexes
• Example: I1, I2
day days indexed days indexed I1 I210 1,2,3,4,5 6,7,8,9,1011 11,2,3,4,5 6,7,8,9,1012 11,12,3,4,5 6,7,8,9,1013 11,12,13,4,5 6,7,8,9,10
•advantage: deletions/insertions from smaller index•disadvantage: query multiple indexes
CS 245 Notes 4 158
Another Solution (Wave Indexes)
day I1 I2 I3 I410 1,2,3 4,5,6 7,8,9 1011 1,2,3 4,5,6 7,8,9 10,1112 1,2,3 4,5,6 7,8,9 10,11, 1213 13 4,5,6 7,8,9 10,11, 1214 13,14 4,5,6 7,8,9 10,11, 1215 13,14,15 4,5,6 7,8,9 10,11, 1216 13,14,15 16 7,8,9 10,11, 12
•advantage: no deletions•disadvantage: approximate windows
CS 245 Notes 4 159
Outline/summary
• Conventional Indexes• Sparse vs. dense• Primary vs. secondary
• B trees• B+trees vs. B-trees• B+trees vs. indexed sequential
• Hashing schemes --> Next