Upload
trankhue
View
224
Download
0
Embed Size (px)
Citation preview
Chapter 9 - 1
÷�öOF fj¶�j� æbÚ
Chapter 9
The B+ Tree Family and
Indexed Sequential File Access
Chapter 9 - 1÷�öOF fj¶�j� æbÚ
TABLE OF CONTENTS
● Indexed Sequential Access● Maintaining a Sequence Set● Adding a Simple Index to the Sequence Set● Separators Instead of Keys● The Simple Prefix B+ Tree● Simple Prefix B+ Tree Maintenance● Index Set Block Size● Internal Structure of Index Set Blocks● Loading a Simple Prefix B+ Tree● B+ Tree● Perspective ( B / B+ / Simple Prefix B+ Tree )
Chapter 9 - 2
Chapter 9 - 2÷�öOF fj¶�j� æbÚ
1. Indexed Sequential Access
z ��
✔ Indexed Access� Sequential Access�����
✔ File ��
– Indexed Part ( Random Access �� )– Sequential Part ( Batch Processing �� )
z �
✔ Student record systems at universities✔ Credit card systems✔ Banking systems
Chapter 9 - 3÷�öOF fj¶�j� æbÚ
2. Maintaining a Sequence Set
z Sorted file���
✔ Record insertion/deletion/update ���
✔ �� file ��� I/O overhead
2.1 The Use of Blocks
z Basic Idea✔ Restrict the effects of insertion or deletion✔ Collect the records into blocks✔ � Block � linked list��� ( �� 9.1 )✔ Sequence Set
Chapter 9 - 3
Chapter 9 - 4÷�öOF fj¶�j� æbÚ
z �� ( �� 9.1 )✔ Overflow: ��� page �� & Link ��
✔ Underflow: Redistribution or Concatenation
z ������
✔ Internal Fragmentation✔ No Clustering✔ Sorting (O) → Binary Search (X)
Chapter 9 - 5÷�öOF fj¶�j� æbÚ
ADAMS . . . BAIRD . . . BIXBY . . . BOONE . . .
BYNUM . . . CARSON . . . COLE . . . DAVIS . . .
DENVER . . . ELLIS . . .
ADAMS . . . BAIRD . . . BIXBY . . . BOONE . . .
BYNUM . . . CARSON . . . CARTER . . .
DENVER . . . ELLIS . . .
COLE . . . DAVIS . . .
Block 1
Block 2
Block 3
Block 1
Block 2
Block 3
Block 4
(a)
(b)
Chapter 9 - 4
Chapter 9 - 6÷�öOF fj¶�j� æbÚ
ADAMS . . . BAIRD . . . BIXBY . . . BOONE . . .
BYNUM . . . CARSON . . . CARTER . . .
COLE . . . DENVER . . . ELLIS . . .
Block 1
Block 2
Block 3
Block 4
(c)
Availablefor reuse
FIGURE 9.1 Block splitting and concatenation due to insertions anddeletions in the sequence set. (a) Initial blocked sequence set. (b) Sequenceset after insertion of CARTER record – block 2 splits, and the contents aredivided between blocks 2 and 4. (c) Sequence set after deletion of DAVISrecord – block 4 is less than half full, so it is concatenation with block 3.
Chapter 9 - 7÷�öOF fj¶�j� æbÚ
2.2 Choice of Block Size
z � Block size��� file ��������?
z ����
1) ���� block ����� RAM �����
– Node Splitting or Concatenating– Two-to-three splitting ���?
2) ��� block access ������ I/O ���
– Block size = 1 cluster !
Chapter 9 - 5
Chapter 9 - 8÷�öOF fj¶�j� æbÚ
3. Adding a Simple Index to Sequence Set
z ����
✔ ��� key ��� random access ��
✔ Index : � block ���� key �� (�� 9.3)✔ Index � RAM ������ (Simple Index)
– Binary search ��
– Efficient index update ��
✔ Index ������, ������� (B+ Tree)– B-Tree index +– A Sequence that holds the actual records
�����?
Chapter 9 - 9÷�öOF fj¶�j� æbÚ
ADAMS-BERNE BOLEN-CAGE CAMP-DUTTON EMBRY-EVANS FABER-FOLK FOLKS-GADDIS
1 2 3 4 5 6
FIGURE 9.2 Sequence of blocks showing the range of keys in each block.
FIGURE 9.3 Simple index for the sequence set illustrated in Fig. 9.2.
Key Block number
BERNE 1CAGE 2DUTTON 3EVANS 4FOLK 5GADDIS 6
Chapter 9 - 6
Chapter 9 - 10÷�öOF fj¶�j� æbÚ
4. Separators Instead of Keys
z B+ Tree�� (non-leaf) index key���
✔ Leaf node ����� separator��� (�� 9.4)✔ ������������ ← B-Tree�����
z Optimization✔ ������� string separator���
✔ �� : Fan-out ��
find_sep ( key1, key2, sep )Char *key1, *key2, *sep ;{
while ( ( *sep++ = *key2++ ) = = *key1++ ) ;*sep = ‘ \0 ’ ;
}
Chapter 9 - 11÷�öOF fj¶�j� æbÚ
ADAMS-BERNE BOLEN-CAGE CAMP-DUTTON EMBRY-EVANS FABER-FOLK FOLKS-GADDIS
1 2 3 4 5 6
Separator: BO CAM E F FOLKS
FIGURE 9.4 Separator between blocks in the sequence set.
FIGURE 9.5 A list of potential separator
CAMP-DUTTON EMBRY-EVANS
DUTUDVXGHESJFDZEEBQXELEEMOSYNARY3 4
Chapter 9 - 7
Chapter 9 - 12÷�öOF fj¶�j� æbÚ
5. The Simple Prefix B+ Tree
z �� ( �� 9.8 )✔ Index set + Sequence set✔ Simple prefix : Index set contains
– shortest separator, or– prefixes of the keys
Chapter 9 - 13÷�öOF fj¶�j� æbÚ
ADAMS-BERNE BOLEN-CAGE CAMP-DUTTON EMBRY-EVANS FABER-FOLK FOLKS-GADDIS
1 2 3 4 5 6
BO CAM F FOLKS
E
FIGURE 9.8 A B-tree index set for the sequence set, forming a simpleprefix B+ tree.
Indexset
Chapter 9 - 8
Chapter 9 - 14÷�öOF fj¶�j� æbÚ
6. Simple Prefix B+ Tree Maintenance
z �
✔ �� 9.8�� “EMBRY” ���
– Sequence Set ��� ( Index ��� )✔ �� 9.8�� “FOLKS” ���
– Sequence Set ��
– Index ���� separator �� ← �����?✔ �� 9.9� “EATON” ��
– Sequence Set ��
– Index set �� separator ���?
6.1 Changes Localized to Single Blocks in a Sequence Set
Chapter 9 - 15÷�öOF fj¶�j� æbÚ
ADAMS-BERNE BOLEN-CAGE CAMP-DUTTON ERVIN-EVANS FABER-FOLK FROST-GADDIS
1 2 3 4 5 6
BO CAM F FOLKS
E
FIGURE 9.9 The deletion of the EMBRY and FOLKS records from thesequence set leaves the index set unchanged.
Chapter 9 - 9
Chapter 9 - 16÷�öOF fj¶�j� æbÚ
6.2 Changes Involving Multiple Blocks in a Sequence Set
z B-Tree insertion/deletion����
✔ �� 9.9 �� block 1� splitting ��
– �� 9.10✔ �� 9.10�� block 2� underflow ��
– Block 2 & 3 � concatenation– �� 9.11
Chapter 9 - 17÷�öOF fj¶�j� æbÚ
ADAMS-AVERY
BOLEN-CAGE
CAMP-DUTTON
EMBRY-EVANS
FABER-FOLK
FOLKS-GADDIS
1 2 3 4 5 6
BO E
F FOLKSAY
AYERS-BERNE
7
CAM
FIGURE 9.10 An insertion into block 1 causes a split and the consequent addition of block 7. The addition of a block in the sequence set requires a new separator in the index set. Insertion of the AY separator into the node containing BO and CAM causes a node splitin the index set B-tree and consequent promotion of BO to the root.
Chapter 9 - 10
Chapter 9 - 18÷�öOF fj¶�j� æbÚ
ADAMS-AVERY AYERS-BERNE BOLEN-DUTTON ERVIN-EVANS FABER-FOLK FROST-GADDIS
1 7 2 4 5 6
AY BO F FOLKS
E
FIGURE 9.11 A deletion from block 2 causes underflow and the consequent concatenationof block 2 and 3. After the concatenation, block 3 is no longer needed and can be placed on an avail list . Consequently, the separator CAM is no longer needed. Removing CAM from its node in the index set forces a concatenation of index set nodes, bringing BO back down from the root.
Chapter 9 - 19÷�öOF fj¶�j� æbÚ
z Index Set ����
✔ Sequence set� block� split ��, ��� separator � index set�������.
✔ Sequence set� block �� concatenation ��, ��
separator� index set��������.✔ Sequence set� block� ��� record � redistribute ��, index set� separator��������.
z Index set� sequence set���������
��������?
Chapter 9 - 11
Chapter 9 - 20÷�öOF fj¶�j� æbÚ
7. Index Set Block Size
z Index block� Sequence block�������
���
✔ Sequence set� block ������ =Index set� block ������
✔ Virtual simple prefix B+ Tree� �����
✔ Index set� Sequence set��� file�����
– ��: ��� file��� seek time ��
– ��: �� key��� index������?
Chapter 9 - 21÷�öOF fj¶�j� æbÚ
8. Internal Structure of Index Set Blocks
z ��
✔ Variable-length separator ��
– Fixed-length separator���?✔ ��� binary search ��
z ��
✔ Index page� separator, separator��� index, ��
����� pointer ����
Chapter 9 - 12
Chapter 9 - 22÷�öOF fj¶�j� æbÚ
z ����� (�� 9.12 ~ �� 9.14)✔ Separator counter: binary search ���
✔ Total length of separators: index � ����
✔ Separators: �� separator� concatenation✔ Index: �� separator��� offset✔ Pointer: relative block number
z ����
✔ ����������������.✔ B+ Tree� order���
– ����� separator ���
– Split/Concatenate/Redistribute ����
Chapter 9 - 23÷�öOF fj¶�j� æbÚ
AsBaBroCChCraDeleEdiErrFaFle 00 02 04 07 08 10 13 17 20 23 25
Concatenatedseparator
Index to separators
FIGURE 9.12 Variable-length separators and corresponding index.
Chapter 9 - 13
Chapter 9 - 24÷�öOF fj¶�j� æbÚ
11 28 AsBaBroCChCraDeleEdiErrFaFle 00 02 04 07 08 10 13 17 20 23 25 B00 B01 B02 B03 B04 B05 B06 B07 B08 B09 B10 B11
Total length of separators
Separator count
Separators Index toseparators
Relative block numbers
FIGURE 9.13 Structure of an index set block.
B00 As B01 Ba B02 Bro B03 C B04 Ch B05 Cra B06 Dele B07 Edi B08 Err B09 Fa B10 Fle B11
Separatorsubscript : 0 1 2 3 4 5 6 7 8 9 10
FIGURE 9.14 Conceptual relationship of separators and relative block numbers.
Chapter 9 - 25÷�öOF fj¶�j� æbÚ
• Data Page
� Variable Length Record ��
� Normal Record��� : Maximum Size = 1 Page
Record(0) Record(1) Record(2) • • • Record(cnt-1)
PtrRID(cnt-1) • • • PtrRID(1) PtrRID(0) ThisPage
first free byte FileID RIDcnt PrevPage NextPage
Free Area
• Record Header
Type(Moved, Not Moved, New Home)
Kind
(Normal, Slice, Crumb) Length
Chapter 9 - 14
Chapter 9 - 26÷�öOF fj¶�j� æbÚ
• Long Data Item
� Long Data Item = Directory + Slice + Crumb
# of bytes # of segments RID 1, Length 1 • • • RID n Length n
• RID ( Record ID )
Volume ID Page Address Slot
Chapter 9 - 27÷�öOF fj¶�j� æbÚ
• Index Structure
Index
Description
Structure
Scan
Etc
B-Tree Hash
• Prefix B-Tree • Extendible Hash
• Root, Internal, Leaf
• Boolean Expression• Range Scan
• Secondary Index: Managing Long Data Item forLong RID list
• Root, Leaf
• Boolean Expression• Search Key (Matching)
• Root: Hash Table with ShortPID
Chapter 9 - 15
Chapter 9 - 28÷�öOF fj¶�j� æbÚ
•Root & Internal Node
ææ Header Information : < Key Type, Key Length, Offset into Record >ææ Header Information 6 Root Node ÆÂj¢
Header Information
PID 2, Key 2 • • •
PID 1, Key 1
PID n, Key n
PID n+1 Free Space
offset n • • • offset 2
Control Informationoffset 1
Chapter 9 - 29÷�öOF fj¶�j� æbÚ
•Leaf Node
� Data Entry : < Key, Count, RID list >� Control Information
– File ID, Page Identifiers ( ThisPage, Next, previous )– Count Information ( # of Free Bytes, # of Entries )– Miscellaneous Information
( type of page, uniqueness of index, … )
Data Entry 1 Data Entry 2 • • •
• • • • • • Data Entry n
Free Space
offset n • • • offset 2
offset 1 Control Information
Chapter 9 - 16
Chapter 9 - 30÷�öOF fj¶�j� æbÚ
9. Loading a Simple Prefix B+ Tree
z B+ Tree����� 2���
✔ N�� B+ Tree insertion �����
– Searching overhead– Splitting overhead
✔ � level leaf �� bottom up���� (�)
z Example✔ �� 9.15 ~ �� 9.17✔ ������� underflow ����
✔ Example�������?
Chapter 9 - 31÷�öOF fj¶�j� æbÚ
ALWASPBET 00 03 06
ACCESS-ALSO ALWAYS-ASK ASPECT-BEST BETTER-CAST CATCH-CHECK
Nextsequenceset block :
FIGURE 9.15 Formation of the first index set block as the sequence set is loaded.
Next separator : CAT
Chapter 9 - 17
Chapter 9 - 32÷�öOF fj¶�j� æbÚ
ACCESS-ALSO ALWAYS-ASK ASPECT-BEST CATCH-CHECKBETTER-CAST
ALWASPBET 00 03 06 -1 -1 -1
CAT 00 -1 -1 Index blockcontaining noseparators
FIGURE 9.16 Simultaneous building of two index set levels as the sequenceset continues to grow.
Chapter 9 - 33÷�öOF fj¶�j� æbÚ
ACCESS-ALSO ALWAYS-ASK ASPECT-BEST BETTER-CAST
CATCH-CHECK CLASS-COPY COST-DAMAGE DELETE-DISK
DRUM-EDITOR EFFORT-GROW HEAD-IDEAL IGNORE-ITEM
• • •
• • •
ALWASPBET 00 03 06 CLCOSDE 00 02 05 EFHIG 00 02 03
CATDR 00 03 -1
• • •
• • •
FIGURE 9.17 Continued growth of index set built up from the sequence set.
Chapter 9 - 18
Chapter 9 - 34÷�öOF fj¶�j� æbÚ
z “� level leaf�� bottom up����” – ��
✔ B+ Tree ������
– ������������
– Searchin /splitting overhead ����
✔ ��� fill factor ����
✔ Index set� sequence set�� physical proximity (×)
Chapter 9 - 35÷�öOF fj¶�j� æbÚ
10. B+ Trees
z Simple Prefix B+ Tree�����
✔ B+ Tree���, separator� actual key � copy✔ �� 9.15 ↔ �� 9.18
ALWAYSASPECTBETTER 00 06 12
ACCESS-ALSO ALWAYS-ASK ASPECT-BEST BETTER-CAST CATCH-CHECK
Nextsequenceset block :
Next separator : CATCH
FIGURE 9.18 Formation of the first index set block in a B+ tree withoutthe use of shortest separators.
Chapter 9 - 19
Chapter 9 - 36÷�öOF fj¶�j� æbÚ
z B+ Tree��, ��
✔ ��: ���� Fan-out������.✔ ��: ���
– ���� key ����
– Compression�������.
Chapter 9 - 37÷�öOF fj¶�j� æbÚ
11. B-Tree Family � ��
z B-Tree Family��������� ��.✔ ���
✔ Index ��� ��, simple index ��
✔ Random access��� ����, hashing ��
z B-Tree Family � ��
✔ Paged index structure✔ Maintain height-balanced trees✔ The trees grow from the bottom up.✔ Greater storage efficiency ( 1-to-2, 2-to-3 )✔ Virtual tree structures ����
✔ Variable-length record ����
Chapter 9 - 20
Chapter 9 - 38÷�öOF fj¶�j� æbÚ
z B-Tree✔ (key, info)�����������
– Leaf node���� �����
✔ Less space than B+ Tree– But, data record��� pointer �
– Separator < Key ← Tree depth�����?✔ In-order traversal��� sequential search ��
– Data� index���������? (NO!)
Chapter 9 - 39÷�öOF fj¶�j� æbÚ
z B+ Tree✔ Leaf node�� linked list �����
– Efficient sequential access ��
– Range query ����
✔ Tree Order ��
– Internal node� data� �� pointer ��
✔ Deletion ����������
✔ Leaf ��� Non-leaf ���������
– (Key, Child /Data pointer)– B-Tree: (Key, Child pointer, Data pointer)