Upload
widespreadpromotion
View
413
Download
3
Embed Size (px)
Citation preview
1Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
13. Indexing and Multiway Trees
2Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Objectives Indexing techniques
B-trees which prove invaluable for problems of external information retrieval
A class of trees called tries, which share some properties of table lookup
Important uses of trees in many search techniques
3Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Introduction A file is a collection of records, each record
having one or more fields
The fields used to distinguish among the records are known as keys
File organization describes the way where the records are stored in a file
File organization is concerned with representing data records on an external storage media
4Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
The file organization breaks down into two more aspects:
Directory—for collection of indices
File organization—for the physical organization of records
5Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
File organization is the way records are organized on a physical storage
One of such organizations is sequential (ordered and unordered)
In this general framework, processing a query or updating a request would proceed in two steps:
The indices would be interrogated to determine the parts of the physical file to be searched
These parts of the physical file will be searched
6Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Indexing An index, whether it is a book or a data file
index (in computer memory), is based on the basic concepts such as keys and reference fields
The index to a book provides a way to find a topic quickly
7Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Indexing An index, whether it is a book or a data file
index (in computer memory), is based on the basic concepts such as keys and reference fields
The index to a book provides a way to find a topic quickly
8Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Cylinder-Surface Indexing This is the simplest type of index organization.
It is useful only for the primary key index of a sequentially ordered file
In a sequentially ordered file, the physical sequence of records is ordered by the key, called the primary key
The cylinder-surface index consists of a cylinder index and several surface indexes
9Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
For each cylinder, there is a surface index. If the disk has S usable surfaces, then each surface index has s entries. The total number of surface index entries is C.SEmp. No. Emp.
NameCylinder Surface
12345678
AboleeAnandAmitAmolRohit
SantoshSaurabh
Shila
11112222
11221122
10Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Let there be two surfaces and two records stored per track. The file is organized sequentially on the field ‘Emp. name’
The cylinder index is shown in following tableEmp. No. Highest Key Value
1
2
Amol
Shila
11Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
This method of maintaining a file and index is referred to as ISAM (indexed sequential access method)
It is the simplest file organization for single key files but not useful for multiple key files
12Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Hashed Indexes The operations related to hashed indexes are
the same as those for hash tables
13Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Multiway Search Trees
A multiway search tree is a tree of order m, where each node has utmost m children
Fig. shows way search tree:
d e p v
w x y z
rh j k l
b c
qia f g
m n o
s t u
14Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
B-trees A B-tree is a balanced M-way tree. A node of the
tree contains many records or keys of records and pointers to children
To reduce disk access, the following points are applicable: Height is kept minimum
All leaves are kept at the same level
All other than leaves must have at least minimum number of children
15Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
B-trees Definition: A B-tree of order m is an m-way tree with
the following properties: The number of keys in each internal node is one
less than the number of its non-empty children, and these keys partition the keys in the children in the fashion of the search tree
All leaves are on the same level All internal nodes except the root have utmost m
non-empty children and at least [m/2] non-empty children
The root is either a leaf node, or it has from two to m children
A leaf node contains no more than m − 1 keys
16Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Node structure
Ptr1 Key1 Ptr2 Key2 Ptri Keyi …….. Keyn-1
Ptrn
X XXX<Key1 Keyi-1<X<Keyi X>Keyn
-1
17Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Operations on B-tree Search a node
Insertion of a key into a B-tree
Deletion from a B-tree
18Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
B+ Tree B+ trees are internal data structures That is, the nodes contain whatever information
is associated with the key as well as the key values
19Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
B+ Tree Structure The structure of a B+ tree can be
understood from the following points: A B+ tree is in the form of a balanced tree
where every path from the root of the tree to a leaf of the tree is of the same length
Each non-leaf node (internal node) in the tree has between [n/2] and n children, where n is fixed
The pointer (Ptr) can point to either a file record or a bucket of pointers which each point to a file record
Searching time is less in B+ trees but has some problem of wasted space
20Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Nodes of B+ Tree Internal node of a B+ tree with q −1 search
values
Leaf node of a B+ tree with q − 1 search values and q − 1 data pointers
21Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Node structure
Ptr1 Key1 Ptr2 Key2 Ptri Keyi …….. Keyn-1
Ptrn
X XX
X<Key1 Keyi-1<X<Keyi X>Keyn-1
Tree PointerTree Pointer
Tree Pointer
22Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Advantages of B+ trees over Indexed Sequential Access Method
A dynamic index structure that adjusts gracefully to inserts and deletes
A balanced tree
Leaf pages are not allocated sequentially. They are linked together through pointers (a doubly linked list)
23Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Trie Tree One solution is to prune from the tree all the
branches that do not lead to any key
The resulting tree is called a trie (short for reTRIEvaL and pronounced ‘try’)
The number of steps needed to search a trie is proportional to the number of characters in a key
24Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Splay Trees Splay trees are a form of a BST. A splay tree
maintains a balance without any explicit balance condition such as color
Instead, ‘splay operations’, which involve rotations, are performed within the tree every time an access is made
25Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Splay Trees If we use a BST or even an AVL tree, then the records
of the newly admitted patient’s records will go to a leaf position, far from the root, and the access will be slower
Instead, we want to keep the records that are newly inserted or frequently accessed very near to the root, while the inactive records far off, in the leaf positions
However, we do not want to rebuild the tree into the desired shape. Instead, we need to make a tree a self-adjusting data structure that automatically changes its shape to bring the records closer to the root as they are used frequently, allowing inactive records to drift slowly down towards the leaves. Such trees are called as splay trees
26Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Red-black Trees A red-black tree is a BST with one extra bit of
storage per node: its colour, which can either be red or black
27Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Properties of red-black trees
Every node is either red or black All the external nodes (leaf nodes) are black The rank in a tree goes from zero upto the maximum
rank which occurs at the root. The rank of two consecutive nodes differs by utmost 1. Each leaf node has a rank 0
If a node is red, then both its children are black. In other words, consecutive red nodes are disallowed. This means every red node is followed by a black node; on the other hand, a black node may be followed by a black or a red node
This implies that utmost 50% of the nodes on any path from external node to root are red
28Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Properties of red-black trees
The number of black nodes on any path from but not including the node x to leaf is called as black height of the node x, denoted as bh(x)
Every simple path from the root to a leaf contains the same number of black nodes
In addition, every simple path from a node to a descendent leaf contains the same number of black nodes
If a black node has a rank r, then its parent has the rank r + 1
If a red node has a rank r, then its parent will have the rank r as well
29Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
KD-Trees A KD-tree is a data structure used in computer
science during orthogonal range searching, for instance, to find the set of points that fall into a given rectangle
30Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
AA TreeAn AA tree is a balanced BST with the following
properties:
Every node is colored either red or black
The root is black
If a node is red, both of its children are black
Every path from a node to a null reference has the same number of black nodes
Left children may not be red
31Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Advantages of AA Trees
They eliminate half the reconstructing cases
They simplify deletion by removing an annoying case If an internal node has only one child, that child
must be a red child We can always replace a node with the smallest
child in the right subtree; it will either be a leaf node or have a red child
AA tree, balanced BST, supports efficient operations, since most operations only have to traverse one or two root-to-leaf paths
32Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Representing Balance Information in AA Tree
In each node of AA tree, we store a level. The level is defined by the following rules: If a node is a leaf, its level is one
If a node is red, its level is the level of its parent
If a node is black, its level is one less than the level of its parent
Here, the level is the number of left links to a null reference
33Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Links in an AA tree A horizontal link is a connection between
a node and a child with equal levels The properties of such horizontal links are
as follows:
Horizontal links are right references
There cannot be two consecutives horizontal links
Nodes at level two or higher must have two children
If a node has no right horizontal link, its two children are at the same level
34Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
Summary
A node of a BST has only one key value entry stored in it. A multiway tree has many key values stored in each node and thus each node may have multiple subtrees
Different indexing techniques are used to search a record in O(1) time. The index is a pair of key value and address. It is an indirect addressing that imposes order on a file without rearranging the file
Indexing techniques are classified as Hashed indexing, Tree indexing, B-tree, B+ tree, Trie tree
Splay trees are self-adjusting trees
35Oxford University Press © 2012
Data Structures Using C++ by Dr Varsha Patil
END Of
Chapter 13….!