Upload
akhilesh-chittora
View
221
Download
0
Embed Size (px)
Citation preview
8/3/2019 About Trees
1/19
Binary Search Tree
In computer science, a binary search tree (BST) is a node based binary tree data structure which has the following properties:
The left subtree of a node contains only nodes with keys less than the node's key. The right subtree of a node contains only nodes with keys greater than the node's key.
Both the left and right subtrees must also be binary search trees.
From the above properties it naturally follows that:
Each node (item in the tree) has a distinct key.
Generally, the information represented by each node is a record rather than a single data element. However, for sequencing purposes,
nodes are compared according to their keys rather than any part of their their associated records.
The major advantage of binary search trees over other data structures is that the related sorting algorithms and search algorithms such as
order traversal can be very efficient.
Binary search trees are a fundamental data structure used to construct more abstract data structures such as sets, multisets, and associa
arrays.
A binary search tree of size 9 and depth 3, with root 8 and leaves 1, 4, 7 and 13
Binary Search tree is a binary tree in which each internal nodex stores an element such that theelement stored in the left subtree ofx are less than or equal tox and elements stored in the right subtr
ofx are greater than or equal tox. This is called binary-search-tree property.
http://en.wikipedia.org/wiki/File:Binary_search_tree.svg8/3/2019 About Trees
2/19
The basic operations on a binary search tree take time proportional to the height of the tree. For a
complete binary tree with node n, such operations runs in (lg n) worst-case time. If the tree is a
linear chain of n nodes, however, the same operations takes (n) worst-case time.
Binary search tree
First of all, binary search tree (BST) is a dynamic data structure, which means, that its size is only limited
amount of free memory in the operating system and number of elements may vary during the program run. Madvantage of binary search trees is rapid search, while addition is quite cheap. Let us see more formal definiti
of BST.
Binary search tree is a data structure, which meets the following requirements:
it is abinary tree; each node contains a value; a total order is defined on these values (every two values can be comparedwith each other); left subtree of a node contains only values lesser, than the node's value; right subtree of a node contains only values greater, than the node's value.
Notice, that definition above doesn't allow duplicates.
Example of a binary search tree
What for binary search trees are used?
Binary search tree is used to construct map data structure. In practice, data can be often associated with so
unique key. For instance, in the phone book such a key is a telephone number. Storing such a data in bin
search tree allows to look up for the record by key faster, than if it was stored in unordered list. Also, BST can
http://www.algolist.net/Data_structures/Binary_treehttp://www.algolist.net/Data_structures/Binary_treehttp://www.algolist.net/Data_structures/Binary_treehttp://www.algolist.net/Data_structures/Dictionary_(ADT)http://www.algolist.net/Data_structures/Dictionary_(ADT)http://www.algolist.net/Data_structures/Dictionary_(ADT)http://www.algolist.net/Data_structures/Binary_tree8/3/2019 About Trees
3/19
utilized to construct set data structure, which allows to store an unordered collection of unique values and ma
operations with such collections.
Performance of a binary search tree depends of its height. In order to keep tree balanced and minimize its heig
the idea of binary search trees was advanced in balanced search trees (AVL trees, Red-Black trees, Splay tree
Here we will discuss the basic ideas, laying in the foundation of binary search trees.
Binary search tree. Internal representation
Like any other dynamic data structure, BST requires storing of some additional auxiliary data, in order to keep
structure. Each node of binary tree contains the following information:
a value (user's data); a link to the left child (auxiliary data); a link to the right child (auxiliary data).
Depending on the size of user data, memory overhead may vary, but in general it is quite reasonable. In someimplementations, node may store a link to the parent, but it depends on algorithm, programmer want to apply to
BST. For basic operations, like addition, removal and search a link to the parent is not necessary. It is needed in
order to implement iterators.
With a view to internal representation, the sample from the overview changes:
Leaf nodes have links to the children, but they don't have children. In a programming language it means, that
corresponding links are set toNULL.
Binary search tree. Adding a value
Adding a value to BST can be divided into two stages:
search for a place to put a new element;
8/3/2019 About Trees
4/19
insert the new element to this place.
Let us see these stages in more detail.
Search for a place
At this stage analgorithm should follow binary search tree property. If a new value is less, than the current node's
value, go to the left subtree, else go to the right subtree. Following this simple rule, the algorithm reaches a node,
which has no left or right subtree. By the moment a place for insertion is found, we can say for sure, that a new vahas no duplicate in the tree. Initially, a new node has no children, so it is a leaf. Let us see it at the picture. Gray
circles indicate possible places for a new node.
Now, let's go down to algorithm itself. Here and in almost every operation on BST recursion is utilized. Startingfrom the root,
1. check, whether value in current node and a new value are equal. If so, duplicate is found. Otherwise,
2. if a new value is less, than the node's value:o if a current node has no left child, place for insertion has been found;o otherwise, handle the left child with the same algorithm.
3. if a new value is greater, than the node's value:o if a current node has no right child, place for insertion has been found;o otherwise, handle the right child with the same algorithm.
Just before code snippets, let us have a look on the example, demonstrating a case of insertion in the binary search tree.
Example
Insert 4 to the tree, shown above.
8/3/2019 About Trees
5/19
8/3/2019 About Trees
6/19
Code snippets
The only the difference, between the algorithm above and the real routine is that first we should check, if a rootexists. If not, just create it and don't run a common algorithm for this special case. This can be done in the
BinarySearchTree class. Principal algorithm is implemented in theBSTNode class.
Binary search tree. Lookup operation
Searching for a value in a BST is very similar to add operation. Search algorithm traverses the tree "in-depth",
choosing appropriate way to go, following binary search tree property and compares value of each visited node withe one, we are looking for. Algorithm stops in two cases:
a node with necessary value is found;
algorithm has no way to go.
8/3/2019 About Trees
7/19
Search algorithm in detail
Now, let's see more detailed description of the search algorithm. Like an add operation, and almost every operatioon BST, search algorithm utilizes recursion. Starting from the root,
1. check, whether value in current node and searched value are equal. If so,value is found. Otherwise,2. if searched value is less, than the node's value:
o if current node has no left child, searched value doesn't exist in the BST;o otherwise, handle the left child with the same algorithm.
3. if a new value is greater, than the node's value:o if current node has no right child, searched value doesn't exist in the BST;o otherwise, handle the right child with the same algorithm.
Just before code snippets, let us have a look on the example, demonstrating searching for a value in the binary search tree
Example
Search for 3 in the tree, shown above.
8/3/2019 About Trees
8/19
Binary search tree. Removing a node
Remove operation on binary search tree is more complicated, than add and search. Basically, in can be divided into
two stages:
8/3/2019 About Trees
9/19
search for a node to remove;
if the node is found, run remove algorithm.
Remove algorithm in detail
Now, let's see more detailed description of a remove algorithm. First stage is identical toalgorithm for lookup,except we should track the parent of the current node. Second part is more tricky. There are three cases, which are
described below.
1. Node to be removed has no children.
This case is quite simple. Algorithm sets corresponding link of the parent to NULL and disposes the node.
Example. Remove -4 from a BST.
2. Node to be removed has one child.
It this case, node is cut from the tree and algorithm links single child (with it's subtree) directly to the parenof the removed node.
Example. Remove 18 from a BST.
http://www.algolist.net/Data_structures/Binary_search_tree/Lookuphttp://www.algolist.net/Data_structures/Binary_search_tree/Lookuphttp://www.algolist.net/Data_structures/Binary_search_tree/Lookuphttp://www.algolist.net/Data_structures/Binary_search_tree/Lookup8/3/2019 About Trees
10/19
3. Node to be removed has two children.
This is the most complex case. To solve it, let us see one useful BST property first. We are going to use theidea, that the same set of values may be represented as different binary-search trees. For example those
BSTs:
8/3/2019 About Trees
11/19
contains the same values {5, 19, 21, 25}. To transform first tree into second one, we can do following:
o choose minimum element from the right subtree (19 in the example);o replace 5 by 19;o hang 5 as a left child.
The same approach can be utilized to remove a node, which has two children:
o find a minimum value in the right subtree;o replace value of the node to be removed with found minimum. Now, right subtree contains a duplicate!o apply remove to the right subtree to remove a duplicate.
Notice, that the node with minimum value has no left child and, therefore, it's removal may result in first or
second cases only.
Example. Remove 12 from a BST.
8/3/2019 About Trees
12/19
Find minimum element in the right subtree of the node to be removed. In current example it is 19.
Replace 12 with 19. Notice, that only values are replaced, not nodes. Now we have two nodes with the samvalue.
8/3/2019 About Trees
13/19
Remove 19 from the left subtree.
8/3/2019 About Trees
14/19
AVL tress
An AVL tree is a self-balancing binary search tree, and it is the first such data structure to be invented. In an AVL tree, the heights of the tw
child subtrees of any node differ by at most one; therefore, it is also said to be height-balanced. Lookup, insertion, and deletion all take O(
n) time in both the average and worst cases, where n is the number of nodes in the tree prior to the operation. Insertions and deletions marequire the tree to be rebalanced by one or more tree rotations.
The AVL tree is named after its two inventors, G.M. Adelson-Velskii and E.M. Landis, who published it in their 1962 paper "An algorithm fo
the organization of information."
The balance factor of a node is the height of its right subtree minus the height of its left subtree and a node with balance factor 1, 0, or -1 i
considered balanced. A node with any other balance factor is considered unbalanced and requires rebalancing the tree. The balance facto
either stored directly at each node or computed from the heights of the subtrees.
AVL trees are often compared with red-black trees because they support the same set of operations and because red-black trees also takO(log n) time for the basic operations. AVL trees perform better than red-black trees for lookup-intensive applications.The AVL tree balanc
algorithm appears in many computer science curricula.
B-Trees: Balanced Tree Data Structures
Tree structures support various basic dynamic set operations including Search, Predecessor, Successor,Minimum
Maximum,Insert, andDelete in time proportional to the height of the tree. Ideally, a tree will be balanced and theheight will be log n where n is the number of nodes in the tree. To ensure that the height of the tree is as small as
possible and therefore provide the best running time, a balanced tree structure like a red-black tree, AVL tree, or btree must be used.
When working with large sets of data, it is often not possible or desirable to maintain the entire structure in primarstorage (RAM). Instead, a relatively small portion of the data structure is maintained in primary storage, and
additional data is read from secondary storage as needed. Unfortunately, a magnetic disk, the most common form o
secondary storage, is significantly slower than random access memory (RAM). In fact, the system often spends mtime retrieving data than actually processing data.
B-trees are balanced trees that are optimized for situations when part or all of the tree must be maintained insecondary storage such as a magnetic disk. Since disk accesses are expensive (time consuming) operations, a b-tretries to minimize the number of disk accesses. For example, a b-tree with a height of 2 and a branching factor of
1001 can store over one billion keys but requires at most two disk accesses to search for any node (Cormen 384).
The Structure of B-Trees
8/3/2019 About Trees
15/19
Unlike a binary-tree, each node of a b-tree may have a variable number of keys and children. The keys are stored i
non-decreasing order. Each key has an associated child that is the root of a subtree containing all nodes with keysless than or equal to the key but greater than the preceeding key. A node also has an additional rightmost child that
the root for a subtree containing all keys greater than any keys in the node.
A b-tree has a minumum number of allowable children for each node known as the minimization factor. Iftis thisminimization factor, every node must have at least t - 1 keys. Under certain circumstances, the root node is allowe
to violate this property by having fewer than t - 1 keys. Every node may have at most 2t - 1 keys or, equivalently,children.
Since each node tends to have a large branching factor (a large number of children), it is typically neccessary totraverse relatively few nodes before locating the desired key. If access to each node requires a disk access, then a b
tree will minimize the number of disk accesses required. The minimzation factor is usually chosen so that the total
size of each node corresponds to a multiple of the block size of the underlying storage device. This choice simplifi
and optimizes disk access. Consequently, a b-tree is an ideal data structure for situations where all data cannot resiin primary storage and accesses to secondary storage are comparatively expensive (or time consuming).
Height of B-Trees
For n greater than or equal to one, the height of an n-key b-tree T of height h with a minimum degree tgreater thanor equal to 2,
For a proof of the above inequality, refer to Cormen, Leiserson, and Rivest pages 383-384.
The worst case height is O(log n). Since the "branchiness" of a b-tree can be large compared to many other balanctree structures, the base of the logarithm tends to be large; therefore, the number of nodes visited during a search
tends to be smaller than required by other tree structures. Although this does not affect the asymptotic worst case
height, b-trees tend to have smaller heights than other trees with the same asymptotic height.
Operations on B-Trees
The algorithms for the search, create, and insertoperations are shown below. Note that these algorithms are singlepass; in other words, they do not traverse back up the tree. Since b-trees strive to minimize disk accesses and the
nodes are usually stored on disk, this single-pass approach will reduce the number of node visits and thus the
number of disk accesses. Simpler double-pass approaches that move back up the tree to fix violations are possible.
Since all nodes are assumed to be stored in secondary storage (disk) rather than primary storage (memory), all
references to a given node be be preceeded by a read operation denoted byDisk-Read. Similarly, once a node ismodified and it is no longer needed, it must be written out to secondary storage with a write operation denoted by
8/3/2019 About Trees
16/19
Disk-Write. The algorithms below assume that all nodes referenced in parameters have already had a correspondin
Disk-Readoperation. New nodes are created and assigned storage with theAllocate-Node call. The implementatiodetails of theDisk-Read,Disk-Write, andAllocate-Node functions are operating system and implementation
dependent.
Examples
Sample B-Tree
Searching a B-Tree for Key 21
Inserting Key 33 into a B-Tree (w/ Split)
A binary heap is aheapdata structurecreated using abinary tree. It can be seen as a binary tree with two additionconstraints:
The shape property: the tree is acomplete binary tree; that is, all levels of the tree, except possibly the last one
(deepest) are fully filled, and, if the last level of the tree is not complete, the nodes of that level are filled from left
right.
The heap property: each node is greater than or equal to each of its children according to some comparison predic
which is fixed for the entire data structure.
http://en.wikipedia.org/wiki/Heap_(data_structure)http://en.wikipedia.org/wiki/Heap_(data_structure)http://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Binary_treehttp://en.wikipedia.org/wiki/Binary_treehttp://en.wikipedia.org/wiki/Binary_treehttp://en.wikipedia.org/wiki/Binary_tree#Types_of_binary_treeshttp://en.wikipedia.org/wiki/Binary_tree#Types_of_binary_treeshttp://en.wikipedia.org/wiki/Binary_tree#Types_of_binary_treeshttp://en.wikipedia.org/wiki/Binary_tree#Types_of_binary_treeshttp://en.wikipedia.org/wiki/Binary_treehttp://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Heap_(data_structure)8/3/2019 About Trees
17/19
"Greater than or equal to" means according to whatever comparison function is chosen to sort the heap, not
necessarily "greater than or equal to" in the mathematical sense (since the quantities are not always numerical).Heaps where the comparison function is mathematical "greater than or equal to" are called max-heaps; those wher
the comparison function is mathematical "less than" are called "min-heaps". Conventionally, min-heaps are used,
since they are readily applicable for use inpriority queues.
Note that the ordering of siblings in a heap is not specified by the heap property, so the two children of a parent ca
be freely interchanged, as long as this does not violate the shape and heap properties (compare with treap).
The binary heap is a special case of thed-ary heapin which d = 2.
It is possible to modify the heap structure to allow extraction of both the smallest and largest element in O(logn)time.
[1]To do this, the rows alternate between min heap and max heap. The algorithms are roughly the same, but, i
each step, one must consider the alternating rows with alternating comparisons. The performance is roughly the
same as a normal single direction heap. This idea can be generalised to a min-max-median heap.
Adding to the heap
If we have a heap, and we add an element, we can perform an operation known as up-heap, bubble-up,percolate-
sift-up, or heapify-up in order to restore the heap property. We can do this inO(log n) time, using a binary heap, b
following this algorithm:
1. Add the element on the bottom level of the heap.2. Compare the added element with its parent; if they are in the correct order, stop.
3. If not, swap the element with its parent and return to the previous step.
We do this at maximum for each level in the treethe height of the tree, which is O(log n). However, sinceapproximately 50% of the elements are leaves and 75% are in the bottom two levels, it is likely that the new eleme
to be inserted will only move a few levels upwards to maintain the heap. Thus, binary heaps support insertion in
average constant time, O(1).
Say we have a max-heap
and we want to add the number 15 to the heap. We first place the 15 in the position marked by the X. However, thheap property is violated since 15 is greater than 8, so we need to swap the 15 and the 8. So, we have the heap
looking as follows after the first swap:
http://en.wikipedia.org/wiki/Priority_queuehttp://en.wikipedia.org/wiki/Priority_queuehttp://en.wikipedia.org/wiki/Priority_queuehttp://en.wikipedia.org/wiki/Treaphttp://en.wikipedia.org/wiki/Treaphttp://en.wikipedia.org/wiki/Treaphttp://en.wikipedia.org/wiki/D-ary_heaphttp://en.wikipedia.org/wiki/D-ary_heaphttp://en.wikipedia.org/wiki/D-ary_heaphttp://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/Binary_heap#cite_note-sym-0http://en.wikipedia.org/wiki/Binary_heap#cite_note-sym-0http://en.wikipedia.org/wiki/Binary_heap#cite_note-sym-0http://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/File:Heap_add_step1.svghttp://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/Binary_heap#cite_note-sym-0http://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/D-ary_heaphttp://en.wikipedia.org/wiki/Treaphttp://en.wikipedia.org/wiki/Priority_queue8/3/2019 About Trees
18/19
However the heap property is still violated since 15 is greater than 11, so we need to swap again:
which is a valid max-heap. There is no need to check the children after this. Before we placed 15 on X, the heap wvalid, meaning 11 is greater than 5. If 15 is greater than 11, and 11 is greater than 5, then 15 must be greater than 5
[edit] Deleting the root from the heap
The procedure for deleting the root from the heapeffectively extracting the maximum element in a max-heap orthe minimum element in a min-heapstarts by replacing it with the last element on the last level. So, if we have th
same max-heap as before, we remove the 11 and replace it with the 4.
Now the heap property is violated since 8 is greater than 4. The operation that restores the property is called downheap, bubble-down,percolate-down, sift-down, or heapify-down. In this case, swapping the two elements 4 and 8, enough to restore the heap property and we need not swap elements further:
The downward-moving node is swapped with the largerof its children in a max-heap (in a min-heap it would beswapped with its smaller child), until it satisfies the heap property in its new position. This functionality is achieve
by the Max-Heapify function as defined below inpseudocodefor anarray-backed heapA. Note that "A" is indexestarting at 1, not 0 as is common in many programming languages.
Max-Heapify[2]
(A, i):
left 2i
http://en.wikipedia.org/w/index.php?title=Binary_heap&action=edit§ion=3http://en.wikipedia.org/w/index.php?title=Binary_heap&action=edit§ion=3http://en.wikipedia.org/w/index.php?title=Binary_heap&action=edit§ion=3http://en.wikipedia.org/wiki/Pseudocodehttp://en.wikipedia.org/wiki/Pseudocodehttp://en.wikipedia.org/wiki/Pseudocodehttp://en.wikipedia.org/wiki/Array_data_structurehttp://en.wikipedia.org/wiki/Array_data_structurehttp://en.wikipedia.org/wiki/Array_data_structurehttp://en.wikipedia.org/wiki/Binary_heap#cite_note-CLRS-1http://en.wikipedia.org/wiki/Binary_heap#cite_note-CLRS-1http://en.wikipedia.org/wiki/Binary_heap#cite_note-CLRS-1http://en.wikipedia.org/wiki/File:Heap_remove_step2.svghttp://en.wikipedia.org/wiki/File:Heap_remove_step1.svghttp://en.wikipedia.org/wiki/File:Heap_add_step3.svghttp://en.wikipedia.org/wiki/File:Heap_add_step2.svghttp://en.wikipedia.org/wiki/File:Heap_remove_step2.svghttp://en.wikipedia.org/wiki/File:Heap_remove_step1.svghttp://en.wikipedia.org/wiki/File:Heap_add_step3.svghttp://en.wikipedia.org/wiki/File:Heap_add_step2.svghttp://en.wikipedia.org/wiki/File:Heap_remove_step2.svghttp://en.wikipedia.org/wiki/File:Heap_remove_step1.svghttp://en.wikipedia.org/wiki/File:Heap_add_step3.svghttp://en.wikipedia.org/wiki/File:Heap_add_step2.svghttp://en.wikipedia.org/wiki/File:Heap_remove_step2.svghttp://en.wikipedia.org/wiki/File:Heap_remove_step1.svghttp://en.wikipedia.org/wiki/File:Heap_add_step3.svghttp://en.wikipedia.org/wiki/File:Heap_add_step2.svghttp://en.wikipedia.org/wiki/Binary_heap#cite_note-CLRS-1http://en.wikipedia.org/wiki/Array_data_structurehttp://en.wikipedia.org/wiki/Pseudocodehttp://en.wikipedia.org/w/index.php?title=Binary_heap&action=edit§ion=38/3/2019 About Trees
19/19
right 2i + 1
largest i
ifleft heap-length[A] andA[left] > A[i] then:
largest left
ifright heap-length[A] andA[right] >A[largest] then:
largest rightiflargest ithen:
swapA[i] A[largest]
Max-Heapify(A, largest)
Note that the down-heap operation (without the preceding swap) can be used in general to modify the value of theroot, even when an element is not being deleted.