About Trees

Embed Size (px)

Citation preview

  • 8/3/2019 About Trees

    1/19

    Binary Search Tree

    In computer science, a binary search tree (BST) is a node based binary tree data structure which has the following properties:

    The left subtree of a node contains only nodes with keys less than the node's key. The right subtree of a node contains only nodes with keys greater than the node's key.

    Both the left and right subtrees must also be binary search trees.

    From the above properties it naturally follows that:

    Each node (item in the tree) has a distinct key.

    Generally, the information represented by each node is a record rather than a single data element. However, for sequencing purposes,

    nodes are compared according to their keys rather than any part of their their associated records.

    The major advantage of binary search trees over other data structures is that the related sorting algorithms and search algorithms such as

    order traversal can be very efficient.

    Binary search trees are a fundamental data structure used to construct more abstract data structures such as sets, multisets, and associa

    arrays.

    A binary search tree of size 9 and depth 3, with root 8 and leaves 1, 4, 7 and 13

    Binary Search tree is a binary tree in which each internal nodex stores an element such that theelement stored in the left subtree ofx are less than or equal tox and elements stored in the right subtr

    ofx are greater than or equal tox. This is called binary-search-tree property.

    http://en.wikipedia.org/wiki/File:Binary_search_tree.svg
  • 8/3/2019 About Trees

    2/19

    The basic operations on a binary search tree take time proportional to the height of the tree. For a

    complete binary tree with node n, such operations runs in (lg n) worst-case time. If the tree is a

    linear chain of n nodes, however, the same operations takes (n) worst-case time.

    Binary search tree

    First of all, binary search tree (BST) is a dynamic data structure, which means, that its size is only limited

    amount of free memory in the operating system and number of elements may vary during the program run. Madvantage of binary search trees is rapid search, while addition is quite cheap. Let us see more formal definiti

    of BST.

    Binary search tree is a data structure, which meets the following requirements:

    it is abinary tree; each node contains a value; a total order is defined on these values (every two values can be comparedwith each other); left subtree of a node contains only values lesser, than the node's value; right subtree of a node contains only values greater, than the node's value.

    Notice, that definition above doesn't allow duplicates.

    Example of a binary search tree

    What for binary search trees are used?

    Binary search tree is used to construct map data structure. In practice, data can be often associated with so

    unique key. For instance, in the phone book such a key is a telephone number. Storing such a data in bin

    search tree allows to look up for the record by key faster, than if it was stored in unordered list. Also, BST can

    http://www.algolist.net/Data_structures/Binary_treehttp://www.algolist.net/Data_structures/Binary_treehttp://www.algolist.net/Data_structures/Binary_treehttp://www.algolist.net/Data_structures/Dictionary_(ADT)http://www.algolist.net/Data_structures/Dictionary_(ADT)http://www.algolist.net/Data_structures/Dictionary_(ADT)http://www.algolist.net/Data_structures/Binary_tree
  • 8/3/2019 About Trees

    3/19

    utilized to construct set data structure, which allows to store an unordered collection of unique values and ma

    operations with such collections.

    Performance of a binary search tree depends of its height. In order to keep tree balanced and minimize its heig

    the idea of binary search trees was advanced in balanced search trees (AVL trees, Red-Black trees, Splay tree

    Here we will discuss the basic ideas, laying in the foundation of binary search trees.

    Binary search tree. Internal representation

    Like any other dynamic data structure, BST requires storing of some additional auxiliary data, in order to keep

    structure. Each node of binary tree contains the following information:

    a value (user's data); a link to the left child (auxiliary data); a link to the right child (auxiliary data).

    Depending on the size of user data, memory overhead may vary, but in general it is quite reasonable. In someimplementations, node may store a link to the parent, but it depends on algorithm, programmer want to apply to

    BST. For basic operations, like addition, removal and search a link to the parent is not necessary. It is needed in

    order to implement iterators.

    With a view to internal representation, the sample from the overview changes:

    Leaf nodes have links to the children, but they don't have children. In a programming language it means, that

    corresponding links are set toNULL.

    Binary search tree. Adding a value

    Adding a value to BST can be divided into two stages:

    search for a place to put a new element;

  • 8/3/2019 About Trees

    4/19

    insert the new element to this place.

    Let us see these stages in more detail.

    Search for a place

    At this stage analgorithm should follow binary search tree property. If a new value is less, than the current node's

    value, go to the left subtree, else go to the right subtree. Following this simple rule, the algorithm reaches a node,

    which has no left or right subtree. By the moment a place for insertion is found, we can say for sure, that a new vahas no duplicate in the tree. Initially, a new node has no children, so it is a leaf. Let us see it at the picture. Gray

    circles indicate possible places for a new node.

    Now, let's go down to algorithm itself. Here and in almost every operation on BST recursion is utilized. Startingfrom the root,

    1. check, whether value in current node and a new value are equal. If so, duplicate is found. Otherwise,

    2. if a new value is less, than the node's value:o if a current node has no left child, place for insertion has been found;o otherwise, handle the left child with the same algorithm.

    3. if a new value is greater, than the node's value:o if a current node has no right child, place for insertion has been found;o otherwise, handle the right child with the same algorithm.

    Just before code snippets, let us have a look on the example, demonstrating a case of insertion in the binary search tree.

    Example

    Insert 4 to the tree, shown above.

  • 8/3/2019 About Trees

    5/19

  • 8/3/2019 About Trees

    6/19

    Code snippets

    The only the difference, between the algorithm above and the real routine is that first we should check, if a rootexists. If not, just create it and don't run a common algorithm for this special case. This can be done in the

    BinarySearchTree class. Principal algorithm is implemented in theBSTNode class.

    Binary search tree. Lookup operation

    Searching for a value in a BST is very similar to add operation. Search algorithm traverses the tree "in-depth",

    choosing appropriate way to go, following binary search tree property and compares value of each visited node withe one, we are looking for. Algorithm stops in two cases:

    a node with necessary value is found;

    algorithm has no way to go.

  • 8/3/2019 About Trees

    7/19

    Search algorithm in detail

    Now, let's see more detailed description of the search algorithm. Like an add operation, and almost every operatioon BST, search algorithm utilizes recursion. Starting from the root,

    1. check, whether value in current node and searched value are equal. If so,value is found. Otherwise,2. if searched value is less, than the node's value:

    o if current node has no left child, searched value doesn't exist in the BST;o otherwise, handle the left child with the same algorithm.

    3. if a new value is greater, than the node's value:o if current node has no right child, searched value doesn't exist in the BST;o otherwise, handle the right child with the same algorithm.

    Just before code snippets, let us have a look on the example, demonstrating searching for a value in the binary search tree

    Example

    Search for 3 in the tree, shown above.

  • 8/3/2019 About Trees

    8/19

    Binary search tree. Removing a node

    Remove operation on binary search tree is more complicated, than add and search. Basically, in can be divided into

    two stages:

  • 8/3/2019 About Trees

    9/19

    search for a node to remove;

    if the node is found, run remove algorithm.

    Remove algorithm in detail

    Now, let's see more detailed description of a remove algorithm. First stage is identical toalgorithm for lookup,except we should track the parent of the current node. Second part is more tricky. There are three cases, which are

    described below.

    1. Node to be removed has no children.

    This case is quite simple. Algorithm sets corresponding link of the parent to NULL and disposes the node.

    Example. Remove -4 from a BST.

    2. Node to be removed has one child.

    It this case, node is cut from the tree and algorithm links single child (with it's subtree) directly to the parenof the removed node.

    Example. Remove 18 from a BST.

    http://www.algolist.net/Data_structures/Binary_search_tree/Lookuphttp://www.algolist.net/Data_structures/Binary_search_tree/Lookuphttp://www.algolist.net/Data_structures/Binary_search_tree/Lookuphttp://www.algolist.net/Data_structures/Binary_search_tree/Lookup
  • 8/3/2019 About Trees

    10/19

    3. Node to be removed has two children.

    This is the most complex case. To solve it, let us see one useful BST property first. We are going to use theidea, that the same set of values may be represented as different binary-search trees. For example those

    BSTs:

  • 8/3/2019 About Trees

    11/19

    contains the same values {5, 19, 21, 25}. To transform first tree into second one, we can do following:

    o choose minimum element from the right subtree (19 in the example);o replace 5 by 19;o hang 5 as a left child.

    The same approach can be utilized to remove a node, which has two children:

    o find a minimum value in the right subtree;o replace value of the node to be removed with found minimum. Now, right subtree contains a duplicate!o apply remove to the right subtree to remove a duplicate.

    Notice, that the node with minimum value has no left child and, therefore, it's removal may result in first or

    second cases only.

    Example. Remove 12 from a BST.

  • 8/3/2019 About Trees

    12/19

    Find minimum element in the right subtree of the node to be removed. In current example it is 19.

    Replace 12 with 19. Notice, that only values are replaced, not nodes. Now we have two nodes with the samvalue.

  • 8/3/2019 About Trees

    13/19

    Remove 19 from the left subtree.

  • 8/3/2019 About Trees

    14/19

    AVL tress

    An AVL tree is a self-balancing binary search tree, and it is the first such data structure to be invented. In an AVL tree, the heights of the tw

    child subtrees of any node differ by at most one; therefore, it is also said to be height-balanced. Lookup, insertion, and deletion all take O(

    n) time in both the average and worst cases, where n is the number of nodes in the tree prior to the operation. Insertions and deletions marequire the tree to be rebalanced by one or more tree rotations.

    The AVL tree is named after its two inventors, G.M. Adelson-Velskii and E.M. Landis, who published it in their 1962 paper "An algorithm fo

    the organization of information."

    The balance factor of a node is the height of its right subtree minus the height of its left subtree and a node with balance factor 1, 0, or -1 i

    considered balanced. A node with any other balance factor is considered unbalanced and requires rebalancing the tree. The balance facto

    either stored directly at each node or computed from the heights of the subtrees.

    AVL trees are often compared with red-black trees because they support the same set of operations and because red-black trees also takO(log n) time for the basic operations. AVL trees perform better than red-black trees for lookup-intensive applications.The AVL tree balanc

    algorithm appears in many computer science curricula.

    B-Trees: Balanced Tree Data Structures

    Tree structures support various basic dynamic set operations including Search, Predecessor, Successor,Minimum

    Maximum,Insert, andDelete in time proportional to the height of the tree. Ideally, a tree will be balanced and theheight will be log n where n is the number of nodes in the tree. To ensure that the height of the tree is as small as

    possible and therefore provide the best running time, a balanced tree structure like a red-black tree, AVL tree, or btree must be used.

    When working with large sets of data, it is often not possible or desirable to maintain the entire structure in primarstorage (RAM). Instead, a relatively small portion of the data structure is maintained in primary storage, and

    additional data is read from secondary storage as needed. Unfortunately, a magnetic disk, the most common form o

    secondary storage, is significantly slower than random access memory (RAM). In fact, the system often spends mtime retrieving data than actually processing data.

    B-trees are balanced trees that are optimized for situations when part or all of the tree must be maintained insecondary storage such as a magnetic disk. Since disk accesses are expensive (time consuming) operations, a b-tretries to minimize the number of disk accesses. For example, a b-tree with a height of 2 and a branching factor of

    1001 can store over one billion keys but requires at most two disk accesses to search for any node (Cormen 384).

    The Structure of B-Trees

  • 8/3/2019 About Trees

    15/19

    Unlike a binary-tree, each node of a b-tree may have a variable number of keys and children. The keys are stored i

    non-decreasing order. Each key has an associated child that is the root of a subtree containing all nodes with keysless than or equal to the key but greater than the preceeding key. A node also has an additional rightmost child that

    the root for a subtree containing all keys greater than any keys in the node.

    A b-tree has a minumum number of allowable children for each node known as the minimization factor. Iftis thisminimization factor, every node must have at least t - 1 keys. Under certain circumstances, the root node is allowe

    to violate this property by having fewer than t - 1 keys. Every node may have at most 2t - 1 keys or, equivalently,children.

    Since each node tends to have a large branching factor (a large number of children), it is typically neccessary totraverse relatively few nodes before locating the desired key. If access to each node requires a disk access, then a b

    tree will minimize the number of disk accesses required. The minimzation factor is usually chosen so that the total

    size of each node corresponds to a multiple of the block size of the underlying storage device. This choice simplifi

    and optimizes disk access. Consequently, a b-tree is an ideal data structure for situations where all data cannot resiin primary storage and accesses to secondary storage are comparatively expensive (or time consuming).

    Height of B-Trees

    For n greater than or equal to one, the height of an n-key b-tree T of height h with a minimum degree tgreater thanor equal to 2,

    For a proof of the above inequality, refer to Cormen, Leiserson, and Rivest pages 383-384.

    The worst case height is O(log n). Since the "branchiness" of a b-tree can be large compared to many other balanctree structures, the base of the logarithm tends to be large; therefore, the number of nodes visited during a search

    tends to be smaller than required by other tree structures. Although this does not affect the asymptotic worst case

    height, b-trees tend to have smaller heights than other trees with the same asymptotic height.

    Operations on B-Trees

    The algorithms for the search, create, and insertoperations are shown below. Note that these algorithms are singlepass; in other words, they do not traverse back up the tree. Since b-trees strive to minimize disk accesses and the

    nodes are usually stored on disk, this single-pass approach will reduce the number of node visits and thus the

    number of disk accesses. Simpler double-pass approaches that move back up the tree to fix violations are possible.

    Since all nodes are assumed to be stored in secondary storage (disk) rather than primary storage (memory), all

    references to a given node be be preceeded by a read operation denoted byDisk-Read. Similarly, once a node ismodified and it is no longer needed, it must be written out to secondary storage with a write operation denoted by

  • 8/3/2019 About Trees

    16/19

    Disk-Write. The algorithms below assume that all nodes referenced in parameters have already had a correspondin

    Disk-Readoperation. New nodes are created and assigned storage with theAllocate-Node call. The implementatiodetails of theDisk-Read,Disk-Write, andAllocate-Node functions are operating system and implementation

    dependent.

    Examples

    Sample B-Tree

    Searching a B-Tree for Key 21

    Inserting Key 33 into a B-Tree (w/ Split)

    A binary heap is aheapdata structurecreated using abinary tree. It can be seen as a binary tree with two additionconstraints:

    The shape property: the tree is acomplete binary tree; that is, all levels of the tree, except possibly the last one

    (deepest) are fully filled, and, if the last level of the tree is not complete, the nodes of that level are filled from left

    right.

    The heap property: each node is greater than or equal to each of its children according to some comparison predic

    which is fixed for the entire data structure.

    http://en.wikipedia.org/wiki/Heap_(data_structure)http://en.wikipedia.org/wiki/Heap_(data_structure)http://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Binary_treehttp://en.wikipedia.org/wiki/Binary_treehttp://en.wikipedia.org/wiki/Binary_treehttp://en.wikipedia.org/wiki/Binary_tree#Types_of_binary_treeshttp://en.wikipedia.org/wiki/Binary_tree#Types_of_binary_treeshttp://en.wikipedia.org/wiki/Binary_tree#Types_of_binary_treeshttp://en.wikipedia.org/wiki/Binary_tree#Types_of_binary_treeshttp://en.wikipedia.org/wiki/Binary_treehttp://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Heap_(data_structure)
  • 8/3/2019 About Trees

    17/19

    "Greater than or equal to" means according to whatever comparison function is chosen to sort the heap, not

    necessarily "greater than or equal to" in the mathematical sense (since the quantities are not always numerical).Heaps where the comparison function is mathematical "greater than or equal to" are called max-heaps; those wher

    the comparison function is mathematical "less than" are called "min-heaps". Conventionally, min-heaps are used,

    since they are readily applicable for use inpriority queues.

    Note that the ordering of siblings in a heap is not specified by the heap property, so the two children of a parent ca

    be freely interchanged, as long as this does not violate the shape and heap properties (compare with treap).

    The binary heap is a special case of thed-ary heapin which d = 2.

    It is possible to modify the heap structure to allow extraction of both the smallest and largest element in O(logn)time.

    [1]To do this, the rows alternate between min heap and max heap. The algorithms are roughly the same, but, i

    each step, one must consider the alternating rows with alternating comparisons. The performance is roughly the

    same as a normal single direction heap. This idea can be generalised to a min-max-median heap.

    Adding to the heap

    If we have a heap, and we add an element, we can perform an operation known as up-heap, bubble-up,percolate-

    sift-up, or heapify-up in order to restore the heap property. We can do this inO(log n) time, using a binary heap, b

    following this algorithm:

    1. Add the element on the bottom level of the heap.2. Compare the added element with its parent; if they are in the correct order, stop.

    3. If not, swap the element with its parent and return to the previous step.

    We do this at maximum for each level in the treethe height of the tree, which is O(log n). However, sinceapproximately 50% of the elements are leaves and 75% are in the bottom two levels, it is likely that the new eleme

    to be inserted will only move a few levels upwards to maintain the heap. Thus, binary heaps support insertion in

    average constant time, O(1).

    Say we have a max-heap

    and we want to add the number 15 to the heap. We first place the 15 in the position marked by the X. However, thheap property is violated since 15 is greater than 8, so we need to swap the 15 and the 8. So, we have the heap

    looking as follows after the first swap:

    http://en.wikipedia.org/wiki/Priority_queuehttp://en.wikipedia.org/wiki/Priority_queuehttp://en.wikipedia.org/wiki/Priority_queuehttp://en.wikipedia.org/wiki/Treaphttp://en.wikipedia.org/wiki/Treaphttp://en.wikipedia.org/wiki/Treaphttp://en.wikipedia.org/wiki/D-ary_heaphttp://en.wikipedia.org/wiki/D-ary_heaphttp://en.wikipedia.org/wiki/D-ary_heaphttp://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/Binary_heap#cite_note-sym-0http://en.wikipedia.org/wiki/Binary_heap#cite_note-sym-0http://en.wikipedia.org/wiki/Binary_heap#cite_note-sym-0http://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/File:Heap_add_step1.svghttp://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/Binary_heap#cite_note-sym-0http://en.wikipedia.org/wiki/Big_O_notationhttp://en.wikipedia.org/wiki/D-ary_heaphttp://en.wikipedia.org/wiki/Treaphttp://en.wikipedia.org/wiki/Priority_queue
  • 8/3/2019 About Trees

    18/19

    However the heap property is still violated since 15 is greater than 11, so we need to swap again:

    which is a valid max-heap. There is no need to check the children after this. Before we placed 15 on X, the heap wvalid, meaning 11 is greater than 5. If 15 is greater than 11, and 11 is greater than 5, then 15 must be greater than 5

    [edit] Deleting the root from the heap

    The procedure for deleting the root from the heapeffectively extracting the maximum element in a max-heap orthe minimum element in a min-heapstarts by replacing it with the last element on the last level. So, if we have th

    same max-heap as before, we remove the 11 and replace it with the 4.

    Now the heap property is violated since 8 is greater than 4. The operation that restores the property is called downheap, bubble-down,percolate-down, sift-down, or heapify-down. In this case, swapping the two elements 4 and 8, enough to restore the heap property and we need not swap elements further:

    The downward-moving node is swapped with the largerof its children in a max-heap (in a min-heap it would beswapped with its smaller child), until it satisfies the heap property in its new position. This functionality is achieve

    by the Max-Heapify function as defined below inpseudocodefor anarray-backed heapA. Note that "A" is indexestarting at 1, not 0 as is common in many programming languages.

    Max-Heapify[2]

    (A, i):

    left 2i

    http://en.wikipedia.org/w/index.php?title=Binary_heap&action=edit&section=3http://en.wikipedia.org/w/index.php?title=Binary_heap&action=edit&section=3http://en.wikipedia.org/w/index.php?title=Binary_heap&action=edit&section=3http://en.wikipedia.org/wiki/Pseudocodehttp://en.wikipedia.org/wiki/Pseudocodehttp://en.wikipedia.org/wiki/Pseudocodehttp://en.wikipedia.org/wiki/Array_data_structurehttp://en.wikipedia.org/wiki/Array_data_structurehttp://en.wikipedia.org/wiki/Array_data_structurehttp://en.wikipedia.org/wiki/Binary_heap#cite_note-CLRS-1http://en.wikipedia.org/wiki/Binary_heap#cite_note-CLRS-1http://en.wikipedia.org/wiki/Binary_heap#cite_note-CLRS-1http://en.wikipedia.org/wiki/File:Heap_remove_step2.svghttp://en.wikipedia.org/wiki/File:Heap_remove_step1.svghttp://en.wikipedia.org/wiki/File:Heap_add_step3.svghttp://en.wikipedia.org/wiki/File:Heap_add_step2.svghttp://en.wikipedia.org/wiki/File:Heap_remove_step2.svghttp://en.wikipedia.org/wiki/File:Heap_remove_step1.svghttp://en.wikipedia.org/wiki/File:Heap_add_step3.svghttp://en.wikipedia.org/wiki/File:Heap_add_step2.svghttp://en.wikipedia.org/wiki/File:Heap_remove_step2.svghttp://en.wikipedia.org/wiki/File:Heap_remove_step1.svghttp://en.wikipedia.org/wiki/File:Heap_add_step3.svghttp://en.wikipedia.org/wiki/File:Heap_add_step2.svghttp://en.wikipedia.org/wiki/File:Heap_remove_step2.svghttp://en.wikipedia.org/wiki/File:Heap_remove_step1.svghttp://en.wikipedia.org/wiki/File:Heap_add_step3.svghttp://en.wikipedia.org/wiki/File:Heap_add_step2.svghttp://en.wikipedia.org/wiki/Binary_heap#cite_note-CLRS-1http://en.wikipedia.org/wiki/Array_data_structurehttp://en.wikipedia.org/wiki/Pseudocodehttp://en.wikipedia.org/w/index.php?title=Binary_heap&action=edit&section=3
  • 8/3/2019 About Trees

    19/19

    right 2i + 1

    largest i

    ifleft heap-length[A] andA[left] > A[i] then:

    largest left

    ifright heap-length[A] andA[right] >A[largest] then:

    largest rightiflargest ithen:

    swapA[i] A[largest]

    Max-Heapify(A, largest)

    Note that the down-heap operation (without the preceding swap) can be used in general to modify the value of theroot, even when an element is not being deleted.