Upload
gloria-walsh
View
226
Download
0
Tags:
Embed Size (px)
Citation preview
Heaps and HeapsortHeaps and Heapsort
Prof. Sin-Min Lee Department of Computer
Science San Jose State University
• Heap Definition.
• Adding a Node.
• Removing a Node.
• Array Implementation.
• Analysis
• Source Code.
HeapsHeaps
What is Heap?
A heap is like a binary search tree. It is a binary tree where the six comparison operators form a total order semantics that can be used to compare the node’s entries. But the arrangement of the elements in a heap follows some new rules that are different from a binary search tree.
HeapsHeaps
A heap is a certain kind of complete binary tree. It is defined by the following properties.
Heaps (Priority Queues)
A heap is a data structure which supports efficient implementation of three operations: insert, findMin, and deleteMin. (Such a heap is a min-heap, to be more precise.)
Two obvious implementations are based on an array (assuming the size n is known) and on a linked list. The comparisons of their relative time complexity are as follows:
insert findMin deleteMin Sorted array O(n) O(1) O(1) (from large (to move (find at end) (delete the end) to small) elements)
Linked list O(1) O(1) O(n) (append) (a pointer to Min) (reset Min pointer)
HeapsHeaps
1. All leaf nodes occur at adjacent levels.
When a completebinary tree is built,
its first node must bethe root.
When a completebinary tree is built,
its first node must bethe root.
Root
HeapsHeaps
2. All levels of the tree, except for the bottom level are completely filled. All nodes on the bottom level occur as far left as possible.
Left childof theroot
The second node isalways the left child
of the root.
The second node isalways the left child
of the root.
HeapsHeaps
3. The key of the root node is greater than or equal to the key of each child. Each subtree is also a heap.
Right childof the
root
The third node isalways the right child
of the root.
The third node isalways the right child
of the root.
HeapsHeaps
The next nodesalways fill the next
level from left-to-right..
The next nodesalways fill the next
level from left-to-right..
HeapsHeaps
The next nodesalways fill the next
level from left-to-right.
The next nodesalways fill the next
level from left-to-right.
HeapsHeaps
Complete binary tree.
Heap Storage Rules• A heap is a binary tree where the entries of the
nodes can be compared with a total order semantics. In addition, these two rules are followed:
• The entry contained by a node is greater than or equal to the entries of the node’s children.
• The tree is a complete binary tree, so that every level except the deepest must contain as many nodes as possible; and at the deepest level, all the nodes are as far left as possible.
HeapsHeaps
A heap is a certain kind of complete binary tree. 19
4222127
23
45
35
The "heap property"requires that each
node's key is >= thekeys of its children
The "heap property"requires that each
node's key is >= thekeys of its children
HeapsHeaps
This tree is not a heap. It breaks rule number 1.
19
27
23
45
35
HeapsHeaps
This tree is not a heap. It breaks rule number 2.
21
23
45
35
HeapsHeaps
This tree is not a heap. It breaks rule number 3.
2135
23
45
27
Heap implementation of priority queue
• Each node of the heap contains one entry along with the entry’s priority, and the tree is maintained so that it follows the heap storage rules using the entry’s priorities to compare nodes. Therefore:
• The entry contained by a node has a priority that is greater than or equal to the priority of the entries of the nodes children.
• The tree is a complete binary tree.
Adding a Node to a HeapAdding a Node to a Heap
Put the new node in the next available spot.
19
4222127
23
45
35
42
Adding a Node to a HeapAdding a Node to a Heap
Push the new node upward, swapping with its parent until the new node reaches an acceptable location.
19
4222142
23
45
35
27
Adding a Node to a HeapAdding a Node to a Heap
Push the new node upward, swapping with its parent until the new node reaches an acceptable location.
19
4222135
23
45
42
27
Adding a Node to a HeapAdding a Node to a Heap
The parent has a key that is >= new node, or
The node reaches the root. The process of pushing the
new node upward is called reheapification upward.
19
4222135
23
45
42
27
Removing the Top of a HeapRemoving the Top of a Heap
Move the last node onto the root.
19
4222135
23
45
42
27
Removing the Top of a HeapRemoving the Top of a Heap
Move the last node onto the root.
19
4222135
23
27
42
Removing the Top of a HeapRemoving the Top of a HeapPush the out-of-
place node downward, swapping with its larger child until the new node reaches an acceptable location.
19
4222135
23
27
42
Removing the Top of a HeapRemoving the Top of a HeapPush the out-of-
place node downward, swapping with its larger child until the new node reaches an acceptable location.
19
4222135
23
42
27
Removing the Top of a HeapRemoving the Top of a HeapPush the out-of-
place node downward, swapping with its larger child until the new node reaches an acceptable location.
19
4222127
23
42
35
Removing the Top of a HeapRemoving the Top of a Heap The children all have keys <= the out-of-place node, or
The node reaches the leaf.
The process of pushing the new node downward is called reheapification downward.
19
4222127
23
42
35
Adding an Entry to a Heap
519
27 21 422
2335
45
As an example, suppose that we already have nine entries that are arranged in a heap with the above priorities.
Suppose that we are adding a new entry with a priority 42.The first stepis to add this entry in a way that keeps the binary tree complete. In thiscase the new entry will be the left child of the entry with priority 21.
519
27 21 422
2335
45
4242
The structure is no longer a heap, since the node with priority 21 has a child with a higher priority. The new entry (with priority 42) rises upwarduntil it reaches an acceptable location. This is accomplished by swappingthe new entry with its parent.
519
27 4242 422
2335
45
21
The new entry is still bigger than its parent, so a second swap is done.
519
27 35 422
234242
45
21
Adding an entry to a priority queue
Pseudocode for Adding an entry
The priority queue has been implemented as a heap.
1.Place the new entry in the heap in the first available location.This keeps the structure as a complete binary tree, but it might no longer be a heap since the new entry might have a higher priority than its parent.
2. while (the new entry has a priority that is higher than its parent)
swap the new entry with its parent.
Reheapification upward
Now the new entry stops rising, because its parent (with priority 45) has a higher prioritythan the new entry. In general, the “new entry rising” stops when the new entry reaches the root. The rising process is called reheapification upward.
Implementing a HeapImplementing a Heap
• We will store the data from the nodes in a partially-filled array.
An array of dataAn array of data
2127
23
42
35
Implementing a HeapImplementing a Heap
• Data from the root goes in the first location of the array.
An array of dataAn array of data
2127
23
42
35
42
Implementing a HeapImplementing a Heap
• Data from the next row goes in the next two array locations.
An array of dataAn array of data
2127
23
42
35
42 35 23
Implementing a HeapImplementing a Heap
• Data from the next row goes in the next two array locations.
• Any node’s two children reside at indexes (2n) and (2n + 1) in the array. An array of An array of datadata
2127
23
42
35
42 35 23 27 21
Implementing a HeapImplementing a Heap
• Data from the next row goes in the next two array locations.
An array of dataAn array of data
2127
23
42
35
42 35 23 27 21
We don't care what's inWe don't care what's inthis part of the array.this part of the array.
Important Points about the Implementation
Important Points about the Implementation
• The links between the tree's nodes are not actually stored as pointers, or in any other way.
• The only way we "know" that "the array is a tree" is from the way we manipulate the data.
An array of dataAn array of data
2127
23
42
35
42 35 23 27 21
Important Points about the Implementation
Important Points about the Implementation
• If you know the index of a node, then it is easy to figure out the indexes of that node's parent and children.
[1][1] [2] [3] [4] [5]
2127
23
42
35
42 35 23 27 21
Removing an entry from a Heap
• When an entry is removed from a priority, we must always remove the
• entry with the highest priority - the entry that stands “on top of heap”.
• For example, the entry at the root, with priority 45 will be removed.
The highest priority item, at the root, will be removed from the heap
519
27 35 422
2342
45
21
4545
Removing an entry from a heap
• During the removal, we must ensure that the resulting structure remains a heap. If the root entry is the only entry in the heap, then there is really no more work to do except to decrement the member variable that is keeping track of the size of the heap. But if there are other entries in the tree, then the tree must be rearranged because a heap is not allowed to run around without a root. The rearrangement begins by moving the last entry in the last level upto the root, like this:
519
35 422
2342
45
2121
27
2121
422
23
35
42
27
519
Before the moveAfter the move
The last entry in the last levelhas been moved to the root.
Removing an entry from a Heap
Heapsort
• Heapsort combines the time efficiency of mergesort and the storage efficiency of quicksort. Like mergesort, heapsort has a worst case running time that is O(nlogn), and like quicksort it does not require an additional array. Heapsort works by first transforming the array to be sorted into a heap.
Heap representation
A heap is a complete binary tree, and an efficient method of representing a complete binary tree in an array follows these rules:
Data from the root of the complete binary tree goes in the [0] component of the array.
The data from depth one nodes is placed in the next two components of the array.
We continue in this way, placing the four nodes of depth two in the next four components of the array, and so forth. For example, a heap with ten nodes can be stored in a ten element array as shown below:
27
45
42
23 352221
5419
45 27 42 21 23 22 35 19 4 5
45
Rules for location of data in the array
• The data from the root always appears in the [0] component of the array.
• Suppose that the data for a root node appears in the component [i] of the array. Then the data for its parent is always at location [(i - 1) / 2] (using integer division).
• Suppose that the data for a node appears in component [i] of the array. Then its children (if they exist)always have their data at these locations:
Left child at component [2i + 1]
Right child at component [2i + 2]
• The largest value in a heap is stored in the root and that in our array representation of a complete binary tree, the root node is stored in the first array position. Thus, since the array represents a heap, we know that the largest array element is in the first array position. To get the largest value in the correct array position, we simply interchange the first and the final array elements. After interchanging these two array elements, we know that the largest array element is in the final array position, as shown below:
• The dark vertical line in the array separates the sorted side of the array from the unsorted side (on the left) . Moreover the left side still represents a complete binary tree which is almost a heap.
5 27 42 21 23 22 35 19 4 45
When we consider the unsorted side as a complete binary tree, it is only the root that violates the heapstorage rule. This root is smaller than its children. The next step of the heapsort is to reposition this out of place value in order to restore the heap. The process called reheapification downward begins by comparing the value in the root with the value of each of its children. If one or both children are larger than the root , then the root’s value is exchanged with the larger of the two children. This moves the troublesome value down one level in the tree. For example:
27
22
419
21 23 35
5
42
42 27 5 21 23 22 35 19 4 45
• In the example 5 is still out of place, so we will once again exchange it with its largest child resulting in the array and heap shown below:
27
22
419
21 23
35
5
42
42 27 35 21 23 22 5 19 4 45
The unsorted side of the array must be maintained a heap. When the unsorted side of the array is once again a heap, the heapsort continues by exchanging the largest element in the unsorted side with the rightmost element of the unsorted side. For example 42 is exchanged with 4 as shown below:
4 27 35 21 23 22 5 19 42 45
• The sorted side now has two largest elements, and the unsorted side is once again almost a heap, as shown below:
4 27 35 21 23 22 5 19 42 45
522
35
4
27
2321
19
Only the root (4) is out of place, and that may be fixed by another reheapification downward. After the reheapification downward, the largest value of the unsorted side will once again reside at location [0]and we can continue to pull out the next largest value.
Pseudocode for the heapsort algorithm//Heapsort for the array called data with n elements1. Convert the array of n elements into a heap.2. unsorted = n; // The number of elements in the unsorted
side3. while ( unsorted > 1){ // Reduce the unsorted side by one unsorted - -; Swap data[0] with data [unsorted]. The unsorted side of the array is now a heap with the
root out of place. Do a reheapification downward to turn the unsorted side back into a heap. }
• While the Heapsort is a constant factor slower than the Quicksort for average data sets, it still has a complexity of O(n log n) for best, worst and average data.
• Reheapification is the process by which an out of place node is exchanged with its parent (upward) or its children (downward) until the node’s key is <= to it’s parent and >= to it’s children. Then reheapification is complete.
Analysis Analysis
A simpler but related problem: Find the smallest and the second smallest elements in an array (of size n):
• A straightforward method:
Find the smallest using n –1 comparisons; swap it to the end of the array, find the next smallest in n –2 comparisons, for a total of 2n – 3 comparisons.• A method that “saves” results of some earlier comparisons:
(1) Divide the array into two halves, find the smallest elements of each half, call them min1 and min2, respectively; (2) Compare min1 and min2, to find the smallest;
(3) If min1 is the smallest, compare min2 with the remaining elements of the first half to determine the second smallest; similarly for the case if min2 is the smallest.
The number of comparisons is (n –1) + (n/2 –1) = (3n/2 –2.
The second method can be depicted in the following figure:
min1 min2
min
First half of array Second half of array
In the figure, a link connects a smaller element to a larger one going downward; for example, min min1 and min min2, and min1 each element in the first half of the array, similarly for min2. This organization facilitates finding the smallest and second smallest elements.
H1 H2
A binary tree data structure for Min-heaps:A binary tree is called a left-complete binary tree if
(1) the tree is full at each level except possibly at the maximum level (depth), where the tree is full at level i means there are 2i nodes at that level (recall the root is at level 0); and
(2) at the tree’s maximum level h, if there are fewer than 2h nodes then the missing nodes are on the right side of the tree.
Missing on the right side
If there are n nodes in a left-complete binary tree of depth h, then 2h n 2h+1 –1. Thus, h lg n < (h +1). In particular, h = O(lg n).
A binary tree satisfies the (min-)heap-order property if the value at each is less than or equal to the value at each of the child nodes (if exist).
A binary min-heap is a left-complete binary tree that also satisfies the heap-order property.
13
21
24 31
16
19 68
65 26 32
Note that there is no definite order between values in sibling nodes; also, it is obvious that the minimum value is at the root.
A binary min-heap
Two basic operations on binary min-heaps:
(1)Suppose the value at a node is decreased which causes violation to the heap-order property (because the value is smaller than that of its parent’s). The operation to restore the heap property is named percolateUp (siftUp).
(2)Suppose the value at a node is increased which becomes larger than either (or both) of the value of the children’s. The operation to restore the heap order is named percolateDown (siftDown).
Both operations can be completed in time proportional to the tree height, thus, of time complexity O(lg n) where n is the total number of nodes. These operations are crucial in supporting the heap operations insert and deleteMin.
The percolateUp (siftUp) operation:
13
21 35
25 37 43 12
The heap order violated at node valued 12
21
25 37 43
13
12
35
percolateUp
21
25 37 43 35
13
12
percolateUpNote that the time of each step (each level) is constant, i.e., compare with parent and swap
The percolateDown (siftDown) operation:
43
21 35
37 25 45 63
The heap order violated at node valued 43
Each step of percolateDown finds the smaller of the two children (if exist) then swap it with the violating node
35
37 25 45 63
43
21
Who is smaller ?
Who is smaller ?
35
37 45 63
21
25
43
percolateDown
percolate-Down
Implementation of the heap operations:
insert(): insert a node next to the last node, call percolateUp from it, treating the new node as a violation.
deleteMin(): delete the root node, move the last node to the root, then call percolateDown from the root.
findMin(): return the root’s value.
Finally, a binary heap can be implemented using an array storing the heap elements from the root towards the leaves, level by level and from left to right within each level. The left-completeness property guarantees that there are no “holes” in the array. Also, if we use array indexes 1..n to store n elements, the parent’s index (if exists) of node i is i/2, the left child’s index is 2i, the right child 2i + 1 (if exist). The time complexity of deleteMin and insert both are O(lg n); the time complexity of findMin is O(1).
An array implementation of a binary heap:
1
2 3
4 5 6
Number the nodes (starting at 1) by levels, from top to bottom and left to right within level
1 2 3 4 5 6 . . . . .
Level 2 nodesLevel 1 nodesRoot
Parent to children (index i to 2i, 2i+1)
Convert an array T[1..n] into a heap (known as heapify):
A top-down method: repeatedly call insert() by inserting T[i] into a heap T[1..(i –1)], for i = 2 to n.
21
12 9
13 7
Initially, T[1] by itself ia a heap
12
21 9
13 7
Insert 12 into T[1..1], making T[1..2] a heap
9
21 12
13 7
9
13 12
21 7
7
9 12
21 13
Insert 9 into T[1..2], making T[1..3] a heap
Insert 13 into T[1..3], making T[1..4} a heap
Insert 7 into T[1..4], making T[1..5] a heap
The total time complexity of top-down heapify is O(n lg n).
A bottom-up heapify method:
12 9
13 7
21For i = n/2 down to 1
/* percolate down T[i], making the subtree rooted at i a heap */
call percolateDown(i)n = 5, start at node n/2, the node of the highest index that has any children
Complexity analysis: Node1 travels down h levels during percolateDown, nodes 2 and 3 each (h –1) levels, nodes 4 through 7 each (h –2) levels, etc. The total number of levels traveled is h + 2(h –1) + 4 (h –2) + …+ 2h –1(h – (h – 1)) = 2h –1(1 + 2(1/2) + 3(1/2)2 + 4(1/2)3 + …) 2n, because 2h –1 < n/2 since h lg n, and the infinite series 1 + 2x + 3x2 + … = 1/(1 –x)2 when |x| < 1. Thus, the time complexity of heapify is O(n).