View
50
Download
0
Category
Preview:
DESCRIPTION
Sorting and Lower Bounds. 15-211 Fundamental Data Structures and Algorithms. Peter Lee February 25, 2003. Announcements. Quiz #2 available today Open until Wednesday midnight Midterm exam next Tuesday Tuesday, March 4, 2003, in class - PowerPoint PPT Presentation
Citation preview
Sorting and Lower Bounds
15-211 Fundamental Data Structures and Algorithms
Peter Lee
February 25, 2003
Announcements
Quiz #2 available todayOpen until Wednesday midnight
Midterm exam next TuesdayTuesday, March 4, 2003, in class
Review session in Thursday’s class
Homework #4 is outYou should finish Part 1 this week!
Reading:Chapter 8
Recap
Naïve sorting algorithms Bubble sort.
24 47 13 99 105 22213 4713 24
105 47 13 99 30 222
47 105 13 99 30 222
13 47 105 99 30 222
13 47 99 105 30 222
13 30 47 99 105 222
105 47 13 99 30 222
Insertion sort.
Heapsort
Build heap. O(N)
DeleteMin until empty. O(Nlog N)
Total worst case: O(Nlog N)
Shellsort
Example with sequence 3, 1.
105 47 13 99 30 222
99 47 13 105 30 222
99 30 13 105 47 222
99 30 13 105 47 222
30 99 13 105 47 222
30 13 99 105 47 222
...
Several inverted pairs fixed in one exchange.
Divide-and-conquer
Divide-and-conquer
Analysis of recursive sorting
Suppose it takes time T(N) to sort N elements.
Suppose also it takes time N to combine the two sorted arrays.
Then:
T(1) = 1
T(N) = 2T(N/2) + N, for N>1
Solving for T gives the running time for the recursive sorting algorithm.
Divide-and-Conquer Theorem
Theorem: Let a, b, c 0.
The recurrence relationT(1) = b
T(N) = aT(N/c) + bN
for any N which is a power of c
has upper-bound solutionsT(N) = O(N) if a<c
T(N) = O(Nlog N) if a=c
T(N) = O(Nlogca) if a>c
a=2, b=1,c=2 for rec.sorting
Exact solutions
It is sometimes possible to derive closed-form solutions to recurrence relations.
Several methods exist for doing this.
Telescoping-sum method
Repeated-substitution method
Mergesort
Mergesort is the most basic recursive sorting algorithm.Divide array in halves A and B.
Recursively mergesort each half.
Combine A and B by successively looking at the first elements of A and B and moving the smaller one to the result array.
Note: Should be a careful to avoid creating of lots of result arrays.
Mergesort
L LR L
Use simple indexes to perform the split.
Use a single extra array to hold each intermediate result.
Analysis of mergesort
Mergesort generates almost exactly the same recurrence relations shown before.
T(1) = 1
T(N) = 2T(N/2) + N - 1, for N>1
Thus, mergesort is O(Nlog N).
Comparison-based sorting
Recall that these are all examples of comparison-based sorting algorithms:
• Items are stored in an array.
• Can be moved around in the array.
• Can compare any two array elements.
Comparison has 3 possible outcomes:
< = >
Non-comparison-based sorting
If we can do more than just compare pairs of elements, we can sometimes sort more quickly
Two simple examples are bucket sort and radix sort
Bucket Sort
Bucket sort
In addition to comparing pairs of elements, we require these additional restrictions:
all elements are non-negative integers
all elements are less than a predetermined maximum value
Bucket sort
1 3 3 1 2
1 2 3
Bucket sort characteristics
Runs in O(N) time.
Easy to implement each bucket as a linked list.
Is stable:
If two elements (A,B) are equal with respect to sorting, and they appear in the input in order (A,B), then they remain in the same order in the output.
Radix Sort
Radix sort
Another sorting algorithm that goes beyond comparison is radix sort.
0 1 00 0 01 0 10 0 11 1 10 1 11 0 01 1 0
20517346
01234567
0 1 00 0 01 0 01 1 01 0 10 0 11 1 10 1 1
0 0 01 0 01 0 10 0 10 1 01 1 01 1 10 1 1
0 0 00 0 10 1 00 1 11 0 01 0 11 1 01 1 1
Each sorting step must be stable.
Radix sort characteristics
Each sorting step can be performed via bucket sort, and is thus O(N).
If the numbers are all b bits long, then there are b sorting steps.
Hence, radix sort is O(bN).
Also, radix sort can be implemented in-place (just like quicksort).
Not just for binary numbers
Radix sort can be used for decimal numbers and alphanumeric strings.
0 3 22 2 40 1 60 1 50 3 11 6 91 2 32 5 2
0 3 10 3 22 5 21 2 32 2 40 1 50 1 61 6 9
0 1 50 1 61 2 32 2 40 3 10 3 22 5 21 6 9
0 1 50 1 60 3 10 3 21 2 31 6 92 2 42 5 2
Why comparison-based?
Bucket and radix sort are much faster than any comparison-based sorting algorithm
Unfortunately, we can’t always live with the restrictions imposed by these algorithms
In such cases, comparison-based sorting algorithms give us general solutions
Back to Quick Sort
Review: Quicksort algorithm
If array A has 1 (or 0) elements, then done.
Choose a pivot element x from A.
Divide A-{x} into two arrays:
B = {yA | yx}
C = {yA | yx}
Quicksort arrays B and C.
Result is B+{x}+C.
Implementation issues
Quick sort can be very fast in practice, but this depends on careful coding
Three major issues:
1. doing quicksort in-place
2. picking the right pivot
3. avoiding quicksort on small arrays
1. Doing quicksort in place
85 24 63 50 17 31 96 45
85 24 63 45 17 31 96 50
L R
85 24 63 45 17 31 96 50
L R
31 24 63 45 17 85 96 50
L R
1. Doing quicksort in place
31 24 63 45 17 85 96 50
L R
31 24 17 45 63 85 96 50
R L
31 24 17 45 50 85 96 63
31 24 17 45 63 85 96 50
L R
2. Picking the pivot
In real life, inputs to a sorting routine are often partially sorted
why does this happen?
So, picking the first or last element to be the pivot is usually a bad choice
One common strategy is to pick the middle element
this is an OK strategy
2. Picking the pivot
A more sophisticated approach is to use random sampling
think about opinion polls
For example, the median-of-three strategy:
take the median of the first, middle, and last elements to be the pivot
3. Avoiding small arrays
While quicksort is extremely fast for large arrays, experimentation shows that it performs less well on small arrays
For small enough arrays, a simpler method such as insertion sort works better
The exact cutoff depends on the language and machine, but usually is somewhere between 10 and 30 elements
Putting it all together
85 24 63 50 17 31 96 45
85 24 63 45 17 31 96 50
L R
85 24 63 45 17 31 96 50
L R
31 24 63 45 17 85 96 50
L R
Putting it all together
31 24 63 45 17 85 96 50
L R
31 24 17 45 63 85 96 50
R L
31 24 17 45 50 85 96 63
31 24 17 45 63 85 96 50
L R
A complication!
What should happen if we encounter an element that is equal to the pivot?
Four possibilities:
L stops, R keeps going
R stops, L keeps going
L and R stop
L and R keep going
Quiz Break
Red-green quiz
What should happen if we encounter an element that is equal to the pivot?
Four possibilities:
L stops, R keeps going
R stops, L keeps going
L and R stop
L and R keep going
Explain why your choice is the only reasonable one
Quick Sort Analysis
Worst-case behavior
105 47 13 17 30 222 5 195
47 13 17 30 222 19 105
47 105 17 30 222 19
13
17
47 105 19 30 22219
Best-case analysis
In the best case, the pivot is always the median element.
In that case, the splits are always “down the middle”.
Hence, same behavior as mergesort.
That is, O(Nlog N).
Average-case analysis
Consider the quicksort tree:
105 47 13 17 30 222 5 19
5 17 13 47 30 222 10519
5 17 30 222 105
13 47
105 222
Average-case analysis
The time spent at each level of the tree is O(N).
So, on average, how many levels?That is, what is the expected height of
the tree?
If on average there are O(log N) levels, then quicksort is O(Nlog N) on average.
Expected height of qsort tree
Assume that pivot is chosen randomly.
When is a pivot “good”? “Bad”?
5 13 17 19 30 47 105 222
Probability of a good pivot is 0.5.
After good pivot, each child is at most 3/4 size of parent.
Expected height of qsort tree
So, if we descend k levels in the tree, each time being lucky enough to pick a “good” pivot, the maximum size of the kth child is:
N(3/4)(3/4) … (3/4) (k times)
= N(3/4)k
But on average, only half of the pivots will be good, so
N(3/4)k/2 = 2log4/3N = O(log N)
Summary of quicksort
A fast sorting algorithm in practice.
Can be implemented in-place.
But is O(N2) in the worst case.
O(Nlog N) average-case performance.
Lower Bound for the Sorting Problem
How fast can we sort?
We have seen several sorting algorithms with O(Nlog N) running time.
In fact, O(Nlog N) is a general lower bound for the sorting algorithm.
A proof appears in Weiss.
Informally…
Upper and lower bounds
N
dg(N)T(N)
T(N) = O(f(N))T(N) = (g(N))
cf(N)
Decision tree for sorting
a<b<ca<c<bb<a<cb<c<ac<a<bc<b<a
a<b<ca<c<bc<a<b
b<a<cb<c<ac<b<a
a<b<ca<c<b
c<a<b
a<b<c a<c<b
b<a<cb<c<a
c<b<a
b<a<c b<c<a
a<b b<a
b<c c<bc<aa<c
b<c c<b a<c c<a
N! leaves.
So, tree has height log(N!).
log(N!) = (Nlog N).
Summary on sorting bound
If we are restricted to comparisons on pairs of elements, then the general lower bound for sorting is (Nlog N).
A decision tree is a representation of the possible comparisons required to solve a problem.
External Sorting
External sorting
In many real-world situations, the amount of data to be sorted is much more than can be stored in memory
So, it is important in some cases to use algorithms that work well when sorting data stored externally
See tomorrow’s recitation…
World’s Fastest Sorters
Sorting competitions
There are several world-wide sorting competitions
Unix CoSort has achieved 1GB in under one minute, on a single Alpha
Berkeley’s NOW-sort sorted 8.4GB of disk data in under one minute, using a network of 95 workstations
Sandia Labs was able to sort 1TB of data in under 50 minutes, using a 144-node multiprocessor machine
Recommended