Upload
colette-rose
View
29
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Sorting. 15-211 Fundamental Data Structures and Algorithms. Klaus Sutner February 17, 2004. Announcements. Homework 5 is out Reading: Chapter 8 in MAW Quiz 1 available on Thursday. Introduction to Sorting. Boring …. - PowerPoint PPT Presentation
Citation preview
Sorting
15-211 Fundamental Data Structures and Algorithms
Klaus Sutner
February 17, 2004
Announcements
Homework 5 is out
Reading:
Chapter 8 in MAW
Quiz 1 available on Thursday
Introduction to Sorting
Boring …
Sorting is admittedly not very sexy, everybody knows some algorithms already, …
But: Good sorting algorithms are needed absolutely everywhere.
Sorting is fairly well understood theoretically.
Provides a good way to introduce some important ideas.
The Problem
We are given a sequence of items
a1 a2 a3 … an-1 an
We want to rearrange them so that they are in non-decreasing order.
More precisely, we need a permutation f such that
af(1) af(2) af(3) … af(n-1) af(n)
.
A Constraint
Comparison Based Sorting
While we are rearranging the items, we will only use queries of the form
ai aj
Or variants thereof (<,> and so forth).
Say What?
The important point here is that the algorithm can can only make comparison such as
if( a[i] < a[j] ) …
We are not allowed to look at pieces of the elements a[i] and a[j]. For example, if these elements are numbers, we are not allowed to compare the most significant digits.
An Easy Upper Bound
Here is a simple idea to sort an array: a flip is a position in the array where two adjacent elements are out of order.
a[i] > a[i+1]
Let’s look for a flip and correct it by swapping the two elements.
A Prototype Algorithm
// FlipSortwhile( there is a flip )
pick one, fix it
Is this algorithm guaranteed to terminate?
If so, what can we say about its running time?
Is it correct, i.e., is the array sorted?
Termination
while( there is a flip )pick one, fix it
It’s tempting to do induction on the number of flips but beware:
10 15 5 10 10 5 15 10
We need to talk about inversions instead.
Flips and Inversions
24 47 13 99 105 222
inversion flip
Running Time
The total number of inversions is clearly quadratic at most.
So we can sort in quadratic time if we can manage to find and fix a flip in constant time.
We need to organize the search somehow.
Probably should try to avoid recomputation.
Naïve sorting algorithms
Bubble Sort
Selection Sort
Insertion Sort this one is actually important
Are all quadratic in the worst case and on average.
Bubble Sort
Scan through the array, fix flips as you go along. Repeat until array is sorted.
for( i = 2; i <= n; i++ )
for( j = n; j >= i; j-- )
if( A[j-1] > A[j] )
swap A[j-1] and A[j];
Selection Sort
For k = n, n-1, … find the smallest element in the last k elements of the array and swap it to the front.
for( i = 1; i <= n-1; i++ )
find A[j] minimal in A[i..n]
swap with A[i]
Insertion Sort
Place the ith element into the proper place into the already sorted list of the first i-1 elements.
for i = 2 to n do
order-insert a[i] in a[1:i-1]
Can be implemented nicely.
Insertion Sort
Using a sentinel.
for( i = 2; i <= n; i++ )
x = A[i];
A[0] = x;
for( j = i; x < A[j-1]; j-- )
A[j] = A[j-1];
A[j] = x;
Insertion sort
105 47 13 99 30 222
47 105 13 99 30 222
13 47 105 99 30 222
13 47 99 105 30 222
13 30 47 99 105 222
105 47 13 99 30 222
Sorted sublist
How fast is insertion sort?
Takes O(#inversions) steps, which is very fast if array is nearly sorted to begin with.
3 2 1 6 5 4 9 8 7 …
How long does it take to sort?
Can we do better than O(n2)?
In the worst case?
In the average case
Sorting in O(n log n)
O(n log n) turns out to be a Magic Wall: it is hard to reach, and exceedingly hard to break through.
In fact, it’s impossible in a sense to do better than O(n log n).
We already know that Heapsort will give us this bound:
- build the heap in linear time,- destroy it in O(n log n).
Heapsort in practice
The average-case analysis for heapsort is somewhat complex.
In practice, heapsort consistently tends to use nearly n log n comparisons.
So, while the worst case is better than n2, other algorithms sometimes work better.
Shellsort
Shellsort, like insertion sort, is based on swapping inverted pairs.
It achieves O(n4/3) running time.
[See your book for details.]
Shellsort
Example with sequence 3, 1.
105 47 13 99 30 222
99 47 13 105 30 222
99 30 13 105 47 222
99 30 13 105 47 222
30 99 13 105 47 222
30 13 99 105 47 222
...
Several inverted pairs fixed in one exchange.
Recursive Sorting
Recursive sorting
Intuitively, divide the problem into pieces and then recombine the results.
If array is length 1, then done.
If array is length N>1, then split in half and sort each half.
Then combine the results.
An example of a divide-and-conquer algorithm.
Divide-and-conquer
Divide-and-conquer
Why divide-and-conquer works
Suppose the amount of work required to divide and recombine is linear, that is, O(n).
Suppose also that the amount of work to complete each step is greater than O(n).
Then each dividing step reduces the amount of work by greater than a linear amount, while requiring only linear work to do so.
Divide-and-conquer is big
We will see several examples of divide-and-conquer in this course.
Recursive Sorting
If array is length 1, then done.
Otherwise, split into two smaller pieces.
Sort each piece.
Combine the sorted pieces.
Two Major Approaches
1. Make the split trivial, but perform some work when the pieces are combined Merge Sort.
2. Work during the split, but then do nothing in the combination step Quick Sort.
In either case, the overhead should be linearwith small constants.
Analysis
The analysis is relatively easy if the two pieceshave (approximately) the same size.
This is the case for Merge Sort, but not forQuick Sort.
Let’s ignore the second case for the time being.
Recurrence Equations
We need to deal with equations of the form
T(1) = 1T(n) = 2 T(n/2) + f(n)
Here f(n) is the non-recursive overhead.
There are two recursive calls, each to a sub-instance of the same size n/2.
Of course, there are other cases to consider.
Recurrence Equations
A slight generalization is
T(1) = 1T(n) = a T(n/b) + f(n)
Here f(n) is again the non-recursive overhead.
There are a recursive calls, each to a sub-instance of the size n/b.
Recurrence Equations
Of course, we’re cheating:
T(1) = 1T(n) = a T(n/b) + f(n)
Makes no sense unless b divides n.
Let’s just ignore this. In reality there are ceilings and floors and continuity arguments everywhere.
Mergesort
The Algorithm
Merging the two sorted parts here is responsible for the overhead.
merge( nil, B ) = B;merge( A, nil ) = A;
merge( a A, b B ) = if( a <= b )
prepend( merge( A, b B ), a ) elseprepend( merge( a A, B ), b )
The Algorithm
The main function.
List MergeSort( List L ){
if( length(L) <= 1 ) return L;
A = first half of L;B = second half of L;return
merge(MergeSort(A),MergeSort(B));}
Harsh Reality
In reality, the items are always given in an array.
The first and second half can be found by index arithmetic.
L LR L
But Note …
We cannot perform the merge operation in place.
Rather, we need to have another array as scratch space.
The total space requirement for Merge Sort is
2n + O(log n)
Assuming the recursive implementation.
Running Time
Solving the recurrence equation for Merge Sort one can see that the running time is
O(n log n)
Since Merge Sort reads the data strictly sequentially it is sometimes useful when data reside on slow external media.
But overall it is no match for Quick Sort.
Quicksort
Quicksort
Quicksort was invented in 1960 by Tony Hoare.
Although it has O(N2) worst-case performance, on average it is O(Nlog N).
More importantly, it is the fastest known comparison-based sorting algorithm in practice.
Quicksort idea
Choose a pivot.
Quicksort idea
Choose a pivot.
Rearrange so that pivot is in the “right” spot.
Quicksort idea
Choose a pivot.
Rearrange so that pivot is in the “right” spot.
Recurse on each half and conquer!
Quicksort algorithm
If array A has 1 (or 0) elements, then done.
Choose a pivot element x from A.
Divide A-{x} into two arrays:
B = {yA | yx}
C = {yA | yx}
Quicksort arrays B and C.
Result is B+{x}+C.
Quicksort algorithm
105 47 13 17 30 222 5 19
5 17 13 47 30 222 10519
5 17 30 222 105
13 47
105 222
Quicksort algorithm
105 47 13 17 30 222 5 19
5 17 13 47 30 222 10519
5 17 30 222 105
13 47
In practice, insertion sort is used once the arrays get “small enough”.
105 222
Doing quicksort in place
85 24 63 50 17 31 96 45
85 24 63 45 17 31 96 50
L R
85 24 63 45 17 31 96 50
L R
31 24 63 45 17 85 96 50
L R
Doing quicksort in place
31 24 63 45 17 85 96 50
L R
31 24 17 45 63 85 96 50
R L
31 24 17 45 50 85 96 63
31 24 17 45 63 85 96 50
L R
Quicksort is fast but hard to do
Quicksort, in the early 1960’s, was famous for being incorrectly implemented many times.
More about invariants next time.
Quicksort is very fast in practice.
Faster than mergesort because Quicksort can be done “in place”.
Informal analysis
If there are duplicate elements, then algorithm does not specify which subarray B or C should get them.Ideally, split down the middle.
Also, not specified how to choose the pivot.Ideally, the median value of the array,
but this would be expensive to compute.
As a result, it is possible that Quicksort will show O(N2) behavior.
Worst-case behavior
105 47 13 17 30 222 5 195
47 13 17 30 222 19 105
47 105 17 30 222 19
13
17
47 105 19 30 22219
Analysis of quicksort
Assume random pivot.
T(0) = 1
T(1) = 1
T(N) = T(i) + T(N-i-1) + cN, for N>1
where I is the size of the left subarray.
Worst-case analysis
If the pivot is always the smallest element, then:
T(0) = 1T(1) = 1T(N) = T(0) + T(N-1) + cN, for N>1 T(N-1) + cN = O(N2)
See the book for details on this solution.
Best-case analysis
In the best case, the pivot is always the median element.
In that case, the splits are always “down the middle”.
Hence, same behavior as mergesort.
That is, O(Nlog N).
Average-case analysis
Consider the quicksort tree:
105 47 13 17 30 222 5 19
5 17 13 47 30 222 10519
5 17 30 222 105
13 47
105 222
Average-case analysis
The time spent at each level of the tree is O(N).
So, on average, how many levels?That is, what is the expected height of
the tree?
If on average there are O(log N) levels, then quicksort is O(Nlog N) on average.
Average-case analysis
We’ll answer this question next time…
Summary of quicksort
A fast sorting algorithm in practice.
Can be implemented in-place.
But is O(N2) in the worst case.
Average-case performance?