90
1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

Embed Size (px)

Citation preview

Page 1: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

1CSC 20531

MERGESORT –Radix and Bin Sort -

Csc 2053 SORTING

Page 2: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

2CSC 2053

Stable vs. Non-Stable Sorts

We frequently use sorting methods for items with multiple keys

Sometimes we need to apply the sorting with different keys

– For instance we want to sort a list of people based on last name and then on age

So Black age 30 should appear before Jones age 30

Page 3: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

3CSC 2053

Stable vs. Non-Stable Sorts

If we sort a list based on the first key (name) and then apply a sort based on the second key (age) how can we guarantee that the list is still ordered based on the first key?

Definition:

A sorting method is said the be stable if it preserves the relative order of duplicated keys on the list

Page 4: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

4CSC 2053

An Example of a Stable Sortadapted from Algorithms byR. Sedgewick

Adams (30)

Washington (23)

Wilson (50)

Black (23)

Brown (40)

Smith (30)

Thompson (40)

Jackson (23)

White (50)

Jones (50)

Adams (30)

Washington (23)

Wilson (50)

Black (23)

Brown (40)

Smith (30)

Thompson (40)

Jackson (23)

White (50)

Jones (50)

Adams (30)

Black (23)

Brown (40)

Jackson (23)

Jones (50)

Smith (30)

Thompson (40)

Washington (23)

White (50)

Wilson (50)

Adams (30)

Black (23)

Brown (40)

Jackson (23)

Jones (50)

Smith (30)

Thompson (40)

Washington (23)

White (50)

Wilson (50)

Black (23)

Jackson (23)

Washington (23)

Adams (30)

Smith (30)

Brown (40)

Thompson (40)

Jones (50)

White (50)

Wilson (50)

Black (23)

Jackson (23)

Washington (23)

Adams (30)

Smith (30)

Brown (40)

Thompson (40)

Jones (50)

White (50)

Wilson (50)

Page 5: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

5CSC 2053

Stable vs. Non-Stable Sorts

Mergesort is relatively easy to be made stable

– Just make sure the merge function is stable

Heapsort sorts in O(n log n) but it is not stable

Quicksort is also not stable

Exercise: You should experiment with all the main sorting algorithms to understand which ones are stable and which ones are not.

Page 6: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

6CSC 2053

Mergesort

We saw that Quicksort is based on the idea of selecting an element and dividing the list in two halves and then sorting the halves separately

The complementary process which is called merging. – Given two lists which are ordered, combine them into a

larger ordered list

Page 7: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

7

MERGESORT

Selection and merging are complementary because

– Selection divides a list into two independent lists

– Merging joins two independent lists into one larger list

Mergesort consists of two recursive calls and one merging procedure

CSC 2053

Page 8: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

8CSC 2053

Mergesort

The desirable features of Mergesort

– It performs in O (n log n) in the worst case– It is stable

– It is quite independent of the way the initial list is organized

– Good for linked lists. Can me implemented in such a way that data is accessed sequentially

Page 9: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

9CSC 2053

Mergesort

Drawbacks

– It may require an auxiliary array of up to the size of the original list

This can be avoided but the algorithm becomes significantly more complicated making it not worth it

Instead we can use heapsort which is also O(n log n)

Page 10: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

10CSC 2053

Understanding the Algorithm

1. Calculate the index of the middle of the list, called it m

2. Use recursion to sort the two partitions [first,m] and [m+1,last]

3. Merge the two ordered lists into one large list

1. Calculate the index of the middle of the list, called it m

2. Use recursion to sort the two partitions [first,m] and [m+1,last]

3. Merge the two ordered lists into one large list

Page 11: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

11CSC 2053

Mergesort void mergesort(int list[], int first, int last) { // PRE: list is an array && // the portion to be sorted runs from first to last inclusive if (first >= last) // Nothing to sort return; int m = (first+last)/2; // calculate the middle of the list

// Recursively call the two partitions mergesort(list,first,m); mergesort(list,m+1,last);

merge(list,first,m,last); // merges two sorted lists

// POST: list is sorted in ascending order between the first // and the last}

void mergesort(int list[], int first, int last) { // PRE: list is an array && // the portion to be sorted runs from first to last inclusive if (first >= last) // Nothing to sort return; int m = (first+last)/2; // calculate the middle of the list

// Recursively call the two partitions mergesort(list,first,m); mergesort(list,m+1,last);

merge(list,first,m,last); // merges two sorted lists

// POST: list is sorted in ascending order between the first // and the last}

Page 12: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

12CSC 2053

Understanding the merge function

We know that we can easily merge two arrays into a third one Let us try to improve this by using only two arrays.

Given an array where we know that this list is organized in such a way that

from first to m we have half of the array sorted and from m+1 to last we have another half to be sorted, we can

have a merge in-place

– merge(list, first, m, last)

We'll see that an extra array is still required but we save time by not having to create an extra array to hold the two halves.

Page 13: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

13CSC 2053

Understanding the merge When the merge function is called we have the following

scenario – a list divided into two sections each in ascending order. Store the two lists in second array as below

To make the algorithm simpler, reverse the second half of the list storing it in an auxiliary list in descending order

list

list

list in ascending orderlist in ascending order list in ascending orderlist in ascending order

firstfirst mm lastlast

aux aux list in ascending orderlist in ascending order list in descending orderlist in descending order

firstfirst mm lastlast

Page 14: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

14CSC 2053

Tracing the merge from the second array to the first (last step)

list 2

list 2 44 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

Page 15: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

15CSC 2053

Tracing the mergeCreate and auxiallary array

list 2

list 2 44 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

Aux Aux

Page 16: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

16CSC 2053

Tracing the mergeCreate a variable i to refer what m is referring to (the middle)

list 2

list 2 44 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

Auxilliaryarray

Auxilliaryarray

i(3)i(3)

Page 17: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

17CSC 2053

Tracing the merge Copy the middle of the first array to the middle of the auxiallary array and decrement i.

list 2

list 2 44 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux aux

1010

i(3)i(3)

Page 18: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

18CSC 2053

Tracing the merge(last step)

list 2

list 2 44 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux aux 1010

i(2)i(2)

Page 19: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

19CSC 2053

Tracing the mergeContinue to copy the first part of the first array to the second array

list 2

list 2 44 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux aux 99 1010

i(1)i(1)

Page 20: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

20CSC 2053

Tracing the merge(last step)

list 2

list 2 44 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux aux 99 1010

i(2)i(2)

Page 21: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

21CSC 2053

Tracing the merge(last step)

list 2

list 2 44 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux aux 44 99 1010

i(1)i(1)

Page 22: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

22CSC 2053

Tracing the merge(last step)

list 2

list 2 44 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010

i(1)i(1)

Page 23: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

23CSC 2053

Tracing the merge(last step)

list 2

list 2 44 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010

i(0)i(0)

Page 24: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

24CSC 2053

Tracing the mergeAssign j to last element in the first array

list 2

list 2 44 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010

i(0)i(0) j(7)j(7)

Page 25: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

25CSC 2053

Tracing the mergeCopy elements from j(12) backwards to m(10) to second array in descending order

list 2

list 2 44 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212

i(0)i(0) j(7)j(7)

Page 26: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

26CSC 2053

Tracing the merge Copy elements from j(12) backwards to m(10) to second array in descending order

list 2

list 2 44 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212

i(0)i(0) j(6)j(6)

Page 27: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

27CSC 2053

Tracing the merge(last step)

list 2

list 2 44 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212 1111

i(0)i(0) j(6)j(6)

Page 28: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

28CSC 2053

Tracing the merge(last step)

list 2

list 2 44 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212 1111

i(0)i(0) j(5)j(5)

Page 29: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

29CSC 2053

Tracing the merge(last step)

list 2

list 2 44 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212 1111 44

i(0)i(0) j(5)j(5)

Page 30: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

30CSC 2053

Tracing the merge(last step)

list 2

list 2 44 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212 1111 44

i(0)i(0) j(4)j(4)

Page 31: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

31CSC 2053

Tracing the merge(Stop when j reaches m(at 10)

list 2

list 2 44 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212 1111 44 33

i(0)i(0) j(4)j(4)

Page 32: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

32CSC 2053

Tracing the mergeNow merge two parts of the array

list 2

list 2 44 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212 1111 44 33

i(0)i(0) j(3)j(3)

Page 33: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

33CSC 2053

Tracing the merge(Assign i to index 0 and j to lastindex

list 2

list 2 44 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212 1111 44 33

i(0)i(0) j(7)j(7)

k(0)k(0)

Page 34: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

34CSC 2053

Tracing the mergeStarting with i, compare i to j(2 to 3)

list 2

list 2 44 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212 1111 44 33

i(0)i(0) j(7)j(7)

k(0)k(0)

Page 35: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

35CSC 2053

Tracing the mergeStore the smaller of the two in the main list and increment i

list 2 list 2 44 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212 1111 44 33

i(1)i(1) j(7)j(7)

k(1)k(1)

Page 36: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

36CSC 2053

Tracing the mergeCompare i and j and choose smallest(3) and decrement j

list 2

list 2

33 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2

aux 2

44 99 1010 1212 1111 44 33

i(1)i(1) j(7)j(7)

k(1)k(1)

Page 37: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

37CSC 2053

Tracing the mergeCompare i and j and pick smallest. The are same so choose the first one - this is a stable sort – and increment i

list 2 list 2 33 99 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212 1111 44 33

i(1)i(1) j(6)j(6)

k(2)k(2)

Page 38: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

38CSC 2053

Tracing the merge(last step)

list 2 list 2 33 44 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212 1111 44 33

i(1)i(1) j(6)j(6)

k(2)k(2)

Page 39: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

39CSC 2053

Tracing the merge Compare i and j and pick smallest - 4

list 2 list 2 33 44 1010 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212 1111 44 33

i(2)i(2) j(6)j(6)

k(3)k(3)

Page 40: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

40CSC 2053

Tracing the merge(last step)

list 2 list 2 33 44 44 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212 1111 44 33

i(2)i(2) j(6)j(6)

k(3)k(3)

Page 41: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

41CSC 2053

Tracing the mergedecrement j and Compare i and j and pick smallest

list 2 list 2 33 44 44 33 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212 1111 44 33

i(2)i(2) j(5)j(5)

k(4)k(4)

Page 42: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

42CSC 2053

Tracing the merge(last step)

list 2 list 2 33 44 44 99 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212 1111 44 33

i(2)i(2) j(5)j(5)

k(4)k(4)

Page 43: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

43CSC 2053

Tracing the merge(last step)

list 2 list 2 33 44 44 99 44 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212 1111 44 33

i(3)i(3) j(5)j(5)

k(5)k(5)

Page 44: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

44CSC 2053

Tracing the merge(last step)

list 2 list 2 33 44 44 99 1010 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212 1111 44 33

i(3)i(3) j(5)j(5)

k(5)k(5)

Page 45: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

45CSC 2053

Tracing the merge(last step)

list 2 list 2 33 44 44 99 1010 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212 1111 44 33

i(4)i(4) j(5)j(5)

k(6)k(6)

Page 46: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

46CSC 2053

Tracing the merge(last step)

list 2 list 2 33 44 44 99 1010 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212 1111 44 33

i(4)i(4) j(5)j(5)

k(6)k(6)

Page 47: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

47CSC 2053

Tracing the merge(last step)

list 2 list 2 33 44 44 99 1010 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212 1111 44 33

i(4)i(4) j(4)j(4)

k(7)k(7)

Page 48: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

48CSC 2053

Tracing the merge(last step)

list 2 list 2 33 44 44 99 1010 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212 1111 44 33

i(4)i(4) j(4)j(4)

k(7)k(7)

Page 49: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

49CSC 2053

Tracing the mergei and j crossover – so method ends and main list is sorted in ascending order

list 2 list 2 33 44 44 99 1010 1111 1212

first(0)first(0)

m(3)m(3)

last(7)last(7)

aux 2 aux 2 44 99 1010 1212 1111 44 33

i(4)i(4) j(3)j(3)

k(8)k(8)

Page 50: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

50CSC 2053

merge void merge(int list[], int first, int m, int last) { int i, j; int aux[MAXSIZE]; // This copies the mainarray for (i = m+1; i > first;i--) // to the second array aux[i-1] = list[i-1]; for (j = m; j < r; j++) aux[last+m-j] = list[j+1];

// ASSERT: aux list has been prepared with the right half in descending order and left half in ascending order for (int k = 0; k <= last; k++) if (aux[j] < aux[i]) // this re-assembles the list into list[k] = aux[j--]; // sorted order to the first array else list[k] = aux[i++];}

void merge(int list[], int first, int m, int last) { int i, j; int aux[MAXSIZE]; // This copies the mainarray for (i = m+1; i > first;i--) // to the second array aux[i-1] = list[i-1]; for (j = m; j < r; j++) aux[last+m-j] = list[j+1];

// ASSERT: aux list has been prepared with the right half in descending order and left half in ascending order for (int k = 0; k <= last; k++) if (aux[j] < aux[i]) // this re-assembles the list into list[k] = aux[j--]; // sorted order to the first array else list[k] = aux[i++];}

Page 51: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

51CSC 2053

On the performance of Mergesort

Unlike quicksort, mergesort guarantees O(n log n) in the worst case

– The reason for this is that quicksort depends on the value of the pivot whereas mergesort divides the list based on the index

Why is it O (n log n)?

– Each merge will require N comparisons – Each time the list is halved– So the standard divide-and-conquer recurrence applies to

mergesort

Page 52: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

52CSC 2053

Lecture Sort - Key Points Quicksort

– Use for good overall performance where time is not a constraint

Heap Sort– Slower than quick sort, but guaranteed O(n log n)– Use for real-time systems where time is critical

Page 53: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

53CSC 2053

Radixsort In many applications the key used in the ordering is not as

simple as we would expect

– Keys in phone books

– The key in a library catalog

– ZIP codes

So far we have not considered the subtleties of dealing with complex keys

Page 54: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

54CSC 2053

Radix Sort

Also in several applications the consideration of the whole key is not necessary

– How many letters of a person name do we compare to find this person in the phone book?

Radix sort algorithms try to gain the same sort of efficiency by decomposing keys into pieces and comparing the pieces

Page 55: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

55CSC 2053

Radixsort

So the main idea is to treat numbers as being represented in a base and work with individual digits of the numbers

– We could represent a number in binary and work with the individual bits

– We can consider the number in decimal and work with the individual digits

– We can consider strings as sequence of characters and work with the individual characters

– etc.

Page 56: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

56CSC 2053

Radixsort

Radix sort is used by several applications that deal with– Telephone numbers– Social security– ZIP codes

Consider the example with ZIP codes– Letter can be divided into 10 boxes: ZIP codes starting with

0 go to box 0, ZIP codes starting with 1 go to box 1 and so on– After the zip codes are separated in boxes, each of the boxes

can be sorted further considering the second digit

Page 57: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

57CSC 2053

An example (MSD)

0

3

2

5

0

1

6

3

3

1

5

2

0

2

2

2

5

0

8

6

6

8

3

4

6

2

1

8

5

3

0

2

5

6

3

4

4

4

9

0

3

1

1

1

2

8

9

0

4

4

7

4

4

2

3

6

7

8

8

4

6

5

8

5

1

Page 58: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

58CSC 2053

An example (MSD)

0 2 1 0 4

3 2 8 3 2

2 2 5 1 3

5 5 3 1 6

0 0 0 1 7

1 8 2 2 8

6 6 5 8 8

3 6 6 9 4

3 8 3 0 6

1 3 4 4 5

5 4 4 4 8

2 6 4 7 5

0 2 9 4 1Sorted on first digit

Page 59: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

59CSC 2053

An example (MSD)

0 2 1 0 4

3 2 8 3 2

2 2 5 1 3

5 5 3 1 6

0 0 0 1 7

1 8 2 2 8

6 6 5 8 8

3 6 6 9 4

3 8 3 0 6

1 3 4 4 5

5 4 4 4 8

2 6 4 7 5

0 2 9 4 1Sorted on first digit and second digit

Page 60: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

60CSC 2053

Types of Radixsort

Because radixsort deals with individual digits of a number we have choices on how to go about comparing the digits

– From left to right– From right to left

The methods that use the first type are called MSD (Most-Significant-Digit) Radix Sorts

The methods that use the second type are called LSD (Less-Significant-Digit) Radix Sorts

MSD sorts are normally more frequently used because the examine the minimum number of data to get the job done

Page 61: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

61CSC 2053

Which radix should we use?

This depend on the size of the keys.

Normally for smaller keys a simple extraction of the digits of the keys can do the job

For large keys it may be a better idea to use the binary representation of the key.

– Computers represent data with binary numbers

– Most modern languages allow us to deal with the binary representation of variables

Page 62: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

62CSC 2053

Sorting - Bin Sort Assume

– All the keys lie in a small, fixed rangeeg

– integers 0-99– characters ‘A’-’z’, ‘0’-’9’

– There is at most one item with each value of the key Bin sort

Allocate a bin for each value of the keyUsually an entry in an array

For each item, Extract the keyCompute it’s bin numberPlace it in the bin

Finished!

Page 63: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

63CSC 2053

Sorting - Bin Sort: Analysis– All the keys lie in a small, fixed range

There are m possible key values– There is at most one item with each value of the key

Bin sort Allocate a bin for each value of the key O(m)

Usually an entry in an array For each item, n times

Extract the key O(1)Compute it’s bin number O(1)Place it in the bin O(1) x n O(n)

Result: O(n) + O(m) = O(n+m) = O(n) if n >> m

Key condition

Page 64: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

64CSC 2053

Sorting - Bin Sort: Caveat Key Range

– All the keys lie in a small, fixed rangeThere are m possible key values

– If this condition is not met, eg m >> n,then bin sort is O(m)

Example– Key is a 32-bit integer, m = 232

– Clearly, this isn’t a good way to sort a few thousand integers– Also, we may not have enough space for bins!

Bin sort trades space for speed!– There’s no free lunch!

Page 65: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

65CSC 2053

Sorting - Bin Sort with duplicates

– There is at most one item with each value of the key Bin sort

Allocate a bin for each value of the key O(m)Usually an entry in an arrayArray of list heads

For each item, n timesExtract the key O(1)Compute it’s bin number O(1)Add it to a list O(1) x n O(n)Join the lists O(m)Finished! O(n) + O(m) = O(n+m) = O(n) if n >> m

Relax?

Page 66: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

66CSC 2053

Sorting - Generalised Bin Sort

Radix sort• Bin sort in phases• Example

• Phase 1 - Sort by least significant digit36 9 0 25 1 49 64 16 81 4

0 1 2 3 4 5 6 7 8 9

0 181

64425 36

16

949

Page 67: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

67CSC 2053

Sorting - Generalised Bin Sort Radix sort - Bin sort in phases

• Phase 1 - Sort by least significant digit

• Phase 2 - Sort by most significant digit

0 1 2 3 4 5 6 7 8 9

0 181

64425 36

16

949

0 1 2 3 4 5 6 7 8 9

0

Page 68: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

68CSC 2053

Sorting - Generalised Bin Sort Radix sort - Bin sort in phases

• Phase 1 - Sort by least significant digit

• Phase 2 - Sort by most significant digit

0 1 2 3 4 5 6 7 8 9

0 181

64425 36

16

949

0 1 2 3 4 5 6 7 8 9

01

Be careful toadd after anythingin the bin already!

The O bin holds values 0-9

Page 69: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

69CSC 2053

Sorting - Generalised Bin Sort Radix sort - Bin sort in phases

• Phase 1 - Sort by least significant digit

• Phase 2 - Sort by most significant digit

0 1 2 3 4 6 7 8 9

0 181

25 3616

949

0 1 2 3 4 5 6 7 8 901

81

5644

Page 70: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

70CSC 2053

Sorting - Generalised Bin Sort

Radix sort - Bin sort in phases• Phase 1 - Sort by least significant digit

• Phase 2 - Sort by most significant digit

0 1 2 3 4 5 6 7 8 9

0 181

64425 36

16

949

0 1 2 3 4 5 6 7 8 9

01

8164

Page 71: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

71CSC 2053

Sorting - Generalised Bin Sort Radix sort - Bin sort in phases

• Phase 1 - Sort by least significant digit

• Phase 2 - Sort by most significant digit

0 1 2 3 4 5 6 7 8 9

0 181

64425 36

16

949

0 1 2 3 4 5 6 7 8 9

014

8164

Page 72: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

72CSC 2053

1 2 3 4 5 6 7 8 9

816425 3616 49

Sorting - Generalised Bin Sort Radix sort - Bin sort in phases

• Phase 1 - Sort by least significant digit

• Phase 2 - Sort by most significant digit

0 1 2 3 4 5 6 7 8 9

0 181

64425 36

16

949

0

0149

Note that the 0bin had to bequite large!

Page 73: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

73CSC 2053

1 2 3 4 5 6 7 8 9

816425 3616 49

Sorting - Generalised Bin Sort Radix sort - Bin sort in phases

• Phase 1 - Sort by least significant digit

• Phase 2 - Sort by most significant digit

0 1 2 3 4 5 6 7 8 9

0 181

64425 36

16

949

0

0149

How much space is neededin each phase?

n items

m bins

Page 74: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

74CSC 2053

Sorting - Generalised Bin Sort Radix sort - Analysis

• Phase 1 - Sort by least significant digit• Create m binsO(m)• Allocate n items O(n)

• Phase 2 • Create m bins O(m)• Allocate n items O(n)

• Final• Link m bins O(m)

• All steps in sequence, so add• Total O(3m+2n) O(m+n) O(n) for m<<n

Page 75: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

75CSC 2053

Sorting - Radix Sort - Analysis Radix sort - General

• Base (or radix) in each phase can be anything suitable• Integers

• Base 10, 16, 100, …• Bases don’t have to be the same

• Still O(n) if n >> si for all i

class date { int day; /* 1 .. 31 */ int month; /* 1 .. 12 */ int year; /* 0 .. 99 */ }

Phase 1 - s1=31 bins

Phase 2 - s2=12 bins

Phase 3 - s3=100 bins

Page 76: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

76CSC 2053

Performance of Radixsort

For sorting n records with k number of digits the running time of Radixsort is equivalent to nk = O(n)

– This is because the the algorithms makes k (constant) pass over all n keys

Clearly this performance depend on the sorting used to sort the element based on digit k.

Page 77: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

77CSC 2053

Radix Sort - Analysis

Generalised Radix Sort Algorithm

radixsort( A, n ) { for(i=0;i<k;i++) { for(j=0;j<s[i];j++) bin[j] = EMPTY;

for(j=0;j<n;j++) { move A[i] to the end of bin[A[i]->fi] }

for(j=0;j<s[i];j++) concat bin[j] onto the end of A; } }

O( si )

O( n )

O( si )

For each of k radices

Page 78: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

78CSC 2053

Radix Sort - Analysis

Generalised Radix Sort Algorithm

radixsort( A, n ) { for(i=0;i<k;i++) { for(j=0;j<s[i];j++) bin[j] = EMPTY;

for(j=0;j<n;j++) { move A[i] to the end of bin[A[i]->fi] }

for(j=0;j<s[i];j++) concat bin[j] onto the end of A; } }

O( si )

O( n )

O( si )

Clear the si bins for the ith radix

Page 79: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

79CSC 2053

Radix Sort - Analysis

Generalised Radix Sort Algorithm

radixsort( A, n ) { for(i=0;i<k;i++) { for(j=0;j<s[i];j++) bin[j] = EMPTY;

for(j=0;j<n;j++) { move A[i] to the end of bin[A[i]->fi] }

for(j=0;j<s[i];j++) concat bin[j] onto the end of A; } }

O( si )

O( n )

O( si )Move element A[i]to the end of the bin addressedby the ith field of A[i]

Page 80: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

80CSC 2053

Radix Sort - Analysis

Generalised Radix Sort Algorithm

radixsort( A, n ) { for(i=0;i<k;i++) { for(j=0;j<s[i];j++) bin[j] = EMPTY;

for(j=0;j<n;j++) { move A[i] to the end of bin[A[i]->fi] }

for(j=0;j<s[i];j++) concat bin[j] onto the end of A; } }

O( si )

O( n )

O( si )

Concatenate si bins intoone list again

Page 81: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

81CSC 2053

Sorting - Better than O(n log n) ? If all we know about the keys is an ordering rule

– No! However,

– If we can compute an address from the key(in constant time) then

– bin sort algorithms can provide better performance

Page 82: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

82CSC 2053

Performance of Radixsort

For large values of n the running time of radixsort is comparable to O(n log n)

– If we use binary representation of the keys and we have 1 million 32-bit keys, then k and log n are both about 32. So kn would be comparable to n log n

Page 83: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

83CSC 2053

Radix Sort - Analysis

Total

– k iterations, 2si + n for each one

– As long as k is constant

– In general, if the keys are in (0, bk-1)Keys are k-digit base-b numbers si = b for all k

Complexity O(n+kb) = O(n)

Page 84: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

84CSC 2053

Radix Sort - Analysis

? Any set of keys can be mapped to (0, bk-1 )

! So we can always obtain O(n) sorting?

If k is constant, yes

Page 85: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

85CSC 2053

Radix Sort - Analysis

– But, if k is allowed to increase with n

eg it takes logbn base-b digits to represent n

– Radix sort is no better than quicksort

= O(n log n + 2 log n ) = O(n log n )

Page 86: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

86CSC 2053

Radix Sort - Analysis• Radix sort is no better than quicksort

• Another way of looking at this:• We can keep k constant as n increases

if we allow duplicate keys• keys are in (0, bk ), bk < n

• but if the keys must be unique,then k must increase with n

• For O(n) performance, the keys must lie in a restricted range

Page 87: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

87CSC 2053

Radix Sort - Realities

Radix sort uses a lot of memory

– n si locations for each phase

– In practice, this makes it difficult to achieve O(n)performance

– Cost of memory management outweighs benefits

Page 88: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

88CSC 2053

Lecture 9 - Key Points

Bin Sorts

– If a function exists which can convert the key to an address (ie a small integer)

and the number of addresses (= number of bins) is not too large

then we can obtain O(n) sorting

… but remember it’s actually O(n + m)

– Number of bins, m, must be constant and small

Page 89: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

89

Bin or Bucket Sort Analysis

Bucket sorts work well for data sets where the possible key values are known and relatively small and there are on average just a few elements per bucket.

This means the cost of sorting the contents of each bucket can be reduced toward zero.

The ideal result is if the order in each bucket is uninteresting or trivial, for instance, when each bucket holds a single key.

The buckets may be arranged so the concatenation phase is not needed, for instance, the buckets are contiguous parts of an array.

CSC 2053

Page 90: 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING

90CSC 2053

Sorting We now know several sorting algorithms

– Insertion O(n2)– Heap O(n log n) Guaranteed– Quick O(n log n) Most of the time!

Can we do any better?