33
Chapter 9: Medians and Order Statistics

Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Chapter 9: Medians and Order Statistics

Page 2: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

About this lecture

• Finding max, min in an unsorted array (upper bound and lower bound)

• Finding both max and min (upper bound)

• Selecting the kth smallest element

kth smallest element kth order statistics

2014/11/24 2

Page 3: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Finding Maximumin unsorted array

Page 4: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Finding Maximum (Method I)

• Let S denote the input set of n items • To find the maximum of S, we can:

Step 1: Set max = item 1Step 2: for k = 2, 3, …, n

if (item k is larger than max) Update max = item k;

Step 3: return max;

# comparisons = n – 12014/11/24 4

Page 5: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Finding Maximum (Method II)

• Define a function Find-Max as follows:Find-Max(R, k) /* R is a set with k items */

1. if (k 2) return maximum of R;

2. Partition items of R into pairs;3. Delete smaller item from R in each pair;4. return Find-Max(R, k - );

Calling Find-Max(S, n) gives the maximum of S

2/k

2/k

2014/11/24 5

Page 6: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Finding Maximum (Method II)

• Let T(n) = # comparisons for Find-Max with problem size n

• So, T(n) = T(n - n/2) + n/2 for n 3T(2) = 1

• Solving the recurrence (by substitution), we get T(n) = n - 1

2014/11/24 6

Page 7: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Lower Bound

• Question: Can we find the maximum using fewer than n – 1 comparisons?

• Answer: No ! Every element except the winner must drop at least one match

• So, we need to ensure n-1 items not max at least n – 1 comparisons are needed

2014/11/24 7

Page 8: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Finding Both Max and Minin unsorted array

Page 9: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Finding Both Max and Min

• Can we find both max and min quickly?

• Solution 1: First, find max with n – 1 comparisonsThen, find min with n – 1 comparisons Total = 2n – 2 comparisons

Is there a better solution ??

2014/11/24 9

Page 10: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Finding Both Max and Min• Better Solution: (Case 1: if n is even)

First, partition items into n/2 pairs;

Next, compare items within each pair;

= larger = smaller2014/11/24 10

Page 11: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Finding Both Max and Min

• Then, max = Find-Max in larger itemsmin = Find-Min in smaller items

Find-Max Find-Min

# comparisons = 3n/2 – 22014/11/24 11

Page 12: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Finding Both Max and Min

• Better Solution: (Case 2: if n is odd)

• We find max and min of first n - 1 items;if (last item is larger than max)

Update max = last item;if (last item is smaller than min )

Update min = last item;

# comparisons = 3(n-1)/22014/11/24 12

Page 13: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Finding Both Max and Min

• Conclusion: • To find both max and min:

if n is odd: 3(n-1)/2 comparisonsif n is even: 3n/2 – 2 comparisons

• Combining: at most 3n/2 comparisons better than finding max and min

separately

2014/11/24 13

Page 14: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Selecting kth smallest itemin unsorted array

Page 15: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Selection in Expected Linear Time

Randomized-Select(A, p, r, i)1. if p==r return A[p]2. q = Randomized-Partition(A, p, r)3. k = q – p + 14. if i ==k //the pivot value is the answer

return A[q]5. else if i < k

return Randomized-Partition(A, p, q-1, i)6. else return Randomized-Partition(A, q+1, r, i-k)2014/11/24 15

Page 16: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Running Time (1)

• Worst case: T(n) = O(n) + T(n–1) = O(n2)• Average case:• E[T(n)] = O(n) + 1/n ∑1≤k≤n E[T(max(k-1, n-k))]

= O(n) + 2/n ∑n/2≤k≤n-1 E[T(k)] (why?)• We can prove E[T(n)] ≤ cn using

mathematic induction method.• Basis: T(n) = O(1) for n less some

constant

2014/11/24 16

Page 17: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Running Time (2)

• Induction Step:• Assume E[T(n)] ≤ K hold• We need to prove n = K + 1 hold

2014/11/24 17

• E 푇 푛 ≤ ∑ 푐푘 + 푎푛

• = ∑ 푘 −∑ 푘 + 푎푛

• = ( ) −( ) /

+ 푎푛

Page 18: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Running Time (3)

2014/11/24 18

≤2푐푛

푛 − 1 푛2 −

푛2 − 2 푛

2 − 12 + 푎푛

=2푐푛

푛 − 푛2 −

푛4 − 3푛

2 + 22 + 푎푛

=푐푛

3푛4 +

푛2 − 2 + 푎푛

= c3푛4 +

12 −

2푛 + 푎푛

Page 19: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Running Time (4)

≤3푐푛4 +

푐2 + 푎푛

= 푐푛 −푐푛4 −

푐2 − 푎푛

• In order to complete the proof, we need − − 푎푛 ≥ 0 => − 푎푛 ≥

푛 − 푎 ≥ => 푛 ≥ =

• Thus, if we assume T(n) = O(1) for 푛 < , then E[T(n)] = O(n)

2014/11/24 19

Page 20: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Selection in Linear Time• In next slides, we describe a recursive call

Select(S, k)which supports finding the kth smallest element in S

• Recursion is used for two purposes:(1) selecting a good pivot (as in Quicksort)(2) solving a smaller sub-problem

2014/11/24 20

Page 21: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Selection in worst-case linear time (1)

2014/11/24 21

Page 22: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

5. Let q = # items of S smaller than m;6. If (k == q + 1)

return m;/* Partition with pivot */7. Else partition S into X and Y

X = {items smaller than m} Y = {items larger than m}

/* Next, form a sub-problem */8. If (k ≤ q)

return Select(X, k)9. Else

return Select(Y, k–(q+1));

Selection in worst-case linear time (2)

2014/11/24 22

Page 23: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

2014/11/24 23

Page 24: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Running Time

• In our selection algorithm, we choose m, which is the median of medians, to be a pivot and partition S into two sets X and Y

• In fact, if we choose any other item as the pivot, the algorithm is still correct

• Why don’t we just pick an arbitrary pivot so that we can save some time ??

2014/11/24 24

Page 25: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Running Time

• A closer look reviews that the worst-case running time depends on |X| and |Y|

• Precisely, if T(|S|) denote the worst-case running time of the algorithm on S, then

T(|S|) = T(|S|/5) + Q(|S|) + max {T(|X|),T(|Y|) }

2014/11/24 25

Page 26: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Running Time

• Later, we show that if we choose m, the “median of medians”, as the pivot,

both |X| and |Y| will be at most 7|S|/10 + 6

• Consequently,

T(n) = T(n /5) + Q(n) + T(7n/10 + 6)

T(n) = O(n) (obtained by substitution)2014/11/24 26

Page 27: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Substitution

cnanccncn

anccnanccnccn

anncncnT

),710/(710/9

610/75/)610/7(5/)(

if 0710/ anccn

2014/11/24 27

=> c ≥ 10an/(n -70) when n > 70 If we assume n ≥ 140, we have n/(n -70) ≤ 2.So, we choose c ≥ 20 a

Page 28: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Median of Medians

• Let’s begin with n/5 sorted groups, each has 5 items (one group may have fewer)

= larger = smaller= median

2014/11/24 28

Page 29: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Median of Medians• Then, we obtain the median of medians, m

= mGroups with median smaller than m

Groups with median larger than m

2014/11/24 29

Page 30: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Median of MediansThe number of items with value greater than m is at least

3(n/5/2 – 2)

each full group has 3 ‘crossed’ items min # of

groups

two groups may not have 3 ‘crossed’

items

number of items: at least 3n/10 – 6

2014/11/24 30

Page 31: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Median of MediansPrevious page implies that at most

7n/10 + 6 itemsare smaller than m

For large enough n (say, n 140)7n/10 + 6 3n/4

|X| is at most 3n/4 for large enough n

2014/11/24 31

Page 32: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Median of MediansSimilarly, we can show that at most

7n/10 + 6 items are larger than m |Y| is at most 3n/4 for large enough n

Conclusion: The “median of medians” helps us control the worst-case size of the sub-problem without it, the algorithm runs in Q(n2)

time in the worst-case

2014/11/24 32

Page 33: Chapter 9: Medians and Order Statisticshscc.cs.nthu.edu.tw/~sheujp/lecture_note/14algorithm/Ch 9 14.pdf · Finding Maximum (Method I) • Let S denote the input set of n items •

Homework

• Exercises: 9.1-1, 9.3-3, 9.3-5, 9.3-6*, 9.3-7* (Due: Nov. 5)

2014/11/24 33