Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
2/2/2016
1
COMP 355Advanced Algorithms
1
More on Greedy Algorithms: Minimizing Lateness & Huffman Coding
Scheduling to Minimizing LatenessMinimizing lateness problem.
– Single resource processes one job at a time.
– Job j requires tj units of processing time and is due at time dj.
– If j starts at time sj, it finishes at time fj = sj + tj.
– Lateness: j = max { 0, fj - dj }.
– Goal: schedule all jobs to minimize maximum lateness L = max j.
Ex:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
d5 = 14d2 = 8 d6 = 15 d1 = 6 d4 = 9d3 = 9
lateness = 0lateness = 2
dj 6
tj 3
1
8
2
2
9
1
3
9
4
4
14
3
5
15
2
6
max lateness = 6
2
2/2/2016
2
Minimizing Lateness: Greedy Algorithms
Greedy template. Consider jobs in some order.
• [Shortest processing time first] Consider jobs in ascending order of processing time tj.
• [Earliest deadline first] Consider jobs in ascending order of deadline dj.
• [Smallest slack] Consider jobs in ascending order of slack dj - tj.
3
Greedy template. Consider jobs in some order.
• [Shortest processing time first] Consider jobs in ascending order of processing time tj.
• [Smallest slack] Consider jobs in ascending order of slack dj - tj.
counterexample
counterexample
dj
tj
100
1
1
10
10
2
dj
tj
2
1
1
10
10
2
Minimizing Lateness: Greedy Algorithms
4
2/2/2016
3
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
d5 = 14d2 = 8 d6 = 15d1 = 6 d4 = 9d3 = 9
max lateness = 1
Sort n jobs by deadline so that d1 d2 … dn
t 0
for j = 1 to n
Assign job j to interval [t, t + tj]
sj t, fj t + tjt t + tj
output intervals [sj, fj]
Greedy algorithm. Earliest deadline first.
Minimizing Lateness: Greedy Algorithms
dj 6
tj 3
1
8
2
2
9
1
3
9
4
4
14
3
5
15
2
6
5
Minimizing Lateness: No Idle TimeObservation. There exists an optimal schedule with no idle time.
Observation. The greedy schedule has no idle time.
0 1 2 3 4 5 6
d = 4 d = 6
7 8 9 10 11
d = 12
0 1 2 3 4 5 6
d = 4 d = 6
7 8 9 10 11
d = 12
6
2/2/2016
4
Minimizing Lateness: Inversions
Def. An inversion in schedule S is a pair of jobs i and j such that:i < j but j scheduled before i.
Observation. Greedy schedule has no inversions.
Observation. If a schedule (with no idle time) has an inversion, it has one with a pair of inverted jobs scheduled consecutively.
ijbefore swap
inversion
7
Def. An inversion in schedule S is a pair of jobs i and j such that:i < j but j scheduled before i.
Claim. Swapping two adjacent, inverted jobs reduces the number of inversions by one and does not increase the max lateness.
Pf. Let be the lateness before the swap, and let ' be it afterwards.
– 'k = k for all k i, j– 'i i
– If job j is late:
ij
i j
before swap
after swap
n)(definitio
)(
) time at finishes (
n)(definitio
i
ii
iji
jjj
jidf
fjdf
df
f'j
fi
inversion
Minimizing Lateness: Inversions
8
2/2/2016
5
Minimizing Lateness: Analysis of Greedy Algorithm
Theorem. Greedy schedule S is optimal.
Pf. Define S* to be an optimal schedule that has the fewest number of inversions, and let's see what happens.
• Can assume S* has no idle time.
• If S* has no inversions, then S = S*.
• If S* has an inversion, let i-j be an adjacent inversion.
– swapping i and j does not increase the maximum lateness and strictly decreases the number of inversions
– this contradicts definition of S*
9
Greedy Analysis Strategies
Greedy algorithm stays ahead. Show that after each step of the greedy algorithm, its solution is at least as good as any other algorithm's.
Exchange argument. Gradually transform any solution to the one found by the greedy algorithm without hurting its quality.
Structural. Discover a simple "structural" bound asserting that every possible solution must have a certain value. Then show that your algorithm always achieves this bound.
10
2/2/2016
6
11
Data Storage
Normal encoding:
• ASCII or Unicode, each character represented by a fixed-length codeword of bits (8 or 16 bits/character)
• Easy to decode
• Not the most efficient way to store data
12
Fixed Length Encoding
Suppose we have a 4-character alphabet {a, b, c, d}
Given the string “abacdaacac”, it would be encoded as
The final 20-character binary string would be “00010010110000100010”.
But what if we knew the frequency of the characters in advance?
2/2/2016
7
Variable Length EncodingUsing variable length codes
Given the string “abacdaacac”, it would be encoded as
The resulting 17-character string would be “01100101110010010”. (savings of 3 bits)
Resulting string is 1.5n compared to 2n, for a savings of 25% in expected encoding length.
n(0.60 · 1 + 0.05 · 3 + 0.30 · 2 + 0.05 · 3) = n(0.60 + 0.15 + 0.60 + 0.15) = 1.5n.
13
14
Prefix Codes
How to decode variable-length codes?
In the variable-length codes given in the example above no codewordis a prefix of another (very important!)
Observe: If two codewords did share a common prefix, e.g. a → 001 and b → 00101, then when we see 00101, how do we know whether the first character of the encoded message is “a” or “b”?
Conversely: If no codeword is a prefix of any other, then as soon as we see a codeword appearing as a prefix in the encoded text, then we know that we may decode it
2/2/2016
8
15
Prefix Codes
Mapping of codewords to characters so that no codeword is a prefix of another.
16
Expected Encoding Length
Optimal Code Generation: Given an alphabet C and the probabilities p(x) of occurrence for each character x ∈ C, compute a prefix code T that minimizes the expected length of the encoded bit-string, B(T).
2/2/2016
9
17
Huffman’s Algorithm
• We are given the occurrence probabilities for the characters.
• Build the tree up from the leaf level.
• Take two characters x and y, and “merge” them into a single super-character called z (prob(z) = prob(x) + prob(y)), which then replaces x and y in the alphabet.
• Continue recursively building the code on the new alphabet, which has one fewer character.
• When done, if codeword for z is 010, then x is 0100 and y is 0101.
18
Huffman’s Algorithm
2/2/2016
10
Huffman Code Construction
Character count in text.Freq
125
93
80
76
73
71
61
55
41
40
E
Char
T
A
O
I
N
R
H
L
D
31
27
C
U
65S
19
Huffman Code Construction
C U
31 27
125Freq
9380767371
61554140
EChar
TAOIN
RHLD
3127
CU
65S
20
2/2/2016
11
Huffman Code Construction
C U
58
31 27
125Freq
9380767371
61
554140
EChar
TAOIN
R
HLD
58
65S
3127
CU
21
Huffman Code Construction
C U
58
D L
81
31 27
40 41
125Freq
93
80767371
615855
EChar
T
AOIN
R
H
81
65S
4140
LD
22
2/2/2016
12
Huffman Code Construction
H
C U
58
113
D L
81
31 27
5540 41
125Freq
93
80767371
61
113E
Char
T
AOIN
R
81
65S
5855H
23
Huffman Code Construction
R S H
C U
58
113126
D L
81
31 27
5561 6540 41
125
Freq
93
80767371
113E
Char
T
AOIN
81
126
61R65S
24
2/2/2016
13
Huffman Code Construction 126
R S N I H
C U
58
113144126
D L
81
31 27
5571 7361 6540 41
125
Freq
93
8076
144
113E
Char
T
AO
81
7371
IN
25
Huffman Code Construction 144
R S N I H
C U
58
113144126
D L
81
156
A O
31 27
5571 7361 6540 41
80 76
126125
Freq
93
156
113E
Char
T81
8076
AO
26
2/2/2016
14
Huffman Code Construction
R S N I H
C U
58
113144126
D L
81
156 174
A O T
31 27
5571 7361 6540 41
9380 76
144126125
Freq
156
113E
Char174
93T81
27
Huffman Code Construction 174
R S N I
E
H
C U
58
113144126
238
T
D L
81
156 174
A O
80 76
71 7361 6540 41
31 27
55
12593
144126
238Freq
156
Char
125113
E
28
2/2/2016
15
Huffman Code Construction
R S N I
E
H
C U
58
113144126
238270
T
D L
81
156 174
A O
31 27
5571 7361 65
125
40 41
9380 76
238
156
270Freq
174
Char
144126
29
Huffman Code Construction
R S N I
E
H
C U
58
113144126
238270
330
T
D L
81
156 174
A O
31 27
5571 7361 65
125
40 41
9380 76
270330Freq
238
Char
156174
30
2/2/2016
16
Huffman Code Construction
R S N I
E
H
C U
58
113144126
238270
330 508
T
D L
81
156 174
A O
31 27
5571 7361 65
125
40 41
9380 76
330508FreqChar
270238
31
Huffman Code Construction
R S N I
E
H
C U
58
113144126
238270
330 508
838
T
D L
81
156 174
A O
31 27
5571 7361 65
125
40 41
9380 76
838FreqChar
330508
32
2/2/2016
17
Huffman Code Construction
R S N I
E
H
C U
0
0
T
D L
1
0 0
A O
0
11
1
10
0
1
1
1
1
1
1
0
0
0
0
0
1
125
Freq
93
80
76
73
71
61
55
41
40
E
Char
T
A
O
I
N
R
H
L
D
31
27
C
U
65S
0000
Fixed
0001
0010
0011
0100
0101
0111
1000
1001
1010
1011
1100
0110
110
Huff
011
000
001
1011
1010
1000
1111
0101
0100
11100
11101
1001
838Total 4.00 3.6233
34
Huffman’s Algorithm: Analysis
Recall that the cost of any encoding tree T is
Need to show that any tree that differs from the one constructed by Huffman’s algorithm can be converted into one that is equal to Huffman’s tree without increasing its cost
The key is showing that the greedy choice is always the proper one to make (or at least it is as good as any other choice).
Our approach is based a few observations. 1.The Huffman tree is a full binary tree, meaning that every internal
node has exactly two children. 2.The two characters with the lowest probabilities will be siblings at
the maximum depth in the tree.
2/2/2016
18
35
Huffman’s Algorithm: Analysis
Claim: Consider the two characters, x and y with the smallest probabilities. Then there is an optimal code tree in which these two characters are siblings at the maximum depth in the tree.
36
Claim: Huffman’s algorithm produces an optimal prefix code tree.
Huffman’s Algorithm: Analysis
2/2/2016
19
37
Practice
What is the optimal Huffman code for the following set of frequencies, based on the first 8 Fibonacci numbers?
a: 1, b:1, c:2, d:3, e:5, f:8, g:13, h:21
Can you generalize your answer to find the optimal code when the frequencies are the first n Fibonacci numbers?
38
Next Time
• Graphs: Definitions, Representations, Traversals