Greedy Algorithms, Continued · The resulting algorithm can be described in terms of priority queue operations (as deÞned on page 120) and takes O(n logn) time if a binary heap (Section

Greedy Algorithms, ContinuedDPV Chapter 5, Part 2

Jim Royer

March 4, 2019

(Unless otherwise credited, all images are from DPV.)

Royer v Greedy Algorithms 1

Huffman Encoding, 1

A toy example:I Suppose our alphabet is {A, B, C, D }.I Suppose T is a text of 130 million characters.I What is a shortest binary string representing T? (A hard question.)

Encoding 1A 7→ 00, B 7→ 01, C 7→ 10, D 7→ 11. Total: 260 megabits.

Statistics on TSymbol Frequency

A 70 millionB 3 millionC 20 millionD 37 million

Idea: Use variable length codesA’s code� D’s code� B’s code

Encoding 2A 7→ 0, B 7→ 100, C 7→ 101, D 7→ 11.Total: 213 megabits — 17% better.Q: How to unambiguously decode?Q: How to come up with the code?Q: How good is the result?


Huffman Encoding, 2

DefinitionA prefix-free code is a code in which no codeword is the prefix of another.Prefix-free codes can be represented by full binary trees(i.e., trees in which each non-leaf node has two children).Example:

154 Algorithms

Figure 5.10 A prefix-free encoding. Frequencies are shown in square brackets.

Symbol CodewordA 0B 100C 101D 11

0

A [70]

1

[60]

C [20]B [3]

D [37][23]

for our toy example, where (under the codes of Figure 5.10) the total size of the binary stringdrops to 213 megabits, a 17% improvement.

In general, how do we find the optimal coding tree, given the frequencies f1, f2, . . . , fn ofn symbols? To make the problem precise, we want a tree whose leaves each correspond to asymbol and which minimizes the overall length of the encoding,

cost of tree =n!

i=1

fi · (depth of ith symbol in tree)

(the number of bits required for a symbol is exactly its depth in the tree).There is another way to write this cost function that is very helpful. Although we are only

given frequencies for the leaves, we can define the frequency of any internal node to be thesum of the frequencies of its descendant leaves; this is, after all, the number of times theinternal node is visited during encoding or decoding. During the encoding process, each timewe move down the tree, one bit gets output for every nonroot node through which we pass. Sothe total cost—the total number of bits which are output—can also be expressed thus:

The cost of a tree is the sum of the frequencies of all leaves and internal nodes,except the root.The first formulation of the cost function tells us that the two symbols with the smallest

frequencies must be at the bottom of the optimal tree, as children of the lowest internal node(this internal node has two children since the tree is full). Otherwise, swapping these twosymbols with whatever is lowest in the tree would improve the encoding.This suggests that we start constructing the tree greedily: find the two symbols with the

smallest frequencies, say i and j, and make them children of a new node, which then hasfrequency fi + fj. To keep the notation simple, let’s just assume these are f1 and f2. By thesecond formulation of the cost function, any tree in which f1 and f2 are sibling-leaves has costf1 + f2 plus the cost for a tree with n ! 1 leaves of frequencies (f1 + f2), f3, f4, . . . , fn:

Question: How do you use such a tree to decode a file? Sample: 01101001010


Huffman Encoding, 3

Goal: Find an optimal coding tree for the frequencies given.

cost of a tree =n

∑i=1

f [i] · (depth of the ith symbol in tree)

=n

∑i=1

f [i] · (# of bits required for the ith symbol)

Assigning frequencies to all tree nodes

(a) Leaf nodes get the frequency of theircharacter.

(b) Internal nodes get the sum of the freqsof the leaf nodes below them.

154 Algorithms

Figure 5.10 A prefix-free encoding. Frequencies are shown in square brackets.

Symbol CodewordA 0B 100C 101D 11

0

A [70]

1

[60]

C [20]B [3]

D [37][23]

for our toy example, where (under the codes of Figure 5.10) the total size of the binary stringdrops to 213 megabits, a 17% improvement.

In general, how do we find the optimal coding tree, given the frequencies f1, f2, . . . , fn ofn symbols? To make the problem precise, we want a tree whose leaves each correspond to asymbol and which minimizes the overall length of the encoding,

cost of tree =n!

i=1

fi · (depth of ith symbol in tree)

(the number of bits required for a symbol is exactly its depth in the tree).There is another way to write this cost function that is very helpful. Although we are only

given frequencies for the leaves, we can define the frequency of any internal node to be thesum of the frequencies of its descendant leaves; this is, after all, the number of times theinternal node is visited during encoding or decoding. During the encoding process, each timewe move down the tree, one bit gets output for every nonroot node through which we pass. Sothe total cost—the total number of bits which are output—can also be expressed thus:

The cost of a tree is the sum of the frequencies of all leaves and internal nodes,except the root.The first formulation of the cost function tells us that the two symbols with the smallest

frequencies must be at the bottom of the optimal tree, as children of the lowest internal node(this internal node has two children since the tree is full). Otherwise, swapping these twosymbols with whatever is lowest in the tree would improve the encoding.This suggests that we start constructing the tree greedily: find the two symbols with the

smallest frequencies, say i and j, and make them children of a new node, which then hasfrequency fi + fj. To keep the notation simple, let’s just assume these are f1 and f2. By thesecond formulation of the cost function, any tree in which f1 and f2 are sibling-leaves has costf1 + f2 plus the cost for a tree with n ! 1 leaves of frequencies (f1 + f2), f3, f4, . . . , fn:


Huffman Encoding, 4

ObservationIn an optimal code tree: The two lowest freq. characters must be at the children of the lowestinternal node. (Why? Try a replacement argument)

Greedy StrategyFind these two characters, build this node, repeat(where some nodes are groups of characters as we go along).

procedure Huffman(f )// Input: An array f [1 . . . n] of freqs// Output: An encoding tree with n leavesH← a priority queue of integers, ordered by ffor i← 1 to n do insert(H, i, f [i])for k← n + 1 to 2n− 1 do

i← deletemin(H); j← deletemin(H)create a node numbered k with children i, jf [k]← f [i] + f [j]; insert(H, k, f [k])

S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani 155

f1 f2

f3f5 f4

f1 + f2

The latter problem is just a smaller version of the one we started with. So we pull f1 and f2

off the list of frequencies, insert (f1 + f2), and loop. The resulting algorithm can be describedin terms of priority queue operations (as defined on page 120) and takes O(n log n) time if abinary heap (Section 4.5.2) is used.

procedure Huffman(f)Input: An array f [1 · · · n] of frequencies

Output: An encoding tree with n leaves

let H be a priority queue of integers, ordered by ffor i = 1 to n: insert(H, i)for k = n + 1 to 2n ! 1:

i = deletemin(H), j = deletemin(H)create a node numbered k with children i, jf [k] = f [i] + f [j]insert(H,k)

Returning to our toy example: can you tell if the tree of Figure 5.10 is optimal?


Huffman Encoding, 5


i← deletemin(H)j← deletemin(H)create a node numbered k with children i, jf [k]← f [i] + f [j]insert(H, k, f [k])

return deletemin(H)

Examplea : 45%b : 13%c : 12%d : 16%e : 9%f : 5%

[Trace on board]


Huffman Encoding, 6


i← deletemin(H)j← deletemin(H)create a node numbered k with children i, jf [k]← f [i] + f [j]insert(H, k, f [k])

return deletemin(H)

Runtime AnalysisI initializing H: Θ(n) timeI for-loop iterations: n− 1I deletemin’s & insert’s:

cost O(log n) eachTotal: Θ(n) + (n− 1)O(log n)

= O(n log n).


Huffman Encoding, 7: Correctness

x

y

a b

⇓

a

b

x y

Suppose x & y are the two chars with the smallest freqs with f [x] ≤ f [y].

Lemma (1)There is an optimal code tree in which x and y have the same length and differonly in their last bit.

Proof.Suppose T is an optimal code tree and characters a and b which aremax-depth siblings in T where f [a] ≤ f [b].

Let T′ be the result of swapping a↔ x and b↔ y. Then:

cost(T)− cost(T′) = f [x] · (dT(x)− dT(a)) + f [y] · (dT(y)− dT(b))+ f [a] · (dT(a)− dT(x)) + f [b] · (dT(b)− dT(y))

= (f [a]− f [x]) · (dT(a)− dT(x))+ (f [b]− f [y]) · (dT(b)− dT(y)) ≥ 0.

So, cost(T) ≥ cost(T′). ∴ Since T is optimal, so is T′.



x

y

a b

⇓

a

b

x y










x

y

a b

⇓

a

b

x y










x

y

a b

⇓

a

b

x y










x

y

a b

⇓

a

b

x y










z : f[x]+f[y]

m

parent

x : f[x] y: f[y]


Lemma (2)Replace x and y by a new character z with frequency f [x] + f [y]. Suppose T′ isan optimal code tree for the new character set.Then swapping the z-node for a node with children x and y results in an optimalcode tree T for the old character set.

Proof.Then cost(T) = cost(T′) + f [x] + f [y]. (Why?)Suppose T′′ is an optimal code tree for the old char. set.WLOG, T′′ has x and y as siblings of max depth. (Why?)Replace x’s and y’s parent’s subtree with a node for z with frequencyf [x] + f [y] and call the tree T′′′. Then cost(T′′′)

= cost(T′′)− f [x]− f [y] ≤ cost(T)− f [x]− f [y] = cost(T′).

But as T′ is optimal, so is T′′′. ∴ cost(T) = cost(T′′) & T is also opt.



z : f[x]+f[y]

m

parent

x : f[x] y: f[y]



Proof.Then cost(T) = cost(T′) + f [x] + f [y]. (Why?)

Suppose T′′ is an optimal code tree for the old char. set.WLOG, T′′ has x and y as siblings of max depth. (Why?)Replace x’s and y’s parent’s subtree with a node for z with frequencyf [x] + f [y] and call the tree T′′′. Then cost(T′′′)





z : f[x]+f[y]

m

parent

x : f[x] y: f[y]



Proof.Then cost(T) = cost(T′) + f [x] + f [y]. (Why?)Suppose T′′ is an optimal code tree for the old char. set.

WLOG, T′′ has x and y as siblings of max depth. (Why?)Replace x’s and y’s parent’s subtree with a node for z with frequencyf [x] + f [y] and call the tree T′′′. Then cost(T′′′)





z : f[x]+f[y]

m

parent

x : f[x] y: f[y]



Proof.Then cost(T) = cost(T′) + f [x] + f [y]. (Why?)Suppose T′′ is an optimal code tree for the old char. set.WLOG, T′′ has x and y as siblings of max depth. (Why?)

Replace x’s and y’s parent’s subtree with a node for z with frequencyf [x] + f [y] and call the tree T′′′. Then cost(T′′′)





z : f[x]+f[y]

m

parent

x : f[x] y: f[y]



Proof.Then cost(T) = cost(T′) + f [x] + f [y]. (Why?)Suppose T′′ is an optimal code tree for the old char. set.WLOG, T′′ has x and y as siblings of max depth. (Why?)Replace x’s and y’s parent’s subtree with a node for z with frequencyf [x] + f [y] and call the tree T′′′.

Then cost(T′′′)





z : f[x]+f[y]

m

parent

x : f[x] y: f[y]








z : f[x]+f[y]

m

parent

x : f[x] y: f[y]





But as T′ is optimal, so is T′′′. ∴ cost(T) = cost(T′′) & T is also opt.Royer v Greedy Algorithms 9


Suppose x & y are the two chars with thesmallest frequencies with f [x] ≤ f [y].

Lemma 1: The greedy choice is safeThere is an optimal code tree in which x and yhave the same length and differ only in theirlast bit.

Lemma 2: Optimal code trees haveoptimal substructureReplace x and y by a new character z withfrequency f [x] + f [y]. Suppose T′ is anoptimal code tree for the new character set.Then swapping the z-node for a node withchildren x and y results in an optimal codetree T for the old char. set.


i← deletemin(H); j← deletemin(H)// Safe by Lemma 1

create a node numbered k with children i, jf [k]← f [i] + f [j]; insert(H, k, f [k])

// Safe by Lemma 2


Improving on Huffman: LZ Compression

I LZ = Abraham Lempel and Jacob ZivI The rough idea: Start with Huffman, but . . .

• Keep statistics on frequencies in a sliding window of a few K.• Keep readjusting the Huffman coding to fix the freqs of the sliding window (& and

note the change in coding in the compressed file).

I Huffman ≈ LV with the sliding window = the whole fileI There are many variations on this, see:

http://en.wikipedia.org/wiki/LZ77_and_LZ78.


http://en.wikipedia.org/wiki/LZ77_and_LZ78

Propositional Logic

I The formulas of propositional logic are given by the grammar:

P ::= Var | ¬P | P∧ P | P∨ P | P⇒ P Var ::= standard syntax

I A truth assignment is a function I : Variables→ { False, True }.I A truth assignment I determines the value of a formula as follows:

I [[x]] = True iff I(x) = True (x a variable)I [[¬p]] = True iff I [[p]] = False

I [[p∧ q]] = True iff I [[p]] = I [[q]] = True.I [[p∨ q]] = True iff I [[p]] = True or I [[q]] = True.I [[p⇒ q]] = True iff I [[p]] = False or I [[q]] = True.

I A satisfying assignment for a formula p is an I with I [[p]] = True.I Finding satisfying assignments for general propositional formulas seems hard.

(See Chapter 8.)


Horn clauses

DefinitionA Horn clause is a propositional logic formula of one oftwo special forms:

Positive Implications:Var∧ · · · ∧Var⇒ Var

Pure negative clauses:¬Var∨ · · · ∨ ¬Var

A Horn formula is the conjunction of a set of Hornclauses.

Example Horn Formula

toddler⇒ child(child∧male)⇒ boy

infant⇒ child(child∧ female)⇒ girl

⇒ toddler⇒ female¬girl

Example from: http://bluehawk.monmouth.edu/~rscherl/Classes/KF/slides6.pdf


http://bluehawk.monmouth.edu/~rscherl/Classes/KF/slides6.pdf

Satisfying Horn Formulas, 1

A Horn clause is a proposi-tional logic formula of oneof two special forms:

Positive Implications:Var∧ · · · ∧Var⇒ Var

Pure negative clauses:¬Var∨ · · · ∨ ¬Var

A Horn formula is the con-junction of a set of Hornclauses.

Finding Satisfying Assignments for Sets of ClausesGiven: A set of Horn clauses: { c1, . . . , cn }.Find: Find a truth assignment I that satisfies each ofc1, . . . , cn or else report that there is no such I .Observation:

1. The positive implications push us to make things true.2. The pure negative clauses push us to make things

false.Strategy:I We greedily build up a satisfying assignment I for the

positive implications — making a few variables True aspossible.

I We check that I also satisfies the pure negative clauses.



Trace with:(w∧ y∧ z)⇒ x, (x∧ z)⇒ w, x⇒ y, ⇒ x, (x∧ y)⇒ w, (¬w∨ ¬x∨ ¬y), (¬z)

// Input: H, a Horn formula// Output: a satisfying assignment, if one existsT ← ∅ // T = the set of vars set to True// Invariant: Each x ∈ T must be set to True in// any satisfying assignment.

while

(there is an (x1 ∧ · · · ∧ xk)⇒ x0 inH with x1, . . . , xk ∈ T but x0 /∈ T

)do

T ← T ∪ { x0 }for

(each pure negative clause(¬x1 ∨ · · · ∨ ¬xk) in H

)do

if x1, . . . , xk ∈ Tthen return No satisfying assignment

return T

Step 1. T ← ∅

Step 2. T ← T ∪ { x }because of: ⇒ x and ∅ ⊆ T

Step 3. T ← T ∪ { y }because of: x⇒ y and { x } ⊆ T

Step 4. T ← T ∪ {w }because of: (x∧ y)⇒ w and{ x, y } ⊆ T

Step 5. The while loop exits and(¬w∨ ¬x∨ ¬y) is unsatisfiablesince w, x, y ∈ T





while


)do

T ← T ∪ { x0 }for


)do


return T

Step 1. T ← ∅









while


)do

T ← T ∪ { x0 }for


)do


return T

Step 1. T ← ∅









while


)do

T ← T ∪ { x0 }for


)do


return T

Step 1. T ← ∅









while


)do

T ← T ∪ { x0 }for


)do


return T

Step 1. T ← ∅









while


)do

T ← T ∪ { x0 }for


)do


return T

Step 1. T ← ∅







// Input: H, a Horn formula (i.e., a set of Horn clauses)// Output: a satisfying assignment, if one existsT ← ∅ // = the set of vars set to True// Invariant: Each x ∈ T must be set to True in any satisfying assignment.while (there is an (x1 ∧ · · · ∧ xk)⇒ x0 in H with x1, . . . , xk ∈ T but x0 /∈ T) do

T ← T ∪ { x0 }for each pure negative clause (¬x1 ∨ · · · ∨ ¬xk) in H doif x1, . . . , xk ∈ T then return No satisfying assignment

return T

Why does this work?I Claim 1: The invariant holds in the while-loop. (Why?)I Claim 2: The while-loop eventually terminates. (Why?)I Claim 3: When the while-loop terminates, T = the set of variables that must be true

in any satisfying assignment for H’s positive implications. (Why?)I Claim 4: The algorithm is correct. (Why?)



// Input: H, a Horn formula (i.e., a set of Horn clauses)// Output: a satisfying assignment, if one existsT ← ∅ // = the set of vars set to True// Invariant: Each x ∈ T must be set to True in any satisfying assignment.while (there is an (x1 ∧ · · · ∧ xk)⇒ x0 in H with x1, . . . , xk ∈ T but x0 /∈ T) do

T ← T ∪ { x0 }for each pure negative clause (¬x1 ∨ · · · ∨ ¬xk) in H doif x1, . . . , xk ∈ T then return No satisfying assignment

return T

Runtime AnalysisI n = the number of characters in the Horn formula.I Naı̈vely, O(n2) time. (Why?)!!! But see DPV Exercise 5.33.

Note: This is in part a setup for Chapter 8.


Set Cover, 1

Suppose B is a set and S1, . . . , Sm ⊆ B.

Definition(a) A set cover of B is a{ S′1, . . . , S′k } ⊆ { S1, . . . , Sm } with B ⊆ ∪k

i=1S′i(b) A minimal set cover of B is a set cover of B

using as few of the Si-sets as possible.

The Set Cover Problem (SCP)Given: B and S1, . . . , Sm as above.Find: A minimal set cover of B.

ExampleFor: B = { 1, . . . , 14 } and

S1 = { 1, 2 }S2 = { 3, 4, 5, 6 }S3 = { 7, 8, 9, 10, 11, 12, 13, 14 }S4 = { 1, 3, 5, 7, 9, 11, 13 }S5 = { 2, 4, 6, 8, 10, 12, 14 }

the solution to SCP is { S4, S5 }.


Set Cover, 2

A Greedy Approx. to the Set Cover Problem

// Input: B and S1, . . . , Sm ⊆ B as above.// Output: A set cover of B which is close to minimal.C ← ∅while (some element of B is not yet covered) do

Pick the Si with the largest numberof uncovered B-elementsC ← C ∪ { Si }

return C

ExampleFor: B = { 1, . . . , 14 } and

S1 = { 1, 2 }S2 = { 3, 4, 5, 6 }S3 = { 7, 8, 9, 10, 11, 12, 13, 14 }S4 = { 1, 3, 5, 7, 9, 11, 13 }S5 = { 2, 4, 6, 8, 10, 12, 14 }

The algorithm returns{ S1, S2, S3 } — which is notoptimal, but not too bad.


Set Cover, 3

A Greedy Approx. to SCP

// Input: B and S1, . . . , Sm ⊆ B// Output: A near min. set coverC ← ∅while (all of B is not covered) do

Pick the Si with the largestnumber of uncovered B-elmsC ← C ∪ { Si }

return C

ClaimSuppose B contains n elementsand the min. cover has k sets.Then the greedy algorithm willuse at most (k ln n) sets.

Proof: Letnt = the number of uncovered elms after t-iterationsSo n0 = n.After iteration t > 0:I there are nt elms left.I k many sets cover themI So there must be some set with at least nt−1/k

many elements.I So by the greedy choice,

nt ≤ nt−1 −nt−1

k= nt−1

(1− 1

k

)= n0

(1− 1

k

)t

.


Set Cover, 4




return C


nt = the number of uncovered elms after t-iterationsWe know: nt ≤ n

(1− 1

k

)t.

Fact: 1− x ≤ e−x for all x, with equality iff x = 0.

160 Algorithms

which is most easily proved by a picture:

x0

11 ! x

e!x

Thusnt ! n0

!1 " 1

k

"t

< n0(e!1/k)t = ne!t/k.

At t = k ln n, therefore, nt is strictly less than ne! lnn = 1, which means no elements remain tobe covered.

The ratio between the greedy algorithm’s solution and the optimal solution varies frominput to input but is always less than ln n. And there are certain inputs for which the ratio isvery close to ln n (Exercise 5.33). We call this maximum ratio the approximation factor of thegreedy algorithm. There seems to be a lot of room for improvement, but in fact such hopes areunjustified: it turns out that under certain widely-held complexity assumptions (which willbe clearer when we reach Chapter 8), there is provably no polynomial-time algorithm with asmaller approximation factor.

So: nt ≤ n(

1− 1k

)t

< n(e−1/k)t = ne−t/k


Set Cover, 5




return C


nt = the number of uncovered elms after t-iterationsWe know: nt < n · e−t/k for t > 0.

∴ When t > k loge n,

nt < n · e−t/k < n · e− loge n =nn= 1.

i.e., we must have covered all of B.

So the greedy algorithm is optimal within a loge nfactor.

Fact: If certain widely-held complexity assumptionshold, then no poly-time algorithm has a better thanan (loge n)-approximation factor.(More on this in Chapters 8 and 9.)


Aside: Braess’s Paradox, 1 (Greed not always good)

Braess’s ParadoxBy adding capacity to a network, can actually reduce(!!!) its throughputwhen “rational actors” can choose their routes through the network.

Example (Part 1)I A road network of four roads with

4000 drivers.I All want to go from START to END.I Roads START→B and A→END have a

45 min travel time.I Roads START→A and B→END have a

T/100 min. travel time, where T = thenumber of travelers on that road.

I If 2000 drivers go the north route and2000 go the south route, every one has atravel time of 45 + (2000/100) = 65 mins,which is optimal.

The example + image are from http://en.wikipedia.org/wiki/Braess’s_paradox.


http://en.wikipedia.org/wiki/Braess's_paradox

Aside: Braess’s Paradox, 2

Example (Part 2)I Now add the A→ B road with travel time 0.I Since all the drivers are “rational” (i.e., greedy),

they will all take the START → A→ B route, sincethey can arrive at B five minutes faster than theSTART → B route.

I But then their total travel time to END is 80 mins.I If any one driver tries another route, that driver

gets a worse outcome (i.e., an ≈ 85 minute traveltime).

I Since the drivers are all “rational”, no one changesroutes.

I So the travel time of the new network is 80minutes.


Aside: Braess’s Paradox, 3

Example (Part 3)I This is what economists call a market failure.I See the Wikipedia article for

• references to mathematically rigorousversions of the above, and

• examples of actual networks that improvedtravel times by closing roads.

I The problem for computer scientists:• Many networks are inhabited by “rational

actors”.• How do we avoid situations like this?


Documents

Greedy Algorithms, Continued · The resulting algorithm can be described in terms of priority queue operations (as deÞned on page 120) and takes O(n logn) time if a binary heap (Section