Greedy algorithms David Kauchak cs161 Summer 2009

Greedy algorithms

David Kauchak

Summer 2009

Administrative

Thank your TAs

Kruskal’s and Prim’s

Make greedy decision about the next edge to add

Greedy algorithm structure

A locally optimal decision can be made which results in a subproblem that does not rely on the local decision

Interval scheduling

Given n activities A = [a1,a2, .., an] where each activity has start time si and a finish time fi. Schedule as many as possible of these activities such that they don’t conflict.

Interval scheduling

Which activities conflict?

Interval scheduling

Which activities conflict?

Simple recursive solution

Enumerate all possible solutions and find which schedules the most activities

Simple recursive solution

Is it correct? max{all possible solutions}

Running time? O(n!)

Can we do better?

Dynamic programming O(n2)

Greedy solution – Is there a way to make a local decision?

Overview of a greedy approach

Greedily pick an activity to schedule Add that activity to the answer Remove that activity and all conflicting

activities. Call this A’. Repeat on A’ until A’ is empty

Greedy options

Select the activity that starts the earliest, i.e. argmin{s1, s2, s3, …, sn}?

Greedy options

Select the activity that starts the earliest?

non-optimal

Greedy options

Select the shortest activity, i.e. argmin{f1-s1, f2-s2, f3-s3, …, fn-sn}

Greedy options

Select the shortest activity, i.e. argmin{f1-s1, f2-s2, f3-s3, …, fn-sn}

non-optimal

Greedy options

Select the activity with the smallest number of conflicts

Greedy options

Select the activity that ends the earliest, i.e. argmin{f1, f2, f3, …, fn}?

Greedy options

Multiple optimal solutions

Greedy options

Efficient greedy algorithm

Once you’ve identified a reasonable greedy heuristic: Prove that it always gives the correct answer Develop an efficient solution

Is our greedy approach correct?

“Stays ahead” argument: show that no matter what other solution someone provides you, the solution provided by your algorithm always “stays ahead”, in that no other choice could do better

Is our greedy approach correct?

“Stays ahead” argument Let r1, r2, r3, …, rk be the solution found by our

approach

Let o1, o2, o3, …, ok of another optimal solution

Show our approach “stays ahead” of any other solution

…r1 r2 r3 rk

o1 o2 o3 ok

Stays ahead

…r1 r2 r3 rk

o1 o2 o3 ok

Compare first activities of each solution

Stays ahead

…r1 r2 r3 rk

o1 o2 o3 ok

finish(r1) ≤ finish(o1)

Stays ahead

…r2 r3 rk

o2 o3 ok

We have at least as much time as any other solution to schedule the remaining 2…k tasks

An efficient solution

Running time?

Θ(n log n)

Overall: Θ(n log n)Better than:

O(n!)O(n2)

Scheduling all intervals

Given n activities, we need to schedule all activities. Minimize the number of resources required.

Greedy approach?

The best we could ever do is the maximum number of conflicts for any time period

Calculating max conflicts efficiently

Calculating max conflicts

Correctness?

We can do no better then the max number of conflicts. This exactly counts the max number of conflicts.

Runtime?

O(2n log 2n + n) = O(n log n)

Horn formulas

Horn formulas are a particular form of boolean logic formulas

They are one approach to allow a program to do logical reasoning

Boolean variables: represent some event x = the murder took place in the kitchen y = the butler is innocent z = the colonel was asleep at 8 pm

Implications

Left-hand side is an AND of any number of positive literals

Right-hand side is a single literal

x = the murder took place in the kitcheny = the butler is innocentz = the colonel was asleep at 8 pm

If the colonel was asleep at 8 pm and the butler is innocent then the murder took place in the kitchen

Implications

Left-hand side is an AND of any number of positive literals

Right-hand side is a single literal

x = the murder took place in the kitcheny = the butler is innocentz = the colonel was asleep at 8 pm

the murder took place in the kitchen

Negative clauses

An OR of any number of negative literals

u = the constable is innocentt = the colonel is innocenty = the butler is innocent

not every one is innocent

Given a horn formula (i.e. set of implications and negative clauses), determine if the formula is satisfiable (i.e. an assignment of true/false that is consistent with all of the formula)

u x y z

0 1 1 0

u x y z

not satifiable

wzx yxw

wyx xzyw

implications tell us to set some variables to true

negative clauses encourage us make them false

A brute force solution

Try each setting of the boolean variables and see if any of them satisfy the formula

For n variables, how many settings are there? 2n

A greedy solution?

wzx yxw wyx

A greedy solution?

wzx yxw wyx

A greedy solution?

wzx yxw wyx

A greedy solution?

wzx yxw wyx

A greedy solution?

wzx yxw wyx

not satisfiable

A greedy solution

set all variables of the implications of the form “x” to true

A greedy solution

if the all variables of the lhs of an implication are true, then set the rhs variable to true

A greedy solution

see if all of the negative clauses are satisfied

Correctness of greedy solution

Two parts: If our algorithm returns an assignment, is it a valid

assignment? If our algorithm does not return an assignment,

does an assignment exist?

If our algorithm returns an assignment, is it a valid assignment?

explicitly check all negative clauses

don’t stop until all implications with all lhs elements true have rhs true

If our algorithm does not return an assignment, does an assignment exist?

Our algorithm is “stingy”. It only sets those variables that have to be true. All others remain false.

Running time?

n = number of variables

m = number of formulas

Knapsack problems: Greedy or not?

0-1 Knapsack – A thief robbing a store finds n items worth v1, v2, .., vn dollars and weight w1, w2, …, wn pounds, where vi and wi are integers. The thief can carry at most W pounds in the knapsack. Which items should the thief take if he wants to maximize value.

Fractional knapsack problem – Same as above, but the thief happens to be at the bulk section of the store and can carry fractional portions of the items. For example, the thief could take 20% of item i for a weight of 0.2wi and a value of 0.2vi.

Data compression

Given a file containing some data of a fixed alphabet Σ (e.g. A, B, C, D), we would like to pick a binary character code that minimizes the number of bits required to represent the data.

A C A D A A D B … 0010100100100 …

minimize the size of the encoded file

Compression algorithms

http://en.wikipedia.org/wiki/Data_compression

Simplifying assumption over general compression?

Given a file containing some data of a fixed alphabet Σ (e.g. A, B, C, D), we would like to pick a binary character code that minimizes the number of bits required to represent the data.

A C A D A A D B … 0010100100100 …

minimize the size of the encoded file

Frequency only

The problem formulation makes the simplifying assumption that we only have character frequency information for a file

A C A D A A D B …

=Symbol Frequency

Fixed length code

Use ceil(log2|Σ|) bits for each character

A = B = C = D =

Fixed length code

A = 00B = 01C = 10D = 11

Symbol Frequency

How many bits to encode the file?

2 x 70 +2 x 3 +2 x 20 + 2 x 37 =

260 bits

Fixed length code

A = 00B = 01C = 10D = 11

Symbol Frequency

Can we do better?

2 x 70 +2 x 3 +2 x 20 + 2 x 37 =

260 bits

Variable length code

What about:

A = 0B = 01C = 10D = 1

Symbol Frequency

1 x 70 +2 x 3 +2 x 20 + 1 x 37 =

173 bits

Decoding a file

A = 0B = 01C = 10D = 1

010100011010

What characters does this sequence represent?

Decoding a file

A = 0B = 01C = 10D = 1

010100011010

What characters does this sequence represent?

A D or B?

Variable length code

What about:

A = 0B = 100C = 101D = 11

Symbol Frequency

How many bits to encode the file?

1 x 70 +3 x 3 +3 x 20 + 2 x 37 =

213 bits(18% reduction)

Prefix codes

A prefix code is a set of codes where no codeword is a prefix of some other codeword

A = 0B = 100C = 101D = 11

A = 0B = 01C = 10D = 1

Prefix tree We can encode a prefix code using a full binary tree

where each child represents an encoding of a symbol

A = 0B = 100C = 101D = 11

Decoding using a prefix tree

To decode, we traverse the graph until a leaf node is reached and output the symbol

A = 0B = 100C = 101D = 11

Traverse the graph until a leaf node is reached and output the symbol

1000111010100

B A D C

1000111010100

B A D C A

1000111010100

B A D C A B

Determining the cost of a file

0 1Symbol Frequency

i i ifT1

)depth()(cost

0 1Symbol Frequency

What if we label the internal nodes with the sum of the children?

0 1Symbol Frequency

Cost is equal to the sum of the internal nodes and the leaf nodes

60 times we see a prefix that starts with a 1

of those, 37 times we see an additional 1

the remaining 23 times we see an additional 0

70 times we see a 0 by itself

of these, 20 times we see a last 1 and 3 times a last 0

As we move down the tree, one bit gets read for every nonroot node

A greedy algorithm?

Given file frequencies, can we come up with a prefix-free encoding (i.e. build a prefix tree) that minimizes the number of bits?

Symbol Frequency

B 3C 20D 37A 70

Symbol Frequency

BC 23D 37A 70

merging with this node will incur an additional cost of 23

Symbol Frequency

BCD 60A 70

Symbol Frequency

ABCD 130

Is it correct?

The algorithm selects the symbols with the two smallest frequencies first (call them f1 and f2)

Consider a tree that did not do this:

Is it correct?

The algorithm selects the symbols with the two smallest frequencies first (call them f1 and f2)

Consider a tree that did not do this:

- frequencies don’t change- cost will decrease since f1 < fi

contradiction

i i ifT1

)depth()(cost

Runtime?

1 call to MakeHeap

2(n-1) calls ExtractMin

n-1 calls Insert

O(n log n)

Non-optimal greedy algorithms

All the greedy algorithms we’ve looked at today give the optimal answer

Some of the most common greedy algorithms generate good, but non-optimal solutions set cover clustering hill-climbing relaxation

Greedy algorithms David Kauchak cs161 Summer 2009

Documents

Sorting Conclusions David Kauchak cs302 Spring 2013

Linear Programming David Kauchak cs161 Summer 2009

CS161%Midterm%2%Review%cs161/sp15/slides/cs161-mt2-review-sl… · Stac%Analysis% • SyntacAc%analysis% – Does%notinterpretthe%statements% • SemanAc%analysis% – Interpretstatements%

PROBABILISTIC MODELS David Kauchak CS451 – Fall 2013

Recurrences David Kauchak cs161 Summer 2009. Administrative Algorithms graded on efficiency! Be specific about the run times (e.g. log bases) Reminder:

Today in CS161

Graphs: MSTs and Shortest Paths David Kauchak cs161 Summer 2009

CS161 – Design and Architecture of Computer Systemskkhas001/0-introduction-cs161f16.pdf · CS161 Goal ! Introduction to Computer Architecture ! Familiarity with processor components

CS457 Fall 2011 David Kauchak

CS161%Midterm%2%Review%gamescrafters.berkeley.edu/~cs161/sp15/slides/cs161-mt2-review-slides.pdf · Stac%Analysis% • SyntacAc%analysis% – Does%notinterpretthe%statements% •

© Geoffrey Werner-Allen – Harvard University CS161 Introductory Lecture: CS161 Overview and Survival Tips Geoffrey Werner-Allen 07 Feb 2007

Watermarking - EECS Instructional Support Group Home …cs161/fa05/Notes/cs161.1130.pdf · Watermarking Watermarking: ... Microsoft PowerPoint - cs161-watermarking.ppt Author: Doug

+ Natural Language Processing CS151 David Kauchak

Eggen y Kauchak - Estrategias Docentes

Greedy algorithms David Kauchak cs302 Spring 2012

SEARCH ALGORITHMS David Kauchak CS30 – Spring 2015

CS161 – Design and Architecture of Computer Systems

ANALIZA TRANSPORTNIH MREMREŽŽAA - nastava.sf.bg.ac.rs · Prožždrljivi (Greedy) heuristi drljivi (Greedy) heurističčki algoritmiki algoritmi Proždrljivi (Greedy) algoritmi generišu

Aprendizaje Cooperativo - Eggen&Kauchak

CS161 Midterm 1 Review - inst.eecs.berkeley.educs161/sp15/slides/cs161-mt1-review... · CS161 Midterm 1 Review Midterm 1: March 4, 18:30-20:00 Same room as lecture. Security Analysis