Download pdf - Chapter 7 : NP-completeness Reference : Computer and ...unixlab.sfsu.edu/~jwong/2019/810/chap7_to_chap8.pdf · Reference : Computer and Intractability by Garey and Johnson, 1979 Freeman

1

Chapter 7 : NP-completeness

Reference : Computer and Intractability by Garey and Johnson,

1979 Freeman & Company

7.1 General Problems, Input Size and Time Complexity

Time complexity of algorithms :

polynomial time algorithm ("efficient algorithm") v.s.

exponential time algorithm ("inefficient algorithm")

f(n) \ n 10 30 50

n 0.00001 sec 0.00003 sec 0.00005 sec

n5

0.1 sec 24.3 sec 5.2 mins

2n

0.001 sec 17.9 mins 35.7 yrs

Revisited input size and running time of an algorithm

The running time of an algorithm is expressed in term of the size of a problem

instance.

Example : for sorting algorithms, we use “n” as input size; for graph

algorithms, we use combination of “E” and “V”.

In a computer, we use binary bits to encode numbers.

Precisely, the running time should be expressed in term of total number of

input characters (total number of binary bits)

2

Example : for sorting algorithm, input is a set of n numbers {x1, x2, …, xn }

Let L be the largest number in {x1, x2, …, xn }. The size of L in binary number

is lg L bits.

The input size is not more than (n * lg L) bits

For mergesort, the total running time for input size n as given chapter 1.2 is :

T(n) = n * lg n where n is n * lg L bit comparisons

So, the running time in term of the “real” input size ( n * lg L ) is

T( n * lg L) = (n * lg L) * lg n

<= (n * lg L) * lg (n * lg L)

T(m) <= m lg m, substituting “n lg L” for “m”

We obtain same result.

In fact, most algorithms give same running time results.

Let consider the following prime number checking algorithm :

prime(int n) { // assume n > 2

i = 2;

while ( i < n) {

if (n % i == 0) return (“n is not a prime”)

i++;

}

return (“n is a prime”)

}

3

The input size is lg n bits

The magnitude of the input is n

The total steps is n

The running time complexity (in term of input size) is exponential, i.e. 2lg n

,

since n steps is exponential of input size lg n bits

The running time which is bounded by a polynomial function of its size and

magnitudes is called pseudopolynomial time

The running time of above algorithm is exponential time and also

pseudopolynomial time.

Another pseudopolynomial time algorithm : The running time O(nM) of

0/1 Knapsack algorithm in Chapter 5.4

Intractable problem: It is a “hard” problem for a computer to solve. Most likely,

there are no polynomial time algorithms to solve the problem.

Decision problem: The solution to the problem is "yes" or "no". Most

optimization problems can be phrased as decision problems (still have the same

time complexity).

Example : Assume we have a decision algorithm X for 0/1 Knapsack problem,

i.e. Algorithm X return “Yes” or “No” to the question “is there a solution with

profit >= P subject to knapsack capacity <= M?”

We can repeatedly run algorithm X for various profits to find an optimal

solution. Example : Use binary search to get the optimal profit,

maximum of lg n runs.

Min Bound Optimal Profit Max Bound

|_____________________|______________________|

4

7.2 The Classes of P and NP

The class P and Deterministic Turing Machine

Given a decision problem X, if there is a polynomial time Deterministic

Turing Machine program that solves X, then X is belong to P

Informally, there is a polynomial time algorithm to solve the problem

The class NP and Non-deterministic Turing Machine

Given a decision problem X, if there is a polynomial time Non-deterministic

Turing machine program that solves X, then X is belong to NP

Given a decision problem X. For every instance I of X, (a) guess solution S

for I, and (b) check “is S a solution to I?”. If (a) and (b) can be done in

polynomial time, then X is belong to NP.

Obvious : P NP, i.e. A problem in P does not need “guess solution”. The

correct solution can be computed in polynomial time.

5

Some problems which are in NP, but may not in P :

0/1 Knapsack Problem

PARTITION Problem : Given a finite set of positive integers Z.

Question : Is there a subset Z' of Z such that

Sum of all numbers in Z' = Sum of all numbers in Z-Z'

i.e. Z' = (Z-Z')

Two-Processor Non-Preemptive Schedule Length Problem : Given a set of n

tasks T with processing time {p1, p2, …, pn}, two processors and a positive

number L.

Question : Is there a non preemptive schedule for T on two processors such

that the schedule length <= L.

How to make $1,000,000!!

One of the most important open problem in theoretical compute science :

Is P=NP ?

See : http://www.claymath.org/millennium/

Most likely “No”. Currently, there are many known problems in NP, and there is

no solution to show anyone of them in P.

http://www.claymath.org/millennium/

6

7.3 NP-Complete Problems

Stephen Cook introduced the notion of NP-Complete Problems. This makes the

problem “P = NP ?” much more interesting to study.

The following are several important things presented by Cook :

1. Polynomial Transformation (" ")

L1 L2 : There is a polynomial time transformation that transforms

arbitrary instance of L1 to some instance of L2.

If L1 L2 then L2 is in P implies L1 is in P (or L1 is not in P L2 is not

in P)

If L1 L2 and L2 L3 then L1 L3

2. Focus on the class of NP – decision problems only. Many intractable

problems, when phrased as decision problems, belong to this class.

3. L is NP-Complete if L NP and for all other L' NP, L' L

If a problem in NP-complete can be solved in polynomial time then all

problems in NP can be solved in polynomial time.

If a problem in NP cannot be solved in polynomial time then all problems

in NP-complete cannot solve in polynomial time.

So, if an NP-complete problem is in P then P=NP

if P != NP then all NP-complete problems are in NP-P

Question : how can we obtain the first NP-complete problem L?

7

4. Cook Theorem : SATISFIABILITY is NP-Complete.

Instance : Given a set of variable U and a collection of clauses C over U.

Question : Is there a truth assignment for U that satisfies all clauses in C?

Example :

U = {x1, x2}

C1 = {(x1, x2), (x1, x2)}

= (x1 OR x2) AND (x1 OR x2)

if x1 = x2 = True C1 = True

C2 = (x1, x2) (x1, x2) (x1) not satisfiable

With the Cook Theorem, we have the following property :

Lemma : If L1 and L2 belong to NP, L1 is NP-complete, and L1 L2 then L2 is

NP-complete.

i.e. L1, L2 NP and for all other L' NP, L' L1 and L1 L2 L' L2

So now, to prove a problem L to be NP-complete problem , we need to

show L is in NP

select a known NP-complete problem L'

construct a polynomial time transformation f from L' to L

prove the correctness of f and that f is a polynomial transformation

Some NP-complete problems :

SATISFIABILITY

0/1 Knapsack

PARTITION

Two-Processor Non-Preemptive Schedule Length

CLIQUE : An undirected graph G=(V, E) and a positive integer J <= |V|

Question : Does G contain a clique (complete subgraph) of size J or more?

8

7.4 Proving NP-Completeness Results

Example 1 : Show that the PARTITION problem is NP-complete.

Given a known NPC problem - Sum of Subset Problem (SS), show that

PARTITION problem is NPC.

SS Problem

Instance : Let A = {a1, a2, …, an} be a set of n positive numbers.

Question : Given M, is there a subset A' A such that A' = M

PARTITION Problem

Instance : Given a finite set of m positive integers Z.

Question : Is there a subset Z' Z such that Z' = (Z-Z')

PARTITION is in NP

guess a subset Z' O(m) // or use choice(1,n)

verify Z' = (Z-Z')? O(m)

Total O(m)

SS PARTITION

Given an arbitrary instance of SS, i.e. A = {a1, a2, …, an} and M

Construct an instance of PARTITION as follows :

Z = {b1, b2, …, bn, bn+1, bn+2} of m = n+2 positive numbers

where

bi = ai for 1 <= i <= n

bn+1 = M + 1

bn+2 = A + (1 – M)

Note : bi = 2 A + 2. Also, the transformation can be done in

polynomial time (based on input size A & M)

9

To show the transformation is correct : The SS problem has a solution if and

only if the PARTITION problem has a solution.

If SS problem has a solution, then the PARTITION problem has a solution

assume A' is the solution for SS problem then

Z' = A' {bn+2} and Z-Z' =A-A' {bn+1}

Z' = M + A + (1 – M) = A + 1 = (Z-Z')

If the PARTITION problem has a solution then the SS problem has a

solution

if Z' is the solution then Z' = A + 1

exactly one of bn+2 or bn+1 Z'

if bn+2 Z' then Z' – { bn+2 } = A' and A' = M

if bn+1 Z', then use Z - Z' to obtain A'

Example 2 : Show that the Traveling Salesman (TS) Problem is NP-complete.

Given a known NPC problem - Hamiltonian Circuit (HC), show that TS problem

is NPC.

Hamiltonian Circuit (HC) problem

Instance : Give an undirected graph G=(V, E)

Question : Does G contain a Hamiltonian circuit, i.e. a sequence < v1, v2, …, vn >

of all vertices in V which is a simple cycle.

Traveling Salesman (TS) Problem

Instance : Give an undirected complete graph G=(V, E) with distance d(i,j) >= 0

for each edge (i,j) for i j and a positive integer B.

Question : Is there a tour of all cities (a simple cycle with all vertices) having

total distance no more than B.

10

TS is in NP

guess a tour, i.e. sequence of all vertices O(V)

verify that it is a cycle covering all vertices and total distance <= B O(E)

HC TS

Given arbitrary instance of HC, i.e. G=(V, E).

Construct an instance of TS as follows :

G’ = (V , E’ ), where (u,v) E’ for all u, v V and u v

d(u,v) = 0 if (u,v) E

d(u,v) = 1 if (u,v) E

and B = 0

Note : The transformation can be done in polynomial time (based on

input size V and E)

To show the transformation is correct : The HC problem has a solution if and

only if the TS problem has a solution.

If HC problem has a solution, then TS problem has a solution

Assume a < v1, v2, …, vn > is the solution for HC

It is a simple cycle which contains all vertices

Each edge (u,v) in this cycle has d(u,v) = 0

Total distance is 0

Solution for TS

If TS problem has a solution then HC problem has a solution

Obvious to see.

11

Example 3 : Show that the Vertex Cover (VC) Problem is NP-complete

(Optional)

Given 3SAT problem is NPC, show that VC problem is NPC.

3SAT Problem

Instance : Given a set of variables U = {u1, u2, …, un} and a collection of clauses

C = {c1, c2, …, cm} over U such that | ci | = 3 for 1 <= i <= m.

Question : Is there a truth assignment for U that satisfies all clauses in C?

Note : 3SAT problem is a restricted problem of SATISFIABILITY problem.

Vertex Cover (VC) Problem

Instance : Given an undirected graph G=(V, E) and a positive integer K <= |V|

Question : Is there a vertex cover of size K or less for G, i.e. a subset V’ V

such that |V’| <= K and, for each (u,v) E, at least one of u or v V’.

VC is in NP

guess a set of vertices V’ V O(V)

verify that |V’| <= K and, for each (u,v) E, u V’ or v V’ O(V*E)

3SAT VC

Given arbitrary instance of 3SAT, i.e. U = {u1, u2, …, un} and C = {c1, c2, …,

cm}. Construct an instance of VC as follows :

G = (V , E ) and K = n+2m

V = Vu Vc

Vu = {u1t, u1f, u2t, u2f, …, unt, unf} and

Vc = {a11, a12, a13} {a21, a22, a23} … {am1, cm2, am3}

12

E = Eu Ec Euc

Eu = {(u1t, u1f) , (u2t, u2f) , …, (unt, unf)}

Ec = {(a11, a12) , (a12, a13), (a13, a11)} … {(am1, am2) , (am2, am3), (am3,

am1)}

Assume ci = (xi, yi, zi) for 1 <= i <= m,

find the corresponding vertices, xi, yi, zi, in Vu

Euc = {(x1, a11), (y1, a12), (z1, a13)} … {(xm, am1), (ym, am2), (zm, am3)}

|V| = 2n+3m and |E| = n+3m+3m

The transformation can be done in polynomial time (based on input size n

and m)

Example : U = {u1, u2, u3 , u4} and C = {{u1, u3, u4}, {u1, u2, u4}}

u1_t u1_f u2_t u2_f u3_t u3_f u4_t u4_f

a12

a11 a13 a23 a21

a22

13

Major Property : if there is a vertex cover set V’ <= K, then

(a) |V’| = n+ 2m, and

(b) V’ must include exactly 1 vertex in {uit, uif} for 1 <= i <= n from Vu and

at exactly 2 vertices in {ai1, ai2, ai3} for 1 <= i <= m from Vc,

i.e. n vertices from Vu and 2m vertices from Vc

Look at edges in Eu and Ec, a vertex cover set V’ must include

at least 1 vertex {uit, uif} for 1 <= i <= n, and

at least 2 vertices from {ai1, ai2, ai3} for 1 <= i <= m.

Since |V’| <= K |V’| = K

To show the transformation is correct : The 3SAT problem has a solution if

and only if the VC problem has a solution.

If VC problem has a solution then 3SAT problem has a solution

From the above property, V’ contains n vertices from Vu and 2m vertices

from Vc

From Vu , the truth assignment for {u1, u2, …, un} in 3SAT is

ui = T if uit V’

ui = F if uif V’ for 1 <= i <= n

To see that this is a solution for 3SAT :

we must show that for each ci = (xi, yi, zi), there is atleast one variable

i {xi, yi, zi}which set ci to TRUE, 1 <= i <= n

14

From the above property, exactly 2 vertices from {ai1, ai2, ai3} in V’, for 1

<= i <= m

only cover 2 edges from {(xi, ai1), (yi, ai2), (zi, ai3)} from Euc

assume the edge (xi, ai1) is not covered by 2 vertices from {ai1, ai2, ci3}

then xi V’ since V’ is a vertex cover set

xi set the clause ci to True for 1 <= i <= n

If the 3SAT problem has a solution, then the VC problem has a solution

The vertex cover set V’ with exactly n+2m vertices can be obtained as

follows :

From the truth assignment for {u1, u2, …, un} in 3SAT, we get n

vertices from Vu,

i.e. uit V’ if ui = T; otherwise uif V’ for 1 <= i <= n

This covers all edges in Eu and at least one edge in {(xi, ai1), (yi, ai2), (zi,

ai3)} for 1 <= i <= m

From Vc, include 2 vertices from each {ai1, ai2, ai3} into V’, for 1 <= i

<= m. These 2 vertices cover all edges {(ai1, ai2) , (ai2, ai3), (ai3, ai1)} and

also cover edges in Euc that are not covered previously.

15

Example 4 : Show that the Square Packing (SP) Problem is NP-complete

(Optional)

Motivation : truck loading, the design of VLSI chips and etc.

Square Packing Problem

Given a packing square S and a set of packed squares L = {s1, s2, ..., sn}.

Question : is there and orthogonal packing of L into S?

Note : orthogonal packing the sides of squares are parallel to the vertical and

horizontal axes

3-Partition Problem

Given a list A = {a1, a2, ..., a3z} of 3z positive integers such that sum of all

numbers is zB and B/4 < ai < B/2 for each 1 <= i <= 3z.

Question : Can A be partitioned into z groups such that the sum of all

numbers in each group is B. Note : each group must have exactly 3 numbers

Proof : Refer to the research paper

16

Exercises

1. Use Vertex Cover Problem to show that the CLIQUE Problem is NP-complete.

2. Use PARTITION Problem to show that Two-Processor Nonpreemptive

Schedule Length Problem is NP-complete

3. Use 3SAT to show that the SET SPLITTING Problem is NP-complete

SET SPLITTING Problem

Given a finite set S={a1, a2, …, am} and a collection C = {s1, s2, …, sk}

where si S, 1<= i <= k.

Question : Is there a partition of S into two subsets S1 and S2, i.e. S1 S2 =

and S1 S2 = S , such that no si is entirely contained in S1 or S2.

4. Show the following packing problems are NP-complete

a set of squares into a larger rectangle.

a set of rectangles into a larger square.

Note : You should use different reductions.

17

Solution to problem # 1: Show that CLIQUE problem is NP-complete

Vertex cover problem : Given an undirected graph G=(V,E) and K <= |V|.

Question : Is there a vertex cover set V' V such that |V'| <= K?

CLIQUE problem : Given an undirected graph G=(V,E) and J <= |V|.

Question : Does G contain a set V* V such that |V*| >= J and vertices

in V* forms a complete subgraph (clique).

a) Show that CLIQUE is in NP

1. Guess a set of V* vertices O(V)

2. Check V* >= J O(V)

3. Check for each pair of u,v V*, O(E)

u v and (u,v) E

If both (2) and (3) are OK, return "Yes"; Otherwise, return "No"

b) Show that Vertex Cover CLIQUE

Given arbitrary instance of vertex cover problem, i.e. G=(V,E) and K

Construct an instance of CLIQUE problem :

G'=(V,E') and J = |V| - K

for each pair of u,v V and u v

if (u,v) E then (u,v) E'

if (u,v) E, then (u,v) E'

Note : G' is called a complement graph of G

The transformation can be done in O(V*V)

18

To show the transformation is correct, we need to prove that Vertex Cover problem

has a solution if and only if the constructed CLIQUE problem has a solution.

Assume there is a vertex cover set V' V such that |V'| <= K.

Let V" = V - V', clearly

|V"| >= |V| - K

for any pair of vertices u,v V" and u v (u,v) E; otherwise,

u V’ or v V’.

Now, consider constructed G' and V* = V", clearly

|V"| >= |V| - K |V*| >= J

for any pair of vertices u,v V* and u v (u,v) E'

V* forms a clique in G' and |V*| = J

Assume there is a clique V* in G' such that |V*| >= J.

|V*| >= J |V*| >= |V| - K

for any pair of vertices u,v V* and u v (u,v) E' since V*’s

vertices forms a complete sub-graph

Now, consider G and V' = V – V*, clearly

|V*| >= |V| - K |V'| <= K

for any pair of vertices u,v V* and u v (u,v) E

for any edge (a,b) E, either a V* or b V* a V' or b V'

V' is a vertex cover set in G and |V'| <= K

19

Solution to problem # 2 : Show that Two-Processor Nonpreemprive Schedule

Length (TNSL) problem is NP-complete

PARTITION problem : Given a set Z = {a1, a2, …, an} of n positive integers.

Question : Is there a subset Z' Z such that Z' = (Z-Z').

TNSL Problem : Given 2 processors, a set of m jobs, J, with processing time {p1,

p2, …, pm} and a positive integer L.

Question : Is there a nonpreemptive schedule for m jobs on 2 processors such that

the schedule length <= L?

a) Show that TNSL problem is in NP

1. Guess a set of job T J to be scheduled on 1st processor O(m)

2. Check T <= L O(m)

3. Check (J-T) <= L O(m)

Note : T = total processing time of all jobs in T

If both (2) and (3) are OK, return "Yes"; otherwise, return "No"

b) Show that PARITION TNSL

Given arbitrary instance of PARTITION problem, i.e. Z = {a1, a2, …, an}

Construct an instance of TNSL problem :

m = n jobs

pi = ai for 1 <= i <= n

L = (ai)/2

The transformation can be done in O(n)

20

To show the transformation is correct, we need to prove that PARTITION problem

has a solution if and only if the constructed TNSL problem has a solution.

Assume there is a set Z' such that Z' = (Z - Z')

Let T J and T' = J – T.

For each ai Z' , let job i T . Clearly

Z' = (Z - Z') = (ai)/2 T = T' = (pi)/2 = L

jobs in T and jobs in T' can be scheduled on 1st and 2nd processors

respectively with schedule length = L

Assume there is a schedule with schedule length = L.

Let T and T' be a set of jobs that are scheduled in 1st and 2nd

processor respectively.

Clearly both processors have no idle time from 0 to L since the

total processing time pi = 2L.

i.e. T = T' = (pi)/2 = L

Let ai Z' if job i in T Z' = (Z - Z')

a solution to the partition problem

21

7.5 Coping with NP-Completeness Problem

NP-hard Problem

Note : Refer to Chapter 5 of Garey and Johnson

If L' L and L' is an NP-complete problem then L is called NP-hard problem.

All NPC problems are NP-hard.

There are some NP-hard decision problems that are not in NP.

Example : Kth

Largest Subset Problem is not in NP

Instance : Given a set of positive integers A = {a1, a2, …, an}, and two non-

negative numbers, B <= A and K <= 2|A|

.

Question : Are there at least K distinct subsets A’ A such that each subset has

total sum <= B.

Note : PARTITION problem Kth

Largest Subset Problem

22

Pseudo-polynomial time algorithm


Some NP-complete problem may be solved in "polynomial" time (based on input

size and magnitude).

Example : PARTITION problem

Dynamic Programming Algorithm :

assume B = (sum of n integers)/2

construct a table of size (approx.) n x B

fill in the table row by row

for each row, add a new element

mark sum of all possible subsets

if there is a subset with sum = B, stop.

Time Complexity : O(nB)

23

NP-completeness in the strong sense


If L is NP-complete problem in the strong sense, then L cannot be solved by a

pseudo-polynomial time algorithm unless P=NP

L is NP-complete in the strong sense if L' L , L' is NP-complete in the

strong sense and L is in NP

Example : 3-partition problem is NPC in the strong sense.

Solving more restricted problems

If we restricted the problem L to problem L'

i.e. L' is a special/restricted case of L,

then we may solve L' in polynomial time.

For example : PARTITION problem

if we assume that each input integer ai <= n, where n is the number of input

integers, then the pseudo polynomial time algorithm becomes polynomial time

algorithm, i.e. O(n3)

24

Chapter 8 : Approximation Algorithms

8.1 Introduction

In general, computer cannot solve NPC problem efficiently

But, many NPC problems are too important to abandon

If a problem is an NPC problem, you may try to

find a pseudo polynomial time algorithm if it is not NPC in the strong sense

solve restricted problems

find approximation algorithms (a.k.a. heuristics; usually a simple & fast

algorithm)

Let consider optimization problems only

An algorithm A is an approximation for a problem L : if given any valid instance

I, it finds a solution A(I) for L and A(I) is “close” to optimal solution OPT(I).

[sometime, it will be nice to also include - If I is an invalid instance (with no

solution) then it should return “no solution”].

Approximation ration (or bound) of approximation algorithm A for problem L

A(I)/OPT(I) <= ; if L is a minimizing problem and A(I) >= OPT(I) > 0

OPT(I)/A(I) <= ; if L is a maximizing problem and OPT(I) >= A(I) > 0

When you provide an algorithm (pseudo-polynomial/polynomial/heuristic), you

need to prove that it works correctly (of course, sometime the proof is obvious)

For heuristic, we also need to prove the performance of the algorithm.

You don’t want to give a bad approximation algorithm that sometime gives poor

performance. Also, you don’t want to give a good approximation algorithm but

show that a loose bound (i.e. not a tight bound)

25

8.2 Vertex Cover (VC) problem

Optimization VC Problem: Given an undirected graph G=(V, E). Find a

minimum vertex cover set V’ for G, i.e. V’ V such that for each (u,v) E, at

least one of u or v V’.

Approximation VC Algorithm

// Input a graph G using adjacency lists

// Output a vertex cover set C

C = ;

SE = E // initially, SE = E, i.e. adjacency lists

while ( SE ) {

delete arbitrary edge (u,v) from SE // **

C = C {u, v}

delete all edges incident to either u, v from SE

}

return C

Running time : O(V+E)

Example :

d e f g

a b c

d e f g

a b c (1)

d e f g

a b c (2)

d e f g

a b c (3)

26

Obviously, the result set C is a vertex cover set

Theorem : The above approximation VC algorithm returns a vertex cover set C

such that |C| |C*| <= 2 where C

* is an optimal (or minimum) vertex cover set.

We need to show :

(a) It is easy to show that G(V,E) such that |C| |C*| = 2.

(b) G(V,E), |C| |C*| <= 2

Refer to ** in algorithm, consider only a set of edges E

No two edges in E have the same endpoint

Assume | E | = K

C contains all vertices in E and |C| = 2K

a minimum vertex cover set for E is K vertices

a minimum vertex cover set C* for G is >= K vertices

So, |C| |C*| <= 2

b

a c

d

b

a

c

d

b

a c

d

(1) (2)

27

8.3 Maximum Programs Stored (PS) Problem

Optimization PS Problem: Given a set of n program and two storage devices. Let

si be the amount of storage needed to store the ith

program. Let L be the storage

capacity of each disk. Determine the maximum number of these n programs that

can be stores on the two disks (without splitting a program over the disks).

The decision PS problem is NPC : PARTITION PS (you should try this!)

Approximation PS Algorithm

// assume programs are sorted in nondecreasing order of program size

// i.e. s1 <= s2 <= … <= sn

i = 0; c=0; // c count the number of stored program

for j = 1 to 2 {

sum = 0

while ( sum + si <= L ) {

store ith

program into jth

device

sum += si

i++; c++

if i > n return

}

}

Example :

L=10 Si = (2, 4, 5, 6)

S1 S2 Disk 1

S3 Disk 2

0 6 10

28

Let C* be the optimal (maximum) number of programs that can be stores on the

two disks.

The above approximation PS algorithm gives very good performance ratio.

C* <= (C + 1) OR C

* C

<= 1 + 1/C

i.e. the given program stores at most 1 program less than the optimal solution

Theorem : The above approximation PS algorithm returns a number C such that

C* <= (C + 1) where C

* is the optimal value.

(a) It is easy to show that {s1, s2, …, sn} and L such that C* = (C + 1)

The above example gave C* = (C + 1)

(b) {s1, s2, …, sn} and L, C* <= (C + 1)

Let consider only one disk with capacity 2L.

It is obvious that we can store maximum number of programs into the disk by

considering programs in the order of s1 <= s2 <= … <= sn

Let be the maximum number of programs that are stored in the disk

Clearly >= C* and s1 + s2 + … + s <= 2L (i)

Let j be an index such that

(s1 + s2 + … + sj) <= L and (s1 + s2 + … + sj+1) > L (ii)

Obviously j <= and j programs are stored in the 1st disk by the above

approximation algorithm

By (i) & (ii), (sj+2 + sj+3 + … + s) <= L

(sj+1 + sj+2 + … + s-1) <= L

at least (j+1)th

program, (j+2)th

program, …, (-1)th

program

are stored in 2nd

disk by the above approximation alg. Done!

29

8.4. N-Processor Nonpreemptive Schedule Length Problem (Optional)

Give a set of n tasks and m processors. Produce a non-preemptive schedule with

minimum schedule length.

The decision problem can be proved to be NP-complete problem easily.

Let consider the following approximation LPT (largest processing time first)

algorithm :

Whenever a processor becomes free for assignment, assign the

task with the larger execution time among those available tasks.

Example :

T1 T2 T3 T4 T5 T6

10 8 7 5 3 1

Given m = 2 processors

LPT rule gives :

0 8 10 15 16 18

--------------------------------------------------------------------------

P1 | T1 | T4 | T5 |

--------------------------------------------------------------------------

P2 | T2 | T3 | T6 |

--------------------------------------------------------------------------

P1 = {T1,T4,T5} and P2 = {T2,T3,T6} , SL = 18

Optimal schedule :

P1 = {T1,T3} and P2 = {T2,T4,T5,T6} , SL = 17

30

Theorem : SL(LPT)/SL(OPT) <= 4/3 - 1/3m, i.e. The performance of LPT is at

most 33% worst than the optimal solution

Need to show :

There exists a task set TS such that SL(LPT)/SL(OPT) = 4/3 - 1/3m

Consider the following 2m+1 tasks (for m = even number), the execution

times of tasks are :

e(T2i-1) = e(T2i) = 2m-i for 1 <= i <= m

and e(T2m+1) = m.

Total execution time of 2m+1 tasks :

e(T1) = e(T2) = 2m-1

e(T3) = e(T4) = 2m-2

: :

: :

e(T2m-1) = e(T2m) = 2m – m = m

e(T2m+1) = m

2(2m2 -1-2-3+...-m) + m

2(2m2 – (m)(m+1)/2) + m

…

3m2

31

LPT rule produces the following schedule : SL(LPT) = 4m-1

0 3m-1 4m-1

| | |

______________________________________________________

p1 | T1 | T2m | T2m+1 |

p2 | T2 | T2m-1 |/////////////

p3 | T3 | T2m-2 |/////////////

p4 | T4 | T2m-3 |/////////////

: :

pi | Ti | T2m-(i-1) |/////////////

: :

pm-1 | Tm-1 | Tm+2 |/////////////

pm | Tm | Tm+1 |/////////////

Optimal schedule : SL(OPT) = 3m

0 3m

| |

______________________________________________________

p1 | T1 | T2m-2 |

p2 | T2 | T2m-3 |

p3 | T3 | T2m-4 |

p4 | T4 | T2m-5 |

: :

pi | Ti | T2m-(i+1) |

: :

pm-1 | Tm-1 | Tm |

pm | T2m-1 | T2m | T2m+1 |

We have :

SL(LPT)

------------ = (4m-1)/3m = 4/3 - 1/3m

SL(OPT)

32

For any set of tasks TS, SL(LPT)/SL(OPT) <= 4/3-1/3m

For m = 1, LPT is optimal

Let m >= 2, the proof is by contradiction

Assume that SL(LPT)/SL(OPT) <= 4/3 - 1/3m is not true.

Let TS = {T1,T2, ... , Tn} be the smallest set of tasks (least number of tasks) that

violates the bound. i.e. SL(LPT, TS) /SL(OPT, TS) > 4/3 - 1/3m.

WLOG, assume e(T1) >= e(T2) >= ... >= e(Tn)

Goal : To show that this TS has SL(LPT, TS) / SL(OPT, TS) <= 4/3 - 1/3m

contradiction!

Let f(Tx) and s(Tx) be the finishing time and starting time of Tx TS in LPT

schedule.

Claim 1 : f(Tn) = SL(LPT, TS) and f(Ti) < SL(LPT, TS) for 1 <= i <= n-1.

Proof :

if this is not true, then there will be a task Tk , k < n, such that

f(Tk) = SL(LPT, TS).

Consider a set of first k tasks of TS, TS’={T1,T2,...,Tk}, it is clear that

SL(LPT,TS’) >= SL(LPT,TS) > SL(OPT,TS) >= SL(OPT,TS’)

SL(LPT.TS’)/SL(OPT,TS’) > SL(LPT,TS)/SL(OPT,TS) > 4/3 - 1/3m

TS’ is the smaller set of tasks contradict to our assumption that TS is

the smallest set of tasks.

Therefore, claim 1 must be true.

33

Claim 2 : In an optimal schedule of TS, no processor can execute more than 2

tasks, i.e. SL(OPT,TS) < 3 * e(Tn)

Proof : // Let = e(T1) + e(T2) + … + e(Tn)

SL(OPT, TS) >= /m (i)

s(Tn) <= [e(T1) + e(T2) + … + e(Tn-1) ] / m (ii)

SL(LPT, TS)

= s(Tn) + e(Tn) from claim 1

<= [e(T1) + e(T2) + … + e(Tn-1) ] / m + e(Tn) by (ii)

= /m + (1-1/m) e(Tn) (iii)

SL(LPT, TS) / SL(OPT, TS)

<= [ /m + (1-1/m) e(Tn) ] / SL(OPT, TS) by (iii)

= (/m) / SL(OPT, TS) + [(1-1/m) e(Tn) ] / SL(OPT, TS)

<= 1 + [(1-1/m) e(Tn)] / SL(OPT, TS) by (i)

Since SL(LPT, TS) /SL(OPT, TS) > 4/3 - 1/3m, we have

1 + [(1-1/m) e(Tn)] / SL(OPT, TS) > 4/3 - 1/3m

[(m-1) e(Tn)] / m SL(OPT, TS) > (m-1)/(3m)

3 e(Tn) > SL(OPT, TS)

So, the optimal schedule length is less than 3 * the task with the smallest

execution time at most 2 tasks per processor

34

Now, we have claim 1 and claim 2, we want to transform the optimal schedule

S to a schedule S’ such that the schedule length will never increase,

i.e. SL(S’) <= SL(S).

Finally, we show that S’ is a LPT schedule, i.e. SL(S’) = SL(LPT).

This is the contraction we sought since we assumed

SL(LPT) /SL(OPT) > 4/3 - 1/3m but we have shown that

SL(LPT) /SL(OPT) <= 1.

Let consider the following transformation operations :

Type I : Swap positions of Tj and Ti

P’ | Ti | Tj | where e(Tj) > e(Ti)

P’ | Tj | Ti | note : no change in SL

Type II : Move Tu to the early starting time

P’ | Ti | Tu |

P” | Tj | where e(Ti) > e(Tj)

P’ | Ti |

P” | Tj | Tu | note : no increase in SL

Type III : Swap positions of Tu and Tv

P’ | Ti | Tu |

P” | Tj | Tv | where e(Ti) > e(Tj) and e(Tu) > e(Tv)

P’ | Ti | Tv |

P” | Tj | Tu | note : no increase in SL

35

Start with optimal schedule S (note : at most 2 tasks in each processor)

Apply these three transformations exhaustively until none can be applied.

Note : all intermediate schedules are also optimal schedules since schedule

length does not increase

Rearrange all the processors according to the 1st task as follow :

All processors with one tasks are arranged as in P1 to Ps

with e(Tk1) >= e(Tk2) >= ... >= e(Tks) and

All processors with two tasks are arranged as in Ps+1 to Ps+t

with e(Ti1) >= e(Ti2) >= ... >= e(Tit)

P1 | Tk1 |/////////////////

P2 | Tk2 |////////////////////

: :

: :

Ps | Tks |////////////////////////

Ps+1 | Ti1 | Tj1 |///////////////

Ps+2 | Ti2 | Tj2 |/////////////////

: :

: :

Ps+t | Tit | Tjt |////////////////////

Since no type I operation e(Tiz) >= e(Tjz) 1<= z <= t

Since no type II operation e(Tks) >= e(Ti1)

Since no type III operation e(Tjt) >= e(Tjt-1) >= ... >= e(Tj1)

e(Tk1) >= e(Tk2) >= ... >= e(Tks) >=

e(Ti1) >= e(Ti2) >= ... >= e(Tit) >=

e(Tjt) >= e(Tjt-1) >= ... >= e(Tj1)

Is this LPT schedule? Yes, except the following case :

36

P’ | Ti’ | Tj’ |

P” | Ti” | Tj” |

where Ti’ >= Ti” >= Tj” >= Tj’ and Ti’ > Ti” + Tj”

then LPT will look like

P’ | Ti’ |

P” | Ti” | Tj” | Tj’ |

This cannot happen in the new final schedule by claim 2.

this new final schedule S’ has SL(S’) <= SL(S), and S’ is a LPT schedule,

therefore

SL(LPT, TS) <= SL(OPT, TS) contradiction to out assumption.

Note : It can be shown that at most a finite number of type I, II and II

operations are needed to transform S S’.

37

8.5 Traveling Salesperson problem

Not all NPC problems have polynomial time approximation algorithms!

Recall : Traveling Salesman (TS) Problem

Give an undirected complete graph G=(V, E) with distance d(i,j) >= 0 for each

edge (i,j) for i j. Find a tour of all cities (a simple cycle with all vertices) with

minimum total distance.

If P NP, then there is no polynomial time approximation algorithm with bound

, where >= 1 is any constant, for TS problem.

Assume there is a polynomial time approximation algorithm A for TS

problem with integer bound .

We would like to show that there is a polynomial time algorithm to solve

hamiltonian-cycle problem P = NP contradiction!

We prove this by using algorithm A to solve HC problem

Let G=(V, E) be an instant of HC problem

Construct G’ = (V, E’) for TS in polynomial time as follows:

(u,v) E’ for all u, v V and u v

d(u,v) = *V +1 if (u,v) E

d(u,v) = 1 if (u,v) E

38

If algorithm A returns a tour <= *V

it must be cost V, since weight of each edge is either 1 or *V +1

it is HC in G

If algorithm A returns a tour > *V

since the bound is , optimal solution must be > V

contain at least one edge with cost *V +1

no solution to HC problem.

Therefore, Algorithm A can solve the HC problem.

Note : With some restrictions, you may still find approximation algorithm for TS

problem.

If d(u,w) <= d(u,v) + d(v,w) for any three vertices u, v, w V, then there is

polynomial time approximation algorithm ( = 2) for TS problem.

Skip algorithm, just show an example :

a d

b e

c h f g

a d

b e

c h f g

(i) MST (ii) Preorder walk

ordering from (i)