26
UNIT I INTRODUCTION 1. Algorithm An algorithm is a sequence of unambiguous instructions for solving a problem. Is a finite set of instructions that, if followed, accomplishes a particular task. 2. Algorithm Properties/characteristics Algorithm must satisfy the following 5 criteria o Input o Output o Definiteness o Finiteness o Effectiveness 3. Fundamentals of Algorithmic Problem Solving 1. Understanding the problem 2. Ascertaining the capabilities of a computational device 3. Choose between exact & approximate problem solving 4. Deciding on appropriate data structure 5. Algorithm design techniques 6. Methods of specifying an algorithm 7. Proving an algorithm’s correctness 8. Analysing an algorithm 9. Coding an algorithm 4. Important Problem Types Sorting, Searching, String Processing, Graph problems, Combinatorial problems, Geometric problems, numerical problems. 5. Fundamentals of the Analysis of Algorithm Efficiency a. General characteristics b. Measuring input size c. Units for measuring running time d. Orders of growth e. Worst-case, Best-case, Average-case efficiencies f. Asymptotic Notations 6. Running time measurement running time cannot be measured by seconds, millseconds & so on reason is o dependence of speed of a computer o dependence of quality of algorithm o complier used in generating the machine code o difficulty in clocking the actual running time. 7. Basic Operation The operation contributing the most of the total time & count the number of times the basic operation is executed.

DAA Concepts

Embed Size (px)

Citation preview

Page 1: DAA Concepts

UNIT I INTRODUCTION

1. Algorithm

An algorithm is a sequence of unambiguous instructions for solving a problem. Is a finite set of instructions that, if followed, accomplishes a particular task.

2. Algorithm Properties/characteristics

Algorithm must satisfy the following 5 criteria o Input o Output o Definiteness o Finiteness o Effectiveness

3. Fundamentals of Algorithmic Problem Solving

1. Understanding the problem 2. Ascertaining the capabilities of a computational device 3. Choose between exact & approximate problem solving 4. Deciding on appropriate data structure 5. Algorithm design techniques 6. Methods of specifying an algorithm 7. Proving an algorithm’s correctness 8. Analysing an algorithm 9. Coding an algorithm

4. Important Problem Types

Sorting, Searching, String Processing, Graph problems, Combinatorial problems, Geometric problems, numerical problems. 5. Fundamentals of the Analysis of Algorithm Efficiency

a. General characteristics b. Measuring input size c. Units for measuring running time d. Orders of growth e. Worst-case, Best-case, Average-case efficiencies f. Asymptotic Notations

6. Running time measurement

running time cannot be measured by seconds, millseconds & so on reason is

o dependence of speed of a computer o dependence of quality of algorithm o complier used in generating the machine code o difficulty in clocking the actual running time.

7. Basic Operation

The operation contributing the most of the total time & count the number of times the basic operation is executed.

Page 2: DAA Concepts

Basic operation is usually the most time-consuming operation in the algorithm’s inner loop.

T(n) = Cop * Cn Eg. Sorting comparison; Matrix mulplication multiplicaton & addition

8. Worst-Case efficiency of an algorithm

is the maximum number of steps that an algorithm can take for any collection of data values.

Is its efficiency for the worst case input of size ‘n’, which is an input of size ‘n’ for which the algorithm runs the longest among all possible inputs of that size.

Example: Sequential Search. Searched value may at the at the end or may not be. 9. Best-Case efficiency of an algorithm

is the minimum number of steps that an algorithm can take any collection of data values is its efficiency for the best-case input of size ‘n’, which is an input of size ‘n’, for which

the algorithm runs the fastest among all possible inputs of that size. Example: Sequential Search. Searched value may at the first position.

o The algorithm makes the smallest number of comparisons. Cbest (n) = 1 O(1) 10. Average-Case efficiency of an algorithm

can not be obtained by taking the average of the worst-case and best-case efficiencies. o Sometimes this is may be correct.

Efficiency averaged on possible inputs. Some assumptions about the possible inputs of size is made. Example: Sequential Search. Searched value may at the middle position.

o The algorithm makes the average number of comparisons. 11. Asymptotic Notations

The efficiency analysis framework concentrates on the order of growth of an algorithm’s basic operation count as the principal indicator of the algorithm’s efficiency.

To compare and rank such orders of growth , 3 notations are used. o O (big oh), Ω(big omega), θ(big theta)

12. O (big oh) notation

A function f(n) = O(g(n)) [f(n) Є O(g(n))], iff there exist positive constants c & no such that f(n)≤cg(n) for all n, n≥no.

13. Ω (big omega) notation

A function f(n) = Ω (g(n)) [f(n) Є Ω(g(n))], iff there exist positive constants c & no such that f(n) ≥cg(n) for all n, n≥no.

14. Θ (big theta) notation

A function f(n) = θ (g(n)) [f(n) Є θ(g(n))], iff there exist positive constants c1,c2 & no such that c1g(n) ≤f(n) ≤c2g(n) for all n, n≥no.

15. Steps in analyzing efficiency of non-recursive algorithms

1. Decide on a parameter indicating an input’s size

Page 3: DAA Concepts

2. Identify the basic operation (located in innermost loop) 3. Check whether the number of times the basic operations is executed depended only on

the input size of an input. If it also depends on some additional property, worst, average, best case efficiencies are investigated.

4. Set up a sum expressing the number of times the algorithm’s basic operations is executed. 5. Using standard formulas and rules of sum manipulation, either find closed-form formula

for the count or at the very best establish its order of growth. 16. Steps for analyzing efficiency of recursive algorithms

1. decide on a parameter indicating an input size 2. identify the algorithm’s basic operation 3. check whether the number of times the basic operation executed can vary on different

inputs of the same size. If it can , investigate worst, best, average case. 4. set up a recursive relation, with an appropriate initial condition for the number of times

the basic operation is executed. 5. solve the recurrence or at least ascertain the order of growth of its solution.

17. Time Efficiency Time efficiency is measured by counting the number of time the algorithm basic operation is executed. 18. Space Efficiency Space efficiency is measured by counting the number of extra memory units consumed by the algorithm. 19. Need for case efficiencies The efficiencies of some algorithms may differ significantly for inputs of the same size. For such algorithms, case (best, worst, average) efficiencies is needed. 20. Need for algorithm analysis framework The framework’s primary interest lies in the order of growth of the algorithm’s running time as its input size goes to infinity. 21. Efficiency classes The efficiencies of a large number of algorithms fall into the following few classes; Constant, Logarithmic, Linear, “n-log-n”, Quadratic, Cubic, and Exponential. 22. Main tool for anlaysing the time efficiency of a non-recursive algorithm Setting up a sum expressing the number of executions of its basic operation and ascertain the sum’s order of growth. 23. Main tool for anlaysing the time efficiency of a non-recursive algorithm

Setting up a recurrence expressing the number of executions of its basic operation and ascertain the solution’s order of growth. 24. Brute Force Method Brute force is a straightforward approach to solving a problem, usually directly based on the problem’s statement and definitions of the concepts involved. Examples: Matrix multiplication, selection sort, sequential search.

Page 4: DAA Concepts

25. Strength & Weakness of Brute Force Method Strength Wide applicability and simplicity Weakness Efficiency 26. Exhaustive Search Exhaustive search is a brute force approach to combinatorial problems. It suggests generating each and every combinatorial object of the problem, selecting those of them that satisfy the problem’s constraints and then finding a desired object. This search is impractical for large problems.

UNIT II DIVIDE AND CONQUER METHODOLOGY

27. What is Divide and conquer methodology?

1. A problem instance is divided into several smaller instances of the same problem, ideally of about the same size.

2. The smaller instances are solved (typically recursively though sometimes a different algorithm is employed when instances become small enough)

3. If necessary, the solutions obtained for the smaller instances are combined to get a solution to the original problem.

Algorithm DANDC(P) if SMALL(P) then return SOLUTION(P) else divide P into smaller instances, P1,P2,…..Pk k≥1 Apply DANDC to each of these subproblems Return COMBINE(DANDC(P1),DANDC(P2),….,DANDC(Pk)) 28. Computing time for finding the sum of n numbers Time efficiency T(n) of many divide and conquer algorithms satisfies the equation T(n)=aT(n/b) + f(n).

T(1) n=1 T(n) = aT(n/b) + f(n) n>1 Let a=b=2, T(1)=2, f(n) = n T(n) = 2T(n/2) + n = 2(2T(n/4) + n/2)+n = 4T(n/4) +2n = 4(2T(n/8) + n/4) +2n = 8T(n/8) + 3n = 2iT(n/2i) + in for i≤i≤log2n

if i = log2n then

Page 5: DAA Concepts

T(n) = nT(1) + nlog2n [n/2log2

n = 1; 2log2

n = n ] = nlog2n + 2n

29. Merge Sort A merge sort works as follows:

1. If the list is of length 0 or 1, then it is already sorted. Otherwise: 2. Divide the unsorted list into two sublists of about half the size. 3. Sort each sublist recursively by re-applying merge sort. 4. Merge the two sublists back into one sorted list.

Data structure used Array

Worst case performance Θ(nlogn)

Best case performance Θ(nlogn)

Average case performance Θ(nlogn)

Worst case space complexity Θ(n)

Its principal drawback is the significant extra storage requirement. 30. Computing time for Merge sort

a n=1 a & c are constants T(n) = 2T(n/2) + cn n>1 When n is a power of 2, n=2k [k = log2n] T(n) = 2(2T(n/4) + c(n/2)) + cn = 4T(n/4) + 2cn = 4(2T(n/8) + c(n/4) + 2cn …………….. ……………. = 2kT(1) +kcn = an +c(nlogn) T(n) = O(nlogn) for Worst & Best case 31. Recurrence relation for the number of key comparisons in Merge sort

0 n=1 C(n) = 2C(n/2) +Cmerge(n) n>1 Cmerge(n) is the number of key comparisons performed during the merging stage. In worst case, neither of the two arrays becomes empty before other one contains just

one element. Cmerge(n) = n-1

Worst case 0 n=1 Cworst(n) = 2Cworst(n/2) + n-1 n>1

Page 6: DAA Concepts

if n=2k then Cworst(n) = nlog2n – n+1 & T(n) = O(n log n)

Best case 0 n≤1 Cbest(n) = 2Cbest(n/2) + n/2 n>1 32. Disadvantages of Merge sort

1. Extra space (auxiliary array) ‘n’ is needed during merge process 2. Stack space is increased by the use of recursion. Maximum depth of the stack is log n 3. Time is spent on recursion instead of sorting

33. Quick Sort Quicksort sorts by employing a divide and conquer strategy to divide a list into two sub-lists. The steps are:

1. Pick an element, called a pivot, from the list. 2. Reorder the list so that all elements which are less than the pivot come before the pivot

and so that all elements greater than the pivot come after it (equal values can go either way). After this partitioning, the pivot is in its final position. This is called the partition operation.

3. Recursively sort the sub-list of lesser elements and the sub-list of greater elements. Works by partitioning its input’s elements according to their value relative to some preselected element, called pivot element. The elements to the left of pivot should be less than the pivot element and the elements to the right of pivot should be greater than the pivot element.

Worst case performance Θ(n2)

Best case performance Θ(nlogn)

Average case performance Θ(nlogn) comparisons

Worst case space complexity Varies by implementation

Optimal Result Sometimes

34. Binary Search Binary search is a O(log n) algorithm for searching in sorted arrays. Search a sorted array by repeatedly dividing the search interval in half. Begin with an interval covering the whole array. If the value of the search key is less than the item in the middle of the interval, narrow the interval to the lower half. Otherwise narrow it to the upper half. Repeatedly check until the value is found or the interval is empty.

Data structure Array

Page 7: DAA Concepts

Worst case performance О(log n)

Best case performance O(1)

Average case performance O(Log n)

Worst case space complexity O(1)

Optimal Yes

35. Worst-case analysis of binary search - key comparisons in the worst case is Cw(n)

worst case no match, first or last element matches the key - after one comparison, half the size of array is considered, so recurrence relation for Cw(n)

1 n=1

Cw(n)= Cw|n/2| + 1 n>1

- standard way of solving recurrences is to assume n=2k (n is a power of 2) and solve by backward substitution

Cw(n) = Cw|2k/2| +1 = log2n +1 = log2(n+1)

To prove, Cw(n) = log2 n +1, consider n is positive and even n=2i, i>0 Cw(n) = Cw|n/2| +1 1) LHS Cw(n) = log2 n +1 = log2 2i +1 = log2 2 + log2 i +1 = 1 + log2 i +1 = log2 i +2 RHS Cw|n/2| +1 = Cw|2i/2| +1 = Cw(i) +1 = (log2 i +1) + 1 = log2 i +2 2) Cw(n) = Cw|n/2| +1

Cw(2i) = Cw|2i/2| +1 Cw(2i) = Cw(i) +1 Cw|2i/2| +1= Cw(i) +1 Cw(i) +1 = Cw(i) +1 (log2 i +1) + 1 = (log2 i +1) + 1 from Cw(n) = log2 n +1

Page 8: DAA Concepts

log2 i +2 = log2 i +2 - worst-case efficiency is O(log n) 36. Multiplication Of Large Integers - decrease the total number of multiplications performed at the expense of a slight increase in

the number of addition. - Modern cryptology, require manipulation of integers that are over 100 decimal digits long

Such integers are too long to fit in a single word of a modern computer, so they require special treatment

DANC technique - to multiply 2 n-digit numbers - reduces the number of multiplication - Representation

C = a * b a first number bsecond number n digits A = a1ao b= b1b0 = a1*10n/2 + ao = b1*10n/2 + b0 C = (a1*10n/2 + ao)* (b1*10n/2 + b0) [10n/2*10n/2=10n/2+n/2 = (a1* b1)10n + (a1* b0 +a0* b1)10n/2+(a0* b0) =102n/2 ]

- Formula C= C210n + C110n/2 + C0

C2 = a1* b1 C0 = a0* b0 C1 = (a1 + a0)* (b1+ b0) – (C2+C0)

37. How many digit multiplications does algorithm for multiplying 2 large integers make? - Multiplication of n-digit numbers requires 3 multiplications of n/2 digit numbers. - Recurrence relation for number of multiplications is

1 n=1 M(n) = 3M(n/2) n>1 by backward substitution for n=2k M(2k) = 3M(2k-1) = 3(3M(2k-2)) = 32M(2k-2) In general, 3iM(2k-i) = 3kM(2k-k) = 3k Since k = log2 n M(n) = 3log

2n [by property of logarithms a log

bc = c log

ba ]

= n log23 ≈ n1.585

DANDC algorithm for multiplying two n-digit integers requires about n1.585

One-digit multiplications.

Page 9: DAA Concepts

38. Strassen’s Algorithm Strassen’s algorithm needs only seven multiplications to multiply two 2-by-2 matrices but requires more addition than the definition based algorithm. By exploiting the DANDC technique, this algorithm can multiply two n-by-n matrices with about n1.585 multiplications.

39. Strassen’s method

( A11 A12 ) ( B11 B12 ) ( C11 C12 ) ( A21 A22 ) * ( B21 B22 ) = ( C21 C22 )

C11 = m1+m4-m5+m7 C12 = m3+m5 C21 = m2 + m4 C22 = m1+m3-m2+m6 Where m1 = (A11+A22) (B11+B22) m2 = (A21+A22)B11 m3 = A11(B12-B22) m4 = A22(B21-B11) m5 = (A11+A12)B22 m6 = (A21-A11)(B11+B12) m7 = (A12-A22)(B21+B22) - this method needs 7 multiplications and 18 additions or subtractions

7 multiplications in m1,m2,…,m7 10 additions or subtractions in m1,m2,….,m7 8 additions or subtractions for computing Cij’s

- Recurrence relation

b n≤2 T(n) =

7T(n/2) + an2 n>2 40. How many number of multiplications M(n) made by Strassen’s method?

Recurrence relation 1 n=1

M(n) = 7M(n/2) n>1

Let n = 2k

M(n) = 7M(2k-1) = 7(7M(2k-2)) = 72M(2k-2) In general 7iM(2k-i) = 7kM(2k-k) = 7kM(1) = 7k since k = log2n

Page 10: DAA Concepts

M(n) = 7log2n

= n log27 (by logarithm property) ≈ n2.807 which is smaller than n3 41. Strassen’s Matrix Multiplication for matrix of order n>2 Consider 4 x 4 matrix. A B R S A11 A12 A13 A14 B11 B12 B13 B14 X = A21 A22 A23 A24 Y = B21 B22 B23 B24 A31 A32 A33 A34 B31 B32 B33 B34 A41 A42 A43 A44 A41 A42 A43 A44 C D T U Find Z = X x Y I Method

A B R S X = C D Y = T U AR+BT AS+BU Z = CR+DT CS+DU

Apply Strassen’s method to multiply AR,BT,AS,BU,CR,DT,CS,DU II Method

A B R S X = C D Y = T U

Apply Strassen’s method to multiply X & Y M1=(A+D)*(R+U) L=M1+M4+-M5+M7 M2=(C+D)*R M=M3+M5 M3=A*(S-U) N=M2+M4 M4=D*(T-R) O=M1+M3-M2+M6 M5=(A+B)*U M6=(C-A)*(R+S) M7=(B-D)*(T+U) Z = L M

Page 11: DAA Concepts

N O Adding Zeroes to the non-power of two order matrices Z = X x Y X = 3 x 4 order matrix Y = 4 x 3 order matrix Z = 3 x 3 order matrix Let 1 1 1 1 X = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Y = 1 1 1 1 1 1 Add zeroes to the last row and last column of the X & Y matrices, respectively OR zeroes to the first row and first column of the X & Y matrices, respectively 1 1 1 1 X’ = 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 0 1 1 1 0 Y’= 1 1 1 0 1 1 1 0 Z = X’ x Y’ Apply Strassen’s method

GREEDY METHOD 42. What is Greedy method? - The greedy approach suggests constructing a solution through a sequence of steps, each

expanding a partially constructed solution obtained so far, until a complete solution to the problem is reached.

- On each step the choice made must be Feasible (ie) it has to satisfy the problem’s constraints Locally optimal (ie) it has to be the best local choice among all feasible choices

available on that step Irrevocable (ie) once made, it cannot be changed on subsequent steps of the algorithm

Algorithm GREEDY(a,n)

Page 12: DAA Concepts

solution = 0 for I = 1 to n x = SELECT (a) if FEASIBLE (solution, x) then solution = UNION(solution,x) return solution 43. Spanning Tree - a spanning tree of a connected graph is its connected acyclic subgraph ((ie) a tree) that

contains all the vertices of a graph. 44. Minimum Spanning tree - a minimum spanning tree of a weighted connected graph is its spanning tree of the smallest

weight, where the weight of a tree is defined as the sum of the weights on all its edges. 45. Prim’s algorithm Prim’s algorithm is a greedy algorithm for constructing a minimum spanning tree of a weighted connected graph. It works by attaching to a previously constructed subtree a vertex closest to the vertices already in the tree.

Prim's algorithm is an algorithm that finds a minimum spanning tree for a connected weighted graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized.

46. Steps-Prim’s algorithm

1. consider a vertex as the initial vertex for the minimum spanning tree 2. shortest edge from the initial vertex is selected

a. to find the shortest edge the information about the nearest vertices are provided in the following way

i. name of the nearest vertices and length(weight) of the corresponding edge. This is Fringe. Fringe contains only the vertices that are not in the tree but are adjacent to at least one tree vertex. These are candidates from which the next tree vertex is selected.

ii. Vertices that are not adjacent to any of the tree vertex can be given a ∞ label. This is Unseen. All the other vertices of the graph, is called Unseen, because they are yet to be affected by the algorithm.

3. After identifying a vertex u* to be added to the tree, perform 2 operations a. move u* from the set V-VT to the set of tree vertices VT b. for each remaining vertex u in V-VT that is connected to u* by a shorter edge than

the u’s current distance label, update its labels by u* and weight of the edge between u* & u, respectively.

47. Implementation of Prim’s algorithm (Data structure used) - efficiency of Prim’s algorithm depends on the data structure chosen for the graph

Priority queue is the best - If the graph is represented by its weight matrix and priority queue is implemented by as a

unordered array, the algorithm’s running time will be θ(n2), where n is the number of vertices in the graph.

Page 13: DAA Concepts

- On each of the n-1 iterations, the array implementing the priority queue, is traversed to find and delete the minimum and then to update, if necessary, the priorities of the remaining vertices.

- The priority queue can be implemented with min-heap. - If the graph is represented by its adjacency linked list and the priority queue is implemented

as a min-heap, the running time of the algorithm is O(Elogn) This is because the algorithm performs n-1 deletions of the smallest element and

makes E verifications and possibly, changes of an elements priority in a min-heap of size not greater than ‘n’.

- Each of this operations, is a O(log n) operations - Running time of this implementation of Prim’s algorithm is

((n-1) + E)O(log n) = O(Elogn) in a connected graph, n-1 ≤ E 48. Kruskal’s Algorithm - This algorithm looks at a MST for a weighted connected graph G=<V,E> as an acyclic

subgraph with V-1 edges for which the sum of the edge weights is the smallest. - Algorithm constructs a MST as an expanding sequence of subgraphs, which are always

acyclic but not necessarily connected on the intermediate stages of the algorithm - Algorithm begins by sorting the graph’s edges in non-decreasing order of their weights. - Then starting with the empty subgraph, it scans this sorted list adding the next edge on the

list to the current sub-graph, if such inclusion does not create a cycle and simply skipping the edge otherwise.

- It will generally only be a forest since the set of edges ‘t’ can be completed into a tree iff there are no cycles in ‘t’.

OR Kruskal’s algorithm is a greedy algorithm for the minimum spanning tree problem. It constructs a minimum spanning tree by selecting edges in increasing order of their weights provided that the inclusion does not create a cycle. It requires Union-Find algorithms. 49. Dijkstra’s algorithm Dijkstra’s algorithm solves the single-source shortest path problem of finding shortest paths from a given vertex(the source) to all the other vertices of a weighted graph or digraph. It works as Prim’s algorithm but compares path lengths rather than edge lengths. Dijkstra’s algorithm always yields a correct solution for a graph with non-negative lengths. 50. Difference between Dijktra’s and Prim’s algorithm

1. Dijstra’s finds single source shortest path 2. Dijkstra’s compares path lengths and therefore must add edge weights 1. Prim’s finds minimum spanning tree 2. Prim’s compares the edge weights as given

51. Difference between Prim’s & Kruskal’s algorithm

1. Prim’s algorithm expands the tree by one vertex at a time, by selecting the minimum vertex available at that stage

2. At any point of time the resultant tree is a tree with no cycles

1. Kruskal’s algorithm selects the minimum vertex among all vertices

Page 14: DAA Concepts

2. At any point time the resultant tree may be a forest 3. All edges are sorted in increasing order

52. Difference between Greedy and Dynamic Programming Method - In the greedy method only one decision sequence is ever generated

- In the dynamic programming, many decision sequences may be generated. - Dynamic programming algorithms often have a polynomial complexity. 53. Single Source Shortest Path (SSSP) Problem (Dijkstra’s algorithm) - Given a directed graph G = (V,E), with non-negative costs on each edge, and a selected source node

v in V, for all w in V, find the cost of the least cost path from v to w. - The cost of a path is simply the sum of the costs on the edges traversed by the path. 54. Data structures used by Dijkstra's algorithm include:

a cost matrix C, where C[i,j] is the weight on the edge connecting node i to node j. If there is no such edge, C[i,j] = infinity.

a set of nodes S, containing all the nodes whose shortest path from the source node is known. Initially, S contains only the source node.

a distance vector D, where D[i] contains the cost of the shortest path (so far) from the source node to node i, using only those nodes in S as intermediaries.

UNIT III

DYNAMIC PROGRAMMING 55. What is Dynamic programming?

- is a technique for solving problems with overlapping sub-problems o These sub-problems arise from a recurrence relating a solution to a given problem

with solutions to its smaller sub-problems of the same type. - Is an algorithm design method that can be used when the solution to a problem can be

viewed as the result of a sequence of decisions o An optimal sequence of decisions can be found by making the decisions one at a

time and never making an erroneous decision - It suggests solving each smaller sub problem once and recording the results in a table

from which a solution to the original problem can be than obtained. 56. Principle of Optimality

The principle of optimality state that an optimal sequence of decisions has the property that whatever the initial state and decision are, the remaining decisions must constitute an optimal decision sequence with regard to the state resulting from the first decision.

OR An optimal solution to any of its instance must be made up of optimal solutions to its sub-instances.

57. Computing Binomial Coefficient

- Binomial coefficient denoted by C(n,k) or

k

n, is the number of combinations (subsets)

of k elements from an n-element set (0≤k≤n)

Page 15: DAA Concepts

- The binomial coefficient defined by factorials

k

n=

!)!(

!

kkn

n

(0≤k≤n)

k

n= 0 k<0 or k>n

- numerous properties of binomial coefficients, concentrates on the following two:

C(n,k)=C(n-1,k-1) + C(n-1,k) for n>k>0 C(n,0) = C(n,n) = 1 C(n,k) = 1 k=0,k=n C(n-1,k-1)+C(n-1,k) k>0

- to solve, record the values of the binomial coefficients in a table of n+1 rows and k+1

columns, numbered from 0 to n and from 0 to k, respectively. - to compute C(n,k) fill the table row by row, starting with row 0 and ending with row n. - each row i (0≤i≤n) is filled left to right, starting with 1 because C(n,0) = 1 - Rows 0 through k also end with 1 on the table’s main diagonal C(i,i)=1 (0≤i≤k) - Compute other entries, adding the contents of the cells in the preceding row and previous

column and in the preceding row and the same column. 58. Time efficiency for computing binomial coefficient

- basic operation is Addition - Let A(n,k) be the total number of additions for computing C(n,k) - Computing each entry in the table requires just one addition - The first k+1 rows of the table form a triangle - Remaining n-k rows of the table form a rectangle

o So split the sum expressing A(n,k) as

n

ki

k

j

k

i

i

j

knA1 11

1

1

11),(

=

k

i

n

ki

ki1 1

)1(

=

n

ki

k

i

k

i

ki111

11

= )(2

)1(knkk

kk

= 2

)1( kk+k(n-k)

= O(nk)

Working to prove

Page 16: DAA Concepts

A(n,k) = 2

)1( kk+k(n-k)

n

i

i1

= 2

)1( nn =

2

)1( kk

k

i 1

1 = u-l+1 = k-1+1 = k

n

ki 1

1 = u-l+1 = n-(k+1)+1 = n-k-1+1 = n-k

k

i

k

i

i11

1 = 2

)1( kk - k =

2

2)1( kkk =

2

22 kkk =

2

2 kk =

2

)1( kk

A(n,k) = 2

)1( kk+k(n-k)

59. Warshall’s algorithm

- constructs the transitive closure of a given digraph with n vertices through a series of n x n Boolean matrices

R0,R1, …..Rk-1,Rk,……,Rn - each of these matrices provides certain information about directed path in the digraph. - The element Rij

k in the ith row & jth column of the matrix Rk (k=0,1…,n) is equal to 1 iff there exists a directed path from the ith vertex to the jth vertex with each intermediate value, if any, numbered not higher than k.

- formula for generating the elements of matrix Rk from the elements of matrix Rk-1 is

rijk= rij

k-1 OR (rikk-1 and rkj

k-1)

o if an element rij is 1 in Rk-1, it remains 1 in Rk. o If an element rij is 0 in Rk-1, it remains 1 in Rk iff the element in its row i &

column k and the element in its column j & row k are both 1 in Rk-1. - time efficiency is cubic ,O(n3)

60. Floyds Algorithm

- used to solve All-pairs shortest-paths problem - uses the idea of Warshall’s algorithm - Given a weighted connected graph (undirected or directed), the all-pairs shortest paths

problem finds the distances (the lengths of the shortest paths) from each vertex to all other vertices.

- The lengths of the shortest path is recorded in an n x n matrix D called Distance matrix. o The element dij in the ith row and the jth column of this matrix indicates the length

of the shortest path from the ith vertex to the jth vertex 1≤i, j≤n - Computes the distance matrix of a weighted graph with ‘n’ vertices through a series of n

x n matrices. D0,D1, …..,Dk-1,Dk,……,Dn - each of these matrices contains the lengths of the shortest paths with certain constraints

on the paths considered.

Page 17: DAA Concepts

- The element Dijk in the ith row & jth column of the matrix Dk (k=0,1…,n) is equal to

length of the shortest path among all paths from the ith vertex to the jth vertex with each intermediate value, if any, numbered not higher than k.

- Shortest path among the paths that use the kth vertex is equal to dikk-1 + dkj

k-1

Min dijk-1 , dik

k-1 + dkjk-1 k≥1

dijk = wij k=0

- the element in the ith row & jth column of the current distance matrix Dk-1 is replaced by

the sum of the elements in the same row i & the kth column and in the same column j & kth column iff the latter sum is smaller than its current value.

- time efficiency is O(n3) 61. Difference between Warshall’s & Floyd’s algorithm Warshall’s

- Input is adjacency matrix - Output is transitive closure - If there is no direct edge between vertices, the value in adjacency matrix is zero - Algorithm looks for 1’s in the adjacency matrix to find transitive closure matrix - Transitive closure matrix is a Boolean matrix

Floyd’s - Input is weight matrix - Output is distance matrix - If there is no direct edge between vertices, the value in weight matrix is infinity and

diagonal elements are zero - Algorithm looks for minimum value in the weight matrix to find distance matrix - Distance matrix is not a Boolean matrix and will not have infinity

62. 0/1 Knapsack Problem

- Given ‘n’ items of known weights w1,….,wn and values v1,..vn and a knapsack of capacity W. Find the most valuable subset of the items that fill into the knapsack. All weights and knapsack capacity are positive integers and the item values can be real values.

- Aim is to fill the knapsack in a way that maximizes the value of the included objects, while respecting the capacity constraint.

- Let xi be zero is we don’t select the object i or 1 if we include object i. - The problem may be stated as

Maximize vixi i=1 to n Subject to wixi <W

where vi>0, wi>0 and xi0,1

- To solve the problem by Dynamic Programming, we set up a table V(1:n,0:W), with one row for each available object and one column for each weight from 0 to W.

- The solution of the instance can be found in V(n,W). - Fill the table either row by row or column by column. - the recurrence for knapsack problem is

V(i,j)=MaxV(i-1,j), V(i-1,j-wi)+vi j-wi≥0 V(i-1,j) j-wi<0 If V(i-1,j) is larger, the object i is discarded.

Page 18: DAA Concepts

If V(i-1,j-Wi)+vi is larger, the object i is included

- initial conditions are V(0,j)=0 j0 V(i,j)=- for all i when j<0 V(i,0) = 0 i≥0

- fill the table V, using the above formula - solution is found at V(n,W) - to find solution vector start from V(n,W) and track back the computations in the table - to add or discard an item i, the following criteria should be satisfied for each item.

1. If V(i,j) = V(i-1,j) and V(i,j) V(i-1,j-wi)+vi then discard item i. 2. If V(i,j) V(i-1,j) and V(i,j) = V(i-1,j-wi)+vi then include item i. 3. If V(i,j) = V(i-1,j) and V(i,j) = V(i-1,j-wi)+vi then include item i. 4. If V(i,j) V(i-1,j) and V(i,j) V(i-1,j-wi)+vi then discard item i.

- if item ‘i’ is added, update the weight of the items included in the sack. - Time & space efficiency is O(nW), time necessary to construct the table - The composition of the optimal load can be determined in a time of O(n+W)

63. Memory Functions

- The direct top-down approach to finding a solution to such a recurrence leads to an algorithm that solves common subproblems more than once and hence inefficient (exponential or worse)

- The classic 0/1 knapsack of dynamic programming works bottom-up. o The solutions of some of these smaller subproblems are often not necessary for

getting a solution to the problem given. - The goal is to get a method that solves only subproblems that are necessary and does it

only once. Such a method exists; it is bases on using Memory Functions. o This method solves a given problem in the top-down manner but, in addition,

maintains a table of the kind that is used in bottom-up approach. o Initially, all the table’s entries are initialized with a special “nul” symbol to

indicate that they have not yet been calculated. o Whenever a new value needs to be calculated, the method checks that

corresponding entry in the table first; if this enry is not “null”, it is simple retrieved from the table; otherwise it is computed by the recursive call whose result is than recorded in the table.

o After initializing the table, the recursive function needs to be called with i=n(no. of items) and j=W (knapsack capacity)

63.a. Optimal Binary Search Tree

An optimal binary search tree, is a binary search tree where the average cost of looking up an item (the expected search cost) is minimized.

UNIT IV BACKTRACKING

64. Backtracking Backtracking constructs its state-space tree in the depth-first search fashion. If the sequence of choices represented by a current node of the state-space tree can be developed

Page 19: DAA Concepts

further without violating the problem’s constraints, it is done by considering the first remaining legitimate option for the next component. Otherwise, the method backtracks by undoing the last component of the partially built solution and replaces it by the next alternative. 64.a. Characteristics of Backtracking

Backtracking is typically applied to difficult combinatorial problems for which no efficient algorithms for finding exact solutions possibly exist.

Unlike the exhaustive search approach, which is doomed to be extremely slow for all

instance of a problem, backtracking at least holds a hope for solving some instances of non-trivial sizes in an acceptable amount of time. This is especially true for optimization problems, for which the idea of backtracking can be further enhanced by evaluating the quality of partially constructed solutions.

Even is backtracking does not eliminate any elements of a problem’s state space and ends

up generating all its elements, it provides a specific technique for doing so, which can be of value in its own right.

65. State Space Tree

State-space tree is a rooted tree whose nodes represent partially constructed solutions to the problem in question. Is constructed in the manner of depth first search. Its root represents an initial state before the search for a solution begins. The nodes of the first level in the tree represent the choices made for the first component of a solution; the nodes of the second level represent the choices for the second component and so on. 66. Promising Node

A node in a state-space is said to be promising if it corresponds to a partially constructed that may still lead to a complete solution. 67. Explicit Constraints

Explicit constraints are rules that restrict each xi to take a value only from a given set. xi≥0 Si=all positive real numbers xi=0 or 1 Si=0,1 li≤xi≤ui Si=a : li≤a≤ui

The explicit constraints depend on the particular instance I of the problem being solved. All tuples that satisfy the explicit constraints define a possible solution space for I. 68. Implicit Constraints

Implicit constraints are rules that determine which of the tuples in the solution space of I satisfy the criterion function. Thus implicit constraints describe the way in which the xi must relate to each other. 69. N Queens Problem

Is a combinatorial problem to place n queens on an n-by-n chessboard, so that no two queens attack each other by being in the same row or in the same column or on the same diagonal.

The Explicit constraint: Value of xi must be from S=1,2,3,…,n. The solution space consists of nn n-tuples. Xi represents the column numbers

Page 20: DAA Concepts

The Implicit constraint: No two xi’s can be the same (ie) all queens must be on different

columns and no two queens can be on the same diagonal. Two queens lie on the same diagonal iff |j-l| = |i-k|.

70. Hamiltonian Circuit (Cycle) Let G=<V,E> be a connected graph with n vertices. A Hamiltonian cycle is a round-trip

path along n edges of G that visits every vertex once and returns to its starting position. If a Hamiltonian cycle begins at some vertex Gv 1 and the vertices of G are visited in the order v1,v2,….,vn+1 then edges (vi,vi+1) are in E, 1≤i≤n and the vi are distinct except for v1 and vn+1 which are equal. Implicit constraints

- all vertices should be included in the cycle - all vertices should be distinct except x1,xn+1

- only distinct cycles are output. Explicit constraint

- xi= vertex number 71. Sum of Subsets

Given positive numbers wi, 1in and m. The problem is to find all subsets of the wi whose sum equals m. The problem can be formulated using either Fixed or Variable sized tuples. 72. Sum of subsets Formulations

Variable Size State space tree is constructed using Breadth First Search (Queue Method) Tree representation is not a Binary Tree representation Xi values are weights or indices of weights Solution vector is k-tuple Fixed Size State space tree is constructed using D Search (Depth Search, Stack Method) Tree representation is Binary Tree representation Xi values are either 1 or 0 Solution vector is n-tuple

BRANCH AND BOUND

73. Branch and Bound

Branch and bound is an algorithm design technique that enhances the idea of generating a state-space tree with the idea of estimating the best value obtainable from a current node of the decision tree: if such an estimate is not superior to the best solution seen up to that point in the processing, the node is eliminated from further consideration. 74. Principal idea behind Branch and Bound Technique

- problem is represented in state space tree - a node’s bound value is compared with the value of the best solution seen so far:

o if the bound value is not better than the best solution seen so far, (ie) not smaller for a minimization problem and not larger for a maximization problem, the node is non-promising and can be terminated.

Page 21: DAA Concepts

o No solution obtained from it can yield a better solution than the one already available.

75. A search path at the current node in a state space tree is terminated for any one of the following 3 reasons:

1. value of the node’s bound is not better that the value of the best solution seen so far. 2. node represents no feasible solutions because the constraints of the problem is already

violated. 3. subset of feasible solutions represented by the node consists of a single point (no further

choices can be made) o the value of the objective function for this feasible solution is compared with that

of the best solution seen so far and update the later with the former, if the new solution is better.

76. Feasible Solution Feasible solution is a point in the problem’s search space that satisfies all the problem’s constraints. Example: Cycle in Traveling sales person problem, Items whose weight does not exceed the capacity of bag. 77. Optimal Solution Optimal solution is a feasible solution with the best value of the objective function. Example: shortest path in Traveling Salesperson problem, most valuable items that fits the bag. 78. Best-First Branch and Bound Strategy

- In branch and bound, instead of generating a single child of the last promising node, all the children of the most promising among non-terminated leaves in the current tree is generated. (Non-terminated still promising, leaves are live).

- Compare the lower bounds of the live nodes to find which of the nodes is most promising and consider a node with the best bound as most promising.

- This strategy is called Best First branch and bound. 79. Knapsack Problem

- order the items of a given instance in descending order by their value-to-weight ratios (vi/wi)

v1/w1 ≥ v2/w2 ≥…….≥ vn/wn - Compute upper bound ub = vi + (W-wi)(vi+1/wi+1) - Root of the state space, has no items and total weight of the items already selected W and

their total value V is equal to 0 o Value of the ‘ub’ is computed by formula.

- Next level nodes ‘w’,’v’,’ub’ is computed o Level’s left node’s ‘ub’ and right node’s ‘ub’ is compared. o Higher ‘ub’ value, node is selected for finding the next value, in maximization

problem. o Also ‘w’ at each level is checked to find whether it exceeds the bag capacity. If a

node’s ‘w’ exceeds the bag capacity, the path will not give feasible solution. 80. Assignment Problem

- is the problem of assigning n people to n jobs so that the total cost of the assignment is a small as possible.

Page 22: DAA Concepts

- Assignment problem is specified by an n-by-n cost matrix, C. - The problem is stated as follows:

o Select one element in each row of the matrix so that no 2 selected elements are in the same column and their sum is the smallest possible.

81. Traveling Salesperson Problem

The problem is to find a least-cost tour of the N cities in a sales region. The tour is to visit each city exactly once. To help find an optimum tour, the salesperson has a cost matrix C, where element C(i,j) equals the cost (usually in terms of time, money or distance) of direct travel between city i & city j. 82. Common issues in Backtracking and Branch and Bound

Backtracking and Branch & Bound - are used to solve some large instance of difficult combinatorial problems - can be considered as an improvement over exhaustive search - Are based on the construction of a state-space-tree, whose nodes reflect specific choices

made for a solution’s components. - Terminate a node as soon as it can be guaranteed that no solution, to the problem can be

obtained by considering choices that correspond to the node’s descendants. 83. Difference between Backtracking and Branch and Bound Backtracking

- Applicable to non-optimization problems. - State-space-tree is developed as Depth-first.

Branch & Bound - Applicable only to optimization problems because it is based on computing a bound on

possible values of the problem’s objective function. - Can generate nodes according to several rules; the most natural rule is Best-first rule. - Has both the challenge and opportunity of choosing an order of node generation and

finding a good bounding function. - Best first rule may or may not lead to a solution faster than other strategies.

84. Difference between Backtracking and Backward approach

Backtracking 1. starts from root of the tree 2. When a node is a non-promising node, moves back to the previous node or level 3. From non-promising node, first moves by BFS and then by DFS in backwards

Backward approach 1. starts from leaf and moves towards root 2. moves either by BFS or DFS in backwards, to reach the root

UNIT V

NP-HARD AND NP-COMPLETE PROBLEMS 85.Heuristic approach

A heuristic is a common-sense rule drawn from experience rather than from a mathematically proven assertion. 86.Approximation algorithms

Page 23: DAA Concepts

Approximation algorithms are often used to find approximate solutions to difficult problems of combinatorial optimization. Approximation algorithm run a length in level of sophistication, many of them use greedy algorithms based on some problem specific heuristic. 87. Accuracy of approximate solution

The accuracy of an approximate solution Sa to a problem minimizing some function ‘f’ can be quantified by the size of the relative error of this approximation

)(

)()()(

*

a

aa Sf

SfSfSre

S* is an exact solution to the problem

Since 1)(

)()(

*

Sf

SfSre a

a , the accuracy ratio )(

)()(

*Sf

SfSr a

a is used as a measure of

accuracy of Sa. The accuracy ration of approximate solution to maximization problem is

computed as )(

)()(

*

aa Sf

SfSr . The closer r(Sa) is to 1, the better the approximate solution is.

88. Performance Ratio The performance ratio is the principal metric for measuring the accuracy of such approximation algorithms. 89. Nearest Neighbor algorithm Nearest Neighbor is a simple greedy algorithm for approximating a slution to the Traveling salesperson problem. The performance ratio of this algorithm is unbounded above, even for the important subset of Euclidean graphs. Step 1: choose an arbitrary city as the start Step 2: repeat the following operation until all the cities have been visited; go to the

unvisited city nearest to the one visited last. (Ties can be broken arbitrary) Step 3: return to the starting city. 90. Twice-around-the-tree algorithm Twice-around-the-tree is an approximation algorithm for the Traveling salesperson problem with the performance ratio of 2 for Euclidean graphs. The algorithm is based on modifying a walk around a minimum spanning tree by shortcuts. Step 1: Construct a Minimum Spanning Tree of the graph corresponding to a given

instance of the Traveling salesperson problem. Step 2: Starting at an arbitrary vertex, perform a walk around the Minimum spanning tree

recording the vertices passed by Step 3: Scan the list of vertices obtained in Step 2 and eliminate from it all repeated

occurrences of the same vertex except the starting one at the end of the list. The vertices remaining on the list will form a Hamiltonian Circuit, which is the output of the algorithm.

91. Approximation Schemes Polynomial-time approximation schemes for discrete version of knapsack problem, which are parametric families of algorithms that allows to get approximations Sa

(k) with any predefined accuracy level:

Page 24: DAA Concepts

kSf

Sf ka 1

1)(

)(*

)(

for any instance of size n, where k is an integer parameter in the

range 0≤k≤n. 92. Complexity Theory Complexity theory seeks to classify problems according to their computational complexity. The principal split in between tractable & intractable problems – problems that can & cannot be solved in polynomial time, respectively. Complexity theory concentrates on decision problem, problem with Yes/No answers. 93. Halting Problem Halting problem is an example of an undecidable decision problem (ie) it cannot be solved by any algorithm. The halting problem is to determine for an arbitrary deterministic algorithm A and an input I whether algorithm A with input I ever terminates (or enters an infinite loop). This problem is undecidable. Hence there exists no algorithm to solve this problem. 94. Polynomial time

An algorithm solves the problem in polynomial time if its worst-case time efficiency belong to O(p(n)) where p(n) is a polynomial of the problem’s input size n. An algorithm is said to be solvable in polynomial time if the number of steps required to complete the algorithm for a given input is for some nonnegative integer , where is the complexity of the input. Polynomial-time algorithms are said to be "fast." Problems that can be solved in polynomial time are called Tractable, problems that cannot be solved in polynomial time are called intractable. 95. Class P P is the class of all decision problems than can be solved in polynomial time. Class P is a class of decision problems (problems which have Yes/No answers) that can be solved in polynomial time by deterministic algorithms. This class of problems is called Polynomial. 96. Non deterministic Algorithm A non-deterministic algorithm is a two- stage procedure that takes as its input an instance I of a decision problem and does the following: Non-deterministic stage (guessing): An arbitrary string S is generated that can be thought of as a candidate solution to the given instance I. Deterministic stage(verification): A deterministic algorithm takes both I and S as its input and outputs Yes if S represents a solution to instance I. Non-deterministic algorithms are said to solve decision problem iff for every Yes instance of the problem returns Yes on some execution. A non-deterministic algorithm is said to be non-deterministic polynomial if the time efficiency of its verification stage is polynomial. 97. Class NP

NP is the class of all decision problems whose randomly generated solutions can be verified in polynomial time. Class NP is the class of decision problems that can be solved by non-deterministic polynomial algorithms. This class of problems is called non-deterministic polynomial. 98. NP- Complete

Page 25: DAA Concepts

A decision problem D is said to be NP-complete if 1. it belongs to class NP 2. every problem in NP is polynomially reducible to D.

Decision version of a difficult combinatorial problem in NP-Complete. Examples. Traveling salesperson problem, Knapsack problem. It is not known whether P =NP or P is just a proper subset of NP. A discovery of a Polynomial-time algorithm for any of the thousands known NP-complete problem would imply that P=NP. 99. Polynomially-Reducible A decision problem D1 is said to be polynomially reducible to a decision problem D2 if there exists a function ‘t’ that transforms instanced of D1 to instances of D2 such that

1. ‘t’ maps all ‘yes’ instances of D1 to ‘yes’ instances of D2 and all ‘no’ instances of D1 to ‘no’ instances of D2. 2. ‘t’ is computable by a Polynomial-time algorithm

This definition implies that if a problem D1 is polynomially reducible to some problems D2 that can be solved in polynomial time, then problem D1 can also be solved in polynomial time. 100. Exponential time

In complexity theory, exponential time is the computation time of a problem where the time to complete the computation, m(n), is bounded by an exponential function of the problem size, n (i.e, as the size of the problem increases linearly, the time to solve the problem increases exponentially). Written mathematically, there exists k > 1 such that m(n) = Θ(kn) and there exists c such that m(n) = O(cn). 101. Reason for intractability

1. Arbitrary instance of intractable problems cannot be solved in a reasonable amount of time unless such instances are very small.

2. There is a huge difference between the running times in O(p(n)) for polynomials of drastically different degrees, there are very few useful polynomial-time algorithms with the degree of a polynomial higher than 3.

3. Polynomial functions posses many convenient properties both the sum & composition of two polynomials are always polynomials too.

4. Polynomial algorithm lead to the development of an extensive theory called computational complexity, which seeks to classify problems according to their inherent difficulty.

102. Undecidable Problems

Not every decision problems can be solved in polynomial time. Some decision problems cannot be solved at all by any algorithm. Such problems are called Undecidable. 103. Decision Problem Any problem for which the answer is either zero or one is called decision problem. 104. Optimization problem

Page 26: DAA Concepts

Any problem that involves the identification of an optimal (either minimum or maximum) value of a given cost function is known as an optimization problem. 105. Cook’s Theorem Cook's theorem, states that the Boolean satisfiability problem is NP-complete. That is, any problem in NP can be reduced in polynomial time by a deterministic Turing machine to a problem of determining whether a Boolean formula is satisfiable. 106. Satisfiability Problem The satisfiability problem is to determine whether a formula is true for some assignment of truth values to the variables. CNF-satisfiability problem for CNF (Conjunctive Normal Form) formulas.

Diagram of complexity classes provided that P ≠ NP. If P = NP, then all three classes are equal.

Relationship between P, NP, NP-Complete and NP-Hard problems.

P

NP Complete

NP

NP Hard