Spec Notes · Web view1.1 Teaching/ Instructional Methods and Aids (Content Delivery) The following Instructional methods have been employed during the course: a) Lecture method:

3.1 Algorithms 202

1. Course Instructional Materials / Enactment

1.1 Teaching/ Instructional Methods and Aids (Content Delivery)

The following Instructional methods have been employed during the course:

a) Lecture method: The primary mode of instruction was lectures that introduced the students to new concepts, models and their applications. The students were challenged to think about how and why the concepts, models and proof were developed. Students were guided to emulate the thought process of the researcher that led to the particular concept, model or proof technique. The Lecture material was designed such that it was stimulating and thought provoking. The instructor combined lectures with questions to involve students in the learning process and to check their comprehension.

Instructional aids used (Content Delivery)

Chalk or Marker Board,

b) Individualized learning: Written assignments help in organization of knowledge, absorption of facts and better preparation of examinations. It emphasizes on individual learner work and the method that helps both teaching and learning processes.

c) Group-learning techniques Case Study Quiz Assignments

Instructional aids used Chalk or Marker Board,

3.1 Algorithms 203

1.2 Lecture Notes

UNIT-I

Mathematical Logic

Statements and notations:A proposition or statement is a declarative sentence that is either true or false but not both. For instance, the following are propositions: “Paris is in France” (true), “London is in Denmark” (false), “2 < 4” (true), “4 = 7 (false)”.

However the following are not propositions: “what is your name?” (this is a question), “do your homework” (this is a command), “this sentence is false” (neither true nor false), “x is an even number” (it depends on what x represents), “Socrates” (it is not even a sentence). The truth or falsehood of a proposition is called its truth value.

Connectives:Connectives are used for making compound propositions. The main ones are the following (p and q represent

given propositions):

Name Represented MeaningNegation ¬p “not p”Conjunction p ∧ q “p and q”Disjunction p ∨ q “p or q (or both)”Exclusive Or p ⊕ q “either p or q, but not both”Implication p → q “if p then q”Biconditional p ↔ q “p if and only if q”

3.1 Algorithms 204Truth Tables:

Logical identity

Logical identity is an operation on one logical value, typically the value of a proposition that produces a value of true if its operand is true and a value of false if its operand is false.

The truth table for the logical identity operator is as follows:

Logical Identity

p p

T T

F F

Logical negation

Logical negation is an operation on one logical value, typically the value of a proposition that produces a value of true if its operand is false and a value of false if its operand is true.

The truth table for NOT p (also written as ¬p or ~p) is as follows:

Logical Negation

p ¬p

T F

F T

Binary operations

Truth table for all binary logical operators

Here is a truth table giving definitions of all 16 of the possible truth functions of 2 binary variables (P,Q are thus boolean variables):

3.1 Algorithms 205

P Q 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

T T F F F F F F F F T T T T T T T T

T F F F F F T T T T F F F F T T T T

F T F F T T F F T T F F T T F F T T

F F F T F T F T F T F T F T F T F T

where T = true and F = false.

Key: 0, false, Contradiction

1, NOR, Logical NOR

2, Converse nonimplication

3, ¬p, Negation

4, Material nonimplication

5, ¬q, Negation

6, XOR, Exclusive disjunction

7, NAND, Logical NAND

8, AND, Logical conjunction

9, XNOR, If and only if, Logical biconditional

10, q, Projection function

11, if/then, Logical implication

12, p, Projection function

13, then/if, Converse implication

14, OR, Logical disjunction

15, true, Tautology

3.1 Algorithms 206Logical operators can also be visualized using Venn diagrams.

Logical conjunction

Logical conjunction is an operation on two logical values, typically the values of two propositions, that produces a value of true if both of its operands are true.

The truth table for p AND q (also written as p ∧ q, p & q, or p q) is as follows:

Logical Conjunction

p q p ∧ q

T T T

T F F

F T F

F F F

In ordinary language terms, if both p and q are true, then the conjunction p ∧ q is true. For all other assignments of logical values to p and to q the conjunction p ∧ q is false.

It can also be said that if p, then p ∧ q is q, otherwise p ∧ q is p.

Logical disjunction

Logical disjunction is an operation on two logical values, typically the values of two propositions, that produces a value of true if at least one of its operands is true.

The truth table for p OR q (also written as p ∨ q, p || q, or p + q) is as follows:

3.1 Algorithms 207

Logical Disjunction

p q p ∨ q

T T T

T F T

F T T

F F F

Logical implication

Logical implication and the material conditional are both associated with an operation on two logical values, typically the values of two propositions, that produces a value of false just in the singular case the first operand is true and the second operand is false. The truth table associated with the material conditional if p then q (symbolized as p → q) and the logical implication p implies q (symbolized as p ⇒ q) is as follows:

Logical Implication

p q p → q

T T T

T F F

F T T

F F T

Logical equality

Logical equality (also known as biconditional) is an operation on two logical values, typically the values of two propositions, that produces a value of true if both operands are false or both operands are true.The truth table for p XNOR q (also written as p ↔ q ,p = q, or p ≡ q) is as follows:

3.1 Algorithms 208

Logical Equality

p q p ≡ q

T T T

T F F

F T F

F F T

Exclusive disjunction

Exclusive disjunction is an operation on two logical values, typically the values of two propositions, that produces a value of true if one but not both of its operands is true.The truth table for p XOR q (also written as p ⊕ q, or p ≠ q) is as follows:

Exclusive Disjunction

p q p ⊕ q

T T F

T F T

F T T

F F F

Logical NAND

The logical NAND is an operation on two logical values, typically the values of two propositions, that produces a value of false if both of its operands are true. In other words, it produces a value of true if at least one of its operands is false.The truth table for p NAND q (also written as p ↑ q or p | q) is as follows:

3.1 Algorithms 209

Logical NAND

p q p ↑ q

T T F

T F T

F T T

F F T

It is frequently useful to express a logical operation as a compound operation, that is, as an operation that is built up or composed from other operations. Many such compositions are possible, depending on the operations that are taken as basic or "primitive" and the operations that are taken as composite or "derivative".In the case of logical NAND, it is clearly expressible as a compound of NOT and AND.The negation of a conjunction: ¬(p ∧ q), and the disjunction of negations: (¬p) ∨ (¬q) can be tabulated as follows:

p q p ∧ q ¬(p ∧ q) ¬p ¬q (¬p) ∨ (¬q)

T T T F F F F

T F F T F T T

F T F T T F T

F F F T T T T

Logical NOR

The logical NOR is an operation on two logical values, typically the values of two propositions, that produces a value of true if both of its operands are false. In other words, it produces a value of false if at least one of its operands is true. ↓ is also known as the Peirce arrow after its inventor, Charles Sanders Peirce, and is a Sole sufficient operator.

The truth table for p NOR q (also written as p ↓ q or p ⊥ q) is as follows:

3.1 Algorithms 210

Logical NOR

p q p ↓ q

T T F

T F F

F T F

F F T

The negation of a disjunction ¬(p ∨ q), and the conjunction of negations (¬p) ∧ (¬q) can be tabulated as follows:

p q p ∨ q ¬(p ∨ q) ¬p ¬q (¬p) ∧ (¬q)

T T T F F F F

T F T F F T F

F T T F T F F

F F F T T T T

Inspection of the tabular derivations for NAND and NOR, under each assignment of logical values to the functional arguments p and q, produces the identical patterns of functional values for ¬(p ∧ q) as for (¬p) ∨ (¬q), and for ¬(p ∨ q) as for (¬p) ∧ (¬q). Thus the first and second expressions in each pair are logically equivalent, and may be substituted for each other in all contexts that pertain solely to their logical values.

This equivalence is one of De Morgan's laws.

The truth value of a compound proposition depends only on the value of its components. Writing F for “false” and T for “true”, we can summarize the meaning of the connectives in the following way:

p q ¬p p ∧ q p ∨ q p ⊕ q p → q p ↔ qT T F T T F T TT F F F T T F F

3.1 Algorithms 211F T T F T T T FF F T F F F T T

Note that represents a non-exclusive or, i.e., p q is true when any ofp, q is true and also when both are∨ ∨ true. On the other hand represents an exclusive or, i.e., ⊕ p ⊕ q is true only when exactly one of p and q is true.

Well formed formulas(wff):Not all strings can represent propositions of the predicate logic. Those which produce a proposition when their

symbols are interpreted must follow the rules given below, and they are called wffs(well-formed formulas) of the first order predicate logic. Rules for constructing Wffs A predicate name followed by a list of variables such as P(x, y), where P ispredicate name, and x and y are variables, is called an atomic formula.

A well formed formula of predicate calculus is obtained by using the following rules. 1. An atomic formula is a wff. 2. If A is a wff, then 7A is also a wff. 3. If A and B are wffs, then (A V B), (A ٨ B), (A → B) and (A D B). 4. If A is a wff and x is a any variable, then (x)A and ($x)A are wffs. 5. Only those formulas obtained by using (1) to (4) are wffs. Since we will be concerned with only wffs, we shall use the term formulas for wff. We shall follow the same conventions regarding the use of parentheses as was done in the case of statement formulas.

Wffs are constructed using the following rules: 1. True and False are wffs. 2. Each propositional constant (i.e. specific proposition), and each propositional variable (i.e. a variable

representing propositions) are wffs.

3. Each atomic formula (i.e. a specific predicate with variables) is a wff.

4. If A, B, and C are wffs, then so are A, (A B), (A B), (A B), and (A B).

5. If x is a variable (representing objects of the universe of discourse), and A is a wff, then so are x A and x A .

For example, "The capital of Virginia is Richmond." is a specific proposition. Hence it is a wff by Rule 2. Let B be a predicate name representing "being blue" and let x be a variable. Then B(x) is an atomic formula meaning "x is blue". Thus it is a wff by Rule 3. above. By applying Rule 5. to B(x), xB(x) is a wff and so is xB(x). Then by applying Rule 4. to them x B(x) x B(x) is seen to be a wff. Similarly if R is a predicate name representing "being round". Then R(x) is an atomic formula. Hence it is a wff. By applying Rule 4 to B(x) and R(x), a wff B(x) R(x) is obtained. In this manner, larger and more complex wffs can be constructed following the rules given above. Note, however, that strings that can not be constructed by using those rules are not wffs.

For example: xB(x)R(x), and B( x ) are NOT wffs, NOR are B( R(x) ), and B( x R(x) ) .

More examples: To express the fact that Tom is taller than John, we can use the atomic formula taller(Tom, John), which is a wff. This wff can also be part of some compound statement such as taller(Tom, John) taller(John, Tom), which is also a wff.

3.1 Algorithms 212If x is a variable representing people in the world, then taller(x,Tom), x taller(x,Tom), x taller(x,Tom), x y taller(x,y) are all wffs among others. However, taller( x,John) and taller(Tom Mary, Jim), for example, are NOT wffs.

Tautology, Contradiction, Contingency:A proposition is said to be a tautology if its truth value is T for any assignment of truth values to its components. Example: The proposition p ∨ ¬p is a tautology.A proposition is said to be a contradiction if its truth value is F for any assignment of truth values to its components. Example: The proposition p ∧ ¬p is a contradiction.A proposition that is neither a tautology nor a contradiction is called a contingency.

p ¬p p ∨ ¬p p ∧ ¬pT F T FT F T FF T T FF T T F

Equivalence Implication:We say that the statements r and s are logically equivalent if their truth tables are identical. For example the truth table of

shows that is equivalent to . It is easily shown that the statements r and s are equivalent if and only if is a tautology.

PredicatesPredicative logic:

A predicate or propositional function is a statement containing variables. For instance “x + 2 = 7”, “X is American”, “x < y”, “p is a prime number” are predicates. The truth value of the predicate depends on the value assigned to its variables. For instance if we replace x with 1 in the predicate “x + 2 = 7” we obtain “1 + 2 = 7”, which is false, but if we replace it with 5 we get “5 + 2 = 7”, which is true. We represent a predicate by a letter followed by the variables enclosed between parenthesis: P (x), Q(x, y), etc. An example for P (x) is a value of x for which P (x) is true. A counterexample is a value of x for which P (x) is false. So, 5 is an example for “x + 2 = 7”, while 1 is a counterexample. Each variable in a predicate is assumed to belong to a universe (or domain) of discourse, for instance in the predicate “n is an odd integer” ’n’ represents an integer, so the universe of discourse of n is the set of all integers. In “X is American” we may assume that X is a human being, so in this case the universe of discourse is the set of all humanbeings.

3.1 Algorithms 213

Free & Bound variables:

Let's now turn to a rather important topic: the distinction between free variable s and bound variables.

Have a look at the following formula:

The first occurrence of x is free, whereas the second and third occurrences of x are bound, namely by the first occurrence of the quantifier . The first and second occurrences of the variable y are also bound, namely by the second occurrence of the quantifier .

Informally, the concept of a bound variable can be explained as follows: Recall that quantifications are generally of the form:

or

where may be any variable. Generally, all occurences of this variable within the quantification are bound. But we have to distinguish two cases. Look at the following formula to see why:

1. may occur within another, embedded, quantification or , such as the in in our example. Then we say that it is bound by the quantifier of this embedded quantification (and so on, if there's another embedded quantification over within ).

2. Otherwise, we say that it is bound by the top-level quantifier (like all other occurences of in our example).

Here's a full formal simultaneous definition of free and bound:

1. Any occurrence of any variable is free in any atomic formula.2. No occurrence of any variable is bound in any atomic formula.

3. If an occurrence of any variable is free in or in , then that same occurrence is free in , , , and .

4. If an occurrence of any variable is bound in or in , then that same occurrence is bound in , , , . Moreover, that same occurrence is bound in and as well, for any choice of variable y.

5. In any formula of the form or (where y can be any variable at all in this case) the occurrence of y that immediately follows the initial quantifier symbol is bound.

6. If an occurrence of a variable x is free in , then that same occurrence is free in and , for any variable y distinct from x. On the other hand, all occurrences of x that are free in , are bound in and in .

3.1 Algorithms 214If a formula contains no occurrences of free variables we call it a sentence.

Rules of inference:

The two rules of inference are called rules P and T.

Rule P: A premise may be introduced at any point in the derivation.

Rule T: A formula S may be introduced in a derivation if s is tautologically implied by

any one or more of the preceding formulas in the derivation.

Before proceeding the actual process of derivation, some important list of implications and equivalences

are given in the following tables.

Implications

I1 P٨Q =>P } Simplification

I2 PQ٨ =>Q

I3 P=>PVQ } Addition

I4 Q =>PVQ

I5 7P => P→ Q

I6 Q => P→ Q

I7 7(P→Q) =>P

I8 7(P → Q) => 7Q

I9 P, Q => P ٨ Q

I10 7P, PVQ => Q ( disjunctive syllogism)

I11 P, P→ Q => Q ( modus ponens )

I12 7Q, P → Q => 7P (modus tollens )

I13 P → Q, Q → R => P → R ( hypothetical syllogism)

I14 P V Q, P → Q, Q → R => R (dilemma)

Equivalences

E1 77P <=>P

E2 P ٨ Q <=> Q ٨ P } Commutative laws

E3 P V Q <=> Q V P

E4 (P ٨ Q) ٨ R <=> P ٨ (Q ٨ R) } Associative laws

3.1 Algorithms 215

E5 (P V Q) V R <=> PV (Q V R)

E6 P ٨ (Q V R) <=> (P ٨ Q) V (P ٨ R) } Distributive laws

E7 P V (Q ٨ R) <=> (P V Q) ٨ (PVR)

E8 7(P ٨ Q) <=> 7P V7Q

E9 7(P V Q) <=>7P ٨ 7Q } De Morgan’s laws

E10 P V P <=> P

E11 P ٨ P <=> P

E12 R V (P ٨ 7P) <=>R

E13 R ٨ (P V 7P) <=>R

E14 R V (P V 7P) <=>T

E15 R ٨ (P ٨ 7P) <=>F

E16 P → Q <=> 7P V Q

E17 7 (P→ Q) <=> P ٨ 7Q

E18 P → Q <=> 7Q → 7P

E19 P → (Q → R) <=> (P ٨ Q) → R

E20 7(PD Q) <=> P D 7Q

E21 PDQ <=> (P → Q) ٨ (Q → P)

E22 (PDQ) <=> (P ٨ Q) V (7 P ٨ 7Q)

Example 1.Show that R is logically derived from P → Q, Q → R, and P

Solution. {1} (1) P → Q Rule P

{2} (2) P Rule P

{1, 2} (3) Q Rule (1), (2) and I11

{4} (4) Q → R Rule P

{1, 2, 4} (5) R Rule (3), (4) and I11.

Example 2.Show that S V R tautologically implied by ( P V Q) ٨ ( P → R) ٨ ( Q → S ).

Solution . {1} (1) P V Q Rule P

{1} (2) 7P → Q T, (1), E1 and E16

{3} (3) Q → S P

{1, 3} (4) 7P → S T, (2), (3), and I13

3.1 Algorithms 216

{1, 3} (5) 7S → P T, (4), E13 and E1

{6} (6) P → R P

{1, 3, 6} (7) 7S → R T, (5), (6), and I13

{1, 3, 6) (8) S V R T, (7), E16 and E1

Example 3. Show that 7Q, P→ Q => 7P

Solution . {1} (1) P → Q Rule P

{1} (2) 7P → 7Q T, and E 18

{3} (3) 7Q P

{1, 3} (4) 7P T, (2), (3), and I11 .

Example 4 .Prove that R ٨ ( P V Q ) is a valid conclusion from the premises PVQ ,

Q → R, P → M and 7M.

Solution . {1} (1) P → M P

{2} (2) 7M P

{1, 2} (3) 7P T, (1), (2), and I12

{4} (4) P V Q P

{1, 2 , 4} (5) Q T, (3), (4), and I10.

{6} (6) Q → R P

{1, 2, 4, 6} (7) R T, (5), (6) and I11

{1, 2, 4, 6} (8) R ٨ (PVQ) T, (4), (7), and I9.

There is a third inference rule, known as rule CP or rule of conditional proof.

Rule CP: If we can derives s from R and a set of premises , then we can derive R → S from the set of premises

alone.

Note. 1. Rule CP follows from the equivalence E10 which states that

( P ٨ R ) → S óP → (R → S).

2. Let P denote the conjunction of the set of premises and let R be any formula

The above equivalence states that if R is included as an additional premise and

S is derived from P ٨ R then R → S can be derived from the premises P alone.

3.1 Algorithms 217

3. Rule CP is also called the deduction theorem and is generally used if the

conclusion is of the form R → S. In such cases, R is taken as an additional

premise and S is derived from the given premises and R.

Example 5 .Show that R → S can be derived from the premises

P → (Q → S), 7R V P , and Q.

Solution. {1} (1) 7R V P P

{2} (2) R P, assumed premise

{1, 2} (3) P T, (1), (2), and I10

{4} (4) P → (Q → S) P

{1, 2, 4} (5) Q → S T, (3), (4), and I11

{6} (6) Q P

{1, 2, 4, 6} (7) S T, (5), (6), and I11

{1, 4, 6} (8) R → S CP.

Example 6.Show that P → S can be derived from the premises, 7P V Q, 7Q V R,

and R → S .

Solution.

{1} (1) 7P V Q P

{2} (2) P P, assumed premise

{1, 2} (3) Q T, (1), (2) and I11

{4} (4) 7Q V R P

{1, 2, 4} (5) R T, (3), (4) and I11

{6} (6) R → S P

{1, 2, 4, 6} (7) S T, (5), (6) and I11

{2, 7} (8) P → S CP

Example 7. ” If there was a ball game , then traveling was difficult. If they arrived on time, then traveling was not

difficult. They arrived on time. Therefore, there was no ball game”. Show that these statements constitute a valid

argument.

3.1 Algorithms 218

Solution. Let P: There was a ball game

Q: Traveling was difficult.

R: They arrived on time.

Given premises are: P → Q, R → 7Q and R conclusion is: 7P

{1} (1) P → Q P

{2} (2) R → 7Q P

{3} (3) R P

{2, 3} (4) 7Q T, (2), (3), and I11

{1, 2, 3} (5) 7P T, (2), (4) and I12

Consistency of premises:

Consistency

A set of formulas H1, H2, …, Hm is said to be consistent if their conjunction has the truth value T for some

assignment of the truth values to be atomic appearing in H1, H2, …, Hm.

Inconsistency

If for every assignment of the truth values to the atomic variables, at least one of the formulas H1, H2, … Hm is

false, so that their conjunction is identically false, then the formulas H1, H2, …, Hm are called inconsistent.

A set of formulas H1, H2, …, Hm is inconsistent, if their conjunction implies a contradiction, that is H1٨ H2٨

… ٨ Hm => R ٨ 7R

Where R is any formula. Note that R ٨ 7R is a contradiction and it is necessary and sufficient that H1, H2,

…,Hm are inconsistent the formula.

Indirect method of proof

In order to show that a conclusion C follows logically from the premises H1, H2,…, Hm, we assume that C is

false and consider 7C as an additional premise. If the new set of premises is inconsistent, so that they imply a

contradiction, then the assumption that 7C is true does not hold simultaneously with H1٨ H2 ٨ ….. ٨ Hm being

true. Therefore, C is true whenever H1٨ H2 ٨ ….. ٨ Hm is true. Thus, C follows logically from the premises H1,

H2 ….., Hm.

3.1 Algorithms 219

Example 8 Show that 7(P ٨ Q) follows from 7P٨ 7Q.

Solution.

We introduce 77 (P٨ Q) as an additional premise and show that this additional premise leads to a

contradiction.

{1} (1) 77(P٨ Q) P assumed premise

{1} (2) P٨ Q T, (1) and E1

{1} (3) P T, (2) and I1

{1} {4) 7P٨7Q P

{4} (5) 7P T, (4) and I1

{1, 4} (6) P٨ 7P T, (3), (5) and I9

Here (6) P٨ 7P is a contradiction. Thus {1, 4} viz. 77(P٨ Q) and 7P٨ 7Q leads

to a contradiction P ٨ 7P.

Example 9Show that the following premises are inconsistent.

1. If Jack misses many classes through illness, then he fails high school.

2. If Jack fails high school, then he is uneducated.

3. If Jack reads a lot of books, then he is not uneducated.

4. Jack misses many classes through illness and reads a lot of books.

Solution.

P: Jack misses many classes.

Q: Jack fails high school.

R: Jack reads a lot of books.

S: Jack is uneducated.

The premises are P→ Q, Q → S, R→ 7S and P٨ R

{1} (1) P→Q P

{2} (2) Q→ S P

{1, 2} (3) P → S T, (1), (2) and I13

{4} (4) R→ 7S P

{4} (5) S → 7R T, (4), and E18

{1, 2, 4} (6) P→7R T, (3), (5) and I13

{1, 2, 4} (7) 7PV7R T, (6) and E16

3.1 Algorithms 220

{1, 2, 4} (8) 7(P٨R) T, (7) and E8

{9} (9) P٨ R P

{1, 2, 4, 9) (10) (P٨ R) ٨ 7(P٨ R) T, (8), (9) and I9

The rules above can be summed up in the following table. The "Tautology" column shows how to interpret the notation of a given rule.

Rule of inference Tautology Name

Addition

Simplification

Conjunction

Modus ponens

Modus tollens

Hypothetical syllogism

Disjunctive syllogism

Resolution

Example 1

3.1 Algorithms 221Let us consider the following assumptions: "If it rains today, then we will not go on a canoe today. If we do not go on a canoe trip today, then we will go on a canoe trip tomorrow. Therefore (Mathematical symbol for "therefore" is

), if it rains today, we will go on a canoe trip tomorrow. To make use of the rules of inference in the above table we let p be the proposition "If it rains today", q be " We will not go on a canoe today" and let r be "We will go on a canoe trip tomorrow". Then this argument is of the form:

Example 2

Let us consider a more complex set of assumptions: "It is not sunny today and it is colder than yesterday". "We will go swimming only if it is sunny", "If we do not go swimming, then we will have a barbecue", and "If we will have a barbecue, then we will be home by sunset" lead to the conclusion "We will be home before sunset." Proof by rules of inference: Let p be the proposition "It is sunny this today", q the proposition "It is colder than yesterday", r the proposition "We will go swimming", s the proposition "We will have a barbecue", and t the proposition "We will be home by sunset". Then the hypotheses become and . Using our intuition we conjecture that the conclusion might be t. Using the Rules of Inference table we can proof the conjecture easily:

Step Reason

1. Hypothesis

2. Simplification using Step 1

3. Hypothesis

4. Modus tollens using Step 2 and 3

5. Hypothesis

6. s Modus ponens using Step 4 and 5

7. Hypothesis

8. t Modus ponens using Step 6 and 7

Proof of contradiction:

The "Proof by Contradiction" is also known as reductio ad absurdum, which is probably Latin for "reduce it to something absurd".

Here's the idea:

3.1 Algorithms 2221. Assume that a given proposition is untrue. 2. Based on that assumption reach two conclusions that contradict each other.

This is based on a classical formal logic construction known as Modus Tollens: If P implies Q and Q is false, then P is false. In this case, Q is a proposition of the form (R and not R) which is always false. P is the negation of the fact that we are trying to prove and if the negation is not true then the original proposition must have been true. If computers are not "not stupid" then they are stupid. (I hear that "stupid computer!" phrase a lot around here.)

Example:

Lets prove that there is no largest prime number (this is the idea of Euclid's original proof). Prime numbers are integers with no exact integer divisors except 1 and themselves.

1. To prove: "There is no largest prime number" by contradiction. 2. Assume: There is a largest prime number, call it p.

3. Consider the number N that is one larger than the product of all of the primes smaller than or equal to p. N=1*2*3*5*7*11...*p + 1. Is it prime?

4. N is at least as big as p+1 and so is larger than p and so, by Step 2, cannot be prime.

5. On the other hand, N has no prime factors between 1 and p because they would all leave a remainder of 1. It has no prime factors larger than p because Step 2 says that there are no primes larger than p. So N has no prime factors and therefore must itself be prime (see note below).

We have reached a contradiction (N is not prime by Step 4, and N is prime by Step 5) and therefore our original assumption that there is a largest prime must be false.

Note: The conclusion in Step 5 makes implicit use of one other important theorem: The Fundamental Theorem of Arithmetic: Every integer can be uniquely represented as the product of primes. So if N had a composite (i.e. non-prime) factor, that factor would itself have prime factors which would also be factors of N.

Automatic Theorem Proving:

Automatic Theorem Proving (ATP) deals with the development of computer programs that show that some statement (the conjecture) is a logical consequence of a set of statements (the axioms and hypotheses). ATP systems are used in a wide variety of domains. For examples, a mathematician might prove the conjecture that groups of order two are commutative, from the axioms of group theory; a management consultant might formulate axioms that describe how organizations grow and interact, and from those axioms prove that organizational death rates decrease with age; a hardware developer might validate the design of a circuit by proving a conjecture that describes a circuit's performance, given axioms that describe the circuit itself; or a frustrated teenager might formulate the jumbled faces of a Rubik's cube as a conjecture and prove, from axioms that describe legal changes to the cube's configuration, that the cube can be rearranged to the solution state. All of these are tasks that can be performed by an ATP system, given an appropriate formulation of the problem as axioms, hypotheses, and a conjecture.

The language in which the conjecture, hypotheses, and axioms (generically known as formulae) are written is a logic, often classical 1st order logic, but possibly a non-classical logic and possibly a higher order logic. These languages allow a precise formal statement of the necessary information, which can then be manipulated by an ATP system. This formality is the underlying strength of ATP: there is no ambiguity in the statement of the problem, as is

http://delphiforfun.org/programs/Math_Topics/proof_by_contradiction.htm#FundamentalTheorem

3.1 Algorithms 223often the case when using a natural language such as English. Users have to describe the problem at hand precisely and accurately, and this process in itself can lead to a clearer understanding of the problem domain. This in turn allows the user to formulate their problem appropriately for submission to an ATP system.

The proofs produced by ATP systems describe how and why the conjecture follows from the axioms and hypotheses, in a manner that can be understood and agreed upon by everyone, even other computer programs. The proof output may not only be a convincing argument that the conjecture is a logical consequence of the axioms and hypotheses, it often also describes a process that may be implemented to solve some problem. For example, in the Rubik's cube example mentioned above, the proof would describe the sequence of moves that need to be made in order to solve the puzzle.

ATP systems are enormously powerful computer programs, capable of solving immensely difficult problems. Because of this extreme capability, their application and operation sometimes needs to be guided by an expert in the domain of application, in order to solve problems in a reasonable amount of time. Thus ATP systems, despite the name, are often used by domain experts in an interactive way. The interaction may be at a very detailed level, where the user guides the inferences made by the system, or at a much higher level where the user determines intermediate lemmas to be proved on the way to the proof of a conjecture. There is often a synergetic relationship between ATP system users and the systems themselves:

The system needs a precise description of the problem written in some logical form, the user is forced to think carefully about the problem in order to produce an appropriate formulation and

hence acquires a deeper understanding of the problem,

the system attempts to solve the problem,

if successful the proof is a useful output,

if unsuccessful the user can provide guidance, or try to prove some intermediate result, or examine the formulae to ensure that the problem is correctly described,

and so the process iterates.

ATP is thus a technology very suited to situations where a clear thinking domain expert can interact with a powerful tool, to solve interesting and deep problems. Potential ATP users need not be concerned that they need to write an ATP system themselves; there are many ATP systems readily available for use.

3.1 Algorithms 224

UNIT-II

Sets and Relations and Functions

RELATIONS

Introduction

The elements of a set may be related to one another. For example, in the set of natural numbers there is the

‘less than’ relation between the elements. The elements of one set may also be related to the elements another set.

Binary Relation

A binary relation between two sets A and B is a rule R which decides, for any elements, whether a is in

relation R to b. If so, we write a R b. If a is not in relation R to b, then we shall write a /R b.

We can also consider a R b as the ordered pair (a, b) in which case we can define a binary relation from A to B as a

subset of A X B. This subset is denoted by the relation R.

In general, any set of ordered pairs defines a binary relation.

3.1 Algorithms 225

For example, the relation of father to his child is F = {(a, b) / a is the father of b}

In this relation F, the first member is the name of the father and the second is the name of the child.

The definition of relation permits any set of ordered pairs to define a relation.

For example, the set S given by

S = {(1, 2), (3, a), (b, a) ,(b, Joe)}

Definition

The domain D of a binary relation S is the set of all first elements of the ordered pairs in the relation.(i.e) D(S)=

{a / $ b for which (a, b) Є S}

The range R of a binary relation S is the set of all second elements of the ordered

pairs in the relation.(i.e) R(S) = {b / $ a for which (a, b) Є S}

For example

For the relation S = {(1, 2), (3, a), (b, a) ,(b, Joe)}

D(S) = {1, 3, b, b} and

R(S) = {2, a, a, Joe}

Let X and Y be any two sets. A subset of the Cartesian product X * Y defines a relation, say C. For any such relation

C, we have D( C ) Í X and R( C) Í Y, and the relation C is said to from X to Y. If Y = X, then C is said to be a

relation form X to X. In such case, c is called a relation in X. Thus any relation in X is a subset of X * X . The set X

* X is called a universal relation in X, while the empty set which is also a subset of X * X is called a void relation in

X.

For example

Let L denote the relation “less than or equal to” and D denote the relation

“divides” where x D y means “ x divides y”. Both L and D are defined on the

set {1, 2, 3, 4}

L = {(1, 1), (1, 2), (1, 3), (1, 4), (2, 2), (2, 3), (2, 4), (3, 3), (3, 4), (4, 4)}

D = {(1, 1), (1, 2), (1, 3), (1, 4), (2, 2), (2, 4), (3, 3), (4, 4)}

3.1 Algorithms 226

L Ç D = {(1, 1), (1, 2), (1, 3), (1, 4), (2, 2), (2, 4), (3, 3), (4, 4)}

= D

Properties of Binary Relations:

Definition: A binary relation R in a set X is reflexive if, for every x Є X, x R x,

That is (x, x) Є R, or R is reflexive in X ó (x) (x Є X ® x R x).

For example

The relation £ is reflexive in the set of real numbers.

The set inclusion is reflexive in the family of all subsets of a universal set.

The relation equality of set is also reflexive.

The relation is parallel in the set lines in a plane.

The relation of similarity in the set of triangles in a plane is reflexive.

Definition: A relation R in a set X is symmetric if for every x and y in X, whenever

x R y, then y R x.(i.e) R is symmetric in X ó (x) (y) (x Є X ٨ y Є X ٨ x R y ® y R x}

For example

The relation equality of set is symmetric.

The relation of similarity in the set of triangles in a plane is symmetric.

The relation of being a sister is not symmetric in the set of all people.

However, in the set females it is symmetric.

Definition: A relation R in a set X is transitive if, for every x, y, and z are in X,

whenever x R y and y R z , then x R z. (i.e) R is transitive in X ó (x) (y) (z) (x Є X٨ y Є X٨ z Є X ٨ x R y٨ y R z® x R z)

For example

The relations <, £, >, ³ and = are transitive in the set of real numbers

The relations Í, Ì, Ê, É and equality are also transitive in the family of sets.

The relation of similarity in the set of triangles in a plane is transitive.

3.1 Algorithms 227

Definition: A relation R in a set X is irreflexive if, for every x Є X , (x, x)ÏX.

For example

The relation < is irreflexive in the set of all real numbers.

The relation proper inclusion is irreflexive in the set of all nonempty subsets of a universal set.

Let X = {1, 2, 3} and S = {(1, 1), (1, 2), (3, 2), (2, 3), (3, 3)} is neither irreflexive nor reflexive.

Definition:A relation R in a set x is anti symmetric if , for every x and y in X,

whenever x R y and y R x, then x = y.

Symbolically,(x) (y) (x Є X ٨ y Є X ٨ x R y ٨ y R x ® x = y)

For example

The relations £ , ³ and = are anti symmetric

The relation Í is anti symmetric in set of subsets.

The relation “divides” is anti symmetric in set of real numbers.

Consider the relation “is a son of” on the male children in a family.Evidently the relation is not symmetric, transitive and reflexive.

The relation “ is a divisor of “ is reflexive and transitive but not symmetric on the set of natural numbers.

Consider the set H of all human beings. Let r be a relation “ is married to “ R is symmetric.

Let I be the set of integers. R on I is defined as a R b if a – b is an even number.R is an reflexive, symmetric and transitive.

Equivalence Relation:

Definition:A relation R in a set A is called an equivalence relation if

a R a for every i.e. R is reflexive

a R b => b R a for every a, b Є A i.e. R is symmetric

a R b and b R c => a R c for every a, b, c Є A, i.e. R is transitive.

For example

3.1 Algorithms 228

The relation equality of numbers on set of real numbers.

The relation being parallel on a set of lines in a plane.

Problem1: Let us consider the set T of triangles in a plane. Let us define a relation

R in T as R= {(a, b) / (a, b Є T and a is similar to b}

We have to show that relation R is an equivalence relation

Solution :

A triangle a is similar to itself. a R a

If the triangle a is similar to the triangle b, then triangle b is similar to the triangle a then a R b => b R a

If a is similar to b and b is similar to c, then a is similar to c (i.e) a R b and b R c => a R c.

Hence R is an equivalence relation.

Problem 2: Let x = {1, 2, 3, … 7} and R = {(x, y) / x – y is divisible by 3}

Show that R is an equivalence relation.

Solution: For any a Є X, a- a is divisible by 3,

Hence a R a, R is reflexive

For any a, b Є X, if a – b is divisible by 3, then b – a is also divisible by 3,

R is symmetric.

For any a, b, c Є, if a R b and b R c, then a – b is divisible by 3 and

b–c is divisible by 3. So that (a – b) + (b – c) is also divisible by 3,

hence a – c is also divisible by 3. Thus R is transitive.

Hence R is equivalence.

Problem3 Let Z be the set of all integers. Let m be a fixed integer. Two integers a and b are said to be

congruent modulo m if and only if m divides a-b, in which case we write a º b (mod m). This relation is called the

relation of congruence modulo m and we can show that is an equivalence relation.

Solution :

a - a=0 and m divides a – a (i.e) a R a, (a, a) Є R, R is reflexive .

a R b = m divides a-b

3.1 Algorithms 229

m divides b - a

b º a (mod m)

b R a

that is R is symmetric.

a R b and b R c => a ºb (mod m) and bº c (mod m)

o m divides a – b and m divides b-c

o a – b = km and b – c = lm for some k ,l Є z

o (a – b) + (b – c) = km + lm

o a – c = (k +l) m

o aº c (mod m)

o a R c

o R is transitive

Hence the congruence relation is an equivalence relation.

Equivalence Classes:

Let R be an equivalence relation on a set A. For any a ЄA, the equivalence class generated by a is the set of all

elements b Є A such a R b and is denoted [a]. It is also called the R – equivalence class and denoted by a Є A.

i.e., [a] = {b Є A / b R a}

Let Z be the set of integer and R be the relation called “congruence modulo 3”

defined by R = {(x, y)/ xÎ Z Ù yÎZ Ù (x-y) is divisible by 3}

Then the equivalence classes are

[0] = {… -6, -3, 0, 3, 6, …}

[1] = {…, -5, -2, 1, 4, 7, …}

[2] = {…, -4, -1, 2, 5, 8, …}

Composition of binary relations:

Definition:Let R be a relation from X to Y and S be a relation from Y to Z. Then the relation R o S is

given by R o S = {(x, z) / xÎX Ù z Î Z Ù y Î Y such that (x, y) Î R Ù (y, z) Î S)} is called the composite relation of R

3.1 Algorithms 230

and S.

The operation of obtaining R o S is called the composition of relations.

Example: Let R = {(1, 2), (3, 4), (2, 2)} and

S = {(4, 2), (2, 5), (3, 1),(1,3)}

Then R o S = {(1, 5), (3, 2), (2, 5)} and S o R = {(4, 2), (3, 2), (1, 4)}

It is to be noted that R o S ≠ S o R.

Also Ro(S o T) = (R o S) o T = R o S o T

Note: We write R o R as R2; R o R o R as R3 and so on.

Definition

Let R be a relation from X to Y, a relation R from Y to X is called the converse of R, where the ordered pairs

of Ř are obtained by interchanging the numbers in each of the ordered pairs of R. This means for x Î X and y Î Y,

that x R y ó y Ř x.

Then the relation Ř is given by R = {(x, y) / (y, x) Î R} is called the converse of R

Example:

Let R = {(1, 2),(3, 4),(2, 2)}

Then Ř = {(2, 1),(4, 3),(2, 2)}

Note: If R is an equivalence relation, then Ř is also an equivalence relation.

Definition Let X be any finite set and R be a relation in X. The relation

R+ = R U R2 U R3…in X. is called the transitive closure of R in X

Example: Let R = {(a, b), (b, c), (c, a)}.

Now R2 = R o R = {(a, c), (b, a), (c, b)}

R3 = R2 o R = {(a, a), (b, b), (c, c)}

R4 = R3 o R = {(a, b), (b, c), (c, a)} = R

R5= R3o R2 = R2 and so on.

Thus, R+ = R U R2 U R3 U R4 U…

= R U R2 U R3.

={(a, b),(b, c),(c, a),(a, c),(b, a),(c ,b),(a, b),(b, b),(c, c)}

3.1 Algorithms 231

We see that R+ is a transitive relation containing R. In fact, it is the smallest transitive relation containing

R.

Partial Ordering Relations:

Definition

A binary relation R in a set P is called partial order relation or partial ordering in P iff R is reflexive, anti

symmetric, and transitive.

A partial order relation is denoted by the symbol £., If £ is a partial ordering on P, then the ordered pair

(P, £) is called a partially ordered set or a poset.

Let R be the set of real numbers. The relation “less than or equal to ” or

o , is a partial ordering on R.

Let X be a set and r(X) be its power set. The relation subset, Í on X is partial ordering.

Let Sn be the set of divisors of n. The relation D means “divides” on Sn ,is partial ordering on Sn .

In a partially ordered set (P, £) , an element y Î P is said to cover an element x Î P

if x <y and if there does not exist any element z Î P such that x £ z and z £ y;

that is, y covers x Û (x < y Ù (x £ z £ y Þ x = z Ú z = y))

A partial order relation £ on a set P can be represented by means of a diagram known as a Hasse diagram or

partial order set diagram of (P, £). In such a diagram, each element is represented by a small circle or a dot. The

circle for x Î P is drawn below the circle for y Î P if x < y, and a line is drawn between x and y if y covers x.

If x < y but y does not cover x, then x and y are not connected directly by a single line.However, they are connected

through one or more elements of P.

Hasse Diagram:

A Hasse diagram is a digraph for a poset which does not have loops and arcs implied by the transitivity.

Example 10: For the relation {< a, a >, < a, b >, < a, c >, < b, b >, < b, c >, < c, c >} on set {a, b,c}, the Hasse

diagram has the arcs {< a, b >, < b, c >} as shown below.

3.1 Algorithms 232

Ex: Let A be a given finite set and r(A) its power set. Let Í be the subset relation on the elements of r(A). Draw

Hasse diagram of (r(A), Í) for A = {a, b, c}

Functions:

Introduction

A function is a special type of relation. It may be considered as a relation in which each element of the

domain belongs to only one ordered pair in the relation. Thus a function from A to B is a subset of A X B having the

property that for each a ЄA, there is one and only one b Є B such that (a, b) Î G.

3.1 Algorithms 233

Definition

Let A and B be any two sets. A relation f from A to B is called a function if for every a Є A there is a unique b Є

B such that (a, b) Є f .

Note that the definition of function requires that a relation must satisfy two additional conditions in order to

qualify as a function.

The first condition is that every a Є A must be related to some b Є B, (i.e) the domain of f must be A and not

merely subset of A. The second requirement of uniqueness can be expressed as (a, b) Є f ٨ (b, c) Є f => b = c

Intuitively, a function from a set A to a set B is a rule which assigns to every element of A, a unique element of B. If

a ЄA, then the unique element of B assigned to a under f is denoted by f (a).The usual notation for a function f from

A to B is f: A® B defined by a ® f (a) where a Є A, f(a) is called the image of a under f and a is called pre image of

f(a).

Let X = Y = R and f(x) = x2 + 2. Df = R and Rf Í R.

Let X be the set of all statements in logic and let Y = {True, False}.

A mapping f: X®Y is a function.

A program written in high level language is mapped into a machine language by a compiler. Similarly, the output from a compiler is a function of its input.

Let X = Y = R and f(x) = x2 is a function from X ® Y,and g(x2) = x is not a function from X ® Y.

A mapping f: A ® B is called one-to-one (injective or 1 –1) if distinct elements of A are mapped into distinct

elements of B. (i.e) f is one-to-one if

a1 = a2 => f (a1) = f(a2) or equivalently f(a1) ¹ f(a2) => a1 ¹ a2

For example, f: N ® N given by f(x) = x is 1-1 where N is the set of a natural numbers.

A mapping f: A® B is called onto (surjective) if for every b Є B there is an a Є A such that f (a) = B. i.e. if

every element of B has a pre-image in A. Otherwise it is called into.

For example, f: Z®Z given by f(x) = x + 1 is an onto mapping.

A mapping is both 1-1 and onto is called bijective

.

For example f: R®R given by f(x) = X + 1 is bijective.

3.1 Algorithms 234

Definition: A mapping f: R® b is called a constant mapping if, for all aÎA, f (a) = b,

a fixed element.

For example f: Z®Z given by f(x) = 0, for all x ÎZ is a constant mapping.

Definition

A mapping f: A®A is called the identity mapping of A if f (a) = a,

for all aÎA. Usually it is denoted by IA or simply I.

Composition of functions:

If f: A®B and g: B®C are two functions, then the composition of functions f and g, denoted by g o f, is the

function is given by g o f : A®C and is given by

g o f = {(a, c) / a Є A ٨ c Є C ٨ $bÎ B ': f(a)= b ٨ g(b) = c}

and (g of)(a) = ((f(a))

Example 1: Consider the sets A = {1, 2, 3},B={a, b} and C = {x, y}.

Let f: A® B be defined by f (1) = a ; f(2) = b and f(3)=b and

Let g: B® C be defined by g(a) = x and g(b) = y

(i.e) f = {(1, a), (2, b), (3, b)} and g = {(a, x), (b, y)}.

Then g o f: A®C is defined by

(g of) (1) = g (f(1)) = g(a) = x

(g o f) (2) = g (f(2)) = g(b) = y

(g o f) (3) = g (f(3)) = g(b) = y

i.e., g o f = {(1, x), (2, y),(3, y)}

If f: A® A and g: A®A, where A= {1, 2, 3}, are given by

f = {(1, 2), (2, 3), (3, 1)} and g = {(1, 3), (2, 2), (3, 1)}

Then g of = {(1, 2), (2, 1), (3, 3)}, fog= {(1, 1), (2, 3), (3, 2)}

f of = {(1, 3), (2, 1), (3, 2)} and gog= {(1, 1), (2, 2), (3, 3)}

3.1 Algorithms 235

Example 2: Let f(x) = x+2, g(x) = x – 2 and h(x) = 3x for x Î R, where R is the set of real numbers.

Then f o f = {(x, x+4)/xÎ R}

f o g = {(x, x)/ x Î X}

g o f = {(x, x)/ xÎ X}

g o g = {(x, x-4)/x Î X}

h o g = {(x,3x-6)/ x Î X}

h o f = {(x, 3x+6)/ x Î X}

Inverse functions:Let f: A® B be a one-to-one and onto mapping. Then, its inverse, denoted by f -1 is given by f -1 = {(b, a) / (a, b) Î

f} Clearly f-1: B® A is one-to-one and onto.

Also we observe that f o f -1 = IB and f -1o f = IA.

If f -1 exists then f is called invertible.

For example:Let f: R ®R be defined by f(x) = x + 2

Then f -1: R® R is defined by f -1(x) = x - 2

Theorem: Let f: X ®Y and g: Y ® Z be two one to one and onto functions. Then gof is also one to one and onto

function.

Proof

Let f:X ® Y g : Y ® Z be two one to one and onto functions. Let x1, x2 Î X

g o f (x1) = g o f(x2),

g (f(x1)) = g(f(x2)),

g(x1) = g(x2) since [f is 1-1]

x1 = x2 since [ g is 1-1}

so that gof is 1-1.

3.1 Algorithms 236

By the definition of composition, gof : X ® Z is a function.

We have to prove that every element of z Î Z an image element for some x Î X

under gof.

Since g is onto $ y ÎY ': g(y) = z and f is onto from X to Y,

$ x ÎX ': f(x) = y.

Now, gof (x) = g ( f ( x))

= g(y) [since f(x) = y]

= z [since g(y) = z]

which shows that gof is onto.

Theorem (g o f) -1 = f -1 o g -1

(i.e) the inverse of a composite function can be expressed in terms of the

composition of the inverses in the reverse order.

Proof.

f: A ® B is one to one and onto.

g: B ® C is one to one and onto.

gof: A ® C is also one to one and onto.

Þ (gof) -1: C ® A is one to one and onto.

Let a Î A, then there exists an element b Î b such that f (a) = b Þ a = f-1 (b).

Now b Î B Þ there exists an element c Î C such that g (b) = c Þ b = g -1(c).

Then (gof)(a) = g[f(a)] = g(b) = c Þ a = (gof) -1(c). …….(1)

(f -1 o g-1) (c) = f -1(g -1 (c)) = f -1(b) = a Þ a = (f -1 o g -1)( c ) ….(2)

Combining (1) and (2), we have

(gof) -1 = f -1 o g -1

Theorem: If f: A ® B is an invertible mapping , then

f o f -1 = I B and f-1 o f = IA

Proof: f is invertible, then f -1 is defined by f(a) = b ó f-1(b) = a

where a Î A and bÎ B .

Now we have to prove that f of -1 = IB .

Let bÎ B and f -1(b) = a, a Î A

3.1 Algorithms 237

then fof-1(b) = f(f-1(b))

= f(a) = b

therefore f o f -1 (b) = b " b Î B => f o f -1 = IB

Now f -1 o f(a) = f -1 (f(a)) = f -1 (b) = a

therefore f -1 o f(a) = a " a Î A => f -1 o f = IA.

Hence the theorem.

UNIT-IIIALGORITHMS,MATHEMATICAL INDUCTION AND RECURSION

AGORITHMS

The term algorithm is a corruption of the name al-Khowarizmi, a mathematician of the ninth century, whose book on Hindu numerals is the basis of modern decimal notation. Originally, the word algorism was used for the rules for performing arithmetic using decimal notation. Algorism evolved into the word algorithm by the eighteenth century. With the growing interest in computing machines, the concept of an algorithm was given a more general meaning, to include all definite procedures for solving problems, not just the procedures for performing arithmetic. (We will discuss algorithms for performing arithmetic with integers in Chapter 4.)

In this book, we will discuss algorithms that solve a wide variety of problems. In this section we will use the problem of finding the largest integer in a finite sequence of integers to illustrate the concept of an algorithm and the properties algorithms have. Also, we will describe algorithms for locating a particular element in a finite set. In subsequent sections, procedures for finding the greatest common divisor of two integers, for finding the shortest path between two points in a network, for multiplying matrices, and so on, will be discussed.

EXAMPLE 1 Describe an algorithm for finding the maximum (largest) value in a finite sequence of integers.

Extra

EXAMPLE 1 Describe an algorithm for finding the maximum (largest) value in a finite sequence of integers Solution of Example 1: We perform the following steps.

1. Set the temporary maximum equal to the first integer in the sequence. (The temporary maximum will be the largest integer examined at any stage of the procedure.)

2. Compare the next integer in the sequence to the temporary maximum, and if it is larger than the temporary maximum, set the temporary maximum equal to this integer.

3. Repeat the previous step if there are more integers in the sequence.

4. Stop when there are no integers left in the sequence. The temporary maximum at this point is the largest integer in the sequence. ◂

solving a problem.An algorithm is a finite sequence of precise instructions for performing a computation or for

≤

≤≤

if max < ai then max := aireturn max{max is the largest

element}

for i := 2 to n

procedure max(a1, a2, … , an: integers)max := a1

ALGORITHM 1 Finding the Maximum Element in a Finite Sequence.

This algorithm first assigns the initial term of the sequence, a1, to the variable max. The “for” loop is used to successively examine terms of the sequence. If a term is greater than the

current value of max, it is assigned to be the new value of max. The algorithm terminates after all terms have been examined. The value of max on termination is the maximum

element in the sequence. To gain insight into how an algorithm works it is useful to construct a trace that shows its steps when given specific input. For instance, a trace of Algorithm 1

with input 8, 4, 11, 3, 10 begins with the algorithm setting max to 8, the value of the initial term. It then compares 4, the second term, with 8, the current value of max. Because 4 8,

max is unchanged. Next, the algorithm compares the third term, 11, with 8, the current value of max. Because 8 < 11, max is set equal to 11. The algorithm then compares 3, the fourth

term, and 11, the current value of max. Because 3 11, max is unchanged. Finally, the algorithm compares 10, the first term, and 11, the current value of max. As 10

11, max remains unchanged. Because there are five terms, we have n = 5. So after examining 10, the last term, the algorithm terminates, with max = 11.

When it terminates, the algorithms reports that 11 is the largest term in the sequence.

PROPERTIES OF ALGORITHMS There are several properties that algorithms generally share. They are useful to keep in mind when algorithms are described. These properties are:

▶ Input. An algorithm has input values from a specified set.▶ Output. From each set of input values an algorithm produces output values from a

specified set. The output values are the solution to the problem.▶ Definiteness. The steps of an algorithm must be defined precisely.

▶ Correctness. An algorithm should produce the correct output values for each set of input values.

▶ Finiteness. An algorithm should produce the desired output after a finite (but perhaps large) number of steps for any input in the set.

▶ Effectiveness. It must be possible to perform each step of an algorithm exactly and in a finite amount of time.

≠

Tor

HE LINEAR SEARCH linear

searchx = a1, the solution is the location of a , namely, 1. When x ≠ a , compare x with a . If1 1 2

▶ Generality. The procedure should be applicable for all problems of the desired form, not just for a particular set of input values.

EXAMPLE 2 Show that Algorithm 1 for finding the maximum element in a finite sequence of integers has all the properties listed.

Solution: The input to Algorithm 1 is a sequence of integers. The output is the largest integer in the sequence. Each step of the algorithm is precisely defined, because only assignments, a finite loop, and conditional statements occur. To show that the algorithm is correct, we must show that when the algorithm terminates, the value of the variable max equals the maximum of the terms of the sequence. To see this, note that the initial value of max is the first term of the sequence; as successive terms of the sequence are examined, max is updated to the value of a term if the term exceeds the maximum of the terms previously examined. This (informal) argument shows that when all the terms have been examined, max equals the value of the largest term. (A rigorous proof of this requires the use of mathematical induction, a proof technique developed in Section 5.1.) The algorithm uses a finite number of steps, because it terminates after all the integers in the sequence have been examined. The algorithm can be carried out in a finite amount of time because each step is either a comparison or an assignment, there are a finite number of thesesteps, and each of these two operations takes a finite amount of time. Finally, Algorithm 1 is general, because it can be used to find the maximum of any finite sequence of integers. ◂

Searching AlgorithmsThe problem of locating an element in an ordered list occurs in many contexts. For instance, a program that checks the spelling of words searches for them in a dictionary, which is just an ordered list of words. Problems of this kind are called searching problems. We will discuss several algorithms for searching in this section. We will study the number of steps used by each of these algorithms in Section 3.3.

The general searching problem can be described as follows: Locate an element x in a list of distinct elements a1, a2, … , an, or determine that it is not in the list. The solution to this search problem is the location of the term in the list that equals x (that is, i is the solution if x = ai) and is 0 if x is not in the list.

The first algorithm that we will present is called the

, Wh

seen

quential search, algorithm. The linear search algorithm begins by comparing x and

a1.x = a2, the solution is the location of a2, namely, 2. When x a2, compare x with a3. ContinueLinks this process, comparing x successively with each term of the list until a match is found,

where the solution is the location of that term, unless no match occurs. If the entire list has been searched without locating x, the solution is 0. The pseudocode for the linear search algorithm is displayed as Algorithm 2.

return location{location is the subscript of the term that equals x, or is0 if x is not found}

else location := 0

i := i + 1

if i ≤ n then location := i

i := 1while (i ≤ n and x ≠ ai)

procedure linear search(x: integer, a1, a2, … , an: distinct integers)

ALGORITHM 2 The Linear Search Algorithm.

⌈x⌉⌈ ⌉

⌊ ⌋ ⌊ ⌋

Links

THE BINARY SEARCH We will now consider another searching algorithm. This algorithm can be used when the list has terms occurring in order of increasing size (for instance: if the terms are numbers, they are listed from smallest to largest; if they are words, they are listed in lexicographic, or alphabetic, order). This second searching algorithm is called the binary search algorithm. It proceeds by comparing the element to be located to the middle term of the list. The list is then split into two smaller sublists of the same size, or where one of these smaller lists has one fewer term than the other. The search continues by restricting the search to the appropriate sublist based on the comparison of the element to be located and the middle term. In Section 3.3, it will be shown that the binary search algorithm is much more e cient than the linear search algorithm. Example 3 demonstrates how a binary searchffi works.

EXAMPLE 3 To search for 19 in the list

1 2 3 5 6 7 8 10 12 13 15 16 18 19 20 22,

first split this list, which has 16 terms, into two smaller lists with eight terms each,

namely, 1 2 3 5 6 7 8 10 12 13 15 16 18 19 20 22.

Then, compare 19 and the largest term in the first list. Because 10 < 19, the search for 19 can be restricted to the list containing the 9th through the 16th terms of the original list. Next, split this list, which has eight terms, into the two smaller lists of four terms each, namely,

12 13 15 16 18 19 20 22.

Because 16 < 19 (comparing 19 with the largest term of the first list) the search is restricted to the second of these lists, which contains the 13th through the 16th terms of the original list. The list 18 19 20 22 is split into two lists, namely,

18 19 20 22.

Because 19 is not greater than the largest term of the first of these two lists, which is also 19, the search is restricted to the first list: 18 19, which contains the 13th and 14th terms of the original list. Next, this list of two terms is split into two lists of one term each: 18 and 19. Because 18 < 19, the search is restricted to the second list: the list containing the 14th term of the list,which is 19. Now that the search has been narrowed down to one term, a comparison is made, and 19 is located as the 14th term in the original list. ◂

We now specify the steps of the binary search algorithm. To search for the integer x in the list a1, a2, … , an, where a1 < a2 < ⋯ < an, begin by comparing x with the middle term am of the list, where m = (n + 1)∕2 . (Recall that x is the greatest integer not exceeding x.) If x > am, the search for x is restricted to the second half of the list, which is am+1, am+2, … , an. If x is not greater than am, the search for x is restricted to the first half of the list, which is a1, a2, … , am.

The search has now been restricted to a list with no more than n∕2 elements. (Recall that is the smallest integer greater than or equal to x.) Using the same procedure, compare x to

the middle term of the restricted list. Then restrict the search to the first or second half of the list. Repeat this process until a list with one term is obtained. Then determine whether this term is x. Pseudocode for the binary search algorithm is displayed as Algorithm 3.

Algorithm 3 proceeds by successively narrowing down the part of the sequence being searched. At any given stage only the terms from ai to aj are under consideration. In other words, i and j are the smallest and largest subscripts of the remaining terms, respectively. Al- gorithm 3 continues narrowing the part of the sequence being searched until only one term of the sequence remains. When this is done, a comparison is made to see whether this term equals x.

SortingSuppose that we have a list of elements of a set. Furthermore, suppose that we have a

way to order elements of the set. (The notion of ordering elements of sets will be discussed in detail in Section 9.6.) Sorting is putting these elements into a list in which the elements are in increasing order. For instance, sorting the list 7, 2, 1, 4, 5, 9 produces the list 1, 2, 4, 5, 7, 9. Sorting the list d, h, c, a, f (using alphabetical order) produces the list a, c, d, f, h.

There are many reasons why sorting algorithms interest computer scientists and mathemati- cians. Among these reasons are that some algorithms are easier to implement, some algorithms are more e cient (either in general, or when given input with certainffi characteristics, such as lists slightly out of order), some algorithms take advantage of particular computer architec- tures, and some algorithms are particularly clever. In this section we will introduce two sorting algorithms, the bubble sort and the insertion sort. Two other sorting algorithms, the selection

return location{location is the subscript i of the term ai equal to x, or0 if x is not found}

else location := 0

∕2⌋=

⌊else j := mif x = ai then location

:= i

i + j)

(m :if x > am then i := m + 1

while i < j

j := n {j is right endpoint of search interval}

procedure binary search (x: integer, a1, a2, … , an: increasing integers)i := 1{i is left endpoint of search interval}

ALGORITHM 3 The Binary Search Algorithm.

Links

sort and the binary insertion sort, are introduced in the exercises, and the shaker sort is introduced in the Supplementary Exercises. In Section 5.4 we will discuss the merge sort and introduce the quick sort in the exercises in that section; the tournament sort is introduced in the exercise set in Section 11.2. We cover sorting algorithms both because sorting is an important problem and because these algorithms can serve as examples for many important concepts.

THE BUBBLE SORT The bubble sort is one of the simplest sorting algorithms, but not one of the most e cient. It puts a list into increasing order ffi by successively comparing adjacent elements, interchanging them if they are in the wrong order. To carry out the bubble sort, we perform the basic operation, that is, interchanging a larger element with a smaller one following it, starting at the beginning of the list, for a full pass. We iterate this procedure until the sort is complete. Pseudocode for the bubble sort is given as Algorithm 4. We can imagine the elements in the list placed in a column. In the bubble sort, the smaller elements “bubble” to the top as they are interchanged with larger elements. The larger elements “sink” to the bottom. This is illustrated in Example 4.

EXAMPLE 4 Use the bubble sort to put 3, 2, 4, 1, 5 into increasing order.

Solution: The steps of this algorithm are illustrated in Figure 1. Begin by comparing the first two elements, 3 and 2. Because 3 > 2, interchange 3 and 2, producing the list 2, 3, 4, 1, 5. Because3 < 4, continue by comparing 4 and 1. Because 4 > 1, interchange 1 and 4, producing the list2, 3, 1, 4, 5. Because 4 < 5, the first pass is complete. The first pass guarantees that the largest element, 5, is in the correct position.

The second pass begins by comparing 2 and 3. Because these are in the correct order, 3 and 1 are compared. Because 3 > 1, these numbers are interchanged, producing 2, 1, 3, 4, 5. Because 3 < 4, these numbers are in the correct order. It is not necessary to do any more comparisons for this pass because 5 is already in the correct position. The second pass guarantees that thetwo largest elements, 4 and 5, are in their correct positions.

The third pass begins by comparing 2 and 1. These are interchanged because 2 > 1, producing 1, 2, 3, 4, 5. Because 2 < 3, these two elements are in the correct order. It is not necessary to do any more comparisons for this pass because 4 and 5 are already in the correct posi-tions. The third pass guarantees that the three largest elements, 3, 4, and 5, are in their correct positions.

The fourth pass consists of one comparison, namely, the comparison of 1 and 2. Because 1 < 2, these elements are in the correct order. This completes the bubble sort. ◂

First pass

3 2 2 2 Second pass

2 2 2

2 3 3 3 3 3 14 4 4 1 1 1 31 1 1 4 4 4 45 5 5 5 5 5 5

: an interchange

: pair in correct order numbers in color

guaranteed to be in correct order

Third pass

2 1 Fourth pass 1

1 2 2345

345

345

FIGURE 1 The steps of a bubble sort.

aj−k :=

aj−k−1ai := m

for k := 0 to j − i − 1

{a1, … , an is in increasing order}

i := i + 1m :=

aj

i := 1

while aj > ai

procedure insertion sort(a1, a2, … , an: real numbers with n ≥ 2)for j := 2 to n

ALGORITHM 5 The Insertion Sort.

Links

THE INSERTION SORT The insertion sort is a simple sorting algorithm, but it is usually not the most e cient. ffi To sort a list with n elements, the insertion sort begins with the second element. The insertion sort compares this second element with the first element and inserts it before the first element if it does not exceed the first element and after the first element if it exceeds the first element. At this point, the first two elements are in the correct order. The third element is then compared with the first element, and if it is larger than the first element, it is compared with the second element; it is inserted into the correct position among the first three elements.

In general, in the jth step of the insertion sort, the jth element of the list is inserted into the correct position in the list of the previously sorted j − 1 elements. To insert the jth element in the list, a linear search technique is used (see Exercise 45); the jth element is successively compared with the already sorted j − 1 elements at the start of the list until the first element that is not less than this element is found or until it has been compared with all j − 1 elements; the jth element is inserted in the correct position so that the first j elements are sorted. The algorithm continues until the last element is placed in the correct position relative to the already sorted list of the first n − 1 elements. The insertion sort is described in pseudocode in Algorithm 5.

EXAMPLE 5 Use the insertion sort to put the elements of the list 3, 2, 4, 1, 5 in increasing order.

Solution: The insertion sort first compares 2 and 3. Because 3 > 2, it places 2 in the first position, producing the list 2, 3, 4, 1, 5 (the sorted part of the list is shown in color). At this point, 2 and 3 are in the correct order. Next, it inserts the third element, 4, into the already sorted part of the list by making the comparisons 4 > 2 and 4 > 3. Because 4 > 3, 4 remains in the third position. At this point, the list is 2, 3, 4, 1, 5 and we know that the ordering of the first three elementsis correct. Next, we find the correct place for the fourth element, 1, among the already sorted elements, 2, 3, 4. Because 1 < 2, we obtain the list 1, 2, 3, 4, 5. Finally, we insert 5 into thecorrect position by successively comparing it to 1, 2, 3, and 4. Because 5 > 4, it stays at the end of the list, producing the correct order for the entire list. ◂

if aj > aj+1 then interchange aj and aj+1{a1, … , an is in increasing

order}

for j := 1 to n − i

procedure bubblesort(a1, … , an : real numbers with n ≥ 2)for i := 1 to n − 1

ALGORITHM 4 The Bubble Sort.

e y e

e c e y e y es = 4

e y e

j := j + 1if j > m then print “s is a valid

shift”

while ( j ≤ m and ts+j = pj)

j := 1for s := 0 to n − m

procedure string match (n, m: positive integers, m ≤ n, t1, t2, … , tn, p1, p2, … , pm: characters)

ALGORITHM 6 Naive String Matcher.

String MatchingAlthough searching and sorting are the most commonly encountered problems in computer science, many other problems arise frequently. One of these problems asks where a particular string of characters P, called the pattern, occurs, if it does, within another string T, called the text. For instance, we can ask whether the pattern 101 can be found within the string 11001011. By inspection we can see that the pattern 101 occurs within the text 11001011 at a shift of four characters, because 101 is the string formed by the fifth, sixth, and seventh characters of the text. On the other hand, the pattern 111 does not occur within the text 110110001101.

Finding where a pattern occurs in a text string is called string matching. String matching plays an essential role in a wide variety of applications, including text editing, spam filters, systems that look for attacks in a computer network, search engines, plagiarism detection, bioinformatics, and many other important applications. For example, in text editing, the string matching problem arises whenever we need to find all occurrences of a string so that we can replace this string with a di erentff string. Search engines look for matching of search keywords with words on web pages. Many problems in bioinformatics arise in the study of DNA molecules, which are made up of four bases: thymine (T), adenine (A), cytosine (C), and guanine (G). The process of DNA sequencing is the determination of the order of the four bases in DNA. This leads to string matching problems involving strings made up from the four letters T, A, C, and G. For instance, we can ask whether the pattern CAG occurs in the text CATCACAGAGA. The answer is yes, because it occurs with a shift of five characters. Solving questions about the genome requiresthe use of e cient algorithms ffi

9for string matching, especially because a string representing a

human genome is about 3 × 10 characters long.

theWe will now describe a brute force algorithm, Algorithm 6, for string matching, callednaive string matcher. The input to this algorithm is the pattern we wish to match, P =

p1p2 … pm, and the text, T = t1t2 … tn. When this pattern begins at position s + 1 in the text T, we say that P occurs with shift s in T, that is, when ts+1 = p1, ts+2 = p2, … , ts+m = pm. To find all valid shifts, the naive string matcher runs through all possible shifts s from s = 0 to s = n − m, checking whether s is a valid shift. In Figure 2, we display the operation of Algorithm 6 when it is used to search for the pattern P = eye in the text T = eceyeye.

e c e y e y e

s = 0

s = 3

FIGURE 2

s = 1

The steps of the naive string matcher with P = eye in T = eceyeye. Matches

are identified with a solid line and mismatches with a jagged line. The algorithm finds two valid shifts, s = 2 and s = 4.

e c e y e y e

e y e

e c e y e y es = 2

e y ee y ee c e y e y e

“Greed is good ... Greed is right, greed works.

Greed clarifies ...” – spoken by the character Gordon Gecko in the film Wall Street.

Links

You have to prove that a greedy algorithm always finds an optimal solution.

Many other string matching algorithms have been developed besides the naive string matcher. These algorithms use a surprisingly wide variety of approaches to make them more e - cientffi than the naive string matcher. To learn more about these algorithms, consult [CoLeRiSt09], as well as books on algorithms in bioinformatics.

Greedy AlgorithmsMany algorithms we will study in this book are designed to solve optimization problems. The goal of such problems is to find a solution to the given problem that either minimizes or maximizes the value of some parameter. Optimization problems studied later in this text include finding a route between two cities with least total mileage, determining a way to encode messages using the fewest bits possible, and finding a set of fiber links between network nodes using the least amount of fiber.

Surprisingly, one of the simplest approaches often leads to a solution of an optimization problem. This approach selects the best choice at each step, instead of considering all sequences of steps that may lead to an optimal solution. Algorithms that make what seems to be the “best” choice at each step are called greedy algorithms. Once we know that a greedy algorithm finds a feasible solution, we need to determine whether it has found an optimal solution. (Note that we call the algorithm “greedy” whether or not it finds an optimal solution.) To do this, we either prove that the solution is optimal or we show that there is a counterexample where the algorithm yields a nonoptimal solution. To make these concepts more concrete, we will consider the cashier’s algorithm that makes change using coins. (This algorithm is called the cashier’s algorithm because cashiers often used this algorithm for making change in the days before cash registers became electronic.)

EXAMPLE 6 Consider the problem of making n cents change with quarters, dimes, nickels, and pennies, and using the least total number of coins. We can devise a greedy algorithm for making change for n cents by making a locally optimal choice at each step; that is, at each step we choose the coin of the largest denomination possible to add to the pile of change without exceeding n cents. For example, to make change for 67 cents, we first select a quarter (leaving 42 cents). We next select a second quarter (leaving 17 cents), followed by a dime (leaving 7 cents), followed by a nickel(leaving 2 cents), followed by a penny (leaving 1 cent), followed by a penny. ◂

Demo We display the cashier’s algorithm for n cents, using any set of denominations of coins, as Algorithm 7.

We have described the cashier’s algorithm, a greedy algorithm for making change, using any finite set of coins with denominations c1, c2, … , cr. In the particular case where the four denominations are quarters, dimes, nickels, and pennies, we have c1 = 25, c2 = 10, c3 = 5, and c4 = 1. For this case, we will show that this algorithm leads to an optimal solution in the sense

n := n − ci{di is the number of coins of denomination ci in the change for i =

1, 2, … , r}

idi := di + 1 {add a coin of denomination ci}

di := 0 {di counts the coins of denomination ci used}while n ≥ c

c1 > c2 > ⋯ > cr; n: a positive integer)for i := 1 to

r

ALGORITHM 7 Cashier’s Algorithm.

procedure change(c1, c2, … , cr: values of denominations of coins, where

≤

LEMMA 1

that it uses the fewest coins possible. Before we embark on our proof, we show that there are sets of coins for which the cashier’s algorithm (Algorithm 7) does not necessarily produce change using the fewest coins possible. For example, if we have only quarters, dimes, and pennies (and no nickels) to use, the cashier’s algorithm would make change for 30 cents using six coins—a quarter and five pennies—whereas we could have used three coins, namely, three dimes.

THEOREM 1

Proof: We use a proof by contradiction. We will show that if we had more than the specified number of coins of each type, we could replace them using fewer coins that have the same value. We note that if we had three dimes we could replace them with a quarter and a nickel, if we had two nickels we could replace them with a dime, if we had five pennies we could replace them with a nickel, and if we had two dimes and a nickel we could replace them with a quarter. Because we can have at most two dimes, one nickel, and four pennies, but we cannot have two dimes and a nickel, it follows that 24 cents is the most money we can have in dimes, nickels, and pennies when we make change using the fewest number of coins for n cents.

Proof: We will use a proof by contradiction. Suppose that there is a positive integer n such that there is a way to make change for n cents using quarters, dimes, nickels, and pennies that uses fewer coins than the greedy algorithm finds. We first note that q′, the number of quarters used in this optimal way to make change for n cents, must be the same as q, the number of quarters used by the greedy algorithm. To show this, first note that the greedy algorithm uses the most quarters possible, so q′ q. However, it is also the case that q′ cannot be less than q. If it were, we would need to make up at least 25 cents from dimes, nickels, and pennies in this optimal way to make change. But this is impossible by Lemma 1.

Because there must be the same number of quarters in the two ways to make change, the value of the dimes, nickels, and pennies in these two ways must be the same, and these coins are worth no more than 24 cents. There must be the same number of dimes, because the greedy algorithm used the most dimes possible and by Lemma 1, when change is made using the fewest coins possible, at most one nickel and at most four pennies are used, so that the most dimes possible are also used in the optimal way to make change. Similarly, we have the same number of nickels and, finally, the same number of pennies.

A greedy algorithm makes the best choice at each step according to a specified criterion. The next example shows that it can be di cultffi to determine which of many possible criteria to choose.

EXAMPLE 7 Suppose we have a group of proposed talks with preset start and end times. Devise a greedy algorithm to schedule as many of these talks as possible in a lecture hall, under the assumptions that once a talk starts, it continues until it ends, no two talks can proceed at the same time, and a talk can begin at the same time another one ends. Assume that talk j begins at time sj (where s stands for start) and ends at time ej (where e stands for end).

Solution: To use a greedy algorithm to schedule the most talks, that is, an optimal schedule, we need to decide how to choose which talk to add at each step. There are many criteria we could

pennies cannot exceed 24 cents.

nies, and cannot have two dimes and a nickel. The amount of change in dimes, nickels, and

If n is a positive integer, then n cents in change using quarters, dimes, nickels, and penniesusing the fewest coins possible has at most two dimes, at most one nickel, at most four pen-

when change is made from quarters, dimes, nickels, and pennies.

The cashier’s algorithm (Algorithm 7) always makes changes using the fewest coins possible

first step we will make is to sort th≤

e talks according to increasing finish time. After this

sort-

S := S {∪ talk j}return S{S is the set of talks

scheduled}

if talk j is compatible with S then

for j := 1 to n

e1 ≤ e2 ≤ ⋯ ≤ en: ending times of talks)sort talks by finish time and reorder so that e1 ≤ e2 ≤

⋯ ≤ enS := ∅

procedure schedule(s1 ≤ s2 ≤ ⋯ ≤ sn: start times of talks,

ALGORITHM 8 Greedy Algorithm for Scheduling Talks.

use to select a talk at each step, where we chose from the talks that do not overlap talks already selected. For example, we could add talks in order of earliest start time, we could add talks in order of shortest time, we could add talks in order of earliest finish time, or we could use some other criterion.

We now consider these possible criteria. Suppose we add the talk that starts earliest among the talks compatible with those already selected. We can construct a counterexample to see that the resulting algorithm does not always produce an optimal schedule. For instance, suppose that we have three talks: Talk 1 starts at 8 A.M. and ends at 12 noon, Talk 2 starts at 9A.M. and ends at 10 A.M., and Talk 3 starts at 11 A.M. and ends at 12 noon. We first select the Talk 1 because it starts earliest. But once we have selected Talk 1 we cannot select either Talk 2 or Talk 3 because both overlap Talk 1. Hence, this greedy algorithm selects only one talk. This is not optimal because we could schedule Talk 2 and Talk 3, which do not overlap.

Now suppose we add the talk that is shortest among the talks that do not overlap any of those already selected. Again we can construct a counterexample to show that this greedy algorithm does not always produce an optimal schedule. So, suppose that we have three talks: Talk 1 starts at 8 A.M. and ends at 9:15 A.M., Talk 2 starts at9 A.M. and ends at 10 A.M., and Talk 3 starts at 9:45 A.M. and ends at 11 A.M. We select Talk 2 because it is shortest, requiring one hour. Once we select Talk 2, we cannot select either Talk 1 or Talk 3 because neither is compatible with Talk 2. Hence, this greedy algorithm selects only one talk. However, it is possible to select two talks, Talk 1 and Talk 3, which are compatible.

However, it can be shown that we schedule the most talks possible if in each step we select the talk with the earliest ending time among the talks compatible with those already selected. We will prove this in Chapter 5 using the method of mathematical induction. The

ing, we relabel the talks so that e1 e2 ≤ ⋯ ≤ en. The resulting greedy algorithm is given asAlgorithm 8. ◂

Links

The Halting ProblemWe will now describe a proof of one of the most famous theorems in computer science. We will show that there is a problem that cannot be solved using any procedure. That is, we will show there are unsolvable problems. The problem we will study is the halting problem. It asks whether there is a procedure that does this: It takes as input a computer program and input to the program and determines whether the program will eventually stop when run with this input. It would be convenient to have such a procedure, if it existed. Certainly being able to test whether a program entered into an infinite loop would be helpful when writing and debugging programs. However, in 1936 Alan Turing showed that no such procedure exists (see his biography in Section 13.4).

Before we present a proof that the halting problem is unsolvable, first note that we cannot simply run a program and observe what it does to determine whether it terminates when run

Input Program P

P as program

P as input

Output

H(P, P)

If H(P, P) = “halts,” then loop forever

If H(P, P) = “loops forever,” then halt

FIGURE 3 Showing that the halting problem is unsolvable.

with the given input. If the program halts, we have our answer, but if it is still running after any fixed length of time has elapsed, we do not know whether it will never halt or we just did not wait long enough for it to terminate. After all, it is not hard to design a program that will stop only after more than a billion years has elapsed.

We will describe Turing’s proof that the halting problem is unsolvable; it is a proof by contradiction. (The reader should note that our proof is not completely rigorous, because we have not explicitly defined what a procedure is. To remedy this, the concept of a Turing machine is needed. This concept is introduced in Section 13.5.)

Proof: Assume there is a solution to the halting problem, a procedure called H(P, I). The procedure H(P, I) takes two inputs, one a program P and the other I, an input to the program P.

H(P,I) generates the string “halt” as output if H determines that P stops when given I as in-put. Otherwise, H(P, I) generates the string “loops forever” as output.

We will now derive a contradiction.

When a procedure is coded, it is expressed as a string of characters; this string can be interpreted as a sequence of bits. This means that a program itself can be used as data. Therefore, a program can be thought of as input to another program, or even itself. Hence, H can take a program P as both of its inputs, which are a program and input to this program. H should be able to determine whether P will halt when it is given a copy of itself as input.

To show that no procedure H exists that solves the halting problem, we construct a simple procedure K(P), which works as follows, making use of the output H(P, P). If the output of H(P, P) is “loops forever,” which means that P loops forever when given a copy of itself as input, then K(P) halts. If the output of H(P, P) is “halt,” which means that P halts when given a copy of itself as input, then K(P) loops forever. That is, K(P) does the opposite of what the output of H(P, P) specifies. (See Figure 3.)

Now suppose we provide K as intput to K. We note that if the output of H(K, K) is “loops forever,” then by the definition of K,

Program

K(P)

Program

H(P, I)

we see that K(K) halts. This means that by the definition of H, the output of H(K, K) is “halt,” which is a contradiction. Otherwise, if the output of H(K, K) is “halts,” then by the definition of K, we see that K(K) loops forever, which means that by the

definition of H, the output of H(K, K) is “loops forever.” This is also a contradiction. Thus, H cannot always give the correct answers. Consequently, there is no procedure that solves the halting problem.

70. .

The Growth of Functions IntroductionIn Section 3.1 we discussed the concept of an algorithm. We introduced algorithms that solve a variety of problems, including searching for an element in a list and sorting a list. In Section 3.3 we will study the number of operations used by these algorithms. In particular, we will estimate the number of comparisons used by the linear and binary search algorithms to find an element in a sequence of n elements. We will also estimate the number of comparisons used by the bubble sort and by the insertion sort to sort a list of n elements. The time required to solve a problem depends on more than only the number of operations it uses. The time also depends on the hardware and software used to run the program that implements the algorithm. However, when we change the hardware and software used to implement an algorithm, we can closely

3.2

3.2 The Growth of Functions

217

| | ≤ | |>

C|g(x)| ≤ ′ ′C |g(x)| whenever x > k > k.

| | ≤

approximate the time required to solve a problem of size n by multiplying the previous time required by a constant. For example, on a supercomputer we might be able to solve a problem of size n a million times faster than we can on a PC. However, this factor of one million will not depend on n (except perhaps in some minor ways). One of the advantages of using big-O notation, which we introduce in this section, is that we can estimate the growth of a function without worrying about constant multipliers or smaller order terms. This means that, using big- O notation, we do not have to worry about the hardware and software used to implement an algorithm. Furthermore, using big-O notation, we can assume that the di erentff operations used in an algorithm take the same time, which simplifies the analysis considerably.

Big-O notation is used extensively to estimate the number of operations an algorithm uses as its input grows. With the help of this notation, we can determine whether it is practical to use a particular algorithm to solve a problem as the size of the input increases. Furthermore, using big-O notation, we can compare two algorithms to determine which is more e cientffi asthe size of t

2he input grows. For instance, if we have two

3 algorithms for solving a problem, one

using 100n + 17n + 4 operations and the other using n operations, big-O notation can help us

Definition 1

see that the first algorithm uses far fewer operations when n is large, even though it uses more operations for small values of n, such as n = 10.

This section introduces big-O notation and the related big-Omega and big-Theta notations. We will explain how big-O, big-Omega, and big-Theta estimates are constructed and establish estimates for some important functions that are used in the analysis of algorithms.

Big-O NotationThe growth of functions is often described using a special notation. Definition 1 describes this notation.

Assessment

Links

Remark: Intuitively, the definition that f (x) is O(g(x)) says that f (x) grows slower than some fixed multiple of g(x) as x grows without bound.

The constants C and k in the definition of big-O notation are called witnesses to the relationship f (x) is O(g(x)). To establish that f (x) is O(g(x)) we need only one pair of witnesses to this relationship. That is, to show that f (x) is O(g(x)), we need find only one pair of constants C and k, the witnesses, such that f (x) C g(x) whenever x k.

Note that when there is one pair of witnesses to the relationship f (x) is O(g(x)), there are infinitely many pairs of witnesses. To see this, note that if C and k are one pair of witnesses, then any pair C′ and k′, where C < C′ and k < k′, is also a pair of witnesses, because f (x)

THE HISTORY OF BIG-O NOTATION Big-O notation has been used in mathematics for more than a century. In computer science it is widely used in the analysis of algorithms, as will be seen in Section 3.3. The German mathematician Paul Bachmann first introduced big-O notation in 1892 in an important book on number theory. The big-O symbol is sometimes called a Landau symbol after the German mathematician Edmund Landau, who used this notation throughout his work. The use of big-O notation in computer science was popularized by Donald Knuth, who also introduced the big- Ω and big- Θ notations defined later in this section.

x) is big-oh of g(x).”]

|x)

(g| ≤ C|x

)

f (

|whenever x > k. [This is read as “f (

numbers. We say that f (x) is O(g(x)) if there are constants C and k such that

Let f and g be functions from the set of integers or the set of real numbers to the set of real

4 x 2 x 2 +2 x +1

x 2

x 2 +2 x + 1 < 4 x 2 for x > 1

||

| | ||

≤ 2≤

Wing

Oa

Rp

Ka

IiN

r G

of W

wiItT

neH

ssT

eH

s iE

s D

toE

fiF

rIsN

t IsT

elIe

Oct

Na

OvF

aluB

eIG

of-

O NOTATION A useful approach for find-estimated when x k for which the size of f (x) can be readily> k and to see whether we can use this estimate to find a value of C for which

f (x) ≤ C g(x) for x > k. This approach is illustrated in Example 1.

EXAMPLE 1 Show that f (x) = x2 + 2x + 1 is O(x2).

Extra Solution: 2We observe that we can readily estimate the size of f (x) when x > 1 because

x < x2

Examples and 1 < x when x > 1. It follows that

0 ≤ x2 + 2x + 1 ≤ x2 + 2x2 + x2 = 4x2

whenever x > 1, as show2

n in Figure 1. Con2sequently, we ca

2n take C = 4 and k = 1 as witnesses

to show that f (x) is O(x ). That is, f (x) = x + 2x + 1 < 4x whenever x > 1. (Note that it is notnecessary to use absolute values here because all functions in these equalities are positive whenx is positive.)

Alternatively, we can estimate the size of f (x) when x > 2. When x > 2, we have 2x x2

and 1 x . Consequently, if x > 2, we have

0 ≤ x2 + 2x + 1 ≤ x2 + x2 + x2 = 3x2.It follows that C = 3 and k = 2 are also witnesse

2s to t

2he relation f (x) is O(x2).Observe that in

2the relationship “f (x) is O(x ),” x can be replaced by any function that h

3as

larger values than x for all x ≥ k for some positive real number k. For example, f (x) is O(x ),f (x) is O(x2 + x + 7), and

2so on.

2 2 2It is also true that x is O(x + 2x + 1), because x < x + 2x + 1 whenever x > 1. This

means that C = 1 and k = 1 are witnesses to the relationship x2 is O(x2 + 2x + 1). ◂

Note that in Example 1 we have two functions, f (x) = x2 + 2x + 1 and g(x) = x2, such f (x) is O(g(x)) and g(x) is O(f (x))—the latter fact following from the inequality x2 + 2x + 1, which holds for all nonnegative real numbers x. We say that two functions

4

3 The part of the graph of f (x) = x 2 +2 x +1

that satisfies f (x) < 4 x 2 is shown in color.

2

1

thatx2

≤

1 2

FIGURE 1 The function x2 + 2x + 1 is O(x2).

| f (x)| ≤ C|h(x)| if x > k.

f (x) and g(x) that satisfy both of these big-O relationships are of the same order. We will return to this notion later in this section.

Remark: The fact that f (x) is O(g(x)) is sometimes written f (x) = O(g(x)). However, the equals sign in this notation does not represent a genuine equality. Rather, this notation tells us that an inequality holds relating the values of the functions f and g for su cientlyffi large numbers in the domains of these functions. However, it is acceptable to write f (x) ∈ O(g(x)) because O(g(x)) represents the set of functions that are O(g(x)).

When f (x) is O(g(x)), and h(x) is a function that has larger absolute values than g(x) does for su cientlyffi large values of x, it follows that f (x) is O(h(x)). In other words, the function g(x) in the relationship f (x) is O(g(x)) can be replaced by a function with larger absolute values. To see this, note that if

| f (x)| ≤ C|g(x)| if x > k,

and if |h(x)| > |g(x)| for all x > k, then

Hence, f (x) is O(h(x)).

When big-O notation is used, the function g in the relationship f (x) is O(g(x)) is often chosen to have the smallest growth rate of the functions belonging to a set of reference functions, such as functions of the form xn, where n is a positive real number. (Important reference functions are discussed later in this section.)

In subsequent discussions, we will almost always deal with functions that take on only positive values. All references to absolute values can be dropped when working with big-O estimates for such functions. Figure 2 illustrates the relationship f (x) is O(g(x)).

Links

.

2><

The part of the graph of f (x) that satisfies

f (x) < Cg (x) is shown in color.

k

FIGURE 2 The function f (x) is O(g(x)).

Example 2 illustrates how big-O notation is used to estimate the growth of functions.

EXAMPLE 2 Show that 7x2 is O(x3).

Solution: Note that when x 7, we have 7x2 x3. (We can obtain this inequality by multiplying

both sides of x > 72by x .)

3Consequently, we can take C = 1 and k =

27 as w

3itnesses to establish

the relationship 7x is O(x ). Alternatively, wh2

en x >31, we have 7x < 7x , so that C =

7 and

k = 1 are also witnesses to the relationship 7x is O(x ). ◂

Cg (x)

f (x)

g(x)

f (x) < Cg (x) for x > k

≤

≤ 2 ≤2 ≤

2 ≤

≤2

≤ |an|xn + |an−1|xn−1 + ⋯ + |a1|x + |a0| (

| n| | n−1| | 1| | 0|)

Solution: To det3ermine w

2hether x3 is O(7x2), we need to determine whether

witnesses C and

EXAMPLE 3 Show that n2 is not O(n).

Solution: To show that n2 is not O(n), we must show that no pair of witnesses C and k exist such that n Cn whenever n > k. We will use a proof by contradiction to show this.

Suppose that there are constants C and k for which n Cn whenever n > k. Observe that when n > 0 we can divide both sides of the inequality n Cn by n to obtain the equivalent inequality n C. However, no matter what C and k are, the inequality n C cannot hold forall n with n > k. In particular, once we set a value of k, we see that when n is larger than the maximum of k and C, it is not true that n C even though n > k. This contradiction shows that n2 is not O(n). ◂

EXAMPLE 4 Example 2 shows that 7x2 is O(x3). Is it also true that x3 is O(7x2)?

k exist, so that x ≤ C(7x ) whenever x > k. We will show that no such witnesses exist using aproof by contradiction.If C and k are witnesses, the inequality x3

≤ C(7x2) holds for all x > k. Observe that

THEOREM 1

the inequality x3 ≤ C(7x2) is equivalent to the inequality x ≤ 7C, which follows by divid- ing both sides by the positive quantity x2. However, no matter what C is, it is not the case that x 7C for all x > k no matter what k is, because x can be made arbitrarily large. It follows that no witnesses C and k exist for this proposed big-O relationship. Hence, x3 is not O(7x ). ◂

Big-O Estimates for Some Important Functions

Polynomials can often be used to estimate the growth of functions. Instead of analyzing the growth of polynomials each time they occur, we would like a result that can always be used to estimate the growth of a polynomial. Theorem 1 does this. It shows that the leading term of a polynomial dominates its growth by asserting that a polynomial of degree n or less is O(xn).

Proof: Using the triangle inequality (see Exercise 9 in Section 1.8), if x > 1 we have

| f (x)| = |anxn + an−1xn−1 + ⋯ + a1x + a0|

= xn a + a ∕x + ⋯ + a ∕xn−1 + a ∕xn

≤ xn (|an| + |an−1| + ⋯ + |a1| + |a0|

) .

f (x) is O(xn).

Let f (x) = anxn + an−1xn−1 + ⋯ + a1x + a0, where a0, a1, … , an−1, an are real numbers. Then

≤

+ ⋯ a and k = 1 show that f (x) is O(xn).

where C = |an| + |an−1| + ⋯ + |a0| whenever x > 1. Hence, the witnesses C = |an| + |an−1|

This shows that

| f (x)| ≤ Cxn,

| 0|We now give some examples involving functions that have the set of positive integers as

their domains.

EXAMPLE 5 How can big-O notation be used to estimate the sum of the first n positive integers?

Solution: Because each of the integers in the sum of the first n positive integers does not exceedn, it follows that

1 + 2 + ⋯ + n ≤ n + n + ⋯ + n = n2.

From this inequality it follows that 1 + 2 + 3 + ⋯ + n is O(n2), taking C = 1 and k = 1 as witnesses. (In this example the domains of the functions in the big-O relationship are the set of positive integers.) ◂

In Example 6 big-O estimates will be developed for the factorial function and its logarithm. These estimates will be important in the analysis of the number of steps used in sorting procedures.

EXAMPLE 6 Give big-O estimates for the factorial function and the logarithm of the factorial function, where the factorial function f (n) = n! is defined by

n! = 1 ⋅ 2 ⋅ 3 ⋅ ⋯ ⋅ nwhenever n is a positive integer, and 0! = 1. For example,

1! = 1, 2! = 1 ⋅ 2 = 2, 3! = 1 ⋅ 2 ⋅ 3 = 6, 4! = 1 ⋅ 2 ⋅ 3 ⋅ 4 = 24.

Note that the function n! grows rapidly. For instance,

20! = 2,432,902,008,176,640,000.Solution: A big-O estimate for n! can be obtained by noting that each term in the product does not exceed n. Hence,

n! = 1 ⋅ 2 ⋅ 3 ⋅ ⋯ ⋅ n n ⋅ n ⋅ n ⋅ ⋯ ⋅ n

= nn.

This inequality shows that n! is O(nn), taking C = 1 and k = 1 as witnesses. Taking logarithms of both sides of the inequality established for n!, we obtain

log n! ≤ log nn = n log n.

This implies that log n! is O(n log n), again taking C = 1 and k = 1 as witnesses. ◂

EXAMPLE 7 In Section 5.1 , we will show that n < 2n whenever n is a positive integer. Show that this inequality implies that n is O(2n), and use this inequality to show that log n is O(n).

Solution: Using the inequality n < 2n, we quickly can conclude that n is O(2n) by taking k = C = 1 as witnesses. Note that because the logarithm function is increasing, taking logarithms (base 2) of both sides of this inequality shows that

log n < n.

It follows that

log n is O(n).

(Again we take C = k = 1 as witnesses.)If we have logarithms to a base b, where b is di erent from 2, we still have ff logb n is

O(n) because

logb n = log n < n

log b log b

whenever n is a positive integer. We take C = 1∕ log b and k = 1 as witnesses. (We have used Theorem 3 in Appendix 2 to see that logb n = log n ∕ log b.) ◂

As mentioned before, big-O notation is used to estimate the number of operations needed to solve a problem using a specified procedure or algorithm. The functions used in these estimates often include the following:

1, log n, n, n log n, n2, 2n, n!

Using calculus it can be shown that each function in the list is smaller than the succeeding function, in the sense that the ratio of a function and the succeeding function tends to zero as n grows without bound. Figure 3 displays the graphs of these functions, using a scale for the values of the functions that doubles for each successive marking on the graph. That is, the vertical scale in this graph is logarithmic.

4096

2048

1024

512

256

128

64

32

16

8

4

2

1

2 3 4n!

2n

n2

n log n

n

log n

l

5 6 7 8

FIGURE 3 A display of the growth of functions commonly used in big-O estimates.

We now give some useful facts that help us determine whether big-

1

√

3

1

√2 3 4 5

.

USEFUL BIG-O ESTIMATES INVOLVING LOGARITHMS, POWERS, AND EXPONEN-

O

relationships hold between pairs of functions when each of the functions is a power of a logarithm, a power, or an exponential function of the form bn where b > 1. Their proofs are left as Exercises 57–62 for readers skilled with calculus.

Theorem 1 shows that if f (n) is a polynomial of degree d or less, then f (n) is O(nd ). Applying this theorem, we see that if d > c > 1, then nc is O(nd ). We leave it to the reader to show that the reverse of this relationship does not hold. Putting these facts together, we see that if d > c > 1, then

nc is O(nd ), but nd is not O(nc).

In Example 7 we showed that logb n is O(n) whenever b > 1. More generally, whenever b > 1 and c and d are positive, we have

(logb n)c is O(nd ), but nd is not (O(logb n)c).

This tells us that every positive power of the logarithm of n to the base b, where b > 1, is big-O of every positive power of n, but the reverse relationship never holds.

In Example 7, we also showed that n is O(2n). More generally, whenever d is positive andb > 1, we have

nd is O(bn), but bn is not O(nd ).

This tells us that every power of n is big-O of every exponential function of n with a base that is greater than one, but the reverse relationship never holds. Furthermore, when c > b > 1 we have

bn is O(cn), but cn is not O(bn).

This tells us that if we have two exponential functions with di erentff bases greater than one, one of these functions is big-O of the other if and only if its base is smaller or equal.

Finally, we note that if c > 1, we have

cn is O(n!), but n! is not O(cn).

We can use the big-O estimates discussed here to help us order the growth of di erent fffunctions, as Example 8 illustrates.

EXAMPLE 8 Arrange the functions f (n) = 8 n, f (n) = (log n)2, f (n) = 2n log n, f (n) = n!, f (n) = (1 1)n, and f6(n) = n2 in a list so that each function is big-O of the next function.

Solution: From the big-O estimates described in this subsection, we see that f2(n) = (log n)2 is the slowest growing of these functions. (This follows because log n grows slower than any positive power of n.) The next three functions, in order, are f (n) = 8 n = f (n) = 2n log n, and f6(n) = n2. (We know this because f1(n) = 8n1∕2, f3(n) = 2n log n is a function that grows faster than n but slower than nc for every c > 1, and f6(n) = n2 is of the form nc

where c = 2.) The nextfunction in the list is f5(n) = (1.1)n, because it is an exponential function with base 1.1. Finally,f4(n) = n! is the fastest growing function on the list, because f (n) = n! grows faster than any exponential function of n. ◂

TIAL FUNCTIONS

big-

| | ||

| | ||

| | | |

≤ )

| | ||

= (C + C )|g(x)|1 2

| | | |1 2 1

2 | | ≤ | | >1 2

1 2

f1(x)| + | f2(x)| ≤ C1|g1(x)| + C2|g2(x)|

The Growth of Combinations of FunctionsMany algorithms are made up of two or more separate subprocedures. The number of steps used by a computer to solve a problem with input of a specified size using such an algorithm is the sum of the number of steps used by these subprocedures. To give a big-O estimate for the number of steps needed, it is necessary to find big-O estimates for the number of steps used by each subprocedure and then combine these estimates.

Big-O estimates of combinations of functions can be provided if care is taken when di erentff O estimates are combined. In particular, it is often necessary to estimate the growth of the sum and the product of two functions. What can be said if big-O estimates for each of two functions are known? To see what sort of estimates hold for the sum and the product

of twofunctions, suppose that f1(x) is O(g1(x)) and f2(x) is O(g2(x)).

From the definition of big-O notation, there are constants C1, C2, k1, and k2 such that

f1(x) ≤ C1 g1(x)

when x > k1, and

f2(x) ≤ C2 g2(x)

when x > k2. To estimate the sum of f1(x) and f2(x), note that

( f1 + f2)(x) = f1(x) + f2(x)

f1(x)| + | f2(x | using the triangle inequality |a + b| ≤| | | |

a + b .

When x is greater tha

|

n both k1 and k2, it follows from the inequalities for f1(x) and f2(x) that|

≤ C1|g(x)| + C2|g(x)|

= C|g(x)|,

where C = C + C and g(x) = max( g (x) , g (x) ). [Here max(a, b) denotes the maximum, or larger, of a and b.]

This inequality shows that ( f + f )(x) C g(x) whenever x k, where k = max(k , k ).

We state this useful result as Theorem 2.

THEOREM 2

We often have big-O estimates for f1 and f2 in terms of the same function g. In this situation, Theorem 2 can be used to show that ( f1 + f2)(x) is also O(g(x)), because max(g(x), g(x)) = g(x). This result is stated in Corollary 1.

g(x) = (max(|g1(x)|, |g2(x)|) for all x.

Suppose that f1(x) is O(g1(x)) and that f2(x) is O(g2(x)). Then ( f1 + f2)(x) is O(g(x)), where

COROLLARY 1Suppose that f1(x) and f2(x) are both O(g(x)). Then ( f1 + f2)(x) is O(g(x)).

C C |(g g )(x)|1 2 1 2

≤C (g g )(x)|,1 2

| | | |1 2 1 2 1 2

1 2

constants C and k, namely, C = C C and k = max(k , k ), such that ( f f )(x) ≤ C g (x)g (x)

Solution: First,2a big-O es

2timate for (x + 1) log(x2 + 1) will be found. Note that (x +

1) is O(x).

In a similar way big-O estimates can be derived for the product of the functions f1 and f2.When x is greater than max(k1, k2) it follows that

|( f1f2)(x)| =≤

| f1(x)|(

| f2(x)| ( )

THEOREM 3

≤

C1|g1 x)|C2|g2 x |

where C = C1C2. F

|

rom this inequality, it follows that f1(x)f2(x) is O(g1g2(x)), because there are

whenever x > k. This result is stated in Theorem 3.

Suppose that f1(x) is O(g1(x)) and f2(x) is O(g2(x)). Then ( f1f2)(x) is O(g1(x)g2(x)).

The goal in using big-O notation to estimate functions is to choose a function g(x) as simple as possible, that grows relatively slowly so that f (x) is O(g(x)). Examples 9 and 10 illustrate how to use Theorems 2 and 3 to do this. The type of analysis given in these examples is often used in the analysis of the time used to solve problems using computer programs.

EXAMPLE 9 Give a big-O estimate for f (n) = 3n log(n!) + (n2 + 3) log n, where n is a positive integer.

Solution: First, the product 3n log(n!) will be estimated. From Example 6 we know that log(n!)is O(n log n). Usi

2ng this estimate and the fact that 3n is O(n), Theorem 3 gives the estimate that

3n log(n!) is O(n log n). 2 2 2

Next, the2product (n + 3) log n will be estimated. Because (n + 3) < 2n when n > 2, it

follows that n + 3 is O(n2). Thus, from Theorem 3 it follows that (n2 + 3) log n is O(n2 log n).Using Theorem

2 2 to combine t

2he two big-O estimates for the products shows that f (n) =

3n log(n!) + (n + 3) log n is O(n log n). ◂

EXAMPLE 10 Give a big-O estimate for f (x) = (x + 1) log(x2 + 1) + 3x2.

Furthermore, x + 1 ≤ 2x when x > 1. Hence,

log(x2 + 1) ≤ log(2x2) = log 2 + log x2 = log 2 + 2 log x ≤ 3 log x,if x > 2. This shows that log(x2 + 1) is O(log x).

2 2 2From Theorem 3 it follows that (x + 1) l2og(x + 1) is O(x log x).

2Because 3x is O(x ), The-

Ω and Θ are the Greek uppercase letters omega and theta, respectively.

orem 2 tells us that f (x) is O(max(x log x, x )). Because x log x ≤ x , for x > 1, it follows thatf (x) is O(x2). ◂

Big-Omega and Big-Theta NotationBig-O notation is used extensively to describe the growth of functions, but it has limitations. In particular, when f (x) is O(g(x)), we have an upper bound, in terms of g(x), for the size of f (x) for large values of x. However, big-O notation does not provide a lower bound for the size of f (x) for large x. For this, we use big-Omega (big-Ω) notation. When we want to give both an upper and a lower bound on the size of a function f (x), relative to a reference function g(x), we use big-Theta (big-Θ) notation. Both big-Omega and big-Theta notation were introduced

3 2 ≥

| | | | ||

by Donald Knuth in the 1970s. His motivation for introducing these notations was the common misuse of big-O notation when both an upper and a lower bound on the size of a function are needed. We now define big-Omega notation and illustrate its use. After doing so, we will do the same for big-Theta notation.

Definition 2

There is a strong connection between big-O and big-Omega notation. In particular, f (x) is Ω(g(x)) if and only if g(x) is O(f (x)). We leave the verification of this fact as a straightforward exercise for the reader.

EXAMPLE 11 The function f (x) = 8x3 + 5x2 + 7 is Ω(g(x)), where g(x) is the function g(x) = x3. This is easy to see because f (x) = 8x + 5x + 7 8x for all positive real numbers x. This is equivalent to saying that g(x) = x3 is O(8x3 + 5x2 + 7), which can be established directly by turning theinequality around. ◂

Often, it is important to know the order of growth of a function in terms of some relatively simple reference function such as xn when n is a positive integer or cx, where c > 1. Knowing the order of growth requires that we have both an upper bound and a lower bound for the size of the function. That is, given a function f (x), we want a reference function g(x) such that f (x) is O(g(x)) and f (x) is Ω(g(x)). Big-Theta notation, defined as follows, is used to express both of these relationships, providing both an upper and a lower bound on the size of a function.

Definition 3

When f (x) is Θ(g(x)), it is also the case that g(x) is Θ( f (x)). Also note that f (x) is Θ(g(x)) if and only if f (x) is O(g(x)) and g(x) is O( f (x)) (see Exercise 31). Furthermore, note that f (x) is Θ(g(x)) if and only if there are positive real numbers C1 and C2 and a positive real number k such that

C1 g(x) ≤ f (x) ≤ C2 g(x)

whenever x > k. The existence of the constants C1, C2, and k tells us that f (x) is Ω(g(x)) and that f (x) is O(g(x)), respectively.

Usually, when big-Theta notation is used, the function g(x) in Θ(g(x)) is a relatively simple reference function, such as xn, cx, log x, and so on, while f (x) can be relatively complicated.

EXAMPLE 12 We showed (in Example 5) th2

at the sum of the first n positive integers is O(n2). Determine

Extra whether this sum is of order n

without using the summation formula for this sum.

x) is big-Omega of g(x).”]

|x)

(g| ≥ C|x

)

f (

|whenever x > k. [This is read as “f (

numbers. We say that f (x) is Ω(g(x)) if there are constants C and k with C positive such that

Let f and g be functions from the set of integers or the set of real numbers to the set of real

Θ(g(x)), we say that f is big-Theta of g(x), that f (x) is of order g(x), and that f (x) and g(x)are of the same order.

real numbers. We say that f (x) is Θ(g(x)) if f (x) is O(g(x)) and f (x) is Ω(g(x)). When f (x) is

Let f and g be functions from the set of integers or the set of real numbers to the set of

2⋯

2

ExamplesSolution: Let f (n) = 1 + 2 + 3 + + n. Because we already know that f (n) is O(n2), to show that f (n) is of order n we need to find a positive constant C such that f (n) > Cn for su cientlyffi

= (n − ⌈n∕2⌉ + 1) ⌈n∕2⌉

≥

≥ ⌈n∕2⌉ + ⌈n∕2⌉ + ⋯ + ⌈n∕2⌉

2

⌈ ⌉

n i=1 = n(n + 1)∕2 from Table 2 in Section 2.4 and deri=iv1ed in Exercise 37(b) of that

section.

x) = 3x

+ x

+ 17x

+ 2, then f (x) is of order x . This is stated in Theorem 4, whose proof

large integers n. To obtain a lower bound for this sum, we can ignore the first half of the terms. Summing only the terms greater than n∕2 , we find that

1 + 2 + ⋯ + n ≥ ⌈n∕2⌉ + ( ⌈n∕2⌉ + 1) + ⋯ + n

(n∕2)(n∕2)= n2∕4.

This shows that f (n) is Ω(n2). We conclude that f (n) is of order n2, or in symbols, f (n) is Θ(n ). ◂∑Remark: Note that we can also show that f

(n) = ∑n

i is Θ(n2) using the closed formula

EXAMPLE 13 Show that 3x2 + 8x log x is Θ(x2).

Extra Solution: Because 0 ≤ 8x log x ≤ 8x2, it follows that 3x2 + 8x log x ≤ 11x2 for x > 1.

Examples Co2nsequently, 3x2 + 8x log x is O(x2). Clearly, x2 is O(3x2 + 8x log x). Consequently,

3x + 8x log x is Θ(x2). ◂

if f ( One use

5ful f

4act is th

3at the leading term of a poly

5 nomial determines its order. For example,

THEOREM 4

is left as Exercise 50.

EXAMPLE 14 The polynomials 3x8 + 10x7 + 221x2 + 1444, x19 − 18x4 − 10,112, and −x99 + 40,001x98 +

100,003x are of orders x8, x19, and x99, respectively. ◂

Unfortunately, as Knuth observed, big-O notation is often used by careless writers and speakers as if it had the same meaning as big-Theta notation. Keep this in mind when you see big-O notation used. The recent trend has been to use big-Theta notation whenever both upper and lower bounds on the size of a function are needed.

Complexity of Algorithms Introduction

n0 11 0n

1n−1n−x) is of order x .

nn

x) = a x + a x + ⋯ + a x + a , where a , a , … , a are real numbers with

an ≠ 0. Then f (Let f (

3.3

tual

When does an algorithm provide a satisfactory solution to a problem? First, it must always produce the correct answer. How this can be demonstrated will be discussed in Chapter 5. Second, it should be e cient.ffi The e ciencyffi of algorithms will be discussed in this section.

How can the e ciencyffi of an algorithm be analyzed? One measure of e ciencyffi is the time used by a computer to solve a problem using the algorithm, when input values are of a specified size. A second measure is the amount of computer memory required to implement the algorithm when input values are of a specified size.

Questions such as these involve the computational complexity of the algorithm. An analysis of the time required to solve a problem of a particular size involves the time complexity of the algorithm. An analysis of the computer memory required involves the space complexity of the algorithm. Considerations of the time and space complexity of an algorithm are essential when algorithms are implemented. It is important to know whether an algorithm will produce an answer in a microsecond, a minute, or a billion years. Likewise, the required memory must be available to solve a problem, so that space complexity must be taken into account.

Considerations of space complexity are tied in with the particular data structures used to implement the algorithm. Because data structures are not dealt with in detail in this book, space complexity will not be considered. We will restrict our attention to time complexity.

Time ComplexityThe time complexity of an algorithm can be expressed in terms of the number of operations used by the algorithm when the input has a particular size. The operations used to measure time complexity can be the comparison of integers, the addition of integers, the multiplication of integers, the division of integers, or any other basic operation.

Time complexity is described in terms of the number of operations required instead of ac- computer time because of the di erenceff in time needed for di erentff computers to

perform basic operations. Moreover, it is quite complicated to break all operations down to the basic bit operations that a computer uses. Furthermore, the fastest computers in existence can perform basic bit operations (for instance, adding, multiplying, comparing, or exchanging

two bits) in 10−11 second (10 picoseconds), but personal computers may require 10−8 second (10 nanosec-

onds), which is 1000 times as long, to do the same operations.

⌊ ⌋

≤

≤

≤≤

We illustrate how to analyze the time complexity of an algorithm by considering Algo- rithm 1 of Section 3.1, which finds the maximum of a finite set of integers.

EXAMPLE 1 Describe the time complexity of Algorithm 1 of Section 3.1 for finding the maximum element in a finite set of integers.

Extra Solution: The number of comparisons will be used as the measure of the time complexity of theExamples algorithm, because comparisons are the basic operations used.

To find the maximum element of a set with n elements, listed in an arbitrary order, the temporary maximum is first set equal to the initial term in the list. Then, after a comparison i n has been done to determine that the end of the list has not yet been reached, the temporary maximum and second term are compared, updating the temporary maximum to the value of the second term if it is larger. This procedure is continued, using two additional comparisons for each term of the list—one i n, to determine that the end of the list has not been reachedand another max < ai, to determine whether to update the temporary maximum. Because two comparisons are used for each of the second through the nth elements and one more comparison is used to exit the loop when i = n + 1, exactly 2(n − 1) + 1 = 2n − 1 comparisons are usedwhenever this algorithm is applied. Hence, the algorithm for finding the maximum of a set of n elements has time complexity Θ(n), measured in terms of the number of comparisons used. Note that for this algorithm the number of comparisons is independent of particular input of nnumbers. ◂

Next, we will analyze the time complexity of searching algorithms.

EXAMPLE 2 Describe the time complexity of the linear search algorithm (specified as Algortihm 2 in Section 3.1).

Solution: The number of comparisons used by Algorithm 2 in Section 3.1 will be taken as the measure of the time complexity. At each step of the loop in the algorithm, two comparisons are performed—one i n, to see whether the end of the list has been reached and one x ai, to compare the element x with a term of the list. Finally, one more comparison i n is made outside the loop. Consequently, if x = ai, 2i + 1 comparisons are used. The most comparisons, 2n + 2, are required when the element is not in the list. In this case, 2n comparisons are used to determine that x is not ai, for i = 1, 2, … , n, an additional comparison is used to exit the loop, and one comparison is made outside the loop. So when x is not in the list, a total of 2n + 2 comparisons are used. Hence, a linear search requires Θ(n) comparisons in the worstcase, because 2n + 2 is Θ(n). ◂

WORST-CASE COMPLEXITY The type of complexity analysis done in Example 2 is a worst-case analysis. By the worst-case performance of an algorithm, we mean the largest number of operations needed to solve the given problem using this algorithm on input of specified size. Worst-case analysis tells us how many operations an algorithm requires to guarantee that it will produce a solution.

EXAMPLE 3 Describe the time complexity of the binary search algorithm (specified as Algorithm 3 in Section 3.1) in terms of the number of comparisons used (and ignoring the time required to compute m = (i + j)∕2 in each iteration of the loop in the algorithm).

Solution: For simplicity, assume there are n = 2k elements in the list a1, a2, … , an, where k is a nonnegative integer. Note that k = log n. (If n, the number of elements in the list, is not a power of 2, the list can be considered part of a larger list with 2k+1

elements, where 2k < n < 2k+1. Here 2k+1 is the smallest power of 2 larger than n.)

⌊ ⌋ ⌈ ⌉

≠≤

Aw

Vor

EsR

t-cA

aG

seE

a-

Cna

Aly

SsE

isC

, iO

s

2

n

At each stage of the algorithm, i and j, the locations of the first term and the last term of the restricted list at that stage, are compared to see whether the restricted list has more than one term. If i < j, a comparison is done to determine whether x is greater than the middle term of the restricted list.

At the first stage the search is restricted to a list with 2k−1 terms. So far, two comparisons have been used. This procedure is continued, using two comparisons at each stage to restrict the search to a list with half as many terms. In other words, two comparisons are used at the first stage of the algorithm when the list has 2k elements, two more when the search has been reduced to a list with 2k−1 elements, two more when the search has been reduced to a list with2k−2 eleme

1nts, and so on, until two comparisons are used when the search has been reduced to a

list with 2 = 2 elements. Finally, when one term is left in the list, one comparison tells us thatthere are no additional terms left, and one more comparison is used to determine if this term

is x. Hence, at most 2k + 2 = 2 log n + 2 comparisons are required to perform a binary search when the list being searched has 2k elements. (If n is not a power of 2, the

original list is expanded to a list with 2k+1 terms, where k = log n , and the search requires at most 2 log n + 2 comparisons.) It follows that in the worst case, binary search requires O(log n) comparisons. Note that in the worst case, 2 log n + 2 comparisons are used by the

binary search. Hence,the binary search uses Θ(log n) comparisons in the worst case, because 2 log n + 2 = Θ(log n). From this analysis it follows that in the worst case, the binary search algorithm is more e cient thanffi the linear search algorithm, because we know by Example 2 that the linear search algorithmhas Θ(n) worst-case time complexity. ◂

Another important type of complexity analysis, besidesaverage-case analysis. The average number of operations used

to solve the problem over all possible inputs of a given size is found in this type of analysis. Average-case time complexity analysis is usually much more complicated than worst-case analysis. However, the average-case analysis for the linear search algorithm can be done without di culty, ffi as shown in Example 4.

EXAMPLE 4 Describe the average-case performance of the linear search algorithm in terms of the average number of comparisons used, assuming that the integer x is in the list and it is equally likely that x is in any position.

Solution: By hypothesis, the integer x is one of the integers a1, a2, … , an in the list. If x is the first term a1 of the list, three comparisons are needed, one i n to determine whether the end of the list has been reached, one x ai to compare x and the first term, and one i n outside the loop. If x is the second term a2 of the list, two more comparisons are needed, so that a total of five comparisons are used. In general, if x is the ith term of the list ai, two comparisons will be used at each of the i steps of the loop, and one outside the loop, so that a total of 2i + 1 comparisons are needed. Hence, the average number of comparisons used equals

3 + 5 + 7 + ⋯ + (2 n + 1) =

2(1 + 2 + 3 + ⋯ + n ) + n .n n

Using the formula from line 2 of Table 2 in Section 2.4 (and see Exercise 37(b) of Section

2.4), 1 + 2 + 3 + ⋯ + n = n(n + 1) .

Hence, the average number of comparisons used by the linear search algorithm (when x is known to be in the list) is

2[ n ( n + 1) ∕ 2 ] + 1 = n + 2,

which is Θ(n). ◂

2

2

2

2

Remark: In the analysis in Example 4 we assumed that x is in the list being searched. It is also possible to do an average-case analysis of this algorithm when x may not be in the list (see Exercise 23).

Remark: Although we have counted the comparisons needed to determine whether we have reached the end of a loop, these comparisons are often not counted. From this point on we will ignore such comparisons.

WORST-CASE COMPLEXITY OF TWO SORTING ALGORITHMS We analyze the

worst-case complexity of the bubble sort and the insertion sort in Examples 5 and 6.

EXAMPLE 5 What is the worst-case complexity of the bubble sort in terms of the number of comparisons made?

Solution: The bubble sort described before Example 4 in Section 3.1 sorts a list by performing a sequence of passes through the list. During each pass the bubble sort successively compares adjacent elements, interchanging them if necessary. When the ith pass begins, the i − 1 largest elements are guaranteed to be in the correct positions. During this pass, n − i comparisons are used. Consequently, the total number of comparisons used by the bubble sort to order a list of n elements is

(n − 1) + (n − 2) + ⋯ + 2 + 1 = (n − 1)n

using a summation formula from line 2 in Table 2 in Section 2.4 (and Exercise 37(b) in Section 2.4). Note that the bubble sort always uses this many comparisons, because it continues even if the list becomes completely sorted at some intermediate step. Consequently, thebubble sort uses (n − 1)n∕2 comparisons, so it has Θ(n ) worst-case complexity in terms of the number of comparisons used. ◂

EXAMPLE 6 What is the worst-case complexity of the insertion sort in terms of the number of comparisons made?

Solution: The insertion sort (described in Section 3.1) inserts the jth element into the correct position among the first j − 1 elements that have already been put into the correct order. It does this by using a linear search technique, successively comparing the jth element with successive terms until a term that is greater than or equal to it is found or it compares aj with itself and stops because aj is not less than itself. Consequently, in the worst case, j comparisons are required to insert the jth element into the correct position. Therefore, the total number of comparisons used by the insertion sort to sort a list of n elements is

2 + 3 + ⋯ + n = n(n + 1)

− 1,

using the summation formula for the sum of consecutive integers in line 2 of Table 2 of Section 2.4 (and see Exercise 37(b) of Section 2.4), and noting that the first term, 1, is missing in this sum. Note that the insertion sort may use considerably fewer comparisons if the smaller elements started out at the end of the list. We conclude that the insertion sort has worst-casecomplexity Θ(n ). ◂

2

3√

Links

In Examples 5 and 6 we showed that both the bubble sort and the insertion sort have worst-case time complexity Θ(n ). However, the most e cient sorting algorithms can sort ffi n items in O(n log n) time, as we will show in Sections 8.3 and 11.1 using techniques we de- velop in those sections. From this point on, we will assume that sorting n items can be done in O(n log n) time.

You can run animations found on many di erentff websites that simultaneously run di erent sorting algorithms on the same lists. Doing so will help you gain insights into theff e ciency of di erent sorting algorithms. Among the sorting algorithms that you can find areffi ff the bubble sort, the insertion sort, the shell sort, the merge sort, and the quick sort. Some of these animations allow you to test the relative performance of these sorting algorithms on lists of randomly selected items, lists that are nearly sorted, and lists that are in reversed order.

Complexity of Matrix MultiplicationThe definition of the product of two matrices can be expressed as an algorithm for computing the product of two matrices. Suppose that C = [cij] is the m × n matrix that is the product of the m × k matrix A = [aij] and the k × n matrix B = [bij]. The algorithm based on the definition of the matrix product is expressed in pseudocode in Algorithm 1.

We can determine the complexity of this algorithm in terms of the number of additions and multiplications used.

EXAMPLE 7 How many additions of integers and multiplications of integers are used by Algorithm 1 to multiply two n × n matrices with integer entries?

Solution: There are n2 entries in the product of A and B3

. To find each entry re2quires

a total of nmultiplications and n − 1 additions. Hence, a total of n multiplications and n (n − 1) additionsare used. ◂

Surprisingly, there are more e cientffi algorithms for matrix multiplication than that given in Algorithm 1. As Example 7 shows, multiplying two n × n matrices directly from the definition requires O(n ) multiplications and additions. Using other algorithms, two n × n matrices canbe multiplied using O(n 7) multiplications and additions. (Details of such algorithms can be found in [CoLeRiSt09].)

We can also analyze the complexity of the algorithm we described in Chapter 2 for computing the Boolean product of two matrices, which we display as Algorithm 2.

cij := cij +

aiqbqjreturn C {C = [cij] is the product of A and B}

for q := 1 to k

cij := 0for j := 1 to n

for i := 1 to m

procedure matrix multiplication(A, B: matrices)

ALGORITHM 1 Matrix Multiplication.

≥

The number of bit operations used to find the Boolean product of two n × n matrices can be easily determined.

EXAMPLE 8 How many bit operations are used to find A ⊙ B, where A and B are n × n zero–one matrices?

Solution: There are n2 entries in A ⊙ B. Using Algorithm 2, a total of n ORs and n ANDs are

use3d to find an entry of A ⊙ B. Hence, 2n bit operations are used to find each entry. Therefore,

2n bit operations are required to compute A ⊙ B using Algorithm 2. ◂

Understanding the Complexity of AlgorithmsTable 1 displays some common terminology used to describe the time complexity of algorithms. For example, an algorithm that finds the largest of the first 100 terms of a list of n elements by applying Algorithm 1 to the sequence of the first 100 terms, where n is an integer with n 100, has constant complexity because it uses 99 comparisons no matter what n is (as the reader can verify). The linear search algorithm has linear (worst-case or average-case) complexity and the binary search algorithm has logarithmic (worst-case) complexity. Many important algorithms have n log n, or linearithmic (worst-case) complexity, such as the merge sort, which we will introduce in Chapter 4. (The word linearithmic is a combination of the words linear and logarithmic.)

TABLE 1 Commonly Used Terminology for the Complexity of Algorithms.

Complexity Terminology

Θ(1) Constant complexityΘ(log n) Logarithmic complexityΘ(n) Linear complexityΘ(n log n) Linearithmic complexityΘ(nb) Polynomial complexityΘ(bn), where b > 1 Exponential complexityΘ(n!) Factorial complexity

cij := cij ∨ (aiq ∧bqj)return C {C = [cij] is the Boolean product of A

and B}

for q := 1 to k

cij := 0for j := 1 to n

for i := 1 to m

procedure Boolean product of Zero–One Matrices (A, B: zero–one matrices)

ALGORITHM 2 The Boolean Product of Zero–One Matrices.

Discrete Mathematics Department of CSEUNIT-IV

DISCRETE PROBABILITY THEORY AND ADVANCED COUNTING TECHNIQUES

Basis of counting:

If X is a set, let us use |X| to denote the number of elements in X.

Two Basic Counting Principles

Two elementary principles act as “building blocks” for all counting problems. The first principle says that the whole is the sum of its parts; it is at once immediate and elementary.

Sum Rule : The principle of disjunctive counting :

If a set X is the union of disjoint nonempty subsets S1, ….., Sn, then | X | = | S1 | + | S2 | + ….. + | Sn |.

We emphasize that the subsets S1, S2, …., Sn must have no elements in common. Moreover, since X = S1 U S2 U ……U Sn, each element of X is in exactly one of the subsets Si. In other words, S1, S2, …., Sn is a partition of X.

If the subsets S1, S2, …., Sn were allowed to overlap, then a more profound principle will be

needed--the principle of inclusion and exclusion.

Frequently, instead of asking for the number of elements in a set perse, some problems ask for how

many ways a certain event can happen.

The difference is largely in semantics, for if A is an event, we can let X be the set of ways that A can

happen and count the number of elements in X. Nevertheless, let us state the sum rule for counting events.

If E1, ……, En are mutually exclusive events, and E1 can happen e1 ways, E2 happen e2 ways,

…. ,En can happen en ways, E1 or E2 or …. or En can happen e1 + e2 + …….. + en ways.

Again we emphasize that mutually exclusive events E1 and E2 mean that E1 or E2 can happen but

both cannot happen simultaneously.

The sum rule can also be formulated in terms of choices: If an object can be selected from a reservoir

in e1 ways and an object can be selected from a separate reservoir in e2 ways and an object can be selected

from a separate reservoir in e2 ways, then the selection of one object from either one reservoir or the other

can be made in

e1 + e2 ways.

St.peters Engineering college

1

Discrete Mathematics Department of CSE

Product Rule: The principle of sequencing counting

If S1, ….., Sn are nonempty sets, then the number of elements in the Cartesian product S1 x S2 x …..

x Sn is the product ∏in=1 |S i |. That is,

| S1 x S2 x . . . . . . . x Sn | = ∏in=1| S i |.

Observe that there are 5 branches in the first stage corresponding to the 5 elements of S1 and to each

of these branches there are 3 branches in the second stage corresponding to the 3 elements of S2 giving a

total of 15 branches altogether. Moreover, the Cartesian product S1 x S2 can be partitioned as (a1 x S2) U

(a2 x S2) U (a3 x S2) U (a4 x S2) U (a5 x S2), where (ai x S2) = {( ai, b1), ( ai i, b2), ( ai, b3)}.

Thus, for example, (a3 x S2) corresponds to the third branch in the first stage followed by each of the 3

branches in the second stage.

More generally, if a1,….., an are the n distinct elements of S1 and b1,….,bm are the m distinct

elements of S2, then S1 x S2 = Uin =1 (ai x S2).

For if x is an arbitrary element of S1 x S2 , then x = (a, b) where a Î S1 and b Î S2.

Thus, a = ai for some i and b = bj for some j. Thus, x = (ai, bj) Î(ai x S2) and

therefore x Î Uni =1(ai x S2).

Conversely, if x Î Uin =1(ai x S2), then x Î (ai x S2) for some i and thus x = (ai, bj) where bj is some element

of S2. Therefore, x Î S1 x S2.

Next observe that (ai x S2) and (aj x S2) are disjoint if i ≠ j since if

x Î (ai x S2) ∩ (aj x S2) then x = ( ai, bk) for some k and x = (aj, b1) for some l.

But then (ai, bk) = (aj, bl) implies that ai = aj and bk = bl. But since i ≠ j , ai ≠ a j.

Thus, we conclude that S1 x S2 is the disjoint union of the sets (ai x S2). Furthermore |ai x S2| = |S2|

since there is obviously a one-to-one correspondence between the sets ai x S2 and S2, namely, (ai, bj) → bj.

Then by the sum rule |S1 x S2| = ∑nni=1 | ai x S2|

= (n summands) |S2| + |S2| +…….+ |S2|

= n |S2|

= nm.


2


Therefore, we have proven the product rule for two sets. The general rule follows by mathematical

induction.

We can reformulate the product rule in terms of events. If events E1, E2 , …., En can happen e1, e2,

…., and en ways, respectively, then the sequence of events E1 first, followed by E2,…., followed by En can

happen e1e2 …en ways.

In terms of choices, the product rule is stated thus: If a first object can be chosen e1 ways, a second

e2 ways , …, and an nth object can be made in e1e2….en ways.

Combinations & Permutations:

Definition.

A combination of n objects taken r at a time (called an r-combination of n objects) is an unordered

selection of r of the objects.

A permutation of n objects taken r at a time (also called an r-permutation of n objects) is an ordered

selection or arrangement of r of the objects.

Note that we are simply defining the terms r-combinations and r-permutations here and have not

mentioned anything about the properties of the n objects.

For example, these definitions say nothing about whether or not a given element may appear more than once

in the list of n objects.

In other words, it may be that the n objects do not constitute a set in the normal usage of the word.

SOLVED PROBLEMS

Example1. Suppose that the 5 objects from which selections are to be made are: a, a, a, b, c. then the 3-combinations of these 5 objects are : aaa, aab, aac, abc. The permutations are:

aaa, aab, aba, baa, aac, aca, caa,

abc, acb, bac, bca, cab, cba.

Neither do these definitions say anything about any rules governing the selection of the r-objects: on

one extreme, objects could be chosen where all repetition is forbidden, or on the other extreme, each object

may be chosen up to t times, or then again may be some rule of selection between these extremes; for

instance, the rule that would allow a given object to be repeated up to a certain specified number of times.


3


We will use expressions like {3 . a , 2. b ,5.c} to indicate either

(1) that we have 3 + 2 + 5 =10 objects including 3a’s , 2b’s and 5c’s, or (2) that we have 3 objects a, b, c,

where selections are constrained by the conditions that a can be selected at most three times, b can be

selected at most twice, and c can be chosen up to five times.

The numbers 3, 2 and 5 in this example will be called repetition numbers.

Example 2 The 3-combinations of {3. a, 2. b, 5. c} are:

aaa, aab, aac, abb, abc, ccc, ccb, cca, cbb.

Example 3. The 3-combinations of {3 . a, 2. b, 2. c , 1. d} are:

aaa, aab, aac, aad, bba, bbc, bbd, cca, ccb, ccd, abc, abd, acd, bcd.

In order to include the case where there is no limit on the number of times an object can be repeated in a selection (except that imposed by the size of the selection) we use the symbol ∞ as a repetition number to mean that an object can occur an infinite number of times.

Example 4. The 3-combinations of {∞. a, 2.b, ∞.c} are the same as in Example 2 even though a and c can be repeated an infinite number of times. This is because, in 3-combinations, 3 is the limit on the number of objects to be chosen.

If we are considering selections where each object has ∞ as its repetition number then we designate

such selections as selections with unlimited repetitions. In particular, a selection of r objects in this case will

be called r-combinations with unlimited repetitions and any ordered arrangement of these r objects will be an

r-permutation with unlimited repetitions.

Example5 The combinations of a ,b, c, d with unlimited repetitions are the 3-combinations of {∞ . a , ∞. b, ∞. c, ∞. d}. These are 20 such 3-combinations, namely: aaa, aab, aac, aad, bbb, bba, bbc, bbd, ccc, cca, ccb, ccd, ddd, dda, ddb, ddc, abc, abd, acd, bcd.Moreover, there are 43 = 64 of 3-permutations with unlimited repetitions since the first position can be filled 4 ways (with a, b, c, or d), the second position can be filled 4 ways, and likewise for the third position. The 2-permutations of {∞. a, ∞. b, ∞. c, ∞. d} do not present such a formidable list and so we tabulate them in the following table.


4


2-combinationswith Unlimited

Repetitions

2-permutations

with Unlimited Repetitions

aa aa

ab ab, ba

ac ac, ca

ad ad, da

bb bb

bc bc, cb

bd bd, db

cc cc

cd cd, dc

dd dd

10 16

Of course, these are not the only constraints that can be placed on selections; the

possibilities are endless. We list some more examples just for concreteness. We might, for example, consider

selections of {∞.a, ∞. b, ∞. c} where b can be chosen only even number of times. Thus, 5-combinations with

these repetition numbers and this constraint would be those 5-combinations with unlimited repetitions and

where b is chosen 0, 2, or 4 times.

Example6 The 3-combinations of {∞ .a, ∞ .b, 1 .c,1 .d} where b can be chosen only an even number of

times are the 3-combinations of a, b, c, d where a can be chosen up 3 times, b can be chosen 0 or 2 times, and

c and d can be chosen at most once. The 3-cimbinations subject to these constraints are:

aaa, aac, aad, bbc, bbd, acd.

As another example, we might be interested in, selections of {∞.a, 3.b, 1.c} where a can be chosen a

prime number of times. Thus, the 8-combinations subject to these constraints would be all those 8-

combinations where a can be chosen 2, 3, 5, or 7 times, b can chosen up to 3 times, and c can be chosen at

most once.


5


There are, as we have said, an infinite variety of constraints one could place on selections. You can

just let your imagination go free in conjuring up different constraints on the selection, would constitute an r-

combination according to our definition. Moreover, any arrangement of these r objects would constitute an r-

permutation.

While there may be an infinite variety of constraints, we are primarily interested in two major types:

one we have already described—combinations and permutations with unlimited repetitions, the other we

now describe.

If the repetition numbers are all 1, then selections of r objects are called r-combinations without

repetitions and arrangements of the r objects are r-permutations without repetitions. We remind you that r-

combinations without repetitions are just subsets of the n elements containing exactly r elements. Moreover,

we shall often drop the repetition number 1 when considering r-combinations without repetitions. For

example, when considering r-combinations of {a, b, c, d} we will mean that each repetition number is 1

unless otherwise designated, and, of course, we mean that in a given selection an element need not be chosen

at all, but, if it is chosen, then in this selection this element cannot be chosen again.

Example7. Suppose selections are to be made from the four objects a, b, c, d.

2-combinations

without Repetitions

2-Permutations

without Repetitions

ab ab, ba

ac ac, ca

ad ad, da

bc bc, cb

bd bd, db

cd cd, dc

6

12

There are six 2-combinations without repetitions and to each there are two 2-permutations giving a total of

twelve 2-permutations without repetitions.

Note that total number of 2-combinations with unlimited repetitions in Example 5 included six 2-


6


combinations without repetitions of Example.7 and as well 4 other 2-combinations where repetitions actually

occur. Likewise, the sixteen 2-permutations with unlimited repetitions included the twelve 2-permutations

without repetitions.

3-combinations without Repetitions

3-Permutations

without Repetitions

abc abc, acb, bac, bca, cab, cba

abd abd, adb, bad, bda, dab, dba

acd acd, adc, cad, cda, dac, dca

bcd bcd, bdc, cbd, cdb, dbc, dcb

4

24

Note that to each of the 3-combinations without repetitions there are 6 possible 3-

permutations without repetitions. Momentarily, we will show that this observation can be generalized.

Combinations And Permutations With Repetitions:

General formulas for enumerating combinations and permutations will now be presented. At this

time, we will only list formulas for combinations and permutations without repetitions or with unlimited

repetitions. We will wait until later to use generating functions to give general techniques for enumerating

combinations where other rules govern the selections.

Let P (n, r) denote the number of r-permutations of n elements without repetitions.

Theorem 5.3.1.( Enumerating r-permutations without repetitions).

P(n, r) = n(n-1)……. (n – r + 1) = n! / (n-r)!

Proof. Since there are n distinct objects, the first position of an r-permutation may be filled in n ways.

This done, the second position can be filled in n-1 ways since no repetitions are allowed and there are n – 1

objects left to choose from. The third can be filled in n-2 ways. By applying the product rule, we conduct that


7


P (n, r) = n(n-1)(n-2)……. (n – r + 1).

From the definition of factorials, it follows that

P (n, r) = n! / (n-r)!

When r = n, this formula becomes

P (n, n) = n! / 0! = n!

When we explicit reference to r is not made, we assume that all the objects are to be arranged; thus we talk

about the permutations of n objects we mean the case r=n.

Corollary 1. There are n! permutations of n distinct objects.

Example 1. There are 3! = 6 permutations of {a, b, c}.

There are 4! = 24 permutations of (a, b, c, d). The number of 2-permutations

{a, b, c, d, e} is P(5, 2) = 5! / (5 - 2)! = 5 x 4 = 20. The number of 5-letter words using the letters a, b, c, d,

and e at most once is P (5, 5) = 120.

Example 2 There are P (10, 4) = 5,040 4-digit numbers that contain no repeated digits since each such

number is just an arrangement of four of the digits 0, 1, 2, 3 , …., 9 (leading zeroes are allowed). There are P

(26, 3) P(10, 4) license plates formed by 3 distinct letters followed by 4 distinct digits.

Example3. In how many ways can 7 women and 3 men be arranged in a row if the 3 men must always

stand next to each other?

There are 3! ways of arranging the 3 men. Since the 3 men always stand next to each other, we treat

them as a single entity, which we denote by X. Then if W1, W2, ….., W7 represents the women, we next are

interested in the number of ways of arranging {X, W1, W2, W3,……., W7}. There are 8! permutations these

8 objects. Hence there are (3!) (8!) permutations altogether. (of course, if there has to be a prescribed order

of an arrangement on the 3 men then there are only 8! total permutations).

Example4. In how many ways can the letters of the English alphabet be arranged so that there are exactly

5 letters between the letters a and b?


8

Discrete Mathematics Department of CSE There are P (24, 5) ways to arrange the 5 letters between a and b, 2 ways to place a and b, and then 20! ways to arrange any 7-letter word treated as one unit along with the remaining 19 letters. The total is P (24, 5) (20!) (2).

permutations for the objects are being arranged in a line. If instead of arranging objects in a line, we arrange them in a circle, then the number of permutations decreases.

Example 5. In how many ways can 5 children arrange themselves in a ring?

Solution. Here, the 5 children are not assigned to particular places but are only arranged relative to one

another. Thus, the arrangements (see Figure 2-3) are considered the same if the children are in the same

order clockwise. Hence, the position of child C1 is immaterial and it is only the position of the 4 other

children relative to C1 that counts. Therefore, keeping C1 fixed in position, there are 4! arrangements of the

remaining children.

Binomial Coefficients:

In mathematics, the binomial coefficient is the coefficient of the x k term in the polynomial expansion of the binomial power (1 + x) n.

In combinatorics, is interpreted as the number of k-element subsets (the k-combinations) of an n-element

set, that is the number of ways that k things can be "chosen" from a set of n things. Hence, is often read as

"n choose k" and is called the choose function of n and k. The notation was introduced by Andreas von Ettingshausen in 182, although the numbers were already known centuries before that (see Pascal's triangle). Alternative notations include C(n, k), nCk, nCk, , in all of which the C stands for combinations or choices.

For natural numbers (taken to include 0) n and k, the binomial coefficient can be defined as the coefficient of the monomial Xk in the expansion of (1 + X)n. The same coefficient also occurs (if k ≤ n) in the binomial formula

(valid for any elements x,y of a commutative ring), which explains the name "binomial coefficient".

Another occurrence of this number is in combinatorics, where it gives the number of ways, disregarding order, that a k objects can be chosen from among n objects; more formally, the number of k-element subsets (or k-combinations) of an n-element set. This number can be seen to be equal to the one of the first


9

Discrete Mathematics Department of CSEdefinition, independently of any of the formulas below to compute it: if in each of the n factors of the power (1 + X)n one temporarily labels the term X with an index i (running from 1 to n), then each subset of k indices gives after expansion a contribution Xk, and the coefficient of that monomial in the result will be the number

of such subsets. This shows in particular that is a natural number for any natural numbers n and k. There are many other combinatorial interpretations of binomial coefficients (counting problems for which the answer is given by a binomial coefficient expression), for instance the number of words formed of n bits (digits 0 or 1) whose sum is k, but most of these are easily seen to be equivalent to counting k-combinations.

Several methods exist to compute the value of without actually expanding a binomial power or counting k-combinations.

Binomial Multinomial theorems:Binomial theorem:In elementary algebra, the binomial theorem describes the algebraic expansion of powers of a binomial. According to the theorem, it is possible to expand the power (x + y)n into a sum involving terms of the form axbyc, where the coefficient of each term is a positive integer, and the sum of the exponents of x and y in each term is n. For example,

The coefficients appearing in the binomial expansion are known as binomial coefficients. They are the same as the entries of Pascal's triangle, and can be determined by a simple formula involving factorials. These numbers also arise in combinatorics, where the coefficient of xn−kyk is equal to the number of different combinations of k elements that can be chosen from an n-element set.

According to the theorem, it is possible to expand any power of x + y into a sum of the form

where denotes the corresponding binomial coefficient. Using summation notation, the formula above can be written

This formula is sometimes referred to as the Binomial Formula or the Binomial Identity.

A variant of the binomial formula is obtained by substituting 1 for x and x for y, so that it involves only a single variable. In this form, the formula reads


10


or equivalently

EXAMPLE

Simplify (x+v(x2-1)) + (x- v(x2-1))6

Solution: let vx2-1 = a, so we have:

(x=a)6 + (x-a)6

= [x6+6C1x5.a+6C2.x4.a2 + 6C3x3a3 + 6C4x2a4 + 6C5xa5 +6C6a6]

+ [x6-6C1x5a+6C2.x4.a2 – 6C3x3a3 + 6C4x2a4 – 6C5xa5 +6C6a6]

= 2[x6+6C2x4a2+6C4x2a4+6C6a6]

= 2[x6+15x4(x2-1)+15x2(x2-1)2+(x2-1)3]

= 2[x6+15x6-15x4+15x6+15x2-30x4+x6-1-3x4+3x3]

= 2[32x6-48x4+18x2-1]

Multinomial theorem:

In mathematics, the multinomial theorem says how to write a power of a sum in terms of powers of the terms in that sum. It is the generalization of the binomial theorem to polynomials.

For any positive integer m and any nonnegative integer n, the multinomial formula tells us how a polynomial expands when raised to an arbitrary power:

The summation is taken over all sequences of nonnegative integer indices k1 through km such the sum of all ki

is n. That is, for each term in the expansion, the exponents must add up to n. Also, as with the binomial theorem, quantities of the form x0 that appear are taken to equal 1 (even when x equals zero). Alternatively, this can be written concisely using multiindices as


11


where α = (α1,α2,…,αm) and xα = x1α

1x2α

2⋯xmα

m.

Example

(a + b + c)3 = a3 + b3 + c3 + 3a2b + 3a2c + 3b2a + 3b2c + 3c2a + 3c2b + 6abc.

We could have calculated each coefficient by first expanding (a + b + c)2 = a2 + b2 + c2 + 2ab + 2bc + 2ac, then self-multiplying it again to get (a + b + c)3 (and then if we were raising it to higher powers, we'd multiply it by itself even some more). However this process is slow, and can be avoided by using the multinomial theorem. The multinomial theorem "solves" this process by giving us the closed form for any coefficient we might want. It is possible to "read off" the multinomial coefficients from the terms by using the multinomial coefficient formula. For example:

a2b0c1 has the coefficient

a1b1c1 has the coefficient .

We could have also had a 'd' variable, or even more variables—hence the multinomial theorem.

The principles of Inclusion – Exclusion:

Let denote the cardinality of set , then it follows immediately that

(1)

where denotes union, and denotes intersection. The more general statement

(2)

also holds, and is known as Boole's inequality.

This formula can be generalized in the following beautiful manner. Let be a p-system of consisting of sets , ..., , then


12

Discrete Mathematics Department of CSE3

where the sums are taken over k-subsets of . This formula holds for infinite sets as well as finite sets.

The principle of inclusion-exclusion was used by Nicholas Bernoulli to solve the recontres problem of finding the number of derangements.

For example, for the three subsets , , and of ,

the following table summarizes the terms appearing the sum.

# term set length1 2, 3, 7, 9, 10 5

1, 2, 3, 9 42, 4, 9, 10 4

2 2, 3, 9 32, 9, 10 32, 9 2

3 2, 9 2

is therefore equal to , corresponding to the seven elements .

Pigeon hole principles and its application:

The statement of the Pigeonhole Principle:

If m pigeons are put into m pigeonholes, there is an empty hole iff there's a hole with more than one pigeon.

If n > m pigeons are put into m pigeonholes, there's a hole with more than one pigeon.

Example:

Consider a chess board with two of the diagonally opposite corners removed. Is it possible to cover the board with pieces of domino whose size is exactly two board squares?


13

http://www.cut-the-knot.org/do_you_know/few_words.shtml#iff

Discrete Mathematics Department of CSESolution

No, it's not possible. Two diagonally opposite squares on a chess board are of the same color. Therefore, when these are removed, the number of squares of one color exceeds by 2 the number of squares of another color. However, every piece of domino covers exactly two squares and these are of different colors. Every placement of domino pieces establishes a 1-1 correspondence between the set of white squares and the set of black squares. If the two sets have different number of elements, then, by the Pigeonhole Principle, no 1-1 correspondence between the two sets is possible.

Generalizations of the pigeonhole principle

A generalized version of this principle states that, if n discrete objects are to be allocated to m containers, then at least one container must hold no fewer than objects, where is the ceiling function, denoting the smallest integer larger than or equal to x. Similarly, at least one container must hold no more than objects, where is the floor function, denoting the largest integer smaller than or equal to x.

A probabilistic generalization of the pigeonhole principle states that if n pigeons are randomly put into m pigeonholes with uniform probability 1/m, then at least one pigeonhole will hold more than one pigeon with probability

where (m)n is the falling factorial m(m − 1)(m − 2)...(m − n + 1). For n = 0 and for n = 1 (and m > 0), that probability is zero; in other words, if there is just one pigeon, there cannot be a conflict. For n > m (more pigeons than pigeonholes) it is one, in which case it coincides with the ordinary pigeonhole principle. But even if the number of pigeons does not exceed the number of pigeonholes (n ≤ m), due to the random nature of the assignment of pigeons to pigeonholes there is often a substantial chance that clashes will occur. For example, if 2 pigeons are randomly assigned to 4 pigeonholes, there is a 25% chance that at least one pigeonhole will hold more than one pigeon; for 5 pigeons and 10 holes, that probability is 69.76%; and for 10 pigeons and 20 holes it is about 93.45%. If the number of holes stays fixed, there is always a greater probability of a pair when you add more pigeons. This problem is treated at much greater length at birthday paradox.

A further probabilistic generalisation is that when a real-valued random variable X has a finite mean E(X), then the probability is nonzero that X is greater than or equal to E(X), and similarly the probability is nonzero that X is less than or equal to E(X). To see that this implies the standard pigeonhole principle, take any fixed arrangement of n pigeons into m holes and let X be the number of pigeons in a hole chosen uniformly at random. The mean of X is n/m, so if there are more pigeons than holes the mean is greater than one. Therefore, X is sometimes at least 2.

Applications:

The pigeonhole principle arises in computer science. For example, collisions are inevitable in a hash table because the number of possible keys exceeds the number of indices in the array. No hashing algorithm, no matter how clever, can avoid these collisions. This principle also proves that any general-purpose lossless St.peters Engineering college

14

Discrete Mathematics Department of CSEcompression algorithm that makes at least one input file smaller will make some other input file larger. (Otherwise, two files would be compressed to the same smaller file and restoring them would be ambiguous.)

A notable problem in mathematical analysis is, for a fixed irrational number a, to show that the set {[na]: n is an integer} of fractional parts is dense in [0, 1]. After a moment's thought, one finds that it is not easy to explicitly find integers n, m such that |na − m| < e, where e > 0 is a small positive number and a is some arbitrary irrational number. But if one takes M such that 1/M < e, by the pigeonhole principle there must be n1, n2 ∈ {1, 2, ..., M + 1} such that n1a and n2a are in the same integer subdivision of size 1/M (there are only M such subdivisions between consecutive integers). In particular, we can find n1, n2 such that n1a is in (p + k/M, p + (k + 1)/M), and n2a is in (q + k/M, q + (k + 1)/M), for some p, q integers and k in {0, 1, ..., M − 1}. We can then easily verify that (n2 − n1)a is in (q − p − 1/M, q − p + 1/M). This implies that [na] < 1/M < e, where n = n2 − n1 or n = n1 − n2. This shows that 0 is a limit point of {[na]}. We can then use this fact to prove the case for p in (0, 1]: find n such that [na] < 1/M < e; then if p ∈ (0, 1/M], we are done. Otherwise p in (j/M, (j + 1)/M], and by setting k = sup{r ∈ N : r[na] < j/M}, one obtains |[(k + 1)na] − p| < 1/M < e.

Recurrence Relation

Generating Functions:

In mathematics, a generating function is a formal power series in one indeterminate, whose coefficients encode information about a sequence of numbers an that is indexed by the natural numbers. Generating functions were first introduced by Abraham de Moivre in 1730, in order to solve the general linear recurrence problem. One can generalize to formal power series in more than one indeterminate, to encode information about arrays of numbers indexed by several natural numbers.

Generating functions are not functions in the formal sense of a mapping from a domain to a codomain; the name is merely traditional, and they are sometimes more correctly called generating series.

Ordinary generating function

The ordinary generating function of a sequence an is

When the term generating function is used without qualification, it is usually taken to mean an ordinary generating function.St.peters Engineering college

15

Discrete Mathematics Department of CSEIf an is the probability mass function of a discrete random variable, then its ordinary generating function is called a probability-generating function.

The ordinary generating function can be generalized to arrays with multiple indices. For example, the ordinary generating function of a two-dimensional array am, n (where n and m are natural numbers) is

Example:

Exponential generating functionThe exponential generating function of a sequence an is

Example:

Function of Sequences:Generating functions giving the first few powers of the nonnegative integers are given in the following table.

series1

There are many beautiful generating functions for special functions in number theory. A few particularly nice examples are

(2)


16

Discrete Mathematics Department of CSE(3)

(4)

for the partition function P, where is a q-Pochhammer symbol, and

(5)

(6)

(7)

for the Fibonacci numbers .

Generating functions are very useful in combinatorial enumeration problems. For example, the subset sum problem, which asks the number of ways to select out of given integers such that their sum equals , can be solved using generating functions.

Calculating Coefficient of generating function:

By using the following polynomial expansions, we can calculate the coefficient of a generating function.


17


2(1 ) 1 ( ,1) ( , 2) ... ( 1) ( , ) ... ( 1) ( , )m n m m k km n nmx C n x C n x C n k x C n n x

121 1 ...

1

mmx x x x

x

1)

21 1 ...1

x xx

2)

2(1 ) 1 ( ,1) ( , 2) ... ( , ) ... ( , )n r nx C n x C n x C n r x C n n x 3)

4)

21 1 (1 1,1) (2 1, 2) ... ( 1, ) ...(1 )

rn C n x C n x C r n r x

x

5)

6) If h(x)=f(x)g(x), where f(x) and g(x) , then

h(x)

20 1 2 ...a a x a x 2

0 1 2 ...b b x b x

20 0 1 0 0 1 2 0 1 1 0 2 0 1 1 2 2 0( ) ( ) ... ( ... ) ...r

r r r ra b a b a b x a b a b a b x a b a b a b a b x

Polynomial Expansions:


18


Recurrence relations:

Introduction :A recurrence relation is a formula that relates for any integer n ≥ 1, the n-th term of a sequence A = {ar}∞r=0 to one or more of the terms a0,a1,….,an-1.Example. If Sn denotes the sum of the first n positive integers, then

(1) Sn = n + Sn-1. Similarly if d is a real number, then the nth term of an arithmetic progression with common difference d satisfies the relation

(2) an = an -1 + d. Likewise if pn denotes the nth term of a geometric progression with common ratio r, then

(3) pn = r pn – 1. We list other examples as: (4) an – 3an-1 + 2an-2 = 0. (5) an – 3 an-1+ 2 an-2 = n2 + 1. (6) an – (n - 1) an-1 - (n - 1) an-2 = 0. (7) an – 9 an-1+ 26 an-2 – 24 an-3 = 5n. (8) an – 3(an-1)2 + 2 an-2 = n. (9) an = a0 an-1+ a1 an-2+ … + an-1a0. (10) a2n + (an-1)2 = -1.

Definition. Suppose n and k are nonnegative integers. A recurrence relation of the form c0(n)an + c1(n)an-1 + …. + ck(n)an-k = f(n) for n ≥ k, where c0(n), c1(n),…., ck(n), and f(n) are functions of n is said to be a linear recurrence relation. If c0(n) and ck(n) are not identically zero, then it is said to be a linear recurrence relation degree k. If c0(n), c1(n),…., ck(n) are constants, then the recurrence relation is known as a linear relation with constant coefficients. If f(n) is identically zero, then the recurrence relation is said to be homogeneous; otherwise, it is inhomogeneous.

Thus, all the examples above are linear recurrence relations except (8), (9), and (10); the relation (8), for instance, is not linear because of the squared term. The relations in (3), (4) , (5), and (7) are linear with constant coefficients. Relations (1), (2), and (3) have degree 1; (4), (5), and (6) have degree 2; (7) has degree 3. Relations (3) , (4),


19

Discrete Mathematics Department of CSEand (6) are homogeneous. There are no general techniques that will enable one to solve all recurrence relations. There are, nevertheless, techniques that will enable us to solve linear recurrence relations with constant coefficients.

SOLVING RECURRENCE RELATIONS BY SUSTITUTION AND GENERATING FUNCTIONS We shall consider four methods of solving recurrence relations in this and the next two sections:

1. Substitution (also called iteration),

2. Generating functions,

3. Characteristics roots, and

4. Undetermined coefficients.

In the substitution method the recurrence relation for an is used repeatedly to solve for a general

expression for an in terms of n. We desire that this expression involve no other terms of the sequence except

those given by boundary conditions.

The mechanics of this method are best described in terms of examples. We used this method in

Example5.3.4. Let us also illustrate the method in the following examples.

Example

Solve the recurrence relation an = a n-1 + f(n) for n ³1 by substitution

a1= a0 + f(1)

a2 = a1 + f(2) = a0 + f(1) + f(2))

a3 = a2 + f(3)= a0 + f(1) + f(2) + f(3)

.

.

.

an = a0 + f(1) + f(2) +….+ f(n)

n

= a0 + ∑ f(k)

K = 1

Thus, an is just the sum of the f(k) ‘s plus a0.


20


More generally, if c is a constant then we can solve an = c a n-1 + f(n) for n ³1 in the same way:

a1 = c a0 + f(1)

a2 = c a1 + f(2) = c (c a0 + f(1)) + f(2)

= c2 a0 + c f(1) + f(2)

a3= c a2 + f(3) = c(c 2 a0 + c f(1) + f(2)) + f(3)

=c3 a0 + c2 f(1) + c f(2) + f(3)

.

.

.

an = c a n-1 + f(n) = c(c n-1 a0 + c n-2 f(1) +. . . + c n-2 + f(n-1)) + f(n)

=c n a0 + c n-1 f(1) + c n-2 f(2) +. . .+ c f(n-1) + f(n)

Or

an = c n a0 + ∑c n-k f(k)

Solution of Linear Inhomogeneous Recurrence Relations:

The equation 𝑎𝑛+𝑐1𝑎𝑛−1+𝑐2𝑎𝑛−2=(𝑛), where 𝑐1and 𝑐2 are constant, and 𝑓(𝑛) is not identically 0, is called a second-order linear inhomogeneous recurrence relation (or difference equation) with constant coefficients. The homogeneous case, which we’ve looked at already, occurs when (𝑛)≡0. The inhomogeneous case occurs more frequently. The homogeneous case is so important largely because it gives us the key to solving the inhomogeneous equation. If you’ve studied linear differential equations with constant coefficients, you’ll see the parallel. We will call the difference obtained by setting the right-hand side equal to 0, the “associated homogeneous equation.” We know how to solve this. Say that 𝑉 is a solution. Now suppose that (𝑛) is any particular solution of the inhomogeneous equation. (That is, it solves the equation, but does not necessarily match the initial data.) Then 𝑈=𝑉+(𝑛) is a solution to the inhomogeneous equation, which you can see simply by substituting 𝑈 into the equation. On the other hand, every solution 𝑈 of the inhomogeneous equation is of the form 𝑈=𝑉+(𝑛) where 𝑉 is a solution of the homogeneous equation, and 𝑔(𝑛) is a particular solution of the inhomogeneous equation. The proof of this is straightforward. If we have two solutions to the inhomogeneous equation, say 𝑈1 and 𝑈2, then their difference 𝑈1−𝑈2=𝑉 is a solution to the homogeneous equation, which you can check by substitution. But then 𝑈1=𝑉+𝑈2, and we can set 𝑈2=(𝑛), since by assumption, 𝑈2 is a particular solution. This leads to the following theorem: the general solution to the inhomogeneous equation is the general solution to the associated homogeneous equation, plus any particular solution to the inhomogeneous equation. This gives the following procedure for solving the inhomogeneous equation: 1) Solve the associated homogeneous equation by the method we’ve learned. This will involve variable (or undetermined) coefficients.


21

Discrete Mathematics Department of CSE2) Guess a particular solution to the inhomogeneous equation. It is because of the guess that I’ve called this a procedure, not an algorithm. For simple right-hand sides , we can say how to compute a particular solution, and in these cases, the procedure merits the name “algorithm.” 3) The general solution to the inhomogeneous equation is the sum of the answers from the two steps above. 4) Use the initial data to solve for the undetermined coefficients from step 1.


22

Discrete Mathematics Department of CSETo solve the equation 𝑎𝑛 − 6𝑎𝑛−1 + 8𝑎𝑛−2 = 3. Let’s suppose that we are also given the initial data 𝑎0 = 3, 𝑎1 = 3. The associated homogeneous equation is 𝑎𝑛 − 6𝑎𝑛−1 + 8𝑎𝑛−2 = 0, so the characteristic equation is 𝑟2 − 6𝑟 + 8 = 0, which has roots 𝑟1 = 2 and 𝑟2 = 4. Thus, the general solution to the associated homogeneous equation is 𝑐12𝑛 + 𝑐24 . When the right-hand side is a polynomial, as in this case, there will always be a particular solution that is a polynomial. Usually, a polynomial of the same degree will work, so we’ll guess in this case that there is a constant 𝐶 that solves the homogeneous equation. If that is so, then 𝑎𝑛 = 𝑎𝑛 −1 = 𝑎𝑛 −2 = 𝐶, and substituting into the equation gives 𝐶 − 6𝐶 + 8𝐶 = 3, and we find that 𝐶 = 1. Now, the general solution to the inhomogeneous equations is 𝑐12𝑛 + 𝑐24𝑛 + 1. Reassuringly, this is the answer given in the back of the book. Our initial data lead to the equations 𝑐1 + 𝑐2 + 1 = 3 and 2𝑐1 + 4𝑐2 + 1 = 3, whose solution is 𝑐1 = 3, 𝑐2 = −1. Finally, the solution to the inhomogeneous equation, with the initial condition given, is 𝑎𝑛 = 3 ∙ 2𝑛 − 4𝑛 + 1. Sometimes, a polynomial of the same degree as the right-hand side doesn’t work. This happens when the characteristic equation has 1 as a root. If our equation had been 𝑎𝑛 − 6𝑎𝑛 −1 + 5𝑎𝑛−2 = 3, when we guessed that the particular solution was a constant 𝐶, we’d have arrived at the equation 𝐶 − 6𝐶 + 5𝐶 = 3, or 0 = 3. The way to deal with this is to increase the degree of the polynomial. Instead of assuming that the solution is constant, we’ll assume that it’s linear. In fact, we’ll guess that it is of the form 𝑔 𝑛 =𝑛𝐶. Then we have 𝑛𝐶−6 𝑛−1 𝐶+5 𝑛−2 𝐶=3, which simplifies to 6𝐶−10𝐶=3 so that 𝐶=−34 . Thus, 𝑔 𝑛 = −3𝑛4 . This won’t be enough if 1 is a root of multiplicity 2, that is, if 𝑟−1 2 is a factor of the characteristic polynomial. Then there is a particular solution of the form 𝑔 𝑛 =𝐶𝑛2. For second-order equations, you never have to go past this. If the right-hand side is a polynomial of degree greater than 0, then the process works juts the same, except that you start with a polynomial of the same degree, increase the degree by 1, if necessary, and then once more, if need be. For example, if the right-hand side were 𝑓 𝑛 =2𝑛−1, we would start by guessing a particular solution 𝑔 𝑛 =𝐶1𝑛+𝐶2. If it turned out that 1 was a characteristic root, we would amend our guess to 𝑔 𝑛 =𝐶1𝑛2+𝐶2𝑛+𝐶3. If 1 is a double root, this will fail also, but 𝑔 𝑛 =𝐶1𝑛3+𝐶2𝑛2+𝐶3𝑛+𝐶4 will work in this case. Another case where there is a simple way of guessing a particular solution is when the right-hand side is an exponential, say 𝑓 𝑛 =𝐶𝑛. In that case, we guess that a particular solution is just a constant multiple of 𝑓, say (𝑛)=𝑘𝐶𝑛. Again, we gave trouble when 1 is a characteristic root. We then guess that 𝑔 𝑛 =𝑘𝑛𝐶𝑛, which will fail only if 1 is a double root. In that case we must use 𝑔 𝑛 =𝑘𝑛2𝐶𝑛, which is as far as we ever have to go in the second-order case. These same ideas extend to higher-order recurrence relations, but we usually solve them numerically, rather than exactly. A third-order linear difference equation with constant coefficients leads to a cubic characteristic polynomial. There is a formula for the roots of a cubic, but it’s very complicated. For fourth-degree polynomials, there’s also a formula, but it’s even worse. For fifth and higher degrees, no such formula exists. Even for the third-order case, the exact solution of a simple-looking inhomogeneous linear recurrence relation with constant coefficients can take pages to write down. The coefficients will be complicated expressions involving square roots and cube roots. For most, if not all, purposes, a simpler answer with numerical coefficients is better, even though they must in the nature of things, be approximate. The procedure I’ve suggested may strike you as silly. After all, we’ve already solved the characteristic equation, so we know whether 1 is a characteristic root, and what it’s multiplicity is. Why not start with a polynomial of the correct degree? This is all well and good, while you’re taking the course, and remember the procedure in detail. However, if you have to use this procedure some years from now, you probably won’t remember all the details. Then the method I’ve suggested will be valuable. Alternatively, you can start with a general polynomial of the maximum possible degree This leads to a lot of extra work if you’re solving by hand, but it’s the approach I prefer for computer solution.


23


UNIT-V

Graph Theory

Representation of Graphs:

There are two different sequential representations of a graph. They are

Adjacency Matrix representation Path Matrix representation

Adjacency Matrix Representation

Suppose G is a simple directed graph with m nodes, and suppose the nodes of G have been ordered

and are called v1, v2, . . . , vm. Then the adjacency matrix A = (aij) of the graph G is the m x m matrix

defined as follows:

1 if vi is adjacent to Vj, that is, if there is an edge (Vi, Vj) aij =

0 otherwise Suppose G is an undirected graph. Then the adjacency matrix A of G will be a symmetric matrix, i.e., one in which aij = aji; for every i and j.

Drawbacks

1. It may be difficult to insert and delete nodes in G.

2. If the number of edges is 0(m) or 0(m log2 m), then the matrix A will be sparse, hence a great deal of

space will be wasted.

Path Matrix Represenation

Let G be a simple directed graph with m nodes, v1,v2, . . . ,vm. The path matrix of G is the m-square

matrix P = (pij) defined as follows:


24

Discrete Mathematics Department of CSE 1 if there is a path from Vi to VjPij = 0 otherwise

Graphs and Multigraphs A graph G consists of two things:

1.A set V of elements called nodes (or points or vertices)

2.A set E of edges such that each edge e in E is identified with a unique

(unordered) pair [u, v] of nodes in V, denoted by e = [u, v]

Sometimes we indicate the parts of a graph by writing G = (V, E).

Suppose e = [u, v]. Then the nodes u and v are called the endpoints of e, and u and v are said to be adjacent

nodes or neighbors. The degree of a node u, written deg(u), is the number of edges containing u. If deg(u) =

0 — that is, if u does not belong to any edge—then u is called an isolated node.

Path and Cycle

A path P of length n from a node u to a node v is defined as a sequence of n + 1 nodes. P = (v0, v1,

v2, . . . , vn) such that u = v0; vi-1 is adjacent to vi for i = 1,2, . . ., n and vn = v.

Types of Path

1. Simple Path2. Cycle Path

(i) Simple Path

Simple path is a path in which first and last vertex are different (V0 ≠ Vn)

(ii) Cycle Path

Cycle path is a path in which first and last vertex are same (V0 = Vn).It is also called as Closed path.

Connected Graph

A graph G is said to be connected if there is a path between any two of its nodes.

Complete Graph

A graph G is said to be complete if every node u in G is adjacent to every other node v in G.


25


Tree

A connected graph T without any cycles is called a tree graph or free tree or, simply, a tree.

Labeled or Weighted Graph

If the weight is assigned to each edge of the graph then it is called as Weighted or Labeled graph.

The definition of a graph may be generalized by permitting the following:

1. Multiple edges: Distinct edges e and e' are called multiple edges if they connect the same endpoints, that is, if e = [u, v] and e' = [u, v].

2. Loops: An edge e is called a loop if it has identical endpoints, that is, if e = [u, u].

3. Finite Graph:A multigraph M is said to be finite if it has a finite number of nodes and a finite number of edges.


26


Directed Graphs

A directed graph G, also called a digraph or graph is the same as a multigraph except that each edge e in G is

assigned a direction, or in other words, each edge e is identified with an ordered pair (u, v) of nodes in G.

Outdegree and Indegree

Indegree : The indegree of a vertex is the number of edges for which v is head

Example

Indegree of 1 = 1

Indegree pf 2 = 2

Outdegree :The outdegree of a node or vertex is the number of edges for which v is tail.

Example

Outdegree of 1 =1

Outdegree of 2 =2

Simple Directed Graph

A directed graph G is said to be simple if G has no parallel edges. A simple graph G may have loops, but

it cannot have more than one loop at a given node.

Graph Traversal

The breadth first search (BFS) and the depth first search (DFS) are the two algorithms used for traversing and searching a node in a graph. They can also be used to find out whether a node is reachable from a given node or not.

Depth First Search (DFS)


27

Discrete Mathematics Department of CSEThe aim of DFS algorithm is to traverse the graph in such a way that it tries to go far from the root node. Stack is used in the implementation of the depth first search. Let’s see how depth first search works with respect to the following graph:

As stated before, in DFS, nodes are visited by going through the depth of the tree from the starting node. If we do the depth first traversal of the above graph and print the visited node, it will be “A B E F C D”. DFS visits the root node and then its children nodes until it reaches the end node, i.e. E and F nodes, then moves up to the parent nodes.

Algorithmic Steps 1. Step 1: Push the root node in the Stack.2. Step 2: Loop until stack is empty.

3. Step 3: Peek the node of the stack.

4. Step 4: If the node has unvisited child nodes, get the unvisited child node, mark it as traversed and push it on stack.

5. Step 5: If the node does not have any unvisited child nodes, pop the node from the stack.

Based upon the above steps, the following Java code shows the implementation of the DFS algorithm:

public void dfs(){

//DFS uses Stack data structureStack s=new Stack();s.push(this.rootNode);rootNode.visited=true;printNode(rootNode);while(!s.isEmpty()){

Node n=(Node)s.peek();Node child=getUnvisitedChildNode(n);if(child!=null)


28

Discrete Mathematics Department of CSE{

child.visited=true;printNode(child);s.push(child);

}else{

s.pop();}

}//Clear visited property of nodesclearNodes();

}

Breadth First Search (BFS)

This is a very different approach for traversing the graph nodes. The aim of BFS algorithm is to traverse the graph as close as possible to the root node. Queue is used in the implementation of the breadth first search. Let’s see how BFS traversal works with respect to the following graph:

If we do the breadth first traversal of the above graph and print the visited node as the output, it will print the following output. “A B C D E F”. The BFS visits the nodes level by level, so it will start with level 0 which is the root node, and then it moves to the next levels which are B, C and D, then the last levels which are E and F.

Algorithmic Steps 1. Step 1: Push the root node in the Queue.2. Step 2: Loop until the queue is empty.

3. Step 3: Remove the node from the Queue.

4. Step 4: If the removed node has unvisited child nodes, mark them as visited and insert the unvisited children in the queue.


29

Discrete Mathematics Department of CSEBased upon the above steps, the following Java code shows the implementation of the BFS algorithm:

public void bfs(){

//BFS uses Queue data structureQueue q=new LinkedList();q.add(this.rootNode);printNode(this.rootNode);rootNode.visited=true;while(!q.isEmpty()){

Node n=(Node)q.remove();Node child=null;while((child=getUnvisitedChildNode(n))!=null){

child.visited=true;printNode(child);q.add(child);

}}//Clear visited property of nodesclearNodes();

}

Spanning Trees:In the mathematical field of graph theory, a spanning tree T of a connected, undirected graph G is a tree composed of all the vertices and some (or perhaps all) of the edges of G. Informally, a spanning tree of G is a selection of edges of G that form a tree spanning every vertex. That is, every vertex lies in the tree, but no cycles (or loops) are formed. On the other hand, every bridge of G must belong to T.A spanning tree of a connected graph G can also be defined as a maximal set of edges of G that contains no cycle, or as a minimal set of edges that connect all vertices.Example:


30

http://upload.wikimedia.org/wikipedia/commons/d/d4/4x4_grid_spanning_tree.svg

Discrete Mathematics Department of CSEA spanning tree (blue heavy edges) of a grid graph.

Spanning forests

A spanning forest is a type of subgraph that generalises the concept of a spanning tree. However, there are two definitions in common use. One is that a spanning forest is a subgraph that consists of a spanning tree in each connected component of a graph. (Equivalently, it is a maximal cycle-free subgraph.) This definition is common in computer science and optimisation. It is also the definition used when discussing minimum spanning forests, the generalization to disconnected graphs of minimum spanning trees. Another definition, common in graph theory, is that a spanning forest is any subgraph that is both a forest (contains no cycles) and spanning (includes every vertex).

Counting spanning trees

The number t(G) of spanning trees of a connected graph is an important invariant. In some cases, it is easy to calculate t(G) directly. It is also widely used in data structures in different computer languages. For example, if G is itself a tree, then t(G)=1, while if G is the cycle graph Cn with n vertices, then t(G)=n. For any graph G, the number t(G) can be calculated using Kirchhoff's matrix-tree theorem (follow the link for an explicit example using the theorem).

Cayley's formula is a formula for the number of spanning trees in the complete graph Kn with n vertices. The formula states that t(Kn) = nn − 2. Another way of stating Cayley's formula is that there are exactly nn − 2

labelled trees with n vertices. Cayley's formula can be proved using Kirchhoff's matrix-tree theorem or via the Prüfer code.

If G is the complete bipartite graph Kp,q, then t(G) = pq − 1qp − 1, while if G is the n-dimensional hypercube

graph Qn, then . These formulae are also consequences of the matrix-tree theorem.

If G is a multigraph and e is an edge of G, then the number t(G) of spanning trees of G satisfies the deletion-contraction recurrence t(G)=t(G-e)+t(G/e), where G-e is the multigraph obtained by deleting e and G/e is the contraction of G by e, where multiple edges arising from this contraction are not deleted.

Uniform spanning trees

A spanning tree chosen randomly from among all the spanning trees with equal probability is called a uniform spanning tree (UST). This model has been extensively researched in probability and mathematical physics.

Algorithms

The classic spanning tree algorithm, depth-first search (DFS), is due to Robert Tarjan. Another important algorithm is based on breadth-first search (BFS).


31


Planar Graphs:In graph theory, a planar graph is a graph that can be embedded in the plane, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints.A planar graph already drawn in the plane without edge intersections is called a plane graph or planar embedding of the graph. A plane graph can be defined as a planar graph with a mapping from every node to a point in 2D space, and from every edge to a plane curve, such that the extreme points of each curve are the points mapped from its end nodes, and all curves are disjoint except on their extreme points. Plane graphs can be encoded by combinatorial maps.It is easily seen that a graph that can be drawn on the plane can be drawn on the sphere as well, and vice versa.The equivalence class of topologically equivalent drawings on the sphere is called a planar map. Although a plane graph has an external or unbounded face, none of the faces of a planar map have a particular status.Applications

Telecommunications – e.g. spanning trees Vehicle routing – e.g. planning routes on roads without

underpasses

VLSI – e.g. laying out circuits on computer chip.

The puzzle game Planarity requires the player to "untangle" a planar graph so that none of its edges intersect.


32

Example graphs

Planar Nonplanar

Butterfly graph

K5

The complete graphK4 is planar

K3,3

http://en.wikipedia.org/wiki/File:Butterfly_graph.svg

http://en.wikipedia.org/wiki/File:Complete_graph_K5.svg

http://en.wikipedia.org/wiki/File:CGK4PLN.svg

http://en.wikipedia.org/wiki/File:Biclique_K_3_3.svg


Graph Theory and Applications

Graph Theory and Applications:Graphs are among the most ubiquitous models of both natural and human-made structures. They can be used to model many types of relations and process dynamics in physical, biological and social systems. Many problems of practical interest can be represented by graphs.

In computer science, graphs are used to represent networks of communication, data organization, computational devices, the flow of computation, etc. One practical example: The link structure of a website could be represented by a directed graph. The vertices are the web pages available at the website and a directed edge from page A to page B exists if and only if A contains a link to B. A similar approach can be taken to problems in travel, biology, computer chip design, and many other fields. The development of algorithms to handle graphs is therefore of major interest in computer science. There, the transformation of graphs is often formalized and represented by graph rewrite systems. They are either directly used or properties of the rewrite systems (e.g. confluence) are studied. Complementary to graph transformation systems focussing on rule-based in-memory manipulation of graphs are graph databases geared towards transaction-safe, persistent storing and querying of graph-structured data.

Graph-theoretic methods, in various forms, have proven particularly useful in linguistics, since natural language often lends itself well to discrete structure. Traditionally, syntax and compositional semantics follow tree-based structures, whose expressive power lies in the Principle of Compositionality, modeled in a hierarchical graph. Within lexical semantics, especially as applied to computers, modeling word meaning is easier when a given word is understood in terms of related words; semantic networks are therefore important in computational linguistics. Still other methods in phonology (e.g. Optimality Theory, which uses lattice graphs) and morphology (e.g. finite-state morphology, using finite-state transducers) are common in the analysis of language as a graph. Indeed, the usefulness of this area of mathematics to linguistics has borne organizations such as TextGraphs, as well as various 'Net' projects, such as WordNet, VerbNet, and others.


33

Discrete Mathematics Department of CSEGraph theory is also used to study molecules in chemistry and physics. In condensed matter physics, the three dimensional structure of complicated simulated atomic structures can be studied quantitatively by gathering statistics on graph-theoretic properties related to the topology of the atoms. For example, Franzblau's shortest-path (SP) rings. In chemistry a graph makes a natural model for a molecule, where vertices represent atoms and edges bonds. This approach is especially used in computer processing of molecular structures, ranging from chemical editors to database searching. In statistical physics, graphs can represent local connections between interacting parts of a system, as well as the dynamics of a physical process on such systems.

Graph theory is also widely used in sociology as a way, for example, to measure actors' prestige or to explore diffusion mechanisms, notably through the use of social network analysis software.Likewise, graph theory is useful in biology and conservation efforts where a vertex can represent regions where certain species exist (or habitats) and the edges represent migration paths, or movement between the regions. This information is important when looking at breeding patterns or tracking the spread of disease, parasites or how changes to the movement can affect other species.

In mathematics, graphs are useful in geometry and certain parts of topology, e.g. Knot Theory. Algebraic graph theory has close links with group theory.

A graph structure can be extended by assigning a weight to each edge of the graph. Graphs with weights, or weighted graphs, are used to represent structures in which pairwise connections have some numerical values. For example if a graph represents a road network, the weights could represent the length of each road.

Basic Concepts Isomorphism:Let G1 and G1 be two graphs and let f be a function from the vertex set of G1 to the vertex set of G2. Suppose that f is one-to-one and onto & f(v) is adjacent to f(w) in G2 if and only if v is adjacent to w in G1.

Then we say that the function f is an isomorphism and that the two graphs G1 and G2 are isomorphic. So two graphs G1 and G2 are isomorphic if there is a one-to-one correspondence between vertices of G1 and those of G2 with the property that if two vertices of G1 are adjacent then so are their images in G2. If two graphs are isomorphic then as far as we are concerned they are the same graph though the location of the vertices may be different. To show you how the program can be used to explore isomorphism draw the graph in figure 4 with the program (first get the null graph on four vertices and then use the right mouse to add edges).


34

javascript:%20openNote('lesson1.htm#adjacent',%20'adjacent')

Discrete Mathematics Department of CSESave this graph as Graph 1 (you need to click Graph then Save). Now get the circuit graph with 4 vertices. It looks like figure 5, and we shall call it C(4).

Example:

The two graphs shown below are isomorphic, despite their different looking drawings.

Graph G Graph H An isomorphismbetween G and H

ƒ(a) = 1

ƒ(b) = 6

ƒ(c) = 8

ƒ(d) = 3

ƒ(g) = 5

ƒ(h) = 2

ƒ(i) = 4

ƒ(j) = 7

Subgraphs:

A subgraph of a graph G is a graph whose vertex set is a subset of that of G, and whose adjacency relation is a subset of that of G restricted to this subset. In the other direction, a supergraph of a graph G is a graph


35

http://en.wikipedia.org/wiki/File:Graph_isomorphism_a.svg

http://en.wikipedia.org/wiki/File:Graph_isomorphism_b.svg

Discrete Mathematics Department of CSEof which G is a subgraph. We say a graph G contains another graph H if some subgraph of G is H or is isomorphic to H.

A subgraph H is a spanning subgraph, or factor, of a graph G if it has the same vertex set as G. We say H spans G.

A subgraph H of a graph G is said to be induced if, for any pair of vertices x and y of H, xy is an edge of H if and only if xy is an edge of G. In other words, H is an induced subgraph of G if it has all the edges that appear in G over the same vertex set. If the vertex set of H is the subset S of V(G), then H can be written as G[S] and is said to be induced by S.

A graph that does not contain H as an induced subgraph is said to be H-free.

A universal graph in a class K of graphs is a simple graph in which every element in K can be embedded as a subgraph.

K5, a complete graph. If a subgraph looks like this, the vertices in that subgraph form a clique of size 5.

Multi graphs:

In mathematics, a multigraph or pseudograph is a graph which is permitted to have multiple edges, (also called "parallel edges"), that is, edges that have the same end nodes. Thus two vertices may be connected by more than one edge. Formally, a multigraph G is an ordered pair G:=(V, E) with

V a set of vertices or nodes, E a multiset of unordered pairs of vertices, called edges or lines.

Multigraphs might be used to model the possible flight connections offered by an airline. In this case the multigraph would be a directed graph with pairs of directed parallel edges connecting cities to show that it is possible to fly both to and from these locations.


36




A multigraph with multiple edges (red) and a loop (blue). Not all authors allow multigraphs to have loops.

Euler circuits:

In graph theory, an Eulerian trail is a trail in a graph which visits every edge exactly once. Similarly, an Eulerian circuit is an Eulerian trail which starts and ends on the same vertex. They were first discussed by Leonhard Euler while solving the famous Seven Bridges of Königsberg problem in 1736. Mathematically the problem can be stated like this:

Given the graph on the right, is it possible to construct a path (or a cycle, i.e. a path starting and ending on the same vertex) which visits each edge exactly once?

Euler proved that a necessary condition for the existence of Eulerian circuits is that all vertices in the graph have an even degree, and stated without proof that connected graphs with all vertices of even degree have an Eulerian circuit. The first complete proof of this latter claim was published in 1873 by Carl Hierholzer.

The term Eulerian graph has two common meanings in graph theory. One meaning is a graph with an Eulerian circuit, and the other is a graph with every vertex of even degree. These definitions coincide for connected graphs.

For the existence of Eulerian trails it is necessary that no more than two vertices have an odd degree; this means the Königsberg graph is not Eulerian. If there are no vertices of odd degree, all Eulerian trails are circuits. If there are exactly two vertices of odd degree, all Eulerian trails start at one of them and end at the other. Sometimes a graph that has an Eulerian trail but not an Eulerian circuit is called semi-Eulerian.

An Eulerian trail, Eulerian trail or Euler walk in an undirected graph is a path that uses each edge exactly once. If such a path exists, the graph is called traversable or semi-eulerian. St.peters Engineering college

37

http://en.wikipedia.org/wiki/File:Multigraph.svg

Discrete Mathematics Department of CSEAn Eulerian cycle, Eulerian circuit or Euler tour in an undirected graph is a cycle that uses each edge exactly once. If such a cycle exists, the graph is called unicursal. While such graphs are Eulerian graphs, not every Eulerian graph possesses an Eulerian cycle.

For directed graphs path has to be replaced with directed path and cycle with directed cycle.

The definition and properties of Eulerian trails, cycles and graphs are valid for multigraphs as well.

This graph is not Eulerian, therefore, a solution does not exist.

Every vertex of this graph has an even degree, therefore this is an Eulerian graph. Following the edges in alphabetical order gives an Eulerian circuit/cycle.

Hamiltonian graphs:In the mathematical field of graph theory, a Hamiltonian path (or traceable path) is a path in an undirected graph which visits each vertex exactly once. A Hamiltonian cycle (or Hamiltonian circuit) is a cycle in an undirected graph which visits each vertex exactly once and also returns to the starting vertex. Determining whether such paths and cycles exist in graphs is the Hamiltonian path problem which is NP-complete.

Hamiltonian paths and cycles are named after William Rowan Hamilton who invented the Icosian game, now also known as Hamilton's puzzle, which involves finding a Hamiltonian cycle in the edge graph of the dodecahedron. Hamilton solved this problem using the Icosian Calculus, an algebraic structure based on roots of unity with many similarities to the quaternions (also invented by Hamilton). This solution does not generalize to arbitrary graphs.St.peters Engineering college

38

http://en.wikipedia.org/wiki/File:Konigsburg_graph.svg

http://en.wikipedia.org/wiki/File:Konigsburg_graph.svg

http://en.wikipedia.org/wiki/File:Labelled_Eulergraph.svg

Discrete Mathematics Department of CSEA Hamiltonian path or traceable path is a path that visits each vertex exactly once. A graph that contains a Hamiltonian path is called a traceable graph. A graph is Hamilton-connected if for every pair of vertices there is a Hamiltonian path between the two vertices.

A Hamiltonian cycle, Hamiltonian circuit, vertex tour or graph cycle is a cycle that visits each vertex exactly once (except the vertex which is both the start and end, and so is visited twice). A graph that contains a Hamiltonian cycle is called a Hamiltonian graph.

Similar notions may be defined for directed graphs, where each edge (arc) of a path or cycle can only be traced in a single direction (i.e., the vertices are connected with arrows and the edges traced "tail-to-head").

A Hamiltonian decomposition is an edge decomposition of a graph into Hamiltonian circuits.

Examples a complete graph with more than two vertices is Hamiltonian every cycle graph is Hamiltonian

every tournament has an odd number of Hamiltonian paths

every platonic solid, considered as a graph, is Hamiltonian

Chromatic Numbers:

In graph theory, graph coloring is a special case of graph labeling; it is an assignment of labels traditionally called "colors" to elements of a graph subject to certain constraints. In its simplest form, it is a way of coloring the vertices of a graph such that no two adjacent vertices share the same color; this is called a vertex coloring. Similarly, an edge coloring assigns a color to each edge so that no two adjacent edges share the same color, and a face coloring of a planar graph assigns a color to each face or region so that no two faces that share a boundary have the same color.

Vertex coloring is the starting point of the subject, and other coloring problems can be transformed into a vertex version. For example, an edge coloring of a graph is just a vertex coloring of its line graph, and a face coloring of a planar graph is just a vertex coloring of its planar dual. However, non-vertex coloring problems are often stated and studied as is. That is partly for perspective, and partly because some problems are best studied in non-vertex form, as for instance is edge coloring.

The convention of using colors originates from coloring the countries of a map, where each face is literally colored. This was generalized to coloring the faces of a graph embedded in the plane. By planar duality it became coloring the vertices, and in this form it generalizes to all graphs. In mathematical and computer representations it is typical to use the first few positive or nonnegative integers as the "colors". In general one can use any finite set as the "color set". The nature of the coloring problem depends on the number of colors but not on what they are.


39

Discrete Mathematics Department of CSEGraph coloring enjoys many practical applications as well as theoretical challenges. Beside the classical types of problems, different limitations can also be set on the graph, or on the way a color is assigned, or even on the color itself. It has even reached popularity with the general public in the form of the popular number puzzle Sudoku. Graph coloring is still a very active field of research.

A proper vertex coloring of the Petersen graph with 3 colors, the minimum number possible.

Vertex coloring

When used without any qualification, a coloring of a graph is almost always a proper vertex coloring, namely a labelling of the graph’s vertices with colors such that no two vertices sharing the same edge have the same color. Since a vertex with a loop could never be properly colored, it is understood that graphs in this context are loopless.

The terminology of using colors for vertex labels goes back to map coloring. Labels like red and blue are only used when the number of colors is small, and normally it is understood that the labels are drawn from the integers {1,2,3,...}.A coloring using at most k colors is called a (proper) k-coloring. The smallest number of colors needed to color a graph G is called its chromatic number, χ(G). A graph that can be assigned a (proper) k-coloring is k-colorable, and it is k-chromatic if its chromatic number is exactly k. A subset of vertices assigned to the same color is called a color class, every such class forms an independent set. Thus, a k-coloring is the same as a partition of the vertex set into k independent sets, and the terms k-partite and k-colorable have the same meaning.

This graph can be 3-colored in 12 different ways.The following table gives the chromatic number for familiar classes of graphs.

graph

complete graph


40

http://en.wikipedia.org/wiki/File:Petersen_graph_3-coloring.svg

http://en.wikipedia.org/wiki/File:Graph_with_all_three-colourings.svg

Discrete Mathematics Department of CSEcycle graph ,

star graph , 2

wheel graph ,

2. Support materials

In order for students to complete their assignments, fare well in exams and fulfill the learning objectives, BVRIT supplies them with additional material. 2.1 Computer files, programs, and documents

This section includes on-line documentation, and the electronic version of the handouts, teaching material and Course Activities support documents.


41


2.2 Unit Wise Question Bank

2.2.1 Subjective Question Bank

UNIT-I

1) Write the following statements in symbolic forma) Mark is poor but happyb) Mark is rich or unhappyc) Mark is neither rich nor happyd) Mark is poor or he is both rich and unhappy

2) Construct the truth tables for the following formulas

a) (Q (P

b) (P Q) R)

3) Determine which of the formulas are tautologies or contradictions or contingencies


42

Discrete Mathematics Department of CSEa) ((P Q) P )

b) ((P Q) (Q P))

c) ((Q P) Q)

4) Show the following implications

a) (P Q) (P Q)

b) (P (Q R)) (P Q) (P R) with out constructing truth table

5) Show the following equivalence

a) (P Q) (P Q) (P Q) with out constructing truth table

b) (P Q) (R Q) (P R) Q

6) Obtain the principal disjunctive normal form of P (P (Q (Q R)))

7) Obtain the principal conjunctive normal form of P (P (Q P) )

8) Express P(PQ) in terms of only. Express the same formula in terms of only.

9) Obtain product- of- sums canonical form for (P Q) (P Q).

10) Obtain sum-of-products canonical form for (P Q) (P Q)

1) Indicate the variables that are free and bound. Also show the scope of the quantifier.

a) (P(x) R(x)) P(x) Q(x)

b) (P(x) Q(x)) ( P(x) Q(x) )

2) Convert the following predicates into symbolic forma) x is the father of the mother of yb) All the world loves a loverc) All men are giants


43

Discrete Mathematics Department of CSEd) Some cats are black

3) Show that P(x) Q(x) (P(x) Q(x)).

4) Show that Q(x) is a valid conclusion from the premises (P(x) Q(x)) , P(x).

5) Using Rule CP show the following implication

(P(x) Q(x) ), ( R(x) Q(x) ) ( R(x) P(x))

6) Show the following implication using indirect method of proof

(P(x) Q(x)) P(x) Q(x)

7) Test the validity of the following arguments

No real numbers has Negative Square

All real numbers are complex numbers

Some complex numbers have negative squares

Z is a number whose square is not negative

There fore Z is a real number

8) Show that S R is tautologically implied by (P Q) (P R) (Q S) using automatic thermo

proving.

Unit-II

1) Given S={1,2,3,4,5,6,7,8,9,10} and relation R on S where R={(x, y) / x+ y =10}What are the properties of the relation R?

2) Let X={1,2,3,4} and R={(x, y)/x>y}.Draw the graph of R and also give its matrix.


44

Discrete Mathematics Department of CSE3) Let R denote a relation on the set of ordered pairs of positive integers such that

(x, y) R (u, v) iff xv=yu. Show that R is an equivalence relation.4) Given a set s={1,2,3,4,5}, find the equivalence relation on s which generates the partition {{1,2},{3},

{4,5}}.Draw the graph of the relation.5) Let the compatibility relation on a set {x1,x2,x3,……,.x6} be given by the matrix

X21 X3 1 1 X4 0 0 1 X5 0 0 1 1 X6 1 0 1 0 1

Draw the graph and find the maximal compatibility blocks of the relation

6) Given the relation matrix MR of a relation R on the set {a, b, c} , find the relation matrices of R’ , R2,R3 and RoR`

1 0 1

MR = 1 1 0

1 1 17) Draw the Hasse diagrams of the following sets under the partial ordering relation “divides”.

a) {2,6,24} b) {1,2,3,6,12} c) {3,9,27,54}8) List all possible functions from X= {a, b, c} to Y= {0,1} and indicate in each case whether the

function is one-to-one ,is onto, and is one-to-one onto.

9) Let f: R R and g: R R, where R is the set of real numbers. Find fog and gof, where f(x) = x2-2 and

g(x) = x+4. State whether these functions are injective, surjective, and bijective.

10) Let f: R R be given by f(x) = x2-2. Find f-1


45

Discrete Mathematics Department of CSEUnit IV

1) In how many ways can we draw a heart or a spade from an ordinary deck of playing cards? A heart or an ace? An ace or a king? A card numbered 2 through 10? A numbered card or a king?

2) How many 3-letter words can be formed using the letters a, b, c, d, e, f and using a letter only once if:

a) The letter a is to be used?b) Either a or b or both a and b are used?c) The letter a is not used?

3) How many ways are there to roll two distinguishable dice to yield a sum that is divisible by 3?4) In how many ways can 10 people arrange themselves

a) In a row of 10 chairs?b) In a circle of 10 chairs

5) a) How many binary sequences are there of length 15?b) How many binary sequences are there of length 15 with exactly six 1’s?

6) A multiple choice test has 15 questions and 4 choices for each answer how many ways can the 15 questions be answered so that

a) Exactly 3 answers are correct?b) Atleast 3 answers are correct?

7) Find the number of ways in which 5 different english books,6 french books,3 german books and 7 russian books can be arranged in a shelf so that all books of the same language are together?

8) How many solutions are there to the equation x1+x2+x3+x4+x5=50 in non negative integers?9) How many ways can we distribute 12 white balls and 2 black balls?

a) Into 9 numbered boxes?b) Into 9 numbered boxes where each box contains atleast one white ball?

10) Consider the word TRIANNUALa) How many arrangements are there to these 9 letters?b) How many 9 letters words are there with the letters T,I,and U separated by exactly 2 of

the other letters?c) How many 6 letter words can be formed from the letters of TRIANNUAL with no N’S?


46


1) Find the coefficient of X16 in (1+X4+X8)10?

2) In (1+X4+X9)10 find the coefficient of x23 and x32?

3) Write the following formal power series expression for 1/(1-5X)3?

4) FInd the coefficient of X12 in 1-X4-X7+X11/(1-X)5?

5) Solve the recurrence relation an-7a(n-1)+10a(n-2)=0 for n>=2 using generating functions method?

6) Solve the following recurrence relation by substitution an=a(n-1)+n(n-1) where a0=1?

7) Solve the following recurrence relation using generating functions an-2an-3+a(n-6)=0 for n>=6 and a0=1,a1=0,a2=a3=a4=a5=0?

8) Solve the following recurrence relation using the characteristic rootsAn-7a(n-1)+8a(n-2)=0 and a0=2,a1=-7?

9) Solve the following recurrence relation using generating functionsAn-5a(n-1)+6a(n-2)=4n-2 for n>=2 and a0=1,a1=5?

10) Find the complete solution to an+2a(n-1)=n+3 for n>=1 and with a0=3?


47


UNIT-V

1) Suppose that G is a non directed graph with 12 edges .suppose the G has 6 vertices of degree 3 and the rest have degrees less than 3.Determine the minimum number of vertices G can have ?

2) Is there a graph with the degree sequence (1,1,3,3,3,4,6,7)?If yes find whether the graph is simple or multi graph?

3) Prove if V={ v1,v2,……vn } is the vertex set of a nondirected graph then deg (vi)=2E, i =1 to nIf g is a directed graph, then deg+ (vi)= deg- (vi)= E.

4) Verify whether the following two graphs are isomorphic or not.

5) Determine the number of edges in the following graphsa) Kn b) Km, n c) Cn d) Pn

Also determine the number of vertices in Km, n 6) Define spanning tree and circuit rank? Draw all possible spanning trees for the following graph.

7) Write algorithm for Breadth first search. construct spanning for the following figure using BFS.


48


8) Construct spanning tree for the following graph using DFS algorithm.

9) Find a minimal spanning tree for the following graph using krushkal’s algorithm?

10) Find a minimal spanning tree for the following graph using prim’s algorithm?


49


1) Show that K5 is non planar

2) Draw planar graphs for n= 2, 3, 4,5

3) Draw the dual graph for each of the following graph

4) a) A complete graph kn is planar iff n<=4

b) A complete bipartite graph Km,n is planar iff m<=2 or n<=2

5) Which of the multigraphs have Euler paths, circuits, or neither?

6) Find a Hamiltonian cycle in each of the following graphs?


50


7) how many different Hamiltonian cycles are there in Kn, a complete graph on n vertices?

8) Define chromatic number? Find the chromatic number of a)cycle b)tree c)bipartite graph

9) Determine the chromatic number for the following graphs?


51



52

Documents

Spec Notes · Web view1.1 Teaching/ Instructional Methods and Aids (Content Delivery) The following Instructional methods have been employed during the course: a) Lecture method: