Randomized Algorithms - TUNIelomaa/teach/RandAl-17-1.pdf · Randomized Algorithms Prof. Tapio Elomaa [email protected] Course Basics • A 4 credit unit course • Part of Theoretical

1/11/2017

1

Randomized Algorithms

Prof. Tapio Elomaa

[email protected]

Course Basics

• A 4 credit unit course• Part of Theoretical Computer Science

courses at the Laboratory of Mathematics• There will be 4 hours of lectures per week• Weekly exercises start next week• We will assume familiarity with

– Necessary mathematics– Basic programming

11-Jan-17MAT-72306 RandAl, Spring 2017 2

1/11/2017

2

Organization & Timetable

• Lectures: Prof. Tapio Elomaa– Mon 12–14 in SE100J & Wed 14–16 in SE203– Jan. 9 – Feb. 22, 2017– Exceptions: ?

• Exercises: Ph.D. Juho Lauri– Thu12–14 SE100J

• Exam: Fri Mar. 3, 2017 @ 9–12 (next Apr. 20)


Course Grading


• Exam: Maximum of 30 points• Weekly exercises yield extra points

• 40% of questions answered: 1 point• 80% answered: 6 points• In between: linear scale (so that

decimals are possible)

1/11/2017

3

Material

• The textbook of the course is– Michael Mitzenmacher & Eli Upfal: Probability

and Computing, 3rd ed., MIT Press, 2009• There is no prepared material, the slides

appear in the web as the lectures proceed– http://www.cs.tut.fi/~elomaa/teach/72306.html

• The exam is based on the lectures (i.e., noton the slides only)


Content (Plan)

1. Events and Probability2. Discrete Random Variables and Expectation3. Moments and Deviations4. Chernoff Bounds5. Balls, Bins, and Random Graphs6. The Probabilistic Method7. Markov Chains and Random Walks8. Continuous Distributions and the Poisson Process9. Entropy, Randomness, and Information10. The Monte Carlo Method (if time)


1/11/2017

4

1. Events and Probability

Verifying Polynomial IdentitiesAxioms of Probability

Verifying Matrix MultiplicationA Randomized Min-Cut Algorithm

1.1. Verifying Polynomial Identities

• Suppose we have a program that multipliestogether monomials

• Consider the problem of verifying the followingidentity, which might be output by our program:

+ 1 2 + 3 4 + 57 + 25

• To verify the identity: multiply together the termson the LHS and see if the resulting polynomialmatches the RHS


1/11/2017

5

• When we multiply all the constant terms on theleft, the result does not match the constant termon the right, so the identity cannot be valid

• Given two polynomials ( ) and ( ), we canverify the identity

( ( )by converting the two polynomials to theircanonical forms

• Two polynomials are equivalent iff all thecoefficients in their canonical forms are equal


• Let us assume that ( ) is given as a product= and ( ) is given in its

canonical form• Transforming ( ) to its canonical form by

consecutively multiplying the th monomial withthe product of the first 1 monomials requires

multiplications of coefficients• We assume that each multiplication can be

performed in constant time• If the products of the coefficients grow large then

it could require more than constant time to addand multiply numbers together


1/11/2017

6

• Let us utilize randomness• Assume that the maximum degree, or the largest

exponent of in ( ) and ( ) is• The algorithm chooses an integer uniformly at

random in the range {1, … , 100 }• All integers are equally likely to be chosen• Compute the values ( ) and ( )• If ( ( ) the algorithm decides that the two

polynomials are not equivalent, and• if ( ) = ( ) the algorithm decides that the two

polynomials are equivalent


• Suppose that the algorithm can generate aninteger chosen uniformly at random in the range{1, … , 100 } in one computation step

• Computing the values of ( ) and ( ) can bedone in ( ) time, which is faster thancomputing the canonical form of ( )

• The randomized algorithm, however, may give awrong answer– If ( ( ), the algorithm gives the correct

answer, since ( ) = ( ) for any value of– If ( ( ) and ( ( ), the algorithm

gives the correct answer since it has found acase where ( ) and ( ) disagree


1/11/2017

7

• Thus, when the algorithm decides that the twopolynomials are not the same, the answer isalways correct

• If ( ( ) and ( ) = ( ), the algorithmgives the wrong answer

• It is possible that the algorithm erroneouslydecides that the two polynomials are the same

• For this error to occur, must be a root of theequation ( ( ) = 0

• The degree of the polynomial ( ( ) is nolarger than


• By the fundamental theorem of algebra, apolynomial of degree up to has no more than

roots• Thus, if ( ( ), then there are no more

than values in the range {1, … , 100 } for which( ) = ( )

• Since there are 100 values in the range{1, … , 100 }, the chance that the algorithmchooses such a value and returns a wronganswer is no more than 1/100


1/11/2017

8

1.2. Axioms of Probability

Definition 1.1: A probability space has threecomponents:1. a sample space , which is the set of all

possible outcomes of the random processmodeled by the probability space;

2. a family of sets representing the allowableevents, where each set in is a subset of thesample space ; and

3. a probability function Pr: satisfyingDefinition 1.2


• An element of is called a simple orelementary event

• In the randomized algorithm for verifyingpolynomial identities, the sample space is theset of integers {1, … , 100 }

• Each choice of an integer in this range is asimple event


1/11/2017

9

Definition 1.2: A probability function is anyfunction Pr: that satisfies the conditions:1. for any event , Pr( 1;2. Pr = 1; and3. for any finite or countably infinite sequence of

pairwise mutually disjoint events , , , …,

Pr = Pr


• We mostly use discrete probability spaces (DPS)• In a DPS the sample space is finite or

countably infinite, and the family of allowableevents consists of all subsets of

• In a DPS, the probability function is uniquelydefined by the probabilities of the simple events

• Events are sets use standard set theorynotation to express combinations of events

• Write for the occurrence of both andand for the occurrence of either or(or both)


1/11/2017

10

• Suppose we roll two dice• If is the event that the first die is a 1 and

is the event that the second die is a 1– Then denotes the event

that both dice are 1– denotes the event that at least one of

the two dice lands on 1• Similarly, we write for the occurrence of

an event that is in but not in– With the same dice example, consists

of the event where the first die is a 1 and thesecond die is not


• We use the notation as shorthand for• E.g., if is the event that we obtain an even

number when rolling a die, then is the eventthat we obtain an odd number

• Definition 1.2 yields the following lemma

Lemma 1.1: For any two events and ,Pr = Pr + Pr Pr

• A consequence of Definition 1.2 is known as theunion bound


1/11/2017

11

Lemma 1.2: For any finite or countably infinitesequence of events , , …,

Pr Pr

• The third part of Definition 1.2 is an equality andrequires the events to be pairwise mutuallydisjoint


• Lemma 1.1 can be generalized to the followingequality, known as the inclusion-exclusionprinciple

Lemma 1.3: Let , … , be any events. Then

Pr = Pr Pr

+ Pr

+ ( 1 ) Pr


1/11/2017

12


• The only case in which the algorithm may fail togive the correct answer is when the twopolynomials ( ) and ( ) are not equivalent– The algorithm then gives an incorrect answer

if the random number it chooses is a root ofthe polynomial ( )

• Let represent the event that the algorithmfailed to give the correct answer


1/11/2017

13

• The elements of the set corresponding to arethe roots of the polynomial ( ) that arein the set of integers {1, … , 100 }

• Since the polynomial has no more than rootsit follows that the event includes no more than

simple events, and therefore

Pr algorithm fails = Pr =100

=1

100


• The algorithm gives the correct answer 99% ofthe time even when the polynomials are notequivalent

• One way to improve this probability is to choosethe random number from a larger range ofintegers

• If our sample space is the set of integers{1, … , 1000 }, then the probability of a wronganswer is at most 1/1000

• At some point, however, the range of values wecan use is limited by the precision available onthe machine on which we run the algorithm


1/11/2017

14

• Another approach is to repeat multiple times,using different random values to test the identity

• The algorithm has a one-sided error– It may be wrong only when it outputs that the two

polynomials are equivalent• If any run yields a number s.t. ( ( ),

then the polynomials are not equivalent• Repeat the algorithm a number of times and if

we find ( ( ) in any round, we know that( ) and ( ) are not equivalent

• Output that the two polynomials are equivalentonly if there is equality for all runs


• In repeating the algorithm we repeatedly choosea random number in the range {1, … , 100 }

• Repeatedly choosing random numbers by agiven distribution is known as sampling

• We can repeatedly choose random numberseither with or without replacement– In sampling with replacement we do not

remember which numbers we have alreadytested

– Sampling without replacement means that,once we have chosen a number , we do notallow it to be chosen on subsequent runs


1/11/2017

15

• Consider sampling with replacement• Repeat the algorithm times, and assume that

the input polynomials are not equivalent• What is the probability that in all iterations our

random sampling yields roots of the polynomial( ( ), resulting in a wrong output?

– If = 1: this probability is /100 = 1/100– If = 2, the probability that the 1st iteration finds

a root is 1/100 and the probability that the 2nd

iteration finds a root is 1/100, so the probabilitythat both iterations find a root is 1 100

• Generalizing, the probability of choosing rootsfor iterations would be 1 100


Definition 1.3: Two events and areindependent if and only if

Pr = Pr Pr .More generally, events , , . . . , are mutuallyindependent if and only if, for any subset [1, ],

Pr = Pr .


( )( )( )( )

1/11/2017

16

• We choose a random number uniformly atrandom from the set {1, … , 100 }, and thus thechoice in one iteration is independent of those inprevious iterations

• Let be the event that, on the th run, wechoose a root s.t. = 0

• The probability that the algorithm returns thewrong answer is given by

Pr ,


• Since Pr 100 and the events, , . . . , are independent, the probability that

the algorithm gives the wrong answer afteriterations is

Pr = Pr1

100

• The probability of making an error is therefore atmost exponentially small in the number of trials


1/11/2017

17

• Consider now the case where sampling is donewithout replacement

• In this case the probability of choosing a givennumber is conditioned on the events of theprevious iterations

Definition 1.4: The conditional probability thatevent occurs given that event occurs is

Pr =Pr

PrThe conditional probability is well-defined only ifPr > 0



• We are looking for theprobability of within theset of events defined by

• Because defines ourrestricted sample space, wenormalize the probabilities bydividing by Pr , so that thesum of the probabilities of allevents is 1

1/11/2017

18

• When Pr > 0, the definition can also bewritten in the useful form

Pr Pr = Pr• Notice that, when and are independent and

Pr 0, we have

Pr =Pr

Pr=

Pr PrPr

= Pr

• Intuitively, if two events are independent, theninformation about one event should not affectthe probability of the second event


• What is the probability that in all the iterationsour random sampling without replacement yieldsroots of the polynomial ( ( ), resulting ina wrong output?

• As in the previous analysis, let be the eventthat the random number chosen in the thiteration of the algorithm is a root of ( ( )

• Again, the probability that the algorithm returnsthe wrong answer is given by

Pr


1/11/2017

19

• Applying the definition of conditional probability,we obtain

Pr = PrPr

• Repeating this argument givesPr = Pr PrPr Pr

• Recall that there are at most values forwhich = 0


• If trials 1 through 1 < have found 1 ofthem, then when sampling without replacementthere are only ( 1) values out of the100 ( 1) remaining root choices

• Hence

Pr( 1)

100 ( 1)• and the probability of the wrong answer after

iterations is bounded by

Pr( 1)

100 ( 1)1

100


1/11/2017

20

• ( ( 1))/(100 ( 1)) < /100 when> 1, and our bounds on the probability of an

error are actually slightly better withoutreplacement

• Also, if we take + 1 samples w/o replacementand two polynomials are not equivalent, then weare guaranteed to find an s.t. ( ( 0

• Thus, in + 1 iterations we are guaranteed tooutput the correct answer

• However, computing the value of the polynomialat + 1 points takes ( ) time using thestandard approach, which is no faster thanfinding the canonical form deterministically


1.3. Verifying Matrix Multiplication

• We are given three × matrices A, B, and• For convenience, assume we are working over

the integers modulo 2• We want to verify whether

AB = C• One way to accomplish this is to multiply A and

B and compare the result to• The simple matrix multiplication algorithm takes

( ) operations


1/11/2017

21

• A randomized algorithm allows for fasterverification – at the expense of possiblyreturning a wrong answer with small probability

• It is similar in spirit to our randomized algorithmfor checking polynomial identities

• The algorithm chooses a random vector =( , , … , 0,1

• It then computes by first computing andthen ( ), and it also computes

• If ( , then AB C• Otherwise, it returns that AB = C


• The algorithm requires three matrix-vectormultiplications in time ( )

Lemma 1.5: Choosing = ( , , … , 0,1uniformly at random is equivalent to choosing each

independently and uniformly from 0,1 .Proof: If each is chosen independently anduniformly at random, then each of the 2 possiblevectors is chosen with probability 2 , giving thelemma.• The probability that the algorithm returns AB = C

when they are actually not equal is bounded bythe following theorem


1/11/2017

22

Theorem 1.4: If AB C and is chosen uniformlyat random from 0,1 , then

Pr =12

.

Proof: Let D = AB C 0. Then = impliesthat = . Since D 0 it must have somenonzero entry; w.l.o.g., let that entry be .For = , it must be the case that

= 0

or, equivalently,

= (1.1)


• Instead of reasoning about , we choose theindependently and uniformly at random from{0, 1} in order, from down to

• By Lemma 1.5 this is equivalent to choosing avector uniformly at random

• Consider the situation just before is chosen• The RHS of Eqn. (1.1) is determined, and there

is at most one choice for that will make thatequality hold

• Since there are two choices for , the equalityholds with probability at most 1/2, and hence theprobability that = is at most 1/2


1/11/2017

23

• By considering all variables besides as havingbeen set, we have reduced the sample space tothe set of two values {0, 1} for and havechanged the event being considered to whetherEqn. (1.1) holds

• This idea is called the principle of deferreddecisions

• When there are several random variables, suchas the of the vector , it often helps to think ofsome of them as being set at one point in thealgorithm with the rest of them being left random– or deferred – until some further point in theanalysis


Theorem 1.6 [Law of Total Probability]:Let , , … , be mutually disjoint events in thesample space , and let . Then

Pr = Pr = Pr Pr .

• To improve on the error probability of Thm 1.4,we again use the fact that the algorithm has aone-sided error and run it multiple times

• If we ever find an s.t. , then thealgorithm correctly returns that


1/11/2017

24

• If we always find = , then the algorithmreturns that = and there is someprobability of a mistake

• Choosing with replacement from 0,1 foreach trial, we obtain that, after trials, theprobability of error is at most 2

• Repeated trials lead to running time of ( )• If we do this verification 100 times, the running

time of the algorithm is still ( ) – faster thandeterministic algorithms for sufficiently large


• The probability that an incorrect input passes theverification test 100 times is 2

• In practice, the computer is much more likely tocrash during the execution of the algorithm thanto return a wrong answer

• A related problem is to evaluate the gradualchange in our confidence in the correctness ofthe matrix multiplication as we repeat therandomized test

• Toward that end we introduce Bayes' law11-Jan-17MAT-72306 RandAl, Spring 2017 48

Documents

Randomized Algorithms - TUNIelomaa/teach/RandAl-17-1.pdf · Randomized Algorithms Prof. Tapio Elomaa [email protected] Course Basics • A 4 credit unit course • Part of Theoretical