13
Combinatorics (Important to algorithm analysis ) lem I: many N-bit strings contain at least 1 ze lem II: many N-bit strings contain more than 1 z

Combinatorics (Important to algorithm analysis ) Problem I: How many N-bit strings contain at least 1 zero? Problem II: How many N-bit strings contain

Embed Size (px)

Citation preview

Page 1: Combinatorics (Important to algorithm analysis ) Problem I: How many N-bit strings contain at least 1 zero? Problem II: How many N-bit strings contain

Combinatorics(Important to algorithm analysis )

Problem I: How many N-bit strings contain at least 1 zero?

Problem II: How many N-bit strings contain more than 1 zero?

Page 2: Combinatorics (Important to algorithm analysis ) Problem I: How many N-bit strings contain at least 1 zero? Problem II: How many N-bit strings contain

Which areas of CS need all this?

1. Information Theory2. Data Storage, Retrieval3. Data Transmission4. Encoding

Page 3: Combinatorics (Important to algorithm analysis ) Problem I: How many N-bit strings contain at least 1 zero? Problem II: How many N-bit strings contain

Information (Shannon) Entropyquantifies, in the sense of an expected value, the information contained in a message.

Example 1: A fair coin has an entropy of 1 [bit]. If the coin is not fair, then the uncertainty is lower (if asked to bet on the next outcome, we would bet preferentially on the most frequent result) => the Shannon entropy is lower than 1.

Example 2: A long string of repeating characters: S=0

Example 3: English text: S ~ 0.6 to 1.3

The source coding theorem: as the length of a stream of independent and identically-distributed random variable data tends to infinity, it is impossible to compress the data such that the code rate (average number of bits per symbol) is less than the Shannon entropy of the source, without information loss.

Page 4: Combinatorics (Important to algorithm analysis ) Problem I: How many N-bit strings contain at least 1 zero? Problem II: How many N-bit strings contain

Information (Shannon) Entropy Cont’d

The source coding theorem: It is impossible to compress the data such that the code rate (average number of bits per symbol) is less than the Shannon entropy of the source, without information loss.

(works in the limit of large length of a stream of independent and identically-distributed random variable data )

Page 5: Combinatorics (Important to algorithm analysis ) Problem I: How many N-bit strings contain at least 1 zero? Problem II: How many N-bit strings contain

Combinatorics cont’dProblem: “Random Play” on your I-Touch works like this. When pressed once, it plays a random song from your library of N songs. When pressed again, it excludes the song just played from the library, and draws another song at random from the remaining N-1 songs. Suppose you have pressed “Random Play” k times. What is the probability you will have heard your one most favorite song?

Page 6: Combinatorics (Important to algorithm analysis ) Problem I: How many N-bit strings contain at least 1 zero? Problem II: How many N-bit strings contain

Combinatorics cont’dThe “Random Play” problem.

1. Tactics. Get your hands dirty. 2. P(1) = 1/N. P(2) = ? Need to be careful: what if the song has not

played 1st? What if it has? Becomes complicated for large N.3. Table. Press # (k) vs. P.

1 1/N 2 1/(N-1) k 1/(N-k +1)

4. Key Tactics: find complimentary probability P(not) = (1 - 1/N)(1 - 1/(N-1))*…(1- 1/(N-k +1)); then P(k) = 1 - P(not).

5. Re-arrange. P(not) = (N-1)/N * (N-2)/ (N-1) * (N-3)/(N-2)*…(N-k)/(N-k+1) = (N-k)/N. Thus, P = 1 - P(not) = k/N.

6. If you try to guess the solution, make sure your guess works for simple cases when the answer is obvious. E.g. k=1, k=N. Also, P <= 1.

7. The very simple answer suggests that a simpler solution may be possible. Can you find it?

Page 7: Combinatorics (Important to algorithm analysis ) Problem I: How many N-bit strings contain at least 1 zero? Problem II: How many N-bit strings contain

Data Compression

Lossless Compression: 25.888888888 => 25.[9]8

Lossy Compression: 25.888888888 => 26

Lossless compression: exploit statistical redundancy. For example, In English, letter “e” is common, but “z” is not. And you never have a “q” followed by a “z”.

Drawback: not universal. If no pattern - no compression.

Lossy compression (e.g. JPEG images).

Page 8: Combinatorics (Important to algorithm analysis ) Problem I: How many N-bit strings contain at least 1 zero? Problem II: How many N-bit strings contain

Combinatorics Cont’dProblem: DNA sequence contains only 4 letters (A,T,G and C). Short “words” made of K consecutive letters are the genetic code. Each word (called “codon”, ) codes for a specific amino-acid in proteins. There are a total of 20 biologically relevant amino-acids. Prove that genetic code based on fixed K is degenerate, that is there are amino-acids which are coded for by more than one “word”. It is assumed that every word codes for an amino-acid

Page 9: Combinatorics (Important to algorithm analysis ) Problem I: How many N-bit strings contain at least 1 zero? Problem II: How many N-bit strings contain

Permutations• How many three digit numbers (decimal representation)

are there if you cannot use a number more that once?

Page 10: Combinatorics (Important to algorithm analysis ) Problem I: How many N-bit strings contain at least 1 zero? Problem II: How many N-bit strings contain

P(n,r)• The number of ways a subset of r elements

can be chosen from a set of n elements is given by

Page 11: Combinatorics (Important to algorithm analysis ) Problem I: How many N-bit strings contain at least 1 zero? Problem II: How many N-bit strings contain

Theorem• Suppose we have n objects of k different

types, with nk identical objects of the kth type. Then the number of distinct arrangements of those n objects is equal to

Visualize: nk identical balls of color k. total # of balls = n

Page 12: Combinatorics (Important to algorithm analysis ) Problem I: How many N-bit strings contain at least 1 zero? Problem II: How many N-bit strings contain

Combinatorics Cont’d

The Mississippi formula.

Example: What is the number of letter permutations of the word BOOBOO?

6!/(2! 4!)

Page 13: Combinatorics (Important to algorithm analysis ) Problem I: How many N-bit strings contain at least 1 zero? Problem II: How many N-bit strings contain

Permutations and Combinations with Repetitions

• How many distinct arrangements are there of the letters in the word MISSISSIPPI?Each letter has a unique color (e.g. 1st “I” is blue, the 2nd is red, etc. ). Each arrangement must still read the same MISSISSIPPI.