Chapter 5.Balls, Bins and Random GraphsPart II. Application
Probability and Computing
Michael Mitzenmacher and Eli Upfal
Presenter : Kim, Deawoo
Password checker• Prevent using common, easily cracked passwords
• Keep a dictionary of unacceptable passwords
• Requested password is part of the unacceptable set?
How to search the unacceptable password list?• Binary search on the dictionary
• Θ(log𝑚) time for 𝑚 words
• Chain Hashing – search time
• Bloom filter – save space
Unacceptable set
App 1. Hashing
12
123 345 12345 22 11 lanada
Goal : Reduce search time
Store unacceptable passwd into appropriate bin using hash function
Searching Item1. Hash input to find the appropriate bin
2. Search sequentially through the linked list in a bin
Hash function 𝑓: 𝑈 → 0, 𝑛 − 1• 𝑓 ∶ uniform random and can be computed in 𝑂(1)
• 𝑈 ∶ All possible passwdstrings
• 𝑛 ∶ array size (# of bins)
• 𝑚 : # of unacceptable passwd (# of balls)
Chain Hashing [1/2]
lanada
123 345 12345
11
Balls-and-Bins model with 𝑚 balls in 𝑛 bins• The distribution of # of balls in a bin is approximately Poisson with 𝜇 = 𝑚/𝑛
Total expected time for searching• 𝐸 # 𝑏𝑎𝑙𝑙𝑠 𝑖𝑛 𝑎 𝑏𝑖𝑛 =
𝑚
𝑛
• Since # balls in a bin is Poisson distribution, 𝐸 𝑋 = 𝜇 = 𝑚/𝑛
• If 𝑛 = 𝑚, 𝐸 # 𝑏𝑎𝑙𝑙𝑠 𝑖𝑛 𝑎 𝑏𝑖𝑛 = 1
• Total expected time for the search is constant
Maximum time for searching• Maximum # balls in a bin : Θ(ln𝑚 / ln ln𝑚) 𝑤. ℎ. 𝑝
Better than binary search
Drawback: wasted space
Chain Hashing [2/2]
Goal : save space
Bloom filters • Array of n bits, initially all set to 0• 𝑘 independent hash functions 𝐻1, 𝐻2, … , 𝐻𝑘
• Bloom filter is used to represent a set𝑆 = {𝑠1, 𝑠2, … , 𝑠𝑚} of m elements
• Balls-and-Bins model with 𝑘 ⋅ 𝑚 balls in 𝑛 bins
Q: Is an element 𝑥 in 𝑆 ?
False positive : According to Bloom filter the element is in the array but it actually isn’t.
False positive matches are possible, but false negatives are not.Useful for “Password checker”
Bloom Filters [1/3]
• False positive
• 𝑦 is not in 𝑆
• Bloom filter
𝐻1 x1 = 3 𝐻1 x2 = 5𝐻2 x1 = 5 𝐻2 x2 = 6𝐻3 x1 = 11 𝐻3 x2 = 9
𝐻1 y = 5 𝐻2 𝑦 = 6 𝐻3 𝑦 = 10
𝐻1 y = 3 𝐻2 𝑦 = 6 𝐻3 𝑦 = 9
Calculating false positive probability (balls-and-bins model)
Bloom filter• The prob. that a specific bit is still 0 is
For requested password
• Let 𝑝 = 𝑒−𝑘𝑚
𝑛 . Then the prob. of false positive is
Optimize # of hash functions 𝑘 to minimize the false positive probability 𝑓, for given 𝑚 and 𝑛
Bloom Filters [2/3]
Increasing k
Decreasing k
Gives us more chances to find a 0-bitfor an element that is not a member of S
Increases the fraction of 0-bits in the array
Minimizing false positive probability
min𝑘
𝑓 = 𝑒𝑔
• where 𝑔 = 𝑘 ln(1 − 𝑒−𝑘𝑚/𝑛)
• This yields a global minimum of 𝑘 = (ln 2) ∙ (𝑛/𝑚)
• In this case the prob. f is (1/2)k ≅(0.6185)n/m
Bloom filters allow a constant prob. of a false positive while keeping 𝑛/𝑚
Bloom filters are highly effective even if n=cm for a small constant c, such as c=8• In this case, when k=5 or k=6 the false positive prob. is just over 0.02
Bloom Filters [3/3]
Random graph models Gn,p• n: # of nodes
• p: edge adding prob.
• Expected number of edges in the graph is 𝑛2
𝑝
• Each vertex has expected degree 𝑛 − 1 𝑝
App 2. Random Graphs
8/25
Hamiltonian Cycle Problem• Input : Given a graph 𝐺 = (𝑉, 𝐸) with 𝑛 vertices• Goal : Does 𝐺 Have a Hamiltonian cycle?
A Hamiltonian cycle is a cycle in the graph that visits every vertex in 𝐺 exactly once
Hamiltonian Cycle is NP-Complete
Question• Q: Hard for most inputs or relatively small fraction of all graphs?• A: Finding Hamiltonian cycle is not hard for suitably randomly selected
graphs. (balls-and-bins model)
Analysis• Propose randomized algorithm for finding Hamiltonian cycle
in random graphs• Probabilistic analysis over random choices and input distribution
using balls-and-bins model
Hamiltonian Cycles in Random Graph
𝑟𝑜𝑡𝑎𝑡𝑒 𝑣6, 𝑣3 : 𝑣1 𝑣2 𝑣3 𝑣4 𝑣5 𝑣6 → 𝑣1 𝑣2 𝑣3 𝑣6 𝑣5 𝑣4
Hamiltonian Cycles in Random Graph
𝑣1 𝑣2 𝑣3 𝑣4 𝑣5 𝑣6 𝑣1 𝑣2 𝑣3 𝑣4 𝑣5 𝑣6
When the snake does not have any edge to eat next, algorithm stops and reports “FAIL”
When will the algorithm fail?• Difficult to analyze directly…
Modify the algorithm to “stupid” one, but easy to analyze (Algorithm version 2)
Failure Case of the Algorithm
Modify the rotation process so that the next head of the list is chosen uniformly at random from among all vertices of the graph
Algorithm Version 2
Current state : • 𝑃 : 𝑣4, 𝑣3, 𝑣2, 𝑣1• 𝑣𝑘:head after 𝑘𝑡ℎ steps (𝑣1)• 𝑥𝑘: used edge list for 𝑣𝑘(visited vertex from 𝑣𝑘)
Case 1 : prob = 1/𝑛• Reverse 𝑃, make 𝑣4 as head 𝑣𝑘
Case 3 : prob = 1 - 1/𝑛 - |𝑥𝑘|/𝑛• Choose a random node 𝑣 adjacent to 𝑣𝑘, which was not visited from 𝑣𝑘 previously
• Add 𝑣 into 𝑥𝑘Current 𝑥4: 𝑣3Case 3-1 : choose 𝑣2 - 𝑟𝑜𝑡𝑎𝑡𝑒 𝑣4, 𝑣2 , 𝑥4: 𝑣2, 𝑣3Case 3-2 : choose 𝑣5 - 𝑒𝑥𝑡𝑒𝑛𝑑(𝑣5), 𝑥4: 𝑣3, 𝑣5
Algorithm Version 2
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
<Case 1>
<Case 3-1>
<Case 3-2>
Case 2 : prob = |𝑥𝑘|/𝑛• Choose a random node 𝑣 that is visited from 𝑣𝑘 previously; select 𝑣 ∈ 𝑥𝑘Current 𝑥4: 𝑣3Choose 𝑣3 as next head: 𝑟𝑜𝑡𝑎𝑡𝑒 𝑣3, 𝑣4
Lemma 5.15
After 𝑘𝑡ℎ steps, if there is at least one adjacent vertex of 𝑣𝑘 unvisited from 𝑣𝑘, then for any vertex 𝑢
Pr(𝑉𝑘+1 = 𝑢|𝑉𝑘 = 𝑢𝑘, 𝑉𝑘−1 = 𝑢𝑘−1,…, 𝑉0 = 𝑢0) = 1/𝑛
Any vertex becomes next head with same probability 1/𝑛
Algorithm Version 2
1
2
3
4
5
<Case 2>
Proof sketch
Balls-and-bins model with 𝑛 bins and 𝑂(𝑛 ln 𝑛) balls
Failure case ( ≤ 𝑂(𝑛−1) )• 𝜀1: Algorithm ran for 3𝑛 ln 𝑛 steps but fail to construct a Hamiltonian cycle
• 𝜀2: Unused edge list is empty in the first 3𝑛 ln 𝑛 iterations
Analysis using Balls-and-Bins Model
𝜀1: Algorithm ran for 3𝑛 ln 𝑛 steps but fail to construct a Hamiltonian cycle
• 𝜀1𝑎: Construct a Hamiltonian path within 2𝑛 ln 𝑛 steps• There exists empty bin after throwing 2𝑛 ln 𝑛 balls
• Probability that 1 bin is empty
• By union bound, the probability for n bins is at most 1/n
• 𝜀1𝑏: Complete a Hamiltonian path to cycle within 𝑛 ln 𝑛 steps
• Pr(𝜀1) ≤2
𝑛
Proof [1/2]
𝜀2: Unused edge list is empty in the first 3𝑛 ln 𝑛iterations
• 𝜀2𝑎: At least 9 ln 𝑛 edges were removed from the unused-edge list of at least one vertex in 3𝑛 ln 𝑛 steps• Maximum loads in a bin is more than 9 ln 𝑛 throwing 3𝑛 ln𝑛 balls
• 𝜀2𝑏: At least one vertex had fewer than 10 ln 𝑛 edges
The probability that the algorithm fails to find a Hamiltonian cycle in 3𝑛 ln 𝑛 steps is bounded by
Proof [2/2]