Upload
samson-rose
View
221
Download
1
Embed Size (px)
Citation preview
The Multiplicative Weights Update MethodBased on Arora, Hazan & Kale (2005)
Mashor Housh
Oded Cats
Advanced simulation methods
Prof. Rubinstein
Outline
Weighted Majority Algorithm Binary case Generalized
Applications Game Theory
Zero-Sum game Linear Programming
Fractional Packing problem NP-hard problems
Set Cover problem Artificial intelligence (Boosting)
WMA – Binary case
N experts give their predictions Our decision rule is a weighted majority of the expert
predictions Initially, all experts have the same weight on our
decision rule The update rule for incorrect experts is:
1 (1 )t ti iw w
WMA – Binary case
This procedure will yield in gains/losses that are roughly as good as those of the best of these experts
Theorem 1 – The algorithm results in the following bound:
Where: - the number of mistakes that expert I after t steps
- the number of mistakes that our algorithm made
2ln2(1 )t t
i
nm m
timtm
WMA Binary case – Proof of Theorem 1
I. By induction:
II. Define the ‘potential function’:
III. Each time we make a mistake, at least half of the total weight decrease by a factor of , so:
IV. By induction:
V. Using for
(1 )timt
iw 1t t
ii
w n
1 1 1 1
( (1 )) (1 )2 2 2
t t t
(1 / 2)tt mn
2ln(1 )x x x 1
2x
2ln2(1 )t t
i
nm m
WMA – Binary case : Example1
4 analysts give their prediction to the stock exchange: 3 are always wrong and the 4th is always right
1234
Market
1
2
3
4
DayExpert
WMA – Binary case : Example1 (Cont.)
1234
Market
10.50.250.125
10.50.250.125
10.50.250.125
1111
Balance of powers
3/41.5/10.75/10.375/1
User
DayExpert
1w
2w
3w
4w
0.5
WMA – Binary case : Example1 (Cont.)
Since our fourth analyst is never wrong: 4 0tm t
4 2 5.545m
2ln 2ln 42(1 ) 2(1 0.5)0
0.5t t
i
nm m
WMA – Binary case : Example2
100 analysts give their prediction to the stock exchange:
99 predict up with probability 0.05 the 100th expert predicts up with probability 0.99
The market goes up at 99% of the time.
WMA – Binary case : Example2 (Cont.)
Generalization of the WMA
Set of events/outcomes (P) is not bounded is the penalty that expert pays when the
outcome is is the distribution associated with the
experts The probability to choose an expert is:
At every round we choose an expert according to D and follow his advice
),( jiMPj
i
tnttt pppD ,...,, 21t
t ii t
kk
wp
w
Generalization of the WMA (Cont.)
The update rule is:
The expected penalty of the randomized algorithm is not much worse than that of the best expert
Theorem 2 - The algorithm results in:
Generalization of the WMA
0 0
ln( , ) (1 ) ( , ) (1 ) ( , )t t t t
t
nM D j M i j M i j
( , )
1
( , )
(1 ) ( , ) 0
(1 ) ( , ) 0
t
t
M i jt tit
i M i jt ti
w M i jw
w M i j
WMA – Comparison via example
The stock market example, using randomized expert instead of majority vote
1 0
1 0
1 0
0 1( , )
0 1
0 1
0 1
1 0
t
t odd
M i j
t even
Market
WMA – Comparison via example (Cont.)
With penalty only 0 1l
WMA – Comparison via example (Cont.)
WMA – Comparison via example (Cont.)
With penalty and reward 1 1l
Generalization of the WMA - Example
4 weather man give their forecast There are four possible weather conditions The payoff matrix is:
100 33 33 33
50 50 50 50( , )
33 33 33 100
25 25 25 25
M i j
Sunny Cloudy Rainy Snowy
Generalization of the WMA – Example (Cont.)
The actual weather is sunny and cloudy alternately
Generalization of the WMA – Example (Cont.)
The actual weather varies on the four possible weather conditions alternately
Applications
Define the following components in order to draw analogy: Experts Events Payoff matrix Weights Update rule
Applications Game theory
Zero-Sum games Experts – pure strategies to row player Events – pure strategies to column player Payoff matrix – the payoff to the row player,
when the row player plays strategy and the column player plays strategy
A distribution on the experts represents a mixed row strategy
The game value is (von Neumann’s MinMax theory and Nash equilibrium):
i),( jiM
j
* minmax ( , ) maxmin ( , )D ij P
M D j M i P
Applications Game theory
1) Initialize . Determine .
2) Random a row strategy according to
3) The column player choose the strategy that maximizes his revenues
4) Update
5) If stop. Otherwise – return to step 2.
D
)1,...,1(1 W
),(1 )1( jiMti
ti ww
i
1tt2/)ln(16 nt
Algorithm for solving Zero-Sum game:
4
1
Applications Game theory – Example1
3
1
3
1
3
1D
i j
min( )j
max( )i123
11/41/31/21/2
21/41/31/21/2
31/41/31/21/2
1/41/31/2
The row player choose minimum of maximum
penalty
The column player choose maximum of
minimum penalty2
1*
Applications Game theory – Example1 (Cont.)
Applications Game theory – Example2
),(1 )1( jiMti
ti ww
3
1*
(1) The row player chooses a strategy randomly (2) The column player chooses
the strategy that yield maximum benefits for him
)3 (Updating the weighting over row strategies
i j
min( )j
max( )i123
11/41/31/31/3
2003/43/4
32/31/32/32/3
001/3
3/1
3/1
3/1
p
Applications Game theory – Example2 (Cont.)
Applications Artificial Intelligence
The objective is to learn an unknown function A sequence of training examples is given: is the fixed unknown distribution on the domain The learning algorithm results in an hypothesis
The error is:
: 0,1c X ( , ( ))x c x
D X
: 0,1h X
~ ( ) ( )x DE h x c x
0
20
40
60
80
100
0 0.2 0.4 0.6 0.8 1
x
c(x)
Applications Artificial Intelligence (Cont.)
Strong learning algorithm – for every and , with probability
-weak learning algorithm – for every and , with probability
Boosting – combining several moderately accurate rules-of-thumb into a singly highly accurate prediction rule.
,D 0 ~x DE 1
, , 0D 0
~ 1/ 2x DE 1
Applications Artificial Intelligence (Cont.)
Experts – samples in the training set Events – set of all hypotheses that can be generated
by the weak learning algorithm Payoff matrix –
The final hypothesis is obtained via majority vote among
1 ( ) ( )( , )
0 ( ) ( )
h x c xM i j
h x c x
)(),...,(),( 21 xhxhxh T
2
2 1lnT
Applications
Linear Programming
Finding a feasible solution for a set of m constraints
Experts – constraints Events – solution vectors Payoff matrix - the distance from satisfying
the constraint: The final solution is
Track cases that there is no feasible solution
x
bAx
i iA x b),( jiM
1 t
t
x xT
1) Initialize , and the resulting .
2) Given an oracle which solves the following feasibility problem with a single constraint plus a set of easy constraints (Plotkin, Shmoys and Tardos):
Where:
If there is no feasible solution – break.
Applications
Linear Programming (Cont.)
i
itibpd
Algorithm for finding a feasible solution to a LP problem:
P)1,...,1(1 W
dxcT
i
iti Apc
4
Applications
Linear Programming (Cont.)
3) Update
Where:
4) Update
5) If stop. Otherwise – return to step 2.
Algorithm for finding a feasible solution to a LP problem (Cont.):
( , )
1
( , )
(1 ) ( , ) 0
(1 ) ( , ) 0
M i xtit
i M i xti
w M i xw
w M i x
( , ) i iM i x A x b
i
ti
tt
w
wP
1
11
2,T m m
Applications
Linear Programming - Example
Finding a feasible solution to the following problem:
Solution:
20
42
30
b
1 1 1
3 2 4
3 2 0
A
(30,30,30)Upper bound
(0,0,0)Lower bound
)71.19,9.20,06.23(X
Applications
Fractional Vertex Covering problem
Finding a feasible solution for a set of m constraints
Experts – constraints Events – solution vectors Payoff matrix - The final solution is
Track cases that there is no feasible solution
x
bAx
),( jiM
1 t
t
x xT
Ax
0, bA0Ax
1) Initialize , and the resulting .
2) Given an oracle which solves the following feasibility problem with a single constraint plus a set of easy constraints (Plotkin, Shmoys and Tardos):
Where:
If there is no feasible solution – break.
Applications
Fractional Vertex Covering problem (Cont.)
i
itibpd
Algorithm for finding a feasible solution to a Fractional Covering problem:
P)1,...,1(1 W
dxcT
i
iti Apc
4
Applications
Fractional Vertex Covering problem (Cont.)
3) Update
Where:
4) Update
5) If stop. Otherwise – return to step 2.
i
ti
tt
w
wP
1
11
),(
1 )1(xiM
ti
ti ww
xAxiM i),(
,T m m
Algorithm for finding a feasible solution to a Fractional Covering problem:
Applications
Flow problems
Maximum multi-commodity flow problem A set of source-sink pairs and capacity restrained
edges
p
pfmax
,p e
p e p
f c e
Applications
Flow problems (Cont.)
Experts – edges Events – a flow of value on the path, where
is the minimum capacity of an edge. Payoff matrix:
Update rule
Terminate rule:
( , )tpt
e
cM e p
c
tccte
te peww e
tp /1 )1(
tpc
tpc
1Tew
Applications
Set Cover problem
Find the minimal number of subsets in collection that their union equals the universe
Experts – elements in the universe Events – sets in the collection Payoff matrix –
CU
1( , )
0j
jj
i CM i C
i C
U C
Applications
Set Cover problem (Cont.)
Update rule:
We would search for the set which maximizes
(Greedy Set Cover Algorithm)
jC
1 (1 ( , ))t ti i jw w M i C
1 0tj ii C w
1 1tj ii C w
( , )j j
ii j i
i i C i C jj
wp M i C p
w
Applications
Vertex covering problem - Example
1 0 1 0 0
1 1 0 0 0
0 1 1 0 0( , )
1 0 0 0 1
0 1 0 0 1
0 0 0 1 1
M i j
min
. .
nn N
x
s t
51
2
4
4
2 5
3
1
3
6
1 ( , )
0,1e n
n
x x e n E
x n N
Applications
Vertex covering problem – Example (Cont.)
Find the minimum number of nodes (subsets) that cover all of the edges (universal)
1
Eset of edges
1,2,4
2,3,5
1,3
4,5,6
6
ln( ) optT n X
Applications
Vertex covering problem – Example (Cont.)
The maximum subset which includes edge 1.000000 is:
node =1
The selected nodes are :
c =1
Iteration: 2.000000
The probability for edge i
p = 0 0 0.3333 0 0.3333 0.3333
Choose edge following the distribution
i = 3
The maximum subset which includes edge 3.000000 is:
node = 2
The selected nodes are :
c =1 2
Iteration: 3.000000
The probability for edge i
p = 0 0 0 0 0 1
Choose edge following the distribution
i = 6
The maximum subset which includes edge 6.000000 is:
node = 4
The selected nodes are :
c = 1 2 4
Applications
Vertex covering problem – Example (Cont.)
Iteration: 1.000000
The probability for edge i
p = 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667
Choose edge following the distribution i = 6
The maximum subset which includes edge 6.000000 is: node = 5
The selected nodes are :
c = 5
Iteration: 2.000000
The probability for edge i
p = 0.3333 0.3333 0.3333 0 0 0
Choose edge following the distribution i = 3
The maximum subset which includes edge 3.000000 is: node = 2
The selected nodes are :
c = 5 2
Iteration: 3.000000
The probability for edge i
p = 1 0 0 0 0 0
Choose edge following the distribution i = 1
The maximum subset which includes edge 1.000000 is: node = 1
The selected nodes are :
c = 5 2 1
Summary
This paper presents a comprehensive meta-analysis on the weights update method
Various fields developed independently methods that has a common ground which can be generalized into one conceptual procedure
The procedure includes the determination of experts, events, penalty matrix, weights and an update rule
Additional relevant input: error size, size of and number of iterations.