Lesson 35 Game Theory and Linear Programming

Embed Size (px)

DESCRIPTION

Lesson 35 Game Theory and Linear Programming

Citation preview

  • Lesson 35Game Theory and Linear Programming

    Math 20

    December 14, 2007

    Announcements

    I Pset 12 due December 17 (last day of class)I Lecture notes and K&H on websiteI next OH Monday 12 (SC 323)

  • Outline

    RecapDefinitionsExamplesFundamental TheoremGames we can solve so far

    GT problems as LP problemsFrom the continuous to the discreteStandardizationRock/Paper/Scissors again

    The row players LP problem

  • DefinitionA zero-sum game is defined by a payoff matrix A, where aijrepresents the payoff to the row player if R chooses option i and Cchooses option j .

    I The row player chooses from the rows of the matrix, and thecolumn player from the columns.

    I The payoff could be a negative number, representing a netgain for the column player.

  • DefinitionA zero-sum game is defined by a payoff matrix A, where aijrepresents the payoff to the row player if R chooses option i and Cchooses option j .

    I The row player chooses from the rows of the matrix, and thecolumn player from the columns.

    I The payoff could be a negative number, representing a netgain for the column player.

  • DefinitionA strategy for a player consists of a probability vector representingthe portion of time each option is employed.

    I We use a row vector p for the row players strategy, and acolumn vector q for the column players strategy.

    I A pure strategy (select the same option every time) isrepresented by a standard basis vector ej or e

    j . For instance,

    if R has three choices and C has five:

    e2 =

    010

    e4 = (0 0 0 1 0)I A non-pure strategy is called mixed.

  • DefinitionA strategy for a player consists of a probability vector representingthe portion of time each option is employed.

    I We use a row vector p for the row players strategy, and acolumn vector q for the column players strategy.

    I A pure strategy (select the same option every time) isrepresented by a standard basis vector ej or e

    j . For instance,

    if R has three choices and C has five:

    e2 =

    010

    e4 = (0 0 0 1 0)I A non-pure strategy is called mixed.

  • DefinitionThe expected value of row and column strategies p and q is thescalar

    E (p,q) =n

    i ,j=1

    piaijqj = pAq

    Probabilistically, this is the amount the row player receives (or thecolumn player if its negative) if players employ these strategies.

  • DefinitionThe expected value of row and column strategies p and q is thescalar

    E (p,q) =n

    i ,j=1

    piaijqj = pAq

    Probabilistically, this is the amount the row player receives (or thecolumn player if its negative) if players employ these strategies.

  • Rock/Paper/Scissors

    Example

    What is the payoff matrix for Rock/Paper/Scissors?

    SolutionThe payoff matrix is

    A =

    0 1 11 0 11 1 0

    .

  • Rock/Paper/Scissors

    Example

    What is the payoff matrix for Rock/Paper/Scissors?

    SolutionThe payoff matrix is

    A =

    0 1 11 0 11 1 0

    .

  • Example

    Consider a new game: players R and C each choose a number 1,2, or 3. If they choose the same thing, C pays R that amount. Ifthey choose differently, R pays C the amount that C has chosen.What is the payoff matrix?

    Solution

    A =

    1 2 31 2 31 2 3

  • Example

    Consider a new game: players R and C each choose a number 1,2, or 3. If they choose the same thing, C pays R that amount. Ifthey choose differently, R pays C the amount that C has chosen.What is the payoff matrix?

    Solution

    A =

    1 2 31 2 31 2 3

  • Theorem (Fundamental Theorem of Matrix Games)

    There exist optimal strategies p for R and q for C such that forall strategies p and q:

    E (p,q) E (p,q) E (p,q)

    E (p,q) is called the value v of the game.

  • Theorem (Fundamental Theorem of Matrix Games)

    There exist optimal strategies p for R and q for C such that forall strategies p and q:

    E (p,q) E (p,q) E (p,q)

    E (p,q) is called the value v of the game.

  • Reflect on the inequality

    E (p,q) E (p,q) E (p,q)In other words,

    I E (p,q) E (p,q): R can guarantee a lower bound onhis/her payoff

    I E (p,q) E (p,q): C can guarantee an upper bound onhow much he/she loses

    I This value could be negative in which case C has theadvantage

  • Fundamental problem of zero-sum games

    I Find the p and q!I Last time we did these:

    I Strictly-determined gamesI 2 2 non-strictly-determined games

    I The general case well look at next.

  • Pure Strategies are optimal in Strictly-Determined Games

    TheoremLet A be a payoff matrix. If ars is a saddle point, then er is anoptimal strategy for R and es is an optimal strategy for C. Alsov = E (er , es) = ars .

  • Optimal strategies in 2 2 non-Strictly-Determined Games

    Let A be a 2 2 matrix with no saddle points. Then the optimalstrategies are

    p =(a22 a21

    a11 a12

    )q =

    a22 a12a11 a21

    where = a11 + a22 a12 a21. Also

    v =|A|

  • Outline

    RecapDefinitionsExamplesFundamental TheoremGames we can solve so far

    GT problems as LP problemsFrom the continuous to the discreteStandardizationRock/Paper/Scissors again

    The row players LP problem

  • This could get a little weird

    This derivation is not something that needs to be memorized, butshould be understood at least once.

  • Objectifying the problem

    Lets think about the problem from the column playersperspective. If she chooses strategy q, and R knew it, he wouldchoose p to maximize the payoff pAq. Thus the column playerwants to minimize that quantity. That is, C s objective is realizedwhen the payoff is

    E = minq

    (max

    ppAq.

    )

    This seems hard! Luckily, linearity, saves us.

  • Objectifying the problem

    Lets think about the problem from the column playersperspective. If she chooses strategy q, and R knew it, he wouldchoose p to maximize the payoff pAq. Thus the column playerwants to minimize that quantity. That is, C s objective is realizedwhen the payoff is

    E = minq

    (max

    ppAq.

    )This seems hard! Luckily, linearity, saves us.

  • From the continuous to the discrete

    LemmaRegardless of q, we have

    maxp

    pAq = max1im

    eiAq

    Here ei is the probability vector represents the pure strategy ofgoing only with choice i .

    The idea is that a weighted average of things is no bigger than thelargest of them. (Think about grades).

  • From the continuous to the discrete

    LemmaRegardless of q, we have

    maxp

    pAq = max1im

    eiAq

    Here ei is the probability vector represents the pure strategy ofgoing only with choice i .

    The idea is that a weighted average of things is no bigger than thelargest of them. (Think about grades).

  • Proof of the lemma

    Proof.We must have

    maxp

    pAq max1im

    eiAq

    (the maximum over a larger set must be at least as big). On theother hand, let q be C s strategy. Let the quantity on the right bemaximized when i = i0. Let p be any strategy for R. Notice thatp =

    i pie

    i . So

    E (p,q) = pAq =mi=1

    pieiAq

    mi=1

    piei0Aq

    =

    (mi=1

    pi

    )ei0Aq = e

    i0Aq.

    Thusmax

    ppAq ei0Aq.

  • The next step is to introduce a new variable v representing thevalue of this inner maximization. Our objective is to minimize it.Saying its the maximum of all payoffs from pure strategies is thesame as saying

    v eiAqfor all i . So we finally have something that looks like an LPproblem! We want to choose q and v which minimize v subject tothe constraints

    v eiAq i = 1, 2, . . .mqj 0 j = 1, 2, . . . n

    nj=1

    qj = 1

  • Trouble with this formulation

    I Simplex method with equalities?I Not in standard form

    Resolution:

    I We may assume all aij 0, so v > 0I Let xj =

    qjv

  • Since we know v > 0, we still have x 0. Nown

    j=1

    xj =1

    v

    nj=1

    qj =1

    v.

    So our problem is now to choose x 0 which maximizes j xj .The constraints now take the form

    v eiAq 1 eiAx,

    for all i . Another way to write this is

    Ax 1,

    where 1 is the vector consisting of all ones.

  • Upshot

    TheoremConsider a game with payoff matrix A, where each entry of A is

    positive. The column players optimal strategy q isx

    x1 + + xn ,where x 0 satisfies the LP problem of maximizing x1 + + xnsubject to the constraints Ax 1.

  • Rock/Paper Scissors

    The payoff matrix is

    A =

    0 1 11 0 11 1 0

    .

    We can add 2 to everything to make

    A =

    2 1 33 2 11 3 2

    .

  • Rock/Paper Scissors

    The payoff matrix is

    A =

    0 1 11 0 11 1 0

    .We can add 2 to everything to make

    A =

    2 1 33 2 11 3 2

    .

  • Convert to LP

    The problem is to maximize x1 + x2 + x3 subject to the constraints

    2x1 + x2 + 3x3 13x1 + 2x2 + x3 1x1 + 3x3 + 2x3 1.

    We introduce slack variables y1, y2, and y3, so the constraints nowbecome

    2x1 + x2 + 3x3 + y1 = 1

    3x1 + 2x2 + x3 + y2 = 1

    x1 + 3x3 + 2x3 + y3 = 1.

  • An easy initial basic solution is to let x = 0 and y = 1. The initialtableau is therefore

    x1 x2 x3 y1 y2 y3 z valuey1 2 1 3 1 0 0 0 1y2 3 2 1 0 1 0 0 1y3 1 3 2 0 0 1 0 1z 1 1 1 0 0 0 1 0

  • Which should be the entering variable? The coefficients in thebottom row are all the same, so lets just pick one, x1. To find thedeparting variable, we look at the ratios 12 ,

    13 , and

    11 . So y2 is the

    departing variable.We scale row 2 by 13 :

    x1 x2 x3 y1 y2 y3 z valuey1 2 1 3 1 0 0 0 1y2 1 2/3 1/3 0 1/3 0 0 1/3y3 1 3 2 0 0 1 0 1z 1 1 1 0 0 0 1 0

  • Then we use row operations to zero out the rest of column one:

    x1 x2 x3 y1 y2 y3 z valuey1 0 1/3 7/3 1 2/3 0 0 1/3x1 1 2/3 1/3 0 1/3 0 0 1/3y3 0 7/3 5/3 0 1/3 1 0 2/3z 0 1/3 2/3 0 1/3 0 1 1/3

  • We can still improve this: x3 is the entering variable and y1 is thedeparting variable. The new tableau is

    x1 x2 x3 y1 y2 y3 z valuex3 0 1/7 1 3/7 2/7 0 0 1/7x1 1 5/7 0 1/7 3/7 0 0 2/7y3 0 18/7 0 5/7 1/7 1 0 3/7z 0 3/7 0 2/7 1/7 0 1 3/7

  • Finally, entering x2 and departing y3 gives

    x1 x2 x3 y1 y2 y3 z valuex3 0 0 1 7/18 5/18 1/18 0 1/6x1 1 0 0 1/18 7/18 5/18 0 1/6x2 0 1 0 5/18 1/18 7/18 0 1/6z 0 0 0 1/6 1/6 1/6 1 1/2

  • So the x variables have values x1 = 1/6, x2 = 1/6, x3 = 1/6.Furthermore z = x1 + x2 + x3 = 1/2, so v = 1/z = 2. This alsomeans that p1 = 1/3, p2 = 1/3, and p3 = 1/3. So the optimalstrategy is to do each thing the same number of times.

  • Outline

    RecapDefinitionsExamplesFundamental TheoremGames we can solve so far

    GT problems as LP problemsFrom the continuous to the discreteStandardizationRock/Paper/Scissors again

    The row players LP problem

  • Now lets think about the problem from the column playersperspective. If he chooses strategy p, and C knew it, he wouldchoose p to minimize the payoff pAq. Thus the row player wantsto maximize that quantity. That is, Rs objective is realized whenthe payoff is

    E = maxp

    minq

    pAq.

  • LemmaRegardless of p, we have

    minq

    pAq = min1jn

    pAej

  • The next step is to introduce a new variable v representing thevalue of this inner minimization. Our objective is to maximize it.Saying its the minimum of all payoffs from pure strategies is thesame as saying

    v pAejfor all j . Again, we have something that looks like an LP problem!We want to choose p and v which maximize v subject to theconstraints

    v pAej j = 1, 2, . . . npi 0 i = 1, 2, . . .m

    mi=1

    pi = 1

  • As before, we can standardize this by renaming

    y =1

    vp

    (this makes y a column vector). Then

    mi=1

    yi =1

    v,

    So maximizing v is the same as minimizing 1y. Likewise, theequations of constraint become v (vy)Aej for all j , or yA 1,or (taking transposes) Ay 1. If all the entries of A are positive,we may assume that v is positive, so the constraints p 0 aresatisfied if and only if y 0.

  • Upshot

    TheoremConsider a game with payoff matrix A, where each entry of A is

    positive. The row players optimal strategy p isy

    y1 + + yn ,where y 0 satisfies the LP problem of minimizingy1 + + yn = 1y subject to the constraints Ay 1.

  • The big idea

    The big observation is this:

    TheoremThe row players LP problem is the dual of the column players LPproblem.

  • The final tableau in the Rock/Paper/Scissors LP problem was this:

    x1 x2 x3 y1 y2 y3 z valuex3 0 0 1 7/18 5/18 1/18 0 1/6x1 1 0 0 1/18 7/18 5/18 0 1/6x2 0 1 0 5/18 1/18 7/18 0 1/6z 0 0 0 1/6 1/6 1/6 1 1/2

    The entries in the objective row below the slack variables are thesolutions to the dual problem! In this case, we have the samevalues, which means R has the same strategy as C . This reflectsthe symmetry of the original game.

  • Example

    Consider the game: players R and C each choose a number 1, 2,or 3. If they choose the same thing, C pays R that amount. Ifthey choose differently, R pays C the amount that C has chosen.What should each do?

    Answer.

    Choice R C

    1 54.5% 22.7%

    2 27.3% 36.3%

    3 18.2% 40.1%

    The expected payoff is 2.71 to the column player.

  • Example

    Consider the game: players R and C each choose a number 1, 2,or 3. If they choose the same thing, C pays R that amount. Ifthey choose differently, R pays C the amount that C has chosen.What should each do?

    Answer.

    Choice R C

    1 54.5% 22.7%

    2 27.3% 36.3%

    3 18.2% 40.1%

    The expected payoff is 2.71 to the column player.

    AnnouncementsRecapDefinitionsExamplesFundamental TheoremGames we can solve so far

    GT problems as LP problemsFrom the continuous to the discreteStandardizationRock/Paper/Scissors again

    The row player's LP problem