Transcript
  • 3. SimulationEnvironment:

    Bounded Agents

    Results:

    1.  BackgroundMyopic decision-making: Cognitive limitations prevent bounded agents from fully considering all possible the long-term consequences of our actions.Optimism bias: People systematically overestimate the probability of good outcomes [1] and underestimate how long it will take to achieve them [2]. Yet, optimists often perform better than realists [1]. Questions: Is it resource-rational to be optimistic? Hypothesis: The optimism bias rational decision-making by compensating for the limitation that people can look only a small number of steps ahead.

    Does the Optimism Bias Support Rational Action?Falk Lieder ∙ Sidharth Goel ∙ Ronald Kwan ∙ Thomas L. Griffiths1 University of California at Berkeley, CA, USA, † Correspondence: [email protected]

    References: [1] T. Sharot, “The optimism bias,” Current Biology, vol. 21, no. 23, pp. R941–R945, 2011. [2] R. Buehler, D. Griffin, and M. Ross, “Exploring the” planning fallacy”: Why people underestimate their task completion times.,” Journal of personality and social psychology, vol. 67, no. 3, p. 366, 1994. [3] R. Neumann, A. N. Rafferty, and T. L. Griffiths, “A bounded rationality account of wishful thinking,” in Proceedings of the 36th Annual Conference of the Cognitive Science Society, 2014. [4] R. S. Sutton, “Integrated architectures for learning, planning, and reacting based on approximating dynamic programming,” in Proceedings of the seventh international conference on machine learning, pp. 216–224, 1990. [5] P. Auer, “Using confidence bounds for exploitation-exploration trade-offs,” J. Mach. Learn. Res., vol. 3, pp. 397–422, 2003. [6] I. Szita and A. Lorincz, “The many faces of optimism: a unifying approach,” in Proceedings of the 25th international conference on Machine learning, pp. 1048–1055, ACM, 2008. [7] P. Sunehag and M. Hutter, “Rationality, optimism and guarantees in general reinforcement learning,” J. Mach. Learn. Res., vol. 16, pp. 1345–1390, 2015. [8] Stankevicius, A., Huys, Q. J. M., Kalra, A., & Series, P. (2014). Optimism as a Prior Belief about the Probability of Future Reward. PLoS Comput Biol, 10(5), e1003605.

    Acknowledgment: This work was supported by ONR MURI N00014-13-1-0341.

    2.  ModelEnvironment: MDP lifetime:

    Bounded Rational Agent

    Internal Model: in which transition probabilities are distorted depending on the value of the transition (cf. [3]):

    Optimism: Realism: Optimism:

    Decision Mechanism:

    Limited Computational resources:

    Planning Horizon Expected Life Time

    3. Experiment: Does optimism help people decide better?•  Product Manager Paradigm:

    1.  Simulation Phase

    2.  Decision Phase

    •  1 episode of an MDP structurally equivalent to simulations

    •  Bonus of up to $1 proportional to financial gain in the game

    •  Participants: 337 adults recruited on Amazon Mechanical Turk

    •  Independent variables:

    1.  Horizon: 5, 12, 24, or 72 steps (months)

    2.  Progress rate in simulations:

    a)  Pessimism (50% of true rate)

    b)  Realism (true rate)

    c)  Optimism: 80%

    •  Dependent variables: #investments, financial gain

    E = S,A,T,γ, r( )

    M = S,A, T̂,γ, r( )

    T̂α (s ' | s,a) ⋅sigmoid(VE*(s ')−VE

    *(s))α

    α > 0 α = 0 α < 0

    π h (s) = argmaxa

    Qh (s,a)

    Qh (s,a) = ET̂ r(s t ,a,St+1)+maxπ r(S i ,π (S i ),Si+1)i=t+1

    t+h+1

    ∑"

    #$

    %

    &'

    h