1
3. Simulation Environment: Bounded Agents Results: 1. Background Myopic decision-making: Cognitive limitations prevent bounded agents from fully considering all possible the long-term consequences of our actions. Optimism bias: People systematically overestimate the probability of good outcomes [1] and underestimate how long it will take to achieve them [2]. Yet, optimists often perform better than realists [1]. Questions: Is it resource-rational to be optimistic? Hypothesis: The optimism bias rational decision-making by compensating for the limitation that people can look only a small number of steps ahead. Does the Optimism Bias Support Rational Action? Falk Lieder Sidharth Goel Ronald Kwan Thomas L. Griffiths 1 University of California at Berkeley, CA, USA, † Correspondence: [email protected] References: [1] T. Sharot, “The optimism bias,” Current Biology, vol. 21, no. 23, pp. R941–R945, 2011. [2] R. Buehler, D. Griffin, and M. Ross, “Exploring the” planning fallacy”: Why people underestimate their task completion times.,” Journal of personality and social psychology, vol. 67, no. 3, p. 366, 1994. [3] R. Neumann, A. N. Rafferty, and T. L. Griffiths, “A bounded rationality account of wishful thinking,” in Proceedings of the 36th Annual Conference of the Cognitive Science Society, 2014. [4] R. S. Sutton, “Integrated architectures for learning, planning, and reacting based on approximating dynamic programming,” in Proceedings of the seventh international conference on machine learning, pp. 216–224, 1990. [5] P. Auer, “Using confidence bounds for exploitation-exploration trade-offs,” J. Mach. Learn. Res., vol. 3, pp. 397–422, 2003. [6] I. Szita and A. Lorincz, “The many faces of optimism: a unifying approach,” in Proceedings of the 25 th international conference on Machine learning, pp. 1048–1055, ACM, 2008. [7] P. Sunehag and M. Hutter, “Rationality, optimism and guarantees in general reinforcement learning,” J. Mach. Learn. Res., vol. 16, pp. 1345–1390, 2015. [8] Stankevicius, A., Huys, Q. J. M., Kalra, A., & Series, P. (2014). Optimism as a Prior Belief about the Probability of Future Reward. PLoS Comput Biol, 10(5), e1003605. Acknowledgment: This work was supported by ONR MURI N00014-13-1-0341. 2. Model Environment: MDP lifetime: Bounded Rational Agent Internal Model: in which transition probabilities are distorted depending on the value of the transition (cf. [3]): Optimism: Realism: Optimism: Decision Mechanism: Limited Computational resources: Planning Horizon Expected Life Time 3. Experiment: Does optimism help people decide better? Product Manager Paradigm: 1. Simulation Phase 2. Decision Phase 1 episode of an MDP structurally equivalent to simulations Bonus of up to $1 proportional to financial gain in the game Participants: 337 adults recruited on Amazon Mechanical Turk Independent variables: 1. Horizon: 5, 12, 24, or 72 steps (months) 2. Progress rate in simulations: a) Pessimism (50% of true rate) b) Realism (true rate) c) Optimism: 80% Dependent variables: #investments, financial gain E = S , A , T , γ , r ( ) M = S , A , ˆ T , γ , r ( ) ˆ T α ( s '| s , a ) sigmoid( V E * ( s ') V E * ( s )) α α > 0 α = 0 α < 0 π h ( s ) = argmax a Q h ( s , a ) Q h ( s , a ) = E ˆ T r ( s t , a , S t + 1 ) + max π r ( S i , π ( S i ), S i+ 1 ) i =t + 1 t +h+ 1 " # $ % & ' h << EL [ ] L T(s'| s, a 1 ) = Binomial ( s' - s ; n=100, p) α pessimism = 10 α realism = 10 α optimism =+10 r 0 =+1 r 1 = -1 r 2 = +100 h = 5 p=0.2 Myopic realists and pessimists failed to work towards the goal. Optimism compensated for the limitations of myopic planning. Myopic Pessimism Myopic Realism Myopic Optimism Optimal Expected Returns 10 1 10 2 10 3 4. Results Main Experiment: Optimism failed to improve benefit decision-making Control Experiments: Belief manipulations were effective People‘s estimates were well expalined as Bayesian learning from the observed progress with a Beta-prior whose mean corresponds to 28% progress per step and a precision corresponding to about 300 observations. No significant effect of belief manipulation (F(2,306)=2.32, p=0.1000). No significant effect of belief manipulation (F(2,306)=2.36, p=0.0961). 5. Discussion Theory: Our theory of optimism generalizes the model by [3] to general MDPs. In contrast to previous notions of optimism in RL [4-7], we model optimism as a distortion of the transition probabilities. This distortion can be beneficial even when the environment is fully known. It may result from an optimistic prior on transition probabilities (cf. [8]). Experiments: While the simulations supported this perspective, the behavioral experiment did not. Limitation of the experiment: People could solve the task without planning by calculating how many steps it would take to fully develop HoverCar. Future experiments will eliminate this confound, e.g. using the scenario of [3]. Presented Rate of Progress 0 20 40 60 Estimated Rate of Progress 0 10 20 30 40 50 60 Bayesian Model of Optimism Observations Fit of Bayesian Model Task Duration in months 72 24 12 4 Estimated Progress Rate (in %) 0 10 20 30 40 50 Pessimism Realism Optimism Horizon 5 12 24 72 Financial Gain ( 1 SEM) 10 5 -2 0 2 4 6 8 10 Optimism Realism Pessimism Horizon 5 12 24 72 Investment Freq. in % 0 10 20 30 40 50 60 70 80

Does the Optimism Bias Support Rational Action?cocosci.princeton.edu/falk/Optimism.pdfOptimism as a Prior Belief about the Probability of Future Reward. PLoS Comput Biol, 10(5), e1003605

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

  • 3. SimulationEnvironment:

    Bounded Agents

    Results:

    1.  BackgroundMyopic decision-making: Cognitive limitations prevent bounded agents from fully considering all possible the long-term consequences of our actions.Optimism bias: People systematically overestimate the probability of good outcomes [1] and underestimate how long it will take to achieve them [2]. Yet, optimists often perform better than realists [1]. Questions: Is it resource-rational to be optimistic? Hypothesis: The optimism bias rational decision-making by compensating for the limitation that people can look only a small number of steps ahead.

    Does the Optimism Bias Support Rational Action?Falk Lieder ∙ Sidharth Goel ∙ Ronald Kwan ∙ Thomas L. Griffiths1 University of California at Berkeley, CA, USA, † Correspondence: [email protected]

    References: [1] T. Sharot, “The optimism bias,” Current Biology, vol. 21, no. 23, pp. R941–R945, 2011. [2] R. Buehler, D. Griffin, and M. Ross, “Exploring the” planning fallacy”: Why people underestimate their task completion times.,” Journal of personality and social psychology, vol. 67, no. 3, p. 366, 1994. [3] R. Neumann, A. N. Rafferty, and T. L. Griffiths, “A bounded rationality account of wishful thinking,” in Proceedings of the 36th Annual Conference of the Cognitive Science Society, 2014. [4] R. S. Sutton, “Integrated architectures for learning, planning, and reacting based on approximating dynamic programming,” in Proceedings of the seventh international conference on machine learning, pp. 216–224, 1990. [5] P. Auer, “Using confidence bounds for exploitation-exploration trade-offs,” J. Mach. Learn. Res., vol. 3, pp. 397–422, 2003. [6] I. Szita and A. Lorincz, “The many faces of optimism: a unifying approach,” in Proceedings of the 25th international conference on Machine learning, pp. 1048–1055, ACM, 2008. [7] P. Sunehag and M. Hutter, “Rationality, optimism and guarantees in general reinforcement learning,” J. Mach. Learn. Res., vol. 16, pp. 1345–1390, 2015. [8] Stankevicius, A., Huys, Q. J. M., Kalra, A., & Series, P. (2014). Optimism as a Prior Belief about the Probability of Future Reward. PLoS Comput Biol, 10(5), e1003605.

    Acknowledgment: This work was supported by ONR MURI N00014-13-1-0341.

    2.  ModelEnvironment: MDP lifetime:

    Bounded Rational Agent

    Internal Model: in which transition probabilities are distorted depending on the value of the transition (cf. [3]):

    Optimism: Realism: Optimism:

    Decision Mechanism:

    Limited Computational resources:

    Planning Horizon Expected Life Time

    3. Experiment: Does optimism help people decide better?•  Product Manager Paradigm:

    1.  Simulation Phase

    2.  Decision Phase

    •  1 episode of an MDP structurally equivalent to simulations

    •  Bonus of up to $1 proportional to financial gain in the game

    •  Participants: 337 adults recruited on Amazon Mechanical Turk

    •  Independent variables:

    1.  Horizon: 5, 12, 24, or 72 steps (months)

    2.  Progress rate in simulations:

    a)  Pessimism (50% of true rate)

    b)  Realism (true rate)

    c)  Optimism: 80%

    •  Dependent variables: #investments, financial gain

    E = S,A,T,γ, r( )

    M = S,A, T̂,γ, r( )

    T̂α (s ' | s,a) ⋅sigmoid(VE*(s ')−VE

    *(s))α

    α > 0 α = 0 α < 0

    π h (s) = argmaxa

    Qh (s,a)

    Qh (s,a) = ET̂ r(s t ,a,St+1)+maxπ r(S i ,π (S i ),Si+1)i=t+1

    t+h+1

    ∑"

    #$

    %

    &'

    h