Does the Optimism Bias Support Rational Action?cocosci.princeton.edu/falk/Optimism.pdfOptimism as a Prior Belief about the Probability of Future Reward. PLoS Comput Biol, 10(5), e1003605

3. Simulation Environment: Bounded Agents Results: 1. Background Myopic decision-making: Cognitive limitations prevent bounded agents from fully considering all possible the long-term consequences of our actions. Optimism bias: People systematically overestimate the probability of good outcomes [1] and underestimate how long it will take to achieve them [2]. Yet, optimists often perform better than realists [1]. Questions: Is it resource-rational to be optimistic? Hypothesis: The optimism bias rational decision-making by compensating for the limitation that people can look only a small number of steps ahead. Does the Optimism Bias Support Rational Action? Falk Lieder ∙ Sidharth Goel ∙ Ronald Kwan ∙ Thomas L. Grifﬁths 1 University of California at Berkeley, CA, USA, † Correspondence: [email protected] References: [1] T. Sharot, “The optimism bias,” Current Biology, vol. 21, no. 23, pp. R941–R945, 2011. [2] R. Buehler, D. Griffin, and M. Ross, “Exploring the” planning fallacy”: Why people underestimate their task completion times.,” Journal of personality and social psychology, vol. 67, no. 3, p. 366, 1994. [3] R. Neumann, A. N. Rafferty, and T. L. Griffiths, “A bounded rationality account of wishful thinking,” in Proceedings of the 36th Annual Conference of the Cognitive Science Society, 2014. [4] R. S. Sutton, “Integrated architectures for learning, planning, and reacting based on approximating dynamic programming,” in Proceedings of the seventh international conference on machine learning, pp. 216–224, 1990. [5] P. Auer, “Using confidence bounds for exploitation-exploration trade-offs,” J. Mach. Learn. Res., vol. 3, pp. 397–422, 2003. [6] I. Szita and A. Lorincz, “The many faces of optimism: a unifying approach,” in Proceedings of the 25 th international conference on Machine learning, pp. 1048–1055, ACM, 2008. [7] P. Sunehag and M. Hutter, “Rationality, optimism and guarantees in general reinforcement learning,” J. Mach. Learn. Res., vol. 16, pp. 1345–1390, 2015. [8] Stankevicius, A., Huys, Q. J. M., Kalra, A., & Series, P. (2014). Optimism as a Prior Belief about the Probability of Future Reward. PLoS Comput Biol, 10(5), e1003605. Acknowledgment: This work was supported by ONR MURI N00014-13-1-0341. 2. Model Environment: MDP lifetime: Bounded Rational Agent Internal Model: in which transition probabilities are distorted depending on the value of the transition (cf. [3]): Optimism: Realism: Optimism: Decision Mechanism: Limited Computational resources: Planning Horizon Expected Life Time 3. Experiment: Does optimism help people decide better? • Product Manager Paradigm: 1. Simulation Phase 2. Decision Phase • 1 episode of an MDP structurally equivalent to simulations • Bonus of up to $1 proportional to financial gain in the game • Participants: 337 adults recruited on Amazon Mechanical Turk • Independent variables: 1. Horizon: 5, 12, 24, or 72 steps (months) 2. Progress rate in simulations: a) Pessimism (50% of true rate) b) Realism (true rate) c) Optimism: 80% • Dependent variables: #investments, financial gain E = S , A , T , γ , r ( ) M = S , A , ˆ T , γ , r ( ) ˆ T α ( s '| s , a ) ⋅ sigmoid( V E * ( s ') − V E * ( s )) α α > 0 α = 0 α < 0 π h ( s ) = argmax a Q h ( s , a ) Q h ( s , a ) = E ˆ T r ( s t , a , S t + 1 ) + max π r ( S i , π ( S i ), S i+ 1 ) i =t + 1 t +h+ 1 ∑ " # $ % & ' h << EL [ ] L T(s'| s, a 1 ) = Binomial ( s' - s ; n=100, p) α pessimism = −10 α realism = −10 α optimism =+10 r 0 =+1 r 1 = -1 r 2 = +100 h = 5 p=0.2 • Myopic realists and pessimists failed to work towards the goal. • Optimism compensated for the limitations of myopic planning. Myopic Pessimism Myopic Realism Myopic Optimism Optimal Expected Returns 10 1 10 2 10 3 4. Results Main Experiment: Optimism failed to improve benefit decision-making Control Experiments: Belief manipulations were effective People‘s estimates were well expalined as Bayesian learning from the observed progress with a Beta-prior whose mean corresponds to 28% progress per step and a precision corresponding to about 300 observations. No significant effect of belief manipulation (F(2,306)=2.32, p=0.1000). No significant effect of belief manipulation (F(2,306)=2.36, p=0.0961). 5. Discussion Theory: • Our theory of optimism generalizes the model by [3] to general MDPs. • In contrast to previous notions of optimism in RL [4-7], we model optimism as a distortion of the transition probabilities. • This distortion can be beneficial even when the environment is fully known. • It may result from an optimistic prior on transition probabilities (cf. [8]). Experiments: • While the simulations supported this perspective, the behavioral experiment did not. • Limitation of the experiment: People could solve the task without planning by calculating how many steps it would take to fully develop HoverCar. • Future experiments will eliminate this confound, e.g. using the scenario of [3]. Presented Rate of Progress 0 20 40 60 Estimated Rate of Progress 0 10 20 30 40 50 60 Bayesian Model of Optimism Observations Fit of Bayesian Model Task Duration in months 72 24 12 4 Estimated Progress Rate (in %) 0 10 20 30 40 50 Pessimism Realism Optimism Horizon 5 12 24 72 Financial Gain ( 1 SEM) 10 5 -2 0 2 4 6 8 10 Optimism Realism Pessimism Horizon 5 12 24 72 Investment Freq. in % 0 10 20 30 40 50 60 70 80

Documents

Does the Optimism Bias Support Rational Action?cocosci.princeton.edu/falk/Optimism.pdfOptimism as a Prior Belief about the Probability of Future Reward. PLoS Comput Biol, 10(5), e1003605