Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
3. SimulationEnvironment:
Bounded Agents
Results:
1. BackgroundMyopic decision-making: Cognitive limitations prevent bounded agents from fully considering all possible the long-term consequences of our actions.Optimism bias: People systematically overestimate the probability of good outcomes [1] and underestimate how long it will take to achieve them [2]. Yet, optimists often perform better than realists [1]. Questions: Is it resource-rational to be optimistic? Hypothesis: The optimism bias rational decision-making by compensating for the limitation that people can look only a small number of steps ahead.
Does the Optimism Bias Support Rational Action?Falk Lieder ∙ Sidharth Goel ∙ Ronald Kwan ∙ Thomas L. Griffiths1 University of California at Berkeley, CA, USA, † Correspondence: [email protected]
References: [1] T. Sharot, “The optimism bias,” Current Biology, vol. 21, no. 23, pp. R941–R945, 2011. [2] R. Buehler, D. Griffin, and M. Ross, “Exploring the” planning fallacy”: Why people underestimate their task completion times.,” Journal of personality and social psychology, vol. 67, no. 3, p. 366, 1994. [3] R. Neumann, A. N. Rafferty, and T. L. Griffiths, “A bounded rationality account of wishful thinking,” in Proceedings of the 36th Annual Conference of the Cognitive Science Society, 2014. [4] R. S. Sutton, “Integrated architectures for learning, planning, and reacting based on approximating dynamic programming,” in Proceedings of the seventh international conference on machine learning, pp. 216–224, 1990. [5] P. Auer, “Using confidence bounds for exploitation-exploration trade-offs,” J. Mach. Learn. Res., vol. 3, pp. 397–422, 2003. [6] I. Szita and A. Lorincz, “The many faces of optimism: a unifying approach,” in Proceedings of the 25th international conference on Machine learning, pp. 1048–1055, ACM, 2008. [7] P. Sunehag and M. Hutter, “Rationality, optimism and guarantees in general reinforcement learning,” J. Mach. Learn. Res., vol. 16, pp. 1345–1390, 2015. [8] Stankevicius, A., Huys, Q. J. M., Kalra, A., & Series, P. (2014). Optimism as a Prior Belief about the Probability of Future Reward. PLoS Comput Biol, 10(5), e1003605.
Acknowledgment: This work was supported by ONR MURI N00014-13-1-0341.
2. ModelEnvironment: MDP lifetime:
Bounded Rational Agent
Internal Model: in which transition probabilities are distorted depending on the value of the transition (cf. [3]):
Optimism: Realism: Optimism:
Decision Mechanism:
Limited Computational resources:
Planning Horizon Expected Life Time
3. Experiment: Does optimism help people decide better?• Product Manager Paradigm:
1. Simulation Phase
2. Decision Phase
• 1 episode of an MDP structurally equivalent to simulations
• Bonus of up to $1 proportional to financial gain in the game
• Participants: 337 adults recruited on Amazon Mechanical Turk
• Independent variables:
1. Horizon: 5, 12, 24, or 72 steps (months)
2. Progress rate in simulations:
a) Pessimism (50% of true rate)
b) Realism (true rate)
c) Optimism: 80%
• Dependent variables: #investments, financial gain
E = S,A,T,γ, r( )
M = S,A, T̂,γ, r( )
T̂α (s ' | s,a) ⋅sigmoid(VE*(s ')−VE
*(s))α
α > 0 α = 0 α < 0
π h (s) = argmaxa
Qh (s,a)
Qh (s,a) = ET̂ r(s t ,a,St+1)+maxπ r(S i ,π (S i ),Si+1)i=t+1
t+h+1
∑"
#$
%
&'
h