2
POSTER PRESENTATION Open Access An abstract model of the basal ganglia, reward learning and action selection Pierre Berthet 1,2* , Anders Lansner 1,2,3 From Twentieth Annual Computational Neuroscience Meeting: CNS*2011 Stockholm, Sweden. 23-28 July 2011 Learning is of major interest for various fields, from human pathologies to artificial intelligence. Animal and biology experiments have provided a great amount of data in this area, especially in classical and operant con- ditioning. Reinforcement learning is largely used in computational models in order to reproduce and explain these observations. This method enables to model a wide range of phenomena from the neuronal level to the behavior of a whole organism. Studying how infor- mation about reward or punishment is processed in the brain is crucial in the understanding of the action selec- tion and decision-making in normal and pathological conditions. Basal Ganglia are a group of nuclei playing a major role in processing motor, associative and limbic information [1] and could be specialized in resolving conflicts between these sub-systems that compete for access to limited cognitive ressources [2]. Dopaminergic neurons activity is believed to be related to the reward prediction error and is involved in long term potentia- tion (LTP) and depression (LTD) in the striatum [3]. We present here an abstract computational model of the Basal Ganglia using reinforcement learning and Bayesian inference. It is based on a dual pathway archi- tecture similar to the direct / Go and indirect / NoGo pathways that have been found in biology [4]. The units could be seen as grandmother cells, with inputs consist- ing of states and outputs being actions. Thus as a result of the activity in both pathways, an action will be selected based on the state and the previous outcomes of the different actions when dealing with the same state, that is if a reward has been obtained or not. One aim of our model is to be biologically plausible, weights can be updated via a triple activation Hebbian learning rule, similar to pre-synaptic activity, post-synaptic depo- larisation and dopamine level [5]. The update equation is based on the probability of co-activity of the different active units. In its current form, the model uses trace activation and a simple delay mechanism. Basically, when a reward occurs, the weight between the active units is increased in the Go projection while it is decreased in the NoGo connection. In a one to one mapping learning scheme (only one specific action for a given state triggers a reward), the simulation shows good results in both learning and re- learning, i.e. the mapping is then shuffled. One interesting feature is the homeostasis property of the units: the variations of the sum of the outgoing and ingoing weights and bias, of a particular unit, are very small. In constrained set up (low numbers of possible actions and states), similar to experimental design, learning in stochastic reward occurrence is handled and the results are similar to the data. The response of mid- brain dopamine neurons is positively correlated with the number of unrewarded trials [6] and our model pro- duces a similar result with the prediction-error value. Future works will focus on reproducing conditioning phenomena and implementing spiking neurons. Acknowledgements FACETS-ITN 237955, Swedish Scientific Counsel Author details 1 Department of Numerical Analysis and Computer Science, Stockholm University, Stockholm, 106 91, Sweden. 2 Stockholm Brain Institute, Karolinska Institutet Stockholm, 171 77, Sweden. 3 Department of Computational Biology, Royal Institute of Technology (KTH), Stockholm, 106 91, Sweden. Published: 18 July 2011 References 1. Mink JW: The basal ganglia: focused selection and inhibition of competing motor programs. Progress in neurobiology 1996, 50(4):381-425. * Correspondence: [email protected] 1 Department of Numerical Analysis and Computer Science, Stockholm University, Stockholm, 106 91, Sweden Full list of author information is available at the end of the article Berthet and Lansner BMC Neuroscience 2011, 12(Suppl 1):P189 http://www.biomedcentral.com/1471-2202/12/S1/P189 © 2011 Berthet and Lansner; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

An abstract model of the basal ganglia, reward learning and action selection

Embed Size (px)

Citation preview

POSTER PRESENTATION Open Access

An abstract model of the basal ganglia, rewardlearning and action selectionPierre Berthet1,2*, Anders Lansner1,2,3

From Twentieth Annual Computational Neuroscience Meeting: CNS*2011Stockholm, Sweden. 23-28 July 2011

Learning is of major interest for various fields, fromhuman pathologies to artificial intelligence. Animal andbiology experiments have provided a great amount ofdata in this area, especially in classical and operant con-ditioning. Reinforcement learning is largely used incomputational models in order to reproduce and explainthese observations. This method enables to model awide range of phenomena from the neuronal level tothe behavior of a whole organism. Studying how infor-mation about reward or punishment is processed in thebrain is crucial in the understanding of the action selec-tion and decision-making in normal and pathologicalconditions. Basal Ganglia are a group of nuclei playing amajor role in processing motor, associative and limbicinformation [1] and could be specialized in resolvingconflicts between these sub-systems that compete foraccess to limited cognitive ressources [2]. Dopaminergicneurons activity is believed to be related to the rewardprediction error and is involved in long term potentia-tion (LTP) and depression (LTD) in the striatum [3].We present here an abstract computational model of

the Basal Ganglia using reinforcement learning andBayesian inference. It is based on a dual pathway archi-tecture similar to the direct / Go and indirect / NoGopathways that have been found in biology [4]. The unitscould be seen as grandmother cells, with inputs consist-ing of states and outputs being actions. Thus as a resultof the activity in both pathways, an action will beselected based on the state and the previous outcomesof the different actions when dealing with the samestate, that is if a reward has been obtained or not. Oneaim of our model is to be biologically plausible, weightscan be updated via a triple activation Hebbian learning

rule, similar to pre-synaptic activity, post-synaptic depo-larisation and dopamine level [5]. The update equationis based on the probability of co-activity of the differentactive units. In its current form, the model uses traceactivation and a simple delay mechanism. Basically,when a reward occurs, the weight between the activeunits is increased in the Go projection while it isdecreased in the NoGo connection.In a one to one mapping learning scheme (only one

specific action for a given state triggers a reward), thesimulation shows good results in both learning and re-learning, i.e. the mapping is then shuffled.One interesting feature is the homeostasis property of

the units: the variations of the sum of the outgoing andingoing weights and bias, of a particular unit, are verysmall. In constrained set up (low numbers of possibleactions and states), similar to experimental design,learning in stochastic reward occurrence is handled andthe results are similar to the data. The response of mid-brain dopamine neurons is positively correlated with thenumber of unrewarded trials [6] and our model pro-duces a similar result with the prediction-error value.Future works will focus on reproducing conditioningphenomena and implementing spiking neurons.

AcknowledgementsFACETS-ITN 237955, Swedish Scientific Counsel

Author details1Department of Numerical Analysis and Computer Science, StockholmUniversity, Stockholm, 106 91, Sweden. 2Stockholm Brain Institute, KarolinskaInstitutet Stockholm, 171 77, Sweden. 3Department of ComputationalBiology, Royal Institute of Technology (KTH), Stockholm, 106 91, Sweden.

Published: 18 July 2011

References1. Mink JW: The basal ganglia: focused selection and inhibition of

competing motor programs. Progress in neurobiology 1996, 50(4):381-425.

* Correspondence: [email protected] of Numerical Analysis and Computer Science, StockholmUniversity, Stockholm, 106 91, SwedenFull list of author information is available at the end of the article

Berthet and Lansner BMC Neuroscience 2011, 12(Suppl 1):P189http://www.biomedcentral.com/1471-2202/12/S1/P189

© 2011 Berthet and Lansner; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.

2. Redgrave P, Prescott T, Gurney K: The basal ganglia: a vertebrate solutionto the selection problem? Neuroscience 1999, 89(4):1009-1023.

3. Cohen MX, Frank MJ: Neurocomputational models of basal gangliafunction in learning, memory and choice. Behavioural brain research 2009,199(1):141-156.

4. Surmeier DJ, Ding J, Day M, Wang Z, Shen W: D1 and D2 dopamine-receptor modulation of striatal glutamatergic signaling in striatalmedium spiny neurons. Trends in neurosciences 2007, 30(5):228-235.

5. Reynolds JNJ, Wickens JR: Dopamine-dependent plasticity ofcorticostriatal synapses. Neural Networks 2002, 15(4-6):507-521.

6. Nakahara H, Itoh H, Kawagoe R, Takikawa Y, Hikosaka O: Dopamineneurons can represent context-dependent prediction error. Neuron 2004,41(2):269-280.

doi:10.1186/1471-2202-12-S1-P189Cite this article as: Berthet and Lansner: An abstract model of the basalganglia, reward learning and action selection. BMC Neuroscience 2011 12(Suppl 1):P189.

Submit your next manuscript to BioMed Centraland take full advantage of:

• Convenient online submission

• Thorough peer review

• No space constraints or color figure charges

• Immediate publication on acceptance

• Inclusion in PubMed, CAS, Scopus and Google Scholar

• Research which is freely available for redistribution

Submit your manuscript at www.biomedcentral.com/submit

Berthet and Lansner BMC Neuroscience 2011, 12(Suppl 1):P189http://www.biomedcentral.com/1471-2202/12/S1/P189

Page 2 of 2