20
Emile van Krieken Deep Learning 2020 Lecture 9: Reinforcement Learning dlvu.github.io Reinforcement learning Train agent to act in an environment by maximizing reward Comparison Supervised learning: Exact action given Unsupervised learning: No reward given Deep Reinforcement Learning Agents use neural network policies REINFORCEMENT LEARNING 2 In Reinforcement Learning, we try to learn a policy of an agent that acts in an environment. The agent should act ‘optimally’, in the sense that it should maximize rewards it receives from the environment. This is different from supervised learning, where agents are told exactly what is the best action to perform. The agent just receives a reward from the environment, but no feedback as to what action leads to good rewards. Furthermore, it’s also not like unsupervised learning, where no explicit feedback on how to act is given. We do receive rewards! REINFORCEMENT LEARNING LOOP 3 learner policy environment state s t reward r t ac9on a t - This is the standard reinforcement loop that most approaches follow. We have an environment, like a game, or the real world in which a robot has to act. We assume it is black-box, which means that we have no information about what the environment looks like. The environment receives actions from the agent, which follows a policy. A policy is a probability distribution that suggests which action to take in some state. Of course, it receives the state again from the environment. - This loop happens for every timestep t. The environment presents a state s_t to the policy \pi_\theta, which then chooses an action a_t. The environment does some magic, and chooses the next state s_{t+1}. It also returns a reward r_{t+1} for that time step, which could be received because the agent achieves some goal in the environment. This reward is then used in the learner to update the policy parameters.

Lecture 9: Reinforcement Learning Emile van Krieken · 2021. 1. 25. · Emile van Krieken Deep Learning 2020 Lecture 9: Reinforcement Learning dlvu.github.io Reinforcement learning

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 9: Reinforcement Learning Emile van Krieken · 2021. 1. 25. · Emile van Krieken Deep Learning 2020 Lecture 9: Reinforcement Learning dlvu.github.io Reinforcement learning

Emile van Krieken Deep Learning 2020

Lecture 9: Reinforcement Learning

dlvu.github.io

Reinforcement learning • Train agent to act in an environment by maximizing reward Comparison • Supervised learning: Exact action given • Unsupervised learning: No reward given Deep Reinforcement Learning • Agents use neural network policies

REINFORCEMENT LEARNING

2

In Reinforcement Learning, we try to learn a policy of an agent that acts in an environment. The agent should act ‘optimally’, in the sense that it should maximize rewards it receives from the environment.

This is different from supervised learning, where agents are told exactly what is the best action to perform. The agent just receives a reward from the environment, but no feedback as to what action leads to good rewards.

Furthermore, it’s also not like unsupervised learning, where no explicit feedback on how to act is given. We do receive rewards!

REINFORCEMENT LEARNING LOOP

3

learnerpolicy <latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>

⇡✓environment

state <latexit sha1_base64="/pzW3Mv4fuxhZwPDsc7BaKD9MQQ=">AAAOQnicfZfLbuM2GIU109s0rduZdtmN0GCAoggCKXF8WQwwlmxjFp2ZNMitjYOAomlZCCUSJOXYI+glum3fpk/RR+iy6LaLUrIjSxRlbcLwHB5/+iUKPz2KAy4s668nTz/6+JNPP3v2+d4XX7a++vr5i28uOYkZRBeQYMKuPcARDiJ0IQKB0TVlCIQeRlfevZvpVwvEeECic7Gi6DYEfhTMAgiEnLqecAEEuhN3z/etQyu/zPrA3gz2jc11eveitT+ZEhiHKBIQA85vbIuK2wQwEUCM0r1JzBEF8B746CYWs95tEkQ0FiiCqflSarMYm4KYGZM5DRiCAq/kAEAWyAQTzgEDUEjyvWoURxEIET+YLgLK10O+8NcDAeRt3ybLvCxpZWHiM0DnAVxWyBIQ8hCIeW2Sr0KvOolijNgirE5mlJJRcS4RgwHPanAqC/OeZpXm5+R0o89XdI4iniYxw2l5oRQQY2gmF+ZDjkRMk/xm5OO9568Ei9FBNsznXg0Buz9D0wOZU5mo4swwAaI65YVZcSL0AEkYgmiaTGiaTARaimRycJhK8aXpYWn2CGDTqvMsTZJJVjPPM8+ktSK+K4nvVHFUEkebHyF4as4IMxfy+RPGTWk0pYUFEPHq6oti9cy8UKMvS+KlKl6VxCtV9OKSGtfURUld1NSHkvqgqsuSuFTFVUlcqeKHkvhBFa93xf6yK/ZXJVbWX26K1d5kimby65G/Qsk9TJM3529/SpN+fm3ehRiZdtUIvUfj8bjT7XdTVcaPenvcs51hXS8MXWdguzpD4Rh0XWs42rIcKd4C2rI6o0FHjYJ4q/f67riub2GtQXfoaAxb2rHbHnU35UMoUqx+4XP7nXatLH6R03cc56Rf1wuD03bd3pHGUDjc4XA4cHMUGjOKkeKlj8ZO58SqR9EiqGd12gONvn0AVs9xarC0xOKMHXto5ywCAaw4RfGyuL1Bf6QGiW39nYHr1h6gKJW/59rDtsawZT0ZnoyOcxLCQOSrVSFF+Trd3nFPjSJF0Lgrn2CNhWx/adx3rG6NhZRYxo7rZIWt7kS5yW7s22T9yd3uO3PfNtNUNcuNppqzvfdoVrxYY8bNbq19l1+/ADfC1+8UwqZ4qAmHjTBQxwKb4aEWHu6C9+t+vyne14T7jTC+jsVvhve18P4ueFr306Z4qgmnjTBUx0Kb4akWnu6CF3W/aIoXmnDRCCN0LKIZXmjhxS54UveTpniiCSeNMETHQprhiRae7IBn6EG2fFmnIN+uhNUiZU8uu9lchzgBNT0/T+QywQmvyZ6YIwEy3QslU/5P3QPiwiGHqj4NOCRxJDaQGCcTH0gtXfc0NDuCAGxmLTzBZhBtupzKB5hmi++hbHzX7nUhhkgeZRh6K3uk97L/BrIl/VHeMvPDQN6y/Ds5yEa7jGD5aJSjPXmsstVDVH1wdXRotw9t++f2/uvO5oT1zPjO+N74wbCNrvHaeGOcGhcGNLDxm/G78Ufrz9bfrX9a/66tT59s1nxrVK7Wf/8Db7dJIA==</latexit>st

reward <latexit sha1_base64="JCho6RHVaZ12zLSSoSGaNUMc/P8=">AAAOQ3icfZdNb9s2HMbV7q3L5rXdjrsICwoMQxBIieOXQ4Faso0e1jYL8tbVQUDRtCyEEgmKcuwK+hS7bt9mX2JfYcdh1wGjZEeWSMq65B8+Dx//9JcokB7FQcwt669Hjz/59LPPv3jy5d5XX7e+efrs+beXMUkYRBeQYMKuPRAjHEToggcco2vKEAg9jK68OzfXrxaIxQGJzvmKopsQ+FEwCyDgYuj9hKF7wKa3/PbZvnVoFZepFvam2Dc21+nt89b+ZEpgEqKIQwzi+INtUX6TAsYDiFG2N0liRAG8Az76kPBZ7yYNIppwFMHMfCG0WYJNTswcypwGDEGOV6IAkAUiwYRzwADkAn2vHhWjCIQoPpguAhqvy3jhrwsOxH3fpMuiL1ltYuozQOcBXNbIUhDGIeBzZTBehV59ECUYsUVYH8wpBaPkXCIGgzjvwalozDuatzo+J6cbfb6icxTFWZownFUnCgExhmZiYlHGiCc0LW5GPN+7+CVnCTrIy2Ls5RCwuzM0PRA5tYE6zgwTwOtDXpg3J0L3kIQhiKbphGbphKMlTycHh5kQX5geFmaPiLej7jzL0nSS98zzzDNhrYlvK+JbWRxVxNHmRwiemjPCzIV4/oTFpjCawsICiOL67Ity9sy8kKMvK+KlLF5VxCtZ9JKKmijqoqIuFPW+ot7L6rIiLmVxVRFXsvixIn6Uxetdse93xf4qxYr+i0Wx2ptM0Ux8PopXKL2DWfr6/M3PWdovrs27kCDTrhuh92A8Hne6/W4my/hBb497tjNU9dLQdQa2qzOUjkHXtYajLcuR5C2hLaszGnTkKIi3eq/vjlV9C2sNukNHY9jSjt32qLtpH0KRZPVLn9vvtJW2+GVO33Gck76qlwan7bq9I42hdLjD4XDgFig0YRQjyUsfjJ3OiaVG0TKoZ3XaA42+fQBWz3EUWFphccaOPbQLFo4Alpy8fFnc3qA/koP4tv/OwHWVB8gr7e+59rCtMWxZT4Yno+OChDAQ+XJXSNm+Trd33JOjSBk07oonqLCQ7S+N+47VVVhIhWXsuE7e2PpKFIvsg32Trj+523Vn7ttmlslmsdBkc772HsySF2vMuNmtte/y6yfgRnj1TiFsioeacNgIA3UssBkeauHhLnhf9ftN8b4m3G+E8XUsfjO8r4X3d8FT1U+b4qkmnDbCUB0LbYanWni6C56rft4UzzXhvBGG61h4MzzXwvNd8ET1k6Z4ogknjTBEx0Ka4YkWnuyAXx8I8p2CeLtSpkSKPbnYzRY6xClQ9JgDjgqZ4DRWZI/PEQe57oWCqfhH9YCkdIhS1qdBDEkS8Q0kxunEB0LL1nsamh9BADbzLTzBZhBtdjm1DzDNJ99BsfFdu9eNGCJxlGHojdgjvRP7byC2pD+JW2Z+GIhbFn8nB3m1ywiWD0ZR7YljlS0fotTi6ujQbh/a9i/t/VedzQnrifG98YPxo2EbXeOV8do4NS4MaITGb8bvxh+tP1t/t/5p/bu2Pn60mfOdUbta//0PDH9Jjg==</latexit>rt

ac9on <latexit sha1_base64="+LzeGFZxpTjAKZV/HNGMwJ8Fw9M=">AAAOQ3icfZdNb9s2HMbV7q3L5rXdjrsICwoMQxBIieOXQ4Faso0e1jYL8tbVQUDRtCyEEgmKcuwK+hS7bt9mX2JfYcdh1wGjZEeWSMq6+G8+Dx//9JdokB7FQcwt669Hjz/59LPPv3jy5d5XX7e+efrs+beXMUkYRBeQYMKuPRAjHEToggcco2vKEAg9jK68OzfXrxaIxQGJzvmKopsQ+FEwCyDgYuj9BMD885bfPtu3Dq3iMtXC3hT7xuY6vX3e2p9MCUxCFHGIQRx/sC3Kb1LAeAAxyvYmSYwogHfARx8SPuvdpEFEE44imJkvhDZLsMmJmUOZ04AhyPFKFACyQCSYcA6YgBPoe/WoGEUgRPHBdBHQeF3GC39dcCDu+yZdFn3JahNTnwE6D+CyRpaCMA4BnyuD8Sr06oMowYgtwvpgTikYJecSMRjEeQ9ORWPe0bzF8Tk53ejzFZ2jKM7ShOGsOlEIiDE0ExOLMkY8oWlxM+L53sUvOUvQQV4WYy+HgN2doemByKkN1HFmmABeH/LCvDkRuockDEE0TSc0SyccLXk6OTjMhPjC9LAwewSwad15lqXpJO+Z55lnwloT31bEt7I4qoijzY8QPDVnhJkL8fwJi01hNIWFBRDF9dkX5eyZeSFHX1bES1m8qohXsuglFTVR1EVFXSjqfUW9l9VlRVzK4qoirmTxY0X8KIvXu2Lf74r9VYoV/ReLYrU3maKZ+PsoXqH0Dmbp6/M3P2dpv7g270KCTLtuhN6D8Xjc6fa7mSzjB7097tnOUNVLQ9cZ2K7OUDoGXdcajrYsR5K3hLaszmjQkaMg3uq9vjtW9S2sNegOHY1hSzt226Pupn0IRZLVL31uv9NW2uKXOX3HcU76ql4anLbr9o40htLhDofDgVug0IRRjCQvfTB2OieWGkXLoJ7VaQ80+vYBWD3HUWBphcUZO/bQLlg4Alhy8vJlcXuD/kgO4tv+OwPXVR4gr7S/59rDtsawZT0ZnoyOCxLCQOTLXSFl+zrd3nFPjiJl0LgrnqDCQra/NO47VldhIRWWseM6eWPrK1Essg/2Tbr+y92uO3PfNrNMNouFJpvztfdglrxYY8bNbq19l18/ATfCq3cKYVM81ITDRhioY4HN8FALD3fB+6rfb4r3NeF+I4yvY/Gb4X0tvL8Lnqp+2hRPNeG0EYbqWGgzPNXC013wXPXzpniuCeeNMFzHwpvhuRae74Inqp80xRNNOGmEIToW0gxPtPBkBzxD92LLl+8UxNuVMiVyfWAodIhToOgxBxwVMsFprMgenyMOct0LBVPxRfWApHSIUtanQQxJEvENJMbpxAdCy9Z7GpofQQA28y08wWYQbXY5tT9gmk++g2Lju3avGzFE4ijD0BuxR3on9t9AbEl/ErfM/DAQtyw+Jwd5tcsIlg9GUe2JY5UtH6LU4uro0G4f2vYv7f1Xnc0J64nxvfGD8aNhG13jlfHaODUuDGiExm/G78YfrT9bf7f+af27tj5+tJnznVG7Wv/9D6soSYc=</latexit>at

- This is the standard reinforcement loop that most approaches follow. We have an environment, like a game, or the real world in which a robot has to act. We assume it is black-box, which means that we have no information about what the environment looks like. The environment receives actions from the agent, which follows a policy. A policy is a probability distribution that suggests which action to take in some state. Of course, it receives the state again from the environment.

- This loop happens for every timestep t. The environment presents a state s_t to the policy \pi_\theta, which then chooses an action a_t. The environment does some magic, and chooses the next state s_{t+1}. It also returns a reward r_{t+1} for that time step, which could be received because the agent achieves some goal in the environment. This reward is then used in the learner to update the policy parameters.

Page 2: Lecture 9: Reinforcement Learning Emile van Krieken · 2021. 1. 25. · Emile van Krieken Deep Learning 2020 Lecture 9: Reinforcement Learning dlvu.github.io Reinforcement learning

CART POLE

4

learnerpolicy <latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>

⇡✓physics engine

angle of pole

reward: 1 if poll upright 0 otherwise

le? or right

https://medium.com/@tuzzer/cart-pole-balancing-with-q-learning-b54c6068d947

Here, we present an example of an RL environment: Cart pole balancing. Our agent is the funny object on the left of the screen, a cart, has to balance the wooden stick, the pole, so that it remains upright.

To encode the state, it is sufficient to pass the angle of the pole, though additional features can be thought of! Our agent uses its policy to decide whether the cart should be moved to the left or the right, so the pole remains upright.

It receives a positive reward if the pole is upright, and otherwise receives no reward. The reward is used in the learner to update the parameters.

Simplest (Deep) RL setting • Only neural network policies • Episodic • Online • Model-free

Gradient estimation focus

THIS LECTURE

5

In this lecture, we will focus on the simplest of Deep Reinforcement Learning settings. We will only use neural networks to model our policies, and not look at the tabular reinforcement learning setting or different machine learning models.

We will assume our environment is episodic. This means that our agent acts in the environment for a finite amount of timesteps. There is some state that is the terminal state: From this point, nothing happens anymore! There are also non-episodic environments in which there is never an end to the agent acting and receiving rewards. This is however much less common in practice.

Secondly, we will be assuming an online RL setting: Here, we train the agent while it interacts with the environment. Other settings, like offline RL, exist, where instead we receive a batch of data of another agent acting in the environment, and we should find an agent that acts optimally just from inspecting this batch of data. This is a very challenging setting, and an active research area!

We will also only look at model-free methods for now. There are also model-based methods to RL, where in addition to learning the policy, we also learn a model of the environment! This is also an exciting and active research area.

In this lecture, we will attempt to explain Deep RL with a focus on gradient estimation. We will go into more details later what exactly this is, but we have seen an example of gradient estimation before in our course: The reparameterization in VAEs!

This lecture will be introducing RL and explaining the basic Deep RL algorithm, while the second lecture on RL, lecture 12) will focus on more recent popular methods for Deep RL.

Page 3: Lecture 9: Reinforcement Learning Emile van Krieken · 2021. 1. 25. · Emile van Krieken Deep Learning 2020 Lecture 9: Reinforcement Learning dlvu.github.io Reinforcement learning

DEEPMIND ATARI

6h"ps://www.youtube.com/watch?v=VCdxqn0fcnE

One benefit of RL is that a single system can be developed for many different tasks, so long as the interface between the world and the learner stays the same. Here is a famous experiment by DeepMind, the company behind AlphaGo. The environment is an Atari simulator. The state is a single image, containing everything that can be seen on the screen. The ackons are the four possible movements of the joyskck and the pressing of the fire bulon. The reward is determined by the score shown on the screen.

The amazing thing here is that the system was not pre-programmed with any knowledge of any of the games. For several of the games the system learned play the game beler than the top human performance. source: h"ps://www.youtube.com/watch?v=V1eYniJ0Rnk

CNN policy network <latexit sha1_base64="nfi9SmFy8d5udS4HKyngRVb/rHE=">AAAONXicfZdNb9s2HMaV7q3L5jXdjr0ICwp0QxBIieOXQ4Faso0e1jYL8tItCgKKpmUhlEhQlGNX02GfZtftsu8yYMdh132FUbIjSxRlXcLwefj4xz9FgXQp9iNuGH/tPPro408+/ezx57tffNn66sne068vIxIziC4gwYS9d0GEsB+iC+5zjN5ThkDgYnTl3tmZfjVHLPJJeM6XFN0EwAv9qQ8BF123e88c6t86Lp8hDl44AGa9vzgRBxx9d7u3bxwa+aPXG+a6sa+tn9Pbp60dZ0JgHKCQQwyi6No0KL9JAOM+xCjddeIIUQDvgIeuYz7t3SR+SGOOQpjqz4U2jbHOiZ6B6hOfIcjxUjQAZL5I0OEMMIEoprNbjYpQCAIUHUzmPo1WzWjurRociFrcJIu8VmllYOIxQGc+XFTIEhBEAeCzWme0DNxqJ4oxYvOg2plRCkbJuUAM+lFWg1NRmHc0K3R0Tk7X+mxJZyiM0iRmOC0PFAJiDE3FwLwZIR7TJJ+MWPO76CVnMTrImnnfyyFgd2dociByKh1VnCkmgFe73CArTojuIQkCEE4Sh6aJw9GCJ87BYSrE57qLhdklgE2qzrM0SZysZq6rnwlrRXxbEt/K4qgkjtY/QvBEnxKmz8X6ExbpwqgLC/MhiqqjL4rRU/1Cjr4siZeyeFUSr2TRjUtqXFPnJXVeU+9L6r2sLkriQhaXJXEpix9K4gdZfL8t9qdtsT9LsaL+YlMsd50JmopPSv4KJXcwTV6fv/khTfr5s34XYqSbVSN0H4zH4063301lGT/o7XHPtIZ1vTB0rYFpqwyFY9C1jeFow3IkeQtow+iMBh05CuKN3uvb47q+gTUG3aGlMGxox3Z71F2XD6FQsnqFz+532rWyeEVO37Ksk35dLwxW27Z7RwpD4bCHw+HAzlFozChGkpc+GDudE6MeRYugntFpDxT6ZgGMnmXVYGmJxRpb5tDMWTgCWHLy4mWxe4P+SA7im/pbA9uuLSAvlb9nm8O2wrBhPRmejI5zEsJA6MlVIUX5Ot3ecU+OIkXQuCtWsMZCNr807ltGt8ZCSixjy7aywlZ3othk1+ZNsvrkbvadvm/qaSqbxUaTzdneezBLXqww42a30r7Nrx6AG+HrM4WwKR4qwmEjDFSxwGZ4qISH2+C9ut9rivcU4V4jjKdi8ZrhPSW8tw2e1v20KZ4qwmkjDFWx0GZ4qoSn2+B53c+b4rkinDfCcBULb4bnSni+DZ7U/aQpnijCSSMMUbGQZniihCdb4Bm6F0e+7KQg3q6E1SJX14ZchzgBNT2/TuQywUlUk1e3j0x3A8GU/yN7Jn4ESRzyPIXixPGAUNLViYVmFwyA9eyATrDuh+szTOXzSrOhd1Aca1fu1TSHSFxUGHojTkDvxOkaiAPn92JCzAt8MSHx1znIWtuMYPFgFK1dcWky5StSvXF1dGi2D03zx/b+q876/vRYe6Z9q73QTK2rvdJea6fahQa1X7XftN+1P1p/tv5u/dP6d2V9tLMe841WeVr//Q/9WUJo</latexit>

⇡✓(a|s)

ATARI POLICY

7

Left

<latexit sha1_base64="waUnr2Aq8dArVtVjOAboSaAlUwM=">AAAOHnicfZdNb9s2HMaV7q3L5rXdjrsICwoMQxBIieOXQ4Faso0e1jYL8rbVQUHRtKyFEgmScuwK+g67bpd9mh2HXbdvM0p2ZImirEsYPg8f//inKJAexQEXlvXf3qOPPv7k088ef77/xZetr548ffb1FScxg+gSEkzYjQc4wkGELkUgMLqhDIHQw+jau3Mz/XqBGA9IdCFWFN2GwI+CWQCBkF1XEy6AQO+fHlhHVv6Y9Ya9aRwYm+fs/bPW3mRKYByiSEAMOH9nW1TcJoCJAGKU7k9ijiiAd8BH72Ix690mQURjgSKYms+lNouxKYiZEZnTgCEo8Eo2AGSBTDDhHDAAheTer0ZxFIEQ8cPpIqB83eQLf90QQE76NlnmRUkrAxOfAToP4LJCloCQh0DMa518FXrVThRjxBZhtTOjlIyKc4kYDHhWgzNZmLc0qzO/IGcbfb6icxTxNIkZTssDpYAYQzM5MG9yJGKa5JORi3vHXwgWo8Osmfe9GAJ2d46mhzKn0lHFmWECRLXLC7PiROgekjAE0TSZ0DSZCLQUyeTwKJXic9PD0uwRwKZV53maJJOsZp5nnktrRXxTEt+o4qgkjjY/QvDUnBFmLuT6E8ZNaTSlhQUQ8eroy2L0zLxUo69K4pUqXpfEa1X04pIa19RFSV3U1PuSeq+qy5K4VMVVSVyp4oeS+EEVb3bF/rwr9hclVtZfborV/mSKZvLbkb9CyR1Mk1cXr39Mk37+bN6FGJl21Qi9B+PJuNPtd1NVxg96e9yznWFdLwxdZ2C7OkPhGHRdazjashwr3gLasjqjQUeNgnir9/ruuK5vYa1Bd+hoDFvasdsedTflQyhSrH7hc/uddq0sfpHTdxzntF/XC4PTdt3escZQONzhcDhwcxQaM4qR4qUPxk7n1KpH0SKoZ3XaA42+XQCr5zg1WFpiccaOPbRzFoEAVpyieFnc3qA/UoPEtv7OwHVrCyhK5e+59rCtMWxZT4eno5OchDAQ+WpVSFG+Trd30lOjSBE07soVrLGQ7S+N+47VrbGQEsvYcZ2ssNWdKDfZO/s2WX9yt/vOPLDNNFXNcqOp5mzvPZgVL9aYcbNba9/l1w/AjfD1mULYFA814bARBupYYDM81MLDXfB+3e83xfuacL8Rxtex+M3wvhbe3wVP637aFE814bQRhupYaDM81cLTXfCi7hdN8UITLhphhI5FNMMLLbzYBU/qftIUTzThpBGG6FhIMzzRwpMd8AzdyyNfdlKQb1fCapHyTC5Ps7kOcQJqen6byGWCE16TPTFHAmS6F0qm/B/VMw04JHEk8hSKk4kPpJKuTyw0u2AAbGYHdILNINqcYSqfV5oNvYPyWLt2r6c5RPKiwtBreQJ6K0/XQB44f5ATYn4YyAnJv5PDrLXLCJYPRtnal5cmW70i1RvXx0d2+8i2f2ofvOxs7k+PjW+N74zvDdvoGi+NV8aZcWlA41fjN+N344/Wn62/Wn+3/llbH+1txnxjVJ7Wv/8DASs5Vg==</latexit>s <latexit sha1_base64="tu4dC4JjNNVWy9U2qYJ6CpDY5rg=">AAAOH3icfZdNb9s2HMaV7q3L5q3djrsICwoMQxBIieOXQ4Fako0e1jYLkjhbHBQUTcuCKZGgKMeuoA+x63bZp9lx2LXfZpTsyBJFWRfRfB4+/vEvUSBdiv2IG8bHgyeffPrZ5188/fLwq69b33z77Pl3NxGJGUTXkGDCbl0QIeyH6Jr7HKNbyhAIXIzG7sLO9PESscgn4RVfU3QfAC/0Zz4EXHSNJwBm9/fPjowTI7/0esPcNo607XXx/nnrYDIlMA5QyCEGUXRnGpTfJ4BxH2KUHk7iCFEAF8BDdzGf9e4TP6QxRyFM9RdCm8VY50TPkPSpzxDkeC0aADJfJOhwDphAE+CH1agIhSBA0fF06dNo04yW3qbBgZj1fbLKq5JWBiYeA3Tuw1WFLAFBFAA+r3VG68CtdqIYI7YMqp0ZpWCUnCvEoB9lNbgQhXlHswJHV+Riq8/XdI7CKE1ihtPyQCEgxtBMDMybEeIxTfLJiKe7iF5yFqPjrJn3vXQAW1yi6bHIqXRUcWaYAF7tcoOsOCF6gCQIQDhNJjRNJhyteDI5PkmF+EJ3sTC7BLBp1XmZJskkq5nr6pfCWhHflsS3sjgsicPtnxA81WeE6Uvx/AmLdGHUhYX5EEXV0dfF6Jl+LUfflMQbWRyXxLEsunFJjWvqsqQua+pDSX2Q1VVJXMniuiSuZfFDSfwgi7f7Yn/bF/u7FCvqLxbF+nAyRTPx8chfoWQB0+T11Ztf0qSfX9t3IUa6WTVC99F4Nup0+91UlvGj3h71TMup64Whaw1MW2UoHIOubTjDHcup5C2gDaMzHHTkKIh3eq9vj+r6DtYYdB1LYdjRjuz2sLstH0KhZPUKn93vtGtl8YqcvmVZ5/26Xhistm33ThWGwmE7jjOwcxQaM4qR5KWPxk7n3KhH0SKoZ3TaA4W+ewBGz7JqsLTEYo0s0zFzFo4Alpy8eFns3qA/lIP4rv7WwLZrD5CXyt+zTaetMOxYz53z4VlOQhgIPbkqpChfp9s768lRpAgadcUTrLGQ3T+N+pbRrbGQEsvIsq2ssNWVKBbZnXmfbD65u3WnH5l6mspmsdBkc7b2Hs2SFyvMuNmttO/zqwfgRvj6TCFsioeKcNgIA1UssBkeKuHhPniv7vea4j1FuNcI46lYvGZ4Twnv7YOndT9tiqeKcNoIQ1UstBmeKuHpPnhe9/OmeK4I540wXMXCm+G5Ep7vgyd1P2mKJ4pw0ghDVCykGZ4o4ckeeIYexJYv2ymItythtcjNcSHXIU5ATY844CiXCU6imuzyOeIg091AMOU/ZM/UjyCJQ56nUJxMPCCUdLNjodkBA2A926ATrPvhdg9T+bzSbOgCim3txr2ZpoPEQYWhN2IH9E7sroHYcP4sJsS8wBcTEvfJcdbaZwSrR6NoHYpDkykfkeqN8emJ2T4xzV/bR6862/PTU+0H7UftJ83Uutor7bV2oV1rUFtof2h/an+1/m790/q39d/G+uRgO+Z7rXK1Pv4PL2I5vQ==</latexit>aMnih, V., Kavukcuoglu, K., Silver, D. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).

Here, we see an example of what a Deep RL policy might look like. Like we mentioned, the states in our atari game are simple images, so, a sensible idea is to use a CNN! This CNN policy takes the current state of the game, does a lot of hard neural network computation, and computes a probability distribution over actions! Since we have a finite set of actions, we can use a softmax output layer to create a categorical distribution over actions. Then, we sample an action to perform from this distribution.

We won’t discuss policy network architectures any further in this lecture. For choosing network architecture, generally the same recommendations apply as for normal deep learning: Use CNNs for states represented as images, Graph Neural Networks for states represented as graphs, RNNs or transformers for text, etc.

Figure from the original DQN paper in Nature: Mnih, V., Kavukcuoglu, K., Silver, D. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).

• Components of RL: • Actions • States • Rewards • These are random variables!

• Trajectories • Initial state • Terminal state

<latexit sha1_base64="+LzeGFZxpTjAKZV/HNGMwJ8Fw9M=">AAAOQ3icfZdNb9s2HMbV7q3L5rXdjrsICwoMQxBIieOXQ4Faso0e1jYL8tbVQUDRtCyEEgmKcuwK+hS7bt9mX2JfYcdh1wGjZEeWSMq6+G8+Dx//9JdokB7FQcwt669Hjz/59LPPv3jy5d5XX7e+efrs+beXMUkYRBeQYMKuPRAjHEToggcco2vKEAg9jK68OzfXrxaIxQGJzvmKopsQ+FEwCyDgYuj9BMD885bfPtu3Dq3iMtXC3hT7xuY6vX3e2p9MCUxCFHGIQRx/sC3Kb1LAeAAxyvYmSYwogHfARx8SPuvdpEFEE44imJkvhDZLsMmJmUOZ04AhyPFKFACyQCSYcA6YgBPoe/WoGEUgRPHBdBHQeF3GC39dcCDu+yZdFn3JahNTnwE6D+CyRpaCMA4BnyuD8Sr06oMowYgtwvpgTikYJecSMRjEeQ9ORWPe0bzF8Tk53ejzFZ2jKM7ShOGsOlEIiDE0ExOLMkY8oWlxM+L53sUvOUvQQV4WYy+HgN2doemByKkN1HFmmABeH/LCvDkRuockDEE0TSc0SyccLXk6OTjMhPjC9LAwewSwad15lqXpJO+Z55lnwloT31bEt7I4qoijzY8QPDVnhJkL8fwJi01hNIWFBRDF9dkX5eyZeSFHX1bES1m8qohXsuglFTVR1EVFXSjqfUW9l9VlRVzK4qoirmTxY0X8KIvXu2Lf74r9VYoV/ReLYrU3maKZ+PsoXqH0Dmbp6/M3P2dpv7g270KCTLtuhN6D8Xjc6fa7mSzjB7097tnOUNVLQ9cZ2K7OUDoGXdcajrYsR5K3hLaszmjQkaMg3uq9vjtW9S2sNegOHY1hSzt226Pupn0IRZLVL31uv9NW2uKXOX3HcU76ql4anLbr9o40htLhDofDgVug0IRRjCQvfTB2OieWGkXLoJ7VaQ80+vYBWD3HUWBphcUZO/bQLlg4Alhy8vJlcXuD/kgO4tv+OwPXVR4gr7S/59rDtsawZT0ZnoyOCxLCQOTLXSFl+zrd3nFPjiJl0LgrnqDCQra/NO47VldhIRWWseM6eWPrK1Essg/2Tbr+y92uO3PfNrNMNouFJpvztfdglrxYY8bNbq19l18/ATfCq3cKYVM81ITDRhioY4HN8FALD3fB+6rfb4r3NeF+I4yvY/Gb4X0tvL8Lnqp+2hRPNeG0EYbqWGgzPNXC013wXPXzpniuCeeNMFzHwpvhuRae74Inqp80xRNNOGmEIToW0gxPtPBkBzxD92LLl+8UxNuVMiVyfWAodIhToOgxBxwVMsFprMgenyMOct0LBVPxRfWApHSIUtanQQxJEvENJMbpxAdCy9Z7GpofQQA28y08wWYQbXY5tT9gmk++g2Lju3avGzFE4ijD0BuxR3on9t9AbEl/ErfM/DAQtyw+Jwd5tcsIlg9GUe2JY5UtH6LU4uro0G4f2vYv7f1Xnc0J64nxvfGD8aNhG13jlfHaODUuDGiExm/G78YfrT9bf7f+af27tj5+tJnznVG7Wv/9D6soSYc=</latexit>at

<latexit sha1_base64="/pzW3Mv4fuxhZwPDsc7BaKD9MQQ=">AAAOQnicfZfLbuM2GIU109s0rduZdtmN0GCAoggCKXF8WQwwlmxjFp2ZNMitjYOAomlZCCUSJOXYI+glum3fpk/RR+iy6LaLUrIjSxRlbcLwHB5/+iUKPz2KAy4s668nTz/6+JNPP3v2+d4XX7a++vr5i28uOYkZRBeQYMKuPcARDiJ0IQKB0TVlCIQeRlfevZvpVwvEeECic7Gi6DYEfhTMAgiEnLqecAEEuhN3z/etQyu/zPrA3gz2jc11eveitT+ZEhiHKBIQA85vbIuK2wQwEUCM0r1JzBEF8B746CYWs95tEkQ0FiiCqflSarMYm4KYGZM5DRiCAq/kAEAWyAQTzgEDUEjyvWoURxEIET+YLgLK10O+8NcDAeRt3ybLvCxpZWHiM0DnAVxWyBIQ8hCIeW2Sr0KvOolijNgirE5mlJJRcS4RgwHPanAqC/OeZpXm5+R0o89XdI4iniYxw2l5oRQQY2gmF+ZDjkRMk/xm5OO9568Ei9FBNsznXg0Buz9D0wOZU5mo4swwAaI65YVZcSL0AEkYgmiaTGiaTARaimRycJhK8aXpYWn2CGDTqvMsTZJJVjPPM8+ktSK+K4nvVHFUEkebHyF4as4IMxfy+RPGTWk0pYUFEPHq6oti9cy8UKMvS+KlKl6VxCtV9OKSGtfURUld1NSHkvqgqsuSuFTFVUlcqeKHkvhBFa93xf6yK/ZXJVbWX26K1d5kimby65G/Qsk9TJM3529/SpN+fm3ehRiZdtUIvUfj8bjT7XdTVcaPenvcs51hXS8MXWdguzpD4Rh0XWs42rIcKd4C2rI6o0FHjYJ4q/f67riub2GtQXfoaAxb2rHbHnU35UMoUqx+4XP7nXatLH6R03cc56Rf1wuD03bd3pHGUDjc4XA4cHMUGjOKkeKlj8ZO58SqR9EiqGd12gONvn0AVs9xarC0xOKMHXto5ywCAaw4RfGyuL1Bf6QGiW39nYHr1h6gKJW/59rDtsawZT0ZnoyOcxLCQOSrVSFF+Trd3nFPjSJF0Lgrn2CNhWx/adx3rG6NhZRYxo7rZIWt7kS5yW7s22T9yd3uO3PfNtNUNcuNppqzvfdoVrxYY8bNbq19l1+/ADfC1+8UwqZ4qAmHjTBQxwKb4aEWHu6C9+t+vyne14T7jTC+jsVvhve18P4ueFr306Z4qgmnjTBUx0Kb4akWnu6CF3W/aIoXmnDRCCN0LKIZXmjhxS54UveTpniiCSeNMETHQprhiRae7IBn6EG2fFmnIN+uhNUiZU8uu9lchzgBNT0/T+QywQmvyZ6YIwEy3QslU/5P3QPiwiGHqj4NOCRxJDaQGCcTH0gtXfc0NDuCAGxmLTzBZhBtupzKB5hmi++hbHzX7nUhhkgeZRh6K3uk97L/BrIl/VHeMvPDQN6y/Ds5yEa7jGD5aJSjPXmsstVDVH1wdXRotw9t++f2/uvO5oT1zPjO+N74wbCNrvHaeGOcGhcGNLDxm/G78Ufrz9bfrX9a/66tT59s1nxrVK7Wf/8Db7dJIA==</latexit>st<latexit sha1_base64="JCho6RHVaZ12zLSSoSGaNUMc/P8=">AAAOQ3icfZdNb9s2HMbV7q3L5rXdjrsICwoMQxBIieOXQ4Faso0e1jYL8tbVQUDRtCyEEgmKcuwK+hS7bt9mX2JfYcdh1wGjZEeWSMq65B8+Dx//9JcokB7FQcwt669Hjz/59LPPv3jy5d5XX7e+efrs+beXMUkYRBeQYMKuPRAjHEToggcco2vKEAg9jK68OzfXrxaIxQGJzvmKopsQ+FEwCyDgYuj9hKF7wKa3/PbZvnVoFZepFvam2Dc21+nt89b+ZEpgEqKIQwzi+INtUX6TAsYDiFG2N0liRAG8Az76kPBZ7yYNIppwFMHMfCG0WYJNTswcypwGDEGOV6IAkAUiwYRzwADkAn2vHhWjCIQoPpguAhqvy3jhrwsOxH3fpMuiL1ltYuozQOcBXNbIUhDGIeBzZTBehV59ECUYsUVYH8wpBaPkXCIGgzjvwalozDuatzo+J6cbfb6icxTFWZownFUnCgExhmZiYlHGiCc0LW5GPN+7+CVnCTrIy2Ls5RCwuzM0PRA5tYE6zgwTwOtDXpg3J0L3kIQhiKbphGbphKMlTycHh5kQX5geFmaPiLej7jzL0nSS98zzzDNhrYlvK+JbWRxVxNHmRwiemjPCzIV4/oTFpjCawsICiOL67Ity9sy8kKMvK+KlLF5VxCtZ9JKKmijqoqIuFPW+ot7L6rIiLmVxVRFXsvixIn6Uxetdse93xf4qxYr+i0Wx2ptM0Ux8PopXKL2DWfr6/M3PWdovrs27kCDTrhuh92A8Hne6/W4my/hBb497tjNU9dLQdQa2qzOUjkHXtYajLcuR5C2hLaszGnTkKIi3eq/vjlV9C2sNukNHY9jSjt32qLtpH0KRZPVLn9vvtJW2+GVO33Gck76qlwan7bq9I42hdLjD4XDgFig0YRQjyUsfjJ3OiaVG0TKoZ3XaA42+fQBWz3EUWFphccaOPbQLFo4Alpy8fFnc3qA/koP4tv/OwHWVB8gr7e+59rCtMWxZT4Yno+OChDAQ+XJXSNm+Trd33JOjSBk07oonqLCQ7S+N+47VVVhIhWXsuE7e2PpKFIvsg32Trj+523Vn7ttmlslmsdBkc772HsySF2vMuNmtte/y6yfgRnj1TiFsioeacNgIA3UssBkeauHhLnhf9ftN8b4m3G+E8XUsfjO8r4X3d8FT1U+b4qkmnDbCUB0LbYanWni6C56rft4UzzXhvBGG61h4MzzXwvNd8ET1k6Z4ogknjTBEx0Ka4YkWnuyAXx8I8p2CeLtSpkSKPbnYzRY6xClQ9JgDjgqZ4DRWZI/PEQe57oWCqfhH9YCkdIhS1qdBDEkS8Q0kxunEB0LL1nsamh9BADbzLTzBZhBtdjm1DzDNJ99BsfFdu9eNGCJxlGHojdgjvRP7byC2pD+JW2Z+GIhbFn8nB3m1ywiWD0ZR7YljlS0fotTi6ujQbh/a9i/t/VedzQnrifG98YPxo2EbXeOV8do4NS4MaITGb8bvxh+tP1t/t/5p/bu2Pn60mfOdUbta//0PDH9Jjg==</latexit>rt

<latexit sha1_base64="nyvr2I+1rJ5Zdc1PSdcTqOSsGX4=">AAAOs3icfZdbb9s2GIaVdocua7x0u9yNsKDAMHiBlDg+XASoJdvoxdpmgZN0jYKAomlZMCUSJOXYFfRDd7H/Mkp2dJZ1ky98X75+9EkUSJtilwtN+/fgxctvvv3u+1c/HP74+qj10/Gbn285CRhEN5Bgwj7bgCPs+uhGuAKjz5Qh4NkY3dlLM9bvVohxl/hTsaHowQOO785dCIQcejx+smwBgkuLCyDQo9ZWLQATJS4ZegJs9qjLcqvrma5n+lmqn6nWjAiemsLpn3qUGaepcfp4fKKdasmlVgt9V5wou+vq8c3RicyGgYd8ATHg/F7XqHgIARMuxCg6tAKOKIBL4KD7QMz7D6Hr00AgH0bqW6nNA6wKosZNUGcuQ1DgjSwAZK5MUOECMEktW3VYjOLIBx7i7dnKpXxb8pWzLQSQfX4I18lziAoTQ4cBunDhukAWAo97QCwqg3zj2cVBFGDEVl5xMKaUjCXnGjHo8rgHV7Ixn2jcez4lVzt9saEL5PMoDBiO8hOlgBhDczkxKTkSAQ2Tm5Hv05JfChagdlwmY5cjwJbXaNaWOYWBIs4cEyCKQ7YXN8dHT5B4HvBnoUWj0BJoLUKrfRpJ8a1qY2m2iXxPis7rKAytuGe2rV5La0H8mBM/lsVxThzvfoTgmTonTF3J508YV6VRlRbmQsSLs2/S2XP1phx9mxNvy+JdTrwri3aQU4OKusqpq4r6lFOfyuo6J67L4iYnbsri15z4tSx+3hf7z77YL6VY2X+5KDaH1gzN5ecqeYXCJYzC99MPf0XhILl270KAVL1ohPaz8XzS7Q16UVnGz3pn0teNUVVPDT1jqJt1htQx7JnaaJyxnJW8KbSmdcfDbjkK4kzvD8xJVc9gtWFvZNQYMtqJ2Rn3du1DyC9ZndRnDrqdSlucNGdgGMbFoKqnBqNjmv2zGkPqMEej0dBMUGjAKEYlL302drsXWjWKpkF9rdsZ1ujZA9D6hlGBpTkWY2LoIz1hEQjgklOkL4vZHw7G5SCR9d8YmmblAYpc+/umPurUGDLWi9HF+DwhIQz4TrkrJG1ft9c/75ejSBo06cknWGEh2S9NBobWq7CQHMvEMI24scWVKBfZvf4Qbj+52bpTT3Q1ispmudDK5njtPZtLXlxjxs3uWvs+f/0E3AhfvVMIm+JhTThshIF1LLAZHtbCw33wTtXvNMU7NeFOI4xTx+I0wzu18M4+eFr106Z4WhNOG2FoHQtthqe18HQfvKj6RVO8qAkXjTCijkU0w4taeLEPnlT9pCme1ISTRhhSx0Ka4UktPNkDvz0axDsF+XaFrBK5PUkkOsQhqOjJeSKRCQ55RbbFAgkQ67YnmZJ/qh4QpA5ZlvWZyyEJfLGDxDi0HCC1aLunofERBGA13sITrLr+bpdT+ADTePISyo3v1r1txAjJowxDH+Qe6ZPcfwO5Jf1D3jJzPFfesvxrteNqnxGsn42yOpTHKr18iKoWd2eneudU1//unLzr7k5Yr5Rfld+U3xVd6SnvlPfKlXKjQOW/g5cHrw+OWp3Wl5bdmm2tLw52c35RClfL+x8CFGve</latexit>

⌧ = s0,a0, r1, s1,a1, r2, s2 . . .aT-1, rT , sT<latexit sha1_base64="GEr+u/sGsjmjWJOBOSadRWZ5rCM=">AAAOQnicfZfLbuM2GIU109s0rduZdtmN0GCAoggCKXF8WQwwlmxjFp2ZNMitjYOAomlZCCUSJOXYI+glum3fpk/RR+iy6LaLUrIjSxRlbcLwHB5/+iUKPz2KAy4s668nTz/6+JNPP3v2+d4XX7a++vr5i28uOYkZRBeQYMKuPcARDiJ0IQKB0TVlCIQeRlfevZvpVwvEeECic7Gi6DYEfhTMAgiEnLqecAEEurPunu9bh1Z+mfWBvRnsG5vr9O5Fa38yJTAOUSQgBpzf2BYVtwlgIoAYpXuTmCMK4D3w0U0sZr3bJIhoLFAEU/Ol1GYxNgUxMyZzGjAEBV7JAYAskAkmnAMGoJDke9UojiIQIn4wXQSUr4d84a8HAsjbvk2WeVnSysLEZ4DOA7iskCUg5CEQ89okX4VedRLFGLFFWJ3MKCWj4lwiBgOe1eBUFuY9zSrNz8npRp+v6BxFPE1ihtPyQikgxtBMLsyHHImYJvnNyMd7z18JFqODbJjPvRoCdn+GpgcypzJRxZlhAkR1yguz4kToAZIwBNE0mdA0mQi0FMnk4DCV4kvTw9LsEcCmVedZmiSTrGaeZ55Ja0V8VxLfqeKoJI42P0Lw1JwRZi7k8yeMm9JoSgsLIOLV1RfF6pl5oUZflsRLVbwqiVeq6MUlNa6pi5K6qKkPJfVBVZclcamKq5K4UsUPJfGDKl7viv1lV+yvSqysv9wUq73JFM3k1yN/hZJ7mCZvzt/+lCb9/Nq8CzEy7aoReo/G43Gn2++mqowf9fa4ZzvDul4Yus7AdnWGwjHoutZwtGU5UrwFtGV1RoOOGgXxVu/13XFd38Jag+7Q0Ri2tGO3PepuyodQpFj9wuf2O+1aWfwip+84zkm/rhcGp+26vSONoXC4w+Fw4OYoNGYUI8VLH42dzolVj6JFUM/qtAcaffsArJ7j1GBpicUZO/bQzlkEAlhxiuJlcXuD/kgNEtv6OwPXrT1AUSp/z7WHbY1hy3oyPBkd5ySEgchXq0KK8nW6veOeGkWKoHFXPsEaC9n+0rjvWN0aCymxjB3XyQpb3Ylyk93Yt8n6k7vdd+a+baapapYbTTVne+/RrHixxoyb3Vr7Lr9+AW6Er98phE3xUBMOG2GgjgU2w0MtPNwF79f9flO8rwn3G2F8HYvfDO9r4f1d8LTup03xVBNOG2GojoU2w1MtPN0FL+p+0RQvNOGiEUboWEQzvNDCi13wpO4nTfFEE04aYYiOhTTDEy082QHP0INs+bJOQb5dCatFyp5cdrO5DnECanp+nshlghNekz0xRwJkuhdKpvyfugfEhUMOVX0acEjiSGwgMU4mPpBauu5paHYEAdjMWniCzSDadDmVDzDNFt9D2fiu3etCDJE8yjD0VvZI72X/DWRL+qO8ZeaHgbxl+XdykI12GcHy0ShHe/JYZauHqPrg6ujQbh/a9s/t/dedzQnrmfGd8b3xg2EbXeO18cY4NS4MaGDjN+N344/Wn62/W/+0/l1bnz7ZrPnWqFyt//4Hwl9I3A==</latexit>s0

<latexit sha1_base64="mT6OGDXD2gX+Wm6tE1qKZ8PXC3g=">AAAOQnicfZfLbuM2GIU109s0rduZdtmN0GCAoggCKXF8WQwwlmxjFp2ZNMitjYOAomlZCCUSJOXYI+glum3fpk/RR+iy6LaLUrIjSxRlbcLwHB5/+iUKPz2KAy4s668nTz/6+JNPP3v2+d4XX7a++vr5i28uOYkZRBeQYMKuPcARDiJ0IQKB0TVlCIQeRlfevZvpVwvEeECic7Gi6DYEfhTMAgiEnLqecAEEuju/e75vHVr5ZdYH9mawb2yu07sXrf3JlMA4RJGAGHB+Y1tU3CaAiQBilO5NYo4ogPfARzexmPVukyCisUARTM2XUpvF2BTEzJjMacAQFHglBwCyQCaYcA4YgEKS71WjOIpAiPjBdBFQvh7yhb8eCCBv+zZZ5mVJKwsTnwE6D+CyQpaAkIdAzGuTfBV61UkUY8QWYXUyo5SMinOJGAx4VoNTWZj3NKs0PyenG32+onMU8TSJGU7LC6WAGEMzuTAfciRimuQ3Ix/vPX8lWIwOsmE+92oI2P0Zmh7InMpEFWeGCRDVKS/MihOhB0jCEETTZELTZCLQUiSTg8NUii9ND0uzRwCbVp1naZJMspp5nnkmrRXxXUl8p4qjkjja/AjBU3NGmLmQz58wbkqjKS0sgIhXV18Uq2fmhRp9WRIvVfGqJF6poheX1LimLkrqoqY+lNQHVV2WxKUqrkriShU/lMQPqni9K/aXXbG/KrGy/nJTrPYmUzSTX4/8FUruYZq8OX/7U5r082vzLsTItKtG6D0aj8edbr+bqjJ+1Nvjnu0M63ph6DoD29UZCseg61rD0ZblSPEW0JbVGQ06ahTEW73Xd8d1fQtrDbpDR2PY0o7d9qi7KR9CkWL1C5/b77RrZfGLnL7jOCf9ul4YnLbr9o40hsLhDofDgZuj0JhRjBQvfTR2OidWPYoWQT2r0x5o9O0DsHqOU4OlJRZn7NhDO2cRCGDFKYqXxe0N+iM1SGzr7wxct/YARan8PdcetjWGLevJ8GR0nJMQBiJfrQopytfp9o57ahQpgsZd+QRrLGT7S+O+Y3VrLKTEMnZcJytsdSfKTXZj3ybrT+5235n7tpmmqlluNNWc7b1Hs+LFGjNudmvtu/z6BbgRvn6nEDbFQ004bISBOhbYDA+18HAXvF/3+03xvibcb4TxdSx+M7yvhfd3wdO6nzbFU004bYShOhbaDE+18HQXvKj7RVO80ISLRhihYxHN8EILL3bBk7qfNMUTTThphCE6FtIMT7TwZAc8Qw+y5cs6Bfl2JawWKXty2c3mOsQJqOn5eSKXCU54TfbEHAmQ6V4omfJ/6h4QFw45VPVpwCGJI7GBxDiZ+EBq6bqnodkRBGAza+EJNoNo0+VUPsA0W3wPZeO7dq8LMUTyKMPQW9kjvZf9N5At6Y/ylpkfBvKW5d/JQTbaZQTLR6Mc7cljla0eouqDq6NDu31o2z+39193NiesZ8Z3xvfGD4ZtdI3Xxhvj1LgwoIGN34zfjT9af7b+bv3T+ndtffpks+Zbo3K1/vsftLlJAA==</latexit>sT

TRAJECTORIES

8

As we have seen before, reinforcement learning settings are modelled using actions, states and rewards. Note that RL settings are stochastic: This means these three components are random variables.

Furthermore, we have trajectories, which are a full rollout of the agent interacting with the environment and receiving rewards. It starts at the initial state s_0, and ends in the terminal state s_T, from which the agent stops interacting with the environment.

Page 4: Lecture 9: Reinforcement Learning Emile van Krieken · 2021. 1. 25. · Emile van Krieken Deep Learning 2020 Lecture 9: Reinforcement Learning dlvu.github.io Reinforcement learning

ATARI TRAJECTORY

9

= Left<latexit sha1_base64="5c7YI5DF4pFOIUZakFvff7LcCFY=">AAAOR3icfZfLbuM2GIWV6W2a1tOZdtmN0GCAog0CKXF8WQwwlmxjFp2ZNMitjY2AomlZMCWyFOXYI+g5um3fpq/Ql+iy6LKUbMsSKVkb/+Y5PP70SzRIh2Iv5Ibx98GTjz7+5NPPnn5++MWXjWdfPX/x9U1IIgbRNSSYsDsHhAh7AbrmHsfojjIEfAejW2dup/rtArHQI8EVX1E09oEbeFMPAi6GxiMA08+HmP9oJg/Pj4wTI7t0tTA3xZG2uS4eXjSORhMCIx8FHGIQhvemQfk4Box7EKPkcBSFiAI4By66j/i0M469gEYcBTDRXwptGmGdEz0F0yceQ5DjlSgAZJ5I0OEMMAEo8A/LUSEKgI/C48nCo+G6DBfuuuBA3Ps4Xma9SUoTY5cBOvPgskQWAz/0AZ8pg+HKd8qDKMKILfzyYEopGCXnEjHohWkPLkRj3tO0zeEVudjosxWdoSBM4ojhpDhRCIgxNBUTszJEPKJxdjPiGc/DV5xF6Dgts7FXfcDml2hyLHJKA2WcKSaAl4ccP21OgB4h8X0QTOIRTeIRR0sej45PEiG+1B0szA4BbFJ2XiZxPEp75jj6pbCWxHcF8Z0sDgriYPMjBE/0KWH6Qjx/wkJdGHVhYR5EYXn2dT57ql/L0TcF8UYWbwvirSw6UUGNFHVRUBeK+lhQH2V1WRCXsrgqiCtZ/FAQP8ji3b7YX/bF/irFiv6LRbE6HE3QVPyFZK9QPIdJ/Obq7U9J3M2uzbsQId0sG6GzNZ4NW+1uO5FlvNWbw45p9VU9N7StnmlXGXJHr20b/cGO5VTy5tCG0Rr0WnIUxDu907WHqr6DNXrtvlVh2NEO7eagvWkfQoFkdXOf3W01lba4eU7XsqzzrqrnBqtp253TCkPusPv9fs/OUGjEKEaSl26Nrda5oUbRPKhjtJq9Cn33AIyOZSmwtMBiDS2zb2YsHAEsOXn+stidXncgB/Fd/62ebSsPkBfa37HNfrPCsGM9758PzjISwkDgyl0hefta7c5ZR44iedCwLZ6gwkJ2vzTsWkZbYSEFlqFlW2ljyytRLLJ7cxyv/3J3604/MvUkkc1iocnmdO1tzZIXV5hxvbvSvs9fPQHXwqt3CmFdPKwIh7UwsIoF1sPDSni4D95V/W5dvFsR7tbCuFUsbj28Wwnv7oOnqp/WxdOKcFoLQ6tYaD08rYSn++C56ud18bwinNfC8CoWXg/PK+H5Pnii+kldPKkIJ7UwpIqF1MOTSniyB56hR7HlS3cK4u2KmRK5PjRkOsQxUPSQA44ymeA4VGSHzxAHqe74gin7onpAlDtEKesTL4QkCvgGEuN45AKhJes9DU2PIADr6RaeYN0LNruc0h8wTSfPodj4rt3rRvSROMow9Fbskd6L/TcQW9IfxC0z1/fELYvP0XFa7TOC5dYoqkNxrDLlQ5Ra3J6emM0T0/y5efS6tTlhPdW+1b7TvtdMra291t5oF9q1BrXftN+1P7Q/G381/mn82/hvbX1ysJnzjVa6nh38D+KuSgQ=</latexit>at+1

https://becominghuman.ai/lets-build-an-atari-ai-part-0-intro-to-rl-9b2c5336e0ec

= Left<latexit sha1_base64="+LzeGFZxpTjAKZV/HNGMwJ8Fw9M=">AAAOQ3icfZdNb9s2HMbV7q3L5rXdjrsICwoMQxBIieOXQ4Faso0e1jYL8tbVQUDRtCyEEgmKcuwK+hS7bt9mX2JfYcdh1wGjZEeWSMq6+G8+Dx//9JdokB7FQcwt669Hjz/59LPPv3jy5d5XX7e+efrs+beXMUkYRBeQYMKuPRAjHEToggcco2vKEAg9jK68OzfXrxaIxQGJzvmKopsQ+FEwCyDgYuj9BMD885bfPtu3Dq3iMtXC3hT7xuY6vX3e2p9MCUxCFHGIQRx/sC3Kb1LAeAAxyvYmSYwogHfARx8SPuvdpEFEE44imJkvhDZLsMmJmUOZ04AhyPFKFACyQCSYcA6YgBPoe/WoGEUgRPHBdBHQeF3GC39dcCDu+yZdFn3JahNTnwE6D+CyRpaCMA4BnyuD8Sr06oMowYgtwvpgTikYJecSMRjEeQ9ORWPe0bzF8Tk53ejzFZ2jKM7ShOGsOlEIiDE0ExOLMkY8oWlxM+L53sUvOUvQQV4WYy+HgN2doemByKkN1HFmmABeH/LCvDkRuockDEE0TSc0SyccLXk6OTjMhPjC9LAwewSwad15lqXpJO+Z55lnwloT31bEt7I4qoijzY8QPDVnhJkL8fwJi01hNIWFBRDF9dkX5eyZeSFHX1bES1m8qohXsuglFTVR1EVFXSjqfUW9l9VlRVzK4qoirmTxY0X8KIvXu2Lf74r9VYoV/ReLYrU3maKZ+PsoXqH0Dmbp6/M3P2dpv7g270KCTLtuhN6D8Xjc6fa7mSzjB7097tnOUNVLQ9cZ2K7OUDoGXdcajrYsR5K3hLaszmjQkaMg3uq9vjtW9S2sNegOHY1hSzt226Pupn0IRZLVL31uv9NW2uKXOX3HcU76ql4anLbr9o40htLhDofDgVug0IRRjCQvfTB2OieWGkXLoJ7VaQ80+vYBWD3HUWBphcUZO/bQLlg4Alhy8vJlcXuD/kgO4tv+OwPXVR4gr7S/59rDtsawZT0ZnoyOCxLCQOTLXSFl+zrd3nFPjiJl0LgrnqDCQra/NO47VldhIRWWseM6eWPrK1Essg/2Tbr+y92uO3PfNrNMNouFJpvztfdglrxYY8bNbq19l18/ATfCq3cKYVM81ITDRhioY4HN8FALD3fB+6rfb4r3NeF+I4yvY/Gb4X0tvL8Lnqp+2hRPNeG0EYbqWGgzPNXC013wXPXzpniuCeeNMFzHwpvhuRae74Inqp80xRNNOGmEIToW0gxPtPBkBzxD92LLl+8UxNuVMiVyfWAodIhToOgxBxwVMsFprMgenyMOct0LBVPxRfWApHSIUtanQQxJEvENJMbpxAdCy9Z7GpofQQA28y08wWYQbXY5tT9gmk++g2Lju3avGzFE4ijD0BuxR3on9t9AbEl/ErfM/DAQtyw+Jwd5tcsIlg9GUe2JY5UtH6LU4uro0G4f2vYv7f1Xnc0J64nxvfGD8aNhG13jlfHaODUuDGiExm/G78YfrT9bf7f+af27tj5+tJnznVG7Wv/9D6soSYc=</latexit>at

Credit assignment: What action causes reward?

<latexit sha1_base64="gfi6SL5j681Wde1MvJUNdF/DmQ4=">AAAOW3icfZdNb9s2HMbV7q3LmizdsNMOExYU6IYgkBLHL4cCtWQbPaxtFuRti4KAomlZMCUSFOXY1XTcp9l1+zAD9mFGyY4sUZR1yT98Hj7+6S9RIF2K/Ygbxr9Pnn7y6Weff/Hsy52vnu/ufb3/4puriMQMoktIMGE3LogQ9kN0yX2O0Q1lCAQuRtfuzM706zlikU/CC76k6C4AXuhPfAi4GLrf/8Gh/r3j8ini4JUDYD7K/3AiDji65z/d7x8YR0Z+6fXCXBcH2vo6u3+xe+CMCYwDFHKIQRTdmgbldwlg3IcYpTtOHCEK4Ax46Dbmk+5d4oc05iiEqf5SaJMY65zoGaw+9hmCHC9FASDzRYIOp4AJTHFLO9WoCIUgQNHheO7TaFVGc29VcCD6cZcs8n6llYmJxwCd+nBRIUtAEAWAT2uD0TJwq4MoxojNg+pgRikYJecCMehHWQ/ORGM+0KzZ0QU5W+vTJZ2iMEqTmOG0PFEIiDE0ERPzMkI8pkl+M+K5z6LXnMXoMCvzsdcDwGbnaHwocioDVZwJJoBXh9wga06IHiAJAhCOE4emicPRgifO4VEqxJe6i4XZJYCNq87zNEmcrGeuq58La0V8XxLfy+KwJA7XP0LwWJ8Qps/F8ycs0oVRFxbmQxRVZ18Wsyf6pRx9VRKvZPG6JF7LohuX1LimzkvqvKY+lNQHWV2UxIUsLkviUhY/lsSPsnizLfa3bbG/S7Gi/2JRLHecMZqIz0r+CiUzmCZvL979kia9/Fq/CzHSzaoRuo/Gk1G70+uksowf9daoa1qDul4YOlbftFWGwtHv2MZguGE5lrwFtGG0h/22HAXxRu/27FFd38Aa/c7AUhg2tCO7Neys24dQKFm9wmf32q1aW7wip2dZ1mmvrhcGq2Xb3WOFoXDYg8Ggb+coNGYUI8lLH43t9qlRj6JFUNdot/oKffMAjK5l1WBpicUaWebAzFk4Alhy8uJlsbv93lAO4pv+W33brj1AXmp/1zYHLYVhw3o6OB2e5CSEgdCTu0KK9rU73ZOuHEWKoFFHPMEaC9n80qhnGZ0aCymxjCzbyhpbXYlikd2ad8nqk7tZd/qBqaepbBYLTTZna+/RLHmxwoyb3Ur7Nr96Am6Er98phE3xUBEOG2GgigU2w0MlPNwG79X9XlO8pwj3GmE8FYvXDO8p4b1t8LTup03xVBFOG2GoioU2w1MlPN0Gz+t+3hTPFeG8EYarWHgzPFfC823wpO4nTfFEEU4aYYiKhTTDEyU82QLP0IPY8mU7BfF2JawWuTo65DrECajp+YEilwlOopq8OoFkuhsIpvyfugfEhUOUsj72I0jikK8hMU4cDwgtXe1paHYEAVjPtvAE63643uVUPsA0mzyDYuO7cq8aMUDiKMPQO7FH+iD230BsSX8Wt8y8wBe3LP46h1m1zQgWj0ZR7YhjlSkfourF9fGR2ToyzV9bB2/a6xPWM+177UftlWZqHe2N9lY70y41qP2p/aX9rf2z+9/e072dvecr69Mn6znfapVr77v/ARAATx0=</latexit>

⇡✓(at|st)

<latexit sha1_base64="/pzW3Mv4fuxhZwPDsc7BaKD9MQQ=">AAAOQnicfZfLbuM2GIU109s0rduZdtmN0GCAoggCKXF8WQwwlmxjFp2ZNMitjYOAomlZCCUSJOXYI+glum3fpk/RR+iy6LaLUrIjSxRlbcLwHB5/+iUKPz2KAy4s668nTz/6+JNPP3v2+d4XX7a++vr5i28uOYkZRBeQYMKuPcARDiJ0IQKB0TVlCIQeRlfevZvpVwvEeECic7Gi6DYEfhTMAgiEnLqecAEEuhN3z/etQyu/zPrA3gz2jc11eveitT+ZEhiHKBIQA85vbIuK2wQwEUCM0r1JzBEF8B746CYWs95tEkQ0FiiCqflSarMYm4KYGZM5DRiCAq/kAEAWyAQTzgEDUEjyvWoURxEIET+YLgLK10O+8NcDAeRt3ybLvCxpZWHiM0DnAVxWyBIQ8hCIeW2Sr0KvOolijNgirE5mlJJRcS4RgwHPanAqC/OeZpXm5+R0o89XdI4iniYxw2l5oRQQY2gmF+ZDjkRMk/xm5OO9568Ei9FBNsznXg0Buz9D0wOZU5mo4swwAaI65YVZcSL0AEkYgmiaTGiaTARaimRycJhK8aXpYWn2CGDTqvMsTZJJVjPPM8+ktSK+K4nvVHFUEkebHyF4as4IMxfy+RPGTWk0pYUFEPHq6oti9cy8UKMvS+KlKl6VxCtV9OKSGtfURUld1NSHkvqgqsuSuFTFVUlcqeKHkvhBFa93xf6yK/ZXJVbWX26K1d5kimby65G/Qsk9TJM3529/SpN+fm3ehRiZdtUIvUfj8bjT7XdTVcaPenvcs51hXS8MXWdguzpD4Rh0XWs42rIcKd4C2rI6o0FHjYJ4q/f67riub2GtQXfoaAxb2rHbHnU35UMoUqx+4XP7nXatLH6R03cc56Rf1wuD03bd3pHGUDjc4XA4cHMUGjOKkeKlj8ZO58SqR9EiqGd12gONvn0AVs9xarC0xOKMHXto5ywCAaw4RfGyuL1Bf6QGiW39nYHr1h6gKJW/59rDtsawZT0ZnoyOcxLCQOSrVSFF+Trd3nFPjSJF0Lgrn2CNhWx/adx3rG6NhZRYxo7rZIWt7kS5yW7s22T9yd3uO3PfNtNUNcuNppqzvfdoVrxYY8bNbq19l1+/ADfC1+8UwqZ4qAmHjTBQxwKb4aEWHu6C9+t+vyne14T7jTC+jsVvhve18P4ueFr306Z4qgmnjTBUx0Kb4akWnu6CF3W/aIoXmnDRCCN0LKIZXmjhxS54UveTpniiCSeNMETHQprhiRae7IBn6EG2fFmnIN+uhNUiZU8uu9lchzgBNT0/T+QywQmvyZ6YIwEy3QslU/5P3QPiwiGHqj4NOCRxJDaQGCcTH0gtXfc0NDuCAGxmLTzBZhBtupzKB5hmi++hbHzX7nUhhkgeZRh6K3uk97L/BrIl/VHeMvPDQN6y/Ds5yEa7jGD5aJSjPXmsstVDVH1wdXRotw9t++f2/uvO5oT1zPjO+N74wbCNrvHaeGOcGhcGNLDxm/G78Ufrz9bfrX9a/66tT59s1nxrVK7Wf/8Db7dJIA==</latexit>st

<latexit sha1_base64="6BZVX3qWN3a5GYVMCDT6XJYIKOY=">AAAOS3icfZdNb9s2HMbVbl27bG7T7biLsKDAsAWBlDh+OQSoJdvoYW2zIG9bbAQUTcuCKZEgKceuoE+y6/Zt9gX2NXYcdhglO7JESdYlDJ+Hj3/8UxRIh2KPC8P4+8nTzz5/9sXzF1/uffV14+Wr/dffXHMSMoiuIMGE3TqAI+wF6Ep4AqNbyhDwHYxunLmd6DcLxLhHgkuxomjsAzfwph4EQnbd778aMfQA2OQ+Ej+Z8Zlxv39gHBnpo5cb5qZxoG2e8/vXjYPRhMDQR4GAGHB+ZxpUjCPAhAcxivdGIUcUwDlw0V0opp1x5AU0FCiAsf5GatMQ64LoCZw+8RiCAq9kA0DmyQQdzgADUMgp7BWjOAqAj/jhZOFRvm7yhbtuCCDnP46WaX3iwsDIZYDOPLgskEXA5z4Qs1InX/lOsROFGLGFX+xMKCWj4lwiBj2e1OBcFuYjTUrOL8n5Rp+t6AwFPI5ChuP8QCkgxtBUDkybHImQRulk5DrP+ZlgITpMmmnfWR+w+QWaHMqcQkcRZ4oJEMUux0+KE6AHSHwfBJNoRONoJNBSRKPDo1iKb3QHS7ND5DtSdF7EUTRKauY4+oW0FsQPOfGDKg5y4mDzIwRP9Clh+kKuP2Fcl0ZdWpgHES+OvspGT/UrNfo6J16r4k1OvFFFJ8ypYUld5NRFSX3IqQ+qusyJS1Vc5cSVKn7KiZ9U8XZX7K+7Yn9TYmX95aZY7Y0maCo/I+krFM1hHL27fP9zHHXTZ/MuhEg3i0boPBpPhq12tx2rMn7Um8OOafXLemZoWz3TrjJkjl7bNvqDLcux4s2gDaM16LXUKIi3eqdrD8v6FtbotftWhWFLO7Sbg/amfAgFitXNfHa31SyVxc1yupZlnXbLemawmrbdOa4wZA673+/37BSFhoxipHjpo7HVOjXKUTQL6hitZq9C3y6A0bGsEizNsVhDy+ybKYtAACtOkb0sdqfXHahBYlt/q2fbpQUUufJ3bLPfrDBsWU/7p4OTlIQwELhqVUhWvla7c9JRo0gWNGzLFSyxkO0vDbuW0S6xkBzL0LKtpLDFnSg32Z05jtaf3O2+0w9MPY5Vs9xoqjnZe49mxYsrzLjeXWnf5a8egGvhyzOFsC4eVoTDWhhYxQLr4WElPNwF75b9bl28WxHu1sK4VSxuPbxbCe/ugqdlP62LpxXhtBaGVrHQenhaCU93wYuyX9TFi4pwUQsjqlhEPbyohBe74EnZT+riSUU4qYUhVSykHp5UwpMd8OtrQXJSkG9XxEqR8kwuT7OpDnEESjoXQKBUJjjiJdkRMyRAoju+ZEr/KXtAmDlkU9UnHockDMQGEuNo5AKpxeszDU2uIADryRGeYN0LNqecwgeYJoPnUB581+51IfpIXmUYei/PSB/l+RvII+mPcsrM9T05Zfl3dJi0dhnB8tEoW3vyWmWql6hy4+b4yGwemeYvzYO3rc0N64X2nfa99oNmam3trfZOO9euNKiF2u/aH9qfjb8a/zT+bfy3tj59shnzrVZ4Xj77H4vTSr0=</latexit>

rt+1 = 0

<latexit sha1_base64="PlwOklHMGJBk/GATo+aTkqmNU+g=">AAAORnicfZfLbuM2GIU109s0bebSLrsRGgxQtEEgJY4viwHGkm3MojOTBrlMGwcBRdOyYEokSMqxR9BrdNu+TZ+hD9Fl0W0p2ZElirI2YXgOjz/9EoWfHsUBF5b196PHn3z62edfPPly76uv958+e/7imytOYgbRJSSYsA8e4AgHEboUgcDoA2UIhB5G197czfTrBWI8INGFWFF0GwI/CqYBBEJOjcdcAIHuEvGTnd49P7COrPwy6wN7MzgwNtfZ3Yv9g/GEwDhEkYAYcH5jW1TcJoCJAGKU7o1jjiiAc+Cjm1hMu7dJENFYoAim5kupTWNsCmJmXOYkYAgKvJIDAFkgE0w4AwxAIen3qlEcRSBE/HCyCChfD/nCXw8EkLd+myzz0qSVhYnPAJ0FcFkhS0DIQyBmtUm+Cr3qJIoxYouwOplRSkbFuUQMBjyrwZkszHuaVZtfkLONPlvRGYp4msQMp+WFUkCMoalcmA85EjFN8puRj3jOXwkWo8NsmM+9GgA2P0eTQ5lTmajiTDEBojrlhVlxInQPSRiCaJKMaZqMBVqKZHx4lErxpelhafYIYJOq8zxNknFWM88zz6W1Ir4rie9UcVgSh5sfIXhiTgkzF/L5E8ZNaTSlhQUQ8erqy2L11LxUo69K4pUqXpfEa1X04pIa19RFSV3U1PuSeq+qy5K4VMVVSVyp4seS+FEVP+yK/XVX7G9KrKy/3BSrvfEETeUXJH+FkjlMkzcXb39Ok15+bd6FGJl21Qi9B+PJqN3pdVJVxg96a9S1nUFdLwwdp2+7OkPh6HdcazDcshwr3gLastrDfluNgnird3vuqK5vYa1+Z+BoDFvakdsadjblQyhSrH7hc3vtVq0sfpHTcxzntFfXC4PTct3uscZQONzBYNB3cxQaM4qR4qUPxnb71KpH0SKoa7VbfY2+fQBW13FqsLTE4owce2DnLAIBrDhF8bK43X5vqAaJbf2dvuvWHqAolb/r2oOWxrBlPR2cDk9yEsJA5KtVIUX52p3uSVeNIkXQqCOfYI2FbH9p1HOsTo2FlFhGjutkha3uRLnJbuzbZP3J3e4788A201Q1y42mmrO992BWvFhjxs1urX2XX78AN8LX7xTCpnioCYeNMFDHApvhoRYe7oL3636/Kd7XhPuNML6OxW+G97Xw/i54WvfTpniqCaeNMFTHQpvhqRae7oIXdb9oiheacNEII3QsohleaOHFLnhS95OmeKIJJ40wRMdCmuGJFp7sgGfoXrZ8Wacg366E1SJlTy672VyHOAE1PT9T5DLBCa/JnpghATLdCyVT/k/dA+LCIYeqPgk4JHEkNpAYJ2MfSC1d9zQ0O4IAbGYtPMFmEG26nMoHmGaL51A2vmv3uhADJI8yDL2VPdJ72X8D2ZL+KG+Z+WEgb1n+HR9mo11GsHwwytGePFbZ6iGqPrg+PrJbR7b9S+vgdXtzwnpifGd8b/xg2EbHeG28Mc6MSwMa1Pjd+MP4c/+v/X/2/93/b219/Giz5lujcj01/gelwUmd</latexit>st+1<latexit sha1_base64="9Fg4jXwGbDbUAdJrMoo6ifvIsd8=">AAAORnicfZfLbuM2GIU109s0bebSLrsRGgxQtEEgJY4viwHGkm3MojOTBrlMGwcBRdOyYEokSMqxR9BrdNu+TZ+hD9Fl0W0p2ZElirI2YXgOjz/9EoWfHsUBF5b196PHn3z62edfPPly76uv958+e/7imytOYgbRJSSYsA8e4AgHEboUgcDoA2UIhB5G197czfTrBWI8INGFWFF0GwI/CqYBBEJOjcdcAIHuEvHTcXr3/MA6svLLrA/szeDA2Fxndy/2D8YTAuMQRQJiwPmNbVFxmwAmAohRujeOOaIAzoGPbmIx7d4mQURjgSKYmi+lNo2xKYiZcZmTgCEo8EoOAGSBTDDhDDAAhaTfq0ZxFIEQ8cPJIqB8PeQLfz0QQN76bbLMS5NWFiY+A3QWwGWFLAEhD4GY1Sb5KvSqkyjGiC3C6mRGKRkV5xIxGPCsBmeyMO9pVm1+Qc42+mxFZyjiaRIznJYXSgExhqZyYT7kSMQ0yW9GPuI5fyVYjA6zYT73agDY/BxNDmVOZaKKM8UEiOqUF2bFidA9JGEIokkypmkyFmgpkvHhUSrFl6aHpdkjgE2qzvM0ScZZzTzPPJfWiviuJL5TxWFJHG5+hOCJOSXMXMjnTxg3pdGUFhZAxKurL4vVU/NSjb4qiVeqeF0Sr1XRi0tqXFMXJXVRU+9L6r2qLkviUhVXJXGlih9L4kdV/LAr9tddsb8psbL+clOs9sYTNJVfkPwVSuYwTd5cvP05TXr5tXkXYmTaVSP0Howno3an10lVGT/orVHXdgZ1vTB0nL7t6gyFo99xrcFwy3KseAtoy2oP+201CuKt3u25o7q+hbX6nYGjMWxpR25r2NmUD6FIsfqFz+21W7Wy+EVOz3Gc015dLwxOy3W7xxpD4XAHg0HfzVFozChGipc+GNvtU6seRYugrtVu9TX69gFYXcepwdISizNy7IGdswgEsOIUxcvidvu9oRoktvV3+q5be4CiVP6uaw9aGsOW9XRwOjzJSQgDka9WhRTla3e6J101ihRBo458gjUWsv2lUc+xOjUWUmIZOa6TFba6E+Umu7Fvk/Und7vvzAPbTFPVLDeaas723oNZ8WKNGTe7tfZdfv0C3Ahfv1MIm+KhJhw2wkAdC2yGh1p4uAver/v9pnhfE+43wvg6Fr8Z3tfC+7vgad1Pm+KpJpw2wlAdC22Gp1p4ugte1P2iKV5owkUjjNCxiGZ4oYUXu+BJ3U+a4okmnDTCEB0LaYYnWniyA56he9nyZZ2CfLsSVouUPbnsZnMd4gTU9PxMkcsEJ7wme2KGBMh0L5RM+T91D4gLhxyq+iTgkMSR2EBinIx9ILV03dPQ7AgCsJm18ASbQbTpciofYJotnkPZ+K7d60IMkDzKMPRW9kjvZf8NZEv6o7xl5oeBvGX5d3yYjXYZwfLBKEd78lhlq4eo+uD6+MhuHdn2L62D1+3NCeuJ8Z3xvfGDYRsd47XxxjgzLg1oUON34w/jz/2/9v/Z/3f/v7X18aPNmm+NyvXU+B+zmUme</latexit>st+2

<latexit sha1_base64="0Xz+aHDaXnZyuOfTbAOjKslbg60=">AAAOS3icfZdNb9s2HMbVbl27bG7T7biLsKDAsAWBlDh+OQSoJdvoYW2zIG9bbAQUTcuCKZGgKMeuoE+y6/Zt9gX2NXYcdhglO7JEStYlDJ+Hj3/8UxRIh2Iv5Ibx95Onn33+7IvnL77c++rrxstX+6+/uQ5JxCC6ggQTduuAEGEvQFfc4xjdUoaA72B048ztVL9ZIBZ6JLjkK4rGPnADb+pBwEXX/f6rEUMPgE3uY/7TcXJm3u8fGEdG9uhqw9w0DrTNc37/unEwmhAY+SjgEIMwvDMNyscxYNyDGCV7oyhEFMA5cNFdxKedcewFNOIogIn+RmjTCOuc6CmcPvEYghyvRANA5okEHc4AA5CLKeyVo0IUAB+Fh5OFR8N1M1y46wYHYv7jeJnVJykNjF0G6MyDyxJZDPzQB3ymdIYr3yl3oggjtvDLnSmlYJScS8SgF6Y1OBeF+UjTkoeX5Hyjz1Z0hoIwiSOGk+JAISDG0FQMzJoh4hGNs8mIdZ6HZ5xF6DBtZn1nfcDmF2hyKHJKHWWcKSaAl7scPy1OgB4g8X0QTOIRTeIRR0sejw6PEiG+0R0szA4R70jZeZHE8SitmePoF8JaEj8UxA+yOCiIg82PEDzRp4TpC7H+hIW6MOrCwjyIwvLoq3z0VL+So68L4rUs3hTEG1l0ooIaKeqioC4U9aGgPsjqsiAuZXFVEFey+KkgfpLF212xv+6K/U2KFfUXm2K1N5qgqfiMZK9QPIdJ/O7y/c9J3M2ezbsQId0sG6HzaDwZttrddiLL+FFvDjum1Vf13NC2eqZdZcgdvbZt9AdblmPJm0MbRmvQa8lREG/1TtceqvoW1ui1+1aFYUs7tJuD9qZ8CAWS1c19drfVVMri5jldy7JOu6qeG6ymbXeOKwy5w+73+z07Q6ERoxhJXvpobLVODTWK5kEdo9XsVejbBTA6lqXA0gKLNbTMvpmxcASw5OT5y2J3et2BHMS39bd6tq0sIC+Uv2Ob/WaFYct62j8dnGQkhIHAlatC8vK12p2TjhxF8qBhW6ygwkK2vzTsWkZbYSEFlqFlW2lhyztRbLI7cxyvP7nbfacfmHqSyGax0WRzuvcezZIXV5hxvbvSvstfPQDXwqszhbAuHlaEw1oYWMUC6+FhJTzcBe+qfrcu3q0Id2th3CoWtx7erYR3d8FT1U/r4mlFOK2FoVUstB6eVsLTXfBc9fO6eF4RzmtheBULr4fnlfB8FzxR/aQunlSEk1oYUsVC6uFJJTzZAb++FqQnBfF2xUyJFGdycZrNdIhjoOghBxxlMsFxqMgOnyEOUt3xBVP2j+oBUe4QTVmfeCEkUcA3kBjHIxcILVmfaWh6BQFYT4/wBOtesDnllD7ANB08h+Lgu3avC9FH4irD0HtxRvoozt9AHEl/FFNmru+JKYu/o8O0tcsIlo9G0doT1ypTvkSpjZvjI7N5ZJq/NA/etjY3rBfad9r32g+aqbW1t9o77Vy70qAWab9rf2h/Nv5q/NP4t/Hf2vr0yWbMt1rpefnsf6eESr8=</latexit>

rt+2 = 1

<latexit sha1_base64="LQ65ew/VbtFCIOFTc/0FK50zRhc=">AAAOY3icfZfdbts2GIbV7q/LliztdjYMEBYU67YgkBLHPwcFask2erC2WZA02aIgoGhaFkyJBEU5djVdwq5mp9uF7HwXMkp2ZImUrBN/5vvy9aNPokG6FPsRN4x/Hz3+6ONPPv3syec7X3y5u/fV/tNn7yMSM4guIcGEXbsgQtgP0SX3OUbXlCEQuBhduTM706/miEU+CS/4kqLbAHihP/Eh4GLobv8Hh/p3jsuniIMXDoD5aMJ/NtM/nIgDjlZffrzbPzCOjPzS1cJcFwfa+jq7e7p74IwJjAMUcohBFN2YBuW3CWDchxilO04cIQrgDHjoJuaT7m3ihzTmKISp/lxokxjrnOgZtD72GYIcL0UBIPNFgg6ngAlccWs71agIhSBA0eF47tNoVUZzb1VwIPpymyzyvqWViYnHAJ36cFEhS0AQBYBPlcFoGbjVQRRjxOZBdTCjFIySc4EY9KOsB2eiMe9o1vTogpyt9emSTlEYpUnMcFqeKATEGJqIiXkZIR7TJL8Z8fxn0UvOYnSYlfnYywFgs3M0PhQ5lYEqzgQTwKtDbpA1J0T3kAQBCMeJQ9PE4WjBE+fwKBXic93FwuwSwMZV53maJE7WM9fVz4W1Ir4tiW9lcVgSh+sfIXisTwjT5+L5ExbpwqgLC/MhiqqzL4vZE/1Sjn5fEt/L4lVJvJJFNy6psaLOS+pcUe9L6r2sLkriQhaXJXEpix9K4gdZvN4W+9u22N+lWNF/sSiWO84YTcTfS/4KJTOYJq8v3vySJr38Wr8LMdLNqhG6D8aTUbvT66SyjB/01qhrWgNVLwwdq2/adYbC0e/YxmC4YTmWvAW0YbSH/bYcBfFG7/bskapvYI1+Z2DVGDa0I7s17Kzbh1AoWb3CZ/faLaUtXpHTsyzrtKfqhcFq2Xb3uMZQOOzBYNC3cxQaM4qR5KUPxnb71FCjaBHUNdqtfo2+eQBG17IUWFpisUaWOTBzFo4Alpy8eFnsbr83lIP4pv9W37aVB8hL7e/a5qBVY9iwng5Ohyc5CWEg9OSukKJ97U73pCtHkSJo1BFPUGEhm18a9Syjo7CQEsvIsq2ssdWVKBbZjXmbrP5yN+tOPzD1NJXNYqHJ5mztPZglL64x42Z3rX2bv34CboRX7xTCpnhYEw4bYWAdC2yGh7XwcBu8p/q9pnivJtxrhPHqWLxmeK8W3tsGT1U/bYqnNeG0EYbWsdBmeFoLT7fBc9XPm+J5TThvhOF1LLwZntfC823wRPWTpnhSE04aYUgdC2mGJ7XwZAs8Q/diy5ftFMTblTAlcnWEyHWIE6Do+aEilwlOIkVenUQy3Q0EU/5F9YC4cIhS1sd+BEkc8jUkxonjAaGlqz0NzY4gAOvZFp5g3Q/Xu5zKHzDNJs+g2Piu3KtGDJA4yjD0RuyR3on9NxBb0p/ELTMv8MUti0/nMKu2GcHiwSiqHXGsMuVDlFpcHR+ZrSPT/LV18Kq9PmE90b7VvtdeaKbW0V5pr7Uz7VKD2p/aX9rf2j+7/+3t7D3b+2ZlffxoPedrrXLtffc/yxxSFQ==</latexit>

⇡✓(at+1|st+1)

To show this setup, let’s again look at the atari example. We see three states, represented using images. These go into our CNN policy, and we sample an action from them, in this case left (we see the bat go to the left). The ball hits a block on the third picture, which increases the score by one! That is a positive reward.

This brings into question the credit assignment problem: what was it that caused this positive reward? It probably was not bat moving left, but rather, several actions that were taken quite a few timesteps back, where the bat reflected the ball. We need to assign credit to exactly the action(s) that caused the positive reward because it allows us to ‘reinforce' these actions: These were good choices!

Actions , states and rewards Trajectories Markov decision processes (MDPs) model

• Policy • State transition distribution • Initial state distribution

<latexit sha1_base64="+LzeGFZxpTjAKZV/HNGMwJ8Fw9M=">AAAOQ3icfZdNb9s2HMbV7q3L5rXdjrsICwoMQxBIieOXQ4Faso0e1jYL8tbVQUDRtCyEEgmKcuwK+hS7bt9mX2JfYcdh1wGjZEeWSMq6+G8+Dx//9JdokB7FQcwt669Hjz/59LPPv3jy5d5XX7e+efrs+beXMUkYRBeQYMKuPRAjHEToggcco2vKEAg9jK68OzfXrxaIxQGJzvmKopsQ+FEwCyDgYuj9BMD885bfPtu3Dq3iMtXC3hT7xuY6vX3e2p9MCUxCFHGIQRx/sC3Kb1LAeAAxyvYmSYwogHfARx8SPuvdpEFEE44imJkvhDZLsMmJmUOZ04AhyPFKFACyQCSYcA6YgBPoe/WoGEUgRPHBdBHQeF3GC39dcCDu+yZdFn3JahNTnwE6D+CyRpaCMA4BnyuD8Sr06oMowYgtwvpgTikYJecSMRjEeQ9ORWPe0bzF8Tk53ejzFZ2jKM7ShOGsOlEIiDE0ExOLMkY8oWlxM+L53sUvOUvQQV4WYy+HgN2doemByKkN1HFmmABeH/LCvDkRuockDEE0TSc0SyccLXk6OTjMhPjC9LAwewSwad15lqXpJO+Z55lnwloT31bEt7I4qoijzY8QPDVnhJkL8fwJi01hNIWFBRDF9dkX5eyZeSFHX1bES1m8qohXsuglFTVR1EVFXSjqfUW9l9VlRVzK4qoirmTxY0X8KIvXu2Lf74r9VYoV/ReLYrU3maKZ+PsoXqH0Dmbp6/M3P2dpv7g270KCTLtuhN6D8Xjc6fa7mSzjB7097tnOUNVLQ9cZ2K7OUDoGXdcajrYsR5K3hLaszmjQkaMg3uq9vjtW9S2sNegOHY1hSzt226Pupn0IRZLVL31uv9NW2uKXOX3HcU76ql4anLbr9o40htLhDofDgVug0IRRjCQvfTB2OieWGkXLoJ7VaQ80+vYBWD3HUWBphcUZO/bQLlg4Alhy8vJlcXuD/kgO4tv+OwPXVR4gr7S/59rDtsawZT0ZnoyOCxLCQOTLXSFl+zrd3nFPjiJl0LgrnqDCQra/NO47VldhIRWWseM6eWPrK1Essg/2Tbr+y92uO3PfNrNMNouFJpvztfdglrxYY8bNbq19l18/ATfCq3cKYVM81ITDRhioY4HN8FALD3fB+6rfb4r3NeF+I4yvY/Gb4X0tvL8Lnqp+2hRPNeG0EYbqWGgzPNXC013wXPXzpniuCeeNMFzHwpvhuRae74Inqp80xRNNOGmEIToW0gxPtPBkBzxD92LLl+8UxNuVMiVyfWAodIhToOgxBxwVMsFprMgenyMOct0LBVPxRfWApHSIUtanQQxJEvENJMbpxAdCy9Z7GpofQQA28y08wWYQbXY5tT9gmk++g2Lju3avGzFE4ijD0BuxR3on9t9AbEl/ErfM/DAQtyw+Jwd5tcsIlg9GUe2JY5UtH6LU4uro0G4f2vYv7f1Xnc0J64nxvfGD8aNhG13jlfHaODUuDGiExm/G78YfrT9bf7f+af27tj5+tJnznVG7Wv/9D6soSYc=</latexit>at<latexit sha1_base64="/pzW3Mv4fuxhZwPDsc7BaKD9MQQ=">AAAOQnicfZfLbuM2GIU109s0rduZdtmN0GCAoggCKXF8WQwwlmxjFp2ZNMitjYOAomlZCCUSJOXYI+glum3fpk/RR+iy6LaLUrIjSxRlbcLwHB5/+iUKPz2KAy4s668nTz/6+JNPP3v2+d4XX7a++vr5i28uOYkZRBeQYMKuPcARDiJ0IQKB0TVlCIQeRlfevZvpVwvEeECic7Gi6DYEfhTMAgiEnLqecAEEuhN3z/etQyu/zPrA3gz2jc11eveitT+ZEhiHKBIQA85vbIuK2wQwEUCM0r1JzBEF8B746CYWs95tEkQ0FiiCqflSarMYm4KYGZM5DRiCAq/kAEAWyAQTzgEDUEjyvWoURxEIET+YLgLK10O+8NcDAeRt3ybLvCxpZWHiM0DnAVxWyBIQ8hCIeW2Sr0KvOolijNgirE5mlJJRcS4RgwHPanAqC/OeZpXm5+R0o89XdI4iniYxw2l5oRQQY2gmF+ZDjkRMk/xm5OO9568Ei9FBNsznXg0Buz9D0wOZU5mo4swwAaI65YVZcSL0AEkYgmiaTGiaTARaimRycJhK8aXpYWn2CGDTqvMsTZJJVjPPM8+ktSK+K4nvVHFUEkebHyF4as4IMxfy+RPGTWk0pYUFEPHq6oti9cy8UKMvS+KlKl6VxCtV9OKSGtfURUld1NSHkvqgqsuSuFTFVUlcqeKHkvhBFa93xf6yK/ZXJVbWX26K1d5kimby65G/Qsk9TJM3529/SpN+fm3ehRiZdtUIvUfj8bjT7XdTVcaPenvcs51hXS8MXWdguzpD4Rh0XWs42rIcKd4C2rI6o0FHjYJ4q/f67riub2GtQXfoaAxb2rHbHnU35UMoUqx+4XP7nXatLH6R03cc56Rf1wuD03bd3pHGUDjc4XA4cHMUGjOKkeKlj8ZO58SqR9EiqGd12gONvn0AVs9xarC0xOKMHXto5ywCAaw4RfGyuL1Bf6QGiW39nYHr1h6gKJW/59rDtsawZT0ZnoyOcxLCQOSrVSFF+Trd3nFPjSJF0Lgrn2CNhWx/adx3rG6NhZRYxo7rZIWt7kS5yW7s22T9yd3uO3PfNtNUNcuNppqzvfdoVrxYY8bNbq19l1+/ADfC1+8UwqZ4qAmHjTBQxwKb4aEWHu6C9+t+vyne14T7jTC+jsVvhve18P4ueFr306Z4qgmnjTBUx0Kb4akWnu6CF3W/aIoXmnDRCCN0LKIZXmjhxS54UveTpniiCSeNMETHQprhiRae7IBn6EG2fFmnIN+uhNUiZU8uu9lchzgBNT0/T+QywQmvyZ6YIwEy3QslU/5P3QPiwiGHqj4NOCRxJDaQGCcTH0gtXfc0NDuCAGxmLTzBZhBtupzKB5hmi++hbHzX7nUhhkgeZRh6K3uk97L/BrIl/VHeMvPDQN6y/Ds5yEa7jGD5aJSjPXmsstVDVH1wdXRotw9t++f2/uvO5oT1zPjO+N74wbCNrvHaeGOcGhcGNLDxm/G78Ufrz9bfrX9a/66tT59s1nxrVK7Wf/8Db7dJIA==</latexit>st

<latexit sha1_base64="JCho6RHVaZ12zLSSoSGaNUMc/P8=">AAAOQ3icfZdNb9s2HMbV7q3L5rXdjrsICwoMQxBIieOXQ4Faso0e1jYL8tbVQUDRtCyEEgmKcuwK+hS7bt9mX2JfYcdh1wGjZEeWSMq65B8+Dx//9JcokB7FQcwt669Hjz/59LPPv3jy5d5XX7e+efrs+beXMUkYRBeQYMKuPRAjHEToggcco2vKEAg9jK68OzfXrxaIxQGJzvmKopsQ+FEwCyDgYuj9hKF7wKa3/PbZvnVoFZepFvam2Dc21+nt89b+ZEpgEqKIQwzi+INtUX6TAsYDiFG2N0liRAG8Az76kPBZ7yYNIppwFMHMfCG0WYJNTswcypwGDEGOV6IAkAUiwYRzwADkAn2vHhWjCIQoPpguAhqvy3jhrwsOxH3fpMuiL1ltYuozQOcBXNbIUhDGIeBzZTBehV59ECUYsUVYH8wpBaPkXCIGgzjvwalozDuatzo+J6cbfb6icxTFWZownFUnCgExhmZiYlHGiCc0LW5GPN+7+CVnCTrIy2Ls5RCwuzM0PRA5tYE6zgwTwOtDXpg3J0L3kIQhiKbphGbphKMlTycHh5kQX5geFmaPiLej7jzL0nSS98zzzDNhrYlvK+JbWRxVxNHmRwiemjPCzIV4/oTFpjCawsICiOL67Ity9sy8kKMvK+KlLF5VxCtZ9JKKmijqoqIuFPW+ot7L6rIiLmVxVRFXsvixIn6Uxetdse93xf4qxYr+i0Wx2ptM0Ux8PopXKL2DWfr6/M3PWdovrs27kCDTrhuh92A8Hne6/W4my/hBb497tjNU9dLQdQa2qzOUjkHXtYajLcuR5C2hLaszGnTkKIi3eq/vjlV9C2sNukNHY9jSjt32qLtpH0KRZPVLn9vvtJW2+GVO33Gck76qlwan7bq9I42hdLjD4XDgFig0YRQjyUsfjJ3OiaVG0TKoZ3XaA42+fQBWz3EUWFphccaOPbQLFo4Alpy8fFnc3qA/koP4tv/OwHWVB8gr7e+59rCtMWxZT4Yno+OChDAQ+XJXSNm+Trd33JOjSBk07oonqLCQ7S+N+47VVVhIhWXsuE7e2PpKFIvsg32Trj+523Vn7ttmlslmsdBkc772HsySF2vMuNmtte/y6yfgRnj1TiFsioeacNgIA3UssBkeauHhLnhf9ftN8b4m3G+E8XUsfjO8r4X3d8FT1U+b4qkmnDbCUB0LbYanWni6C56rft4UzzXhvBGG61h4MzzXwvNd8ET1k6Z4ogknjTBEx0Ka4YkWnuyAXx8I8p2CeLtSpkSKPbnYzRY6xClQ9JgDjgqZ4DRWZI/PEQe57oWCqfhH9YCkdIhS1qdBDEkS8Q0kxunEB0LL1nsamh9BADbzLTzBZhBtdjm1DzDNJ99BsfFdu9eNGCJxlGHojdgjvRP7byC2pD+JW2Z+GIhbFn8nB3m1ywiWD0ZR7YljlS0fotTi6ujQbh/a9i/t/VedzQnrifG98YPxo2EbXeOV8do4NS4MaITGb8bvxh+tP1t/t/5p/bu2Pn60mfOdUbta//0PDH9Jjg==</latexit>rt<latexit sha1_base64="nyvr2I+1rJ5Zdc1PSdcTqOSsGX4=">AAAOs3icfZdbb9s2GIaVdocua7x0u9yNsKDAMHiBlDg+XASoJdvoxdpmgZN0jYKAomlZMCUSJOXYFfRDd7H/Mkp2dJZ1ky98X75+9EkUSJtilwtN+/fgxctvvv3u+1c/HP74+qj10/Gbn285CRhEN5Bgwj7bgCPs+uhGuAKjz5Qh4NkY3dlLM9bvVohxl/hTsaHowQOO785dCIQcejx+smwBgkuLCyDQo9ZWLQATJS4ZegJs9qjLcqvrma5n+lmqn6nWjAiemsLpn3qUGaepcfp4fKKdasmlVgt9V5wou+vq8c3RicyGgYd8ATHg/F7XqHgIARMuxCg6tAKOKIBL4KD7QMz7D6Hr00AgH0bqW6nNA6wKosZNUGcuQ1DgjSwAZK5MUOECMEktW3VYjOLIBx7i7dnKpXxb8pWzLQSQfX4I18lziAoTQ4cBunDhukAWAo97QCwqg3zj2cVBFGDEVl5xMKaUjCXnGjHo8rgHV7Ixn2jcez4lVzt9saEL5PMoDBiO8hOlgBhDczkxKTkSAQ2Tm5Hv05JfChagdlwmY5cjwJbXaNaWOYWBIs4cEyCKQ7YXN8dHT5B4HvBnoUWj0BJoLUKrfRpJ8a1qY2m2iXxPis7rKAytuGe2rV5La0H8mBM/lsVxThzvfoTgmTonTF3J508YV6VRlRbmQsSLs2/S2XP1phx9mxNvy+JdTrwri3aQU4OKusqpq4r6lFOfyuo6J67L4iYnbsri15z4tSx+3hf7z77YL6VY2X+5KDaH1gzN5ecqeYXCJYzC99MPf0XhILl270KAVL1ohPaz8XzS7Q16UVnGz3pn0teNUVVPDT1jqJt1htQx7JnaaJyxnJW8KbSmdcfDbjkK4kzvD8xJVc9gtWFvZNQYMtqJ2Rn3du1DyC9ZndRnDrqdSlucNGdgGMbFoKqnBqNjmv2zGkPqMEej0dBMUGjAKEYlL302drsXWjWKpkF9rdsZ1ujZA9D6hlGBpTkWY2LoIz1hEQjgklOkL4vZHw7G5SCR9d8YmmblAYpc+/umPurUGDLWi9HF+DwhIQz4TrkrJG1ft9c/75ejSBo06cknWGEh2S9NBobWq7CQHMvEMI24scWVKBfZvf4Qbj+52bpTT3Q1ispmudDK5njtPZtLXlxjxs3uWvs+f/0E3AhfvVMIm+JhTThshIF1LLAZHtbCw33wTtXvNMU7NeFOI4xTx+I0wzu18M4+eFr106Z4WhNOG2FoHQtthqe18HQfvKj6RVO8qAkXjTCijkU0w4taeLEPnlT9pCme1ISTRhhSx0Ka4UktPNkDvz0axDsF+XaFrBK5PUkkOsQhqOjJeSKRCQ55RbbFAgkQ67YnmZJ/qh4QpA5ZlvWZyyEJfLGDxDi0HCC1aLunofERBGA13sITrLr+bpdT+ADTePISyo3v1r1txAjJowxDH+Qe6ZPcfwO5Jf1D3jJzPFfesvxrteNqnxGsn42yOpTHKr18iKoWd2eneudU1//unLzr7k5Yr5Rfld+U3xVd6SnvlPfKlXKjQOW/g5cHrw+OWp3Wl5bdmm2tLw52c35RClfL+x8CFGve</latexit>

⌧ = s0,a0, r1, s1,a1, r2, s2 . . .aT-1, rT , sT<latexit sha1_base64="Ij/51rj3GbZAgyYSleGt0K1m3sU=">AAAOTHicfZfdbts2GIbVbuu6bO7S7XAnwoIC3RAEUuL456BALclGD9Y2C/LTLQ4CiqZlwZRIkJRjV9Od7HS7m13ArmOHw4BRsiNLlGQdJDTfl68ffRKNjy7FPheG8fejx598+tmTz59+sfflV61nX+8//+aKk4hBdAkJJuyDCzjCfoguhS8w+kAZAoGL0bU7t1P9eoEY90l4IVYU3QbAC/2pD4GQU3f7+/Tl2BUg+k3+nSEBfrjbPzCOjOzSqwNzMzjQNtfZ3fPWwXhCYBSgUEAMOL8xDSpuY8CEDzFK9sYRRxTAOfDQTSSmvdvYD2kkUAgT/YXUphHWBdFTOn3iMwQFXskBgMyXCTqcAQagkPewV47iKAQB4oeThU/5esgX3noggCzAbbzMCpSUFsYeA3Tmw2WJLAYBD4CYVSb5KnDLkyjCiC2C8mRKKRkV5xIx6PO0BmeyMO9pWnN+Qc42+mxFZyjkSRwxnBQXSgExhqZyYTbkSEQ0zm5GPug5fyVYhA7TYTb3ygFsfo4mhzKnNFHGmWICRHnKDdLihOgekiAA4SQe0yQeC7QU8fjwKJHiC93F0uwSwCZl53kSx+O0Zq6rn0trSXxXEN+p4rAgDjdfQvBEnxKmL+TzJ4zr0qhLC/Mh4uXVl/nqqX6pRl8VxCtVvC6I16roRgU1qqiLgrqoqPcF9V5VlwVxqYqrgrhSxY8F8aMqftgV+8uu2F+VWFl/uSlWe+MJmsrfkewViucwid9cvP0pifvZtXkXIqSbZSN0H4wno063301UGT/o7VHPtJyqnhu61sC06wy5Y9C1DWe4ZTlWvDm0YXSGg44aBfFW7/XtUVXfwhqDrmPVGLa0I7s97G7Kh1CoWL3cZ/c77UpZvDynb1nWab+q5warbdu94xpD7rAdxxnYGQqNGMVI8dIHY6dzalSjaB7UMzrtQY2+fQBGz7IqsLTAYo0s0zEzFoEAVpwif1ns3qA/VIPEtv7WwLYrD1AUyt+zTaddY9iynjqnw5OMhDAQempVSF6+Trd30lOjSB406sonWGEh228a9S2jW2EhBZaRZVtpYcs7UW6yG/M2Xv/kbvedfmDqSaKa5UZTzeneezArXlxjxs3uWvsuf/0C3AhfvVMIm+JhTThshIF1LLAZHtbCw13wXtXvNcV7NeFeI4xXx+I1w3u18N4ueFr106Z4WhNOG2FoHQtthqe18HQXvKj6RVO8qAkXjTCijkU0w4taeLELnlT9pCme1ISTRhhSx0Ka4UktPNkBz9C9bPnSTkG+XTGrRMqeXHazmQ5xDCo6F0CgTCY45hV5fdxIdTeQTNmHqgdEuUMOVX3ic0iiUGwgMY7HHpBasu5paHoEAVhPW3iCdT/cdDmlH2CaLp5D2fiu3etCOEgeZRh6K3uk97L/BrIl/VHeMvMCX96y/D8+TEe7jGD5YJSjPXmsMtVDVHVwfXxkto9M8+f2wevO5oT1VPtO+157qZlaV3utvdHOtEsNagvtd+0P7c/WX61/Wv+2/ltbHz/arPlWK13PnvwPn69Law==</latexit>

p(⌧|✓)

<latexit sha1_base64="q+zu+qOySWKNgXCi+xVWreMKZs0=">AAAOvnicfZfbbts2HMaV7tRlq5duFxuwG2FBgWTzAilxfLgIVku20Yu1zYqctigzKJqWBVMiQVGOXUU3e8u9wR5jlCzLOloXNs3v4+ef/iIF0qTY9rii/Lv37JNPP/v8i+df7n/19YvGNwcvv73xiM8guoYEE3ZnAg9h20XX3OYY3VGGgGNidGvO9Ui/XSDm2cS94iuKHhxgufbUhoCLrvHBP/TIMDnwn8TnDHFwfCE6PA44GivHBmVkMg74hRL+HVz9qoYGtceJ8cgAMI7gT4mfH6dDA/6LGjZlg6FHwCbrn/KTnKqRthl+PD44VE6U+JLLDTVpHErJdTl++eLQmBDoO8jlEAPPu1cVyh8CwLgNMQr3Dd9DFMA5sNC9z6fdh8B2qc+RC0P5ldCmPpY5kaN6yBObIcjxSjQAZLZIkOEMMEEnqrafj/KQCxzkNScLm3rrprew1g0ORMkfgmX8SMLcwMBigM5suMyRBcDxHMBnpU5v5Zj5TuRjxBZOvjOiFIwF5xIxaHtRDS5FYd7TqMbeFblM9NmKzpDrhYHPcJgdKATEGJqKgXHTQ9ynQXwzYmrNvQvOfNSMmnHfxQCw+Qc0aYqcXEceZ4oJ4Pku04mK46JHSBwHuJPAoGFgcLTkgdE8CYX4SjaxMJtETJ2880MYBEZUM9OUPwhrTnyXEd8VxWFGHCZ/QvBEnhImL8TzJ8yThVEWFmZD5OVHX6ejp/J1MfomI94UxduMeFsUTT+j+iV1kVEXJfUxoz4W1WVGXBbFVUZcFcWPGfFjUbzbFfvnrti/CrGi/mJRrPaNCZqKN1c8hYI5DIM3V29/D4NefCVzwUeymjdCc2M8G7U7vU5YlPFGb426qjYo66mho/VVvcqQOvodXRkMtyynBW8KrSjtYb9djIJ4q3d7+qisb2GVfmegVRi2tCO9Newk5UPILVit1Kf32q1SWaw0p6dp2nmvrKcGraXr3dMKQ+rQB4NBX49RqM8oRgUv3Rjb7XOlHEXToK7SbvUr9O0DULqaVoKlGRZtpKkDNWbhCOCCk6eTRe/2e8NiEN/WX+vreukB8kz5u7o6aFUYtqzng/PhWUxCGHCtYlVIWr52p3vWLUaRNGjUEU+wxEK2/zTqaUqnxEIyLCNN16LC5leiWGT36kOwfuVu1518qMphWDSLhVY0R2tvYy54cYUZ17sr7bv81QNwLXz5TiGsi4cV4bAWBlaxwHp4WAkPd8FbZb9VF29VhFu1MFYVi1UPb1XCW7vgadlP6+JpRTithaFVLLQenlbC013wvOzndfG8IpzXwvAqFl4Pzyvh+S54UvaTunhSEU5qYUgVC6mHJ5XwZAf8+rQQ7RTE7ApYKXJ9Yoh1iANQ0uPTRSwTHHgleX1uiXTTEUzxj7IH+KlDNIv6xPYg8V2eQGIcGBYQWrje09DoCAKwHG3hCZZtN9nl5F7ANBo8h2Lju3avCzFA4ijD0FuxR3ov9t9AbEl/FrfMLMcWtyy+jWbU2mUEy41RtPbFsUotHqLKjdvTE7V1oqp/tA5ft5MT1nPpR+kn6UhSpY70WnojXUrXEpT+22vsfb/3Q+O3Bmo4DbK2PttLxnwn5a7G8n/biXMp</latexit>

p(⌧|✓) = p(s0)T-1Y

t=0

⇡✓(at|st)p(st+1, rt+1|st,at)

<latexit sha1_base64="gfi6SL5j681Wde1MvJUNdF/DmQ4=">AAAOW3icfZdNb9s2HMbV7q3LmizdsNMOExYU6IYgkBLHL4cCtWQbPaxtFuRti4KAomlZMCUSFOXY1XTcp9l1+zAD9mFGyY4sUZR1yT98Hj7+6S9RIF2K/Ygbxr9Pnn7y6Weff/Hsy52vnu/ufb3/4puriMQMoktIMGE3LogQ9kN0yX2O0Q1lCAQuRtfuzM706zlikU/CC76k6C4AXuhPfAi4GLrf/8Gh/r3j8ini4JUDYD7K/3AiDji65z/d7x8YR0Z+6fXCXBcH2vo6u3+xe+CMCYwDFHKIQRTdmgbldwlg3IcYpTtOHCEK4Ax46Dbmk+5d4oc05iiEqf5SaJMY65zoGaw+9hmCHC9FASDzRYIOp4AJTHFLO9WoCIUgQNHheO7TaFVGc29VcCD6cZcs8n6llYmJxwCd+nBRIUtAEAWAT2uD0TJwq4MoxojNg+pgRikYJecCMehHWQ/ORGM+0KzZ0QU5W+vTJZ2iMEqTmOG0PFEIiDE0ERPzMkI8pkl+M+K5z6LXnMXoMCvzsdcDwGbnaHwocioDVZwJJoBXh9wga06IHiAJAhCOE4emicPRgifO4VEqxJe6i4XZJYCNq87zNEmcrGeuq58La0V8XxLfy+KwJA7XP0LwWJ8Qps/F8ycs0oVRFxbmQxRVZ18Wsyf6pRx9VRKvZPG6JF7LohuX1LimzkvqvKY+lNQHWV2UxIUsLkviUhY/lsSPsnizLfa3bbG/S7Gi/2JRLHecMZqIz0r+CiUzmCZvL979kia9/Fq/CzHSzaoRuo/Gk1G70+uksowf9daoa1qDul4YOlbftFWGwtHv2MZguGE5lrwFtGG0h/22HAXxRu/27FFd38Aa/c7AUhg2tCO7Neys24dQKFm9wmf32q1aW7wip2dZ1mmvrhcGq2Xb3WOFoXDYg8Ggb+coNGYUI8lLH43t9qlRj6JFUNdot/oKffMAjK5l1WBpicUaWebAzFk4Alhy8uJlsbv93lAO4pv+W33brj1AXmp/1zYHLYVhw3o6OB2e5CSEgdCTu0KK9rU73ZOuHEWKoFFHPMEaC9n80qhnGZ0aCymxjCzbyhpbXYlikd2ad8nqk7tZd/qBqaepbBYLTTZna+/RLHmxwoyb3Ur7Nr96Am6Er98phE3xUBEOG2GgigU2w0MlPNwG79X9XlO8pwj3GmE8FYvXDO8p4b1t8LTup03xVBFOG2GoioU2w1MlPN0Gz+t+3hTPFeG8EYarWHgzPFfC823wpO4nTfFEEU4aYYiKhTTDEyU82QLP0IPY8mU7BfF2JawWuTo65DrECajp+YEilwlOopq8OoFkuhsIpvyfugfEhUOUsj72I0jikK8hMU4cDwgtXe1paHYEAVjPtvAE63643uVUPsA0mzyDYuO7cq8aMUDiKMPQO7FH+iD230BsSX8Wt8y8wBe3LP46h1m1zQgWj0ZR7YhjlSkfourF9fGR2ToyzV9bB2/a6xPWM+177UftlWZqHe2N9lY70y41qP2p/aX9rf2z+9/e072dvecr69Mn6znfapVr77v/ARAATx0=</latexit>

⇡✓(at|st)<latexit sha1_base64="SYCfd0d/aUDyf9x/Pic/Ian8kLo=">AAAObnicfZfbbts2HMbV7tRlS5Zu2NUwTFhQIFuNQEocHy4K1JJt9GJtsyCnLQ4CiqZlwZRIkJRjV9N77Gl2u73CnmKvMEp2ZImSrBtT/D5+/ukvUiAdij0uDOPfJ08/+viTTz979vnOF1/u7n21//zrK05CBtElJJiwGwdwhL0AXQpPYHRDGQK+g9G1M7MT/XqOGPdIcCGWFN35wA28iQeBkF33+8f0cMQFEOg+Ei/NuKGPGHoAbLy6/WOtCdkPYDpC/HS/f2AcGemllxvmunGgra+z++e7B6MxgaGPAgEx4PzWNKi4iwATHsQo3hmFHFEAZ8BFt6GYdO4iL6ChQAGM9RdSm4RYF0RP+PWxxxAUeCkbADJPJuhwCpikk0+5U4ziKAA+4o3x3KN81eRzd9UQQJboLlqkJYwLAyOXATr14KJAFgGf+0BMS5186TvFThRixOZ+sTOhlIyKc4EY9HhSgzNZmPc0qTG/IGdrfbqkUxTwOAoZjvMDpYAYQxM5MG1yJEIapQ8jp8KMvxIsRI2kmfa96gM2O0fjhswpdBRxJpgAUexy/KQ4AXqAxPdBMI5GNI5GAi1ENGocxVJ8oTtYmh0ip03ReR5H0SipmePo59JaEN/lxHeqOMiJg/WfEDzWJ4Tpc/n+CeO6NOrSwjyIeHH0ZTZ6ol+q0Vc58UoVr3PitSo6YU4NS+o8p85L6kNOfVDVRU5cqOIyJy5V8UNO/KCKN9tif9sW+7sSK+svF8VyZzRGE/mlSadQNINx9Obi7S9x1E2v9VwIkW4WjdB5NJ4MW+1uO1Zl/Kg3hx3T6pf1zNC2eqZdZcgcvbZt9AcblmPFm0EbRmvQa6lREG/0TtcelvUNrNFr960Kw4Z2aDcH7XX5EAoUq5v57G6rWSqLm+V0Lcs67Zb1zGA1bbtzXGHIHHa/3+/ZKQoNGcVI8dJHY6t1apSjaBbUMVrNXoW+eQFGx7JKsDTHYg0ts2+mLAIBrDhFNlnsTq87UIPEpv5Wz7ZLL1Dkyt+xzX6zwrBhPe2fDk5SEsJA4KpVIVn5Wu3OSUeNIlnQsC3fYImFbP5p2LWMdomF5FiGlm0lhS2uRLnIbs27aPXJ3aw7/cDU41g1y4WmmpO192hWvLjCjOvdlfZt/uoBuBa+/KQQ1sXDinBYCwOrWGA9PKyEh9vg3bLfrYt3K8LdWhi3isWth3cr4d1t8LTsp3XxtCKc1sLQKhZaD08r4ek2eFH2i7p4UREuamFEFYuohxeV8GIbPCn7SV08qQgntTCkioXUw5NKeLIFfnVSSHYKcnZFrBS5OjGkOsQRKOnp2SKVCY54SXbEFAmQ6I4vmdKbsgeEmUM2VX3scUjCQKwhMY5GLpBavNrT0OQIArCebOEJ1r1gvcspfIBpMngG5cZ35V4Voo/kUYaht3KP9F7uv4Hckv4sH5m5vicfWf6OGklrmxEsHo2ytSOPVaZ6iCo3ro+PzOaRaf7aPHjdWp+wnmnfaT9qh5qptbXX2hvtTLvUoPan9pf2t/bP7n973+59v/fDyvr0yXrMN1rh2jv8H/d8Vic=</latexit>

p(st+1, rt+1|st,at)<latexit sha1_base64="CZu97J0BQEi9bXmmD2bQ3YcmuT4=">AAAORXicfZdNb9s2HMbV7q3L5q3djrsICwp0QxBIieOXQ4Faso0e1jYL8tItDgKKpmUhlEhQlGNX0MfYdfs2+w77DjsOu26U7MgSSVmXMHwePv7pL1H406M4iLll/fXo8Ucff/LpZ08+3/viy9ZXXz999s1lTBIG0QUkmLD3HogRDiJ0wQOO0XvKEAg9jK68OzfXrxaIxQGJzvmKopsQ+FEwCyDgYuqavpjEHHB0a/1w+3TfOrSKy1QH9mawb2yu09tnrf3JlMAkRBGHGMTxtW1RfpMCxgOIUbY3SWJEAbwDPrpO+Kx3kwYRTTiKYGY+F9oswSYnZo5lTgOGIMcrMQCQBSLBhHPAAOQCfq8eFaMIhCg+mC4CGq+H8cJfDzgQd36TLovKZLWFqc8AnQdwWSNLQRiHgM+VyXgVevVJlGDEFmF9MqcUjJJziRgM4rwGp6Iw72he7PicnG70+YrOURRnacJwVl0oBMQYmomFxTBGPKFpcTPiCd/FLzlL0EE+LOZeDgG7O0PTA5FTm6jjzDABvD7lhXlxInQPSRiCaJpOaJZOOFrydHJwmAnxuelhYfYIYNO68yxL00leM88zz4S1Jr6tiG9lcVQRR5sfIXhqzggzF+L5ExabwmgKCwsgiuurL8rVM/NCjr6siJeyeFURr2TRSypqoqiLirpQ1PuKei+ry4q4lMVVRVzJ4oeK+EEW3++K/WVX7K9SrKi/2BSrvckUzcQHpHiF0juYpa/P3/yUpf3i2rwLCTLtuhF6D8bjcafb72ayjB/09rhnO0NVLw1dZ2C7OkPpGHRdazjashxJ3hLasjqjQUeOgnir9/ruWNW3sNagO3Q0hi3t2G2PupvyIRRJVr/0uf1OWymLX+b0Hcc56at6aXDarts70hhKhzscDgdugUITRjGSvPTB2OmcWGoULYN6Vqc90OjbB2D1HEeBpRUWZ+zYQ7tg4QhgycnLl8XtDfojOYhv6+8MXFd5gLxS/p5rD9saw5b1ZHgyOi5ICAORL1eFlOXrdHvHPTmKlEHjrniCCgvZ/tK471hdhYVUWMaO6+SFre9Escmu7Zt0/cnd7jtz3zazTDaLjSab8733YJa8WGPGzW6tfZdfvwA3wqt3CmFTPNSEw0YYqGOBzfBQCw93wfuq32+K9zXhfiOMr2Pxm+F9Lby/C56qftoUTzXhtBGG6lhoMzzVwtNd8Fz186Z4rgnnjTBcx8Kb4bkWnu+CJ6qfNMUTTThphCE6FtIMT7TwZAc8Q/ei5cs7BfF2pUyJFD256GYLHeIUKHpxoChkgtNYkT0+RxzkuhcKpuIf1QOS0iGGsj4NYkiSiG8gMU4nPhBatu5paH4EAdjMW3iCzSDadDm1DzDNF99B0fiu3etCDJE4yjD0RvRI70T/DURL+qO4ZeaHgbhl8XdykI92GcHywShGe+JYZcuHKHVwdXRotw9t++f2/qvO5oT1xPjO+N54YdhG13hlvDZOjQsDGsT4zfjd+KP1Z+vv1j+tf9fWx482a741alfrv/8BzsRJuw==</latexit>

p(s0)

MARKOV DECISION PROCESSES

10

Markov decision processes model the RL loop. An MDP is defined as a distribution over trajectories: What trajectories are most likely, given that our agent acts in the environment according to the policy \pi_\theta?

An MDP defines the probability of a trajectory, when interacting using policy \pi_theta. This probability is given with the long equation. We will explain in a lot of detail in the coming slides what exactly this means.

The probability of a trajectory is build up using three probability distributions: The first is the policy, we have seen this one before. It represents our agent, and defines a distribution over actions given states.

The second distribution is the state transition distribution, which represents the environment. This one is more complicated: It defines a distribution over what the next state of the environment is, given our current state of the environment and the action performed by the agent. It is jointly distributed with the reward, which is also dependent both on the previous state and action.

MARKOV DECISION PROCESSES

11

<latexit sha1_base64="1XD/drTWwzJ56XWvUlFI4PIcB5s=">AAAN2HicfZdNb9s2HMbV7q3L6q1dj7sICwp0QxBIieMXFAVqSTZ6WNssyFsbBwFF04pgSiQoyrErCDsOuw7YdfsW+y77NqNkR5YoyrqY4PPw8U9/icKfLsV+xA3jvwcPP/v8iy+/evT1zjePW99+9+Tp9+cRiRlEZ5Bgwi5dECHsh+iM+xyjS8oQCFyMLtyZnekXc8Qin4SnfEnRdQC80J/6EHAxdUVfjCMOOLoxfrp5smvsG/ml1wfmerCrra/jm6eP/x1PCIwDFHKIQRRdmQbl1wlg3IcYpTvjOEIUwBnw0FXMp73rxA9pzFEIU/250KYx1jnRMyx94jMEOV6KAYDMFwk6vAUMQC7gd6pREQpBgKK9ydyn0WoYzb3VgANx59fJIq9MWlmYeAzQWx8uKmQJCKIA8NvaZLQM3OokijFi86A6mVEKRsm5QAz6UVaDY1GY9zQrdnRKjtf67ZLeojBKk5jhtLxQCIgxNBUL82GEeEyT/GbEE55FrziL0V42zOdeOYDNTtBkT+RUJqo4U0wAr065QVacEN1BEgQgnCRjmiZjjhY8Ge/tp0J8rrtYmF0C2KTqPEmTZJzVzHX1E2GtiO9K4jtZHJbE4fpPCJ7oU8L0uXj+hEW6MOrCwnyIourqs2L1VD+To89L4rksXpTEC1l045Ia19R5SZ3X1LuSeieri5K4kMVlSVzK4qeS+EkWL7fFftgW+1GKFfUXm2K5M56gqfiA5K9QMoNp8ub07S9p0s+v9bsQI92sGqF7bzwcdbr9birL+F5vj3qm5dT1wtC1BqatMhSOQdc2nOGG5UDyFtCG0RkOOnIUxBu917dHdX0Dawy6jqUwbGhHdnvYXZcPoVCyeoXP7nfatbJ4RU7fsqyjfl0vDFbbtnsHCkPhsB3HGdg5Co0ZxUjy0ntjp3Nk1KNoEdQzOu2BQt88AKNnWTVYWmKxRpbpmDkLRwBLTl68LHZv0B/KQXxTf2tg27UHyEvl79mm01YYNqxHztHwMCchDISeXBVSlK/T7R325ChSBI264gnWWMjmn0Z9y+jWWEiJZWTZVlbY6k4Um+zKvE5Wn9zNvtN3TT1NZbPYaLI523v3ZsmLFWbc7Fbat/nVC3AjfP1OIWyKh4pw2AgDVSywGR4q4eE2eK/u95riPUW41wjjqVi8ZnhPCe9tg6d1P22Kp4pw2ghDVSy0GZ4q4ek2eF7386Z4rgjnjTBcxcKb4bkSnm+DJ3U/aYoninDSCENULKQZnijhyRZ4hu5Ey5d1CuLtSlgtUvTkopvNdYgTUNPzA0UuE5xE6arNoNmpAGA966oJ1v1w3XhUvok0WzWDohdduVdsDhKnC4beirblvWiJgegSfxYUzAt8QSF+x3vZaJsRLO6NYrQjTjqmfK6pDy4O9s32vmn+2t59/XJ96Hmk/aD9qL3QTK2rvdbeaMfamQY1ov2l/a390/rQ+q31e+uPlfXhg/WaZ1rlav35Px0uGww=</latexit>

p(s0)<latexit sha1_base64="YHkXKgzGtE9903LLKx5hW4i+GOE=">AAAN+XicfZfbbts2HMbV7tRl9ZZul7sRFhToCiOQEscHDAVqyTZ6sbZZkNMaBwFF04pgSiQoyrGj6R32CLscdjugt9tr7G1GyY4sUZR5Y4Lfx88//UUKpEOxF3LD+O/R408+/ezzL558ufPV08bX3+w++/Y8JBGD6AwSTNilA0KEvQCdcY9jdEkZAr6D0YUzs1P9Yo5Y6JHglC8puvaBG3hTDwIuhm52X9IX45ADjm7Mpj5m6A6wyY352xjATDfE4Eo2frzZ3TP2jazp1Y657uxp63Z88+zpx/GEwMhHAYcYhOGVaVB+HQPGPYhRsjOOQkQBnAEXXUV82r2OvYBGHAUw0Z8LbRphnRM95dYnHkOQ46XoAMg8kaDDW8AEp3i6nXJUiALgo7A5mXs0XHXDubvqcCBKcx0vstIlpYmxywC99eCiRBYDP/QBv60MhkvfKQ+iCCM298uDKaVglJwLxKAXpjU4FoV5T9Nqh6fkeK3fLuktCsIkjhhOihOFgBhDUzEx64aIRzTOHkYsgVn4irMINdNuNvZqANjsBE2aIqc0UMaZYgJ4ecjx0+IE6A4S3wfBJB7TJB5ztODxuLmfCPG57mBhdohYMmXnSRLH47RmjqOfCGtJfFcQ38nisCAO139C8ESfEqbPxfsnLNSFURcW5kEUlmef5bOn+pkcfV4Qz2XxoiBeyKITFdSoos4L6ryi3hXUO1ldFMSFLC4L4lIW7wvivSxebov9dVvsBylW1F9siuXOeIKm4guTLaF4BpP4zenbn5O4l7X1WoiQbpaN0HkwHo7anV4nkWX8oLdGXdMaVPXc0LH6pq0y5I5+xzYGww3LgeTNoQ2jPey35SiIN3q3Z4+q+gbW6HcGlsKwoR3ZrWFnXT6EAsnq5j67125VyuLmOT3Lso56VT03WC3b7h4oDLnDHgwGfTtDoRGjGEle+mBst4+MahTNg7pGu9VX6JsXYHQtqwJLCyzWyDIHZsbCEcCSk+eLxe72e0M5iG/qb/Vtu/ICeaH8XdsctBSGDevR4Gh4mJEQBgJXrgrJy9fudA+7chTJg0Yd8QYrLGTzT6OeZXQqLKTAMrJsKy1seSeKTXZlXserT+5m3+l7pp4ksllsNNmc7r0Hs+TFCjOudyvt2/zqCbgWvvqkENbFQ0U4rIWBKhZYDw+V8HAbvFv1u3XxriLcrYVxVSxuPbyrhHe3wdOqn9bFU0U4rYWhKhZaD0+V8HQbPK/6eV08V4TzWhiuYuH18FwJz7fBk6qf1MUTRTiphSEqFlIPT5TwZAv86paQnhTE6opZJXJ1d8h0iGNQ0bMLRSYTHIfJ6phB01sBwHp6qiZY94L1waP0TaTprBkUZ9GVe8U2QOJ2wdBbcWx5L47EQJwSXwoK5vqeoBC/42ba22YEiwej6O2Im44p32uqnYuDfbO1b5q/tPZe/7S+9DzRvtd+0F5optbRXmtvtGPtTIPa79pH7R/t38Z944/Gn42/VtbHj9ZzvtNKrfH3/2uAJ24=</latexit>

p(s1, r1|a0, s0)<latexit sha1_base64="TtotXtVyiolaey0Pj34Znxjw2ao=">AAAN+XicfZfbbts2HMbV7tRl9ZZul7sRFhToCiOQEscHDAVqyTZ6sbZZkNMaBwFF04pgSiQoyrGj6R32CLscdjugt9tr7G1GyY4sUZR5Y4Lfx88//UUKpEOxF3LD+O/R408+/ezzL558ufPV08bX3+w++/Y8JBGD6AwSTNilA0KEvQCdcY9jdEkZAr6D0YUzs1P9Yo5Y6JHglC8puvaBG3hTDwIuhm52X9IX45ADjm4OmvqYoTvAJjcHv40BzHRTDK5k88eb3T1j38iaXu2Y686etm7HN8+efhxPCIx8FHCIQRhemQbl1zFg3IMYJTvjKEQUwBlw0VXEp93r2AtoxFEAE/250KYR1jnRU2594jEEOV6KDoDMEwk6vAVMcIqn2ylHhSgAPgqbk7lHw1U3nLurDgeiNNfxIitdUpoYuwzQWw8uSmQx8EMf8NvKYLj0nfIgijBic788mFIKRsm5QAx6YVqDY1GY9zStdnhKjtf67ZLeoiBM4ojhpDhRCIgxNBUTs26IeETj7GHEEpiFrziLUDPtZmOvBoDNTtCkKXJKA2WcKSaAl4ccPy1OgO4g8X0QTOIxTeIxRwsej5v7iRCf6w4WZoeIJVN2niRxPE5r5jj6ibCWxHcF8Z0sDgvicP0nBE/0KWH6XLx/wkJdGHVhYR5EYXn2WT57qp/J0ecF8VwWLwrihSw6UUGNKuq8oM4r6l1BvZPVRUFcyOKyIC5l8b4g3svi5bbYX7fFfpBiRf3FpljujCdoKr4w2RKKZzCJ35y+/TmJe1lbr4UI6WbZCJ0H4+Go3el1ElnGD3pr1DWtQVXPDR2rb9oqQ+7od2xjMNywHEjeHNow2sN+W46CeKN3e/aoqm9gjX5nYCkMG9qR3Rp21uVDKJCsbu6ze+1WpSxuntOzLOuoV9Vzg9Wy7e6BwpA77MFg0LczFBoxipHkpQ/GdvvIqEbRPKhrtFt9hb55AUbXsiqwtMBijSxzYGYsHAEsOXm+WOxuvzeUg/im/lbftisvkBfK37XNQUth2LAeDY6GhxkJYSBw5aqQvHztTvewK0eRPGjUEW+wwkI2/zTqWUanwkIKLCPLttLClnei2GRX5nW8+uRu9p2+Z+pJIpvFRpPN6d57MEterDDjerfSvs2vnoBr4atPCmFdPFSEw1oYqGKB9fBQCQ+3wbtVv1sX7yrC3VoYV8Xi1sO7Snh3Gzyt+mldPFWE01oYqmKh9fBUCU+3wfOqn9fFc0U4r4XhKhZeD8+V8HwbPKn6SV08UYSTWhiiYiH18EQJT7bAr24J6UlBrK6YVSJXd4dMhzgGFT27UGQywXGYrI4ZNL0VAKynp2qCdS9YHzxK30SazppBcRZduVdsAyRuFwy9FceW9+JIDMQp8aWgYK7vCQrxO26mvW1GsHgwit6OuOmY8r2m2rk42Ddb+6b5S2vv9U/rS88T7XvtB+2FZmod7bX2RjvWzjSo/a591P7R/m3cN/5o/Nn4a2V9/Gg95zut1Bp//w+haSdy</latexit>

p(s2, r2|a1, s1)<latexit sha1_base64="dlBFF7y4eqn9g3p0iFNRijY0e0s=">AAAN+XicfZfbbts2HMbV7tRl9ZZul7sRFhToCiOQEscHDAVqyTZ6sbZZkNMaBwFF04pgSiQoyrGj6R32CLscdjugt9tr7G1GyY4sUZR5Y4Lfx88//UUKpEOxF3LD+O/R408+/ezzL558ufPV08bX3+w++/Y8JBGD6AwSTNilA0KEvQCdcY9jdEkZAr6D0YUzs1P9Yo5Y6JHglC8puvaBG3hTDwIuhm52X9IX45ADjm4Om/qYoTvAJjeHv40BzPQDMbiSD3682d0z9o2s6dWOue7saet2fPPs6cfxhMDIRwGHGIThlWlQfh0Dxj2IUbIzjkJEAZwBF11FfNq9jr2ARhwFMNGfC20aYZ0TPeXWJx5DkOOl6ADIPJGgw1vABKd4up1yVIgC4KOwOZl7NFx1w7m76nAgSnMdL7LSJaWJscsAvfXgokQWAz/0Ab+tDIZL3ykPoggjNvfLgymlYJScC8SgF6Y1OBaFeU/Taoen5Hit3y7pLQrCJI4YTooThYAYQ1MxMeuGiEc0zh5GLIFZ+IqzCDXTbjb2agDY7ARNmiKnNFDGmWICeHnI8dPiBOgOEt8HwSQe0yQec7Tg8bi5nwjxue5gYXaIWDJl50kSx+O0Zo6jnwhrSXxXEN/J4rAgDtd/QvBEnxKmz8X7JyzUhVEXFuZBFJZnn+Wzp/qZHH1eEM9l8aIgXsiiExXUqKLOC+q8ot4V1DtZXRTEhSwuC+JSFu8L4r0sXm6L/XVb7AcpVtRfbIrlzniCpuILky2heAaT+M3p25+TuJe19VqIkG6WjdB5MB6O2p1eJ5Fl/KC3Rl3TGlT13NCx+qatMuSOfsc2BsMNy4HkzaENoz3st+UoiDd6t2ePqvoG1uh3BpbCsKEd2a1hZ10+hALJ6uY+u9duVcri5jk9y7KOelU9N1gt2+4eKAy5wx4MBn07Q6ERoxhJXvpgbLePjGoUzYO6RrvVV+ibF2B0LasCSwss1sgyB2bGwhHAkpPni8Xu9ntDOYhv6m/1bbvyAnmh/F3bHLQUhg3r0eBoeJiREAYCV64KycvX7nQPu3IUyYNGHfEGKyxk80+jnmV0KiykwDKybCstbHknik12ZV7Hq0/uZt/pe6aeJLJZbDTZnO69B7PkxQozrncr7dv86gm4Fr76pBDWxUNFOKyFgSoWWA8PlfBwG7xb9bt18a4i3K2FcVUsbj28q4R3t8HTqp/WxVNFOK2FoSoWWg9PlfB0Gzyv+nldPFeE81oYrmLh9fBcCc+3wZOqn9TFE0U4qYUhKhZSD0+U8GQL/OqWkJ4UxOqKWSVydXfIdIhjUNGzC0UmExyHyeqYQdNbAcB6eqomWPeC9cGj9E2k6awZFGfRlXvFNkDidsHQW3FseS+OxECcEl8KCub6nqAQv+Nm2ttmBIsHo+jtiJuOKd9rqp2Lg32ztW+av7T2Xv+0vvQ80b7XftBeaKbW0V5rb7Rj7UyD2u/aR+0f7d/GfeOPxp+Nv1bWx4/Wc77TSq3x9//XUid2</latexit>

p(s3, r3|a2, s2)

<latexit sha1_base64="v7EdJ+ThHXgJX0gdR//dSzJOt1g=">AAAOEnicfZdNb9s2HMbV7q3Lmi3djjtMWFCgG4JAShy/YChQS7bRw9pmQV66RUFA0bQsmBIJinLsajruE+xj7Loddhx23RfYPs0o2ZElirIu+YfPw8c//SUKpEuxH3HD+PfBw/fe/+DDjx59vPPJ491PP9t78vllRGIG0QUkmLC3LogQ9kN0wX2O0VvKEAhcjK7cmZ3pV3PEIp+E53xJ0U0AvNCf+BBwMXS795VD/VvH5VPEwTMHwHzU+NmJOODo1vjmdm/fODTyS68X5rrY19bX6e2Tx/85YwLjAIUcYhBF16ZB+U0CGPchRumOE0eIAjgDHrqO+aR7k/ghjTkKYao/Fdokxjonegarj32GIMdLUQDIfJGgwylgAlPc0k41KkIhCFB0MJ77NFqV0dxbFRyIftwki7xfaWVi4jFApz5cVMgSEEQB4NPaYLQM3OogijFi86A6mFEKRsm5QAz6UdaDU9GYNzRrdnROTtf6dEmnKIzSJGY4LU8UAmIMTcTEvIwQj2mS34x47rPoOWcxOsjKfOz5ALDZGRofiJzKQBVnggng1SE3yJoTojtIggCE48ShaeJwtOCJc3CYCvGp7mJhdglg46rzLE0SJ+uZ6+pnwloRX5fE17I4LInD9Y8QPNYnhOlz8fwJi3Rh1IWF+RBF1dkXxeyJfiFHX5bES1m8KolXsujGJTWuqfOSOq+pdyX1TlYXJXEhi8uSuJTFdyXxnSy+3Rb747bYn6RY0X+xKJY7zhhNxGclf4WSGUyTl+evvk+TXn6t34UY6WbVCN174/Go3el1UlnG93pr1DWtQV0vDB2rb9oqQ+Hod2xjMNywHEneAtow2sN+W46CeKN3e/aorm9gjX5nYCkMG9qR3Rp21u1DKJSsXuGze+1WrS1ekdOzLOukV9cLg9Wy7e6RwlA47MFg0LdzFBozipHkpffGdvvEqEfRIqhrtFt9hb55AEbXsmqwtMRijSxzYOYsHAEsOXnxstjdfm8oB/FN/62+bdceIC+1v2ubg5bCsGE9GZwMj3MSwkDoyV0hRfvane5xV44iRdCoI55gjYVsfmnUs4xOjYWUWEaWbWWNra5EsciuzZtk9cndrDt939TTVDaLhSabs7V3b5a8WGHGzW6lfZtfPQE3wtfvFMKmeKgIh40wUMUCm+GhEh5ug/fqfq8p3lOEe40wnorFa4b3lPDeNnha99OmeKoIp40wVMVCm+GpEp5ug+d1P2+K54pw3gjDVSy8GZ4r4fk2eFL3k6Z4oggnjTBExUKa4YkSnmyBZ+hObPmynYJ4uxJWi1wdHXId4gTU9PxAkcsEJ1FNXp1AMt0NBFP+z2orQrOTA8B6tvMmWPfD9eak8t2k2cwZFPvVlXvFP0DiBMLQK7G1eSO2zUDsJL8VpMwLfEEq/joHWbXNCBb3RlHtiNOQKZ996sXV0aHZOjTNH1r7L75bH4weaV9qX2vPNFPraC+0l9qpdqFB7RftN+137Y/dX3f/3P1r9++V9eGD9ZwvtMq1+8//ia4zmg==</latexit>

⇡✓(a0|s0)<latexit sha1_base64="4j85n+ya1RERqomaZORIgNwngbY=">AAAOEnicfZdNb9s2HMbV7q3Lmi3djjtMWFCgG4JAShy/YChQS7bRw9pmQV66RUFA0bQsmBIJinLsajruE+xj7Loddhx23RfYPs0o2ZElirIu+YfPw8c//SUKpEuxH3HD+PfBw/fe/+DDjx59vPPJ491PP9t78vllRGIG0QUkmLC3LogQ9kN0wX2O0VvKEAhcjK7cmZ3pV3PEIp+E53xJ0U0AvNCf+BBwMXS795VD/VvH5VPEwTMHwHzU/NmJOODo1vzmdm/fODTyS68X5rrY19bX6e2Tx/85YwLjAIUcYhBF16ZB+U0CGPchRumOE0eIAjgDHrqO+aR7k/ghjTkKYao/Fdokxjonegarj32GIMdLUQDIfJGgwylgAlPc0k41KkIhCFB0MJ77NFqV0dxbFRyIftwki7xfaWVi4jFApz5cVMgSEEQB4NPaYLQM3OogijFi86A6mFEKRsm5QAz6UdaDU9GYNzRrdnROTtf6dEmnKIzSJGY4LU8UAmIMTcTEvIwQj2mS34x47rPoOWcxOsjKfOz5ALDZGRofiJzKQBVnggng1SE3yJoTojtIggCE48ShaeJwtOCJc3CYCvGp7mJhdglg46rzLE0SJ+uZ6+pnwloRX5fE17I4LInD9Y8QPNYnhOlz8fwJi3Rh1IWF+RBF1dkXxeyJfiFHX5bES1m8KolXsujGJTWuqfOSOq+pdyX1TlYXJXEhi8uSuJTFdyXxnSy+3Rb747bYn6RY0X+xKJY7zhhNxGclf4WSGUyTl+evvk+TXn6t34UY6WbVCN174/Go3el1UlnG93pr1DWtQV0vDB2rb9oqQ+Hod2xjMNywHEneAtow2sN+W46CeKN3e/aorm9gjX5nYCkMG9qR3Rp21u1DKJSsXuGze+1WrS1ekdOzLOukV9cLg9Wy7e6RwlA47MFg0LdzFBozipHkpffGdvvEqEfRIqhrtFt9hb55AEbXsmqwtMRijSxzYOYsHAEsOXnxstjdfm8oB/FN/62+bdceIC+1v2ubg5bCsGE9GZwMj3MSwkDoyV0hRfvane5xV44iRdCoI55gjYVsfmnUs4xOjYWUWEaWbWWNra5EsciuzZtk9cndrDt939TTVDaLhSabs7V3b5a8WGHGzW6lfZtfPQE3wtfvFMKmeKgIh40wUMUCm+GhEh5ug/fqfq8p3lOEe40wnorFa4b3lPDeNnha99OmeKoIp40wVMVCm+GpEp5ug+d1P2+K54pw3gjDVSy8GZ4r4fk2eFL3k6Z4oggnjTBExUKa4YkSnmyBZ+hObPmynYJ4uxJWi1wdHXId4gTU9PxAkcsEJ1FNXp1AMt0NBFP+z2orQrOTA8B6tvMmWPfD9eak8t2k2cwZFPvVlXvFP0DiBMLQK7G1eSO2zUDsJL8VpMwLfEEq/joHWbXNCBb3RlHtiNOQKZ996sXV0aHZOjTNH1r7L75bH4weaV9qX2vPNFPraC+0l9qpdqFB7RftN+137Y/dX3f/3P1r9++V9eGD9ZwvtMq1+8//pNUznA==</latexit>

⇡✓(a1|s1)<latexit sha1_base64="y2/EBqPf66V+UZmCJQzpbqka44c=">AAAOEnicfZdNb9s2HMbV7q3Lmi3djjtMWFCgG4JAShy/YChQS7bRw9pmQV66RUFA0bQsmBIJinLsajruE+xj7Loddhx23RfYPs0o2ZElirIu+YfPw8c//SUKpEuxH3HD+PfBw/fe/+DDjx59vPPJ491PP9t78vllRGIG0QUkmLC3LogQ9kN0wX2O0VvKEAhcjK7cmZ3pV3PEIp+E53xJ0U0AvNCf+BBwMXS795VD/VvH5VPEwTMHwHz06Gcn4oCj26Nvbvf2jUMjv/R6Ya6LfW19nd4+efyfMyYwDlDIIQZRdG0alN8kgHEfYpTuOHGEKIAz4KHrmE+6N4kf0pijEKb6U6FNYqxzomew+thnCHK8FAWAzBcJOpwCJjDFLe1UoyIUggBFB+O5T6NVGc29VcGB6MdNssj7lVYmJh4DdOrDRYUsAUEUAD6tDUbLwK0OohgjNg+qgxmlYJScC8SgH2U9OBWNeUOzZkfn5HStT5d0isIoTWKG0/JEISDG0ERMzMsI8Zgm+c2I5z6LnnMWo4OszMeeDwCbnaHxgcipDFRxJpgAXh1yg6w5IbqDJAhAOE4cmiYORwueOAeHqRCf6i4WZpcANq46z9IkcbKeua5+JqwV8XVJfC2Lw5I4XP8IwWN9Qpg+F8+fsEgXRl1YmA9RVJ19Ucye6Bdy9GVJvJTFq5J4JYtuXFLjmjovqfOaeldS72R1URIXsrgsiUtZfFcS38ni222xP26L/UmKFf0Xi2K544zRRHxW8lcomcE0eXn+6vs06eXX+l2IkW5WjdC9Nx6P2p1eJ5VlfK+3Rl3TGtT1wtCx+qatMhSOfsc2BsMNy5HkLaANoz3st+UoiDd6t2eP6voG1uh3BpbCsKEd2a1hZ90+hELJ6hU+u9du1driFTk9y7JOenW9MFgt2+4eKQyFwx4MBn07R6ExoxhJXnpvbLdPjHoULYK6RrvVV+ibB2B0LasGS0ss1sgyB2bOwhHAkpMXL4vd7feGchDf9N/q23btAfJS+7u2OWgpDBvWk8HJ8DgnIQyEntwVUrSv3eked+UoUgSNOuIJ1ljI5pdGPcvo1FhIiWVk2VbW2OpKFIvs2rxJVp/czbrT9009TWWzWGiyOVt792bJixVm3OxW2rf51RNwI3z9TiFsioeKcNgIA1UssBkeKuHhNniv7vea4j1FuNcI46lYvGZ4TwnvbYOndT9tiqeKcNoIQ1UstBmeKuHpNnhe9/OmeK4I540wXMXCm+G5Ep5vgyd1P2mKJ4pw0ghDVCykGZ4o4ckWeIbuxJYv2ymItythtcjV0SHXIU5ATc8PFLlMcBLV5NUJJNPdQDDl/6y2IjQ7OQCsZztvgnU/XG9OKt9Nms2cQbFfXblX/AMkTiAMvRJbmzdi2wzETvJbQcq8wBek4q9zkFXbjGBxbxTVjjgNmfLZp15cHR2arUPT/KG1/+K79cHokfal9rX2TDO1jvZCe6mdahca1H7RftN+1/7Y/XX3z92/dv9eWR8+WM/5Qqtcu//8D7/8M54=</latexit>

⇡✓(a2|s2)

<latexit sha1_base64="KIzHNcLYbod1+x/KccCLlzNOPGI=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe+P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wN8r/HQ</latexit>s0

<latexit sha1_base64="M8sqRzeDFh3Mi1iBGtqcTbyBm/Y=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk98b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gf1BfGy</latexit>a0

~<latexit sha1_base64="9xsJmqaI6QVO7aNPEsbsg9BoZ44=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL7837Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q8UNPHI</latexit>r1

<latexit sha1_base64="OzSwNdQHk8BIaE6qZCRw39GSBas=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe/P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wOJuPHR</latexit>s1

<latexit sha1_base64="faCoi3suTLeizhRJ44TVzLkE1Ww=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk9+b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gcCHfGz</latexit>a1

~<latexit sha1_base64="pd3L+wPVTqb/6ceddXPUL+1bOrE=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL70/unx0ax8by0quFuS4OtfV1fv98fzAcERgHKOQQgyi6NQ3K7xLAuA8xSveGcYQogFPgoduYj9t3iR/SmKMQpvpLoY1jrHOiZ1D6yGcIcrwQBYDMFwk6nAAGIBfoe+WoCIUgQNHRaObTaFVGM29VcCDu+y6ZL/uSliYmHgN04sN5iSwBQRQAPqkMRovALQ+iGCM2C8qDGaVglJxzxKAfZT04F415T7NWR5fkfK1PFnSCwihNYobT4kQhIMbQWExclhHiMU2WNyPWdxq94ixGR1m5HHvlADa9QKMjkVMaKOOMMQG8POQGWXNC9ABJEIBwlAxpmgw5mvNkeHScCvGl7mJhdglgo7LzIk2SYdYz19UvhLUkviuI72SxVxB76x8heKSPCdNnYv0Ji3Rh1IWF+RBF5dmDfPZYH8jRVwXxShavC+K1LLpxQY0r6qygzirqQ0F9kNV5QZzL4qIgLmTxU0H8JIs3u2I/7Ir9KMWK/otNsdgbjtBYvD6Wj1AyhWny5vLtH2nSWV7rZyFGulk2QndjPO03W51WKst4ozf6bdNyqnpuaFld01YZcke3ZRtOb8tyInlzaMNo9rpNOQrird7u2P2qvoU1ui3HUhi2tH270Wut24dQKFm93Gd3mo1KW7w8p2NZ1lmnqucGq2Hb7ROFIXfYjuN07SUKjRnFSPLSjbHZPDOqUTQPahvNRlehbxfAaFtWBZYWWKy+ZTrmkoUjgCUnzx8Wu93t9OQgvu2/1bXtygLyQvvbtuk0FIYt65lz1jtdkhAGQk/uCsnb12y1T9tyFMmD+i2xghUWsv2lfscyWhUWUmDpW7aVNba8E8UmuzXvktUrd7vv9ENTT1PZLDaabM723sYsebHCjOvdSvsuv3oCroWv3imEdfFQEQ5rYaCKBdbDQyU83AXvVf1eXbynCPdqYTwVi1cP7ynhvV3wtOqndfFUEU5rYaiKhdbDUyU83QXPq35eF88V4bwWhqtYeD08V8LzXfCk6id18UQRTmphiIqF1MMTJTwpw4s/j+zUDrCenXoJ1v1wfTAovbNodnyYQnFWXLlXN+4gcfpn6K04VrwXR1YgTnG/JkPAvMAPU/E14A2PsmqXEcw3RlGJDxFT/uyoFtcnx2bj2DT/bBy+/n39TfJU+177UftFM7WW9lp7o51rAw1qgfaX9rf2z/5/Bz8c/HTw88r6+NF6zgutdB389j8hPfHJ</latexit>r2~

~ ~

Environment

~

~

<latexit sha1_base64="1RcFirdzOoS5tyEl3SzBljfwnaY=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnS+5P7Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q+WwfHS</latexit>s2

<latexit sha1_base64="MJSdrcnRCn4KhDzKoML3O8YiGsk=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk9yf3zw6NY2N56dXCXBeH2vo6v3++PxiOCIwDFHKIQRTdmgbldwlg3IcYpXvDOEIUwCnw0G3Mx+27xA9pzFEIU/2l0MYx1jnRMyh95DMEOV6IAkDmiwQdTgADkAv0vXJUhEIQoOhoNPNptCqjmbcqOBD3fZfMl31JSxMTjwE68eG8RJaAIAoAn1QGo0XglgdRjBGbBeXBjFIwSs45YtCPsh6ci8a8p1mro0tyvtYnCzpBYZQmMcNpcaIQEGNoLCYuywjxmCbLmxHrO41ecRajo6xcjr1yAJteoNGRyCkNlHHGmABeHnKDrDkheoAkCEA4SoY0TYYczXkyPDpOhfhSd7EwuwSwUdl5kSbJMOuZ6+oXwloS3xXEd7LYK4i99Y8QPNLHhOkzsf6ERbow6sLCfIii8uxBPnusD+Toq4J4JYvXBfFaFt24oMYVdVZQZxX1oaA+yOq8IM5lcVEQF7L4qSB+ksWbXbEfdsV+lGJF/8WmWOwNR2gsXh/LRyiZwjR5c/n2jzTpLK/1sxAj3SwbobsxnvabrU4rlWW80Rv9tmk5VT03tKyuaasMuaPbsg2nt2U5kbw5tGE0e92mHAXxVm937H5V38Ia3ZZjKQxb2r7d6LXW7UMolKxe7rM7zUalLV6e07Es66xT1XOD1bDt9onCkDtsx3G69hKFxoxiJHnpxthsnhnVKJoHtY1mo6vQtwtgtC2rAksLLFbfMh1zycIRwJKT5w+L3e52enIQ3/bf6tp2ZQF5of1t23QaCsOW9cw5650uSQgDoSd3heTta7bap205iuRB/ZZYwQoL2f5Sv2MZrQoLKbD0LdvKGlveiWKT3Zp3yeqVu913+qGpp6lsFhtNNmd7b2OWvFhhxvVupX2XXz0B18JX7xTCunioCIe1MFDFAuvhoRIe7oL3qn6vLt5ThHu1MJ6KxauH95Tw3i54WvXTuniqCKe1MFTFQuvhqRKe7oLnVT+vi+eKcF4Lw1UsvB6eK+H5LnhS9ZO6eKIIJ7UwRMVC6uGJEp6U4cWfR3ZqB1jPTr0E6364PhiU3lk0Oz5MoTgrrtyrG3eQOP0z9FYcK96LIysQp7hfkyFgXuCHqfga8IZHWbXLCOYbo6jEh4gpf3ZUi+uTY7NxbJp/Ng5f/77+Jnmqfa/9qP2imVpLe6290c61gQa1QPtL+1v7Z/+/gx8Ofjr4eWV9/Gg954VWug5++x8PJvG0</latexit>a2

Agent <latexit sha1_base64="JYRotR/AW1We/S2YX7ncehvQmNA=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIjeOXQ4Faso0e1jbL8rbVQUDRtCKYEgmScuwKuu0L7Lp9gx2HXfdB9gH2PUbJtiyRknXJP3wePv7pT8kmPYoDLizr3wcPP/r4k08/e/T53hdfPt5/8vTZV5ecxAyiC0gwYdce4AgHEboQgcDomjIEQg+jK2/mZvrVHDEekOhcLCm6CYEfBdMAAiGHrsaeuEMC3D49sI6s/DL1wl4XB8b6Or199vi/8YTAOESRgBhw/t62qLhJABMBxCjdG8ccUQBnwEfvYzHt3iRBRGOBIpiaz6U2jbEpiJkhmZOAISjwUhYAskAmmPAOMACFBN+rRnEUgRDxw8k8oHxV8rm/KgSQd32TLPKupJWJic8AvQvgokKWgJCHQNxpg3wZetVBFGPE5mF1MKOUjIpzgRgMeNaDU9mYdzRrND8np2v9bknvUMTTJGY4LU+UAmIMTeXEvORIxDTJb0au7oy/FCxGh1mZj70cADY7Q5NDmVMZqOJMMQGiOuSFWXMidA9JGIJokoxpmowFWohkfHiUSvG56WFp9ghgk6rzLE2ScdYzzzPPpLUivi2Jb1VxWBKH6w8heGJOCTPncv0J46Y0mtLCAoh4dfZFMXtqXqjRlyXxUhWvSuKVKnpxSY01dV5S55p6X1LvVXVREhequCyJS1X8UBI/qOL1rtifd8X+osTK/suXYrk3nqCp/PLIH6FkBtPk9fmbH9Kkl1/rZyFGpl01Qm9jPB61O71Oqsp4o7dGXdsZ6Hph6Dh9260zFI5+x7UGwy3LC8VbQFtWe9hvq1EQb/Vuzx3p+hbW6ncGTo1hSztyW8POun0IRYrVL3xur93S2uIXOT3HcU56ul4YnJbrdl/UGAqHOxgM+m6OQmNGMVK8dGNst08sPYoWQV2r3erX6NsFsLqOo8HSEoszcuyBnbMIBLDiFMXD4nb7vaEaJLb9d/quqy2gKLW/69qDVo1hy3oyOBke5ySEgchXu0KK9rU73eOuGkWKoFFHrqDGQrafNOo5VkdjISWWkeM6WWOrb6J8yd7bN8nqK3f73pkHtpmmqlm+aKo5e/c2ZsWLa8y42V1r3+Wvn4Ab4fU7hbApHtaEw0YYWMcCm+FhLTzcBe/rfr8p3q8J9xth/DoWvxner4X3d8FT3U+b4mlNOG2EoXUstBme1sLTXfBC94umeFETLhphRB2LaIYXtfBiFzzR/aQpntSEk0YYUsdCmuFJLTzZAc/QvdzyZTsF+XQlTIuUe3K5m811iBOg6VwAgXKZ4IRr8uq0keleKJnyf3QPiAuHLDUdUb7RZRlgyaN6JgGHJI5ETkJxMvaBVNJi1zMJ5LnFRFwEYX4OUm4ChPLHdHOTcienxvt8ut1MJX56m4yJ3LADuYfNTiLJT6NUn0MnO+ecDtI1H80OUQCb2SGEYDOI1vu0yk8IzcJmUG7dV+7VUg6QPIwx9EZ+yLt1+Pdy0ZgfBnLR5N/xYVbtMoLFxiirPXkwtNVjoF5cvTiyW0e2/WPr4FV7fUZ8ZHxjfGt8Z9hGx3hlvDZOjQsDGjPjN+N344/9X/f/3P9r/++V9eGD9Zyvjcq1/8//cW+XHQ==</latexit>

Environment

Agent

<latexit sha1_base64="0ikfrFZg78atJ8tLBb9MSbBADcA=">AAANl3icfZfbbts2HMbV7tR59dZuNwN2IywoMAxBICWODxg61JJs5GJt0yCnNjYCiqYVwZRIUJQPFfQUu90ebG8zynZkiaKsK4Lfx08//UkKpEuxH3HD+O/J0y++/Orrb5592/juefP7H168/PE6IjGD6AoSTNitCyKE/RBdcZ9jdEsZAoGL0Y07szP9Zo5Y5JPwkq8oGgfAC/2pDwEXXR9HDC0Am9yf3L84MI6M9aNXG+a2caBtn/P7l88XowmBcYBCDjGIojvToHycAMZ9iFHaGMURogDOgIfuYj7tjhM/pDFHIUz1V0KbxljnRM+g9InPEOR4JRoAMl8k6PABMAC5QG+UoyIUggBFh5O5T6NNM5p7mwYH4rvHyXJdl7Q0MPEYoA8+XJbIEhBEAeAPlc5oFbjlThRjxOZBuTOjFIySc4kY9KOsBueiMO9pVurokpxv9YcVfUBhlCYxw2lxoBAQY2gqBq6bEeIxTdYfI+Z3Fr3mLEaHWXPd99oBbHaBJocip9RRxpliAni5yw2y4oRoAUkQgHCSjGiajDha8mR0eJQK8ZXuYmF2iVgdZedFmiSjrGauq18Ia0l8VxDfyeKgIA62LyF4ok8J0+di/gmLdGHUhYX5EEXl0Vf56Kl+JUdfF8RrWbwpiDey6MYFNa6o84I6r6iLgrqQ1WVBXMriqiCuZPFzQfwsi7f7Yj/ui/0kxYr6i02xaowmaCp+H+sllMxgmpxdvv0rTXrrZ7sWYqSbZSN0H40nw3an10llGT/qrWHXtJyqnhs6Vt+0VYbc0e/YhjPYsRxL3hzaMNqDfluOgnind3v2sKrvYI1+x7EUhh3t0G4NOtvyIRRKVi/32b12q1IWL8/pWZZ12qvqucFq2Xb3WGHIHbbjOH17jUJjRjGSvPTR2G6fGtUomgd1jXarr9B3E2B0LasCSwss1tAyHXPNwhHAkpPni8Xu9nsDOYjv6m/1bbsygbxQ/q5tOi2FYcd66pwOTtYkhIHQk6tC8vK1O92TrhxF8qBhR8xghYXs3jTsWUanwkIKLEPLtrLClnei2GR35jjZ/HJ3+04/MPU0lc1io8nmbO89miUvVphxvVtp3+dXD8C18NUvhbAuHirCYS0MVLHAeniohIf74L2q36uL9xThXi2Mp2Lx6uE9Jby3D55W/bQunirCaS0MVbHQeniqhKf74HnVz+viuSKc18JwFQuvh+dKeL4PnlT9pC6eKMJJLQxRsZB6eKKEJ3vgNxeC7KQgVlfC0s05gWbHeoD17FhMsO6H25ND6adGs1EzKA6TG/cm3EHiesDQW3HueC/OtEAc835PRoB5gR+m4rrgjQ6z1j4jWD4aRashriqmfDGpNm6Oj8zWkWl+aB28+WN7a3mm/aL9qv2mmVpHe6OdaefalQa1QPtb+0f7t/lz88/msHm2sT59sh3zk1Z6mh/+B1b9/uQ=</latexit>r3

<latexit sha1_base64="y2/EBqPf66V+UZmCJQzpbqka44c=">AAAOEnicfZdNb9s2HMbV7q3Lmi3djjtMWFCgG4JAShy/YChQS7bRw9pmQV66RUFA0bQsmBIJinLsajruE+xj7Loddhx23RfYPs0o2ZElirIu+YfPw8c//SUKpEuxH3HD+PfBw/fe/+DDjx59vPPJ491PP9t78vllRGIG0QUkmLC3LogQ9kN0wX2O0VvKEAhcjK7cmZ3pV3PEIp+E53xJ0U0AvNCf+BBwMXS795VD/VvH5VPEwTMHwHz06Gcn4oCj26Nvbvf2jUMjv/R6Ya6LfW19nd4+efyfMyYwDlDIIQZRdG0alN8kgHEfYpTuOHGEKIAz4KHrmE+6N4kf0pijEKb6U6FNYqxzomew+thnCHK8FAWAzBcJOpwCJjDFLe1UoyIUggBFB+O5T6NVGc29VcGB6MdNssj7lVYmJh4DdOrDRYUsAUEUAD6tDUbLwK0OohgjNg+qgxmlYJScC8SgH2U9OBWNeUOzZkfn5HStT5d0isIoTWKG0/JEISDG0ERMzMsI8Zgm+c2I5z6LnnMWo4OszMeeDwCbnaHxgcipDFRxJpgAXh1yg6w5IbqDJAhAOE4cmiYORwueOAeHqRCf6i4WZpcANq46z9IkcbKeua5+JqwV8XVJfC2Lw5I4XP8IwWN9Qpg+F8+fsEgXRl1YmA9RVJ19Ucye6Bdy9GVJvJTFq5J4JYtuXFLjmjovqfOaeldS72R1URIXsrgsiUtZfFcS38ni222xP26L/UmKFf0Xi2K544zRRHxW8lcomcE0eXn+6vs06eXX+l2IkW5WjdC9Nx6P2p1eJ5VlfK+3Rl3TGtT1wtCx+qatMhSOfsc2BsMNy5HkLaANoz3st+UoiDd6t2eP6voG1uh3BpbCsKEd2a1hZ90+hELJ6hU+u9du1driFTk9y7JOenW9MFgt2+4eKQyFwx4MBn07R6ExoxhJXnpvbLdPjHoULYK6RrvVV+ibB2B0LasGS0ss1sgyB2bOwhHAkpMXL4vd7feGchDf9N/q23btAfJS+7u2OWgpDBvWk8HJ8DgnIQyEntwVUrSv3eked+UoUgSNOuIJ1ljI5pdGPcvo1FhIiWVk2VbW2OpKFIvs2rxJVp/czbrT9009TWWzWGiyOVt792bJixVm3OxW2rf51RNwI3z9TiFsioeKcNgIA1UssBkeKuHhNniv7vea4j1FuNcI46lYvGZ4TwnvbYOndT9tiqeKcNoIQ1UstBmeKuHpNnhe9/OmeK4I540wXMXCm+G5Ep5vgyd1P2mKJ4pw0ghDVCykGZ4o4ckWeIbuxJYv2ymItythtcjV0SHXIU5ATc8PFLlMcBLV5NUJJNPdQDDl/6y2IjQ7OQCsZztvgnU/XG9OKt9Nms2cQbFfXblX/AMkTiAMvRJbmzdi2wzETvJbQcq8wBek4q9zkFXbjGBxbxTVjjgNmfLZp15cHR2arUPT/KG1/+K79cHokfal9rX2TDO1jvZCe6mdahca1H7RftN+1/7Y/XX3z92/dv9eWR8+WM/5Qqtcu//8D7/8M54=</latexit>

⇡✓(a2|s2)

<latexit sha1_base64="yadDSl6u/Z0oEAShEoO9Io7EzIg=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnS+9P7Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q+jyvHT</latexit>s3

MDPs can be illustrated using a variant of computation graphs. These represent the dependencies between how the different components of the MDP are generated step by step. This is done by following the lines in the graph. Dotted lines denote non-differentiable computation. A node with the tilde (~) operation is a sampling step.

First, we generate the initial state s_0 from the initial state distribution. This is not conditioned on any input, which is represented by the fact that there are no incoming arrows. Next, we pass the generated initial state through the policy network \pi_\theta to get a probability distribution over possible actions. From this distribution, we generate the first action a_0.

Following the arrows, we pass the initial state s_0 and the first action a_0 to the environment and generate the next state s_1 and the associated reward r_1 of this state.

The process repeats from this point: Generate an action from the policy, and use it together with the state to generate the next state and reward.

Page 5: Lecture 9: Reinforcement Learning Emile van Krieken · 2021. 1. 25. · Emile van Krieken Deep Learning 2020 Lecture 9: Reinforcement Learning dlvu.github.io Reinforcement learning

• Full observability of state

• Partial observability: POMDP • Out of scope!

OBSERVABILITY

12

<latexit sha1_base64="KIzHNcLYbod1+x/KccCLlzNOPGI=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe+P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wN8r/HQ</latexit>s0<latexit sha1_base64="M8sqRzeDFh3Mi1iBGtqcTbyBm/Y=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk98b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gf1BfGy</latexit>a0

~<latexit sha1_base64="9xsJmqaI6QVO7aNPEsbsg9BoZ44=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL7837Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q8UNPHI</latexit>r1

<latexit sha1_base64="OzSwNdQHk8BIaE6qZCRw39GSBas=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe/P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wOJuPHR</latexit>s1<latexit sha1_base64="faCoi3suTLeizhRJ44TVzLkE1Ww=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk9+b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gcCHfGz</latexit>a1

~<latexit sha1_base64="pd3L+wPVTqb/6ceddXPUL+1bOrE=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL70/unx0ax8by0quFuS4OtfV1fv98fzAcERgHKOQQgyi6NQ3K7xLAuA8xSveGcYQogFPgoduYj9t3iR/SmKMQpvpLoY1jrHOiZ1D6yGcIcrwQBYDMFwk6nAAGIBfoe+WoCIUgQNHRaObTaFVGM29VcCDu+y6ZL/uSliYmHgN04sN5iSwBQRQAPqkMRovALQ+iGCM2C8qDGaVglJxzxKAfZT04F415T7NWR5fkfK1PFnSCwihNYobT4kQhIMbQWExclhHiMU2WNyPWdxq94ixGR1m5HHvlADa9QKMjkVMaKOOMMQG8POQGWXNC9ABJEIBwlAxpmgw5mvNkeHScCvGl7mJhdglgo7LzIk2SYdYz19UvhLUkviuI72SxVxB76x8heKSPCdNnYv0Ji3Rh1IWF+RBF5dmDfPZYH8jRVwXxShavC+K1LLpxQY0r6qygzirqQ0F9kNV5QZzL4qIgLmTxU0H8JIs3u2I/7Ir9KMWK/otNsdgbjtBYvD6Wj1AyhWny5vLtH2nSWV7rZyFGulk2QndjPO03W51WKst4ozf6bdNyqnpuaFld01YZcke3ZRtOb8tyInlzaMNo9rpNOQrird7u2P2qvoU1ui3HUhi2tH270Wut24dQKFm93Gd3mo1KW7w8p2NZ1lmnqucGq2Hb7ROFIXfYjuN07SUKjRnFSPLSjbHZPDOqUTQPahvNRlehbxfAaFtWBZYWWKy+ZTrmkoUjgCUnzx8Wu93t9OQgvu2/1bXtygLyQvvbtuk0FIYt65lz1jtdkhAGQk/uCsnb12y1T9tyFMmD+i2xghUWsv2lfscyWhUWUmDpW7aVNba8E8UmuzXvktUrd7vv9ENTT1PZLDaabM723sYsebHCjOvdSvsuv3oCroWv3imEdfFQEQ5rYaCKBdbDQyU83AXvVf1eXbynCPdqYTwVi1cP7ynhvV3wtOqndfFUEU5rYaiKhdbDUyU83QXPq35eF88V4bwWhqtYeD08V8LzXfCk6id18UQRTmphiIqF1MMTJTwpw4s/j+zUDrCenXoJ1v1wfTAovbNodnyYQnFWXLlXN+4gcfpn6K04VrwXR1YgTnG/JkPAvMAPU/E14A2PsmqXEcw3RlGJDxFT/uyoFtcnx2bj2DT/bBy+/n39TfJU+177UftFM7WW9lp7o51rAw1qgfaX9rf2z/5/Bz8c/HTw88r6+NF6zgutdB389j8hPfHJ</latexit>r2

~

~ ~

~

~

<latexit sha1_base64="1RcFirdzOoS5tyEl3SzBljfwnaY=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnS+5P7Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q+WwfHS</latexit>s2<latexit sha1_base64="MJSdrcnRCn4KhDzKoML3O8YiGsk=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk9yf3zw6NY2N56dXCXBeH2vo6v3++PxiOCIwDFHKIQRTdmgbldwlg3IcYpXvDOEIUwCnw0G3Mx+27xA9pzFEIU/2l0MYx1jnRMyh95DMEOV6IAkDmiwQdTgADkAv0vXJUhEIQoOhoNPNptCqjmbcqOBD3fZfMl31JSxMTjwE68eG8RJaAIAoAn1QGo0XglgdRjBGbBeXBjFIwSs45YtCPsh6ci8a8p1mro0tyvtYnCzpBYZQmMcNpcaIQEGNoLCYuywjxmCbLmxHrO41ecRajo6xcjr1yAJteoNGRyCkNlHHGmABeHnKDrDkheoAkCEA4SoY0TYYczXkyPDpOhfhSd7EwuwSwUdl5kSbJMOuZ6+oXwloS3xXEd7LYK4i99Y8QPNLHhOkzsf6ERbow6sLCfIii8uxBPnusD+Toq4J4JYvXBfFaFt24oMYVdVZQZxX1oaA+yOq8IM5lcVEQF7L4qSB+ksWbXbEfdsV+lGJF/8WmWOwNR2gsXh/LRyiZwjR5c/n2jzTpLK/1sxAj3SwbobsxnvabrU4rlWW80Rv9tmk5VT03tKyuaasMuaPbsg2nt2U5kbw5tGE0e92mHAXxVm937H5V38Ia3ZZjKQxb2r7d6LXW7UMolKxe7rM7zUalLV6e07Es66xT1XOD1bDt9onCkDtsx3G69hKFxoxiJHnpxthsnhnVKJoHtY1mo6vQtwtgtC2rAksLLFbfMh1zycIRwJKT5w+L3e52enIQ3/bf6tp2ZQF5of1t23QaCsOW9cw5650uSQgDoSd3heTta7bap205iuRB/ZZYwQoL2f5Sv2MZrQoLKbD0LdvKGlveiWKT3Zp3yeqVu913+qGpp6lsFhtNNmd7b2OWvFhhxvVupX2XXz0B18JX7xTCunioCIe1MFDFAuvhoRIe7oL3qn6vLt5ThHu1MJ6KxauH95Tw3i54WvXTuniqCKe1MFTFQuvhqRKe7oLnVT+vi+eKcF4Lw1UsvB6eK+H5LnhS9ZO6eKIIJ7UwRMVC6uGJEp6U4cWfR3ZqB1jPTr0E6364PhiU3lk0Oz5MoTgrrtyrG3eQOP0z9FYcK96LIysQp7hfkyFgXuCHqfga8IZHWbXLCOYbo6jEh4gpf3ZUi+uTY7NxbJp/Ng5f/77+Jnmqfa/9qP2imVpLe6290c61gQa1QPtL+1v7Z/+/gx8Ofjr4eWV9/Gg954VWug5++x8PJvG0</latexit>a2

<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>

⇡✓<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>

⇡✓<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>

⇡✓

MDPs contain two important assumptions. These assumptions are used to develop efficient algorithms, but are often not a good model of the real world! The first one is full observability of states. We assume that we receive from the environment all there is to know about the current state of the environment. An example of an environment with full observability is chess: Both our agent and their opponent know everything there is to know about the current state of the game by just observing the game board. This is often a wrong assumption: Consider the game of poker, where our agent only knows their hand and the open cards, and not the hands of other players. Poker is an example of a partially observable MDP (POMDP), where our agent can only observe a small part of the environment, or its observations are noisy. POMDPs are much more complex to work with than MDPs, but there is a lot of literature out there! For this lecture, it’s out of scope, however.

Markov assumption: independent of history given :

• Used to derive strong algorithms! • Fundamental assumption behind RL

<latexit sha1_base64="hFxcxLt2af6z65PzziqJZ9qXfU0=">AAAN+XicfZdNb9s2HMbV7q3L6rXdjrsICwoMQxBIieOXQ4Fako0e1jYL8rbFQUDRtCyYEgmKcuwI+hK7bocdh133abZPM0p2ZImirEsYPg8f//SXKPzpUuxH3DD+ffL0k08/+/yLZ1/uffW89fWLl6++uYxIzCC6gAQTdu2CCGE/RBfc5xhdU4ZA4GJ05c7tTL9aIBb5JDznK4puA+CF/tSHgIup63HEAUd3/O7lvnFo5JdeH5ibwb62uU7vXj3/bzwhMA5QyCEGUXRjGpTfJoBxH2KU7o3jCFEA58BDNzGf9m4TP6QxRyFM9ddCm8ZY50TPmPSJzxDkeCUGADJfJOhwBhiAXJDvVaMiFIIARQeThU+j9TBaeOsBB+K2b5NlXpa0sjDxGKAzHy4rZAkIogDwWW0yWgVudRLFGLFFUJ3MKAWj5FwiBv0oq8GpKMxHmlU6OienG322ojMURmkSM5yWFwoBMYamYmE+jBCPaZLfjHi88+gNZzE6yIb53BsHsPkZmhyInMpEFWeKCeDVKTfIihOie0iCAISTZEzTZMzRkifjg8NUiK91FwuzSwCbVJ1naZKMs5q5rn4mrBXxQ0n8IIvDkjjc/AjBE31KmL4Qz5+wSBdGXViYD1FUXX1RrJ7qF3L0ZUm8lMWrkngli25cUuOauiipi5p6X1LvZXVZEpeyuCqJK1l8KIkPsni9K/aXXbG/SrGi/mJTrPbGEzQVX4/8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Nx6NOt99NZRk/6u1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76aOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd779EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqen6eyGWCk6gmu3yGOMh0NxBM+T/rVoRmJweA9azzJlj3w01zUvlu0mzlHIp+de1e8ztInEAYei9am4+ibQaik/xRkDIv8AWp+Ds+yEa7jGD5aBSjPXEaMuWzT31wdXRotg9N8+f2/tvO5mD0TPtO+177QTO1rvZWe6edahca1LD2m/a79kfrofVn66/W32vr0yebNd9qlav1z/9NHCok</latexit>st<latexit sha1_base64="+i+xB44Y0hcRWpnFl3QGzjOKgDk=">AAAN/XicfZdNb9s2HMbV7q3L6rXdjrsICwoMQxZIieOXQ4Fako0e1jYL8rbFQUDRtCyYEgmKcuwK2sfYdTvsOOy6z7J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHH338yaefPfl874unrS+fPX/x1WVEYgbRBSSYsGsXRAj7IbrgPsfomjIEAhejK3duZ/rVArHIJ+E5X1F0GwAv9Kc+BFxMjccRBxzdJfwHM717vm8cGvml1wfmZrCvba7TuxdP/xtPCIwDFHKIQRTdmAbltwlg3IcYpXvjOEIUwDnw0E3Mp73bxA9pzFEIU/2l0KYx1jnRMy594jMEOV6JAYDMFwk6nAEGIBf0e9WoCIUgQNHBZOHTaD2MFt56wIG49dtkmZcmrSxMPAbozIfLClkCgigAfFabjFaBW51EMUZsEVQnM0rBKDmXiEE/ympwKgrznmbVjs7J6UafregMhVGaxAyn5YVCQIyhqViYDyPEY5rkNyMe8Tx6xVmMDrJhPvfKAWx+hiYHIqcyUcWZYgJ4dcoNsuKE6B6SIADhJBnTNBlztOTJ+OAwFeJL3cXC7BLAJlXnWZok46xmrqufCWtFfFcS38nisCQONz9C8ESfEqYvxPMnLNKFURcW5kMUVVdfFKun+oUcfVkSL2XxqiReyaIbl9S4pi5K6qKm3pfUe1ldlsSlLK5K4koWP5TED7J4vSv2512xv0ixov5iU6z2xhM0FV+Q/BVK5jBN3py//TFN+vm1eRdipJtVI3QfjMejTrffTWUZP+jtUc+0nLpeGLrWwLRVhsIx6NqGM9yyHEneAtowOsNBR46CeKv3+vaorm9hjUHXsRSGLe3Ibg+7m/IhFEpWr/DZ/U67VhavyOlblnXSr+uFwWrbdu9IYSgctuM4AztHoTGjGEle+mDsdE6MehQtgnpGpz1Q6NsHYPQsqwZLSyzWyDIdM2fhCGDJyYuXxe4N+kM5iG/rbw1su/YAean8Pdt02grDlvXEORke5ySEgdCTq0KK8nW6veOeHEWKoFFXPMEaC9n+0qhvGd0aCymxjCzbygpb3Ylik92Yt8n6k7vdd/q+qaepbBYbTTZne+/BLHmxwoyb3Ur7Lr96AW6Er98phE3xUBEOG2GgigU2w0MlPNwF79X9XlO8pwj3GmE8FYvXDO8p4b1d8LTup03xVBFOG2GoioU2w1MlPN0Fz+t+3hTPFeG8EYarWHgzPFfC813wpO4nTfFEEU4aYYiKhTTDEyU82QHP0L1o+bJOQbxdCatFip5cdLO5DnECanp+pshlgpOoJrt8hjjIdDcQTPk/61aEZicHgPWs8yZY98NNc1L5btJs5RyKfnXtXvM7SJxAGHorWpv3om0GopP8XpAyL/AFqfg7PshGu4xg+WAUoz1xGjLls099cHV0aLYPTfOn9v7rzuZg9ET7RvtW+04zta72WnujnWoXGtSo9pv2u/ZH69fWn62/Wn+vrY8fbdZ8rVWu1j//Az0sK6I=</latexit>st-1

<latexit sha1_base64="09xIn/l5oG3MK3LZwLHHoihf5iw=">AAAOUXichZdNc6M2HMbZ9G2b3aTZ9tgL08zObDuuBxLHL4fMrAF79tDdTTN5a+M0I2QZMxZII4RjL+Xb9RP01n6EXttDjxXYwSDA5WKh59HjH38kRrIpdgOuaX882fno408+/ezp57vPnu/tf3Hw4surgIQMoktIMGE3NggQdn10yV2O0Q1lCHg2Rtf2zEz06zligUv8C76k6M4Dju9OXAi46Lo/+IW+GgUccHTPfx0BmHZG/Hs9bqjrfr2hNpvN7DYVvz3932Er3/3BodbU0kstN/R141BZX2f3L57/ORoTGHrI5xCDILjVNcrvIsC4CzGKd0dhgCiAM+Cg25BPuneR69OQIx/G6kuhTUKscqImz6qOXYYgx0vRAJC5IkGFU8AEr6jIbjEqQD7wUNAYz10arJrB3Fk1OBDlvIsWabnjwsDIYYBOXbgokEXACzzAp6XOYOnZxU4UYsTmXrEzoRSMknOBGHSDpAZnojDvaVL14IKcrfXpkk6RH8RRyHCcHygExBiaiIFpM0A8pFH6MGLazIJTzkLUSJpp36kF2OwcjRsip9BRxJlgAnixy/aS4vjoARLPA/44GtE4GnG04NGo0YyF+FK1sTDbBLBx0XkeR9EoqZltq+fCWhDf5cR3sjjIiYP1nxA8VieEqXPx/gkLVGFUhYW5EAXF0ZfZ6Il6KUdf5cQrWbzOideyaIc5NSyp85w6L6kPOfVBVhc5cSGLy5y4lMUPOfGDLN5si/1pW+zPUqyov1gUy93RGE3EVymdQtEMxtGbi7c/xFEvvdZzIUSqXjRC+9F4PGx3ep1YlvGj3hp2dcMq65mhY/R1s8qQOfodU7MGG5YjyZtBa1p70G/LURBv9G7PHJb1DazW71hGhWFDOzRbg866fAj5ktXJfGav3SqVxclyeoZhnPTKemYwWqbZPaowZA7Tsqy+maLQkFGMJC99NLbbJ1o5imZBXa3d6lfomxegdQ2jBEtzLMbQ0C09ZeEIYMnJs8lidvu9gRzEN/U3+qZZeoE8V/6uqVutCsOG9cQ6GRynJIQB35GrQrLytTvd464cRbKgYUe8wRIL2fzTsGdonRILybEMDdNICltciWKR3ep30eqTu1l36qGuxrFsFgtNNidr79EseXGFGde7K+3b/NUDcC18+UkhrIuHFeGwFgZWscB6eFgJD7fBO2W/UxfvVIQ7tTBOFYtTD+9Uwjvb4GnZT+viaUU4rYWhVSy0Hp5WwtNt8Lzs53XxvCKc18LwKhZeD88r4fk2eFL2k7p4UhFOamFIFQuphyeV8GQLPEMPYsuX7BTE7IpYKXJ1hkh1iCNQ0tNDRSoTHAUl2eZTxEGi255gSm9WWxGanBwAVpOdN8Gq6683J4XvJk1GzqDYr67cK34LiRMIQ2/F1ua92DYDsZP8TpAyx3MFqfgdNZLWNiNYPBpFa1echnT57FNuXB819VZT139sHb5urw9GT5WvlW+UV4qudJTXyhvlTLlUoPKb8pfyt/LP3u97/+4r+zsr686T9ZivlMK1/+w/Dy1I1A==</latexit>

p(st|at-1, s1, ..., st-1) = p(st|at-1, st-1)

MARKOV ASSUMPTION

13

<latexit sha1_base64="KIzHNcLYbod1+x/KccCLlzNOPGI=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe+P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wN8r/HQ</latexit>s0<latexit sha1_base64="M8sqRzeDFh3Mi1iBGtqcTbyBm/Y=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk98b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gf1BfGy</latexit>a0

~<latexit sha1_base64="9xsJmqaI6QVO7aNPEsbsg9BoZ44=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL7837Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q8UNPHI</latexit>r1

<latexit sha1_base64="OzSwNdQHk8BIaE6qZCRw39GSBas=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe/P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wOJuPHR</latexit>s1<latexit sha1_base64="faCoi3suTLeizhRJ44TVzLkE1Ww=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk9+b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gcCHfGz</latexit>a1

~<latexit sha1_base64="pd3L+wPVTqb/6ceddXPUL+1bOrE=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL70/unx0ax8by0quFuS4OtfV1fv98fzAcERgHKOQQgyi6NQ3K7xLAuA8xSveGcYQogFPgoduYj9t3iR/SmKMQpvpLoY1jrHOiZ1D6yGcIcrwQBYDMFwk6nAAGIBfoe+WoCIUgQNHRaObTaFVGM29VcCDu+y6ZL/uSliYmHgN04sN5iSwBQRQAPqkMRovALQ+iGCM2C8qDGaVglJxzxKAfZT04F415T7NWR5fkfK1PFnSCwihNYobT4kQhIMbQWExclhHiMU2WNyPWdxq94ixGR1m5HHvlADa9QKMjkVMaKOOMMQG8POQGWXNC9ABJEIBwlAxpmgw5mvNkeHScCvGl7mJhdglgo7LzIk2SYdYz19UvhLUkviuI72SxVxB76x8heKSPCdNnYv0Ji3Rh1IWF+RBF5dmDfPZYH8jRVwXxShavC+K1LLpxQY0r6qygzirqQ0F9kNV5QZzL4qIgLmTxU0H8JIs3u2I/7Ir9KMWK/otNsdgbjtBYvD6Wj1AyhWny5vLtH2nSWV7rZyFGulk2QndjPO03W51WKst4ozf6bdNyqnpuaFld01YZcke3ZRtOb8tyInlzaMNo9rpNOQrird7u2P2qvoU1ui3HUhi2tH270Wut24dQKFm93Gd3mo1KW7w8p2NZ1lmnqucGq2Hb7ROFIXfYjuN07SUKjRnFSPLSjbHZPDOqUTQPahvNRlehbxfAaFtWBZYWWKy+ZTrmkoUjgCUnzx8Wu93t9OQgvu2/1bXtygLyQvvbtuk0FIYt65lz1jtdkhAGQk/uCsnb12y1T9tyFMmD+i2xghUWsv2lfscyWhUWUmDpW7aVNba8E8UmuzXvktUrd7vv9ENTT1PZLDaabM723sYsebHCjOvdSvsuv3oCroWv3imEdfFQEQ5rYaCKBdbDQyU83AXvVf1eXbynCPdqYTwVi1cP7ynhvV3wtOqndfFUEU5rYaiKhdbDUyU83QXPq35eF88V4bwWhqtYeD08V8LzXfCk6id18UQRTmphiIqF1MMTJTwpw4s/j+zUDrCenXoJ1v1wfTAovbNodnyYQnFWXLlXN+4gcfpn6K04VrwXR1YgTnG/JkPAvMAPU/E14A2PsmqXEcw3RlGJDxFT/uyoFtcnx2bj2DT/bBy+/n39TfJU+177UftFM7WW9lp7o51rAw1qgfaX9rf2z/5/Bz8c/HTw88r6+NF6zgutdB389j8hPfHJ</latexit>r2

~

~ ~

~

~

<latexit sha1_base64="1RcFirdzOoS5tyEl3SzBljfwnaY=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnS+5P7Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q+WwfHS</latexit>s2<latexit sha1_base64="MJSdrcnRCn4KhDzKoML3O8YiGsk=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk9yf3zw6NY2N56dXCXBeH2vo6v3++PxiOCIwDFHKIQRTdmgbldwlg3IcYpXvDOEIUwCnw0G3Mx+27xA9pzFEIU/2l0MYx1jnRMyh95DMEOV6IAkDmiwQdTgADkAv0vXJUhEIQoOhoNPNptCqjmbcqOBD3fZfMl31JSxMTjwE68eG8RJaAIAoAn1QGo0XglgdRjBGbBeXBjFIwSs45YtCPsh6ci8a8p1mro0tyvtYnCzpBYZQmMcNpcaIQEGNoLCYuywjxmCbLmxHrO41ecRajo6xcjr1yAJteoNGRyCkNlHHGmABeHnKDrDkheoAkCEA4SoY0TYYczXkyPDpOhfhSd7EwuwSwUdl5kSbJMOuZ6+oXwloS3xXEd7LYK4i99Y8QPNLHhOkzsf6ERbow6sLCfIii8uxBPnusD+Toq4J4JYvXBfFaFt24oMYVdVZQZxX1oaA+yOq8IM5lcVEQF7L4qSB+ksWbXbEfdsV+lGJF/8WmWOwNR2gsXh/LRyiZwjR5c/n2jzTpLK/1sxAj3SwbobsxnvabrU4rlWW80Rv9tmk5VT03tKyuaasMuaPbsg2nt2U5kbw5tGE0e92mHAXxVm937H5V38Ia3ZZjKQxb2r7d6LXW7UMolKxe7rM7zUalLV6e07Es66xT1XOD1bDt9onCkDtsx3G69hKFxoxiJHnpxthsnhnVKJoHtY1mo6vQtwtgtC2rAksLLFbfMh1zycIRwJKT5w+L3e52enIQ3/bf6tp2ZQF5of1t23QaCsOW9cw5650uSQgDoSd3heTta7bap205iuRB/ZZYwQoL2f5Sv2MZrQoLKbD0LdvKGlveiWKT3Zp3yeqVu913+qGpp6lsFhtNNmd7b2OWvFhhxvVupX2XXz0B18JX7xTCunioCIe1MFDFAuvhoRIe7oL3qn6vLt5ThHu1MJ6KxauH95Tw3i54WvXTuniqCKe1MFTFQuvhqRKe7oLnVT+vi+eKcF4Lw1UsvB6eK+H5LnhS9ZO6eKIIJ7UwRMVC6uGJEp6U4cWfR3ZqB1jPTr0E6364PhiU3lk0Oz5MoTgrrtyrG3eQOP0z9FYcK96LIysQp7hfkyFgXuCHqfga8IZHWbXLCOYbo6jEh4gpf3ZUi+uTY7NxbJp/Ng5f/77+Jnmqfa/9qP2imVpLe6290c61gQa1QPtL+1v7Z/+/gx8Ofjr4eWV9/Gg954VWug5++x8PJvG0</latexit>a2

<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>

⇡✓<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>

⇡✓<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>

⇡✓

The second key assumption, which is maybe even more important than the previous one, is the Markov assumption. It says that the distribution over the next state s_t is independent of the history, if we have complete information of the current state s_{t-1}. Or, in other words, we can reliably reconstruct the next state using just the current state and the chosen action. This condition is very easy to violate: For example, a static image doesn’t contain the velocity of objects on the pixel! This requires multiple images, or a different feature space. Often, we can expand the features of the state, for example by taking the last few observations as part of the state, to solve such issues.

The Markov assumption is what distinguishes RL from black-box optimization algorithms, like evolutionary algorithms: These do not assume anything about the environment and thus their algorithms don't make use of this structure.

We introduced RL with MDPs The goal is to maximize reward! What does this mean in Deep RL?

REWARDS

14

So far, we have introduced the structure of RL using MDPs. In RL, we want to maximize reward. What exactly does this mean? And particularly, what does this mean in the context of Deep RL?

Page 6: Lecture 9: Reinforcement Learning Emile van Krieken · 2021. 1. 25. · Emile van Krieken Deep Learning 2020 Lecture 9: Reinforcement Learning dlvu.github.io Reinforcement learning

Total reward: Total discounted reward :

→Prefer rewards soon

Goal: maximize total discounted reward. • Rewards and states are stochastic! • So, maximize expected return

<latexit sha1_base64="2tv9YTwPlyMUmsvhwJ09u2oLHo8=">AAAOInicfZdNb9s2HMbV7q3L6i3djrsICwoMa5BJieOXQ4Baso0e1jYL8tItDgKKpmXBlEiQlGNX0MfYJ9jH2HU77DjstGEfZpRsyxIlWRf/zefh45//EgXSodjjwjD+efT4gw8/+viTJ5/uffa08fkX+8++vOYkZBBdQYIJe+cAjrAXoCvhCYzeUYaA72B048zsRL+ZI8Y9ElyKJUV3PnADb+JBIOTQ/f73I+ji6CI+GzH0ANj43nyxqY71F/poTARPPtdjl/f7B8aRkV56uTDXxYG2vs7vnz39V4bA0EeBgBhwfmsaVNxFgAkPYhTvjUKOKIAz4KLbUEw6d5EX0FCgAMb6c6lNQqwLoifw+thjCAq8lAWAzJMJOpwCBqCQf3GvGMVRAHzED8dzj/JVyefuqhBA9ucuWqT9iwsTI5cBOvXgokAWAZ/7QExLg3zpO8VBFGLE5n5xMKGUjIpzgRj0eNKDc9mYtzS5JfySnK/16ZJOUcDjKGQ4zk+UAmIMTeTEtORIhDRK/4x8Dmb8TLAQHSZlOnbWB2x2gcaHMqcwUMSZYAJEccjxk+YE6AES3wfBOBrROBoJtBDR6PAoluJz3cHS7BD5dBSdF3EUjZKeOY5+Ia0F8U1OfKOKg5w4WP8IwWN9Qpg+l/efMK5Loy4tzIOIF2dfZbMn+pUafZ0Tr1XxJifeqKIT5tSwpM5z6rykPuTUB1Vd5MSFKi5z4lIV3+fE96r4blfsT7tif1ZiZf/loljujcZoIl8z6SMUzWAcvbp8/UMcddNr/SyESDeLRuhsjCfDVrvbjlUZb/TmsGNa/bKeGdpWz7SrDJmj17aN/mDLcqx4M2jDaA16LTUK4q3e6drDsr6FNXrtvlVh2NIO7eagvW4fQoFidTOf3W01S21xs5yuZVmn3bKeGaymbXeOKwyZw+73+z07RaEhoxgpXroxtlqnRjmKZkEdo9XsVejbG2B0LKsES3Ms1tAy+2bKIhDAilNkD4vd6XUHapDY9t/q2XbpBopc+zu22W9WGLasp/3TwUlKQhgIXLUrJGtfq9056ahRJAsatuUdLLGQ7S8Nu5bRLrGQHMvQsq2kscWVKBfZrXkXrV6523WnH5h6HKtmudBUc7L2NmbFiyvMuN5dad/lr56Aa+HL/xTCunhYEQ5rYWAVC6yHh5XwcBe8W/a7dfFuRbhbC+NWsbj18G4lvLsLnpb9tC6eVoTTWhhaxULr4WklPN0FL8p+URcvKsJFLYyoYhH18KISXuyCJ2U/qYsnFeGkFoZUsZB6eFIJT3bArw4EyU4hOUCwUqTck8vdbKpDHIGSzgUQKJUJjnhJdsQUCZDoji+Z0i+rrQhNTg4A68nOm2DdC9abk8J7kyYzZ1DuV1fuFX8fyRMIQ6/l1uat3DYDuZP8TpIy1/ckqfwcHSbVLiNYbIyy2pOnIVM9+5SLm+Mjs3lkmj82D1621gejJ9rX2jfat5qptbWX2ivtXLvSoPaL9pv2u/ZH49fGn42/Gn+vrI8fred8pRWuxn//AxS9OTI=</latexit>

R = r1 + r2 + · · ·+ rT<latexit sha1_base64="HjJc7PKHsJyg8rTub5MneKi4BH0=">AAAOLnicfZfLbuM2GIWV6W2a1m2m7a4bocEARREEUuL4shhgLNnGLDozaZBbGwcBRdOyYEpkScqxI+hdum03fZouuii67WOUkm1ZoiRrY4rn8PjjL1EgHYo9Lgzj771nH3z40cefPP90/7PPG198efDiq2tOQgbRFSSYsFsHcIS9AF0JT2B0SxkCvoPRjTOzE/1mjhj3SHAplhTd+8ANvIkHgZBdDwffGCOMfh2NPQ5JGIjkRjcfDg6NYyO99HLDXDcOtfV1/vCisTcaExj6KBAQA87vTIOK+wgw4UGM4v1RyBEFcAZcdBeKSec+8gIaChTAWH8ptUmIdUH0hFAfewxBgZeyASDzZIIOp4ABKOQ89otRHAXAR/xoPPcoXzX53F01BJBFuI8WaZHiwsDIZYBOPbgokEXA5z4Q01InX/pOsROFGLG5X+xMKCWj4lwgBj2e1OBcFuY9TerOL8n5Wp8u6RQFPI5ChuP8QCkgxtBEDkybHImQRulk5MOe8VeChegoaaZ9r/qAzS7Q+EjmFDqKOBNMgCh2OX5SnAA9QuL7IBhHIxpHI4EWIhodHcdSfKk7WJodAti46LyIo2iU1Mxx9AtpLYjvcuI7VRzkxMH6Twge6xPC9Ll8/oRxXRp1aWEeRLw4+iobPdGv1OjrnHitijc58UYVnTCnhiV1nlPnJfUxpz6q6iInLlRxmROXqviUE59U8XZX7M+7Yn9RYmX95aJY7o/GaCK/JekrFM1gHL25fPtjHHXTa/0uhEg3i0bobIynw1a7245VGW/05rBjWv2ynhnaVs+0qwyZo9e2jf5gy3KieDNow2gNei01CuKt3unaw7K+hTV67b5VYdjSDu3moL0uH0KBYnUzn91tNUtlcbOcrmVZZ92ynhmspm13TioMmcPu9/s9O0WhIaMYKV66MbZaZ0Y5imZBHaPV7FXo2wdgdCyrBEtzLNbQMvtmyiIQwIpTZC+L3el1B2qQ2Nbf6tl26QGKXPk7ttlvVhi2rGf9s8FpSkIYCFy1KiQrX6vdOe2oUSQLGrblEyyxkO0/DbuW0S6xkBzL0LKtpLDFlSgX2Z15H60+udt1px+aehyrZrnQVHOy9jZmxYsrzLjeXWnf5a8egGvhyzOFsC4eVoTDWhhYxQLr4WElPNwF75b9bl28WxHu1sK4VSxuPbxbCe/ugqdlP62LpxXhtBaGVrHQenhaCU93wYuyX9TFi4pwUQsjqlhEPbyohBe74EnZT+riSUU4qYUhVSykHp5UwpMd8Aw9yi1fslOQb1fESpFyTy53s6kOcQRKOhdAoFQmOOIl2RFTJECiO75kSm9Uz+ZkkqZQHI1cIJV4tWOhyQEDYD3ZoBOse8F6D1P4vNJk6AzKbe3KvZpmH8mDCkNv5Q7ovdxdA7nh/EFOiLm+Jyckf0dHSWuXESw2Rtnal4cmUz0ilRs3J8dm89g0f2oevm6tz0/PtW+177TvNVNra6+1N9q5dqVB7Un7Tftd+6PxZ+Ovxj+Nf1fWZ3vrMV9rhavx3/+4aD8Y</latexit>

0 6 � 6 1<latexit sha1_base64="mFWMPbZtkcP/WzEwCBfWOSfltVc=">AAAOgnicfZddb9s2GIWVdh9dtnrpdrkbYUGBbc0CKXH8ASxALdlGL9Y2C5ImW5wFFE3LgimRICnHrqC/uPv9j91uGCXbskRJ1k2Y9xweP3olCqRDsceFYfy99+TpJ59+9vmzL/a//Op54+uDF9984CRkEF1Dggm7dQBH2AvQtfAERreUIeA7GN04MzvRb+aIcY8EV2JJ0b0P3MCbeBAIWXo4mI6gi6PL+GE09jgkYSDORww9AjZ+MF9ltU3pRH+lZ8U/Tzbl07RMBC/I0dXPZryxXD0cHBrHRnrp5YG5Hhxq6+vi4cXzPZkJQx8FAmLA+Z1pUHEfASY8iFG8Pwo5ogDOgIvuQjHp3EdeQEOBAhjrL6U2CbEuiJ7ctT72GIICL+UAQObJBB1OAQNQyN7sF6M4CoCP+NF47lG+GvK5uxoIIBt7Hy3SxseFiZHLAJ16cFEgi4DPfSCmpSJf+k6xiEKM2NwvFhNKyag4F4hBjyc9uJCNeU+TZ8mvyMVany7pFAU8jkKG4/xEKSDG0EROTIcciZBG6c3IF2jGzwUL0VEyTGvnfcBml2h8JHMKhSLOBBMgiiXHT5oToEdIfB8E42hE42gk0EJEo6PjWIovdQdLs0Pk21F0XsZRNEp65jj6pbQWxHc58Z0qDnLiYP0jBI/1CWH6XD5/wrgujbq0MA8iXpx9nc2e6Ndq9Iec+EEVb3LijSo6YU4NS+o8p85L6mNOfVTVRU5cqOIyJy5V8WNO/KiKt7tif98V+4cSK/svF8VyfzRGE/l9Sl+haAbj6M3V21/jqJte63chRLpZNEJnYzwdttrddqzKeKM3hx3T6pf1zNC2eqZdZcgcvbZt9AdblhPFm0EbRmvQa6lREG/1TtcelvUtrNFr960Kw5Z2aDcH7XX7EAoUq5v57G6rWWqLm+V0Lcs665b1zGA1bbtzUmHIHHa/3+/ZKQoNGcVI8dKNsdU6M8pRNAvqGK1mr0LfPgCjY1klWJpjsYaW2TdTFoEAVpwie1nsTq87UIPEtv9Wz7ZLD1Dk2t+xzX6zwrBlPeufDU5TEsJA4KpdIVn7Wu3OaUeNIlnQsC2fYImFbH9p2LWMdomF5FiGlm0ljS2uRLnI7sz7aPXJ3a47/dDU41g1y4WmmpO1tzErXlxhxvXuSvsuf/UEXAtfvlMI6+JhRTishYFVLLAeHlbCw13wbtnv1sW7FeFuLYxbxeLWw7uV8O4ueFr207p4WhFOa2FoFQuth6eV8HQXvCj7RV28qAgXtTCiikXUw4tKeLELnpT9pC6eVISTWhhSxULq4UklPNkBvzoQJDuF5OTBSpFyTy53s6kOcQRKOhdAoFQmOOIl2RFTJECiO75kSv9RPZsjSppCcTRygVTi1Y6FJgcMgPVkg06w7gXrPUzh80qTqTMot7Ur9+o2+0geVBh6K3dA7+XuGsgN50/yhpjre/KG5N/RUTLaZQSLjVGO9uWhyVSPSOXBzcmx2Tw2zd+ah69b6/PTM+077XvtB83U2tpr7Y12oV1rUPtL+0f7V/uv8bTxY8NsnK6sT/bWc77VClfjl/8B2N5ccw==</latexit>

R� = r1 + �r2 + �2r3 + · · ·+ �T-1rT

<latexit sha1_base64="TiVJ9OGn/h7kAOyG3yMK5v3AT8g=">AAAOfHicfZdvb6M2HMe5279bt3a97eGesFUn3bZeBW2aPw8qXSCJTtPurqvapluJKuM4BMVgy5g0Ocb72avZ020vZtIMSQgYCE/i+Pv1l49/YGTbFLsB17R/nzz96ONPPv3s2ed7X3y5f/DV4fOvbwMSMohuIMGE3dkgQNj10Q13OUZ3lCHg2RgN7ZmZ6MM5YoFL/Gu+pGjkAcd3Jy4EXHQ9HBoWdHD0c/zSsvkUcfDDheUBPrXtqB8/RDTpBqH6x0aN71P/Vfxgjd0AktDno4fDI+1ESy+13NDXjSNlfV0+PN//zhoTGHrI5xCDILjXNcpHEWDchRjFe1YYIArgDDjoPuST9ihyfRpy5MNYfSG0SYhVTtRkPurYZQhyvBQNAJkrElQ4BQxALma9V4wKkA88FByP5y4NVs1g7qwaHIiSjaJFWtK4MDByGKBTFy4KZBHwgqRUpc5g6dnFThRixOZesTOhFIySc4EYdIOkBpeiMO9p8pSCa3K51qdLOkV+EEchw3F+oBAQY2giBqbNAPGQRulkxKsxCy44C9Fx0kz7LnqAza7Q+FjkFDqKOBNMAC922V5SHB89QuJ5wB9HFo0ji6MFj6zjk1iIL1QbC7NNABsXnVdxFK1fL/VKWAviu5z4Thb7ObG/vgnBY3VCmDoXz5+wQBVGVViYC1FQHH2TjZ6oN3L0bU68lcVhThzKoh3m1LCkznPqvKQ+5tRHWV3kxIUsLnPiUhY/5MQPsni3K/a3XbG/S7Gi/mJRLPesMZqIL0/6CkUzGEdvrt/+Eked9Fq/CyFS9aIR2hvj2aDZ6rRiWcYbvTFo60avrGeGltHVzSpD5ui2TK3X37KcSt4MWtOa/W5TjoJ4q7c75qCsb2G1bqtnVBi2tAOz0W+ty4eQL1mdzGd2mo1SWZwsp2MYxnmnrGcGo2Ga7dMKQ+Ywe71e10xRaMgoRpKXbozN5rlWjqJZUFtrNroV+vYBaG3DKMHSHIsxMPSenrJwBLDk5NnLYra7nb4cxLf1N7qmWXqAPFf+tqn3GhWGLet577x/lpIQBnxHrgrJytdstc/achTJggYt8QRLLGR7p0HH0FolFpJjGRimkRS2uBLFIrvXR9Hqk7tdd+qRrsaxbBYLTTYna29jlry4wozr3ZX2Xf7qAbgWvjxTCOviYUU4rIWBVSywHh5WwsNd8E7Z79TFOxXhTi2MU8Xi1MM7lfDOLnha9tO6eFoRTmthaBULrYenlfB0Fzwv+3ldPK8I57UwvIqF18PzSni+C56U/aQunlSEk1oYUsVC6uFJJTzZAc/Qo9jyJTuF5IzASpFiTy52s6kOcQRKesABR6lMcBSU5NURJNFtTzClf8oeEGYO0ZT1zaElvQvFkeUAocSrHQ1NDiAAq8kGnmDV9dd7nMLnlyZDZ1Bse1fuVRl6SBxkGHordkjvxe4biA3pj2LCzPFcMWHxax0nrV1GsNgYRWtPHKp0+QhVbgxPT/TGia7/2jh63Vyfr54p3yrfKy8VXWkpr5U3yqVyo0DlT+Uv5W/ln/3/Do4Ofjp4tbI+fbIe841SuA6a/wOkZ1zz</latexit>

J(✓) = Ep(⌧|✓)[R�]<latexit sha1_base64="bXnJqj0zFD7qRndUfNZuXhjS+jM=">AAAORnicfZdNb6NGHMbJ9m2b1ttse+wFNVppu4oiSBy/HCKtwbZWVXc3jfLWhjQaxmOMPDCjYXDsRXyJfppe20s/Qz9Ej1WvHTDBMIA5jed55vGPPwz6j02xG3BN+3vnyUcff/LpZ08/3/3iy9azr/aef30VkJBBdAkJJuzGBgHCro8uucsxuqEMAc/G6Nqem4l+vUAscIl/wVcU3XnA8d2pCwEXU/d7B5bNZ4iDX1+dRhZgjuWBZXyfTVrQwdEP8cvs5/f3e/vaoZZeanWgZ4N9JbvO7p+3dqwJgaGHfA4xCIJbXaP8LgKMuxCjeNcKA0QBnAMH3YZ82ruLXJ+GHPkwVl8IbRpilRM1IVcnLkOQ45UYAMhckaDCGWAAcnF/u+WoAPnAQ8HBZOHSYD0MFs56wIEozl20TIsXlxZGDgN05sJliSwCXuABPqtMBivPLk+iECO28MqTCaVglJxLxKAbJDU4E4V5T5PnEVyQs0yfregM+UEchQzHxYVCQIyhqViYDgPEQxqlNyNegnlwylmIDpJhOnc6BGx+jiYHIqc0UcaZYgJ4ecr2kuL46AESzwP+JLJoHFkcLXlkHRzGQnyh2liYbQLYpOw8j6PISmpm2+q5sJbEdwXxnSyOCuIo+xOCJ+qUMHUhnj9hgSqMqrAwF6KgvPoyXz1VL+Xoq4J4JYvXBfFaFu2woIYVdVFQFxX1oaA+yOqyIC5lcVUQV7L4oSB+kMWbbbE/b4v9RYoV9RebYrVrTdBUfGPSVyiawzh6c/H2xzjqp1f2LoRI1ctGaD8aj8edbr8byzJ+1Nvjnm4Mq3pu6BoD3awz5I5B19SGow3LkeTNoTWtMxp05CiIN3qvb46r+gZWG3SHRo1hQzs226NuVj6EfMnq5D6z32lXyuLkOX3DME76VT03GG3T7B3VGHKHORwOB2aKQkNGMZK89NHY6Zxo1SiaB/W0TntQo28egNYzjAosLbAYY0Mf6ikLRwBLTp6/LGZv0B/JQXxTf2NgmpUHyAvl75n6sF1j2LCeDE9GxykJYcB35KqQvHydbu+4J0eRPGjcFU+wwkI2/zTuG1q3wkIKLGPDNJLClnei2GS3+l20/uRu9p26r6txLJvFRpPNyd57NEteXGPGze5a+zZ//QLcCF+9Uwib4mFNOGyEgXUssBke1sLDbfBO1e80xTs14U4jjFPH4jTDO7XwzjZ4WvXTpnhaE04bYWgdC22Gp7XwdBs8r/p5UzyvCeeNMLyOhTfD81p4vg2eVP2kKZ7UhJNGGFLHQprhSS082QLP0INo+ZJOITkhsEqk6MlFN5vqEEegogcccJTKBEdBRV4fNxLd9gRT+kP2TNwAktDnaQrFkeUAocTrjoUmBwyA1aRBJ1h1/ayHKX1eabJ0DkVbu3avb3OIxEGFobeiA3ovumsgGs5X6cnIc8UNJSekg2S0zShOUJlRjHbFoUmXj0jVwfXRod4+1PWf2vuvO9n56anyrfKd8lLRla7yWnmjnCmXClR+U35X/lD+bP3V+qf1b+u/tfXJTrbmG6V0PVP+Bw2XSHw=</latexit>

✓⇤ = argmax✓J(✓)

EXPECTED RETURN

15

<latexit sha1_base64="KIzHNcLYbod1+x/KccCLlzNOPGI=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe+P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wN8r/HQ</latexit>s0<latexit sha1_base64="M8sqRzeDFh3Mi1iBGtqcTbyBm/Y=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk98b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gf1BfGy</latexit>a0

~<latexit sha1_base64="9xsJmqaI6QVO7aNPEsbsg9BoZ44=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL7837Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q8UNPHI</latexit>r1

<latexit sha1_base64="OzSwNdQHk8BIaE6qZCRw39GSBas=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe/P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wOJuPHR</latexit>s1<latexit sha1_base64="faCoi3suTLeizhRJ44TVzLkE1Ww=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk9+b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gcCHfGz</latexit>a1

~<latexit sha1_base64="pd3L+wPVTqb/6ceddXPUL+1bOrE=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL70/unx0ax8by0quFuS4OtfV1fv98fzAcERgHKOQQgyi6NQ3K7xLAuA8xSveGcYQogFPgoduYj9t3iR/SmKMQpvpLoY1jrHOiZ1D6yGcIcrwQBYDMFwk6nAAGIBfoe+WoCIUgQNHRaObTaFVGM29VcCDu+y6ZL/uSliYmHgN04sN5iSwBQRQAPqkMRovALQ+iGCM2C8qDGaVglJxzxKAfZT04F415T7NWR5fkfK1PFnSCwihNYobT4kQhIMbQWExclhHiMU2WNyPWdxq94ixGR1m5HHvlADa9QKMjkVMaKOOMMQG8POQGWXNC9ABJEIBwlAxpmgw5mvNkeHScCvGl7mJhdglgo7LzIk2SYdYz19UvhLUkviuI72SxVxB76x8heKSPCdNnYv0Ji3Rh1IWF+RBF5dmDfPZYH8jRVwXxShavC+K1LLpxQY0r6qygzirqQ0F9kNV5QZzL4qIgLmTxU0H8JIs3u2I/7Ir9KMWK/otNsdgbjtBYvD6Wj1AyhWny5vLtH2nSWV7rZyFGulk2QndjPO03W51WKst4ozf6bdNyqnpuaFld01YZcke3ZRtOb8tyInlzaMNo9rpNOQrird7u2P2qvoU1ui3HUhi2tH270Wut24dQKFm93Gd3mo1KW7w8p2NZ1lmnqucGq2Hb7ROFIXfYjuN07SUKjRnFSPLSjbHZPDOqUTQPahvNRlehbxfAaFtWBZYWWKy+ZTrmkoUjgCUnzx8Wu93t9OQgvu2/1bXtygLyQvvbtuk0FIYt65lz1jtdkhAGQk/uCsnb12y1T9tyFMmD+i2xghUWsv2lfscyWhUWUmDpW7aVNba8E8UmuzXvktUrd7vv9ENTT1PZLDaabM723sYsebHCjOvdSvsuv3oCroWv3imEdfFQEQ5rYaCKBdbDQyU83AXvVf1eXbynCPdqYTwVi1cP7ynhvV3wtOqndfFUEU5rYaiKhdbDUyU83QXPq35eF88V4bwWhqtYeD08V8LzXfCk6id18UQRTmphiIqF1MMTJTwpw4s/j+zUDrCenXoJ1v1wfTAovbNodnyYQnFWXLlXN+4gcfpn6K04VrwXR1YgTnG/JkPAvMAPU/E14A2PsmqXEcw3RlGJDxFT/uyoFtcnx2bj2DT/bBy+/n39TfJU+177UftFM7WW9lp7o51rAw1qgfaX9rf2z/5/Bz8c/HTw88r6+NF6zgutdB389j8hPfHJ</latexit>r2

~

~ ~

~

~

<latexit sha1_base64="1RcFirdzOoS5tyEl3SzBljfwnaY=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnS+5P7Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q+WwfHS</latexit>s2<latexit sha1_base64="MJSdrcnRCn4KhDzKoML3O8YiGsk=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk9yf3zw6NY2N56dXCXBeH2vo6v3++PxiOCIwDFHKIQRTdmgbldwlg3IcYpXvDOEIUwCnw0G3Mx+27xA9pzFEIU/2l0MYx1jnRMyh95DMEOV6IAkDmiwQdTgADkAv0vXJUhEIQoOhoNPNptCqjmbcqOBD3fZfMl31JSxMTjwE68eG8RJaAIAoAn1QGo0XglgdRjBGbBeXBjFIwSs45YtCPsh6ci8a8p1mro0tyvtYnCzpBYZQmMcNpcaIQEGNoLCYuywjxmCbLmxHrO41ecRajo6xcjr1yAJteoNGRyCkNlHHGmABeHnKDrDkheoAkCEA4SoY0TYYczXkyPDpOhfhSd7EwuwSwUdl5kSbJMOuZ6+oXwloS3xXEd7LYK4i99Y8QPNLHhOkzsf6ERbow6sLCfIii8uxBPnusD+Toq4J4JYvXBfFaFt24oMYVdVZQZxX1oaA+yOq8IM5lcVEQF7L4qSB+ksWbXbEfdsV+lGJF/8WmWOwNR2gsXh/LRyiZwjR5c/n2jzTpLK/1sxAj3SwbobsxnvabrU4rlWW80Rv9tmk5VT03tKyuaasMuaPbsg2nt2U5kbw5tGE0e92mHAXxVm937H5V38Ia3ZZjKQxb2r7d6LXW7UMolKxe7rM7zUalLV6e07Es66xT1XOD1bDt9onCkDtsx3G69hKFxoxiJHnpxthsnhnVKJoHtY1mo6vQtwtgtC2rAksLLFbfMh1zycIRwJKT5w+L3e52enIQ3/bf6tp2ZQF5of1t23QaCsOW9cw5650uSQgDoSd3heTta7bap205iuRB/ZZYwQoL2f5Sv2MZrQoLKbD0LdvKGlveiWKT3Zp3yeqVu913+qGpp6lsFhtNNmd7b2OWvFhhxvVupX2XXz0B18JX7xTCunioCIe1MFDFAuvhoRIe7oL3qn6vLt5ThHu1MJ6KxauH95Tw3i54WvXTuniqCKe1MFTFQuvhqRKe7oLnVT+vi+eKcF4Lw1UsvB6eK+H5LnhS9ZO6eKIIJ7UwRMVC6uGJEp6U4cWfR3ZqB1jPTr0E6364PhiU3lk0Oz5MoTgrrtyrG3eQOP0z9FYcK96LIysQp7hfkyFgXuCHqfga8IZHWbXLCOYbo6jEh4gpf3ZUi+uTY7NxbJp/Ng5f/77+Jnmqfa/9qP2imVpLe6290c61gQa1QPtL+1v7Z/+/gx8Ofjr4eWV9/Gg954VWug5++x8PJvG0</latexit>a2

<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>

⇡✓<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>

⇡✓<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>

⇡✓

The total reward is simply the sum of all rewards the agent receives during one trajectory. Note that a sum of random variables is also a random variable, so the total reward is a random variable!

We will often deal with discounted rewards. These encode that we prefer our rewards sooner than later, and is encoded using the discount factor \gamma. Usually, this is a number very close to 1, like 0.999. Using the discount factor, rewards decay exponentially, such that rewards very far in the future are weighted infinitely low. This might seem like an unnatural way to develop reward functions. Unfortunately, some algorithms require a discount factor smaller than 1!

To maximize rewards, we have to note that again that total (discounted) reward is a random variable: So instead, we want to maximize the expected total (discounted) return given a policy \pi_\theta!

We will explain how exactly this expectation works and how to compute it. To maximize it, we should find the policy for which the expected return is maximized. As we assume our policies are parameterized neural networks, we can turn this into optimizing the parameters of our policy network instead!

Expectation over trajectories by following policy Requires summing over all trajectories!

→Monte Carlo (sampling) estimation

<latexit sha1_base64="TiVJ9OGn/h7kAOyG3yMK5v3AT8g=">AAAOfHicfZdvb6M2HMe5279bt3a97eGesFUn3bZeBW2aPw8qXSCJTtPurqvapluJKuM4BMVgy5g0Ocb72avZ020vZtIMSQgYCE/i+Pv1l49/YGTbFLsB17R/nzz96ONPPv3s2ed7X3y5f/DV4fOvbwMSMohuIMGE3dkgQNj10Q13OUZ3lCHg2RgN7ZmZ6MM5YoFL/Gu+pGjkAcd3Jy4EXHQ9HBoWdHD0c/zSsvkUcfDDheUBPrXtqB8/RDTpBqH6x0aN71P/Vfxgjd0AktDno4fDI+1ESy+13NDXjSNlfV0+PN//zhoTGHrI5xCDILjXNcpHEWDchRjFe1YYIArgDDjoPuST9ihyfRpy5MNYfSG0SYhVTtRkPurYZQhyvBQNAJkrElQ4BQxALma9V4wKkA88FByP5y4NVs1g7qwaHIiSjaJFWtK4MDByGKBTFy4KZBHwgqRUpc5g6dnFThRixOZesTOhFIySc4EYdIOkBpeiMO9p8pSCa3K51qdLOkV+EEchw3F+oBAQY2giBqbNAPGQRulkxKsxCy44C9Fx0kz7LnqAza7Q+FjkFDqKOBNMAC922V5SHB89QuJ5wB9HFo0ji6MFj6zjk1iIL1QbC7NNABsXnVdxFK1fL/VKWAviu5z4Thb7ObG/vgnBY3VCmDoXz5+wQBVGVViYC1FQHH2TjZ6oN3L0bU68lcVhThzKoh3m1LCkznPqvKQ+5tRHWV3kxIUsLnPiUhY/5MQPsni3K/a3XbG/S7Gi/mJRLPesMZqIL0/6CkUzGEdvrt/+Eked9Fq/CyFS9aIR2hvj2aDZ6rRiWcYbvTFo60avrGeGltHVzSpD5ui2TK3X37KcSt4MWtOa/W5TjoJ4q7c75qCsb2G1bqtnVBi2tAOz0W+ty4eQL1mdzGd2mo1SWZwsp2MYxnmnrGcGo2Ga7dMKQ+Ywe71e10xRaMgoRpKXbozN5rlWjqJZUFtrNroV+vYBaG3DKMHSHIsxMPSenrJwBLDk5NnLYra7nb4cxLf1N7qmWXqAPFf+tqn3GhWGLet577x/lpIQBnxHrgrJytdstc/achTJggYt8QRLLGR7p0HH0FolFpJjGRimkRS2uBLFIrvXR9Hqk7tdd+qRrsaxbBYLTTYna29jlry4wozr3ZX2Xf7qAbgWvjxTCOviYUU4rIWBVSywHh5WwsNd8E7Z79TFOxXhTi2MU8Xi1MM7lfDOLnha9tO6eFoRTmthaBULrYenlfB0Fzwv+3ldPK8I57UwvIqF18PzSni+C56U/aQunlSEk1oYUsVC6uFJJTzZAc/Qo9jyJTuF5IzASpFiTy52s6kOcQRKesABR6lMcBSU5NURJNFtTzClf8oeEGYO0ZT1zaElvQvFkeUAocSrHQ1NDiAAq8kGnmDV9dd7nMLnlyZDZ1Bse1fuVRl6SBxkGHordkjvxe4biA3pj2LCzPFcMWHxax0nrV1GsNgYRWtPHKp0+QhVbgxPT/TGia7/2jh63Vyfr54p3yrfKy8VXWkpr5U3yqVyo0DlT+Uv5W/ln/3/Do4Ofjp4tbI+fbIe841SuA6a/wOkZ1zz</latexit>

J(✓) = Ep(⌧|✓)[R�]<latexit sha1_base64="OFVdZgNlJf9zFZcA/y+6jnOW0PU=">AAAOI3icfZdNb9s2HMaV7q3L5q3djrsICwoMQxBIieOXQ4Faso0e1jYL8tItDgKKpmUhlEiQlGNX0MfYdbvs0+w47LLDvsso2ZElirIuYfg8fPzjn6JAehQHXFjWv3tPPvr4k08/e/r5/hdftr76+tnzb644iRlEl5Bgwt57gCMcROhSBAKj95QhEHoYXXv3bqZfLxDjAYkuxIqi2xD4UTALIBCy62ZCg7uJJ+ZIgLtnB9aRlT9mvWFvGgfG5jm7e97am0wJjEMUCYgB5ze2RcVtApgIIEbp/iTmiAJ4D3x0E4tZ7zYJIhoLFMHUfCG1WYxNQcwMy5wGDEGBV7IBIAtkggnngAEoJPx+NYqjCISIH04XAeXrJl/464YAcua3yTKvTFoZmPgM0HkAlxWyBIQ8BGJe6+Sr0Kt2ohgjtgirnRmlZFScS8RgwLManMnCvKNZsfkFOdvo8xWdo4inScxwWh4oBcQYmsmBeZMjEdMkn4xc4Xv+UrAYHWbNvO/lELD7czQ9lDmVjirODBMgql1emBUnQg+QhCGIpsmEpslEoKVIJodHqRRfmB6WZo8ANq06z9MkmWQ18zzzXFor4tuS+FYVRyVxtPkRgqfmjDBzIdefMG5KoyktLICIV0dfFqNn5qUafVUSr1TxuiReq6IXl9S4pi5K6qKmPpTUB1VdlsSlKq5K4koVP5TED6r4flfsL7tif1ViZf3lpljtT6ZoJj8g+SuU3MM0eX3x5qc06efP5l2IkWlXjdB7NJ6MO91+N1Vl/Ki3xz3bGdb1wtB1BrarMxSOQde1hqMty7HiLaAtqzMadNQoiLd6r++O6/oW1hp0h47GsKUdu+1Rd1M+hCLF6hc+t99p18riFzl9x3FO+3W9MDht1+0dawyFwx0OhwM3R6ExoxgpXvpo7HROrXoULYJ6Vqc90OjbBbB6jlODpSUWZ+zYQztnEQhgxSmKl8XtDfojNUhs6+8MXLe2gKJU/p5rD9saw5b1dHg6OslJCAORr1aFFOXrdHsnPTWKFEHjrlzBGgvZ/tK471jdGgspsYwd18kKW92JcpPd2LfJ+pO73XfmgW2mqWqWG001Z3vv0ax4scaMm91a+y6/fgBuhK/PFMKmeKgJh40wUMcCm+GhFh7ugvfrfr8p3teE+40wvo7Fb4b3tfD+Lnha99OmeKoJp40wVMdCm+GpFp7ughd1v2iKF5pw0QgjdCyiGV5o4cUueFL3k6Z4ogknjTBEx0Ka4YkWnuyAZ+hBHvmyk4J8uxJWi5RncnmazXWIE1DTuQAC5TLBCa/J69tGpnuhZMr/UT3TgEMSRyJPoTiZ+EAq6frEQrMLBsBmdkAn2AyizRmm8nml2dB7KI+1a/d6mkMkLyoMvZEnoHfydA3kgfNHOSHmh4GckPw7Ocxau4xg+WiUrX15abLVK1K9cX18ZLePbPvn9sGrzub+9NT4zvje+MGwja7xynhtnBmXBjSI8Zvxu/FH68/WX62/W/+srU/2NmO+NSpP67//AVItO3M=</latexit>

⇡✓

EXPECTED RETURN

16

<latexit sha1_base64="KIzHNcLYbod1+x/KccCLlzNOPGI=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe+P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wN8r/HQ</latexit>s0<latexit sha1_base64="M8sqRzeDFh3Mi1iBGtqcTbyBm/Y=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk98b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gf1BfGy</latexit>a0

~<latexit sha1_base64="9xsJmqaI6QVO7aNPEsbsg9BoZ44=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL7837Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q8UNPHI</latexit>r1

<latexit sha1_base64="OzSwNdQHk8BIaE6qZCRw39GSBas=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe/P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wOJuPHR</latexit>s1<latexit sha1_base64="faCoi3suTLeizhRJ44TVzLkE1Ww=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk9+b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gcCHfGz</latexit>a1

~<latexit sha1_base64="pd3L+wPVTqb/6ceddXPUL+1bOrE=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL70/unx0ax8by0quFuS4OtfV1fv98fzAcERgHKOQQgyi6NQ3K7xLAuA8xSveGcYQogFPgoduYj9t3iR/SmKMQpvpLoY1jrHOiZ1D6yGcIcrwQBYDMFwk6nAAGIBfoe+WoCIUgQNHRaObTaFVGM29VcCDu+y6ZL/uSliYmHgN04sN5iSwBQRQAPqkMRovALQ+iGCM2C8qDGaVglJxzxKAfZT04F415T7NWR5fkfK1PFnSCwihNYobT4kQhIMbQWExclhHiMU2WNyPWdxq94ixGR1m5HHvlADa9QKMjkVMaKOOMMQG8POQGWXNC9ABJEIBwlAxpmgw5mvNkeHScCvGl7mJhdglgo7LzIk2SYdYz19UvhLUkviuI72SxVxB76x8heKSPCdNnYv0Ji3Rh1IWF+RBF5dmDfPZYH8jRVwXxShavC+K1LLpxQY0r6qygzirqQ0F9kNV5QZzL4qIgLmTxU0H8JIs3u2I/7Ir9KMWK/otNsdgbjtBYvD6Wj1AyhWny5vLtH2nSWV7rZyFGulk2QndjPO03W51WKst4ozf6bdNyqnpuaFld01YZcke3ZRtOb8tyInlzaMNo9rpNOQrird7u2P2qvoU1ui3HUhi2tH270Wut24dQKFm93Gd3mo1KW7w8p2NZ1lmnqucGq2Hb7ROFIXfYjuN07SUKjRnFSPLSjbHZPDOqUTQPahvNRlehbxfAaFtWBZYWWKy+ZTrmkoUjgCUnzx8Wu93t9OQgvu2/1bXtygLyQvvbtuk0FIYt65lz1jtdkhAGQk/uCsnb12y1T9tyFMmD+i2xghUWsv2lfscyWhUWUmDpW7aVNba8E8UmuzXvktUrd7vv9ENTT1PZLDaabM723sYsebHCjOvdSvsuv3oCroWv3imEdfFQEQ5rYaCKBdbDQyU83AXvVf1eXbynCPdqYTwVi1cP7ynhvV3wtOqndfFUEU5rYaiKhdbDUyU83QXPq35eF88V4bwWhqtYeD08V8LzXfCk6id18UQRTmphiIqF1MMTJTwpw4s/j+zUDrCenXoJ1v1wfTAovbNodnyYQnFWXLlXN+4gcfpn6K04VrwXR1YgTnG/JkPAvMAPU/E14A2PsmqXEcw3RlGJDxFT/uyoFtcnx2bj2DT/bBy+/n39TfJU+177UftFM7WW9lp7o51rAw1qgfaX9rf2z/5/Bz8c/HTw88r6+NF6zgutdB389j8hPfHJ</latexit>r2

~

~ ~

~

~

<latexit sha1_base64="1RcFirdzOoS5tyEl3SzBljfwnaY=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnS+5P7Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q+WwfHS</latexit>s2<latexit sha1_base64="MJSdrcnRCn4KhDzKoML3O8YiGsk=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk9yf3zw6NY2N56dXCXBeH2vo6v3++PxiOCIwDFHKIQRTdmgbldwlg3IcYpXvDOEIUwCnw0G3Mx+27xA9pzFEIU/2l0MYx1jnRMyh95DMEOV6IAkDmiwQdTgADkAv0vXJUhEIQoOhoNPNptCqjmbcqOBD3fZfMl31JSxMTjwE68eG8RJaAIAoAn1QGo0XglgdRjBGbBeXBjFIwSs45YtCPsh6ci8a8p1mro0tyvtYnCzpBYZQmMcNpcaIQEGNoLCYuywjxmCbLmxHrO41ecRajo6xcjr1yAJteoNGRyCkNlHHGmABeHnKDrDkheoAkCEA4SoY0TYYczXkyPDpOhfhSd7EwuwSwUdl5kSbJMOuZ6+oXwloS3xXEd7LYK4i99Y8QPNLHhOkzsf6ERbow6sLCfIii8uxBPnusD+Toq4J4JYvXBfFaFt24oMYVdVZQZxX1oaA+yOq8IM5lcVEQF7L4qSB+ksWbXbEfdsV+lGJF/8WmWOwNR2gsXh/LRyiZwjR5c/n2jzTpLK/1sxAj3SwbobsxnvabrU4rlWW80Rv9tmk5VT03tKyuaasMuaPbsg2nt2U5kbw5tGE0e92mHAXxVm937H5V38Ia3ZZjKQxb2r7d6LXW7UMolKxe7rM7zUalLV6e07Es66xT1XOD1bDt9onCkDtsx3G69hKFxoxiJHnpxthsnhnVKJoHtY1mo6vQtwtgtC2rAksLLFbfMh1zycIRwJKT5w+L3e52enIQ3/bf6tp2ZQF5of1t23QaCsOW9cw5650uSQgDoSd3heTta7bap205iuRB/ZZYwQoL2f5Sv2MZrQoLKbD0LdvKGlveiWKT3Zp3yeqVu913+qGpp6lsFhtNNmd7b2OWvFhhxvVupX2XXz0B18JX7xTCunioCIe1MFDFAuvhoRIe7oL3qn6vLt5ThHu1MJ6KxauH95Tw3i54WvXTuniqCKe1MFTFQuvhqRKe7oLnVT+vi+eKcF4Lw1UsvB6eK+H5LnhS9ZO6eKIIJ7UwRMVC6uGJEp6U4cWfR3ZqB1jPTr0E6364PhiU3lk0Oz5MoTgrrtyrG3eQOP0z9FYcK96LIysQp7hfkyFgXuCHqfga8IZHWbXLCOYbo6jEh4gpf3ZUi+uTY7NxbJp/Ng5f/77+Jnmqfa/9qP2imVpLe6290c61gQa1QPtL+1v7Z/+/gx8Ofjr4eWV9/Gg954VWug5++x8PJvG0</latexit>a2

<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>

⇡✓<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>

⇡✓<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>

⇡✓

So what exactly is this expectation over returns? It is an expectation over trajectories that result from the agent following the policy \pi_\btheta. Remember that, given a policy, the probability of a trajectory is defined using the MDP formalism. We will use this to compute our expectation!

Unfortunately, computing an expectation requires summing (or integrating) over all possible trajectories that could be sampled. There are a huge amount of possible trajectories, however! So, clearly, we cannot hope to precisely compute the expected return. Therefore, we will do what we usually do when we see an expectation: Sample! This is called Monte Carlo estimation, which sounds fancy but just means estimating an expectation using sampling.

Input: Parameters , discount

while is not terminal:

return

<latexit sha1_base64="7SymiPph3MfAvvzD9Aij6Ac3EKc=">AAAOH3icfZdNb9s2HMaV7q3L5q3djrsICwoMQxBIieOXQ4Fako0e1jYLkjhbHBQUTcuCKZGgKMeuoA+x63bZp9lx2LXfZpTsyBJFWZcwfB4+/vFPUSBdiv2IG8bHgyeffPrZ5188/fLwq69b33z77Pl3NxGJGUTXkGDCbl0QIeyH6Jr7HKNbyhAIXIzG7sLO9PESscgn4RVfU3QfAC/0Zz4EXHSNJy6fIw7ePzsyToz80esNc9s40rbPxfvnrYPJlMA4QCGHGETRnWlQfp8Axn2IUXo4iSNEAVwAD93FfNa7T/yQxhyFMNVfCG0WY50TPUPSpz5DkOO1aADIfJGgwzlgAHIBfliNilAIAhQdT5c+jTbNaOltGhyIWd8nq7wqaWVg4jFA5z5cVcgSEEQB4PNaZ7QO3GonijFiy6DamVEKRsm5Qgz6UVaDC1GYdzQrdHRFLrb6fE3nKIzSJGY4LQ8UAmIMzcTAvBkhHtMkn4xY3UX0krMYHWfNvO+lA9jiEk2PRU6lo4ozwwTwapcbZMUJ0QMkQQDCaTKhaTLhaMWTyfFJKsQXuouF2SWATavOyzRJJlnNXFe/FNaK+LYkvpXFYUkcbn+E4Kk+I0xfivUnLNKFURcW5kMUVUdfF6Nn+rUcfVMSb2RxXBLHsujGJTWuqcuSuqypDyX1QVZXJXEli+uSuJbFDyXxgyze7ov9bV/s71KsqL/YFOvDyRTNxMcjf4WSBUyT11dvfkmTfv5s34UY6WbVCN1H49mo0+13U1nGj3p71DMtp64Xhq41MG2VoXAMurbhDHcsp5K3gDaMznDQkaMg3um9vj2q6ztYY9B1LIVhRzuy28PutnwIhZLVK3x2v9OulcUrcvqWZZ3363phsNq23TtVGAqH7TjOwM5RaMwoRpKXPho7nXOjHkWLoJ7RaQ8U+m4BjJ5l1WBpicUaWaZj5iwcASw5efGy2L1BfygH8V39rYFt1xaQl8rfs02nrTDsWM+d8+FZTkIYCD25KqQoX6fbO+vJUaQIGnXFCtZYyO6XRn3L6NZYSIllZNlWVtjqThSb7M68Tzaf3N2+049MPU1ls9hosjnbe49myYsVZtzsVtr3+dUDcCN8faYQNsVDRThshIEqFtgMD5XwcB+8V/d7TfGeItxrhPFULF4zvKeE9/bB07qfNsVTRThthKEqFtoMT5XwdB88r/t5UzxXhPNGGK5i4c3wXAnP98GTup80xRNFOGmEISoW0gxPlPBkDzxDD+LIl50UxNuVsFqkOJOL02yuQ5yAmh5xwFEuE5xENXlz28h0NxBM+T+yZ+pHkMQhz1MoTiYeEEq6ObHQ7IIBsJ4d0AnW/XB7hql8Xmk2dAHFsXbj3kzTQeKiwtAbcQJ6J07XQBw4fxYTYl7giwmJv5PjrLXPCFaPRtE6FJcmU74i1Rvj0xOzfWKav7aPXnW296en2g/aj9pPmql1tVfaa+1Cu9agttD+0P7U/mr93fqn9W/rv431ycF2zPda5Wl9/B/dNzm3</latexit>

✓<latexit sha1_base64="pkTOV+MdRGIubSrCB6GPllmwtkc=">AAAOIXicfZdNb9s2HMaV7q3L5rXdjrsICwoMQxBIieOXQ4Fako0e1jYL8tbVQUHRtCyYEgmKcuwK+hS7bpd9mh2H3YZ9mVGyLUsUZV3C8Hn4+Mc/RYF0KfYjbhj/Hjz65NPPPv/i8ZeHX33d+ubJ02ff3kQkZhBdQ4IJu3NBhLAfomvuc4zuKEMgcDG6ded2pt8uEIt8El7xFUX3AfBCf+pDwEXXu/HEjyCJQ/7h6ZFxYuSPXm+Ym8aRtnkuPjxrHYwnBMYBCjnEIIremwbl9wlg3IcYpYfjOEIUwDnw0PuYT3v3iR/SmKMQpvpzoU1jrHOiZ1D6xGcIcrwSDQCZLxJ0OAMMQC7QD6tREQpBgKLjycKn0boZLbx1gwMx7/tkmdclrQxMPAbozIfLClkCgigAfFbrjFaBW+1EMUZsEVQ7M0rBKDmXiEE/ympwIQrzlmaljq7IxUafregMhVGaxAyn5YFCQIyhqRiYNyPEY5rkkxHrO49ecBaj46yZ971wAJtfosmxyKl0VHGmmABe7XKDrDgheoAkCEA4ScY0TcYcLXkyPj5Jhfhcd7EwuwSwSdV5mSbJOKuZ6+qXwloR35TEN7I4LInDzY8QPNGnhOkLsf6ERbow6sLCfIii6ujrYvRUv5ajb0rijSzelsRbWXTjkhrX1EVJXdTUh5L6IKvLkriUxVVJXMnix5L4URbv9sW+2xf7qxQr6i82xepwPEFT8fnIX6FkDtPk1dXrn9Oknz+bdyFGulk1QndrPBt1uv1uKst4q7dHPdNy6nph6FoD01YZCsegaxvOcMdyKnkLaMPoDAcdOQrind7r26O6voM1Bl3HUhh2tCO7PexuyodQKFm9wmf3O+1aWbwip29Z1nm/rhcGq23bvVOFoXDYjuMM7ByFxoxiJHnp1tjpnBv1KFoE9YxOe6DQdwtg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmI7+pvDWy7toC8VP6ebTpthWHHeu6cD89yEsJA6MlVIUX5Ot3eWU+OIkXQqCtWsMZCdr806ltGt8ZCSiwjy7aywlZ3othk7837ZP3J3e07/cjU01Q2i40mm7O9tzVLXqww42a30r7Prx6AG+HrM4WwKR4qwmEjDFSxwGZ4qISH++C9ut9rivcU4V4jjKdi8ZrhPSW8tw+e1v20KZ4qwmkjDFWx0GZ4qoSn++B53c+b4rkinDfCcBULb4bnSni+D57U/aQpnijCSSMMUbGQZniihCd74Bl6EEe+7KQg3q6E1SLFmVycZnMd4gTU9IgDjnKZ4CSqyS6fIQ4y3Q0EU/6P7NneSfIUipOxB4SSrk8sNLtgAKxnB3SCdT/cnGEqn1eaDZ1Dcaxdu9fTdJC4qDD0WpyA3orTNRAHzp/EhJgX+GJC4u/4OGvtM4Ll1ihah+LSZMpXpHrj9vTEbJ+Y5i/to5edzf3psfa99oP2o2ZqXe2l9kq70K41qAXab9rv2h+tP1t/tf5u/bO2PjrYjPlOqzyt//4HLdw6vA==</latexit>�

<latexit sha1_base64="zs6C1jBlXSW4Dyc42P6i5TTkpdc=">AAAOMnicfZdNb9s2HMaV7q3L5i3dLgN2ERYU6IYgkBLHL4cCtWQbPaxtFuRti4OAomlZMCUSJOXYFbRPs+t22afpcdh1H2KUbMsSJVkXU3wePv7xL1EgHYo9Lgzjw96Tjz7+5NPPnn6+/8WXja++Pnj2zTUnIYPoChJM2K0DOMJegK6EJzC6pQwB38HoxpnZiX4zR4x7JLgUS4rufeAG3sSDQMiuh4PvRlwAgR6MEfd8nb7Y3P74cHBoHBvppZcb5rpxqK2v84dnjb3RmMDQR4GAGHB+ZxpU3EeACQ9iFO+PQo4ogDPgortQTDr3kRfQUKAAxvpzqU1CrAuiJ5T62GMICryUDQCZJxN0OAUMQCHnsl+M4igAPuJH47lH+arJ5+6qIYAsxH20SAsVFwZGLgN06sFFgSwCPveBmJY6+dJ3ip0oxIjN/WJnQikZFecCMejxpAbnsjDvaFJ7fknO1/p0Saco4HEUMhznB0oBMYYmcmDa5EiENEonIx/4jL8ULERHSTPte9kHbHaBxkcyp9BRxJlgAkSxy/GT4gToERLfB8E4GtE4Ggm0ENHo6DiW4nPdwdLsEMDGRedFHEWjpGaOo19Ia0F8mxPfquIgJw7Wf0LwWJ8Qps/l8yeM69KoSwvzIOLF0VfZ6Il+pUZf58RrVbzJiTeq6IQ5NSyp85w6L6mPOfVRVRc5caGKy5y4VMX3OfG9Kt7uiv11V+xvSqysv1wUy/3RGE3k9yR9haIZjKPXl29+jqNueq3fhRDpZtEInY3xdNhqd9uxKuON3hx2TKtf1jND2+qZdpUhc/TattEfbFlOFG8GbRitQa+lRkG81Ttde1jWt7BGr923Kgxb2qHdHLTX5UMoUKxu5rO7rWapLG6W07Us66xb1jOD1bTtzkmFIXPY/X6/Z6coNGQUI8VLN8ZW68woR9EsqGO0mr0KffsAjI5llWBpjsUaWmbfTFkEAlhxiuxlsTu97kANEtv6Wz3bLj1AkSt/xzb7zQrDlvWsfzY4TUkIA4GrVoVk5Wu1O6cdNYpkQcO2fIIlFrL9p2HXMtolFpJjGVq2lRS2uBLlIrsz76PVJ3e77vRDU49j1SwXmmpO1t7GrHhxhRnXuyvtu/zVA3AtfHmmENbFw4pwWAsDq1hgPTyshIe74N2y362LdyvC3VoYt4rFrYd3K+HdXfC07Kd18bQinNbC0CoWWg9PK+HpLnhR9ou6eFERLmphRBWLqIcXlfBiFzwp+0ldPKkIJ7UwpIqF1MOTSniyA56hR7nlS3YK8u2KWClS7snlbjbVIY5ASU8PFKlMcMRLsiOmSIBEd3zJlN6onrHHIQkDkaZQHI1cIJV4tWOhyQEDYD3ZoBOse8F6D1P4vNJk6AzKbe3KvZpmH8mDCkNv5A7ondxdA7nh/ElOiLm+Jyckf0dHSWuXESw2Rtnal4cmUz0ilRs3J8dm89g0f2kevmqtz09Pte+1H7QXmqm1tVfaa+1cu9Kg9rv2h/an9lfj78aHxj+Nf1fWJ3vrMd9qhavx3//SwUBc</latexit>

s0 ⇠ p(s0)<latexit sha1_base64="4jpVXBlgvUiiEoKBL4Bt9y+yCVs=">AAAOG3icfZdNb9s2HMaV7q3L5rXdjrsICwoMQxBIieOXQ4Bako0e1jbL8rbFQUHRtCyYEgmKcuwK+gi7bpd9mh2HXXfYtxklO7JEUdYlDJ+Hj3/8SxT+cin2I24Y/+09+ejjTz797Onn+1982frq2fMXX19HJGYQXUGCCbt1QYSwH6Ir7nOMbilDIHAxunHndqbfLBCLfBJe8hVF9wHwQn/qQ8DF1M/8zHj//MA4MvJLrw/MzeBA21zn71+09sYTAuMAhRxiEEV3pkH5fQIY9yFG6f44jhAFcA48dBfzae8+8UMacxTCVH8ptGmMdU70DEef+AxBjldiACDzRYIOZ4AByAX0fjUqQiEIUHQ4Wfg0Wg+jhbcecCB2fJ8s84qklYWJxwCd+XBZIUtAEAWAz2qT0Spwq5MoxogtgupkRikYJecSMehHWQ3ORWHe0azI0SU53+izFZ2hMEqTmOG0vFAIiDE0FQvzYYR4TJN8M+LOzqMzzmJ0mA3zuTMHsPkFmhyKnMpEFWeKCeDVKTfIihOiB0iCAISTZEzTZMzRkifjw6NUiC91FwuzSwCbVJ0XaZKMs5q5rn4hrBXxbUl8K4vDkjjc/AjBE31KmL4Q95+wSBdGXViYD1FUXX1VrJ7qV3L0dUm8lsWbkngji25cUuOauiipi5r6UFIfZHVZEpeyuCqJK1n8UBI/yOLtrthfdsX+KsWK+otDsdofT9BUvDjyRyiZwzR5ffnmxzTp59fmWYiRblaN0H00now63X43lWX8qLdHPdNy6nph6FoD01YZCsegaxvOcMtyLHkLaMPoDAcdOQrird7r26O6voU1Bl3HUhi2tCO7PexuyodQKFm9wmf3O+1aWbwip29Z1mm/rhcGq23bvWOFoXDYjuMM7ByFxoxiJHnpo7HTOTXqUbQI6hmd9kChb2+A0bOsGiwtsVgjy3TMnIUjgCUnLx4WuzfoD+Ugvq2/NbDt2g3kpfL3bNNpKwxb1lPndHiSkxAGQk+uCinK1+n2TnpyFCmCRl1xB2ssZPtLo75ldGsspMQysmwrK2z1JIpDdmfeJ+tX7vbc6QemnqayWRw02ZydvUez5MUKM252K+27/OoFuBG+vlMIm+KhIhw2wkAVC2yGh0p4uAveq/u9pnhPEe41wngqFq8Z3lPCe7vgad1Pm+KpIpw2wlAVC22Gp0p4ugue1/28KZ4rwnkjDFex8GZ4roTnu+BJ3U+a4okinDTCEBULaYYnSniyA56hB9HyZZ2CeLoSVosUPbnoZnMd4gTU9IgDjnKZ4CSqyS6fIQ4y3Q0EU/6P7Jn4ESRxyPMUipOxB4SSrjsWmn1gAKxnDTrBuh9uepjK65VmS+dQtLVr93qbDhIfKgy9ER3QO9FdA9Fw/iA2xLzAFxsSf8eH2WiXESwfjWK0Lz6aTPkTqT64OT4y20em+VP74FVn8/30VPtW+077XjO1rvZKe62da1ca1DztN+137Y/Wn62/Wn+3/llbn+xt1nyjVa7Wv/8DWDA3nA==</latexit>

t = 0<latexit sha1_base64="9IsLHzzfNX+Fj4UGAv60VtFgecw=">AAAOIHicfZdNb9s2HMaV7q3L5q3djrsICwoMQxBIieOXQ4Faso0e1jYL8rbVQUDRtCyEEgmScuwK+hK7bpd9mh2HHbdPM0p2ZImirEsYPg8f//inKJAexQEXlvXv3pOPPv7k08+efr7/xZetr75+9vybK05iBtElJJiwGw9whIMIXYpAYHRDGQKhh9G1d+9m+vUCMR6Q6EKsKLoNgR8FswACIbtuJlwAge7E3bMD68jKH7PesDeNA2PznN09b+1NpgTGIYoExIDz97ZFxW0CmAggRun+JOaIAngPfPQ+FrPebRJENBYogqn5QmqzGJuCmBmTOQ0YggKvZANAFsgEE84BA1BI8v1qFEcRCBE/nC4CytdNvvDXDQHktG+TZV6WtDIw8Rmg8wAuK2QJCHkIxLzWyVehV+1EMUZsEVY7M0rJqDiXiMGAZzU4k4V5R7NK8wtyttHnKzpHEU+TmOG0PFAKiDE0kwPzJkcipkk+Gbm89/ylYDE6zJp538shYPfnaHoocyodVZwZJkBUu7wwK06EHiAJQxBNkwlNk4lAS5FMDo9SKb4wPSzNHgFsWnWep0kyyWrmeea5tFbEtyXxrSqOSuJo8yMET80ZYeZCrj9h3JRGU1pYABGvjr4sRs/MSzX6qiReqeJ1SbxWRS8uqXFNXZTURU19KKkPqrosiUtVXJXElSp+KIkfVPFmV+wvu2J/VWJl/eWmWO1Ppmgmvx75K5TcwzR5ffHmpzTp58/mXYiRaVeN0Hs0now73X43VWX8qLfHPdsZ1vXC0HUGtqszFI5B17WGoy3LseItoC2rMxp01CiIt3qv747r+hbWGnSHjsawpR277VF3Uz6EIsXqFz6332nXyuIXOX3HcU77db0wOG3X7R1rDIXDHQ6HAzdHoTGjGCle+mjsdE6tehQtgnpWpz3Q6NsFsHqOU4OlJRZn7NhDO2cRCGDFKYqXxe0N+iM1SGzr7wxct7aAolT+nmsP2xrDlvV0eDo6yUkIA5GvVoUU5et0eyc9NYoUQeOuXMEaC9n+0rjvWN0aCymxjB3XyQpb3Ylyk723b5P1J3e778wD20xT1Sw3mmrO9t6jWfFijRk3u7X2XX79ANwIX58phE3xUBMOG2GgjgU2w0MtPNwF79f9flO8rwn3G2F8HYvfDO9r4f1d8LTup03xVBNOG2GojoU2w1MtPN0FL+p+0RQvNOGiEUboWEQzvNDCi13wpO4nTfFEE04aYYiOhTTDEy082QHP0IM88mUnBfl2JawWKc/k8jSb6xAnoKbn94lcJjjhNdkTcyRApnuhZMr/UT3TgEMSRyJPoTiZ+EAq6frEQrMLBsBmdkAn2AyizRmm8nml2dB7KI+1a/d6mkMkLyoMvZEnoHfydA3kgfNHOSHmh4GckPw7Ocxau4xg+WiUrX15abLVK1K9cX18ZLePbPvn9sGrzub+9NT4zvje+MGwja7xynhtnBmXBjSw8Zvxu/FH68/WX62/W/+srU/2NmO+NSpP67//AbWYOj0=</latexit>st

<latexit sha1_base64="OodyT2X3NV9wlOnDK1QsNJowrCo=">AAAOR3icfZdNb9s2HMaV7q3Lli7djrsICwp0QxZIieOXQ4Faso0e1jYL8tI1CgKKpmXBlEiQlGNX06fYp9l1u+wr7EvsOOw4SnZkiZKsSxg+Dx//+BcpkC7FPheG8ffOo48+/uTTzx5/vvvFl3tPvtp/+vUVJxGD6BISTNg7F3CE/RBdCl9g9I4yBAIXo2t3Zqf69Rwx7pPwQiwpug2AF/oTHwIhu+72f3QAzFrC4X6gO9S/c1wxRQI8z5VfHS6AQHfi+7v9A+PIyB692jDXjQNt/ZzdPd3bccYERgEKBcSA8xvToOI2Bkz4EKNk14k4ogDOgIduIjHp3sZ+SCOBQpjoz6Q2ibAuiJ6i62OfISjwUjYAZL5M0OEUMIkpJ7hbjuIoBAHih+O5T/mqyefeqiGArM5tvMiql5QGxh4DdOrDRYksBgEPgJhWOvkycMudKMKIzYNyZ0opGRXnAjHo87QGZ7Iwb2labH5Bztb6dEmnKORJHDGcFAdKATGGJnJg1uRIRDTOJiNXwYy/ECxCh2kz63sxAGx2jsaHMqfUUcaZYAJEucsN0uKE6B6SIADhOHZoEjsCLUTsHB4lUnymu1iaXQLYuOw8T+LYSWvmuvq5tJbENwXxjSoOC+Jw/SMEj/UJYfpcvn/CuC6NurQwHyJeHn2Zj57ol2r0VUG8UsXrgnitim5UUKOKOi+o84p6X1DvVXVREBequCyIS1X8UBA/qOK7bbG/bIt9r8TK+stNsdx1xmgiPzLZEopnMIlfXbz+KYl72bNeCxHSzbIRug/Gk1G70+skqowf9Naoa1qDqp4bOlbftOsMuaPfsY3BcMNyrHhzaMNoD/ttNQrijd7t2aOqvoE1+p2BVWPY0I7s1rCzLh9CoWL1cp/da7cqZfHynJ5lWae9qp4brJZtd49rDLnDHgwGfTtDoRGjGCle+mBst0+NahTNg7pGu9Wv0TcvwOhaVgWWFliskWUOzIxFIIAVp8gXi93t94ZqkNjU3+rbduUFikL5u7Y5aNUYNqyng9PhSUZCGAg9tSokL1+70z3pqlEkDxp15BussJDNL416ltGpsJACy8iyrbSw5Z0oN9mNeRuvPrmbfacfmHqSqGa50VRzuvcezIoX15hxs7vWvs1fPwA3wldnCmFTPKwJh40wsI4FNsPDWni4Dd6r+r2meK8m3GuE8epYvGZ4rxbe2wZPq37aFE9rwmkjDK1joc3wtBaeboMXVb9oihc14aIRRtSxiGZ4UQsvtsGTqp80xZOacNIIQ+pYSDM8qYUnW+AZupdHvvSkIFdXzCqRq6tDpkMcg4qeXSgymeCYV+TVDSTV3UAyZf+onrHPIYlCkaVQHDsekEqyOrHQ9IIBsJ4e0AnW/XB9hil9Xmk6dAblsXblXk1zgORFhaHX8gT0Vp6ugTxw/iAnxLzAlxOSf53DtLXNCBYPRtnalZcmU70iVRvXx0dm68g0f24dvGyv70+PtW+177Tnmql1tJfaK+1Mu9Sg9pv2u/aH9ufeX3v/7P2799/K+mhnPeYbrfQ82fkfs7hJNQ==</latexit>

at ⇠ ⇡✓(at|st)<latexit sha1_base64="vYZ3uLJ56rkkqyluUrknHQsGSvo=">AAAObnicfZdNb9s2HMaV7q3LlizdsNMwTFhQIFuNQEocvxwK1JJt9LC2WZA02aIgoGhaFkyJBEk5djV9sX2T3XbdLvsKo2THlijJupjm8/Dxj3+RBulS7HNhGH/tPPno408+/ezp57tffLm3/9XBs6/fcxIxiK4gwYTduIAj7IfoSvgCoxvKEAhcjK7dqZ3q1zPEuE/CS7Gg6C4AXuiPfQiE7Lo/uHS4AALdx+KFmTR0h6EHwEbLrw73A50ebXHof+grVUgFwCxT/HR/cGgcG9mjlxvmqnGorZ7z+2d7O86IwChAoYAYcH5rGlTcxYAJH2KU7DoRRxTAKfDQbSTGnbvYD2kkUAgT/bnUxhHWBdHTGeojnyEo8EI2AGS+TNDhBDBJJ+uwW4ziKAQB4o3RzKd82eQzb9kQQBbxLp5nRU4KA2OPATrx4bxAFoOAB0BMSp18EbjFThRhxGZBsTOllIyKc44Y9Hlag3NZmHc0rTG/JOcrfbKgExTyJI4YTvIDpYAYQ2M5MGtyJCIaZ5ORi2XKXwoWoUbazPpe9gGbXqBRQ+YUOoo4Y0yAKHa5QVqcED1AEgQgHMUOTWJHoLmIncZxIsXnuoul2SVy4RSdF0kcO2nNXFe/kNaC+DYnvlXFQU4crH6E4JE+JkyfyfdPGNelUZcW5kPEi6Ov1qPH+pUa/T4nvlfF65x4rYpulFOjkjrLqbOS+pBTH1R1nhPnqrjIiQtV/JATP6jizbbY37bF/q7EyvrLTbHYdUZoLP+LsiUUT2ESv75880sSd7NntRYipJtFI3QfjafDVrvbTlQZP+rNYce0+mV9bWhbPdOuMqwdvbZt9AcblhPFu4Y2jNag11KjIN7ona49LOsbWKPX7lsVhg3t0G4O2qvyIRQqVm/ts7utZqks3jqna1nWWbesrw1W07Y7JxWGtcPu9/s9O0OhEaMYKV76aGy1zoxyFF0HdYxWs1ehb16A0bGsEizNsVhDy+ybGYtAACtOsV4sdqfXHahBYlN/q2fbpRcocuXv2Ga/WWHYsJ71zwanGQlhIPTUqpB1+VrtzmlHjSLroGFbvsESC9n80rBrGe0SC8mxDC3bSgtb3Ilyk92ad/HyL3ez7/RDU08S1Sw3mmpO996jWfHiCjOud1fat/mrB+Ba+PJMIayLhxXhsBYGVrHAenhYCQ+3wXtlv1cX71WEe7UwXhWLVw/vVcJ72+Bp2U/r4mlFOK2FoVUstB6eVsLTbfCi7Bd18aIiXNTCiCoWUQ8vKuHFNnhS9pO6eFIRTmphSBULqYcnlfBkC/zyrpCeFOTqilkpcnljyHSIY1DSs7tFJhMc85LsigkSINXdQDJlX1TPyOeQRKHIUiiOHQ9IJVmeWGh6wQBYTw/oBOt+uDrDFP5eaTp0CuWxduleTrOP5EWFoTfyBPROnq6BPHD+LCfEvMCXE5KfTiNtbTOC+aNRtnblpclUr0jlxvXJsdk8Ns1fm4evWqv701PtO+1H7Ugztbb2SnutnWtXGtT+1P7W/tH+3ftv/9v97/d/WFqf7KzGfKMVnv2j/wFd5lSV</latexit>

st+1, rt+1 ⇠ p(st+1, rt+1|st,at)<latexit sha1_base64="INSRbFyUjTjotUdeQeTz/mxDbOo=">AAAOHXicfZdPb9s2GMaV7l+XzWu7HXcRFhQYtiCQEsd/DgFqyTZ6WNssiJNssVFQNC0LpkSCohy7gj7Drttln2bHYddh32aU7MgSRVmXMHwePv7xlSi8cij2Qm4Y/x08+ejjTz797Onnh1982fjq2fMXX9+EJGIQjSDBhN05IETYC9CIexyjO8oQ8B2Mbp2Fneq3S8RCjwTXfE3RxAdu4M08CLiYGvEL/qP5/vmRcWJkl14dmNvBkba9Lt+/aByMpwRGPgo4xCAM702D8kkMGPcgRsnhOAoRBXABXHQf8VlnEnsBjTgKYKK/FNoswjonegqkTz2GIMdrMQCQeSJBh3PAAOQC+7AcFaIA+Cg8ni49Gm6G4dLdDDgQe57Eq6wmSWlh7DJA5x5clchi4Ic+4PPKZLj2nfIkijBiS788mVIKRsm5Qgx6YVqDS1GYdzQtc3hNLrf6fE3nKAiTOGI4KS4UAmIMzcTCbBgiHtE424y4t4vwgrMIHafDbO6iD9jiCk2PRU5poowzwwTw8pTjp8UJ0AMkvg+CaTymSTzmaMXj8fFJIsSXuoOF2SGATcvOqySOx2nNHEe/EtaS+LYgvpXFQUEcbH+E4Kk+I0xfivtPWKgLoy4szIMoLK8e5atn+kiOvimIN7J4WxBvZdGJCmpUUZcFdVlRHwrqg6yuCuJKFtcFcS2LHwriB1m82xf7y77YX6VYUX9xKNaH4ymaiVdH9gjFC5jEr6/f/JTE3ezaPgsR0s2yETqPxrNhq91tJ7KMH/XmsGNa/aqeG9pWz7RVhtzRa9tGf7BjOZW8ObRhtAa9lhwF8U7vdO1hVd/BGr1231IYdrRDuzlob8uHUCBZ3dxnd1vNSlncPKdrWdZ5t6rnBqtp251ThSF32P1+v2dnKDRiFCPJSx+Nrda5UY2ieVDHaDV7Cn13A4yOZVVgaYHFGlpm38xYOAJYcvL8YbE7ve5ADuK7+ls9267cQF4of8c2+02FYcd63j8fnGUkhIHAlatC8vK12p2zjhxF8qBhW9zBCgvZ/dKwaxntCgspsAwt20oLWz6J4pDdm5N488rdnTv9yNSTRDaLgyab07P3aJa8WGHG9W6lfZ9fvQDXwld3CmFdPFSEw1oYqGKB9fBQCQ/3wbtVv1sX7yrC3VoYV8Xi1sO7Snh3Hzyt+mldPFWE01oYqmKh9fBUCU/3wfOqn9fFc0U4r4XhKhZeD8+V8HwfPKn6SV08UYSTWhiiYiH18EQJT/bAM/QgWr60UxBPV8wqkaInF91spkMcg4oecsBRJhMchxXZ4XPEQao7vmDK/pE9Uy+EJAp4lkJxPHaBUJJNx0LTDwyA9bRBJ1j3gm0PU3q90nTpAoq2duPebLOPxIcKQ29EB/ROdNdANJw/iA0x1/fEhsTf8XE62mcEq0ejGB2KjyZT/kSqDm5PT8zmiWn+3Dx61dp+Pz3VvtW+077XTK2tvdJea5faSIOap/2m/a790fiz8Vfj78Y/G+uTg+2ab7TS1fj3f06AOFA=</latexit>

t = t+ 1

ESTIMATE EXPECTED RETURN

17

Monte Carlo estimate of expected return <latexit sha1_base64="TiVJ9OGn/h7kAOyG3yMK5v3AT8g=">AAAOfHicfZdvb6M2HMe5279bt3a97eGesFUn3bZeBW2aPw8qXSCJTtPurqvapluJKuM4BMVgy5g0Ocb72avZ020vZtIMSQgYCE/i+Pv1l49/YGTbFLsB17R/nzz96ONPPv3s2ed7X3y5f/DV4fOvbwMSMohuIMGE3dkgQNj10Q13OUZ3lCHg2RgN7ZmZ6MM5YoFL/Gu+pGjkAcd3Jy4EXHQ9HBoWdHD0c/zSsvkUcfDDheUBPrXtqB8/RDTpBqH6x0aN71P/Vfxgjd0AktDno4fDI+1ESy+13NDXjSNlfV0+PN//zhoTGHrI5xCDILjXNcpHEWDchRjFe1YYIArgDDjoPuST9ihyfRpy5MNYfSG0SYhVTtRkPurYZQhyvBQNAJkrElQ4BQxALma9V4wKkA88FByP5y4NVs1g7qwaHIiSjaJFWtK4MDByGKBTFy4KZBHwgqRUpc5g6dnFThRixOZesTOhFIySc4EYdIOkBpeiMO9p8pSCa3K51qdLOkV+EEchw3F+oBAQY2giBqbNAPGQRulkxKsxCy44C9Fx0kz7LnqAza7Q+FjkFDqKOBNMAC922V5SHB89QuJ5wB9HFo0ji6MFj6zjk1iIL1QbC7NNABsXnVdxFK1fL/VKWAviu5z4Thb7ObG/vgnBY3VCmDoXz5+wQBVGVViYC1FQHH2TjZ6oN3L0bU68lcVhThzKoh3m1LCkznPqvKQ+5tRHWV3kxIUsLnPiUhY/5MQPsni3K/a3XbG/S7Gi/mJRLPesMZqIL0/6CkUzGEdvrt/+Eked9Fq/CyFS9aIR2hvj2aDZ6rRiWcYbvTFo60avrGeGltHVzSpD5ui2TK3X37KcSt4MWtOa/W5TjoJ4q7c75qCsb2G1bqtnVBi2tAOz0W+ty4eQL1mdzGd2mo1SWZwsp2MYxnmnrGcGo2Ga7dMKQ+Ywe71e10xRaMgoRpKXbozN5rlWjqJZUFtrNroV+vYBaG3DKMHSHIsxMPSenrJwBLDk5NnLYra7nb4cxLf1N7qmWXqAPFf+tqn3GhWGLet577x/lpIQBnxHrgrJytdstc/achTJggYt8QRLLGR7p0HH0FolFpJjGRimkRS2uBLFIrvXR9Hqk7tdd+qRrsaxbBYLTTYna29jlry4wozr3ZX2Xf7qAbgWvjxTCOviYUU4rIWBVSywHh5WwsNd8E7Z79TFOxXhTi2MU8Xi1MM7lfDOLnha9tO6eFoRTmthaBULrYenlfB0Fzwv+3ldPK8I57UwvIqF18PzSni+C56U/aQunlSEk1oYUsVC6uFJJTzZAc/Qo9jyJTuF5IzASpFiTy52s6kOcQRKesABR6lMcBSU5NURJNFtTzClf8oeEGYO0ZT1zaElvQvFkeUAocSrHQ1NDiAAq8kGnmDV9dd7nMLnlyZDZ1Bse1fuVRl6SBxkGHordkjvxe4biA3pj2LCzPFcMWHxax0nrV1GsNgYRWtPHKp0+QhVbgxPT/TGia7/2jh63Vyfr54p3yrfKy8VXWkpr5U3yqVyo0DlT+Uv5W/ln/3/Do4Ofjp4tbI+fbIe841SuA6a/wOkZ1zz</latexit>

J(✓) = Ep(⌧|✓)[R�]

<latexit sha1_base64="e35Dprw10MZ6DsPODwhWpB7Vo5U=">AAAOQHicfZfLbuM2GIWV6W2a1m2mXXYjNJiiKNJAShxfFgHGkm3MojOTBrl1okxA0bQsmBIJknLsEfQGfZpu200fo0/QZdFtV6Vkx5YoydqE4Tk8/vhTFEiXYp8Lw/hr58kHH3708SdPP9397PPGF1/uPfvqipOIQXQJCSbsxgUcYT9El8IXGN1QhkDgYnTtTu1Uv54hxn0SXogFRXcB8EJ/7EMgZNf93ncOj4L72D81k3dCd0Y+hyQKxbvY/9FMdIehB8BG9/793r5xaGSPXm6Yq8a+tnrO7p81dpwRgVGAQgEx4PzWNKi4iwETPsQo2XUijiiAU+Ch20iMO3exH9JIoBAm+nOpjSOsC6KnzPrIZwgKvJANAJkvE3Q4AQxAIWe2W4ziKAQB4gejmU/5ssln3rIhgCzLXTzPypYUBsYeA3Tiw3mBLAYBD4CYlDr5InCLnSjCiM2CYmdKKRkV5xwx6PO0BmeyMG9ouhL8gpyt9MmCTlDIkzhiOMkPlAJiDI3lwKzJkYhonE1GLv+UnwoWoYO0mfWd9gGbnqPRgcwpdBRxxpgAUexyg7Q4IXqAJAhAOIodmsSOQHMROweHiRSf6y6WZpfIt6PoPE/i2Elr5rr6ubQWxNc58bUqDnLiYPUjBI/0MWH6TK4/YVyXRl1amA8RL46+XI8e65dq9FVOvFLF65x4rYpulFOjkjrLqbOS+pBTH1R1nhPnqrjIiQtVfJ8T36vizbbYX7bFvlViZf3lpljsOiM0ll+X7BWKpzCJX168+imJu9mzehcipJtFI3QfjcfDVrvbTlQZP+rNYce0+mV9bWhbPdOuMqwdvbZt9AcbliPFu4Y2jNag11KjIN7ona49LOsbWKPX7lsVhg3t0G4O2qvyIRQqVm/ts7utZqks3jqna1nWSbesrw1W07Y7RxWGtcPu9/s9O0OhEaMYKV76aGy1ToxyFF0HdYxWs1ehbxbA6FhWCZbmWKyhZfbNjEUggBWnWL8sdqfXHahBYlN/q2fbpQUUufJ3bLPfrDBsWE/6J4PjjIQwEHpqVci6fK1257ijRpF10LAtV7DEQja/NOxaRrvEQnIsQ8u20sIWd6LcZLfmXbz85G72nb5v6kmimuVGU83p3ns0K15cYcb17kr7Nn/1AFwLX54phHXxsCIc1sLAKhZYDw8r4eE2eK/s9+rivYpwrxbGq2Lx6uG9SnhvGzwt+2ldPK0Ip7UwtIqF1sPTSni6DV6U/aIuXlSEi1oYUcUi6uFFJbzYBk/KflIXTyrCSS0MqWIh9fCkEp5sgV9eCNKTgny7YlaKlGdyeZrNdIhjUNK5AAJlMsExL8mumCABUt0NJFP2j+p5vKNkKRTHjgekkixPLDS9YACspwd0gnU/XJ1hCp9Xmg6dQnmsXbqX0+wjeVFh6JU8Ab2Rp2sgD5w/yAkxL/DlhORf5yBtbTOC+aNRtnblpclUr0jlxvXRodk8NM2fm/svWqv701PtG+1b7XvN1NraC+2ldqZdalD7VftN+137o/Fn4+/GP41/l9YnO6sxX2uFp/Hf/5xgRsY=</latexit>

tX

i=1

�i-1ri

<latexit sha1_base64="KIzHNcLYbod1+x/KccCLlzNOPGI=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe+P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wN8r/HQ</latexit>s0<latexit sha1_base64="M8sqRzeDFh3Mi1iBGtqcTbyBm/Y=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk98b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gf1BfGy</latexit>a0

~<latexit sha1_base64="9xsJmqaI6QVO7aNPEsbsg9BoZ44=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL7837Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q8UNPHI</latexit>r1

<latexit sha1_base64="OzSwNdQHk8BIaE6qZCRw39GSBas=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe/P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wOJuPHR</latexit>s1<latexit sha1_base64="faCoi3suTLeizhRJ44TVzLkE1Ww=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk9+b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gcCHfGz</latexit>a1

~<latexit sha1_base64="pd3L+wPVTqb/6ceddXPUL+1bOrE=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL70/unx0ax8by0quFuS4OtfV1fv98fzAcERgHKOQQgyi6NQ3K7xLAuA8xSveGcYQogFPgoduYj9t3iR/SmKMQpvpLoY1jrHOiZ1D6yGcIcrwQBYDMFwk6nAAGIBfoe+WoCIUgQNHRaObTaFVGM29VcCDu+y6ZL/uSliYmHgN04sN5iSwBQRQAPqkMRovALQ+iGCM2C8qDGaVglJxzxKAfZT04F415T7NWR5fkfK1PFnSCwihNYobT4kQhIMbQWExclhHiMU2WNyPWdxq94ixGR1m5HHvlADa9QKMjkVMaKOOMMQG8POQGWXNC9ABJEIBwlAxpmgw5mvNkeHScCvGl7mJhdglgo7LzIk2SYdYz19UvhLUkviuI72SxVxB76x8heKSPCdNnYv0Ji3Rh1IWF+RBF5dmDfPZYH8jRVwXxShavC+K1LLpxQY0r6qygzirqQ0F9kNV5QZzL4qIgLmTxU0H8JIs3u2I/7Ir9KMWK/otNsdgbjtBYvD6Wj1AyhWny5vLtH2nSWV7rZyFGulk2QndjPO03W51WKst4ozf6bdNyqnpuaFld01YZcke3ZRtOb8tyInlzaMNo9rpNOQrird7u2P2qvoU1ui3HUhi2tH270Wut24dQKFm93Gd3mo1KW7w8p2NZ1lmnqucGq2Hb7ROFIXfYjuN07SUKjRnFSPLSjbHZPDOqUTQPahvNRlehbxfAaFtWBZYWWKy+ZTrmkoUjgCUnzx8Wu93t9OQgvu2/1bXtygLyQvvbtuk0FIYt65lz1jtdkhAGQk/uCsnb12y1T9tyFMmD+i2xghUWsv2lfscyWhUWUmDpW7aVNba8E8UmuzXvktUrd7vv9ENTT1PZLDaabM723sYsebHCjOvdSvsuv3oCroWv3imEdfFQEQ5rYaCKBdbDQyU83AXvVf1eXbynCPdqYTwVi1cP7ynhvV3wtOqndfFUEU5rYaiKhdbDUyU83QXPq35eF88V4bwWhqtYeD08V8LzXfCk6id18UQRTmphiIqF1MMTJTwpw4s/j+zUDrCenXoJ1v1wfTAovbNodnyYQnFWXLlXN+4gcfpn6K04VrwXR1YgTnG/JkPAvMAPU/E14A2PsmqXEcw3RlGJDxFT/uyoFtcnx2bj2DT/bBy+/n39TfJU+177UftFM7WW9lp7o51rAw1qgfaX9rf2z/5/Bz8c/HTw88r6+NF6zgutdB389j8hPfHJ</latexit>r2

~

~ ~

~

~

<latexit sha1_base64="1RcFirdzOoS5tyEl3SzBljfwnaY=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnS+5P7Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q+WwfHS</latexit>s2<latexit sha1_base64="MJSdrcnRCn4KhDzKoML3O8YiGsk=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk9yf3zw6NY2N56dXCXBeH2vo6v3++PxiOCIwDFHKIQRTdmgbldwlg3IcYpXvDOEIUwCnw0G3Mx+27xA9pzFEIU/2l0MYx1jnRMyh95DMEOV6IAkDmiwQdTgADkAv0vXJUhEIQoOhoNPNptCqjmbcqOBD3fZfMl31JSxMTjwE68eG8RJaAIAoAn1QGo0XglgdRjBGbBeXBjFIwSs45YtCPsh6ci8a8p1mro0tyvtYnCzpBYZQmMcNpcaIQEGNoLCYuywjxmCbLmxHrO41ecRajo6xcjr1yAJteoNGRyCkNlHHGmABeHnKDrDkheoAkCEA4SoY0TYYczXkyPDpOhfhSd7EwuwSwUdl5kSbJMOuZ6+oXwloS3xXEd7LYK4i99Y8QPNLHhOkzsf6ERbow6sLCfIii8uxBPnusD+Toq4J4JYvXBfFaFt24oMYVdVZQZxX1oaA+yOq8IM5lcVEQF7L4qSB+ksWbXbEfdsV+lGJF/8WmWOwNR2gsXh/LRyiZwjR5c/n2jzTpLK/1sxAj3SwbobsxnvabrU4rlWW80Rv9tmk5VT03tKyuaasMuaPbsg2nt2U5kbw5tGE0e92mHAXxVm937H5V38Ia3ZZjKQxb2r7d6LXW7UMolKxe7rM7zUalLV6e07Es66xT1XOD1bDt9onCkDtsx3G69hKFxoxiJHnpxthsnhnVKJoHtY1mo6vQtwtgtC2rAksLLFbfMh1zycIRwJKT5w+L3e52enIQ3/bf6tp2ZQF5of1t23QaCsOW9cw5650uSQgDoSd3heTta7bap205iuRB/ZZYwQoL2f5Sv2MZrQoLKbD0LdvKGlveiWKT3Zp3yeqVu913+qGpp6lsFhtNNmd7b2OWvFhhxvVupX2XXz0B18JX7xTCunioCIe1MFDFAuvhoRIe7oL3qn6vLt5ThHu1MJ6KxauH95Tw3i54WvXTuniqCKe1MFTFQuvhqRKe7oLnVT+vi+eKcF4Lw1UsvB6eK+H5LnhS9ZO6eKIIJ7UwRMVC6uGJEp6U4cWfR3ZqB1jPTr0E6364PhiU3lk0Oz5MoTgrrtyrG3eQOP0z9FYcK96LIysQp7hfkyFgXuCHqfga8IZHWbXLCOYbo6jEh4gpf3ZUi+uTY7NxbJp/Ng5f/77+Jnmqfa/9qP2imVpLe6290c61gQa1QPtL+1v7Z/+/gx8Ofjr4eWV9/Gg954VWug5++x8PJvG0</latexit>a2

<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>

⇡✓<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>

⇡✓<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>

⇡✓

Here, we show the monte carlo algorithm for computing an estimation of the expected return given a policy \pi_\theta. The algorithm is simple: We generate a trajectory from the MDP, and compute the total discounted return.

The tildes used in this algorithm are an operation that denotes sampling from a distribution.

In the second line, we sample an initial state s_0. Then, we loop until we find a terminal state:

1: we use our policy network \pi_\theta to compute a distribution over actions, from which we sample our next action to take

2: use the sampled action and the previous state in our environment to sample the next state and reward.

Step 2 is a bit hard to follow: Environments are black-box functions that simulate some complicated process, like a game or a physics engine. It need not be stochastic, but it’s easiest to assume it is. Sampling from the environment simply means to simulate the environment and create the next state, whether this is done deterministically or stochastically.

After encountering a terminal state, we return the monte carlo estimate: The discounted sum of rewards found in the trajectory.

Note that algorithms described in this way are also called generative stories or generative processes: They describe how the data is generated. It is an alternative and intuitive formulation for probabilistic graphical models to describe complicated joint probability distributions like MDPs! We have seen a generative story before in the VAE lecture.

Page 7: Lecture 9: Reinforcement Learning Emile van Krieken · 2021. 1. 25. · Emile van Krieken Deep Learning 2020 Lecture 9: Reinforcement Learning dlvu.github.io Reinforcement learning

How to find ? →Policy gradient methods: Use in gradient ascent

<latexit sha1_base64="bXnJqj0zFD7qRndUfNZuXhjS+jM=">AAAORnicfZdNb6NGHMbJ9m2b1ttse+wFNVppu4oiSBy/HCKtwbZWVXc3jfLWhjQaxmOMPDCjYXDsRXyJfppe20s/Qz9Ej1WvHTDBMIA5jed55vGPPwz6j02xG3BN+3vnyUcff/LpZ08/3/3iy9azr/aef30VkJBBdAkJJuzGBgHCro8uucsxuqEMAc/G6Nqem4l+vUAscIl/wVcU3XnA8d2pCwEXU/d7B5bNZ4iDX1+dRhZgjuWBZXyfTVrQwdEP8cvs5/f3e/vaoZZeanWgZ4N9JbvO7p+3dqwJgaGHfA4xCIJbXaP8LgKMuxCjeNcKA0QBnAMH3YZ82ruLXJ+GHPkwVl8IbRpilRM1IVcnLkOQ45UYAMhckaDCGWAAcnF/u+WoAPnAQ8HBZOHSYD0MFs56wIEozl20TIsXlxZGDgN05sJliSwCXuABPqtMBivPLk+iECO28MqTCaVglJxLxKAbJDU4E4V5T5PnEVyQs0yfregM+UEchQzHxYVCQIyhqViYDgPEQxqlNyNegnlwylmIDpJhOnc6BGx+jiYHIqc0UcaZYgJ4ecr2kuL46AESzwP+JLJoHFkcLXlkHRzGQnyh2liYbQLYpOw8j6PISmpm2+q5sJbEdwXxnSyOCuIo+xOCJ+qUMHUhnj9hgSqMqrAwF6KgvPoyXz1VL+Xoq4J4JYvXBfFaFu2woIYVdVFQFxX1oaA+yOqyIC5lcVUQV7L4oSB+kMWbbbE/b4v9RYoV9RebYrVrTdBUfGPSVyiawzh6c/H2xzjqp1f2LoRI1ctGaD8aj8edbr8byzJ+1Nvjnm4Mq3pu6BoD3awz5I5B19SGow3LkeTNoTWtMxp05CiIN3qvb46r+gZWG3SHRo1hQzs226NuVj6EfMnq5D6z32lXyuLkOX3DME76VT03GG3T7B3VGHKHORwOB2aKQkNGMZK89NHY6Zxo1SiaB/W0TntQo28egNYzjAosLbAYY0Mf6ikLRwBLTp6/LGZv0B/JQXxTf2NgmpUHyAvl75n6sF1j2LCeDE9GxykJYcB35KqQvHydbu+4J0eRPGjcFU+wwkI2/zTuG1q3wkIKLGPDNJLClnei2GS3+l20/uRu9p26r6txLJvFRpPNyd57NEteXGPGze5a+zZ//QLcCF+9Uwib4mFNOGyEgXUssBke1sLDbfBO1e80xTs14U4jjFPH4jTDO7XwzjZ4WvXTpnhaE04bYWgdC22Gp7XwdBs8r/p5UzyvCeeNMLyOhTfD81p4vg2eVP2kKZ7UhJNGGFLHQprhSS082QLP0INo+ZJOITkhsEqk6MlFN5vqEEegogcccJTKBEdBRV4fNxLd9gRT+kP2TNwAktDnaQrFkeUAocTrjoUmBwyA1aRBJ1h1/ayHKX1eabJ0DkVbu3avb3OIxEGFobeiA3ovumsgGs5X6cnIc8UNJSekg2S0zShOUJlRjHbFoUmXj0jVwfXRod4+1PWf2vuvO9n56anyrfKd8lLRla7yWnmjnCmXClR+U35X/lD+bP3V+qf1b+u/tfXJTrbmG6V0PVP+Bw2XSHw=</latexit>

✓⇤ = argmax✓J(✓)<latexit sha1_base64="o5GAbZzkXTS46Vj7G4bTnaIGpf4=">AAAOOXicfZdNb9s2HMaV7q3L5i3djjtMWFCgG4JAShy/HArUkm0Uw9pmQV7aRUFA0bQimBIJknLsCjru0+y6XfZNdttx2HVfYJTsyBIlWRfTfB4+/vFP0SBdin0uDOOvnUcffPjRx588/nT3s89bX3y59+SrS04iBtEFJJiwty7gCPshuhC+wOgtZQgELkZX7sxO9as5Ytwn4blYUnQTAC/0pz4EQnbd7n3rhMDF4NZxxR0SQHegh+Mfk2fr79/f7u0bh0b26NWGuW7sa+vn9PZJa8eZEBgFKBQQA86vTYOKmxgw4UOMkl0n4ogCOAMeuo7EtHcT+yGNBAphoj+V2jTCuiB6CqtPfIagwEvZAJD5MkGHd4ABKOSUdstRHIUgQPxgMvcpXzX53Fs1hJwjuokXWb2S0sDYY4De+XBRIotBwAMg7iqdfBm45U4UYcTmQbkzpZSMinOBGPR5WoNTWZg3NF0Cfk5O1/rdkt6hkCdxxHBSHCgFxBiayoFZkyMR0TibjFz3GX8uWIQO0mbW93wI2OwMTQ5kTqmjjDPFBIhylxukxQnRPSRBAMJJ7NAkdgRaiNg5OEyk+FSXbwucuQSwSdl5lsSxk9bMdfUzaS2Jrwvia1UcFcTR+kcInuhTwvS5XH/CuC6NurQwHyJeHn2Rj57qF2r0ZUG8VMWrgnilim5UUKOKOi+o84p6X1DvVXVREBequCyIS1V8XxDfq+LbbbHvtsX+osTK+stNsdx1Jmgq/1ayVyiewSR+ef7qpyTuZ8/6XYiQbpaN0H0wHo873X43UWX8oLfHPdMaVvXc0LUGpl1nyB2Drm0MRxuWI8WbQxtGZzToqFEQb/Re3x5X9Q2sMegOrRrDhnZst0fddfkQChWrl/vsfqddKYuX5/QtyzrpV/XcYLVtu3dUY8gd9nA4HNgZCo0YxUjx0gdjp3NiVKNoHtQzOu1Bjb5ZAKNnWRVYWmCxxpY5NDMWgQBWnCJ/WezeoD9Sg8Sm/tbAtisLKArl79nmsF1j2LCeDE9GxxkJYSD01KqQvHydbu+4p0aRPGjclStYYSGbXxr3LaNbYSEFlrFlW2lhyztRbrJr8yZe/eVu9p2+b+pJoprlRlPN6d57MCteXGPGze5a+zZ//QDcCF+dKYRN8bAmHDbCwDoW2AwPa+HhNniv6vea4r2acK8Rxqtj8ZrhvVp4bxs8rfppUzytCaeNMLSOhTbD01p4ug1eVP2iKV7UhItGGFHHIprhRS282AZPqn7SFE9qwkkjDKljIc3wpBaebIFn6F4e+dKTQnpDYJVIeSaXp9lMhzgGFZ0LIFAmExzziry6bqS6G0im7IvqmfgckigUWQrFseMBqSSrEwtNLxgA6+kBnWDdD9dnmNLfK02HzqA81q7cq2kOkbyoMPRKnoDeyNM1kAfOH+SEmBf4ckLy0zlIW9uMYPFglK1deWky1StStXF1dGi2D03z5/b+i876/vRY+0b7TnummVpXe6G91E61Cw1qv2q/ab9rf7T+bP3d+qf178r6aGc95mut9LT++x8FyEOq</latexit>

r✓J(✓)<latexit sha1_base64="nRUMS05VMtYzLdEIwt+hn6g1tIA=">AAAOZXicfZfdbts2HMWV7qvLmizdht3sYsKCAt0aBFLi+OMiQC3ZRrGtbRbESbYoCCialgVTIkFRjl1BD7WnGXa33ew5Rsm2LFGSdWOK5/D4x79EgbQpdgOuaX/vPPno408+/ezp57tfPNvb//Lg+VfXAQkZRENIMGG3NggQdn005C7H6JYyBDwboxt7aib6zQyxwCX+FV9QdO8Bx3fHLgRcdD0c/GLZfII4eIjcV3p8vr5z1VeqBSmOLIDpBMSWD2wsTJkeC9nB0c/xy6zrx4eDQ+1YSy+13NBXjUNldV08PN/bsUYEhh7yOcQgCO50jfL7CDDuQoziXSsMEAVwChx0F/Jx+z5yfRpy5MNYfSG0cYhVTtRkYurIZQhyvBANAJkrElQ4AQxALqa/W4wKkA88FByNZi4Nls1g5iwbXMwU3UfztLZxYWDkMEAnLpwXyCLgBR7gk1JnsPDsYicKMWIzr9iZUApGyTlHDLpBUoMLUZj3NHlcwRW5WOmTBZ0gP4ijkOE4P1AIiDE0FgPTZoB4SKN0MuIdmQbnnIXoKGmmfec9wKaXaHQkcgodRZwxJoAXu2wvKY6PHiHxPOCPIovGkcXRnEfW0XEsxBeqeGfg1CaAjYrOyziKrKRmtq1eCmtBfJcT38liPyf2V39C8EgdE6bOxPMnLFCFURUW5kIUFEcPs9FjdShHX+fEa1m8yYk3smiHOTUsqbOcOiupjzn1UVbnOXEui4ucuJDFDznxgyzebov9fVvsH1KsqL9YFItda4TG4hOUvkLRFMbRm6u3v8ZRJ71W70KIVL1ohPbaeDpotjqtWJbxWm8M2rrRK+uZoWV0dbPKkDm6LVPr9TcsJ5I3g9a0Zr/blKMg3ujtjjko6xtYrdvqGRWGDe3AbPRbq/Ih5EtWJ/OZnWajVBYny+kYhnHWKeuZwWiYZvukwpA5zF6v1zVTFBoyipHkpWtjs3mmlaNoFtTWmo1uhb55AFrbMEqwNMdiDAy9p6csHAEsOXn2spjtbqcvB/FN/Y2uaZYeIM+Vv23qvUaFYcN61jvrn6YkhAHfkatCsvI1W+3TthxFsqBBSzzBEgvZ/NOgY2itEgvJsQwM00gKW1yJYpHd6ffR8pO7WXfqoa7GsWwWC002J2tvbZa8uMKM692V9m3+6gG4Fr48Uwjr4mFFOKyFgVUssB4eVsLDbfBO2e/UxTsV4U4tjFPF4tTDO5XwzjZ4WvbTunhaEU5rYWgVC62Hp5XwdBs8L/t5XTyvCOe1MLyKhdfD80p4vg2elP2kLp5UhJNaGFLFQurhSSU82QLP0KPY8iU7heSMwEqRYk8udrOpDnEESnrAAUepTHAUlOTlgSPRbU8wpTeyZ+QGkIQ+T1OSc4wDhBIvdyw0OWAArCYbdIJV11/tYQqfV5oMnUKxrV26l9PsIXFQYeit2AG9F7trIDacP4kJMcdzxYTEr3WUtLYZwXxtFK1dcWjS5SNSuXFzcqw3jnX9t8bh6+bq/PRU+U75QXmp6EpLea28US6UoQKVP5W/lH+Uf/f+23+2/83+t0vrk53VmK+VwrX//f9eYlIX</latexit>

✓i+1 = ✓i + ↵r✓iJ(✓i)

MAXIMIZING EXPECTED RETURN

18

We just introduced how we can estimate the expected return of a policy \pi_\theta. Now, we want to know how to maximize this expected return!

Maximizing an objective in Deep RL amounts to finding the optimal parameters \theta^* that maximize the expected return. We will do this using policy gradient methods: Estimate the gradient of expected return, then use this gradient to update the parameters using gradient ascent. Note that we do gradient ascent and not gradient descent like in most other Deep Learning: We want to maximize the return, not minimize it! Alternatively, we can minimize the negative expected return :) Like in SGD, we choose a step size (often called learning rate) \alpha with which to update the parameters.

That is all nice, but this requires that we have access to the gradient of the expected return. Unfortunately, this is not an easy quantity! This is because of two problems:

1. The environment is assumed to be black-box (unknown) and stochastic. This means that we cannot compute gradients through the environment! This is a huge downside to RL: A differentiable environment could give us a lot of information on how to cleverly update the policy parameters.

2. Sampling an action from the policy is also not differentiable, because sampling from a distribution in general is not. Luckily, there are several techniques from the gradient estimation literature to circumvent this problem.

Reinforcement Learning • Agent acts in environment to receive rewards

MDPs • Full observability • Markov assumption

Find parameters maximizing expected reward • Policy gradient methods • Estimate gradient for gradient ascent

RECAP PART 1

19

This was it for the first part of the RL lecture! We introduced the reinforcement learning framework through the MDP. The MDP is a decision process with two strong assumptions: States are fully observable, with no information hidden, and the Markov assumption: The next state is independent of the history given the current state.

We then looked at the fundamental problem of RL: Maximizing the expected (discounted) reward. We showed a simple algorithm that estimates this reward by simply sampling a trajectory and summing over the rewards. We want to find parameters for our policy network that maximize this reward. How do we do this? We use gradient ascent, and estimate the gradient of the expected reward using policy gradient methods. We will explain those in the next part!

How to find ? • Policy gradient methods: Use in gradient ascent

<latexit sha1_base64="bXnJqj0zFD7qRndUfNZuXhjS+jM=">AAAORnicfZdNb6NGHMbJ9m2b1ttse+wFNVppu4oiSBy/HCKtwbZWVXc3jfLWhjQaxmOMPDCjYXDsRXyJfppe20s/Qz9Ej1WvHTDBMIA5jed55vGPPwz6j02xG3BN+3vnyUcff/LpZ08/3/3iy9azr/aef30VkJBBdAkJJuzGBgHCro8uucsxuqEMAc/G6Nqem4l+vUAscIl/wVcU3XnA8d2pCwEXU/d7B5bNZ4iDX1+dRhZgjuWBZXyfTVrQwdEP8cvs5/f3e/vaoZZeanWgZ4N9JbvO7p+3dqwJgaGHfA4xCIJbXaP8LgKMuxCjeNcKA0QBnAMH3YZ82ruLXJ+GHPkwVl8IbRpilRM1IVcnLkOQ45UYAMhckaDCGWAAcnF/u+WoAPnAQ8HBZOHSYD0MFs56wIEozl20TIsXlxZGDgN05sJliSwCXuABPqtMBivPLk+iECO28MqTCaVglJxLxKAbJDU4E4V5T5PnEVyQs0yfregM+UEchQzHxYVCQIyhqViYDgPEQxqlNyNegnlwylmIDpJhOnc6BGx+jiYHIqc0UcaZYgJ4ecr2kuL46AESzwP+JLJoHFkcLXlkHRzGQnyh2liYbQLYpOw8j6PISmpm2+q5sJbEdwXxnSyOCuIo+xOCJ+qUMHUhnj9hgSqMqrAwF6KgvPoyXz1VL+Xoq4J4JYvXBfFaFu2woIYVdVFQFxX1oaA+yOqyIC5lcVUQV7L4oSB+kMWbbbE/b4v9RYoV9RebYrVrTdBUfGPSVyiawzh6c/H2xzjqp1f2LoRI1ctGaD8aj8edbr8byzJ+1Nvjnm4Mq3pu6BoD3awz5I5B19SGow3LkeTNoTWtMxp05CiIN3qvb46r+gZWG3SHRo1hQzs226NuVj6EfMnq5D6z32lXyuLkOX3DME76VT03GG3T7B3VGHKHORwOB2aKQkNGMZK89NHY6Zxo1SiaB/W0TntQo28egNYzjAosLbAYY0Mf6ikLRwBLTp6/LGZv0B/JQXxTf2NgmpUHyAvl75n6sF1j2LCeDE9GxykJYcB35KqQvHydbu+4J0eRPGjcFU+wwkI2/zTuG1q3wkIKLGPDNJLClnei2GS3+l20/uRu9p26r6txLJvFRpPNyd57NEteXGPGze5a+zZ//QLcCF+9Uwib4mFNOGyEgXUssBke1sLDbfBO1e80xTs14U4jjFPH4jTDO7XwzjZ4WvXTpnhaE04bYWgdC22Gp7XwdBs8r/p5UzyvCeeNMLyOhTfD81p4vg2eVP2kKZ7UhJNGGFLHQprhSS082QLP0INo+ZJOITkhsEqk6MlFN5vqEEegogcccJTKBEdBRV4fNxLd9gRT+kP2TNwAktDnaQrFkeUAocTrjoUmBwyA1aRBJ1h1/ayHKX1eabJ0DkVbu3avb3OIxEGFobeiA3ovumsgGs5X6cnIc8UNJSekg2S0zShOUJlRjHbFoUmXj0jVwfXRod4+1PWf2vuvO9n56anyrfKd8lLRla7yWnmjnCmXClR+U35X/lD+bP3V+qf1b+u/tfXJTrbmG6V0PVP+Bw2XSHw=</latexit>

✓⇤ = argmax✓J(✓)<latexit sha1_base64="o5GAbZzkXTS46Vj7G4bTnaIGpf4=">AAAOOXicfZdNb9s2HMaV7q3L5i3djjtMWFCgG4JAShy/HArUkm0Uw9pmQV7aRUFA0bQimBIJknLsCjru0+y6XfZNdttx2HVfYJTsyBIlWRfTfB4+/vFP0SBdin0uDOOvnUcffPjRx588/nT3s89bX3y59+SrS04iBtEFJJiwty7gCPshuhC+wOgtZQgELkZX7sxO9as5Ytwn4blYUnQTAC/0pz4EQnbd7n3rhMDF4NZxxR0SQHegh+Mfk2fr79/f7u0bh0b26NWGuW7sa+vn9PZJa8eZEBgFKBQQA86vTYOKmxgw4UOMkl0n4ogCOAMeuo7EtHcT+yGNBAphoj+V2jTCuiB6CqtPfIagwEvZAJD5MkGHd4ABKOSUdstRHIUgQPxgMvcpXzX53Fs1hJwjuokXWb2S0sDYY4De+XBRIotBwAMg7iqdfBm45U4UYcTmQbkzpZSMinOBGPR5WoNTWZg3NF0Cfk5O1/rdkt6hkCdxxHBSHCgFxBiayoFZkyMR0TibjFz3GX8uWIQO0mbW93wI2OwMTQ5kTqmjjDPFBIhylxukxQnRPSRBAMJJ7NAkdgRaiNg5OEyk+FSXbwucuQSwSdl5lsSxk9bMdfUzaS2Jrwvia1UcFcTR+kcInuhTwvS5XH/CuC6NurQwHyJeHn2Rj57qF2r0ZUG8VMWrgnilim5UUKOKOi+o84p6X1DvVXVREBequCyIS1V8XxDfq+LbbbHvtsX+osTK+stNsdx1Jmgq/1ayVyiewSR+ef7qpyTuZ8/6XYiQbpaN0H0wHo873X43UWX8oLfHPdMaVvXc0LUGpl1nyB2Drm0MRxuWI8WbQxtGZzToqFEQb/Re3x5X9Q2sMegOrRrDhnZst0fddfkQChWrl/vsfqddKYuX5/QtyzrpV/XcYLVtu3dUY8gd9nA4HNgZCo0YxUjx0gdjp3NiVKNoHtQzOu1Bjb5ZAKNnWRVYWmCxxpY5NDMWgQBWnCJ/WezeoD9Sg8Sm/tbAtisLKArl79nmsF1j2LCeDE9GxxkJYSD01KqQvHydbu+4p0aRPGjclStYYSGbXxr3LaNbYSEFlrFlW2lhyztRbrJr8yZe/eVu9p2+b+pJoprlRlPN6d57MCteXGPGze5a+zZ//QDcCF+dKYRN8bAmHDbCwDoW2AwPa+HhNniv6vea4r2acK8Rxqtj8ZrhvVp4bxs8rfppUzytCaeNMLSOhTbD01p4ug1eVP2iKV7UhItGGFHHIprhRS282AZPqn7SFE9qwkkjDKljIc3wpBaebIFn6F4e+dKTQnpDYJVIeSaXp9lMhzgGFZ0LIFAmExzziry6bqS6G0im7IvqmfgckigUWQrFseMBqSSrEwtNLxgA6+kBnWDdD9dnmNLfK02HzqA81q7cq2kOkbyoMPRKnoDeyNM1kAfOH+SEmBf4ckLy0zlIW9uMYPFglK1deWky1StStXF1dGi2D03z5/b+i876/vRY+0b7TnummVpXe6G91E61Cw1qv2q/ab9rf7T+bP3d+qf178r6aGc95mut9LT++x8FyEOq</latexit>

r✓J(✓)<latexit sha1_base64="nRUMS05VMtYzLdEIwt+hn6g1tIA=">AAAOZXicfZfdbts2HMWV7qvLmizdht3sYsKCAt0aBFLi+OMiQC3ZRrGtbRbESbYoCCialgVTIkFRjl1BD7WnGXa33ew5Rsm2LFGSdWOK5/D4x79EgbQpdgOuaX/vPPno408+/ezp57tfPNvb//Lg+VfXAQkZRENIMGG3NggQdn005C7H6JYyBDwboxt7aib6zQyxwCX+FV9QdO8Bx3fHLgRcdD0c/GLZfII4eIjcV3p8vr5z1VeqBSmOLIDpBMSWD2wsTJkeC9nB0c/xy6zrx4eDQ+1YSy+13NBXjUNldV08PN/bsUYEhh7yOcQgCO50jfL7CDDuQoziXSsMEAVwChx0F/Jx+z5yfRpy5MNYfSG0cYhVTtRkYurIZQhyvBANAJkrElQ4AQxALqa/W4wKkA88FByNZi4Nls1g5iwbXMwU3UfztLZxYWDkMEAnLpwXyCLgBR7gk1JnsPDsYicKMWIzr9iZUApGyTlHDLpBUoMLUZj3NHlcwRW5WOmTBZ0gP4ijkOE4P1AIiDE0FgPTZoB4SKN0MuIdmQbnnIXoKGmmfec9wKaXaHQkcgodRZwxJoAXu2wvKY6PHiHxPOCPIovGkcXRnEfW0XEsxBeqeGfg1CaAjYrOyziKrKRmtq1eCmtBfJcT38liPyf2V39C8EgdE6bOxPMnLFCFURUW5kIUFEcPs9FjdShHX+fEa1m8yYk3smiHOTUsqbOcOiupjzn1UVbnOXEui4ucuJDFDznxgyzebov9fVvsH1KsqL9YFItda4TG4hOUvkLRFMbRm6u3v8ZRJ71W70KIVL1ohPbaeDpotjqtWJbxWm8M2rrRK+uZoWV0dbPKkDm6LVPr9TcsJ5I3g9a0Zr/blKMg3ujtjjko6xtYrdvqGRWGDe3AbPRbq/Ih5EtWJ/OZnWajVBYny+kYhnHWKeuZwWiYZvukwpA5zF6v1zVTFBoyipHkpWtjs3mmlaNoFtTWmo1uhb55AFrbMEqwNMdiDAy9p6csHAEsOXn2spjtbqcvB/FN/Y2uaZYeIM+Vv23qvUaFYcN61jvrn6YkhAHfkatCsvI1W+3TthxFsqBBSzzBEgvZ/NOgY2itEgvJsQwM00gKW1yJYpHd6ffR8pO7WXfqoa7GsWwWC002J2tvbZa8uMKM692V9m3+6gG4Fr48Uwjr4mFFOKyFgVUssB4eVsLDbfBO2e/UxTsV4U4tjFPF4tTDO5XwzjZ4WvbTunhaEU5rYWgVC62Hp5XwdBs8L/t5XTyvCOe1MLyKhdfD80p4vg2elP2kLp5UhJNaGFLFQurhSSU82QLP0KPY8iU7heSMwEqRYk8udrOpDnEESnrAAUepTHAUlOTlgSPRbU8wpTeyZ+QGkIQ+T1OSc4wDhBIvdyw0OWAArCYbdIJV11/tYQqfV5oMnUKxrV26l9PsIXFQYeit2AG9F7trIDacP4kJMcdzxYTEr3WUtLYZwXxtFK1dcWjS5SNSuXFzcqw3jnX9t8bh6+bq/PRU+U75QXmp6EpLea28US6UoQKVP5W/lH+Uf/f+23+2/83+t0vrk53VmK+VwrX//f9eYlIX</latexit>

✓i+1 = ✓i + ↵r✓iJ(✓i)

MAXIMIZING EXPECTED RETURN

20

We just introduced how we can estimate the expected return of a policy \pi_\theta. Now, we want to know how to maximize this expected return!

Maximizing an objective in Deep RL amounts to finding the optimal parameters \theta^* that maximize the expected return. We will do this using policy gradient methods: Estimate the gradient of expected return, then use this gradient to update the parameters using gradient ascent. Note that we do gradient ascent and not gradient descent like in most other Deep Learning: We want to maximize the return, not minimize it! Alternatively, we can minimize the negative expected return :) Like in SGD, we choose a step size (often called learning rate) \alpha with which to update the parameters.

Page 8: Lecture 9: Reinforcement Learning Emile van Krieken · 2021. 1. 25. · Emile van Krieken Deep Learning 2020 Lecture 9: Reinforcement Learning dlvu.github.io Reinforcement learning

MARKOV DECISION PROCESSES

21

<latexit sha1_base64="1XD/drTWwzJ56XWvUlFI4PIcB5s=">AAAN2HicfZdNb9s2HMbV7q3L6q1dj7sICwp0QxBIieMXFAVqSTZ6WNssyFsbBwFF04pgSiQoyrErCDsOuw7YdfsW+y77NqNkR5YoyrqY4PPw8U9/icKfLsV+xA3jvwcPP/v8iy+/evT1zjePW99+9+Tp9+cRiRlEZ5Bgwi5dECHsh+iM+xyjS8oQCFyMLtyZnekXc8Qin4SnfEnRdQC80J/6EHAxdUVfjCMOOLoxfrp5smvsG/ml1wfmerCrra/jm6eP/x1PCIwDFHKIQRRdmQbl1wlg3IcYpTvjOEIUwBnw0FXMp73rxA9pzFEIU/250KYx1jnRMyx94jMEOV6KAYDMFwk6vAUMQC7gd6pREQpBgKK9ydyn0WoYzb3VgANx59fJIq9MWlmYeAzQWx8uKmQJCKIA8NvaZLQM3OokijFi86A6mVEKRsm5QAz6UVaDY1GY9zQrdnRKjtf67ZLeojBKk5jhtLxQCIgxNBUL82GEeEyT/GbEE55FrziL0V42zOdeOYDNTtBkT+RUJqo4U0wAr065QVacEN1BEgQgnCRjmiZjjhY8Ge/tp0J8rrtYmF0C2KTqPEmTZJzVzHX1E2GtiO9K4jtZHJbE4fpPCJ7oU8L0uXj+hEW6MOrCwnyIourqs2L1VD+To89L4rksXpTEC1l045Ia19R5SZ3X1LuSeieri5K4kMVlSVzK4qeS+EkWL7fFftgW+1GKFfUXm2K5M56gqfiA5K9QMoNp8ub07S9p0s+v9bsQI92sGqF7bzwcdbr9birL+F5vj3qm5dT1wtC1BqatMhSOQdc2nOGG5UDyFtCG0RkOOnIUxBu917dHdX0Dawy6jqUwbGhHdnvYXZcPoVCyeoXP7nfatbJ4RU7fsqyjfl0vDFbbtnsHCkPhsB3HGdg5Co0ZxUjy0ntjp3Nk1KNoEdQzOu2BQt88AKNnWTVYWmKxRpbpmDkLRwBLTl68LHZv0B/KQXxTf2tg27UHyEvl79mm01YYNqxHztHwMCchDISeXBVSlK/T7R325ChSBI264gnWWMjmn0Z9y+jWWEiJZWTZVlbY6k4Um+zKvE5Wn9zNvtN3TT1NZbPYaLI523v3ZsmLFWbc7Fbat/nVC3AjfP1OIWyKh4pw2AgDVSywGR4q4eE2eK/u95riPUW41wjjqVi8ZnhPCe9tg6d1P22Kp4pw2ghDVSy0GZ4q4ek2eF7386Z4rgjnjTBcxcKb4bkSnm+DJ3U/aYoninDSCENULKQZnijhyRZ4hu5Ey5d1CuLtSlgtUvTkopvNdYgTUNPzA0UuE5xE6arNoNmpAGA966oJ1v1w3XhUvok0WzWDohdduVdsDhKnC4beirblvWiJgegSfxYUzAt8QSF+x3vZaJsRLO6NYrQjTjqmfK6pDy4O9s32vmn+2t59/XJ96Hmk/aD9qL3QTK2rvdbeaMfamQY1ov2l/a390/rQ+q31e+uPlfXhg/WaZ1rlav35Px0uGww=</latexit>

p(s0)<latexit sha1_base64="YHkXKgzGtE9903LLKx5hW4i+GOE=">AAAN+XicfZfbbts2HMbV7tRl9ZZul7sRFhToCiOQEscHDAVqyTZ6sbZZkNMaBwFF04pgSiQoyrGj6R32CLscdjugt9tr7G1GyY4sUZR5Y4Lfx88//UUKpEOxF3LD+O/R408+/ezzL558ufPV08bX3+w++/Y8JBGD6AwSTNilA0KEvQCdcY9jdEkZAr6D0YUzs1P9Yo5Y6JHglC8puvaBG3hTDwIuhm52X9IX45ADjm7Mpj5m6A6wyY352xjATDfE4Eo2frzZ3TP2jazp1Y657uxp63Z88+zpx/GEwMhHAYcYhOGVaVB+HQPGPYhRsjOOQkQBnAEXXUV82r2OvYBGHAUw0Z8LbRphnRM95dYnHkOQ46XoAMg8kaDDW8AEp3i6nXJUiALgo7A5mXs0XHXDubvqcCBKcx0vstIlpYmxywC99eCiRBYDP/QBv60MhkvfKQ+iCCM298uDKaVglJwLxKAXpjU4FoV5T9Nqh6fkeK3fLuktCsIkjhhOihOFgBhDUzEx64aIRzTOHkYsgVn4irMINdNuNvZqANjsBE2aIqc0UMaZYgJ4ecjx0+IE6A4S3wfBJB7TJB5ztODxuLmfCPG57mBhdohYMmXnSRLH47RmjqOfCGtJfFcQ38nisCAO139C8ESfEqbPxfsnLNSFURcW5kEUlmef5bOn+pkcfV4Qz2XxoiBeyKITFdSoos4L6ryi3hXUO1ldFMSFLC4L4lIW7wvivSxebov9dVvsBylW1F9siuXOeIKm4guTLaF4BpP4zenbn5O4l7X1WoiQbpaN0HkwHo7anV4nkWX8oLdGXdMaVPXc0LH6pq0y5I5+xzYGww3LgeTNoQ2jPey35SiIN3q3Z4+q+gbW6HcGlsKwoR3ZrWFnXT6EAsnq5j67125VyuLmOT3Lso56VT03WC3b7h4oDLnDHgwGfTtDoRGjGEle+mBst4+MahTNg7pGu9VX6JsXYHQtqwJLCyzWyDIHZsbCEcCSk+eLxe72e0M5iG/qb/Vtu/ICeaH8XdsctBSGDevR4Gh4mJEQBgJXrgrJy9fudA+7chTJg0Yd8QYrLGTzT6OeZXQqLKTAMrJsKy1seSeKTXZlXserT+5m3+l7pp4ksllsNNmc7r0Hs+TFCjOudyvt2/zqCbgWvvqkENbFQ0U4rIWBKhZYDw+V8HAbvFv1u3XxriLcrYVxVSxuPbyrhHe3wdOqn9bFU0U4rYWhKhZaD0+V8HQbPK/6eV08V4TzWhiuYuH18FwJz7fBk6qf1MUTRTiphSEqFlIPT5TwZAv86paQnhTE6opZJXJ1d8h0iGNQ0bMLRSYTHIfJ6phB01sBwHp6qiZY94L1waP0TaTprBkUZ9GVe8U2QOJ2wdBbcWx5L47EQJwSXwoK5vqeoBC/42ba22YEiwej6O2Im44p32uqnYuDfbO1b5q/tPZe/7S+9DzRvtd+0F5optbRXmtvtGPtTIPa79pH7R/t38Z944/Gn42/VtbHj9ZzvtNKrfH3/2uAJ24=</latexit>

p(s1, r1|a0, s0)<latexit sha1_base64="TtotXtVyiolaey0Pj34Znxjw2ao=">AAAN+XicfZfbbts2HMbV7tRl9ZZul7sRFhToCiOQEscHDAVqyTZ6sbZZkNMaBwFF04pgSiQoyrGj6R32CLscdjugt9tr7G1GyY4sUZR5Y4Lfx88//UUKpEOxF3LD+O/R408+/ezzL558ufPV08bX3+w++/Y8JBGD6AwSTNilA0KEvQCdcY9jdEkZAr6D0YUzs1P9Yo5Y6JHglC8puvaBG3hTDwIuhm52X9IX45ADjm4OmvqYoTvAJjcHv40BzHRTDK5k88eb3T1j38iaXu2Y686etm7HN8+efhxPCIx8FHCIQRhemQbl1zFg3IMYJTvjKEQUwBlw0VXEp93r2AtoxFEAE/250KYR1jnRU2594jEEOV6KDoDMEwk6vAVMcIqn2ylHhSgAPgqbk7lHw1U3nLurDgeiNNfxIitdUpoYuwzQWw8uSmQx8EMf8NvKYLj0nfIgijBic788mFIKRsm5QAx6YVqDY1GY9zStdnhKjtf67ZLeoiBM4ojhpDhRCIgxNBUTs26IeETj7GHEEpiFrziLUDPtZmOvBoDNTtCkKXJKA2WcKSaAl4ccPy1OgO4g8X0QTOIxTeIxRwsej5v7iRCf6w4WZoeIJVN2niRxPE5r5jj6ibCWxHcF8Z0sDgvicP0nBE/0KWH6XLx/wkJdGHVhYR5EYXn2WT57qp/J0ecF8VwWLwrihSw6UUGNKuq8oM4r6l1BvZPVRUFcyOKyIC5l8b4g3svi5bbYX7fFfpBiRf3FpljujCdoKr4w2RKKZzCJ35y+/TmJe1lbr4UI6WbZCJ0H4+Go3el1ElnGD3pr1DWtQVXPDR2rb9oqQ+7od2xjMNywHEjeHNow2sN+W46CeKN3e/aoqm9gjX5nYCkMG9qR3Rp21uVDKJCsbu6ze+1WpSxuntOzLOuoV9Vzg9Wy7e6BwpA77MFg0LczFBoxipHkpQ/GdvvIqEbRPKhrtFt9hb55AUbXsiqwtMBijSxzYGYsHAEsOXm+WOxuvzeUg/im/lbftisvkBfK37XNQUth2LAeDY6GhxkJYSBw5aqQvHztTvewK0eRPGjUEW+wwkI2/zTqWUanwkIKLCPLttLClnei2GRX5nW8+uRu9p2+Z+pJIpvFRpPN6d57MEterDDjerfSvs2vnoBr4atPCmFdPFSEw1oYqGKB9fBQCQ+3wbtVv1sX7yrC3VoYV8Xi1sO7Snh3Gzyt+mldPFWE01oYqmKh9fBUCU+3wfOqn9fFc0U4r4XhKhZeD8+V8HwbPKn6SV08UYSTWhiiYiH18EQJT7bAr24J6UlBrK6YVSJXd4dMhzgGFT27UGQywXGYrI4ZNL0VAKynp2qCdS9YHzxK30SazppBcRZduVdsAyRuFwy9FceW9+JIDMQp8aWgYK7vCQrxO26mvW1GsHgwit6OuOmY8r2m2rk42Ddb+6b5S2vv9U/rS88T7XvtB+2FZmod7bX2RjvWzjSo/a591P7R/m3cN/5o/Nn4a2V9/Gg95zut1Bp//w+haSdy</latexit>

p(s2, r2|a1, s1)<latexit sha1_base64="dlBFF7y4eqn9g3p0iFNRijY0e0s=">AAAN+XicfZfbbts2HMbV7tRl9ZZul7sRFhToCiOQEscHDAVqyTZ6sbZZkNMaBwFF04pgSiQoyrGj6R32CLscdjugt9tr7G1GyY4sUZR5Y4Lfx88//UUKpEOxF3LD+O/R408+/ezzL558ufPV08bX3+w++/Y8JBGD6AwSTNilA0KEvQCdcY9jdEkZAr6D0YUzs1P9Yo5Y6JHglC8puvaBG3hTDwIuhm52X9IX45ADjm4Om/qYoTvAJjeHv40BzPQDMbiSD3682d0z9o2s6dWOue7saet2fPPs6cfxhMDIRwGHGIThlWlQfh0Dxj2IUbIzjkJEAZwBF11FfNq9jr2ARhwFMNGfC20aYZ0TPeXWJx5DkOOl6ADIPJGgw1vABKd4up1yVIgC4KOwOZl7NFx1w7m76nAgSnMdL7LSJaWJscsAvfXgokQWAz/0Ab+tDIZL3ykPoggjNvfLgymlYJScC8SgF6Y1OBaFeU/Taoen5Hit3y7pLQrCJI4YTooThYAYQ1MxMeuGiEc0zh5GLIFZ+IqzCDXTbjb2agDY7ARNmiKnNFDGmWICeHnI8dPiBOgOEt8HwSQe0yQec7Tg8bi5nwjxue5gYXaIWDJl50kSx+O0Zo6jnwhrSXxXEN/J4rAgDtd/QvBEnxKmz8X7JyzUhVEXFuZBFJZnn+Wzp/qZHH1eEM9l8aIgXsiiExXUqKLOC+q8ot4V1DtZXRTEhSwuC+JSFu8L4r0sXm6L/XVb7AcpVtRfbIrlzniCpuILky2heAaT+M3p25+TuJe19VqIkG6WjdB5MB6O2p1eJ5Fl/KC3Rl3TGlT13NCx+qatMuSOfsc2BsMNy4HkzaENoz3st+UoiDd6t2ePqvoG1uh3BpbCsKEd2a1hZ10+hALJ6uY+u9duVcri5jk9y7KOelU9N1gt2+4eKAy5wx4MBn07Q6ERoxhJXvpgbLePjGoUzYO6RrvVV+ibF2B0LasCSwss1sgyB2bGwhHAkpPni8Xu9ntDOYhv6m/1bbvyAnmh/F3bHLQUhg3r0eBoeJiREAYCV64KycvX7nQPu3IUyYNGHfEGKyxk80+jnmV0KiykwDKybCstbHknik12ZV7Hq0/uZt/pe6aeJLJZbDTZnO69B7PkxQozrncr7dv86gm4Fr76pBDWxUNFOKyFgSoWWA8PlfBwG7xb9bt18a4i3K2FcVUsbj28q4R3t8HTqp/WxVNFOK2FoSoWWg9PlfB0Gzyv+nldPFeE81oYrmLh9fBcCc+3wZOqn9TFE0U4qYUhKhZSD0+U8GQL/OqWkJ4UxOqKWSVydXfIdIhjUNGzC0UmExyHyeqYQdNbAcB6eqomWPeC9cGj9E2k6awZFGfRlXvFNkDidsHQW3FseS+OxECcEl8KCub6nqAQv+Nm2ttmBIsHo+jtiJuOKd9rqp2Lg32ztW+av7T2Xv+0vvQ80b7XftBeaKbW0V5rb7Rj7UyD2u/aR+0f7d/GfeOPxp+Nv1bWx4/Wc77TSq3x9//XUid2</latexit>

p(s3, r3|a2, s2)

<latexit sha1_base64="v7EdJ+ThHXgJX0gdR//dSzJOt1g=">AAAOEnicfZdNb9s2HMbV7q3Lmi3djjtMWFCgG4JAShy/YChQS7bRw9pmQV66RUFA0bQsmBIJinLsajruE+xj7Loddhx23RfYPs0o2ZElirIu+YfPw8c//SUKpEuxH3HD+PfBw/fe/+DDjx59vPPJ491PP9t78vllRGIG0QUkmLC3LogQ9kN0wX2O0VvKEAhcjK7cmZ3pV3PEIp+E53xJ0U0AvNCf+BBwMXS795VD/VvH5VPEwTMHwHzU+NmJOODo1vjmdm/fODTyS68X5rrY19bX6e2Tx/85YwLjAIUcYhBF16ZB+U0CGPchRumOE0eIAjgDHrqO+aR7k/ghjTkKYao/Fdokxjonegarj32GIMdLUQDIfJGgwylgAlPc0k41KkIhCFB0MJ77NFqV0dxbFRyIftwki7xfaWVi4jFApz5cVMgSEEQB4NPaYLQM3OogijFi86A6mFEKRsm5QAz6UdaDU9GYNzRrdnROTtf6dEmnKIzSJGY4LU8UAmIMTcTEvIwQj2mS34x47rPoOWcxOsjKfOz5ALDZGRofiJzKQBVnggng1SE3yJoTojtIggCE48ShaeJwtOCJc3CYCvGp7mJhdglg46rzLE0SJ+uZ6+pnwloRX5fE17I4LInD9Y8QPNYnhOlz8fwJi3Rh1IWF+RBF1dkXxeyJfiFHX5bES1m8KolXsujGJTWuqfOSOq+pdyX1TlYXJXEhi8uSuJTFdyXxnSy+3Rb747bYn6RY0X+xKJY7zhhNxGclf4WSGUyTl+evvk+TXn6t34UY6WbVCN174/Go3el1UlnG93pr1DWtQV0vDB2rb9oqQ+Hod2xjMNywHEneAtow2sN+W46CeKN3e/aorm9gjX5nYCkMG9qR3Rp21u1DKJSsXuGze+1WrS1ekdOzLOukV9cLg9Wy7e6RwlA47MFg0LdzFBozipHkpffGdvvEqEfRIqhrtFt9hb55AEbXsmqwtMRijSxzYOYsHAEsOXnxstjdfm8oB/FN/62+bdceIC+1v2ubg5bCsGE9GZwMj3MSwkDoyV0hRfvane5xV44iRdCoI55gjYVsfmnUs4xOjYWUWEaWbWWNra5EsciuzZtk9cndrDt939TTVDaLhSabs7V3b5a8WGHGzW6lfZtfPQE3wtfvFMKmeKgIh40wUMUCm+GhEh5ug/fqfq8p3lOEe40wnorFa4b3lPDeNnha99OmeKoIp40wVMVCm+GpEp5ug+d1P2+K54pw3gjDVSy8GZ4r4fk2eFL3k6Z4oggnjTBExUKa4YkSnmyBZ+hObPmynYJ4uxJWi1wdHXId4gTU9PxAkcsEJ1FNXp1AMt0NBFP+z2orQrOTA8B6tvMmWPfD9eak8t2k2cwZFPvVlXvFP0DiBMLQK7G1eSO2zUDsJL8VpMwLfEEq/joHWbXNCBb3RlHtiNOQKZ996sXV0aHZOjTNH1r7L75bH4weaV9qX2vPNFPraC+0l9qpdqFB7RftN+137Y/dX3f/3P1r9++V9eGD9ZwvtMq1+8//ia4zmg==</latexit>

⇡✓(a0|s0)<latexit sha1_base64="4j85n+ya1RERqomaZORIgNwngbY=">AAAOEnicfZdNb9s2HMbV7q3Lmi3djjtMWFCgG4JAShy/YChQS7bRw9pmQV66RUFA0bQsmBIJinLsajruE+xj7Loddhx23RfYPs0o2ZElirIu+YfPw8c//SUKpEuxH3HD+PfBw/fe/+DDjx59vPPJ491PP9t78vllRGIG0QUkmLC3LogQ9kN0wX2O0VvKEAhcjK7cmZ3pV3PEIp+E53xJ0U0AvNCf+BBwMXS795VD/VvH5VPEwTMHwHzU/NmJOODo1vzmdm/fODTyS68X5rrY19bX6e2Tx/85YwLjAIUcYhBF16ZB+U0CGPchRumOE0eIAjgDHrqO+aR7k/ghjTkKYao/Fdokxjonegarj32GIMdLUQDIfJGgwylgAlPc0k41KkIhCFB0MJ77NFqV0dxbFRyIftwki7xfaWVi4jFApz5cVMgSEEQB4NPaYLQM3OogijFi86A6mFEKRsm5QAz6UdaDU9GYNzRrdnROTtf6dEmnKIzSJGY4LU8UAmIMTcTEvIwQj2mS34x47rPoOWcxOsjKfOz5ALDZGRofiJzKQBVnggng1SE3yJoTojtIggCE48ShaeJwtOCJc3CYCvGp7mJhdglg46rzLE0SJ+uZ6+pnwloRX5fE17I4LInD9Y8QPNYnhOlz8fwJi3Rh1IWF+RBF1dkXxeyJfiFHX5bES1m8KolXsujGJTWuqfOSOq+pdyX1TlYXJXEhi8uSuJTFdyXxnSy+3Rb747bYn6RY0X+xKJY7zhhNxGclf4WSGUyTl+evvk+TXn6t34UY6WbVCN174/Go3el1UlnG93pr1DWtQV0vDB2rb9oqQ+Hod2xjMNywHEneAtow2sN+W46CeKN3e/aorm9gjX5nYCkMG9qR3Rp21u1DKJSsXuGze+1WrS1ekdOzLOukV9cLg9Wy7e6RwlA47MFg0LdzFBozipHkpffGdvvEqEfRIqhrtFt9hb55AEbXsmqwtMRijSxzYOYsHAEsOXnxstjdfm8oB/FN/62+bdceIC+1v2ubg5bCsGE9GZwMj3MSwkDoyV0hRfvane5xV44iRdCoI55gjYVsfmnUs4xOjYWUWEaWbWWNra5EsciuzZtk9cndrDt939TTVDaLhSabs7V3b5a8WGHGzW6lfZtfPQE3wtfvFMKmeKgIh40wUMUCm+GhEh5ug/fqfq8p3lOEe40wnorFa4b3lPDeNnha99OmeKoIp40wVMVCm+GpEp5ug+d1P2+K54pw3gjDVSy8GZ4r4fk2eFL3k6Z4oggnjTBExUKa4YkSnmyBZ+hObPmynYJ4uxJWi1wdHXId4gTU9PxAkcsEJ1FNXp1AMt0NBFP+z2orQrOTA8B6tvMmWPfD9eak8t2k2cwZFPvVlXvFP0DiBMLQK7G1eSO2zUDsJL8VpMwLfEEq/joHWbXNCBb3RlHtiNOQKZ996sXV0aHZOjTNH1r7L75bH4weaV9qX2vPNFPraC+0l9qpdqFB7RftN+137Y/dX3f/3P1r9++V9eGD9ZwvtMq1+8//pNUznA==</latexit>

⇡✓(a1|s1)<latexit sha1_base64="y2/EBqPf66V+UZmCJQzpbqka44c=">AAAOEnicfZdNb9s2HMbV7q3Lmi3djjtMWFCgG4JAShy/YChQS7bRw9pmQV66RUFA0bQsmBIJinLsajruE+xj7Loddhx23RfYPs0o2ZElirIu+YfPw8c//SUKpEuxH3HD+PfBw/fe/+DDjx59vPPJ491PP9t78vllRGIG0QUkmLC3LogQ9kN0wX2O0VvKEAhcjK7cmZ3pV3PEIp+E53xJ0U0AvNCf+BBwMXS795VD/VvH5VPEwTMHwHz06Gcn4oCj26Nvbvf2jUMjv/R6Ya6LfW19nd4+efyfMyYwDlDIIQZRdG0alN8kgHEfYpTuOHGEKIAz4KHrmE+6N4kf0pijEKb6U6FNYqxzomew+thnCHK8FAWAzBcJOpwCJjDFLe1UoyIUggBFB+O5T6NVGc29VcGB6MdNssj7lVYmJh4DdOrDRYUsAUEUAD6tDUbLwK0OohgjNg+qgxmlYJScC8SgH2U9OBWNeUOzZkfn5HStT5d0isIoTWKG0/JEISDG0ERMzMsI8Zgm+c2I5z6LnnMWo4OszMeeDwCbnaHxgcipDFRxJpgAXh1yg6w5IbqDJAhAOE4cmiYORwueOAeHqRCf6i4WZpcANq46z9IkcbKeua5+JqwV8XVJfC2Lw5I4XP8IwWN9Qpg+F8+fsEgXRl1YmA9RVJ19Ucye6Bdy9GVJvJTFq5J4JYtuXFLjmjovqfOaeldS72R1URIXsrgsiUtZfFcS38ni222xP26L/UmKFf0Xi2K544zRRHxW8lcomcE0eXn+6vs06eXX+l2IkW5WjdC9Nx6P2p1eJ5VlfK+3Rl3TGtT1wtCx+qatMhSOfsc2BsMNy5HkLaANoz3st+UoiDd6t2eP6voG1uh3BpbCsKEd2a1hZ90+hELJ6hU+u9du1driFTk9y7JOenW9MFgt2+4eKQyFwx4MBn07R6ExoxhJXnpvbLdPjHoULYK6RrvVV+ibB2B0LasGS0ss1sgyB2bOwhHAkpMXL4vd7feGchDf9N/q23btAfJS+7u2OWgpDBvWk8HJ8DgnIQyEntwVUrSv3eked+UoUgSNOuIJ1ljI5pdGPcvo1FhIiWVk2VbW2OpKFIvs2rxJVp/czbrT9009TWWzWGiyOVt792bJixVm3OxW2rf51RNwI3z9TiFsioeKcNgIA1UssBkeKuHhNniv7vea4j1FuNcI46lYvGZ4TwnvbYOndT9tiqeKcNoIQ1UstBmeKuHpNnhe9/OmeK4I540wXMXCm+G5Ep5vgyd1P2mKJ4pw0ghDVCykGZ4o4ckWeIbuxJYv2ymItythtcjV0SHXIU5ATc8PFLlMcBLV5NUJJNPdQDDl/6y2IjQ7OQCsZztvgnU/XG9OKt9Nms2cQbFfXblX/AMkTiAMvRJbmzdi2wzETvJbQcq8wBek4q9zkFXbjGBxbxTVjjgNmfLZp15cHR2arUPT/KG1/+K79cHokfal9rX2TDO1jvZCe6mdahca1H7RftN+1/7Y/XX3z92/dv9eWR8+WM/5Qqtcu//8D7/8M54=</latexit>

⇡✓(a2|s2)

<latexit sha1_base64="KIzHNcLYbod1+x/KccCLlzNOPGI=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe+P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wN8r/HQ</latexit>s0

<latexit sha1_base64="M8sqRzeDFh3Mi1iBGtqcTbyBm/Y=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk98b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gf1BfGy</latexit>a0

~<latexit sha1_base64="9xsJmqaI6QVO7aNPEsbsg9BoZ44=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL7837Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q8UNPHI</latexit>r1

<latexit sha1_base64="OzSwNdQHk8BIaE6qZCRw39GSBas=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe/P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wOJuPHR</latexit>s1

<latexit sha1_base64="faCoi3suTLeizhRJ44TVzLkE1Ww=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk9+b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gcCHfGz</latexit>a1

~<latexit sha1_base64="pd3L+wPVTqb/6ceddXPUL+1bOrE=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL70/unx0ax8by0quFuS4OtfV1fv98fzAcERgHKOQQgyi6NQ3K7xLAuA8xSveGcYQogFPgoduYj9t3iR/SmKMQpvpLoY1jrHOiZ1D6yGcIcrwQBYDMFwk6nAAGIBfoe+WoCIUgQNHRaObTaFVGM29VcCDu+y6ZL/uSliYmHgN04sN5iSwBQRQAPqkMRovALQ+iGCM2C8qDGaVglJxzxKAfZT04F415T7NWR5fkfK1PFnSCwihNYobT4kQhIMbQWExclhHiMU2WNyPWdxq94ixGR1m5HHvlADa9QKMjkVMaKOOMMQG8POQGWXNC9ABJEIBwlAxpmgw5mvNkeHScCvGl7mJhdglgo7LzIk2SYdYz19UvhLUkviuI72SxVxB76x8heKSPCdNnYv0Ji3Rh1IWF+RBF5dmDfPZYH8jRVwXxShavC+K1LLpxQY0r6qygzirqQ0F9kNV5QZzL4qIgLmTxU0H8JIs3u2I/7Ir9KMWK/otNsdgbjtBYvD6Wj1AyhWny5vLtH2nSWV7rZyFGulk2QndjPO03W51WKst4ozf6bdNyqnpuaFld01YZcke3ZRtOb8tyInlzaMNo9rpNOQrird7u2P2qvoU1ui3HUhi2tH270Wut24dQKFm93Gd3mo1KW7w8p2NZ1lmnqucGq2Hb7ROFIXfYjuN07SUKjRnFSPLSjbHZPDOqUTQPahvNRlehbxfAaFtWBZYWWKy+ZTrmkoUjgCUnzx8Wu93t9OQgvu2/1bXtygLyQvvbtuk0FIYt65lz1jtdkhAGQk/uCsnb12y1T9tyFMmD+i2xghUWsv2lfscyWhUWUmDpW7aVNba8E8UmuzXvktUrd7vv9ENTT1PZLDaabM723sYsebHCjOvdSvsuv3oCroWv3imEdfFQEQ5rYaCKBdbDQyU83AXvVf1eXbynCPdqYTwVi1cP7ynhvV3wtOqndfFUEU5rYaiKhdbDUyU83QXPq35eF88V4bwWhqtYeD08V8LzXfCk6id18UQRTmphiIqF1MMTJTwpw4s/j+zUDrCenXoJ1v1wfTAovbNodnyYQnFWXLlXN+4gcfpn6K04VrwXR1YgTnG/JkPAvMAPU/E14A2PsmqXEcw3RlGJDxFT/uyoFtcnx2bj2DT/bBy+/n39TfJU+177UftFM7WW9lp7o51rAw1qgfaX9rf2z/5/Bz8c/HTw88r6+NF6zgutdB389j8hPfHJ</latexit>r2~

~ ~

~

~

<latexit sha1_base64="1RcFirdzOoS5tyEl3SzBljfwnaY=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnS+5P7Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q+WwfHS</latexit>s2

<latexit sha1_base64="MJSdrcnRCn4KhDzKoML3O8YiGsk=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk9yf3zw6NY2N56dXCXBeH2vo6v3++PxiOCIwDFHKIQRTdmgbldwlg3IcYpXvDOEIUwCnw0G3Mx+27xA9pzFEIU/2l0MYx1jnRMyh95DMEOV6IAkDmiwQdTgADkAv0vXJUhEIQoOhoNPNptCqjmbcqOBD3fZfMl31JSxMTjwE68eG8RJaAIAoAn1QGo0XglgdRjBGbBeXBjFIwSs45YtCPsh6ci8a8p1mro0tyvtYnCzpBYZQmMcNpcaIQEGNoLCYuywjxmCbLmxHrO41ecRajo6xcjr1yAJteoNGRyCkNlHHGmABeHnKDrDkheoAkCEA4SoY0TYYczXkyPDpOhfhSd7EwuwSwUdl5kSbJMOuZ6+oXwloS3xXEd7LYK4i99Y8QPNLHhOkzsf6ERbow6sLCfIii8uxBPnusD+Toq4J4JYvXBfFaFt24oMYVdVZQZxX1oaA+yOq8IM5lcVEQF7L4qSB+ksWbXbEfdsV+lGJF/8WmWOwNR2gsXh/LRyiZwjR5c/n2jzTpLK/1sxAj3SwbobsxnvabrU4rlWW80Rv9tmk5VT03tKyuaasMuaPbsg2nt2U5kbw5tGE0e92mHAXxVm937H5V38Ia3ZZjKQxb2r7d6LXW7UMolKxe7rM7zUalLV6e07Es66xT1XOD1bDt9onCkDtsx3G69hKFxoxiJHnpxthsnhnVKJoHtY1mo6vQtwtgtC2rAksLLFbfMh1zycIRwJKT5w+L3e52enIQ3/bf6tp2ZQF5of1t23QaCsOW9cw5650uSQgDoSd3heTta7bap205iuRB/ZZYwQoL2f5Sv2MZrQoLKbD0LdvKGlveiWKT3Zp3yeqVu913+qGpp6lsFhtNNmd7b2OWvFhhxvVupX2XXz0B18JX7xTCunioCIe1MFDFAuvhoRIe7oL3qn6vLt5ThHu1MJ6KxauH95Tw3i54WvXTuniqCKe1MFTFQuvhqRKe7oLnVT+vi+eKcF4Lw1UsvB6eK+H5LnhS9ZO6eKIIJ7UwRMVC6uGJEp6U4cWfR3ZqB1jPTr0E6364PhiU3lk0Oz5MoTgrrtyrG3eQOP0z9FYcK96LIysQp7hfkyFgXuCHqfga8IZHWbXLCOYbo6jEh4gpf3ZUi+uTY7NxbJp/Ng5f/77+Jnmqfa/9qP2imVpLe6290c61gQa1QPtL+1v7Z/+/gx8Ofjr4eWV9/Gg954VWug5++x8PJvG0</latexit>a2

Agent <latexit sha1_base64="JYRotR/AW1We/S2YX7ncehvQmNA=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIjeOXQ4Faso0e1jbL8rbVQUDRtCKYEgmScuwKuu0L7Lp9gx2HXfdB9gH2PUbJtiyRknXJP3wePv7pT8kmPYoDLizr3wcPP/r4k08/e/T53hdfPt5/8vTZV5ecxAyiC0gwYdce4AgHEboQgcDomjIEQg+jK2/mZvrVHDEekOhcLCm6CYEfBdMAAiGHrsaeuEMC3D49sI6s/DL1wl4XB8b6Or199vi/8YTAOESRgBhw/t62qLhJABMBxCjdG8ccUQBnwEfvYzHt3iRBRGOBIpiaz6U2jbEpiJkhmZOAISjwUhYAskAmmPAOMACFBN+rRnEUgRDxw8k8oHxV8rm/KgSQd32TLPKupJWJic8AvQvgokKWgJCHQNxpg3wZetVBFGPE5mF1MKOUjIpzgRgMeNaDU9mYdzRrND8np2v9bknvUMTTJGY4LU+UAmIMTeXEvORIxDTJb0au7oy/FCxGh1mZj70cADY7Q5NDmVMZqOJMMQGiOuSFWXMidA9JGIJokoxpmowFWohkfHiUSvG56WFp9ghgk6rzLE2ScdYzzzPPpLUivi2Jb1VxWBKH6w8heGJOCTPncv0J46Y0mtLCAoh4dfZFMXtqXqjRlyXxUhWvSuKVKnpxSY01dV5S55p6X1LvVXVREhequCyJS1X8UBI/qOL1rtifd8X+osTK/suXYrk3nqCp/PLIH6FkBtPk9fmbH9Kkl1/rZyFGpl01Qm9jPB61O71Oqsp4o7dGXdsZ6Hph6Dh9260zFI5+x7UGwy3LC8VbQFtWe9hvq1EQb/Vuzx3p+hbW6ncGTo1hSztyW8POun0IRYrVL3xur93S2uIXOT3HcU56ul4YnJbrdl/UGAqHOxgM+m6OQmNGMVK8dGNst08sPYoWQV2r3erX6NsFsLqOo8HSEoszcuyBnbMIBLDiFMXD4nb7vaEaJLb9d/quqy2gKLW/69qDVo1hy3oyOBke5ySEgchXu0KK9rU73eOuGkWKoFFHrqDGQrafNOo5VkdjISWWkeM6WWOrb6J8yd7bN8nqK3f73pkHtpmmqlm+aKo5e/c2ZsWLa8y42V1r3+Wvn4Ab4fU7hbApHtaEw0YYWMcCm+FhLTzcBe/rfr8p3q8J9xth/DoWvxner4X3d8FT3U+b4mlNOG2EoXUstBme1sLTXfBC94umeFETLhphRB2LaIYXtfBiFzzR/aQpntSEk0YYUsdCmuFJLTzZAc/QvdzyZTsF+XQlTIuUe3K5m811iBOg6VwAgXKZ4IRr8uq0keleKJnyf3QPiAuHLDUdUb7RZRlgyaN6JgGHJI5ETkJxMvaBVNJi1zMJ5LnFRFwEYX4OUm4ChPLHdHOTcienxvt8ut1MJX56m4yJ3LADuYfNTiLJT6NUn0MnO+ecDtI1H80OUQCb2SGEYDOI1vu0yk8IzcJmUG7dV+7VUg6QPIwx9EZ+yLt1+Pdy0ZgfBnLR5N/xYVbtMoLFxiirPXkwtNVjoF5cvTiyW0e2/WPr4FV7fUZ8ZHxjfGt8Z9hGx3hlvDZOjQsDGjPjN+N344/9X/f/3P9r/++V9eGD9Zyvjcq1/8//cW+XHQ==</latexit>

<latexit sha1_base64="0ikfrFZg78atJ8tLBb9MSbBADcA=">AAANl3icfZfbbts2HMbV7tR59dZuNwN2IywoMAxBICWODxg61JJs5GJt0yCnNjYCiqYVwZRIUJQPFfQUu90ebG8zynZkiaKsK4Lfx08//UkKpEuxH3HD+O/J0y++/Orrb5592/juefP7H168/PE6IjGD6AoSTNitCyKE/RBdcZ9jdEsZAoGL0Y07szP9Zo5Y5JPwkq8oGgfAC/2pDwEXXR9HDC0Am9yf3L84MI6M9aNXG+a2caBtn/P7l88XowmBcYBCDjGIojvToHycAMZ9iFHaGMURogDOgIfuYj7tjhM/pDFHIUz1V0KbxljnRM+g9InPEOR4JRoAMl8k6PABMAC5QG+UoyIUggBFh5O5T6NNM5p7mwYH4rvHyXJdl7Q0MPEYoA8+XJbIEhBEAeAPlc5oFbjlThRjxOZBuTOjFIySc4kY9KOsBueiMO9pVurokpxv9YcVfUBhlCYxw2lxoBAQY2gqBq6bEeIxTdYfI+Z3Fr3mLEaHWXPd99oBbHaBJocip9RRxpliAni5yw2y4oRoAUkQgHCSjGiajDha8mR0eJQK8ZXuYmF2iVgdZedFmiSjrGauq18Ia0l8VxDfyeKgIA62LyF4ok8J0+di/gmLdGHUhYX5EEXl0Vf56Kl+JUdfF8RrWbwpiDey6MYFNa6o84I6r6iLgrqQ1WVBXMriqiCuZPFzQfwsi7f7Yj/ui/0kxYr6i02xaowmaCp+H+sllMxgmpxdvv0rTXrrZ7sWYqSbZSN0H40nw3an10llGT/qrWHXtJyqnhs6Vt+0VYbc0e/YhjPYsRxL3hzaMNqDfluOgnind3v2sKrvYI1+x7EUhh3t0G4NOtvyIRRKVi/32b12q1IWL8/pWZZ12qvqucFq2Xb3WGHIHbbjOH17jUJjRjGSvPTR2G6fGtUomgd1jXarr9B3E2B0LasCSwss1tAyHXPNwhHAkpPni8Xu9nsDOYjv6m/1bbsygbxQ/q5tOi2FYcd66pwOTtYkhIHQk6tC8vK1O92TrhxF8qBhR8xghYXs3jTsWUanwkIKLEPLtrLClnei2GR35jjZ/HJ3+04/MPU0lc1io8nmbO89miUvVphxvVtp3+dXD8C18NUvhbAuHirCYS0MVLHAeniohIf74L2q36uL9xThXi2Mp2Lx6uE9Jby3D55W/bQunirCaS0MVbHQeniqhKf74HnVz+viuSKc18JwFQuvh+dKeL4PnlT9pC6eKMJJLQxRsZB6eKKEJ3vgNxeC7KQgVlfC0s05gWbHeoD17FhMsO6H25ND6adGs1EzKA6TG/cm3EHiesDQW3HueC/OtEAc835PRoB5gR+m4rrgjQ6z1j4jWD4aRashriqmfDGpNm6Oj8zWkWl+aB28+WN7a3mm/aL9qv2mmVpHe6OdaefalQa1QPtb+0f7t/lz88/msHm2sT59sh3zk1Z6mh/+B1b9/uQ=</latexit>r3

<latexit sha1_base64="y2/EBqPf66V+UZmCJQzpbqka44c=">AAAOEnicfZdNb9s2HMbV7q3Lmi3djjtMWFCgG4JAShy/YChQS7bRw9pmQV66RUFA0bQsmBIJinLsajruE+xj7Loddhx23RfYPs0o2ZElirIu+YfPw8c//SUKpEuxH3HD+PfBw/fe/+DDjx59vPPJ491PP9t78vllRGIG0QUkmLC3LogQ9kN0wX2O0VvKEAhcjK7cmZ3pV3PEIp+E53xJ0U0AvNCf+BBwMXS795VD/VvH5VPEwTMHwHz06Gcn4oCj26Nvbvf2jUMjv/R6Ya6LfW19nd4+efyfMyYwDlDIIQZRdG0alN8kgHEfYpTuOHGEKIAz4KHrmE+6N4kf0pijEKb6U6FNYqxzomew+thnCHK8FAWAzBcJOpwCJjDFLe1UoyIUggBFB+O5T6NVGc29VcGB6MdNssj7lVYmJh4DdOrDRYUsAUEUAD6tDUbLwK0OohgjNg+qgxmlYJScC8SgH2U9OBWNeUOzZkfn5HStT5d0isIoTWKG0/JEISDG0ERMzMsI8Zgm+c2I5z6LnnMWo4OszMeeDwCbnaHxgcipDFRxJpgAXh1yg6w5IbqDJAhAOE4cmiYORwueOAeHqRCf6i4WZpcANq46z9IkcbKeua5+JqwV8XVJfC2Lw5I4XP8IwWN9Qpg+F8+fsEgXRl1YmA9RVJ19Ucye6Bdy9GVJvJTFq5J4JYtuXFLjmjovqfOaeldS72R1URIXsrgsiUtZfFcS38ni222xP26L/UmKFf0Xi2K544zRRHxW8lcomcE0eXn+6vs06eXX+l2IkW5WjdC9Nx6P2p1eJ5VlfK+3Rl3TGtT1wtCx+qatMhSOfsc2BsMNy5HkLaANoz3st+UoiDd6t2eP6voG1uh3BpbCsKEd2a1hZ90+hELJ6hU+u9du1driFTk9y7JOenW9MFgt2+4eKQyFwx4MBn07R6ExoxhJXnpvbLdPjHoULYK6RrvVV+ibB2B0LasGS0ss1sgyB2bOwhHAkpMXL4vd7feGchDf9N/q23btAfJS+7u2OWgpDBvWk8HJ8DgnIQyEntwVUrSv3eked+UoUgSNOuIJ1ljI5pdGPcvo1FhIiWVk2VbW2OpKFIvs2rxJVp/czbrT9009TWWzWGiyOVt792bJixVm3OxW2rf51RNwI3z9TiFsioeKcNgIA1UssBkeKuHhNniv7vea4j1FuNcI46lYvGZ4TwnvbYOndT9tiqeKcNoIQ1UstBmeKuHpNnhe9/OmeK4I540wXMXCm+G5Ep5vgyd1P2mKJ4pw0ghDVCykGZ4o4ckWeIbuxJYv2ymItythtcjV0SHXIU5ATc8PFLlMcBLV5NUJJNPdQDDl/6y2IjQ7OQCsZztvgnU/XG9OKt9Nms2cQbFfXblX/AMkTiAMvRJbmzdi2wzETvJbQcq8wBek4q9zkFXbjGBxbxTVjjgNmfLZp15cHR2arUPT/KG1/+K79cHokfal9rX2TDO1jvZCe6mdahca1H7RftN+1/7Y/XX3z92/dv9eWR8+WM/5Qqtcu//8D7/8M54=</latexit>

⇡✓(a2|s2)

<latexit sha1_base64="yadDSl6u/Z0oEAShEoO9Io7EzIg=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnS+9P7Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q+jyvHT</latexit>s3

How to compute ? Gradient estimation!

<latexit sha1_base64="o5GAbZzkXTS46Vj7G4bTnaIGpf4=">AAAOOXicfZdNb9s2HMaV7q3L5i3djjtMWFCgG4JAShy/HArUkm0Uw9pmQV7aRUFA0bQimBIJknLsCjru0+y6XfZNdttx2HVfYJTsyBIlWRfTfB4+/vFP0SBdin0uDOOvnUcffPjRx588/nT3s89bX3y59+SrS04iBtEFJJiwty7gCPshuhC+wOgtZQgELkZX7sxO9as5Ytwn4blYUnQTAC/0pz4EQnbd7n3rhMDF4NZxxR0SQHegh+Mfk2fr79/f7u0bh0b26NWGuW7sa+vn9PZJa8eZEBgFKBQQA86vTYOKmxgw4UOMkl0n4ogCOAMeuo7EtHcT+yGNBAphoj+V2jTCuiB6CqtPfIagwEvZAJD5MkGHd4ABKOSUdstRHIUgQPxgMvcpXzX53Fs1hJwjuokXWb2S0sDYY4De+XBRIotBwAMg7iqdfBm45U4UYcTmQbkzpZSMinOBGPR5WoNTWZg3NF0Cfk5O1/rdkt6hkCdxxHBSHCgFxBiayoFZkyMR0TibjFz3GX8uWIQO0mbW93wI2OwMTQ5kTqmjjDPFBIhylxukxQnRPSRBAMJJ7NAkdgRaiNg5OEyk+FSXbwucuQSwSdl5lsSxk9bMdfUzaS2Jrwvia1UcFcTR+kcInuhTwvS5XH/CuC6NurQwHyJeHn2Rj57qF2r0ZUG8VMWrgnilim5UUKOKOi+o84p6X1DvVXVREBequCyIS1V8XxDfq+LbbbHvtsX+osTK+stNsdx1Jmgq/1ayVyiewSR+ef7qpyTuZ8/6XYiQbpaN0H0wHo873X43UWX8oLfHPdMaVvXc0LUGpl1nyB2Drm0MRxuWI8WbQxtGZzToqFEQb/Re3x5X9Q2sMegOrRrDhnZst0fddfkQChWrl/vsfqddKYuX5/QtyzrpV/XcYLVtu3dUY8gd9nA4HNgZCo0YxUjx0gdjp3NiVKNoHtQzOu1Bjb5ZAKNnWRVYWmCxxpY5NDMWgQBWnCJ/WezeoD9Sg8Sm/tbAtisLKArl79nmsF1j2LCeDE9GxxkJYSD01KqQvHydbu+4p0aRPGjclStYYSGbXxr3LaNbYSEFlrFlW2lhyztRbrJr8yZe/eVu9p2+b+pJoprlRlPN6d57MCteXGPGze5a+zZ//QDcCF+dKYRN8bAmHDbCwDoW2AwPa+HhNniv6vea4r2acK8Rxqtj8ZrhvVp4bxs8rfppUzytCaeNMLSOhTbD01p4ug1eVP2iKV7UhItGGFHHIprhRS282AZPqn7SFE9qwkkjDKljIc3wpBaebIFn6F4e+dKTQnpDYJVIeSaXp9lMhzgGFZ0LIFAmExzziry6bqS6G0im7IvqmfgckigUWQrFseMBqSSrEwtNLxgA6+kBnWDdD9dnmNLfK02HzqA81q7cq2kOkbyoMPRKnoDeyNM1kAfOH+SEmBf4ckLy0zlIW9uMYPFglK1deWky1StStXF1dGi2D03z5/b+i876/vRY+0b7TnummVpXe6G91E61Cw1qv2q/ab9rf7T+bP3d+qf178r6aGc95mut9LT++x8FyEOq</latexit>

r✓J(✓)

The problem with finding the gradient is that the computation graph of MDPs are not differentiable. What we mean with this is that there is no differentiable path from the rewards to the model parameters \theta.

Simple REINFORCE: 1. 2.

Let’s derive algorithm!

<latexit sha1_base64="SoGZ+HdYsbbenfP+QovWVhqqXZE=">AAAOVnicfZdPb9s2GMbVbl27bEnT7diLsKBANwSBlDj+cyhQS7bRw9JmQf5tsRFQNC0LpkSCpBy7mg77NLtu32b7MsMo2ZElSrIOyWs+Dx//9Eo0SIdijwvD+PfJ0y++fPbV8xdf73zz7e7ey/1X311zEjKIriDBhN06gCPsBehKeAKjW8oQ8B2MbpyZneg3c8S4R4JLsaRo5AM38CYeBEIO3e+/HjoChPqQe75O36Yffpd/p0iAH+/3D4wjI730cmGuiwNtfZ3fv9o9GI4JDH0UCIgB53emQcUoAkx4EKN4ZxhyRAGcARfdhWLSHkVeQEOBAhjrb6Q2CbEuiJ6A6mOPISjwUhYAMk8m6HAKGIBC3s5OMYqjAPiIH47nHuWrks/dVSGA7MUoWqS9igsTI5cBOvXgokAWAZ/7QExLg3zpO8VBFGLE5n5xMKGUjIpzgRj0eNKDc9mYTzRpP78k52t9uqRTFPA4ChmO8xOlgBhDEzkxLTkSIY3Sm5HPfMbfCRaiw6RMx971AJtdoPGhzCkMFHEmmABRHHL8pDkBeoDE90EwjoY0joYCLUQ0PDyKpfhGd7A0OwSwcdF5EUfRMOmZ4+gX0loQP+bEj6rYz4n99ZcQPNYnhOlz+fwJ47o06tLCPIh4cfZVNnuiX6nR1znxWhVvcuKNKjphTg1L6jynzkvqQ059UNVFTlyo4jInLlXxc078rIq322J/3Rb7mxIr+y8XxXJnOEYT+ZOSvkLRDMbRh8uzn+Ook17rdyFEulk0QufReDJotjqtWJXxo94YtE2rV9YzQ8vqmnaVIXN0W7bR629YjhVvBm0YzX63qUZBvNHbHXtQ1jewRrfVsyoMG9qB3ei31u1DKFCsbuazO81GqS1ultOxLOu0U9Yzg9Ww7fZxhSFz2L1er2unKDRkFCPFSx+NzeapUY6iWVDbaDa6FfrmARhtyyrB0hyLNbDMnpmyCASw4hTZy2K3u52+GiQ2/be6tl16gCLX/rZt9hoVhg3rae+0f5KSEAYCV+0KydrXbLVP2moUyYIGLfkESyxk802DjmW0SiwkxzKwbCtpbHElykV2Z46i1U/uZt3pB6Yex6pZLjTVnKy9R7PixRVmXO+utG/zV0/AtfDlO4WwLh5WhMNaGFjFAuvhYSU83Abvlv1uXbxbEe7WwrhVLG49vFsJ726Dp2U/rYunFeG0FoZWsdB6eFoJT7fBi7Jf1MWLinBRCyOqWEQ9vKiEF9vgSdlP6uJJRTiphSFVLKQenlTCky3wDD3ILV+yU5BvV8RKkXJPLnezqQ5xBEo6F0CgVCY44iV5ddxIdMeXTOmHsgeEmUOWqj72OCRhINaQGEdDF0gtXu1paHIEAVhPtvAE616w3uUUfoBpMnkG5cZ35V41oofkUYahM7lH+iT330BuSX+St8xc35O3LP8PD5NqmxEsHo2y2pHHKlM9RJWL6+Mj8/TI+KVx8P5sfcB6ob3WftDeaqbW0t5rH7Rz7UqD2h/an9pf2t+7/+z+t/ds7/nK+vTJes73WuHa2/8f2ZtNqg==</latexit>

⌧ ⇠ p(⌧|✓)

SIMPLE REINFORCE

22

Sample trajectory

Gradient ascent

<latexit sha1_base64="7hE5kK6cYV0J+Rq5uwgO1533hnI=">AAAOnHicfZfbbts2HMbl7tRla5puNxsGDMKCAt2WBVLi+HARoJZko8DWNgty2qLMoGhaFkyJBEU5djVd7Sn3CHuLUbIs62jdmOb38dOPf4kCaVHs+FxR/m09+ejjTz797Onne198+Wz/+cGLr258EjCIriHBhN1ZwEfY8dA1dzhGd5Qh4FoY3VpzPdZvF4j5DvGu+IqiBxfYnjN1IOCia3zwj2nxGeJANjGacsAYeZQ3XT/LJqQ4NAGmMxCJPzYOL6OxOXF8SAKPy6YfuOOQnyvRX+HVL2pkesDCYBymAWIIJrZsUmfb9coEMLkz/9v0OeBozH8cHxwqx0pyydWGmjYOpfS6GL941jInBAYu8jjEwPfvVYXyhxAw7kCMoj0z8BEFcA5sdB/wae8hdDwacOTBSH4ptGmAZU7kuCDyxGEIcrwSDQCZIxJkOANMYIqy7RWjfOQBF/lHk4VD/XXTX9jrBhdzRw/hMnkmUWFgaDNAZw5cFshC4Pou4LNKp79yrWInCjBiC7fYGVMKxpJziRh0/LgGF6Iw72lcbP+KXKT6bEVnyPOjMGA4yg8UAmIMTcXApOkjHtAwmYx4t+b+OWcBOoqbSd+5Adj8Ek2ORE6ho4gzxQTwYpflxsXx0CMkrgu8SWjSKDQ5WvLQPDqOhPhSFm8RnFsEsEnReRmFoRnXzLLkS2EtiO9y4ruyOMyJw/QmBE/kKWHyQjx/wnxZGGVhYQ5EfnH0dTZ6Kl+Xo29y4k1ZvM2Jt2XRCnJqUFEXOXVRUR9z6mNZXebEZVlc5cRVWfyQEz+UxbtdsX/siv2zFCvqLxbFas+coKn4dCWvUDiHUfjm6u1vUdhPrvRdCJCsFo3Q2hhPR51uvxuVZbzR26OeqhlVPTN0tYGq1xkyx6CrK8Zwy3JS8mbQitIZDjrlKIi3eq+vj6r6FlYZdA2txrClHentYTctH0JeyWpnPr3faVfKYmc5fU3TzvpVPTNobV3vndQYModuGMZAT1BowChGJS/dGDudM6UaRbOgntJpD2r07QNQeppWgaU5Fm2kqYaasHAEcMnJs5dF7w36w3IQ39ZfG+h65QHyXPl7umq0awxb1jPjbHiakBAGPLtcFZKVr9PtnfbKUSQLGnXFE6ywkO2dRn1N6VZYSI5lpOlaXNjiShSL7F59CNef3O26kw9VOYrKZrHQyuZ47W3MJS+uMeNmd619l79+AG6Er84UwqZ4WBMOG2FgHQtshoe18HAXvF31203xdk243Qhj17HYzfB2Lby9C55W/bQpntaE00YYWsdCm+FpLTzdBc+rft4Uz2vCeSMMr2PhzfC8Fp7vgidVP2mKJzXhpBGG1LGQZnhSC092wDP0KLZ88U4hPmqwSuT66JDoEIegoicHikQmOPQrcnoEEbrlCqbkT9mzOdskKfHhxwZCidY7FhofMACW4w06wbLjpXuYwueVxkPnUGxr1+71NA0kDioMvRU7oPdidw3EhvMnMSFmu46YkPg1j+LWLiNYboyitScOTWr5iFRt3Jwcq2fHyu/tw9dv0+PTU+k76QfplaRKXem19Ea6kK4lKP3Xet76pvXt/vf7xv6v+6n3SSsd87VUuPZv/gfDJGYO</latexit>

✓ ✓+ ↵R�

T-1X

t=0

r✓ log⇡✓(at|st)

We will first look at the simplest method to estimate the policy gradient: The simple REINFORCE algorithm.

The simple REINFORCE algorithm starts by sampling a trajectory from the agent interacting with the environment. This happens just like how we sampled a trajectory when estimating the expectation of the return.

Then, we estimate the gradient using the simple REINFORCE estimate. This might look a bit like magic! Where does it come from? We’ll derive it in the next few slides to ensure it’s not some equation thrown out of nowhere :)

This estimate gradient is finally used in the gradient ascent step.

Expectation is over all trajectories :(

To sample, we need an expression like

Solution: The score function!

<latexit sha1_base64="V94a1rmN00BZAOsNJWV/v3Cf5kA=">AAAOmnicfZfbcqM2HMbZ7WmbdtNse7k3tJmd2XYyGUgcHy4yswbbsz3sbppJNmmDxxWyTBgLpBHCsZfyen2HvkNv2+sKjDEIMDcW+j59/vFHYiSbYjfgmvb3o8cfffzJp589+Xzviy+f7n918Ozr9wEJGUTXkGDCbm0QIOz66Jq7HKNbyhDwbIxu7LmZ6DcLxAKX+Fd8RdHYA47vzlwIuOiaHPxh+cDGYBJZNr9HHMQWdHD0U/wyu//+PDNk95YH+L1tR8N4EtHEBEL1z403vktHX8YTa+oGkIQ+H08ODrVjLb3UakPPGodKdl1Mnj391poSGHrI5xCDILjTNcrHEWDchRjFe1YYIArgHDjoLuSz7jhyfRpy5MNYfSG0WYhVTtTkadWpyxDkeCUaADJXJKjwHjAAuajJXjkqQD7wUHA0Xbg0WDeDhbNucFEDNI6WacHj0sDIYYDeu3BZIouAFySlqnQGK88ud6IQI7bwyp0JpWCUnEvEoBskNbgQhXlHk3cYXJGLTL9f0XvkB3EUMhwXBwoBMYZmYmDaDBAPaZQ+jJg48+CcsxAdJc2073wA2PwSTY9ETqmjjDPDBPByl+0lxfHRAySeB/xpZNE4sjha8sg6Oo6F+EIVswnObQLYtOy8jKMom17qpbCWxLcF8a0sDgviMPsTgqfqjDB1Id4/YYEqjKqwMBeioDz6Oh89U6/l6PcF8b0s3hTEG1m0w4IaVtRFQV1U1IeC+iCry4K4lMVVQVzJ4oeC+EEWb3fF/rYr9ncpVtRfLIrVnjVFM/FdSqdQNIdx9PrqzS9x1EuvbC6ESNXLRmhvjKejdqfXiWUZb/TWqKsbg6qeGzpGXzfrDLmj3zG1wXDLciJ5c2hNaw/7bTkK4q3e7Zmjqr6F1fqdgVFj2NKOzNawk5UPIV+yOrnP7LVblbI4eU7PMIyzXlXPDUbLNLsnNYbcYQ4Gg76ZotCQUYwkL90Y2+0zrRpF86Cu1m71a/TtC9C6hlGBpQUWY2ToAz1l4QhgycnzyWJ2+72hHMS39Tf6pll5gbxQ/q6pD1o1hi3r2eBseJqSEAZ8R64KycvX7nRPu3IUyYNGHfEGKyxk+0+jnqF1KiykwDIyTCMpbHklikV2p4+j9Sd3u+7UQ12NY9ksFppsTtbexix5cY0ZN7tr7bv89QNwI3z1SSFsioc14bARBtaxwGZ4WAsPd8E7Vb/TFO/UhDuNME4di9MM79TCO7vgadVPm+JpTThthKF1LLQZntbC013wvOrnTfG8Jpw3wvA6Ft4Mz2vh+S54UvWTpnhSE04aYUgdC2mGJ7XwZAc8Qw9iy5fsFJIzAqtEij252M2mOsQRqOgBBxylMsFRUJGz44vQbU8wpTdVDwhzh2jK+ubQkv4LxZHlAKHE6x0NTQ4gAKvJBp5g1fWzPU7p80uToXMotr1r97oMAyQOMgy9ETukd2L3DcSG9AfxwMzxXPHA4tc6Slq7jGC5MYrWnjhU6fIRqtq4OTnWW8e6/mvr8FU7O189UZ4r3ykvFV3pKK+U18qFcq1A5S/lH+Vf5b/95/v9/R/3f15bHz/KxnyjlK79q/8BGzFpMg==</latexit>

r✓J(✓) = r✓Ep(⌧|✓)[R�]

<latexit sha1_base64="tUC8aEd9SL1MjT6TyR974uZxXQ4=">AAAOP3icfZdNb9s2HMbV7q3L5q3djrsICwoMQxBIieOXQ4Faso0e1jYL8rbVQUHRtCyYEgmScuwK+gy7bt9mX2NfYMdh191GyY4sUZR1CcPn4eOf/hKFPz2KAy4s669Hjz/6+JNPP3vy+cEXX7a++vrps2+uOYkZRFeQYMJuPcARDiJ0JQKB0S1lCIQeRjfews30myViPCDRpVhTdBcCPwpmAQRCTl1NPAHi908PrWMrv8z6wN4ODo3tdf7+WetwMiUwDlEkIAacv7MtKu4SwEQAMUoPJjFHFMAF8NG7WMx6d0kQ0VigCKbmc6nNYmwKYmZA5jRgCAq8lgMAWSATTDgHDEAhsQ+qURxFIET8aLoMKN8M+dLfDASQ93yXrPKapJWFic8AnQdwVSFLQMhDIOa1Sb4OveokijFiy7A6mVFKRsW5QgwGPKvBuSzMW5qVmV+S860+X9M5iniaxAyn5YVSQIyhmVyYDzkSMU3ym5HPdsFfCBajo2yYz70YAra4QNMjmVOZqOLMMAGiOuWFWXEidA9JGIJomkxomkwEWolkcnScSvG56WFp9ghg06rzIk2SSVYzzzMvpLUivimJb1RxVBJH2x8heGrOCDOX8vkTxk1pNKWFBRDx6uqrYvXMvFKjr0vitSrelMQbVfTikhrX1GVJXdbU+5J6r6qrkrhSxXVJXKvih5L4QRVv98X+si/2VyVW1l9uivXBZIpm8tORv0LJAqbJq8vXP6VJP7+270KMTLtqhN6D8XTc6fa7qSrjB7097tnOsK4Xhq4zsF2doXAMuq41HO1YThRvAW1ZndGgo0ZBvNN7fXdc13ew1qA7dDSGHe3YbY+62/IhFClWv/C5/U67Vha/yOk7jnPWr+uFwWm7bu9EYygc7nA4HLg5Co0ZxUjx0gdjp3Nm1aNoEdSzOu2BRt89AKvnODVYWmJxxo49tHMWgQBWnKJ4WdzeoD9Sg8Su/s7AdWsPUJTK33PtYVtj2LGeDc9GpzkJYSDy1aqQonydbu+0p0aRImjclU+wxkJ2vzTuO1a3xkJKLGPHdbLCVnei3GTv7Ltk88nd7Tvz0DbTVDXLjaaas733YFa8WGPGzW6tfZ9fvwA3wtfvFMKmeKgJh40wUMcCm+GhFh7ug/frfr8p3teE+40wvo7Fb4b3tfD+Pnha99OmeKoJp40wVMdCm+GpFp7ugxd1v2iKF5pw0QgjdCyiGV5o4cU+eFL3k6Z4ogknjTBEx0Ka4YkWnuyBZ+hetnxZpyDfroTVImVPLrvZXIc4ATWdCyBQLhOc8JrsiTkSINO9UDLl/9Q9IC4ccqjq04BDEkdiC4lxMvGB1NJNT0OzIwjAZtbCE2wG0bbLqXyAabZ4AWXju3FvCjFE8ijD0GvZI72V/TeQLemP8paZHwbyluXfyVE22mcEqwejHB3IY5WtHqLqg5uTY7t9bNs/tw9fdrYnrCfGd8b3xg+GbXSNl8Yr49y4MqARGL8Zvxt/tP5s/d36p/Xvxvr40XbNt0blav33PxfKR7o=</latexit>

JOINT DISTRIBUTION OF MDP

23

<latexit sha1_base64="YvMQ+ScIH1Lk+rrLO/ZFtn1H69o=">AAAOZHicfZfbbts2HMbV7tSlS5au2NWAQVhQIB2CQEocHy4K1JJt9GJtsyCnLTYCiqZlwZRIkJRjV9Mr7Gl2u73HHmDvMUp2ZImSrBvR/D5+/vEvUSAdij0uDOPfJ08/+/yLL7969vXO8292977df/HdNSchg+gKEkzYrQM4wl6AroQnMLqlDAHfwejGmdmJfjNHjHskuBRLikY+cANv4kEgZNf9/uGQh/59NHQECGN6mN71P3R5nyIBXuuTVddr/X7/wDg20ksvN8x140BbX+f3L3YPhmMCQx8FAmLA+Z1pUDGKABMexCjeGYYcUQBnwEV3oZi0R5EX0FCgAMb6K6lNQqwLoifU+thjCAq8lA0AmScTdDgFDEAh57ZTjOIoAD7iR+O5R/mqyefuqiGALMwoWqSFiwsDI5cBOvXgokAWAZ/7QExLnXzpO8VOFGLE5n6xM6GUjIpzgRj0eFKDc1mYjzR5FvySnK/16ZJOUcDjKGQ4zg+UAmIMTeTAtMmRCGmUTka+ADP+RrAQHSXNtO9ND7DZBRofyZxCRxFnggkQxS7HT4oToAdIfB8E42hI42go0EJEw6PjWIqvdAdLs0MAGxedF3EUDZOaOY5+Ia0F8UNO/KCK/ZzYX/8JwWN9Qpg+l8+fMK5Loy4tzIOIF0dfZaMn+pUafZ0Tr1XxJifeqKIT5tSwpM5z6rykPuTUB1Vd5MSFKi5z4lIVP+XET6p4uy32t22xvyuxsv5yUSx3hmM0kd+X9BWKZjCO3l2+/yWOOum1fhdCpJtFI3QejaeDZqvTilUZP+qNQdu0emU9M7SsrmlXGTJHt2Ubvf6G5UTxZtCG0ex3m2oUxBu93bEHZX0Da3RbPavCsKEd2I1+a10+hALF6mY+u9NslMriZjkdy7LOOmU9M1gN226fVBgyh93r9bp2ikJDRjFSvPTR2GyeGeUomgW1jWajW6FvHoDRtqwSLM2xWAPL7Jkpi0AAK06RvSx2u9vpq0FiU3+ra9ulByhy5W/bZq9RYdiwnvXO+qcpCWEgcNWqkKx8zVb7tK1GkSxo0JJPsMRCNv806FhGq8RCciwDy7aSwhZXolxkd+YoWn1yN+tOPzD1OFbNcqGp5mTtPZoVL64w43p3pX2bv3oAroUvzxTCunhYEQ5rYWAVC6yHh5XwcBu8W/a7dfFuRbhbC+NWsbj18G4lvLsNnpb9tC6eVoTTWhhaxULr4WklPN0GL8p+URcvKsJFLYyoYhH18KISXmyDJ2U/qYsnFeGkFoZUsZB6eFIJT7bAM/Qgt3zJTkG+XRErRco9udzNpjrEESjpXACBUpngiJfk1cEj0R1fMqU/yh4QZg7ZVPWxxyEJA7GGxDgaukBq8WpPQ5MjCMB6soUnWPeC9S6n8AGmyeAZlBvflXtViB6SRxmG3ss90ke5/wZyS/qznDJzfU9OWd6HR0lrmxEsHo2ytSOPVaZ6iCo3bk6Ozcaxaf7aOHjbXJ+wnmk/aD9ph5qptbS32jvtXLvSoPan9pf2t/bP7n97z/de7n2/sj59sh7zUitcez/+D96TUa0=</latexit>X

p(⌧|✓)f(⌧)

<latexit sha1_base64="w3D1YAruoyiXwnUwZKWQQQScaAE=">AAAOmnicfZfbcqM2HMbZ7Wmbdt1se7k3tJmdSTuZDCSODxeZWYPt2R52N83k1IaMK2QZMxZII4RjL+X1+g59h9621xXYxiDA3Fjo+/T5pz+IkWyK3YBr2t9Pnn708Seffvbs870vvnze+Gr/xdc3AQkZRNeQYMLubBAg7PromrscozvKEPBsjG7tmZnot3PEApf4V3xJ0YMHHN+duBBw0TXa/8PygY3ByLL5FHGgWtDB0U/x4fr++3MrCL1RJG5BGK/Uy3hkjd0AktDnqjScHqZO9c/NeHW0f6Ada+mllhv6unGgrK+L0Yvn31pjAkMP+RxiEAT3ukb5QwQYdyFG8Z4VBogCOAMOug/5pPMQuT4NOfJhrL4S2iTEKidqMlt17DIEOV6KBoDMFQkqnAIGIBc12StGBcgHHgqOxnOXBqtmMHdWDS5miR6iRVrwuDAwchigUxcuCmQR8AIP8GmpM1h6drEThRixuVfsTCgFo+RcIAbdIKnBhSjMe5o8w+CKXKz16ZJOkR/EUchwnB8oBMQYmoiBaTNAPKRROhnx4syCc85CdJQ0077zPmCzSzQ+EjmFjiLOBBPAi122lxTHR4+QeB7wx5FF48jiaMEj6+g4FuIrVbwvcGYTwMZF52UcRVZSM9tWL4W1IL7Lie9kcZATB+s/IXisTghT5+L5ExaowqgKC3MhCoqjr7PRE/Vajr7JiTeyeJsTb2XRDnNqWFLnOXVeUh9z6qOsLnLiQhaXOXEpix9y4gdZvNsV+9uu2N+lWFF/sSiWe9YYTcR3KX2FohmMozdXb3+Jo256rd+FEKl60QjtjfF02Gp327Es443eHHZ0o1/WM0Pb6OlmlSFz9Nqm1h9sWU4kbwataa1BryVHQbzVO11zWNa3sFqv3TcqDFvaodkctNflQ8iXrE7mM7utZqksTpbTNQzjrFvWM4PRNM3OSYUhc5j9fr9npig0ZBQjyUs3xlbrTCtH0Syoo7WavQp9+wC0jmGUYGmOxRgael9PWTgCWHLy7GUxO73uQA7i2/obPdMsPUCeK3/H1PvNCsOW9ax/NjhNSQgDviNXhWTla7U7px05imRBw7Z4giUWsv2nYdfQ2iUWkmMZGqaRFLa4EsUiu9cfotUnd7vu1ANdjWPZLBaabE7W3sYseXGFGde7K+27/NUDcC18eaYQ1sXDinBYCwOrWGA9PKyEh7vgnbLfqYt3KsKdWhinisWph3cq4Z1d8LTsp3XxtCKc1sLQKhZaD08r4ekueF7287p4XhHOa2F4FQuvh+eV8HwXPCn7SV08qQgntTCkioXUw5NKeLIDnqFHseVLdgrJIYKVIsWeXOxmUx3iCJT0gAOOUpngKCjJq/NGotueYEpvyh4QZg7RlPXNqSb9F4ojywFCiVc7GpocQABWkw08warrr/c4hc8vTYbOoNj2rtyrMvSROMgw9FbskN6L3TcQG9IfxISZ47liwuLXOkpau4xgsTGK1p44VOnyEarcuD051pvHuv5r8+B1a32+eqa8VL5TDhVdaSuvlTfKhXKtQOUv5R/lX+W/xstGr/Fj4+eV9emT9ZhvlMLVuPofDbVn1A==</latexit>

r✓J(✓) =X

R�r✓p(⌧|✓)

Let’s first recap what we know. We want to estimate the gradient of the expected return. This is the derivative of an expectation! We can write this derivative out by expanding the expectation: This results in a sum over all trajectories. We can move the derivative inwards, however. Still, it is obviously impossible to compute this: The amount of possible trajectories is likely infinitely big, so we cannot sum over it.

Like usually when we see an expectation, we want to estimate it using monte carlo estimation. However, to be able to do monte carlo estimation, we need an expression in the form of a sum of the probability times some quantity. Note that the gradient is not in this form: It is the gradient of the probability times a quantity (R_\gamma).

How can we write the gradient in a form that allows sampling? For that, we can use the score function! Let’s show what it is.

Page 9: Lecture 9: Reinforcement Learning Emile van Krieken · 2021. 1. 25. · Emile van Krieken Deep Learning 2020 Lecture 9: Reinforcement Learning dlvu.github.io Reinforcement learning

This is an expression like !

THE SCORE FUNCTION

24

Multiply by 1

<latexit sha1_base64="YvMQ+ScIH1Lk+rrLO/ZFtn1H69o=">AAAOZHicfZfbbts2HMbV7tSlS5au2NWAQVhQIB2CQEocHy4K1JJt9GJtsyCnLTYCiqZlwZRIkJRjV9Mr7Gl2u73HHmDvMUp2ZImSrBvR/D5+/vEvUSAdij0uDOPfJ08/+/yLL7969vXO8292977df/HdNSchg+gKEkzYrQM4wl6AroQnMLqlDAHfwejGmdmJfjNHjHskuBRLikY+cANv4kEgZNf9/uGQh/59NHQECGN6mN71P3R5nyIBXuuTVddr/X7/wDg20ksvN8x140BbX+f3L3YPhmMCQx8FAmLA+Z1pUDGKABMexCjeGYYcUQBnwEV3oZi0R5EX0FCgAMb6K6lNQqwLoifU+thjCAq8lA0AmScTdDgFDEAh57ZTjOIoAD7iR+O5R/mqyefuqiGALMwoWqSFiwsDI5cBOvXgokAWAZ/7QExLnXzpO8VOFGLE5n6xM6GUjIpzgRj0eFKDc1mYjzR5FvySnK/16ZJOUcDjKGQ4zg+UAmIMTeTAtMmRCGmUTka+ADP+RrAQHSXNtO9ND7DZBRofyZxCRxFnggkQxS7HT4oToAdIfB8E42hI42go0EJEw6PjWIqvdAdLs0MAGxedF3EUDZOaOY5+Ia0F8UNO/KCK/ZzYX/8JwWN9Qpg+l8+fMK5Loy4tzIOIF0dfZaMn+pUafZ0Tr1XxJifeqKIT5tSwpM5z6rykPuTUB1Vd5MSFKi5z4lIVP+XET6p4uy32t22xvyuxsv5yUSx3hmM0kd+X9BWKZjCO3l2+/yWOOum1fhdCpJtFI3QejaeDZqvTilUZP+qNQdu0emU9M7SsrmlXGTJHt2Ubvf6G5UTxZtCG0ex3m2oUxBu93bEHZX0Da3RbPavCsKEd2I1+a10+hALF6mY+u9NslMriZjkdy7LOOmU9M1gN226fVBgyh93r9bp2ikJDRjFSvPTR2GyeGeUomgW1jWajW6FvHoDRtqwSLM2xWAPL7Jkpi0AAK06RvSx2u9vpq0FiU3+ra9ulByhy5W/bZq9RYdiwnvXO+qcpCWEgcNWqkKx8zVb7tK1GkSxo0JJPsMRCNv806FhGq8RCciwDy7aSwhZXolxkd+YoWn1yN+tOPzD1OFbNcqGp5mTtPZoVL64w43p3pX2bv3oAroUvzxTCunhYEQ5rYWAVC6yHh5XwcBu8W/a7dfFuRbhbC+NWsbj18G4lvLsNnpb9tC6eVoTTWhhaxULr4WklPN0GL8p+URcvKsJFLYyoYhH18KISXmyDJ2U/qYsnFeGkFoZUsZB6eFIJT7bAM/Qgt3zJTkG+XRErRco9udzNpjrEESjpXACBUpngiJfk1cEj0R1fMqU/yh4QZg7ZVPWxxyEJA7GGxDgaukBq8WpPQ5MjCMB6soUnWPeC9S6n8AGmyeAZlBvflXtViB6SRxmG3ss90ke5/wZyS/qznDJzfU9OWd6HR0lrmxEsHo2ytSOPVaZ6iCo3bk6Ozcaxaf7aOHjbXJ+wnmk/aD9ph5qptbS32jvtXLvSoPan9pf2t/bP7n97z/de7n2/sj59sh7zUitcez/+D96TUa0=</latexit>X

p(⌧|✓)f(⌧)

<latexit sha1_base64="vJo4jzbBWAJMjcNmkrvqihZCp28=">AAAQMXicrZdNb9s2HMad7K3L1qXdjrtoCzp0QxBIreOXQ4Baso1iWNssaJp2kWFQNK0IpkSCohy7mk67bl9mn2a7DbvuvvMo2ZYlUjI2YLqE4vPw0U9/kQ7pUOyFXNd/39t/59333v/gzocHH31895PDe/c/fRWSiEF0CQkm7LUDQoS9AF1yj2P0mjIEfAejK2dmpfrVHLHQI8FLvqRo5AM38KYeBFx0je/v/W0HwMFgbDv8BnGg2dDF8bfJw/X911+d2WHkj2NxD6JkJV8kY3vihZBEAdek8fRh5tR+3ARotn3wP4RMGYDxuj/vTtSef/W4uvxqjP/4uIpwBUB+YRsTdztQ24xchfuA3zhOPEjG6vOvlewdyfmw0cH43pF+omeXpjaMdeOosb7Ox/fvfmFPCIx8FHCIQRheGzrloxgw7kGMkgM7ChEFcAZcdB3xaWcUewGNOApgoj0Q2jTCGidaOgm1iccQ5HgpGgAyTyRo8AaIL8DFVD0oR4UoAD4Kjydzj4arZjh3Vw0uXhWN4kW2DpLSwNhlgN54cFEii4EfpuVUOsOl75Q7UYQRm/vlzpRSMErOBWLQC9ManIvCvKDp0gpfkvO1frOkNygIkzhiOCkOFAJiDE3FwKwZIh7ROHsZsZ5n4RlnETpOm1nfWR+w2QWaHIucUkcZZ4oJ4OUux0+LE6BbSHwfBJPYpklsc7TgsX18kgjxgSYmDZw5BLBJ2XmRxPF6CmoXwloSnxfE57I4KIiD9UMInmhTwrS5+P6EhZowasLCPIjC8ujLfPRUu5SjXxXEV7J4VRCvZNGJCmqkqPOCOlfU24J6K6uLgriQxWVBXMri24L4VhZf74p9syv2BylW1F8siuWBPUFT8e8im0LxDCbx05fPvkvibnat50KENKNshM7G+HjYanfbiSzjjd4cdgyzr+q5oW32DKvKkDt6bUvvD7YsjyRvDq3rrUGvJUdBvNU7XWuo6ltYvdfumxWGLe3Qag7a6/IhFEhWN/dZ3VZTKYub53RN0zztqnpuMJuW1XlUYcgdVr/f71kZCo0YxUjy0o2x1TrV1SiaB3X0VrNXoW8/gN4xTQWWFljMoWn0jYyFI4AlJ88ni9XpdQdyEN/W3+xZlvIBeaH8HcvoNysMW9bT/ungcUZCGAhcuSokL1+r3XnckaNIHjRsiy+osJDtk4ZdU28rLKTAMjQtMy1seSWKRXZtjOLVT+523WlHhpYkslksNNmcrr2NWfLiCjOud1fad/mrB+BaePVNIayLhxXhsBYGVrHAenhYCQ93wbuq362LdyvC3VoYt4rFrYd3K+HdXfBU9dO6eFoRTmthaBULrYenlfB0FzxX/bwunleE81oYXsXC6+F5JTzfBU9UP6mLJxXhpBaGVLGQenhSCU92wDN0K7Z86U4hPUgwJVLsycVuNtMhjoGihxxwlMkEx6Eirw4cqe74gim7UT0gyh2iKeubk032FIpj2wVCSVY7GpoeQADW0g08wZoXrPc4pZ9fmg6difPd2r0qQx+JgwxDz8QO6YXYfQOxIf1GvDBzfU+8sPhrH6etXUaw2BhFKz1UGfIRSm1cPToxmieG8X3z6Elrfb660/i88WXjYcNotBtPGk8b543LBtwf7f+0//P+L4e/Hv52+Mfhnyvr/t56zGeN0nX41z+F5QD7</latexit>

r✓J(✓) =X

R�r✓p(⌧|✓)

=X

R�r✓p(⌧|✓)p(⌧|✓)

p(⌧|✓)

=X

R�p(⌧|✓)r✓p(⌧|✓)

p(⌧|✓)

=X

p(⌧|✓)R�r✓ log p(⌧|✓)

= Ep(⌧|✓)[R�r✓ log p(⌧|✓)]

Again, we remind ourselves that the goal is to put the gradient of the expectation in a form so that we can sample from it. We will be using the score function trick to do this.

First, we multiply the expression by 1: p(\tau|\theta)/p(\tau|\theta). Next, we simply move the probability out of the numerator and the gradient of the probability into the numerator.

We know have an expression that we can sample from! There is a probability term, and a function (R times the fraction). This fraction is called the score function, and has a very useful representation that we will look at next.

THE SCORE FUNCTION

25

<latexit sha1_base64="bk1BZ5DCGxXqRdl1KaxQbkuQuWM=">AAAPWnichZfdbts2HMXt7qtN16bddjcMEBa06IYgkFrHHxcFask2erG2WdE06aogoGhaEUyJBEU5djXd7Wl2u73MsJcZJTuyRFKebkzzHB79+BclkB7FQcxN85/2rc8+/+LLr27f2bv79b37+w8efvM+JgmD6BQSTNi5B2KEgwid8oBjdE4ZAqGH0Zk3d3L9bIFYHJDoHV9RdBECPwpmAQRcdF0+bP/w2I2Ah8Gl6/ErxIHhYuIb9In4C5LfN50/ue7e88fujAGYuhQwHgCsN2ZbXZGk8YpuVLLXfZlR3NhYj7SyVB0j0cuGWsD/eDXxlw8OzCOzuAy1YW0aB63NdXL58N6BOyUwCVHEIQZx/NEyKb9I83lBjLI9N4kRBXAOfPQx4bP+RRpENOEogpnxSGizBBucGPmzMqYBQ5DjlWgAyAKRYMArIGbCxRPdq0fFKAIhig+ni4DG62a88NcNLqaNLtJlsVyy2sDUZ4BeBXBZI0tBGIeAXymd8Sr06p0owYgtwnpnTikYJecSMRjEeQ1ORGHe0HwFxu/IyUa/WtErFMVZmjCcVQcKATGGZmJg0YwRT2haTEYs+3n8nLMEHebNou/5CLD5WzQ9FDm1jjrODBPA611emBcnQteQhCGIpmI9ivXM0ZKn7uFRJsRHhlhAcO4RwKZ159ssTd28Zp5nvBXWmvi6Ir6WxXFFHG9uQvDUmBFmLMTzJyw2hNEQFhZAFNdHn5ajZ8apHP2+Ir6XxbOKeCaLXlJRE0VdVNSFol5X1GtZXVbEpSyuKuJKFj9VxE+yeL4r9sOu2N+kWFF/8VKs9twpmomvarGE0jnM0pfvXv2SpYPi2qyFBBlW3Qi9G+OzSbc36GWyjG/0zqRv2SNVLw09e2g5OkPpGPYcczTesjyVvCW0aXbHw64cBfFW7w+ciapvYc1hb2RrDFvaidMZ9zblQyiSrH7pcwbdjlIWv8wZ2LZ9PFD10mB3HKf/VGMoHc5oNBo6BQpNGMVI8tIbY7d7bKpRtAzqm93OUKNvH4DZt20FllZY7IltjayChSOAJScvF4vTHw7GchDf1t8eOo7yAHml/H3HGnU0hi3r8eh4/KwgIQxEvlwVUpav2+s/68tRpAya9MQTVFjI9k6TgW32FBZSYZnYjp0Xtv4mipfso3WRrj+52/fOOLCMLJPN4kWTzfm7d2OWvFhjxs1urX2XXz8AN8KrM4WwKR5qwmEjDNSxwGZ4qIWHu+B91e83xfuacL8Rxtex+M3wvhbe3wVPVT9tiqeacNoIQ3UstBmeauHpLniu+nlTPNeE80YYrmPhzfBcC893wRPVT5riiSacNMIQHQtphidaeLIDnqFrseXLdwpidaVMiRR7crGbLXSIU6DoMQccFTLBaazIm+OO0L1QMBV/VA9ISodoyvo0iCFJIr6BxDh1fSC0bL2nuTla5Vt4go0g2uxyah9gmg+ebw9p60KMkDjKMPRK7JHeiP03EFvSn8WUmR8GYsri1z3MW7uMYHljFK09cayy5EOU2jh7emR1jizr187Bi+7mhHW79X3rx9aTltXqtV60XrZOWqct2P6j/Wf7r/bf9//db+/f2b+7tt5qb8Z826pd+9/9B4dLrNQ=</latexit>

r✓ log p(⌧|✓)

=@ log p(⌧|✓)

@p(⌧|✓)

@p(⌧|✓)

@✓

=1

p(⌧|✓)r✓p(⌧|✓)

=r✓p(⌧|✓)

p(⌧|✓)?

Score function:<latexit sha1_base64="vJo4jzbBWAJMjcNmkrvqihZCp28=">AAAQMXicrZdNb9s2HMad7K3L1qXdjrtoCzp0QxBIreOXQ4Baso1iWNssaJp2kWFQNK0IpkSCohy7mk67bl9mn2a7DbvuvvMo2ZYlUjI2YLqE4vPw0U9/kQ7pUOyFXNd/39t/59333v/gzocHH31895PDe/c/fRWSiEF0CQkm7LUDQoS9AF1yj2P0mjIEfAejK2dmpfrVHLHQI8FLvqRo5AM38KYeBFx0je/v/W0HwMFgbDv8BnGg2dDF8bfJw/X911+d2WHkj2NxD6JkJV8kY3vihZBEAdek8fRh5tR+3ARotn3wP4RMGYDxuj/vTtSef/W4uvxqjP/4uIpwBUB+YRsTdztQ24xchfuA3zhOPEjG6vOvlewdyfmw0cH43pF+omeXpjaMdeOosb7Ox/fvfmFPCIx8FHCIQRheGzrloxgw7kGMkgM7ChEFcAZcdB3xaWcUewGNOApgoj0Q2jTCGidaOgm1iccQ5HgpGgAyTyRo8AaIL8DFVD0oR4UoAD4Kjydzj4arZjh3Vw0uXhWN4kW2DpLSwNhlgN54cFEii4EfpuVUOsOl75Q7UYQRm/vlzpRSMErOBWLQC9ManIvCvKDp0gpfkvO1frOkNygIkzhiOCkOFAJiDE3FwKwZIh7ROHsZsZ5n4RlnETpOm1nfWR+w2QWaHIucUkcZZ4oJ4OUux0+LE6BbSHwfBJPYpklsc7TgsX18kgjxgSYmDZw5BLBJ2XmRxPF6CmoXwloSnxfE57I4KIiD9UMInmhTwrS5+P6EhZowasLCPIjC8ujLfPRUu5SjXxXEV7J4VRCvZNGJCmqkqPOCOlfU24J6K6uLgriQxWVBXMri24L4VhZf74p9syv2BylW1F8siuWBPUFT8e8im0LxDCbx05fPvkvibnat50KENKNshM7G+HjYanfbiSzjjd4cdgyzr+q5oW32DKvKkDt6bUvvD7YsjyRvDq3rrUGvJUdBvNU7XWuo6ltYvdfumxWGLe3Qag7a6/IhFEhWN/dZ3VZTKYub53RN0zztqnpuMJuW1XlUYcgdVr/f71kZCo0YxUjy0o2x1TrV1SiaB3X0VrNXoW8/gN4xTQWWFljMoWn0jYyFI4AlJ88ni9XpdQdyEN/W3+xZlvIBeaH8HcvoNysMW9bT/ungcUZCGAhcuSokL1+r3XnckaNIHjRsiy+osJDtk4ZdU28rLKTAMjQtMy1seSWKRXZtjOLVT+523WlHhpYkslksNNmcrr2NWfLiCjOud1fad/mrB+BaePVNIayLhxXhsBYGVrHAenhYCQ93wbuq362LdyvC3VoYt4rFrYd3K+HdXfBU9dO6eFoRTmthaBULrYenlfB0FzxX/bwunleE81oYXsXC6+F5JTzfBU9UP6mLJxXhpBaGVLGQenhSCU92wDN0K7Z86U4hPUgwJVLsycVuNtMhjoGihxxwlMkEx6Eirw4cqe74gim7UT0gyh2iKeubk032FIpj2wVCSVY7GpoeQADW0g08wZoXrPc4pZ9fmg6difPd2r0qQx+JgwxDz8QO6YXYfQOxIf1GvDBzfU+8sPhrH6etXUaw2BhFKz1UGfIRSm1cPToxmieG8X3z6Elrfb660/i88WXjYcNotBtPGk8b543LBtwf7f+0//P+L4e/Hv52+Mfhnyvr/t56zGeN0nX41z+F5QD7</latexit>

r✓J(✓) =X

R�r✓p(⌧|✓)

=X

R�r✓p(⌧|✓)p(⌧|✓)

p(⌧|✓)

=X

R�p(⌧|✓)r✓p(⌧|✓)

p(⌧|✓)

=X

p(⌧|✓)R�r✓ log p(⌧|✓)

= Ep(⌧|✓)[R�r✓ log p(⌧|✓)]

As it happens, we can rewrite the fraction, the score function, into the derivative of the log probability! I’m sure you’ve seen this quantity a lot before: Losses like the cross entropy minimize log-probabilities by taking their derivatives.

We can clearly see now that this is an expectation: A sum over a probability times a function. So let’s rewrite it as such!

So how did we find that that fraction is equal to the derivative of the log-probability? It is easier to show this the other way around. Note that, by the chain rule, this derivative is equal to the 2nd line on the right. Note that the derivative of log x is 1/x, so we find on the left side 1/p(\tau|\theta). The right side is just the derivative of the probability wrt the parameters. This recovers the fraction!

How to compute ? MDP distribution over trajectories:

<latexit sha1_base64="8O3qtABy0P3Fkil1AsZUKWM4Rz4=">AAAOr3icfZftb6M2HMfJ7emW22W97eXesFUn3aaqgjbNw4tKF0ii07S766o+baXrjOMQFIMtY9LkGH/otH9mhiQEDIQ3cfz9+suHnzGybYrdgGvav41nn33+xZdfPf+6+eKbl61vD159dxOQkEF0DQkm7M4GAcKuj665yzG6owwBz8bo1p6biX67QCxwiX/FVxQ9eMDx3akLARddjweh5QMbg0fL5jPEgWpBB0e/xm82/38+tzzAZ7YdjeLHiCbdIPxnK8b3qf0yfrQmbgBJ6HM5DhNHlYc9NB8PDrVjLb3UckPfNA6VzXXx+Orlj9aEwNBDPocYBMG9rlH+EAHGXYhR3LTCAFEA58BB9yGf9h4i16chRz6M1ddCm4ZY5URNSqBOXIYgxyvRAJC5IkGFM8AA5KJQzWJUgHzgoeBosnBpsG4GC2fd4OJR0UO0TGchLgyMHAbozIXLAlkEvCApZ6kzWHl2sROFGLGFV+xMKAWj5FwiBt0gqcGFKMxHmkxscEUuNvpsRWfID+IoZDjODxQCYgxNxcC0GSAe0ih9GPE2zYNzzkJ0lDTTvvMhYPNLNDkSOYWOIs4UE8CLXbaXFMdHT5B4HvAnkUXjyOJoySPr6DgW4mtVvDRwbhPAJkXnZRxFm1dQvRTWgvghJ36QxVFOHG1uQvBEnRKmLsT8ExaowqgKC3MhCoqjr7PRU/Vajr7JiTeyeJsTb2XRDnNqWFIXOXVRUp9y6pOsLnPiUhZXOXEli59y4idZvNsX+8e+2D+lWFF/sShWTWuCpuJjlb5C0RzG0bur97/FUT+9Nu9CiFS9aIT21ng67nT73ViW8VZvj3u6MSzrmaFrDHSzypA5Bl1TG452LCeSN4PWtM5o0JGjIN7pvb45Lus7WG3QHRoVhh3t2GyPupvyIeRLVifzmf1Ou1QWJ8vpG4Zx1i/rmcFom2bvpMKQOczhcDgwUxQaMoqR5KVbY6dzppWjaBbU0zrtQYW+mwCtZxglWJpjMcaGPtRTFo4Alpw8e1nM3qA/koP4rv7GwDRLE8hz5e+Z+rBdYdixng3PRqcpCWHAd+SqkKx8nW7vtCdHkSxo3BUzWGIhuzuN+4bWLbGQHMvYMI2ksMWVKBbZvf4QrT+5u3WnHupqHMtmsdBkc7L2tmbJiyvMuN5dad/nrx6Aa+HLTwphXTysCIe1MLCKBdbDw0p4uA/eKfuduninItyphXGqWJx6eKcS3tkHT8t+WhdPK8JpLQytYqH18LQSnu6D52U/r4vnFeG8FoZXsfB6eF4Jz/fBk7Kf1MWTinBSC0OqWEg9PKmEJ3vgGXoSW75kp5AcJFgpUuzJxW421SGOQEkPOOAolQmOgpK8PnAkuu0JpvRP2QPCzCGasr492aR3oTiyHCCUeL2jockBBGA12cATrLr+Zo9T+PzSZOgcim3v2r0uwxCJgwxD78UO6aPYfQOxIf1FPDBzPFc8sPi1jpLWPiNYbo2ilRyqdPkIVW7cnhzr7WNd/719+LazOV89V35QflLeKLrSVd4q75QL5VqByn+NRqPZeNHSWjetv1p/r63PGpsx3yuFq+X+D0wEbhA=</latexit>

r✓J(✓) = Ep(⌧|✓)[R�r✓ log p(⌧|✓)]<latexit sha1_base64="sacn+9sntJwdnRn9kXqZPkjKjig=">AAAOZHicfZfbbts2HMbV7tSlS5au2NWAQVtQoCuCQEocHy4K1JJt9GJtsyCnLgoCiqYVwZRIUJRjV9M77Gl2u73GHmDvMUp2ZImirJsw/D5+/vEvUiBdiv2IG8a/jx5/9vkXX3715Outp99s73y7++y7i4jEDKJzSDBhVy6IEPZDdM59jtEVZQgELkaX7tTO9MsZYpFPwjO+oOgmAF7oT3wIuOi63X3lhMDF4NZx+R3iQHcw8XQH8oS+FF0g/mMl/JJu3e7uGQdG/uj1hrlq7Gmr5+T22fZPzpjAOEAhhxhE0bVpUH6TAMZ9iFG65cQRogBOgYeuYz7p3iR+SGOOQpjqL4Q2ibHOiZ5x62OfIcjxQjQAZL5I0OEdYAByMbutalSEQhCgaH8882m0bEYzb9ngYrroJpnnpUsrAxOPAXrnw3mFLAFBFAB+V+uMFoFb7UQxRmwWVDszSsEoOeeIQT/KanAiCvOBZm8jOiMnK/1uQe9QGKVJzHBaHigExBiaiIF5M0I8pkk+GbEEptFrzmK0nzXzvtcDwKanaLwvciodVZwJJoBXu9wgK06I7iEJAhCOE4emicPRnCfO/kEqxBe6WDhw6hLAxlXnaZokTlYz19VPhbUivi+J72VxWBKHqx8heKxPCNNn4v0TFunCqAsL8yGKqqPPi9ET/VyOviiJF7J4WRIvZdGNS2pcU2cldVZT70vqvazOS+JcFhclcSGLn0riJ1m82hT7cVPs71KsqL/YFIstZ4wm4guTL6FkCtPk7dm7X9Oklz+rtRAj3awaoftgPBq1O71OKsv4QW+NuqY1qOuFoWP1TVtlKBz9jm0MhmuWQ8lbQBtGe9hvy1EQr/Vuzx7V9TWs0e8MLIVhTTuyW8POqnwIhZLVK3x2r92qlcUrcnqWZR336nphsFq23T1UGAqHPRgM+naOQmNGMZK89MHYbh8b9ShaBHWNdquv0NcvwOhaVg2WlliskWUOzJyFI4AlJy8Wi93t94ZyEF/X3+rbdu0F8lL5u7Y5aCkMa9bjwfHwKCchDISeXBVSlK/d6R515ShSBI064g3WWMj6l0Y9y+jUWEiJZWTZVlbY6k4Um+zavEmWn9z1vtP3TD1NZbPYaLI523sPZsmLFWbc7FbaN/nVA3AjfH2mEDbFQ0U4bISBKhbYDA+V8HATvFf3e03xniLca4TxVCxeM7ynhPc2wdO6nzbFU0U4bYShKhbaDE+V8HQTPK/7eVM8V4TzRhiuYuHN8FwJzzfBk7qfNMUTRThphCEqFtIMT5TwZAM8Q/fiyJedFMTqSlgtUpzJxWk21yFOQE2POOAolwlOopq8vHBkuhsIpvyfugfEhUM0ZX3sR5DEIc9/heLE8YBQ0uWJhmYXEID17ABPsO6HqzNO5fNLs6FTKI69S/eyDAMkLjIMvRMnpA/i9A3EgfSVmDDzAl9MWPx19rPWJiOYPxhFK7tUmfIVqt64PDwwWwem+Vtr7017db96ov2g/ay91Eyto73R3mon2rkGtT+1v7S/tX+2/9t5uvN85/ul9fGj1ZjnWuXZ+fF/iK5SNw==</latexit>

r✓ log p(⌧|✓)

WORKING OUT MDP

26

<latexit sha1_base64="80Zb8J6SEWfrL4Q7Rb72xv70Zg8=">AAAOw3icfZfbbts2HMaV7tRla5aul9sAYUGBZPMCKXF8uAhQS7bRi7XNgpy6KDMompY1UyJBUY5dRdd7xj3E3mGULMs6Whc2ze/j55/+IgXSpNj2uKL8u/Pss8+/+PKr51/vfvPti73v9l9+f+MRn0F0DQkm7M4EHsK2i665zTG6owwBx8To1pzpkX47R8yziXvFlxQ9OMBy7YkNARddo/1/DMgDemiYHPhP4nOKODgKz0WPxwFHI+XIoIyMRwE/V8K/gqvf1NCg9ihxHhoAxjn8KfHzo3RowH9Vw4ZsMPQI2Hj1U36SUzXS1sOPRvsHyrESX3K5oSaNAym5LkYvXxwYYwJ9B7kcYuB596pC+UMAGLchRuGu4XuIAjgDFrr3+aTzENgu9TlyYSi/FtrExzInclQUeWwzBDleigaAzBYJMpwCJuhE6XbzUR5ygYO8xnhuU2/V9ObWqsGBqPtDsIifS5gbGFgM0KkNFzmyADieA/i01OktHTPfiXyM2NzJd0aUgrHgXCAGbS+qwYUozAca1di7IheJPl3SKXK9MPAZDrMDhYAYQxMxMG56iPs0iG9GzK+Zd86ZjxpRM+477wM2u0TjhsjJdeRxJpgAnu8ynag4LnqExHGAOw4MGgYGRwseGI3jUIivZRMLs0nE1Mk7L8MgMKKamaZ8Kaw58X1GfF8UBxlxkPwJwWN5Qpg8F8+fME8WRllYmA2Rlx99nY6eyNfF6JuMeFMUbzPibVE0/Yzql9R5Rp2X1MeM+lhUFxlxURSXGXFZFD9lxE9F8W5b7MdtsX8WYkX9xaJY7hpjNBGvr3gKBTMYBm+v3v0eBt34SuaCj2Q1b4Tm2ng6bLW77bAo47XeHHZUrV/WU0Nb66l6lSF19Nq60h9sWE4K3hRaUVqDXqsYBfFG73T1YVnfwCq9dl+rMGxoh3pz0E7Kh5BbsFqpT++2mqWyWGlOV9O0s25ZTw1aU9c7JxWG1KH3+/2eHqNQn1GMCl66NrZaZ0o5iqZBHaXV7FXomwegdDStBEszLNpQU/tqzMIRwAUnTyeL3ul1B8Ugvqm/1tP10gPkmfJ3dLXfrDBsWM/6Z4PTmIQw4FrFqpC0fK1257RTjCJp0LAtnmCJhWz+adjVlHaJhWRYhpquRYXNr0SxyO7Vh2D1yt2sO/lAlcOwaBYLrWiO1t7aXPDiCjOud1fat/mrB+Ba+PKdQlgXDyvCYS0MrGKB9fCwEh5ug7fKfqsu3qoIt2phrCoWqx7eqoS3tsHTsp/WxdOKcFoLQ6tYaD08rYSn2+B52c/r4nlFOK+F4VUsvB6eV8LzbfCk7Cd18aQinNTCkCoWUg9PKuHJFvjVaSHaKYjZFbBS5OrEEOsQB6Ckx6eLWCY48Ery6twS6aYjmOIfZQ/wU4doFvWx7UHiuzyBxDgwLCC0cLWnodERBGA52sITLNtussvJvYBpNHgGxcZ35V4Voo/EUYahd2KP9EHsv4HYkv4ibplZji1uWXwbjai1zQgWa6No7YpjlVo8RJUbtyfHavNYVf9oHrxpJSes59IP0s/SoaRKbemN9Fa6kK4lKP2382rnx52f9vp7f++xPb6yPttJxrySctde+D/2bXWG</latexit>

p(⌧|✓) = p(s0)T-1Y

t=0

⇡✓(at|st)p(st+1, rt+1|st,at)<latexit sha1_base64="KIzHNcLYbod1+x/KccCLlzNOPGI=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe+P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wN8r/HQ</latexit>s0<latexit sha1_base64="M8sqRzeDFh3Mi1iBGtqcTbyBm/Y=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk98b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gf1BfGy</latexit>a0

~<latexit sha1_base64="9xsJmqaI6QVO7aNPEsbsg9BoZ44=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL7837Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q8UNPHI</latexit>r1

<latexit sha1_base64="OzSwNdQHk8BIaE6qZCRw39GSBas=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe/P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wOJuPHR</latexit>s1<latexit sha1_base64="faCoi3suTLeizhRJ44TVzLkE1Ww=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk9+b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gcCHfGz</latexit>a1

~<latexit sha1_base64="pd3L+wPVTqb/6ceddXPUL+1bOrE=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL70/unx0ax8by0quFuS4OtfV1fv98fzAcERgHKOQQgyi6NQ3K7xLAuA8xSveGcYQogFPgoduYj9t3iR/SmKMQpvpLoY1jrHOiZ1D6yGcIcrwQBYDMFwk6nAAGIBfoe+WoCIUgQNHRaObTaFVGM29VcCDu+y6ZL/uSliYmHgN04sN5iSwBQRQAPqkMRovALQ+iGCM2C8qDGaVglJxzxKAfZT04F415T7NWR5fkfK1PFnSCwihNYobT4kQhIMbQWExclhHiMU2WNyPWdxq94ixGR1m5HHvlADa9QKMjkVMaKOOMMQG8POQGWXNC9ABJEIBwlAxpmgw5mvNkeHScCvGl7mJhdglgo7LzIk2SYdYz19UvhLUkviuI72SxVxB76x8heKSPCdNnYv0Ji3Rh1IWF+RBF5dmDfPZYH8jRVwXxShavC+K1LLpxQY0r6qygzirqQ0F9kNV5QZzL4qIgLmTxU0H8JIs3u2I/7Ir9KMWK/otNsdgbjtBYvD6Wj1AyhWny5vLtH2nSWV7rZyFGulk2QndjPO03W51WKst4ozf6bdNyqnpuaFld01YZcke3ZRtOb8tyInlzaMNo9rpNOQrird7u2P2qvoU1ui3HUhi2tH270Wut24dQKFm93Gd3mo1KW7w8p2NZ1lmnqucGq2Hb7ROFIXfYjuN07SUKjRnFSPLSjbHZPDOqUTQPahvNRlehbxfAaFtWBZYWWKy+ZTrmkoUjgCUnzx8Wu93t9OQgvu2/1bXtygLyQvvbtuk0FIYt65lz1jtdkhAGQk/uCsnb12y1T9tyFMmD+i2xghUWsv2lfscyWhUWUmDpW7aVNba8E8UmuzXvktUrd7vv9ENTT1PZLDaabM723sYsebHCjOvdSvsuv3oCroWv3imEdfFQEQ5rYaCKBdbDQyU83AXvVf1eXbynCPdqYTwVi1cP7ynhvV3wtOqndfFUEU5rYaiKhdbDUyU83QXPq35eF88V4bwWhqtYeD08V8LzXfCk6id18UQRTmphiIqF1MMTJTwpw4s/j+zUDrCenXoJ1v1wfTAovbNodnyYQnFWXLlXN+4gcfpn6K04VrwXR1YgTnG/JkPAvMAPU/E14A2PsmqXEcw3RlGJDxFT/uyoFtcnx2bj2DT/bBy+/n39TfJU+177UftFM7WW9lp7o51rAw1qgfaX9rf2z/5/Bz8c/HTw88r6+NF6zgutdB389j8hPfHJ</latexit>r2

~

~ ~

<latexit sha1_base64="1RcFirdzOoS5tyEl3SzBljfwnaY=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnS+5P7Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q+WwfHS</latexit>s2

<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>

⇡✓<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>

⇡✓<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>

⇡✓

So we found a way to sample trajectories, and we can use the discounted reward times the derivative of the log-probability of the trajectory: This is an unbiased estimate of the derivative of the expected return.

Still, we haven’t quite figured out how to compute the derivative of the log-probability over trajectories yet.

Recall that we defined the probability of a trajectory using MDPs. This was the long equation some slides back. Now with more knowledge of the structure of how the RL loop works, see if you can follow this equation: It follows exactly the generative process denoted in the graphical representation.

Page 10: Lecture 9: Reinforcement Learning Emile van Krieken · 2021. 1. 25. · Emile van Krieken Deep Learning 2020 Lecture 9: Reinforcement Learning dlvu.github.io Reinforcement learning

WORKING OUT MDP

27

<latexit sha1_base64="uYjYt/uuu2HEd6uZCYSBVQ9UdEQ=">AAAPX3iclZfbbts2HMbtbuu6bG2a7WrYDdGgQ7J4gZQ4PlwEqCXb6MXaZkFOW5QFFE0rQiiRoCjHrqIH2NPsdnuU3e1RRsm2rKOH6SKh+X38/NOfpEGajNieUJS/608++fSzp58/+2Ljy6+ev9h8ufX1hUd9jvA5ooTyKxN6mNguPhe2IPiKcQwdk+BL816P9MsJ5p5N3TMxY/jGgZZrj20Ehey63aq/YjuGKaD/KP/eYQF3vz+WPZ6AAt8quwbjdHQbiGMl/C04+1ENDWbfLpw7BkRxiHhc+MVuMjQQe2rYAAbHD5CP5h/BI0jUSFsO3wWGsWEQaoEiC1j2L4nAngzxnQxTZPkPsL1szv/Cu325rewr8QOKDXXR2K4tnpPbrefbxogi38GuQAR63rWqMHETQC5sRHC4YfgeZhDdQwtf+2LcuQlsl/kCuygEr6U29gkQFEQTBkY2x0iQmWxAxG2ZANAd5JJOTutGNsrDLnSw1xhNbObNm97EmjcElGviJpjGaybMDAwsDtmdjaYZsgA6ngPFXaHTmzlmthP7BPOJk+2MKCVjzjnFHNleVIMTWZgPLKqxd0ZPFvrdjN1h1wsDn5MwPVAKmHM8lgPjpoeFz4L4ZeTav/eOBfdxI2rGfcd9yO9P8aghczIdWZwxoVBku0wnKo6LHxB1HOiOAoOFgSHwVARGYz+U4mtgEmk2qVw6WedpGARGVDPTBKfSmhHfp8T3eXGQEgeLL6FkBMaUg4mcf8o9II1AWriNsJcdfZ6MHoPzfPRFSrzIi5cp8TIvmn5K9QvqJKVOCupDSn3Iq9OUOM2Ls5Q4y4sfU+LHvHi1LvaXdbG/5mJl/eWmmG0YIzyWP63xEgruURi8PXv3Uxh042exFnwM1KwRmUvj4bDV7rbDvEyWenPYUbV+UU8Mba2n6mWGxNFr60p/sGI5yHkTaEVpDXqtfBQiK73T1YdFfQWr9Np9rcSwoh3qzUF7UT6M3ZzVSnx6t9UslMVKcrqaph11i3pi0Jq63jkoMSQOvd/v9/QYhfmcEZzzsqWx1TpSilEsCeoorWavRF9NgNLRtAIsS7FoQ03tqzGLwJDknCJZLHqn1x3kg8Sq/lpP1wsTKFLl7+hqv1liWLEe9Y8GhzEJ5dC18lWhSfla7c5hJx9Fk6BhW85ggYWuvmnY1ZR2gYWmWIaarkWFze5Eucmu1Ztg/pO72ndgWwVhmDfLjZY3R3tvac55SYmZVLtL7ev85QNIJXzxTRGqikcl4agSBpWxoGp4VAqP1sFbRb9VFW+VhFuVMFYZi1UNb5XCW+vgWdHPquJZSTirhGFlLKwanpXCs3XwougXVfGiJFxUwogyFlENL0rhxTp4WvTTqnhaEk4rYWgZC62Gp6XwdA38/LYQnRTk6gp4IXJ+Y4h1RAJY0OPbRSxTEngFeX57iXTTkUzxh6IH+olDNvP6yPYQ9V2xgCQkMCwotXB+pmHRFQQSEB3hKQG2uzjlZH6AWTT4HsmD79w9L0Qfy6sMx+/kGemDPH9DeST9Qb4ytxxbvrL8bzSi1jojnC6NsrUhr1Vq/hJVbFwe7KvNfVX9ubn9prW4YT2rfVd7VdupqbV27U3tbe2kdl5D9d/rf9T/rP/14p/Np5vyljy3PqkvxnxTyzyb3/4Lpo2sUQ==</latexit>

p(⌧|✓) = p(s0)T-1Y

t=0

⇡✓(at|st)p(st+1, rt+1|st,at)

log p(⌧|✓) = log p(s0) +T-1X

t=0

log⇡✓(at|st) + log p(st+1, rt+1|st,at)

<latexit sha1_base64="OEhv9+wa+eTwFIVpWZ+plvbJWQI=">AAAPhHicpZdNc+M0HMaThYWlsHQLRy4aOrvT0m7HbtO8HMps7CTsgd0tnb5BXTqyorieyJZGltNkXX8lPg0XDvBZkJ3E8WuGGXyJrOfRk5//ljySyYjtCUX5q/7kk0+ffvb5sy82vvzq+debL7a+ufSozxG+QJRQfm1CDxPbxRfCFgRfM46hYxJ8ZY71SL+aYO7Z1D0XM4ZvHWi59shGUMiuu636T4YLTQLvDFPcYwGBQagF2I68hf7jonP31QmosHkCCnyn7II9YHi+cxeIEyX8PTh/rYbZEfEAg9nL+x0DohhBPC5CxC4wjI1XeyXjkj8KxJ4a7gOD4wfIh/Nb8AgSNdKWuXHcf326/81+92JbOVDiCxQb6qKxXVtcp3dbz7eNIUW+g12BCPS8G1Vh4jaAXNiI4HDD8D3MIBpDC9/4YtS+DWyX+QK7KAQvpTbyCRAURG8VDG2OkSAz2YCI2zIBoHvIJaZ89xvZKA+70MHe/nBiM2/e9CbWvCHkY+PbYBpPrDAzMLA4ZPc2mmbIAuh4DhT3hU5v5pjZTuwTzCdOtjOilIw55xRzZHtRDU5lYT6wqNjeOT1d6Pczdo9dLwx8TsL0QClgzvFIDoybHhY+C+KHkQtk7J0I7uP9qBn3nfQgH5/h4b7MyXRkcUaEQpHtMp2oOC5+QNRxoDsMDBYGhsBTERj7B6EUXwI5gdDYpHKeZp1nYRAYUc1ME5xJa0Z8nxLf58V+Suwv/oSSIRhRDiby/VPuAWkE0sJthL3s6Itk9Ahc5KMvU+JlXrxKiVd50fRTql9QJyl1UlAfUupDXp2mxGlenKXEWV78mBI/5sXrdbG/rov9LRcr6y8XxWzDGOKR/P7GUygYozB4e/7u5zDoxNdiLvgYqFkjMpfGo0Gz1WmFeZks9cagrWq9op4YWlpX1csMiaPb0pVef8VymPMm0IrS7Heb+ShEVnq7ow+K+gpW6bZ6WolhRTvQG/3WonwYuzmrlfj0TrNRKIuV5HQ0TTvuFPXEoDV0vX1YYkgceq/X6+oxCvM5IzjnZUtjs3msFKNYEtRWmo1uib56AUpb0wqwLMWiDTS1p8YsAkOSc4pksujtbqefDxKr+mtdXS+8QJEqf1tXe40Sw4r1uHfcP4pJKIeula8KTcrXbLWP2vkomgQNWvINFljo6p8GHU1pFVhoimWg6VpU2OxKlIvsRr0N5p/c1boD2yoIw7xZLrS8OVp7S3POS0rMpNpdal/nLx9AKuGLT4pQVTwqCUeVMKiMBVXDo1J4tA7eKvqtqnirJNyqhLHKWKxqeKsU3loHz4p+VhXPSsJZJQwrY2HV8KwUnq2DF0W/qIoXJeGiEkaUsYhqeFEKL9bB06KfVsXTknBaCUPLWGg1PC2Fp2vg50eTaKcgZ1fAC5Hzo0OsIxLAgh4fKGKZksAryPMTSKSbjmSKb4oe6CcO2czrQ9tD1HfFApKQwLCg1ML5noZFRxBIQLSFpwTY7mKXk/kAs2jwGMmN79w9L0QPy6MMx+/kHumD3H9DuSX9QT4ytxxbPrL8Nfaj1jojnC6NsrUhj1Vq/hBVbFwdHqiNA1X9pbH9prk4YT2rfVf7vrZTU2ut2pva29pp7aKG6n/U/6z/Xf9n8+nm3ubR5vHc+qS+GPNtLXNt/vgvqya7Mw==</latexit>

r✓ log p(⌧|✓) = r✓ log p(s0) +T-1X

t=0

r✓ log⇡✓(at|st)

+r✓ log p(st+1, rt+1|st,at)

r✓ log p(⌧|✓) =T-1X

t=0

r✓ log⇡✓(at|st)Gradient of environment wrt policy parameters is 0!

<latexit sha1_base64="PJRypNmaWObVdaJOCcHo9uHG0zo=">AAAOQXicfZdNb9s2HMbV7q3L5q3djrsICwoMQxBIieOXQ4Faso0e1jYL8rbVQUHRtCyYEgmScuwK+hC7bt9m32LfYMdh111GyY4sUZR1CcPn4eOf/hKFPz2KAy4s669Hjz/6+JNPP3vy+cEXX7a++vrps2+uOYkZRFeQYMJuPcARDiJ0JQKB0S1lCIQeRjfews30myViPCDRpVhTdBcCPwpmAQRCTt1MPDFHArx/emgdW/ll1gf2dnBobK/z989ah5MpgXGIIgEx4PydbVFxlwAmAohRejCJOaIALoCP3sVi1rtLgojGAkUwNZ9LbRZjUxAzQzKnAUNQ4LUcAMgCmWDCOWAACgl+UI3iKAIh4kfTZUD5ZsiX/mYggLzru2SVVyWtLEx8Bug8gKsKWQJCHgIxr03ydehVJ1GMEVuG1cmMUjIqzhViMOBZDc5lYd7SrND8kpxv9fmazlHE0yRmOC0vlAJiDM3kwnzIkYhpkt+MfLoL/kKwGB1lw3zuxRCwxQWaHsmcykQVZ4YJENUpL8yKE6F7SMIQRNNkQtNkItBKJJOj41SKz00PS7NHAJtWnRdpkkyymnmeeSGtFfFNSXyjiqOSONr+CMFTc0aYuZTPnzBuSqMpLSyAiFdXXxWrZ+aVGn1dEq9V8aYk3qiiF5fUuKYuS+qypt6X1HtVXZXElSquS+JaFT+UxA+qeLsv9pd9sb8qsbL+clOsDyZTNJMfj/wVShYwTV5dvv4pTfr5tX0XYmTaVSP0Hoyn4063301VGT/o7XHPdoZ1vTB0nYHt6gyFY9B1reFox3KieAtoy+qMBh01CuKd3uu747q+g7UG3aGjMexox2571N2WD6FIsfqFz+132rWy+EVO33Gcs35dLwxO23V7JxpD4XCHw+HAzVFozChGipc+GDudM6seRYugntVpDzT67gFYPcepwdISizN27KGdswgEsOIUxcvi9gb9kRokdvV3Bq5be4CiVP6eaw/bGsOO9Wx4NjrNSQgDka9WhRTl63R7pz01ihRB4658gjUWsvulcd+xujUWUmIZO66TFba6E+Ume2ffJZtP7m7fmYe2maaqWW401ZztvQez4sUaM252a+37/PoFuBG+fqcQNsVDTThshIE6FtgMD7XwcB+8X/f7TfG+JtxvhPF1LH4zvK+F9/fB07qfNsVTTThthKE6FtoMT7XwdB+8qPtFU7zQhItGGKFjEc3wQgsv9sGTup80xRNNOGmEIToW0gxPtPBkDzxD97LlyzoF+XYlrBYpe3LZzeY6xAmo6VwAgXKZ4ITX5M1pI9O9UDLl/9Q9IC4ccqjq04BDEkdiC4lxMvGB1NJNT0OzIwjAZtbCE2wG0bbLqXyAabZ4AWXju3FvCjFE8ijD0GvZI72V/TeQLemP8paZHwbyluXfyVE22mcEqwejHB3IY5WtHqLqg5uTY7t9bNs/tw9fdrYnrCfGd8b3xg+GbXSNl8Yr49y4MqCxMH4zfjf+aP3Z+rv1T+vfjfXxo+2ab43K1frvf4a8SJo=</latexit>

Recall how the probability of a trajectory is defined in an MDP. First, we take the logarithm of this probability: This allows us to change multiplication into summation (because \log(ab)=\log(a) + \log(b) ). We thus have a big sum of different log-probabilities. The first and the last are probabilities representing the dynamics of the environment, and the middle one represents the policy.

Now remains one step: We are interested in the derivative of this log-probability! By linearity of the gradient, we simply put a gradient symbol in front of every log-probability. In the last line, we remove the terms associated with the environment. Why? The parameters of the policy only influence the policy (the agent), and not the environment! So, changing the value of the parameters of the policy changes nothing about the environment itself. This leaves us with a sum over gradients of log-policy probabilities.

Fill into :

<latexit sha1_base64="+BVeB0h2rbRMEP9ZjpfeD2GO/YY=">AAAOpXicfZdbb9s2HMXV7tZlq5duj3sRFrRIhyyQEseXhwC1ZBt96CULcmkXZQZF07JgSiRIyrGr6jPueR9kr9so2ZZ1tV5M8Rwe//iXKJA2xS4Xmvb3o8dffPnV1988+Xbvu++fNn7Yf/bjDScBg+gaEkzYBxtwhF0fXQtXYPSBMgQ8G6Nbe2bG+u0cMe4S/0osKbr3gOO7ExcCIbtG+67lAxuDkWWLKRJAtTBxVHoob0Hwed358sW5avHAG4XiXIv+DK9+06P8sGSURd3N/aEFYJIvPltcAIFG4uVo/0A71pJLLTf0deNAWV8Xo2dPD6wxgYGHfAEx4PxO16i4DwETLsQo2rMCjiiAM+Cgu0BMOveh69NAIB9G6nOpTQKsCqLG01bHLkNQ4KVsAMhcmaDCKWASUxZnLx/FkQ88xI/Gc5fyVZPPnVVDyGmj+3CRVD7KDQwdBujUhYscWQg87gExLXXypWfnO1GAEZt7+c6YUjIWnAvEoMvjGlzIwryncbH5FblY69MlnSKfR2HAcJQdKAXEGJrIgUmTIxHQMJmMfINm/FywAB3FzaTvvA/Y7BKNj2ROriOPM8EEiHyX7cXF8dEDJJ4H/HFo0Si0BFqI0Do6jqT4XJUvEJzZBLBx3nkZhaEV18y21UtpzYnvMuK7ojjIiIP1nxA8VieEqXP5/AnjqjSq0sJciHh+9HU6eqJeF6NvMuJNUbzNiLdF0Q4yalBS5xl1XlIfMupDUV1kxEVRXGbEZVH8lBE/FcUPu2I/7or9oxAr6y8XxXLPGqOJ/EAlr1A4g1H4+urtmyjsJtf6XQiQqueN0N4YT4etdrcdFWW80ZvDjm70y3pqaBs93awypI5e29T6gy3LScGbQmtaa9BrFaMg3uqdrjks61tYrdfuGxWGLe3QbA7a6/Ih5BesTuozu61mqSxOmtM1DOOsW9ZTg9E0zc5JhSF1mP1+v2cmKDRgFKOCl26MrdaZVo6iaVBHazV7Ffr2AWgdwyjB0gyLMTT0vp6wCARwwSnSl8Xs9LqDYpDY1t/omWbpAYpM+Tum3m9WGLasZ/2zwWlCQhjwnWJVSFq+Vrtz2ilGkTRo2JZPsMRCtv807Bpau8RCMixDwzTiwuZXolxkd/p9uPrkbtedeqCrUVQ0y4VWNMdrb2MueHGFGde7K+27/NUDcC18eaYQ1sXDinBYCwOrWGA9PKyEh7vgnbLfqYt3KsKdWhinisWph3cq4Z1d8LTsp3XxtCKc1sLQKhZaD08r4ekueFH2i7p4UREuamFEFYuohxeV8GIXPCn7SV08qQgntTCkioXUw5NKeLIDnqEHueWLdwry7QpZKXJ1dEh0iENQ0pMDRSITHPKSvDqBxLrtSabkpuwBQeqQzaI+djkkgS/WkBiHlgOkFq32NDQ+ggCsxlt4glXXX+9ych9gGg+eQbnxXblXhegjeZRh6K3cI72X+28gt6S/yikzx3PllOWvdRS3dhnBYmOUrT15rNKLh6hy4/bkWG8e6/rvzYNXrfUJ64nys/KLcqjoSlt5pbxWLpRrBSp/Kf8o/yr/NV403jSuGjcr6+NH6zE/KbmrMfof6KxsRQ==</latexit>

r✓ log p(⌧|✓) =T-1X

t=0

r✓ log⇡✓(at|st)

<latexit sha1_base64="GGl8c1l2EY7SWmYrhkOvnPwEFSM=">AAAOj3icfZfdbts2HMXV7qvL1izdLnejLSjQDUEgJY4/LjrEkm10wNpmQZykiwyDomlZMCUSFOXY1fRke5Jd7nZ7iVGyLUuUZF0kNM/h0U9/kQJpU+wGXNP+fvL0k08/+/yLZ18efPX188Nvjl58exuQkEE0hAQTdm+DAGHXR0PucozuKUPAszG6s+dmot8tEAtc4t/wFUUjDzi+O3Uh4KJrfDS0PMBnth3143FEX1k2B+Gf4u8McfBT/GBBB0fX8diauAEkoc8tH9gYjDcO1cLEUeVho4Px0bF2qqWXWm7om8axsrmuxi+e/2BNCAw95HOIQRA86Brlowgw7kKM4gMrDBAFcA4c9BDyaXsUuT4NOfJhrL4U2jTEKidq8ojqxGUIcrwSDQCZKxJUOAMMQC4KcVCMCpAPPBScTBYuDdbNYOGsG1w8KhpFy7TKcWFg5DBAZy5cFsgi4AVJOUudwcqzi50oxIgtvGJnQikYJecSMegGSQ2uRGHe0+TFBTfkaqPPVnSG/CCOQobj/EAhIMbQVAxMmwHiIY3ShxGzZR685ixEJ0kz7XvdA2x+jSYnIqfQUcSZYgJ4scv2kuL46BESzwP+JLJoHFkcLXlknZzGQnypikkD5zYBbFJ0XsdRtJmC6rWwFsR3OfGdLPZzYn9zE4In6pQwdSHeP2GBKoyqsDAXoqA4epiNnqpDOfo2J97K4l1OvJNFO8ypYUld5NRFSX3MqY+yusyJS1lc5cSVLH7MiR9l8X5f7Id9sX9IsaL+YlGsDqwJmoqPUTqFojmMozc3b3+Lo056beZCiFS9aIT21ng+aLY6rViW8VZvDNq60SvrmaFldHWzypA5ui1T6/V3LGeSN4PWtGa/25SjIN7p7Y45KOs7WK3b6hkVhh3twGz0W5vyIeRLVifzmZ1mo1QWJ8vpGIZx0SnrmcFomGb7rMKQOcxer9c1UxQaMoqR5KVbY7N5oZWjaBbU1pqNboW+ewFa2zBKsDTHYgwMvaenLBwBLDl5NlnMdrfTl4P4rv5G1zRLL5Dnyt829V6jwrBjvehd9M9TEsKA78hVIVn5mq32eVuOIlnQoCXeYImF7O406Bhaq8RCciwDwzSSwhZXolhkD/ooWn9yd+tOPdbVOJbNYqHJ5mTtbc2SF1eYcb270r7PXz0A18KXnxTCunhYEQ5rYWAVC6yHh5XwcB+8U/Y7dfFORbhTC+NUsTj18E4lvLMPnpb9tC6eVoTTWhhaxULr4WklPN0Hz8t+XhfPK8J5LQyvYuH18LwSnu+DJ2U/qYsnFeGkFoZUsZB6eFIJT/bAM/QotnzJTiE5SLBSpNiTi91sqkMcgZIecMBRKhMcBSV5feBIdNsTTOmPsgeEmUM0ZX17sknvQnFkOUAo8XpHQ5MDCMBqsoEnWHX9zR6n8PmlydA5FNvetXtdhh4SBxmG3ood0nux+wZiQ/qzeGDmeK54YPHfOkla+4xguTWKVnKo0uUjVLlxd3aqN051/ffG8WVzc756pnyv/Ki8UnSlpVwqb5QrZahA5S/lH+Vf5b/Do8Pm4S+Hl2vr0yebMd8phevw1/8BY+9j6g==</latexit>

Ep(⌧|✓)[R�r✓ log p(⌧|✓)]

BASIC REINFORCE

28

<latexit sha1_base64="tYG1LUh9ZhiVeJr5rMaGHKIgQN4=">AAAOz3icfZfdbts2HMWV7qvL1izdLnejLiiQdlkgJY4/LgLUkm0Uw9pmQdJkizyDomlFMCUSFOXYVTXsdg+wh9uz7GaUbMsSJVk3pnkOj376ixRIm2I34Jr2786jTz797PMvHn+5+9XXT/a+2X/67fuAhAyia0gwYbc2CBB2fXTNXY7RLWUIeDZGN/bUTPSbGWKBS/wrvqBo6AHHdycuBFx0jfb/sXxgYzCybH6POFAt6ODo5/hw9f/FueUBfm/bUT8eRTTpBuHHtRjfpfbLeGSN3QCS0OeqFYTeKOLnWvxHdPWTHsv5mDiqRd11x6EFYErCP1oBBxyN+Ivh7mj/QDvW0kstN/RV40BZXRejp0+eWWMCQw/5HGIQBHe6RvkwAoy7EKN41woDRAGcAgfdhXzSHkauT0OOfBirz4U2CbHKiZqUSB27DEGOF6IBIHNFggrvAROgopC7xagA+cBDwdF45tJg2QxmzrLBxZOjYTRP31JcGBg5DNB7F84LZBHwgqTcpc5g4dnFThRixGZesTOhFIySc44YdIOkBheiMO9oUu7gilys9PsFvUd+EEchw3F+oBAQY2giBqbNAPGQRunDiNk2Dc45C9FR0kz7znuATS/R+EjkFDqKOBNMAC922V5SHB89QOJ5wB9HFo0ji6M5j6yj41iIz1Uxh+DUJoCNi87LOIpWU1S9FNaC+DYnvpXFfk7sr25C8FidEKbOxPsnLFCFURUW5kIUFEdfZ6Mn6rUc/T4nvpfFm5x4I4t2mFPDkjrLqbOS+pBTH2R1nhPnsrjIiQtZ/JATP8ji7bbY37bF/i7FivqLRbHYtcZoIj5m6RSKpjCOXl+9+SWOOum1mgshUvWiEdpr4+mg2eq0YlnGa70xaOtGr6xnhpbR1c0qQ+botkyt19+wnEjeDFrTmv1uU46CeKO3O+agrG9gtW6rZ1QYNrQDs9FvrcqHkC9ZncxndpqNUlmcLKdjGMZZp6xnBqNhmu2TCkPmMHu9XtdMUWjIKEaSl66NzeaZVo6iWVBbaza6FfrmBWhtwyjB0hyLMTD0np6ycASw5OTZZDHb3U5fDuKb+htd0yy9QJ4rf9vUe40Kw4b1rHfWP01JCAO+I1eFZOVrttqnbTmKZEGDlniDJRayudOgY2itEgvJsQwM00gKW1yJYpHd6cNo+cndrDv1QFfjWDaLhSabk7W3NkteXGHG9e5K+zZ/9QBcC19+Ugjr4mFFOKyFgVUssB4eVsLDbfBO2e/UxTsV4U4tjFPF4tTDO5XwzjZ4WvbTunhaEU5rYWgVC62Hp5XwdBs8L/t5XTyvCOe1MLyKhdfD80p4vg2elP2kLp5UhJNaGFLFQurhSSU82QLP0IPY8iU7heSgwUqRy8NDqkMcgZKeHilSmeAoKMnLM0ii255gSv+UPSDMHKIp6+uTT3oXiiPLAUKJlzsamhxAAFaTDTzBquuv9jiFzy9Nhk6h2PYu3csy9JA4yDD0RuyQ3ondNxAb0pfigZnjueKBxa91lLS2GcF8bRSt5FCly0eocuPm5FhvHOv6r42DV83V+eqx8r3yg3Ko6EpLeaW8Vi6UawUq/+0823m58+Pexd5s78+9v5bWRzurMd8phWvv7/8BMM57eg==</latexit>

r✓J(✓) = Ep(⌧|✓)[R�

T-1X

t=0

r✓ log⇡✓(at|st)]<latexit sha1_base64="MUBwUlZhd5Oe24ZoK1vEHiLDUpA=">AAAOunicfZfbbts2HMaV7tRlbZZu2G52wy0okA5eICWODxgC1JJt9GJtsyCnLcoMiqYVwZTIUZRjVxWw19wL7DlGybaso3Vjmt/HTz/9KQqkxYjjC1X9d+fJJ59+9vkXT7/c/erZ872v9198c+3TgCN8hSih/NaCPiaOh6+EIwi+ZRxD1yL4xpoasX4zw9x3qHcpFgzfu9D2nImDoJBdo/1/TMgYp3NgIpuEF9HIHDs+ooEngOkH7igUZ2r0V3j5ixaZHrQIHJmWeMACApNQG5jMWXccmhAlmeKj6Qso8Ei8agDz7wCOgbTAQAY6LmCHyZ+Pq1GvRvsH6pGaXKDc0FaNA2V1nY9ePP/RHFMUuNgTiEDfv9NUJu5DyIWDCI52zcDHDKIptPFdICad+9DxWCCwhyLwUmqTgABBQVwNMHY4RoIsZAMi7sgEgB4gl08ia7abj/KxB13sN8Yzh/nLpj+zlw0hS4Pvw3kyIVFuYGhzyB4cNM+RhdD1XSgeSp3+wrXynTggmM/cfGdMKRkLzjnmyPHjGpzLwrxn8Xz4l/R8pT8s2AP2/CgMOImyA6WAOccTOTBp+lgELEweRr5YU/9M8AA34mbSd9aHfHqBxw2Zk+vI40wIhSLfZblxcTz8iKjrQm8cmiwKTYHnIjQbR5EUXwL5kqGpRSEf550XURiacc0sC1xIa058lxHfFcVBRhysbkLJGEwoBzM5/5T7QBqBtHAHYT8/+iodPQFXxejrjHhdFG8y4k1RtIKMGpTUWUadldTHjPpYVOcZcV4UFxlxURQ/ZMQPRfF2W+wf22L/LMTK+stFsdg1x3giv1vJKxROURS+uXz7WxR2k2v1LgQYaHkjstbGk2Gr3W1HRZms9eawo+n9sp4a2npPM6oMqaPXNtT+YMNyXPCm0KraGvRaxShENnqnawzL+gZW7bX7eoVhQzs0moP2qnwYewWrnfqMbqtZKoud5nR1XT/tlvXUoDcNo3NcYUgdRr/f7xkJCgs4I7jgZWtjq3WqlqNYGtRRW81ehb6ZALWj6yVYlmHRh7rW1xIWgSEpOEX6shidXndQDBKb+us9wyhNoMiUv2No/WaFYcN62j8dnCQklEPPLlaFpuVrtTsnnWIUTYOGbTmDJRa6udOwq6vtEgvNsAx1Q48Lm1+JcpHdaffh8pO7WXfgQANRVDTLhVY0x2tvbS54SYWZ1Lsr7dv81QNILXz5SRGqi0cV4agWBlWxoHp4VAmPtsHbZb9dF29XhNu1MHYVi10Pb1fC29vgWdnP6uJZRTirhWFVLKwenlXCs23wouwXdfGiIlzUwogqFlEPLyrhxTZ4WvbTunhaEU5rYWgVC62Hp5XwdAs8x49yyxfvFOKTCC9FLk8XiY5ICEt6cuZIZEpCvyQvjxuxbrmSKflT9sAgdchmUV8fjZK7MBKaNpRKtNzRsPgAAgmIN/CUAMdb7XFyn18WD50iue1dupdl6GN5kOH4rdwhvZe7byg3pD/LB+a268gHlr9mI25tM8L52ihbu/JQpRWPUOXGzfGR1jzStN+bB69bq/PVU+UH5SflUNGUtvJaeaOcK1cKUv7bebbz3c73e7/uwT1nb7q0PtlZjflWyV174n9idXIb</latexit>

⇡ R�

T-1X

t=0

r✓ log⇡✓(at|st), ⌧ ⇠ p(⌧|✓) Monte Carlo (sample) estimate

Now that we can compute the derivative of the log-probability of the trajectory, we can fill this into the expression of the gradient of the expected return.

This gives us the expression on the third line: An expectation over trajectories, again, but now we multiply the policy probabilities over time with the discounted reward.

https://spinningup.openai.com/en/latest/spinningup/rl_intro3.html#deriving-the-simplest-policy-gradient

Reinforce actions with high total return • Reinforce when is high? • Only reinforce actions with good consequences!

<latexit sha1_base64="D3QeuD0TaT5R0e0KVecWK/RRFjc=">AAAORXicfZdNj+M0HMazy9tSKOzCkUtgtBJCwyiZ6fTlsNI2aas9sLvDaN7YaTVyXDeN6sTGdjrtRvkcXOHL8B34DhwRV3DSTpo4SXOp6+fxk5//iSPbodjjwjD+evT4gw8/+viTJ582Pvu8+cWXT599dcVJyCC6hAQTduMAjrAXoEvhCYxuKEPAdzC6dhZ2ol8vEeMeCS7EmqKJD9zAm3kQCNk1GQOY/N5FFz+a8d3TA+PISC+93DC3jQNte53dPWt+O54SGPooEBADzm9Ng4pJBJjwIEZxYxxyRAFcABfdhmLWnUReQEOBAhjrz6U2C7EuiJ6A6VOPISjwWjYAZJ5M0OEcMAko8RvFKI4C4CN+OF16lG+afOluGgLIuU+iVVqbuDAwchmgcw+uCmQR8LkPxLzUyde+U+xEIUZs6Rc7E0rJqDhXiEGPJzU4k4V5S5My8wtyttXnazpHAY+jkOE4P1AKiDE0kwPTJkcipFE6GfmMF/yFYCE6TJpp34sBYItzND2UOYWOIs4MEyCKXY6fFCdA95D4Pgim0ZjG0ViglYjGh0exFJ/rDpZmhwA2LTrP4ygaJzVzHP1cWgvim5z4RhWHOXG4vQnBU31GmL6Uz58wrkujLi3Mg4gXR19mo2f6pRp9lROvVPE6J16rohPm1LCkLnPqsqTe59R7VV3lxJUqrnPiWhXf58T3qnizL/aXfbHvlFhZf7ko1o3xFM3kJyR9haIFjKNXF69/iqNeem3fhRDpZtEInQfjyajd6XViVcYPemvUNa1BWc8MHatv2lWGzNHv2MZguGM5VrwZtGG0h/22GgXxTu/27FFZ38Ea/c7AqjDsaEd2a9jZlg+hQLG6mc/utVulsrhZTs+yrNNeWc8MVsu2u8cVhsxhDwaDvp2i0JBRjBQvfTC226dGOYpmQV2j3epX6LsHYHQtqwRLcyzWyDIHZsoiEMCKU2Qvi93t94ZqkNjV3+rbdukBilz5u7Y5aFUYdqyng9PhSUpCGAhctSokK1+70z3pqlEkCxp15BMssZDdnUY9y+iUWEiOZWTZVlLY4kqUi+zWnESbT+5u3ekHph7HqlkuNNWcrL0Hs+LFFWZc76607/NXD8C18OWZQlgXDyvCYS0MrGKB9fCwEh7ug3fLfrcu3q0Id2th3CoWtx7erYR398HTsp/WxdOKcFoLQ6tYaD08rYSn++BF2S/q4kVFuKiFEVUsoh5eVMKLffCk7Cd18aQinNTCkCoWUg9PKuHJHniG7uWWL9kpyLcrYqXIzaEh1SGOQEnnAgiUygRHvCQ7Yo4ESHTHl0zpn7IHhJlDNlV96nFIwkCkd6E4GrtAKvFmR0OTAwjAerKBJ1j3gu0ep/D5pcnQBZTb3o17U4YBkgcZhl7LHdJbufsGckP6g5wwc31PTlj+jg+T1j4jWD0YZashD1WmeoQqN66Pj8zWkWn+3Dp42d6er55o32jfad9rptbRXmqvtDPtUoPar9pv2u/aH80/m383/2n+u7E+frQd87VWuJr//Q9MBUoK</latexit>aT-1<latexit sha1_base64="cMEvqefOIj4N4G00cthp1Waxs10=">AAAOQXicfZdNb9s2HMbV7q3L5q3djrtoCwoMQxBIieOXQ4Faso0e1jYL8tIuDgKKpmUhlEhQlGNX0KfYdfsy+xb7BjsOu+4ySnZkiaSsS/7h8/DxT3+JAulRHMTcsv569Pijjz/59LMnn+998WXrq6+fPvvmMiYJg+gCEkzYOw/ECAcRuuABx+gdZQiEHkZX3p2b61cLxOKAROd8RdFNCPwomAUQcDH0fsLQPWDTW/v26b51aBWXqRb2ptg3Ntfp7bPW95MpgUmIIg4xiONr26L8JgWMBxCjbG+SxIgCeAd8dJ3wWe8mDSKacBTBzHwutFmCTU7MHMqcBgxBjleiAJAFIsGEc8AA5AJ9rx4VowiEKD6YLgIar8t44a8LDsR936TLoi9ZbWLqM0DnAVzWyFIQxiHgc2UwXoVefRAlGLFFWB/MKQWj5FwiBoM478GpaMxbmrc6PienG32+onMUxVmaMJxVJwoBMYZmYmJRxognNC1uRjzfu/gFZwk6yMti7MUQsLszND0QObWBOs4ME8DrQ16YNydC95CEIYim6YRm6YSjJU8nB4eZEJ+bHhZmj4i3o+48y9J0kvfM88wzYa2JbyriG1kcVcTR5kcInpozwsyFeP6ExaYwmsLCAoji+uyLcvbMvJCjLyvipSxeVcQrWfSSipoo6qKiLhT1vqLey+qyIi5lcVURV7L4oSJ+kMV3u2Lf74r9VYoV/ReLYrU3maKZ+HwUr1B6B7P01fnrn7O0X1ybdyFBpl03Qu/BeDzudPvdTJbxg94e92xnqOqloesMbFdnKB2DrmsNR1uWI8lbQltWZzToyFEQb/Ve3x2r+hbWGnSHjsawpR277VF30z6EIsnqlz6332krbfHLnL7jOCd9VS8NTtt1e0caQ+lwh8PhwC1QaMIoRpKXPhg7nRNLjaJlUM/qtAcaffsArJ7jKLC0wuKMHXtoFywcASw5efmyuL1BfyQH8W3/nYHrKg+QV9rfc+1hW2PYsp4MT0bHBQlhIPLlrpCyfZ1u77gnR5EyaNwVT1BhIdtfGvcdq6uwkArL2HGdvLH1lSgW2bV9k64/udt1Z+7bZpbJZrHQZHO+9h7MkhdrzLjZrbXv8usn4EZ49U4hbIqHmnDYCAN1LLAZHmrh4S54X/X7TfG+JtxvhPF1LH4zvK+F93fBU9VPm+KpJpw2wlAdC22Gp1p4ugueq37eFM814bwRhutYeDM818LzXfBE9ZOmeKIJJ40wRMdCmuGJFp7sgF8fCPKdgni7UqZEij252M0WOsQpUPSYA44KmeA0VmSPzxEHue6Fgqn4R/WApHSIUtanQQxJEvHiVyhOJz4QSrbe0dD8AAKwmW/gCTaDaLPHqX1+aT71Dopt79q9bsMQiYMMQ6/FDumt2H0DsSH9Sdww88NA3LD4OznIq11GsHwwimpPHKps+QilFldHh3b70LZ/ae+/7GzOV0+M74wfjB8N2+gaL41XxqlxYUAjNH4zfjf+aP3Z+rv1T+vftfXxo82cb43a1frvf3fgSHA=</latexit>r1

CREDIT ASSIGNMENT

29

<latexit sha1_base64="tYG1LUh9ZhiVeJr5rMaGHKIgQN4=">AAAOz3icfZfdbts2HMWV7qvL1izdLnejLiiQdlkgJY4/LgLUkm0Uw9pmQdJkizyDomlFMCUSFOXYVTXsdg+wh9uz7GaUbMsSJVk3pnkOj376ixRIm2I34Jr2786jTz797PMvHn+5+9XXT/a+2X/67fuAhAyia0gwYbc2CBB2fXTNXY7RLWUIeDZGN/bUTPSbGWKBS/wrvqBo6AHHdycuBFx0jfb/sXxgYzCybH6POFAt6ODo5/hw9f/FueUBfm/bUT8eRTTpBuHHtRjfpfbLeGSN3QCS0OeqFYTeKOLnWvxHdPWTHsv5mDiqRd11x6EFYErCP1oBBxyN+Ivh7mj/QDvW0kstN/RV40BZXRejp0+eWWMCQw/5HGIQBHe6RvkwAoy7EKN41woDRAGcAgfdhXzSHkauT0OOfBirz4U2CbHKiZqUSB27DEGOF6IBIHNFggrvAROgopC7xagA+cBDwdF45tJg2QxmzrLBxZOjYTRP31JcGBg5DNB7F84LZBHwgqTcpc5g4dnFThRixGZesTOhFIySc44YdIOkBheiMO9oUu7gilys9PsFvUd+EEchw3F+oBAQY2giBqbNAPGQRunDiNk2Dc45C9FR0kz7znuATS/R+EjkFDqKOBNMAC922V5SHB89QOJ5wB9HFo0ji6M5j6yj41iIz1Uxh+DUJoCNi87LOIpWU1S9FNaC+DYnvpXFfk7sr25C8FidEKbOxPsnLFCFURUW5kIUFEdfZ6Mn6rUc/T4nvpfFm5x4I4t2mFPDkjrLqbOS+pBTH2R1nhPnsrjIiQtZ/JATP8ji7bbY37bF/i7FivqLRbHYtcZoIj5m6RSKpjCOXl+9+SWOOum1mgshUvWiEdpr4+mg2eq0YlnGa70xaOtGr6xnhpbR1c0qQ+botkyt19+wnEjeDFrTmv1uU46CeKO3O+agrG9gtW6rZ1QYNrQDs9FvrcqHkC9ZncxndpqNUlmcLKdjGMZZp6xnBqNhmu2TCkPmMHu9XtdMUWjIKEaSl66NzeaZVo6iWVBbaza6FfrmBWhtwyjB0hyLMTD0np6ycASw5OTZZDHb3U5fDuKb+htd0yy9QJ4rf9vUe40Kw4b1rHfWP01JCAO+I1eFZOVrttqnbTmKZEGDlniDJRayudOgY2itEgvJsQwM00gKW1yJYpHd6cNo+cndrDv1QFfjWDaLhSabk7W3NkteXGHG9e5K+zZ/9QBcC19+Ugjr4mFFOKyFgVUssB4eVsLDbfBO2e/UxTsV4U4tjFPF4tTDO5XwzjZ4WvbTunhaEU5rYWgVC62Hp5XwdBs8L/t5XTyvCOe1MLyKhdfD80p4vg2elP2kLp5UhJNaGFLFQurhSSU82QLP0IPY8iU7heSgwUqRy8NDqkMcgZKeHilSmeAoKMnLM0ii255gSv+UPSDMHKIp6+uTT3oXiiPLAUKJlzsamhxAAFaTDTzBquuv9jiFzy9Nhk6h2PYu3csy9JA4yDD0RuyQ3ondNxAb0pfigZnjueKBxa91lLS2GcF8bRSt5FCly0eocuPm5FhvHOv6r42DV83V+eqx8r3yg3Ko6EpLeaW8Vi6UawUq/+0823m58+Pexd5s78+9v5bWRzurMd8phWvv7/8BMM57eg==</latexit>

r✓J(✓) = Ep(⌧|✓)[R�

T-1X

t=0

r✓ log⇡✓(at|st)]

<latexit sha1_base64="mFWMPbZtkcP/WzEwCBfWOSfltVc=">AAAOgnicfZddb9s2GIWVdh9dtnrpdrkbYUGBbc0CKXH8ASxALdlGL9Y2C5ImW5wFFE3LgimRICnHrqC/uPv9j91uGCXbskRJ1k2Y9xweP3olCqRDsceFYfy99+TpJ59+9vmzL/a//Op54+uDF9984CRkEF1Dggm7dQBH2AvQtfAERreUIeA7GN04MzvRb+aIcY8EV2JJ0b0P3MCbeBAIWXo4mI6gi6PL+GE09jgkYSDORww9AjZ+MF9ltU3pRH+lZ8U/Tzbl07RMBC/I0dXPZryxXD0cHBrHRnrp5YG5Hhxq6+vi4cXzPZkJQx8FAmLA+Z1pUHEfASY8iFG8Pwo5ogDOgIvuQjHp3EdeQEOBAhjrL6U2CbEuiJ7ctT72GIICL+UAQObJBB1OAQNQyN7sF6M4CoCP+NF47lG+GvK5uxoIIBt7Hy3SxseFiZHLAJ16cFEgi4DPfSCmpSJf+k6xiEKM2NwvFhNKyag4F4hBjyc9uJCNeU+TZ8mvyMVany7pFAU8jkKG4/xEKSDG0EROTIcciZBG6c3IF2jGzwUL0VEyTGvnfcBml2h8JHMKhSLOBBMgiiXHT5oToEdIfB8E42hE42gk0EJEo6PjWIovdQdLs0Pk21F0XsZRNEp65jj6pbQWxHc58Z0qDnLiYP0jBI/1CWH6XD5/wrgujbq0MA8iXpx9nc2e6Ndq9Iec+EEVb3LijSo6YU4NS+o8p85L6mNOfVTVRU5cqOIyJy5V8WNO/KiKt7tif98V+4cSK/svF8VyfzRGE/l9Sl+haAbj6M3V21/jqJte63chRLpZNEJnYzwdttrddqzKeKM3hx3T6pf1zNC2eqZdZcgcvbZt9AdblhPFm0EbRmvQa6lREG/1TtcelvUtrNFr960Kw5Z2aDcH7XX7EAoUq5v57G6rWWqLm+V0Lcs665b1zGA1bbtzUmHIHHa/3+/ZKQoNGcVI8dKNsdU6M8pRNAvqGK1mr0LfPgCjY1klWJpjsYaW2TdTFoEAVpwie1nsTq87UIPEtv9Wz7ZLD1Dk2t+xzX6zwrBlPeufDU5TEsJA4KpdIVn7Wu3OaUeNIlnQsC2fYImFbH9p2LWMdomF5FiGlm0ljS2uRLnI7sz7aPXJ3a47/dDU41g1y4WmmpO1tzErXlxhxvXuSvsuf/UEXAtfvlMI6+JhRTishYFVLLAeHlbCw13wbtnv1sW7FeFuLYxbxeLWw7uV8O4ueFr207p4WhFOa2FoFQuth6eV8HQXvCj7RV28qAgXtTCiikXUw4tKeLELnpT9pC6eVISTWhhSxULq4UklPNkBvzoQJDuF5OTBSpFyTy53s6kOcQRKOhdAoFQmOOIl2RFTJECiO75kSv9RPZsjSppCcTRygVTi1Y6FJgcMgPVkg06w7gXrPUzh80qTqTMot7Ur9+o2+0geVBh6K3dA7+XuGsgN50/yhpjre/KG5N/RUTLaZQSLjVGO9uWhyVSPSOXBzcmx2Tw2zd+ah69b6/PTM+077XvtB83U2tpr7Y12oV1rUPtL+0f7V/uv8bTxY8NsnK6sT/bWc77VClfjl/8B2N5ccw==</latexit>

R� = r1 + �r2 + �2r3 + · · ·+ �T-1rT

This algorithm is very simple, and also very poor! One important reason for this is that it doesn’t do any credit assignment: Whenever we get a high total reward, the log-probability of all actions performed in that trajectory are increased evenly. Increasing the probability of performing an action because we got a good reward is called reinforcing that action.

We don’t really want this: Consider the situation where we got a high total reward because the very first reward r_1 was very high. All actions taken are equally reinforced, so the action a_T taken at the last timestep T will be reinforced as much as the first action a_0, although only the first action a_0 could have influenced the value of the first reward!

Page 11: Lecture 9: Reinforcement Learning Emile van Krieken · 2021. 1. 25. · Emile van Krieken Deep Learning 2020 Lecture 9: Reinforcement Learning dlvu.github.io Reinforcement learning

Recall Consider gradient of reward at :

Only influenced by actions until

<latexit sha1_base64="mFWMPbZtkcP/WzEwCBfWOSfltVc=">AAAOgnicfZddb9s2GIWVdh9dtnrpdrkbYUGBbc0CKXH8ASxALdlGL9Y2C5ImW5wFFE3LgimRICnHrqC/uPv9j91uGCXbskRJ1k2Y9xweP3olCqRDsceFYfy99+TpJ59+9vmzL/a//Op54+uDF9984CRkEF1Dggm7dQBH2AvQtfAERreUIeA7GN04MzvRb+aIcY8EV2JJ0b0P3MCbeBAIWXo4mI6gi6PL+GE09jgkYSDORww9AjZ+MF9ltU3pRH+lZ8U/Tzbl07RMBC/I0dXPZryxXD0cHBrHRnrp5YG5Hhxq6+vi4cXzPZkJQx8FAmLA+Z1pUHEfASY8iFG8Pwo5ogDOgIvuQjHp3EdeQEOBAhjrL6U2CbEuiJ7ctT72GIICL+UAQObJBB1OAQNQyN7sF6M4CoCP+NF47lG+GvK5uxoIIBt7Hy3SxseFiZHLAJ16cFEgi4DPfSCmpSJf+k6xiEKM2NwvFhNKyag4F4hBjyc9uJCNeU+TZ8mvyMVany7pFAU8jkKG4/xEKSDG0EROTIcciZBG6c3IF2jGzwUL0VEyTGvnfcBml2h8JHMKhSLOBBMgiiXHT5oToEdIfB8E42hE42gk0EJEo6PjWIovdQdLs0Pk21F0XsZRNEp65jj6pbQWxHc58Z0qDnLiYP0jBI/1CWH6XD5/wrgujbq0MA8iXpx9nc2e6Ndq9Iec+EEVb3LijSo6YU4NS+o8p85L6mNOfVTVRU5cqOIyJy5V8WNO/KiKt7tif98V+4cSK/svF8VyfzRGE/l9Sl+haAbj6M3V21/jqJte63chRLpZNEJnYzwdttrddqzKeKM3hx3T6pf1zNC2eqZdZcgcvbZt9AdblhPFm0EbRmvQa6lREG/1TtcelvUtrNFr960Kw5Z2aDcH7XX7EAoUq5v57G6rWWqLm+V0Lcs665b1zGA1bbtzUmHIHHa/3+/ZKQoNGcVI8dKNsdU6M8pRNAvqGK1mr0LfPgCjY1klWJpjsYaW2TdTFoEAVpwie1nsTq87UIPEtv9Wz7ZLD1Dk2t+xzX6zwrBlPeufDU5TEsJA4KpdIVn7Wu3OaUeNIlnQsC2fYImFbH9p2LWMdomF5FiGlm0ljS2uRLnI7sz7aPXJ3a47/dDU41g1y4WmmpO1tzErXlxhxvXuSvsuf/UEXAtfvlMI6+JhRTishYFVLLAeHlbCw13wbtnv1sW7FeFuLYxbxeLWw7uV8O4ueFr207p4WhFOa2FoFQuth6eV8HQXvCj7RV28qAgXtTCiikXUw4tKeLELnpT9pC6eVISTWhhSxULq4UklPNkBvzoQJDuF5OTBSpFyTy53s6kOcQRKOhdAoFQmOOIl2RFTJECiO75kSv9RPZsjSppCcTRygVTi1Y6FJgcMgPVkg06w7gXrPUzh80qTqTMot7Ur9+o2+0geVBh6K3dA7+XuGsgN50/yhpjre/KG5N/RUTLaZQSLjVGO9uWhyVSPSOXBzcmx2Tw2zd+ah69b6/PTM+077XvtB83U2tpr7Y12oV1rUPtL+0f7V/uv8bTxY8NsnK6sT/bWc77VClfjl/8B2N5ccw==</latexit>

R� = r1 + �r2 + �2r3 + · · ·+ �T-1rT<latexit sha1_base64="D3RwoSqmlhdpI3n1idNRMWypLRc=">AAAOPHicfZdNb9s2HMbV7q3L5q3djrtoC4oNWxBIieOXQ4Faso0e1jYL8tIuDgqKpmUhlEhQlGNX0FfYdfsy+yA77zjsuvMo2ZElkrIu/pvPw0c//SUKlEdxEHPL+uvBww8+/OjjTx59uvfZ560vvnz85KvLmCQMogtIMGFvPBAjHEToggccozeUIRB6GF15t26uXy0QiwMSnfMVRTch8KNgFkDAi6Hvf7LfPd63Dq3iMNXC3hT7xuY4ffek9e1kSmASoohDDOL42rYov0kB4wHEKNubJDGiAN4CH10nfNa7SYOIJhxFMDOfCm2WYJMTM+cxpwFDkOOVKABkgUgw4RwwALmg3qtHxSgCIYoPpouAxusyXvjrggNxyTfpsmhJVpuY+gzQeQCXNbIUhHEI+FwZjFehVx9ECUZsEdYHc0rBKDmXiMEgzntwKhrzmuZdjs/J6Uafr+gcRXGWJgxn1YlCQIyhmZhYlDHiCU2LixG39jZ+xlmCDvKyGHs2BOz2DE0PRE5toI4zwwTw+pAX5s2J0B0kYQiiaTqhWTrhaMnTycFhJsSnpoeF2SOATevOsyxNJ3nPPM88E9aa+KoivpLFUUUcbU5C8NScEWYuxP0nLDaF0RQWFkAU12dflLNn5oUcfVkRL2XxqiJeyaKXVNREURcVdaGodxX1TlaXFXEpi6uKuJLF9xXxvSy+2RX7dlfsr1Ks6L9YFKu9yRTNxJujeITSW5ilL85f/pyl/eLYPAsJMu26EXr3xuNxp9vvZrKM7/X2uGc7Q1UvDV1nYLs6Q+kYdF1rONqyHEneEtqyOqNBR46CeKv3+u5Y1bew1qA7dDSGLe3YbY+6m/YhFElWv/S5/U5baYtf5vQdxznpq3ppcNqu2zvSGEqHOxwOB26BQhNGMZK89N7Y6ZxYahQtg3pWpz3Q6NsbYPUcR4GlFRZn7NhDu2DhCGDJycuHxe0N+iM5iG/77wxcV7mBvNL+nmsP2xrDlvVkeDI6LkgIA5Evd4WU7et0e8c9OYqUQeOuuIMKC9meadx3rK7CQiosY8d18sbWV6JYZNf2Tbp+5W7Xnblvm1kmm8VCk8352rs3S16sMeNmt9a+y6+fgBvh1SuFsCkeasJhIwzUscBmeKiFh7vgfdXvN8X7mnC/EcbXsfjN8L4W3t8FT1U/bYqnmnDaCEN1LLQZnmrh6S54rvp5UzzXhPNGGK5j4c3wXAvPd8ET1U+a4okmnDTCEB0LaYYnWniyA56hO7Hly3cK4ulKmRIp9uRiN1voEKdA0WMOOCpkgtNYkT0+RxzkuhcKpuKP6gFJ6RClrE+DGJIk4sVZKE4nPhBKtt7R0PwDBGAz38ATbAbRZo9Te/3SfOotFNvetXvdhiESHzIMvRQ7pNdi9w3EhvRHccHMDwNxweJ3cpBXu4xgeW8U1Z74qLLlTyi1uDo6tNuHtv1Le/95Z/N99cj4xvjO+MGwja7x3HhhnBoXBjTmxm/G78YfrT9bf7f+af27tj58sJnztVE7Wv/9DzE7RcQ=</latexit>

t 0 + 1

<latexit sha1_base64="PhDCm1hi6TQRs7YSOLmhz/ykrow=">AAAOOnicfZdNb9s2HMbV7q3L5q3djrtoC4oNQxBIieOXQ4Faso0e1jYL8rbFQUHRtCyEEgmKcuwK+ga7bl9mn2THHYdd9wFGyY4skZR18d98Hj766S9RoDyKg5hb1l+PHn/w4Ucff/Lk073PPm998eXTZ19dxiRhEF1Aggm79kCMcBChCx5wjK4pQyD0MLry7txcv1ogFgckOucrim5D4EfBLICAi6Ez/v27p/vWoVUcplrYm2Lf2Byn7561vp1MCUxCFHGIQRzf2BbltylgPIAYZXuTJEYUwDvgo5uEz3q3aRDRhKMIZuZzoc0SbHJi5jTmNGAIcrwSBYAsEAkmnAMGIBfMe/WoGEUgRPHBdBHQeF3GC39dcCAu+DZdFg3JahNTnwE6D+CyRpaCMA4BnyuD8Sr06oMowYgtwvpgTikYJecSMRjEeQ9ORWPe0rzH8Tk53ejzFZ2jKM7ShOGsOlEIiDE0ExOLMkY8oWlxMeLG3sUvOEvQQV4WYy+GgN2doemByKkN1HFmmABeH/LCvDkRuockDEE0TSc0SyccLXk6OTjMhPjc9LAwewSwad15lqXpJO+Z55lnwloT31TEN7I4qoijzUkInpozwsyFuP+ExaYwmsLCAoji+uyLcvbMvJCjLyvipSxeVcQrWfSSipoo6qKiLhT1vqLey+qyIi5lcVURV7L4viK+l8XrXbG/7Ir9VYoV/ReLYrU3maKZeG8Uj1B6B7P01fnrn7K0XxybZyFBpl03Qu/BeDzudPvdTJbxg94e92xnqOqloesMbFdnKB2DrmsNR1uWI8lbQltWZzToyFEQb/Ve3x2r+hbWGnSHjsawpR277VF30z6EIsnqlz6332krbfHLnL7jOCd9VS8NTtt1e0caQ+lwh8PhwC1QaMIoRpKXPhg7nRNLjaJlUM/qtAcafXsDrJ7jKLC0wuKMHXtoFywcASw5efmwuL1BfyQH8W3/nYHrKjeQV9rfc+1hW2PYsp4MT0bHBQlhIPLlrpCyfZ1u77gnR5EyaNwVd1BhIdszjfuO1VVYSIVl7LhO3tj6ShSL7Ma+Tdev3O26M/dtM8tks1hosjlfew9myYs1Ztzs1tp3+fUTcCO8eqUQNsVDTThshIE6FtgMD7XwcBe8r/r9pnhfE+43wvg6Fr8Z3tfC+7vgqeqnTfFUE04bYaiOhTbDUy083QXPVT9viueacN4Iw3UsvBmea+H5Lnii+klTPNGEk0YYomMhzfBEC092wDN0L7Z8+U5BPF0pUyLFnlzsZgsd4hQoeswBR4VMcBorssfniINc90LBVPxRPSApHaKU9WkQQ5JEvDgLxenEB0LJ1jsamn+AAGzmG3iCzSDa7HFqr1+aT72DYtu7dq/bMETiQ4ah12KH9FbsvoHYkP4oLpj5YSAuWPxODvJqlxEsH4yi2hMfVbb8CaUWV0eHdvvQtn9u77/sbL6vnhjfGN8ZPxi20TVeGq+MU+PCgMbM+M343fij9Wfr79Y/rX/X1sePNnO+NmpH67//AdSERVQ=</latexit>

t 0

CREDIT ASSIGNMENT

30

<latexit sha1_base64="7jTNceGqOxA6oQx+dcGd9Nb5QjY=">AAAPinicrZfdbts2HMXt7qtL16XZLnejLeiWblkgJY4/MASoJdvoxdpkQdJ0i1yDomlFMCUSFOXYVfVae5cBu92eY5RsyxIl+aKYbsLwHB7/+BcpkBbFjs9V9a/6g48+/uTTzx5+vvPoi8df7j7Z++q1TwIG0TUkmLA3FvARdjx0zR2O0RvKEHAtjG6sqRHrNzPEfId4V3xB0dAFtudMHAi46Brt1c9ND1gYjEyL3yEOFNMF/M6ywn40CumB6AXB+5X2LLo1x44PSeDxtyH/ITIZugdsPBLtn7Ro+P3Zhw9WTD9wRftMjd6GVz9rkYyFia2Y1Fl3HJgAJjPg702fA45G/NlQ2THN/w0iVj+AYfRkXz1Sk0cpNrRVY7+2ei5Ge4+/NccEBi7yOMTA9281lfJhCBh3IEbRjhn4iAI4BTa6DfikPQwdjwYceTBSngptEmCFEyV+u8rYYQhyvBANAJkjEhR4B5jgFGtgJx/lIw+4yD8czxzqL5v+zF42uJg4GobzZIFFuYGhzQC9c+A8RxYC14/rXuj0F66V70QBRmzm5jtjSsEoOeeIQcePa3AhCnNO42r7V+Ripd8t6B3y/CgMGI6yA4WAGEMTMTBp+ogHNEwmIzbK1D/jLECHcTPpO+sBNr1E40ORk+vI40wwATzfZblxcTx0D4nrAm8cmjQKTY7mPDQPjyIhPlXEEoJTi4hllndeRmG4WqvKpbDmxFcZ8ZUs9jNif/UjBI+VCWHKTLx/wnxFGBVhYQ5Efn70dTp6olzL0a8z4mtZvMmIN7JoBRk1KKizjDorqPcZ9V5W5xlxLouLjLiQxXcZ8Z0svtkW+/u22D+kWFF/sSkWO+YYTcR3OFlC4RRG4Yurl79GYSd5VmshQIqWN0JrbTwZNFudViTLeK03Bm1N7xX11NDSu5pRZkgd3Zah9voblmPJm0KrarPfbcpREG/0dscYFPUNrNpt9fQSw4Z2YDT6rVX5EPIkq536jE6zUSiLneZ0dF0/7RT11KA3DKN9XGJIHUav1+saCQoNGMVI8tK1sdk8VYtRNA1qq81Gt0TfvAC1resFWJph0Qe61tMSFo4Alpw8XSxGu9vpy0F8U3+9axiFF8gz5W8bWq9RYtiwnvZO+ycJCWHAs+WqkLR8zVb7pC1HkTRo0BJvsMBCNr806Ohqq8BCMiwD3dDjwuZ3othkt9owXH5yN/tO2deUKJLNYqPJ5njvrc2SF5eYcbW71L7NXz4AV8IXZwphVTwsCYeVMLCMBVbDw1J4uA3eLvrtqni7JNyuhLHLWOxqeLsU3t4GT4t+WhVPS8JpJQwtY6HV8LQUnm6D50U/r4rnJeG8EoaXsfBqeF4Kz7fBk6KfVMWTknBSCUPKWEg1PCmFJ1vglzeL+KQgVlfICpHLu0OiQxyCgp7cKBKZ4NAvyMsrSKxbrmBK/il6QJA6RFPW11eh5FcoDk0bCCVanmhofAEBWIkP8AQrjrc64+Q+vzQeOoXi2Lt0L8vQQ+Iiw9BLcUI6F6dvIA6kP4oJM9t1xITFX/Mwbm0zgvnaKFo74lKlyVeoYuPm+EhrHGnab439583V/eph7Zvad7WDmlZr1Z7XXtQuatc1WP+z/nf9n/q/u492td3O7i9L64P6aszXtdyz2/sP5gnC2w==</latexit>

r✓Ep(⌧|✓)[�t0rt0+1] = Ep(⌧|✓)[�

t0rt0+1

T-1X

t=0

r✓ log⇡✓(at|st)]

= Ep(⌧|✓)[�t0rt0+1

t0X

t=0

r✓ log⇡✓(at|st)]

So can we improve upon this simple REINFORCE algorithm? Yes!

Let’s consider what the gradient of the discounted reward at timestep t+1’ is. So instead of taking the total expected reward, we only look at the expected reward at timestep t’+1. We do the same trick as before to find the first expression on the right.

It turns out this is equal to the second expression: We only need to some over the first t’ action log-probabilities! The intuition behind this is that the actions taken in timesteps t’+1 to T do not influence the outcome of the reward at t’+1.

In the RL assignment, you’ll prove this formally :)

https://youtu.be/oPGVsoBonLM?t=1523

Sum over timesteps:

Equivalent: Update actions based on following rewards • Discounted reward to go

Gradient estimate:

BETTER REINFORCE

31

<latexit sha1_base64="y4rJ5T0Ip9mHnRxKdsy3imVVGeM=">AAAPpHiclZdbb9s2HMWV7tZl69Jsj3vRFrRrtyyQUseXhwC1ZBsFtrZZlku3yDUomlYEUyJBUY5dVd9xr/sie90o2ZYlSnIxvYThOTz68S/SIG2K3YBr2t879z76+JNPP7v/+e4XXz74au/h/tdXAQkZRJeQYMLe2CBA2PXRJXc5Rm8oQ8CzMbq2p2aiX88QC1ziX/AFRUMPOL47cSHgomu0v+NaPrAxGFk2v0UcqJYH+K1tR/14FNEnoheE71fa0/jGgg6OzuORNXYDSEKfDx+ffmBEEHqjiP9wqsVvo4uf9Tgb+lb0xhZDd4CNE8dPeqyu3Kk5USU2TBzVou6644kFYDoN/t4KOOBoxJ8OVcvafXz6gXkUXpRS/f83ZTPjlTNTpakNRw8PtCMtfdRyQ181DpTVczbaf/CdNSYw9JDPIQZBcKNrlA8jwLgLMYp3rTBAFMApcNBNyCftYeT6NOTIh7H6SGiTEKucqMl3V8cuQ5DjhWgAyFyRoMJbwMS0xOrYLUYFyAceCg7HM5cGy2Ywc5YNLuqEhtE8XXpxYWDkMEBvXTgvkEXAC5JvUeoMFp5d7EQhRmzmFTsTSsEoOeeIQTdIanAmCvOaJh8nuCBnK/12QW+RH8RRyHCcHygExBiaiIFpM0A8pFE6GbGFpsEpZyE6TJpp32kPsOk5Gh+KnEJHEWeCCeDFLttLiuOjO0g8D/jjyKJxZHE055F1eBQL8ZEqVhyc2kQskqLzPI6i1fpVz4W1IL7Kia9ksZ8T+6uXEDxWJ4SpM/H9CQtUYVSFhbkQBcXRl9noiXopR1/lxCtZvM6J17Johzk1LKmznDorqXc59U5W5zlxLouLnLiQxXc58Z0svtkW+8e22D+lWFF/sSkWu9YYTcQvdLqEoimMoxcXL3+No076rNZCiFS9aIT22vhs0Gx1WrEs47XeGLR1o1fWM0PL6OpmlSFzdFum1utvWI4lbwatac1+tylHQbzR2x1zUNY3sFq31TMqDBvagdnot1blQ8iXrE7mMzvNRqksTpbTMQzjpFPWM4PRMM32cYUhc5i9Xq9rpig0ZBQjyUvXxmbzRCtH0SyorTUb3Qp98wG0tmGUYGmOxRgYek9PWTgCWHLybLGY7W6nLwfxTf2NrmmWPiDPlb9t6r1GhWHDetI76T9LSQgDviNXhWTla7baz9pyFMmCBi3xBUssZPOmQcfQWiUWkmMZGKaRFLa4E8Umu9GH0fInd7Pv1ANdjWPZLDaabE723toseXGFGde7K+3b/NUDcC18eaYQ1sXDinBYCwOrWGA9PKyEh9vgnbLfqYt3KsKdWhinisWph3cq4Z1t8LTsp3XxtCKc1sLQKhZaD08r4ek2eF7287p4XhHOa2F4FQuvh+eV8HwbPCn7SV08qQgntTCkioXUw5NKeLIFfnkvSE4KyW2KlSKXV41UhzgCJT29gKQywVFQkpc3lkS3PcGU/lP2gDBziKasr28y6VsojiwHCCVenmhocgEBWE0O8ASrrr864xR+fmkydArFsXfpXpahh8RFhqGX4oT0Wpy+gTiQ/igmzBzPFRMWf63DpLXNCOZro2jtikuVLl+hyo3r4yO9caTrvzUOnjdX96v7yrfK98oTRVdaynPlhXKmXCpw56+df3b+vafsPdr7Ze/3vcul9d7Oasw3SuHZe/sf1nvLzw==</latexit>

r✓Ep(⌧|✓)[R�] = Ep(⌧|✓)[T-1X

t0=0

�t0rt0+1

t0X

t=0

r✓ log⇡✓(at|st)]

= Ep(⌧|✓)[T-1X

t=0

r✓ log⇡✓(at|st)T-1X

t0=t

�t0rt0+1]

<latexit sha1_base64="YBuoKOy9cK4UEyGD6qqBudmOzR4=">AAAOV3icfZdNc+M0HMbdBZZSaGnhyMVDZwcGuh27TfNy6MzGTsIe2N3S6dtSdzuyojieyJZGktNkPf40fBqucIEvwyA7iePX+FJVz6MnP/1leSSbYpcLTft369knn372/PPtL3a+/Gp37+v9g29uOAkYRNeQYMLubMARdn10LVyB0R1lCHg2Rrf2xIz12yli3CX+lZhT9OABx3dHLgRCdj3un1vQweEv0aNQz1WLB95jKH44F9GH8OqlHqnW0OWQBL74ILtfCtnB0BNgw9j1sx497h9qx1ryqOWGvmwcKsvn4vFgd8saEhh4yBcQA87vdY2KhxAw4UKMoh0r4IgCOAEOug/EqP0Quj4NBPJhpL6Q2ijAqiBqPBV16DIEBZ7LBoDMlQkqHAMGoJAT3slHceQDD/Gj4dSlfNHkU2fREEBW6yGcJdWMcgNDhwE6duEsRxYCj3tAjEudfO7Z+U4UYMSmXr4zppSMBecMMejyuAYXsjDvaLxA/IpcLPXxnI6Rz6MwYDjKDpQCYgyN5MCkyZEIaJhMRr4VE34uWICO4mbSd94DbHKJhkcyJ9eRxxlhAkS+y/bi4vjoCRLPA/4wtGgUWgLNRGgdHUdSfKHaWJptIl+RvPMyCkMrrpltq5fSmhPfZsS3RbGfEfvLHyF4qI4IU6dy/QnjqjSq0sJciHh+9HU6eqReF6NvMuJNUbzNiLdF0Q4yalBSpxl1WlKfMupTUZ1lxFlRnGfEeVH8mBE/FsW7TbHvN8X+XoiV9ZebYr5jDdFIfnSSVyicwCh8ffXm1yjsJM/yXQiQqueN0F4ZTwfNVqcVFWW80huDtm70ynpqaBld3awypI5uy9R6/TXLScGbQmtas99tFqMgXuvtjjko62tYrdvqGRWGNe3AbPRby/Ih5BesTuozO81GqSxOmtMxDOOsU9ZTg9EwzfZJhSF1mL1er2smKDRgFKOCl66MzeaZVo6iaVBbaza6Ffp6AbS2YZRgaYbFGBh6T09YBAK44BTpy2K2u51+MUis6290TbO0gCJT/rap9xoVhjXrWe+sf5qQEAZ8p1gVkpav2WqftotRJA0atOQKlljI+pcGHUNrlVhIhmVgmEZc2PxOlJvsXn8IF5/c9b5TD3U1iopmudGK5njvrcwFL64w43p3pX2Tv3oAroUvzxTCunhYEQ5rYWAVC6yHh5XwcBO8U/Y7dfFORbhTC+NUsTj18E4lvLMJnpb9tC6eVoTTWhhaxULr4WklPN0EL8p+URcvKsJFLYyoYhH18KISXmyCJ2U/qYsnFeGkFoZUsZB6eFIJTzbAL24F8Ukhvk6wUqQ8k8vTbKJDHIKSzgUQKJEJDnlJtsUYCRDrtieZkn+KntVNJUmhOLQcIJVocWKh8QUDYDU+oBOsuv7yDJP7vNJ46ATKY+3CvZhmD8mLCkNv5AnonTxdA3ng/ElOiDmeKyck/1pHcWuTEcxWRtnakZcmvXhFKjduT471xrGu/9Y4fNVc3p+2le+U75UfFV1pKa+U18qFcq1A5Q/lT+Uv5e/df3b/23u+t72wPttajvlWyT17B/8D/qxMdg==</latexit>

Gt =T-1X

t0=t

�t0-trt0+1

<latexit sha1_base64="59UGCyusojdLxVCiBoNqxLXqpQ8=">AAAO83ichZfbcqM2HMad7WmbdtNse9kb2sx2sp00A4njw0Vm1mC7e9Hsppmc2uD1CFnGjAXSCOHYy/Ikvev0ti/QN+nbVGAbgwCXG8v6Pn38+CMxkkWx43NV/XfnyUcff/LpZ08/3/3iy2d7X+0///rWJwGD6AYSTNi9BXyEHQ/dcIdjdE8ZAq6F0Z01NWL9boaY7xDvmi8oGrjA9pyxAwEXXcP9f0wPWBgMTYtPEAeK6QI+saywFw1Deih6QfBhpb2MHkxo4/AqGpojx4ck8Pjgh/P/GaKYfuAOQ36uRu/C65+0KB37jitJ3s/RkMsUmNiKSZ11x6EJYMLLP5g+BxwN+cvBcP9APVaTSyk2tFXjoLa6LofPn31njggMXORxiIHvP2gq5YMQMO5AjKJdM/ARBXAKbPQQ8HFrEDoeDTjyYKS8ENo4wAonSlxHZeQwBDleiAaAzBEJCpwAJjhFtXfzUT7ygIv8o9HMof6y6c/sZYOLB0eDcJ68yig3MLQZoBMHznNkIXD9uOKFTn/hWvlOFGDEZm6+M6YUjJJzjhh0/LgGl6Iwb2lcbf+aXK70yYJOkOdHYcBwlB0oBMQYGouBSdNHPKBh8jBiSk79c84CdBQ3k77zLmDTKzQ6Ejm5jjzOGBPA812WGxfHQ4+QuC7wRqFJo9DkaM5D8+g4EuILRUwhOLUIYKO88yoKw9UsVa6ENSe+yYhvZLGXEXurmxA8UsaEKTPx/gnzFWFUhIU5EPn50Tfp6LFyI0ffZsRbWbzLiHeyaAUZNSios4w6K6iPGfVRVucZcS6Li4y4kMX3GfG9LN5vi/1tW+zvUqyov1gUi11zhMbii5dMoXAKo/D19cUvUdhOrtVcCJCi5Y3QWhtP+41muxnJMl7r9X5L07tFPTU09Y5mlBlSR6dpqN3ehuVE8qbQqtrodRpyFMQbvdU2+kV9A6t2ml29xLCh7Rv1XnNVPoQ8yWqnPqPdqBfKYqc5bV3Xz9pFPTXodcNonZQYUofR7XY7RoJCA0Yxkrx0bWw0ztRiFE2DWmqj3inRNy9Abel6AZZmWPS+rnW1hIUjgCUnTyeL0eq0e3IQ39Rf7xhG4QXyTPlbhtatlxg2rGfds95pQkIY8Gy5KiQtX6PZOm3JUSQN6jfFGyywkM2d+m1dbRZYSIalrxt6XNj8ShSL7EEbhMtP7mbdKQeaEkWyWSw02RyvvbVZ8uISM652l9q3+csH4Er44pNCWBUPS8JhJQwsY4HV8LAUHm6Dt4t+uyreLgm3K2HsMha7Gt4uhbe3wdOin1bF05JwWglDy1hoNTwthafb4HnRz6vieUk4r4ThZSy8Gp6XwvNt8KToJ1XxpCScVMKQMhZSDU9K4ckWeIYexZYv3inExwlWiFyeHRId4hAU9OREkcgEh35BXh5BYt1yBVPyp+gBQeoQTVlfn3mSu1AcmjYQSrTc0dD4AAKwEm/gCVYcb7XHyX1+aTx0CsW2d+lelqGLxEGGoQuxQ3ordt9AbEh/FA/MbNcRDyx+zaO4tc0I5mujaO2KQ5UmH6GKjduTY+3sWP21fvDqYnW8elr7tvZ97bCm1Zq1V7XXtcvaTQ3uKDv9nbc7l3vB3h97f+79tbQ+2VmN+aaWu/b+/g/36Yqc</latexit>

r✓Ep(⌧|✓)[R�] = Ep(⌧|✓)[T-1X

t=0

�tGtr✓ log⇡✓(at|st)]

In the last step, we found that for a single reward, we only need to update the probability of actions that came before the reward.

Let’s use this to find the gradient of the total expected reward (that is, over all timesteps)! We see that, again, for all rewards, we also should only consider the actions that came before to update. Those are the actions that could have led to a positive reward! The actions after receiving each reward couldn’t possibly have influenced it.

We can rewrite this to an equivalent formulation: for the update of each action, we should consider only the rewards received after performing the action. We will also proof this relation in the assignment.

To do this, let’s first define the discounted reward to go. This is the discounted reward received after timestep t, but only counting the discounting from that point. You might wonder why! It turns out this quantity represents something very useful in RL, and we will be going back to it in the next lecture on RL.

Finally, we find a nice expression for the policy gradient: Each action is reinforced based on the discounted reward to go it receives: The reward after executing that action.

https://youtu.be/oPGVsoBonLM?t=1523

REINFORCE: 1. 2.

<latexit sha1_base64="SoGZ+HdYsbbenfP+QovWVhqqXZE=">AAAOVnicfZdPb9s2GMbVbl27bEnT7diLsKBANwSBlDj+cyhQS7bRw9JmQf5tsRFQNC0LpkSCpBy7mg77NLtu32b7MsMo2ZElSrIOyWs+Dx//9Eo0SIdijwvD+PfJ0y++fPbV8xdf73zz7e7ey/1X311zEjKIriDBhN06gCPsBehKeAKjW8oQ8B2MbpyZneg3c8S4R4JLsaRo5AM38CYeBEIO3e+/HjoChPqQe75O36Yffpd/p0iAH+/3D4wjI730cmGuiwNtfZ3fv9o9GI4JDH0UCIgB53emQcUoAkx4EKN4ZxhyRAGcARfdhWLSHkVeQEOBAhjrb6Q2CbEuiJ6A6mOPISjwUhYAMk8m6HAKGIBC3s5OMYqjAPiIH47nHuWrks/dVSGA7MUoWqS9igsTI5cBOvXgokAWAZ/7QExLg3zpO8VBFGLE5n5xMKGUjIpzgRj0eNKDc9mYTzRpP78k52t9uqRTFPA4ChmO8xOlgBhDEzkxLTkSIY3Sm5HPfMbfCRaiw6RMx971AJtdoPGhzCkMFHEmmABRHHL8pDkBeoDE90EwjoY0joYCLUQ0PDyKpfhGd7A0OwSwcdF5EUfRMOmZ4+gX0loQP+bEj6rYz4n99ZcQPNYnhOlz+fwJ47o06tLCPIh4cfZVNnuiX6nR1znxWhVvcuKNKjphTg1L6jynzkvqQ059UNVFTlyo4jInLlXxc078rIq322J/3Rb7mxIr+y8XxXJnOEYT+ZOSvkLRDMbRh8uzn+Ook17rdyFEulk0QufReDJotjqtWJXxo94YtE2rV9YzQ8vqmnaVIXN0W7bR629YjhVvBm0YzX63qUZBvNHbHXtQ1jewRrfVsyoMG9qB3ei31u1DKFCsbuazO81GqS1ultOxLOu0U9Yzg9Ww7fZxhSFz2L1er2unKDRkFCPFSx+NzeapUY6iWVDbaDa6FfrmARhtyyrB0hyLNbDMnpmyCASw4hTZy2K3u52+GiQ2/be6tl16gCLX/rZt9hoVhg3rae+0f5KSEAYCV+0KydrXbLVP2moUyYIGLfkESyxk802DjmW0SiwkxzKwbCtpbHElykV2Z46i1U/uZt3pB6Yex6pZLjTVnKy9R7PixRVmXO+utG/zV0/AtfDlO4WwLh5WhMNaGFjFAuvhYSU83Abvlv1uXbxbEe7WwrhVLG49vFsJ726Dp2U/rYunFeG0FoZWsdB6eFoJT7fBi7Jf1MWLinBRCyOqWEQ9vKiEF9vgSdlP6uJJRTiphSFVLKQenlTCky3wDD3ILV+yU5BvV8RKkXJPLnezqQ5xBEo6F0CgVCY44iV5ddxIdMeXTOmHsgeEmUOWqj72OCRhINaQGEdDF0gtXu1paHIEAVhPtvAE616w3uUUfoBpMnkG5cZ35V41oofkUYahM7lH+iT330BuSX+St8xc35O3LP8PD5NqmxEsHo2y2pHHKlM9RJWL6+Mj8/TI+KVx8P5sfcB6ob3WftDeaqbW0t5rH7Rz7UqD2h/an9pf2t+7/+z+t/ds7/nK+vTJes73WuHa2/8f2ZtNqg==</latexit>

⌧ ⇠ p(⌧|✓)

REINFORCE

32

<latexit sha1_base64="6Uq5dn8r9bKNsuZ3zo0yxrYIDX8=">AAAOVHicfZdNb9s2HMaVdt26bMnS7biLsKDAsKWBlDh+GRCglmyvh7XNgrxtcRpQNC0LpkSCpBy7gr7LPs2u22WHfZZdRsmOLFGSdTHF5+HjH/8SBdKh2OPCMP7devL0k2effvb88+0vvtzZ/WrvxddXnIQMoktIMGE3DuAIewG6FJ7A6IYyBHwHo2tnaif69Qwx7pHgQiwouvOBG3hjDwIhu+73fhpCF0c/x/dCP9WHPPTvo+mpiD9EF6/MWB+OPA5JGIgP0fSVkPcMPQA2kp4fzfh+b984NNJLLzfMVWNfW11n9y92toYjAkMfBQJiwPmtaVBxFwEmPIhRvD0MOaIAToGLbkMxbt9FXkBDgQIY6y+lNg6xLoiezEMfeQxBgReyASDzZIIOJ4ABKORst4tRHAXAR/xgNPMoXzb5zF02BJCluovmaSnjwsDIZYBOPDgvkEXA5z4Qk1InX/hOsROFGLGZX+xMKCWj4pwjBj2e1OBMFuY9TZ4OvyBnK32yoBMU8DgKGY7zA6WAGENjOTBtciRCGqWTka/ElJ8KFqKDpJn2nfYAm56j0YHMKXQUccaYAFHscvykOAF6gMT3QTCKhjSOhgLNRTQ8OIyl+FJ3sDQ7RL4hRed5HEXDpGaOo59La0F8lxPfqWI/J/ZXf0LwSB8Tps/k8yeM69KoSwvzIOLF0ZfZ6LF+qUZf5cQrVbzOideq6IQ5NSyps5w6K6kPOfVBVec5ca6Ki5y4UMWPOfGjKt5siv1tU+zvSqysv1wUi+3hCI3lFyd9haIpjKM3F29/iaNOeq3ehRDpZtEInUfj8aDZ6rRiVcaPemPQNq1eWc8MLatr2lWGzNFt2Uavv2Y5UrwZtGE0+92mGgXxWm937EFZX8Ma3VbPqjCsaQd2o99alQ+hQLG6mc/uNBulsrhZTseyrJNOWc8MVsO220cVhsxh93q9rp2i0JBRjBQvfTQ2mydGOYpmQW2j2ehW6OsHYLQtqwRLcyzWwDJ7ZsoiEMCKU2Qvi93udvpqkFjX3+radukBilz527bZa1QY1qwnvZP+cUpCGAhctSokK1+z1T5uq1EkCxq05BMssZD1Pw06ltEqsZAcy8CyraSwxZUoF9mteRctP7nrdafvm3ocq2a50FRzsvYezYoXV5hxvbvSvslfPQDXwpdnCmFdPKwIh7UwsIoF1sPDSni4Cd4t+926eLci3K2FcatY3Hp4txLe3QRPy35aF08rwmktDK1iofXwtBKeboIXZb+oixcV4aIWRlSxiHp4UQkvNsGTsp/UxZOKcFILQ6pYSD08qYQnG+CXh4Jkp5CcJVgpUu7J5W421SGOQEnnAgiUygRHvCQ7YoIESHTHl0zpjep5PKekKRRHQxdIJV7uWGhywABYTzboBOtesNrDFD6vNBk6hXJbu3Qvp9lD8qDC0Fu5A3ovd9dAbjh/kBNiru/JCcnf4UHS2mQE80ejbG3LQ5OpHpHKjeujQ7NxaJq/NvZfN1fnp+fat9p32veaqbW019ob7Uy71KD2h/an9pf2984/O//tPt19trQ+2VqN+UYrXLu7/wN8SkvI</latexit>

Gt =T-1X

k=t

�k-trk+1

Sample trajectory

Gradient ascent

<latexit sha1_base64="XGafOMVHScycbbFGjl+Cs9TqUS0=">AAAOnnicfZfbbts2HMbl7tRlq5dsd+sNsaBAt6WBlDg+XASoJdvrxdJkRU5blBoUTcuCKZGgKMeupss95J5hLzFKPulo3Zjm9/HTj3+JAmkx4vhCVf+tPfvs8y++/Or513vffPui/t3+wfe3Pg04wjeIEsrvLehj4nj4RjiC4HvGMXQtgu+sqRHrdzPMfYd612LB8KMLbc8ZOwgK2TXc/8e0xAQLCEyCxwJyTp/AuutXYCJGQhMSNoERMP3AHYbiXI0+htdvtMgcOT6igSc+ChPZJPwtGgrTgxaBw3AVIQcRagOTOduu1yZEyb3F36YvoMBD8fNw/1A9VpMLFBvaqnGorK6r4cGLmjmiKHCxJxCBvv+gqUw8hpALBxEc7ZmBjxlEU2jjh0CM24+h47FAYA9F4JXUxgEBgoK4JGDkcIwEWcgGRNyRCQBNIJeYsnB72Sgfe9DF/tFo5jB/2fRn9rIh5NzxYzhPnkqUGRjaHLKJg+YZshC6vgvFpNDpL1wr24kDgvnMzXbGlJIx55xjjhw/rsGVLMwli4vtX9OrlT5ZsAn2/CgMOInSA6WAOcdjOTBp+lgELEwmI9+uqX8ueICP4mbSd96DfPoBj45kTqYjizMmFIpsl+XGxfHwE6KuC71RaLIoNAWei9A8Oo6k+ArItwhNLQr5KOv8EIWhGdfMssAHac2I71Pi+7zYT4n91U0oGYEx5WAmnz/lPpBGIC3cQdjPjr7ZjB6Dm3z0bUq8zYt3KfEuL1pBSg0K6iylzgrqU0p9yqvzlDjPi4uUuMiLn1Lip7x4vyv2z12xf+ViZf3loljsmSM8lh+v5BUKpygK311f/B6FneRavQsBBlrWiKy18XTQbHVaUV4ma70xaGt6r6hvDC29qxllho2j2zLUXn/LcpLzbqBVtdnvNvNRiGz1dscYFPUtrNpt9fQSw5Z2YDT6rVX5MPZyVnvjMzrNRqEs9iano+v6Waeobwx6wzDaJyWGjcPo9XpdI0FhAWcE57xsbWw2z9RiFNsEtdVmo1uibx+A2tb1AixLsegDXetpCYvAkOScYvOyGO1up58PEtv6613DKDxAkSp/29B6jRLDlvWsd9Y/TUgoh56drwrdlK/Zap+281F0EzRoySdYYKHbOw06utoqsNAUy0A39Liw2ZUoF9mD9hguP7nbdQcONRBFebNcaHlzvPbW5pyXlJhJtbvUvstfPoBUwhdnilBVPCoJR5UwqIwFVcOjUni0C94u+u2qeLsk3K6EsctY7Gp4uxTe3gXPin5WFc9KwlklDCtjYdXwrBSe7YIXRb+oihcl4aISRpSxiGp4UQovdsHTop9WxdOScFoJQ8tYaDU8LYWnO+A5fpJbvninEJ8oeCFyeXRIdERCWNCTA0UiUxL6BXl1BJG65Uqm5E/esz7VJCnx8ceGUomWOxYWHzAgAfEGnRLgeKs9TObzyuKhUyS3tUv3cpo9LA8qHF/IHdCl3F1DueH8RU6I264jJyR/zaO4tcsI52ujbO3JQ5OWPyIVG7cnx9rZsfpH4/Dtxer49Fx5qfykvFY0paW8Vd4pV8qNgpT/age1H2sv66A+qF/UL5fWZ7XVmB+UzFW//x+/k2c9</latexit>

✓ ✓+ ↵T-1X

t=0

�tGtr✓ log⇡✓(at|st)

We will first look at the simplest method to estimate the policy gradient: The REINFORCE algorithm.

First, we define G_t, the total discounted reward received after timestep t. This quantity will turn out to be very useful in future algorithms! Note that the discounting also happens from timestep t onward: the reward r_{t+1} is not discounted, because \gamma^{k-t}=\gamma^{t-t}=1

The REINFORCE algorithm start by sampling a trajectory from the agent interacting with the environment. This happens just like how we sampled a trajectory when estimating the expectation of the return.

Then, we estimate the gradient using the REINFORCE estimate. This might look a bit like magic! Where does it come from? We’ll derive it in the next few slides to ensure it’s not some equation thrown out of nowhere :)

This estimate gradient is finally used in the gradient ascent step.

Page 12: Lecture 9: Reinforcement Learning Emile van Krieken · 2021. 1. 25. · Emile van Krieken Deep Learning 2020 Lecture 9: Reinforcement Learning dlvu.github.io Reinforcement learning

REINFORCE ALGORITHM FOR PYTHON

33

<latexit sha1_base64="J8Jiii67GqrDFbFcaX68oWlX93E=">AAAOdHicfZdNb9s2HMaV7q3Llizdjr0ICwJ0QxZIieOXQ4Basr0e1jYL8tI2Sg2KpmXBlEiQlGNX03fb19gX2HW77D5KtmW9WpcwfB4+/vEvUiBtil0uNO2vnSefff7Fl189/Xr3m2/39r87ePb9LScBg+gGEkzYOxtwhF0f3QhXYPSOMgQ8G6M7e2rG+t0MMe4S/1osKHrwgOO7YxcCIbuGBx8sHnjDUFxo0cfw+hc9Uq2RyyEJfPFRqBZ0cPhrNJQtH9gYDC1bTJAAqoWJo1rUXXe8sABMAsUfFhdAoKH4aXhwqJ1oyaOWG/qqcaisnsvhs70da0Rg4CFfQAw4v9c1Kh5CwIQLMYp2rYAjCuAUOOg+EOP2Q+j6NBDIh5F6JLVxgFVB1Hie6shlCAq8kA0AmSsTVDgBTGLKauzmozjygYf48WjmUr5s8pmzbAg5b/QQzpNSR7mBocMAnbhwniMLgcc9ICalTr7w7HwnCjBiMy/fGVNKxoJzjhh0eVyDS1mYtzQuNr8mlyt9sqAT5PMoDBiOsgOlgBhDYzkwaXIkAhomk5FLZsovBAvQcdxM+i56gE2v0OhY5uQ68jhjTIDId9leXBwfPULiecAfhRaNQkuguQit45NIikeqXEFwahPARnnnVRSGVlwz21avpDUnvsmIb4piPyP2Vz9C8EgdE6bO5PsnjKvSqEoLcyHi+dE36eixelOMvs2It0XxLiPeFUU7yKhBSZ1l1FlJfcyoj0V1nhHnRXGRERdF8VNG/FQU322Lfb8t9kMhVtZfborFrjVCY/lFSpZQOIVR+Or69W9R2Eme1VoIkKrnjdBeG88GzVanFRVlvNYbg7Zu9Mp6amgZXd2sMqSObsvUev0Ny2nBm0JrWrPfbRajIN7o7Y45KOsbWK3b6hkVhg3twGz0W6vyIeQXrE7qMzvNRqksTprTMQzjvFPWU4PRMM32aYUhdZi9Xq9rJig0YBSjgpeujc3muVaOomlQW2s2uhX65gVobcMowdIMizEw9J6esAgEcMEp0sVitrudfjFIbOpvdE2z9AJFpvxtU+81Kgwb1vPeef8sISEM+E6xKiQtX7PVPmsXo0gaNGjJN1hiIZtfGnQMrVViIRmWgWEacWHzO1Fusnv9IVx+cjf7Tj3U1SgqmuVGK5rjvbc2F7y4wozr3ZX2bf7qAbgWvjxTCOviYUU4rIWBVSywHh5WwsNt8E7Z79TFOxXhTi2MU8Xi1MM7lfDONnha9tO6eFoRTmthaBULrYenlfB0G7wo+0VdvKgIF7UwoopF1MOLSnixDZ6U/aQunlSEk1oYUsVC6uFJJTzZAs/QozzyxSeF+DrBSpHLq0OiQxyCkp5cKBKZ4JCX5OUNJNZtTzIl/xQ96ztNkkJxaDlAKtHyxELjCwbAanxAJ1h1/dUZJvd5pfHQKZTH2qV7Oc0ekhcVhl7LE9BbeboG8sD5s5wQczxXTkj+tY7j1jYjmK+NsrUrL0168YpUbtydnuiNE13/vXH4srm6Pz1Vnis/Ki8UXWkpL5VXyqVyo0DlT+Vv5R/l373/9p/vH+4fLa1PdlZjflByz/7J/+PmWH8=</latexit>

T-1X

t=0

�tGtr✓ log⇡✓(at|st)

1 discount = 0.9999 2 sgd = SGD(NN.parameters(), lr=1e-4) 3 for tau in range(100000): 4 s = env.reset() 5 sgd.zero_grad() 6 R, log_pi_as = [], [] 7 for t in range(100000): 8 pi = Categorical(logits=NN(s)) 9 a = pi.sample() 10 s, r, terminal, _ = env.step(a) 11 R.append(r) 12 log_pi_as.append(pi.log_prob(a)) 13 if terminal: 14 break 15 surrogate_l = 0. 16 for t in range(len(log_pi_as)): 17 Gt = 0. 18 for t’ in range(t, len(R)): 19 Gt += discount**(t’ - t) * R[t’] 20 surrogate_l += discount**t * Gt * log_pi_as[t] 21 (-surrogate_l).backward()

Sample a trajectory

Update parameters

Initialize

<latexit sha1_base64="YBuoKOy9cK4UEyGD6qqBudmOzR4=">AAAOV3icfZdNc+M0HMbdBZZSaGnhyMVDZwcGuh27TfNy6MzGTsIe2N3S6dtSdzuyojieyJZGktNkPf40fBqucIEvwyA7iePX+FJVz6MnP/1leSSbYpcLTft369knn372/PPtL3a+/Gp37+v9g29uOAkYRNeQYMLubMARdn10LVyB0R1lCHg2Rrf2xIz12yli3CX+lZhT9OABx3dHLgRCdj3un1vQweEv0aNQz1WLB95jKH44F9GH8OqlHqnW0OWQBL74ILtfCtnB0BNgw9j1sx497h9qx1ryqOWGvmwcKsvn4vFgd8saEhh4yBcQA87vdY2KhxAw4UKMoh0r4IgCOAEOug/EqP0Quj4NBPJhpL6Q2ijAqiBqPBV16DIEBZ7LBoDMlQkqHAMGoJAT3slHceQDD/Gj4dSlfNHkU2fREEBW6yGcJdWMcgNDhwE6duEsRxYCj3tAjEudfO7Z+U4UYMSmXr4zppSMBecMMejyuAYXsjDvaLxA/IpcLPXxnI6Rz6MwYDjKDpQCYgyN5MCkyZEIaJhMRr4VE34uWICO4mbSd94DbHKJhkcyJ9eRxxlhAkS+y/bi4vjoCRLPA/4wtGgUWgLNRGgdHUdSfKHaWJptIl+RvPMyCkMrrpltq5fSmhPfZsS3RbGfEfvLHyF4qI4IU6dy/QnjqjSq0sJciHh+9HU6eqReF6NvMuJNUbzNiLdF0Q4yalBSpxl1WlKfMupTUZ1lxFlRnGfEeVH8mBE/FsW7TbHvN8X+XoiV9ZebYr5jDdFIfnSSVyicwCh8ffXm1yjsJM/yXQiQqueN0F4ZTwfNVqcVFWW80huDtm70ynpqaBld3awypI5uy9R6/TXLScGbQmtas99tFqMgXuvtjjko62tYrdvqGRWGNe3AbPRby/Ih5BesTuozO81GqSxOmtMxDOOsU9ZTg9EwzfZJhSF1mL1er2smKDRgFKOCl66MzeaZVo6iaVBbaza6Ffp6AbS2YZRgaYbFGBh6T09YBAK44BTpy2K2u51+MUis6290TbO0gCJT/rap9xoVhjXrWe+sf5qQEAZ8p1gVkpav2WqftotRJA0atOQKlljI+pcGHUNrlVhIhmVgmEZc2PxOlJvsXn8IF5/c9b5TD3U1iopmudGK5njvrcwFL64w43p3pX2Tv3oAroUvzxTCunhYEQ5rYWAVC6yHh5XwcBO8U/Y7dfFORbhTC+NUsTj18E4lvLMJnpb9tC6eVoTTWhhaxULr4WklPN0EL8p+URcvKsJFLYyoYhH18KISXmyCJ2U/qYsnFeGkFoZUsZB6eFIJTzbAL24F8Ukhvk6wUqQ8k8vTbKJDHIKSzgUQKJEJDnlJtsUYCRDrtieZkn+KntVNJUmhOLQcIJVocWKh8QUDYDU+oBOsuv7yDJP7vNJ46ATKY+3CvZhmD8mLCkNv5AnonTxdA3ng/ElOiDmeKyck/1pHcWuTEcxWRtnakZcmvXhFKjduT471xrGu/9Y4fNVc3p+2le+U75UfFV1pKa+U18qFcq1A5Q/lT+Uv5e/df3b/23u+t72wPttajvlWyT17B/8D/qxMdg==</latexit>

Gt =T-1X

t0=t

�t0-trt0+1REINFORCE update

Here, we see some python code for a simple REINFORCE implementation. It uses PyTorch and the OpenAI Gym library, which implements many RL environments to play around with.

In the first loop (line 3), we loop over the trajectories we sample. In line 4, we sample the initial state s from the environment.

Next, we start looping over the timesteps (line 7). We use the NN to create a categorical distribution over actions from the state s. This is the policy distribution \pi_\theta(a|s)! From this distribution, we sample the next action a to perform. We then use env.step(a) to transition to the next state s, and we receive a reward r in the process. We save the reward and the log-probability of the action. We also get back a boolean terminal which tells us if the sampled state is a terminal state. If so, we break the loop and go to the REINFORCE update.

In the next lines, we compute the REINFORCE update. For each timestep, we compute a "loss" component, which is often called the surrogate loss. Don’t consider this loss as a meaningful quantity: We don’t want to maximize the loss, we want to maximize the expected reward! The REINFORCE update is straightforwardly computed, just like the equation suggests, with a loop in the sums.

Policy gradient is the gradient of expected return REINFORCE

• Simplest method to approximate policy gradient • General and unbiased :) • Very high variance! :(

• Not sample efficient Next part: General gradient estimation

REINFORCE

34

Recap: We discussed the policy gradient, which is the gradient with respect to the policy parameters of the expected return.

The simplest policy gradient algorithm is REINFORCE, which is general and unbiased, but has very high variance. In the next part, we will discuss gradient estimation in general, and will also discuss what it means for an estimator to be unbiased and to have high variance.

Policy gradient methods: Estimate gradient of expected return

Gradient estimation: Estimate gradient of any expectation

GRADIENT ESTIMATION

35

<latexit sha1_base64="IVji3a96ysOSHc8GAl0AbhpmMj0=">AAAPQXicfZdPb9s2GMbV7l+XrVm7HXcRFhTohiCwUsd/DgVqyTZ6WNssS5puURBQNK0IoUSCpBy7hD7Qvsa+wK7rBxiw47DrLqMUWZYoybqE4fPw8U8vJZuvR3HARafz4d79jz7+5NPPHny+88WXD3e/evT467ecxAyiM0gwYe88wBEOInQmAoHRO8oQCD2Mzr0bJ9XPF4jxgESnYkXRZQj8KJgHEAg1dfXIcQHzQ7C8kq4nrpEAiemGQFx7npwkV5Je5dNPXQ5CitH3yYULfSznSTFzefVor3PQyS6zPrDywZ6RX8dXjx/+5c4IjEMUCYgB5xdWh4pLCZgIIEbJjhtzRAG8AT66iMV8cCmDiMYCRTAxnyhtHmNTEDO9IXMWMAQFXqkBgCxQCSa8BgxAoW57pxrFUQRCxPdni4DyuyFf+HcDAVTNLuUyq2lSWSh9Buh1AJcVMglCnpaqNslXoVedRDFGbBFWJ1NKxag5l4jBgKc1OFaFeUPTbeKn5DjXr1f0GkU8kTHDSXmhEhBjaK4WZkOORExldjPq2bjhzwWL0X46zOaejwG7OUGzfZVTmajizDEBojrlhWlxInQLSRiCaCZdmkhXoKWQ7v5BosQnpoeV2SOAzarOk0TK/PEyT5S1Ir4uia91cVISJ/mHEDwz54SZC7X/hHFTGU1lYQFEvLr6rFg9N8/06Lcl8a0unpfEc1304pIa19RFSV3U1NuSequry5K41MVVSVzp4vuS+F4X322L/WVb7K9arKq/eilWO+4MzdVXT/YIyRuYyJenr35M5DC78mchRqZVNUJvbXw27fWH/USX8VrvTgeWPa7rhaFvjyynyVA4Rn2nM55sWA41bwHd6fQmo54eBfFGHwydaV3fwHZG/bHdYNjQTp3upJ+XD6FIs/qFzxn2urWy+EXO0Lbto2FdLwx213EGhw2GwuGMx+ORk6HQmKkvcs1L18Ze76hTj6JF0KDT644a9M0GdAa2XYOlJRZ7altjK2MRCGDNKYqHxRmMhhM9SGzqb48cp7aBolT+gWONuw2GDevR+GjyLCMhDES+XhVSlK/XHzwb6FGkCJr21Q7WWMjmk6ZDu9OvsZASy9R27LSw1TdRvWQX1qW8+8rdvHfmnmUmiW5WL5puTt+9tVnz4gYzbnc32rf5mxfgVvj6nULYFg8bwmErDGxige3wsBEeboP3636/Ld5vCPdbYfwmFr8d3m+E97fB07qftsXThnDaCkObWGg7PG2Ep9vgRd0v2uJFQ7hohRFNLKIdXjTCi23wpO4nbfGkIZy0wpAmFtIOTxrhyRZ4hm7VkS89KaQtAqtFqjO5Os1mOsQS1HQugECZTLDkNTlvUJTuhYop+6fuAXHhUMOajihf62oYYMWje2YBhySOREZCsXR9oJSkOPXMAtW3mIiLIMy6KO0msq5ofZPqPKbH+3y+OUxJX3VZLlEHdqDOsGknIn+eJvU1dLZ1zfE4yflo2kQBbKZNCMFmEOXntMpPCE3DbqA6ut+577ZyjFQzxtAr9SFv8vAfZNYfBmrT1F93Px1tM4Ll2qhGO6oxtPQ2sD44PzywugeW9VN378WrvEd8YHxrfGc8NSyjb7wwXhrHxpkBjd+MP4w/jQ+7v+/+vfvP7r931vv38jXfGJVr97//AbCVsRk=</latexit>

argmax✓

Ep✓(z)[f(z)]

In this final part of the lecture, we will be discussing gradient estimation in general. As we have seen in the previous part, policy gradient methods estimate the gradient of the expected return in an MDP. Gradient estimation as a field generalizes this: We want to estimate the gradient of any expectation, and not just over MDPs. This allows us to optimize problems of the form of the formula.

Page 13: Lecture 9: Reinforcement Learning Emile van Krieken · 2021. 1. 25. · Emile van Krieken Deep Learning 2020 Lecture 9: Reinforcement Learning dlvu.github.io Reinforcement learning

Derivative of expectation:

Continuous distributions:

Discrete distributions:

GRADIENT ESTIMATION

36

~

<latexit sha1_base64="JYRotR/AW1We/S2YX7ncehvQmNA=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIjeOXQ4Faso0e1jbL8rbVQUDRtCKYEgmScuwKuu0L7Lp9gx2HXfdB9gH2PUbJtiyRknXJP3wePv7pT8kmPYoDLizr3wcPP/r4k08/e/T53hdfPt5/8vTZV5ecxAyiC0gwYdce4AgHEboQgcDomjIEQg+jK2/mZvrVHDEekOhcLCm6CYEfBdMAAiGHrsaeuEMC3D49sI6s/DL1wl4XB8b6Or199vi/8YTAOESRgBhw/t62qLhJABMBxCjdG8ccUQBnwEfvYzHt3iRBRGOBIpiaz6U2jbEpiJkhmZOAISjwUhYAskAmmPAOMACFBN+rRnEUgRDxw8k8oHxV8rm/KgSQd32TLPKupJWJic8AvQvgokKWgJCHQNxpg3wZetVBFGPE5mF1MKOUjIpzgRgMeNaDU9mYdzRrND8np2v9bknvUMTTJGY4LU+UAmIMTeXEvORIxDTJb0au7oy/FCxGh1mZj70cADY7Q5NDmVMZqOJMMQGiOuSFWXMidA9JGIJokoxpmowFWohkfHiUSvG56WFp9ghgk6rzLE2ScdYzzzPPpLUivi2Jb1VxWBKH6w8heGJOCTPncv0J46Y0mtLCAoh4dfZFMXtqXqjRlyXxUhWvSuKVKnpxSY01dV5S55p6X1LvVXVREhequCyJS1X8UBI/qOL1rtifd8X+osTK/suXYrk3nqCp/PLIH6FkBtPk9fmbH9Kkl1/rZyFGpl01Qm9jPB61O71Oqsp4o7dGXdsZ6Hph6Dh9260zFI5+x7UGwy3LC8VbQFtWe9hvq1EQb/Vuzx3p+hbW6ncGTo1hSztyW8POun0IRYrVL3xur93S2uIXOT3HcU56ul4YnJbrdl/UGAqHOxgM+m6OQmNGMVK8dGNst08sPYoWQV2r3erX6NsFsLqOo8HSEoszcuyBnbMIBLDiFMXD4nb7vaEaJLb9d/quqy2gKLW/69qDVo1hy3oyOBke5ySEgchXu0KK9rU73eOuGkWKoFFHrqDGQrafNOo5VkdjISWWkeM6WWOrb6J8yd7bN8nqK3f73pkHtpmmqlm+aKo5e/c2ZsWLa8y42V1r3+Wvn4Ab4fU7hbApHtaEw0YYWMcCm+FhLTzcBe/rfr8p3q8J9xth/DoWvxner4X3d8FT3U+b4mlNOG2EoXUstBme1sLTXfBC94umeFETLhphRB2LaIYXtfBiFzzR/aQpntSEk0YYUsdCmuFJLTzZAc/QvdzyZTsF+XQlTIuUe3K5m811iBOg6VwAgXKZ4IRr8uq0keleKJnyf3QPiAuHLDUdUb7RZRlgyaN6JgGHJI5ETkJxMvaBVNJi1zMJ5LnFRFwEYX4OUm4ChPLHdHOTcienxvt8ut1MJX56m4yJ3LADuYfNTiLJT6NUn0MnO+ecDtI1H80OUQCb2SGEYDOI1vu0yk8IzcJmUG7dV+7VUg6QPIwx9EZ+yLt1+Pdy0ZgfBnLR5N/xYVbtMoLFxiirPXkwtNVjoF5cvTiyW0e2/WPr4FV7fUZ8ZHxjfGt8Z9hGx3hlvDZOjQsDGjPjN+N344/9X/f/3P9r/++V9eGD9Zyvjcq1/8//cW+XHQ==</latexit>

✓<latexit sha1_base64="T8FmzHO54kPteUaY1WhvZp8nAKY=">AAAPBXicfZdNb9s2HMbV7q3L1qzdjrsICwoMQxBIieOXQ4Faso0e1jbL8tbFQUDRtCKEEgmScuwKuu4L7Lp9gx2HXfc59gH2PUbJtiyRknXJP3wePv7pT8kmPYoDLizr30ePP/r4k08/e/L5zhdfPt396tnzry84iRlE55Bgwq48wBEOInQuAoHRFWUIhB5Gl969m+mXM8R4QKIzsaDoJgR+FEwDCIQcek9vx564QwLcPtuzDqz8MvXCXhV7xuo6uX3+9L/xhMA4RJGAGHB+bVtU3CSAiQBilO6MY44ogPfAR9exmHZvkiCisUARTM0XUpvG2BTEzKDMScAQFHghCwBZIBNMeAcYgEKi71SjOIpAiPj+ZBZQviz5zF8WAsj7vknmeV/SysTEZ4DeBXBeIUtAyEMg7rRBvgi96iCKMWKzsDqYUUpGxTlHDAY868GJbMw7mrWan5GTlX63oHco4mkSM5yWJ0oBMYamcmJeciRimuQ3I9f3nr8ULEb7WZmPvRwAdn+KJvsypzJQxZliAkR1yAuz5kToAZIwBNEkGdM0GQs0F8l4/yCV4gvTw9LsEcAmVedpmiTjrGeeZ55Ka0V8WxLfquKwJA5XH0LwxJwSZs7k+hPGTWk0pYUFEPHq7PNi9tQ8V6MvSuKFKl6WxEtV9OKSGmvqrKTONPWhpD6o6rwkzlVxURIXqvihJH5Qxattse+3xf6ixMr+y5disTOeoKn8+sgfoeQepsnrszc/pkkvv1bPQoxMu2qE3tp4NGp3ep1UlfFab426tjPQ9cLQcfq2W2coHP2Oaw2GG5ZDxVtAW1Z72G+rURBv9G7PHen6BtbqdwZOjWFDO3Jbw86qfQhFitUvfG6v3dLa4hc5Pcdxjnu6Xhiclut2D2sMhcMdDAZ9N0ehMaMYKV66Nrbbx5YeRYugrtVu9Wv0zQJYXcfRYGmJxRk59sDOWQQCWHGK4mFxu/3eUA0Sm/47fdfVFlCU2t917UGrxrBhPR4cD49yEsJA5KtdIUX72p3uUVeNIkXQqCNXUGMhm08a9Ryro7GQEsvIcZ2ssdU3Ub5k1/ZNsvzK3bx35p5tpqlqli+aas7evbVZ8eIaM25219q3+esn4EZ4/U4hbIqHNeGwEQbWscBmeFgLD7fB+7rfb4r3a8L9Rhi/jsVvhvdr4f1t8FT306Z4WhNOG2FoHQtthqe18HQbvND9oile1ISLRhhRxyKa4UUtvNgGT3Q/aYonNeGkEYbUsZBmeFILT7bAM/Qgt3zZTkE+XQnTIuWeXO5mcx3iBGg6F0CgXCY44Zq8PG1kuhdKpvwf3QPiwiFLTUeUr3VZBljyqJ5JwCGJI5GTUJyMfSCVtNj1TAJ5bjERF0GYn4SUmwCh/DFd36TcyanxPp9uNlOJn94mYyI37EDuYbOTSPLzKNXn0MnWOSeDdMVHs0MUwGZ2CCHYDKLVPq3yE0KzsHsot+5L93IpB0gexhh6Iz/k3Sr8B7lozA8DuWjy73g/q7YZwXxtlNWOPBja6jFQLy4PD+zWgW3/1Np71V6dEZ8Y3xrfGd8bttExXhmvjRPj3IBGaPxm/G78sfvr7p+7f+3+vbQ+frSa841RuXb/+R+p/5gA</latexit>p✓

<latexit sha1_base64="ytaGjQ3HxJR6GVJlezzzjEx+HxU=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxLGD4PH/30pyiTHsUBF5b176PHH338yaefPfl854svn+4+e/7iq0tOYgbRBSSYsPce4AgHEboQgcDoPWUIhB5GV97MzfSrOWI8ING5WFJ0EwI/CqYBBEJ2XY05CClGt8/3rAMrv0y9Ya8ae8bqOr198fS/8YTAOESRgBhwfm1bVNwkgIkAYpTujGOOKIAz4KPrWEy7N0kQ0VigCKbmS6lNY2wKYmZI5iRgCAq8lA0AWSATTHgHGIBCgu9UoziKQIj4/mQeUP7Q5HP/oSGAfOqbZJFXJa0MTHwG6F0AFxWyBIQ8BOJO6+TL0Kt2ohgjNg+rnRmlZFScC8RgwLManMrCvKNZofk5OV3pd0t6hyKeJjHDaXmgFBBjaCoH5k2OREyT/GHk7M74K8FitJ81875XA8BmZ2iyL3MqHVWcKSZAVLu8MCtOhO4hCUMQTZIxTZOxQAuRjPcPUim+ND0szR4BbFJ1nqVJMs5q5nnmmbRWxLcl8a0qDkvicHUTgifmlDBzLuefMG5KoyktLICIV0dfFKOn5oUafVkSL1XxqiReqaIXl9RYU+clda6p9yX1XlUXJXGhisuSuFTFDyXxgyq+3xb787bYX5RYWX+5KJY74wmayo9H/golM5gmb85PfkiTXn6t3oUYmXbVCL218WjU7vQ6qSrjtd4adW1noOuFoeP0bbfOUDj6HdcaDDcsh4q3gLas9rDfVqMg3ujdnjvS9Q2s1e8MnBrDhnbktoadVfkQihSrX/jcXrullcUvcnqO4xz3dL0wOC3X7R7WGAqHOxgM+m6OQmMmP+OKl66N7faxpUfRIqhrtVv9Gn0zAVbXcTRYWmJxRo49sHMWgQBWnKJ4WdxuvzdUg8Sm/k7fdbUJFKXyd1170KoxbFiPB8fDo5yEMBD5alVIUb52p3vUVaNIETTqyBnUWMjmTqOeY3U0FlJiGTmukxW2uhLlIru2b5KHT+5m3Zl7tpmmqlkuNNWcrb21WfHiGjNudtfat/nrB+BGeP1JIWyKhzXhsBEG1rHAZnhYCw+3wfu632+K92vC/UYYv47Fb4b3a+H9bfBU99OmeFoTThthaB0LbYantfB0G7zQ/aIpXtSEi0YYUccimuFFLbzYBk90P2mKJzXhpBGG1LGQZnhSC0+2wDN0L7d82U5Bvl0J0yLlnlzuZnMd4gRoOhdAoFwmOOGa7Ik7JECme6Fkyv/RPSAuHLKp6YjytS6bAZY8qmcScEjiSOQkFCdjH0glLXY9k0CeW0zERRDm5yDlIfIz0foh5X5Mjff5dLOZSvz0NhkTuWEHcg+bnUSSn0apPoZOto45HaQrPpodogA2s0MIwWYQrfZplZ8QmoXNoNy6P7gfpnKA5GGMoRN5k3er8O/lpDE/DOSkyb/j/ay1zQgWa6Ns7ciDoa0eA/XG1eGB3Tqw7R9be69PVmfEJ8Y3xrfGd4ZtdIzXxhvj1LgwoDEzfjN+N/7Y/XX3z92/dv9+sD5+tBrztVG5dv/5Hw1/l0A=</latexit>z<latexit sha1_base64="dSa//P/biN8T0p7UDtRAa3DR8qw=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxL/uHz8NFPf1o26VEccGFZ/z56/NHHn3z62ZPPd7748unus+cvvrrkJGYQXUCCCXvvAY5wEKELEQiM3lOGQOhhdOXN3Ey/miPGAxKdiyVFNyHwo2AaQCDk0NUY+jiZprfP96wDK79MvbBXxZ6xuk5vXzz9bzwhMA5RJCAGnF/bFhU3CWAigBilO+OYIwrgDPjoOhbT7k0SRDQWKIKp+VJq0xibgpgZkjkJGIICL2UBIAtkggnvAANQSPCdahRHEQgR35/MA8ofSj73HwoB5FPfJIu8K2llYuIzQO8CuKiQJSDkIRB32iBfhl51EMUYsXlYHcwoJaPiXCAGA5714FQ25h3NGs3PyelKv1vSOxTxNIkZTssTpYAYQ1M5MS85EjFN8oeRqzvjrwSL0X5W5mOvBoDNztBkX+ZUBqo4U0yAqA55YdacCN1DEoYgmiRjmiZjgRYiGe8fpFJ8aXpYmj0C2KTqPEuTZJz1zPPMM2mtiG9L4ltVHJbE4eomBE/MKWHmXK4/YdyURlNaWAARr86+KGZPzQs1+rIkXqriVUm8UkUvLqmxps5L6lxT70vqvaouSuJCFZclcamKH0riB1V8vy32522xvyixsv/ypVjujCdoKr888o9QMoNp8ub85Ic06eXX6rMQI9OuGqG3Nh6N2p1eJ1VlvNZbo67tDHS9MHScvu3WGQpHv+Nag+GG5VDxFtCW1R7222oUxBu923NHur6BtfqdgVNj2NCO3Naws2ofQpFi9Quf22u3tLb4RU7PcZzjnq4XBqflut3DGkPhcAeDQd/NUWjMKEaKl66N7faxpUfRIqhrtVv9Gn2zAFbXcTRYWmJxRo49sHMWgQBWnKL4sLjdfm+oBolN/52+62oLKErt77r2oFVj2LAeD46HRzkJYSDy1a6Qon3tTveoq0aRImjUkSuosZDNnUY9x+poLKTEMnJcJ2ts9U2UL9m1fZM8fOVu3jtzzzbTVDXLF001Z+/e2qx4cY0ZN7tr7dv89RNwI7z+pBA2xcOacNgIA+tYYDM8rIWH2+B93e83xfs14X4jjF/H4jfD+7Xw/jZ4qvtpUzytCaeNMLSOhTbD01p4ug1e6H7RFC9qwkUjjKhjEc3wohZebIMnup80xZOacNIIQ+pYSDM8qYUnW+AZupdbvmynkB0MmBYp9+RyN5vrECdA07kAAuUywQnXZE/cIQEy3QslU/6P7gFx4ZClpiPK17osAyx5VM8k4JDEkchJKE7GPpBKWux6JoE8t5iIiyDMz0HKQ4BQ/piuH1Lux9R4n083m6nET2+TMZEbdiD3sNlJJPlplOpz6GTrnNNBuuKj2SEKYDM7hBBsBtFqn1b5CaFZ2AzKrfuD+2EpB0gexhg6kTd5twr/Xi4a88NALpr8O97Pqm1GsFgbZbUjD4a2egzUi6vDA7t1YNs/tvZen6zOiE+Mb4xvje8M2+gYr403xqlxYUBjZvxm/G78sfvr7p+7f+3+/WB9/Gg152ujcu3+8z8UJZdS</latexit>

f

Stochastic node

<latexit sha1_base64="XUoEPgQbWwxoFDiX98EqCDobYLY=">AAAPPnicfVfLbuM2FNVMX9O0k2baZTdCgwGmRRBIiePHYoBYso1ZNDPpdPJooyCgaFoRTIkESTn2CPqd/kZ/oNsW6AfMsui2y1KyLUuUZG1yw3N4dHhJyve6FPtcGMbfjx5/9PEnn3725POdL758uvvV3rOvLzmJGEQXkGDCrl3AEfZDdCF8gdE1ZQgELkZX7tRO8asZYtwn4TuxoOg2AF7oT3wIhBy62zt1QuBicOe44h4JoDsBEPeuGw+Tu5iuR184HAQUo++TGwd6OJ4k+cjt3d6+cWhkj14NzFWwr62e87tnTz84YwKjAIUCYsD5jWlQcRsDJnyIUbLjRBxRAKfAQzeRmHRvYz+kkUAhTPTnEptEWBdET1ejj32GoMALGQDIfKmgw3vAABRyzTtlKY5CECB+MJ75lC9DPvOWgZBJQLfxPEtoUpoYewzQex/OS85iEPA0VZVBvgjc8iCKMGKzoDyYupQeFeYcMejzNAfnMjFvaLpH/B05X+H3C3qPQp7EEcNJcaIEEGNoIidmIUcionG2GHkwpvylYBE6SMNs7OUAsOlbND6QOqWBsp0JJkCUh9wgTU6IHiAJAhCOY4cmsSPQXMTOwWEiwee6PE5w6hLAxmXm2ySOV8dLfyupJfB1AXytgsMCOFy9hOCxPiFMn8n9J4zrkqhLCvMh4uXZF/nsiX6hSl8WwEsVvCqAVyroRgU0qqCzAjqroA8F9EFF5wVwroKLArhQwfcF8L0KXm+T/WWb7K+KrMy/vBSLHWeMJvK7kx2heAqT+NW7sx+TuJc9q7MQId0sE6G7Jh6P2p1eJ1FhvMZbo65pDap4TuhYfdOuI+SMfsc2BsONlyOFm5s2jPaw31alIN7g3Z49quIbs0a/M7BqCBu3I7s17KzSh1CoUL2cZ/farUpavFynZ1nWSa+K5wSrZdvdoxpCzrAHg0HfzqzQiMkPucKla2K7fWJUpWgu1DXarX4NvtkAo2tZFbO04MUaWebAzLwIBLDCFPlhsbv93lAVEpv8W33brmygKKS/a5uDVg1h4/VkcDI8zpwQBkJPzQrJ09fudI+7qhTJhUYduYMVL2TzplHPMjoVL6TgZWTZVprY8k2Ul+zGvI2Xn9zNvdP3TT1JVLK8aCo5vXtrssLFNWTczK6lb+PXT8CN5qsrhbBJHtaIw0YzsM4LbDYPa83Dbea9Kt9rkvdqxL1GM16dF6/ZvFdr3ttmnlb5tEme1ojTRjO0zgttNk9rzdNt5kWVL5rkRY24aDQj6ryIZvOi1rzYZp5U+aRJntSIk0YzpM4LaTZPas2TLeYZepAlX1oppC0Cq0jKmlxWsxkOcQwqOBdAoAwmOOYVeNmJpLgbSE/ZP1UOiHKGDCs4onyNy9DH0o/KGfsckigUmROKY8cDEknyqmfsy75FR1z4QdZCKYvIuqL1ImU9psp7fLIppmJPdlkOkQU7kDVs2onEP4+S6hw63jrnfJCs/NG0iQJYT5sQgnU/XNVppZ8QmopNoSzdl+zlVg6QbMYYOpMvebMS/0FuGvMCX26a/OscpNE2IpiviTLakY2hqbaB1eDq6NBsHZrmT63907NVj/hE+1b7TnuhmVpHO9VeaefahQa137Q/tD+1v3Z/3/2w+8/uv0vq40erOd9opWf3v/8BLCWvgQ==</latexit>

r✓Ep✓(z)[f(z)]

<latexit sha1_base64="pe+CJofaq4h40W6t1CBpd7g06r4=">AAAPPnicfZdPb9s2GMbV7l+XrVm7HXcRFhTohiCQWsd/DgVqyTZ6WNusa5puVRBQNK0IpkSCpBy7gr7Ovsa+wK4bsA/Q47DrjqNkW5ZIybrkNZ+Hj396KTmkT3HIhWX9fev2Rx9/8ulndz4/+OLLu4df3bv/9RtOEgbROSSYsLc+4AiHMToXocDoLWUIRD5GF/7czfWLBWI8JPFrsaLoMgJBHM5CCIQcurr31IuBj8GV54trJIDphbEw6fbjQ4+DiGL0venBAKezbDcw3VRX946sE6u4TL2wN8WRsbnOru7f/eBNCUwiFAuIAefvbIuKyxQwEUKMsgMv4YgCOAcBepeIWf8yDWOaCBTDzHwgtVmCTUHM/G7MacgQFHglCwBZKBNMeA0YgELe80E9iqMYRIgfTxch5euSL4J1IWQT0GW6LBqa1SamAQP0OoTLGlkKIh4Bca0N8lXk1wdRghFbRPXBnFIyKs4lYjDkeQ/OZGNe0nyN+GtyttGvV/QaxTxLE4az6kQpIMbQTE4sSo5EQtPiZuSDMedPBEvQcV4WY09GgM1foemxzKkN1HFmmABRH/KjvDkxuoEkikA8TT2apZ5AS5F6xyeZFB+Y8nGCc58ANq07X2Vp6uU9833zlbTWxBcV8YUqjiviePMlBE/NGWHmQq4/YdyURlNaWAgRr88+L2fPzHM1+k1FfKOKFxXxQhX9pKImmrqoqAtNvamoN6q6rIhLVVxVxJUqvq+I71Xx7b7YX/bF/qrEyv7Ll2J14E3RTP7uFI9QOodZ+uz18x+zdFBcm2chQaZdN0J/a3w86fYGvUyV8VbvTPq2M9L10tBzhrbbZCgdw55rjcY7lkeKt4S2rO542FWjIN7p/YE70fUdrDXsjZwGw4524nbGvU37EIoVa1D63EG3o7UlKHMGjuOcDnS9NDgd1+0/ajCUDnc0Gg3dAoUmTP6MK166NXa7p5YeRcugvtXtDBv03QJYfcfRYGmFxZk49sguWAQCWHGK8mFx+8PBWA0Su/47Q9fVFlBU2t937VGnwbBjPR2djh8XJISBOFC7Qsr2dXv9x301ipRBk55cQY2F7L5pMnCsnsZCKiwTx3XyxtbfRPmSvbMv0/VP7u69M49sM8tUs3zRVHP+7m3Nihc3mHG7u9G+z988AbfC63cKYVs8bAiHrTCwiQW2w8NGeLgPPtD9QVt80BAetMIETSxBO3zQCB/sg6e6n7bF04Zw2gpDm1hoOzxthKf74IXuF23xoiFctMKIJhbRDi8a4cU+eKL7SVs8aQgnrTCkiYW0w5NGeLIHnqEbueXLdwr5UYFpkXJPLnezhQ5xCjSdCyBQIROcck1eH0hy3Y8kU/FB94CkdMhS0xHlW12WIZY8qmcackiSWBQkFKdeAKSSlbueaSjPLSbiIoyKI5RyE8WZaHuTcj+mxgd8tttMpUF2lXpEbtiB3MPmJ5H050mmz6HTvXPORtmGj+aHKIDN/BBCsBnGm31a7V8IzcPmUG7d1+71Uo6QPIwx9Fx+yctN+A9y0VgQhXLR5F/vOK/2GcFya5TVgTwY2uoxUC8uHp3YnRPb/qlz9PT55ox4x/jW+M54aNhGz3hqPDPOjHMDGr8Zfxh/Gn8d/n744fCfw3/X1tu3NnO+MWrX4X//A19wrrQ=</latexit>

r✓

Zp✓(z)f(z)dz

<latexit sha1_base64="3Sqeikh9CUHQFJkEOGIjtYba2mc=">AAAPPnicfZdNb9s2HMbV7q3L1qzdjrsICwp0QxBIreOXQ4Faso0e1jbrmqZbFQQUTSuCKZEgKceuoK+zr7EvsOsG7AP0OOy64yjZliVSsi5h+Dx89NOfokz6FIdcWNbft25/9PEnn3525/ODL768e/jVvftfv+EkYRCdQ4IJe+sDjnAYo3MRCozeUoZA5GN04c/dXL9YIMZDEr8WK4ouIxDE4SyEQMiuq3tPvRj4GFx5vrhGApgeT6Irj4OIYmTSbffDTc/3pgcDnM6yXcfVvSPrxCouU2/Ym8aRsbnOru7f/eBNCUwiFAuIAefvbIuKyxQwEUKMsgMv4YgCOAcBepeIWf8yDWOaCBTDzHwgtVmCTUHM/GnMacgQFHglGwCyUCaY8BowAIV85oN6FEcxiBA/ni5CytdNvgjWDSGLgC7TZVHQrDYwDRig1yFc1shSEPEIiGutk68iv96JEozYIqp35pSSUXEuEYMhz2twJgvzkuZzxF+Ts41+vaLXKOZZmjCcVQdKATGGZnJg0eRIJDQtHka+GHP+RLAEHefNou/JCLD5KzQ9ljm1jjrODBMg6l1+lBcnRjeQRBGIp6lHs9QTaClS7/gkk+IDU75OcO4TwKZ156ssTb28Zr5vvpLWmviiIr5QxXFFHG9uQvDUnBFmLuT8E8ZNaTSlhYUQ8fro83L0zDxXo99UxDeqeFERL1TRTypqoqmLirrQ1JuKeqOqy4q4VMVVRVyp4vuK+F4V3+6L/WVf7K9KrKy/XBSrA2+KZvK7U7xC6Rxm6bPXz3/M0kFxbd6FBJl23Qj9rfHxpNsb9DJVxlu9M+nbzkjXS0PPGdpuk6F0DHuuNRrvWB4p3hLasrrjYVeNgnin9wfuRNd3sNawN3IaDDvaidsZ9zblQyhWrEHpcwfdjlaWoMwZOI5zOtD10uB0XLf/qMFQOtzRaDR0CxSaMPkhV7x0a+x2Ty09ipZBfavbGTbouwmw+o6jwdIKizNx7JFdsAgEsOIU5cvi9oeDsRokdvV3hq6rTaColL/v2qNOg2HHejo6HT8uSAgDcaBWhZTl6/b6j/tqFCmDJj05gxoL2d1pMnCsnsZCKiwTx3XywtZXolxk7+zLdP3J3a0788g2s0w1y4WmmvO1tzUrXtxgxu3uRvs+f/MA3AqvPymEbfGwIRy2wsAmFtgODxvh4T74QPcHbfFBQ3jQChM0sQTt8EEjfLAPnup+2hZPG8JpKwxtYqHt8LQRnu6DF7pftMWLhnDRCiOaWEQ7vGiEF/vgie4nbfGkIZy0wpAmFtIOTxrhyR54hm7kli/fKeRHBKZFyj253M0WOsQp0HQugECFTHDKNXl9EMl1P5JMxT+6BySlQzY1HVG+1WUzxJJH9UxDDkkSi4KE4tQLgFSyctczDeW5xURchFFxhFIeojgVbR9S7sfU+IDPdpupNMiuUo/IDTuQe9j8JJL+PMn0MXS6d8zZKNvw0fwQBbCZH0IINsN4s0+r/YTQPGwO5dZ97V5P5QjJwxhDz+VNXm7Cf5CTxoIolJMm/3rHeWufESy3Rtk6kAdDWz0G6o2LRyd258S2f+ocPX2+OSPeMb41vjMeGrbRM54az4wz49yAxm/GH8afxl+Hvx9+OPzn8N+19fatzZhvjNp1+N//t3CuuQ==</latexit>

r✓

X

z

p✓(z)f(z)

We want to find the derivative of an expectation. This is represented using the graph on the right: We use the parameter \theta to form a distribution p_\theta. Then, we sample z from this distribution. This is represented by the tilde (~) operation. Finally, z is used in the function f to optimize. The problem? There is no differentiable path from the function f to the parameters theta, so we cannot directly optimize this. This is because sampling isn’t differentiable, and because the function f might also not be!

The derivative of an expectation can mean different thing, depending on whether the distribution p_\btheta is a continuous or a discrete distribution. For continuous distributions, we compute an integral over all options of z, while for discrete distributions, it is a sum over all options. Both are normalized by the probability density.

Recall score function:

• All distributions • All functions • But very high variance

<latexit sha1_base64="/mpHnbc6UrcMwHv01CGcLvhpgb8=">AAAPEXicfZfbbts2HMbV7tRla5aul7sRFhTohiCQGseHiwK1ZBu9WNosy6FbHQQUTStCKJEgKceuoJfYC+x2e4NdDrvdE+wB9h6jZFuWSMm6CcPv46ef/hRl0qM44MKy/n3w8KOPP/n0s0ef73zx5ePdr/aefH3JScwguoAEE/bOAxzhIEIXIhAYvaMMgdDD6Mq7czP9aoYYD0h0LhYUXYfAj4JpAIGQXTd7T+nN2BO3SIDnYw5CitF35s3evnVo5ZepN+xVY99YXac3Tx7/N54QGIcoEhADzt/bFhXXCWAigBilO+OYIwrgHfDR+1hMu9dJENFYoAim5jOpTWNsCmJmgOYkYAgKvJANAFkgE0x4CxiAQj7GTjWKowiEiB9MZgHlyyaf+cuGALIG18k8r1FaGZj4DNDbAM4rZAkIeQjErdbJF6FX7UQxRmwWVjszSsmoOOeIwYBnNTiVhXlLs7Lzc3K60m8X9BZFPE1ihtPyQCkgxtBUDsybHImYJvnDyLm+4y8Fi9FB1sz7Xg4AuztDkwOZU+mo4kwxAaLa5YVZcSJ0D0kYgmiSjGmajAWai2R8cJhK8ZnpYWn2CGCTqvMsTZJxVjPPM8+ktSK+KYlvVHFYEoermxA8MaeEmTM5/4RxUxpNaWEBRLw6+qIYPTUv1OjLknipilcl8UoVvbikxpo6K6kzTb0vqfeqOi+Jc1VclMSFKn4oiR9U8d222J+3xf6ixMr6y0Wx2BlP0FR+SvJXKLmDafL6/OSHNOnl1+pdiJFpV43QWxuPRu1Or5OqMl7rrVHXdga6Xhg6Tt926wyFo99xrcFww/JC8RbQltUe9ttqFMQbvdtzR7q+gbX6nYFTY9jQjtzWsLMqH0KRYvULn9trt7Sy+EVOz3Gc456uFwan5brdFzWGwuEOBoO+m6PQmMkPueKla2O7fWzpUbQI6lrtVr9G30yA1XUcDZaWWJyRYw/snEUggBWnKF4Wt9vvDdUgsam/03ddbQJFqfxd1x60agwb1uPB8fAoJyEMRL5aFVKUr93pHnXVKFIEjTpyBjUWsrnTqOdYHY2FlFhGjutkha2uRLnI3tvXyfKTu1l35r5tpqlqlgtNNWdrb21WvLjGjJvdtfZt/voBuBFef1IIm+JhTThshIF1LLAZHtbCw23wvu73m+L9mnC/EcavY/Gb4f1aeH8bPNX9tCme1oTTRhhax0Kb4WktPN0GL3S/aIoXNeGiEUbUsYhmeFELL7bBE91PmuJJTThphCF1LKQZntTCky3wDN3LLV+2U5BvV8K0SLknl7vZXIc4AZrOBRAolwlOuCYvzx2Z7oWSKf9H94C4cMimpiPK17psBljyqJ5JwCGJI5GTUJyMfSCVtNj1TAJ5bjERF0GYn4qUh8hPReuHlPsxNd7n081mKvHTm2RM5IYdyD1sdhJJfhql+hg62TrmdJCu+Gh2iALYzA4hBJtBtNqnVX5CaBZ2B+XWfeleTuUAycMYQyfyJm9X4d/LSWN+GMhJk3/HB1lrmxHM10bZ2pEHQ1s9BuqNqxeHduvQtn9s7b86WZ0RHxnfGN8azw3b6BivjNfGqXFhQGNh/Gb8bvyx++vun7t/7f69tD58sBrz1Khcu//8DznLm/0=</latexit>

p✓(z)<latexit sha1_base64="dSa//P/biN8T0p7UDtRAa3DR8qw=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxL/uHz8NFPf1o26VEccGFZ/z56/NHHn3z62ZPPd7748unus+cvvrrkJGYQXUCCCXvvAY5wEKELEQiM3lOGQOhhdOXN3Ey/miPGAxKdiyVFNyHwo2AaQCDk0NUY+jiZprfP96wDK79MvbBXxZ6xuk5vXzz9bzwhMA5RJCAGnF/bFhU3CWAigBilO+OYIwrgDPjoOhbT7k0SRDQWKIKp+VJq0xibgpgZkjkJGIICL2UBIAtkggnvAANQSPCdahRHEQgR35/MA8ofSj73HwoB5FPfJIu8K2llYuIzQO8CuKiQJSDkIRB32iBfhl51EMUYsXlYHcwoJaPiXCAGA5714FQ25h3NGs3PyelKv1vSOxTxNIkZTssTpYAYQ1M5MS85EjFN8oeRqzvjrwSL0X5W5mOvBoDNztBkX+ZUBqo4U0yAqA55YdacCN1DEoYgmiRjmiZjgRYiGe8fpFJ8aXpYmj0C2KTqPEuTZJz1zPPMM2mtiG9L4ltVHJbE4eomBE/MKWHmXK4/YdyURlNaWAARr86+KGZPzQs1+rIkXqriVUm8UkUvLqmxps5L6lxT70vqvaouSuJCFZclcamKH0riB1V8vy32522xvyixsv/ypVjujCdoKr888o9QMoNp8ub85Ic06eXX6rMQI9OuGqG3Nh6N2p1eJ1VlvNZbo67tDHS9MHScvu3WGQpHv+Nag+GG5VDxFtCW1R7222oUxBu923NHur6BtfqdgVNj2NCO3Naws2ofQpFi9Quf22u3tLb4RU7PcZzjnq4XBqflut3DGkPhcAeDQd/NUWjMKEaKl66N7faxpUfRIqhrtVv9Gn2zAFbXcTRYWmJxRo49sHMWgQBWnKL4sLjdfm+oBolN/52+62oLKErt77r2oFVj2LAeD46HRzkJYSDy1a6Qon3tTveoq0aRImjUkSuosZDNnUY9x+poLKTEMnJcJ2ts9U2UL9m1fZM8fOVu3jtzzzbTVDXLF001Z+/e2qx4cY0ZN7tr7dv89RNwI7z+pBA2xcOacNgIA+tYYDM8rIWH2+B93e83xfs14X4jjF/H4jfD+7Xw/jZ4qvtpUzytCaeNMLSOhTbD01p4ug1e6H7RFC9qwkUjjKhjEc3wohZebIMnup80xZOacNIIQ+pYSDM8qYUnW+AZupdbvmynkB0MmBYp9+RyN5vrECdA07kAAuUywQnXZE/cIQEy3QslU/6P7gFx4ZClpiPK17osAyx5VM8k4JDEkchJKE7GPpBKWux6JoE8t5iIiyDMz0HKQ4BQ/piuH1Lux9R4n083m6nET2+TMZEbdiD3sNlJJPlplOpz6GTrnNNBuuKj2SEKYDM7hBBsBtFqn1b5CaFZ2AzKrfuD+2EpB0gexhg6kTd5twr/Xi4a88NALpr8O97Pqm1GsFgbZbUjD4a2egzUi6vDA7t1YNs/tvZen6zOiE+Mb4xvje8M2+gYr403xqlxYUBjZvxm/G78sfvr7p+7f+3+/WB9/Gg152ujcu3+8z8UJZdS</latexit>

f

SCORE FUNCTION

37

<latexit sha1_base64="1g6JSjsblpN44H7bUqU2JV+O2LA=">AAAPynichVfdbts2GHW6vy5bl3a73I2woEU3BIGUOv65KFBLtlFga5t1TZOtCgKKphXBlMiRlGNX0N0eby+wB9h7jFJkWSIlTzf+xHN4ePiRlPl5FAdcmOY/e/c++fSzz7+4/+X+V18/+Obg4aNv33MSM4jOIcGEXXqAIxxE6FwEAqNLyhAIPYwuvIWT4RdLxHhAondiTdFVCPwomAcQCNl0/fDvJ24EPAyuXU/cIAEMFxPfoJvXpy4HIcXoR8N1958/cecMwMRd0DZamoMN7YYyjE7JBzCKEf6PnCZNY1w/PDSPzfwx9MAqgsNO8ZxdP3rwrzsjMA5RJCAGnH+wTCquEsBEADFK992YIwrgAvjoQyzmg6skiGgsUART47HE5jE2BDGyzBqzgCEo8FoGALJAKhjwBsjpCJn//boURxEIET+aLQPK70K+9O8CIeeOrpJVvrhprWPiM0BvAriqOUtAyEMgbrRGvg69eiOKMWLLsN6YuZQeFeYKMRjwLAdnMjFvaLZf+DtyVuA3a3qDIp4mMcNptaMEEGNoLjvmIUcipkk+GblJF/y5YDE6ysK87fkYsMVbNDuSOrWGup05JkDUm7wwS06EbiEJQxDNEpfK7SfQSiTu0XEqwceG3EVw4RHAZnXm2zRJ3Cxnnme8ldQa+LoCvlbBSQWcFIMQPDPmhBlLuf6EcUMSDUlhAUS83vu87D03zlXp9xXwvQpeVMALFfTiChpr6LKCLjX0toLequiqAq5UcF0B1yr4sQJ+VMHLXbK/75L9Q5GV+ZeHYr3vztBcfgPzLZQsYJq8fPfqlzQZ5k+xF2JkWHUi9DbEZ9Nef9hPVRhv8O50YNljHS8JfXtkOU2EkjHqO+Z4svVyonBL06bZm4x6qhTEW3wwdKY6vjVrjvpju4GwdTt1upN+kT6EIoXqlzxn2OtqafFLnaFt26dDHS8JdtdxBicNhJLhjMfjkZNboTGTX3KFSzfEXu/U1KVoKTQwe91RA75dAHNg25pZWvFiT21rbOVeBAJYYYpysziD0XCiColt/u2R42gLKCrpHzjWuNtA2Ho9HZ9OnuVOCAORr2aFlOnr9QfPBqoUKYWmfbmCmheyHWk6tM2+5oVUvExtx84SWz+J8pB9sK6Su0/u9twZh5aRpipZHjSVnJ29DVnh4gYybmc30nfxmzvgVvP6TCFsk4cN4rDVDGzyAtvNw0bzcJd5X+f7bfJ+g7jfasZv8uK3m/cbzfu7zFOdT9vkaYM4bTVDm7zQdvO00TzdZV7ofNEmLxrERasZ0eRFtJsXjebFLvNE55M2edIgTlrNkCYvpN08aTRPdphn6FZe+bKbgtxdCdMk5Z1c3mZzHOIEaDgXQKAcJjjhGnxXeGS4F0pP+YvOAXHJkKGGI8o3uAwDLP2onFnAIYkjkTuhOHF9IJG0vPXMAlm3GIiLIMzLOWUSeVm0maS8j6nyPp9vL1OJn14nLpEXdiDvsFklkvw2TfU+dLazz9k4LfzRrIgC2MiKEIKNICruabW/EJqJLWTBV7DvlnKMZDHG0Cs5yJtC/Ce5aMwPA7lo8tc9yqJdRLDaEGW0LwtDSy0D9eDi5NjqHlvWr93DF6+KGvF+5/vOD52nHavT77zovOycdc47cO9k73IP7HkHPx/8ebA+SO6o9/aKPt91as/BX/8BsGTeKw==</latexit>

r✓ log p✓(z)

=@ log p✓(z)

@p✓(z)r✓p✓(z)

=r✓p✓(z)

p✓(z)

~

<latexit sha1_base64="JYRotR/AW1We/S2YX7ncehvQmNA=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIjeOXQ4Faso0e1jbL8rbVQUDRtCKYEgmScuwKuu0L7Lp9gx2HXfdB9gH2PUbJtiyRknXJP3wePv7pT8kmPYoDLizr3wcPP/r4k08/e/T53hdfPt5/8vTZV5ecxAyiC0gwYdce4AgHEboQgcDomjIEQg+jK2/mZvrVHDEekOhcLCm6CYEfBdMAAiGHrsaeuEMC3D49sI6s/DL1wl4XB8b6Or199vi/8YTAOESRgBhw/t62qLhJABMBxCjdG8ccUQBnwEfvYzHt3iRBRGOBIpiaz6U2jbEpiJkhmZOAISjwUhYAskAmmPAOMACFBN+rRnEUgRDxw8k8oHxV8rm/KgSQd32TLPKupJWJic8AvQvgokKWgJCHQNxpg3wZetVBFGPE5mF1MKOUjIpzgRgMeNaDU9mYdzRrND8np2v9bknvUMTTJGY4LU+UAmIMTeXEvORIxDTJb0au7oy/FCxGh1mZj70cADY7Q5NDmVMZqOJMMQGiOuSFWXMidA9JGIJokoxpmowFWohkfHiUSvG56WFp9ghgk6rzLE2ScdYzzzPPpLUivi2Jb1VxWBKH6w8heGJOCTPncv0J46Y0mtLCAoh4dfZFMXtqXqjRlyXxUhWvSuKVKnpxSY01dV5S55p6X1LvVXVREhequCyJS1X8UBI/qOL1rtifd8X+osTK/suXYrk3nqCp/PLIH6FkBtPk9fmbH9Kkl1/rZyFGpl01Qm9jPB61O71Oqsp4o7dGXdsZ6Hph6Dh9260zFI5+x7UGwy3LC8VbQFtWe9hvq1EQb/Vuzx3p+hbW6ncGTo1hSztyW8POun0IRYrVL3xur93S2uIXOT3HcU56ul4YnJbrdl/UGAqHOxgM+m6OQmNGMVK8dGNst08sPYoWQV2r3erX6NsFsLqOo8HSEoszcuyBnbMIBLDiFMXD4nb7vaEaJLb9d/quqy2gKLW/69qDVo1hy3oyOBke5ySEgchXu0KK9rU73eOuGkWKoFFHrqDGQrafNOo5VkdjISWWkeM6WWOrb6J8yd7bN8nqK3f73pkHtpmmqlm+aKo5e/c2ZsWLa8y42V1r3+Wvn4Ab4fU7hbApHtaEw0YYWMcCm+FhLTzcBe/rfr8p3q8J9xth/DoWvxner4X3d8FT3U+b4mlNOG2EoXUstBme1sLTXfBC94umeFETLhphRB2LaIYXtfBiFzzR/aQpntSEk0YYUsdCmuFJLTzZAc/QvdzyZTsF+XQlTIuUe3K5m811iBOg6VwAgXKZ4IRr8uq0keleKJnyf3QPiAuHLDUdUb7RZRlgyaN6JgGHJI5ETkJxMvaBVNJi1zMJ5LnFRFwEYX4OUm4ChPLHdHOTcienxvt8ut1MJX56m4yJ3LADuYfNTiLJT6NUn0MnO+ecDtI1H80OUQCb2SGEYDOI1vu0yk8IzcJmUG7dV+7VUg6QPIwx9EZ+yLt1+Pdy0ZgfBnLR5N/xYVbtMoLFxiirPXkwtNVjoF5cvTiyW0e2/WPr4FV7fUZ8ZHxjfGt8Z9hGx3hlvDZOjQsDGjPjN+N344/9X/f/3P9r/++V9eGD9Zyvjcq1/8//cW+XHQ==</latexit>

✓<latexit sha1_base64="T8FmzHO54kPteUaY1WhvZp8nAKY=">AAAPBXicfZdNb9s2HMbV7q3L1qzdjrsICwoMQxBIieOXQ4Faso0e1jbL8tbFQUDRtCKEEgmScuwKuu4L7Lp9gx2HXfc59gH2PUbJtiyRknXJP3wePv7pT8kmPYoDLizr30ePP/r4k08/e/L5zhdfPt396tnzry84iRlE55Bgwq48wBEOInQuAoHRFWUIhB5Gl969m+mXM8R4QKIzsaDoJgR+FEwDCIQcek9vx564QwLcPtuzDqz8MvXCXhV7xuo6uX3+9L/xhMA4RJGAGHB+bVtU3CSAiQBilO6MY44ogPfAR9exmHZvkiCisUARTM0XUpvG2BTEzKDMScAQFHghCwBZIBNMeAcYgEKi71SjOIpAiPj+ZBZQviz5zF8WAsj7vknmeV/SysTEZ4DeBXBeIUtAyEMg7rRBvgi96iCKMWKzsDqYUUpGxTlHDAY868GJbMw7mrWan5GTlX63oHco4mkSM5yWJ0oBMYamcmJeciRimuQ3I9f3nr8ULEb7WZmPvRwAdn+KJvsypzJQxZliAkR1yAuz5kToAZIwBNEkGdM0GQs0F8l4/yCV4gvTw9LsEcAmVedpmiTjrGeeZ55Ka0V8WxLfquKwJA5XH0LwxJwSZs7k+hPGTWk0pYUFEPHq7PNi9tQ8V6MvSuKFKl6WxEtV9OKSGmvqrKTONPWhpD6o6rwkzlVxURIXqvihJH5Qxattse+3xf6ixMr+y5disTOeoKn8+sgfoeQepsnrszc/pkkvv1bPQoxMu2qE3tp4NGp3ep1UlfFab426tjPQ9cLQcfq2W2coHP2Oaw2GG5ZDxVtAW1Z72G+rURBv9G7PHen6BtbqdwZOjWFDO3Jbw86qfQhFitUvfG6v3dLa4hc5Pcdxjnu6Xhiclut2D2sMhcMdDAZ9N0ehMaMYKV66Nrbbx5YeRYugrtVu9Wv0zQJYXcfRYGmJxRk59sDOWQQCWHGK4mFxu/3eUA0Sm/47fdfVFlCU2t917UGrxrBhPR4cD49yEsJA5KtdIUX72p3uUVeNIkXQqCNXUGMhm08a9Ryro7GQEsvIcZ2ssdU3Ub5k1/ZNsvzK3bx35p5tpqlqli+aas7evbVZ8eIaM25219q3+esn4EZ4/U4hbIqHNeGwEQbWscBmeFgLD7fB+7rfb4r3a8L9Rhi/jsVvhvdr4f1t8FT306Z4WhNOG2FoHQtthqe18HQbvND9oile1ISLRhhRxyKa4UUtvNgGT3Q/aYonNeGkEYbUsZBmeFILT7bAM/Qgt3zZTkE+XQnTIuWeXO5mcx3iBGg6F0CgXCY44Zq8PG1kuhdKpvwf3QPiwiFLTUeUr3VZBljyqJ5JwCGJI5GTUJyMfSCVtNj1TAJ5bjERF0GYn4SUmwCh/DFd36TcyanxPp9uNlOJn94mYyI37EDuYbOTSPLzKNXn0MnWOSeDdMVHs0MUwGZ2CCHYDKLVPq3yE0KzsHsot+5L93IpB0gexhh6Iz/k3Sr8B7lozA8DuWjy73g/q7YZwXxtlNWOPBja6jFQLy4PD+zWgW3/1Np71V6dEZ8Y3xrfGd8bttExXhmvjRPj3IBGaPxm/G78sfvr7p+7f+3+vbQ+frSa841RuXb/+R+p/5gA</latexit>p✓

×

<latexit sha1_base64="ytaGjQ3HxJR6GVJlezzzjEx+HxU=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxLGD4PH/30pyiTHsUBF5b176PHH338yaefPfl854svn+4+e/7iq0tOYgbRBSSYsPce4AgHEboQgcDoPWUIhB5GV97MzfSrOWI8ING5WFJ0EwI/CqYBBEJ2XY05CClGt8/3rAMrv0y9Ya8ae8bqOr198fS/8YTAOESRgBhwfm1bVNwkgIkAYpTujGOOKIAz4KPrWEy7N0kQ0VigCKbmS6lNY2wKYmZI5iRgCAq8lA0AWSATTHgHGIBCgu9UoziKQIj4/mQeUP7Q5HP/oSGAfOqbZJFXJa0MTHwG6F0AFxWyBIQ8BOJO6+TL0Kt2ohgjNg+rnRmlZFScC8RgwLManMrCvKNZofk5OV3pd0t6hyKeJjHDaXmgFBBjaCoH5k2OREyT/GHk7M74K8FitJ81875XA8BmZ2iyL3MqHVWcKSZAVLu8MCtOhO4hCUMQTZIxTZOxQAuRjPcPUim+ND0szR4BbFJ1nqVJMs5q5nnmmbRWxLcl8a0qDkvicHUTgifmlDBzLuefMG5KoyktLICIV0dfFKOn5oUafVkSL1XxqiReqaIXl9RYU+clda6p9yX1XlUXJXGhisuSuFTFDyXxgyq+3xb787bYX5RYWX+5KJY74wmayo9H/golM5gmb85PfkiTXn6t3oUYmXbVCL218WjU7vQ6qSrjtd4adW1noOuFoeP0bbfOUDj6HdcaDDcsh4q3gLas9rDfVqMg3ujdnjvS9Q2s1e8MnBrDhnbktoadVfkQihSrX/jcXrullcUvcnqO4xz3dL0wOC3X7R7WGAqHOxgM+m6OQmMmP+OKl66N7faxpUfRIqhrtVv9Gn0zAVbXcTRYWmJxRo49sHMWgQBWnKJ4WdxuvzdUg8Sm/k7fdbUJFKXyd1170KoxbFiPB8fDo5yEMBD5alVIUb52p3vUVaNIETTqyBnUWMjmTqOeY3U0FlJiGTmukxW2uhLlIru2b5KHT+5m3Zl7tpmmqlkuNNWcrb21WfHiGjNudtfat/nrB+BGeP1JIWyKhzXhsBEG1rHAZnhYCw+3wfu632+K92vC/UYYv47Fb4b3a+H9bfBU99OmeFoTThthaB0LbYantfB0G7zQ/aIpXtSEi0YYUccimuFFLbzYBk90P2mKJzXhpBGG1LGQZnhSC0+2wDN0L7d82U5Bvl0J0yLlnlzuZnMd4gRoOhdAoFwmOOGa7Ik7JECme6Fkyv/RPSAuHLKp6YjytS6bAZY8qmcScEjiSOQkFCdjH0glLXY9k0CeW0zERRDm5yDlIfIz0foh5X5Mjff5dLOZSvz0NhkTuWEHcg+bnUSSn0apPoZOto45HaQrPpodogA2s0MIwWYQrfZplZ8QmoXNoNy6P7gfpnKA5GGMoRN5k3er8O/lpDE/DOSkyb/j/ay1zQgWa6Ns7ciDoa0eA/XG1eGB3Tqw7R9be69PVmfEJ8Y3xrfGd4ZtdIzXxhvj1LgwoDEzfjN+N/7Y/XX3z92/dv9+sD5+tBrztVG5dv/5Hw1/l0A=</latexit>z

<latexit sha1_base64="R6z8TG0QwliEYM/f2YnOnQGLmEk=">AAAPFXicfZfPbts2HMfV7l+XrVm7HXbYRVhQoBuCQGod/zkUqCXb6GFps6xpulVBQNG0IoQSCZJy7Ap6jb3Artsb7DjsuvMeYO8xSrZliZSsSxh+v/zqox9FmfQpDrmwrH/v3P3gw48+/uTep3uffX5//4sHD798w0nCIDqHBBP21gcc4TBG5yIUGL2lDIHIx+jCv3Fz/WKOGA9J/FosKbqMQBCHsxACIbuuHnztYRKY9MrzxTUS4LHHQUQx+u7qwYF1ZBWXqTfsdePAWF+nVw/v/+dNCUwiFAuIAefvbIuKyxQwEUKMsj0v4YgCeAMC9C4Rs/5lGsY0ESiGmflIarMEm4KYOaQ5DRmCAi9lA0AWygQTXgMGoJCPsleP4igGEeKH03lI+arJ58GqIYCsw2W6KOqU1QamAQP0OoSLGlkKIh4Bca118mXk1ztRghGbR/XOnFIyKs4FYjDkeQ1OZWFe0bz0/DU5XevXS3qNYp6lCcNZdaAUEGNoJgcWTY5EQtPiYeR83/BngiXoMG8Wfc9GgN2coemhzKl11HFmmABR7/KjvDgxuoUkikA8TT2apZ5AC5F6h0eZFB+ZPpZmnwA2rTvPsjT18pr5vnkmrTXxZUV8qYrjijhe34TgqTkjzJzL+SeMm9JoSgsLIeL10efl6Jl5rka/qYhvVPGiIl6oop9U1ERT5xV1rqm3FfVWVRcVcaGKy4q4VMX3FfG9Kr7dFfvzrthflFhZf7kolnveFM3k56R4hdIbmKUvXp/8kKWD4lq/Cwky7boR+hvj00m3N+hlqow3emfSt52RrpeGnjO03SZD6Rj2XGs03rI8UbwltGV1x8OuGgXxVu8P3Imub2GtYW/kNBi2tBO3M+6ty4dQrFiD0ucOuh2tLEGZM3Ac53ig66XB6bhu/0mDoXS4o9Fo6BYoNGHyO6546cbY7R5behQtg/pWtzNs0LcTYPUdR4OlFRZn4tgju2ARCGDFKcqXxe0PB2M1SGzr7wxdV5tAUSl/37VHnQbDlvV4dDx+WpAQBuJArQopy9ft9Z/21ShSBk16cgY1FrK902TgWD2NhVRYJo7r5IWtr0S5yN7Zl+nqk7tdd+aBbWaZapYLTTXna29jVry4wYzb3Y32Xf7mAbgVXn9SCNviYUM4bIWBTSywHR42wsNd8IHuD9rig4bwoBUmaGIJ2uGDRvhgFzzV/bQtnjaE01YY2sRC2+FpIzzdBS90v2iLFw3hohVGNLGIdnjRCC92wRPdT9riSUM4aYUhTSykHZ40wpMd8Azdyi1fvlOQb1fKtEi5J5e72UKHOAWazgUQqJAJTrkmr44due5Hkqn4R/eApHTIpqYjyje6bIZY8qieacghSWJRkFCcegGQSlbueqahPLeYiIswKk5GykMUh6LNQ8r9mBof8Nl2M5UG2VXqEblhB3IPm59E0p8mmT6GTneOOR1laz6aH6IANvNDCMFmGK/3abWfEJqH3UC5dV+5V1M5QvIwxtCJvMmrdfj3ctJYEIVy0uRf7zBv7TKCxcYoW3vyYGirx0C9cfHkyO4c2faPnYPnJ+sz4j3jG+Nb47FhGz3jufHCODXODWhkxm/G78Yf+7/u/7n/1/7fK+vdO+sxXxm1a/+f/wG8IJ3D</latexit>

log p✓(z)

<latexit sha1_base64="dSa//P/biN8T0p7UDtRAa3DR8qw=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxL/uHz8NFPf1o26VEccGFZ/z56/NHHn3z62ZPPd7748unus+cvvrrkJGYQXUCCCXvvAY5wEKELEQiM3lOGQOhhdOXN3Ey/miPGAxKdiyVFNyHwo2AaQCDk0NUY+jiZprfP96wDK79MvbBXxZ6xuk5vXzz9bzwhMA5RJCAGnF/bFhU3CWAigBilO+OYIwrgDPjoOhbT7k0SRDQWKIKp+VJq0xibgpgZkjkJGIICL2UBIAtkggnvAANQSPCdahRHEQgR35/MA8ofSj73HwoB5FPfJIu8K2llYuIzQO8CuKiQJSDkIRB32iBfhl51EMUYsXlYHcwoJaPiXCAGA5714FQ25h3NGs3PyelKv1vSOxTxNIkZTssTpYAYQ1M5MS85EjFN8oeRqzvjrwSL0X5W5mOvBoDNztBkX+ZUBqo4U0yAqA55YdacCN1DEoYgmiRjmiZjgRYiGe8fpFJ8aXpYmj0C2KTqPEuTZJz1zPPMM2mtiG9L4ltVHJbE4eomBE/MKWHmXK4/YdyURlNaWAARr86+KGZPzQs1+rIkXqriVUm8UkUvLqmxps5L6lxT70vqvaouSuJCFZclcamKH0riB1V8vy32522xvyixsv/ypVjujCdoKr888o9QMoNp8ub85Ic06eXX6rMQI9OuGqG3Nh6N2p1eJ1VlvNZbo67tDHS9MHScvu3WGQpHv+Nag+GG5VDxFtCW1R7222oUxBu923NHur6BtfqdgVNj2NCO3Naws2ofQpFi9Quf22u3tLb4RU7PcZzjnq4XBqflut3DGkPhcAeDQd/NUWjMKEaKl66N7faxpUfRIqhrtVv9Gn2zAFbXcTRYWmJxRo49sHMWgQBWnKL4sLjdfm+oBolN/52+62oLKErt77r2oFVj2LAeD46HRzkJYSDy1a6Qon3tTveoq0aRImjUkSuosZDNnUY9x+poLKTEMnJcJ2ts9U2UL9m1fZM8fOVu3jtzzzbTVDXLF001Z+/e2qx4cY0ZN7tr7dv89RNwI7z+pBA2xcOacNgIA+tYYDM8rIWH2+B93e83xfs14X4jjF/H4jfD+7Xw/jZ4qvtpUzytCaeNMLSOhTbD01p4ug1e6H7RFC9qwkUjjKhjEc3wohZebIMnup80xZOacNIIQ+pYSDM8qYUnW+AZupdbvmynkB0MmBYp9+RyN5vrECdA07kAAuUywQnXZE/cIQEy3QslU/6P7gFx4ZClpiPK17osAyx5VM8k4JDEkchJKE7GPpBKWux6JoE8t5iIiyDMz0HKQ4BQ/piuH1Lux9R4n083m6nET2+TMZEbdiD3sNlJJPlplOpz6GTrnNNBuuKj2SEKYDM7hBBsBtFqn1b5CaFZ2AzKrfuD+2EpB0gexhg6kTd5twr/Xi4a88NALpr8O97Pqm1GsFgbZbUjD4a2egzUi6vDA7t1YNs/tvZen6zOiE+Mb4xvje8M2+gYr403xqlxYUBjZvxm/G78sfvr7p+7f+3+/WB9/Gg152ujcu3+8z8UJZdS</latexit>

f

<latexit sha1_base64="8yXyhlDvM2p/v1QGHZPnlj0wrRU=">AAAPD3icfZfLbuM2GIU109s07aRJu+xGaDBAUQSBlDi+LAYYS7Yxi8lMmuYybRwEFE0rQiiRICnHHkHv0Bfotn2DLotu+wh9gL5HKcWWJVKyNvnDc3j86adkkx7FAReW9e+Tpx99/Mmnnz37fOuLL59vf7Wz+/UlJzGD6AISTNh7D3CEgwhdiEBg9J4yBEIPoyvv3s30qxliPCDRuVhQdBMCPwqmAQRCDt3u7I4JRQwIwiIQouSnN+ntzp51YOWXqRf2stgzltfp7e7z/8YTAuMQRQJiwPm1bVFxkwAmAohRujWOOaIA3gMfXcdi2r1JgojGAkUwNV9IbRpjUxAzwzMnAUNQ4IUsAGSBTDDhHWAACnkTW9UojjJmvj+ZBZQ/lnzmPxYCyA7cJPO8Q2llYuIzQO8COK+QJSDkIRB32iBfhF51EMUYsVlYHcwoJaPinCMGA5714FQ25h3Nms7PyelSv1vQOxTxNIkZTssTpYAYQ1M5MS85EjFN8puRK33PXwoWo/2szMdeDgC7P0OTfZlTGajiTDEBojrkhVlzIvQASRiCaJKMaZqMBZqLZLx/kErxhelhafYIYJOq8yxNknHWM88zz6S1Ir4tiW9VcVgSh8sPIXhiTgkzZ3L9CeOmNJrSwgKIeHX2RTF7al6o0Zcl8VIVr0rilSp6cUmNNXVWUmea+lBSH1R1XhLnqrgoiQtV/FASP6ji+02xP2+K/UWJlf2XL8ViazxBU/lFkj9CyT1Mk9fnJ2/SpJdfy2chRqZdNUJvZTwatTu9TqrKeKW3Rl3bGeh6Yeg4fdutMxSOfse1BsM1y6HiLaAtqz3st9UoiNd6t+eOdH0Na/U7A6fGsKYdua1hZ9k+hCLF6hc+t9duaW3xi5ye4zjHPV0vDE7LdbuHNYbC4Q4Gg76bo9CYUYwUL10Z2+1jS4+iRVDXarf6Nfp6Aayu42iwtMTijBx7YOcsAgGsOEXxsLjdfm+oBol1/52+62oLKErt77r2oFVjWLMeD46HRzkJYSDy1a6Qon3tTveoq0aRImjUkSuosZD1J416jtXRWEiJZeS4TtbY6psoX7Jr+yZ5/Mpdv3fmnm2mqWqWL5pqzt69lVnx4hozbnbX2jf56yfgRnj9TiFsioc14bARBtaxwGZ4WAsPN8H7ut9vivdrwv1GGL+OxW+G92vh/U3wVPfTpnhaE04bYWgdC22Gp7XwdBO80P2iKV7UhItGGFHHIprhRS282ARPdD9piic14aQRhtSxkGZ4UgtPNsAz9CC3fNlOQT5dCdMi5Z5c7mZzHeIEaDoXQKBcJjjhmuyJOyRApnuhZMr/0T0gLhyy1HRE+UqXZYAlj+qZBBySOBI5CcXJ2AdSSYtdzySQ5xYTcRGE+ZlIuQkQyh/T1U3K/Zga7/PpejOV+OltopyoRqk+h042zjkdpEs+mh2iADazQwjBZhAt92mVnxCahd1DuXV/dD8u5QDJwxhDJ/JD3i3Df5CLxvwwkIsm/473s2qTEcxXRlltyYOhrR4D9eLq8MBuHdj2j629VyfLM+Iz41vjO+N7wzY6xivjtXFqXBjQeDB+M343/tj+dfvP7b+2/360Pn2ynPONUbm2//kfl4Gb9w==</latexit>

SL

<latexit sha1_base64="xOiEwEJybV+SMihDCH1/bzGKjdI=">AAAQDnicnVfdbts2GLW7vy5b53a73I2woEM3BIGUOv65KFBLttGLtc26pukWGQFF04pgSiRIyrEr6Bn2Arvd3mCXw273CnuAvccoRZYlSjKw6SZfeM53ePSRlPk5FHtc6Prf7Tvvvf/Bhx/d/fjgk0/vfda5/+DzN5yEDKJzSDBhbx3AEfYCdC48gdFbyhDwHYwunKWV4BcrxLhHgtdiQ9HMB27gLTwIhBy6etDu2AFwMLiyHXGNBNBsH4hrx4km8VVEt6OPbA58itE38aUNXRwt4nxkpn395D8n2QsGYKTMXJNYJzbTbPvg/8ypvCcmbs2Us6v7h/qxnj5aNTCy4LCVPWdXD+79Y88JDH0UCIgB55eGTsUsAkx4EKP4wA45ogAugYsuQ7EYzCIvoKFAAYy1hxJbhFgTREtWR5t7DEGBNzIAkHlSQYPXQBZLyDU8KEtxFAAf8aP5yqP8NuQr9zYQ8l3RLFqnGyQuJUYuA/Tag+uSswj4PCloZZBvfKc8iEKM2MovDyYupUeFuUYMejypwZkszEua7Dn+mpxl+PWGXqOAx1HIcFxMlABiDC1kYhpyJEIapS8jN/qSPxEsREdJmI49GQO2fIXmR1KnNFC2s8AEiPKQ4yfFCdANJL4Pgnlk0ziyBVqLyD46jiX4UJO7Bi4dAti8zHwVR1G2CbVXkloCXxTAFyo4KYCTbBKC59qCMG0l158wrkmiJinMg4iXs8/z7IV2rkq/KYBvVPCiAF6ooBMW0LCCrgroqoLeFNAbFV0XwLUKbgrgRgXfFcB3Kvh2n+yP+2R/UmRl/eWh2BzYc7SQ39F0C0VLGEfPXj//Lo6G6ZPthRBpRpkInS3x8bTXH/ZjFcZbvDsdGOa4iueEvjkyrDpCzhj1LX082Xk5Ubi5aV3vTUY9VQriHT4YWtMqvjOrj/pjs4awczu1upN+Vj6EAoXq5jxr2OtWyuLmOkPTNE+HVTwnmF3LGpzUEHKGNR6PR1ZqhYZMfsgVLt0Se71TvSpFc6GB3uuOavDdAugD06yYpQUv5tQ0xkbqRSCAFabIN4s1GA0nqpDY1d8cWVZlAUWh/APLGHdrCDuvp+PTyePUCWEgcNWqkLx8vf7g8UCVIrnQtC9XsOKF7GaaDk29X/FCCl6mpmUmhS2fRHnILo1ZdPvJ3Z077dDQ4lgly4OmkpOztyUrXFxDxs3sWvo+fn0CbjRffVMIm+RhjThsNAPrvMBm87DWPNxn3q3y3SZ5t0bcbTTj1nlxm827tebdfeZplU+b5GmNOG00Q+u80GbztNY83WdeVPmiSV7UiItGM6LOi2g2L2rNi33mSZVPmuRJjThpNEPqvJBm86TWPNljnqEbeeVLbgpJI8EqkvJOLm+zKQ5xBCo4F0CgFCY44hX4tu9IcMeXntJ/qhwQ5gwZVnBE+RaXoYelH5Uz9zgkYSBSJxRHtgskEue3nrkn+xYNceH5aUuovETaFW1fUt7HVHmXL3aXqciVvZhN5IUdyDts0olEP0zjag6d7805G8eZP5o0UQBrSRNCsOYF2T2t9BNCE7GlbCcz9u1SjpFsxhh6Lid5mYl/KxeNub4nF03+tY+SaB8RrLdEGR3IxtBQ28BqcHFybHSPDeP77uHT51mPeLf1Zeur1qOW0eq3nraetc5a5y3YXrV/af/a/q3zc+f3zh+dP2+pd9pZzhet0tP561+BYfs3</latexit>

r✓Ep✓(z)[f(z)] = Ep✓(z)[f(z)r✓p✓(z)

p✓(z)]

= Ep✓(z)[f(z)r✓ log p✓(z)]

Recall the score function. We used it when deriving the REINFORCE estimator: The derivative of an expectation turns out to be equal to the expected value of the function times the gradient of the log-probability of the sample z. This is represented by the graph: The distribution is used to create the log-probability, which is multiplied with the function output to create the surrogate loss, which is named this way because it is like a fake loss.

The score function is very general: It can be used for any distribution and any function f! However, it has very, very high variance. So what exactly is variance?

Score function estimate:

Variance:

High variance →More samples needed →Unstable training

Deep RL lecture: Variance reduction for policy gradients!

<latexit sha1_base64="cbPFr7zuRCKtaLPWRAHzVEB6ZBQ=">AAAPYXicfZfNb9s2GMbV7qvL1ibtjr0ICwp0QxBIreOPQ4Faso0eljbL8tEtCgKKpmXBlMiRlGNX0B+4P2HXATvvut1GybYsUZJ1Cc3n4eOfXkoOX5dinwvD+PPBw88+/+LLrx59vffNt4+f7B88fXbFScQguoQEE/bRBRxhP0SXwhcYfaQMgcDF6Nqd2al+PUeM+yS8EEuKbgPghf7Eh0DIqbsD6Hh88kZ3oIfjSfLS4SCgGP3ghMDF4M5xxRQJoDuYeDrdfMxdR7rzewTG+vqz/OsHNba7g0Pj2MguvTow14NDbX2d3T19/LczJjAKUCggBpzfmAYVtzFgwocYJXtOxBEFcAY8dBOJSfc29kMaCRTCRH8htUmEdUH09Ib1sc8QFHgpBwAyXybocAoYgEKWZa8cxVEIAsSPxnOf8tWQz73VQMiKoNt4kdU8KS2MPQbo1IeLElkMAh4AMa1M8mXglidRhBGbB+XJlFIyKs4FYtDnaQ3OZGE+0HQb+QU5W+vTJZ2ikCdxxHBSXCgFxBiayIXZkCMR0Ti7GfnszPgbwSJ0lA6zuTcDwGbnaHwkc0oTZZwJJkCUp9wgLU6I7iEJAhCOY4cmsSPQQsTO0XEixRe6fLbgzCWAjcvO8ySOnbRmrqufS2tJfF8Q36visCAO119C8FifEKbP5f4TxnVp1KWF+RDx8urLfPVEv1SjrwrilSpeF8RrVXSjghpV1HlBnVfU+4J6r6qLgrhQxWVBXKrip4L4SRU/7or9dVfsb0qsrL98KZZ7zhhN5E9T9gjFM5jE7y5Of0riXnatn4UI6WbZCN2N8fWo3el1ElXGG7016prWoKrnho7VN+06Q+7od2xjMNyyvFK8ObRhtIf9thoF8Vbv9uxRVd/CGv3OwKoxbGlHdmvYWZcPoVCxernP7rVblbJ4eU7PsqyTXlXPDVbLtruvagy5wx4MBn07Q6ERk7/jipdujO32iVGNonlQ12i3+jX6dgOMrmVVYGmBxRpZ5sDMWAQCWHGK/GGxu/3eUA0S2/pbfduubKAolL9rm4NWjWHLejI4Gb7OSAgDoadWheTla3e6r7tqFMmDRh25gxUWsv2mUc8yOhUWUmAZWbaVFrb8JsqX7Ma8jVc/udv3Tj809SRRzfJFU83pu7cxK15cY8bN7lr7Ln/9AtwIX71TCJviYU04bISBdSywGR7WwsNd8F7V7zXFezXhXiOMV8fiNcN7tfDeLnha9dOmeFoTThthaB0LbYantfB0F7yo+kVTvKgJF40woo5FNMOLWnixC55U/aQpntSEk0YYUsdCmuFJLTzZAc/QvTzypSeFtN1glUh5Jpen2UyHOAYVnQsgUCYTHPOKvGo7Ut0NJFP2oeoBUe6Qw4qOKN/ocuhjyaN6xj6HJApFRkJx7HhAKkl+6hn7sm/RERd+kHVZyk1kTdHmJuV5TI2XDdn2MBV7yV3sEHlgB/IMm3Yi8S+jpLqGjneuORskaz6aNlEA62kTQrDuh+tzWulfCE3DZlAe3Vfu1VYOkGzGGDqVX/JhHf6j3DTmBb7cNPnXOUpHu4xgsTHK0Z5sDE21DawOrl8dm61j0/y5dfj2dN0jPtKea99rLzVT62hvtXfamXapQe0P7R/tX+2/J3/tP9o/2H+2sj58sF7znVa69p//D+BGuIw=</latexit>

gSF = f(z)r✓ log p✓(z), z ⇠ p✓(z)

VARIANCE

38

<latexit sha1_base64="zUrjBCKBODz+yF8ThFKxUxBltGA=">AAAPD3icfZdNb9s2HMbV7q3L1qzdjrsICwp0QxZIqeOXQ4Baso0eljbLmqRd5BoUTctCKJGgKMeuoI8wYJ9mx2HXnXbeB9l1GyXLskRJ1iUMn4ePfvqLlEmbYjfgmvb3vfsffPjRx588+HTvs88f7n/x6PGXVwEJGUSXkGDC3tggQNj10SV3OUZvKEPAszG6tm/NRL9eIBa4xH/NVxSNPeD47syFgIuuySPX8gCf23Z0Fd9YTjAbn246hvEkohPL5nPEQWxhNOM3VhB6k8g91eN3g6eJfeJ+X++/Wavjb98dW8x15nw8eXSgHWnppVYbetY4ULLrfPL44S/WlMDQQz6HGATBja5RPo4A4y7EKN6zwgBRAG+Bg25CPuuOI9enIUc+jNUnQpuFWOVETR5bnboMQY5XogEgc0WCCueAAchFcfbKUQHygYeCw+nCpcG6GSycdYMDUdlxtEwrH5cGRg4DdO7CZYksAl6QVKjSGaw8u9yJQozYwit3JpSCUXIuEYNukNTgXBTmFU1eZvCanGf6fEXnyA/iKGQ4Lg4UAmIMzcTAtBkgHtIofRgxg26DU85CdJg0077TAWC3F2h6KHJKHWWcGSaAl7tsLymOj+4g8TzgTyOLxpHF0ZJH1uFRLMQnqo2F2SaATcvOiziKslmlXghrSXxZEF/K4rAgDrObEDxVZ4SpC/H+CQtUYVSFhbkQBeXRl/nomXopR18VxCtZvC6I17JohwU1rKiLgrqoqHcF9U5WlwVxKYurgriSxfcF8b0svtkV+3ZX7M9SrKi/WBSrPWuKZuIDlU6h6BbG0YvXZz/EUS+9srkQIlUvG6G9MT4btTu9TizLeKO3Rl3dGFT13NAx+rpZZ8gd/Y6pDYZblmPJm0NrWnvYb8tREG/1bs8cVfUtrNbvDIwaw5Z2ZLaGnax8CPmS1cl9Zq/dqpTFyXN6hmGc9Kp6bjBaptk9rjHkDnMwGPTNFIWGjGIkeenG2G6faNUomgd1tXarX6NvX4DWNYwKLC2wGCNDH+gpC0cAS06eTxaz2+8N5SC+rb/RN83KC+SF8ndNfdCqMWxZTwYnw2cpCWHAd+SqkLx87U73WVeOInnQqCPeYIWFbO806hlap8JCCiwjwzSSwpZXolhkN/o4Wn9yt+tOPdDVOJbNYqHJ5mTtbcySF9eYcbO71r7LXz8AN8JXnxTCpnhYEw4bYWAdC2yGh7XwcBe8U/U7TfFOTbjTCOPUsTjN8E4tvLMLnlb9tCme1oTTRhhax0Kb4WktPN0Fz6t+3hTPa8J5IwyvY+HN8LwWnu+CJ1U/aYonNeGkEYbUsZBmeFILT3bAM3QntnzJTkHMrohVIsWeXOxmUx3iCFT0gAOOUpngKKjI2QFE6LYnmNJ/qh4Q5g7RlPWpG0AS+jy9C8WR5QChxPmOZuqKM4mKAu566TlKAgSe+KHcPIDYpcnx4mi03ShFjjg4WURsxoHYnyanjOinUZzdiyaHHYDV5LBAsOr62X6q9KmnSdgtFFvstXtd8gEShyaGzsRNXmXh34niMsdzRXHFX+swae0yguXGKFp74gCny8e1auP6+EhvHen6j62D52fZWe6B8rXyjfJU0ZWO8lx5oZwrlwpU/lL+Uf5V/tv/df+3/d/3/1hb79/LxnyllK79P/8HAhKayw==</latexit>

V[gSF] = Ep✓

"DX

i=1

(gSFi - Ep✓[gSFi])

2

#

Variance is the average squared distance from mean of an estimator., per dimension. In this case, it is the score function estimator. The D represents the dimensionality of the gradient, which is equal to the dimensionality of the parameters. Note that variance increases with dimensionality: With more dimensions, the variance usually increases.

High variance is pretty bad! It usually means that we will need way more samples to get a good estimate of the expected gradient. This also destabilizes the training loop: If the gradient has very high variance, the training algorithm has to deal with a lot of noise, and the training can jump everywhere!

This is in particular problematic in deep learning applications, because the dimensionality of the parameters is so large. And, as we mentioned, the variance scales with dimensionality…

In the next lecture on deep reinforcement learning, we will be looking into variance reduction techniques for policy gradient methods. This is essential to get them working on practice.

Page 14: Lecture 9: Reinforcement Learning Emile van Krieken · 2021. 1. 25. · Emile van Krieken Deep Learning 2020 Lecture 9: Reinforcement Learning dlvu.github.io Reinforcement learning

MULTIPLE DEPENDENCIES

39

~

<latexit sha1_base64="JYRotR/AW1We/S2YX7ncehvQmNA=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIjeOXQ4Faso0e1jbL8rbVQUDRtCKYEgmScuwKuu0L7Lp9gx2HXfdB9gH2PUbJtiyRknXJP3wePv7pT8kmPYoDLizr3wcPP/r4k08/e/T53hdfPt5/8vTZV5ecxAyiC0gwYdce4AgHEboQgcDomjIEQg+jK2/mZvrVHDEekOhcLCm6CYEfBdMAAiGHrsaeuEMC3D49sI6s/DL1wl4XB8b6Or199vi/8YTAOESRgBhw/t62qLhJABMBxCjdG8ccUQBnwEfvYzHt3iRBRGOBIpiaz6U2jbEpiJkhmZOAISjwUhYAskAmmPAOMACFBN+rRnEUgRDxw8k8oHxV8rm/KgSQd32TLPKupJWJic8AvQvgokKWgJCHQNxpg3wZetVBFGPE5mF1MKOUjIpzgRgMeNaDU9mYdzRrND8np2v9bknvUMTTJGY4LU+UAmIMTeXEvORIxDTJb0au7oy/FCxGh1mZj70cADY7Q5NDmVMZqOJMMQGiOuSFWXMidA9JGIJokoxpmowFWohkfHiUSvG56WFp9ghgk6rzLE2ScdYzzzPPpLUivi2Jb1VxWBKH6w8heGJOCTPncv0J46Y0mtLCAoh4dfZFMXtqXqjRlyXxUhWvSuKVKnpxSY01dV5S55p6X1LvVXVREhequCyJS1X8UBI/qOL1rtifd8X+osTK/suXYrk3nqCp/PLIH6FkBtPk9fmbH9Kkl1/rZyFGpl01Qm9jPB61O71Oqsp4o7dGXdsZ6Hph6Dh9260zFI5+x7UGwy3LC8VbQFtWe9hvq1EQb/Vuzx3p+hbW6ncGTo1hSztyW8POun0IRYrVL3xur93S2uIXOT3HcU56ul4YnJbrdl/UGAqHOxgM+m6OQmNGMVK8dGNst08sPYoWQV2r3erX6NsFsLqOo8HSEoszcuyBnbMIBLDiFMXD4nb7vaEaJLb9d/quqy2gKLW/69qDVo1hy3oyOBke5ySEgchXu0KK9rU73eOuGkWKoFFHrqDGQrafNOo5VkdjISWWkeM6WWOrb6J8yd7bN8nqK3f73pkHtpmmqlm+aKo5e/c2ZsWLa8y42V1r3+Wvn4Ab4fU7hbApHtaEw0YYWMcCm+FhLTzcBe/rfr8p3q8J9xth/DoWvxner4X3d8FT3U+b4mlNOG2EoXUstBme1sLTXfBC94umeFETLhphRB2LaIYXtfBiFzzR/aQpntSEk0YYUsdCmuFJLTzZAc/QvdzyZTsF+XQlTIuUe3K5m811iBOg6VwAgXKZ4IRr8uq0keleKJnyf3QPiAuHLDUdUb7RZRlgyaN6JgGHJI5ETkJxMvaBVNJi1zMJ5LnFRFwEYX4OUm4ChPLHdHOTcienxvt8ut1MJX56m4yJ3LADuYfNTiLJT6NUn0MnO+ecDtI1H80OUQCb2SGEYDOI1vu0yk8IzcJmUG7dV+7VUg6QPIwx9EZ+yLt1+Pdy0ZgfBnLR5N/xYVbtMoLFxiirPXkwtNVjoF5cvTiyW0e2/WPr4FV7fUZ8ZHxjfGt8Z9hGx3hlvDZOjQsDGjPjN+N344/9X/f/3P9r/++V9eGD9Zyvjcq1/8//cW+XHQ==</latexit>

<latexit sha1_base64="T8FmzHO54kPteUaY1WhvZp8nAKY=">AAAPBXicfZdNb9s2HMbV7q3L1qzdjrsICwoMQxBIieOXQ4Faso0e1jbL8tbFQUDRtCKEEgmScuwKuu4L7Lp9gx2HXfc59gH2PUbJtiyRknXJP3wePv7pT8kmPYoDLizr30ePP/r4k08/e/L5zhdfPt396tnzry84iRlE55Bgwq48wBEOInQuAoHRFWUIhB5Gl969m+mXM8R4QKIzsaDoJgR+FEwDCIQcek9vx564QwLcPtuzDqz8MvXCXhV7xuo6uX3+9L/xhMA4RJGAGHB+bVtU3CSAiQBilO6MY44ogPfAR9exmHZvkiCisUARTM0XUpvG2BTEzKDMScAQFHghCwBZIBNMeAcYgEKi71SjOIpAiPj+ZBZQviz5zF8WAsj7vknmeV/SysTEZ4DeBXBeIUtAyEMg7rRBvgi96iCKMWKzsDqYUUpGxTlHDAY868GJbMw7mrWan5GTlX63oHco4mkSM5yWJ0oBMYamcmJeciRimuQ3I9f3nr8ULEb7WZmPvRwAdn+KJvsypzJQxZliAkR1yAuz5kToAZIwBNEkGdM0GQs0F8l4/yCV4gvTw9LsEcAmVedpmiTjrGeeZ55Ka0V8WxLfquKwJA5XH0LwxJwSZs7k+hPGTWk0pYUFEPHq7PNi9tQ8V6MvSuKFKl6WxEtV9OKSGmvqrKTONPWhpD6o6rwkzlVxURIXqvihJH5Qxattse+3xf6ixMr+y5disTOeoKn8+sgfoeQepsnrszc/pkkvv1bPQoxMu2qE3tp4NGp3ep1UlfFab426tjPQ9cLQcfq2W2coHP2Oaw2GG5ZDxVtAW1Z72G+rURBv9G7PHen6BtbqdwZOjWFDO3Jbw86qfQhFitUvfG6v3dLa4hc5Pcdxjnu6Xhiclut2D2sMhcMdDAZ9N0ehMaMYKV66Nrbbx5YeRYugrtVu9Wv0zQJYXcfRYGmJxRk59sDOWQQCWHGK4mFxu/3eUA0Sm/47fdfVFlCU2t917UGrxrBhPR4cD49yEsJA5KtdIUX72p3uUVeNIkXQqCNXUGMhm08a9Ryro7GQEsvIcZ2ssdU3Ub5k1/ZNsvzK3bx35p5tpqlqli+aas7evbVZ8eIaM25219q3+esn4EZ4/U4hbIqHNeGwEQbWscBmeFgLD7fB+7rfb4r3a8L9Rhi/jsVvhvdr4f1t8FT306Z4WhNOG2FoHQtthqe18HQbvND9oile1ISLRhhRxyKa4UUtvNgGT3Q/aYonNeGkEYbUsZBmeFILT7bAM/Qgt3zZTkE+XQnTIuWeXO5mcx3iBGg6F0CgXCY44Zq8PG1kuhdKpvwf3QPiwiFLTUeUr3VZBljyqJ5JwCGJI5GTUJyMfSCVtNj1TAJ5bjERF0GYn4SUmwCh/DFd36TcyanxPp9uNlOJn94mYyI37EDuYbOTSPLzKNXn0MnWOSeDdMVHs0MUwGZ2CCHYDKLVPq3yE0KzsHsot+5L93IpB0gexhh6Iz/k3Sr8B7lozA8DuWjy73g/q7YZwXxtlNWOPBja6jFQLy4PD+zWgW3/1Np71V6dEZ8Y3xrfGd8bttExXhmvjRPj3IBGaPxm/G78sfvr7p+7f+3+vbQ+frSa841RuXb/+R+p/5gA</latexit>p✓

<latexit sha1_base64="ytaGjQ3HxJR6GVJlezzzjEx+HxU=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxLGD4PH/30pyiTHsUBF5b176PHH338yaefPfl854svn+4+e/7iq0tOYgbRBSSYsPce4AgHEboQgcDoPWUIhB5GV97MzfSrOWI8ING5WFJ0EwI/CqYBBEJ2XY05CClGt8/3rAMrv0y9Ya8ae8bqOr198fS/8YTAOESRgBhwfm1bVNwkgIkAYpTujGOOKIAz4KPrWEy7N0kQ0VigCKbmS6lNY2wKYmZI5iRgCAq8lA0AWSATTHgHGIBCgu9UoziKQIj4/mQeUP7Q5HP/oSGAfOqbZJFXJa0MTHwG6F0AFxWyBIQ8BOJO6+TL0Kt2ohgjNg+rnRmlZFScC8RgwLManMrCvKNZofk5OV3pd0t6hyKeJjHDaXmgFBBjaCoH5k2OREyT/GHk7M74K8FitJ81875XA8BmZ2iyL3MqHVWcKSZAVLu8MCtOhO4hCUMQTZIxTZOxQAuRjPcPUim+ND0szR4BbFJ1nqVJMs5q5nnmmbRWxLcl8a0qDkvicHUTgifmlDBzLuefMG5KoyktLICIV0dfFKOn5oUafVkSL1XxqiReqaIXl9RYU+clda6p9yX1XlUXJXGhisuSuFTFDyXxgyq+3xb787bYX5RYWX+5KJY74wmayo9H/golM5gmb85PfkiTXn6t3oUYmXbVCL218WjU7vQ6qSrjtd4adW1noOuFoeP0bbfOUDj6HdcaDDcsh4q3gLas9rDfVqMg3ujdnjvS9Q2s1e8MnBrDhnbktoadVfkQihSrX/jcXrullcUvcnqO4xz3dL0wOC3X7R7WGAqHOxgM+m6OQmMmP+OKl66N7faxpUfRIqhrtVv9Gn0zAVbXcTRYWmJxRo49sHMWgQBWnKJ4WdxuvzdUg8Sm/k7fdbUJFKXyd1170KoxbFiPB8fDo5yEMBD5alVIUb52p3vUVaNIETTqyBnUWMjmTqOeY3U0FlJiGTmukxW2uhLlIru2b5KHT+5m3Zl7tpmmqlkuNNWcrb21WfHiGjNudtfat/nrB+BGeP1JIWyKhzXhsBEG1rHAZnhYCw+3wfu632+K92vC/UYYv47Fb4b3a+H9bfBU99OmeFoTThthaB0LbYantfB0G7zQ/aIpXtSEi0YYUccimuFFLbzYBk90P2mKJzXhpBGG1LGQZnhSC0+2wDN0L7d82U5Bvl0J0yLlnlzuZnMd4gRoOhdAoFwmOOGa7Ik7JECme6Fkyv/RPSAuHLKp6YjytS6bAZY8qmcScEjiSOQkFCdjH0glLXY9k0CeW0zERRDm5yDlIfIz0foh5X5Mjff5dLOZSvz0NhkTuWEHcg+bnUSSn0apPoZOto45HaQrPpodogA2s0MIwWYQrfZplZ8QmoXNoNy6P7gfpnKA5GGMoRN5k3er8O/lpDE/DOSkyb/j/ay1zQgWa6Ns7ciDoa0eA/XG1eGB3Tqw7R9be69PVmfEJ8Y3xrfGd4ZtdIzXxhvj1LgwoDEzfjN+N/7Y/XX3z92/dv9+sD5+tBrztVG5dv/5Hw1/l0A=</latexit>z<latexit sha1_base64="dSa//P/biN8T0p7UDtRAa3DR8qw=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxL/uHz8NFPf1o26VEccGFZ/z56/NHHn3z62ZPPd7748unus+cvvrrkJGYQXUCCCXvvAY5wEKELEQiM3lOGQOhhdOXN3Ey/miPGAxKdiyVFNyHwo2AaQCDk0NUY+jiZprfP96wDK79MvbBXxZ6xuk5vXzz9bzwhMA5RJCAGnF/bFhU3CWAigBilO+OYIwrgDPjoOhbT7k0SRDQWKIKp+VJq0xibgpgZkjkJGIICL2UBIAtkggnvAANQSPCdahRHEQgR35/MA8ofSj73HwoB5FPfJIu8K2llYuIzQO8CuKiQJSDkIRB32iBfhl51EMUYsXlYHcwoJaPiXCAGA5714FQ25h3NGs3PyelKv1vSOxTxNIkZTssTpYAYQ1M5MS85EjFN8oeRqzvjrwSL0X5W5mOvBoDNztBkX+ZUBqo4U0yAqA55YdacCN1DEoYgmiRjmiZjgRYiGe8fpFJ8aXpYmj0C2KTqPEuTZJz1zPPMM2mtiG9L4ltVHJbE4eomBE/MKWHmXK4/YdyURlNaWAARr86+KGZPzQs1+rIkXqriVUm8UkUvLqmxps5L6lxT70vqvaouSuJCFZclcamKH0riB1V8vy32522xvyixsv/ypVjujCdoKr888o9QMoNp8ub85Ic06eXX6rMQI9OuGqG3Nh6N2p1eJ1VlvNZbo67tDHS9MHScvu3WGQpHv+Nag+GG5VDxFtCW1R7222oUxBu923NHur6BtfqdgVNj2NCO3Naws2ofQpFi9Quf22u3tLb4RU7PcZzjnq4XBqflut3DGkPhcAeDQd/NUWjMKEaKl66N7faxpUfRIqhrtVv9Gn2zAFbXcTRYWmJxRo49sHMWgQBWnKL4sLjdfm+oBolN/52+62oLKErt77r2oFVj2LAeD46HRzkJYSDy1a6Qon3tTveoq0aRImjUkSuosZDNnUY9x+poLKTEMnJcJ2ts9U2UL9m1fZM8fOVu3jtzzzbTVDXLF001Z+/e2qx4cY0ZN7tr7dv89RNwI7z+pBA2xcOacNgIA+tYYDM8rIWH2+B93e83xfs14X4jjF/H4jfD+7Xw/jZ4qvtpUzytCaeNMLSOhTbD01p4ug1e6H7RFC9qwkUjjKhjEc3wohZebIMnup80xZOacNIIQ+pYSDM8qYUnW+AZupdbvmynkB0MmBYp9+RyN5vrECdA07kAAuUywQnXZE/cIQEy3QslU/6P7gFx4ZClpiPK17osAyx5VM8k4JDEkchJKE7GPpBKWux6JoE8t5iIiyDMz0HKQ4BQ/piuH1Lux9R4n083m6nET2+TMZEbdiD3sNlJJPlplOpz6GTrnNNBuuKj2SEKYDM7hBBsBtFqn1b5CaFZ2AzKrfuD+2EpB0gexhg6kTd5twr/Xi4a88NALpr8O97Pqm1GsFgbZbUjD4a2egzUi6vDA7t1YNs/tvZen6zOiE+Mb4xvje8M2+gYr403xqlxYUBjZvxm/G78sfvr7p+7f+3+/WB9/Gg152ujcu3+8z8UJZdS</latexit>

f

depends on directly and through

<latexit sha1_base64="dSa//P/biN8T0p7UDtRAa3DR8qw=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxL/uHz8NFPf1o26VEccGFZ/z56/NHHn3z62ZPPd7748unus+cvvrrkJGYQXUCCCXvvAY5wEKELEQiM3lOGQOhhdOXN3Ey/miPGAxKdiyVFNyHwo2AaQCDk0NUY+jiZprfP96wDK79MvbBXxZ6xuk5vXzz9bzwhMA5RJCAGnF/bFhU3CWAigBilO+OYIwrgDPjoOhbT7k0SRDQWKIKp+VJq0xibgpgZkjkJGIICL2UBIAtkggnvAANQSPCdahRHEQgR35/MA8ofSj73HwoB5FPfJIu8K2llYuIzQO8CuKiQJSDkIRB32iBfhl51EMUYsXlYHcwoJaPiXCAGA5714FQ25h3NGs3PyelKv1vSOxTxNIkZTssTpYAYQ1M5MS85EjFN8oeRqzvjrwSL0X5W5mOvBoDNztBkX+ZUBqo4U0yAqA55YdacCN1DEoYgmiRjmiZjgRYiGe8fpFJ8aXpYmj0C2KTqPEuTZJz1zPPMM2mtiG9L4ltVHJbE4eomBE/MKWHmXK4/YdyURlNaWAARr86+KGZPzQs1+rIkXqriVUm8UkUvLqmxps5L6lxT70vqvaouSuJCFZclcamKH0riB1V8vy32522xvyixsv/ypVjujCdoKr888o9QMoNp8ub85Ic06eXX6rMQI9OuGqG3Nh6N2p1eJ1VlvNZbo67tDHS9MHScvu3WGQpHv+Nag+GG5VDxFtCW1R7222oUxBu923NHur6BtfqdgVNj2NCO3Naws2ofQpFi9Quf22u3tLb4RU7PcZzjnq4XBqflut3DGkPhcAeDQd/NUWjMKEaKl66N7faxpUfRIqhrtVv9Gn2zAFbXcTRYWmJxRo49sHMWgQBWnKL4sLjdfm+oBolN/52+62oLKErt77r2oFVj2LAeD46HRzkJYSDy1a6Qon3tTveoq0aRImjUkSuosZDNnUY9x+poLKTEMnJcJ2ts9U2UL9m1fZM8fOVu3jtzzzbTVDXLF001Z+/e2qx4cY0ZN7tr7dv89RNwI7z+pBA2xcOacNgIA+tYYDM8rIWH2+B93e83xfs14X4jjF/H4jfD+7Xw/jZ4qvtpUzytCaeNMLSOhTbD01p4ug1e6H7RFC9qwkUjjKhjEc3wohZebIMnup80xZOacNIIQ+pYSDM8qYUnW+AZupdbvmynkB0MmBYp9+RyN5vrECdA07kAAuUywQnXZE/cIQEy3QslU/6P7gFx4ZClpiPK17osAyx5VM8k4JDEkchJKE7GPpBKWux6JoE8t5iIiyDMz0HKQ4BQ/piuH1Lux9R4n083m6nET2+TMZEbdiD3sNlJJPlplOpz6GTrnNNBuuKj2SEKYDM7hBBsBtFqn1b5CaFZ2AzKrfuD+2EpB0gexhg6kTd5twr/Xi4a88NALpr8O97Pqm1GsFgbZbUjD4a2egzUi6vDA7t1YNs/tvZen6zOiE+Mb4xvje8M2+gYr403xqlxYUBjZvxm/G78sfvr7p+7f+3+/WB9/Gg152ujcu3+8z8UJZdS</latexit>

f<latexit sha1_base64="JYRotR/AW1We/S2YX7ncehvQmNA=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIjeOXQ4Faso0e1jbL8rbVQUDRtCKYEgmScuwKuu0L7Lp9gx2HXfdB9gH2PUbJtiyRknXJP3wePv7pT8kmPYoDLizr3wcPP/r4k08/e/T53hdfPt5/8vTZV5ecxAyiC0gwYdce4AgHEboQgcDomjIEQg+jK2/mZvrVHDEekOhcLCm6CYEfBdMAAiGHrsaeuEMC3D49sI6s/DL1wl4XB8b6Or199vi/8YTAOESRgBhw/t62qLhJABMBxCjdG8ccUQBnwEfvYzHt3iRBRGOBIpiaz6U2jbEpiJkhmZOAISjwUhYAskAmmPAOMACFBN+rRnEUgRDxw8k8oHxV8rm/KgSQd32TLPKupJWJic8AvQvgokKWgJCHQNxpg3wZetVBFGPE5mF1MKOUjIpzgRgMeNaDU9mYdzRrND8np2v9bknvUMTTJGY4LU+UAmIMTeXEvORIxDTJb0au7oy/FCxGh1mZj70cADY7Q5NDmVMZqOJMMQGiOuSFWXMidA9JGIJokoxpmowFWohkfHiUSvG56WFp9ghgk6rzLE2ScdYzzzPPpLUivi2Jb1VxWBKH6w8heGJOCTPncv0J46Y0mtLCAoh4dfZFMXtqXqjRlyXxUhWvSuKVKnpxSY01dV5S55p6X1LvVXVREhequCyJS1X8UBI/qOL1rtifd8X+osTK/suXYrk3nqCp/PLIH6FkBtPk9fmbH9Kkl1/rZyFGpl01Qm9jPB61O71Oqsp4o7dGXdsZ6Hph6Dh9260zFI5+x7UGwy3LC8VbQFtWe9hvq1EQb/Vuzx3p+hbW6ncGTo1hSztyW8POun0IRYrVL3xur93S2uIXOT3HcU56ul4YnJbrdl/UGAqHOxgM+m6OQmNGMVK8dGNst08sPYoWQV2r3erX6NsFsLqOo8HSEoszcuyBnbMIBLDiFMXD4nb7vaEaJLb9d/quqy2gKLW/69qDVo1hy3oyOBke5ySEgchXu0KK9rU73eOuGkWKoFFHrqDGQrafNOo5VkdjISWWkeM6WWOrb6J8yd7bN8nqK3f73pkHtpmmqlm+aKo5e/c2ZsWLa8y42V1r3+Wvn4Ab4fU7hbApHtaEw0YYWMcCm+FhLTzcBe/rfr8p3q8J9xth/DoWvxner4X3d8FT3U+b4mlNOG2EoXUstBme1sLTXfBC94umeFETLhphRB2LaIYXtfBiFzzR/aQpntSEk0YYUsdCmuFJLTzZAc/QvdzyZTsF+XQlTIuUe3K5m811iBOg6VwAgXKZ4IRr8uq0keleKJnyf3QPiAuHLDUdUb7RZRlgyaN6JgGHJI5ETkJxMvaBVNJi1zMJ5LnFRFwEYX4OUm4ChPLHdHOTcienxvt8ut1MJX56m4yJ3LADuYfNTiLJT6NUn0MnO+ecDtI1H80OUQCb2SGEYDOI1vu0yk8IzcJmUG7dV+7VUg6QPIwx9EZ+yLt1+Pdy0ZgfBnLR5N/xYVbtMoLFxiirPXkwtNVjoF5cvTiyW0e2/WPr4FV7fUZ8ZHxjfGt8Z9hGx3hlvDZOjQsDGjPjN+N344/9X/f/3P9r/++V9eGD9Zyvjcq1/8//cW+XHQ==</latexit>

✓<latexit sha1_base64="ytaGjQ3HxJR6GVJlezzzjEx+HxU=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxLGD4PH/30pyiTHsUBF5b176PHH338yaefPfl854svn+4+e/7iq0tOYgbRBSSYsPce4AgHEboQgcDoPWUIhB5GV97MzfSrOWI8ING5WFJ0EwI/CqYBBEJ2XY05CClGt8/3rAMrv0y9Ya8ae8bqOr198fS/8YTAOESRgBhwfm1bVNwkgIkAYpTujGOOKIAz4KPrWEy7N0kQ0VigCKbmS6lNY2wKYmZI5iRgCAq8lA0AWSATTHgHGIBCgu9UoziKQIj4/mQeUP7Q5HP/oSGAfOqbZJFXJa0MTHwG6F0AFxWyBIQ8BOJO6+TL0Kt2ohgjNg+rnRmlZFScC8RgwLManMrCvKNZofk5OV3pd0t6hyKeJjHDaXmgFBBjaCoH5k2OREyT/GHk7M74K8FitJ81875XA8BmZ2iyL3MqHVWcKSZAVLu8MCtOhO4hCUMQTZIxTZOxQAuRjPcPUim+ND0szR4BbFJ1nqVJMs5q5nnmmbRWxLcl8a0qDkvicHUTgifmlDBzLuefMG5KoyktLICIV0dfFKOn5oUafVkSL1XxqiReqaIXl9RYU+clda6p9yX1XlUXJXGhisuSuFTFDyXxgyq+3xb787bYX5RYWX+5KJY74wmayo9H/golM5gmb85PfkiTXn6t3oUYmXbVCL218WjU7vQ6qSrjtd4adW1noOuFoeP0bbfOUDj6HdcaDDcsh4q3gLas9rDfVqMg3ujdnjvS9Q2s1e8MnBrDhnbktoadVfkQihSrX/jcXrullcUvcnqO4xz3dL0wOC3X7R7WGAqHOxgM+m6OQmMmP+OKl66N7faxpUfRIqhrtVv9Gn0zAVbXcTRYWmJxRo49sHMWgQBWnKJ4WdxuvzdUg8Sm/k7fdbUJFKXyd1170KoxbFiPB8fDo5yEMBD5alVIUb52p3vUVaNIETTqyBnUWMjmTqOeY3U0FlJiGTmukxW2uhLlIru2b5KHT+5m3Zl7tpmmqlkuNNWcrb21WfHiGjNudtfat/nrB+BGeP1JIWyKhzXhsBEG1rHAZnhYCw+3wfu632+K92vC/UYYv47Fb4b3a+H9bfBU99OmeFoTThthaB0LbYantfB0G7zQ/aIpXtSEi0YYUccimuFFLbzYBk90P2mKJzXhpBGG1LGQZnhSC0+2wDN0L7d82U5Bvl0J0yLlnlzuZnMd4gRoOhdAoFwmOOGa7Ik7JECme6Fkyv/RPSAuHLKp6YjytS6bAZY8qmcScEjiSOQkFCdjH0glLXY9k0CeW0zERRDm5yDlIfIz0foh5X5Mjff5dLOZSvz0NhkTuWEHcg+bnUSSn0apPoZOto45HaQrPpodogA2s0MIwWYQrfZplZ8QmoXNoNy6P7gfpnKA5GGMoRN5k3er8O/lpDE/DOSkyb/j/ay1zQgWa6Ns7ciDoa0eA/XG1eGB3Tqw7R9be69PVmfEJ8Y3xrfGd4ZtdIzXxhvj1LgwoDEzfjN+N/7Y/XX3z92/dv9+sD5+tBrztVG5dv/5Hw1/l0A=</latexit>z

Now, consider gradient estimation on the following graph. Unlike before, the parameter \theta isn’t just used to compute the probability p_\theta, but also used in the computation of the function f, that is, f(z, \theta). Furthermore, the function f is a differentiable function now!

The previous gradient will not do here: It’d not take into account how the parameter \theta influences f directly.

MULTIPLE DEPENDENCIES

40

~

<latexit sha1_base64="JYRotR/AW1We/S2YX7ncehvQmNA=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIjeOXQ4Faso0e1jbL8rbVQUDRtCKYEgmScuwKuu0L7Lp9gx2HXfdB9gH2PUbJtiyRknXJP3wePv7pT8kmPYoDLizr3wcPP/r4k08/e/T53hdfPt5/8vTZV5ecxAyiC0gwYdce4AgHEboQgcDomjIEQg+jK2/mZvrVHDEekOhcLCm6CYEfBdMAAiGHrsaeuEMC3D49sI6s/DL1wl4XB8b6Or199vi/8YTAOESRgBhw/t62qLhJABMBxCjdG8ccUQBnwEfvYzHt3iRBRGOBIpiaz6U2jbEpiJkhmZOAISjwUhYAskAmmPAOMACFBN+rRnEUgRDxw8k8oHxV8rm/KgSQd32TLPKupJWJic8AvQvgokKWgJCHQNxpg3wZetVBFGPE5mF1MKOUjIpzgRgMeNaDU9mYdzRrND8np2v9bknvUMTTJGY4LU+UAmIMTeXEvORIxDTJb0au7oy/FCxGh1mZj70cADY7Q5NDmVMZqOJMMQGiOuSFWXMidA9JGIJokoxpmowFWohkfHiUSvG56WFp9ghgk6rzLE2ScdYzzzPPpLUivi2Jb1VxWBKH6w8heGJOCTPncv0J46Y0mtLCAoh4dfZFMXtqXqjRlyXxUhWvSuKVKnpxSY01dV5S55p6X1LvVXVREhequCyJS1X8UBI/qOL1rtifd8X+osTK/suXYrk3nqCp/PLIH6FkBtPk9fmbH9Kkl1/rZyFGpl01Qm9jPB61O71Oqsp4o7dGXdsZ6Hph6Dh9260zFI5+x7UGwy3LC8VbQFtWe9hvq1EQb/Vuzx3p+hbW6ncGTo1hSztyW8POun0IRYrVL3xur93S2uIXOT3HcU56ul4YnJbrdl/UGAqHOxgM+m6OQmNGMVK8dGNst08sPYoWQV2r3erX6NsFsLqOo8HSEoszcuyBnbMIBLDiFMXD4nb7vaEaJLb9d/quqy2gKLW/69qDVo1hy3oyOBke5ySEgchXu0KK9rU73eOuGkWKoFFHrqDGQrafNOo5VkdjISWWkeM6WWOrb6J8yd7bN8nqK3f73pkHtpmmqlm+aKo5e/c2ZsWLa8y42V1r3+Wvn4Ab4fU7hbApHtaEw0YYWMcCm+FhLTzcBe/rfr8p3q8J9xth/DoWvxner4X3d8FT3U+b4mlNOG2EoXUstBme1sLTXfBC94umeFETLhphRB2LaIYXtfBiFzzR/aQpntSEk0YYUsdCmuFJLTzZAc/QvdzyZTsF+XQlTIuUe3K5m811iBOg6VwAgXKZ4IRr8uq0keleKJnyf3QPiAuHLDUdUb7RZRlgyaN6JgGHJI5ETkJxMvaBVNJi1zMJ5LnFRFwEYX4OUm4ChPLHdHOTcienxvt8ut1MJX56m4yJ3LADuYfNTiLJT6NUn0MnO+ecDtI1H80OUQCb2SGEYDOI1vu0yk8IzcJmUG7dV+7VUg6QPIwx9EZ+yLt1+Pdy0ZgfBnLR5N/xYVbtMoLFxiirPXkwtNVjoF5cvTiyW0e2/WPr4FV7fUZ8ZHxjfGt8Z9hGx3hlvDZOjQsDGjPjN+N344/9X/f/3P9r/++V9eGD9Zyvjcq1/8//cW+XHQ==</latexit>

<latexit sha1_base64="T8FmzHO54kPteUaY1WhvZp8nAKY=">AAAPBXicfZdNb9s2HMbV7q3L1qzdjrsICwoMQxBIieOXQ4Faso0e1jbL8tbFQUDRtCKEEgmScuwKuu4L7Lp9gx2HXfc59gH2PUbJtiyRknXJP3wePv7pT8kmPYoDLizr30ePP/r4k08/e/L5zhdfPt396tnzry84iRlE55Bgwq48wBEOInQuAoHRFWUIhB5Gl969m+mXM8R4QKIzsaDoJgR+FEwDCIQcek9vx564QwLcPtuzDqz8MvXCXhV7xuo6uX3+9L/xhMA4RJGAGHB+bVtU3CSAiQBilO6MY44ogPfAR9exmHZvkiCisUARTM0XUpvG2BTEzKDMScAQFHghCwBZIBNMeAcYgEKi71SjOIpAiPj+ZBZQviz5zF8WAsj7vknmeV/SysTEZ4DeBXBeIUtAyEMg7rRBvgi96iCKMWKzsDqYUUpGxTlHDAY868GJbMw7mrWan5GTlX63oHco4mkSM5yWJ0oBMYamcmJeciRimuQ3I9f3nr8ULEb7WZmPvRwAdn+KJvsypzJQxZliAkR1yAuz5kToAZIwBNEkGdM0GQs0F8l4/yCV4gvTw9LsEcAmVedpmiTjrGeeZ55Ka0V8WxLfquKwJA5XH0LwxJwSZs7k+hPGTWk0pYUFEPHq7PNi9tQ8V6MvSuKFKl6WxEtV9OKSGmvqrKTONPWhpD6o6rwkzlVxURIXqvihJH5Qxattse+3xf6ixMr+y5disTOeoKn8+sgfoeQepsnrszc/pkkvv1bPQoxMu2qE3tp4NGp3ep1UlfFab426tjPQ9cLQcfq2W2coHP2Oaw2GG5ZDxVtAW1Z72G+rURBv9G7PHen6BtbqdwZOjWFDO3Jbw86qfQhFitUvfG6v3dLa4hc5Pcdxjnu6Xhiclut2D2sMhcMdDAZ9N0ehMaMYKV66Nrbbx5YeRYugrtVu9Wv0zQJYXcfRYGmJxRk59sDOWQQCWHGK4mFxu/3eUA0Sm/47fdfVFlCU2t917UGrxrBhPR4cD49yEsJA5KtdIUX72p3uUVeNIkXQqCNXUGMhm08a9Ryro7GQEsvIcZ2ssdU3Ub5k1/ZNsvzK3bx35p5tpqlqli+aas7evbVZ8eIaM25219q3+esn4EZ4/U4hbIqHNeGwEQbWscBmeFgLD7fB+7rfb4r3a8L9Rhi/jsVvhvdr4f1t8FT306Z4WhNOG2FoHQtthqe18HQbvND9oile1ISLRhhRxyKa4UUtvNgGT3Q/aYonNeGkEYbUsZBmeFILT7bAM/Qgt3zZTkE+XQnTIuWeXO5mcx3iBGg6F0CgXCY44Zq8PG1kuhdKpvwf3QPiwiFLTUeUr3VZBljyqJ5JwCGJI5GTUJyMfSCVtNj1TAJ5bjERF0GYn4SUmwCh/DFd36TcyanxPp9uNlOJn94mYyI37EDuYbOTSPLzKNXn0MnWOSeDdMVHs0MUwGZ2CCHYDKLVPq3yE0KzsHsot+5L93IpB0gexhh6Iz/k3Sr8B7lozA8DuWjy73g/q7YZwXxtlNWOPBja6jFQLy4PD+zWgW3/1Np71V6dEZ8Y3xrfGd8bttExXhmvjRPj3IBGaPxm/G78sfvr7p+7f+3+vbQ+frSa841RuXb/+R+p/5gA</latexit>p✓

×

<latexit sha1_base64="ytaGjQ3HxJR6GVJlezzzjEx+HxU=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxLGD4PH/30pyiTHsUBF5b176PHH338yaefPfl854svn+4+e/7iq0tOYgbRBSSYsPce4AgHEboQgcDoPWUIhB5GV97MzfSrOWI8ING5WFJ0EwI/CqYBBEJ2XY05CClGt8/3rAMrv0y9Ya8ae8bqOr198fS/8YTAOESRgBhwfm1bVNwkgIkAYpTujGOOKIAz4KPrWEy7N0kQ0VigCKbmS6lNY2wKYmZI5iRgCAq8lA0AWSATTHgHGIBCgu9UoziKQIj4/mQeUP7Q5HP/oSGAfOqbZJFXJa0MTHwG6F0AFxWyBIQ8BOJO6+TL0Kt2ohgjNg+rnRmlZFScC8RgwLManMrCvKNZofk5OV3pd0t6hyKeJjHDaXmgFBBjaCoH5k2OREyT/GHk7M74K8FitJ81875XA8BmZ2iyL3MqHVWcKSZAVLu8MCtOhO4hCUMQTZIxTZOxQAuRjPcPUim+ND0szR4BbFJ1nqVJMs5q5nnmmbRWxLcl8a0qDkvicHUTgifmlDBzLuefMG5KoyktLICIV0dfFKOn5oUafVkSL1XxqiReqaIXl9RYU+clda6p9yX1XlUXJXGhisuSuFTFDyXxgyq+3xb787bYX5RYWX+5KJY74wmayo9H/golM5gmb85PfkiTXn6t3oUYmXbVCL218WjU7vQ6qSrjtd4adW1noOuFoeP0bbfOUDj6HdcaDDcsh4q3gLas9rDfVqMg3ujdnjvS9Q2s1e8MnBrDhnbktoadVfkQihSrX/jcXrullcUvcnqO4xz3dL0wOC3X7R7WGAqHOxgM+m6OQmMmP+OKl66N7faxpUfRIqhrtVv9Gn0zAVbXcTRYWmJxRo49sHMWgQBWnKJ4WdxuvzdUg8Sm/k7fdbUJFKXyd1170KoxbFiPB8fDo5yEMBD5alVIUb52p3vUVaNIETTqyBnUWMjmTqOeY3U0FlJiGTmukxW2uhLlIru2b5KHT+5m3Zl7tpmmqlkuNNWcrb21WfHiGjNudtfat/nrB+BGeP1JIWyKhzXhsBEG1rHAZnhYCw+3wfu632+K92vC/UYYv47Fb4b3a+H9bfBU99OmeFoTThthaB0LbYantfB0G7zQ/aIpXtSEi0YYUccimuFFLbzYBk90P2mKJzXhpBGG1LGQZnhSC0+2wDN0L7d82U5Bvl0J0yLlnlzuZnMd4gRoOhdAoFwmOOGa7Ik7JECme6Fkyv/RPSAuHLKp6YjytS6bAZY8qmcScEjiSOQkFCdjH0glLXY9k0CeW0zERRDm5yDlIfIz0foh5X5Mjff5dLOZSvz0NhkTuWEHcg+bnUSSn0apPoZOto45HaQrPpodogA2s0MIwWYQrfZplZ8QmoXNoNy6P7gfpnKA5GGMoRN5k3er8O/lpDE/DOSkyb/j/ay1zQgWa6Ns7ciDoa0eA/XG1eGB3Tqw7R9be69PVmfEJ8Y3xrfGd4ZtdIzXxhvj1LgwoDEzfjN+N/7Y/XX3z92/dv9+sD5+tBrztVG5dv/5Hw1/l0A=</latexit>z

<latexit sha1_base64="R6z8TG0QwliEYM/f2YnOnQGLmEk=">AAAPFXicfZfPbts2HMfV7l+XrVm7HXbYRVhQoBuCQGod/zkUqCXb6GFps6xpulVBQNG0IoQSCZJy7Ap6jb3Artsb7DjsuvMeYO8xSrZliZSsSxh+v/zqox9FmfQpDrmwrH/v3P3gw48+/uTep3uffX5//4sHD798w0nCIDqHBBP21gcc4TBG5yIUGL2lDIHIx+jCv3Fz/WKOGA9J/FosKbqMQBCHsxACIbuuHnztYRKY9MrzxTUS4LHHQUQx+u7qwYF1ZBWXqTfsdePAWF+nVw/v/+dNCUwiFAuIAefvbIuKyxQwEUKMsj0v4YgCeAMC9C4Rs/5lGsY0ESiGmflIarMEm4KYOaQ5DRmCAi9lA0AWygQTXgMGoJCPsleP4igGEeKH03lI+arJ58GqIYCsw2W6KOqU1QamAQP0OoSLGlkKIh4Bca118mXk1ztRghGbR/XOnFIyKs4FYjDkeQ1OZWFe0bz0/DU5XevXS3qNYp6lCcNZdaAUEGNoJgcWTY5EQtPiYeR83/BngiXoMG8Wfc9GgN2coemhzKl11HFmmABR7/KjvDgxuoUkikA8TT2apZ5AC5F6h0eZFB+ZPpZmnwA2rTvPsjT18pr5vnkmrTXxZUV8qYrjijhe34TgqTkjzJzL+SeMm9JoSgsLIeL10efl6Jl5rka/qYhvVPGiIl6oop9U1ERT5xV1rqm3FfVWVRcVcaGKy4q4VMX3FfG9Kr7dFfvzrthflFhZf7kolnveFM3k56R4hdIbmKUvXp/8kKWD4lq/Cwky7boR+hvj00m3N+hlqow3emfSt52RrpeGnjO03SZD6Rj2XGs03rI8UbwltGV1x8OuGgXxVu8P3Imub2GtYW/kNBi2tBO3M+6ty4dQrFiD0ucOuh2tLEGZM3Ac53ig66XB6bhu/0mDoXS4o9Fo6BYoNGHyO6546cbY7R5behQtg/pWtzNs0LcTYPUdR4OlFRZn4tgju2ARCGDFKcqXxe0PB2M1SGzr7wxdV5tAUSl/37VHnQbDlvV4dDx+WpAQBuJArQopy9ft9Z/21ShSBk16cgY1FrK902TgWD2NhVRYJo7r5IWtr0S5yN7Zl+nqk7tdd+aBbWaZapYLTTXna29jVry4wYzb3Y32Xf7mAbgVXn9SCNviYUM4bIWBTSywHR42wsNd8IHuD9rig4bwoBUmaGIJ2uGDRvhgFzzV/bQtnjaE01YY2sRC2+FpIzzdBS90v2iLFw3hohVGNLGIdnjRCC92wRPdT9riSUM4aYUhTSykHZ40wpMd8Azdyi1fvlOQb1fKtEi5J5e72UKHOAWazgUQqJAJTrkmr44due5Hkqn4R/eApHTIpqYjyje6bIZY8qieacghSWJRkFCcegGQSlbueqahPLeYiIswKk5GykMUh6LNQ8r9mBof8Nl2M5UG2VXqEblhB3IPm59E0p8mmT6GTneOOR1laz6aH6IANvNDCMFmGK/3abWfEJqH3UC5dV+5V1M5QvIwxtCJvMmrdfj3ctJYEIVy0uRf7zBv7TKCxcYoW3vyYGirx0C9cfHkyO4c2faPnYPnJ+sz4j3jG+Nb47FhGz3jufHCODXODWhkxm/G78Yf+7/u/7n/1/7fK+vdO+sxXxm1a/+f/wG8IJ3D</latexit>

log p✓(z)

<latexit sha1_base64="dSa//P/biN8T0p7UDtRAa3DR8qw=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxL/uHz8NFPf1o26VEccGFZ/z56/NHHn3z62ZPPd7748unus+cvvrrkJGYQXUCCCXvvAY5wEKELEQiM3lOGQOhhdOXN3Ey/miPGAxKdiyVFNyHwo2AaQCDk0NUY+jiZprfP96wDK79MvbBXxZ6xuk5vXzz9bzwhMA5RJCAGnF/bFhU3CWAigBilO+OYIwrgDPjoOhbT7k0SRDQWKIKp+VJq0xibgpgZkjkJGIICL2UBIAtkggnvAANQSPCdahRHEQgR35/MA8ofSj73HwoB5FPfJIu8K2llYuIzQO8CuKiQJSDkIRB32iBfhl51EMUYsXlYHcwoJaPiXCAGA5714FQ25h3NGs3PyelKv1vSOxTxNIkZTssTpYAYQ1M5MS85EjFN8oeRqzvjrwSL0X5W5mOvBoDNztBkX+ZUBqo4U0yAqA55YdacCN1DEoYgmiRjmiZjgRYiGe8fpFJ8aXpYmj0C2KTqPEuTZJz1zPPMM2mtiG9L4ltVHJbE4eomBE/MKWHmXK4/YdyURlNaWAARr86+KGZPzQs1+rIkXqriVUm8UkUvLqmxps5L6lxT70vqvaouSuJCFZclcamKH0riB1V8vy32522xvyixsv/ypVjujCdoKr888o9QMoNp8ub85Ic06eXX6rMQI9OuGqG3Nh6N2p1eJ1VlvNZbo67tDHS9MHScvu3WGQpHv+Nag+GG5VDxFtCW1R7222oUxBu923NHur6BtfqdgVNj2NCO3Naws2ofQpFi9Quf22u3tLb4RU7PcZzjnq4XBqflut3DGkPhcAeDQd/NUWjMKEaKl66N7faxpUfRIqhrtVv9Gn2zAFbXcTRYWmJxRo49sHMWgQBWnKL4sLjdfm+oBolN/52+62oLKErt77r2oFVj2LAeD46HRzkJYSDy1a6Qon3tTveoq0aRImjUkSuosZDNnUY9x+poLKTEMnJcJ2ts9U2UL9m1fZM8fOVu3jtzzzbTVDXLF001Z+/e2qx4cY0ZN7tr7dv89RNwI7z+pBA2xcOacNgIA+tYYDM8rIWH2+B93e83xfs14X4jjF/H4jfD+7Xw/jZ4qvtpUzytCaeNMLSOhTbD01p4ug1e6H7RFC9qwkUjjKhjEc3wohZebIMnup80xZOacNIIQ+pYSDM8qYUnW+AZupdbvmynkB0MmBYp9+RyN5vrECdA07kAAuUywQnXZE/cIQEy3QslU/6P7gFx4ZClpiPK17osAyx5VM8k4JDEkchJKE7GPpBKWux6JoE8t5iIiyDMz0HKQ4BQ/piuH1Lux9R4n083m6nET2+TMZEbdiD3sNlJJPlplOpz6GTrnNNBuuKj2SEKYDM7hBBsBtFqn1b5CaFZ2AzKrfuD+2EpB0gexhg6kTd5twr/Xi4a88NALpr8O97Pqm1GsFgbZbUjD4a2egzUi6vDA7t1YNs/tvZen6zOiE+Mb4xvje8M2+gYr403xqlxYUBjZvxm/G78sfvr7p+7f+3+/WB9/Gg152ujcu3+8z8UJZdS</latexit>

f<latexit sha1_base64="8yXyhlDvM2p/v1QGHZPnlj0wrRU=">AAAPD3icfZfLbuM2GIU109s07aRJu+xGaDBAUQSBlDi+LAYYS7Yxi8lMmuYybRwEFE0rQiiRICnHHkHv0Bfotn2DLotu+wh9gL5HKcWWJVKyNvnDc3j86adkkx7FAReW9e+Tpx99/Mmnnz37fOuLL59vf7Wz+/UlJzGD6AISTNh7D3CEgwhdiEBg9J4yBEIPoyvv3s30qxliPCDRuVhQdBMCPwqmAQRCDt3u7I4JRQwIwiIQouSnN+ntzp51YOWXqRf2stgzltfp7e7z/8YTAuMQRQJiwPm1bVFxkwAmAohRujWOOaIA3gMfXcdi2r1JgojGAkUwNV9IbRpjUxAzwzMnAUNQ4IUsAGSBTDDhHWAACnkTW9UojjJmvj+ZBZQ/lnzmPxYCyA7cJPO8Q2llYuIzQO8COK+QJSDkIRB32iBfhF51EMUYsVlYHcwoJaPinCMGA5714FQ25h3Nms7PyelSv1vQOxTxNIkZTssTpYAYQ1M5MS85EjFN8puRK33PXwoWo/2szMdeDgC7P0OTfZlTGajiTDEBojrkhVlzIvQASRiCaJKMaZqMBZqLZLx/kErxhelhafYIYJOq8yxNknHWM88zz6S1Ir4tiW9VcVgSh8sPIXhiTgkzZ3L9CeOmNJrSwgKIeHX2RTF7al6o0Zcl8VIVr0rilSp6cUmNNXVWUmea+lBSH1R1XhLnqrgoiQtV/FASP6ji+02xP2+K/UWJlf2XL8ViazxBU/lFkj9CyT1Mk9fnJ2/SpJdfy2chRqZdNUJvZTwatTu9TqrKeKW3Rl3bGeh6Yeg4fdutMxSOfse1BsM1y6HiLaAtqz3st9UoiNd6t+eOdH0Na/U7A6fGsKYdua1hZ9k+hCLF6hc+t9duaW3xi5ye4zjHPV0vDE7LdbuHNYbC4Q4Gg76bo9CYUYwUL10Z2+1jS4+iRVDXarf6Nfp6Aayu42iwtMTijBx7YOcsAgGsOEXxsLjdfm+oBol1/52+62oLKErt77r2oFVjWLMeD46HRzkJYSDy1a6Qon3tTveoq0aRImjUkSuosZD1J416jtXRWEiJZeS4TtbY6psoX7Jr+yZ5/Mpdv3fmnm2mqWqWL5pqzt69lVnx4hozbnbX2jf56yfgRnj9TiFsioc14bARBtaxwGZ4WAsPN8H7ut9vivdrwv1GGL+OxW+G92vh/U3wVPfTpnhaE04bYWgdC22Gp7XwdBO80P2iKV7UhItGGFHHIprhRS282ARPdD9piic14aQRhtSxkGZ4UgtPNsAz9CC3fNlOQT5dCdMi5Z5c7mZzHeIEaDoXQKBcJjjhmuyJOyRApnuhZMr/0T0gLhyy1HRE+UqXZYAlj+qZBBySOBI5CcXJ2AdSSYtdzySQ5xYTcRGE+ZlIuQkQyh/T1U3K/Zga7/PpejOV+OltopyoRqk+h042zjkdpEs+mh2iADazQwjBZhAt92mVnxCahd1DuXV/dD8u5QDJwxhDJ/JD3i3Df5CLxvwwkIsm/473s2qTEcxXRlltyYOhrR4D9eLq8MBuHdj2j629VyfLM+Iz41vjO+N7wzY6xivjtXFqXBjQeDB+M343/tj+dfvP7b+2/360Pn2ynPONUbm2//kfl4Gb9w==</latexit>

SL+

<latexit sha1_base64="6ATDkqxu0HoA8Eyeac2rmXaRNP4=">AAAPS3icfZfPbts2HMfVbd26bE3b7biLsKBAtwWB1Dr+cyhQS7bRQ9NmXdN0q4KAomlFCCUSJOXYFfRUe429QK8b9gI7DjuMkm1ZIiXrEobfL7/66EdRJn2KQy4s6+OtTz797PbnX9z5cu+rr+/u37v/4Ju3nCQMojNIMGHvfMARDmN0JkKB0TvKEIh8jM79azfXz+eI8ZDEb8SSoosIBHE4CyEQsuvy/olHKGJAEBaDCKW/vMieejDA6Sx75HEQUYx+MD1MApNeer64QgJs+38yVevl/QPryCouU2/Y68aBsb5OLx/c/dubEphEKBYQA87f2xYVFylgIoQYZXtewhEF8BoE6H0iZv2LNIxpIlAMM/Oh1GYJNgUx80czpyFDUOClbADIQplgwivAABSyAHv1KI7y5+WH03lI+arJ58GqIYCs3kW6KKqb1QamAQP0KoSLGlkKIh4BcaV18mXk1ztRghGbR/XOnFIyKs4FYjDkeQ1OZWFe0XzC+BtyutavlvQKxTxLE4az6kApIMbQTA4smhyJhKbFw8i35Jo/FSxBh3mz6Hs6Auz6NZoeypxaRx1nhgkQ9S4/yosToxtIogjE09SjWeoJtBCpd3iUSfGh6WNp9glg07rzdZamXl4z3zdfS2tNfFkRX6riuCKO1zcheGrOCDPncv4J46Y0mtLCQoh4ffRZOXpmnqnRbyviW1U8r4jnqugnFTXR1HlFnWvqTUW9UdVFRVyo4rIiLlXxQ0X8oIrvdsX+uiv2NyVW1l8uiuWeN0Uz+REqXqH0Gmbp8zcnL7J0UFzrdyFBpl03Qn9jfDLp9ga9TJXxRu9M+rYz0vXS0HOGtttkKB3DnmuNxluWx4q3hLas7njYVaMg3ur9gTvR9S2sNeyNnAbDlnbidsa9dfkQihVrUPrcQbejlSUocwaO4xwPdL00OB3X7T9uMJQOdzQaDd0ChSZMfscVL90Yu91jS4+iZVDf6naGDfp2Aqy+42iwtMLiTBx7ZBcsAgGsOEX5srj94WCsBolt/Z2h62oTKCrl77v2qNNg2LIej47HTwoSwkAcqFUhZfm6vf6TvhpFyqBJT86gxkK2d5oMHKunsZAKy8Rxnbyw9ZUoF9l7+yJdfXK36848sM0sU81yoanmfO1tzIoXN5hxu7vRvsvfPAC3wutPCmFbPGwIh60wsIkFtsPDRni4Cz7Q/UFbfNAQHrTCBE0sQTt80Agf7IKnup+2xdOGcNoKQ5tYaDs8bYSnu+CF7hdt8aIhXLTCiCYW0Q4vGuHFLnii+0lbPGkIJ60wpImFtMOTRniyA56hG7nly3cK+QmBaZFyTy53s4UOcQo0nQsgUCETnHJNXp1Ect2PJFPxj+4BSemQTU1HlG902Qyx5FE905BDksSiIKE49QIglazc9UxDeW4xERdhVJynlIcoDkWbh5T7MTU+4LPtZioNsstUOY1NMn0Mne4cczrK1nw0P0QBbOaHEILNMF7v02o/ITQPu4Zy675yr6ZyhORhjKETeZNX6/Af5aSxIArlpMm/3mHe2mUEi41RtvbkwdBWj4F64/zxkd05su2fOwfPTtZnxDvGd8b3xiPDNnrGM+O5cWqcGdD43fho/Gn8tf/H/j/7/+7/t7J+cms95lujdt27/T+9rLLm</latexit>

SL = f(z) log p✓(z) + f(z)<latexit sha1_base64="m/HsR0X3V7c0aAJ+Tm7VBPaejBc=">AAAQAXicpZfPbts2HMft7l+XrUu7HXcRFhTotiCwUsd/DgFqyTZ6aNqsa5puURBQNK0IpkSCpBy7gi57gV23N9hx2HVPsgfYe4ySbVkiJV+mS2h+v/zqI5JS+HMp9rlotf5p3vvgw48+/uT+p3ufff7gi/2Hj758y0nEILqABBP2zgUcYT9EF8IXGL2jDIHAxejSndmpfjlHjPskfCOWFF0HwAv9qQ+BkF03j5r3nBC4GNw4rrhFAhgOoYgBQVgIAhT/+CIxTg0HejieJk8cDgKK0bfqEEw8g25+5i7je0M1Kjn/Z2RIhAMoZWSheQMgbl03HiU3sZ6dXKlZ1zcPD1pHrewy9Ia5bhw01tf5zaMH/zoTAqMAhQJiwPmV2aLiOgZM+BCjZM+JOKIAzoCHriIx7V3HfkgjgUKYGI+lNo2wIYiRLogx8RmCAi9lA0DmywQD3gIGoJDLtleO4ihdFX44mfuUr5p87q0aQk4Cuo4X2Z5ISgNjjwF668NFiSwGAU+nSuvky8Atd6IIIzYPyp0ppWRUnAvEoM/TOTiXE/OKptuMvyHna/12SW9RyJM4YjgpDpQCYgxN5cCsyZGIaJw9jNzbM34qWIQO02bWdzoEbPYaTQ5lTqmjjDPFBIhylxukkxOiO0iCAIST2KFJ7Ai0ELFzeJRI8bEhtxOcuQSwSdn5Oonj9fYyXktrSXxZEF+q4qggjtY3IXhiTAkz5nL9CeOGNBrSwnyIeHn0RT56alyo0W8L4ltVvCyIl6roRgU10tR5QZ1r6l1BvVPVRUFcqOKyIC5V8X1BfK+K73bF/rQr9mclVs6/fCmWe84ETeWnM9tC8Qwm8fM3Zy+SuJ9d670QIcMsG6G7MT4dd7r9bqLKeKO3xz3TGup6buhaA9OuMuSOQdduDUdblmPFm0O3Wp3RoKNGQbzVe317rOtb2NagO7QqDFvasd0eddfTh1CoWL3cZ/c7bW1avDynb1nWSV/Xc4PVtu3ecYUhd9jD4XBgZyg0YvJDrnjpxtjpnLT0KJoH9Vqd9qBC3y5Aq2dZGiwtsFhjyxyaGYtAACtOkW8Wuzfoj9QgsZ1/a2Db2gKKwvT3bHPYrjBsWU+GJ6OnGQlhIPTUWSH59HW6vac9NYrkQeOuXEGNhWzvNO5bra7GQgosY8u20oktv4nyJbsyr+PVJ3f73hkHppEkqlm+aKo5ffc2ZsWLK8y43l1p3+WvHoBr4fUnhbAuHlaEw1oYWMUC6+FhJTzcBe/pfq8u3qsI92phvCoWrx7eq4T3dsFT3U/r4mlFOK2FoVUstB6eVsLTXfBC94u6eFERLmphRBWLqIcXlfBiFzzR/aQunlSEk1oYUsVC6uFJJTzZAc/QnTzypSeFtERgWqQ8k8vTbKZDHANN5wIIlMkEx1yTV5VIqruBZMp+6B4Q5Q7Z1HRE+UaXTR9LHtUz8TkkUSgyEopjxwNSSfJTz8SXdYuBuPCDrApUHiKrijYPKc9jarzHp9vDVOzJKkupGceJPoZOdo45HyZrPpoWUQAbaRFCsOGH63Na6V8ITcNmUB7dV+7VUg6RLMYYOpM3ebUO/04uGvMCXy6a/Oscpq1dRrDYGGVrTxaGploG6o3L4yOzfWSaP7QPnp2ta8T7ja8b3zSeNMxGt/Gs8bxx3rhowKbf/LX5W/P3/V/2/9j/c/+vlfVecz3mq0bp2v/7P+0q9LA=</latexit>

r✓ SL = f(z)r✓ log p✓(z) +r✓f(z) log p✓(z) +r✓f(z) 6⇡ r✓Ep✓(z)[f(z)]

The corresponding surrogate loss for this graph uses both the log probability and the function itself. Walk through the non-broken lines backwards from SL to see what paths to \theta exist. Does that work?

Looking at the derivative, no it doesn’t! There is an extra term in the middle which we didn’t expect. It comes from using the product rule on the function times log-probability term. So, this surrogate loss is unfortunately wrong.

MULTIPLE DEPENDENCIES

41

~

<latexit sha1_base64="JYRotR/AW1We/S2YX7ncehvQmNA=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIjeOXQ4Faso0e1jbL8rbVQUDRtCKYEgmScuwKuu0L7Lp9gx2HXfdB9gH2PUbJtiyRknXJP3wePv7pT8kmPYoDLizr3wcPP/r4k08/e/T53hdfPt5/8vTZV5ecxAyiC0gwYdce4AgHEboQgcDomjIEQg+jK2/mZvrVHDEekOhcLCm6CYEfBdMAAiGHrsaeuEMC3D49sI6s/DL1wl4XB8b6Or199vi/8YTAOESRgBhw/t62qLhJABMBxCjdG8ccUQBnwEfvYzHt3iRBRGOBIpiaz6U2jbEpiJkhmZOAISjwUhYAskAmmPAOMACFBN+rRnEUgRDxw8k8oHxV8rm/KgSQd32TLPKupJWJic8AvQvgokKWgJCHQNxpg3wZetVBFGPE5mF1MKOUjIpzgRgMeNaDU9mYdzRrND8np2v9bknvUMTTJGY4LU+UAmIMTeXEvORIxDTJb0au7oy/FCxGh1mZj70cADY7Q5NDmVMZqOJMMQGiOuSFWXMidA9JGIJokoxpmowFWohkfHiUSvG56WFp9ghgk6rzLE2ScdYzzzPPpLUivi2Jb1VxWBKH6w8heGJOCTPncv0J46Y0mtLCAoh4dfZFMXtqXqjRlyXxUhWvSuKVKnpxSY01dV5S55p6X1LvVXVREhequCyJS1X8UBI/qOL1rtifd8X+osTK/suXYrk3nqCp/PLIH6FkBtPk9fmbH9Kkl1/rZyFGpl01Qm9jPB61O71Oqsp4o7dGXdsZ6Hph6Dh9260zFI5+x7UGwy3LC8VbQFtWe9hvq1EQb/Vuzx3p+hbW6ncGTo1hSztyW8POun0IRYrVL3xur93S2uIXOT3HcU56ul4YnJbrdl/UGAqHOxgM+m6OQmNGMVK8dGNst08sPYoWQV2r3erX6NsFsLqOo8HSEoszcuyBnbMIBLDiFMXD4nb7vaEaJLb9d/quqy2gKLW/69qDVo1hy3oyOBke5ySEgchXu0KK9rU73eOuGkWKoFFHrqDGQrafNOo5VkdjISWWkeM6WWOrb6J8yd7bN8nqK3f73pkHtpmmqlm+aKo5e/c2ZsWLa8y42V1r3+Wvn4Ab4fU7hbApHtaEw0YYWMcCm+FhLTzcBe/rfr8p3q8J9xth/DoWvxner4X3d8FT3U+b4mlNOG2EoXUstBme1sLTXfBC94umeFETLhphRB2LaIYXtfBiFzzR/aQpntSEk0YYUsdCmuFJLTzZAc/QvdzyZTsF+XQlTIuUe3K5m811iBOg6VwAgXKZ4IRr8uq0keleKJnyf3QPiAuHLDUdUb7RZRlgyaN6JgGHJI5ETkJxMvaBVNJi1zMJ5LnFRFwEYX4OUm4ChPLHdHOTcienxvt8ut1MJX56m4yJ3LADuYfNTiLJT6NUn0MnO+ecDtI1H80OUQCb2SGEYDOI1vu0yk8IzcJmUG7dV+7VUg6QPIwx9EZ+yLt1+Pdy0ZgfBnLR5N/xYVbtMoLFxiirPXkwtNVjoF5cvTiyW0e2/WPr4FV7fUZ8ZHxjfGt8Z9hGx3hlvDZOjQsDGjPjN+N344/9X/f/3P9r/++V9eGD9Zyvjcq1/8//cW+XHQ==</latexit>

<latexit sha1_base64="T8FmzHO54kPteUaY1WhvZp8nAKY=">AAAPBXicfZdNb9s2HMbV7q3L1qzdjrsICwoMQxBIieOXQ4Faso0e1jbL8tbFQUDRtCKEEgmScuwKuu4L7Lp9gx2HXfc59gH2PUbJtiyRknXJP3wePv7pT8kmPYoDLizr30ePP/r4k08/e/L5zhdfPt396tnzry84iRlE55Bgwq48wBEOInQuAoHRFWUIhB5Gl969m+mXM8R4QKIzsaDoJgR+FEwDCIQcek9vx564QwLcPtuzDqz8MvXCXhV7xuo6uX3+9L/xhMA4RJGAGHB+bVtU3CSAiQBilO6MY44ogPfAR9exmHZvkiCisUARTM0XUpvG2BTEzKDMScAQFHghCwBZIBNMeAcYgEKi71SjOIpAiPj+ZBZQviz5zF8WAsj7vknmeV/SysTEZ4DeBXBeIUtAyEMg7rRBvgi96iCKMWKzsDqYUUpGxTlHDAY868GJbMw7mrWan5GTlX63oHco4mkSM5yWJ0oBMYamcmJeciRimuQ3I9f3nr8ULEb7WZmPvRwAdn+KJvsypzJQxZliAkR1yAuz5kToAZIwBNEkGdM0GQs0F8l4/yCV4gvTw9LsEcAmVedpmiTjrGeeZ55Ka0V8WxLfquKwJA5XH0LwxJwSZs7k+hPGTWk0pYUFEPHq7PNi9tQ8V6MvSuKFKl6WxEtV9OKSGmvqrKTONPWhpD6o6rwkzlVxURIXqvihJH5Qxattse+3xf6ixMr+y5disTOeoKn8+sgfoeQepsnrszc/pkkvv1bPQoxMu2qE3tp4NGp3ep1UlfFab426tjPQ9cLQcfq2W2coHP2Oaw2GG5ZDxVtAW1Z72G+rURBv9G7PHen6BtbqdwZOjWFDO3Jbw86qfQhFitUvfG6v3dLa4hc5Pcdxjnu6Xhiclut2D2sMhcMdDAZ9N0ehMaMYKV66Nrbbx5YeRYugrtVu9Wv0zQJYXcfRYGmJxRk59sDOWQQCWHGK4mFxu/3eUA0Sm/47fdfVFlCU2t917UGrxrBhPR4cD49yEsJA5KtdIUX72p3uUVeNIkXQqCNXUGMhm08a9Ryro7GQEsvIcZ2ssdU3Ub5k1/ZNsvzK3bx35p5tpqlqli+aas7evbVZ8eIaM25219q3+esn4EZ4/U4hbIqHNeGwEQbWscBmeFgLD7fB+7rfb4r3a8L9Rhi/jsVvhvdr4f1t8FT306Z4WhNOG2FoHQtthqe18HQbvND9oile1ISLRhhRxyKa4UUtvNgGT3Q/aYonNeGkEYbUsZBmeFILT7bAM/Qgt3zZTkE+XQnTIuWeXO5mcx3iBGg6F0CgXCY44Zq8PG1kuhdKpvwf3QPiwiFLTUeUr3VZBljyqJ5JwCGJI5GTUJyMfSCVtNj1TAJ5bjERF0GYn4SUmwCh/DFd36TcyanxPp9uNlOJn94mYyI37EDuYbOTSPLzKNXn0MnWOSeDdMVHs0MUwGZ2CCHYDKLVPq3yE0KzsHsot+5L93IpB0gexhh6Iz/k3Sr8B7lozA8DuWjy73g/q7YZwXxtlNWOPBja6jFQLy4PD+zWgW3/1Np71V6dEZ8Y3xrfGd8bttExXhmvjRPj3IBGaPxm/G78sfvr7p+7f+3+vbQ+frSa841RuXb/+R+p/5gA</latexit>p✓

×

<latexit sha1_base64="ytaGjQ3HxJR6GVJlezzzjEx+HxU=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxLGD4PH/30pyiTHsUBF5b176PHH338yaefPfl854svn+4+e/7iq0tOYgbRBSSYsPce4AgHEboQgcDoPWUIhB5GV97MzfSrOWI8ING5WFJ0EwI/CqYBBEJ2XY05CClGt8/3rAMrv0y9Ya8ae8bqOr198fS/8YTAOESRgBhwfm1bVNwkgIkAYpTujGOOKIAz4KPrWEy7N0kQ0VigCKbmS6lNY2wKYmZI5iRgCAq8lA0AWSATTHgHGIBCgu9UoziKQIj4/mQeUP7Q5HP/oSGAfOqbZJFXJa0MTHwG6F0AFxWyBIQ8BOJO6+TL0Kt2ohgjNg+rnRmlZFScC8RgwLManMrCvKNZofk5OV3pd0t6hyKeJjHDaXmgFBBjaCoH5k2OREyT/GHk7M74K8FitJ81875XA8BmZ2iyL3MqHVWcKSZAVLu8MCtOhO4hCUMQTZIxTZOxQAuRjPcPUim+ND0szR4BbFJ1nqVJMs5q5nnmmbRWxLcl8a0qDkvicHUTgifmlDBzLuefMG5KoyktLICIV0dfFKOn5oUafVkSL1XxqiReqaIXl9RYU+clda6p9yX1XlUXJXGhisuSuFTFDyXxgyq+3xb787bYX5RYWX+5KJY74wmayo9H/golM5gmb85PfkiTXn6t3oUYmXbVCL218WjU7vQ6qSrjtd4adW1noOuFoeP0bbfOUDj6HdcaDDcsh4q3gLas9rDfVqMg3ujdnjvS9Q2s1e8MnBrDhnbktoadVfkQihSrX/jcXrullcUvcnqO4xz3dL0wOC3X7R7WGAqHOxgM+m6OQmMmP+OKl66N7faxpUfRIqhrtVv9Gn0zAVbXcTRYWmJxRo49sHMWgQBWnKJ4WdxuvzdUg8Sm/k7fdbUJFKXyd1170KoxbFiPB8fDo5yEMBD5alVIUb52p3vUVaNIETTqyBnUWMjmTqOeY3U0FlJiGTmukxW2uhLlIru2b5KHT+5m3Zl7tpmmqlkuNNWcrb21WfHiGjNudtfat/nrB+BGeP1JIWyKhzXhsBEG1rHAZnhYCw+3wfu632+K92vC/UYYv47Fb4b3a+H9bfBU99OmeFoTThthaB0LbYantfB0G7zQ/aIpXtSEi0YYUccimuFFLbzYBk90P2mKJzXhpBGG1LGQZnhSC0+2wDN0L7d82U5Bvl0J0yLlnlzuZnMd4gRoOhdAoFwmOOGa7Ik7JECme6Fkyv/RPSAuHLKp6YjytS6bAZY8qmcScEjiSOQkFCdjH0glLXY9k0CeW0zERRDm5yDlIfIz0foh5X5Mjff5dLOZSvz0NhkTuWEHcg+bnUSSn0apPoZOto45HaQrPpodogA2s0MIwWYQrfZplZ8QmoXNoNy6P7gfpnKA5GGMoRN5k3er8O/lpDE/DOSkyb/j/ay1zQgWa6Ns7ciDoa0eA/XG1eGB3Tqw7R9be69PVmfEJ8Y3xrfGd4ZtdIzXxhvj1LgwoDEzfjN+N/7Y/XX3z92/dv9+sD5+tBrztVG5dv/5Hw1/l0A=</latexit>z

<latexit sha1_base64="R6z8TG0QwliEYM/f2YnOnQGLmEk=">AAAPFXicfZfPbts2HMfV7l+XrVm7HXbYRVhQoBuCQGod/zkUqCXb6GFps6xpulVBQNG0IoQSCZJy7Ap6jb3Artsb7DjsuvMeYO8xSrZliZSsSxh+v/zqox9FmfQpDrmwrH/v3P3gw48+/uTep3uffX5//4sHD798w0nCIDqHBBP21gcc4TBG5yIUGL2lDIHIx+jCv3Fz/WKOGA9J/FosKbqMQBCHsxACIbuuHnztYRKY9MrzxTUS4LHHQUQx+u7qwYF1ZBWXqTfsdePAWF+nVw/v/+dNCUwiFAuIAefvbIuKyxQwEUKMsj0v4YgCeAMC9C4Rs/5lGsY0ESiGmflIarMEm4KYOaQ5DRmCAi9lA0AWygQTXgMGoJCPsleP4igGEeKH03lI+arJ58GqIYCsw2W6KOqU1QamAQP0OoSLGlkKIh4Bca118mXk1ztRghGbR/XOnFIyKs4FYjDkeQ1OZWFe0bz0/DU5XevXS3qNYp6lCcNZdaAUEGNoJgcWTY5EQtPiYeR83/BngiXoMG8Wfc9GgN2coemhzKl11HFmmABR7/KjvDgxuoUkikA8TT2apZ5AC5F6h0eZFB+ZPpZmnwA2rTvPsjT18pr5vnkmrTXxZUV8qYrjijhe34TgqTkjzJzL+SeMm9JoSgsLIeL10efl6Jl5rka/qYhvVPGiIl6oop9U1ERT5xV1rqm3FfVWVRcVcaGKy4q4VMX3FfG9Kr7dFfvzrthflFhZf7kolnveFM3k56R4hdIbmKUvXp/8kKWD4lq/Cwky7boR+hvj00m3N+hlqow3emfSt52RrpeGnjO03SZD6Rj2XGs03rI8UbwltGV1x8OuGgXxVu8P3Imub2GtYW/kNBi2tBO3M+6ty4dQrFiD0ucOuh2tLEGZM3Ac53ig66XB6bhu/0mDoXS4o9Fo6BYoNGHyO6546cbY7R5behQtg/pWtzNs0LcTYPUdR4OlFRZn4tgju2ARCGDFKcqXxe0PB2M1SGzr7wxdV5tAUSl/37VHnQbDlvV4dDx+WpAQBuJArQopy9ft9Z/21ShSBk16cgY1FrK902TgWD2NhVRYJo7r5IWtr0S5yN7Zl+nqk7tdd+aBbWaZapYLTTXna29jVry4wYzb3Y32Xf7mAbgVXn9SCNviYUM4bIWBTSywHR42wsNd8IHuD9rig4bwoBUmaGIJ2uGDRvhgFzzV/bQtnjaE01YY2sRC2+FpIzzdBS90v2iLFw3hohVGNLGIdnjRCC92wRPdT9riSUM4aYUhTSykHZ40wpMd8Azdyi1fvlOQb1fKtEi5J5e72UKHOAWazgUQqJAJTrkmr44due5Hkqn4R/eApHTIpqYjyje6bIZY8qieacghSWJRkFCcegGQSlbueqahPLeYiIswKk5GykMUh6LNQ8r9mBof8Nl2M5UG2VXqEblhB3IPm59E0p8mmT6GTneOOR1laz6aH6IANvNDCMFmGK/3abWfEJqH3UC5dV+5V1M5QvIwxtCJvMmrdfj3ctJYEIVy0uRf7zBv7TKCxcYoW3vyYGirx0C9cfHkyO4c2faPnYPnJ+sz4j3jG+Nb47FhGz3jufHCODXODWhkxm/G78Yf+7/u/7n/1/7fK+vdO+sxXxm1a/+f/wG8IJ3D</latexit>

log p✓(z)

<latexit sha1_base64="dSa//P/biN8T0p7UDtRAa3DR8qw=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxL/uHz8NFPf1o26VEccGFZ/z56/NHHn3z62ZPPd7748unus+cvvrrkJGYQXUCCCXvvAY5wEKELEQiM3lOGQOhhdOXN3Ey/miPGAxKdiyVFNyHwo2AaQCDk0NUY+jiZprfP96wDK79MvbBXxZ6xuk5vXzz9bzwhMA5RJCAGnF/bFhU3CWAigBilO+OYIwrgDPjoOhbT7k0SRDQWKIKp+VJq0xibgpgZkjkJGIICL2UBIAtkggnvAANQSPCdahRHEQgR35/MA8ofSj73HwoB5FPfJIu8K2llYuIzQO8CuKiQJSDkIRB32iBfhl51EMUYsXlYHcwoJaPiXCAGA5714FQ25h3NGs3PyelKv1vSOxTxNIkZTssTpYAYQ1M5MS85EjFN8oeRqzvjrwSL0X5W5mOvBoDNztBkX+ZUBqo4U0yAqA55YdacCN1DEoYgmiRjmiZjgRYiGe8fpFJ8aXpYmj0C2KTqPEuTZJz1zPPMM2mtiG9L4ltVHJbE4eomBE/MKWHmXK4/YdyURlNaWAARr86+KGZPzQs1+rIkXqriVUm8UkUvLqmxps5L6lxT70vqvaouSuJCFZclcamKH0riB1V8vy32522xvyixsv/ypVjujCdoKr888o9QMoNp8ub85Ic06eXX6rMQI9OuGqG3Nh6N2p1eJ1VlvNZbo67tDHS9MHScvu3WGQpHv+Nag+GG5VDxFtCW1R7222oUxBu923NHur6BtfqdgVNj2NCO3Naws2ofQpFi9Quf22u3tLb4RU7PcZzjnq4XBqflut3DGkPhcAeDQd/NUWjMKEaKl66N7faxpUfRIqhrtVv9Gn2zAFbXcTRYWmJxRo49sHMWgQBWnKL4sLjdfm+oBolN/52+62oLKErt77r2oFVj2LAeD46HRzkJYSDy1a6Qon3tTveoq0aRImjUkSuosZDNnUY9x+poLKTEMnJcJ2ts9U2UL9m1fZM8fOVu3jtzzzbTVDXLF001Z+/e2qx4cY0ZN7tr7dv89RNwI7z+pBA2xcOacNgIA+tYYDM8rIWH2+B93e83xfs14X4jjF/H4jfD+7Xw/jZ4qvtpUzytCaeNMLSOhTbD01p4ug1e6H7RFC9qwkUjjKhjEc3wohZebIMnup80xZOacNIIQ+pYSDM8qYUnW+AZupdbvmynkB0MmBYp9+RyN5vrECdA07kAAuUywQnXZE/cIQEy3QslU/6P7gFx4ZClpiPK17osAyx5VM8k4JDEkchJKE7GPpBKWux6JoE8t5iIiyDMz0HKQ4BQ/piuH1Lux9R4n083m6nET2+TMZEbdiD3sNlJJPlplOpz6GTrnNNBuuKj2SEKYDM7hBBsBtFqn1b5CaFZ2AzKrfuD+2EpB0gexhg6kTd5twr/Xi4a88NALpr8O97Pqm1GsFgbZbUjD4a2egzUi6vDA7t1YNs/tvZen6zOiE+Mb4xvje8M2+gYr403xqlxYUBjZvxm/G78sfvr7p+7f+3+/WB9/Gg152ujcu3+8z8UJZdS</latexit>

f<latexit sha1_base64="8yXyhlDvM2p/v1QGHZPnlj0wrRU=">AAAPD3icfZfLbuM2GIU109s07aRJu+xGaDBAUQSBlDi+LAYYS7Yxi8lMmuYybRwEFE0rQiiRICnHHkHv0Bfotn2DLotu+wh9gL5HKcWWJVKyNvnDc3j86adkkx7FAReW9e+Tpx99/Mmnnz37fOuLL59vf7Wz+/UlJzGD6AISTNh7D3CEgwhdiEBg9J4yBEIPoyvv3s30qxliPCDRuVhQdBMCPwqmAQRCDt3u7I4JRQwIwiIQouSnN+ntzp51YOWXqRf2stgzltfp7e7z/8YTAuMQRQJiwPm1bVFxkwAmAohRujWOOaIA3gMfXcdi2r1JgojGAkUwNV9IbRpjUxAzwzMnAUNQ4IUsAGSBTDDhHWAACnkTW9UojjJmvj+ZBZQ/lnzmPxYCyA7cJPO8Q2llYuIzQO8COK+QJSDkIRB32iBfhF51EMUYsVlYHcwoJaPinCMGA5714FQ25h3Nms7PyelSv1vQOxTxNIkZTssTpYAYQ1M5MS85EjFN8puRK33PXwoWo/2szMdeDgC7P0OTfZlTGajiTDEBojrkhVlzIvQASRiCaJKMaZqMBZqLZLx/kErxhelhafYIYJOq8yxNknHWM88zz6S1Ir4tiW9VcVgSh8sPIXhiTgkzZ3L9CeOmNJrSwgKIeHX2RTF7al6o0Zcl8VIVr0rilSp6cUmNNXVWUmea+lBSH1R1XhLnqrgoiQtV/FASP6ji+02xP2+K/UWJlf2XL8ViazxBU/lFkj9CyT1Mk9fnJ2/SpJdfy2chRqZdNUJvZTwatTu9TqrKeKW3Rl3bGeh6Yeg4fdutMxSOfse1BsM1y6HiLaAtqz3st9UoiNd6t+eOdH0Na/U7A6fGsKYdua1hZ9k+hCLF6hc+t9duaW3xi5ye4zjHPV0vDE7LdbuHNYbC4Q4Gg76bo9CYUYwUL10Z2+1jS4+iRVDXarf6Nfp6Aayu42iwtMTijBx7YOcsAgGsOEXxsLjdfm+oBol1/52+62oLKErt77r2oFVjWLMeD46HRzkJYSDy1a6Qon3tTveoq0aRImjUkSuosZD1J416jtXRWEiJZeS4TtbY6psoX7Jr+yZ5/Mpdv3fmnm2mqWqWL5pqzt69lVnx4hozbnbX2jf56yfgRnj9TiFsioc14bARBtaxwGZ4WAsPN8H7ut9vivdrwv1GGL+OxW+G92vh/U3wVPfTpnhaE04bYWgdC22Gp7XwdBO80P2iKV7UhItGGFHHIprhRS282ARPdD9piic14aQRhtSxkGZ4UgtPNsAz9CC3fNlOQT5dCdMi5Z5c7mZzHeIEaDoXQKBcJjjhmuyJOyRApnuhZMr/0T0gLhyy1HRE+UqXZYAlj+qZBBySOBI5CcXJ2AdSSYtdzySQ5xYTcRGE+ZlIuQkQyh/T1U3K/Zga7/PpejOV+OltopyoRqk+h042zjkdpEs+mh2iADazQwjBZhAt92mVnxCahd1DuXV/dD8u5QDJwxhDJ/JD3i3Df5CLxvwwkIsm/473s2qTEcxXRlltyYOhrR4D9eLq8MBuHdj2j629VyfLM+Iz41vjO+N7wzY6xivjtXFqXBjQeDB+M343/tj+dfvP7b+2/360Pn2ynPONUbm2//kfl4Gb9w==</latexit>

SL+

<latexit sha1_base64="M6w28jWozx3wuePzZU3VLCnkxoY=">AAAPwnicfZfdbts2HMWd7qNdti7tdrkbYUGBrgsCqXX8cVGglmyjF02bZU3TNgoCiqYVwZTIkZRjV9Pb7SX2AHuPUbItS6Rk3YTmOTz66U9KIT2KAy5M89+9e199/c239x98t//9Dw9/PHj0+KcPnMQMogtIMGEfPcARDiJ0IQKB0UfKEAg9jC69mZPpl3PEeECi92JJ0XUI/CiYBhAI2XXz6B83Ah4GN64nbpEAhksoYkAQFoEQJX++SY2Xhgt9nEzTpy4HIcXoN3UIJr5BNz8Ll/G7oRqVHMMFlDKy0HwhELeel4zSm0TPTa/UnOubR4fmsZlfht6w1o3D1vo6u3n88D93QmAcokhADDi/skwqrhPARAAxSvfdmCMK4Az46CoW0951EkQ0FiiCqfFEatMYG4IYWUWNScAQFHgpGwCyQCYY8BYwAIWs+341iqOsrvxoMg8oXzX53F81hCwCuk4W+aSmlYGJzwC9DeCiQpaAkGel0jr5MvSqnSjGiM3DamdGKRkV5wIxGPCsBmeyMO9otk74e3K21m+X9BZFPE1ihtPyQCkgxtBUDsybHImYJvnDyMU54y8Fi9FR1sz7Xg4Bm52jyZHMqXRUcaaYAFHt8sKsOBG6gyQMQTRJXJomrkALkbhHx6kUnxhyOcGZRwCbVJ3naZKsl5dxLq0V8W1JfKuKo5I4Wt+E4IkxJcyYy/knjBvSaEgLCyDi1dEXxeipcaFGfyiJH1TxsiReqqIXl9RYU+clda6pdyX1TlUXJXGhisuSuFTFLyXxiyp+3BX7aVfsZyVW1l++FMt9d4Km8tuXL6FkBtPk9fvTN2nSz6/1WoiRYVWN0NsYX4w73X43VWW80dvjnmUPdb0wdO2B5dQZCseg65jD0ZblueItoE2zMxp01CiIt3qv74x1fQtrDrpDu8awpR077VF3XT6EIsXqFz6n32lrZfGLnL5t2yd9XS8Mdttxes9rDIXDGQ6HAydHoTGTH3LFSzfGTufE1KNoEdQzO+1Bjb6dALNn2xosLbHYY9saWjmLQAArTlEsFqc36I/UILGtvz1wHG0CRan8PccatmsMW9aT4cnoRU5CGIh8tSqkKF+n23vRU6NIETTuyhnUWMj2TuO+bXY1FlJiGduOnRW2+ibKl+zKuk5Wn9zte2ccWkaaqmb5oqnm7N3bmBUvrjHjZnetfZe/fgBuhNefFMKmeFgTDhthYB0LbIaHtfBwF7yv+/2meL8m3G+E8etY/GZ4vxbe3wVPdT9tiqc14bQRhtax0GZ4WgtPd8EL3S+a4kVNuGiEEXUsohle1MKLXfBE95OmeFITThphSB0LaYYntfBkBzxDd3LLl+0UsiMC0yLlnlzuZnMd4gRoOhdAoFwmOOGavDqJZLoXSqb8h+4BceGQTU1HlG902Qyw5FE9k4BDEkciJ6E4cX0glbTY9UwCeW4xEBdBmB/jlIfIT0Wbh5T7MTXe59PtZirx5SlLOfWNU30MnewcczZM13w0O0QBbGSHEIKNIFrv0yr/QmgWNoNy675yr6ZyiORhjKFTeZN36/BnctKYHwZy0uRf9yhr7TKCxcYoW/vyYGipx0C9cfn82GofW9Yf7cNXp+sz4oPWL61fW09bVqvbetV63TprXbTg3rO9s71Pe58PnIPg4K8DvrLe21uP+blVuQ7+/h/oydyl</latexit>

r✓ SL = f(z)r✓ log p✓(z) +r✓f(z) ⇡ r✓Ep✓(z)[f(z)]

<latexit sha1_base64="ArXytzzJAQac0ibpYZ8LM+5f/co=">AAAPUHicfZdNb9s2HMbV7q3LtrTdjrsICwq0WxBIreOXQ4Faso0e+pK1TdKtCgKKphUhlEiQlGNX0Afb19htp50GbN9gt42SbVkiJetihs/DRz/9KSqkT3HIhWX9cev2J59+9vkXd77c++rrb/bv3rv/7RknCYPoFBJM2HsfcITDGJ2KUGD0njIEIh+jc//azfXzOWI8JPE7saToIgJBHM5CCITsurz31iMUMSAIi0GE0rcvsqeeT8RDDwY4nWUPPQ4iitGjR6aHSWDSS88XV0iAUjB/MlXv5b0D68gqLlNv2OvGgbG+Ti7vf/OXNyUwiVAsIAacf7AtKi5SwEQIMcr2vIQjCuA1CNCHRMz6F2kY00SgGGbmA6nNEmwKYuYPaE5DhqDAS9kAkIUywYRXgAEoZBn26lEc5U/ND6fzkPJVk8+DVUMAWcOLdFHUOKsNTAMG6FUIFzWyFEQ8AuJK6+TLyK93ogQjNo/qnTmlZFScC8RgyPManMjCvKb5tPF35GStXy3pFYp5liYMZ9WBUkCMoZkcWDQ5EglNi4eR78o1fypYgg7zZtH3dATY9Rs0PZQ5tY46zgwTIOpdfpQXJ0Y3kEQRiKepR7PUE2ghUu/wKJPiA9PH0uwTwKZ155ssTb28Zr5vvpHWmviqIr5SxXFFHK9vQvDUnBFmzuX8E8ZNaTSlhYUQ8fro03L0zDxVo88q4pkqnlfEc1X0k4qaaOq8os419aai3qjqoiIuVHFZEZeq+LEiflTF97tif9kV+6sSK+svF8Vyz5uimfwUFa9Qeg2z9Pm7ly+ydFBc63chQaZdN0J/Y3wy6fYGvUyV8UbvTPq2M9L10tBzhrbbZCgdw55rjcZblseKt4S2rO542FWjIN7q/YE70fUtrDXsjZwGw5Z24nbGvXX5EIoVa1D63EG3o5UlKHMGjuMcD3S9NDgd1+0/bjCUDnc0Gg3dAoUmTH7HFS/dGLvdY0uPomVQ3+p2hg36dgKsvuNosLTC4kwce2QXLAIBrDhF+bK4/eFgrAaJbf2doetqEygq5e+79qjTYNiyHo+Ox08KEsJAHKhVIWX5ur3+k74aRcqgSU/OoMZCtneaDByrp7GQCsvEcZ28sPWVKBfZB/siXX1yt+vOPLDNLFPNcqGp5nztbcyKFzeYcbu70b7L3zwAt8LrTwphWzxsCIetMLCJBbbDw0Z4uAs+0P1BW3zQEB60wgRNLEE7fNAIH+yCp7qftsXThnDaCkObWGg7PG2Ep7vghe4XbfGiIVy0wogmFtEOLxrhxS54ovtJWzxpCCetMKSJhbTDk0Z4sgOeoRu55ct3CvkJgWmRck8ud7OFDnEKNJ0LIFAhE5xyTV6dRHLdjyRT8YfuAUnpkE1NR5RvdNkMseRRPdOQQ5LEoiChOPUCIJWs3PVMQ3luMREXYVScqpSHKA5Fm4eU+zE1PuCz7WYqDbLLVDmTTTJ9DJ3uHHMyytZ8ND9EAWzmhxCCzTBe79Nq/0JoHnYN5dZ95V5N5QjJwxhDL+VNXq/Df5STxoIolJMmf73DvLXLCBYbo2ztyYOhrR4D9cbZ4yP7+Mj6uXPw7OX6iHjH+N74wXho2EbPeGY8N06MUwMavxl/Gn8b/+z/vv/v/n93b62st9e/xndG7bq79z8h67Pg</latexit>

SL = ?(f(z)) log p✓(z) + f(z)

The correct surrogate loss uses the detach function. We have seen this before in lecture 2. This is an operation that blocks gradients from flowing backwards, but leaves forward computation intact.

Taking the derivative of the surrogate loss, we see that it uses both the score function and a direct derivative of the function with respect to the parameter \theta.

Page 15: Lecture 9: Reinforcement Learning Emile van Krieken · 2021. 1. 25. · Emile van Krieken Deep Learning 2020 Lecture 9: Reinforcement Learning dlvu.github.io Reinforcement learning

GENERAL SURROGATE LOSS

42

<latexit sha1_base64="YBuoKOy9cK4UEyGD6qqBudmOzR4=">AAAOV3icfZdNc+M0HMbdBZZSaGnhyMVDZwcGuh27TfNy6MzGTsIe2N3S6dtSdzuyojieyJZGktNkPf40fBqucIEvwyA7iePX+FJVz6MnP/1leSSbYpcLTft369knn372/PPtL3a+/Gp37+v9g29uOAkYRNeQYMLubMARdn10LVyB0R1lCHg2Rrf2xIz12yli3CX+lZhT9OABx3dHLgRCdj3un1vQweEv0aNQz1WLB95jKH44F9GH8OqlHqnW0OWQBL74ILtfCtnB0BNgw9j1sx497h9qx1ryqOWGvmwcKsvn4vFgd8saEhh4yBcQA87vdY2KhxAw4UKMoh0r4IgCOAEOug/EqP0Quj4NBPJhpL6Q2ijAqiBqPBV16DIEBZ7LBoDMlQkqHAMGoJAT3slHceQDD/Gj4dSlfNHkU2fREEBW6yGcJdWMcgNDhwE6duEsRxYCj3tAjEudfO7Z+U4UYMSmXr4zppSMBecMMejyuAYXsjDvaLxA/IpcLPXxnI6Rz6MwYDjKDpQCYgyN5MCkyZEIaJhMRr4VE34uWICO4mbSd94DbHKJhkcyJ9eRxxlhAkS+y/bi4vjoCRLPA/4wtGgUWgLNRGgdHUdSfKHaWJptIl+RvPMyCkMrrpltq5fSmhPfZsS3RbGfEfvLHyF4qI4IU6dy/QnjqjSq0sJciHh+9HU6eqReF6NvMuJNUbzNiLdF0Q4yalBSpxl1WlKfMupTUZ1lxFlRnGfEeVH8mBE/FsW7TbHvN8X+XoiV9ZebYr5jDdFIfnSSVyicwCh8ffXm1yjsJM/yXQiQqueN0F4ZTwfNVqcVFWW80huDtm70ynpqaBld3awypI5uy9R6/TXLScGbQmtas99tFqMgXuvtjjko62tYrdvqGRWGNe3AbPRby/Ih5BesTuozO81GqSxOmtMxDOOsU9ZTg9EwzfZJhSF1mL1er2smKDRgFKOCl66MzeaZVo6iaVBbaza6Ffp6AbS2YZRgaYbFGBh6T09YBAK44BTpy2K2u51+MUis6290TbO0gCJT/rap9xoVhjXrWe+sf5qQEAZ8p1gVkpav2WqftotRJA0atOQKlljI+pcGHUNrlVhIhmVgmEZc2PxOlJvsXn8IF5/c9b5TD3U1iopmudGK5njvrcwFL64w43p3pX2Tv3oAroUvzxTCunhYEQ5rYWAVC6yHh5XwcBO8U/Y7dfFORbhTC+NUsTj18E4lvLMJnpb9tC6eVoTTWhhaxULr4WklPN0EL8p+URcvKsJFLYyoYhH18KISXmyCJ2U/qYsnFeGkFoZUsZB6eFIJTzbAL24F8Ukhvk6wUqQ8k8vTbKJDHIKSzgUQKJEJDnlJtsUYCRDrtieZkn+KntVNJUmhOLQcIJVocWKh8QUDYDU+oBOsuv7yDJP7vNJ46ATKY+3CvZhmD8mLCkNv5AnonTxdA3ng/ElOiDmeKyck/1pHcWuTEcxWRtnakZcmvXhFKjduT471xrGu/9Y4fNVc3p+2le+U75UfFV1pKa+U18qFcq1A5Q/lT+Uv5e/df3b/23u+t72wPttajvlWyT17B/8D/qxMdg==</latexit>

Gt =T-1X

t0=t

�t0-trt0+1 Rewards in RL are not differentiable

<latexit sha1_base64="faPeJyEyT6kqhqu06qDHnpseZ74=">AAAOtnicfZdbU+M2HMXN9kZpl8L2sS9umZ1hO5SxIeTywMzGTtJ9WHYp5bbF2YysKMaDbGlkOSQr/EH72k9S2QmOr/EDKDpHxz//LXkkm2I34Jr278aLr77+5tvvNr/f+uHHl9s/7ey+ug5IyCC6ggQTdmuDAGHXR1fc5RjdUoaAZ2N0Yz+YsX4zRSxwiX/J5xQNPeD47sSFgMuu0c6TRShigBPmAw+Jv99Hp6rlAX5v26IfjQTdt2wOwif59x5x8Ca6U60g9EaCn2rRZ3H5hx5ZYzeAJPT5Z65a0BF/RiPZwMRRLeqOlgP3LQCTO/InK+CAoxF/Mxzt7GmHWnKp5Ya+bOwpy+t8tPvyV2tMYOghn0MMguBO1ygfCsC4CzGKtqwwQBTAB+Cgu5BP2kPh+jTkyIeR+lpqkxCrnKhxJdSxyxDkeC4bADJXJqjwHjDJKeu1lY8KUFye4GA8dWmwaAZTZ9HgQBZ7KGbJy4hyA4XDAL134SxHJoAXxCUudQZzz853ohAjNvXynTGlZCw4Z4hBN4hrcC4L85HG1Q4uyflSv5/Te+QHkQgZjrIDpYAYQxM5MGkGiIdUJA8jJ9VDcMpZiA7iZtJ32gPs4QKND2ROriOPM8EE8HyX7cXF8dEjJJ4H/LGwaCQsjmZcWAeHkRRfqzaWZpsANs47LyIhltNSvZDWnPghI34oiv2M2F/ehOCxOiFMncr3T1igSqMqLcyFKMiPvkpHT9SrYvR1RrwuijcZ8aYo2mFGDUvqNKNOS+pjRn0sqrOMOCuK84w4L4pfMuKXoni7LvbTuth/CrGy/nJRzLesMZrIb1YyhcQDjMS7y7P3kegk13IuhEjV80ZoPxuPB81WpxUVZfysNwZt3eiV9dTQMrq6WWVIHd2WqfX6K5ajgjeF1rRmv9ssRkG80tsdc1DWV7Bat9UzKgwr2oHZ6LeW5UPIL1id1Gd2mo1SWZw0p2MYxkmnrKcGo2Ga7aMKQ+owe71e10xQaMgoRgUvfTY2mydaOYqmQW2t2ehW6KsXoLUNowRLMyzGwNB7esLCEcAFJ08ni9nudvrFIL6qv9E1zdIL5Jnyt02916gwrFhPeif944SEMOA7xaqQtHzNVvu4XYwiadCgJd9giYWs7jToGFqrxEIyLAPDNOLC5leiXGR3+lAsPrmrdafu6WoUFc1yoRXN8dp7Nhe8uMKM692V9nX+6gG4Fr78pBDWxcOKcFgLA6tYYD08rISH6+Cdst+pi3cqwp1aGKeKxamHdyrhnXXwtOyndfG0IpzWwtAqFloPTyvh6Tp4XvbzunheEc5rYXgVC6+H55XwfB08KftJXTypCCe1MKSKhdTDk0p4sgaeoUe55Yt3CnJ2CVaKXJwdEh1iAUp6cqJIZIJFUJIXR5BYtz3JlPwoe0CYOmSzqD8fcpK7UCwsB0glWuxoaHwAAViNN/AEq66/3OPkPr80HvoA5bZ34V6UoYfkQYahM7lD+rg8if0uH5g5nisfWP63DuLWOiOYPRtla0seqvTiEarcuDk61BuHuv5XY+/t2fJ8tan8ovym7Cu60lLeKu+Uc+VKgcp/G5sbuxuvtlvbw2207SysLzaWY35Wctc2/R9IonFp</latexit>

SL = Ep(⌧|✓)[T-1X

t=0

�tGt log⇡✓(at|st)]

<latexit sha1_base64="KIzHNcLYbod1+x/KccCLlzNOPGI=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe+P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wN8r/HQ</latexit>s0<latexit sha1_base64="M8sqRzeDFh3Mi1iBGtqcTbyBm/Y=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk98b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gf1BfGy</latexit>a0

~<latexit sha1_base64="9xsJmqaI6QVO7aNPEsbsg9BoZ44=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL7837Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q8UNPHI</latexit>r1

<latexit sha1_base64="OzSwNdQHk8BIaE6qZCRw39GSBas=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe/P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wOJuPHR</latexit>s1<latexit sha1_base64="faCoi3suTLeizhRJ44TVzLkE1Ww=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk9+b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gcCHfGz</latexit>a1

~<latexit sha1_base64="pd3L+wPVTqb/6ceddXPUL+1bOrE=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL70/unx0ax8by0quFuS4OtfV1fv98fzAcERgHKOQQgyi6NQ3K7xLAuA8xSveGcYQogFPgoduYj9t3iR/SmKMQpvpLoY1jrHOiZ1D6yGcIcrwQBYDMFwk6nAAGIBfoe+WoCIUgQNHRaObTaFVGM29VcCDu+y6ZL/uSliYmHgN04sN5iSwBQRQAPqkMRovALQ+iGCM2C8qDGaVglJxzxKAfZT04F415T7NWR5fkfK1PFnSCwihNYobT4kQhIMbQWExclhHiMU2WNyPWdxq94ixGR1m5HHvlADa9QKMjkVMaKOOMMQG8POQGWXNC9ABJEIBwlAxpmgw5mvNkeHScCvGl7mJhdglgo7LzIk2SYdYz19UvhLUkviuI72SxVxB76x8heKSPCdNnYv0Ji3Rh1IWF+RBF5dmDfPZYH8jRVwXxShavC+K1LLpxQY0r6qygzirqQ0F9kNV5QZzL4qIgLmTxU0H8JIs3u2I/7Ir9KMWK/otNsdgbjtBYvD6Wj1AyhWny5vLtH2nSWV7rZyFGulk2QndjPO03W51WKst4ozf6bdNyqnpuaFld01YZcke3ZRtOb8tyInlzaMNo9rpNOQrird7u2P2qvoU1ui3HUhi2tH270Wut24dQKFm93Gd3mo1KW7w8p2NZ1lmnqucGq2Hb7ROFIXfYjuN07SUKjRnFSPLSjbHZPDOqUTQPahvNRlehbxfAaFtWBZYWWKy+ZTrmkoUjgCUnzx8Wu93t9OQgvu2/1bXtygLyQvvbtuk0FIYt65lz1jtdkhAGQk/uCsnb12y1T9tyFMmD+i2xghUWsv2lfscyWhUWUmDpW7aVNba8E8UmuzXvktUrd7vv9ENTT1PZLDaabM723sYsebHCjOvdSvsuv3oCroWv3imEdfFQEQ5rYaCKBdbDQyU83AXvVf1eXbynCPdqYTwVi1cP7ynhvV3wtOqndfFUEU5rYaiKhdbDUyU83QXPq35eF88V4bwWhqtYeD08V8LzXfCk6id18UQRTmphiIqF1MMTJTwpw4s/j+zUDrCenXoJ1v1wfTAovbNodnyYQnFWXLlXN+4gcfpn6K04VrwXR1YgTnG/JkPAvMAPU/E14A2PsmqXEcw3RlGJDxFT/uyoFtcnx2bj2DT/bBy+/n39TfJU+177UftFM7WW9lp7o51rAw1qgfaX9rf2z/5/Bz8c/HTw88r6+NF6zgutdB389j8hPfHJ</latexit>r2

~

~ ~

~

~

<latexit sha1_base64="1RcFirdzOoS5tyEl3SzBljfwnaY=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnS+5P7Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q+WwfHS</latexit>s2

<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>

⇡✓<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>

⇡✓<latexit sha1_base64="QERxgNakkv/TH9ixu1baowQUq7c=">AAAN/HicfZdNb9s2HMbVdi9dVq/tetxFWFBgGIJAShy/HArUkmz0sLZZkLctDgKKpmXBlEhQlGNX0MfYdTvsOOy677J9mlGyI0sUZV3C8Hn4+Ke/ROFPl2I/4obx76PHTz77/Isvn3619/Wz1jfPX7z89jIiMYPoAhJM2LULIoT9EF1wn2N0TRkCgYvRlTu3M/1qgVjkk/Ccryi6DYAX+lMfAi6mbsbUvxu7fIY4uHuxbxwa+aXXB+ZmsK9trtO7l8/+G08IjAMUcohBFN2YBuW3CWDchxile+M4QhTAOfDQTcynvdvED2nMUQhT/bXQpjHWOdEzLH3iMwQ5XokBgMwXCTqcAQYgF/B71agIhSBA0cFk4dNoPYwW3nrAgbjz22SZVyatLEw8BujMh8sKWQKCKAB8VpuMVoFbnUQxRmwRVCczSsEoOZeIQT/KanAqCvORZsWOzsnpRp+t6AyFUZrEDKflhUJAjKGpWJgPI8RjmuQ3I57wPHrDWYwOsmE+98YBbH6GJgcipzJRxZliAnh1yg2y4oToHpIgAOEkGdM0GXO05Mn44DAV4mvdxcLsEsAmVedZmiTjrGauq58Ja0X8UBI/yOKwJA43P0LwRJ8Spi/E8ycs0oVRFxbmQxRVV18Uq6f6hRx9WRIvZfGqJF7JohuX1LimLkrqoqbel9R7WV2WxKUsrkriShY/lcRPsni9K/aXXbG/SrGi/mJTrPbGEzQVH5D8FUrmME3enb//KU36+bV5F2Kkm1UjdB+Mx6NOt99NZRk/6O1Rz7Scul4YutbAtFWGwjHo2oYz3LIcSd4C2jA6w0FHjoJ4q/f69qiub2GNQdexFIYt7chuD7ub8iEUSlav8Nn9TrtWFq/I6VuWddKv64XBatt270hhKBy24zgDO0ehMaMYSV76YOx0Tox6FC2CekanPVDo2wdg9CyrBktLLNbIMh0zZ+EIYMnJi5fF7g36QzmIb+tvDWy79gB5qfw923TaCsOW9cQ5GR7nJISB0JOrQorydbq9454cRYqgUVc8wRoL2f7SqG8Z3RoLKbGMLNvKClvdiWKT3Zi3yfqTu913+r6pp6lsFhtNNmd778EsebHCjJvdSvsuv3oBboSv3ymETfFQEQ4bYaCKBTbDQyU83AXv1f1eU7ynCPcaYTwVi9cM7ynhvV3wtO6nTfFUEU4bYaiKhTbDUyU83QXP637eFM8V4bwRhqtYeDM8V8LzXfCk7idN8UQRThphiIqFNMMTJTzZAc/QvWj5sk5BvF0Jq0WKnlx0s7kOcQJqesQBR7lMcBLV5PVpI9PdQDDl/6xbEZqdHADWs86bYN0PN81J5btJs5VzKPrVtXvN7yBxAmHovWhtPoq2GYhO8kdByrzAF6Ti7/ggG+0yguWDUYz2xGnIlM8+9cHV0aHZPjTNn9v7bzubg9FT7Tvte+0HzdS62lvtnXaqXWhQI9pv2u/aH6209Wfrr9bfa+vjR5s1r7TK1frnf76FK1o=</latexit>

⇡✓

<latexit sha1_base64="vJL2nU9GCZRL0g9hkkWg4kuQvbc=">AAAPqnicfZfbbts2HMad7tRl69Kul7shFhRotyCQU8eHiwK1ZBu96CHLmqRtFAQUTctCKJEgKccuoUfaA+0B9h6jZFnW0boxze/jp59IyubfYcQT0jD+3Xvwzbffff/Dwx/3f/r50S8Hj5/8eiloyBG+QJRQ/smBAhMvwBfSkwR/YhxD3yH4yrmzYv1qgbnwaPBRrhi+8aEbeDMPQam7bh//Y1OGOZSUB9DH6u+30Svbh3LuOGoc3SobIaK+RJFN8Exe2yL0dZ+APiPY9gKwkYFNqAvY81R6AUDBCmzNhIDN8T3kU+12qHyefnsB/tyY1x3rXJeo8zh33ac/PXcub24fHxrHRnKBaqOdNg5b6XV2++TRf/aUotDHgUQECnHdNpi8UZBLDxEc7duhwAyiO+ji61DO+jfKC1gocYAi8Exrs5AASUE8dWDq6aeQZKUbEHFPJwA0hxwiqSd4vxglcDyf4mi68JhYN8XCXTck1Ktzo5bJ6kWFgcrlkM09tCyQKeiLeFEqnWLlO8VOHBLMF36xM6bUjCXnEnPkiXgOzvTEfGDxhhAf6Vmqz1dsjgMRqZCTKD9QC5hzPNMDk6bAMmQqeRi9C+/EK8lDfBQ3k75XI8jvzvH0SOcUOoo4M0KhLHY5fjw5Ab5H1PdhMFU2i5Qt8VIq++g40uIz4BBtdqjeI0XneaRUupGB3khF8X1OfF8WxzlxnN6EkimYUQ4Wev0pF0AbgbZwD2FRHH2RjZ6Bi3L0ZU68LItXOfGqLDphTg0r6iKnLirqfU69L6vLnLgsi6ucuCqLX3Pi17L4aVfs512xX0qxev71S7Hat6d4pn/kki2k7lCk3nx89zZSg+RK90KIQbtoRM7G+HLS7Q16UVkmG70z6bfNUVXPDD1z2LbqDJlj2LOM0XjLclLyZtCG0R0Pu+UoRLZ6f2BNqvoW1hj2RmaNYUs7sTrjXjp9GAclq5v5rEG3U5kWN8sZmKZ5OqjqmcHsWFb/pMaQOazRaDS0EhQWcv1/UPKyjbHbPTWqUSwL6hvdzrBG3y6A0TfNCizLsZgTsz1qJywSQ1JyymyzWP3hYFwOktv5N4eWVVlAmZv+vtUedWoMW9bT0en4ZUJCOQzc8qzQbPq6vf7LfjmKZkGTnl7BCgvd3mkyMI1ehYXmWCamZcYTW3wT9Ut23b5R65/c7XsHDtsgispm/aKVzfG7tzGXvKTGTJrdtfZd/voBpBG++qQINcWjmnDUCIPqWFAzPKqFR7vg3arfbYp3a8LdRhi3jsVthndr4d1d8KzqZ03xrCacNcKwOhbWDM9q4dkueFn1y6Z4WRMuG2FkHYtshpe18HIXPK36aVM8rQmnjTC0joU2w9NaeLoDPq0kVLLnFK9E6jO5Ps0muq5QYEUXEkqcyJQoUZEdOccSxrrja6bkS9UDw8yhmxUdM7HRddMjmqfsmXoC0TCQCQkjynahVqLs1DP1dN0CsJCen9RrpYdIiqvNQ+rzWDneFbPtYUq5cT1XrPYmUXUMm+4cczaKUj4WF1GQgLgIoQTosm19Tiv8hbA47A7po/vavV7KEdbFGMfv9E0+pOF/6EXjru/pRdOf9lHc2mWEy41Rt/Z1Ydgul4HVxuXJcfv02Pirc/j6XVoiPmz91vq99bzVbvVar1tvWmetixbae7o32DP3rIOjg/ODzwfXa+uDvXTM01bhOpj+D+141BE=</latexit>

SL = EZ

"X

z2Z

log p(z)X

z�r

?(r) +X

r2R

r

#

Schulman, John, et al. "Gradient estimation using stochastic computation graphs." Advances in Neural Information Processing Systems. 2015.

Can we generalize this? Yes! Schulman et al, 2015 introduces an algorithm for gradient estimation on any computation graph with stochastic nodes. Say we have a set of stochastic nodes Z and a set of rewards R. We will also use the notation z < r, which means that the reward r is influenced by the node z. A node is influenced by another if there is a directed path between them through the computation graph.

The correct surrogate loss then sums over each stochastic node to get their log-probability. This is multiplied by the sum of rewards that the stochastic node influences. Finally, we also have a sum over all rewards to ensure that parameters that directly influence the rewards are also taken into account for computing the gradient.

Compare this with the REINFORCE surrogate loss. We note that reward functions in RL are not differentiable, so we won’t need the second sum over rewards. Then, the automatic surrogate loss corresponds perfectly with the REINFORCE estimator: It sums over actions to compute their log-probability, and this is multiplied by the sum of rewards that the action influences!

Score function has high variance… Can we do better? Lecture 6 (VAEs):

• VAEs also use gradient estimation! • “Reparameterization trick”

CAN WE DO BETTER?

43

~

<latexit sha1_base64="JYRotR/AW1We/S2YX7ncehvQmNA=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIjeOXQ4Faso0e1jbL8rbVQUDRtCKYEgmScuwKuu0L7Lp9gx2HXfdB9gH2PUbJtiyRknXJP3wePv7pT8kmPYoDLizr3wcPP/r4k08/e/T53hdfPt5/8vTZV5ecxAyiC0gwYdce4AgHEboQgcDomjIEQg+jK2/mZvrVHDEekOhcLCm6CYEfBdMAAiGHrsaeuEMC3D49sI6s/DL1wl4XB8b6Or199vi/8YTAOESRgBhw/t62qLhJABMBxCjdG8ccUQBnwEfvYzHt3iRBRGOBIpiaz6U2jbEpiJkhmZOAISjwUhYAskAmmPAOMACFBN+rRnEUgRDxw8k8oHxV8rm/KgSQd32TLPKupJWJic8AvQvgokKWgJCHQNxpg3wZetVBFGPE5mF1MKOUjIpzgRgMeNaDU9mYdzRrND8np2v9bknvUMTTJGY4LU+UAmIMTeXEvORIxDTJb0au7oy/FCxGh1mZj70cADY7Q5NDmVMZqOJMMQGiOuSFWXMidA9JGIJokoxpmowFWohkfHiUSvG56WFp9ghgk6rzLE2ScdYzzzPPpLUivi2Jb1VxWBKH6w8heGJOCTPncv0J46Y0mtLCAoh4dfZFMXtqXqjRlyXxUhWvSuKVKnpxSY01dV5S55p6X1LvVXVREhequCyJS1X8UBI/qOL1rtifd8X+osTK/suXYrk3nqCp/PLIH6FkBtPk9fmbH9Kkl1/rZyFGpl01Qm9jPB61O71Oqsp4o7dGXdsZ6Hph6Dh9260zFI5+x7UGwy3LC8VbQFtWe9hvq1EQb/Vuzx3p+hbW6ncGTo1hSztyW8POun0IRYrVL3xur93S2uIXOT3HcU56ul4YnJbrdl/UGAqHOxgM+m6OQmNGMVK8dGNst08sPYoWQV2r3erX6NsFsLqOo8HSEoszcuyBnbMIBLDiFMXD4nb7vaEaJLb9d/quqy2gKLW/69qDVo1hy3oyOBke5ySEgchXu0KK9rU73eOuGkWKoFFHrqDGQrafNOo5VkdjISWWkeM6WWOrb6J8yd7bN8nqK3f73pkHtpmmqlm+aKo5e/c2ZsWLa8y42V1r3+Wvn4Ab4fU7hbApHtaEw0YYWMcCm+FhLTzcBe/rfr8p3q8J9xth/DoWvxner4X3d8FT3U+b4mlNOG2EoXUstBme1sLTXfBC94umeFETLhphRB2LaIYXtfBiFzzR/aQpntSEk0YYUsdCmuFJLTzZAc/QvdzyZTsF+XQlTIuUe3K5m811iBOg6VwAgXKZ4IRr8uq0keleKJnyf3QPiAuHLDUdUb7RZRlgyaN6JgGHJI5ETkJxMvaBVNJi1zMJ5LnFRFwEYX4OUm4ChPLHdHOTcienxvt8ut1MJX56m4yJ3LADuYfNTiLJT6NUn0MnO+ecDtI1H80OUQCb2SGEYDOI1vu0yk8IzcJmUG7dV+7VUg6QPIwx9EZ+yLt1+Pdy0ZgfBnLR5N/xYVbtMoLFxiirPXkwtNVjoF5cvTiyW0e2/WPr4FV7fUZ8ZHxjfGt8Z9hGx3hlvDZOjQsDGjPjN+N344/9X/f/3P9r/++V9eGD9Zyvjcq1/8//cW+XHQ==</latexit>

✓<latexit sha1_base64="T8FmzHO54kPteUaY1WhvZp8nAKY=">AAAPBXicfZdNb9s2HMbV7q3L1qzdjrsICwoMQxBIieOXQ4Faso0e1jbL8tbFQUDRtCKEEgmScuwKuu4L7Lp9gx2HXfc59gH2PUbJtiyRknXJP3wePv7pT8kmPYoDLizr30ePP/r4k08/e/L5zhdfPt396tnzry84iRlE55Bgwq48wBEOInQuAoHRFWUIhB5Gl969m+mXM8R4QKIzsaDoJgR+FEwDCIQcek9vx564QwLcPtuzDqz8MvXCXhV7xuo6uX3+9L/xhMA4RJGAGHB+bVtU3CSAiQBilO6MY44ogPfAR9exmHZvkiCisUARTM0XUpvG2BTEzKDMScAQFHghCwBZIBNMeAcYgEKi71SjOIpAiPj+ZBZQviz5zF8WAsj7vknmeV/SysTEZ4DeBXBeIUtAyEMg7rRBvgi96iCKMWKzsDqYUUpGxTlHDAY868GJbMw7mrWan5GTlX63oHco4mkSM5yWJ0oBMYamcmJeciRimuQ3I9f3nr8ULEb7WZmPvRwAdn+KJvsypzJQxZliAkR1yAuz5kToAZIwBNEkGdM0GQs0F8l4/yCV4gvTw9LsEcAmVedpmiTjrGeeZ55Ka0V8WxLfquKwJA5XH0LwxJwSZs7k+hPGTWk0pYUFEPHq7PNi9tQ8V6MvSuKFKl6WxEtV9OKSGmvqrKTONPWhpD6o6rwkzlVxURIXqvihJH5Qxattse+3xf6ixMr+y5disTOeoKn8+sgfoeQepsnrszc/pkkvv1bPQoxMu2qE3tp4NGp3ep1UlfFab426tjPQ9cLQcfq2W2coHP2Oaw2GG5ZDxVtAW1Z72G+rURBv9G7PHen6BtbqdwZOjWFDO3Jbw86qfQhFitUvfG6v3dLa4hc5Pcdxjnu6Xhiclut2D2sMhcMdDAZ9N0ehMaMYKV66Nrbbx5YeRYugrtVu9Wv0zQJYXcfRYGmJxRk59sDOWQQCWHGK4mFxu/3eUA0Sm/47fdfVFlCU2t917UGrxrBhPR4cD49yEsJA5KtdIUX72p3uUVeNIkXQqCNXUGMhm08a9Ryro7GQEsvIcZ2ssdU3Ub5k1/ZNsvzK3bx35p5tpqlqli+aas7evbVZ8eIaM25219q3+esn4EZ4/U4hbIqHNeGwEQbWscBmeFgLD7fB+7rfb4r3a8L9Rhi/jsVvhvdr4f1t8FT306Z4WhNOG2FoHQtthqe18HQbvND9oile1ISLRhhRxyKa4UUtvNgGT3Q/aYonNeGkEYbUsZBmeFILT7bAM/Qgt3zZTkE+XQnTIuWeXO5mcx3iBGg6F0CgXCY44Zq8PG1kuhdKpvwf3QPiwiFLTUeUr3VZBljyqJ5JwCGJI5GTUJyMfSCVtNj1TAJ5bjERF0GYn4SUmwCh/DFd36TcyanxPp9uNlOJn94mYyI37EDuYbOTSPLzKNXn0MnWOSeDdMVHs0MUwGZ2CCHYDKLVPq3yE0KzsHsot+5L93IpB0gexhh6Iz/k3Sr8B7lozA8DuWjy73g/q7YZwXxtlNWOPBja6jFQLy4PD+zWgW3/1Np71V6dEZ8Y3xrfGd8bttExXhmvjRPj3IBGaPxm/G78sfvr7p+7f+3+vbQ+frSa841RuXb/+R+p/5gA</latexit>p✓

<latexit sha1_base64="ytaGjQ3HxJR6GVJlezzzjEx+HxU=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxLGD4PH/30pyiTHsUBF5b176PHH338yaefPfl854svn+4+e/7iq0tOYgbRBSSYsPce4AgHEboQgcDoPWUIhB5GV97MzfSrOWI8ING5WFJ0EwI/CqYBBEJ2XY05CClGt8/3rAMrv0y9Ya8ae8bqOr198fS/8YTAOESRgBhwfm1bVNwkgIkAYpTujGOOKIAz4KPrWEy7N0kQ0VigCKbmS6lNY2wKYmZI5iRgCAq8lA0AWSATTHgHGIBCgu9UoziKQIj4/mQeUP7Q5HP/oSGAfOqbZJFXJa0MTHwG6F0AFxWyBIQ8BOJO6+TL0Kt2ohgjNg+rnRmlZFScC8RgwLManMrCvKNZofk5OV3pd0t6hyKeJjHDaXmgFBBjaCoH5k2OREyT/GHk7M74K8FitJ81875XA8BmZ2iyL3MqHVWcKSZAVLu8MCtOhO4hCUMQTZIxTZOxQAuRjPcPUim+ND0szR4BbFJ1nqVJMs5q5nnmmbRWxLcl8a0qDkvicHUTgifmlDBzLuefMG5KoyktLICIV0dfFKOn5oUafVkSL1XxqiReqaIXl9RYU+clda6p9yX1XlUXJXGhisuSuFTFDyXxgyq+3xb787bYX5RYWX+5KJY74wmayo9H/golM5gmb85PfkiTXn6t3oUYmXbVCL218WjU7vQ6qSrjtd4adW1noOuFoeP0bbfOUDj6HdcaDDcsh4q3gLas9rDfVqMg3ujdnjvS9Q2s1e8MnBrDhnbktoadVfkQihSrX/jcXrullcUvcnqO4xz3dL0wOC3X7R7WGAqHOxgM+m6OQmMmP+OKl66N7faxpUfRIqhrtVv9Gn0zAVbXcTRYWmJxRo49sHMWgQBWnKJ4WdxuvzdUg8Sm/k7fdbUJFKXyd1170KoxbFiPB8fDo5yEMBD5alVIUb52p3vUVaNIETTqyBnUWMjmTqOeY3U0FlJiGTmukxW2uhLlIru2b5KHT+5m3Zl7tpmmqlkuNNWcrb21WfHiGjNudtfat/nrB+BGeP1JIWyKhzXhsBEG1rHAZnhYCw+3wfu632+K92vC/UYYv47Fb4b3a+H9bfBU99OmeFoTThthaB0LbYantfB0G7zQ/aIpXtSEi0YYUccimuFFLbzYBk90P2mKJzXhpBGG1LGQZnhSC0+2wDN0L7d82U5Bvl0J0yLlnlzuZnMd4gRoOhdAoFwmOOGa7Ik7JECme6Fkyv/RPSAuHLKp6YjytS6bAZY8qmcScEjiSOQkFCdjH0glLXY9k0CeW0zERRDm5yDlIfIz0foh5X5Mjff5dLOZSvz0NhkTuWEHcg+bnUSSn0apPoZOto45HaQrPpodogA2s0MIwWYQrfZplZ8QmoXNoNy6P7gfpnKA5GGMoRN5k3er8O/lpDE/DOSkyb/j/ay1zQgWa6Ns7ciDoa0eA/XG1eGB3Tqw7R9be69PVmfEJ8Y3xrfGd4ZtdIzXxhvj1LgwoDEzfjN+N/7Y/XX3z92/dv9+sD5+tBrztVG5dv/5Hw1/l0A=</latexit>z<latexit sha1_base64="dSa//P/biN8T0p7UDtRAa3DR8qw=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxL/uHz8NFPf1o26VEccGFZ/z56/NHHn3z62ZPPd7748unus+cvvrrkJGYQXUCCCXvvAY5wEKELEQiM3lOGQOhhdOXN3Ey/miPGAxKdiyVFNyHwo2AaQCDk0NUY+jiZprfP96wDK79MvbBXxZ6xuk5vXzz9bzwhMA5RJCAGnF/bFhU3CWAigBilO+OYIwrgDPjoOhbT7k0SRDQWKIKp+VJq0xibgpgZkjkJGIICL2UBIAtkggnvAANQSPCdahRHEQgR35/MA8ofSj73HwoB5FPfJIu8K2llYuIzQO8CuKiQJSDkIRB32iBfhl51EMUYsXlYHcwoJaPiXCAGA5714FQ25h3NGs3PyelKv1vSOxTxNIkZTssTpYAYQ1M5MS85EjFN8oeRqzvjrwSL0X5W5mOvBoDNztBkX+ZUBqo4U0yAqA55YdacCN1DEoYgmiRjmiZjgRYiGe8fpFJ8aXpYmj0C2KTqPEuTZJz1zPPMM2mtiG9L4ltVHJbE4eomBE/MKWHmXK4/YdyURlNaWAARr86+KGZPzQs1+rIkXqriVUm8UkUvLqmxps5L6lxT70vqvaouSuJCFZclcamKH0riB1V8vy32522xvyixsv/ypVjujCdoKr888o9QMoNp8ub85Ic06eXX6rMQI9OuGqG3Nh6N2p1eJ1VlvNZbo67tDHS9MHScvu3WGQpHv+Nag+GG5VDxFtCW1R7222oUxBu923NHur6BtfqdgVNj2NCO3Naws2ofQpFi9Quf22u3tLb4RU7PcZzjnq4XBqflut3DGkPhcAeDQd/NUWjMKEaKl66N7faxpUfRIqhrtVv9Gn2zAFbXcTRYWmJxRo49sHMWgQBWnKL4sLjdfm+oBolN/52+62oLKErt77r2oFVj2LAeD46HRzkJYSDy1a6Qon3tTveoq0aRImjUkSuosZDNnUY9x+poLKTEMnJcJ2ts9U2UL9m1fZM8fOVu3jtzzzbTVDXLF001Z+/e2qx4cY0ZN7tr7dv89RNwI7z+pBA2xcOacNgIA+tYYDM8rIWH2+B93e83xfs14X4jjF/H4jfD+7Xw/jZ4qvtpUzytCaeNMLSOhTbD01p4ug1e6H7RFC9qwkUjjKhjEc3wohZebIMnup80xZOacNIIQ+pYSDM8qYUnW+AZupdbvmynkB0MmBYp9+RyN5vrECdA07kAAuUywQnXZE/cIQEy3QslU/6P7gFx4ZClpiPK17osAyx5VM8k4JDEkchJKE7GPpBKWux6JoE8t5iIiyDMz0HKQ4BQ/piuH1Lux9R4n083m6nET2+TMZEbdiD3sNlJJPlplOpz6GTrnNNBuuKj2SEKYDM7hBBsBtFqn1b5CaFZ2AzKrfuD+2EpB0gexhg6kTd5twr/Xi4a88NALpr8O97Pqm1GsFgbZbUjD4a2egzUi6vDA7t1YNs/tvZen6zOiE+Mb4xvje8M2+gYr403xqlxYUBjZvxm/G78sfvr7p+7f+3+/WB9/Gg152ujcu3+8z8UJZdS</latexit>

f

Stochastic node

Can we do better than the score function, which has very high variance? Yes, but not always. Recall lecture 6, in which Jakub discussed VAEs. He also talked about reparameterization. The reparameterization is actually also a gradient estimator! Let’s dive in.

Gaussian

<latexit sha1_base64="+SC8ajZYfXh8LM2RijQA9snXI7U=">AAAPPHicfZdNb9s2HMbV7q3L1qzdjrsICwp0QxBIreOXQ4daso0e1jZrm6ZbFAQUTStCKJEgKceuoG+zr7EvsOt22XnYadh151GyLUukZF1M83n46Kc/KZv0KQ65sKw/b93+4MOPPv7kzqd7n31+d/+Le/e/fMtJwiA6hQQT9s4HHOEwRqciFBi9owyByMfozL92c/1sjhgPSfxGLCm6iEAQh7MQAiG7Lu9970VAXEGA0xfZQw/6OPV8gqd8GcmP1IuSLDs0PUiV/tdhEIEs+/by3oF1ZBWXqTfsdePAWF8nl/fv/u1NCUwiFAuIAefntkXFRQqYCCFG2Z6XcEQBvAYBOk/ErH+RhjFNBIphZj6Q2izBpiBm/izmNGQICryUDQBZKBNMeAUYgEI+8V49iqMYRIgfTuch5asmnwerhgCyXBfpoihnVhuYBgzQqxAuamQpiHheN60zr0+9EyUYsXlU78wpJaPiXCAGQ57X4EQW5iXNZ4i/ISdr/WpJr1DMszRhOKsOlAJiDM3kwKLJkUhoWjyMXBbX/IlgCTrMm0XfkxFg16/Q9FDm1DrqODNMgKh3+VFenBjdQBJFIJ6mHs1ST6CFSL3Do0yKD0wfS7NPAJvWna+yNC3Wmu+br6S1Jr6oiC9UcVwRx+ubyKVozggz53L+CeOmNJrSwkKIeH30aTl6Zp6q0W8r4ltVPKuIZ6roJxU10dR5RZ1r6k1FvVHVRUVcqOKyIi5V8X1FfK+K73bF/rQr9mclVtZfvhTLPW+KZvJXp1hC6TXM0mdvnv+QpYPiWq+FBJl23Qj9jfHxpNsb9DJVxhu9M+nbzkjXS0PPGdpuk6F0DHuuNRpvWR4p3hLasrrjYVeNgnir9wfuRNe3sNawN3IaDFvaidsZ99blQyhWrEHpcwfdjlaWoMwZOI5zPND10uB0XLf/qMFQOtzRaDR0CxSaMIqR4qUbY7d7bOlRtAzqW93OsEHfToDVdxwNllZYnIljj+yCRSCAFacoF4vbHw7GapDY1t8Zuq42gaJS/r5rjzoNhi3r8eh4/LggIQzEgVoVUpav2+s/7qtRpAya9OQMaixke6fJwLF6GgupsEwc18kLW38T5Ut2bl+kq5/c7XtnHthmlqlm+aKp5vzd25gVL24w43Z3o32Xv3kAboXXnxTCtnjYEA5bYWATC2yHh43wcBd8oPuDtvigITxohQmaWIJ2+KARPtgFT3U/bYunDeG0FYY2sdB2eNoIT3fBC90v2uJFQ7hohRFNLKIdXjTCi13wRPeTtnjSEE5aYUgTC2mHJ43wZAc8Qzdyy5fvFOTqSpkWKffkcjdb6BCnQNO5AAIVsjxecE32xRUSINf9SDIVX3QPSEqHbGo6onyjy2aIJY/qmYYckiQWBUl+6AlAFBW3Wu16pqE8t5iIizAqDlDKQ4BI/pluHlLu5NT4gM+2m6k0yC5Tj8gNO5B72Pwkkr6eZPoYOt055mSUrflofogC2Fwd0swwXu/Tan8hNA+7hnLrvnKvpnKE5GGMoefyJi/X4d/JSWNBFMpJk5/eYd7aZQSLjVG29uTB0FaPgXrj7NGR3Tmy7R87B0+fr8+Id4yvjW+Mh4Zt9IynxjPjxDg1oPGL8Zvxu/HH/q/7f+3/s//vynr71nrMV0bt2v/vf133r4k=</latexit>

N(µ,⌃)<latexit sha1_base64="YRaiQUIBwB+l5XdKXuMbbgL16sE=">AAAPK3icfZfdbts2GIbV7q/L1qzdDnciLCjQDVkgtY5/DgrUkm30YE2zLn9dHAQUTStCKJEgKccuoavYbewGdrrdwc427LT3MUq2ZYmSrJN84fvy9aOPkk16FAdcWNY/9+5/9PEnn3724POdL758uPvVo8dfn3ESM4hOIcGEXXiAIxxE6FQEAqMLyhAIPYzOvVs31c9niPGARCdiQdFVCPwomAYQCDV0/ejHsYcoH/MgNMchEDcQYHmUPM1qbyqtZN9c13by/fWjPevAyi6zWtirYs9YXcfXjx9+GE8IjEMUCYgB55e2RcWVBEwEEKNkZxxzRAG8BT66jMW0eyWDiMYCRTAxnyhtGmNTEDNFNycBQ1DghSoAZIFKMOENYAAKdYM75SiOIhAivj+ZBZQvSz7zl4UAqjtXcp51LylNlD4D9CaA8xKZBCFPm1AZ5IvQKw+iGCM2C8uDKaVi1JxzxGDA0x4cq8a8oemC8BNyvNJvFvQGRTyRMcNJcaISEGNoqiZmJUcipjK7GfUU3PIXgsVoPy2zsRcDwG7fosm+yikNlHGmmABRHvLCtDkRuoMkDEE0kWOayLFAcyHH+weJEp+YHlZmjwA2KTvfJlIuHxzPfKusJfGoIB7p4rAgDlcfQvDEnBJmztT6E8ZNZTSVhQUQ8fLs03z21DzVo88K4pkunhfEc1304oIaV9RZQZ1V1LuCeqer84I418VFQVzo4vuC+F4XL7bFvtsW+6sWq/qvXorFzniCpupLJnuE5C1M5KuT1z8lspddq2chRqZdNkJvbXw+and6nUSX8Vpvjbq2M6jquaHj9G23zpA7+h3XGgw3LM80bw5tWe1hv61HQbzRuz13VNU3sFa/M3BqDBvakdsadlbtQyjSrH7uc3vtVqUtfp7TcxznsFfVc4PTct3usxpD7nAHg0HfzVBozChGmpeuje32oVWNonlQ12q3+jX6ZgGsruNUYGmBxRk59sDOWAQCWHOK/GFxu/3eUA8Sm/47fdetLKAotL/r2oNWjWHDejg4HD7PSAgDka93heTta3e6z7t6FMmDRh21ghUWsvmkUc+xOhUWUmAZOa6TNrb8JqqX7NK+ksuv3M17Z+7ZZpLoZvWi6eb03VubNS+uMeNmd619m79+Am6Er94phE3xsCYcNsLAOhbYDA9r4eE2eL/q95vi/ZpwvxHGr2Pxm+H9Wnh/Gzyt+mlTPK0Jp40wtI6FNsPTWni6DV5U/aIpXtSEi0YYUccimuFFLbzYBk+qftIUT2rCSSMMqWMhzfCkFp5sgWfoTm350p2Cerokq0SqPbnazWY6xBJUdC6AQJlMsOQV2RM3SIBU90LFlP1T9YA4d6iyoqtzzVpXZYAVj+6ZBBySOBIZCcVy7AOlJPmuZxKoc4uJuAjC7Lyk3QQI1Y/p+ibVTk6P9/l0s5mSfnItx0Rt2IHaw6YnEfnLKKnOoZOtc44HyYqPpocogM30EEKwGUSrfVrpJ4SmYbdQbd2X7uVSDpA6jDH0Wn3Im1X4D2rRmB8GatHU3/F+Wm0zgvnaqKoddTC09WNgtTh/dmC3Dmz759bey/bqjPjA+Nb4znhq2EbHeGm8Mo6NUwMavxl/GH8af+3+vvv37r+7/y2t9++t5nxjlK7dD/8Dpbqmxw==</latexit>

✏ ⇠ N(0, 1)<latexit sha1_base64="tfoCE8ruVfGunj6AvuNzONY8WGE=">AAAPPnicfVfLbttGFGXSV+o2btIuuyFqBChawyATWY+FAYuUhCyaxE3iOG1oGMPRiCZMcgYzQ1kKwd/pb/QHum2BfkCWRbdd9pKSKHJIihtdzzlzeObODH2vywJfSMP4+87djz7+5NPP7n2+98WX9/e/evDw6zeCxhyTc0wDyt+6SJDAj8i59GVA3jJOUOgG5MK9sTP8Yk648Gn0Wi4ZuQyRF/kzHyMJQ1cPTh2BQhaQE93BbpA4Lg2mYhnCT+KEcZrqP+qOS5hwMFPQV74XojS9enBgHBn5o9cDcx0caOvn7Orh/Q/OlOI4JJHEARLinWkweZkgLn0ckHTPiQVhCN8gj7yL5ax/mfgRiyWJcKo/AmwWB7qkerYafepzgmWwhABh7oOCjq8RR1jCmveqUoJEKCTicDr3mViFYu6tAokgYZfJIk9oWpmYeByxax8vKs4SFIoQyevaYJae6iCJA8LnYXUwcwkeFeaCcOyLLAdnkJgXLNsj8ZqerfHrJbsmkUiTmAdpeSIAhHMyg4l5KIiMWZIvBg7GjTiRPCaHWZiPnYwQv3lJpoegUxmo2pkFFMnqkBtmyYnILaZhiKJp4rA0cSRZyMQ5PEoBfKS7AZBdivi0ynyZJnCmIGeuq78EagV8XgKfq+C4BI7XL4GTqM8o1+ew/5QLHYg6ULiPiajOPi9mz/RzVfpNCXyjghcl8EIF3biExjV0XkLnNfS2hN6q6KIELlRwWQKXKvi+BL5Xwbe7ZH/ZJfurIgv5h0ux3HOmZAbfnfwIJTc4TZ6+fvZTmgzyZ30WYqKbVSJ2N8Qnk25v0EtVONjgnUnftEZ1vCD0rKFpNxEKxrBnG6Px1stjhVuYNozueNhVpXCwxfsDe1LHt2aNYW9kNRC2bid2Z9xbp4+QSKF6Bc8edDu1tHiFzsCyrONBHS8IVse2+48bCAXDHo1GQzu3wmIO33+FyzbEbvfYqEuxQqhvdDvDBny7AUbfsmpmWcmLNbHMkZl7kQQFClMWh8XuDwdjVUhu828Nbbu2gbKU/r5tjjoNhK3X49Hx+EnuhHIUeWpWaJG+bq//pK9K0UJo0oMdrHmh2zdNBpbRq3mhJS8Ty7ayxFZvIlyyd+Zlsvrkbu+dfmDqaaqS4aKp5OzubcgKN2ggB+3sRvoufvOEoNV8faUYt8njBnHcagY3ecHt5nGjebzLvFfne23yXoO412rGa/LitZv3Gs17u8yzOp+1ybMGcdZqhjV5Ye3mWaN5tsu8rPNlm7xsEJetZmSTF9luXjaal7vM0zqftsnTBnHaaoY2eaHt5mmjebrDPCe3UPJllQKcroTXJKEmh2o2x3GQoBouJJIkh6G7EDXYlddEogx3Q/CU/1HnoLhgQFjDoZHZ4BD6AfhROVNfYBpHMneS9TweCsP8VauqZ+pD36ITIf0wb6GUReTN1GaRUI+p8p6YbYupxEuvEodCwY6ghs06keTVJK3PYdOdc85G6dofy5ooFOirHk33o3WdVvkXwjKxGwyl+4q92soRgWaMk2fwkhdr8R9g07gX+rBp8OscZtEuIlpsiBDtQWNoqm1gPbh4fGR2jkzz587B6bN1j3hP+1b7TvteM7Wedqo91c60cw1rv2l/aH9qf+3/vv9h/5/9f1fUu3fWc77RKs/+f/8DRZuwMg==</latexit>

z = µ+ ✏⌃

GAUSSIAN REPARAMETERIZATION

44

~

<latexit sha1_base64="T8FmzHO54kPteUaY1WhvZp8nAKY=">AAAPBXicfZdNb9s2HMbV7q3L1qzdjrsICwoMQxBIieOXQ4Faso0e1jbL8tbFQUDRtCKEEgmScuwKuu4L7Lp9gx2HXfc59gH2PUbJtiyRknXJP3wePv7pT8kmPYoDLizr30ePP/r4k08/e/L5zhdfPt396tnzry84iRlE55Bgwq48wBEOInQuAoHRFWUIhB5Gl969m+mXM8R4QKIzsaDoJgR+FEwDCIQcek9vx564QwLcPtuzDqz8MvXCXhV7xuo6uX3+9L/xhMA4RJGAGHB+bVtU3CSAiQBilO6MY44ogPfAR9exmHZvkiCisUARTM0XUpvG2BTEzKDMScAQFHghCwBZIBNMeAcYgEKi71SjOIpAiPj+ZBZQviz5zF8WAsj7vknmeV/SysTEZ4DeBXBeIUtAyEMg7rRBvgi96iCKMWKzsDqYUUpGxTlHDAY868GJbMw7mrWan5GTlX63oHco4mkSM5yWJ0oBMYamcmJeciRimuQ3I9f3nr8ULEb7WZmPvRwAdn+KJvsypzJQxZliAkR1yAuz5kToAZIwBNEkGdM0GQs0F8l4/yCV4gvTw9LsEcAmVedpmiTjrGeeZ55Ka0V8WxLfquKwJA5XH0LwxJwSZs7k+hPGTWk0pYUFEPHq7PNi9tQ8V6MvSuKFKl6WxEtV9OKSGmvqrKTONPWhpD6o6rwkzlVxURIXqvihJH5Qxattse+3xf6ixMr+y5disTOeoKn8+sgfoeQepsnrszc/pkkvv1bPQoxMu2qE3tp4NGp3ep1UlfFab426tjPQ9cLQcfq2W2coHP2Oaw2GG5ZDxVtAW1Z72G+rURBv9G7PHen6BtbqdwZOjWFDO3Jbw86qfQhFitUvfG6v3dLa4hc5Pcdxjnu6Xhiclut2D2sMhcMdDAZ9N0ehMaMYKV66Nrbbx5YeRYugrtVu9Wv0zQJYXcfRYGmJxRk59sDOWQQCWHGK4mFxu/3eUA0Sm/47fdfVFlCU2t917UGrxrBhPR4cD49yEsJA5KtdIUX72p3uUVeNIkXQqCNXUGMhm08a9Ryro7GQEsvIcZ2ssdU3Ub5k1/ZNsvzK3bx35p5tpqlqli+aas7evbVZ8eIaM25219q3+esn4EZ4/U4hbIqHNeGwEQbWscBmeFgLD7fB+7rfb4r3a8L9Rhi/jsVvhvdr4f1t8FT306Z4WhNOG2FoHQtthqe18HQbvND9oile1ISLRhhRxyKa4UUtvNgGT3Q/aYonNeGkEYbUsZBmeFILT7bAM/Qgt3zZTkE+XQnTIuWeXO5mcx3iBGg6F0CgXCY44Zq8PG1kuhdKpvwf3QPiwiFLTUeUr3VZBljyqJ5JwCGJI5GTUJyMfSCVtNj1TAJ5bjERF0GYn4SUmwCh/DFd36TcyanxPp9uNlOJn94mYyI37EDuYbOTSPLzKNXn0MnWOSeDdMVHs0MUwGZ2CCHYDKLVPq3yE0KzsHsot+5L93IpB0gexhh6Iz/k3Sr8B7lozA8DuWjy73g/q7YZwXxtlNWOPBja6jFQLy4PD+zWgW3/1Np71V6dEZ8Y3xrfGd8bttExXhmvjRPj3IBGaPxm/G78sfvr7p+7f+3+vbQ+frSa841RuXb/+R+p/5gA</latexit>p✓

<latexit sha1_base64="zPFDJz1Q5MFquEJzZiL8BGpobt0=">AAAPFHicfZdNb9s2HMbV7q3L1izdgF12ERYUGIYgkBrHL4cCtWQbPSxtluWlax0EFE0rQiSRICnHrqaPsS+w6/YNdhx23X0fYN9jf8m2LFGSdTHN5+Gjn/4UbdJhviekYfz74OEHH3708SePPt357PPHu1/sPfnyUtCIY3KBqU/5GwcJ4nshuZCe9MkbxgkKHJ9cOXd2ql/NCBceDc/lgpHrALmhN/UwktB1s/f1GDt+PHaoPxGLAD7icRAlyc3evnFoZJdebZirxr62uk5vnjz+bzyhOApIKLGPhHhnGkxex4hLD/sk2RlHgjCE75BL3kVy2r2OvZBFkoQ40Z+CNo18XVI9ZdQnHidY+gtoIMw9SNDxLeIIS3iSnXKUICEKiDiYzDwmlk0xc5cNiaAM1/E8K1NSGhi7HLFbD89LZDEKRIDkbaUzrU25k0Q+4bOg3JlSAqPinBOOPZHW4BQK85qllRfn9HSl3y7YLQlFEkfcT4oDQSCckykMzJqCyIjF2cPAdN+J55JH5CBtZn3PB4jfnZHJAeSUOso4U58iWe5ygrQ4IbnHNAhQOInHLInHksxlPD44TEB8qjs+mB2K+KTsPEtieGWgZo6jn4G1JL4qiK9UcVgQh6ubwGuoTynXZzD/lAsdjDpYuIeJKI++yEdP9Qs1+rIgXqriVUG8UkUnKqhRRZ0V1FlFvS+o96o6L4hzVVwUxIUqvi+I71XxzbbYn7fFvlViof6wKBY74wmZwq9J9grFdziJX56f/JDEvexavQsR0c2yETtr49Go3el1ElX213pr1DWtQVXPDR2rb9p1htzR79jGYLhheaZ4c2jDaA/7bTUK+xu927NHVX0Da/Q7A6vGsKEd2a1hZ1U+QkLF6uY+u9duVcri5jk9y7KOe1U9N1gt2+4+qzHkDnswGPTtDIVFnPlE8bK1sd0+NqpRLA/qGu1Wv0bfTIDRtawKLCuwWCPLHJgZiyTIV5wyf1nsbr83VIPkpv5W37YrEygL5e/a5qBVY9iwHg+Oh0cZCeUodNWq0Lx87U73qKtG0Txo1IEZrLDQzZ1GPcvoVFhogWVk2VZa2PJKhEX2zryOlz+5m3Wn75t6kqhmWGiqOV17a7Pi9WvMfrO71r7NXz/Ab4SvPinGTfG4Jhw3wuA6FtwMj2vh8TZ4t+p3m+LdmnC3EcatY3Gb4d1aeHcbPKv6WVM8qwlnjTCsjoU1w7NaeLYNXlb9sile1oTLRhhZxyKb4WUtvNwGT6t+2hRPa8JpIwytY6HN8LQWnm6B5+QetnzpTgHerphXImFPDrvZTMd+jCq6kEiSTIajhajIjrwlEqW6EwBT9qXqQVHugGZFJ0ysdWh6PvConoknMI1CmZEwOOS4CJQk3/VMPDi36ERIL8gORspDoAD+TNcPCTs5Nd4V081mKnaTm3hMYcOOYA+bnkTin0ZJdQybbB1zOkhWfCw9RCFfXx7QdC9c7dNKfyEsDbvDsHVfupdTOSBwGOPkBG7yehX+PUwadwMPJg0+xwdpa5sRzddGaO3AwdBUj4HVxtWzQ7N1aJo/tvZfnKzOiI+0b7Rvte80U+toL7SX2ql2oWHtF+037Xftj91fd//c/Wv376X14YPVmK+00rX7z/9csZ51</latexit>µ

<latexit sha1_base64="GYuO5la/Tn1O75f0NO0RswHDISE=">AAAPF3icfZdNb9s2HMbV7q3L1izdTsMuwoICwxAEUuP45VCglmyjh6XN0rx0q4OAomlFCCUSJOXYFYR9jX2BXbdvsOOw6477APseo2RblkjJupjm8/DRT3+KNulRHHBhWf8+ePjBhx99/MmjT3c++/zx7hd7T7685CRmEF1Aggl76wGOcBChCxEIjN5ShkDoYXTl3bmZfjVDjAckOhcLiq5D4EfBNIBAyK6bva/HkOJk7BE84YtQfiTjN4EfgjS92du3Dq38MvWGvWrsG6vr9ObJ4//GEwLjEEUCYsD5O9ui4joBTAQQo3RnHHNEAbwDPnoXi2n3OgkiGgsUwdR8KrVpjE1BzAzTnAQMQYEXsgEgC2SCCW8BA1DIh9mpRnEUgRDxg8ksoHzZ5DN/2RBAVuI6meeVSisDE58BehvAeYUsASEPgbjVOrPyVDtRjBGbhdXOjFIyKs45YjDgWQ1OZWFe06z4/JycrvTbBb1FEU+TmOG0PFAKiDE0lQPzJkcipkn+MHLG7/hzwWJ0kDXzvucDwO7O0ORA5lQ6qjhTTICodnlhVpwI3UMShiCaJGOaJmOB5iIZHxymUnxqeliaPQLYpOo8S5NknNXM88wzaa2Ir0riK1UclsTh6ibyTTSnhJkzOf+EcVMaTWlhAUS8OvqiGD01L9Toy5J4qYpXJfFKFb24pMaaOiupM029L6n3qjoviXNVXJTEhSq+L4nvVfHtttiftsX+rMTK+stFsdgZT9BU/qDkr1ByB9Pk5fnJD2nSy6/VuxAj064aobc2Ho3anV4nVWW81lujru0MdL0wdJy+7dYZCke/41qD4YblmeItoC2rPey31SiIN3q35450fQNr9TsDp8awoR25rWFnVT6EIsXqFz63125pZfGLnJ7jOMc9XS8MTst1u89qDIXDHQwGfTdHoTGjGCleuja228eWHkWLoK7VbvVr9M0EWF3H0WBpicUZOfbAzlkEAlhxiuJlcbv93lANEpv6O33X1SZQlMrfde1Bq8awYT0eHA+PchLCQOSrVSFF+dqd7lFXjSJF0KgjZ1BjIZs7jXqO1dFYSIll5LhOVtjqSpSL7J19nSx/cjfrzty3zTRVzXKhqeZs7a3NihfXmHGzu9a+zV8/ADfC608KYVM8rAmHjTCwjgU2w8NaeLgN3tf9flO8XxPuN8L4dSx+M7xfC+9vg6e6nzbF05pw2ghD61hoMzythafb4IXuF03xoiZcNMKIOhbRDC9q4cU2eKL7SVM8qQknjTCkjoU0w5NaeLIFnqF7ueXLdgry7UqYFin35HI3m+sQJ0DTuQAC5bI8XXBN9sQtEiDTvVAy5V90D4gLh2xqOqJ8rctmgCWP6pkEHJI4EjlJdubxQRjmt1rueiaBPLeYiIsgzM9GykOAUP6Zrh9S7uTUeJ9PN5upxE9vkjGRG3Yg97DZSSR5M0r1MXSydczpIF3x0ewQBbC5PKOZQbTap1X+QmgWdgfl1n3pXk7lAMnDGEMn8iavV+Hfy0ljfhjISZOf44Ostc0I5mujbO3Ig6GtHgP1xtWzQ7t1aNs/tvZfnKzOiI+Mb4xvje8M2+gYL4yXxqlxYUDjF+M343fjj91fd//c/Wv376X14YPVmK+MyrX7z/9Rzp+w</latexit>

Move sampling out of computation path

Gradients flow to parameters :)

<latexit sha1_base64="ytaGjQ3HxJR6GVJlezzzjEx+HxU=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxLGD4PH/30pyiTHsUBF5b176PHH338yaefPfl854svn+4+e/7iq0tOYgbRBSSYsPce4AgHEboQgcDoPWUIhB5GV97MzfSrOWI8ING5WFJ0EwI/CqYBBEJ2XY05CClGt8/3rAMrv0y9Ya8ae8bqOr198fS/8YTAOESRgBhwfm1bVNwkgIkAYpTujGOOKIAz4KPrWEy7N0kQ0VigCKbmS6lNY2wKYmZI5iRgCAq8lA0AWSATTHgHGIBCgu9UoziKQIj4/mQeUP7Q5HP/oSGAfOqbZJFXJa0MTHwG6F0AFxWyBIQ8BOJO6+TL0Kt2ohgjNg+rnRmlZFScC8RgwLManMrCvKNZofk5OV3pd0t6hyKeJjHDaXmgFBBjaCoH5k2OREyT/GHk7M74K8FitJ81875XA8BmZ2iyL3MqHVWcKSZAVLu8MCtOhO4hCUMQTZIxTZOxQAuRjPcPUim+ND0szR4BbFJ1nqVJMs5q5nnmmbRWxLcl8a0qDkvicHUTgifmlDBzLuefMG5KoyktLICIV0dfFKOn5oUafVkSL1XxqiReqaIXl9RYU+clda6p9yX1XlUXJXGhisuSuFTFDyXxgyq+3xb787bYX5RYWX+5KJY74wmayo9H/golM5gmb85PfkiTXn6t3oUYmXbVCL218WjU7vQ6qSrjtd4adW1noOuFoeP0bbfOUDj6HdcaDDcsh4q3gLas9rDfVqMg3ujdnjvS9Q2s1e8MnBrDhnbktoadVfkQihSrX/jcXrullcUvcnqO4xz3dL0wOC3X7R7WGAqHOxgM+m6OQmMmP+OKl66N7faxpUfRIqhrtVv9Gn0zAVbXcTRYWmJxRo49sHMWgQBWnKJ4WdxuvzdUg8Sm/k7fdbUJFKXyd1170KoxbFiPB8fDo5yEMBD5alVIUb52p3vUVaNIETTqyBnUWMjmTqOeY3U0FlJiGTmukxW2uhLlIru2b5KHT+5m3Zl7tpmmqlkuNNWcrb21WfHiGjNudtfat/nrB+BGeP1JIWyKhzXhsBEG1rHAZnhYCw+3wfu632+K92vC/UYYv47Fb4b3a+H9bfBU99OmeFoTThthaB0LbYantfB0G7zQ/aIpXtSEi0YYUccimuFFLbzYBk90P2mKJzXhpBGG1LGQZnhSC0+2wDN0L7d82U5Bvl0J0yLlnlzuZnMd4gRoOhdAoFwmOOGa7Ik7JECme6Fkyv/RPSAuHLKp6YjytS6bAZY8qmcScEjiSOQkFCdjH0glLXY9k0CeW0zERRDm5yDlIfIz0foh5X5Mjff5dLOZSvz0NhkTuWEHcg+bnUSSn0apPoZOto45HaQrPpodogA2s0MIwWYQrfZplZ8QmoXNoNy6P7gfpnKA5GGMoRN5k3er8O/lpDE/DOSkyb/j/ay1zQgWa6Ns7ciDoa0eA/XG1eGB3Tqw7R9be69PVmfEJ8Y3xrfGd4ZtdIzXxhvj1LgwoDEzfjN+N/7Y/XX3z92/dv9+sD5+tBrztVG5dv/5Hw1/l0A=</latexit>z

+

×

~<latexit sha1_base64="L6FT0jlPTopkQHcO1l/lf0lDk1k=">AAAPIXicfZdNb9s2HMbV7q3L1qzdjrsICwp0QxBIieOXQ4Faso0eljbr8tItDgKKphUhlEiQlGNX0AfY19gX2HX7BjsOuw0773uMkm1ZIiXrkn/4PHz805+STXoUB1xY1j8PHn7w4Ucff/Lo053PPn+8+8WTp19ecBIziM4hwYS98wBHOIjQuQgERu8oQyD0MLr07txMv5whxgMSnYkFRdch8KNgGkAg5NDNk71xCMQtBDh5nT7Pa2+aWOm+ua7t9Fvpsg6s/DL1wl4Ve8bqOr15+vi/8YTAOESRgBhwfmVbVFwngIkAYpTujGOOKIB3wEdXsZh2r5MgorFAEUzNZ1KbxtgUxMyAzUnAEBR4IQsAWSATTHgLGIBC3tZONYqjCISI709mAeXLks/8ZSGA7Ml1Ms97llYmJj4D9DaA8wpZAkKeNUEb5IvQqw6iGCM2C6uDGaVkVJxzxGDAsx6cysa8odky8DNyutJvF/QWRTxNYobT8kQpIMbQVE7MS45ETJP8ZuTa3/EXgsVoPyvzsRcDwO7eosm+zKkMVHGmmABRHfLCrDkRuockDEE0ScY0TcYCzUUy3j9IpfjM9LA0ewSwSdX5Nk2S5YPjmW+ltSK+LomvVXFYEoerDyF4Yk4JM2dy/QnjpjSa0sICiHh19nkxe2qeq9EXJfFCFS9L4qUqenFJjTV1VlJnmnpfUu9VdV4S56q4KIkLVXxfEt+r4rttsT9ti/1ZiZX9ly/FYmc8QVP51ZI/QskdTJNXZyffp0kvv1bPQoxMu2qE3tp4NGp3ep1UlfFab426tjPQ9cLQcfq2W2coHP2Oaw2GG5ZDxVtAW1Z72G+rURBv9G7PHen6BtbqdwZOjWFDO3Jbw86qfQhFitUvfG6v3dLa4hc5Pcdxjnu6Xhiclut2D2sMhcMdDAZ9N0ehMaMYKV66Nrbbx5YeRYugrtVu9Wv0zQJYXcfRYGmJxRk59sDOWQQCWHGK4mFxu/3eUA0Sm/47fdfVFlCU2t917UGrxrBhPR4cD49yEsJA5KtdIUX72p3uUVeNIkXQqCNXUGMhm08a9Ryro7GQEsvIcZ2ssdU3Ub5kV/Z1svzK3bx35p5tpqlqli+aas7evbVZ8eIaM25219q3+esn4EZ4/U4hbIqHNeGwEQbWscBmeFgLD7fB+7rfb4r3a8L9Rhi/jsVvhvdr4f1t8FT306Z4WhNOG2FoHQtthqe18HQbvND9oile1ISLRhhRxyKa4UUtvNgGT3Q/aYonNeGkEYbUsZBmeFILT7bAM3Qvt3zZTkE+XQnTIuWeXO5mcx3iBGg6F0CgXCY44ZrsiVskQKZ7oWTK/9E9IC4cstR0RPlal2WAJY/qmQQckjgSOQnFydgHUkmLXc8kkOcWE3ERhPkpSbkJEMof0/VNyp2cGu/z6WYzlfjpTTImcsMO5B42O4kkP45SfQ6dbJ1zOkhXfDQ7RAFsZocQgs0gWu3TKj8hNAu7g3LrvnQvl3KA5GGMoRP5IW9W4d/JRWN+GMhFk3/H+1m1zQjma6OsduTB0FaPgXpxeXhgtw5s+4fW3suT1RnxkfG18Y3x3LCNjvHSeGWcGucGNH4xfjN+N/7Y/XX3z92/dv9eWh8+WM35yqhcu//+D7xDoq8=</latexit>

N(0, 1)

<latexit sha1_base64="zPFDJz1Q5MFquEJzZiL8BGpobt0=">AAAPFHicfZdNb9s2HMbV7q3L1izdgF12ERYUGIYgkBrHL4cCtWQbPSxtluWlax0EFE0rQiSRICnHrqaPsS+w6/YNdhx23X0fYN9jf8m2LFGSdTHN5+Gjn/4UbdJhviekYfz74OEHH3708SePPt357PPHu1/sPfnyUtCIY3KBqU/5GwcJ4nshuZCe9MkbxgkKHJ9cOXd2ql/NCBceDc/lgpHrALmhN/UwktB1s/f1GDt+PHaoPxGLAD7icRAlyc3evnFoZJdebZirxr62uk5vnjz+bzyhOApIKLGPhHhnGkxex4hLD/sk2RlHgjCE75BL3kVy2r2OvZBFkoQ40Z+CNo18XVI9ZdQnHidY+gtoIMw9SNDxLeIIS3iSnXKUICEKiDiYzDwmlk0xc5cNiaAM1/E8K1NSGhi7HLFbD89LZDEKRIDkbaUzrU25k0Q+4bOg3JlSAqPinBOOPZHW4BQK85qllRfn9HSl3y7YLQlFEkfcT4oDQSCckykMzJqCyIjF2cPAdN+J55JH5CBtZn3PB4jfnZHJAeSUOso4U58iWe5ygrQ4IbnHNAhQOInHLInHksxlPD44TEB8qjs+mB2K+KTsPEtieGWgZo6jn4G1JL4qiK9UcVgQh6ubwGuoTynXZzD/lAsdjDpYuIeJKI++yEdP9Qs1+rIgXqriVUG8UkUnKqhRRZ0V1FlFvS+o96o6L4hzVVwUxIUqvi+I71XxzbbYn7fFvlViof6wKBY74wmZwq9J9grFdziJX56f/JDEvexavQsR0c2yETtr49Go3el1ElX213pr1DWtQVXPDR2rb9p1htzR79jGYLhheaZ4c2jDaA/7bTUK+xu927NHVX0Da/Q7A6vGsKEd2a1hZ1U+QkLF6uY+u9duVcri5jk9y7KOe1U9N1gt2+4+qzHkDnswGPTtDIVFnPlE8bK1sd0+NqpRLA/qGu1Wv0bfTIDRtawKLCuwWCPLHJgZiyTIV5wyf1nsbr83VIPkpv5W37YrEygL5e/a5qBVY9iwHg+Oh0cZCeUodNWq0Lx87U73qKtG0Txo1IEZrLDQzZ1GPcvoVFhogWVk2VZa2PJKhEX2zryOlz+5m3Wn75t6kqhmWGiqOV17a7Pi9WvMfrO71r7NXz/Ab4SvPinGTfG4Jhw3wuA6FtwMj2vh8TZ4t+p3m+LdmnC3EcatY3Gb4d1aeHcbPKv6WVM8qwlnjTCsjoU1w7NaeLYNXlb9sile1oTLRhhZxyKb4WUtvNwGT6t+2hRPa8JpIwytY6HN8LQWnm6B5+QetnzpTgHerphXImFPDrvZTMd+jCq6kEiSTIajhajIjrwlEqW6EwBT9qXqQVHugGZFJ0ysdWh6PvConoknMI1CmZEwOOS4CJQk3/VMPDi36ERIL8gORspDoAD+TNcPCTs5Nd4V081mKnaTm3hMYcOOYA+bnkTin0ZJdQybbB1zOkhWfCw9RCFfXx7QdC9c7dNKfyEsDbvDsHVfupdTOSBwGOPkBG7yehX+PUwadwMPJg0+xwdpa5sRzddGaO3AwdBUj4HVxtWzQ7N1aJo/tvZfnKzOiI+0b7Rvte80U+toL7SX2ql2oWHtF+037Xftj91fd//c/Wv376X14YPVmK+00rX7z/9csZ51</latexit>µ

<latexit sha1_base64="GYuO5la/Tn1O75f0NO0RswHDISE=">AAAPF3icfZdNb9s2HMbV7q3L1izdTsMuwoICwxAEUuP45VCglmyjh6XN0rx0q4OAomlFCCUSJOXYFYR9jX2BXbdvsOOw6477APseo2RblkjJupjm8/DRT3+KNulRHHBhWf8+ePjBhx99/MmjT3c++/zx7hd7T7685CRmEF1Aggl76wGOcBChCxEIjN5ShkDoYXTl3bmZfjVDjAckOhcLiq5D4EfBNIBAyK6bva/HkOJk7BE84YtQfiTjN4EfgjS92du3Dq38MvWGvWrsG6vr9ObJ4//GEwLjEEUCYsD5O9ui4joBTAQQo3RnHHNEAbwDPnoXi2n3OgkiGgsUwdR8KrVpjE1BzAzTnAQMQYEXsgEgC2SCCW8BA1DIh9mpRnEUgRDxg8ksoHzZ5DN/2RBAVuI6meeVSisDE58BehvAeYUsASEPgbjVOrPyVDtRjBGbhdXOjFIyKs45YjDgWQ1OZWFe06z4/JycrvTbBb1FEU+TmOG0PFAKiDE0lQPzJkcipkn+MHLG7/hzwWJ0kDXzvucDwO7O0ORA5lQ6qjhTTICodnlhVpwI3UMShiCaJGOaJmOB5iIZHxymUnxqeliaPQLYpOo8S5NknNXM88wzaa2Ir0riK1UclsTh6ibyTTSnhJkzOf+EcVMaTWlhAUS8OvqiGD01L9Toy5J4qYpXJfFKFb24pMaaOiupM029L6n3qjoviXNVXJTEhSq+L4nvVfHtttiftsX+rMTK+stFsdgZT9BU/qDkr1ByB9Pk5fnJD2nSy6/VuxAj064aobc2Ho3anV4nVWW81lujru0MdL0wdJy+7dYZCke/41qD4YblmeItoC2rPey31SiIN3q35450fQNr9TsDp8awoR25rWFnVT6EIsXqFz63125pZfGLnJ7jOMc9XS8MTst1u89qDIXDHQwGfTdHoTGjGCleuja228eWHkWLoK7VbvVr9M0EWF3H0WBpicUZOfbAzlkEAlhxiuJlcbv93lANEpv6O33X1SZQlMrfde1Bq8awYT0eHA+PchLCQOSrVSFF+dqd7lFXjSJF0KgjZ1BjIZs7jXqO1dFYSIll5LhOVtjqSpSL7J19nSx/cjfrzty3zTRVzXKhqeZs7a3NihfXmHGzu9a+zV8/ADfC608KYVM8rAmHjTCwjgU2w8NaeLgN3tf9flO8XxPuN8L4dSx+M7xfC+9vg6e6nzbF05pw2ghD61hoMzythafb4IXuF03xoiZcNMKIOhbRDC9q4cU2eKL7SVM8qQknjTCkjoU0w5NaeLIFnqF7ueXLdgry7UqYFin35HI3m+sQJ0DTuQAC5bI8XXBN9sQtEiDTvVAy5V90D4gLh2xqOqJ8rctmgCWP6pkEHJI4EjlJdubxQRjmt1rueiaBPLeYiIsgzM9GykOAUP6Zrh9S7uTUeJ9PN5upxE9vkjGRG3Yg97DZSSR5M0r1MXSydczpIF3x0ewQBbC5PKOZQbTap1X+QmgWdgfl1n3pXk7lAMnDGEMn8iavV+Hfy0ljfhjISZOf44Ostc0I5mujbO3Ig6GtHgP1xtWzQ7t1aNs/tvZfnKzOiI+Mb4xvje8M2+gYL4yXxqlxYUDjF+M343fjj91fd//c/Wv376X14YPVmK+MyrX7z/9Rzp+w</latexit>

<latexit sha1_base64="jP+88+J5EambS0r9WjxZ5v5gh3s=">AAAPAXicfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKpmUhlEiQlGNX0GVfYNftG+w47LpPsg+w7zFKsWWJlKxL/uHz8PFPf0o26VEccGFZ/z56/NHHn3z62ZPPd7748unus+cvvrrkJGYQXUCCCXvvAY5wEKELEQiM3lOGQOhhdOXduZl+NUeMByQ6F0uKbkLgR8E0gEDIoYuxhyi/fb5nHVj5ZeqFvSr2jNV1evvi6X/jCYFxiCIBMeD82raouEkAEwHEKN0ZxxxRAO+Aj65jMe3eJEFEY4EimJovpTaNsSmImQGZk4AhKPBSFgCyQCaYcAYYgEJi71SjOIpAiPj+ZB5Q/lDyuf9QCCDv+SZZ5D1JKxMTnwE6C+CiQpaAkIdAzLRBvgy96iCKMWLzsDqYUUpGxblADAY868GpbMw7mrWZn5PTlT5b0hmKeJrEDKfliVJAjKGpnJiXHImYJvnNyLW9468Ei9F+VuZjrwaA3Z2hyb7MqQxUcaaYAFEd8sKsORG6hyQMQTRJxjRNxgItRDLeP0il+NL0sDR7BLBJ1XmWJsk465nnmWfSWhHflsS3qjgsicPVhxA8MaeEmXO5/oRxUxpNaWEBRLw6+6KYPTUv1OjLknipilcl8UoVvbikxpo6L6lzTb0vqfequiiJC1VclsSlKn4oiR9U8f222J+3xf6ixMr+y5diuTOeoKn86sgfoeQOpsmb85Mf0qSXX6tnIUamXTVCb208GrU7vU6qynitt0Zd2xnoemHoOH3brTMUjn7HtQbDDcuh4i2gLas97LfVKIg3erfnjnR9A2v1OwOnxrChHbmtYWfVPoQixeoXPrfXbmlt8YucnuM4xz1dLwxOy3W7hzWGwuEOBoO+m6PQmFGMFC9dG9vtY0uPokVQ12q3+jX6ZgGsruNosLTE4owce2DnLAIBrDhF8bC43X5vqAaJTf+dvutqCyhK7e+69qBVY9iwHg+Oh0c5CWEg8tWukKJ97U73qKtGkSJo1JErqLGQzSeNeo7V0VhIiWXkuE7W2OqbKF+ya/smefjK3bx35p5tpqlqli+aas7evbVZ8eIaM25219q3+esn4EZ4/U4hbIqHNeGwEQbWscBmeFgLD7fB+7rfb4r3a8L9Rhi/jsVvhvdr4f1t8FT306Z4WhNOG2FoHQtthqe18HQbvND9oile1ISLRhhRxyKa4UUtvNgGT3Q/aYonNeGkEYbUsZBmeFILT7bAM3Qvt3zZTkE+XQnTIuWeXO5mcx3iBGg6F0CgXCY44ZrsiRkSINO9UDLl/+geEBcOWWq6PK+sdVkGWPKonknAIYkjkZNQnIx9IJW02PVMAnluMREXQZifgpSbAKH8MV3fpNzJqfE+n242U4mf3iZjIjfsQO5hs5NI8tMo1efQydY5p4N0xUezQxTAZnYIIdgMotU+rfITQrOwOyi37g/uh6UcIHkYY+hEfsi7Vfj3ctGYHwZy0eTf8X5WbTOCxdooqx15MLTVY6BeXB0e2K0D2/6xtff6ZHVGfGJ8Y3xrfGfYRsd4bbwxTo0LAxqB8Zvxu/HH7q+7f+7+tfv3g/Xxo9Wcr43KtfvP/0/8llI=</latexit>✏

<latexit sha1_base64="ytaGjQ3HxJR6GVJlezzzjEx+HxU=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxLGD4PH/30pyiTHsUBF5b176PHH338yaefPfl854svn+4+e/7iq0tOYgbRBSSYsPce4AgHEboQgcDoPWUIhB5GV97MzfSrOWI8ING5WFJ0EwI/CqYBBEJ2XY05CClGt8/3rAMrv0y9Ya8ae8bqOr198fS/8YTAOESRgBhwfm1bVNwkgIkAYpTujGOOKIAz4KPrWEy7N0kQ0VigCKbmS6lNY2wKYmZI5iRgCAq8lA0AWSATTHgHGIBCgu9UoziKQIj4/mQeUP7Q5HP/oSGAfOqbZJFXJa0MTHwG6F0AFxWyBIQ8BOJO6+TL0Kt2ohgjNg+rnRmlZFScC8RgwLManMrCvKNZofk5OV3pd0t6hyKeJjHDaXmgFBBjaCoH5k2OREyT/GHk7M74K8FitJ81875XA8BmZ2iyL3MqHVWcKSZAVLu8MCtOhO4hCUMQTZIxTZOxQAuRjPcPUim+ND0szR4BbFJ1nqVJMs5q5nnmmbRWxLcl8a0qDkvicHUTgifmlDBzLuefMG5KoyktLICIV0dfFKOn5oUafVkSL1XxqiReqaIXl9RYU+clda6p9yX1XlUXJXGhisuSuFTFDyXxgyq+3xb787bYX5RYWX+5KJY74wmayo9H/golM5gmb85PfkiTXn6t3oUYmXbVCL218WjU7vQ6qSrjtd4adW1noOuFoeP0bbfOUDj6HdcaDDcsh4q3gLas9rDfVqMg3ujdnjvS9Q2s1e8MnBrDhnbktoadVfkQihSrX/jcXrullcUvcnqO4xz3dL0wOC3X7R7WGAqHOxgM+m6OQmMmP+OKl66N7faxpUfRIqhrtVv9Gn0zAVbXcTRYWmJxRo49sHMWgQBWnKJ4WdxuvzdUg8Sm/k7fdbUJFKXyd1170KoxbFiPB8fDo5yEMBD5alVIUb52p3vUVaNIETTqyBnUWMjmTqOeY3U0FlJiGTmukxW2uhLlIru2b5KHT+5m3Zl7tpmmqlkuNNWcrb21WfHiGjNudtfat/nrB+BGeP1JIWyKhzXhsBEG1rHAZnhYCw+3wfu632+K92vC/UYYv47Fb4b3a+H9bfBU99OmeFoTThthaB0LbYantfB0G7zQ/aIpXtSEi0YYUccimuFFLbzYBk90P2mKJzXhpBGG1LGQZnhSC0+2wDN0L7d82U5Bvl0J0yLlnlzuZnMd4gRoOhdAoFwmOOGa7Ik7JECme6Fkyv/RPSAuHLKp6YjytS6bAZY8qmcScEjiSOQkFCdjH0glLXY9k0CeW0zERRDm5yDlIfIz0foh5X5Mjff5dLOZSvz0NhkTuWEHcg+bnUSSn0apPoZOto45HaQrPpodogA2s0MIwWYQrfZplZ8QmoXNoNy6P7gfpnKA5GGMoRN5k3er8O/lpDE/DOSkyb/j/ay1zQgWa6Ns7ciDoa0eA/XG1eGB3Tqw7R9be69PVmfEJ8Y3xrfGd4ZtdIzXxhvj1LgwoDEzfjN+N/7Y/XX3z92/dv9+sD5+tBrztVG5dv/5Hw1/l0A=</latexit>z<latexit sha1_base64="dSa//P/biN8T0p7UDtRAa3DR8qw=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxL/uHz8NFPf1o26VEccGFZ/z56/NHHn3z62ZPPd7748unus+cvvrrkJGYQXUCCCXvvAY5wEKELEQiM3lOGQOhhdOXN3Ey/miPGAxKdiyVFNyHwo2AaQCDk0NUY+jiZprfP96wDK79MvbBXxZ6xuk5vXzz9bzwhMA5RJCAGnF/bFhU3CWAigBilO+OYIwrgDPjoOhbT7k0SRDQWKIKp+VJq0xibgpgZkjkJGIICL2UBIAtkggnvAANQSPCdahRHEQgR35/MA8ofSj73HwoB5FPfJIu8K2llYuIzQO8CuKiQJSDkIRB32iBfhl51EMUYsXlYHcwoJaPiXCAGA5714FQ25h3NGs3PyelKv1vSOxTxNIkZTssTpYAYQ1M5MS85EjFN8oeRqzvjrwSL0X5W5mOvBoDNztBkX+ZUBqo4U0yAqA55YdacCN1DEoYgmiRjmiZjgRYiGe8fpFJ8aXpYmj0C2KTqPEuTZJz1zPPMM2mtiG9L4ltVHJbE4eomBE/MKWHmXK4/YdyURlNaWAARr86+KGZPzQs1+rIkXqriVUm8UkUvLqmxps5L6lxT70vqvaouSuJCFZclcamKH0riB1V8vy32522xvyixsv/ypVjujCdoKr888o9QMoNp8ub85Ic06eXX6rMQI9OuGqG3Nh6N2p1eJ1VlvNZbo67tDHS9MHScvu3WGQpHv+Nag+GG5VDxFtCW1R7222oUxBu923NHur6BtfqdgVNj2NCO3Naws2ofQpFi9Quf22u3tLb4RU7PcZzjnq4XBqflut3DGkPhcAeDQd/NUWjMKEaKl66N7faxpUfRIqhrtVv9Gn2zAFbXcTRYWmJxRo49sHMWgQBWnKL4sLjdfm+oBolN/52+62oLKErt77r2oFVj2LAeD46HRzkJYSDy1a6Qon3tTveoq0aRImjUkSuosZDNnUY9x+poLKTEMnJcJ2ts9U2UL9m1fZM8fOVu3jtzzzbTVDXLF001Z+/e2qx4cY0ZN7tr7dv89RNwI7z+pBA2xcOacNgIA+tYYDM8rIWH2+B93e83xfs14X4jjF/H4jfD+7Xw/jZ4qvtpUzytCaeNMLSOhTbD01p4ug1e6H7RFC9qwkUjjKhjEc3wohZebIMnup80xZOacNIIQ+pYSDM8qYUnW+AZupdbvmynkB0MmBYp9+RyN5vrECdA07kAAuUywQnXZE/cIQEy3QslU/6P7gFx4ZClpiPK17osAyx5VM8k4JDEkchJKE7GPpBKWux6JoE8t5iIiyDMz0HKQ4BQ/piuH1Lux9R4n083m6nET2+TMZEbdiD3sNlJJPlplOpz6GTrnNNBuuKj2SEKYDM7hBBsBtFqn1b5CaFZ2AzKrfuD+2EpB0gexhg6kTd5twr/Xi4a88NALpr8O97Pqm1GsFgbZbUjD4a2egzUi6vDA7t1YNs/tvZen6zOiE+Mb4xvje8M2+gYr403xqlxYUBjZvxm/G78sfvr7p+7f+3+/WB9/Gg152ujcu3+8z8UJZdS</latexit>

f

<latexit sha1_base64="dSa//P/biN8T0p7UDtRAa3DR8qw=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxL/uHz8NFPf1o26VEccGFZ/z56/NHHn3z62ZPPd7748unus+cvvrrkJGYQXUCCCXvvAY5wEKELEQiM3lOGQOhhdOXN3Ey/miPGAxKdiyVFNyHwo2AaQCDk0NUY+jiZprfP96wDK79MvbBXxZ6xuk5vXzz9bzwhMA5RJCAGnF/bFhU3CWAigBilO+OYIwrgDPjoOhbT7k0SRDQWKIKp+VJq0xibgpgZkjkJGIICL2UBIAtkggnvAANQSPCdahRHEQgR35/MA8ofSj73HwoB5FPfJIu8K2llYuIzQO8CuKiQJSDkIRB32iBfhl51EMUYsXlYHcwoJaPiXCAGA5714FQ25h3NGs3PyelKv1vSOxTxNIkZTssTpYAYQ1M5MS85EjFN8oeRqzvjrwSL0X5W5mOvBoDNztBkX+ZUBqo4U0yAqA55YdacCN1DEoYgmiRjmiZjgRYiGe8fpFJ8aXpYmj0C2KTqPEuTZJz1zPPMM2mtiG9L4ltVHJbE4eomBE/MKWHmXK4/YdyURlNaWAARr86+KGZPzQs1+rIkXqriVUm8UkUvLqmxps5L6lxT70vqvaouSuJCFZclcamKH0riB1V8vy32522xvyixsv/ypVjujCdoKr888o9QMoNp8ub85Ic06eXX6rMQI9OuGqG3Nh6N2p1eJ1VlvNZbo67tDHS9MHScvu3WGQpHv+Nag+GG5VDxFtCW1R7222oUxBu923NHur6BtfqdgVNj2NCO3Naws2ofQpFi9Quf22u3tLb4RU7PcZzjnq4XBqflut3DGkPhcAeDQd/NUWjMKEaKl66N7faxpUfRIqhrtVv9Gn2zAFbXcTRYWmJxRo49sHMWgQBWnKL4sLjdfm+oBolN/52+62oLKErt77r2oFVj2LAeD46HRzkJYSDy1a6Qon3tTveoq0aRImjUkSuosZDNnUY9x+poLKTEMnJcJ2ts9U2UL9m1fZM8fOVu3jtzzzbTVDXLF001Z+/e2qx4cY0ZN7tr7dv89RNwI7z+pBA2xcOacNgIA+tYYDM8rIWH2+B93e83xfs14X4jjF/H4jfD+7Xw/jZ4qvtpUzytCaeNMLSOhTbD01p4ug1e6H7RFC9qwkUjjKhjEc3wohZebIMnup80xZOacNIIQ+pYSDM8qYUnW+AZupdbvmynkB0MmBYp9+RyN5vrECdA07kAAuUywQnXZE/cIQEy3QslU/6P7gFx4ZClpiPK17osAyx5VM8k4JDEkchJKE7GPpBKWux6JoE8t5iIiyDMz0HKQ4BQ/piuH1Lux9R4n083m6nET2+TMZEbdiD3sNlJJPlplOpz6GTrnNNBuuKj2SEKYDM7hBBsBtFqn1b5CaFZ2AzKrfuD+2EpB0gexhg6kTd5twr/Xi4a88NALpr8O97Pqm1GsFgbZbUjD4a2egzUi6vDA7t1YNs/tvZen6zOiE+Mb4xvje8M2+gYr403xqlxYUBjZvxm/G78sfvr7p+7f+3+/WB9/Gg152ujcu3+8z8UJZdS</latexit>

f

Assume we have a Gaussian distribution with mean parameter \mu and covariance matrix \Sigma (often diagonal).

The idea behind reparameterization is to move the sampling step out of the computation graph, so that gradients can flow directly through the graph to the parameters. This is very easy to do for the Gaussian distribution: We simply sample noise epsilon from a normal gaussian (with mean 0 and identity covariance matrix), and transform this noise by multiplying it with the covariance matrix \Sigma and adding the mean \mu. We can now do a normal backwards pass from f directly to the parameters!

Page 16: Lecture 9: Reinforcement Learning Emile van Krieken · 2021. 1. 25. · Emile van Krieken Deep Learning 2020 Lecture 9: Reinforcement Learning dlvu.github.io Reinforcement learning

EXAMPLE: VAE ELBO

45

<latexit sha1_base64="lNmv2rQRJfeS6jE27Zk1R6T5ass=">AAAPaXicfVfLbttGFKXTV+o2sdNuinYzqBEgKVyDjGU9FgYiUhICNA83je20pmEMRyOaEMmZcoayFIYf2WU/oJv+QLe9pCSKT3Hj6zlnDs/cmaHutbjrCKmqf+/c++TTzz7/4v6Xu199/eDh3v6jby4ECwNCzwlzWfDewoK6jk/PpSNd+p4HFHuWSy+tqZHglzMaCIf57+SC02sP274zcQiWMHSzPx2il0hHb05Nl9mI35iWvKUSPzFBOjLnMTI9Z4xMgT3u0qc/D26iX17GV3/eRKbF3LFYeFbC47dOHD9ZsT6u5z5FHz8ivh5+en2zf6AeqemDqoG2Cg6U1XN28+jBP+aYkdCjviQuFuJKU7m8jnAgHeLSeNcMBeWYTLFNr0I56V5Hjs9DSX0So8eATUIXSYaShaOxE1Ai3QUEmAQOKCByiwNMJKRntyglqI89Kg7HM4eLZShm9jKQGHJ7Hc3T3MeFiZEdYEgFmRecRdgTHpa3lcEke8VBGro0mHnFwcQleCwx5zQgjkhycAaJecOT7RTv2NkKv13wW+qLOAoDN85PBIAGAZ3AxDQUVIY8ShcDZ2gqTmUQ0sMkTMdOBziYvqXjQ9ApDBTtTFyGZXHI8pLk+PSOMM/D/hhOSRyZks5lZB4exQA+RpYLZIvhYFxkvo2jyExyZlnoLVAL4Osc+LoMDnPgcPUSOKhowgI0g/1ngUBAREAJHEJFcfZ5NnuCzsvSFznwogxe5sDLMmiFOTSsoLMcOqugdzn0rozOc+C8DC5y4KIMfsiBH8rg+22yv2+T/aMkC/mHS7HYNcd0Ap+o9AhFUxJHL969ehlHvfRZnYWQIq1IJNaaeDxqd3qduAy7a7w16mr6oIpnhI7e14w6Qsbodwx1MNx4eVbiZqZVtT3st8tSxN3g3Z4xquIbs2q/M9BrCBu3I6M17KzSR6lfotoZz+i1W5W02JlOT9f1k14Vzwh6yzC6z2oIGcMYDAZ9I7XCwwA+5CUuXxPb7RO1KsUzoa7abvVr8M0GqF1dr5jlOS/6SNcGWupFUuyWmDI7LEa33xuWheQm/3rfMCobKHPp7xraoFVD2Hg9GZwMj1MnLMC+Xc4Ky9LX7nSPu2UplgmNOrCDFS9s86ZRT1c7FS8s52WkG3qS2OJNhEt2pV1Hy0/u5t6hAw3FcZkMF61MTu7emlziujVkt5ldS9/Gr5/gNpqvrpSQJnlSI04azZA6L6TZPKk1T7aZt6t8u0nerhG3G83YdV7sZvN2rXl7m3le5fMmeV4jzhvN8DovvNk8rzXPt5mXVb5skpc14rLRjKzzIpvNy1rzcpt5VuWzJnlWI84azbA6L6zZPKs1z7aYD+gdlHxJpQCnKwoqklCTQzWb4sSNcAUXEkuawtBoiAq8bGES3PLAU/pPlYPDjAFhBadcrHEIHRf8lDljRxAW+jJ1wqHlsTEgcVb1jB3oWxAV0vHSbqu0iLQrWi8S6rGyvC0mm2IqsmPouRgU7Bhq2KQTiX4bxdU5fLx1ztkgXvnjSROFXbRs4ZDjr+q0wk8IT8SmBEr3JXu5lQMKzVhAX8FL3qzEf4JNC2zPgU2Dv+ZhEm0j4vmaCNEuNIZauQ2sBpfPjrTWkab92jp4/mrVI95XflB+VJ4omtJRnisvlDPlXCHKX8p/O8rOzsN/9/b3vtv7fkm9t7Oa861SePYO/geC7bn7</latexit>

ELBO = log p✓(x | z)-DKL[q�(z|x)||p(z)]

+

×

~

<latexit sha1_base64="L6FT0jlPTopkQHcO1l/lf0lDk1k=">AAAPIXicfZdNb9s2HMbV7q3L1qzdjrsICwp0QxBIieOXQ4Faso0eljbr8tItDgKKphUhlEiQlGNX0AfY19gX2HX7BjsOuw0773uMkm1ZIiXrkn/4PHz805+STXoUB1xY1j8PHn7w4Ucff/Lo053PPn+8+8WTp19ecBIziM4hwYS98wBHOIjQuQgERu8oQyD0MLr07txMv5whxgMSnYkFRdch8KNgGkAg5NDNk71xCMQtBDh5nT7Pa2+aWOm+ua7t9Fvpsg6s/DL1wl4Ve8bqOr15+vi/8YTAOESRgBhwfmVbVFwngIkAYpTujGOOKIB3wEdXsZh2r5MgorFAEUzNZ1KbxtgUxMyAzUnAEBR4IQsAWSATTHgLGIBC3tZONYqjCISI709mAeXLks/8ZSGA7Ml1Ms97llYmJj4D9DaA8wpZAkKeNUEb5IvQqw6iGCM2C6uDGaVkVJxzxGDAsx6cysa8odky8DNyutJvF/QWRTxNYobT8kQpIMbQVE7MS45ETJP8ZuTa3/EXgsVoPyvzsRcDwO7eosm+zKkMVHGmmABRHfLCrDkRuockDEE0ScY0TcYCzUUy3j9IpfjM9LA0ewSwSdX5Nk2S5YPjmW+ltSK+LomvVXFYEoerDyF4Yk4JM2dy/QnjpjSa0sICiHh19nkxe2qeq9EXJfFCFS9L4qUqenFJjTV1VlJnmnpfUu9VdV4S56q4KIkLVXxfEt+r4rttsT9ti/1ZiZX9ly/FYmc8QVP51ZI/QskdTJNXZyffp0kvv1bPQoxMu2qE3tp4NGp3ep1UlfFab426tjPQ9cLQcfq2W2coHP2Oaw2GG5ZDxVtAW1Z72G+rURBv9G7PHen6BtbqdwZOjWFDO3Jbw86qfQhFitUvfG6v3dLa4hc5Pcdxjnu6Xhiclut2D2sMhcMdDAZ9N0ehMaMYKV66Nrbbx5YeRYugrtVu9Wv0zQJYXcfRYGmJxRk59sDOWQQCWHGK4mFxu/3eUA0Sm/47fdfVFlCU2t917UGrxrBhPR4cD49yEsJA5KtdIUX72p3uUVeNIkXQqCNXUGMhm08a9Ryro7GQEsvIcZ2ssdU3Ub5kV/Z1svzK3bx35p5tpqlqli+aas7evbVZ8eIaM25219q3+esn4EZ4/U4hbIqHNeGwEQbWscBmeFgLD7fB+7rfb4r3a8L9Rhi/jsVvhvdr4f1t8FT306Z4WhNOG2FoHQtthqe18HQbvND9oile1ISLRhhRxyKa4UUtvNgGT3Q/aYonNeGkEYbUsZBmeFILT7bAM3Qvt3zZTkE+XQnTIuWeXO5mcx3iBGg6F0CgXCY44ZrsiVskQKZ7oWTK/9E9IC4cstR0RPlal2WAJY/qmQQckjgSOQnFydgHUkmLXc8kkOcWE3ERhPkpSbkJEMof0/VNyp2cGu/z6WYzlfjpTTImcsMO5B42O4kkP45SfQ6dbJ1zOkhXfDQ7RAFsZocQgs0gWu3TKj8hNAu7g3LrvnQvl3KA5GGMoRP5IW9W4d/JRWN+GMhFk3/H+1m1zQjma6OsduTB0FaPgXpxeXhgtw5s+4fW3suT1RnxkfG18Y3x3LCNjvHSeGWcGucGNH4xfjN+N/7Y/XX3z92/dv9eWh8+WM35yqhcu//+D7xDoq8=</latexit>

N(0, 1)

<latexit sha1_base64="zPFDJz1Q5MFquEJzZiL8BGpobt0=">AAAPFHicfZdNb9s2HMbV7q3L1izdgF12ERYUGIYgkBrHL4cCtWQbPSxtluWlax0EFE0rQiSRICnHrqaPsS+w6/YNdhx23X0fYN9jf8m2LFGSdTHN5+Gjn/4UbdJhviekYfz74OEHH3708SePPt357PPHu1/sPfnyUtCIY3KBqU/5GwcJ4nshuZCe9MkbxgkKHJ9cOXd2ql/NCBceDc/lgpHrALmhN/UwktB1s/f1GDt+PHaoPxGLAD7icRAlyc3evnFoZJdebZirxr62uk5vnjz+bzyhOApIKLGPhHhnGkxex4hLD/sk2RlHgjCE75BL3kVy2r2OvZBFkoQ40Z+CNo18XVI9ZdQnHidY+gtoIMw9SNDxLeIIS3iSnXKUICEKiDiYzDwmlk0xc5cNiaAM1/E8K1NSGhi7HLFbD89LZDEKRIDkbaUzrU25k0Q+4bOg3JlSAqPinBOOPZHW4BQK85qllRfn9HSl3y7YLQlFEkfcT4oDQSCckykMzJqCyIjF2cPAdN+J55JH5CBtZn3PB4jfnZHJAeSUOso4U58iWe5ygrQ4IbnHNAhQOInHLInHksxlPD44TEB8qjs+mB2K+KTsPEtieGWgZo6jn4G1JL4qiK9UcVgQh6ubwGuoTynXZzD/lAsdjDpYuIeJKI++yEdP9Qs1+rIgXqriVUG8UkUnKqhRRZ0V1FlFvS+o96o6L4hzVVwUxIUqvi+I71XxzbbYn7fFvlViof6wKBY74wmZwq9J9grFdziJX56f/JDEvexavQsR0c2yETtr49Go3el1ElX213pr1DWtQVXPDR2rb9p1htzR79jGYLhheaZ4c2jDaA/7bTUK+xu927NHVX0Da/Q7A6vGsKEd2a1hZ1U+QkLF6uY+u9duVcri5jk9y7KOe1U9N1gt2+4+qzHkDnswGPTtDIVFnPlE8bK1sd0+NqpRLA/qGu1Wv0bfTIDRtawKLCuwWCPLHJgZiyTIV5wyf1nsbr83VIPkpv5W37YrEygL5e/a5qBVY9iwHg+Oh0cZCeUodNWq0Lx87U73qKtG0Txo1IEZrLDQzZ1GPcvoVFhogWVk2VZa2PJKhEX2zryOlz+5m3Wn75t6kqhmWGiqOV17a7Pi9WvMfrO71r7NXz/Ab4SvPinGTfG4Jhw3wuA6FtwMj2vh8TZ4t+p3m+LdmnC3EcatY3Gb4d1aeHcbPKv6WVM8qwlnjTCsjoU1w7NaeLYNXlb9sile1oTLRhhZxyKb4WUtvNwGT6t+2hRPa8JpIwytY6HN8LQWnm6B5+QetnzpTgHerphXImFPDrvZTMd+jCq6kEiSTIajhajIjrwlEqW6EwBT9qXqQVHugGZFJ0ysdWh6PvConoknMI1CmZEwOOS4CJQk3/VMPDi36ERIL8gORspDoAD+TNcPCTs5Nd4V081mKnaTm3hMYcOOYA+bnkTin0ZJdQybbB1zOkhWfCw9RCFfXx7QdC9c7dNKfyEsDbvDsHVfupdTOSBwGOPkBG7yehX+PUwadwMPJg0+xwdpa5sRzddGaO3AwdBUj4HVxtWzQ7N1aJo/tvZfnKzOiI+0b7Rvte80U+toL7SX2ql2oWHtF+037Xftj91fd//c/Wv376X14YPVmK+00rX7z/9csZ51</latexit>µ

<latexit sha1_base64="GYuO5la/Tn1O75f0NO0RswHDISE=">AAAPF3icfZdNb9s2HMbV7q3L1izdTsMuwoICwxAEUuP45VCglmyjh6XN0rx0q4OAomlFCCUSJOXYFYR9jX2BXbdvsOOw6477APseo2RblkjJupjm8/DRT3+KNulRHHBhWf8+ePjBhx99/MmjT3c++/zx7hd7T7685CRmEF1Aggl76wGOcBChCxEIjN5ShkDoYXTl3bmZfjVDjAckOhcLiq5D4EfBNIBAyK6bva/HkOJk7BE84YtQfiTjN4EfgjS92du3Dq38MvWGvWrsG6vr9ObJ4//GEwLjEEUCYsD5O9ui4joBTAQQo3RnHHNEAbwDPnoXi2n3OgkiGgsUwdR8KrVpjE1BzAzTnAQMQYEXsgEgC2SCCW8BA1DIh9mpRnEUgRDxg8ksoHzZ5DN/2RBAVuI6meeVSisDE58BehvAeYUsASEPgbjVOrPyVDtRjBGbhdXOjFIyKs45YjDgWQ1OZWFe06z4/JycrvTbBb1FEU+TmOG0PFAKiDE0lQPzJkcipkn+MHLG7/hzwWJ0kDXzvucDwO7O0ORA5lQ6qjhTTICodnlhVpwI3UMShiCaJGOaJmOB5iIZHxymUnxqeliaPQLYpOo8S5NknNXM88wzaa2Ir0riK1UclsTh6ibyTTSnhJkzOf+EcVMaTWlhAUS8OvqiGD01L9Toy5J4qYpXJfFKFb24pMaaOiupM029L6n3qjoviXNVXJTEhSq+L4nvVfHtttiftsX+rMTK+stFsdgZT9BU/qDkr1ByB9Pk5fnJD2nSy6/VuxAj064aobc2Ho3anV4nVWW81lujru0MdL0wdJy+7dYZCke/41qD4YblmeItoC2rPey31SiIN3q35450fQNr9TsDp8awoR25rWFnVT6EIsXqFz63125pZfGLnJ7jOMc9XS8MTst1u89qDIXDHQwGfTdHoTGjGCleuja228eWHkWLoK7VbvVr9M0EWF3H0WBpicUZOfbAzlkEAlhxiuJlcbv93lANEpv6O33X1SZQlMrfde1Bq8awYT0eHA+PchLCQOSrVSFF+dqd7lFXjSJF0KgjZ1BjIZs7jXqO1dFYSIll5LhOVtjqSpSL7J19nSx/cjfrzty3zTRVzXKhqeZs7a3NihfXmHGzu9a+zV8/ADfC608KYVM8rAmHjTCwjgU2w8NaeLgN3tf9flO8XxPuN8L4dSx+M7xfC+9vg6e6nzbF05pw2ghD61hoMzythafb4IXuF03xoiZcNMKIOhbRDC9q4cU2eKL7SVM8qQknjTCkjoU0w5NaeLIFnqF7ueXLdgry7UqYFin35HI3m+sQJ0DTuQAC5bI8XXBN9sQtEiDTvVAy5V90D4gLh2xqOqJ8rctmgCWP6pkEHJI4EjlJdubxQRjmt1rueiaBPLeYiIsgzM9GykOAUP6Zrh9S7uTUeJ9PN5upxE9vkjGRG3Yg97DZSSR5M0r1MXSydczpIF3x0ewQBbC5PKOZQbTap1X+QmgWdgfl1n3pXk7lAMnDGEMn8iavV+Hfy0ljfhjISZOf44Ostc0I5mujbO3Ig6GtHgP1xtWzQ7t1aNs/tvZfnKzOiI+Mb4xvje8M2+gYL4yXxqlxYUDjF+M343fjj91fd//c/Wv376X14YPVmK+MyrX7z/9Rzp+w</latexit>

<latexit sha1_base64="jP+88+J5EambS0r9WjxZ5v5gh3s=">AAAPAXicfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKpmUhlEiQlGNX0GVfYNftG+w47LpPsg+w7zFKsWWJlKxL/uHz8PFPf0o26VEccGFZ/z56/NHHn3z62ZPPd7748unus+cvvrrkJGYQXUCCCXvvAY5wEKELEQiM3lOGQOhhdOXduZl+NUeMByQ6F0uKbkLgR8E0gEDIoYuxhyi/fb5nHVj5ZeqFvSr2jNV1evvi6X/jCYFxiCIBMeD82raouEkAEwHEKN0ZxxxRAO+Aj65jMe3eJEFEY4EimJovpTaNsSmImQGZk4AhKPBSFgCyQCaYcAYYgEJi71SjOIpAiPj+ZB5Q/lDyuf9QCCDv+SZZ5D1JKxMTnwE6C+CiQpaAkIdAzLRBvgy96iCKMWLzsDqYUUpGxblADAY868GpbMw7mrWZn5PTlT5b0hmKeJrEDKfliVJAjKGpnJiXHImYJvnNyLW9468Ei9F+VuZjrwaA3Z2hyb7MqQxUcaaYAFEd8sKsORG6hyQMQTRJxjRNxgItRDLeP0il+NL0sDR7BLBJ1XmWJsk465nnmWfSWhHflsS3qjgsicPVhxA8MaeEmXO5/oRxUxpNaWEBRLw6+6KYPTUv1OjLknipilcl8UoVvbikxpo6L6lzTb0vqfequiiJC1VclsSlKn4oiR9U8f222J+3xf6ixMr+y5diuTOeoKn86sgfoeQOpsmb85Mf0qSXX6tnIUamXTVCb208GrU7vU6qynitt0Zd2xnoemHoOH3brTMUjn7HtQbDDcuh4i2gLas97LfVKIg3erfnjnR9A2v1OwOnxrChHbmtYWfVPoQixeoXPrfXbmlt8YucnuM4xz1dLwxOy3W7hzWGwuEOBoO+m6PQmFGMFC9dG9vtY0uPokVQ12q3+jX6ZgGsruNosLTE4owce2DnLAIBrDhF8bC43X5vqAaJTf+dvutqCyhK7e+69qBVY9iwHg+Oh0c5CWEg8tWukKJ97U73qKtGkSJo1JErqLGQzSeNeo7V0VhIiWXkuE7W2OqbKF+ya/smefjK3bx35p5tpqlqli+aas7evbVZ8eIaM25219q3+esn4EZ4/U4hbIqHNeGwEQbWscBmeFgLD7fB+7rfb4r3a8L9Rhi/jsVvhvdr4f1t8FT306Z4WhNOG2FoHQtthqe18HQbvND9oile1ISLRhhRxyKa4UUtvNgGT3Q/aYonNeGkEYbUsZBmeFILT7bAM3Qvt3zZTkE+XQnTIuWeXO5mcx3iBGg6F0CgXCY44ZrsiRkSINO9UDLl/+geEBcOWWq6PK+sdVkGWPKonknAIYkjkZNQnIx9IJW02PVMAnluMREXQZifgpSbAKH8MV3fpNzJqfE+n242U4mf3iZjIjfsQO5hs5NI8tMo1efQydY5p4N0xUezQxTAZnYIIdgMotU+rfITQrOwOyi37g/uh6UcIHkYY+hEfsi7Vfj3ctGYHwZy0eTf8X5WbTOCxdooqx15MLTVY6BeXB0e2K0D2/6xtff6ZHVGfGJ8Y3xrfGfYRsd4bbwxTo0LAxqB8Zvxu/HH7q+7f+7+tfv3g/Xxo9Wcr43KtfvP/0/8llI=</latexit>✏

<latexit sha1_base64="ytaGjQ3HxJR6GVJlezzzjEx+HxU=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxLGD4PH/30pyiTHsUBF5b176PHH338yaefPfl854svn+4+e/7iq0tOYgbRBSSYsPce4AgHEboQgcDoPWUIhB5GV97MzfSrOWI8ING5WFJ0EwI/CqYBBEJ2XY05CClGt8/3rAMrv0y9Ya8ae8bqOr198fS/8YTAOESRgBhwfm1bVNwkgIkAYpTujGOOKIAz4KPrWEy7N0kQ0VigCKbmS6lNY2wKYmZI5iRgCAq8lA0AWSATTHgHGIBCgu9UoziKQIj4/mQeUP7Q5HP/oSGAfOqbZJFXJa0MTHwG6F0AFxWyBIQ8BOJO6+TL0Kt2ohgjNg+rnRmlZFScC8RgwLManMrCvKNZofk5OV3pd0t6hyKeJjHDaXmgFBBjaCoH5k2OREyT/GHk7M74K8FitJ81875XA8BmZ2iyL3MqHVWcKSZAVLu8MCtOhO4hCUMQTZIxTZOxQAuRjPcPUim+ND0szR4BbFJ1nqVJMs5q5nnmmbRWxLcl8a0qDkvicHUTgifmlDBzLuefMG5KoyktLICIV0dfFKOn5oUafVkSL1XxqiReqaIXl9RYU+clda6p9yX1XlUXJXGhisuSuFTFDyXxgyq+3xb787bYX5RYWX+5KJY74wmayo9H/golM5gmb85PfkiTXn6t3oUYmXbVCL218WjU7vQ6qSrjtd4adW1noOuFoeP0bbfOUDj6HdcaDDcsh4q3gLas9rDfVqMg3ujdnjvS9Q2s1e8MnBrDhnbktoadVfkQihSrX/jcXrullcUvcnqO4xz3dL0wOC3X7R7WGAqHOxgM+m6OQmMmP+OKl66N7faxpUfRIqhrtVv9Gn0zAVbXcTRYWmJxRo49sHMWgQBWnKJ4WdxuvzdUg8Sm/k7fdbUJFKXyd1170KoxbFiPB8fDo5yEMBD5alVIUb52p3vUVaNIETTqyBnUWMjmTqOeY3U0FlJiGTmukxW2uhLlIru2b5KHT+5m3Zl7tpmmqlkuNNWcrb21WfHiGjNudtfat/nrB+BGeP1JIWyKhzXhsBEG1rHAZnhYCw+3wfu632+K92vC/UYYv47Fb4b3a+H9bfBU99OmeFoTThthaB0LbYantfB0G7zQ/aIpXtSEi0YYUccimuFFLbzYBk90P2mKJzXhpBGG1LGQZnhSC0+2wDN0L7d82U5Bvl0J0yLlnlzuZnMd4gRoOhdAoFwmOOGa7Ik7JECme6Fkyv/RPSAuHLKp6YjytS6bAZY8qmcScEjiSOQkFCdjH0glLXY9k0CeW0zERRDm5yDlIfIz0foh5X5Mjff5dLOZSvz0NhkTuWEHcg+bnUSSn0apPoZOto45HaQrPpodogA2s0MIwWYQrfZplZ8QmoXNoNy6P7gfpnKA5GGMoRN5k3er8O/lpDE/DOSkyb/j/ay1zQgWa6Ns7ciDoa0eA/XG1eGB3Tqw7R9be69PVmfEJ8Y3xrfGd4ZtdIzXxhvj1LgwoDEzfjN+N/7Y/XX3z92/dv9+sD5+tBrztVG5dv/5Hw1/l0A=</latexit>z

<latexit sha1_base64="hfskakxjJ4XE/BPcIPEUN43jUxo=">AAAPBHicfZdPb9s2GMbV7l+XrWm7HXcRFhQYhiCQEsd/DgVqyTZ6WNosS5pscRBQNC0LoUSCpBy7go77Artu32DHYdd9j32AfY9Rii1LpGRd8obPw8c/vZRs0qM44MKy/n30+KOPP/n0syef73zx5dPdZ89ffPWek5hBdAEJJuzKAxzhIEIXIhAYXVGGQOhhdOnduZl+OUeMByQ6F0uKbkLgR8E0gEDIoauxTEjGi/T2+Z51YOWXqRf2qtgzVtfp7Yun/40nBMYhigTEgPNr26LiJgFMBBCjdGccc0QBvAM+uo7FtHuTBBGNBYpgar6U2jTGpiBmxmROAoagwEtZAMgCmWDCGWAACkm+U43iKAIh4vuTeUD5Q8nn/kMhgLztm2SRtyWtTEx8BugsgIsKWQJCHgIx0wb5MvSqgyjGiM3D6mBGKRkV5wIxGPCsB6eyMe9o1ml+Tk5X+mxJZyjiaRIznJYnSgExhqZyYl5yJGKa5Dcjl/eOvxIsRvtZmY+9GgB2d4Ym+zKnMlDFmWICRHXIC7PmROgekjAE0SQZ0zQZC7QQyXj/IJXiS9PD0uwRwCZV51maJOOsZ55nnklrRXxbEt+q4rAkDlcfQvDEnBJmzuX6E8ZNaTSlhQUQ8ersi2L21LxQo9+XxPeqeFkSL1XRi0tqrKnzkjrX1PuSeq+qi5K4UMVlSVyq4oeS+EEVr7bF/rwt9hclVvZfvhTLnfEETeW3R/4IJXcwTd6cn/yQJr38Wj0LMTLtqhF6a+PRqN3pdVJVxmu9NerazkDXC0PH6dtunaFw9DuuNRhuWA4VbwFtWe1hv61GQbzRuz13pOsbWKvfGTg1hg3tyG0NO6v2IRQpVr/wub12S2uLX+T0HMc57ul6YXBarts9rDEUDncwGPTdHIXGjGKkeOna2G4fW3oULYK6VrvVr9E3C2B1HUeDpSUWZ+TYAztnEQhgxSmKh8Xt9ntDNUhs+u/0XVdbQFFqf9e1B60aw4b1eHA8PMpJCAORr3aFFO1rd7pHXTWKFEGjjlxBjYVsPmnUc6yOxkJKLCPHdbLGVt9E+ZJd2zfJw1fu5r0z92wzTVWzfNFUc/burc2KF9eYcbO71r7NXz8BN8LrdwphUzysCYeNMLCOBTbDw1p4uA3e1/1+U7xfE+43wvh1LH4zvF8L72+Dp7qfNsXTmnDaCEPrWGgzPK2Fp9vghe4XTfGiJlw0wog6FtEML2rhxTZ4ovtJUzypCSeNMKSOhTTDk1p4sgWeoXu55ct2CvLpSpgWKffkcjeb6xAnQNO5AALlsjxYcE32xAwJkOleKJnyf3QPiAuHLDUdUb7WZRlgyaN6JgGHJI5ETkLlEccHUkmLXc8kkOcWE3ERhPlBSLkJEMof0/VNyv2YGu/z6WYzlfjpbTImcsMO5B42O4kkP41SfQ6dbJ1zOkhXfDQ7RAFsZocQgs0gWu3TKj8hNAu7g3Lr/uB+WMoBkocxhk7kh7xbhX8vF435YSAXTf4d72fVNiNYrI2y2pEHQ1s9BurF5eGB3Tqw7R9be69PVmfEJ8Y3xrfGd4ZtdIzXxhvj1LgwoIGN34zfjT92f939c/ev3b8frI8freZ8bVSu3X/+BwQYl9I=</latexit>x

KL

<latexit sha1_base64="kz2OU4P+vdC26/8qpZKCr1THJxs=">AAAPD3icfZdNjts2GIaV/qbTZpq0y26EDgIUxWAgJR7/LALEkm1k0STTNJNJGw8GFE1rhJFEgqQ8dgTdoRfotr1Bl0W3PUIP0Hv0k2zLEilZG9N8X7569FG0SY+FgZCW9e+dDz786ONPPr372cHnX9w7/PL+g6/eCJpwTM4xDSl/6yFBwiAm5zKQIXnLOEGRF5IL78bN9YsF4SKg8Wu5YuQyQn4czAOMJHRd3X8w9Wg4E6sIPtIpuw6yq/tH1olVXKbesDeNI2NznV09uPffdEZxEpFY4hAJ8c62mLxMEZcBDkl2ME0EYQjfIJ+8S+S8f5kGMUskiXFmPgRtnoSmpGaOZ84CTrAMV9BAmAeQYOJrxBGW8BAH9ShBYhQRcTxbBEysm2LhrxsSQQUu02VRoaw2MPU5ggfFyxpZiiIRIXmtdea1qXeSJCR8EdU7c0pgVJxLwnEg8hqcQWFesrzo4jU92+jXK3ZNYpGlCQ+z6kAQCOdkDgOLpiAyYWnxMDDTN+KJ5Ak5zptF35MR4jevyOwYcmoddZx5SJGsd3lRXpyY3GIaRSiewTuQpVNJljKdHp9kID40vRDMHkV8Vne+ytJ0mtfM88xXYK2JLyriC1UcV8Tx5ibwGppzys0FzD/lwgSjCRYeYCLqo8/L0XPzXI1+UxHfqOJFRbxQRS+pqImmLirqQlNvK+qtqi4r4lIVVxVxpYrvK+J7VXy7L/bnfbG/KLFQf1gUq4PpjMzhh6R4hdIbnKXPXj//IUsHxbV5FxJi2nUj9rbGx5Nub9DLVDnc6p1J33ZGul4aes7QdpsMpWPYc63ReMfySPGW0JbVHQ+7ahQOd3p/4E50fQdrDXsjp8Gwo524nXFvUz5CYsXqlz530O1oZfHLnIHjOKcDXS8NTsd1+48aDKXDHY1GQ7dAYQlnIVG8bGvsdk8tPYqVQX2r2xk26LsJsPqOo8GyCoszceyRXbBIgkLFKcuXxe0PB2M1SO7q7wxdV5tAWSl/37VHnQbDjvV0dDp+XJBQjmJfrQoty9ft9R/31ShaBk16MIMaC93daTJwrJ7GQissE8d18sLWVyIssnf2Zbr+yd2tO/PINrNMNcNCU8352tuaFW/YYA7b3Y32ff7mAWErvP6kGLfF44Zw3AqDm1hwOzxuhMf74H3d77fF+w3hfiuM38Tit8P7jfD+Pnim+1lbPGsIZ60wrImFtcOzRni2D17qftkWLxvCZSuMbGKR7fCyEV7ug6e6n7bF04Zw2gpDm1hoOzxthKd74Dm5hS1fvlOAtyvlWiTsyWE3W+g4TJGmC4kkKWQ4WghN9uQ1kSjXvQiYii+6ByWlA5qaTpjY6tAMQuBRPbNAYJrEsiBhcMjxEShZueuZBXBuMYmQQVSciZSHQBH8mW4fEvZjarwv5rvNVOpnV+mUwoYdwR42P4mkP00yfQyb7R1zNso2fCw/RKHQXB/QzCDe7NNqfyEsD7vBsHVfu9dTOSJwGOPkOdzk5Sb8e5g07kcBTBp8To/z1j4jWm6N0DqAg6GtHgP1xsWjE7tzYts/do6ePt+cEe8a3xjfGt8ZttEznhrPjDPj3MDGrfGb8bvxx+Gvh38e/nX499r6wZ3NmK+N2nX4z//fYJwf</latexit>

<latexit sha1_base64="UxFu6bt6MHIcMG5kg7mdvIpbOf8=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIjeOXQ4Faso0eljbL8tKtDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKtmWJlKxL/uHz8PFPf0o26VEccGFZ/z54+NHHn3z62aPP97748vH+k6fPvrriJGYQXUKCCXvnAY5wEKFLEQiM3lGGQOhhdO3N3Ey/niPGAxJdiCVFNyHwo2AaQCDk0PXYE3dIgNunB9aRlV+mXtjr4sBYX2e3zx7/N54QGIcoEhADzt/bFhU3CWAigBile+OYIwrgDPjofSym3ZskiGgsUART87nUpjE2BTEzJHMSMAQFXsoCQBbIBBPeAQagkOB71SiOIhAifjiZB5SvSj73V4UA8q5vkkXelbQyMfEZoHcBXFTIEhDyEIg7bZAvQ686iGKM2DysDmaUklFxLhCDAc96cCYb85ZmjeYX5Gyt3y3pHYp4msQMp+WJUkCMoamcmJcciZgm+c3I1Z3xl4LF6DAr87GXA8Bm52hyKHMqA1WcKSZAVIe8MGtOhO4hCUMQTZIxTZOxQAuRjA+PUik+Nz0szR4BbFJ1nqdJMs565nnmubRWxDcl8Y0qDkvicP0hBE/MKWHmXK4/YdyURlNaWAARr86+LGZPzUs1+qokXqnidUm8VkUvLqmxps5L6lxT70vqvaouSuJCFZclcamKH0riB1V8tyv2512xvyixsv/ypVjujSdoKr888kcomcE0eX1x+kOa9PJr/SzEyLSrRuhtjMejdqfXSVUZb/TWqGs7A10vDB2nb7t1hsLR77jWYLhleaF4C2jLag/7bTUK4q3e7bkjXd/CWv3OwKkxbGlHbmvYWbcPoUix+oXP7bVbWlv8IqfnOM5JT9cLg9Ny3e6LGkPhcAeDQd/NUWjMKEaKl26M7faJpUfRIqhrtVv9Gn27AFbXcTRYWmJxRo49sHMWgQBWnKJ4WNxuvzdUg8S2/07fdbUFFKX2d1170KoxbFlPBifD45yEMBD5aldI0b52p3vcVaNIETTqyBXUWMj2k0Y9x+poLKTEMnJcJ2ts9U2UL9l7+yZZfeVu3zvzwDbTVDXLF001Z+/exqx4cY0ZN7tr7bv89RNwI7x+pxA2xcOacNgIA+tYYDM8rIWHu+B93e83xfs14X4jjF/H4jfD+7Xw/i54qvtpUzytCaeNMLSOhTbD01p4ugte6H7RFC9qwkUjjKhjEc3wohZe7IInup80xZOacNIIQ+pYSDM8qYUnO+AZupdbvmynIJ+uhGmRck8ud7O5DnECNJ0LIFAuE5xwTV6dNjLdCyVT/o/uAXHhkKWmI8o3uiwDLHlUzyTgkMSRyEkoTsY+kEpa7HomgTy3mIiLIMzPQcpNgFD+mG5uUu7H1HifT7ebqcRPb5MxkRt2IPew2Ukk+WmU6nPoZOecs0G65qPZIQpgMzuEEGwG0XqfVvkJoVnYDMqt+8q9WsoBkocxhk7lh7xdh38vF435YSAXTf4dH2bVLiNYbIyy2pMHQ1s9BurF9Ysju3Vk2z+2Dl6drs+Ij4xvjG+N7wzb6BivjNfGmXFpQGNm/Gb8bvyx/+v+n/t/7f+9sj58sJ7ztVG59v/5H3tQlzY=</latexit>

<latexit sha1_base64="774AfefSWULS+zvPT37aYqCoi3U=">AAAPGXicfZdPb9s2GMbV7l+XrVm7HXsRFhTohiCQWsd/DgVqyTZ6WNusa5puURBQNK0IoUSCpBy7mg77GvsCu27fYMdh1532AfY9Rsm2LJGSdQnD5+Hjn15KNl+f4pALy/r31u0PPvzo40/ufLr32ed397+4d//Lt5wkDKJTSDBh73zAEQ5jdCpCgdE7yhCIfIzO/Gs318/miPGQxG/EkqKLCARxOAshEHLq8t4Deun54goJ8MiTaam3yH72OIgoRt9c3juwjqziMvWBvR4cGOvr5PL+3f+8KYFJhGIBMeD83LaouEgBEyHEKNvzEo4ogNcgQOeJmPUv0jCmiUAxzMyHUpsl2BTEzEHNacgQFHgpBwCyUCaY8AowAIW8nb16FEcxiBA/nM5DyldDPg9WAwFkLS7SRVGrrLYwDRigVyFc1MhSEPEIiCttki8jvz6JEozYPKpP5pSSUXEuEIMhz2twIgvziubl52/IyVq/WtIrFPMsTRjOqgulgBhDM7mwGHIkEpoWNyP3/Jo/FSxBh/mwmHs6Auz6NZoeypzaRB1nhgkQ9Sk/yosToxtIogjE09SjWeoJtBCpd3iUSfGh6WNp9glg07rzdZamXl4z3zdfS2tNfFkRX6riuCKO1x9C8NScEWbO5f4Txk1pNKWFhRDx+urTcvXMPFWj31bEt6p4VhHPVNFPKmqiqfOKOtfUm4p6o6qLirhQxWVFXKri+4r4XhXf7Yr9cVfsT0qsrL98KZZ73hTN5FdK8Qil1zBLn7958V2WDopr/SwkyLTrRuhvjE8m3d6gl6ky3uidSd92RrpeGnrO0HabDKVj2HOt0XjL8ljxltCW1R0Pu2oUxFu9P3Anur6FtYa9kdNg2NJO3M64ty4fQrFiDUqfO+h2tLIEZc7AcZzjga6XBqfjuv3HDYbS4Y5Go6FboNCEye9xxUs3xm732NKjaBnUt7qdYYO+3QCr7zgaLK2wOBPHHtkFi0AAK05RPixufzgYq0FiW39n6LraBopK+fuuPeo0GLasx6Pj8ZOChDAQB2pVSFm+bq//pK9GkTJo0pM7qLGQ7SdNBo7V01hIhWXiuE5e2PqbKF+yc/siXX3lbt8788A2s0w1yxdNNefv3saseHGDGbe7G+27/M0LcCu8fqcQtsXDhnDYCgObWGA7PGyEh7vgA90ftMUHDeFBK0zQxBK0wweN8MEueKr7aVs8bQinrTC0iYW2w9NGeLoLXuh+0RYvGsJFK4xoYhHt8KIRXuyCJ7qftMWThnDSCkOaWEg7PGmEJzvgGbqRR778pCCfrpRpkfJMLk+zhQ5xCjSdCyBQIcsOg2vyqv/IdT+STMU/ugckpUMONR1RvtHlMMSSR/VMQw5JEouChMpeJwBSycpTzzSUfYuJuAijojtSbqJoijY3Kc9janzAZ9vDVBpkl6lH5IEdyDNs3omkP0wyfQ2d7lxzMsrWfDRvogA28yaEYDOM1+e02k8IzcOuoTy6r9yrrRwh2Ywx9EJ+yKt1+Ldy01gQhXLT5F/vMB/tMoLFxihHe7IxtNU2UB+cPT6yO0e2/X3n4NmLdY94x3hgfG08MmyjZzwznhsnxqkBjV+M34zfjT/2f93/c/+v/b9X1tu31mu+MmrX/j//A/M2oA8=</latexit>

p✓(x|z)

<latexit sha1_base64="kE2njz4/c03+YmTpzL+HmwXe9NE=">AAAPBnicfZdNb9s2HMbV7q3L1qzdjrsICwp0QxBIieOXQ4Faso0eljbL8rbGQUDRtCKEEgmScuwKOu8L7Lp9gx2HXfc19gH2PUYptiyRknUJw+fho5/+FGXSozjgwrL+ffT4o48/+fSzJ59vffHl0+2vnj3/+pyTmEF0Bgkm7NIDHOEgQmciEBhdUoZA6GF04d25mX4xQ4wHJDoVC4quQ+BHwTSAQMiu9/TlmIOQYvT9zbMda8/KL1Nv2MvGjrG8jm+eP/1vPCEwDlEkIAacX9kWFdcJYCKAGKVb45gjCuAd8NFVLKbd6ySIaCxQBFPzhdSmMTYFMTMqcxIwBAVeyAaALJAJJrwFDEAh2beqURxFIER8dzILKH9o8pn/0BBAPvh1Ms8Lk1YGJj4D9DaA8wpZAkIeAnGrdfJF6FU7UYwRm4XVzoxSMirOOWIw4FkNjmVh3tGs1vyUHC/12wW9RRFPk5jhtDxQCogxNJUD8yZHIqZJ/jBygu/4K8FitJs1875XA8DuTtBkV+ZUOqo4U0yAqHZ5YVacCN1DEoYgmiRjmiZjgeYiGe/upVJ8YXpYmj0C2KTqPEmTZJzVzPPME2mtiG9L4ltVHJbE4fImBE/MKWHmTM4/YdyURlNaWAARr44+K0ZPzTM1+rwknqviRUm8UEUvLqmxps5K6kxT70vqvarOS+JcFRclcaGKH0riB1W83BT7y6bY90qsrL9cFIut8QRN5fcjf4WSO5gmb06PfkyTXn4t34UYmXbVCL2V8WDU7vQ6qSrjld4adW1noOuFoeP0bbfOUDj6HdcaDNcs+4q3gLas9rDfVqMgXuvdnjvS9TWs1e8MnBrDmnbktoadZfkQihSrX/jcXrullcUvcnqO4xz2dL0wOC3X7e7XGAqHOxgM+m6OQmMmv+OKl66M7fahpUfRIqhrtVv9Gn09AVbXcTRYWmJxRo49sHMWgQBWnKJ4WdxuvzdUg8S6/k7fdbUJFKXyd1170KoxrFkPB4fDg5yEMBD5alVIUb52p3vQVaNIETTqyBnUWMj6TqOeY3U0FlJiGTmukxW2uhLlIruyr5OHT+563Zk7tpmmqlkuNNWcrb2VWfHiGjNudtfaN/nrB+BGeP1JIWyKhzXhsBEG1rHAZnhYCw83wfu632+K92vC/UYYv47Fb4b3a+H9TfBU99OmeFoTThthaB0LbYantfB0E7zQ/aIpXtSEi0YYUccimuFFLbzYBE90P2mKJzXhpBGG1LGQZnhSC082wDN0L7d82U5Bvl0J0yLlnlzuZnMd4gRoOhdAoFwmOOGa7IlbJECme6Fkyv/RPSAuHLKp6YjylS6bAZY8qmcScEjiSOQkFCdjH0glLXY9k0CeW0zERRDmRyHlIfJD0eoh5X5Mjff5dL2ZSvz0JhkTuWEHcg+bnUSSn0epPoZONo45HqRLPpodogA2s0MIwWYQLfdplZ8QmoXdQbl1f3A/TOUAycMYQ0fyJu+W4T/ISWN+GMhJk3/Hu1lrkxHMV0bZ2pIHQ1s9BuqNi/09u7Vn2z+1dl4fLc+IT4xvje+Ml4ZtdIzXxhvj2DgzoBEZvxm/G39s/7r95/Zf238/WB8/Wo75xqhc2//8D62gmB8=</latexit>

p(z)

Rec

Rec KL

For Gaussian posteriors, we can use reparameterization to train this model well! Using the same trick as before, we sample noise \epsilon, multiply it with the covariance matrix of the approximate posterior q(z|x) and then add the result to the mean of the approximate posterior. This produces the sample z in a differentiable way!

Reparameterization:

• Noise distribution

Pathwise derivative (=backprop):

<latexit sha1_base64="XJx6yTR/c6jkd3UXTeCqFR3WGEU=">AAAPBHicfZdPb9s2GMbV7l+XrVm7HXcRFhTohiCQEsd/DgVqyTZ6WNssS5qscRBQNC0LoUSCpBy7go77Artu32DHYdd9j32AfY9Rii1LpGRd8obPw8c/vZRs0qM44MKy/n3w8KOPP/n0s0ef73zx5ePdr548/fodJzGD6BwSTNilBzjCQYTORSAwuqQMgdDD6MK7dTP9Yo4YD0h0JpYUXYfAj4JpAIGQQ5f0+dhDlH9/82TPOrDyy9QLe1XsGavr5Obp4//GEwLjEEUCYsD5lW1RcZ0AJgKIUbozjjmiAN4CH13FYtq9ToKIxgJFMDWfSW0aY1MQM2MyJwFDUOClLABkgUww4QwwAIUk36lGcRSBEPH9yTyg/L7kc/++EEDe9nWyyNuSViYmPgN0FsBFhSwBIQ+BmGmDfBl61UEUY8TmYXUwo5SMinOBGAx41oMT2Zi3NOs0PyMnK322pDMU8TSJGU7LE6WAGENTOTEvORIxTfKbkct7y18IFqP9rMzHXgwAuz1Fk32ZUxmo4kwxAaI65IVZcyJ0B0kYgmiSjGmajAVaiGS8f5BK8ZnpYWn2CGCTqvM0TZJx1jPPM0+ltSK+KYlvVHFYEoerDyF4Yk4JM+dy/QnjpjSa0sICiHh19nkxe2qeq9HvSuI7VbwoiReq6MUlNdbUeUmda+pdSb1T1UVJXKjisiQuVfFDSfygipfbYn/ZFvteiZX9ly/Fcmc8QVP57ZE/QsktTJNXZ69/TJNefq2ehRiZdtUIvbXxaNTu9DqpKuO13hp1bWeg64Wh4/Rtt85QOPod1xoMNyyHireAtqz2sN9WoyDe6N2eO9L1DazV7wycGsOGduS2hp1V+xCKFKtf+Nxeu6W1xS9yeo7jHPd0vTA4LdftHtYYCoc7GAz6bo5CY0YxUrx0bWy3jy09ihZBXavd6tfomwWwuo6jwdISizNy7IGdswgEsOIUxcPidvu9oRokNv13+q6rLaAotb/r2oNWjWHDejw4Hh7lJISByFe7Qor2tTvdo64aRYqgUUeuoMZCNp806jlWR2MhJZaR4zpZY6tvonzJruzr5P4rd/PemXu2maaqWb5oqjl799ZmxYtrzLjZXWvf5q+fgBvh9TuFsCke1oTDRhhYxwKb4WEtPNwG7+t+vynerwn3G2H8Oha/Gd6vhfe3wVPdT5viaU04bYShdSy0GZ7WwtNt8EL3i6Z4URMuGmFEHYtohhe18GIbPNH9pCme1ISTRhhSx0Ka4UktPNkCz9Cd3PJlOwX5dCVMi5R7crmbzXWIE6DpXACBcpnghGuyJ2ZIgEz3QsmU/6N7QFw4ZKnp8sCy1mUZYMmjeiYBhySORE5CcTL2gVTSYtczCeS5xURcBGF+EFJuAoTyx3R9k3Inp8b7fLrZTCV+epOMidywA7mHzU4iyc+jVJ9DJ1vnnAzSFR/NDlEAm9khhGAziFb7tMpPCM3CbqHcut+775dygORhjKHX8kPersJ/kIvG/DCQiyb/jvezapsRLNZGWe3Ig6GtHgP14uLwwG4d2PZPrb2X7dUZ8ZHxrfGd8dywjY7x0nhlnBjnBjSw8Zvxu/HH7q+7f+7+tfv3vfXhg9Wcb4zKtfvP/+celxo=</latexit>

p(✏)<latexit sha1_base64="GvKYDiZSGuyEGAgfrnliSPLOFCI=">AAAPN3icfZdNb9s2HMbV7q3L1qzdjrsICwq0QxBIieOXQ4Faso0e1jbr8tI1CgKKphUhlEiQlGNX0GfZ19gX2HU777Lbhl33DUbJsixRknUJw+fh45/+lGz+XYp9Lgzjz3v3P/r4k08/e/D5zhdfPtz96tHjr885iRhEZ5Bgwt65gCPsh+hM+AKjd5QhELgYXbi3dqpfzBHjPglPxZKiqwB4oT/zIRBy6vrRwOEgoBjpz3UHChx7yVPHFTdIgH3dcRHlz3SH+4FOr/Ppp/mCZ9eP9owDI7v0+sDMB3tafp1cP374tzMlMApQKCAGnF+aBhVXMWDChxglO07EEQXwFnjoMhKz/lXshzQSKISJ/kRqswjrgujpbehTnyHJu5QDAJkvE3R4AxiAQt7sTjWKoxAEiO9P5z7lqyGfe6uBALJSV/Eiq2RSWRh7DNAbHy4qZDEIeADETW2SLwO3OokijNg8qE6mlJJRcS4Qgz5Pa3AiC/OGppvDT8lJrt8s6Q0KeRJHDCflhVJAjKGZXJgNORIRjbObkU/ELX8uWIT202E293wE2O1bNN2XOZWJKs4MEyCqU26QFidEd5AEAQinsUOT2BFoIWJn/yCR4hPdxdLsEsCmVefbJI6dtGauq7+V1or4uiS+VsVxSRznH0LwVJ8Rps/l/hPGdWnUpYX5EPHq6rNi9Uw/U6PPS+K5Kl6UxAtVdKOSGtXUeUmd19S7knqnqouSuFDFZUlcquKHkvhBFd9ti/15W+x7JVbWX74Uyx1nimbyCyd7hOJbmMQvT1/9kMSD7MqfhQjpZtUI3bXxaNLtDXqJKuO13pn0TWtU1wtDzxqadpOhcAx7tjEab1gOFW8BbRjd8bCrRkG80fsDe1LXN7DGsDeyGgwb2ondGffy8iEUKlav8NmDbqdWFq/IGViWdTyo64XB6th2/7DBUDjs0Wg0tDMUGjH5Pa546drY7R4b9ShaBPWNbmfYoG82wOhbVg2WllisiWWOzIxFIIAVpygeFrs/HIzVILGpvzW07doGilL5+7Y56jQYNqzHo+PxUUZCGAg9tSqkKF+31z/qq1GkCJr05A7WWMjmkyYDy+jVWEiJZWLZVlrY6psoX7JL8ypefeVu3jt9z9STRDXLF001p+/e2qx4cYMZt7sb7dv8zQtwK3z9TiFsi4cN4bAVBjaxwHZ42AgPt8F7db/XFu81hHutMF4Ti9cO7zXCe9vgad1P2+JpQzhthaFNLLQdnjbC023wou4XbfGiIVy0wogmFtEOLxrhxTZ4UveTtnjSEE5aYUgTC2mHJ43wZAs8Q3fyyJeeFOTTFbNapDyTy9NspkMcg5rOBRAokwmOeU1etR2p7gaSKfun7gFR4ZDDmi4bmbUuhz6WPKpn6nNIolBkJBTHjgekkhSnnqkv+xYdceEHWe+k3ETWFK1vUp7H1HiPzzaHKdlmXccOkQd2IM+waScS/zRJ6mvodOuak1GS89G0iQJYT5sQgnU/zM9plZ8QmobdQnl0X7lXWzlCshlj6JX8kDd5+Pdy05gX+HLT5F9nPx1tM4LF2ihHO7IxNNU2sD64ODwwOwem+WNn78WrvEd8oH2rfac91Uytp73QXmon2pkGtV+037TftT92f939a/ef3X9X1vv38jXfaJVr97//AQAiqw4=</latexit>

z = g(✓,✏) ⇠ p✓(z)

PATHWISE DERIVATIVE

46

<latexit sha1_base64="JYRotR/AW1We/S2YX7ncehvQmNA=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIjeOXQ4Faso0e1jbL8rbVQUDRtCKYEgmScuwKuu0L7Lp9gx2HXfdB9gH2PUbJtiyRknXJP3wePv7pT8kmPYoDLizr3wcPP/r4k08/e/T53hdfPt5/8vTZV5ecxAyiC0gwYdce4AgHEboQgcDomjIEQg+jK2/mZvrVHDEekOhcLCm6CYEfBdMAAiGHrsaeuEMC3D49sI6s/DL1wl4XB8b6Or199vi/8YTAOESRgBhw/t62qLhJABMBxCjdG8ccUQBnwEfvYzHt3iRBRGOBIpiaz6U2jbEpiJkhmZOAISjwUhYAskAmmPAOMACFBN+rRnEUgRDxw8k8oHxV8rm/KgSQd32TLPKupJWJic8AvQvgokKWgJCHQNxpg3wZetVBFGPE5mF1MKOUjIpzgRgMeNaDU9mYdzRrND8np2v9bknvUMTTJGY4LU+UAmIMTeXEvORIxDTJb0au7oy/FCxGh1mZj70cADY7Q5NDmVMZqOJMMQGiOuSFWXMidA9JGIJokoxpmowFWohkfHiUSvG56WFp9ghgk6rzLE2ScdYzzzPPpLUivi2Jb1VxWBKH6w8heGJOCTPncv0J46Y0mtLCAoh4dfZFMXtqXqjRlyXxUhWvSuKVKnpxSY01dV5S55p6X1LvVXVREhequCyJS1X8UBI/qOL1rtifd8X+osTK/suXYrk3nqCp/PLIH6FkBtPk9fmbH9Kkl1/rZyFGpl01Qm9jPB61O71Oqsp4o7dGXdsZ6Hph6Dh9260zFI5+x7UGwy3LC8VbQFtWe9hvq1EQb/Vuzx3p+hbW6ncGTo1hSztyW8POun0IRYrVL3xur93S2uIXOT3HcU56ul4YnJbrdl/UGAqHOxgM+m6OQmNGMVK8dGNst08sPYoWQV2r3erX6NsFsLqOo8HSEoszcuyBnbMIBLDiFMXD4nb7vaEaJLb9d/quqy2gKLW/69qDVo1hy3oyOBke5ySEgchXu0KK9rU73eOuGkWKoFFHrqDGQrafNOo5VkdjISWWkeM6WWOrb6J8yd7bN8nqK3f73pkHtpmmqlm+aKo5e/c2ZsWLa8y42V1r3+Wvn4Ab4fU7hbApHtaEw0YYWMcCm+FhLTzcBe/rfr8p3q8J9xth/DoWvxner4X3d8FT3U+b4mlNOG2EoXUstBme1sLTXfBC94umeFETLhphRB2LaIYXtfBiFzzR/aQpntSEk0YYUsdCmuFJLTzZAc/QvdzyZTsF+XQlTIuUe3K5m811iBOg6VwAgXKZ4IRr8uq0keleKJnyf3QPiAuHLDUdUb7RZRlgyaN6JgGHJI5ETkJxMvaBVNJi1zMJ5LnFRFwEYX4OUm4ChPLHdHOTcienxvt8ut1MJX56m4yJ3LADuYfNTiLJT6NUn0MnO+ecDtI1H80OUQCb2SGEYDOI1vu0yk8IzcJmUG7dV+7VUg6QPIwx9EZ+yLt1+Pdy0ZgfBnLR5N/xYVbtMoLFxiirPXkwtNVjoF5cvTiyW0e2/WPr4FV7fUZ8ZHxjfGt8Z9hGx3hlvDZOjQsDGjPjN+N344/9X/f/3P9r/++V9eGD9Zyvjcq1/8//cW+XHQ==</latexit>

~

<latexit sha1_base64="BRpqmjUu9VUXHozRBk4PwnIRiEQ=">AAAPBHicfZdNb9s2HMbV7q3L1qzdjrsICwp0QxBIieOXQ4Faso0eljbL8rbGQUDRtCKEEgmScuwKOu4L7Lp9gx2HXfc99gH2PUYptiyRknXJP3wePv7pT8kmPYoDLizr30ePP/r4k08/e/L51hdfPt3+6tnzr885iRlEZ5Bgwi49wBEOInQmAoHRJWUIhB5GF96dm+kXM8R4QKJTsaDoOgR+FEwDCIQcuqQvxx6i/PubZzvWnpVfpl7Yy2LHWF7HN8+f/jeeEBiHKBIQA86vbIuK6wQwEUCM0q1xzBEF8A746CoW0+51EkQ0FiiCqflCatMYm4KYGZM5CRiCAi9kASALZIIJbwEDUEjyrWoURxEIEd+dzALKH0o+8x8KAeRtXyfzvC1pZWLiM0BvAzivkCUg5CEQt9ogX4RedRDFGLFZWB3MKCWj4pwjBgOe9eBYNuYdzTrNT8nxUr9d0FsU8TSJGU7LE6WAGENTOTEvORIxTfKbkct7x18JFqPdrMzHXg0AuztBk12ZUxmo4kwxAaI65IVZcyJ0D0kYgmiSjGmajAWai2S8u5dK8YXpYWn2CGCTqvMkTZJx1jPPM0+ktSK+LYlvVXFYEofLDyF4Yk4JM2dy/QnjpjSa0sICiHh19lkxe2qeqdHnJfFcFS9K4oUqenFJjTV1VlJnmnpfUu9VdV4S56q4KIkLVfxQEj+o4uWm2F82xb5XYmX/5Uux2BpP0FR+e+SPUHIH0+TN6dGPadLLr+WzECPTrhqhtzIejNqdXidVZbzSW6Ou7Qx0vTB0nL7t1hkKR7/jWoPhmmVf8RbQltUe9ttqFMRrvdtzR7q+hrX6nYFTY1jTjtzWsLNsH0KRYvULn9trt7S2+EVOz3Gcw56uFwan5brd/RpD4XAHg0HfzVFozChGipeujO32oaVH0SKoa7Vb/Rp9vQBW13E0WFpicUaOPbBzFoEAVpyieFjcbr83VIPEuv9O33W1BRSl9ndde9CqMaxZDweHw4OchDAQ+WpXSNG+dqd70FWjSBE06sgV1FjI+pNGPcfqaCykxDJyXCdrbPVNlC/ZlX2dPHzlrt87c8c201Q1yxdNNWfv3sqseHGNGTe7a+2b/PUTcCO8fqcQNsXDmnDYCAPrWGAzPKyFh5vgfd3vN8X7NeF+I4xfx+I3w/u18P4meKr7aVM8rQmnjTC0joU2w9NaeLoJXuh+0RQvasJFI4yoYxHN8KIWXmyCJ7qfNMWTmnDSCEPqWEgzPKmFJxvgGbqXW75spyCfroRpkXJPLnezuQ5xAjSdCyBQLhOccE32xC0SINO9UDLl/+geEBcOWWq6PLCsdFkGWPKonknAIYkjkZNQnIx9IJW02PVMAnluMREXQZgfhJSbAKH8MV3dpNzJqfE+n643U4mf3iRjIjfsQO5hs5NI8vMo1efQycY5x4N0yUezQxTAZnYIIdgMouU+rfITQrOwOyi37g/uh6UcIHkYY+hIfsi7ZfgPctGYHwZy0eTf8W5WbTKC+cooqy15MLTVY6BeXOzv2a092/6ptfP6aHlGfGJ8a3xnvDRso2O8Nt4Yx8aZAQ1s/Gb8bvyx/ev2n9t/bf/9YH38aDnnG6Nybf/zP+4JlzE=</latexit>

p(✏)<latexit sha1_base64="jP+88+J5EambS0r9WjxZ5v5gh3s=">AAAPAXicfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKpmUhlEiQlGNX0GVfYNftG+w47LpPsg+w7zFKsWWJlKxL/uHz8PFPf0o26VEccGFZ/z56/NHHn3z62ZPPd7748unus+cvvrrkJGYQXUCCCXvvAY5wEKELEQiM3lOGQOhhdOXduZl+NUeMByQ6F0uKbkLgR8E0gEDIoYuxhyi/fb5nHVj5ZeqFvSr2jNV1evvi6X/jCYFxiCIBMeD82raouEkAEwHEKN0ZxxxRAO+Aj65jMe3eJEFEY4EimJovpTaNsSmImQGZk4AhKPBSFgCyQCaYcAYYgEJi71SjOIpAiPj+ZB5Q/lDyuf9QCCDv+SZZ5D1JKxMTnwE6C+CiQpaAkIdAzLRBvgy96iCKMWLzsDqYUUpGxblADAY868GpbMw7mrWZn5PTlT5b0hmKeJrEDKfliVJAjKGpnJiXHImYJvnNyLW9468Ei9F+VuZjrwaA3Z2hyb7MqQxUcaaYAFEd8sKsORG6hyQMQTRJxjRNxgItRDLeP0il+NL0sDR7BLBJ1XmWJsk465nnmWfSWhHflsS3qjgsicPVhxA8MaeEmXO5/oRxUxpNaWEBRLw6+6KYPTUv1OjLknipilcl8UoVvbikxpo6L6lzTb0vqfequiiJC1VclsSlKn4oiR9U8f222J+3xf6ixMr+y5diuTOeoKn86sgfoeQOpsmb85Mf0qSXX6tnIUamXTVCb208GrU7vU6qynitt0Zd2xnoemHoOH3brTMUjn7HtQbDDcuh4i2gLas97LfVKIg3erfnjnR9A2v1OwOnxrChHbmtYWfVPoQixeoXPrfXbmlt8YucnuM4xz1dLwxOy3W7hzWGwuEOBoO+m6PQmFGMFC9dG9vtY0uPokVQ12q3+jX6ZgGsruNosLTE4owce2DnLAIBrDhF8bC43X5vqAaJTf+dvutqCyhK7e+69qBVY9iwHg+Oh0c5CWEg8tWukKJ97U73qKtGkSJo1JErqLGQzSeNeo7V0VhIiWXkuE7W2OqbKF+ya/smefjK3bx35p5tpqlqli+aas7evbVZ8eIaM25219q3+esn4EZ4/U4hbIqHNeGwEQbWscBmeFgLD7fB+7rfb4r3a8L9Rhi/jsVvhvdr4f1t8FT306Z4WhNOG2FoHQtthqe18HQbvND9oile1ISLRhhRxyKa4UUtvNgGT3Q/aYonNeGkEYbUsZBmeFILT7bAM3Qvt3zZTkE+XQnTIuWeXO5mcx3iBGg6F0CgXCY44ZrsiRkSINO9UDLl/+geEBcOWWq6PK+sdVkGWPKonknAIYkjkZNQnIx9IJW02PVMAnluMREXQZifgpSbAKH8MV3fpNzJqfE+n242U4mf3iZjIjfsQO5hs5NI8tMo1efQydY5p4N0xUezQxTAZnYIIdgMotU+rfITQrOwOyi37g/uh6UcIHkYY+hEfsi7Vfj3ctGYHwZy0eTf8X5WbTOCxdooqx15MLTVY6BeXB0e2K0D2/6xtff6ZHVGfGJ8Y3xrfGfYRsd4bbwxTo0LAxqB8Zvxu/HH7q+7f+7+tfv3g/Xxo9Wcr43KtfvP/0/8llI=</latexit>✏

<latexit sha1_base64="ytaGjQ3HxJR6GVJlezzzjEx+HxU=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxLGD4PH/30pyiTHsUBF5b176PHH338yaefPfl854svn+4+e/7iq0tOYgbRBSSYsPce4AgHEboQgcDoPWUIhB5GV97MzfSrOWI8ING5WFJ0EwI/CqYBBEJ2XY05CClGt8/3rAMrv0y9Ya8ae8bqOr198fS/8YTAOESRgBhwfm1bVNwkgIkAYpTujGOOKIAz4KPrWEy7N0kQ0VigCKbmS6lNY2wKYmZI5iRgCAq8lA0AWSATTHgHGIBCgu9UoziKQIj4/mQeUP7Q5HP/oSGAfOqbZJFXJa0MTHwG6F0AFxWyBIQ8BOJO6+TL0Kt2ohgjNg+rnRmlZFScC8RgwLManMrCvKNZofk5OV3pd0t6hyKeJjHDaXmgFBBjaCoH5k2OREyT/GHk7M74K8FitJ81875XA8BmZ2iyL3MqHVWcKSZAVLu8MCtOhO4hCUMQTZIxTZOxQAuRjPcPUim+ND0szR4BbFJ1nqVJMs5q5nnmmbRWxLcl8a0qDkvicHUTgifmlDBzLuefMG5KoyktLICIV0dfFKOn5oUafVkSL1XxqiReqaIXl9RYU+clda6p9yX1XlUXJXGhisuSuFTFDyXxgyq+3xb787bYX5RYWX+5KJY74wmayo9H/golM5gmb85PfkiTXn6t3oUYmXbVCL218WjU7vQ6qSrjtd4adW1noOuFoeP0bbfOUDj6HdcaDDcsh4q3gLas9rDfVqMg3ujdnjvS9Q2s1e8MnBrDhnbktoadVfkQihSrX/jcXrullcUvcnqO4xz3dL0wOC3X7R7WGAqHOxgM+m6OQmMmP+OKl66N7faxpUfRIqhrtVv9Gn0zAVbXcTRYWmJxRo49sHMWgQBWnKJ4WdxuvzdUg8Sm/k7fdbUJFKXyd1170KoxbFiPB8fDo5yEMBD5alVIUb52p3vUVaNIETTqyBnUWMjmTqOeY3U0FlJiGTmukxW2uhLlIru2b5KHT+5m3Zl7tpmmqlkuNNWcrb21WfHiGjNudtfat/nrB+BGeP1JIWyKhzXhsBEG1rHAZnhYCw+3wfu632+K92vC/UYYv47Fb4b3a+H9bfBU99OmeFoTThthaB0LbYantfB0G7zQ/aIpXtSEi0YYUccimuFFLbzYBk90P2mKJzXhpBGG1LGQZnhSC0+2wDN0L7d82U5Bvl0J0yLlnlzuZnMd4gRoOhdAoFwmOOGa7Ik7JECme6Fkyv/RPSAuHLKp6YjytS6bAZY8qmcScEjiSOQkFCdjH0glLXY9k0CeW0zERRDm5yDlIfIz0foh5X5Mjff5dLOZSvz0NhkTuWEHcg+bnUSSn0apPoZOto45HaQrPpodogA2s0MIwWYQrfZplZ8QmoXNoNy6P7gfpnKA5GGMoRN5k3er8O/lpDE/DOSkyb/j/ay1zQgWa6Ns7ciDoa0eA/XG1eGB3Tqw7R9be69PVmfEJ8Y3xrfGd4ZtdIzXxhvj1LgwoDEzfjN+N/7Y/XX3z92/dv9+sD5+tBrztVG5dv/5Hw1/l0A=</latexit>z<latexit sha1_base64="IEo9R4zn4kfotJN/50Rri/5e8ZA=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8rbFQUHRtCKYEgmScuwKuu0L7Lp9gx2HXfdB9gH2PUYptiyRknXJP3wePv7pT8kmPYoDLizr30ePP/r4k08/e/L5zhdfPt199vzFV5ecxAyiC0gwYdce4AgHEboQgcDomjIEQg+jK2/mZvrVHDEekOhcLCm6DYEfBdMAAiGHrsZQ4MRP3z/fsw6s/DL1wl4Ve8bqOn3/4ul/4wmBcYgiATHg/Ma2qLhNABMBxCjdGcccUQBnwEc3sZh2b5MgorFAEUzNl1KbxtgUxMyQzEnAkORYygJAFsgEE94BBqCQ4DvVKI4iECK+P5kHlD+UfO4/FALIu75NFnlX0srExGeA3gVwUSFLQMhDIO60Qb4MveogijFi87A6mFFKRsW5QAwGPOvBqWzMO5o1mp+T05V+t6R3KOJpEjOclidKATGGpnJiXnIkYprkNyNXd8ZfCRaj/azMx14NAJudocm+zKkMVHGmmABRHfLCrDkRuockDEE0ScY0TcYCLUQy3j9IpfjS9LA0ewSwSdV5libJOOuZ55ln0loR35bEt6o4LInD1YcQPDGnhJlzuf6EcVMaTWlhAUS8OvuimD01L9Toy5J4qYpXJfFKFb24pMaaOi+pc029L6n3qrooiQtVXJbEpSp+KIkfVPF6W+zP22J/UWJl/+VLsdwZT9BUfnnkj1Ayg2ny5vzkhzTp5dfqWYiRaVeN0Fsbj0btTq+TqjJe661R13YGul4YOk7fdusMhaPfca3BcMNyqHgLaMtqD/ttNQrijd7tuSNd38Ba/c7AqTFsaEdua9hZtQ+hSLH6hc/ttVtaW/wip+c4znFP1wuD03Ld7mGNoXC4g8Gg7+YoNGYUI8VL18Z2+9jSo2gR1LXarX6NvlkAq+s4GiwtsTgjxx7YOYtAACtOUTwsbrffG6pBYtN/p++62gKKUvu7rj1o1Rg2rMeD4+FRTkIYiHy1K6RoX7vTPeqqUaQIGnXkCmosZPNJo55jdTQWUmIZOa6TNbb6JsqX7Ma+TR6+cjfvnblnm2mqmuWLppqzd29tVry4xoyb3bX2bf76CbgRXr9TCJviYU04bISBdSywGR7WwsNt8L7u95vi/ZpwvxHGr2Pxm+H9Wnh/GzzV/bQpntaE00YYWsdCm+FpLTzdBi90v2iKFzXhohFG1LGIZnhRCy+2wRPdT5riSU04aYQhdSykGZ7UwpMt8Azdyy1ftlOQT1fCtEi5J5e72VyHOAGazgUQKJcJTrgme+IOCZDpXiiZ8n90D4gLhyw1HVG+1mUZYMmjeiYBhySORE5CcTL2gVTSYtczCeS5xURcBGF+DlJuAoTyx3R9k3Inp8b7fLrZTGXHp2RM5IYdyD1sdhJJfhql+hw62TrndJCu+Gh2iALYzA4hBJtBtNqnVX5CaBY2g3Lr/uB+WMoBkocxhk7kh7xbhX8vF435YSAXTf4d72fVNiNYrI2y2pEHQ1s9BurF1eGB3Tqw7R9be69PVmfEJ8Y3xrfGd4ZtdIzXxhvj1LgwoDEzfjN+N/7Y/XX3z92/dv9+sD5+tJrztVG5dv/5H93Cl14=</latexit>g <latexit sha1_base64="dSa//P/biN8T0p7UDtRAa3DR8qw=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxL/uHz8NFPf1o26VEccGFZ/z56/NHHn3z62ZPPd7748unus+cvvrrkJGYQXUCCCXvvAY5wEKELEQiM3lOGQOhhdOXN3Ey/miPGAxKdiyVFNyHwo2AaQCDk0NUY+jiZprfP96wDK79MvbBXxZ6xuk5vXzz9bzwhMA5RJCAGnF/bFhU3CWAigBilO+OYIwrgDPjoOhbT7k0SRDQWKIKp+VJq0xibgpgZkjkJGIICL2UBIAtkggnvAANQSPCdahRHEQgR35/MA8ofSj73HwoB5FPfJIu8K2llYuIzQO8CuKiQJSDkIRB32iBfhl51EMUYsXlYHcwoJaPiXCAGA5714FQ25h3NGs3PyelKv1vSOxTxNIkZTssTpYAYQ1M5MS85EjFN8oeRqzvjrwSL0X5W5mOvBoDNztBkX+ZUBqo4U0yAqA55YdacCN1DEoYgmiRjmiZjgRYiGe8fpFJ8aXpYmj0C2KTqPEuTZJz1zPPMM2mtiG9L4ltVHJbE4eomBE/MKWHmXK4/YdyURlNaWAARr86+KGZPzQs1+rIkXqriVUm8UkUvLqmxps5L6lxT70vqvaouSuJCFZclcamKH0riB1V8vy32522xvyixsv/ypVjujCdoKr888o9QMoNp8ub85Ic06eXX6rMQI9OuGqG3Nh6N2p1eJ1VlvNZbo67tDHS9MHScvu3WGQpHv+Nag+GG5VDxFtCW1R7222oUxBu923NHur6BtfqdgVNj2NCO3Naws2ofQpFi9Quf22u3tLb4RU7PcZzjnq4XBqflut3DGkPhcAeDQd/NUWjMKEaKl66N7faxpUfRIqhrtVv9Gn2zAFbXcTRYWmJxRo49sHMWgQBWnKL4sLjdfm+oBolN/52+62oLKErt77r2oFVj2LAeD46HRzkJYSDy1a6Qon3tTveoq0aRImjUkSuosZDNnUY9x+poLKTEMnJcJ2ts9U2UL9m1fZM8fOVu3jtzzzbTVDXLF001Z+/e2qx4cY0ZN7tr7dv89RNwI7z+pBA2xcOacNgIA+tYYDM8rIWH2+B93e83xfs14X4jjF/H4jfD+7Xw/jZ4qvtpUzytCaeNMLSOhTbD01p4ug1e6H7RFC9qwkUjjKhjEc3wohZebIMnup80xZOacNIIQ+pYSDM8qYUnW+AZupdbvmynkB0MmBYp9+RyN5vrECdA07kAAuUywQnXZE/cIQEy3QslU/6P7gFx4ZClpiPK17osAyx5VM8k4JDEkchJKE7GPpBKWux6JoE8t5iIiyDMz0HKQ4BQ/piuH1Lux9R4n083m6nET2+TMZEbdiD3sNlJJPlplOpz6GTrnNNBuuKj2SEKYDM7hBBsBtFqn1b5CaFZ2AzKrfuD+2EpB0gexhg6kTd5twr/Xi4a88NALpr8O97Pqm1GsFgbZbUjD4a2egzUi6vDA7t1YNs/tvZen6zOiE+Mb4xvje8M2+gYr403xqlxYUBjZvxm/G78sfvr7p+7f+3+/WB9/Gg152ujcu3+8z8UJZdS</latexit>

f

~

<latexit sha1_base64="JYRotR/AW1We/S2YX7ncehvQmNA=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIjeOXQ4Faso0e1jbL8rbVQUDRtCKYEgmScuwKuu0L7Lp9gx2HXfdB9gH2PUbJtiyRknXJP3wePv7pT8kmPYoDLizr3wcPP/r4k08/e/T53hdfPt5/8vTZV5ecxAyiC0gwYdce4AgHEboQgcDomjIEQg+jK2/mZvrVHDEekOhcLCm6CYEfBdMAAiGHrsaeuEMC3D49sI6s/DL1wl4XB8b6Or199vi/8YTAOESRgBhw/t62qLhJABMBxCjdG8ccUQBnwEfvYzHt3iRBRGOBIpiaz6U2jbEpiJkhmZOAISjwUhYAskAmmPAOMACFBN+rRnEUgRDxw8k8oHxV8rm/KgSQd32TLPKupJWJic8AvQvgokKWgJCHQNxpg3wZetVBFGPE5mF1MKOUjIpzgRgMeNaDU9mYdzRrND8np2v9bknvUMTTJGY4LU+UAmIMTeXEvORIxDTJb0au7oy/FCxGh1mZj70cADY7Q5NDmVMZqOJMMQGiOuSFWXMidA9JGIJokoxpmowFWohkfHiUSvG56WFp9ghgk6rzLE2ScdYzzzPPpLUivi2Jb1VxWBKH6w8heGJOCTPncv0J46Y0mtLCAoh4dfZFMXtqXqjRlyXxUhWvSuKVKnpxSY01dV5S55p6X1LvVXVREhequCyJS1X8UBI/qOL1rtifd8X+osTK/suXYrk3nqCp/PLIH6FkBtPk9fmbH9Kkl1/rZyFGpl01Qm9jPB61O71Oqsp4o7dGXdsZ6Hph6Dh9260zFI5+x7UGwy3LC8VbQFtWe9hvq1EQb/Vuzx3p+hbW6ncGTo1hSztyW8POun0IRYrVL3xur93S2uIXOT3HcU56ul4YnJbrdl/UGAqHOxgM+m6OQmNGMVK8dGNst08sPYoWQV2r3erX6NsFsLqOo8HSEoszcuyBnbMIBLDiFMXD4nb7vaEaJLb9d/quqy2gKLW/69qDVo1hy3oyOBke5ySEgchXu0KK9rU73eOuGkWKoFFHrqDGQrafNOo5VkdjISWWkeM6WWOrb6J8yd7bN8nqK3f73pkHtpmmqlm+aKo5e/c2ZsWLa8y42V1r3+Wvn4Ab4fU7hbApHtaEw0YYWMcCm+FhLTzcBe/rfr8p3q8J9xth/DoWvxner4X3d8FT3U+b4mlNOG2EoXUstBme1sLTXfBC94umeFETLhphRB2LaIYXtfBiFzzR/aQpntSEk0YYUsdCmuFJLTzZAc/QvdzyZTsF+XQlTIuUe3K5m811iBOg6VwAgXKZ4IRr8uq0keleKJnyf3QPiAuHLDUdUb7RZRlgyaN6JgGHJI5ETkJxMvaBVNJi1zMJ5LnFRFwEYX4OUm4ChPLHdHOTcienxvt8ut1MJX56m4yJ3LADuYfNTiLJT6NUn0MnO+ecDtI1H80OUQCb2SGEYDOI1vu0yk8IzcJmUG7dV+7VUg6QPIwx9EZ+yLt1+Pdy0ZgfBnLR5N/xYVbtMoLFxiirPXkwtNVjoF5cvTiyW0e2/WPr4FV7fUZ8ZHxjfGt8Z9hGx3hlvDZOjQsDGjPjN+N344/9X/f/3P9r/++V9eGD9Zyvjcq1/8//cW+XHQ==</latexit>

✓<latexit sha1_base64="T8FmzHO54kPteUaY1WhvZp8nAKY=">AAAPBXicfZdNb9s2HMbV7q3L1qzdjrsICwoMQxBIieOXQ4Faso0e1jbL8tbFQUDRtCKEEgmScuwKuu4L7Lp9gx2HXfc59gH2PUbJtiyRknXJP3wePv7pT8kmPYoDLizr30ePP/r4k08/e/L5zhdfPt396tnzry84iRlE55Bgwq48wBEOInQuAoHRFWUIhB5Gl969m+mXM8R4QKIzsaDoJgR+FEwDCIQcek9vx564QwLcPtuzDqz8MvXCXhV7xuo6uX3+9L/xhMA4RJGAGHB+bVtU3CSAiQBilO6MY44ogPfAR9exmHZvkiCisUARTM0XUpvG2BTEzKDMScAQFHghCwBZIBNMeAcYgEKi71SjOIpAiPj+ZBZQviz5zF8WAsj7vknmeV/SysTEZ4DeBXBeIUtAyEMg7rRBvgi96iCKMWKzsDqYUUpGxTlHDAY868GJbMw7mrWan5GTlX63oHco4mkSM5yWJ0oBMYamcmJeciRimuQ3I9f3nr8ULEb7WZmPvRwAdn+KJvsypzJQxZliAkR1yAuz5kToAZIwBNEkGdM0GQs0F8l4/yCV4gvTw9LsEcAmVedpmiTjrGeeZ55Ka0V8WxLfquKwJA5XH0LwxJwSZs7k+hPGTWk0pYUFEPHq7PNi9tQ8V6MvSuKFKl6WxEtV9OKSGmvqrKTONPWhpD6o6rwkzlVxURIXqvihJH5Qxattse+3xf6ixMr+y5disTOeoKn8+sgfoeQepsnrszc/pkkvv1bPQoxMu2qE3tp4NGp3ep1UlfFab426tjPQ9cLQcfq2W2coHP2Oaw2GG5ZDxVtAW1Z72G+rURBv9G7PHen6BtbqdwZOjWFDO3Jbw86qfQhFitUvfG6v3dLa4hc5Pcdxjnu6Xhiclut2D2sMhcMdDAZ9N0ehMaMYKV66Nrbbx5YeRYugrtVu9Wv0zQJYXcfRYGmJxRk59sDOWQQCWHGK4mFxu/3eUA0Sm/47fdfVFlCU2t917UGrxrBhPR4cD49yEsJA5KtdIUX72p3uUVeNIkXQqCNXUGMhm08a9Ryro7GQEsvIcZ2ssdU3Ub5k1/ZNsvzK3bx35p5tpqlqli+aas7evbVZ8eIaM25219q3+esn4EZ4/U4hbIqHNeGwEQbWscBmeFgLD7fB+7rfb4r3a8L9Rhi/jsVvhvdr4f1t8FT306Z4WhNOG2FoHQtthqe18HQbvND9oile1ISLRhhRxyKa4UUtvNgGT3Q/aYonNeGkEYbUsZBmeFILT7bAM/Qgt3zZTkE+XQnTIuWeXO5mcx3iBGg6F0CgXCY44Zq8PG1kuhdKpvwf3QPiwiFLTUeUr3VZBljyqJ5JwCGJI5GTUJyMfSCVtNj1TAJ5bjERF0GYn4SUmwCh/DFd36TcyanxPp9uNlOJn94mYyI37EDuYbOTSPLzKNXn0MnWOSeDdMVHs0MUwGZ2CCHYDKLVPq3yE0KzsHsot+5L93IpB0gexhh6Iz/k3Sr8B7lozA8DuWjy73g/q7YZwXxtlNWOPBja6jFQLy4PD+zWgW3/1Np71V6dEZ8Y3xrfGd8bttExXhmvjRPj3IBGaPxm/G78sfvr7p+7f+3+vbQ+frSa841RuXb/+R+p/5gA</latexit>p✓

<latexit sha1_base64="ytaGjQ3HxJR6GVJlezzzjEx+HxU=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxLGD4PH/30pyiTHsUBF5b176PHH338yaefPfl854svn+4+e/7iq0tOYgbRBSSYsPce4AgHEboQgcDoPWUIhB5GV97MzfSrOWI8ING5WFJ0EwI/CqYBBEJ2XY05CClGt8/3rAMrv0y9Ya8ae8bqOr198fS/8YTAOESRgBhwfm1bVNwkgIkAYpTujGOOKIAz4KPrWEy7N0kQ0VigCKbmS6lNY2wKYmZI5iRgCAq8lA0AWSATTHgHGIBCgu9UoziKQIj4/mQeUP7Q5HP/oSGAfOqbZJFXJa0MTHwG6F0AFxWyBIQ8BOJO6+TL0Kt2ohgjNg+rnRmlZFScC8RgwLManMrCvKNZofk5OV3pd0t6hyKeJjHDaXmgFBBjaCoH5k2OREyT/GHk7M74K8FitJ81875XA8BmZ2iyL3MqHVWcKSZAVLu8MCtOhO4hCUMQTZIxTZOxQAuRjPcPUim+ND0szR4BbFJ1nqVJMs5q5nnmmbRWxLcl8a0qDkvicHUTgifmlDBzLuefMG5KoyktLICIV0dfFKOn5oUafVkSL1XxqiReqaIXl9RYU+clda6p9yX1XlUXJXGhisuSuFTFDyXxgyq+3xb787bYX5RYWX+5KJY74wmayo9H/golM5gmb85PfkiTXn6t3oUYmXbVCL218WjU7vQ6qSrjtd4adW1noOuFoeP0bbfOUDj6HdcaDDcsh4q3gLas9rDfVqMg3ujdnjvS9Q2s1e8MnBrDhnbktoadVfkQihSrX/jcXrullcUvcnqO4xz3dL0wOC3X7R7WGAqHOxgM+m6OQmMmP+OKl66N7faxpUfRIqhrtVv9Gn0zAVbXcTRYWmJxRo49sHMWgQBWnKJ4WdxuvzdUg8Sm/k7fdbUJFKXyd1170KoxbFiPB8fDo5yEMBD5alVIUb52p3vUVaNIETTqyBnUWMjmTqOeY3U0FlJiGTmukxW2uhLlIru2b5KHT+5m3Zl7tpmmqlkuNNWcrb21WfHiGjNudtfat/nrB+BGeP1JIWyKhzXhsBEG1rHAZnhYCw+3wfu632+K92vC/UYYv47Fb4b3a+H9bfBU99OmeFoTThthaB0LbYantfB0G7zQ/aIpXtSEi0YYUccimuFFLbzYBk90P2mKJzXhpBGG1LGQZnhSC0+2wDN0L7d82U5Bvl0J0yLlnlzuZnMd4gRoOhdAoFwmOOGa7Ik7JECme6Fkyv/RPSAuHLKp6YjytS6bAZY8qmcScEjiSOQkFCdjH0glLXY9k0CeW0zERRDm5yDlIfIz0foh5X5Mjff5dLOZSvz0NhkTuWEHcg+bnUSSn0apPoZOto45HaQrPpodogA2s0MIwWYQrfZplZ8QmoXNoNy6P7gfpnKA5GGMoRN5k3er8O/lpDE/DOSkyb/j/ay1zQgWa6Ns7ciDoa0eA/XG1eGB3Tqw7R9be69PVmfEJ8Y3xrfGd4ZtdIzXxhvj1LgwoDEzfjN+N/7Y/XX3z92/dv9+sD5+tBrztVG5dv/5Hw1/l0A=</latexit>z<latexit sha1_base64="dSa//P/biN8T0p7UDtRAa3DR8qw=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxL/uHz8NFPf1o26VEccGFZ/z56/NHHn3z62ZPPd7748unus+cvvrrkJGYQXUCCCXvvAY5wEKELEQiM3lOGQOhhdOXN3Ey/miPGAxKdiyVFNyHwo2AaQCDk0NUY+jiZprfP96wDK79MvbBXxZ6xuk5vXzz9bzwhMA5RJCAGnF/bFhU3CWAigBilO+OYIwrgDPjoOhbT7k0SRDQWKIKp+VJq0xibgpgZkjkJGIICL2UBIAtkggnvAANQSPCdahRHEQgR35/MA8ofSj73HwoB5FPfJIu8K2llYuIzQO8CuKiQJSDkIRB32iBfhl51EMUYsXlYHcwoJaPiXCAGA5714FQ25h3NGs3PyelKv1vSOxTxNIkZTssTpYAYQ1M5MS85EjFN8oeRqzvjrwSL0X5W5mOvBoDNztBkX+ZUBqo4U0yAqA55YdacCN1DEoYgmiRjmiZjgRYiGe8fpFJ8aXpYmj0C2KTqPEuTZJz1zPPMM2mtiG9L4ltVHJbE4eomBE/MKWHmXK4/YdyURlNaWAARr86+KGZPzQs1+rIkXqriVUm8UkUvLqmxps5L6lxT70vqvaouSuJCFZclcamKH0riB1V8vy32522xvyixsv/ypVjujCdoKr888o9QMoNp8ub85Ic06eXX6rMQI9OuGqG3Nh6N2p1eJ1VlvNZbo67tDHS9MHScvu3WGQpHv+Nag+GG5VDxFtCW1R7222oUxBu923NHur6BtfqdgVNj2NCO3Naws2ofQpFi9Quf22u3tLb4RU7PcZzjnq4XBqflut3DGkPhcAeDQd/NUWjMKEaKl66N7faxpUfRIqhrtVv9Gn2zAFbXcTRYWmJxRo49sHMWgQBWnKL4sLjdfm+oBolN/52+62oLKErt77r2oFVj2LAeD46HRzkJYSDy1a6Qon3tTveoq0aRImjUkSuosZDNnUY9x+poLKTEMnJcJ2ts9U2UL9m1fZM8fOVu3jtzzzbTVDXLF001Z+/e2qx4cY0ZN7tr7dv89RNwI7z+pBA2xcOacNgIA+tYYDM8rIWH2+B93e83xfs14X4jjF/H4jfD+7Xw/jZ4qvtpUzytCaeNMLSOhTbD01p4ug1e6H7RFC9qwkUjjKhjEc3wohZebIMnup80xZOacNIIQ+pYSDM8qYUnW+AZupdbvmynkB0MmBYp9+RyN5vrECdA07kAAuUywQnXZE/cIQEy3QslU/6P7gFx4ZClpiPK17osAyx5VM8k4JDEkchJKE7GPpBKWux6JoE8t5iIiyDMz0HKQ4BQ/piuH1Lux9R4n083m6nET2+TMZEbdiD3sNlJJPlplOpz6GTrnNNBuuKj2SEKYDM7hBBsBtFqn1b5CaFZ2AzKrfuD+2EpB0gexhg6kTd5twr/Xi4a88NALpr8O97Pqm1GsFgbZbUjD4a2egzUi6vDA7t1YNs/tvZen6zOiE+Mb4xvje8M2+gYr403xqlxYUBjZvxm/G78sfvr7p+7f+3+/WB9/Gg152ujcu3+8z8UJZdS</latexit>

f

<latexit sha1_base64="DdJNsEp34u8IRqy4Kx09oRq+3wc=">AAAPaXicfZdPb9s2GMaV7l+XrU26XYbtIiwo0A5BIDWO/xwC1JJt9LC2WZY03SIjoGhaEUyJBEk5dgV9yB33AXbZF9h1lGzLEiVZlzB8Hj7+6aVk83Up9rkwjL/3Hn32+RdffvX46/1vvn3y9ODw2XcfOIkYRNeQYMI+uoAj7IfoWvgCo4+UIRC4GN24MzvVb+aIcZ+EV2JJ0TgAXuhPfQiEnLo7nDkBEPeuGw+Tu5jeOa64RwK8cDgIKEYvk1sHejieJvnMWD/XS2teOC6ivOiEAsee/LvKOtZXhpfju8Mj48TILr06MNeDI219Xdw9e/KPMyEwClAoIAac35oGFeMYMOFDjJJ9J+KIAjgDHrqNxLQ7jv2QRgKFMNGfS20aYV0QPb1xfeIzJMmWcgAg82WCDu8BA1DI8uyXozgKQYD48WTuU74a8rm3GgggazuOF1ntk9LC2GOA3vtwUSKLQcDTglUm+TJwy5MowojNg/JkSikZFecCMejztAYXsjDvabqd/IpcrPX7Jb1HIU/iiOGkuFAKiDE0lQuzIUcionF2M/IZmvFzwSJ0nA6zufMBYLNLNDmWOaWJMs4UEyDKU26QFidED5AEAQgnsUOT2BFoIWLn+CSR4nPdxdLsEsAmZedlEsfrh0y/lNaS+K4gvlPFYUEcrj+E4Ik+JUyfy/0njOvSqEsL8yHi5dXX+eqpfq1GfyiIH1TxpiDeqKIbFdSoos4L6ryiPhTUB1VdFMSFKi4L4lIVPxXET6r4cVfsH7ti/1RiZf3lS7HcdyZoKr+iskconsEkfnP19tck7mXX+lmIkG6WjdDdGE9H7U6vk6gy3uitUde0BlU9N3SsvmnXGXJHv2Mbg+GW5ZXizaENoz3st9UoiLd6t2ePqvoW1uh3BlaNYUs7slvDzrp8CIWK1ct9dq/dqpTFy3N6lmWd9ap6brBatt19VWPIHfZgMOjbGQqNmPwJULx0Y2y3z4xqFM2Duka71a/RtxtgdC2rAksLLNbIMgdmxiIQwIpT5A+L3e33hmqQ2Nbf6tt2ZQNFofxd2xy0agxb1rPB2fA0IyEMhJ5aFZKXr93pnnbVKJIHjTpyByssZPtJo55ldCospMAysmwrLWz5TZQv2a05jldfudv3Tj8y9SRRzfJFU83pu7cxK15cY8bN7lr7Ln/9AtwIX71TCJviYU04bISBdSywGR7WwsNd8F7V7zXFezXhXiOMV8fiNcN7tfDeLnha9dOmeFoTThthaB0LbYantfB0F7yo+kVTvKgJF40woo5FNMOLWnixC55U/aQpntSEk0YYUsdCmuFJLTzZAc/QgzzypSeFtGVglUh5Jpen2UyHOAYVnQsgUCYTHPOKvOo7Ut0NJFP2T9UDotwhhxVdtiwbXQ59LHlUz8TnkEShyEgojh0PSCXJTz0TX/YtOuLCD7JuS7mJrJ/a3KQ8j6nxHp9uD1OyobqLHSIP7ECeYdNOJP59lFTX0MnONReDZM1H0yYKYD1tQgjW/XB9Tiv9hNA0bAbl0X3lXm3lAMlmjKG38kPer8N/kZvGvMCXmyb/OsfpaJcRLDZGOdqXjaGptoHVwc2rE7N1Ypq/tY5ev133iI+1n7SftReaqXW019ob7UK71qD2l/bfnra39/Tfg8ODHw5+XFkf7a3XfK+VroOj/wEJrrqY</latexit>

Ep✓(z)[f(z)] = Ep(✏)[f(g(✓,✏))]

<latexit sha1_base64="nF8Z1UrThKxfAieV76Yt+FTIJg8=">AAAPX3icfVdNb9s2GFa7reuytWm307CLsKBANwSB1Dr+OASoJdvoYWmzrmm6VUFAUbQimBI5kXLsCvp9+w277bLjrtt1r2RF1qd18Us+Dx89fEnKfG1OPSE17c87dz/59LN7n9//Yu/Lrx483H/0+Ot3gkUhJueYURa+t5Eg1AvIufQkJe95SJBvU3JhL8wUv1iSUHgseCvXnFz6yA28uYeRhK6rR8hyuaOeqNY8RDi2Fly1sEvjeZJsGgL5nJKkjOddm4Ytr4lEyaFq/R4hB9qEC6B4vsqfZo0frh4daEda9qjNQM+DAyV/zq4eP/jbchiOfBJITJEQH3SNy8sYhdLD8OI9KxKEI7xALvkQyfnwMvYCHkkS4ER9Atg8oqpkajpd1fFCgiVdQ4Bw6IGCiq8RzEVCUvaqUoIEyCfi0Fl6XGxCsXQ3gUSQ0ct4lWU8qQyM3RDxaw+vKs5i5AsfyetGp1j7drWTRJSES7/amboEjzXmioTYE2kOziAxr3m6iOItO8vx6zW/JoFI4iikSXkgACQMyRwGZqEgMuJxNhnYOQtxIsOIHKZh1ncyQeHiDXEOQafSUbUzpwzJapftp8kJyA1mvo8CJ7Y47BRJVjK2Do8SAJ+oNgWyzVDoVJlvkji20pzZtvoGqBXwVQl8VQenJXCav4RRR52zUF3C+rNQqEBUgRJ6mIjq6PNi9Fw9r0u/K4Hv6uBFCbyog3ZUQqMGuiyhywZ6U0Jv6uiqBK7q4LoEruvgxxL4sQ6+3yX76y7Z32qykH84FOs9yyFz+DBlWyhe4CR++fb0pyQeZU++FyKi6lUitm+Jz2f9wWiQ1GF6i/dmQ92YNPGCMDDGutlGKBjjgalNplsvz2rcwrSm9afjfl0K0y0+HJmzJr41q40HE6OFsHU7M3vTQZ4+QoIa1S145qjfa6TFLXRGhmEcj5p4QTB6pjl81kIoGOZkMhmbmRUehfC9r3H5LbHfP9aaUrwQGmr93rgF3y6ANjSMhlle8mLMDH2iZ14kQbTGlMVmMYfj0bQuJLf5N8am2VhAWUr/0NQnvRbC1uvx5Hj6PHPCQhS49aywIn39wfD5sC7FCqHZAFaw4YVt3zQbGdqg4YWVvMwM00gTWz2JcMg+6Jfx5pO7PXfqga4mSZ0MB61OTs/eLbnGpS1k2s1upe/itw+gneabM8W4Sx63iONOM7jNC+42j1vN413m3Sbf7ZJ3W8TdTjNumxe327zbat7dZZ43+bxLnreI804zvM0L7zbPW83zXeZlky+75GWLuOw0I9u8yG7zstW83GWeNfmsS561iLNOM6zNC+s2z1rNsx3mQ3IDV770ppBWGGFDEu7kcJvNcExj1MCFRJJkMKOxaMB5JQK47YOnrNHkoKhgQNjAoWC5xSH0KPipcxxPYBYFMnPCaWy5CJCkuPU4HtQtKhHS87MaqzaJvHjaTBLuY3V5V8y3l6nYTa5ii8GFHcEdNq1E4l9mSXMMd3aOOZskuT+eFlGIqmkRwqjqBfk9rfIXwlOxBVR7OXuzlBMCxVhITuElr3PxH2HRQtf3YNHg1zpMo11EtLolQrQHhaFeLwObwcWzI713pOs/9w5enOY14n3lO+V75amiKwPlhfJSOVPOFaz8ofyj/Kv89/Cv/Xv7UCVvqHfv5GO+USrP/rf/A0rfuCA=</latexit>

gPD =@f

@z

@z

@✓, ✏ ⇠ p(✏)

In general, the reparameterization trick works as follows: Instead of sampling directly from p_\theta, we sample from some noise distribution p(\epsilon) and transform it using a function g. Importantly, this function should have the property that, after transforming the noise \epsilon, its results have the same distribution as the original distribution p_\theta. So, if z=g(\theta, \epsilon) is the result of the transformation, and \epsilon ~ p(\epsilon), then z ~ p_\theta(z).

After a reparameterization, we can use the pathwise derivative. This is a simple application of the chain rule: First compute the derivative of z with respect to the function f, then of z with respect to the parameters \theta through the function g. Although it sounds fancy, the pathwise derivative, this happens automatically when using backpropagation!

REPARAMETERIZATION IN RL?

47

Environment<latexit sha1_base64="1XD/drTWwzJ56XWvUlFI4PIcB5s=">AAAN2HicfZdNb9s2HMbV7q3L6q1dj7sICwp0QxBIieMXFAVqSTZ6WNssyFsbBwFF04pgSiQoyrErCDsOuw7YdfsW+y77NqNkR5YoyrqY4PPw8U9/icKfLsV+xA3jvwcPP/v8iy+/evT1zjePW99+9+Tp9+cRiRlEZ5Bgwi5dECHsh+iM+xyjS8oQCFyMLtyZnekXc8Qin4SnfEnRdQC80J/6EHAxdUVfjCMOOLoxfrp5smvsG/ml1wfmerCrra/jm6eP/x1PCIwDFHKIQRRdmQbl1wlg3IcYpTvjOEIUwBnw0FXMp73rxA9pzFEIU/250KYx1jnRMyx94jMEOV6KAYDMFwk6vAUMQC7gd6pREQpBgKK9ydyn0WoYzb3VgANx59fJIq9MWlmYeAzQWx8uKmQJCKIA8NvaZLQM3OokijFi86A6mVEKRsm5QAz6UVaDY1GY9zQrdnRKjtf67ZLeojBKk5jhtLxQCIgxNBUL82GEeEyT/GbEE55FrziL0V42zOdeOYDNTtBkT+RUJqo4U0wAr065QVacEN1BEgQgnCRjmiZjjhY8Ge/tp0J8rrtYmF0C2KTqPEmTZJzVzHX1E2GtiO9K4jtZHJbE4fpPCJ7oU8L0uXj+hEW6MOrCwnyIourqs2L1VD+To89L4rksXpTEC1l045Ia19R5SZ3X1LuSeieri5K4kMVlSVzK4qeS+EkWL7fFftgW+1GKFfUXm2K5M56gqfiA5K9QMoNp8ub07S9p0s+v9bsQI92sGqF7bzwcdbr9birL+F5vj3qm5dT1wtC1BqatMhSOQdc2nOGG5UDyFtCG0RkOOnIUxBu917dHdX0Dawy6jqUwbGhHdnvYXZcPoVCyeoXP7nfatbJ4RU7fsqyjfl0vDFbbtnsHCkPhsB3HGdg5Co0ZxUjy0ntjp3Nk1KNoEdQzOu2BQt88AKNnWTVYWmKxRpbpmDkLRwBLTl68LHZv0B/KQXxTf2tg27UHyEvl79mm01YYNqxHztHwMCchDISeXBVSlK/T7R325ChSBI264gnWWMjmn0Z9y+jWWEiJZWTZVlbY6k4Um+zKvE5Wn9zNvtN3TT1NZbPYaLI523v3ZsmLFWbc7Fbat/nVC3AjfP1OIWyKh4pw2AgDVSywGR4q4eE2eK/u95riPUW41wjjqVi8ZnhPCe9tg6d1P22Kp4pw2ghDVSy0GZ4q4ek2eF7386Z4rgjnjTBcxcKb4bkSnm+DJ3U/aYoninDSCENULKQZnijhyRZ4hu5Ey5d1CuLtSlgtUvTkopvNdYgTUNPzA0UuE5xE6arNoNmpAGA966oJ1v1w3XhUvok0WzWDohdduVdsDhKnC4beirblvWiJgegSfxYUzAt8QSF+x3vZaJsRLO6NYrQjTjqmfK6pDy4O9s32vmn+2t59/XJ96Hmk/aD9qL3QTK2rvdbeaMfamQY1ov2l/a390/rQ+q31e+uPlfXhg/WaZ1rlav35Px0uGww=</latexit>

p(s0)

<latexit sha1_base64="v7EdJ+ThHXgJX0gdR//dSzJOt1g=">AAAOEnicfZdNb9s2HMbV7q3Lmi3djjtMWFCgG4JAShy/YChQS7bRw9pmQV66RUFA0bQsmBIJinLsajruE+xj7Loddhx23RfYPs0o2ZElirIu+YfPw8c//SUKpEuxH3HD+PfBw/fe/+DDjx59vPPJ491PP9t78vllRGIG0QUkmLC3LogQ9kN0wX2O0VvKEAhcjK7cmZ3pV3PEIp+E53xJ0U0AvNCf+BBwMXS795VD/VvH5VPEwTMHwHzU+NmJOODo1vjmdm/fODTyS68X5rrY19bX6e2Tx/85YwLjAIUcYhBF16ZB+U0CGPchRumOE0eIAjgDHrqO+aR7k/ghjTkKYao/Fdokxjonegarj32GIMdLUQDIfJGgwylgAlPc0k41KkIhCFB0MJ77NFqV0dxbFRyIftwki7xfaWVi4jFApz5cVMgSEEQB4NPaYLQM3OogijFi86A6mFEKRsm5QAz6UdaDU9GYNzRrdnROTtf6dEmnKIzSJGY4LU8UAmIMTcTEvIwQj2mS34x47rPoOWcxOsjKfOz5ALDZGRofiJzKQBVnggng1SE3yJoTojtIggCE48ShaeJwtOCJc3CYCvGp7mJhdglg46rzLE0SJ+uZ6+pnwloRX5fE17I4LInD9Y8QPNYnhOlz8fwJi3Rh1IWF+RBF1dkXxeyJfiFHX5bES1m8KolXsujGJTWuqfOSOq+pdyX1TlYXJXEhi8uSuJTFdyXxnSy+3Rb747bYn6RY0X+xKJY7zhhNxGclf4WSGUyTl+evvk+TXn6t34UY6WbVCN174/Go3el1UlnG93pr1DWtQV0vDB2rb9oqQ+Hod2xjMNywHEneAtow2sN+W46CeKN3e/aorm9gjX5nYCkMG9qR3Rp21u1DKJSsXuGze+1WrS1ekdOzLOukV9cLg9Wy7e6RwlA47MFg0LdzFBozipHkpffGdvvEqEfRIqhrtFt9hb55AEbXsmqwtMRijSxzYOYsHAEsOXnxstjdfm8oB/FN/62+bdceIC+1v2ubg5bCsGE9GZwMj3MSwkDoyV0hRfvane5xV44iRdCoI55gjYVsfmnUs4xOjYWUWEaWbWWNra5EsciuzZtk9cndrDt939TTVDaLhSabs7V3b5a8WGHGzW6lfZtfPQE3wtfvFMKmeKgIh40wUMUCm+GhEh5ug/fqfq8p3lOEe40wnorFa4b3lPDeNnha99OmeKoIp40wVMVCm+GpEp5ug+d1P2+K54pw3gjDVSy8GZ4r4fk2eFL3k6Z4oggnjTBExUKa4YkSnmyBZ+hObPmynYJ4uxJWi1wdHXId4gTU9PxAkcsEJ1FNXp1AMt0NBFP+z2orQrOTA8B6tvMmWPfD9eak8t2k2cwZFPvVlXvFP0DiBMLQK7G1eSO2zUDsJL8VpMwLfEEq/joHWbXNCBb3RlHtiNOQKZ996sXV0aHZOjTNH1r7L75bH4weaV9qX2vPNFPraC+0l9qpdqFB7RftN+137Y/dX3f/3P1r9++V9eGD9ZwvtMq1+8//ia4zmg==</latexit>

⇡✓(a0|s0)

<latexit sha1_base64="KIzHNcLYbod1+x/KccCLlzNOPGI=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe+P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wN8r/HQ</latexit>s0

<latexit sha1_base64="M8sqRzeDFh3Mi1iBGtqcTbyBm/Y=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk98b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gf1BfGy</latexit>a0

~<latexit sha1_base64="9xsJmqaI6QVO7aNPEsbsg9BoZ44=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL7837Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q8UNPHI</latexit>r1~

~<latexit sha1_base64="JYRotR/AW1We/S2YX7ncehvQmNA=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIjeOXQ4Faso0e1jbL8rbVQUDRtCKYEgmScuwKuu0L7Lp9gx2HXfdB9gH2PUbJtiyRknXJP3wePv7pT8kmPYoDLizr3wcPP/r4k08/e/T53hdfPt5/8vTZV5ecxAyiC0gwYdce4AgHEboQgcDomjIEQg+jK2/mZvrVHDEekOhcLCm6CYEfBdMAAiGHrsaeuEMC3D49sI6s/DL1wl4XB8b6Or199vi/8YTAOESRgBhw/t62qLhJABMBxCjdG8ccUQBnwEfvYzHt3iRBRGOBIpiaz6U2jbEpiJkhmZOAISjwUhYAskAmmPAOMACFBN+rRnEUgRDxw8k8oHxV8rm/KgSQd32TLPKupJWJic8AvQvgokKWgJCHQNxpg3wZetVBFGPE5mF1MKOUjIpzgRgMeNaDU9mYdzRrND8np2v9bknvUMTTJGY4LU+UAmIMTeXEvORIxDTJb0au7oy/FCxGh1mZj70cADY7Q5NDmVMZqOJMMQGiOuSFWXMidA9JGIJokoxpmowFWohkfHiUSvG56WFp9ghgk6rzLE2ScdYzzzPPpLUivi2Jb1VxWBKH6w8heGJOCTPncv0J46Y0mtLCAoh4dfZFMXtqXqjRlyXxUhWvSuKVKnpxSY01dV5S55p6X1LvVXVREhequCyJS1X8UBI/qOL1rtifd8X+osTK/suXYrk3nqCp/PLIH6FkBtPk9fmbH9Kkl1/rZyFGpl01Qm9jPB61O71Oqsp4o7dGXdsZ6Hph6Dh9260zFI5+x7UGwy3LC8VbQFtWe9hvq1EQb/Vuzx3p+hbW6ncGTo1hSztyW8POun0IRYrVL3xur93S2uIXOT3HcU56ul4YnJbrdl/UGAqHOxgM+m6OQmNGMVK8dGNst08sPYoWQV2r3erX6NsFsLqOo8HSEoszcuyBnbMIBLDiFMXD4nb7vaEaJLb9d/quqy2gKLW/69qDVo1hy3oyOBke5ySEgchXu0KK9rU73eOuGkWKoFFHrqDGQrafNOo5VkdjISWWkeM6WWOrb6J8yd7bN8nqK3f73pkHtpmmqlm+aKo5e/c2ZsWLa8y42V1r3+Wvn4Ab4fU7hbApHtaEw0YYWMcCm+FhLTzcBe/rfr8p3q8J9xth/DoWvxner4X3d8FT3U+b4mlNOG2EoXUstBme1sLTXfBC94umeFETLhphRB2LaIYXtfBiFzzR/aQpntSEk0YYUsdCmuFJLTzZAc/QvdzyZTsF+XQlTIuUe3K5m811iBOg6VwAgXKZ4IRr8uq0keleKJnyf3QPiAuHLDUdUb7RZRlgyaN6JgGHJI5ETkJxMvaBVNJi1zMJ5LnFRFwEYX4OUm4ChPLHdHOTcienxvt8ut1MJX56m4yJ3LADuYfNTiLJT6NUn0MnO+ecDtI1H80OUQCb2SGEYDOI1vu0yk8IzcJmUG7dV+7VUg6QPIwx9EZ+yLt1+Pdy0ZgfBnLR5N/xYVbtMoLFxiirPXkwtNVjoF5cvTiyW0e2/WPr4FV7fUZ8ZHxjfGt8Z9hGx3hlvDZOjQsDGjPjN+N344/9X/f/3P9r/++V9eGD9Zyvjcq1/8//cW+XHQ==</latexit>

<latexit sha1_base64="nHYTn8zcrIspPDPFYAY/eEDvzmw=">AAAN73icfZdNb9s2HMbVdi9dVq/petxFWFCgG4xAShy/HArUkm30sLRZkLctDgKKphXBlEhQlGNX03UfYcdh1wG7rt9l32aU7MgSRVkXE3wePv7pT1IgHYq9kBvGf48eP/ns8y++fPrVztfPGt88333x7UVIIgbROSSYsCsHhAh7ATrnHsfoijIEfAejS2dmp/rlHLHQI8EZX1J04wM38KYeBFx03e7q9PWYoXvAJrfmb2MAs16jqY9DDji6NX643d0z9o3s0asNc93Y09bPye2LZ5/GEwIjHwUcYhCG16ZB+U0MGPcgRsnOOAoRBXAGXHQd8Wn3JvYCGnEUwER/JbRphHVO9JRWn3gMQY6XogEg80SCDu8AE5zinXbKUSEKgI/C5mTu0XDVDOfuqsGBKMhNvMgKlpQGxi4D9M6DixJZDPzQB/yu0hkufafciSKM2Nwvd6aUglFyLhCDXpjW4EQU5gNNqx2ekZO1frekdygIkzhiOCkOFAJiDE3FwKwZIh7ROHsZMfGz8A1nEWqmzazvzQCw2SmaNEVOqaOMM8UE8HKX46fFCdA9JL4Pgkk8pkk85mjB43FzPxHiK93BwuwQsWTKztMkjsdpzRxHPxXWkvi+IL6XxWFBHK7/hOCJPiVMn4v5JyzUhVEXFuZBFJZHn+ejp/q5HH1REC9k8bIgXsqiExXUqKLOC+q8ot4X1HtZXRTEhSwuC+JSFj8WxI+yeLUt9pdtsb9KsaL+YlMsd8YTNBXflWwJxTOYxO/Ojn9K4l72rNdChHSzbITOg/Fw1O70Ooks4we9Neqa1qCq54aO1TdtlSF39Du2MRhuWA4kbw5tGO1hvy1HQbzRuz17VNU3sEa/M7AUhg3tyG4NO+vyIRRIVjf32b12q1IWN8/pWZZ11KvqucFq2Xb3QGHIHfZgMOjbGQqNGMVI8tIHY7t9ZFSjaB7UNdqtvkLfTIDRtawKLC2wWCPLHJgZC0cAS06eLxa72+8N5SC+qb/Vt+3KBPJC+bu2OWgpDBvWo8HR8DAjIQwErlwVkpev3ekeduUokgeNOmIGKyxk80+jnmV0KiykwDKybCstbHknik12bd7Eq0/uZt/pe6aeJLJZbDTZnO69B7PkxQozrncr7dv86gG4Fr76phDWxUNFOKyFgSoWWA8PlfBwG7xb9bt18a4i3K2FcVUsbj28q4R3t8HTqp/WxVNFOK2FoSoWWg9PlfB0Gzyv+nldPFeE81oYrmLh9fBcCc+3wZOqn9TFE0U4qYUhKhZSD0+U8GQL/OqWkJ4UxOqKWSVydXfIdIhjUNGzC0UmExyHyeqYQdNbAcB6eqomWPeC9cGj9E2k6agZFGfRlXvFNkDidsHQsTi2fBBHYiBOiT8KCub6nqAQv+Nm2tpmBIsHo2jtiJuOKd9rqo3Lg32ztW+aP7f23h6vLz1Pte+077XXmql1tLfaO+1EO9eg9rv2j/av9qlBG380/mz8tbI+frQe81IrPY2//wceSCPD</latexit>

p(r1|a0, s0)

Reparameterization sounds pretty great! So why didn’t we just use it for Reinforcement Learning?

Page 17: Lecture 9: Reinforcement Learning Emile van Krieken · 2021. 1. 25. · Emile van Krieken Deep Learning 2020 Lecture 9: Reinforcement Learning dlvu.github.io Reinforcement learning

REPARAMETERIZATION IN RL?

48

<latexit sha1_base64="1XD/drTWwzJ56XWvUlFI4PIcB5s=">AAAN2HicfZdNb9s2HMbV7q3L6q1dj7sICwp0QxBIieMXFAVqSTZ6WNssyFsbBwFF04pgSiQoyrErCDsOuw7YdfsW+y77NqNkR5YoyrqY4PPw8U9/icKfLsV+xA3jvwcPP/v8iy+/evT1zjePW99+9+Tp9+cRiRlEZ5Bgwi5dECHsh+iM+xyjS8oQCFyMLtyZnekXc8Qin4SnfEnRdQC80J/6EHAxdUVfjCMOOLoxfrp5smvsG/ml1wfmerCrra/jm6eP/x1PCIwDFHKIQRRdmQbl1wlg3IcYpTvjOEIUwBnw0FXMp73rxA9pzFEIU/250KYx1jnRMyx94jMEOV6KAYDMFwk6vAUMQC7gd6pREQpBgKK9ydyn0WoYzb3VgANx59fJIq9MWlmYeAzQWx8uKmQJCKIA8NvaZLQM3OokijFi86A6mVEKRsm5QAz6UVaDY1GY9zQrdnRKjtf67ZLeojBKk5jhtLxQCIgxNBUL82GEeEyT/GbEE55FrziL0V42zOdeOYDNTtBkT+RUJqo4U0wAr065QVacEN1BEgQgnCRjmiZjjhY8Ge/tp0J8rrtYmF0C2KTqPEmTZJzVzHX1E2GtiO9K4jtZHJbE4fpPCJ7oU8L0uXj+hEW6MOrCwnyIourqs2L1VD+To89L4rksXpTEC1l045Ia19R5SZ3X1LuSeieri5K4kMVlSVzK4qeS+EkWL7fFftgW+1GKFfUXm2K5M56gqfiA5K9QMoNp8ub07S9p0s+v9bsQI92sGqF7bzwcdbr9birL+F5vj3qm5dT1wtC1BqatMhSOQdc2nOGG5UDyFtCG0RkOOnIUxBu917dHdX0Dawy6jqUwbGhHdnvYXZcPoVCyeoXP7nfatbJ4RU7fsqyjfl0vDFbbtnsHCkPhsB3HGdg5Co0ZxUjy0ntjp3Nk1KNoEdQzOu2BQt88AKNnWTVYWmKxRpbpmDkLRwBLTl68LHZv0B/KQXxTf2tg27UHyEvl79mm01YYNqxHztHwMCchDISeXBVSlK/T7R325ChSBI264gnWWMjmn0Z9y+jWWEiJZWTZVlbY6k4Um+zKvE5Wn9zNvtN3TT1NZbPYaLI523v3ZsmLFWbc7Fbat/nVC3AjfP1OIWyKh4pw2AgDVSywGR4q4eE2eK/u95riPUW41wjjqVi8ZnhPCe9tg6d1P22Kp4pw2ghDVSy0GZ4q4ek2eF7386Z4rgjnjTBcxcKb4bkSnm+DJ3U/aYoninDSCENULKQZnijhyRZ4hu5Ey5d1CuLtSlgtUvTkopvNdYgTUNPzA0UuE5xE6arNoNmpAGA966oJ1v1w3XhUvok0WzWDohdduVdsDhKnC4beirblvWiJgegSfxYUzAt8QSF+x3vZaJsRLO6NYrQjTjqmfK6pDy4O9s32vmn+2t59/XJ96Hmk/aD9qL3QTK2rvdbeaMfamQY1ov2l/a390/rQ+q31e+uPlfXhg/WaZ1rlav35Px0uGww=</latexit>

p(s0)

<latexit sha1_base64="KIzHNcLYbod1+x/KccCLlzNOPGI=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YiIYnSe+P+2aFxbCwvvVqY6+JQW1/n98/3B8MRgXGAQg4xiKJb06D8LgGM+xCjdG8YR4gCOAUeuo35uH2X+CGNOQphqr8U2jjGOid6BqWPfIYgxwtRAMh8kaDDCWAAcoG+V46KUAgCFB2NZj6NVmU081YFB+K+75L5si9paWLiMUAnPpyXyBIQRAHgk8pgtAjc8iCKMWKzoDyYUQpGyTlHDPpR1oNz0Zj3NGt1dEnO1/pkQScojNIkZjgtThQCYgyNxcRlGSEe02R5M2J9p9ErzmJ0lJXLsVcOYNMLNDoSOaWBMs4YE8DLQ26QNSdED5AEAQhHyZCmyZCjOU+GR8epEF/qLhZmlwA2Kjsv0iQZZj1zXf1CWEviu4L4ThZ7BbG3/hGCR/qYMH0m1p+wSBdGXViYD1FUnj3IZ4/1gRx9VRCvZPG6IF7LohsX1LiizgrqrKI+FNQHWZ0XxLksLgriQhY/FcRPsnizK/bDrtiPUqzov9gUi73hCI3F62P5CCVTmCZvLt/+kSad5bV+FmKkm2UjdDfG036z1Wmlsow3eqPfNi2nqueGltU1bZUhd3RbtuH0tiwnkjeHNoxmr9uUoyDe6u2O3a/qW1ij23IshWFL27cbvda6fQiFktXLfXan2ai0xctzOpZlnXWqem6wGrbdPlEYcoftOE7XXqLQmFGMJC/dGJvNM6MaRfOgttFsdBX6dgGMtmVVYGmBxepbpmMuWTgCWHLy/GGx291OTw7i2/5bXduuLCAvtL9tm05DYdiynjlnvdMlCWEg9OSukLx9zVb7tC1HkTyo3xIrWGEh21/qdyyjVWEhBZa+ZVtZY8s7UWyyW/MuWb1yt/tOPzT1NJXNYqPJ5mzvbcySFyvMuN6ttO/yqyfgWvjqnUJYFw8V4bAWBqpYYD08VMLDXfBe1e/VxXuKcK8WxlOxePXwnhLe2wVPq35aF08V4bQWhqpYaD08VcLTXfC86ud18VwRzmthuIqF18NzJTzfBU+qflIXTxThpBaGqFhIPTxRwpMyvPjzyE7tAOvZqZdg3Q/XB4PSO4tmx4cpFGfFlXt14w4Sp3+G3opjxXtxZAXiFPdrMgTMC/wwFV8D3vAoq3YZwXxjFJX4EDHlz45qcX1ybDaOTfPPxuHr39ffJE+177UftV80U2tpr7U32rk20KAWaH9pf2v/7P938MPBTwc/r6yPH63nvNBK18Fv/wN8r/HQ</latexit>s0

<latexit sha1_base64="M8sqRzeDFh3Mi1iBGtqcTbyBm/Y=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4YQ4gSk98b9s0Pj2FheerUw18Whtr7O75/vD4YjAuMAhRxiEEW3pkH5XQIY9yFG6d4wjhAFcAo8dBvzcfsu8UMacxTCVH8ptHGMdU70DEof+QxBjheiAJD5IkGHE8AA5AJ9rxwVoRAEKDoazXwarcpo5q0KDsR93yXzZV/S0sTEY4BOfDgvkSUgiALAJ5XBaBG45UEUY8RmQXkwoxSMknOOGPSjrAfnojHvadbq6JKcr/XJgk5QGKVJzHBanCgExBgai4nLMkI8psnyZsT6TqNXnMXoKCuXY68cwKYXaHQkckoDZZwxJoCXh9wga06IHiAJAhCOkiFNkyFHc54Mj45TIb7UXSzMLgFsVHZepEkyzHrmuvqFsJbEdwXxnSz2CmJv/SMEj/QxYfpMrD9hkS6MurAwH6KoPHuQzx7rAzn6qiBeyeJ1QbyWRTcuqHFFnRXUWUV9KKgPsjoviHNZXBTEhSx+KoifZPFmV+yHXbEfpVjRf7EpFnvDERqL18fyEUqmME3eXL79I006y2v9LMRIN8tG6G6Mp/1mq9NKZRlv9Ea/bVpOVc8NLatr2ipD7ui2bMPpbVlOJG8ObRjNXrcpR0G81dsdu1/Vt7BGt+VYCsOWtm83eq11+xAKJauX++xOs1Fpi5fndCzLOutU9dxgNWy7faIw5A7bcZyuvUShMaMYSV66MTabZ0Y1iuZBbaPZ6Cr07QIYbcuqwNICi9W3TMdcsnAEsOTk+cNit7udnhzEt/23urZdWUBeaH/bNp2GwrBlPXPOeqdLEsJA6MldIXn7mq32aVuOInlQvyVWsMJCtr/U71hGq8JCCix9y7ayxpZ3othkt+ZdsnrlbvedfmjqaSqbxUaTzdne25glL1aYcb1bad/lV0/AtfDVO4WwLh4qwmEtDFSxwHp4qISHu+C9qt+ri/cU4V4tjKdi8erhPSW8twueVv20Lp4qwmktDFWx0Hp4qoSnu+B51c/r4rkinNfCcBULr4fnSni+C55U/aQunijCSS0MUbGQeniihCdlePHnkZ3aAdazUy/Buh+uDwaldxbNjg9TKM6KK/fqxh0kTv8MvRXHivfiyArEKe7XZAiYF/hhKr4GvOFRVu0ygvnGKCrxIWLKnx3V4vrk2Gwcm+afjcPXv6+/SZ5q32s/ar9optbSXmtvtHNtoEEt0P7S/tb+2f/v4IeDnw5+XlkfP1rPeaGVroPf/gf1BfGy</latexit>a0

~<latexit sha1_base64="9xsJmqaI6QVO7aNPEsbsg9BoZ44=">AAANdXicfZdbb9s2GIbVdocuS7Z2vdwuhGUdhiHIpMTxAUOBWpKNXqxtFsRJ2jgIKJqWBVMiQVGOXUG/YrfbD9sf2fUoH2SJoqyrD3xfvn70UZQpl2I/4obx76PHTz77/Isvn3619/X+wTffPnv+3VVEYgbRABJM2I0LIoT9EA24zzG6oQyBwMXo2p3amX49QyzySXjJFxTdBcAL/bEPARdDH4bQwwlL7837Z4fGsbG89GphrotDbX2d3z/fHwxHBMYBCjnEIIpuTYPyuwQw7kOM0r1hHCEK4BR46Dbm4/Zd4oc05iiEqf5SaOMY65zoGZQ+8hmCHC9EASDzRYIOJ4AByAX6XjkqQiEIUHQ0mvk0WpXRzFsVHIj7vkvmy76kpYmJxwCd+HBeIktAEAWATyqD0SJwy4MoxojNgvJgRikYJeccMehHWQ/ORWPe06zV0SU5X+uTBZ2gMEqTmOG0OFEIiDE0FhOXZYR4TJPlzYj1nUavOIvRUVYux145gE0v0OhI5JQGyjhjTAAvD7lB1pwQPUASBCAcJUOaJkOO5jwZHh2nQnypu1iYXQLYqOy8SJNkmPXMdfULYS2J7wriO1nsFcTe+kcIHuljwvSZWH/CIl0YdWFhPkRRefYgnz3WB3L0VUG8ksXrgngti25cUOOKOiuos4r6UFAfZHVeEOeyuCiIC1n8VBA/yeLNrtgPu2I/SrGi/2JTLPaGIzQWr4/lI5RMYZq8uXz7R5p0ltf6WYiRbpaN0N0YT/vNVqeVyjLe6I1+27Scqp4bWlbXtFWG3NFt2YbT27KcSN4c2jCavW5TjoJ4q7c7dr+qb2GNbsuxFIYtbd9u9Frr9iEUSlYv99mdZqPSFi/P6ViWddap6rnBath2+0RhyB224zhde4lCY0Yxkrx0Y2w2z4xqFM2D2kaz0VXo2wUw2pZVgaUFFqtvmY65ZOEIYMnJ84fFbnc7PTmIb/tvdW27soC80P62bToNhWHLeuac9U6XJISB0JO7QvL2NVvt07YcRfKgfkusYIWFbH+p37GMVoWFFFj6lm1ljS3vRLHJbs27ZPXK3e47/dDU01Q2i40mm7O9tzFLXqww43q30r7Lr56Aa+GrdwphXTxUhMNaGKhigfXwUAkPd8F7Vb9XF+8pwr1aGE/F4tXDe0p4bxc8rfppXTxVhNNaGKpiofXwVAlPd8Hzqp/XxXNFOK+F4SoWXg/PlfB8Fzyp+kldPFGEk1oYomIh9fBECU/K8OLPIzu1A6xnp16CdT9cHwxK7yyaHR+mUJwVV+7VjTtInP4ZeiuOFe/FkRWIU9yvyRAwL/DDVHwNeMOjrNplBPONUVTiQ8SUPzuqxfXJsdk4Ns0/G4evf19/kzzVvtd+1H7RTK2lvdbeaOfaQINaoP2l/a39s//fwQ8HPx38vLI+frSe80IrXQe//Q8UNPHI</latexit>r1~

<latexit sha1_base64="nHYTn8zcrIspPDPFYAY/eEDvzmw=">AAAN73icfZdNb9s2HMbVdi9dVq/petxFWFCgG4xAShy/HArUkm30sLRZkLctDgKKphXBlEhQlGNX03UfYcdh1wG7rt9l32aU7MgSRVkXE3wePv7pT1IgHYq9kBvGf48eP/ns8y++fPrVztfPGt88333x7UVIIgbROSSYsCsHhAh7ATrnHsfoijIEfAejS2dmp/rlHLHQI8EZX1J04wM38KYeBFx03e7q9PWYoXvAJrfmb2MAs16jqY9DDji6NX643d0z9o3s0asNc93Y09bPye2LZ5/GEwIjHwUcYhCG16ZB+U0MGPcgRsnOOAoRBXAGXHQd8Wn3JvYCGnEUwER/JbRphHVO9JRWn3gMQY6XogEg80SCDu8AE5zinXbKUSEKgI/C5mTu0XDVDOfuqsGBKMhNvMgKlpQGxi4D9M6DixJZDPzQB/yu0hkufafciSKM2Nwvd6aUglFyLhCDXpjW4EQU5gNNqx2ekZO1frekdygIkzhiOCkOFAJiDE3FwKwZIh7ROHsZMfGz8A1nEWqmzazvzQCw2SmaNEVOqaOMM8UE8HKX46fFCdA9JL4Pgkk8pkk85mjB43FzPxHiK93BwuwQsWTKztMkjsdpzRxHPxXWkvi+IL6XxWFBHK7/hOCJPiVMn4v5JyzUhVEXFuZBFJZHn+ejp/q5HH1REC9k8bIgXsqiExXUqKLOC+q8ot4X1HtZXRTEhSwuC+JSFj8WxI+yeLUt9pdtsb9KsaL+YlMsd8YTNBXflWwJxTOYxO/Ojn9K4l72rNdChHSzbITOg/Fw1O70Ooks4we9Neqa1qCq54aO1TdtlSF39Du2MRhuWA4kbw5tGO1hvy1HQbzRuz17VNU3sEa/M7AUhg3tyG4NO+vyIRRIVjf32b12q1IWN8/pWZZ11KvqucFq2Xb3QGHIHfZgMOjbGQqNGMVI8tIHY7t9ZFSjaB7UNdqtvkLfTIDRtawKLC2wWCPLHJgZC0cAS06eLxa72+8N5SC+qb/Vt+3KBPJC+bu2OWgpDBvWo8HR8DAjIQwErlwVkpev3ekeduUokgeNOmIGKyxk80+jnmV0KiykwDKybCstbHknik12bd7Eq0/uZt/pe6aeJLJZbDTZnO69B7PkxQozrncr7dv86gG4Fr76phDWxUNFOKyFgSoWWA8PlfBwG7xb9bt18a4i3K2FcVUsbj28q4R3t8HTqp/WxVNFOK2FoSoWWg9PlfB0Gzyv+nldPFeE81oYrmLh9fBcCc+3wZOqn9TFE0U4qYUhKhZSD0+U8GQL/OqWkJ4UxOqKWSVydXfIdIhjUNGzC0UmExyHyeqYQdNbAcB6eqomWPeC9cGj9E2k6agZFGfRlXvFNkDidsHQsTi2fBBHYiBOiT8KCub6nqAQv+Nm2tpmBIsHo2jtiJuOKd9rqo3Lg32ztW+aP7f23h6vLz1Pte+077XXmql1tLfaO+1EO9eg9rv2j/av9qlBG380/mz8tbI+frQe81IrPY2//wceSCPD</latexit>

p(r1|a0, s0)

~

<latexit sha1_base64="BRpqmjUu9VUXHozRBk4PwnIRiEQ=">AAAPBHicfZdNb9s2HMbV7q3L1qzdjrsICwp0QxBIieOXQ4Faso0eljbL8rbGQUDRtCKEEgmScuwKOu4L7Lp9gx2HXfc99gH2PUYptiyRknXJP3wePv7pT8kmPYoDLizr30ePP/r4k08/e/L51hdfPt3+6tnzr885iRlEZ5Bgwi49wBEOInQmAoHRJWUIhB5GF96dm+kXM8R4QKJTsaDoOgR+FEwDCIQcuqQvxx6i/PubZzvWnpVfpl7Yy2LHWF7HN8+f/jeeEBiHKBIQA86vbIuK6wQwEUCM0q1xzBEF8A746CoW0+51EkQ0FiiCqflCatMYm4KYGZM5CRiCAi9kASALZIIJbwEDUEjyrWoURxEIEd+dzALKH0o+8x8KAeRtXyfzvC1pZWLiM0BvAzivkCUg5CEQt9ogX4RedRDFGLFZWB3MKCWj4pwjBgOe9eBYNuYdzTrNT8nxUr9d0FsU8TSJGU7LE6WAGENTOTEvORIxTfKbkct7x18JFqPdrMzHXg0AuztBk12ZUxmo4kwxAaI65IVZcyJ0D0kYgmiSjGmajAWai2S8u5dK8YXpYWn2CGCTqvMkTZJx1jPPM0+ktSK+LYlvVXFYEofLDyF4Yk4JM2dy/QnjpjSa0sICiHh19lkxe2qeqdHnJfFcFS9K4oUqenFJjTV1VlJnmnpfUu9VdV4S56q4KIkLVfxQEj+o4uWm2F82xb5XYmX/5Uux2BpP0FR+e+SPUHIH0+TN6dGPadLLr+WzECPTrhqhtzIejNqdXidVZbzSW6Ou7Qx0vTB0nL7t1hkKR7/jWoPhmmVf8RbQltUe9ttqFMRrvdtzR7q+hrX6nYFTY1jTjtzWsLNsH0KRYvULn9trt7S2+EVOz3Gcw56uFwan5brd/RpD4XAHg0HfzVFozChGipeujO32oaVH0SKoa7Vb/Rp9vQBW13E0WFpicUaOPbBzFoEAVpyieFjcbr83VIPEuv9O33W1BRSl9ndde9CqMaxZDweHw4OchDAQ+WpXSNG+dqd70FWjSBE06sgV1FjI+pNGPcfqaCykxDJyXCdrbPVNlC/ZlX2dPHzlrt87c8c201Q1yxdNNWfv3sqseHGNGTe7a+2b/PUTcCO8fqcQNsXDmnDYCAPrWGAzPKyFh5vgfd3vN8X7NeF+I4xfx+I3w/u18P4meKr7aVM8rQmnjTC0joU2w9NaeLoJXuh+0RQvasJFI4yoYxHN8KIWXmyCJ7qfNMWTmnDSCEPqWEgzPKmFJxvgGbqXW75spyCfroRpkXJPLnezuQ5xAjSdCyBQLhOccE32xC0SINO9UDLl/+geEBcOWWq6PLCsdFkGWPKonknAIYkjkZNQnIx9IJW02PVMAnluMREXQZgfhJSbAKH8MV3dpNzJqfE+n643U4mf3iRjIjfsQO5hs5NI8vMo1efQycY5x4N0yUezQxTAZnYIIdgMouU+rfITQrOwOyi37g/uh6UcIHkYY+hIfsi7ZfgPctGYHwZy0eTf8W5WbTKC+cooqy15MLTVY6BeXOzv2a092/6ptfP6aHlGfGJ8a3xnvDRso2O8Nt4Yx8aZAQ1s/Gb8bvyx/ev2n9t/bf/9YH38aDnnG6Nybf/zP+4JlzE=</latexit>

p(✏)<latexit sha1_base64="jP+88+J5EambS0r9WjxZ5v5gh3s=">AAAPAXicfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKpmUhlEiQlGNX0GVfYNftG+w47LpPsg+w7zFKsWWJlKxL/uHz8PFPf0o26VEccGFZ/z56/NHHn3z62ZPPd7748unus+cvvrrkJGYQXUCCCXvvAY5wEKELEQiM3lOGQOhhdOXduZl+NUeMByQ6F0uKbkLgR8E0gEDIoYuxhyi/fb5nHVj5ZeqFvSr2jNV1evvi6X/jCYFxiCIBMeD82raouEkAEwHEKN0ZxxxRAO+Aj65jMe3eJEFEY4EimJovpTaNsSmImQGZk4AhKPBSFgCyQCaYcAYYgEJi71SjOIpAiPj+ZB5Q/lDyuf9QCCDv+SZZ5D1JKxMTnwE6C+CiQpaAkIdAzLRBvgy96iCKMWLzsDqYUUpGxblADAY868GpbMw7mrWZn5PTlT5b0hmKeJrEDKfliVJAjKGpnJiXHImYJvnNyLW9468Ei9F+VuZjrwaA3Z2hyb7MqQxUcaaYAFEd8sKsORG6hyQMQTRJxjRNxgItRDLeP0il+NL0sDR7BLBJ1XmWJsk465nnmWfSWhHflsS3qjgsicPVhxA8MaeEmXO5/oRxUxpNaWEBRLw6+6KYPTUv1OjLknipilcl8UoVvbikxpo6L6lzTb0vqfequiiJC1VclsSlKn4oiR9U8f222J+3xf6ixMr+y5diuTOeoKn86sgfoeQOpsmb85Mf0qSXX6tnIUamXTVCb208GrU7vU6qynitt0Zd2xnoemHoOH3brTMUjn7HtQbDDcuh4i2gLas97LfVKIg3erfnjnR9A2v1OwOnxrChHbmtYWfVPoQixeoXPrfXbmlt8YucnuM4xz1dLwxOy3W7hzWGwuEOBoO+m6PQmFGMFC9dG9vtY0uPokVQ12q3+jX6ZgGsruNosLTE4owce2DnLAIBrDhF8bC43X5vqAaJTf+dvutqCyhK7e+69qBVY9iwHg+Oh0c5CWEg8tWukKJ97U73qKtGkSJo1JErqLGQzSeNeo7V0VhIiWXkuE7W2OqbKF+ya/smefjK3bx35p5tpqlqli+aas7evbVZ8eIaM25219q3+esn4EZ4/U4hbIqHNeGwEQbWscBmeFgLD7fB+7rfb4r3a8L9Rhi/jsVvhvdr4f1t8FT306Z4WhNOG2FoHQtthqe18HQbvND9oile1ISLRhhRxyKa4UUtvNgGT3Q/aYonNeGkEYbUsZBmeFILT7bAM3Qvt3zZTkE+XQnTIuWeXO5mcx3iBGg6F0CgXCY44ZrsiRkSINO9UDLl/+geEBcOWWq6PK+sdVkGWPKonknAIYkjkZNQnIx9IJW02PVMAnluMREXQZifgpSbAKH8MV3fpNzJqfE+n242U4mf3iZjIjfsQO5hs5NI8tMo1efQydY5p4N0xUezQxTAZnYIIdgMotU+rfITQrOwOyi37g/uh6UcIHkYY+hEfsi7Vfj3ctGYHwZy0eTf8X5WbTOCxdooqx15MLTVY6BeXB0e2K0D2/6xtff6ZHVGfGJ8Y3xrfGfYRsd4bbwxTo0LAxqB8Zvxu/HH7q+7f+7+tfv3g/Xxo9Wcr43KtfvP/0/8llI=</latexit>✏

Still no differentiable path from to !

<latexit sha1_base64="ch1NQVmILmHb69zi5BmIgLmlnNE=">AAAPBXicfZdNb9s2HMbV7q3L1qzdjrsICwoMQxBIieOXQ4Faso0eljbL8tIuDgKKphUhlEiQlGNX0HVfYNftG+w47LrPsQ+w7zFKtmWJlKxL/uHz8PFPf0o26VEccGFZ/z56/NHHn3z62ZPPd7748unuV8+ef33JScwguoAEE/bOAxzhIEIXIhAYvaMMgdDD6Mq7dzP9aoYYD0h0LhYU3YTAj4JpAIGQQ+/HDD0ANrm1b5/tWQdWfpl6Ya+KPWN1nd4+f/rfeEJgHKJIQAw4v7YtKm4SwEQAMUp3xjFHFMB74KPrWEy7N0kQ0VigCKbmC6lNY2wKYmZQ5iRgCAq8kAWALJAJJrwDDEAh0XeqURxFIER8fzILKF+WfOYvCwHkfd8k87wvaWVi4jNA7wI4r5AlIOQhEHfaIF+EXnUQxRixWVgdzCglo+KcIwYDnvXgVDbmLc1azc/J6Uq/W9A7FPE0iRlOyxOlgBhDUzkxLzkSMU3ym5Hre89fChaj/azMx14OALs/Q5N9mVMZqOJMMQGiOuSFWXMi9ABJGIJokoxpmowFmotkvH+QSvGF6WFp9oh8OqrOszRJxlnPPM88k9aK+KYkvlHFYUkcrj6E4Ik5JcycyfUnjJvSaEoLCyDi1dkXxeypeaFGX5bES1W8KolXqujFJTXW1FlJnWnqQ0l9UNV5SZyr4qIkLlTxQ0n8oIrvtsW+3xb7ixIr+y9fisXOeIKm8usjf4SSe5gmr89PfkyTXn6tnoUYmXbVCL218WjU7vQ6qSrjtd4adW1noOuFoeP0bbfOUDj6HdcaDDcsh4q3gLas9rDfVqMg3ujdnjvS9Q2s1e8MnBrDhnbktoadVfsQihSrX/jcXrultcUvcnqO4xz3dL0wOC3X7R7WGAqHOxgM+m6OQmNGMVK8dG1st48tPYoWQV2r3erX6JsFsLqOo8HSEoszcuyBnbMIBLDiFMXD4nb7vaEaJDb9d/quqy2gKLW/69qDVo1hw3o8OB4e5SSEgchXu0KK9rU73aOuGkWKoFFHrqDGQjafNOo5VkdjISWWkeM6WWOrb6J8ya7tm2T5lbt578w920xT1SxfNNWcvXtrs+LFNWbc7K61b/PXT8CN8PqdQtgUD2vCYSMMrGOBzfCwFh5ug/d1v98U79eE+40wfh2L3wzv18L72+Cp7qdN8bQmnDbC0DoW2gxPa+HpNnih+0VTvKgJF40woo5FNMOLWnixDZ7oftIUT2rCSSMMqWMhzfCkFp5sgV8eCLKdgny6EqZFyj253M3mOsQJ0HQugEC5THDCNdkTd0iATPdCyZT/o3tAXDhkqemI8rUuywBLHtUzCTgkcSRyEoqTsQ+kkha7nkkgzy0m4iII85OQchMglD+m65uU+zE13ufTzWYq8dPbZEzkhh3IPWx2Ekl+HqX6HDrZOud0kK74aHaIAtjMDiEEm0G02qdVfkJoFnYP5dZ96V4u5QDJwxhDJ/JD3q7Cf5CLxvwwkIsm/473s2qbEczXRlntyIOhrR4D9eLq8MBuHdj2T629VyerM+IT41vjO+N7wzY6xivjtXFqXBjQCI3fjN+NP3Z/3f1z96/dv5fWx49Wc74xKtfuP/8D2hCX5w==</latexit>r1<latexit sha1_base64="UxFu6bt6MHIcMG5kg7mdvIpbOf8=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIjeOXQ4Faso0eljbL8tKtDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKtmWJlKxL/uHz8PFPf0o26VEccGFZ/z54+NHHn3z62aPP97748vH+k6fPvrriJGYQXUKCCXvnAY5wEKFLEQiM3lGGQOhhdO3N3Ey/niPGAxJdiCVFNyHwo2AaQCDk0PXYE3dIgNunB9aRlV+mXtjr4sBYX2e3zx7/N54QGIcoEhADzt/bFhU3CWAigBile+OYIwrgDPjofSym3ZskiGgsUART87nUpjE2BTEzJHMSMAQFXsoCQBbIBBPeAQagkOB71SiOIhAifjiZB5SvSj73V4UA8q5vkkXelbQyMfEZoHcBXFTIEhDyEIg7bZAvQ686iGKM2DysDmaUklFxLhCDAc96cCYb85ZmjeYX5Gyt3y3pHYp4msQMp+WJUkCMoamcmJcciZgm+c3I1Z3xl4LF6DAr87GXA8Bm52hyKHMqA1WcKSZAVIe8MGtOhO4hCUMQTZIxTZOxQAuRjA+PUik+Nz0szR4BbFJ1nqdJMs565nnmubRWxDcl8Y0qDkvicP0hBE/MKWHmXK4/YdyURlNaWAARr86+LGZPzUs1+qokXqnidUm8VkUvLqmxps5L6lxT70vqvaouSuJCFZclcamKH0riB1V8tyv2512xvyixsv/ypVjujSdoKr888kcomcE0eX1x+kOa9PJr/SzEyLSrRuhtjMejdqfXSVUZb/TWqGs7A10vDB2nb7t1hsLR77jWYLhleaF4C2jLag/7bTUK4q3e7bkjXd/CWv3OwKkxbGlHbmvYWbcPoUix+oXP7bVbWlv8IqfnOM5JT9cLg9Ny3e6LGkPhcAeDQd/NUWjMKEaKl26M7faJpUfRIqhrtVv9Gn27AFbXcTRYWmJxRo49sHMWgQBWnKJ4WNxuvzdUg8S2/07fdbUFFKX2d1170KoxbFlPBifD45yEMBD5aldI0b52p3vcVaNIETTqyBXUWMj2k0Y9x+poLKTEMnJcJ2ts9U2UL9l7+yZZfeVu3zvzwDbTVDXLF001Z+/exqx4cY0ZN7tr7bv89RNwI7x+pxA2xcOacNgIA+tYYDM8rIWHu+B93e83xfs14X4jjF/H4jfD+7Xw/i54qvtpUzytCaeNMLSOhTbD01p4ugte6H7RFC9qwkUjjKhjEc3wohZe7IInup80xZOacNIIQ+pYSDM8qYUnO+AZupdbvmynIJ+uhGmRck8ud7O5DnECNJ0LIFAuE5xwTV6dNjLdCyVT/o/uAXHhkKWmI8o3uiwDLHlUzyTgkMSRyEkoTsY+kEpa7HomgTy3mIiLIMzPQcpNgFD+mG5uUu7H1HifT7ebqcRPb5MxkRt2IPew2Ukk+WmU6nPoZOecs0G65qPZIQpgMzuEEGwG0XqfVvkJoVnYDMqt+8q9WsoBkocxhk7lh7xdh38vF435YSAXTf4dH2bVLiNYbIyy2pMHQ1s9BurF9Ysju3Vk2z+2Dl6drs+Ij4xvjG+N7wzb6BivjNfGmXFpQGNm/Gb8bvyx/+v+n/t/7f+9sj58sJ7ztVG59v/5H3tQlzY=</latexit>

✓<latexit sha1_base64="IEo9R4zn4kfotJN/50Rri/5e8ZA=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8rbFQUHRtCKYEgmScuwKuu0L7Lp9gx2HXfdB9gH2PUYptiyRknXJP3wePv7pT8kmPYoDLizr30ePP/r4k08/e/L5zhdfPt199vzFV5ecxAyiC0gwYdce4AgHEboQgcDomjIEQg+jK2/mZvrVHDEekOhcLCm6DYEfBdMAAiGHrsZQ4MRP3z/fsw6s/DL1wl4Ve8bqOn3/4ul/4wmBcYgiATHg/Ma2qLhNABMBxCjdGcccUQBnwEc3sZh2b5MgorFAEUzNl1KbxtgUxMyQzEnAkORYygJAFsgEE94BBqCQ4DvVKI4iECK+P5kHlD+UfO4/FALIu75NFnlX0srExGeA3gVwUSFLQMhDIO60Qb4MveogijFi87A6mFFKRsW5QAwGPOvBqWzMO5o1mp+T05V+t6R3KOJpEjOclidKATGGpnJiXnIkYprkNyNXd8ZfCRaj/azMx14NAJudocm+zKkMVHGmmABRHfLCrDkRuockDEE0ScY0TcYCLUQy3j9IpfjS9LA0ewSwSdV5libJOOuZ55ln0loR35bEt6o4LInD1YcQPDGnhJlzuf6EcVMaTWlhAUS8OvuimD01L9Toy5J4qYpXJfFKFb24pMaaOi+pc029L6n3qrooiQtVXJbEpSp+KIkfVPF6W+zP22J/UWJl/+VLsdwZT9BUfnnkj1Ayg2ny5vzkhzTp5dfqWYiRaVeN0Fsbj0btTq+TqjJe661R13YGul4YOk7fdusMhaPfca3BcMNyqHgLaMtqD/ttNQrijd7tuSNd38Ba/c7AqTFsaEdua9hZtQ+hSLH6hc/ttVtaW/wip+c4znFP1wuD03Ld7mGNoXC4g8Gg7+YoNGYUI8VL18Z2+9jSo2gR1LXarX6NvlkAq+s4GiwtsTgjxx7YOYtAACtOUTwsbrffG6pBYtN/p++62gKKUvu7rj1o1Rg2rMeD4+FRTkIYiHy1K6RoX7vTPeqqUaQIGnXkCmosZPNJo55jdTQWUmIZOa6TNbb6JsqX7Ma+TR6+cjfvnblnm2mqmuWLppqzd29tVry4xoyb3bX2bf76CbgRXr9TCJviYU04bISBdSywGR7WwsNt8L7u95vi/ZpwvxHGr2Pxm+H9Wnh/GzzV/bQpntaE00YYWsdCm+FpLTzdBi90v2iKFzXhohFG1LGIZnhRCy+2wRPdT5riSU04aYQhdSykGZ7UwpMt8Azdyy1ftlOQT1fCtEi5J5e72VyHOAGazgUQKJcJTrgme+IOCZDpXiiZ8n90D4gLhyw1HVG+1mUZYMmjeiYBhySORE5CcTL2gVTSYtczCeS5xURcBGF+DlJuAoTyx3R9k3Inp8b7fLrZTGXHp2RM5IYdyD1sdhJJfhql+hw62TrndJCu+Gh2iALYzA4hBJtBtNqnVX5CaBY2g3Lr/uB+WMoBkocxhk7kh7xbhX8vF435YSAXTf4d72fVNiNYrI2y2pEHQ1s9BurF1eGB3Tqw7R9be69PVmfEJ8Y3xrfGd4ZtdIzXxhvj1LgwoDEzfjN+N/7Y/XX3z92/dv9+sD5+tJrztVG5dv/5H93Cl14=</latexit>g

<latexit sha1_base64="JYRotR/AW1We/S2YX7ncehvQmNA=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIjeOXQ4Faso0e1jbL8rbVQUDRtCKYEgmScuwKuu0L7Lp9gx2HXfdB9gH2PUbJtiyRknXJP3wePv7pT8kmPYoDLizr3wcPP/r4k08/e/T53hdfPt5/8vTZV5ecxAyiC0gwYdce4AgHEboQgcDomjIEQg+jK2/mZvrVHDEekOhcLCm6CYEfBdMAAiGHrsaeuEMC3D49sI6s/DL1wl4XB8b6Or199vi/8YTAOESRgBhw/t62qLhJABMBxCjdG8ccUQBnwEfvYzHt3iRBRGOBIpiaz6U2jbEpiJkhmZOAISjwUhYAskAmmPAOMACFBN+rRnEUgRDxw8k8oHxV8rm/KgSQd32TLPKupJWJic8AvQvgokKWgJCHQNxpg3wZetVBFGPE5mF1MKOUjIpzgRgMeNaDU9mYdzRrND8np2v9bknvUMTTJGY4LU+UAmIMTeXEvORIxDTJb0au7oy/FCxGh1mZj70cADY7Q5NDmVMZqOJMMQGiOuSFWXMidA9JGIJokoxpmowFWohkfHiUSvG56WFp9ghgk6rzLE2ScdYzzzPPpLUivi2Jb1VxWBKH6w8heGJOCTPncv0J46Y0mtLCAoh4dfZFMXtqXqjRlyXxUhWvSuKVKnpxSY01dV5S55p6X1LvVXVREhequCyJS1X8UBI/qOL1rtifd8X+osTK/suXYrk3nqCp/PLIH6FkBtPk9fmbH9Kkl1/rZyFGpl01Qm9jPB61O71Oqsp4o7dGXdsZ6Hph6Dh9260zFI5+x7UGwy3LC8VbQFtWe9hvq1EQb/Vuzx3p+hbW6ncGTo1hSztyW8POun0IRYrVL3xur93S2uIXOT3HcU56ul4YnJbrdl/UGAqHOxgM+m6OQmNGMVK8dGNst08sPYoWQV2r3erX6NsFsLqOo8HSEoszcuyBnbMIBLDiFMXD4nb7vaEaJLb9d/quqy2gKLW/69qDVo1hy3oyOBke5ySEgchXu0KK9rU73eOuGkWKoFFHrqDGQrafNOo5VkdjISWWkeM6WWOrb6J8yd7bN8nqK3f73pkHtpmmqlm+aKo5e/c2ZsWLa8y42V1r3+Wvn4Ab4fU7hbApHtaEw0YYWMcCm+FhLTzcBe/rfr8p3q8J9xth/DoWvxner4X3d8FT3U+b4mlNOG2EoXUstBme1sLTXfBC94umeFETLhphRB2LaIYXtfBiFzzR/aQpntSEk0YYUsdCmuFJLTzZAc/QvdzyZTsF+XQlTIuUe3K5m811iBOg6VwAgXKZ4IRr8uq0keleKJnyf3QPiAuHLDUdUb7RZRlgyaN6JgGHJI5ETkJxMvaBVNJi1zMJ5LnFRFwEYX4OUm4ChPLHdHOTcienxvt8ut1MJX56m4yJ3LADuYfNTiLJT6NUn0MnO+ecDtI1H80OUQCb2SGEYDOI1vu0yk8IzcJmUG7dV+7VUg6QPIwx9EZ+yLt1+Pdy0ZgfBnLR5N/xYVbtMoLFxiirPXkwtNVjoF5cvTiyW0e2/WPr4FV7fUZ8ZHxjfGt8Z9hGx3hlvDZOjQsDGjPjN+N344/9X/f/3P9r/++V9eGD9Zyvjcq1/8//cW+XHQ==</latexit>

Unfortunately, reparamterization doesn’t work in Reinforcement Learning. Although we can transform the action sampling step to make it differentiable, this doesn’t solve the non-differentiability of the environment! As you can see, there is still no differentiable path from the parameters to the reward.

Pathwise derivative has low variance :) • Uses extra info:

Requires: • Differentiable function :( • Appropriate continuous distribution :(

No reparameterization for discrete distributions… But continuous relaxations exist!

<latexit sha1_base64="dSa//P/biN8T0p7UDtRAa3DR8qw=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxL/uHz8NFPf1o26VEccGFZ/z56/NHHn3z62ZPPd7748unus+cvvrrkJGYQXUCCCXvvAY5wEKELEQiM3lOGQOhhdOXN3Ey/miPGAxKdiyVFNyHwo2AaQCDk0NUY+jiZprfP96wDK79MvbBXxZ6xuk5vXzz9bzwhMA5RJCAGnF/bFhU3CWAigBilO+OYIwrgDPjoOhbT7k0SRDQWKIKp+VJq0xibgpgZkjkJGIICL2UBIAtkggnvAANQSPCdahRHEQgR35/MA8ofSj73HwoB5FPfJIu8K2llYuIzQO8CuKiQJSDkIRB32iBfhl51EMUYsXlYHcwoJaPiXCAGA5714FQ25h3NGs3PyelKv1vSOxTxNIkZTssTpYAYQ1M5MS85EjFN8oeRqzvjrwSL0X5W5mOvBoDNztBkX+ZUBqo4U0yAqA55YdacCN1DEoYgmiRjmiZjgRYiGe8fpFJ8aXpYmj0C2KTqPEuTZJz1zPPMM2mtiG9L4ltVHJbE4eomBE/MKWHmXK4/YdyURlNaWAARr86+KGZPzQs1+rIkXqriVUm8UkUvLqmxps5L6lxT70vqvaouSuJCFZclcamKH0riB1V8vy32522xvyixsv/ypVjujCdoKr888o9QMoNp8ub85Ic06eXX6rMQI9OuGqG3Nh6N2p1eJ1VlvNZbo67tDHS9MHScvu3WGQpHv+Nag+GG5VDxFtCW1R7222oUxBu923NHur6BtfqdgVNj2NCO3Naws2ofQpFi9Quf22u3tLb4RU7PcZzjnq4XBqflut3DGkPhcAeDQd/NUWjMKEaKl66N7faxpUfRIqhrtVv9Gn2zAFbXcTRYWmJxRo49sHMWgQBWnKL4sLjdfm+oBolN/52+62oLKErt77r2oFVj2LAeD46HRzkJYSDy1a6Qon3tTveoq0aRImjUkSuosZDNnUY9x+poLKTEMnJcJ2ts9U2UL9m1fZM8fOVu3jtzzzbTVDXLF001Z+/e2qx4cY0ZN7tr7dv89RNwI7z+pBA2xcOacNgIA+tYYDM8rIWH2+B93e83xfs14X4jjF/H4jfD+7Xw/jZ4qvtpUzytCaeNMLSOhTbD01p4ug1e6H7RFC9qwkUjjKhjEc3wohZebIMnup80xZOacNIIQ+pYSDM8qYUnW+AZupdbvmynkB0MmBYp9+RyN5vrECdA07kAAuUywQnXZE/cIQEy3QslU/6P7gFx4ZClpiPK17osAyx5VM8k4JDEkchJKE7GPpBKWux6JoE8t5iIiyDMz0HKQ4BQ/piuH1Lux9R4n083m6nET2+TMZEbdiD3sNlJJPlplOpz6GTrnNNBuuKj2SEKYDM7hBBsBtFqn1b5CaFZ2AzKrfuD+2EpB0gexhg6kTd5twr/Xi4a88NALpr8O97Pqm1GsFgbZbUjD4a2egzUi6vDA7t1YNs/tvZen6zOiE+Mb4xvje8M2+gYr403xqlxYUBjZvxm/G78sfvr7p+7f+3+/WB9/Gg152ujcu3+8z8UJZdS</latexit>

f<latexit sha1_base64="l3K4jq2dkmrGyf0anK8516tlUYk=">AAAPEHicfZfbbts2HMbV7tRla5Z2l7sRFhTohiCQGseHiwK1ZBu9WNosy6FbHQQUTStCKJEgKceuoIfYC+x2e4NdDrvdG+wB9h6jZFuWSMm6CcPv46ef/hRl0qM44MKy/n3w8KOPP/n0s0ef73zx5ePdr/aePL3kJGYQXUCCCXvnAY5wEKELEQiM3lGGQOhhdOXduZl+NUOMByQ6FwuKrkPgR8E0gEDIrpu9p/Rm7IlbJMDzMQchxei7m71969DKL1Nv2KvGvrG6Tm+ePP5vPCEwDlEkIAacv7ctKq4TwEQAMUp3xjFHFMA74KP3sZh2r5MgorFAEUzNZ1KbxtgUxMz4zEnAEBR4IRsAskAmmPAWMACFfIqdahRHEQgRP5jMAsqXTT7zlw0BZAmuk3leorQyMPEZoLcBnFfIEhDyEIhbrZMvQq/aiWKM2CysdmaUklFxzhGDAc9qcCoL85ZmVefn5HSl3y7oLYp4msQMp+WBUkCMoakcmDc5EjFN8oeRU33HXwoWo4Osmfe9HAB2d4YmBzKn0lHFmWICRLXLC7PiROgekjAE0SQZ0zQZCzQXyfjgMJXiM9PD0uwRwCZV51maJOOsZp5nnklrRXxTEt+o4rAkDlc3IXhiTgkzZ3L+CeOmNJrSwgKIeHX0RTF6al6o0Zcl8VIVr0rilSp6cUmNNXVWUmeael9S71V1XhLnqrgoiQtV/FASP6jiu22xP2+L/UWJlfWXi2KxM56gqfyS5K9QcgfT5PX5yQ9p0suv1bsQI9OuGqG3Nh6N2p1eJ1VlvNZbo67tDHS9MHScvu3WGQpHv+Nag+GG5YXiLaAtqz3st9UoiDd6t+eOdH0Da/U7A6fGsKEdua1hZ1U+hCLF6hc+t9duaWXxi5ye4zjHPV0vDE7LdbsvagyFwx0MBn03R6Exk99xxUvXxnb72NKjaBHUtdqtfo2+mQCr6zgaLC2xOCPHHtg5i0AAK05RvCxut98bqkFiU3+n77raBIpS+buuPWjVGDasx4Pj4VFOQhiIfLUqpChfu9M96qpRpAgadeQMaixkc6dRz7E6GgspsYwc18kKW12JcpG9t6+T5Sd3s+7MfdtMU9UsF5pqztbe2qx4cY0ZN7tr7dv89QNwI7z+pBA2xcOacNgIA+tYYDM8rIWH2+B93e83xfs14X4jjF/H4jfD+7Xw/jZ4qvtpUzytCaeNMLSOhTbD01p4ug1e6H7RFC9qwkUjjKhjEc3wohZebIMnup80xZOacNIIQ+pYSDM8qYUnW+AZupdbvmynIN+uhGmRck8ud7O5DnECNJ0LIFAuE5xwTV4eOzLdCyVT/o/uAXHhkE1NR5SvddkMsORRPZOAQxJHIiehOBn7QCppseuZBPLcYiIugjA/FCkPkR+K1g8p92NqvM+nm81U4qc3yZjIDTuQe9jsJJL8NEr1MXSydczpIF3x0ewQBbCZHUIINoNotU+r/ITQLOwOyq370r2cygGShzGGTuRN3q7Cv5eTxvwwkJMm/44PstY2I5ivjbK1Iw+GtnoM1BtXLw7t1qFt/9jaf3WyOiM+Mr4xvjWeG7bRMV4Zr41T48KAxtz4zfjd+GP3190/d//a/XtpffhgNeZro3Lt/vM/LYGb0w==</latexit>

p✓(z)

PATHWISE DERIVATIVE

49

<latexit sha1_base64="JYRotR/AW1We/S2YX7ncehvQmNA=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIjeOXQ4Faso0e1jbL8rbVQUDRtCKYEgmScuwKuu0L7Lp9gx2HXfdB9gH2PUbJtiyRknXJP3wePv7pT8kmPYoDLizr3wcPP/r4k08/e/T53hdfPt5/8vTZV5ecxAyiC0gwYdce4AgHEboQgcDomjIEQg+jK2/mZvrVHDEekOhcLCm6CYEfBdMAAiGHrsaeuEMC3D49sI6s/DL1wl4XB8b6Or199vi/8YTAOESRgBhw/t62qLhJABMBxCjdG8ccUQBnwEfvYzHt3iRBRGOBIpiaz6U2jbEpiJkhmZOAISjwUhYAskAmmPAOMACFBN+rRnEUgRDxw8k8oHxV8rm/KgSQd32TLPKupJWJic8AvQvgokKWgJCHQNxpg3wZetVBFGPE5mF1MKOUjIpzgRgMeNaDU9mYdzRrND8np2v9bknvUMTTJGY4LU+UAmIMTeXEvORIxDTJb0au7oy/FCxGh1mZj70cADY7Q5NDmVMZqOJMMQGiOuSFWXMidA9JGIJokoxpmowFWohkfHiUSvG56WFp9ghgk6rzLE2ScdYzzzPPpLUivi2Jb1VxWBKH6w8heGJOCTPncv0J46Y0mtLCAoh4dfZFMXtqXqjRlyXxUhWvSuKVKnpxSY01dV5S55p6X1LvVXVREhequCyJS1X8UBI/qOL1rtifd8X+osTK/suXYrk3nqCp/PLIH6FkBtPk9fmbH9Kkl1/rZyFGpl01Qm9jPB61O71Oqsp4o7dGXdsZ6Hph6Dh9260zFI5+x7UGwy3LC8VbQFtWe9hvq1EQb/Vuzx3p+hbW6ncGTo1hSztyW8POun0IRYrVL3xur93S2uIXOT3HcU56ul4YnJbrdl/UGAqHOxgM+m6OQmNGMVK8dGNst08sPYoWQV2r3erX6NsFsLqOo8HSEoszcuyBnbMIBLDiFMXD4nb7vaEaJLb9d/quqy2gKLW/69qDVo1hy3oyOBke5ySEgchXu0KK9rU73eOuGkWKoFFHrqDGQrafNOo5VkdjISWWkeM6WWOrb6J8yd7bN8nqK3f73pkHtpmmqlm+aKo5e/c2ZsWLa8y42V1r3+Wvn4Ab4fU7hbApHtaEw0YYWMcCm+FhLTzcBe/rfr8p3q8J9xth/DoWvxner4X3d8FT3U+b4mlNOG2EoXUstBme1sLTXfBC94umeFETLhphRB2LaIYXtfBiFzzR/aQpntSEk0YYUsdCmuFJLTzZAc/QvdzyZTsF+XQlTIuUe3K5m811iBOg6VwAgXKZ4IRr8uq0keleKJnyf3QPiAuHLDUdUb7RZRlgyaN6JgGHJI5ETkJxMvaBVNJi1zMJ5LnFRFwEYX4OUm4ChPLHdHOTcienxvt8ut1MJX56m4yJ3LADuYfNTiLJT6NUn0MnO+ecDtI1H80OUQCb2SGEYDOI1vu0yk8IzcJmUG7dV+7VUg6QPIwx9EZ+yLt1+Pdy0ZgfBnLR5N/xYVbtMoLFxiirPXkwtNVjoF5cvTiyW0e2/WPr4FV7fUZ8ZHxjfGt8Z9hGx3hlvDZOjQsDGjPjN+N344/9X/f/3P9r/++V9eGD9Zyvjcq1/8//cW+XHQ==</latexit>

~

<latexit sha1_base64="BRpqmjUu9VUXHozRBk4PwnIRiEQ=">AAAPBHicfZdNb9s2HMbV7q3L1qzdjrsICwp0QxBIieOXQ4Faso0eljbL8rbGQUDRtCKEEgmScuwKOu4L7Lp9gx2HXfc99gH2PUYptiyRknXJP3wePv7pT8kmPYoDLizr30ePP/r4k08/e/L51hdfPt3+6tnzr885iRlEZ5Bgwi49wBEOInQmAoHRJWUIhB5GF96dm+kXM8R4QKJTsaDoOgR+FEwDCIQcuqQvxx6i/PubZzvWnpVfpl7Yy2LHWF7HN8+f/jeeEBiHKBIQA86vbIuK6wQwEUCM0q1xzBEF8A746CoW0+51EkQ0FiiCqflCatMYm4KYGZM5CRiCAi9kASALZIIJbwEDUEjyrWoURxEIEd+dzALKH0o+8x8KAeRtXyfzvC1pZWLiM0BvAzivkCUg5CEQt9ogX4RedRDFGLFZWB3MKCWj4pwjBgOe9eBYNuYdzTrNT8nxUr9d0FsU8TSJGU7LE6WAGENTOTEvORIxTfKbkct7x18JFqPdrMzHXg0AuztBk12ZUxmo4kwxAaI65IVZcyJ0D0kYgmiSjGmajAWai2S8u5dK8YXpYWn2CGCTqvMkTZJx1jPPM0+ktSK+LYlvVXFYEofLDyF4Yk4JM2dy/QnjpjSa0sICiHh19lkxe2qeqdHnJfFcFS9K4oUqenFJjTV1VlJnmnpfUu9VdV4S56q4KIkLVfxQEj+o4uWm2F82xb5XYmX/5Uux2BpP0FR+e+SPUHIH0+TN6dGPadLLr+WzECPTrhqhtzIejNqdXidVZbzSW6Ou7Qx0vTB0nL7t1hkKR7/jWoPhmmVf8RbQltUe9ttqFMRrvdtzR7q+hrX6nYFTY1jTjtzWsLNsH0KRYvULn9trt7S2+EVOz3Gcw56uFwan5brd/RpD4XAHg0HfzVFozChGipeujO32oaVH0SKoa7Vb/Rp9vQBW13E0WFpicUaOPbBzFoEAVpyieFjcbr83VIPEuv9O33W1BRSl9ndde9CqMaxZDweHw4OchDAQ+WpXSNG+dqd70FWjSBE06sgV1FjI+pNGPcfqaCykxDJyXCdrbPVNlC/ZlX2dPHzlrt87c8c201Q1yxdNNWfv3sqseHGNGTe7a+2b/PUTcCO8fqcQNsXDmnDYCAPrWGAzPKyFh5vgfd3vN8X7NeF+I4xfx+I3w/u18P4meKr7aVM8rQmnjTC0joU2w9NaeLoJXuh+0RQvasJFI4yoYxHN8KIWXmyCJ7qfNMWTmnDSCEPqWEgzPKmFJxvgGbqXW75spyCfroRpkXJPLnezuQ5xAjSdCyBQLhOccE32xC0SINO9UDLl/+geEBcOWWq6PLCsdFkGWPKonknAIYkjkZNQnIx9IJW02PVMAnluMREXQZgfhJSbAKH8MV3dpNzJqfE+n643U4mf3iRjIjfsQO5hs5NI8vMo1efQycY5x4N0yUezQxTAZnYIIdgMouU+rfITQrOwOyi37g/uh6UcIHkYY+hIfsi7ZfgPctGYHwZy0eTf8W5WbTKC+cooqy15MLTVY6BeXOzv2a092/6ptfP6aHlGfGJ8a3xnvDRso2O8Nt4Yx8aZAQ1s/Gb8bvyx/ev2n9t/bf/9YH38aDnnG6Nybf/zP+4JlzE=</latexit>

p(✏)<latexit sha1_base64="jP+88+J5EambS0r9WjxZ5v5gh3s=">AAAPAXicfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKpmUhlEiQlGNX0GVfYNftG+w47LpPsg+w7zFKsWWJlKxL/uHz8PFPf0o26VEccGFZ/z56/NHHn3z62ZPPd7748unus+cvvrrkJGYQXUCCCXvvAY5wEKELEQiM3lOGQOhhdOXduZl+NUeMByQ6F0uKbkLgR8E0gEDIoYuxhyi/fb5nHVj5ZeqFvSr2jNV1evvi6X/jCYFxiCIBMeD82raouEkAEwHEKN0ZxxxRAO+Aj65jMe3eJEFEY4EimJovpTaNsSmImQGZk4AhKPBSFgCyQCaYcAYYgEJi71SjOIpAiPj+ZB5Q/lDyuf9QCCDv+SZZ5D1JKxMTnwE6C+CiQpaAkIdAzLRBvgy96iCKMWLzsDqYUUpGxblADAY868GpbMw7mrWZn5PTlT5b0hmKeJrEDKfliVJAjKGpnJiXHImYJvnNyLW9468Ei9F+VuZjrwaA3Z2hyb7MqQxUcaaYAFEd8sKsORG6hyQMQTRJxjRNxgItRDLeP0il+NL0sDR7BLBJ1XmWJsk465nnmWfSWhHflsS3qjgsicPVhxA8MaeEmXO5/oRxUxpNaWEBRLw6+6KYPTUv1OjLknipilcl8UoVvbikxpo6L6lzTb0vqfequiiJC1VclsSlKn4oiR9U8f222J+3xf6ixMr+y5diuTOeoKn86sgfoeQOpsmb85Mf0qSXX6tnIUamXTVCb208GrU7vU6qynitt0Zd2xnoemHoOH3brTMUjn7HtQbDDcuh4i2gLas97LfVKIg3erfnjnR9A2v1OwOnxrChHbmtYWfVPoQixeoXPrfXbmlt8YucnuM4xz1dLwxOy3W7hzWGwuEOBoO+m6PQmFGMFC9dG9vtY0uPokVQ12q3+jX6ZgGsruNosLTE4owce2DnLAIBrDhF8bC43X5vqAaJTf+dvutqCyhK7e+69qBVY9iwHg+Oh0c5CWEg8tWukKJ97U73qKtGkSJo1JErqLGQzSeNeo7V0VhIiWXkuE7W2OqbKF+ya/smefjK3bx35p5tpqlqli+aas7evbVZ8eIaM25219q3+esn4EZ4/U4hbIqHNeGwEQbWscBmeFgLD7fB+7rfb4r3a8L9Rhi/jsVvhvdr4f1t8FT306Z4WhNOG2FoHQtthqe18HQbvND9oile1ISLRhhRxyKa4UUtvNgGT3Q/aYonNeGkEYbUsZBmeFILT7bAM3Qvt3zZTkE+XQnTIuWeXO5mcx3iBGg6F0CgXCY44ZrsiRkSINO9UDLl/+geEBcOWWq6PK+sdVkGWPKonknAIYkjkZNQnIx9IJW02PVMAnluMREXQZifgpSbAKH8MV3fpNzJqfE+n242U4mf3iZjIjfsQO5hs5NI8tMo1efQydY5p4N0xUezQxTAZnYIIdgMotU+rfITQrOwOyi37g/uh6UcIHkYY+hEfsi7Vfj3ctGYHwZy0eTf8X5WbTOCxdooqx15MLTVY6BeXB0e2K0D2/6xtff6ZHVGfGJ8Y3xrfGfYRsd4bbwxTo0LAxqB8Zvxu/HH7q+7f+7+tfv3g/Xxo9Wcr43KtfvP/0/8llI=</latexit>✏

<latexit sha1_base64="ytaGjQ3HxJR6GVJlezzzjEx+HxU=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxLGD4PH/30pyiTHsUBF5b176PHH338yaefPfl854svn+4+e/7iq0tOYgbRBSSYsPce4AgHEboQgcDoPWUIhB5GV97MzfSrOWI8ING5WFJ0EwI/CqYBBEJ2XY05CClGt8/3rAMrv0y9Ya8ae8bqOr198fS/8YTAOESRgBhwfm1bVNwkgIkAYpTujGOOKIAz4KPrWEy7N0kQ0VigCKbmS6lNY2wKYmZI5iRgCAq8lA0AWSATTHgHGIBCgu9UoziKQIj4/mQeUP7Q5HP/oSGAfOqbZJFXJa0MTHwG6F0AFxWyBIQ8BOJO6+TL0Kt2ohgjNg+rnRmlZFScC8RgwLManMrCvKNZofk5OV3pd0t6hyKeJjHDaXmgFBBjaCoH5k2OREyT/GHk7M74K8FitJ81875XA8BmZ2iyL3MqHVWcKSZAVLu8MCtOhO4hCUMQTZIxTZOxQAuRjPcPUim+ND0szR4BbFJ1nqVJMs5q5nnmmbRWxLcl8a0qDkvicHUTgifmlDBzLuefMG5KoyktLICIV0dfFKOn5oUafVkSL1XxqiReqaIXl9RYU+clda6p9yX1XlUXJXGhisuSuFTFDyXxgyq+3xb787bYX5RYWX+5KJY74wmayo9H/golM5gmb85PfkiTXn6t3oUYmXbVCL218WjU7vQ6qSrjtd4adW1noOuFoeP0bbfOUDj6HdcaDDcsh4q3gLas9rDfVqMg3ujdnjvS9Q2s1e8MnBrDhnbktoadVfkQihSrX/jcXrullcUvcnqO4xz3dL0wOC3X7R7WGAqHOxgM+m6OQmMmP+OKl66N7faxpUfRIqhrtVv9Gn0zAVbXcTRYWmJxRo49sHMWgQBWnKJ4WdxuvzdUg8Sm/k7fdbUJFKXyd1170KoxbFiPB8fDo5yEMBD5alVIUb52p3vUVaNIETTqyBnUWMjmTqOeY3U0FlJiGTmukxW2uhLlIru2b5KHT+5m3Zl7tpmmqlkuNNWcrb21WfHiGjNudtfat/nrB+BGeP1JIWyKhzXhsBEG1rHAZnhYCw+3wfu632+K92vC/UYYv47Fb4b3a+H9bfBU99OmeFoTThthaB0LbYantfB0G7zQ/aIpXtSEi0YYUccimuFFLbzYBk90P2mKJzXhpBGG1LGQZnhSC0+2wDN0L7d82U5Bvl0J0yLlnlzuZnMd4gRoOhdAoFwmOOGa7Ik7JECme6Fkyv/RPSAuHLKp6YjytS6bAZY8qmcScEjiSOQkFCdjH0glLXY9k0CeW0zERRDm5yDlIfIz0foh5X5Mjff5dLOZSvz0NhkTuWEHcg+bnUSSn0apPoZOto45HaQrPpodogA2s0MIwWYQrfZplZ8QmoXNoNy6P7gfpnKA5GGMoRN5k3er8O/lpDE/DOSkyb/j/ay1zQgWa6Ns7ciDoa0eA/XG1eGB3Tqw7R9be69PVmfEJ8Y3xrfGd4ZtdIzXxhvj1LgwoDEzfjN+N/7Y/XX3z92/dv9+sD5+tBrztVG5dv/5Hw1/l0A=</latexit>z<latexit sha1_base64="IEo9R4zn4kfotJN/50Rri/5e8ZA=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8rbFQUHRtCKYEgmScuwKuu0L7Lp9gx2HXfdB9gH2PUYptiyRknXJP3wePv7pT8kmPYoDLizr30ePP/r4k08/e/L5zhdfPt199vzFV5ecxAyiC0gwYdce4AgHEboQgcDomjIEQg+jK2/mZvrVHDEekOhcLCm6DYEfBdMAAiGHrsZQ4MRP3z/fsw6s/DL1wl4Ve8bqOn3/4ul/4wmBcYgiATHg/Ma2qLhNABMBxCjdGcccUQBnwEc3sZh2b5MgorFAEUzNl1KbxtgUxMyQzEnAkORYygJAFsgEE94BBqCQ4DvVKI4iECK+P5kHlD+UfO4/FALIu75NFnlX0srExGeA3gVwUSFLQMhDIO60Qb4MveogijFi87A6mFFKRsW5QAwGPOvBqWzMO5o1mp+T05V+t6R3KOJpEjOclidKATGGpnJiXnIkYprkNyNXd8ZfCRaj/azMx14NAJudocm+zKkMVHGmmABRHfLCrDkRuockDEE0ScY0TcYCLUQy3j9IpfjS9LA0ewSwSdV5libJOOuZ55ln0loR35bEt6o4LInD1YcQPDGnhJlzuf6EcVMaTWlhAUS8OvuimD01L9Toy5J4qYpXJfFKFb24pMaaOi+pc029L6n3qrooiQtVXJbEpSp+KIkfVPF6W+zP22J/UWJl/+VLsdwZT9BUfnnkj1Ayg2ny5vzkhzTp5dfqWYiRaVeN0Fsbj0btTq+TqjJe661R13YGul4YOk7fdusMhaPfca3BcMNyqHgLaMtqD/ttNQrijd7tuSNd38Ba/c7AqTFsaEdua9hZtQ+hSLH6hc/ttVtaW/wip+c4znFP1wuD03Ld7mGNoXC4g8Gg7+YoNGYUI8VL18Z2+9jSo2gR1LXarX6NvlkAq+s4GiwtsTgjxx7YOYtAACtOUTwsbrffG6pBYtN/p++62gKKUvu7rj1o1Rg2rMeD4+FRTkIYiHy1K6RoX7vTPeqqUaQIGnXkCmosZPNJo55jdTQWUmIZOa6TNbb6JsqX7Ma+TR6+cjfvnblnm2mqmuWLppqzd29tVry4xoyb3bX2bf76CbgRXr9TCJviYU04bISBdSywGR7WwsNt8L7u95vi/ZpwvxHGr2Pxm+H9Wnh/GzzV/bQpntaE00YYWsdCm+FpLTzdBi90v2iKFzXhohFG1LGIZnhRCy+2wRPdT5riSU04aYQhdSykGZ7UwpMt8Azdyy1ftlOQT1fCtEi5J5e72VyHOAGazgUQKJcJTrgme+IOCZDpXiiZ8n90D4gLhyw1HVG+1mUZYMmjeiYBhySORE5CcTL2gVTSYtczCeS5xURcBGF+DlJuAoTyx3R9k3Inp8b7fLrZTGXHp2RM5IYdyD1sdhJJfhql+hw62TrndJCu+Gh2iALYzA4hBJtBtNqnVX5CaBY2g3Lr/uB+WMoBkocxhk7kh7xbhX8vF435YSAXTf4d72fVNiNYrI2y2pEHQ1s9BurF1eGB3Tqw7R9be69PVmfEJ8Y3xrfGd4ZtdIzXxhvj1LgwoDEzfjN+N/7Y/XX3z92/dv9+sD5+tJrztVG5dv/5H93Cl14=</latexit>g <latexit sha1_base64="dSa//P/biN8T0p7UDtRAa3DR8qw=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxL/uHz8NFPf1o26VEccGFZ/z56/NHHn3z62ZPPd7748unus+cvvrrkJGYQXUCCCXvvAY5wEKELEQiM3lOGQOhhdOXN3Ey/miPGAxKdiyVFNyHwo2AaQCDk0NUY+jiZprfP96wDK79MvbBXxZ6xuk5vXzz9bzwhMA5RJCAGnF/bFhU3CWAigBilO+OYIwrgDPjoOhbT7k0SRDQWKIKp+VJq0xibgpgZkjkJGIICL2UBIAtkggnvAANQSPCdahRHEQgR35/MA8ofSj73HwoB5FPfJIu8K2llYuIzQO8CuKiQJSDkIRB32iBfhl51EMUYsXlYHcwoJaPiXCAGA5714FQ25h3NGs3PyelKv1vSOxTxNIkZTssTpYAYQ1M5MS85EjFN8oeRqzvjrwSL0X5W5mOvBoDNztBkX+ZUBqo4U0yAqA55YdacCN1DEoYgmiRjmiZjgRYiGe8fpFJ8aXpYmj0C2KTqPEuTZJz1zPPMM2mtiG9L4ltVHJbE4eomBE/MKWHmXK4/YdyURlNaWAARr86+KGZPzQs1+rIkXqriVUm8UkUvLqmxps5L6lxT70vqvaouSuJCFZclcamKH0riB1V8vy32522xvyixsv/ypVjujCdoKr888o9QMoNp8ub85Ic06eXX6rMQI9OuGqG3Nh6N2p1eJ1VlvNZbo67tDHS9MHScvu3WGQpHv+Nag+GG5VDxFtCW1R7222oUxBu923NHur6BtfqdgVNj2NCO3Naws2ofQpFi9Quf22u3tLb4RU7PcZzjnq4XBqflut3DGkPhcAeDQd/NUWjMKEaKl66N7faxpUfRIqhrtVv9Gn2zAFbXcTRYWmJxRo49sHMWgQBWnKL4sLjdfm+oBolN/52+62oLKErt77r2oFVj2LAeD46HRzkJYSDy1a6Qon3tTveoq0aRImjUkSuosZDNnUY9x+poLKTEMnJcJ2ts9U2UL9m1fZM8fOVu3jtzzzbTVDXLF001Z+/e2qx4cY0ZN7tr7dv89RNwI7z+pBA2xcOacNgIA+tYYDM8rIWH2+B93e83xfs14X4jjF/H4jfD+7Xw/jZ4qvtpUzytCaeNMLSOhTbD01p4ug1e6H7RFC9qwkUjjKhjEc3wohZebIMnup80xZOacNIIQ+pYSDM8qYUnW+AZupdbvmynkB0MmBYp9+RyN5vrECdA07kAAuUywQnXZE/cIQEy3QslU/6P7gFx4ZClpiPK17osAyx5VM8k4JDEkchJKE7GPpBKWux6JoE8t5iIiyDMz0HKQ4BQ/piuH1Lux9R4n083m6nET2+TMZEbdiD3sNlJJPlplOpz6GTrnNNBuuKj2SEKYDM7hBBsBtFqn1b5CaFZ2AzKrfuD+2EpB0gexhg6kTd5twr/Xi4a88NALpr8O97Pqm1GsFgbZbUjD4a2egzUi6vDA7t1YNs/tvZen6zOiE+Mb4xvje8M2+gYr403xqlxYUBjZvxm/G78sfvr7p+7f+3+/WB9/Gg152ujcu3+8z8UJZdS</latexit>

f<latexit sha1_base64="B/YwSoY9opyj7zWp1UtSE4IyV2Y=">AAAPHXicfZfPbts2HMfV7l+XrVm7HXeYsKDAMASB1Dr+cyhQS7bRw9JmWdN0rYOAomlFMCUSJOXYFXTca+wFdt3eYMdh12EPsPcYJSuyRErWJTS/X3710Y+iQnoUB1xY1r937n7w4Ucff3Lv073PPr+//8WDh1++5iRmEJ1Dggl74wGOcBChcxEIjN5QhkDoYXThLdxMv1gixgMSvRJrii5D4EfBPIBAyK6rB99M5wzAZLqg5hT6OJmn6eYHByHFKL16cGAdWfll6g27aBwYxXV69fD+f9MZgXGIIgEx4PydbVFxmQAmAigD96YxRxTABfDRu1jM+5dJENFYoAim5iOpzWNsCmJmsOYsYAgKvJYNAFkgE0x4DSSwkI+0V4/iKAIh4oezZUD5psmX/qYhgKzHZbLK65XWBiY+A/Q6gKsaWQJCHgJxrXXydejVO1GMEVuG9c6MUjIqzhViMOBZDU5lYV7SbAr4K3Ja6Ndreo0iniYxw2l1oBQQY2guB+ZNjkRMk/xh5Lwv+FPBYnSYNfO+pyPAFmdodihzah11nDkmQNS7vDArToRuIAlDEM2SKZWvg0ArkUwPj1IpPjI9LM0eAWxWd56lSTLNauZ55pm01sQXFfGFKo4r4ri4CcEzc06YuZTzTxg3pdGUFhZAxOujz8vRc/NcjX5dEV+r4kVFvFBFL66osaYuK+pSU28q6o2qririShXXFXGtiu8r4ntVfLMr9uddsW+VWFl/uSjWe9MZmsvPSv4KJQuYJs9fnfyQJoP8Kt6FGJl23Qi9W+OTSbc36KWqjG/1zqRvOyNdLw09Z2i7TYbSMey51mi8ZXmseEtoy+qOh101CuKt3h+4E13fwlrD3shpMGxpJ25n3CvKh1CkWP3S5w66Ha0sfpkzcBzneKDrpcHpuG7/cYOhdLij0Wjo5ig0ZvI7rnjprbHbPbb0KFoG9a1uZ9igbyfA6juOBksrLM7EsUd2ziIQwIpTlC+L2x8OxmqQ2NbfGbquNoGiUv6+a486DYYt6/HoePwkJyEMRL5aFVKWr9vrP+mrUaQMmvTkDGosZHunycCxehoLqbBMHNfJCltfiXKRvbMvk80nd7vuzAPbTFPVLBeaas7W3q1Z8eIGM253N9p3+ZsH4FZ4/UkhbIuHDeGwFQY2scB2eNgID3fB+7rfb4v3G8L9Vhi/icVvh/cb4f1d8FT307Z42hBOW2FoEwtth6eN8HQXvND9oi1eNISLVhjRxCLa4UUjvNgFT3Q/aYsnDeGkFYY0sZB2eNIIT3bAM3Qjt3zZTiE7JjAtUu7J5W421yFOgKZzAQTKZYITrsmeuEYCZLoXSqb8h+4BcemQTU1HlN/qshlgyaN6ZgGHJI5ETkJxMvWBVNJy1zML5LnFRFwEYX5CUh5icygqHlLux9R4n8+3m6nET6+SKZEbdiD3sNlJJPlpkupj6GznmNNRWvDR7BAFsJkdQgg2g6jYp9X+hdAsbCGPdIV7M5UjJA9jDJ3Im7wswr+Xk8b8MJCTJv9OD7PWLiNY3Rpla08eDG31GKg3Lh4f2Z0j2/6xc/DspDgj3jO+Nr41vjNso2c8M54bp8a5AY1fjN+M340/9n/d/3P/r/2/N9a7d4oxXxm1a/+f/wGMCqHn</latexit>

@f

@z

The pathwise derivative using reparameterization has low variance. This is because it uses extra information compared to the score function: It has access to the derivative of the function f, which gives much more information on how to properly change the sampled value z to improve the result. This means that, when using the pathwise derivative, we will have much more stable and quick training.

However, reparameterization requires a differentiable function f. This unfortunately means we cannot use it in applications like Reinforcement Learning. It also requires a continuous distribution which is ‘reparameterizable’, that is, there is a transformation g on noise epsilon so that the result has the same distribution as p_\theta. This is particularly annoying because it means we cannot use the pathwise derivative for discrete distributions!

Gumbel Softmax: Continuous relaxation for categorical distribution Probabilities , temperature

<latexit sha1_base64="EsEnuX5FoMFWwQdj/1F+LCwp+7Y=">AAAPD3icfZfdbts2GIbV7q/L1izZDnciLCgwDIYhpY5/DgrUkm0U2NJmXZN0q4OAomlFCCUSJOXYFXQPu4Gdbneww2Gnu4RdwO5jlGLLEilZJ/nC9+XrRx8lm/QoDriwrH8fPPzgw48+/uTRp3ufff54/4uDwy8vOIkZROeQYMLeeoAjHEToXAQCo7eUIRB6GF16t26mXy4Q4wGJ3ogVRVch8KNgHkAg5ND1weGUBtd2y2y32y0zq7+/Pjiy2lZ+mXphr4sjY32dXR8+/m86IzAOUSQgBpy/sy0qrhLARAAxSvemMUcUwFvgo3exmPevkiCisUARTM0nUpvH2BTEzPDMWcAQFHglCwBZIBNMeAMYgELexF41iqMIhIi3ZouA8vuSL/z7QgDZgatkmXcorUxMfAboTQCXFbIEhDwE4kYb5KvQqw6iGCO2CKuDGaVkVJxLxGDAsx6cyca8olnT+RtyttZvVvQGRTxNYobT8kQpIMbQXE7MS45ETJP8ZuRK3/JngsWolZX52LMRYLev0awlcyoDVZw5JkBUh7wwa06E7iAJQxDNkilNk6lAS5FMW+1Uik9MD0uzRwCbVZ2v0ySZZj3zPPO1tFbElyXxpSqOS+J4/SEEz8w5YeZCrj9h3JRGU1pYABGvzj4vZs/NczX6oiReqOJlSbxURS8uqbGmLkrqQlPvSuqdqi5L4lIVVyVxpYrvS+J7VXy7K/bnXbG/KLGy//KlWO1NZ2guv0jyRyi5hWny4s3pD2kyyK/1sxAj064aobcxPp10e4Neqsp4o3cmfdsZ6Xph6DlD260zFI5hz7VG4y3LseItoC2rOx521SiIt3p/4E50fQtrDXsjp8awpZ24nXFv3T6EIsXqFz530O1obfGLnIHjOCcDXS8MTsd1+8c1hsLhjkajoZuj0JhRjBQv3Ri73RNLj6JFUN/qdoY1+nYBrL7jaLC0xOJMHHtk5ywCAaw4RfGwuP3hYKwGiW3/naHragsoSu3vu/aoU2PYsp6MTsZPcxLCQOSrXSFF+7q9/tO+GkWKoElPrqDGQrafNBk4Vk9jISWWieM6WWOrb6J8yd7ZV8n9V+72vTOPbDNNVbN80VRz9u5tzIoX15hxs7vWvstfPwE3wut3CmFTPKwJh40wsI4FNsPDWni4C97X/X5TvF8T7jfC+HUsfjO8Xwvv74Knup82xdOacNoIQ+tYaDM8rYWnu+CF7hdN8aImXDTCiDoW0QwvauHFLnii+0lTPKkJJ40wpI6FNMOTWniyA56hO7nly3YK8ulKmBYp9+RyN5vrECdA07kAAuUywQnXZE/cIAEy3QslU/6P7gFx4ZClpiPKN7osAyx5VM8s4JDEkchJKE6mPpBKWux6ZoE8t5iIiyDMz0TKTYBQ/phublLux9R4n8+3m6nET6+TKZEbdiD3sNlJJPlpkupz6GznnLNRuuaj2SEKYDM7hBBsBtF6n1b5CaFZ2C2UW/d79/1SjpA8jDF0Kj/k1Tr8O7lozA8DuWjy77SVVbuMYLkxympPHgxt9RioF5fHbbvTtu0fO0fPT9dnxEfG18Y3xreGbfSM58YL48w4N6BxZ/xm/G78sf/r/p/7f+3/fW99+GA95yujcu3/8z8CzZm9</latexit>⇡1, ...,⇡K<latexit sha1_base64="SkG7TCRf01UKGl8g5KnI5kEWxU8=">AAAPCHicfZfLbuM2GIU109s07aQz7bIbocEARREEUuL4sphiLNnGLJqZNM1l2jgIKJqWhVAiQVKOPYIeoC/QbfsGXRbd9i36AH2PUootS6RkbfKH5/D400/JJj2KAy4s699Hjz/48KOPP3ny6c5nnz/d/eLZ8y8vOYkZRBeQYMLeeYAjHEToQgQCo3eUIRB6GF15d26mX80R4wGJzsWSopsQ+FEwDSAQcmg8hhQnYwHi9Hvr9tmedWDll6kX9qrYM1bX6e3zp/+NJwTGIYoExIDza9ui4iYBTAQQo3RnHHNEAbwDPrqOxbR7kwQRjQWKYGq+kNo0xqYgZsZlTgKGoMBLWQDIAplgwhlgAApJv1ON4igCIeL7k3lA+UPJ5/5DIYC89ZtkkbcmrUxMfAboLICLClkCQh4CMdMG+TL0qoMoxojNw+pgRikZFecCMRjwrAensjFvadZtfk5OV/psSWco4mkSM5yWJ0oBMYamcmJeciRimuQ3I5f4jr8ULEb7WZmPvRwAdneGJvsypzJQxZliAkR1yAuz5kToHpIwBNEkGdNUPgloIZLx/kEqxRemh6XZI4BNqs6zNEnGWc88zzyT1or4piS+UcVhSRyuPoTgiTklzJzL9SeMm9JoSgsLIOLV2RfF7Kl5oUZflsRLVbwqiVeq6MUlNdbUeUmda+p9Sb1X1UVJXKjisiQuVfF9SXyviu+2xf68LfYXJVb2X74Uy53xBE3lN0j+CCV3ME1en5/8kCa9/Fo9CzEy7aoRemvj0ajd6XVSVcZrvTXq2s5A1wtDx+nbbp2hcPQ7rjUYblgOFW8BbVntYb+tRkG80bs9d6TrG1ir3xk4NYYN7chtDTur9iEUKVa/8Lm9dktri1/k9BzHOe7pemFwWq7bPawxFA53MBj03RyFxoxipHjp2thuH1t6FC2Cula71a/RNwtgdR1Hg6UlFmfk2AM7ZxEIYMUpiofF7fZ7QzVIbPrv9F1XW0BRan/XtQetGsOG9XhwPDzKSQgDka92hRTta3e6R101ihRBo45cQY2FbD5p1HOsjsZCSiwjx3WyxlbfRPmSXds3ycNX7ua9M/dsM01Vs3zRVHP27q3NihfXmHGzu9a+zV8/ATfC63cKYVM8rAmHjTCwjgU2w8NaeLgN3tf9flO8XxPuN8L4dSx+M7xfC+9vg6e6nzbF05pw2ghD61hoMzythafb4IXuF03xoiZcNMKIOhbRDC9q4cU2eKL7SVM8qQknjTCkjoU0w5NaeLIFnqF7ueXLdgry6UqYFin35HI3m+sQJ0DTuQAC5TLBCddkT8yQAJnuhZIp/0f3gLhwyFLTEeVrXZYBljyqZxJwSOJI5CTZMccHUkmLXc8kkOcWE3ERhPlhSLkJEMof0/VNyv2YGu/z6WYzlfjpbTImcsMO5B42O4kkP41SfQ6dbJ1zOkhXfDQ7RAFsZocQgs0gWu3TKj8hNAu7g3Lr/uB+WMoBkocxhk7kh7xdhX8nF435YSAXTf4d72fVNiNYrI2y2pEHQ1s9BurF1eGB3Tqw7R9be69OVmfEJ8bXxjfGt4ZtdIxXxmvj1LgwoEGN34zfjT92f939c/ev3b8frI8freZ8ZVSu3X/+ByrqmTs=</latexit>

⌧ > 0<latexit sha1_base64="PlqAUZXtSC35Pnj91freVBvb4ng=">AAAPOnicfZdNb9s2HMbV7q3L1izdjrsICwp0g2FIjeMXYAVqyfYKbG2zrmna1UFA0bQihBIJknLsCvoy+xr7Artutx132GHYdR9glGLLEilZlzB8Hj7+8U/RJj2KAy4s689bt997/4MPP7rz8d4nn97d/+zg3uevOIkZRKeQYMJee4AjHEToVAQCo9eUIRB6GJ15V26mny0Q4wGJXooVRech8KNgHkAgZNfFwbdTRHmAZdNume12u2UWHd+bUx6E5pRQxIAgLAIhSr6LQw/h9IHVMu2vLw4OrbaVP6besNeNQ2P9nFzcu/v3dEZgHKJIQAw4f2tbVJwngIkAYpTuTWOOKIBXwEdvYzHvnydBRGOBIpia96U2j7EpiJnNxJwFDEGBV7IBIAtkggkvAQNQyPnuVaM4yuh5a7YIKL9p8oV/0xBAFus8WebFTCsDE58BehnAZYUsASEPgbjUOvkq9KqdKMaILcJqZ0YpGRXnEjEY8KwGJ7Iwz2m2PvwlOVnrlyt6iSKeJjHDaXmgFBBjaC4H5k2OREyTfDLypbjijwSLUStr5n2PRoBdvUCzlsypdFRx5pgAUe3ywqw4EbqGJAxBNEumNE2mAi1FMm21UyneNz0szR4BbFZ1vkiTZJrVzPPMF9JaEZ+VxGeqOC6J4/WHEDwz54SZC7n+hHFTGk1pYQFEvDr6tBg9N0/V6Fcl8ZUqnpXEM1X04pIaa+qipC409bqkXqvqsiQuVXFVEleq+K4kvlPF17ti3+yK/VmJlfWXm2K1N52hufzOyV+h5AqmyZOXT39Ik0H+rN+FGJl21Qi9jfFo0u0Neqkq443emfRtZ6TrhaHnDG23zlA4hj3XGo23LA8VbwFtWd3xsKtGQbzV+wN3outbWGvYGzk1hi3txO2Me+vyIRQpVr/wuYNuRyuLX+QMHMc5Huh6YXA6rtt/WGMoHO5oNBq6OQqNGcVI8dKNsds9tvQoWgT1rW5nWKNvF8DqO44GS0sszsSxR3bOIhDAilMUL4vbHw7GapDY1t8Zuq62gKJU/r5rjzo1hi3r8eh4fJSTEAYiX60KKcrX7fWP+moUKYImPbmCGgvZftJk4Fg9jYWUWCaO62SFre5Eucne2ufJzVfudt+Zh7aZpqpZbjTVnO29jVnx4hozbnbX2nf56wfgRnh9phA2xcOacNgIA+tYYDM8rIWHu+B93e83xfs14X4jjF/H4jfD+7Xw/i54qvtpUzytCaeNMLSOhTbD01p4ugte6H7RFC9qwkUjjKhjEc3wohZe7IInup80xZOacNIIQ+pYSDM8qYUnO+AZupZHvuykIN+uhGmR8kwuT7O5DnECNJ0LIFAuE5xwTfbEJRIg071QMuX/6B4QFw7Z1HR5rdno6xuO5pkFHJI4EjkJxcnUB1JJi1PPLJD3FhNxEYT59UmZBAjlj+lmkvI8psb7fL49TCV+epFU71Y/TVJ9DJ3tHHMyStd8NLtEAWxmlxCCzSBan9MqPyE0C7uC8uh+475ZyhGSlzGGnsoPeb4O/0YuGvPDQC6a/DttZa1dRrDcGGVrT14MbfUaqDfOHrbtTtu2f+wcPn66viPeMb40vjIeGLbRMx4bT4wT49SAxi/Gb8bvxh/7v+7/tf/P/r831tu31mO+MCrP/n//A6Dyq00=</latexit>

✏1, ..., ✏K ⇠ Gumbel(0, 1)<latexit sha1_base64="dsDndegRp6mghq77Qu4vHvM4YZE=">AAAPeXicfZfNc5tGGMZx+pHUbWKnPfbC1JMZJ/W4EMv6OGQmAkmTQ5O4aRwnDR7PslphxsDu7C6yFIY/tMce+k/01BckI1hAXLzmefbRj3cXaV+XBb6QhvH3zr2vvv7m2/sPvtv9/oeHj/b2H//4QdCYY3KOaUD5RxcJEvgROZe+DMhHxgkK3YBcuDd2pl/MCRc+jd7LJSOXIfIif+ZjJOHW1X7iCBSygOgvdAfLIPHSq8TBLEgcieI0PXRcwsSR7rg0mIplCH8Sh/npU7BTRjiSlEcoJMmfdCZDtEgPD52AejW//queJz39bRP+9Gr/wDg28kuvD8z14EBbX2dXjx/+60wpjkMSSRwgIT6bBpOXCeLSxwFJd51YEIbwDfLI51jO+peJH7FYkgin+hPQZnGgS6pnhdCnPifwxEsYIMx9SNDxNeIISyjXbjVKkOwhxdF07jOxGoq5txpIBLW+TBb5WqSViYnHEbv28aJClqBQhEhe125m9areJHFA+Dys3swogVFxLgjHvshqcAaFecuy5RXv6dlav16yaxKJNIl5kJYngkA4JzOYmA8FkTFL8oeBPXUjXkgek6NsmN97MUL85h2ZHkFO5UYVZxZQJKu33DArTkRuMQ1DFE1hX6SwEchCJs7RcQriE90NwOxSxKdV57s0SZysZq6rvwNrRXxTEt+o4rgkjtcfAltTn1Guz2H9KRc6GHWwcB8TUZ19Xsye6edq9IeS+EEVL0rihSq6cUmNa+q8pM5r6m1JvVXVRUlcqOKyJC5V8UtJ/KKKH7fFftoW+5cSC/WHl2K560zJDL6y8i2U3OA0efX+9e9pMsiv9V6IiW5Wjdi9M55Mur1BL1Xl4E7vTPqmNarrhaFnDU27yVA4hj3bGI03LM8VbwFtGN3xsKtG4WCj9wf2pK5vYI1hb2Q1GDa0E7sz7q3LR0ikWL3CZw+6nVpZvCJnYFnW6aCuFwarY9v95w2GwmGPRqOhnaOwmMNPh+Jld8Zu99SoR7EiqG90O8MGfbMARt+yarCsxGJNLHNk5iySoEBxymKz2P3hYKwGyU39raFt1xZQlsrft81Rp8GwYT0dnY5PchLKUeSpVaFF+bq9/klfjaJF0KQHK1hjoZtPmgwso1djoSWWiWVbWWGrbyK8ZJ/Ny2T1lbt57/QDU09T1QwvmmrO3r07s+INGsxBu7vRvs3fPCFoha8/KcZt8bghHLfC4CYW3A6PG+HxNniv7vfa4r2GcK8Vxmti8drhvUZ4bxs8q/tZWzxrCGetMKyJhbXDs0Z4tg1e1v2yLV42hMtWGNnEItvhZSO83AZP637aFk8bwmkrDG1ioe3wtBGeboHn5BaOfNlJAXZXwmuRcCaH02yu4yBBNV1IJEkuQ7sharIrr4lEme6GwJT/U/eguHBknY+qQ/Nyp8PQD4BH9Ux9gWkcyZwk63I8BEpanHqmPvQtOhHSD/PuS3mIvA+7e0g4j6nxnphtDlOrRq3agk3S+hw23TrnbJSu+VjWRKFAXzVtuh+tz2mVnxCWhd1gOLqv3KulHBFoxjh5DR/ydh3+DBaNe6EPiwZ/naNstM0InePaCKNdaAxNtQ2sDy6eH5udY9P8o3Pw8vW6R3yg/az9oh1qptbTXmqvtDPtXMPaPzv3d/Z3Hj/6b0/fO9x7trLe21nP+UmrXHsn/wMw4cEH</latexit>

z = g⌧(✏,⇡) = Softmax((log⇡+ ✏)/⌧)

GUMBEL SOFTMAX

50

~ <latexit sha1_base64="jP+88+J5EambS0r9WjxZ5v5gh3s=">AAAPAXicfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKpmUhlEiQlGNX0GVfYNftG+w47LpPsg+w7zFKsWWJlKxL/uHz8PFPf0o26VEccGFZ/z56/NHHn3z62ZPPd7748unus+cvvrrkJGYQXUCCCXvvAY5wEKELEQiM3lOGQOhhdOXduZl+NUeMByQ6F0uKbkLgR8E0gEDIoYuxhyi/fb5nHVj5ZeqFvSr2jNV1evvi6X/jCYFxiCIBMeD82raouEkAEwHEKN0ZxxxRAO+Aj65jMe3eJEFEY4EimJovpTaNsSmImQGZk4AhKPBSFgCyQCaYcAYYgEJi71SjOIpAiPj+ZB5Q/lDyuf9QCCDv+SZZ5D1JKxMTnwE6C+CiQpaAkIdAzLRBvgy96iCKMWLzsDqYUUpGxblADAY868GpbMw7mrWZn5PTlT5b0hmKeJrEDKfliVJAjKGpnJiXHImYJvnNyLW9468Ei9F+VuZjrwaA3Z2hyb7MqQxUcaaYAFEd8sKsORG6hyQMQTRJxjRNxgItRDLeP0il+NL0sDR7BLBJ1XmWJsk465nnmWfSWhHflsS3qjgsicPVhxA8MaeEmXO5/oRxUxpNaWEBRLw6+6KYPTUv1OjLknipilcl8UoVvbikxpo6L6lzTb0vqfequiiJC1VclsSlKn4oiR9U8f222J+3xf6ixMr+y5diuTOeoKn86sgfoeQOpsmb85Mf0qSXX6tnIUamXTVCb208GrU7vU6qynitt0Zd2xnoemHoOH3brTMUjn7HtQbDDcuh4i2gLas97LfVKIg3erfnjnR9A2v1OwOnxrChHbmtYWfVPoQixeoXPrfXbmlt8YucnuM4xz1dLwxOy3W7hzWGwuEOBoO+m6PQmFGMFC9dG9vtY0uPokVQ12q3+jX6ZgGsruNosLTE4owce2DnLAIBrDhF8bC43X5vqAaJTf+dvutqCyhK7e+69qBVY9iwHg+Oh0c5CWEg8tWukKJ97U73qKtGkSJo1JErqLGQzSeNeo7V0VhIiWXkuE7W2OqbKF+ya/smefjK3bx35p5tpqlqli+aas7evbVZ8eIaM25219q3+esn4EZ4/U4hbIqHNeGwEQbWscBmeFgLD7fB+7rfb4r3a8L9Rhi/jsVvhvdr4f1t8FT306Z4WhNOG2FoHQtthqe18HQbvND9oile1ISLRhhRxyKa4UUtvNgGT3Q/aYonNeGkEYbUsZBmeFILT7bAM3Qvt3zZTkE+XQnTIuWeXO5mcx3iBGg6F0CgXCY44ZrsiRkSINO9UDLl/+geEBcOWWq6PK+sdVkGWPKonknAIYkjkZNQnIx9IJW02PVMAnluMREXQZifgpSbAKH8MV3fpNzJqfE+n242U4mf3iZjIjfsQO5hs5NI8tMo1efQydY5p4N0xUezQxTAZnYIIdgMotU+rfITQrOwOyi37g/uh6UcIHkYY+hEfsi7Vfj3ctGYHwZy0eTf8X5WbTOCxdooqx15MLTVY6BeXB0e2K0D2/6xtff6ZHVGfGJ8Y3xrfGfYRsd4bbwxTo0LAxqB8Zvxu/HH7q+7f+7+tfv3g/Xxo9Wcr43KtfvP/0/8llI=</latexit>✏

<latexit sha1_base64="ytaGjQ3HxJR6GVJlezzzjEx+HxU=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxLGD4PH/30pyiTHsUBF5b176PHH338yaefPfl854svn+4+e/7iq0tOYgbRBSSYsPce4AgHEboQgcDoPWUIhB5GV97MzfSrOWI8ING5WFJ0EwI/CqYBBEJ2XY05CClGt8/3rAMrv0y9Ya8ae8bqOr198fS/8YTAOESRgBhwfm1bVNwkgIkAYpTujGOOKIAz4KPrWEy7N0kQ0VigCKbmS6lNY2wKYmZI5iRgCAq8lA0AWSATTHgHGIBCgu9UoziKQIj4/mQeUP7Q5HP/oSGAfOqbZJFXJa0MTHwG6F0AFxWyBIQ8BOJO6+TL0Kt2ohgjNg+rnRmlZFScC8RgwLManMrCvKNZofk5OV3pd0t6hyKeJjHDaXmgFBBjaCoH5k2OREyT/GHk7M74K8FitJ81875XA8BmZ2iyL3MqHVWcKSZAVLu8MCtOhO4hCUMQTZIxTZOxQAuRjPcPUim+ND0szR4BbFJ1nqVJMs5q5nnmmbRWxLcl8a0qDkvicHUTgifmlDBzLuefMG5KoyktLICIV0dfFKOn5oUafVkSL1XxqiReqaIXl9RYU+clda6p9yX1XlUXJXGhisuSuFTFDyXxgyq+3xb787bYX5RYWX+5KJY74wmayo9H/golM5gmb85PfkiTXn6t3oUYmXbVCL218WjU7vQ6qSrjtd4adW1noOuFoeP0bbfOUDj6HdcaDDcsh4q3gLas9rDfVqMg3ujdnjvS9Q2s1e8MnBrDhnbktoadVfkQihSrX/jcXrullcUvcnqO4xz3dL0wOC3X7R7WGAqHOxgM+m6OQmMmP+OKl66N7faxpUfRIqhrtVv9Gn0zAVbXcTRYWmJxRo49sHMWgQBWnKJ4WdxuvzdUg8Sm/k7fdbUJFKXyd1170KoxbFiPB8fDo5yEMBD5alVIUb52p3vUVaNIETTqyBnUWMjmTqOeY3U0FlJiGTmukxW2uhLlIru2b5KHT+5m3Zl7tpmmqlkuNNWcrb21WfHiGjNudtfat/nrB+BGeP1JIWyKhzXhsBEG1rHAZnhYCw+3wfu632+K92vC/UYYv47Fb4b3a+H9bfBU99OmeFoTThthaB0LbYantfB0G7zQ/aIpXtSEi0YYUccimuFFLbzYBk90P2mKJzXhpBGG1LGQZnhSC0+2wDN0L7d82U5Bvl0J0yLlnlzuZnMd4gRoOhdAoFwmOOGa7Ik7JECme6Fkyv/RPSAuHLKp6YjytS6bAZY8qmcScEjiSOQkFCdjH0glLXY9k0CeW0zERRDm5yDlIfIz0foh5X5Mjff5dLOZSvz0NhkTuWEHcg+bnUSSn0apPoZOto45HaQrPpodogA2s0MIwWYQrfZplZ8QmoXNoNy6P7gfpnKA5GGMoRN5k3er8O/lpDE/DOSkyb/j/ay1zQgWa6Ns7ciDoa0eA/XG1eGB3Tqw7R9be69PVmfEJ8Y3xrfGd4ZtdIzXxhvj1LgwoDEzfjN+N/7Y/XX3z92/dv9+sD5+tBrztVG5dv/5Hw1/l0A=</latexit>z<latexit sha1_base64="T0QtDMwuTx+3LLD9eredKPOaxhM=">AAAPD3icfZdNjts2GIaV/qbTZpq0y26EDgIUxWAgJR7/LALEkm1k0STTNJNJGw8GFE1rhJFEgqQ8dgTdoRfotr1Bl0W3PUIP0Hv0k2zLEilZG9N8X7569FG0SY+FgZCW9e+dDz786ONPPr372cHnX9w7/PL+g6/eCJpwTM4xDSl/6yFBwiAm5zKQIXnLOEGRF5IL78bN9YsF4SKg8Wu5YuQyQn4czAOMJHRd3X9gTj0azsQqgo90yoLs6v6RdWIVl6k37E3jyNhcZ1cP7v03nVGcRCSWOERCvLMtJi9TxGWAQ5IdTBNBGMI3yCfvEjnvX6ZBzBJJYpyZD0GbJ6EpqZnjmbOAEyzDFTQQ5gEkmPgacYQlPMRBPUqQGEVEHM8WARPrplj464ZEUIHLdFlUKKsNTH2O2HWAlzWyFEUiQvJa68xLU+8kSUj4Iqp35pTAqDiXhONA5DU4g8K8ZHnRxWt6ttGvV+yaxCJLEx5m1YEgEM7JHAYWTUFkwtLiYWCmb8QTyRNynDeLvicjxG9ekdkx5NQ66jjzkCJZ7/KivDgxucU0ilA8g1cgS6eSLGU6PT7JQHxoeiGYPYr4rO58laXpNK+Z55mvwFoTX1TEF6o4rojjzU3gLTTnlJsLmH/KhQlGEyw8wETUR5+Xo+fmuRr9piK+UcWLinihil5SURNNXVTUhabeVtRbVV1WxKUqririShXfV8T3qvh2X+zP+2J/UWKh/rAoVgfTGZnDD0nxCqU3OEufvX7+Q5YOimvzLiTEtOtG7G2Njyfd3qCXqXK41TuTvu2MdL009Jyh7TYZSsew51qj8Y7lkeItoS2rOx521Sgc7vT+wJ3o+g7WGvZGToNhRztxO+PepnyExIrVL33uoNvRyuKXOQPHcU4Hul4anI7r9h81GEqHOxqNhm6BwhLOQqJ42dbY7Z5aehQrg/pWtzNs0HcTYPUdR4NlFRZn4tgju2CRBIWKU5Yvi9sfDsZqkNzV3xm6rjaBslL+vmuPOg2GHevp6HT8uCChHMW+WhValq/b6z/uq1G0DJr0YAY1Frq702TgWD2NhVZYJo7r5IWtr0RYZO/sy3T9k7tbd+aRbWaZaoaFpprztbc1K96wwRy2uxvt+/zNA8JWeP1JMW6Lxw3huBUGN7HgdnjcCI/3wfu632+L9xvC/VYYv4nFb4f3G+H9ffBM97O2eNYQzlphWBMLa4dnjfBsH7zU/bItXjaEy1YY2cQi2+FlI7zcB091P22Lpw3htBWGNrHQdnjaCE/3wHNyC1u+fKcAb1fKtUjYk8NuttBxmCJNFxJJUshwshCa7MlrIlGuexEwFV90D0pKBzQ1nTCx1aEZhMCjemaBwDSJZUHC4IzjI1CyctczC+DcYhIhg6g4EykPgSL4M90+JOzH1HhfzHebqdTPrtIphQ07gj1sfhJJf5pk+hg22zvmbJRt+Fh+iEKhuT6fmUG82afV/kJYHnaDYeu+dq+nckTgMMbJc7jJy0349zBp3I8CmDT4nB7nrX1GtNwaoXUAB0NbPQbqjYtHJ3bnxLZ/7Bw9fb45I941vjG+Nb4zbKNnPDWeGWfGuYGNW+M343fjj8NfD/88/Ovw77X1gzubMV8btevwn/8Bv3Kb1w==</latexit>

<latexit sha1_base64="8A1rC81NtEPScRgnEx/cyxTbwvY=">AAAPGXicfZdNb9s2HMbV7q3L1izdjr0ICwp0QxBIieOXQ4Fasr0eljbrmpctDgKKphUhlEiQlGNX0GFfY19g1+0b7DjsutM+wL7HKMWWJVKyLvmHz8PHP/0p2aRHccCFZf374OEHH3708SePPt367PPH21/sPPnyjJOYQXQKCSbswgMc4SBCpyIQGF1QhkDoYXTu3bqZfj5DjAckeicWFF2FwI+CaQCBkEPXO0/HhCIGBGERCFHyXRx6CKfPrT3T/uZ6Z9fat/LL1At7Weway+vk+snj/8YTAuMQRQJiwPmlbVFxlQAmAohRujWOOaIA3gIfXcZi2r1KgojGAkUwNZ9JbRpjUxAzAzUnAUNQ4IUsAGSBTDDhDWAACnk7W9UojjJ6vjeZBZTfl3zm3xcCyF5cJfO8V2llYuIzQG8COK+QJSDkIRA32iBfhF51EMUYsVlYHcwoJaPinCMGA5714EQ25g3N2s/fkZOlfrOgNyjiaRIznJYnSgExhqZyYl5yJGKa5Dcj1/yWvxAsRntZmY+9GAB2+xZN9mROZaCKM8UEiOqQF2bNidAdJGEIokkypmkyFmgukvHefirFZ6aHpdkjgE2qzrdpkoyznnme+VZaK+LrkvhaFYclcbj8EIIn5pQwcybXnzBuSqMpLSyAiFdnnxazp+apGn1WEs9U8bwknquiF5fUWFNnJXWmqXcl9U5V5yVxroqLkrhQxfcl8b0qXmyK/WlT7M9KrOy/fCkWW+MJmsqvlPwRSm5hmrx6d/x9mvTya/ksxMi0q0borYyHo3an10lVGa/01qhrOwNdLwwdp2+7dYbC0e+41mC4ZjlQvAW0ZbWH/bYaBfFa7/bcka6vYa1+Z+DUGNa0I7c17Czbh1CkWP3C5/baLa0tfpHTcxznqKfrhcFpuW73oMZQONzBYNB3cxQaM4qR4qUrY7t9ZOlRtAjqWu1Wv0ZfL4DVdRwNlpZYnJFjD+ycRSCAFacoHha32+8N1SCx7r/Td11tAUWp/V3XHrRqDGvWo8HR8DAnIQxEvtoVUrSv3ekedtUoUgSNOnIFNRay/qRRz7E6GgspsYwc18kaW30T5Ut2aV8l91+56/fO3LXNNFXN8kVTzdm7tzIrXlxjxs3uWvsmf/0E3Aiv3ymETfGwJhw2wsA6FtgMD2vh4SZ4X/f7TfF+TbjfCOPXsfjN8H4tvL8Jnup+2hRPa8JpIwytY6HN8LQWnm6CF7pfNMWLmnDRCCPqWEQzvKiFF5vgie4nTfGkJpw0wpA6FtIMT2rhyQZ4hu7kli/bKcinK2FapNyTy91srkOcAE3nAgiUywQnXJM9cYMEyHQvlEz5P7oHxIVDlpqOKF/psgyw5FE9k4BDEkciJ6E4GftAKmmx65kE8txiIi6CMD8dKTcBQvljurpJuR9T430+XW+mEj+9Tqpnqx9HqT6HTjbOORmkSz6aHaIANrNDCMFmEC33aZWfEJqF3UK5db933y/lAMnDGEPH8kPeLMO/lYvG/DCQiyb/jveyapMRzFdGWW3Jg6GtHgP14vxg327t2/YPrd2Xx8sz4iPjqfG18dywjY7x0nhlnBinBjR+MX4zfjf+2P51+8/tv7b/vrc+fLCc85VRubb/+R/B0J8W</latexit>

Gumbel(0, 1)

<latexit sha1_base64="l0y5LMtdVe+ZLM+5zZZspw78VC4=">AAAPEnicfZfLbuM2GIU109s07biZtrtuhAYDFEUQSInjy2KAsWQbs2hm0jSXaeMgoGhaFkyJBEk5dgQ9RV+g2/YNuiy67Qv0AfoepWRblijJ2oThOTz+9FOy+TsUe1wYxr9Pnn7w4Ucff/Ls073PPn/e+GL/xZfXnIQMoitIMGHvHcAR9gJ0JTyB0XvKEPAdjG6cmZ3oN3PEuEeCS7Gk6M4HbuBNPAiEnLrf/3oEBY7c+D4aQYqjkQBhHN/vHxhHRnrp5YG5Hhxo6+v8/sXz/0ZjAkMfBQJiwPmtaVBxFwEmPIhRvDcKOaIAzoCLbkMx6dxFXkBDgQIY6y+lNgmxLoieEOpjjyEJtZQDAJknE3Q4BQxAIe9jrxjFUQB8xA/Hc4/y1ZDP3dVAAFmEu2iRFikuLIxcBujUg4sCWQR87gMxLU3ype8UJ1GIEZv7xcmEUjIqzgVi0ONJDc5lYd7RpO78kpyv9emSTlHA4yhkOM4vlAJiDE3kwnTIkQhplN6M3OwZfyVYiA6TYTr3qg/Y7AKND2VOYaKIM8EEiOKU4yfFCdADJL4PgnE0orF8ENBCRKPDo1iKL3UHS7NDABsXnRdxFI2SmjmOfiGtBfFtTnyrioOcOFh/CMFjfUKYPpf7TxjXpVGXFuZBxIurr7LVE/1Kjb7OideqeJMTb1TRCXNqWFLnOXVeUh9y6oOqLnLiQhWXOXGpio858VEV3++K/XlX7C9KrKy/fCmWe6MxmsjvkvQRimYwjt5cnv0QR930Wj8LIdLNohE6G+PJsNXutmNVxhu9OeyYVr+sZ4a21TPtKkPm6LVtoz/Yshwr3gzaMFqDXkuNgnird7r2sKxvYY1eu29VGLa0Q7s5aK/Lh1CgWN3MZ3dbzVJZ3Cyna1nWabesZwaradud4wpD5rD7/X7PTlFoyChGipdujK3WqVGOollQx2g1exX6dgOMjmWVYGmOxRpaZt9MWQQCWHGK7GGxO73uQA0S2/pbPdsubaDIlb9jm/1mhWHLeto/HZykJISBwFWrQrLytdqdk44aRbKgYVvuYImFbD9p2LWMdomF5FiGlm0lhS2+ifIluzXvotVX7va90w9MPY5Vs3zRVHPy7m3MihdXmHG9u9K+y1+9ANfCl+8Uwrp4WBEOa2FgFQush4eV8HAXvFv2u3XxbkW4WwvjVrG49fBuJby7C56W/bQunlaE01oYWsVC6+FpJTzdBS/KflEXLyrCRS2MqGIR9fCiEl7sgidlP6mLJxXhpBaGVLGQenhSCU92wDP0II98yUlBPl0RK0XKM7k8zaY6xBEo6VwAgVKZ4IiXZEdMkQCJ7viSKf2n7AFh5kiaE1VHlG90OfSw5FE9Y49DEgYiJUm6HBdIJc5OPWNP9i064sLz07ZIuQngyx/TzU3K85ga7/LJ9jC16qWIPLADeYZNOpHop2FcXkPHO9ec9+M1H02aKID1pAkhWPeC9Tmt8BNCk7AZlEf3lXu1lX0kmzGGzuSHvFuHfy83jbm+JzdN/h0dJqNdRrDYGOVoTzaGptoGlgc3x0dm88g0f2wevD5b94jPtG+0b7XvNFNra6+1N9q5dqVB7VH7Tftd+6Pxa+PPxl+Nv1fWp0/Wa77SClfjn/8B+Lydow==</latexit>g⌧<latexit sha1_base64="dSa//P/biN8T0p7UDtRAa3DR8qw=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxL/uHz8NFPf1o26VEccGFZ/z56/NHHn3z62ZPPd7748unus+cvvrrkJGYQXUCCCXvvAY5wEKELEQiM3lOGQOhhdOXN3Ey/miPGAxKdiyVFNyHwo2AaQCDk0NUY+jiZprfP96wDK79MvbBXxZ6xuk5vXzz9bzwhMA5RJCAGnF/bFhU3CWAigBilO+OYIwrgDPjoOhbT7k0SRDQWKIKp+VJq0xibgpgZkjkJGIICL2UBIAtkggnvAANQSPCdahRHEQgR35/MA8ofSj73HwoB5FPfJIu8K2llYuIzQO8CuKiQJSDkIRB32iBfhl51EMUYsXlYHcwoJaPiXCAGA5714FQ25h3NGs3PyelKv1vSOxTxNIkZTssTpYAYQ1M5MS85EjFN8oeRqzvjrwSL0X5W5mOvBoDNztBkX+ZUBqo4U0yAqA55YdacCN1DEoYgmiRjmiZjgRYiGe8fpFJ8aXpYmj0C2KTqPEuTZJz1zPPMM2mtiG9L4ltVHJbE4eomBE/MKWHmXK4/YdyURlNaWAARr86+KGZPzQs1+rIkXqriVUm8UkUvLqmxps5L6lxT70vqvaouSuJCFZclcamKH0riB1V8vy32522xvyixsv/ypVjujCdoKr888o9QMoNp8ub85Ic06eXX6rMQI9OuGqG3Nh6N2p1eJ1VlvNZbo67tDHS9MHScvu3WGQpHv+Nag+GG5VDxFtCW1R7222oUxBu923NHur6BtfqdgVNj2NCO3Naws2ofQpFi9Quf22u3tLb4RU7PcZzjnq4XBqflut3DGkPhcAeDQd/NUWjMKEaKl66N7faxpUfRIqhrtVv9Gn2zAFbXcTRYWmJxRo49sHMWgQBWnKL4sLjdfm+oBolN/52+62oLKErt77r2oFVj2LAeD46HRzkJYSDy1a6Qon3tTveoq0aRImjUkSuosZDNnUY9x+poLKTEMnJcJ2ts9U2UL9m1fZM8fOVu3jtzzzbTVDXLF001Z+/e2qx4cY0ZN7tr7dv89RNwI7z+pBA2xcOacNgIA+tYYDM8rIWH2+B93e83xfs14X4jjF/H4jfD+7Xw/jZ4qvtpUzytCaeNMLSOhTbD01p4ug1e6H7RFC9qwkUjjKhjEc3wohZebIMnup80xZOacNIIQ+pYSDM8qYUnW+AZupdbvmynkB0MmBYp9+RyN5vrECdA07kAAuUywQnXZE/cIQEy3QslU/6P7gFx4ZClpiPK17osAyx5VM8k4JDEkchJKE7GPpBKWux6JoE8t5iIiyDMz0HKQ4BQ/piuH1Lux9R4n083m6nET2+TMZEbdiD3sNlJJPlplOpz6GTrnNNBuuKj2SEKYDM7hBBsBtFqn1b5CaFZ2AzKrfuD+2EpB0gexhg6kTd5twr/Xi4a88NALpr8O97Pqm1GsFgbZbUjD4a2egzUi6vDA7t1YNs/tvZen6zOiE+Mb4xvje8M2+gYr403xqlxYUBjZvxm/G78sfvr7p+7f+3+/WB9/Gg152ujcu3+8z8UJZdS</latexit>

f

Although reparameterization doesn’t exist for discrete distributions, we can use so-called continuous relaxations. The Gumbel Softmax is a method that allows us to use a pathwise derivative for a continuous distribution that is almost like a categorical distribution.

We first sample noise from the Gumbel distribution. Then, we transform this noise. First, we add the log-probability for the i-th class. We divide this by the temperature parameter, which should be larger than 0. The result is transformed by the softmax! What does this look like?

Page 18: Lecture 9: Reinforcement Learning Emile van Krieken · 2021. 1. 25. · Emile van Krieken Deep Learning 2020 Lecture 9: Reinforcement Learning dlvu.github.io Reinforcement learning

Probabilities , temperature

<latexit sha1_base64="EsEnuX5FoMFWwQdj/1F+LCwp+7Y=">AAAPD3icfZfdbts2GIbV7q/L1izZDnciLCgwDIYhpY5/DgrUkm0U2NJmXZN0q4OAomlFCCUSJOXYFXQPu4Gdbneww2Gnu4RdwO5jlGLLEilZJ/nC9+XrRx8lm/QoDriwrH8fPPzgw48+/uTRp3ufff54/4uDwy8vOIkZROeQYMLeeoAjHEToXAQCo7eUIRB6GF16t26mXy4Q4wGJ3ogVRVch8KNgHkAg5ND1weGUBtd2y2y32y0zq7+/Pjiy2lZ+mXphr4sjY32dXR8+/m86IzAOUSQgBpy/sy0qrhLARAAxSvemMUcUwFvgo3exmPevkiCisUARTM0nUpvH2BTEzPDMWcAQFHglCwBZIBNMeAMYgELexF41iqMIhIi3ZouA8vuSL/z7QgDZgatkmXcorUxMfAboTQCXFbIEhDwE4kYb5KvQqw6iGCO2CKuDGaVkVJxLxGDAsx6cyca8olnT+RtyttZvVvQGRTxNYobT8kQpIMbQXE7MS45ETJP8ZuRK3/JngsWolZX52LMRYLev0awlcyoDVZw5JkBUh7wwa06E7iAJQxDNkilNk6lAS5FMW+1Uik9MD0uzRwCbVZ2v0ySZZj3zPPO1tFbElyXxpSqOS+J4/SEEz8w5YeZCrj9h3JRGU1pYABGvzj4vZs/NczX6oiReqOJlSbxURS8uqbGmLkrqQlPvSuqdqi5L4lIVVyVxpYrvS+J7VXy7K/bnXbG/KLGy//KlWO1NZ2guv0jyRyi5hWny4s3pD2kyyK/1sxAj064aobcxPp10e4Neqsp4o3cmfdsZ6Xph6DlD260zFI5hz7VG4y3LseItoC2rOx521SiIt3p/4E50fQtrDXsjp8awpZ24nXFv3T6EIsXqFz530O1obfGLnIHjOCcDXS8MTsd1+8c1hsLhjkajoZuj0JhRjBQv3Ri73RNLj6JFUN/qdoY1+nYBrL7jaLC0xOJMHHtk5ywCAaw4RfGwuP3hYKwGiW3/naHragsoSu3vu/aoU2PYsp6MTsZPcxLCQOSrXSFF+7q9/tO+GkWKoElPrqDGQrafNBk4Vk9jISWWieM6WWOrb6J8yd7ZV8n9V+72vTOPbDNNVbN80VRz9u5tzIoX15hxs7vWvstfPwE3wut3CmFTPKwJh40wsI4FNsPDWni4C97X/X5TvF8T7jfC+HUsfjO8Xwvv74Knup82xdOacNoIQ+tYaDM8rYWnu+CF7hdN8aImXDTCiDoW0QwvauHFLnii+0lTPKkJJ40wpI6FNMOTWniyA56hO7nly3YK8ulKmBYp9+RyN5vrECdA07kAAuUywQnXZE/cIAEy3QslU/6P7gFx4ZClpiPKN7osAyx5VM8s4JDEkchJKE6mPpBKWux6ZoE8t5iIiyDMz0TKTYBQ/phublLux9R4n8+3m6nET6+TKZEbdiD3sNlJJPlpkupz6GznnLNRuuaj2SEKYDM7hBBsBtF6n1b5CaFZ2C2UW/d79/1SjpA8jDF0Kj/k1Tr8O7lozA8DuWjy77SVVbuMYLkxympPHgxt9RioF5fHbbvTtu0fO0fPT9dnxEfG18Y3xreGbfSM58YL48w4N6BxZ/xm/G78sf/r/p/7f+3/fW99+GA95yujcu3/8z8CzZm9</latexit>⇡1, ...,⇡K<latexit sha1_base64="SkG7TCRf01UKGl8g5KnI5kEWxU8=">AAAPCHicfZfLbuM2GIU109s07aQz7bIbocEARREEUuL4sphiLNnGLJqZNM1l2jgIKJqWhVAiQVKOPYIeoC/QbfsGXRbd9i36AH2PUootS6RkbfKH5/D400/JJj2KAy4s699Hjz/48KOPP3ny6c5nnz/d/eLZ8y8vOYkZRBeQYMLeeYAjHEToQgQCo3eUIRB6GF15d26mX80R4wGJzsWSopsQ+FEwDSAQcmg8hhQnYwHi9Hvr9tmedWDll6kX9qrYM1bX6e3zp/+NJwTGIYoExIDza9ui4iYBTAQQo3RnHHNEAbwDPrqOxbR7kwQRjQWKYGq+kNo0xqYgZsZlTgKGoMBLWQDIAplgwhlgAApJv1ON4igCIeL7k3lA+UPJ5/5DIYC89ZtkkbcmrUxMfAboLICLClkCQh4CMdMG+TL0qoMoxojNw+pgRikZFecCMRjwrAensjFvadZtfk5OV/psSWco4mkSM5yWJ0oBMYamcmJeciRimuQ3I5f4jr8ULEb7WZmPvRwAdneGJvsypzJQxZliAkR1yAuz5kToHpIwBNEkGdNUPgloIZLx/kEqxRemh6XZI4BNqs6zNEnGWc88zzyT1or4piS+UcVhSRyuPoTgiTklzJzL9SeMm9JoSgsLIOLV2RfF7Kl5oUZflsRLVbwqiVeq6MUlNdbUeUmda+p9Sb1X1UVJXKjisiQuVfF9SXyviu+2xf68LfYXJVb2X74Uy53xBE3lN0j+CCV3ME1en5/8kCa9/Fo9CzEy7aoRemvj0ajd6XVSVcZrvTXq2s5A1wtDx+nbbp2hcPQ7rjUYblgOFW8BbVntYb+tRkG80bs9d6TrG1ir3xk4NYYN7chtDTur9iEUKVa/8Lm9dktri1/k9BzHOe7pemFwWq7bPawxFA53MBj03RyFxoxipHjp2thuH1t6FC2Cula71a/RNwtgdR1Hg6UlFmfk2AM7ZxEIYMUpiofF7fZ7QzVIbPrv9F1XW0BRan/XtQetGsOG9XhwPDzKSQgDka92hRTta3e6R101ihRBo45cQY2FbD5p1HOsjsZCSiwjx3WyxlbfRPmSXds3ycNX7ua9M/dsM01Vs3zRVHP27q3NihfXmHGzu9a+zV8/ATfC63cKYVM8rAmHjTCwjgU2w8NaeLgN3tf9flO8XxPuN8L4dSx+M7xfC+9vg6e6nzbF05pw2ghD61hoMzythafb4IXuF03xoiZcNMKIOhbRDC9q4cU2eKL7SVM8qQknjTCkjoU0w5NaeLIFnqF7ueXLdgry6UqYFin35HI3m+sQJ0DTuQAC5TLBCddkT8yQAJnuhZIp/0f3gLhwyFLTEeVrXZYBljyqZxJwSOJI5CTZMccHUkmLXc8kkOcWE3ERhPlhSLkJEMof0/VNyv2YGu/z6WYzlfjpbTImcsMO5B42O4kkP41SfQ6dbJ1zOkhXfDQ7RAFsZocQgs0gWu3TKj8hNAu7g3Lr/uB+WMoBkocxhk7kh7xdhX8nF435YSAXTf4d72fVNiNYrI2y2pEHQ1s9BurF1eGB3Tqw7R9be69OVmfEJ8bXxjfGt4ZtdIxXxmvj1LgwoEGN34zfjT92f939c/ev3b8frI8freZ8ZVSu3X/+ByrqmTs=</latexit>

⌧ > 0<latexit sha1_base64="PlqAUZXtSC35Pnj91freVBvb4ng=">AAAPOnicfZdNb9s2HMbV7q3L1izdjrsICwp0g2FIjeMXYAVqyfYKbG2zrmna1UFA0bQihBIJknLsCvoy+xr7Artutx132GHYdR9glGLLEilZlzB8Hj7+8U/RJj2KAy4s689bt997/4MPP7rz8d4nn97d/+zg3uevOIkZRKeQYMJee4AjHEToVAQCo9eUIRB6GJ15V26mny0Q4wGJXooVRech8KNgHkAgZNfFwbdTRHmAZdNume12u2UWHd+bUx6E5pRQxIAgLAIhSr6LQw/h9IHVMu2vLw4OrbaVP6besNeNQ2P9nFzcu/v3dEZgHKJIQAw4f2tbVJwngIkAYpTuTWOOKIBXwEdvYzHvnydBRGOBIpia96U2j7EpiJnNxJwFDEGBV7IBIAtkggkvAQNQyPnuVaM4yuh5a7YIKL9p8oV/0xBAFus8WebFTCsDE58BehnAZYUsASEPgbjUOvkq9KqdKMaILcJqZ0YpGRXnEjEY8KwGJ7Iwz2m2PvwlOVnrlyt6iSKeJjHDaXmgFBBjaC4H5k2OREyTfDLypbjijwSLUStr5n2PRoBdvUCzlsypdFRx5pgAUe3ywqw4EbqGJAxBNEumNE2mAi1FMm21UyneNz0szR4BbFZ1vkiTZJrVzPPMF9JaEZ+VxGeqOC6J4/WHEDwz54SZC7n+hHFTGk1pYQFEvDr6tBg9N0/V6Fcl8ZUqnpXEM1X04pIaa+qipC409bqkXqvqsiQuVXFVEleq+K4kvlPF17ti3+yK/VmJlfWXm2K1N52hufzOyV+h5AqmyZOXT39Ik0H+rN+FGJl21Qi9jfFo0u0Neqkq443emfRtZ6TrhaHnDG23zlA4hj3XGo23LA8VbwFtWd3xsKtGQbzV+wN3outbWGvYGzk1hi3txO2Me+vyIRQpVr/wuYNuRyuLX+QMHMc5Huh6YXA6rtt/WGMoHO5oNBq6OQqNGcVI8dKNsds9tvQoWgT1rW5nWKNvF8DqO44GS0sszsSxR3bOIhDAilMUL4vbHw7GapDY1t8Zuq62gKJU/r5rjzo1hi3r8eh4fJSTEAYiX60KKcrX7fWP+moUKYImPbmCGgvZftJk4Fg9jYWUWCaO62SFre5Eucne2ufJzVfudt+Zh7aZpqpZbjTVnO29jVnx4hozbnbX2nf56wfgRnh9phA2xcOacNgIA+tYYDM8rIWHu+B93e83xfs14X4jjF/H4jfD+7Xw/i54qvtpUzytCaeNMLSOhTbD01p4ugte6H7RFC9qwkUjjKhjEc3wohZe7IInup80xZOacNIIQ+pYSDM8qYUnO+AZupZHvuykIN+uhGmR8kwuT7O5DnECNJ0LIFAuE5xwTfbEJRIg071QMuX/6B4QFw7Z1HR5rdno6xuO5pkFHJI4EjkJxcnUB1JJi1PPLJD3FhNxEYT59UmZBAjlj+lmkvI8psb7fL49TCV+epFU71Y/TVJ9DJ3tHHMyStd8NLtEAWxmlxCCzSBan9MqPyE0C7uC8uh+475ZyhGSlzGGnsoPeb4O/0YuGvPDQC6a/DttZa1dRrDcGGVrT14MbfUaqDfOHrbtTtu2f+wcPn66viPeMb40vjIeGLbRMx4bT4wT49SAxi/Gb8bvxh/7v+7/tf/P/r831tu31mO+MCrP/n//A6Dyq00=</latexit>

✏1, ..., ✏K ⇠ Gumbel(0, 1)<latexit sha1_base64="dsDndegRp6mghq77Qu4vHvM4YZE=">AAAPeXicfZfNc5tGGMZx+pHUbWKnPfbC1JMZJ/W4EMv6OGQmAkmTQ5O4aRwnDR7PslphxsDu7C6yFIY/tMce+k/01BckI1hAXLzmefbRj3cXaV+XBb6QhvH3zr2vvv7m2/sPvtv9/oeHj/b2H//4QdCYY3KOaUD5RxcJEvgROZe+DMhHxgkK3YBcuDd2pl/MCRc+jd7LJSOXIfIif+ZjJOHW1X7iCBSygOgvdAfLIPHSq8TBLEgcieI0PXRcwsSR7rg0mIplCH8Sh/npU7BTRjiSlEcoJMmfdCZDtEgPD52AejW//queJz39bRP+9Gr/wDg28kuvD8z14EBbX2dXjx/+60wpjkMSSRwgIT6bBpOXCeLSxwFJd51YEIbwDfLI51jO+peJH7FYkgin+hPQZnGgS6pnhdCnPifwxEsYIMx9SNDxNeIISyjXbjVKkOwhxdF07jOxGoq5txpIBLW+TBb5WqSViYnHEbv28aJClqBQhEhe125m9areJHFA+Dys3swogVFxLgjHvshqcAaFecuy5RXv6dlav16yaxKJNIl5kJYngkA4JzOYmA8FkTFL8oeBPXUjXkgek6NsmN97MUL85h2ZHkFO5UYVZxZQJKu33DArTkRuMQ1DFE1hX6SwEchCJs7RcQriE90NwOxSxKdV57s0SZysZq6rvwNrRXxTEt+o4rgkjtcfAltTn1Guz2H9KRc6GHWwcB8TUZ19Xsye6edq9IeS+EEVL0rihSq6cUmNa+q8pM5r6m1JvVXVRUlcqOKyJC5V8UtJ/KKKH7fFftoW+5cSC/WHl2K560zJDL6y8i2U3OA0efX+9e9pMsiv9V6IiW5Wjdi9M55Mur1BL1Xl4E7vTPqmNarrhaFnDU27yVA4hj3bGI03LM8VbwFtGN3xsKtG4WCj9wf2pK5vYI1hb2Q1GDa0E7sz7q3LR0ikWL3CZw+6nVpZvCJnYFnW6aCuFwarY9v95w2GwmGPRqOhnaOwmMNPh+Jld8Zu99SoR7EiqG90O8MGfbMARt+yarCsxGJNLHNk5iySoEBxymKz2P3hYKwGyU39raFt1xZQlsrft81Rp8GwYT0dnY5PchLKUeSpVaFF+bq9/klfjaJF0KQHK1hjoZtPmgwso1djoSWWiWVbWWGrbyK8ZJ/Ny2T1lbt57/QDU09T1QwvmmrO3r07s+INGsxBu7vRvs3fPCFoha8/KcZt8bghHLfC4CYW3A6PG+HxNniv7vfa4r2GcK8Vxmti8drhvUZ4bxs8q/tZWzxrCGetMKyJhbXDs0Z4tg1e1v2yLV42hMtWGNnEItvhZSO83AZP637aFk8bwmkrDG1ioe3wtBGeboHn5BaOfNlJAXZXwmuRcCaH02yu4yBBNV1IJEkuQ7sharIrr4lEme6GwJT/U/eguHBknY+qQ/Nyp8PQD4BH9Ux9gWkcyZwk63I8BEpanHqmPvQtOhHSD/PuS3mIvA+7e0g4j6nxnphtDlOrRq3agk3S+hw23TrnbJSu+VjWRKFAXzVtuh+tz2mVnxCWhd1gOLqv3KulHBFoxjh5DR/ydh3+DBaNe6EPiwZ/naNstM0InePaCKNdaAxNtQ2sDy6eH5udY9P8o3Pw8vW6R3yg/az9oh1qptbTXmqvtDPtXMPaPzv3d/Z3Hj/6b0/fO9x7trLe21nP+UmrXHsn/wMw4cEH</latexit>

z = g⌧(✏,⇡) = Softmax((log⇡+ ✏)/⌧)

GUMBEL SOFTMAX

51

Jang, Eric, Shixiang Gu, and Ben Poole. "Categorical Reparameterization with Gumbel-Softmax." (2016).

In this image, you’ll see what the gumbel softmax distribution looks like. Samples (bottom row) look a lot like samples from a categorical distribution, for lower temperatures. For higher temperatures, it looks more like a uniform distribution over classes. However, note that it is not a distribution: It is a sample! So where the categorical distribution samples ‘one-hot’ vectors, the gumbel-softmax samples “kinda one-hot vectors, but not really”.

This is similar for the expectation of the distribution. For low temperatures, the expectation looks very similar to the expectation of the categorical distribution. For higher temperatures, the expectation again looks rather more like a uniform distribution over classes.

If the temperature increases • Variance decreases :) • Bias increases :(

<latexit sha1_base64="cIBrXXvYBrNrBvY0SmZNGFAjNp8=">AAAPBnicfZdNb9s2HMbV7q3L1qzdjrsICwoMQxBIieOXQ4Faso0eljbL8tI1DgKKpmUhlEiQlGNX0HlfYNftG+w47LqvsQ+w7zFKsWWJlKxL/uHz8PFPf0o26VEccGFZ/z56/NHHn3z62ZPPd7748unuV8+ef33JScwguoAEE/bOAxzhIEIXIhAYvaMMgdDD6Mq7czP9ao4YD0h0LpYU3YTAj4JpAIGQQ+/HkOJkLECc3j7bsw6s/DL1wl4Ve8bqOr19/vS/8YTAOESRgBhwfm1bVNwkgIkAYpTujGOOKIB3wEfXsZh2b5IgorFAEUzNF1KbxtgUxMyozEnAEBR4KQsAWSATTDgDDEAh2XeqURxFIER8fzIPKH8o+dx/KASQN36TLPLGpJWJic8AnQVwUSFLQMhDIGbaIF+GXnUQxRixeVgdzCglo+JcIAYDnvXgVDbmLc16zc/J6UqfLekMRTxNYobT8kQpIMbQVE7MS45ETJP8ZuQC3/GXgsVoPyvzsZcDwO7O0GRf5lQGqjhTTICoDnlh1pwI3UMShiCaJGOayucALUQy3j9IpfjC9LA0ewSwSdV5libJOOuZ55ln0loR35TEN6o4LInD1YcQPDGnhJlzuf6EcVMaTWlhAUS8OvuimD01L9Toy5J4qYpXJfFKFb24pMaaOi+pc029L6n3qrooiQtVXJbEpSp+KIkfVPHdtthftsW+V2Jl/+VLsdwZT9BUfn/kj1ByB9Pk9fnJj2nSy6/VsxAj064aobc2Ho3anV4nVWW81lujru0MdL0wdJy+7dYZCke/41qD4YblUPEW0JbVHvbbahTEG73bc0e6voG1+p2BU2PY0I7c1rCzah9CkWL1C5/ba7e0tvhFTs9xnOOerhcGp+W63cMaQ+FwB4NB381RaMwoRoqXro3t9rGlR9EiqGu1W/0afbMAVtdxNFhaYnFGjj2wcxaBAFaconhY3G6/N1SDxKb/Tt91tQUUpfZ3XXvQqjFsWI8Hx8OjnIQwEPlqV0jRvnane9RVo0gRNOrIFdRYyOaTRj3H6mgspMQyclwna2z1TZQv2bV9kzx85W7eO3PPNtNUNcsXTTVn797arHhxjRk3u2vt2/z1E3AjvH6nEDbFw5pw2AgD61hgMzyshYfb4H3d7zfF+zXhfiOMX8fiN8P7tfD+Nniq+2lTPK0Jp40wtI6FNsPTWni6DV7oftEUL2rCRSOMqGMRzfCiFl5sgye6nzTFk5pw0ghD6lhIMzyphSdb4Bm6l1u+bKcgn66EaZFyTy53s7kOcQI0nQsgUC4TnHBN9sQMCZDpXiiZ8n90D4gLhyw1HVG+1mUZYMmjeiYBhySORE6SHXJ8IJW02PVMAnluMREXQZgfhZSbAKH8MV3fpNyPqfE+n242U4mf3iZjIjfsQO5hs5NI8vMo1efQydY5p4N0xUezQxTAZnYIIdgMotU+rfITQrOwOyi37g/uh6UcIHkYY+hEfsjbVfgPctGYHwZy0eTf8X5WbTOCxdooqx15MLTVY6BeXB0e2K0D2/6ptffqZHVGfGJ8a3xnfG/YRsd4Zbw2To0LAxqR8Zvxu/HH7q+7f+7+tfv3g/Xxo9Wcb4zKtfvP/3f7mLk=</latexit>⌧

BIASED ESTIMATORS

52

Jang, Eric, Shixiang Gu, and Ben Poole. "Categorical Reparameterization with Gumbel-Softmax." (2016).

<latexit sha1_base64="lyLY/GSrbEuf+FI7XttV1ZoxKIk=">AAAP0nicfZdPb9s2GMad7l+XrUu7HXcRFhRIhyywWsd/BgSoJdvoYWmztmnaRUZG0bSiWRIJknLsCjoMu+7j7bIPsO+xV7ItS5RkXczwefjop5eUQtrMc4VsNv/du/fJp599/sX9L/e/+vrBNwcPH337TtCQY3KJqUf5exsJ4rkBuZSu9Mh7xgnybY9c2TMz0a/mhAuXBm/lkpGxj5zAnboYSei6efiPRRnhSFIeIJ9EhotEfKZZPpK3th0N45uIHVk2YeJJfG0FyPbQTWTZ1JuIpQ8/kcXcONYsAImm8ZGFpRc5MMjCDDSJwjheDT/W1FFPnox/2pGYJ6gwHFkC+cwjCdbm5uuesXbz8LB50kwvrdzQ143Dxvq6uHn04D9rQnHok0BiDwlxrTeZHEeISxd7JN63QkEYwjPkkOtQTrvjyA1YKEmAY+0xaNPQ0yTVkgprE5cTqMMSGghzFxI0fIs4whLmYb8YJUhSdnE8mbtMrJpi7qwaEmpDxtEineS4MDByOGK3Ll4UyCLki6Rspc6kcMVOEnqEz/1iZ0IJjIpzQTh2RVKDCyjMK5asG/GWXqz12yW7JYGIo5B7cX4gCIRzMoWBaVMQGbIofRhYrDNxJnlIjpNm2nc2QHz2mkyOIafQUcSZehTJYpftJ8UJyB2mvo+CCSyQGNYeWcjIOj6JQXyswSrDM5siPik6X8dRtF5q2muwFsSXOfGlKg5z4nB9E1ij2pRybQ7zT7nQwKiBhbuYiOLoy2z0VLtUo9/lxHeqeJUTr1TRDnNqWFLnOXVeUu9y6p2qLnLiQhWXOXGpih9z4kdVfL8r9sOu2N+UWKg/vBTLfWtCpvAtTJdQNMNx9OLt+S9x1Euv9VoIiaYXjdjeGJ+N2p1eJ1Zlb6O3Rl3dGJT1zNAx+rpZZcgc/Y7ZHAy3LE8VbwbdbLaH/bYahb2t3u2Zo7K+hW32OwOjwrClHZmtYWddPkICxepkPrPXbpXK4mQ5PcMwTntlPTMYLdPsPq0wZA5zMBj0zRSFhRy+5IqXbYzt9mmzHMWyoG6z3epX6NsJaHYNowTLcizGyNAHesoiCfIUp8wWi9nt94ZqkNzW3+ibZmkCZa78XVMftCoMW9bTwenwWUpCOQoctSo0K1+7033WVaNoFjTqwAyWWOj2TqOe0eyUWGiOZWSYRlLY4psIL9m1Po5Wn9zte6cd6locq2Z40VRz8u5tzIrXqzB79e5K+y5/9QCvFr78pBjXxeOKcFwLg6tYcD08roTHu+Cdst+pi3cqwp1aGKeKxamHdyrhnV3wrOxndfGsIpzVwrAqFlYPzyrh2S54WfbLunhZES5rYWQVi6yHl5Xwchc8LftpXTytCKe1MLSKhdbD00p4ugOekzvY8iU7BVhdES9Fwp4cdrOpjr0IlXQhkSSpDEcMUZJteUskSnTbB6b0j7IHhZkjOQ+pOhyONjo0XQ94VM/EFZiGgUxJkoOVg0CJs13PxIVzi0aEdP30WKc8RHos2jwk7MfUeEdMt5up1fGtcCh8M4rLY9hk55iLQbzmY8khCnna6vSmucF6n1b4F8KSsBmGrfvKvZrKAYHDGCfncJNX6/AfYdK447swafBrHSetXUa02BihtQ8HQ109BpYbV09P9NaJrv/aOnx+vj4j3m983/ihcdTQG53G88aLxkXjsoH3ft77fc/d++PgzcHy4M+Dv1bWe3vrMd81CtfB3/8DpU/nVA==</latexit>

Bias = Ep(✏)[r⇡f(g⌧(✏,⇡))]-r⇡Ep⇡(z)[f(z)]

So clearly, there’s something with this temperature parameter. When it increases, the variance decreases, so that’s good! This is quite easy to see: Samples will look more like very similar uniform distributions.

However, when the temperature increases, the bias also increases! What is the bias? The bias is the difference between the true gradient and the expected gradient using the gumbel softmax. So far, we have only looked at methods that are unbiased, so that the true gradient and expected gradient are equal. However, that’s not the case for the gumbel softmax: In essence, you could say that its approximation is wrong, and gets even more wrong when the temperature increases.

Does that mean that we cannot use it? Actually, the gumbel-softmax works remarkably well. It requires a continuous, differentiable function, but it can optimize to good performance. This is because, although it is biased, it becomes unbiased when the temperature approaches zero. When balancing bias and variance properly through controlling the temperature (called bias-variance trade-off), we can do well!

Discrete distributions: Make crisp choices • Example: Graphs from text or images • Transform neural representations to symbolic representation

DISCRETE DISTRIBUTIONS

53

XU, Pengfei, et al. A Survey of Scene Graph: Generakon and Applicakon. EasyChair, 2020.

Kipf, Thomas, et al. "Neural Relakonal Inference for Interackng Systems." Internakonal Conference on Machine Learning. 2018.

hlps://medium.com/analykcs-vidhya/enkty-linking-a-primary-nlp-task-for-informakon-extrackon-22f9d4b90aa8

So why do we even care so much about discrete distributions? Discrete distributions allow us to make crisp choices within our neural network: We can, for example, choose to only do one computation path instead of all of them. Or we can have intermediate symbolic representations of what the neural network is thinking of. For instance, we can transform text into a graph using a neural network, and then reason on this graph using a GCN to answer questions about the graph. The same thing goes for images: Transform them first into a graph representation and then answer questions!

Intermediate symbolic representations also are much more interpretable than the very high-dimensional continuous representations of neural networks. If we can transform these representations between each other, this could help us understand the neural network.

Page 19: Lecture 9: Reinforcement Learning Emile van Krieken · 2021. 1. 25. · Emile van Krieken Deep Learning 2020 Lecture 9: Reinforcement Learning dlvu.github.io Reinforcement learning

Neuro-Symbolic AI: Combine Deep Learning and Logic • Deep Learning: Continuous • Logic: Discrete • Gradient estimation allows integrating these fields!

NEURO-SYMBOLIC AI

54

The field that tries to combine symbolic reasoning with Deep Learning is called neuro-symbolic AI. A fundamental challenge of this field is that Deep Learning does continuous reasoning, while symbolic reasoning, like logic, reasons about discrete data. This mismatch makes it hard to integrate the two fields.

However, gradient estimation turns out to be a general and principled way to integrate the two fields!

QUESTION ANSWERING WITH QUERIES

55

Who wrote “The Hobbit”?

<latexit sha1_base64="hfskakxjJ4XE/BPcIPEUN43jUxo=">AAAPBHicfZdPb9s2GMbV7l+XrWm7HXcRFhQYhiCQEsd/DgVqyTZ6WNosS5pscRBQNC0LoUSCpBy7go77Artu32DHYdd9j32AfY9Rii1LpGRd8obPw8c/vZRs0qM44MKy/n30+KOPP/n0syef73zx5dPdZ89ffPWek5hBdAEJJuzKAxzhIEIXIhAYXVGGQOhhdOnduZl+OUeMByQ6F0uKbkLgR8E0gEDIoauxTEjGi/T2+Z51YOWXqRf2qtgzVtfp7Yun/40nBMYhigTEgPNr26LiJgFMBBCjdGccc0QBvAM+uo7FtHuTBBGNBYpgar6U2jTGpiBmxmROAoagwEtZAMgCmWDCGWAACkm+U43iKAIh4vuTeUD5Q8nn/kMhgLztm2SRtyWtTEx8BugsgIsKWQJCHgIx0wb5MvSqgyjGiM3D6mBGKRkV5wIxGPCsB6eyMe9o1ml+Tk5X+mxJZyjiaRIznJYnSgExhqZyYl5yJGKa5Dcjl/eOvxIsRvtZmY+9GgB2d4Ym+zKnMlDFmWICRHXIC7PmROgekjAE0SQZ0zQZC7QQyXj/IJXiS9PD0uwRwCZV51maJOOsZ55nnklrRXxbEt+q4rAkDlcfQvDEnBJmzuX6E8ZNaTSlhQUQ8ersi2L21LxQo9+XxPeqeFkSL1XRi0tqrKnzkjrX1PuSeq+qi5K4UMVlSVyq4oeS+EEVr7bF/rwt9hclVvZfvhTLnfEETeW3R/4IJXcwTd6cn/yQJr38Wj0LMTLtqhF6a+PRqN3pdVJVxmu9NerazkDXC0PH6dtunaFw9DuuNRhuWA4VbwFtWe1hv61GQbzRuz13pOsbWKvfGTg1hg3tyG0NO6v2IRQpVr/wub12S2uLX+T0HMc57ul6YXBarts9rDEUDncwGPTdHIXGjGKkeOna2G4fW3oULYK6VrvVr9E3C2B1HUeDpSUWZ+TYAztnEQhgxSmKh8Xt9ntDNUhs+u/0XVdbQFFqf9e1B60aw4b1eHA8PMpJCAORr3aFFO1rd7pHXTWKFEGjjlxBjYVsPmnUc6yOxkJKLCPHdbLGVt9E+ZJd2zfJw1fu5r0z92wzTVWzfNFUc/burc2KF9eYcbO71r7NXz8BN8LrdwphUzysCYeNMLCOBTbDw1p4uA3e1/1+U7xfE+43wvh1LH4zvF8L72+Dp7qfNsXTmnDaCEPrWGgzPK2Fp9vghe4XTfGiJlw0wog6FtEML2rhxTZ4ovtJUzypCSeNMKSOhTTDk1p4sgWeoXu55ct2CvLpSpgWKffkcjeb6xAnQNO5AALlsjxYcE32xAwJkOleKJnyf3QPiAuHLDUdUb7WZRlgyaN6JgGHJI5ETkLlEccHUkmLXc8kkOcWE3ERhPlBSLkJEMof0/VNyv2YGu/z6WYzlfjpbTImcsMO5B42O4kkP41SfQ6dbJ1zOkhXfDQ7RAFsZocQgs0gWu3TKj8hNAu7g3Lr/uB+WMoBkocxhk7kh7xbhX8vF435YSAXTf4d72fVNiNYrI2y2pEHQ1s9BurF5eGB3Tqw7R9be69PVmfEJ8Y3xrfGd4ZtdIzXxhvj1LgwoIGN34zfjT92f939c/ev3b8frI8freZ8bVSu3X/+BwQYl9I=</latexit>x

J.R.R. Tolkien wrote “The Hobbit"

<latexit sha1_base64="Pun0bnZ4nz1JCDx7Zzagcw0h6Yk=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxL/uHz8NFPf1o26VEccGFZ/z56/NHHn3z62ZPPd7748unus+cvvrrkJGYQXUCCCXvvAY5wEKELEQiM3lOGQOhhdOXN3Ey/miPGAxKdiyVFNyHwo2AaQCDk0NUY+jhZprfP96wDK79MvbBXxZ6xuk5vXzz9bzwhMA5RJCAGnF/bFhU3CWAigBilO+OYIwrgDPjoOhbT7k0SRDQWKIKp+VJq0xibgpgZkjkJGIICL2UBIAtkggnvAANQSPCdahRHEQgR35/MA8ofSj73HwoB5FPfJIu8K2llYuIzQO8CuKiQJSDkIRB32iBfhl51EMUYsXlYHcwoJaPiXCAGA5714FQ25h3NGs3PyelKv1vSOxTxNIkZTssTpYAYQ1M5MS85EjFN8oeRqzvjrwSL0X5W5mOvBoDNztBkX+ZUBqo4U0yAqA55YdacCN1DEoYgmiRjmiZjgRYiGe8fpFJ8aXpYmj0C2KTqPEuTZJz1zPPMM2mtiG9L4ltVHJbE4eomBE/MKWHmXK4/YdyURlNaWAARr86+KGZPzQs1+rIkXqriVUm8UkUvLqmxps5L6lxT70vqvaouSuJCFZclcamKH0riB1V8vy32522xvyixsv/ypVjujCdoKr888o9QMoNp8ub85Ic06eXX6rMQI9OuGqG3Nh6N2p1eJ1VlvNZbo67tDHS9MHScvu3WGQpHv+Nag+GG5VDxFtCW1R7222oUxBu923NHur6BtfqdgVNj2NCO3Naws2ofQpFi9Quf22u3tLb4RU7PcZzjnq4XBqflut3DGkPhcAeDQd/NUWjMKEaKl66N7faxpUfRIqhrtVv9Gn2zAFbXcTRYWmJxRo49sHMWgQBWnKL4sLjdfm+oBolN/52+62oLKErt77r2oFVj2LAeD46HRzkJYSDy1a6Qon3tTveoq0aRImjUkSuosZDNnUY9x+poLKTEMnJcJ2ts9U2UL9m1fZM8fOVu3jtzzzbTVDXLF001Z+/e2qx4cY0ZN7tr7dv89RNwI7z+pBA2xcOacNgIA+tYYDM8rIWH2+B93e83xfs14X4jjF/H4jfD+7Xw/jZ4qvtpUzytCaeNMLSOhTbD01p4ug1e6H7RFC9qwkUjjKhjEc3wohZebIMnup80xZOacNIIQ+pYSDM8qYUnW+AZupdbvmynkB0MmBYp9+RyN5vrECdA07kAAuUywQnXZE/cIQEy3QslU/6P7gFx4ZClpiPK17osAyx5VM8k4JDEkchJKE7GPpBKWux6JoE8t5iIiyDMz0HKQ4BQ/piuH1Lux9R4n083m6nET2+TMZEbdiD3sNlJJPlplOpz6GTrnNNBuuKj2SEKYDM7hBBsBtFqn1b5CaFZ2AzKrfuD+2EpB0gexhg6kTd5twr/Xi4a88NALpr8O97Pqm1GsFgbZbUjD4a2egzUi6vDA7t1YNs/tvZen6zOiE+Mb4xvje8M2+gYr403xqlxYUBjZvxm/G78sfvr7p+7f+3+/WB9/Gg152ujcu3+8z8popdl</latexit>y

<http://dbpedia.org/property/JRRTolkien, ?name, "JRR Tolkien”> ...

Execute query <latexit sha1_base64="ytaGjQ3HxJR6GVJlezzzjEx+HxU=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxLGD4PH/30pyiTHsUBF5b176PHH338yaefPfl854svn+4+e/7iq0tOYgbRBSSYsPce4AgHEboQgcDoPWUIhB5GV97MzfSrOWI8ING5WFJ0EwI/CqYBBEJ2XY05CClGt8/3rAMrv0y9Ya8ae8bqOr198fS/8YTAOESRgBhwfm1bVNwkgIkAYpTujGOOKIAz4KPrWEy7N0kQ0VigCKbmS6lNY2wKYmZI5iRgCAq8lA0AWSATTHgHGIBCgu9UoziKQIj4/mQeUP7Q5HP/oSGAfOqbZJFXJa0MTHwG6F0AFxWyBIQ8BOJO6+TL0Kt2ohgjNg+rnRmlZFScC8RgwLManMrCvKNZofk5OV3pd0t6hyKeJjHDaXmgFBBjaCoH5k2OREyT/GHk7M74K8FitJ81875XA8BmZ2iyL3MqHVWcKSZAVLu8MCtOhO4hCUMQTZIxTZOxQAuRjPcPUim+ND0szR4BbFJ1nqVJMs5q5nnmmbRWxLcl8a0qDkvicHUTgifmlDBzLuefMG5KoyktLICIV0dfFKOn5oUafVkSL1XxqiReqaIXl9RYU+clda6p9yX1XlUXJXGhisuSuFTFDyXxgyq+3xb787bYX5RYWX+5KJY74wmayo9H/golM5gmb85PfkiTXn6t3oUYmXbVCL218WjU7vQ6qSrjtd4adW1noOuFoeP0bbfOUDj6HdcaDDcsh4q3gLas9rDfVqMg3ujdnjvS9Q2s1e8MnBrDhnbktoadVfkQihSrX/jcXrullcUvcnqO4xz3dL0wOC3X7R7WGAqHOxgM+m6OQmMmP+OKl66N7faxpUfRIqhrtVv9Gn0zAVbXcTRYWmJxRo49sHMWgQBWnKJ4WdxuvzdUg8Sm/k7fdbUJFKXyd1170KoxbFiPB8fDo5yEMBD5alVIUb52p3vUVaNIETTqyBnUWMjmTqOeY3U0FlJiGTmukxW2uhLlIru2b5KHT+5m3Zl7tpmmqlkuNNWcrb21WfHiGjNudtfat/nrB+BGeP1JIWyKhzXhsBEG1rHAZnhYCw+3wfu632+K92vC/UYYv47Fb4b3a+H9bfBU99OmeFoTThthaB0LbYantfB0G7zQ/aIpXtSEi0YYUccimuFFLbzYBk90P2mKJzXhpBGG1LGQZnhSC0+2wDN0L7d82U5Bvl0J0yLlnlzuZnMd4gRoOhdAoFwmOOGa7Ik7JECme6Fkyv/RPSAuHLKp6YjytS6bAZY8qmcScEjiSOQkFCdjH0glLXY9k0CeW0zERRDm5yDlIfIz0foh5X5Mjff5dLOZSvz0NhkTuWEHcg+bnUSSn0apPoZOto45HaQrPpodogA2s0MIwWYQrfZplZ8QmoXNoNy6P7gfpnKA5GGMoRN5k3er8O/lpDE/DOSkyb/j/ay1zQgWa6Ns7ciDoa0eA/XG1eGB3Tqw7R9be69PVmfEJ8Y3xrfGd4ZtdIzXxhvj1LgwoDEzfjN+N/7Y/XX3z92/dv9+sD5+tBrztVG5dv/5Hw1/l0A=</latexit>z

Encode <latexit sha1_base64="7eHQXERXRpCNg9XYo48iDs3MBF8=">AAAO/XicfZdNb9s2HMbV7q3L1qzdjrsICwoMQxBIieOXQ4Faso1iWNs0y0u3OAgomlaEUCJBUo5dQcPuu27fYMdh132WfYB9j1GyLUukZF3yD5+Hj3/6U7JJj+KAC8v698HDDz786ONPHn2689nnj3e/ePL0ywtOYgbROSSYsHce4AgHEToXgcDoHWUIhB5Gl96dm+mXM8R4QKIzsaDoOgR+FEwDCIQcent282TPOrDyy9QLe1XsGavr5Obp4//GEwLjEEUCYsD5lW1RcZ0AJgKIUbozjjmiAN4BH13FYtq9ToKIxgJFMDWfSW0aY1MQM4MxJwFDUOCFLABkgUww4S1gAAqJvFON4igCIeL7k1lA+bLkM39ZCCDv9zqZ5/1IKxMTnwF6G8B5hSwBIQ+BuNUG+SL0qoMoxojNwupgRikZFeccMRjwrAcnsjFvaNZifkZOVvrtgt6iiKdJzHBanigFxBiayol5yZGIaZLfjFzXO/5csBjtZ2U+9nwA2N0pmuzLnMpAFWeKCRDVIS/MmhOhe0jCEESTZEzTZCzQXCTj/YNUis9MD0uzRwCbVJ2naZKMs555nnkqrRXxdUl8rYrDkjhcfQjBE3NKmDmT608YN6XRlBYWQMSrs8+L2VPzXI2+KIkXqnhZEi9V0YtLaqyps5I609T7knqvqvOSOFfFRUlcqOL7kvheFd9ti/1pW+zPSqzsv3wpFjvjCZrKr438EUruYJq8PHv1Q5r08mv1LMTItKtG6K2NR6N2p9dJVRmv9daoazsDXS8MHadvu3WGwtHvuNZguGE5VLwFtGW1h/22GgXxRu/23JGub2Ctfmfg1Bg2tCO3Neys2odQpFj9wuf22i2tLX6R03Mc57in64XBablu97DGUDjcwWDQd3MUGjOKkeKla2O7fWzpUbQI6lrtVr9G3yyA1XUcDZaWWJyRYw/snEUggBWnKB4Wt9vvDdUgsem/03ddbQFFqf1d1x60agwb1uPB8fAoJyEMRL7aFVK0r93pHnXVKFIEjTpyBTUWsvmkUc+xOhoLKbGMHNfJGlt9E+VLdmVfJ8uv3M17Z+7ZZpqqZvmiqebs3VubFS+uMeNmd619m79+Am6E1+8UwqZ4WBMOG2FgHQtshoe18HAbvK/7/aZ4vybcb4Tx61j8Zni/Ft7fBk91P22KpzXhtBGG1rHQZnhaC0+3wQvdL5riRU24aIQRdSyiGV7Uwott8ET3k6Z4UhNOGmFIHQtphie18GQLPEP3csuX7RTk05UwLVLuyeVuNtchToCmcwEEymWCE67JnrhFAmS6F0qm/B/dA+LCIUtNR5SvdVkGWPKonknAIYkjkZNQnIx9IJW02PVMAnluMREXQZifgJSbAKH8MV3fpNyPqfE+n242U4mf3iRjIjfsQO5hs5NI8uMo1efQydY5J4N0xUezQxTAZnYIIdgMotU+rfITQrOwOyi37kv3cikHSB7GGHolP+TNKvw7uWjMDwO5aPLveD+rthnBfG2U1Y48GNrqMVAvLg8P7NaBbb9t7b34/tflGfGR8bXxjfGtYRsd44Xx0jgxzg1oIOM343fjj91fdv/c/Wv376X14YPVufIro3Lt/vM/62OU9w==</latexit>

TEncode <latexit sha1_base64="hfskakxjJ4XE/BPcIPEUN43jUxo=">AAAPBHicfZdPb9s2GMbV7l+XrWm7HXcRFhQYhiCQEsd/DgVqyTZ6WNosS5pscRBQNC0LoUSCpBy7go77Artu32DHYdd9j32AfY9Rii1LpGRd8obPw8c/vZRs0qM44MKy/n30+KOPP/n0syef73zx5dPdZ89ffPWek5hBdAEJJuzKAxzhIEIXIhAYXVGGQOhhdOnduZl+OUeMByQ6F0uKbkLgR8E0gEDIoauxTEjGi/T2+Z51YOWXqRf2qtgzVtfp7Yun/40nBMYhigTEgPNr26LiJgFMBBCjdGccc0QBvAM+uo7FtHuTBBGNBYpgar6U2jTGpiBmxmROAoagwEtZAMgCmWDCGWAACkm+U43iKAIh4vuTeUD5Q8nn/kMhgLztm2SRtyWtTEx8BugsgIsKWQJCHgIx0wb5MvSqgyjGiM3D6mBGKRkV5wIxGPCsB6eyMe9o1ml+Tk5X+mxJZyjiaRIznJYnSgExhqZyYl5yJGKa5Dcjl/eOvxIsRvtZmY+9GgB2d4Ym+zKnMlDFmWICRHXIC7PmROgekjAE0SQZ0zQZC7QQyXj/IJXiS9PD0uwRwCZV51maJOOsZ55nnklrRXxbEt+q4rAkDlcfQvDEnBJmzuX6E8ZNaTSlhQUQ8ersi2L21LxQo9+XxPeqeFkSL1XRi0tqrKnzkjrX1PuSeq+qi5K4UMVlSVyq4oeS+EEVr7bF/rwt9hclVvZfvhTLnfEETeW3R/4IJXcwTd6cn/yQJr38Wj0LMTLtqhF6a+PRqN3pdVJVxmu9NerazkDXC0PH6dtunaFw9DuuNRhuWA4VbwFtWe1hv61GQbzRuz13pOsbWKvfGTg1hg3tyG0NO6v2IRQpVr/wub12S2uLX+T0HMc57ul6YXBarts9rDEUDncwGPTdHIXGjGKkeOna2G4fW3oULYK6VrvVr9E3C2B1HUeDpSUWZ+TYAztnEQhgxSmKh8Xt9ntDNUhs+u/0XVdbQFFqf9e1B60aw4b1eHA8PMpJCAORr3aFFO1rd7pHXTWKFEGjjlxBjYVsPmnUc6yOxkJKLCPHdbLGVt9E+ZJd2zfJw1fu5r0z92wzTVWzfNFUc/burc2KF9eYcbO71r7NXz8BN8LrdwphUzysCYeNMLCOBTbDw1p4uA3e1/1+U7xfE+43wvh1LH4zvF8L72+Dp7qfNsXTmnDaCEPrWGgzPK2Fp9vghe4XTfGiJlw0wog6FtEML2rhxTZ4ovtJUzypCSeNMKSOhTTDk1p4sgWeoXu55ct2CvLpSpgWKffkcjeb6xAnQNO5AALlsjxYcE32xAwJkOleKJnyf3QPiAuHLDUdUb7WZRlgyaN6JgGHJI5ETkLlEccHUkmLXc8kkOcWE3ERhPlBSLkJEMof0/VNyv2YGu/z6WYzlfjpbTImcsMO5B42O4kkP41SfQ6dbJ1zOkhXfDQ7RAFsZocQgs0gWu3TKj8hNAu7g3Lr/uB+WMoBkocxhk7kh7xbhX8vF435YSAXTf4d72fVNiNYrI2y2pEHQ1s9BurF5eGB3Tqw7R9be69PVmfEJ8Y3xrfGd4ZtdIzXxhvj1LgwoIGN34zfjT92f939c/ev3b8frI8freZ8bVSu3X/+BwQYl9I=</latexit>x

<latexit sha1_base64="3qjB0dabcOfw8YJ0IzkHg/QotWc=">AAAPFnicfZfdbts2GIbV/XbZmrXbyYCdCAsKdEMQSK3jn4MCtWQbxbC2Wdc03aogoGhaEUKJHEk5djUNu4vdwE63O9jhsNOd7gJ2H6NkW5ZIyToJw/fl60cfJZufT3HIhWX9e+Ott9959733b36w9+FHt/Y/vn3nk5ecJAyiU0gwYa98wBEOY3QqQoHRK8oQiHyMzvwrN9fP5ojxkMQvxJKi8wgEcTgLIRBy6uL2Zz9eePQyvOdxEFGMfvJkZOotsi8vbh9YR1ZxmfrAXg8OjPV1cnHn1n/elMAkQrGAGHD+2raoOE8BEyHEKNvzEo4ogFcgQK8TMeufp2FME4FimJl3pTZLsCmImVOa05AhKPBSDgBkoUww4SVgAAp5L3v1KI5iECF+OJ2HlK+GfB6sBgLIQpyni6JQWW1hGjAg7xwuamQpiHgExKU2yZeRX59ECUZsHtUnc0rJqDgXiMGQ5zU4kYV5RvPa8xfkZK1fLuklinmWJgxn1YVSQIyhmVxYDDkSCU2Lm5EbfsUfCpagw3xYzD0cAXb1HE0PZU5too4zwwSI+pQf5cWJ0TUkUQTiaerRLPUEWojUOzzKpHjX9LE0+wSwad35PEtTL6+Z75vPpbUmPq2IT1VxXBHH6w8heGrOCDPncv8J46Y0mtLCQoh4ffVpuXpmnqrRLyviS1U8q4hnqugnFTXR1HlFnWvqdUW9VtVFRVyo4rIiLlXxTUV8o4qvdsV+vyv2ByVW1l++FMs9b4pm8vukeITSK5ilj188+SZLB8W1fhYSZNp1I/Q3xgeTbm/Qy1QZb/TOpG87I10vDT1naLtNhtIx7LnWaLxlua94S2jL6o6HXTUK4q3eH7gTXd/CWsPeyGkwbGknbmfcW5cPoVixBqXPHXQ7WlmCMmfgOM7xQNdLg9Nx3f79BkPpcEej0dAtUGjC5Be64qUbY7d7bOlRtAzqW93OsEHfboDVdxwNllZYnIljj+yCRSCAFacoHxa3PxyM1SCxrb8zdF1tA0Wl/H3XHnUaDFvW49Hx+EFBQhiIA7UqpCxft9d/0FejSBk06ckd1FjI9pMmA8fqaSykwjJxXCcvbP1NlC/Za/s8XX3lbt8788A2s0w1yxdNNefv3saseHGDGbe7G+27/M0LcCu8fqcQtsXDhnDYCgObWGA7PGyEh7vgA90ftMUHDeFBK0zQxBK0wweN8MEueKr7aVs8bQinrTC0iYW2w9NGeLoLXuh+0RYvGsJFK4xoYhHt8KIRXuyCJ7qftMWThnDSCkOaWEg7PGmEJzvgGbqWR778pCCfrpRpkfJMLk+zhQ5xCjSdCyBQIcvOgmuyLy6RALnuR5Kp+Ef3gKR0yKGmI8o3uhyGWPKonmnIIUliUZBQ2eMEQCpZeeqZhrJvMREXYVS0RspNFN3R5ibleUyND/hse5hKg+wi9Yg8sAN5hs07kfS7SaavodOda05G2ZqP5k0UwGbehBBshvH6nFb7CaF52BWUR/eVe7WVIySbMYaeyA95tg7/Sm4aC6JQbpr86x3mo11GsNgY5WhPNoa22gbqg7P7R3bnyLa/7Rw8+vqXVY940/jc+MK4Z9hGz3hkPDZOjFMDGj8bvxm/G3/s/7r/5/5f+3+vrG/dWPeVnxq1a/+f/wG0DZ84</latexit>

q�(z|x) SELECT ?name WHERE { ?person a dbp:author ?person dbp:name ?name ?person db:wrote ?book ?book dbp:name “The Hobbit”}

<latexit sha1_base64="ytaGjQ3HxJR6GVJlezzzjEx+HxU=">AAAPA3icfZdNb9s2HMbV7q3L1rTdjrsICwoMQxBIieOXQ4Faso0eljbL8tItDgKKphXBlEiQlGNX0G1fYNftG+w47LoPsg+w7zFKsWWJlKxLGD4PH/30pyiTHsUBF5b176PHH338yaefPfl854svn+4+e/7iq0tOYgbRBSSYsPce4AgHEboQgcDoPWUIhB5GV97MzfSrOWI8ING5WFJ0EwI/CqYBBEJ2XY05CClGt8/3rAMrv0y9Ya8ae8bqOr198fS/8YTAOESRgBhwfm1bVNwkgIkAYpTujGOOKIAz4KPrWEy7N0kQ0VigCKbmS6lNY2wKYmZI5iRgCAq8lA0AWSATTHgHGIBCgu9UoziKQIj4/mQeUP7Q5HP/oSGAfOqbZJFXJa0MTHwG6F0AFxWyBIQ8BOJO6+TL0Kt2ohgjNg+rnRmlZFScC8RgwLManMrCvKNZofk5OV3pd0t6hyKeJjHDaXmgFBBjaCoH5k2OREyT/GHk7M74K8FitJ81875XA8BmZ2iyL3MqHVWcKSZAVLu8MCtOhO4hCUMQTZIxTZOxQAuRjPcPUim+ND0szR4BbFJ1nqVJMs5q5nnmmbRWxLcl8a0qDkvicHUTgifmlDBzLuefMG5KoyktLICIV0dfFKOn5oUafVkSL1XxqiReqaIXl9RYU+clda6p9yX1XlUXJXGhisuSuFTFDyXxgyq+3xb787bYX5RYWX+5KJY74wmayo9H/golM5gmb85PfkiTXn6t3oUYmXbVCL218WjU7vQ6qSrjtd4adW1noOuFoeP0bbfOUDj6HdcaDDcsh4q3gLas9rDfVqMg3ujdnjvS9Q2s1e8MnBrDhnbktoadVfkQihSrX/jcXrullcUvcnqO4xz3dL0wOC3X7R7WGAqHOxgM+m6OQmMmP+OKl66N7faxpUfRIqhrtVv9Gn0zAVbXcTRYWmJxRo49sHMWgQBWnKJ4WdxuvzdUg8Sm/k7fdbUJFKXyd1170KoxbFiPB8fDo5yEMBD5alVIUb52p3vUVaNIETTqyBnUWMjmTqOeY3U0FlJiGTmukxW2uhLlIru2b5KHT+5m3Zl7tpmmqlkuNNWcrb21WfHiGjNudtfat/nrB+BGeP1JIWyKhzXhsBEG1rHAZnhYCw+3wfu632+K92vC/UYYv47Fb4b3a+H9bfBU99OmeFoTThthaB0LbYantfB0G7zQ/aIpXtSEi0YYUccimuFFLbzYBk90P2mKJzXhpBGG1LGQZnhSC0+2wDN0L7d82U5Bvl0J0yLlnlzuZnMd4gRoOhdAoFwmOOGa7Ik7JECme6Fkyv/RPSAuHLKp6YjytS6bAZY8qmcScEjiSOQkFCdjH0glLXY9k0CeW0zERRDm5yDlIfIz0foh5X5Mjff5dLOZSvz0NhkTuWEHcg+bnUSSn0apPoZOto45HaQrPpodogA2s0MIwWYQrfZplZ8QmoXNoNy6P7gfpnKA5GGMoRN5k3er8O/lpDE/DOSkyb/j/ay1zQgWa6Ns7ciDoa0eA/XG1eGB3Tqw7R9be69PVmfEJ8Y3xrfGd4ZtdIzXxhvj1LgwoDEzfjN+N/7Y/XX3z92/dv9+sD5+tBrztVG5dv/5Hw1/l0A=</latexit>z

Executing queries is non-differentiable… Continuous relaxation isn’t possible!

<latexit sha1_base64="7eHQXERXRpCNg9XYo48iDs3MBF8=">AAAO/XicfZdNb9s2HMbV7q3L1qzdjrsICwoMQxBIieOXQ4Faso1iWNs0y0u3OAgomlaEUCJBUo5dQcPuu27fYMdh132WfYB9j1GyLUukZF3yD5+Hj3/6U7JJj+KAC8v698HDDz786ONPHn2689nnj3e/ePL0ywtOYgbROSSYsHce4AgHEToXgcDoHWUIhB5Gl96dm+mXM8R4QKIzsaDoOgR+FEwDCIQcent282TPOrDyy9QLe1XsGavr5Obp4//GEwLjEEUCYsD5lW1RcZ0AJgKIUbozjjmiAN4BH13FYtq9ToKIxgJFMDWfSW0aY1MQM4MxJwFDUOCFLABkgUww4S1gAAqJvFON4igCIeL7k1lA+bLkM39ZCCDv9zqZ5/1IKxMTnwF6G8B5hSwBIQ+BuNUG+SL0qoMoxojNwupgRikZFeccMRjwrAcnsjFvaNZifkZOVvrtgt6iiKdJzHBanigFxBiayol5yZGIaZLfjFzXO/5csBjtZ2U+9nwA2N0pmuzLnMpAFWeKCRDVIS/MmhOhe0jCEESTZEzTZCzQXCTj/YNUis9MD0uzRwCbVJ2naZKMs555nnkqrRXxdUl8rYrDkjhcfQjBE3NKmDmT608YN6XRlBYWQMSrs8+L2VPzXI2+KIkXqnhZEi9V0YtLaqyps5I609T7knqvqvOSOFfFRUlcqOL7kvheFd9ti/1pW+zPSqzsv3wpFjvjCZrKr438EUruYJq8PHv1Q5r08mv1LMTItKtG6K2NR6N2p9dJVRmv9daoazsDXS8MHadvu3WGwtHvuNZguGE5VLwFtGW1h/22GgXxRu/23JGub2Ctfmfg1Bg2tCO3Neys2odQpFj9wuf22i2tLX6R03Mc57in64XBablu97DGUDjcwWDQd3MUGjOKkeKla2O7fWzpUbQI6lrtVr9G3yyA1XUcDZaWWJyRYw/snEUggBWnKB4Wt9vvDdUgsem/03ddbQFFqf1d1x60agwb1uPB8fAoJyEMRL7aFVK0r93pHnXVKFIEjTpyBTUWsvmkUc+xOhoLKbGMHNfJGlt9E+VLdmVfJ8uv3M17Z+7ZZpqqZvmiqebs3VubFS+uMeNmd619m79+Am6E1+8UwqZ4WBMOG2FgHQtshoe18HAbvK/7/aZ4vybcb4Tx61j8Zni/Ft7fBk91P22KpzXhtBGG1rHQZnhaC0+3wQvdL5riRU24aIQRdSyiGV7Uwott8ET3k6Z4UhNOGmFIHQtphie18GQLPEP3csuX7RTk05UwLVLuyeVuNtchToCmcwEEymWCE67JnrhFAmS6F0qm/B/dA+LCIUtNR5SvdVkGWPKonknAIYkjkZNQnIx9IJW02PVMAnluMREXQZifgJSbAKH8MV3fpNyPqfE+n242U4mf3iRjIjfsQO5hs5NI8uMo1efQydY5J4N0xUezQxTAZnYIIdgMotU+rfITQrOwOyi37kv3cikHSB7GGHolP+TNKvw7uWjMDwO5aPLveD+rthnBfG2U1Y48GNrqMVAvLg8P7NaBbb9t7b34/tflGfGR8bXxjfGtYRsd44Xx0jgxzg1oIOM343fjj91fdv/c/Wv376X14YPVufIro3Lt/vM/62OU9w==</latexit>

T

<latexit sha1_base64="vK6hdGoGjJSy1TXBl+qkk3b4b1M=">AAAPHXicfZfdbuNEGIa9y99S2NKFQw6wqFZaUFXZbZqfg5U2dhKtELtbSv+grqrxZOJaHXtGM+M0WWOJE26DG+AU7oBDxCniArgPxk7i2GM7Pul03ndeP/4mTuZzKfa5MIx/Hzx859333v/g0YdbH338ePuTnSefnnMSMYjOIMGEXbqAI+yH6Ez4AqNLyhAIXIwu3Ds71S+miHGfhKdiTtF1ALzQn/gQCDl1s/MFvXFccYsEeOZAD8fz5CfdkbGxM0v29NOvbnZ2jX0ju/TqwFwOdrXldXzz5PF/zpjAKEChgBhwfmUaVFzHgAkfYpRsORFHFMA74KGrSEy617Ef0kigECb6U6lNIqwLoqew+thnCAo8lwMAmS8TdHgLGIBCPtJWOYqjEASI742nPuWLIZ96i4EAsh7X8SyrV1JaGHsM0FsfzkpkMQh4AMRtZZLPA7c8iSKM2DQoT6aUklFxzhCDPk9rcCwL84amW8BPyfFSv53TWxTyJI4YTooLpYAYQxO5MBtyJCIaZw8j9/2OPxcsQnvpMJt7PgDs7gSN92ROaaKMM8EEiPKUG6TFCdE9JEEAwnHs0CR2BJqJ2NnbT6T4VHexNLsEsHHZeZLEsZPWzHX1E2ktia8L4mtVHBbE4fImBI/1CWH6VO4/YVyXRl1amA8RL68+y1dP9DM1+rwgnqviRUG8UEU3KqhRRZ0W1GlFvS+o96o6K4gzVZwXxLkqvi2Ib1XxclPsD5tif1RiZf3lSzHfcsZoIr9Wso9QfAeT+OXpq2+TuJddy89ChHSzbITuyng4and6nUSV8UpvjbqmNajquaFj9U27zpA7+h3bGAzXLAeKN4c2jPaw31ajIF7r3Z49quprWKPfGVg1hjXtyG4NO8vyIRQqVi/32b12q1IWL8/pWZZ11KvqucFq2Xb3oMaQO+zBYNC3MxQaMYqR4qUrY7t9ZFSjaB7UNdqtfo2+3gCja1kVWFpgsUaWOTAzFoEAVpwi/7DY3X5vqAaJdf2tvm1XNlAUyt+1zUGrxrBmPRocDQ8zEsJA6KlVIXn52p3uYVeNInnQqCN3sMJC1nca9SyjU2EhBZaRZVtpYctvonzJrszrePGVu37v9F1TTxLVLF801Zy+eyuz4sU1ZtzsrrVv8tcvwI3w1SeFsCke1oTDRhhYxwKb4WEtPNwE71X9XlO8VxPuNcJ4dSxeM7xXC+9tgqdVP22KpzXhtBGG1rHQZnhaC083wYuqXzTFi5pw0Qgj6lhEM7yohReb4EnVT5riSU04aYQhdSykGZ7UwpMN8AzdyyNfelJI2wRWiZRncnmazXSIY1DRuQACZbJsLnhFXvQgqe4Gkin7p+oBUe6Qw4qOKF/pcuhjyaN6xj6HJApFRkJlm+MBqST5qWfsy75FR1z4QdYhKQ8BAvljunpIeR5T4z0+WR+mYi+5iR0iD+xAnmHTTiT+fpRU19DxxjXHg2TJR9MmCmA9bUII1v1weU4r/YTQNOwOyqP7wr3YygGSzRhDr+RN3izDv5abxrzAl5sm/zp76WiTEcxWRjnako2hqbaB1cHFwb7Z2jfN71q7L775edEjPtI+177Unmmm1tFeaC+1Y+1Mg9ov2m/a79of279u/7n91/bfC+vDB8u+8jOtdG3/8z8br6GZ</latexit>

p✓(y|x, T)

As an example, we present a neuro-symbolic approach to question-answering. Assume we have a dataset of question-answer pairs written in natural language. Given a question x, we encode it to a distribution over queries q_\phi(z|x). From this distribution, we sample a query that represents the question.

Next, we execute the query on the knowledge base to find the extracted data. Note that executing this query is not a differentiable operation!

Finally, we answer the question both by using the question and the retrieved data.

This model is very hard to train. This is because of the non-differentiability of executing the query on the dataset. However, we need to find the parameters \phi so we can learn how to extract queries from an answer. We can use gradient estimation here, although this is hard: We cannot use continuous relaxations either because executing queries is non-differentiable.

Storchastic: Define computation graph with sampling steps. Compute gradient estimators automatically!

• PyTorch library with easy API • Many low-variance estimators implemented • Focus on discrete distributions

STORCHASTIC

56https://github.com/HEmile/storchastic

We have been working on a PyTorch library for gradient estimation. With this library, you can define a computation graph with sampling steps (~), and it computes correct gradient estimators automatically. This allows people to develop stochastic deep learning models with a simple API. We have implemented state of the art gradient estimators with low variance, that can be plugged and played with. In particular, we have focused on gradient estimators for discrete distributions to allow people to develop neuro-symbolic models as discussed on the previous slide.

Page 20: Lecture 9: Reinforcement Learning Emile van Krieken · 2021. 1. 25. · Emile van Krieken Deep Learning 2020 Lecture 9: Reinforcement Learning dlvu.github.io Reinforcement learning

THESIS PROJECTS

57https://github.com/HEmile/storchastic

Master thesis projects with Storchastic • Designing new gradient estimators • Knowledge Graph walking • Question answering with querying • Model-based RL with discrete latent space • Logical reasoning as a stochastic hidden layer • Neural network pruning • … (your idea here)

Contact [email protected] if interested

RECAP

58

Score function: general, but high variance Reparameterization: move sampling out of computation graph

• Uses pathwise derivative with low variance • Requires appropriate continuous distribution • Requires differentiable functions

Gumbel softmax: Continuous approximation for discrete distributions But biased!

Next lecture: Variance reduction for policy gradients

[email protected] https://github.com/HEmile/storchastic

THANK YOU!

59https://github.com/HEmile/storchastic