31
Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL Deep Learning at Supercomputer Scale | NIPS Workshop

Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

  • Upload
    others

  • View
    19

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Deep Reinforcement Learning at ScaleTimothy Lillicrap

Research Scientist, DeepMind & UCL

Deep Learning at Supercomputer Scale | NIPS Workshop

Page 2: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

What is Reinforcement Learning?

Supervised Learning Reinforcement Learning

Fixed dataset Data depends on actions taken in environment

Page 3: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Formalizing the Agent-Environment Loop

Environment

Actions

Observations

RewardsAgent

Neural Network(s)

Page 4: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Advantage Actor-Critic (A3C)

Mnih et al., ICML 2016

Page 5: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

A Single Trial (with Advantage Actor-Critic)

Mnih et al., ICML 2016

Time

Page 6: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Combating Variance: Advantage Actor-Critic

Mnih et al., ICML 2016

Page 7: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Scaling Reinforcement Learning (A3C)

Actor / LearnerParameter

ServerGradients

Parameters

Mnih et al., ICML 2016

Page 8: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Scaling Reinforcement Learning

Replay BufferLearner(s)

Actors

Parameters Experience+

Initial Priorities

Updated Priorities

Experience

Horgan et al., 2017 & Schaul et al. 2015

Page 9: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Off-policy Actor-Critic for Continuous Actions

Lillicrap et al., ICLR 2016

Page 10: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Hoffman, Barth-Maron et al., 2017

Distributional Distributed DDPG (D4PG)

Page 11: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Hoffman, Barth-Maron et al., 2017

Page 12: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Hoffman, Barth-Maron et al., 2017

Page 13: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Hoffman, Barth-Maron et al., 2017

Page 14: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Hoffman, Barth-Maron et al., 2017

Page 15: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Hoffman, Barth-Maron et al., 2017

Page 16: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Hoffman, Barth-Maron et al., 2017

Page 17: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Hoffman, Barth-Maron et al., 2017

Distributional Distributed DDPG (D4PG)

Page 20: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Silver, Huang et al., Nature, 2016

Playing Go with Deep Networks and Planning

Use environment model in order to plan!

Page 21: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Silver, Huang et al., Nature, 2016

Training Policy and Value Networks

Page 22: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Silver, Huang et al., Nature, 2016

Planning with an Environment Model & MCTS

Page 23: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Silver, Huang et al., Nature, 2016

Planning with an Environment Model

Page 24: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Silver, Schrittwieser, Simonyan, et al. Nature, 2017

Playing Go with Without Human Knowledge

Page 25: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Silver, Schrittwieser, Simonyan, et al. Nature, 2017

Playing Go with Without Human Knowledge

z

Page 26: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Silver, Schrittwieser, Simonyan, et al. Nature, 2017

Playing Go with Without Human Knowledge

Page 27: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Silver, Schrittwieser, Simonyan, et al. Nature, 2017

Playing Go with Without Human Knowledge

Page 28: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Silver, Schrittwieser, Simonyan, et al. Nature, 2017

Playing Go with Without Human Knowledge

Page 29: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Silver, Schrittwieser, Simonyan, et al. Nature, 2017

Playing Go with Without Human Knowledge

Page 30: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Silver, Schrittwieser, Simonyan, et al. Nature, 2017

Playing Go with Without Human Knowledge

Page 31: Deep Reinforcement Learning at Scale - GitHub Pages · Deep Reinforcement Learning at Scale Timothy Lillicrap Research Scientist, DeepMind & UCL ... Scaling Reinforcement Learning

Questions?