25
Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu Presented by: Pihel Saatmann

Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,

  • Upload
    others

  • View
    31

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,

Asynchronous Methods for Deep Reinforcement Learning PaperbyVolodymyrMnih,AdriàPuigdomènechBadia,MehdiMirza,

AlexGraves,TimothyP.Lillicrap,TimHarley,DavidSilver,KorayKavukcuoglu

Presentedby:PihelSaatmann

Page 2: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,

Reinforcement learning

•  State–„snapshot“oftheenvironment

• Ac'on–leadstonewstate,someNmesreward

• Reward–Nmedelayed,sparse

• Policy–rulesforchoosingacNon

Page 3: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,

So far

•  ThoughtthatonlineRLalgorithmswithdeepNN-sareunstable.• Problems-correlatedandnon-staNonaryinputdata.

•  Tocountertheseproblemsdatacanbestoredinexperiencereplaymemory.•  Thisusesmorememory/computaNonalpower.

• DeepRLmethodsrequirespecializedhardware(GPUs)ormassivedistributedarchitectures.

Page 4: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,

Q-learning

• AteachNmestept,theagentreceivesastatestandselectsanacNonaaccordingtoitspolicyπ.Thentheagentgetsthenextstatest+1andascalarrewardrt.•  Thegoalistomaximizetheexpectedreturnfromeachstatest.• QfuncNonesimatestheacNon’svalue.•  EachNmetheagentdoesanacNontheQvalueisupdated.• Off-policymethod–updaNngQfndoesnotdependonpolicy.

Page 5: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,

Asynchronous RL framework

•  InsteadofexperiencereplaytheyasynchronouslyexecutemulNpleagentsinparallelonmulNpleinstancesoftheenvironment.

• Parallelactor-learnershaveastabilizingeffectontraining.• RunsonasinglemachinewithastandardmulN-coreCPU.

Page 6: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,

Asynchronous RL framework II

• AsyncvariantsoffourstandardRLalgorithms:•  1-stepQ-learning•  N-stepQ-learning•  1-stepSarsa•  Advantageactor-criNc(A3C)

Page 7: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,

1-step Q-learning

• NNisusedtoapproximatetheQ(s,a;Θ)funcNon.•  Theparameters(weights)ΘarelearnedbyiteraNvelyminimizingasequenceoflossfuncNons,wherethei-thlossfuncNonisdefinedas:

Page 8: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,

Async 1-step Q-leaning

•  Eachthreadhasowncopyofenvironment.• AteachstepcomputesagradientoftheQ-learningloss.• AccumulategradientsovermulNpleNmestepsbeforeapplying.•  Sharedandslowlychangingtargetnetwork.

Page 9: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,

Asynchronous 1-step Sarsa

•  Sameas1-stepQ-learning,butusesadifferenttargetvalue:

Page 10: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,

Asynchronous n-step Q-learning

• PotenNallyfasterwaytopropagaterewards.• Uses‘forward-view’-selectsacNonsusingitspolicyforuptonstepsinthefuture.• Receivesuptotmaxrewardssincelastupdate.•  Totalaccumulatedreturn:• ValuefnisupdatedadereverytmaxacNonsoraderterminalstate.•  Foreachupdateusesthelongestpossiblen-stepreturn.

Page 11: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,
Page 12: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,

Asynchronous advantage actor-criBc

• On-policymethod-hasapolicyandesNmatedvaluefuncNon.• Uses‘forward-view’.• Receivesuptotmaxrewardssincelastupdate.• Policyandvaluefn-sareupdatedadereverytmaxacNonsoraderterminalstate.•  Foreachupdateusesthelongestpossiblen-stepreturn.

Page 13: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,
Page 14: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,

Performance evaluaBon

•  Fourdifferentplaforms:•  Atari2600-differentgames•  TORCS3D-carracingsimulator• MuJoCo-physicssimulatorforconNnuousmotorcontrol(A3Conly)•  Labyrinth-findingrewardsinrandomlygenerated3Dmazes(A3Conly)

Page 15: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,

Atari 2600 games

• AllfourmethodscansuccessfullytrainNNcontrollers.• AsyncmethodsmostlyfasterthanDQN(DeepQ-Network).• Advantageactor-criNcwasthebest.

Page 16: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,

Async A3C on 57 atari games

Page 17: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,

TORCS Car Racing Simulator

•  EvaluatedonlytheA3Calgorithm.• Agenthadtodrivearacecarusingonlyrawpixelsasinput.• Duringtraining,theagentwasrewardedformaintaininghighvelocityalongthecenteroftheracetrack.

hgps://youtu.be/0xo1Ldx3L5Q

Page 18: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,
Page 19: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,

MuJoCo Physics Simulator

•  EvaluatedonlytheA3Calgorithm.• Rigidbodyphysicswithcontactdynamics.• ConNnuousacNons.•  InallproblemsA3CfoundgoodsoluNonsinlessthan24hoursoftraining(typicallyafewhours).

hgps://youtu.be/0xo1Ldx3L5Q

Page 20: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,

Labyrinth

•  Theagentwasplacedinrandommazeandhad60stocollectpoints.• Apples–1point• Portals–10points,respawnedapplesandagentinrandomlocaNons• Visualinputonly.

•  Theagentlearnedaresonablygoodgeneralstrategyforexploringrandommazes.

hgps://youtu.be/nMR5mjCFZCw

Page 21: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,

Scalability

•  Theframeworkscaleswellwiththenumberofparallelworkers.•  Evenshowssuperlinearspeedupsforsomemethods.

Page 22: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,
Page 23: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,
Page 24: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,

Robustness and stability

•  Trainedmodelsonfivegamesusing50differentlearningratesandrandominiNalizaNon.•  EachgameandalgorithmcombinaNonhadarangeoflearningratesforwhichallrandominiNalizaNonsachievedgoodscores.•  Stabilityindicatedbyvirtuallyno0scoresinregionswithgoodlearningrates.

Page 25: Asynchronous Methods for Deep Reinforcement Learning · Asynchronous Methods for Deep Reinforcement Learning Paper by Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza,

To summarize

• Asynchronousmethodsforfourstandardreinforcementlearningalgorithms(1-stepQ,n-stepQ,1-stepSARSA,A3C).• Abletotrainneuralnetworkcontrollersonavarietyofdomainsinstablemanner.• Usingparallelactorlearnerstoupdateasharedmodelstabilizedthelearningprocess(alternaNivetoexperiencereplay).•  InAtarigamestheadvantageactor-criNc(A3C)surpassedthecurrentstate-of-the-artinhalfthetrainingNme.•  Superlinearspeedupwhenincreasingthreadcountfor1-stepmethods.