26
FeUdal Networks for Hierarchical Reinforcement Learning by Artem Bachysnkyi Computational Neuroscience Seminar University of Tartu 3 May 2017

FeUdalNetworks for Hierarchical Reinforcement Learning

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: FeUdalNetworks for Hierarchical Reinforcement Learning

FeUdal NetworksforHierarchicalReinforcementLearning

byArtem BachysnkyiComputationalNeuroscienceSeminar

UniversityofTartu3May2017

Page 2: FeUdalNetworks for Hierarchical Reinforcement Learning

Reinforcementlearning

The basic reinforcement learning model consists of:

• a set of environment and agent states S • a set of actions A of the agent• policies of transitioning from states to actions• rules that determine the scalar immediate reward of a

transition • rules that describe what the agent observes.

Page 3: FeUdalNetworks for Hierarchical Reinforcement Learning

ATARIgames

Page 4: FeUdalNetworks for Hierarchical Reinforcement Learning

Standart approach

• useanaction-repeatheuristic,whereeachactiontranslatesintoseveralconsecutiveactionsintheenvironment

• notapplicableinnon-Marcovian environmentsthatrequirememory• can’tlearnontheweakrewardsignal

Page 5: FeUdalNetworks for Hierarchical Reinforcement Learning

Feudalreinforcementlearningintuition

• levelsofhierarchywithinanagentcommunicateviaexplicitgoals• goalscanbegeneratedinatop-downfashion• goalsettingcanbedecoupledfromgoalachievement

Page 6: FeUdalNetworks for Hierarchical Reinforcement Learning

Manager-Workermodel

Manager:• setsgoalsatalowertemporalresolution

Worker:• operatesatahighertemporalresolution

• producesprimitiveactions• followsthegoalsbyanintrinsicreward

Page 7: FeUdalNetworks for Hierarchical Reinforcement Learning

Mainproposals

• aconsistent,end-to-enddifferentiablemodel• approximatetransitionpolicygradientupdatefortrainingtheManager• useofgoalsthataredirectionalratherthanabsolute• dilatedLSTMfortheManagerRNNdesign

Page 8: FeUdalNetworks for Hierarchical Reinforcement Learning

FuN modeldescription

Page 9: FeUdalNetworks for Hierarchical Reinforcement Learning

FuN modeldescription

ℎ",ℎ# – internalstates𝑈% – workersoutput𝜙 – maps𝑔% into𝑤%𝜋 – vectorofprobabilitiesoverprimitiveactions

𝑠% – latentstaterepresentation𝑔% – goalvector𝑥% – observationfromtheenvironment𝑧% – sharedintermediaterepresentation

Page 10: FeUdalNetworks for Hierarchical Reinforcement Learning

Learning

Learningsteps:1. receivesanobservationfromtheenvironment2. selectanactionfromafiniteset3. theenvironmentrespondswithanewobservationandascalar

reward4. theprocesscontinuesuntiltheterminalstateisreached

Page 11: FeUdalNetworks for Hierarchical Reinforcement Learning

LearningBadidea:

trainfeudalnetworkend-to-endusingapolicygradientalgorithmoperatingontheactionstakenbytheWorker

Goodidea:independentlytrainManagertopredictadvantageousdirectionsinstatespaceandtointrinsicallyrewardtheWorkertofollowthesedirections

Page 12: FeUdalNetworks for Hierarchical Reinforcement Learning

Theagentsgoal

Maximizethediscountedreturn

where

Theagent’sbehaviour isdefinedbyitsaction-selectionpolicyπ.FuN producesadistributionoverpossibleactions.

Page 13: FeUdalNetworks for Hierarchical Reinforcement Learning

Managersupdaterule

where

– valuefunctionestimatefromtheinternalcritic

– cosinesimilarity

– advantagefunction

Page 14: FeUdalNetworks for Hierarchical Reinforcement Learning

Workersintrinsicreward

where

𝑐 – horizon

Page 15: FeUdalNetworks for Hierarchical Reinforcement Learning

TheWorkerspolicy

Advantageauthorcritic

Advantagefunction

Page 16: FeUdalNetworks for Hierarchical Reinforcement Learning

Architecturedetails

𝑓012310%– ConvolutionalNeuralNetwork:1. 168x8filters,stride42.324x4fil- ters ofstride23.fullyconnectedlayerhas256hiddenunits*eachlayerisfollowedbyarectifiednon-linearity

𝑓"40531 – anotherfullyconnectedlayer𝑓#266 – standardLSTM𝑓"266 – dilatedLSTM

Page 17: FeUdalNetworks for Hierarchical Reinforcement Learning

FuN modeldescription

Page 18: FeUdalNetworks for Hierarchical Reinforcement Learning

DilatedLSTMStateofthenetworkwith𝑟 separategroupsofsub-states

Attime𝑡 wecanindicatewich groupofcoresisupdated

Ateachtimesteponlythecorrespondingpartofthestateisupdatedandtheoutputispooledacrossthepreviouscoutputs.ThisallowsthergroupsofcoresinsidethedLSTM topreservethememoriesforlongperiods.

*Intheexperimentsr=10.

Page 19: FeUdalNetworks for Hierarchical Reinforcement Learning

Experiments:ATARI

Page 20: FeUdalNetworks for Hierarchical Reinforcement Learning

Experiments:Montezuma’srevenge

https://www.youtube.com/watch?v=_zbg9rs5QZY

Page 21: FeUdalNetworks for Hierarchical Reinforcement Learning

Experiments:Montezuma’srevenge

Page 22: FeUdalNetworks for Hierarchical Reinforcement Learning

Experiments:Non-matchandT-maze

Page 23: FeUdalNetworks for Hierarchical Reinforcement Learning

Experiments:Watermaze

Page 24: FeUdalNetworks for Hierarchical Reinforcement Learning

Experiments:transitionpolicygradient

Page 25: FeUdalNetworks for Hierarchical Reinforcement Learning

Experiments:Temporalresolution

Page 26: FeUdalNetworks for Hierarchical Reinforcement Learning

Experiments:DilateLSTMagentbaseline