View
223
Download
0
Embed Size (px)
Citation preview
LEAP Algorithm Reinforcement Learning with Adaptive Partitioning
Tsufit CohenEyal RadianoSupervised by: Andrey Bernstein
Intro
Reinforcement Learning– Learn optimal policy by trying– Reward for “Good” steps– Performance improvement
סוכן
Q-learn
Definitions:
Key specification :– Table representation– Q לכתוב את הנוסחא של– במאמר6ואת ההגדרות זה מעמוד – Q כדי שנוכל להסביר מה זה
Exploration policy: epsilon greedy אולי לפצל את זה לשני שקפים
LEAP Learning Entity (LE) Adaptive Partitioning
Key specifications : – Macro States– Multi Partitioning
(each partition is called LE)
– Pruning and Joining
Update
'
2
'
, , , max ', ' ,
, , , max ', ' ,
, , 1
i i i ia A
i i i ia A
i i
Q s a Q s a R s a Q s a Q s a
s a s a R s a Q s a s a
v s a v s a
Changes and Add-ons to the Algorithm
Change the order of pruning and updating Epsilon Greedy policy starts from 0.9 Boundary condition – Q=0 for End of game.
Implementation
Key Operation :– Finding Active LE List for a given state– Finding a macro state within a LE– Add/Remove JLE and/or macro state
Data Structures– Basic LE– JLE
inheritance
LE
Basic LEJLE
CList<macrostate> Macro_listInt* ID_arr_
Int order
CList<JLE>* Sons_lists_arr
General Data Structure Implementation
Basic LE array: Basic LE
1 Basic LE
2 Basic LE
3
pointer to JLEs list in
order 1)empty(
pointer to JLEs list in
order 2
pointer to JLEs list in
order 3
Basic LE 1 - magnification:macro list, Id array, orderSons list array
3D Grid World Implementation Example
Basic LE array:
Basic LE X Basic LE Y Basic LE Z
Sons list array:
0 1 2
JLE XY
JLE XZ
JLE XYZ
Sons list array:
0 1 2
JLE YZ
Sons list array:
0 1 2
Simulation 1 – 2D Grid World
Environment Properties: – Size: 20x20– Step cost: -1– Award: +2– Available Moves: Up, Down, Left, Right– Wall Bumping – No movement.– Award Taking – Start a new episode.– Basic LEs: X,Y
prize
Start point
חלוקה xלפי -
חלוקה yלפי -
0 2 4 6 8 10 12 14 16 18 20
-20
-18
-16
-14
-12
-10
-8
-6
-4
-2
0RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
LEFTLEFTLEFTLEFTLEFTLEFT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
UPUPUPUPUPUP
LEFTLEFTLEFTLEFTLEFTLEFT
LEFTLEFTLEFTLEFTLEFTLEFT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
UPUPUPUPUPUP
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
UPUPUPUPUPUP
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
UPUPUPUPUPUP
LEFTLEFTLEFTLEFTLEFTLEFT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
UPUPUPUPUPUP
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
LEFTLEFTLEFTLEFTLEFTLEFT
LEFTLEFTLEFTLEFTLEFTLEFT
DOWNDOWNDOWNDOWNDOWNDOWN
UPUPUPUPUPUP
LEFTLEFTLEFTLEFTLEFTLEFT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
UPUPUPUPUPUP
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
UPUPUPUPUPUP
LEFTLEFTLEFTLEFTLEFTLEFT
UPUPUPUPUPUP
DOWNDOWNDOWNDOWNDOWNDOWN
UPUPUPUPUPUP
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
UPUPUPUPUPUP
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
UPUPUPUPUPUP
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
UPUPUPUPUPUP
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
UPUPUPUPUPUP
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
UPUPUPUPUPUP
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
UPUPUPUPUPUP
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
UPUPUPUPUPUP
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
UPUPUPUPUPUP
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
Simulation 1 Results - Policystart
Prize
Results – Average Reward & refined macrostates count
0 500 1000 150020
40
60
80
100
120
140
160
Number of Trials ( x 50)
Num
ber
of M
acro
stat
es
Refined macrostates count Vs Number of Trials
0 100 200 300 400 500 600 700
-300
-250
-200
-150
-100
-50
Number of Trials ( x 50)
Ave
rage
Rew
ard
Average Reward Vs Number of Trials
600 650 700 750-60
-55
-50
-45
-40
-35
-30
-25
Number of Trials ( x 50)
Ave
rage
Rew
ard
Average Reward Vs Number of Trials
Simulation 2 – Grid Word with an obstacle
Environment Properties : – Size : 5x5– Step Cost: -1– Award: +2– Obstacle: -3
prize
start
Simulation 2 Results
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5-5
-4.5
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
RIGHTRIGHTRIGHTRIGHTRIGHTRIGHT
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
DOWNDOWNDOWNDOWNDOWNDOWN
0 0.5 1 1.5 2 2.5
x 104
-35
-30
-25
-20
-15
-10
-5
Number of Trials ( x 50)
Ave
rage
Rew
ard
Average Reward Vs Number of Trials
• Note: the policy changes – Due to Epsilon
start
LEAP vs Q-Learn
0 100 200 300 400 500 600 700-800
-700
-600
-500
-400
-300
-200
-100
0
Number of Trials ( x 50)
Ave
rage
Rew
ard
Avergae Reward Vs Number of Trials
LEAP with multi partition
Regular Q learning