23
Daniele Loiacono - IWLCS 2011 One Step Fits All Fitted Q Iteration with XCS Daniele Loiacono

One Step Fits All

Embed Size (px)

DESCRIPTION

Daniele Loiacono. "One Step Fits All: Fitted Q Iteration with XCS". IWLCS, 2011

Citation preview

  • 1. OneStepFitsAllFittedQIteration with XCS
    Daniele Loiacono

2. XCS(F) in multistep problems
XCS(F) was successfully applied to complex and largesingle step problems.
In contrast, even rather simplemultistep problems might be very challenging for XCS(F)
Connections with methods of generalized reinforcement learning have been widely studied and so common issues:
over-generalization
unstable learning process
divergence (with computed prediction)
Advanced prediction mechanisms (e.g. Tile Coding) generally help but do not provide any guarantee
3. XCS(F) searches for the best generalizations in the problem space
Generalizations might prevent from learning the optimal payoff landscape
The payoff landscape learned affects the search for generalizations in the problem space
4. What is this talk about?
We introduce an alternative approach to multistep problems
based on Fitted Q Iteration
involving a sequence of single step problems
We will show only some preliminary results to test the presented approach
Agenda
Fitted Q Iteration
Fitted Q Iteration + XCS
Preliminary Results
Discussion
Future Works
5. Fitted Q Iteration (Ernst et al., 2005)
Qi(s,a)
rt+1
Agent
st
delay
at
st+1
Problem
{}
Learner
6. Fitted Q Iteration (Ernst et al., 2005)
Q1(s,a)
Q2(s,a)
QL(s,a)

{}
7. Fitted Q Iteration +XCS
XCS is applied to the target multistep problem
The interaction between XCS and the problem is sampled
A sequence of single step regression problems is generated
the state is the concatenation of the state and the action of the original multistep problem
no actions
training set is built for all the pairs collected
test set is built for all the collected
XCS is applied iteratively to each single step problem generated
Qi(s,a) is computed as the system prediction on the test set
8. Experimental Design
Woods 14
Woods 1
Maze 5
Maze 6
9. Experimental Results: Woods 1
XCS +
Sampling for 50 problems
10. Experimental Results: Woods 1
11. Experimental Results: Maze 5
XCS +
Sampling for 25 problems
12. Experimental Results: Maze 5
13. Experimental Results: Maze 6
XCS +
Sampling for 15 problems
14. Experimental Results: Maze 6
15. Experimental Results: Woods 14
XCS +
Sampling for 15 problems
16. Experimental Results: Woods 14
17. Discussion
Fitted Q Iteration + XCS offers several advantages
efficient learning
generalization over the action space
However
no real-time learning
assumes a static environment
how to perform a good problem space sampling and how does it affect the performance?
how does XCS compares to other supervised learning techniques in this task?
18. Future Works
IntegratingFitted Q-Iteration and XCS in an incremental/iterated fashion
Test on more challenging problems that requiresgeneralization (e.g., Butz and Lanzi, 2010)
Investigate sampling strategies
Extends XCS based on some principles of Fitted Q Iteration?
19. Some hints about problem sampling
20. Some hints about problem sampling
21. Some hints about problem sampling
22. Results of a bad sampling on Woods 1
23. Results of a bad sampling on Woods 1