One Step Fits All

1. OneStepFitsAllFittedQIteration with XCS
Daniele Loiacono

2. XCS(F) in multistep problems
XCS(F) was successfully applied to complex and largesingle step problems.
In contrast, even rather simplemultistep problems might be very challenging for XCS(F)
Connections with methods of generalized reinforcement learning have been widely studied and so common issues:
over-generalization
unstable learning process
divergence (with computed prediction)
Advanced prediction mechanisms (e.g. Tile Coding) generally help but do not provide any guarantee
3. XCS(F) searches for the best generalizations in the problem space
Generalizations might prevent from learning the optimal payoff landscape
The payoff landscape learned affects the search for generalizations in the problem space
4. What is this talk about?
We introduce an alternative approach to multistep problems
based on Fitted Q Iteration
involving a sequence of single step problems
We will show only some preliminary results to test the presented approach
Agenda
Fitted Q Iteration
Fitted Q Iteration + XCS
Preliminary Results
Discussion
Future Works
5. Fitted Q Iteration (Ernst et al., 2005)
Qi(s,a)
rt+1
Agent
st
delay
at
st+1
Problem
{}
Learner
6. Fitted Q Iteration (Ernst et al., 2005)
Q1(s,a)
Q2(s,a)
QL(s,a)

{}
7. Fitted Q Iteration +XCS
XCS is applied to the target multistep problem
The interaction between XCS and the problem is sampled
A sequence of single step regression problems is generated
the state is the concatenation of the state and the action of the original multistep problem
no actions
training set is built for all the pairs collected
test set is built for all the collected
XCS is applied iteratively to each single step problem generated
Qi(s,a) is computed as the system prediction on the test set
8. Experimental Design
Woods 14
Woods 1
Maze 5
Maze 6
9. Experimental Results: Woods 1
XCS +
Sampling for 50 problems
11. Experimental Results: Maze 5
XCS +
XCS +
XCS +
17. Discussion
Fitted Q Iteration + XCS offers several advantages
efficient learning
generalization over the action space
However
no real-time learning
assumes a static environment
how to perform a good problem space sampling and how does it affect the performance?
how does XCS compares to other supervised learning techniques in this task?
18. Future Works
IntegratingFitted Q-Iteration and XCS in an incremental/iterated fashion
Test on more challenging problems that requiresgeneralization (e.g., Butz and Lanzi, 2010)
Investigate sampling strategies
Extends XCS based on some principles of Fitted Q Iteration?
19. Some hints about problem sampling
22. Results of a bad sampling on Woods 1
23. Results of a bad sampling on Woods 1

Education

One Step Fits All