Upload
daniele-loiacono
View
702
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Daniele Loiacono. "One Step Fits All: Fitted Q Iteration with XCS". IWLCS, 2011
Citation preview
2. XCS(F) in multistep problems
XCS(F) was successfully applied to complex and largesingle step
problems.
In contrast, even rather simplemultistep problems might be very
challenging for XCS(F)
Connections with methods of generalized reinforcement learning have
been widely studied and so common issues:
over-generalization
unstable learning process
divergence (with computed prediction)
Advanced prediction mechanisms (e.g. Tile Coding) generally help
but do not provide any guarantee
3. XCS(F) searches for the best generalizations in the problem
space
Generalizations might prevent from learning the optimal payoff
landscape
The payoff landscape learned affects the search for generalizations
in the problem space
4. What is this talk about?
We introduce an alternative approach to multistep problems
based on Fitted Q Iteration
involving a sequence of single step problems
We will show only some preliminary results to test the presented
approach
Agenda
Fitted Q Iteration
Fitted Q Iteration + XCS
Preliminary Results
Discussion
Future Works
5. Fitted Q Iteration (Ernst et al., 2005)
Qi(s,a)
rt+1
Agent
st
delay
at
st+1
Problem
{}
Learner
6. Fitted Q Iteration (Ernst et al., 2005)
Q1(s,a)
Q2(s,a)
QL(s,a)
{}
7. Fitted Q Iteration +XCS
XCS is applied to the target multistep problem
The interaction between XCS and the problem is sampled
A sequence of single step regression problems is generated
the state is the concatenation of the state and the action of the
original multistep problem
no actions
training set is built for all the pairs collected
test set is built for all the collected
XCS is applied iteratively to each single step problem
generated
Qi(s,a) is computed as the system prediction on the test set
8. Experimental Design
Woods 14
Woods 1
Maze 5
Maze 6
9. Experimental Results: Woods 1
XCS +
Sampling for 50 problems
10. Experimental Results: Woods 1
11. Experimental Results: Maze 5
XCS +
Sampling for 25 problems
12. Experimental Results: Maze 5
13. Experimental Results: Maze 6
XCS +
Sampling for 15 problems
14. Experimental Results: Maze 6
15. Experimental Results: Woods 14
XCS +
Sampling for 15 problems
16. Experimental Results: Woods 14
17. Discussion
Fitted Q Iteration + XCS offers several advantages
efficient learning
generalization over the action space
However
no real-time learning
assumes a static environment
how to perform a good problem space sampling and how does it affect
the performance?
how does XCS compares to other supervised learning techniques in
this task?
18. Future Works
IntegratingFitted Q-Iteration and XCS in an incremental/iterated
fashion
Test on more challenging problems that requiresgeneralization
(e.g., Butz and Lanzi, 2010)
Investigate sampling strategies
Extends XCS based on some principles of Fitted Q Iteration?
19. Some hints about problem sampling
20. Some hints about problem sampling
21. Some hints about problem sampling
22. Results of a bad sampling on Woods 1
23. Results of a bad sampling on Woods 1