31
Learning Environment Models in Car Racing using Stateful Genetic Programming UCD Natural Computing Research & Applications Group (NCRA) Alexandros Agapitos, Michael O’Neill, Anthony Brabazon School Of Computer Science and Informatics University College Dublin Ireland

Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

Learning Environment Models in Car Racing using Stateful Genetic Programming

UCD Natural Computing Research & Applications Group (NCRA)

Alexandros Agapitos, Michael O’Neill, Anthony Brabazon School Of Computer Science and Informatics University College Dublin Ireland

Page 2: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

Outline

•  Memory usage in Genetic Programming

•  Method for Building Racing Track Models in TORCS

•  Experimental Results

•  Conclusions and Future Work

UCD Natural Computing Research & Applications Group (NCRA)

Page 3: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

Traditional Expression-tree GP

UCD Natural Computing Research & Applications Group (NCRA)

2⋅ π + (x + 3) − y5 +1

⎝ ⎜

⎠ ⎟

Page 4: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

Indexed Memory: A Simple Addition to GP

•  Read and Write primitives are added as new non-terminals in the language.

•  Each GP expression-tree is given access to its own array of integers, indexed over the integers.

•  Read(X) returns the value stored in memory position X, where X is of same type as the type of memory elements (i.e. Read(4) returns the fifth element of the array).

•  Write(X, Y) returns the old value of a memory position X, and has the side effect of changing the value of position X to Y (i.e. Write(10, 104) returns the value of memory position 10 and overwrites it to 104).

•  Program state refers to the contents of array during program execution.

UCD Natural Computing Research & Applications Group (NCRA)

Page 5: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

An example

UCD Natural Computing Research & Applications Group (NCRA)

IF

>

x

5 Read Read

+ 2

Write

Write

*

x

+

x

1

x

+ x

1 x

We wish to execute the program with consecutive input

{x1, x2, x3} = {1, 2, 3} while maintaining program state. 0

0

0

0

0

0

0

0

0

0

Indexed Mem

ory

Page 6: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

State Maintenance During Program Execution (1)

UCD Natural Computing Research & Applications Group (NCRA)

IF

>

x

5 Read Read

+ 2

Write *

x

+

1

x

x

0

0

0

0

0

0

0

0

0

0

x=1

return 0

Write

x +

1 x

Page 7: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

UCD Natural Computing Research & Applications Group (NCRA)

IF

>

x

5 Read Read

+ 2

Write *

x

+

1

x

x

0

0

0

0

0

0

0

0

0

0

return 0

x=1

return 0

State Maintenance During Program Execution (2)

Write

x +

1 x

Page 8: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

UCD Natural Computing Research & Applications Group (NCRA)

IF

>

x

5 Read Read

+ 2

Write

Write

*

x

+ x

1

x +

x 1 x

0

0

0

0

0

0

0

0

0

0

State Maintenance During Program Execution (3)

x=1 0

0

1

0

0

0

0

0

0

0

before after

return 0

write 1

Page 9: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

UCD Natural Computing Research & Applications Group (NCRA)

IF

>

x

5 Read Read

+ 2

Write *

x

+

1

x

x

0

0

1

0

0

0

0

0

0

0

State Maintenance During Program Execution (4)

return 0

return 0

x=1

Write

x +

1 x

Page 10: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

State Maintenance During Program Execution (5)

UCD Natural Computing Research & Applications Group (NCRA)

IF

>

x

5 Read Read

+ 2

Write *

x

+

1

x

x

0

0

1

0

0

0

0

0

0

0

x=2

return 1

Write

x +

1 x

Page 11: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

UCD Natural Computing Research & Applications Group (NCRA)

IF

>

x

5 Read Read

+ 2

Write *

x

+

1

x

x

0

0

1

0

0

0

0

0

0

0

return 1

x=2

return 0

State Maintenance During Program Execution (6)

Write

x +

1 x

Page 12: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

UCD Natural Computing Research & Applications Group (NCRA)

IF

>

x

5 Read Read

+ 2

Write

Write

*

x

+ x

1

x +

x 1 x

0

0

1

0

0

0

0

0

0

0

State Maintenance During Program Execution (7)

x=2 0

0

4

0

0

0

0

0

0

0

before after

return 1

write 4

Page 13: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

UCD Natural Computing Research & Applications Group (NCRA)

IF

>

x

5 Read Read

+ 2

Write *

x

+

1

x

x

0

0

4

0

0

0

0

0

0

0

State Maintenance During Program Execution (8)

return 1

return 1

x=2

Write

x +

1 x

Page 14: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

State Maintenance During Program Execution (9)

UCD Natural Computing Research & Applications Group (NCRA)

IF

>

x

5 Read Read

+ 2

Write *

x

+

1

x

x

0

0

4

0

0

0

0

0

0

0

x=3

return 0

Write

x +

1 x

Page 15: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

UCD Natural Computing Research & Applications Group (NCRA)

IF

>

x

5 Read Read

+ 2

Write *

x

+

1

x

x

0

0

4

0

0

0

0

0

0

0

return 0

x=3

return 0

State Maintenance During Program Execution (10)

Write

x +

1 x

Page 16: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

UCD Natural Computing Research & Applications Group (NCRA)

IF

>

x

5 Read Read

+ 2

Write

Write

*

x

+ x

1

x +

x 1 x

0

0

4

0

0

0

0

0

0

0

State Maintenance During Program Execution (11)

x=3 0

0

4

0

3

0

0

0

0

0

before after

return 0

write 3

Page 17: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

UCD Natural Computing Research & Applications Group (NCRA)

IF

>

x

5 Read Read

+ 2

Write

Write

*

x

+ x

1

x +

x 1 x

0

0

4

0

3

0

0

0

0

0

State Maintenance During Program Execution (12)

return 0

return 0

x=3

Page 18: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

Program Execution Overview

initial state

UCD Natural Computing Research & Applications Group (NCRA)

0

0

0

0

0

0

0

0

0

0

input: 1

output: 0 0

0

1

0

0

0

0

0

0

0

0

0

4

0

0

0

0

0

0

0

0

0

4

0

3

0

0

0

0

0

input: 2

output: 1 input: 3

output: 0

state a state b state c

Page 19: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

Scope for Research

•  Is it possible to use program state to represent a model of a racing track?

•  Can this model be utilised for navigation purposes?

•  What is the best way to evolve programs with state using GP?

•  What are the effects of stateful program representations to the evolutionary search?

•  Test-bed used: The Open Car Racing Simulator

UCD Natural Computing Research & Applications Group (NCRA)

Page 20: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

Cooperative Coevolution of Model-builder and Car-controller Programs •  A multi-phasic fitness evaluation procedure:

–  Phase A: Model Building based on sensory information. –  Phase B: Car Controlling with deprivation of sensory information.

•  Fitness assignment is based on the second phase of fitness evaluation.

•  Program representation employs a modular architecture that consists of two individual expression-tree branches that are coupled with a general-purpose two-dimensional data-structure.

UCD Natural Computing Research & Applications Group (NCRA)

Page 21: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

Phase A: Information for Model Building

•  Information required to build a model of the racing track: –  X (normalised within the range [0, 163] for ETrack5) –  Y (normalised within the range [0, 1621] for ETrack5) –  Angle between the car direction and track axis –  TrackEdgeSensors A, TrackEdgeSensors B, TrackEdgeSensors C

UCD Natural Computing Research & Applications Group (NCRA)

A C

B

(x,y)

Dist. from start line

Dist. From right-most edge

Page 22: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

Phase A: Model Building Flow-chart

UCD Natural Computing Research & Applications Group (NCRA)

Input file generated offline 2D array of 2000x2000

5 Write

*

TES A +

1 X

Model-building tree

Y

Page 23: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

Phase B: Car-controlling Flow-chart

UCD Natural Computing Research & Applications Group (NCRA)

Real-time racing 2D array of 2000x2000

(updated in Phase A) *

X

Read Read

+ X

1

Car-controlling tree

Driving & Steering commands

X, Y, Speed

Y Speed

Page 24: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

Fitness assignment: Combining Phases A & B

UCD Natural Computing Research & Applications Group (NCRA)

Model Building Phase Car Controlling Phase Racing track model

built in memory

fitness

TORCS race

Page 25: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

Program Representation Language

UCD Natural Computing Research & Applications Group (NCRA)

Page 26: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

Method for Localisation

UCD Natural Computing Research & Applications Group (NCRA)

Page 27: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

Experiment design

•  Generational, Elitist GA

•  Population size: 300

•  Generations: 40

•  Tournament size: 3

•  Expression-tree initialisation: Ramped-half-and-half

•  Subtree mutation and crossover (prob. set to 0.7 in favour of mutation)

•  Multiobjective fitness function to be maximised:

•  Race duration: 5,000 time-steps

•  Racing track: Etrack5

•  Maximum gear: constrained to gear 1

•  Car1-trb1 speed: approx. 82Km/h

•  Distance covered: approx. 2,320m within 5,000 time-steps

UCD Natural Computing Research & Applications Group (NCRA)

f = w1DR − w21

5000TCDi

i=1

5000

Page 28: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

Performance Histograms

UCD Natural Computing Research & Applications Group (NCRA)

0 10 20 30 400.2

0.1

0

0.1

0.2

0.3

0.4

0.5

Generation

Best

Fitn

ess

0 10 20 30 400

500

1000

1500

2000

Generation

Max

imum

Dis

tanc

e R

aced

0 10 20 30 400

0.2

0.4

0.6

0.8

1

Generation

Trac

kce

nter

Dev

iatio

n C

ompo

nent

0 10 20 30 400.2

0.1

0

0.1

0.2

0.3

0.4

0.5

Generation

Best

Fitn

ess

0 10 20 30 400

500

1000

1500

2000

Generation

Max

imum

Dis

tanc

e R

aced

0 10 20 30 400

0.2

0.4

0.6

0.8

1

Generation

Trac

kce

nter

Dev

iatio

n C

ompo

nent

(w1,w2)=(0.65, 0.35)

(w1,w2)=(0.75, 0.25)

Page 29: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

Heatmap of Read-Write Overlap

UCD Natural Computing Research & Applications Group (NCRA)

Page 30: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

Conclusions

UCD Natural Computing Research & Applications Group (NCRA)

•  Successful cooperative coevolution of two programs that share memory.

•  A fitness function that penalises a racing line that deviates heavily from the track’s center provides the necessary search bias towards the effective use of memory.

•  Most evolved racing track models exploited a roughly isomorphic relation between the environment and the memory by mapping the car’s move in the racing track to an equivalent position in the 2D array.

•  Indexed memory is a powerful extension to GP for the effective storage and retrieval of information.

•  For future work: –  Study methods to allow agents to utilise the track model to plan. –  Generalisation of model-building ability.

Page 31: Learning Environment Models in Car Racing using Stateful ...ncra.ucd.ie/COMP41190/GPCarRacing.pdf · from the track’s center provides the necessary search bias towards the effective

Thank you

UCD Natural Computing Research & Applications Group (NCRA)