44
Study on Genetic Network Study on Genetic Network Programming (GNP) with Programming (GNP) with Learning and Evolution Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field Graduate School of Information, Production and S ystems Waseda University

Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Embed Size (px)

Citation preview

Page 1: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Study on Genetic Network Study on Genetic Network Programming (GNP) with Programming (GNP) with Learning and EvolutionLearning and Evolution

Hirasawa laboratory,Artificial Intelligence section

Information architecture fieldGraduate School of Information, Production and Systems

Waseda University

Page 2: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

I Research BackgroundI Research Background

Intelligent systems(evolutionary and learning algorithms)

can solve problems automatically

  Systems are becoming large and complex robot control elevator Group Control System Stock trading system

It is very difficult to make efficient control rulesconsidering many kinds of real world phenomena

Page 3: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

II Objective of the researchII Objective of the research

• propose an algorithm which combines evolution and learning

– In the natural world ・・・• evolution  ―  Many individuals (living things) adapt to the world (environment) through long time of generations

• learning  ―  the knowledge the living things acquire in their life time through trial-and-error

give inherent functions and characteristics to the living things

the knowledge acquired in the course of their life

Page 4: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

III evolutionIII evolution

Page 5: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

  selection  crossover  mutation

EvolutionEvolution

Characteristics of living things are determined by genes

Evolution is realized the following components

Evolution gives inherent characteristics and Functions

Page 6: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Selection

Those who fit into an environment survive,otherwise die out.

Crossover

Genes are exchanged between two individuals

MutationSome of the genes of the selected individuals are changed to other ones

New individuals are produced

New individuals are produced

Page 7: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

IV learningIV learning

Page 8: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Important factors in Important factors in reinforcement learningreinforcement learning

• State transition (definition of states and actions)

• Trial and error learning

• Future prediction

Page 9: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Framework of Framework of Reinforcement LearningReinforcement Learning

• Learn action rules through the interaction between an agent and an environment.

agentagent

environmentenvironment

Action

State signal( sensor input )

Reward( evaluation    on the action )

The aim of RL is to maximize the total rewards obtained from the environment

Page 10: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

rt+n

State transitionState transition

• State transition

stst+1 st+2 st+n

……at

An action taken at time t

State at time t

G

Example: maze problem

start

G G

Goal!!

st st+1 st+2

Reward 100

st+n

at: move right at+1: move upwardat+2: move left at+n: do nothing (end)

……

Reward rt

at+1 at+2

rt+1 rt+2

Page 11: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Trial-and-error learningTrial-and-error learning

concept of reinforcement learning

trial and error learning method

Decide an action

take the action

Success (get reward)

Failure (get negative reward)

Take this action again!

Don’t take this action

again

Reward (scalar value): indicate whether good action or not

Acquired knowledge

agent

Page 12: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Future predictionFuture prediction• Reinforcement learning estimates the future rewards and

take actions

st+1

st+2

st+3

at

Reward rt

at+1

at+2

rt+1

rt+2

st

current time

future

Page 13: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Future predictionFuture prediction• Reinforcement learning considers the rewards not only

current but also the future rewards

st+1 st+2 st+3

at

Reward rt=1

at+1 at+2

rt+1=1 rt+2=1

st

st+1 st+2 st+3

at

Reward rt=0

at+1 at+2

rt+1=0 rt+2=100

Case 1

Case 2

Page 14: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

V GNP with evolution and learning V GNP with evolution and learning

Page 15: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Genetic Network Programming (GNP)Genetic Network Programming (GNP)

GNP is an Evolutionary Computation.

What’s Evolutionary Computation ?

solution gene=

•  Solutions (programs) are represented by genes•  The programs are evolved (changed) by selection, crossover and mutation

Page 16: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Structure of GNPStructure of GNP

Graph structure

0 0 3 4

0 1 1 6

0 2 5 7

1 0 8 0

1 0 0 4

1 5 1 2

… … … …

gene structure

• GNP represents its programs using directed graph structures.• The graph structures can be represented as gene structures.• The graph structure is composed of processing nodes and judgment nodes.

Page 17: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Khepera robotKhepera robot

• Khepera robot is used for the performance evaluation of GNP

obstacle

sensorFar from obstacles

Close to obstaclesClose to zero

Close to 1023

Sensor value

wheel

Speed of the right wheel VR

Speed of the left wheel VL

-10 (back) ~ 10 (forward)

-10 (back) ~ 10 (forward)

Page 18: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Node functionsNode functionsProcessing node

Judgment node

Each node determines an agent action

Each node selects a branch based on the judgment result

Set the speed of the right wheel at 10

Ex) khepera robot behavior

Judge the value of sensor 1

500 or more

Less than 500

Page 19: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

An example of node transitionAn example of node transition

Judge sensor 1

Judge sensor 5

Set the speed of the right wheel at 5

The value is 700 or more

The value is less than 700

80 or more

Less than 80

Page 20: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Generate an initial population (initial programs)

Task executionReinforcement Learning

EvolutionSelection / Crossover / Mutation

Last generation

one generation

Flowchart of GNPFlowchart of GNP

stop

start

Page 21: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Evolution of GNPEvolution of GNPselection

Select good individuals (programs) from the population based on their fitness

Fitness indicates how much each individual achieves a given task

used for crossover and mutation

・・・

GNP population

Page 22: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Evolution of GNPEvolution of GNPcrossover

Some nodes and their connections are exchanged.

Individual 1 Individual 2

Page 23: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

mutation

Change connections

Change node function

Speed of Right wheel: 5

Speed of Left wheel: 10

Page 24: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

The role of LearningThe role of LearningExample)

Set the speed of the right wheel at 10

Collision!

Judge sensor 0

1000 or more

Less than 1000

1000 is changed to 500 in order to judge obstacle sensitively

Judgment node

10 is changed to 5 not to collide with the obstacle

Processing nodeNode parameters are changed by reinforcement learning

Page 25: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

The aim of combining The aim of combining evolution and learningevolution and learning

•  create efficient programs•  search for solutions faster

Evolution uses many individuals and better ones are selected after task execution

Learning uses one individuals and better action rules can be determined during task execution

Page 26: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

VI SimulationVI Simulation• Wall-following behavior

1. All the sensor values must not be more than 1000

2. At least one sensor value is more than 100

3. Move straight 4. Move fast

Simulation environment

Ctvtvtvtv

(t) LRLR

20

)()(1

20

)()(Reward

1000/Rewardfitness1000

1

t

(t)

: If the condition 1 and 2 is satisfied

0

1C

: otherwise

Page 27: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Node functionsNode functions

Processing node (2 kinds) Judgment node (8 kinds)

Determine the speed of right wheelDetermine the speed of left wheel

Judge the value of sensor 0

Judge the value of sensor 7

.....

Page 28: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

0

0.2

0.4

0.6

0.8

0 200 400 600 800 1000

Simulation resultSimulation result

• conditions– The number of

individuals: 600

– The number of nodes: 34

• Judgement nodes: 24• Processing nodes: 10

fitn

ess

generation

GNP with learning and evolution

Standard GNP (GNP with evolution)

fitness curves of the best individuals averaged over 30 independent simulations

start

Track of the robot

Page 29: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

startstart

Simulations in the Simulations in the inexperienced environmentsinexperienced environmentsSimulation on the generalization ability

The robot can show the wall-following behavior.

The best program obtained in the previous environment

Execute in the inexperienced environment

Page 30: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

VII ConclusionVII Conclusion

• The algorithm of GNP using evolution and reinforcement learning is proposed.

– From the simulation results, the proposed method can learn wall-following behavior well.

• Future work

– Apply GNP with evolution and reinforcement learning to real world applications

• Elevator control system• Stock trading model

– Compare with other evolutionary algorithms

Page 31: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

VIII other simulationsVIII other simulations

Example of tileworld

wall

floor

tile

agent

Agent can push a tile and drop it into a hole.

The aim of agent is to drop tiles into holes as many as possible.

tileworld

hole

Fitness = the number of dropped tilesReward rt = 1 (when dropping a tile into a hole)

Page 32: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Node functionsNode functions

Processing node Judgement node

go forward

turn right

turn left

stay

What is in the forward cell ? (floor, tile, hole, wall or agent) backward cell left cell right cell the direction of the nearest tile (forward, backward, left, right or nothing) the direction of the nearest hole the direction of the nearest hole from the nearest tile the direction of the second nearest tile

Page 33: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Example of node transitionExample of node transition

What is in the forward?

Direction of the nearest holebackward

right

left nothing

floorwall

agent

tile

hole

Go forward

forward

Page 34: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Simulation 1Simulation 1

– There are 30 tiles and 30 holes

– same environment every generation

– Time limit: 150 steps Environment I

Page 35: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Fitness curveFitness curve (( simulation simulation 11 ))

fitn

es

s

generation

GNP with evolution

GNP with learning and Evolution

EP ( evolution of Finite State Machine )

GP-ADFs (main tree : max depth 3

GP   (max depth 5)

ADF: depth 2)

0

5

10

15

20

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

Page 36: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Simulation 2Simulation 2

• Put 20 tiles and 20 holes at random positions

• One tile and one hole appear just after an agent push a tile into a hole

• Time limit: 300 stepsEnvironment II( example of an initial state )

Page 37: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Fitness curve (simulation 2)Fitness curve (simulation 2)

fitn

es

s

generation

GNP with evolution

GNP with learning and evolution

EP

GP   (max depth 5)

GP-ADFs (main tree : max depth3ADF: depth 2)

0

5

10

15

20

25

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

Page 38: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Ratio of used nodesRatio of used nodes

Ratio

of u

sed n

odes

Node function

Go fo

rward

Turn

left

Turn

right

Do n

oth

ing

Judge fo

rward

Judge b

ackw

ard

Judge le

ft side

Judge rig

ht sid

e

Dire

ction o

f tile

dire

ction o

f hole

Dire

ction

of h

ole

from

tile

Seco

nd n

eare

st tile

Node function

Ratio

of u

sed n

odes

Initial generation Last generation

Go fo

rward

Turn

left

Turn

right

Do n

oth

ing

Judge fo

rward

Judge b

ackw

ard

Judge le

ft side

Judge rig

ht sid

e

Dire

ction o

f tile

dire

ction o

f hole

Dire

ction

of h

ole

from

tile

Seco

nd n

eare

st tile

Page 39: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Summary of the simulationsSummary of the simulations

GNP-LE GNP-E GP GP-ADFs EP

Mean fitness 21.23 18.00 14.00 15.43 16.30

Standard deviation 2.73 1.88 4.00 1.94 1.99

T-test(p value)

GNP-LEGNP-E

1.04×10-6 3.13×10-17

3.17×10-11

3.03×10-13

1.32×10-6

5.31×10-11

5.95×10-4

Simulation I

GNP-LE GNP-E GP GP-ADFs EP

Mean fitness 19.93 15.30 6.10 6.67 14.40

Standard deviation 2.43 3.88 1.75 3.19 2.54

T-test(p value)

GNP-LEGNP-E

5.90×10-8 1.53×10-31

5.91×10-15

7.46×10-26

1.36×10-13

2.90×10-12

1.46×10-1

Simulation II

Data on the best individuals obtained at the last generation (30 samples)

Page 40: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Summary of the simulationsSummary of the simulations

GNP with LE

GNP with E GP GP-ADFs EP

Calculation time for 5000 generations [s]

1,717 1,019 3,281 3,252 2,802

Ratio of GNP with E (1) to each

1.68 1 3.22 3.19 2.75

Simulation I

GNP with LE

GNP with E GP GP-ADFs EP

Calculation time for 5000 generations [s]

2,734 1,177 12,059 5,921 1,584

Ratio of GNP with E (1) to each

2.32 1 10.25 5.03 1.35

Simulation II

Calculation time comparison

Page 41: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

The program obtained by The program obtained by GNPGNP

0   step12345678910111213141516

Page 42: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

K G wall

floor

door

agent

Maze problem

K

G

key

goal

fitness=

reward rt = 1 (when reaching the goal)

Remaining time ( when reaching the goal ) 0 ( when the agent cannot reach the goal )

objective : reach goal as early as possible

The key is necessary to open the door in front of the goal

Time limit: 300 step

Page 43: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

Processing node Judgment node

go forward

turn right

turn left

random (take one of three actions randomly)

Judge forward cell  Judge backward cell

Judge left cell Judge right cell

Node functionsNode functions

Page 44: Study on Genetic Network Programming (GNP) with Learning and Evolution Hirasawa laboratory, Artificial Intelligence section Information architecture field

0

100

200

300

0 1000 2000 3000

Fitness curveFitness curve (( maze maze problemproblem ))

Fitnes

s

generation

GP GNP with evolution (GNP-E)

GNP with learning and Evolution (GNP-LE)

Data on the best individuals obtained at the last generation (30 samples)

GNP-LE GNP-E GP

mean 253.0 246.2 227.0

Standard deviation 0.00 2.30 37.4

Ratio of reaching the goal 100% 100% 100%

Ratio of obtaining the optimal policy 100% 3.3% 63%