43
ADAPTIVE HIGH-LEVEL STRATEGY LEARNING IN STARCRAFT SBGames 2013 Jiéverson Maissiat and Felipe Meneguzzi

Adaptive High-Level Strategy Learning in StarCraft

Embed Size (px)

Citation preview

Page 1: Adaptive High-Level Strategy Learning in StarCraft

ADAPTIVE HIGH-LEVEL STRATEGY LEARNING IN STARCRAFTSBGames 2013

Jiéverson Maissiat and Felipe Meneguzzi

Page 2: Adaptive High-Level Strategy Learning in StarCraft

Background

Page 3: Adaptive High-Level Strategy Learning in StarCraft

Reinforcement Learning

A learning technique for agents acting in the environment, observing the reached states and

the received rewards at each step

Page 4: Adaptive High-Level Strategy Learning in StarCraft

Reinforcement Learning

Page 5: Adaptive High-Level Strategy Learning in StarCraft

Q-Learning

Updates the value of a state-action pair after the action has been taken in the state and an

immediate reward has been received

Page 6: Adaptive High-Level Strategy Learning in StarCraft

Q-Learning

Q(s,a)

● s represents the current state of the world;● a represents the action chosen by the agent;● Q(s,a) represents the value obtained the last time action a was executed at state s. This value is

often called Q-value.

Page 7: Adaptive High-Level Strategy Learning in StarCraft

Q-Learning

Q(s,a) r

● r represents the reward obtained after performing action a in state s;

Page 8: Adaptive High-Level Strategy Learning in StarCraft

Q-Learning

Q(s,a) fit( r )

● fit is responsible to adjust the reward;

Page 9: Adaptive High-Level Strategy Learning in StarCraft

Q-Learning

Q(s,a) = Q(s,a) + fit( r )

● The Q-value is increased with the adjusted reward;

Page 10: Adaptive High-Level Strategy Learning in StarCraft

Q-Learning

Q(s,a) = Q(s,a) + α * r

● α is the learning-rate, which determines the weight of new information over what the agent already knows: a factor of 0 prevents the agent from learning anything (by keeping the Q-value identical to its previous value) whereas a factor of 1 makes the agent consider all newly obtained information;

Page 11: Adaptive High-Level Strategy Learning in StarCraft

Q-Learning

Q(s,a) = 0 + α * r

● If the Q-value starts in 0...

Page 12: Adaptive High-Level Strategy Learning in StarCraft

Q-Learning

Q(s,a) = 0 + α * 2

● and the received reward is 2...

Page 13: Adaptive High-Level Strategy Learning in StarCraft

Q-Learning

Q(s,a) = 0 + 0.3 * 2

● with a learning-rate of 0.3...

Page 14: Adaptive High-Level Strategy Learning in StarCraft

Q-Learning

0.6 = 0 + 0.3 * 2

● than the Q-value is updated to 0.6;

Page 15: Adaptive High-Level Strategy Learning in StarCraft

Q-Learning

-2.4 = 0.6 + 0.3 * -10

● Others updates will be done over the first;

Page 16: Adaptive High-Level Strategy Learning in StarCraft

Q-Learning

Q(s,a) = -2.4 + 0.3 * r

● And so it follows.

Page 17: Adaptive High-Level Strategy Learning in StarCraft

Parameter Control

α / learning-rate

Influence how the state values are computed, and ultimately how a policy is generated

Page 18: Adaptive High-Level Strategy Learning in StarCraft

Learning Rate: α● High Learning-rate?

It can prevent the algorithm from converging, or lead to inaccuracies in the computed value of each state (e.g. a local maximum);

● Low Learning-rate?It can hampers the convergence of the Q-value;

● Varying the Learning-rate?A typical way to vary the α-value, is to start interactions with a value close to 1, and then decrease it over time toward 0.However, this approach is not effective for dynamic environments, since a drastic change in the environment with a learning-rate close to 0 prevents the agent from learning the optimal policy in the changed environment.

Page 19: Adaptive High-Level Strategy Learning in StarCraft

Exploration Policy

ε-greedy

It has the probability ε to select a random action, and a probability 1 - ε to select the optimal action (which has the highest Q-value)

Page 20: Adaptive High-Level Strategy Learning in StarCraft

Meta-Level Reasoning

Page 21: Adaptive High-Level Strategy Learning in StarCraft

Meta-Level Reasoning

Page 22: Adaptive High-Level Strategy Learning in StarCraft

StarCraft:Brood War

Page 23: Adaptive High-Level Strategy Learning in StarCraft

StarCraft

● Real-Time Strategy (RTS)

● Resource Management

● Battle!

Page 24: Adaptive High-Level Strategy Learning in StarCraft

StarCraft

Page 25: Adaptive High-Level Strategy Learning in StarCraft

StarCraft

Page 26: Adaptive High-Level Strategy Learning in StarCraft

META-LEVEL REINFORCEMENT LEARNING

Page 27: Adaptive High-Level Strategy Learning in StarCraft

Meta-Level Reasoning

Page 28: Adaptive High-Level Strategy Learning in StarCraft

Meta-Level Reasoning

Reinforcement Learning

Page 29: Adaptive High-Level Strategy Learning in StarCraft

Meta-Level Learning

Page 30: Adaptive High-Level Strategy Learning in StarCraft

Meta-Level Reasoning

ε-greedy

Page 31: Adaptive High-Level Strategy Learning in StarCraft

Meta-Level Action Selection

ε-greedy modified:Instead of keeping the exploitation-rate (ε-value) constant, we apply the

same meta-level reasoning to the ε-value, increasing the exploration rate, whenever we find that the agent must increase its learning-rate.

The more the agent wants to learn, the more it wants to explore; if there is nothing to learn, there is nothing to explore.

ε = α

Page 32: Adaptive High-Level Strategy Learning in StarCraft

Experimentsin StarCraft

Page 33: Adaptive High-Level Strategy Learning in StarCraft

Code Injection at StarCraft

● BWAPI○ Framework in C++ to inject code at

StarCraft;

● BTHAI○ Bot implemented in BWAPI, with different

high-level strategies;

Page 34: Adaptive High-Level Strategy Learning in StarCraft

A Reinforcement Learning Approach for StarCraft● High-Level Strategy

○ Learning and selection of strategies rather than micro-actions;

● Learn the end of each match○ Feedback of victory: +1○ Feedback of defeat: -1

● Terran vs CPU○ It was assumed that the agent plays only

as Terran;

Page 35: Adaptive High-Level Strategy Learning in StarCraft

List of Terran Strategies of BTHAI● Marine Rush

● Wraith Harass

● Terran Defensive

● Terran Defensive FB

● Terran Push

Page 36: Adaptive High-Level Strategy Learning in StarCraft

Results

Page 37: Adaptive High-Level Strategy Learning in StarCraft
Page 38: Adaptive High-Level Strategy Learning in StarCraft
Page 39: Adaptive High-Level Strategy Learning in StarCraft
Page 40: Adaptive High-Level Strategy Learning in StarCraft
Page 41: Adaptive High-Level Strategy Learning in StarCraft

Source CodeBWAPI & StarCrafthttps://github.com/jieverson/BTHAIMOD

Page 42: Adaptive High-Level Strategy Learning in StarCraft
Page 43: Adaptive High-Level Strategy Learning in StarCraft

[email protected] Alegre, 18 de outubro de 2013

Thank You