37
Learning to Win by Reading Manuals in a Monte-Carlo Framework S.R.K. Branavan, David Silver, Regina Barzilay MIT

Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Learning to Win by Reading Manuals in a Monte-Carlo Framework

S.R.K. Branavan, David Silver, Regina Barzilay

MIT

Page 2: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Traditional view:

Map text into an abstract representation

Alternative view:

Map text into a representation which helps

performance in a control application

Semantic Interpretation

2

Page 3: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Semantic Interpretation for Control Applications

3

lost

won

lost

End resultComplex strategy game

action 1

action 2

action 3

Traditional approach:

Learn action-selection policy from game feedback.

Our contribution:

Use textual advice to guide action-selection policy.

Page 4: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Leveraging Textual Advice: Challenges

4

1. Find sentences relevant to given game state.

Game state Strategy document

You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is suitable for the construction of your first city. Use settlers to build the city on grassland with a river running through it if possible. You can also use settlers to irrigate land near your city. In order to survive and grow …

settlercity

Page 5: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

5

Game state Strategy document

Leveraging Textual Advice: Challenges

1. Find sentences relevant to given game state.

settlercity

You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is suitable for the construction of your first city. Use settlers to build the city on grassland with a river running through it if possible. You can also use settlers to irrigate land near your city. In order to survive and grow …

Page 6: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

settlercity

settler

6

Game state Strategy document

You start with two settler units. Although settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is suitable for the construction of your first city. Use settlers to build the city on grassland with a river running through it if possible.You can also use settlers to irrigate land near your city. In order to survive and grow …

Leveraging Textual Advice: Challenges

1. Find sentences relevant to given game state.

Page 7: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Leveraging Textual Advice: Challenges

7

Move the settler to a site suitable for building a city, onto grassland with a river if possible.

2. Label sentences with predicate stucture.

Move the settler to a site suitable for building a city, onto grassland with a river if possible.

move_settlers_to()

settlers_build_city()

?

?

move_settlers_to()

Label words as action, state or background

Page 8: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Leveraging Textual Advice: Challenges

8

Build the city onplains or grassland with a river running through it if possible.

a1 – move_settlers_to(7,3)

S

a2 – settlers_build_city()

a3 – settlers_irrigate_land()

3. Guide action selection using relevant text

Page 9: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Learning from Game Feedback

9

Goal: Learn from game feedback as only source of supervision.

Key idea: Better parameter settings will lead to more victories.

You start with two settler units. Although

settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is

suitable for the construction of your first city. Use settlers to build the city on plains or grassland with a river running

through it if possible. In order to survive and grow …

a1

S

a2

a3

wonEnd result

a1

S

a2

a3 lost

End result

Model params:

θ1

Model params:

θ2

You start with two settler units. Although

settlers are capable of performing a variety of useful tasks, your first task is to move the settlers to a site that is

suitable for the construction of your first city. Use settlers to build the city on plains or grassland with a river running

through it if possible. In order to survive and grow …

Game manual

Game manual

Page 10: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Model Overview

Monte-Carlo Search Framework

• Learn action selection policy from simulations

• Very successful in complex games like Go and Poker.

Our Algorithm

• Learn text interpretation from simulation feedback

• Bias action selection policy using text

10

Page 11: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Monte-Carlo Search

11

Actual Game

State 1

Simulation

Irrigate

State 1

Copy

Game lost

Copygame

???

Select actions via simulations, game and opponent can be stochastic

Page 12: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

3.53.53.5

Monte-Carlo Search

Try many candidate actions from current state & see how well they perform.

State 1Current game state

Game scores

Ro

llou

t d

epth

0.1 0.4 3.51.2

12

Page 13: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Current game state

Monte-Carlo Search

Try many candidate actions from current state & see how well they perform.

Learn feature weights from simulation outcomes

State 1

Game scores

Ro

llou

t d

epth

0.1 0.4 3.51.2

13

- feature function

- model parameters

. . . . . . . . .

5 1 0 1 1 0 1 = 0.1

15 0 1 0 0 1 0 = 0.4

37 1 0 1 0 0 0 = 1.2

Page 14: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Model Overview

Monte-Carlo Search Framework

• Learn action selection policy from simulations

Our Algorithm

• Bias action selection policy using text

• Learn text interpretation from simulation feedback

14

Page 15: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

• Identify sentence relevant to game state

• Label sentence with predicate structure

• Estimate value of candidate actions

Modeling Requirements

15

Build cities near rivers or ocean.

Build cities near rivers or ocean. Build cities near rivers or ocean.

Build cities near rivers or ocean.

Fortify :

Irrigate : -10

. . . .

Build city :

-5

25

Page 16: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Sentence Relevance

16

Sentence is selected as relevant

State , candidate action , document

- weight vector

- feature functionLog-linear model:

Identify sentence relevant to game state and action

1

2

3

Page 17: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Word index , sentence , dependency info

- weight vector

- feature functionLog-linear model:

Predicate Structure

17

Select word labels based on sentence + dependency info

E.g., “Build cities near rivers or ocean.”

Predicate label = { action, state, background }

1

2

3

Page 18: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

- weight vector

- feature functionLinear model:

Final Q function approximation

18

State , candidate action

Predict expected value of candidate action

Document , relevant sentence , predicate labeling

1

2

3

Page 19: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Multi-layer neural network: Each layer represents a different stage of analysis

Model Representation

19

Input:game state, candidate action, document text

Select most relevant sentence

Q function approximation

Predict sentence predicate structure

Predicted action value

Page 20: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Parameter Estimation

20

Objective: Minimize mean square error between

predicted utility

and observed utility

25

State

Predicted utility:

Action

Observed utility:

Game rollout

Page 21: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Parameter Estimation

21

Method: Gradient descent – i.e., Backpropagation.

Parameter updates:

Page 22: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Features

22

State features:

- Amount of gold in treasury

- Government type

- Terrain surrounding current unit

Action features:

- Unit type (settler, worker, archer, etc)

- Unit action type

Text features:

- Word

- Parent word in dependency tree

- Word matches text label of unit

Page 23: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Experimental Domain

23

Sentences: 2083

Avg. sentence words: 16.7

Vocabulary: 3638

Document:

• Official game manual of Civilization II

Text Statistics:

Game:

• Complex, stochastic turn-based strategy game Civilization II.

• Branching factor: 1020

Page 24: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Experimental Setup

24

Game opponent:

• Built-in AI of Game.

• Domain knowledge rich AI, built to challenge humans.

Primary evaluation:

• Games won within first 100 game steps.

• Averaged over 200 independent experiments.

• Avg. experiment runtime: 1.5 hours

Secondary evaluation:

• Full games won.

• Averaged over 50 independent experiments.

• Avg. experiment runtime: 4 hours

Page 25: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Results

25

0% 20% 40% 60%

Full model

Built-in AI 0%

% games won in 100 turns, averaged over 200 runs.

Page 26: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Does Text Help ?

26

0% 20% 40% 60%

Full model

Game only

Built-in AI 0%

% games won in 100 turns, averaged over 200 runs.

Linear Q fn. approximation,

No text

Page 27: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Text vs. Representational Capacity

27

0% 20% 40% 60%

Full model

Game only

Latent variable

Built-in AI 0%

Non-Linear Q fn. approximation,

No text

% games won in 100 turns, averaged over 200 runs.

Page 28: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Linguistic Complexity vs. Performance Gain

28

0% 20% 40% 60%

Full model

Sentence relevance

Game only

Latent variable

Built-in AI 0%

% games won in 100 turns, averaged over 200 runs.

Page 29: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Results: Sentence Relevance

29

Problem: Sentence relevance depends on game state.

States are game specific, and not known a priori!

Solution: Add known non-relevant sentences to text.

E.g., sentences from the Wall Street Journal corpus.

Results: 71.8% sentence relevance accuracy…

Surprisingly poor accuracy given game win rate!

Page 30: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Results: Sentence Relevance

30

Sen

ten

ce r

elev

ance

acc

ura

cyTe

xt fe

atu

re im

po

rtan

ce

Text features

Game features

Page 31: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Results: Full Games

31

0% 20% 40% 60% 80% 100%

Full model

Latent variable

Percentage games won, averaged over 50 runs

Game only

Page 32: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Grounded Language Acquisition: Instruction InterpretationBranavan et al. 2009, 2010, Vogel & Jurafsky 2010

• Imperative descriptions of action sequences• Assume relevance of text to current world state

Language Analysis in GamesEisenstein et al. 2009

• Extract high-level semantic representation from text• Learn game rules from labeled traces + extracted formulae

Gorniak & Roy 2005

• Interpret spoken commands to control game character• Learn from labeled parallel corpus

Related Work

32

Page 33: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

• Human knowledge encoded in natural language can be automatically leveraged to improve control applications.

• Environment feedback is a powerful supervision signal for language analysis.

• Method is applicable to control applications that have an inherent success signal, and can be simulated.

Conclusions

Code, data & experimental framework available at:http://groups.csail.mit.edu/rbg/code/civ

33

Page 34: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

34

Page 35: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

Monte-Carlo Search: Summary

0.1 3.5 1.2 0.4 1.1 0.8 2.8 0.9 3.1 2.9 0 1.4

Game states and actions

Monte-CarloRollouts (simulations)

Use observed rollout scores to select game action

state 1 state 2 state 3action 1 action 2

35

Page 36: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

36

Model Complexity, Time and Performance

Page 37: Learning to Win by Reading Manuals in a Monte-Carlo Framework …people.csail.mit.edu/.../papers/acl2011_presentation.pdf · 2011-06-21 · Learning to Win by Reading Manuals in a

37

Dependency Information