95
David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Embed Size (px)

Citation preview

Page 1: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

David ChenAdvisor: Raymond Mooney

Research Preparation Exam

August 21, 2008

Learning to Sportscast: A Test of Grounded Language Acquisition

Page 2: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Semantics of Language

• The meaning of words, phrases, etc

• Crucial in communications

2

Page 3: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Semantics of Language

• The meaning of words, phrases, etc

• Crucial in communications

• Example: “Spanish goalkeeper Iker Casillas blocks the ball”– Merriam-Webster: (transitive verb) to interfere

usually legitimately with (as an opponent) in various games or sports

– WordNet: (v) parry, deflect2

Page 4: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Language Grounding

• Problem: We are circularly defining the meanings of words in terms of other words.

• The meanings of many words are grounded in our perception of the physical world: red, ball, cup, run, hit, fall, etc.– Symbol Grounding: Harnad (1990)

• Even many abstract words and meanings are metaphorical abstractions of terms grounded in the physical world: up, down, over, in, etc.– Lakoff and Johnson’s Metaphors We Live By

• It’s difficult to put my ideas into words.• Interest in competitions is up.

3

Page 5: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Grounding Language

Casillas blocks the ball

4

Page 6: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Grounding Language

Casillas blocks the ball

Block(Casillas)

5

Page 7: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Grounding Language

Casillas blocks the ball

Block(Casillas)

5

Page 8: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Natural Language and Meaning Representation

Casillas blocks the ball

Block(Casillas)

6

Page 9: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Natural Language and Meaning Representation

Natural Language (NL)

NL: A language that has evolved naturally, such as English, German, French, Chinese, etc

Block(Casillas)

6

Casillas blocks the ball

Page 10: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Natural Language and Meaning Representation

NL: A language that has evolved naturally, such as English, German, French, Chinese, etc

MRL: Formal languages such as logic or any computer-executable code

Meaning Representation Language (MRL)

Block(Casillas)

6

Casillas blocks the ball

Natural Language (NL)

Page 11: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Semantic Parsing and Tactical Generation

NL

Semantic Parsing: maps a natural-language sentence to a complete, detailed semantic representation

MRL

Semantic Parsing (NL MRL)

Block(Casillas)

7

Casillas blocks the ball

Page 12: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Semantic Parsing and Tactical Generation

NL

Semantic Parsing: maps a natural-language sentence to a complete, detailed semantic representation

Tactical Generation: Generates a natural-language sentence from a meaning representation.

MRL

Semantic Parsing (NL MRL)

Tactical Generation (NL MRL)

Block(Casillas)

7

Casillas blocks the ball

Page 13: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Learning Approach

Manually Annotated Training Corpora(NL/MRL pairs)

Semantic Parser

MRLNL

Semantic Parser Learner

8

Page 14: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Learning Approach

Manually Annotated Training Corpora(NL/MRL pairs)

Tactical Generator

MRLNL

Tactical Generator Learner

9

Page 15: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Example of Annotated Training Corpus

Alice passes the ball to Bob

Bob turns the ball over to John

John passes to Fred

Fred shoots for the goal

Paul blocks the ball

Paul kicks off to Nancy

Pass(Alice, Bob)

Turnover(Bob, John)

Pass(John, Fred)

Kick(Fred)

Block(Paul)

Pass(Paul, Nancy)

Natural Language (NL)Meaning Representation

Language (MRL)

10

Page 16: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Example of Annotated Training Corpus

Alice passes the ball to Bob

Bob turns the ball over to John

John passes to Fred

Fred shoots for the goal

Paul blocks the ball

Paul kicks off to Nancy

P1(C1, C2)

P2(C2, C3)

P1(C3, C4)

P3(C4)

P4(C5)

P5(C5, C6)

11

Natural Language (NL)Meaning Representation

Language (MRL)

Page 17: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Learning Language from Perceptual Context

• Constructing annotated corpora for language learning is difficult

• Children acquire language through exposure to linguistic input in the context of a rich, relevant, perceptual environment

• Ideally, a computer system can learn language in the same manner

12

Page 18: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Goals

• Learn to ground the semantics of language

• Learn language through correlated linguistic and visual inputs

13

Casillas blocks the ball

Page 19: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Challenge

14

Page 20: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Challenge

14

Page 21: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Challenge

“西班牙守門員擋下了球”

14

Page 22: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Challenge

A linguistic input may correspond to many possible events

?

??

“西班牙守門員擋下了球”

15

Page 23: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Challenge

A linguistic input may correspond to many possible events

?

??

Pass(GermanyPlayer1, GermanyPlayer2)

Kick(GermanyPlayer2)

Block(SpanishGoalie)

“西班牙守門員擋下了球”

16

Page 24: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Overview

• Sportscasting task

• Related works

• Tactical generation

• Strategic generation

• Human evaluation

17

Page 25: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Learning to Sportscast

• Robocup Simulation League games

• No speech recognition– Record commentaries in text form

• No computer vision– Ruled-based system to automatically extract

game events in symbolic form

• Concentrate on linguistic issues

18

Page 26: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Robocup Simulation League

19

Page 27: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Robocup Simulation League

19Purple goalie blocked the ball

Page 28: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Learning to Sportscast

• Learn to sportscast by observing sample human sportscasts

• Build a function that maps between natural language (NL) and meaning representation (MR)– NL: Textual commentaries about the game– MR: Predicate logic formulas that represent

events in the game

20

Page 29: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Robocup Sportscaster TraceNatural Language Commentary Meaning Representation

Purple goalie turns the ball over to Pink8

badPass ( Purple1, Pink8 )

Pink11 looks around for a teammate

Pink8 passes the ball to Pink11

Purple team is very sloppy today

Pink11 makes a long pass to Pink8

Pink8 passes back to Pink11

turnover ( Purple1, Pink8 )

pass ( Pink11, Pink8 )

pass ( Pink8, Pink11 )

ballstopped

pass ( Pink8, Pink11 )

kick ( Pink11 )

kick ( Pink8)

kick ( Pink11 )

kick ( Pink11 )

kick ( Pink8 )

21

Page 30: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Robocup Sportscaster TraceNatural Language Commentary Meaning Representation

Purple goalie turns the ball over to Pink8

badPass ( Purple1, Pink8 )

Pink11 looks around for a teammate

Pink8 passes the ball to Pink11

Purple team is very sloppy today

Pink11 makes a long pass to Pink8

Pink8 passes back to Pink11

turnover ( Purple1, Pink8 )

pass ( Pink11, Pink8 )

pass ( Pink8, Pink11 )

ballstopped

pass ( Pink8, Pink11 )

kick ( Pink11 )

kick ( Pink8)

kick ( Pink11 )

kick ( Pink11 )

kick ( Pink8 )

21

Page 31: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Robocup Sportscaster TraceNatural Language Commentary Meaning Representation

Purple goalie turns the ball over to Pink8

badPass ( Purple1, Pink8 )

Pink11 looks around for a teammate

Pink8 passes the ball to Pink11

Purple team is very sloppy today

Pink11 makes a long pass to Pink8

Pink8 passes back to Pink11

turnover ( Purple1, Pink8 )

pass ( Pink11, Pink8 )

pass ( Pink8, Pink11 )

ballstopped

pass ( Pink8, Pink11 )

kick ( Pink11 )

kick ( Pink8)

kick ( Pink11 )

kick ( Pink11 )

kick ( Pink8 )

21

Page 32: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Robocup Sportscaster TraceNatural Language Commentary Meaning Representation

Purple goalie turns the ball over to Pink8

P6 ( C1, C19 )

Pink11 looks around for a teammate

Pink8 passes the ball to Pink11

Purple team is very sloppy today

Pink11 makes a long pass to Pink8

Pink8 passes back to Pink11

P5 ( C1, C19 )

P2 ( C22, C19 )

P2 ( C19, C22 )

P0

P2 ( C19, C22 )

P1 ( C22 )

P1( C19 )

P1 ( C22 )

P1 ( C22 )

P1 ( C19 )

22

Page 33: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Robocup Data

• Collected human textual commentary for the 4 Robocup championship games from 2001-2004.– Avg # events/game = 2,613– Avg # sentences/game = 509

• Each sentence matched to all events within previous 5 seconds.– Avg # MRs/sentence = 2.5 (min 1, max 12)

• Manually annotated with correct matchings of sentences to MRs (for evaluation purposes only).

23

Page 34: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Overview

• Sportscasting task

• Related works

• Tactical generation

• Strategic generation

• Human evaluation

24

Page 35: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Semantic Parser Learners

• Learn a function from NL to MR

NL: “Purple3 passes the ball to Purple5”

MR: Pass ( Purple3, Purple5 )

Semantic Parsing (NL MR)

Tactical Generation (MR NL)

•We experiment with two semantic parser learners–WASP (Wong & Mooney, 2006; 2007)–KRISP (Kate & Mooney, 2006)

25

Page 36: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

• Uses statistical machine translation techniques– Synchronous context-free grammars (SCFG)

[Wu, 1997; Melamed, 2004; Chiang, 2005]– Word alignments [Brown et al., 1993; Och &

Ney, 2003]

• Capable of both semantic parsing and tactical generation

WASP: Word Alignment-based Semantic Parsing

26

Page 37: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

KRISP: Kernel-based Robust Interpretation by Semantic Parsing

• Productions of MR language are treated like semantic concepts

• SVM classifier is trained for each production with string subsequence kernel

• These classifiers are used to compositionally build MRs of the sentences

• More resistant to noisy supervision but incapable of tactical generation

27

Page 38: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

KRISPER: KRISP with EM-like Retraining

• Extension of KRISP that learns from ambiguous supervision [Kate & Mooney, 2007]

• Uses an iterative EM-like method to gradually converge on a correct meaning for each sentence.

28

Page 39: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

KRISPER

Purple goalie turns the ball over to Pink8

badPass ( Purple1, Pink8 )

Pink11 looks around for a teammate

Pink8 passes the ball to Pink11

Purple team is very sloppy today

Pink11 makes a long pass to Pink8

Pink8 passes back to Pink11

turnover ( Purple1, Pink8 )

pass ( Pink11, Pink8 )

pass ( Pink8, Pink11 )

ballstopped

pass ( Pink8, Pink11 )

kick ( Pink11 )

kick ( Pink8)

kick ( Pink11 )

kick ( Pink11 )

kick ( Pink8 )

1. Assume every possible meaning for a sentence is correct

29

Page 40: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

KRISPER

Purple goalie turns the ball over to Pink8

badPass ( Purple1, Pink8 )

Pink11 looks around for a teammate

Pink8 passes the ball to Pink11

Purple team is very sloppy today

Pink11 makes a long pass to Pink8

Pink8 passes back to Pink11

turnover ( Purple1, Pink8 )

pass ( Pink11, Pink8 )

pass ( Pink8, Pink11 )

ballstopped

pass ( Pink8, Pink11 )

kick ( Pink11 )

kick ( Pink8)

kick ( Pink11 )

kick ( Pink11 )

kick ( Pink8 )

1. Assume every possible meaning for a sentence is correct

29

Page 41: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Purple goalie turns the ball over to Pink8

badPass ( Purple1, Pink8 )

Pink11 looks around for a teammate

Pink8 passes the ball to Pink11

Purple team is very sloppy today

Pink11 makes a long pass to Pink8

Pink8 passes back to Pink11

turnover ( Purple1, Pink8 )

pass ( Pink11, Pink8 )

pass ( Pink8, Pink11 )

ballstopped

pass ( Pink8, Pink11 )

kick ( Pink11 )

kick ( Pink8)

kick ( Pink11 )

kick ( Pink11 )

kick ( Pink8 )

1/21/2

1/31/3

1/3

1/31/31/3

1/31/3

1/3

1/21/2

KRISPER

2. Resulting NL-MR pairs are weighted and given to semantic parser learner

30

Page 42: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

KRISPER

Purple goalie turns the ball over to Pink8

badPass ( Purple1, Pink8 )

Pink11 looks around for a teammate

Pink8 passes the ball to Pink11

Purple team is very sloppy today

Pink11 makes a long pass to Pink8

Pink8 passes back to Pink11

turnover ( Purple1, Pink8 )

pass ( Pink11, Pink8 )

pass ( Pink8, Pink11 )

ballstopped

pass ( Pink8, Pink11 )

kick ( Pink11 )

kick ( Pink8)

kick ( Pink11 )

kick ( Pink11 )

kick ( Pink8 )

3. Estimate the confidence of each NL-MR pair using the resulting trained semantic parser

0.650.87

0.220.35

0.130.85 0.81

0.37

0.760.49

0.76

0.670.86

31

Page 43: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

KRISPER

Purple goalie turns the ball over to Pink8

badPass ( Purple1, Pink8 )

Pink11 looks around for a teammate

Pink8 passes the ball to Pink11

Purple team is very sloppy today

Pink11 makes a long pass to Pink8

Pink8 passes back to Pink11

turnover ( Purple1, Pink8 )

pass ( Pink11, Pink8 )

pass ( Pink8, Pink11 )

ballstopped

pass ( Pink8, Pink11 )

kick ( Pink11 )

kick ( Pink8)

kick ( Pink11 )

kick ( Pink11 )

kick ( Pink8 )

4. Use maximum weighted matching on a bipartite graph to find the best NL-MR pairs [Munkres, 1957]

0.650.87

0.220.35

0.130.85 0.81

0.37

0.760.49

0.76

0.670.86

32

Page 44: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Purple team is very sloppy today

KRISPER

Purple goalie turns the ball over to Pink8

badPass ( Purple1, Pink8 )

Pink11 looks around for a teammate

Pink8 passes the ball to Pink11

Pink11 makes a long pass to Pink8

Pink8 passes back to Pink11

turnover ( Purple1, Pink8 )

pass ( Pink11, Pink8 )

pass ( Pink8, Pink11 )

ballstopped

pass ( Pink8, Pink11 )

kick ( Pink11 )

kick ( Pink8)

kick ( Pink11 )

kick ( Pink11 )

kick ( Pink8 )

4. Use maximum weighted matching on a bipartite graph to find the best NL-MR pairs [Munkres, 1957]

0.650.87

0.220.35

0.130.85 0.81

0.37

0.760.49

0.76

0.670.86

32

Page 45: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Purple team is very sloppy today

KRISPER

Purple goalie turns the ball over to Pink8

badPass ( Purple1, Pink8 )

Pink11 looks around for a teammate

Pink8 passes the ball to Pink11

Pink11 makes a long pass to Pink8

Pink8 passes back to Pink11

turnover ( Purple1, Pink8 )

pass ( Pink11, Pink8 )

pass ( Pink8, Pink11 )

ballstopped

pass ( Pink8, Pink11 )

kick ( Pink11 )

kick ( Pink8)

kick ( Pink11 )

kick ( Pink11 )

kick ( Pink8 )

5. Give the best pairs to the semantic parser learner in the next iteration, and repeat until convergence

33

Page 46: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Overview

• Sportscasting task

• Related works

• Tactical generation

• Strategic generation

• Human evaluation

34

Page 47: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Tactical Generation

• Learn how to generate NL from MR

• Example:

• Two steps1. Disambiguate the training data

2. Learn a language generator

Pass(Pink2, Pink3) “Pink2 kicks the ball to Pink3”

35

Page 48: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

WASPER

• WASP with EM-like retraining to handle ambiguous training data.

• Same augmentation as added to KRISP to create KRISPER.

36

Page 49: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

• First train KRISPER to disambiguate the data

• Then train WASP on the resulting unambiguously supervised data.

KRISPER-WASP

37

Page 50: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

WASPER-GEN

• Determines the best matching based on generation (MR→NL).

• Score each potential NL/MR pair by using the currently trained WASP-1 generator.

• Compute NIST MT score [NIST report, 2002] between the generated sentence and the potential matching sentence.

38

Page 51: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

NIST scores

Target: Purple2 quickly passes to Purple3

Candidate: Purple2 passes to Purple3

1-grams: Purple2, passes, to, Purple3

2-grams: Purple2 passes, passes to, to Purple3

3-grams: Purple2 passes to, passes to Purple3

4-gram: Purple2 passes to Purple3

39

Page 52: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

NIST scores

Target: Purple2 quickly passes to Purple3

Candidate: Purple2 passes to Purple3

1-grams: Purple2, passes, to, Purple3

2-grams: Purple2 passes, passes to, to Purple3

3-grams: Purple2 passes to, passes to Purple3

4-gram: Purple2 passes to Purple3

4/4

39

Page 53: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

NIST scores

Target: Purple2 quickly passes to Purple3

Candidate: Purple2 passes to Purple3

1-grams: Purple2, passes, to, Purple3

2-grams: Purple2 passes, passes to, to Purple3

3-grams: Purple2 passes to, passes to Purple3

4-gram: Purple2 passes to Purple3

4/4

2/3

39

Page 54: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

NIST scores

Target: Purple2 quickly passes to Purple3

Candidate: Purple2 passes to Purple3

1-grams: Purple2, passes, to, Purple3

2-grams: Purple2 passes, passes to, to Purple3

3-grams: Purple2 passes to, passes to Purple3

4-gram: Purple2 passes to Purple3

39

4/4

2/3

1/2

Page 55: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

NIST scores

Target: Purple2 quickly passes to Purple3

Candidate: Purple2 passes to Purple3

1-grams: Purple2, passes, to, Purple3

2-grams: Purple2 passes, passes to, to Purple3

3-grams: Purple2 passes to, passes to Purple3

4-gram: Purple2 passes to Purple3

39

4/4

2/3

1/2

0/1

Page 56: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

System Overview

Purple7 loses the ball to Pink2

Sportscaster Robocup Simulator

Ambiguous Training Data

Pink2 kicks the ball to Pink5

Pink5 makes a long pass to Pink8

Pink8 shoots the ball

Turnover ( purple7 , pink2 )

Pass ( pink5 , pink8)

Pass ( Purple5, Purple7 )

Kick ( pink2 )Pass ( pink2 , pink5 )Kick ( pink5 )

BallstoppedKick ( pink8 )

40

Page 57: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

System Overview

Purple7 loses the ball to Pink2

Sportscaster Robocup Simulator

Pink2 kicks the ball to Pink5

Pink5 makes a long pass to Pink8

Pink8 shoots the ball

Turnover ( purple7 , pink2 )

Pass ( pink5 , pink8)

Pass ( Purple5, Purple7 )

Kick ( pink2 )Pass ( pink2 , pink5 )Kick ( pink5 )

BallstoppedKick ( pink8 )

Semantic Parser Learner

Initial SemanticParser

40Ambiguous Training Data

Page 58: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

System Overview

Purple7 loses the ball to Pink2

Sportscaster Robocup Simulator

Pink2 kicks the ball to Pink5

Pink5 makes a long pass to Pink8

Pink8 shoots the ball

Turnover ( purple7 , pink2 )

Pass ( pink5 , pink8)

Pass ( purple5, purple7 )

Kick ( pink2 )Pass ( pink2 , pink5 )Kick ( pink5 )

BallstoppedKick ( pink8 )

Initial SemanticParser

Purple7 loses the ball to Pink2

Pink2 kicks the ball to Pink5Pink5 makes a long pass

to Pink8Pink8 shoots the ball Kick ( pink8 )

Pass ( pink2 , pink5 )

Kick ( pink2 )

Kick ( pink5 )

40Ambiguous Training Data

Unambiguous Training Data

Page 59: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

System Overview

Purple7 loses the ball to Pink2

Sportscaster Robocup Simulator

Pink2 kicks the ball to Pink5

Pink5 makes a long pass to Pink8

Pink8 shoots the ball

Turnover ( purple7 , pink2 )

Pass ( pink5 , pink8)

Pass ( purple5, purple7 )

Kick ( pink2 )Pass ( pink2 , pink5 )Kick ( pink5 )

BallstoppedKick ( pink8 )

SemanticParser

Purple7 loses the ball to Pink2

Pink2 kicks the ball to Pink5Pink5 makes a long pass

to Pink8Pink8 shoots the ball Kick ( pink8 )

Pass ( pink2 , pink5 )

Kick ( pink2 )

Kick ( pink5 )

Semantic Parser Learner

40Ambiguous Training Data

Unambiguous Training Data

Page 60: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

System Overview

Purple7 loses the ball to Pink2

Sportscaster Robocup Simulator

Pink2 kicks the ball to Pink5

Pink5 makes a long pass to Pink8

Pink8 shoots the ball

Turnover ( purple7 , pink2 )

Pass ( pink5 , pink8)

Pass ( purple5, purple7 )

Kick ( pink2 )Pass ( pink2 , pink5 )Kick ( pink5 )

BallstoppedKick ( pink8 )

SemanticParser

Purple7 loses the ball to Pink2

Pink2 kicks the ball to Pink5Pink5 makes a long pass

to Pink8Pink8 shoots the ball Kick ( pink8 )

Pass ( pink2 , pink5 )

Kick ( pink5 )

Semantic Parser Learner

Turnover ( purple7 , pink2 )

40Ambiguous Training Data

Unambiguous Training Data

Page 61: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

System Overview

Purple7 loses the ball to Pink2

Sportscaster Robocup Simulator

Pink2 kicks the ball to Pink5

Pink5 makes a long pass to Pink8

Pink8 shoots the ball

Turnover ( purple7 , pink2 )

Pass ( pink5 , pink8)

Pass ( purple5, purple7 )

Kick ( pink2 )Pass ( pink2 , pink5 )Kick ( pink5 )

BallstoppedKick ( pink8 )

SemanticParser

Purple7 loses the ball to Pink2

Pink2 kicks the ball to Pink5Pink5 makes a long pass

to Pink8Pink8 shoots the ball Kick ( pink8 )

Pass ( pink2 , pink5 )

Kick ( pink5 )

Semantic Parser Learner

Turnover ( purple7 , pink2 )

40Ambiguous Training Data

Unambiguous Training Data

Page 62: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

System Overview

Purple7 loses the ball to Pink2

Sportscaster Robocup Simulator

Pink2 kicks the ball to Pink5

Pink5 makes a long pass to Pink8

Pink8 shoots the ball

Turnover ( purple7 , pink2 )

Pass ( pink5 , pink8)

Pass ( purple5, purple7 )

Kick ( pink2 )Pass ( pink2 , pink5 )Kick ( pink5 )

BallstoppedKick ( pink8 )

SemanticParser

Purple7 loses the ball to Pink2

Pink2 kicks the ball to Pink5Pink5 makes a long pass

to Pink8Pink8 shoots the ball Kick ( pink8 )

Pass ( pink2 , pink5 )

Semantic Parser Learner

Turnover ( purple7 , pink2 )

Pass ( pink5 , pink8)

40Ambiguous Training Data

Unambiguous Training Data

Page 63: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

KRISPER and WASPER

Purple7 loses the ball to Pink2

Sportscaster Robocup Simulator

Pink2 kicks the ball to Pink5

Pink5 makes a long pass to Pink8

Pink8 shoots the ball

Turnover ( purple7 , pink2 )

Pass ( pink5 , pink8)

Pass ( purple5, purple7 )

Kick ( pink2 )Pass ( pink2 , pink5 )Kick ( pink5 )

BallstoppedKick ( pink8 )

SemanticParser

Purple7 loses the ball to Pink2

Pink2 kicks the ball to Pink5Pink5 makes a long pass

to Pink8Pink8 shoots the ball Kick ( pink8 )

Pass ( pink2 , pink5 )

Kick ( pink5 )

Semantic Parser Learner

(KRISP/WASP)

Turnover ( purple7 , pink2 )

41Ambiguous Training Data

Unambiguous Training Data

Page 64: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

WASPER-GEN

Purple7 loses the ball to Pink2

Sportscaster Robocup Simulator

Pink2 kicks the ball to Pink5

Pink5 makes a long pass to Pink8

Pink8 shoots the ball

Turnover ( purple7 , pink2 )

Pass ( pink5 , pink8)

Pass ( purple5, purple7 )

Kick ( pink2 )Pass ( pink2 , pink5 )Kick ( pink5 )

BallstoppedKick ( pink8 )

TacticalGenerator

Purple7 loses the ball to Pink2

Pink2 kicks the ball to Pink5Pink5 makes a long pass

to Pink8Pink8 shoots the ball Kick ( pink8 )

Pass ( pink2 , pink5 )

Kick ( pink5 )

Tactical Generator Learner(WASP)

Turnover ( purple7 , pink2 )

42Ambiguous Training Data

Unambiguous Training Data

Page 65: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Disambiguation Learning language generator

WASP Random WASP

KRISPER

(Kate & Mooney, 2007)

KRISP N/A

WASPER WASP WASP

KRISPER-WASP KRISP WASP

WASPER-GEN WASP’s language generator

WASP

WASP with gold matching

N/A WASP

Systems

43

Page 66: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Disambiguation Learning language generator

WASP Random WASP

KRISPER

(Kate & Mooney, 2007)

KRISP N/A

WASPER WASP WASP

KRISPER-WASP KRISP WASP

WASPER-GEN WASP’s language generator

WASP

WASP with gold matching

N/A WASP

Lower baseline

Upper baseline

Systems

43

Page 67: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Disambiguation Learning language generator

WASP Random WASP

KRISPER

(Kate & Mooney, 2007)

KRISP N/A

WASPER WASP WASP

KRISPER-WASP KRISP WASP

WASPER-GEN WASP’s language generator

WASP

WASP with gold matching

N/A WASP

Lower baseline

Upper baseline

Systems

Matching

44

Page 68: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Matching

• 4 Robocup championship games from 2001-2004.– Avg # events/game = 2,613– Avg # sentences/game = 509

• Leave-one-game-out cross-validation• Metric:

– Precision: % of system’s annotations that are correct– Recall: % of gold-standard annotations produced– F-measure: Harmonic mean of precision and recall

45

Page 69: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Matching Results

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Average results on leave-one-outexperiments

F-m

easu

re WASPKRISPERWASPERWASPER-GEN

46

Page 70: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Disambiguation Learning language generator

WASP Random WASP

KRISPER

(Kate & Mooney, 2007)

KRISP N/A

WASPER WASP WASP

KRISPER-WASP KRISP WASP

WASPER-GEN WASP’s language generator

WASP

WASP with gold matching

N/A WASP

Lower baseline

Upper baseline

Systems

47

Page 71: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Disambiguation Learning language generator

WASP Random WASP

KRISPER

(Kate & Mooney, 2007)

KRISP N/A

WASPER WASP WASP

KRISPER-WASP KRISP WASP

WASPER-GEN WASP’s language generator

WASP

WASP with gold matching

N/A WASP

Lower baseline

Upper baseline

SystemsTactical

Generation

47

Page 72: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Tactical Generation

• 4 Robocup championship games from 2001-2004.– Avg # events/game = 2,613– Avg # sentences/game = 509

• Leave-one-game-out cross-validation• NIST score [NIST report, 2002]

– Evaluate the quality of machine translations based on matching n-grams

48

Page 73: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Tactical Generation Results

0

1

2

3

4

5

6

Average results on leave-one-out experiments

NIS

T

WASP

WASPER

KRISPER-WASPWASPER-GEN

WASP with goldmatching

49

Page 74: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Overview

• Sportscasting task

• Related works

• Tactical generation

• Strategic generation

• Human evaluation

50

Page 75: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Strategic Generation

• Generation requires not only knowing how to say something (tactical generation) but also what to say (strategic generation).

• For automated sportscasting, one must be able to effectively choose which events to describe.

51

Page 76: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Example of Strategic Generation

pass ( purple7 , purple6 )

ballstopped

kick ( purple6 )

pass ( purple6 , purple2 )

ballstopped

kick ( purple2 )

pass ( purple2 , purple3 )

kick ( purple3 )

badPass ( purple3 , pink9 )

turnover ( purple3 , pink9 )

52

Page 77: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Example of Strategic Generation

pass ( purple7 , purple6 )

ballstopped

kick ( purple6 )

pass ( purple6 , purple2 )

ballstopped

kick ( purple2 )

pass ( purple2 , purple3 )

kick ( purple3 )

badPass ( purple3 , pink9 )

turnover ( purple3 , pink9 )

52

Page 78: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Strategic Generation

• For each event type (e.g. pass, kick) estimate the probability that it is described by the sportscaster.

• Requires correct NL/MR matching– Use estimated matching from tactical

generation– Iterative Generation Strategy Learning

53

Page 79: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Iterative Generation Strategy Learning (IGSL)

• Directly estimates the likelihood of an event being commented on

• Self-training iterations to improve estimates

• Uses events not associated with any NL as negative evidence

54

Page 80: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Strategic Generation Performance

• Evaluate how well the system can predict which events a human comments on

• Metric:– Precision: % of system’s annotations that are

correct– Recall: % of gold-standard annotations

correctly produced– F-measure: Harmonic mean of precision and

recall

55

Page 81: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Strategic Generation Results

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Average results on leave-one-game-out cross-validation

F-m

easu

re

inferred fromWASPinferred fromKRISPERinferred fromWASPERinferred fromWASPER-GENIGSL

inferred fromgold matching

56

Page 82: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Overview

• Sportscasting task

• Related works

• Tactical generation

• Strategic generation

• Human evaluation

57

Page 83: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

• 4 fluent English speakers as judges• 8 commented game clips

– 2 minute clips randomly selected from each of the 4 games

– Each clip commented once by a human, and once by the machine

• Presented in random counter-balanced order• Judges were not told which ones were human or

machine generated

Human Evaluation (Quasi Turing Test)

58

Page 84: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Demo Clip

• Game clip commentated using WASPER-GEN with IGSL, since this gave the best results for generation.

• FreeTTS was used to synthesize speech from textual output.

59

Page 85: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Human Evaluation

ScoreEnglish Fluency

Semantic Correctness

Sportscasting Ability

5 Flawless Always Excellent

4 Good Usually Good

3 Non-native Sometimes Average

2 Disfluent Rarely Bad

1 Gibberish Never Terrible

60

Page 86: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Human Evaluation

CommentatorEnglishFluency

Semantic Correctness

SportscastingAbility

Human 3.94 4.25 3.63

Machine 3.44 3.56 2.94

Difference 0.5 0.69 0.69

ScoreEnglish Fluency

Semantic Correctness

Sportscasting Ability

5 Flawless Always Excellent

4 Good Usually Good

3 Non-native Sometimes Average

2 Disfluent Rarely Bad

1 Gibberish Never Terrible

60

Page 87: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Future Work

• Expand MRs to beyond simple logic formulas• Apply approach to learning situated language in a

computer video-game environment (Gorniak & Roy, 2005)

• Apply approach to captioned images or video using computer vision to extract objects, relations, and events from real perceptual data (Fleischman & Roy, 2007)

61

Page 88: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Conclusion

• Current language learning work uses expensive, unrealistic training data.

• We have developed a language learning system that can learn from language paired with an ambiguous perceptual environment.

• We have evaluated it on the task of learning to sportscast simulated Robocup games.

• The system learns to sportscast almost as well as humans.

62

Page 89: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Backup Slides

Page 90: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Robocup Sportscaster Trace

purple6 passes to purple2

purple3 loses the ball to pink9

purple2 makes a short pass to purple3

ballstopped

kick ( purple6 )

pass ( purple6 , purple2 )

turnover ( purple3 , pink9 )

kick ( purple2 )

pass ( purple2 , purple3 )

kick ( purple3 )

Natural Language Commentary Meaning Representation

Page 91: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Robocup Sportscaster Trace

purple6 passes to purple2

purple3 loses the ball to pink9

purple2 makes a short pass to purple3

pass ( purple6 , purple2 )

pass ( purple2 , purple3 )

Natural Language Commentary Meaning Representation

Page 92: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Robocup Sportscaster Trace

purple6 passes to purple2

purple3 loses the ball to pink9

purple2 makes a short pass to purple3

kick ( purple6 )

kick ( purple2 )

kick ( purple3 )

Natural Language Commentary Meaning Representation

Page 93: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Robocup Sportscaster Trace

purple6 passes to purple2

purple3 loses the ball to pink9

purple2 makes a short pass to purple3

kick ( purple 3 )

ballstopped

kick ( purple6 )

pass ( purple6 , purple2 )

kick ( purple2 )

turnover ( purple3 , pink9 )

kick ( purple2 )

pass ( purple2 , purple3 )

kick ( purple3 )

kick (purple 3

Natural Language Commentary Meaning Representation

Page 94: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Robocup Sportscaster Trace

purple6 passes to purple2

purple3 loses the ball to pink9

purple2 makes a short pass to purple3

kick ( purple 3 )

kick ( purple6 )

kick ( purple2 )

kick ( purple2 )

kick ( purple3 )

kick (purple 3

Natural Language Commentary Meaning Representation

Page 95: David Chen Advisor: Raymond Mooney Research Preparation Exam August 21, 2008 Learning to Sportscast: A Test of Grounded Language Acquisition

Robocup Sportscaster Trace

purple6 passes to purple2

purple3 loses the ball to pink9

purple2 makes a short pass to purple3

kick ( purple 3 )

kick ( purple6 )

kick ( purple2 )

kick ( purple2 )

kick ( purple3 )

kick (purple 3 )

Natural Language Commentary Meaning Representation

Negative Evidence