27
Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney

Learning to Transform Natural to Formal Language

  • Upload
    nhi

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Learning to Transform Natural to Formal Language. Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney. Presented by Ping Zhang. Overview. Background SILT CL ANG and G EOQUERY Semantic Parsing using Transformation rules String-based learning Tree-based learning Experiments Future work - PowerPoint PPT Presentation

Citation preview

Page 1: Learning to Transform Natural to Formal Language

Learning to Transform Natural to Formal Language

Presented by Ping Zhang

Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney

Page 2: Learning to Transform Natural to Formal Language

May 13th, 2006 2

Overview

Background SILT

CLANG and GEOQUERY Semantic Parsing using Transformation rules String-based learning Tree-based learning Experiments Future work Conclusion

Page 3: Learning to Transform Natural to Formal Language

May 13th, 2006 3

Natural Language Processing (NLP)

Natural Language—human language. English

The reason to process NL: To provide a much user-friendly interface

Problems: NL is too complex. NL has many ambiguities. Until now, NL cannot be used to program a computer.

Page 4: Learning to Transform Natural to Formal Language

May 13th, 2006 4

Classification of Language

Traditionally classification (Chomsky Hierarchy) Regular grammar Context-free grammar—Formal Language Context-sensitive grammar Unrestricted grammar—Natural Language

All programming languages are less flexible than context-sensitive languages currently. For example, C++ is a restricted context-sensitive

language.

Page 5: Learning to Transform Natural to Formal Language

May 13th, 2006 5

An Approach to process NL

Map a natural language to a formal query or command language.

Therefore, NL interfaces to complex computing and AI systems can be more easily developed.

English Formal LanguageMap Compiler

Interpreter

Page 6: Learning to Transform Natural to Formal Language

May 13th, 2006 6

Grammar Terms

Grammar:G = (N, T, S, P) N: finite set of Non-terminal symbols T: finite set of Terminal symbols S: Starting non-terminal symbol, S∈N P: finite set of productions

Production: x->y For example,

Noun -> “computer” AssignmentStatement -> i := 10; Statements -> Statement; Statements

*)(,)( TNyTNx

Page 7: Learning to Transform Natural to Formal Language

May 13th, 2006 7

SILT

SILT—Semantic Interpretation by Learning Transformations

Transformation rules Map substrings in NL sentences or subtrees in their corresponding syntactic parse trees to subtrees of the formal-language parse tree.

SILT learns transformation rules from training data—pairs of NL sentences and manual translated formal language statements.

Two target formal languages: CLANG GEOQUERY

Page 8: Learning to Transform Natural to Formal Language

May 13th, 2006 8

CLANG

A formal language used in coaching robotic soccer in the RoboCup Coach Competition.

CLANG grammar consists of 37 non-terminals and 133 productions.

All tactics and behaviors are expressed in terms of if-then rules

An example: ( (bpos (penalty-area our) )

(do (player-except our {4} ) (pos (half our) ) ) ) “If the ball is in our penalty area, all our players except pla

yer 4 should stay in our half.”

Page 9: Learning to Transform Natural to Formal Language

May 13th, 2006 9

GEOQUERY

A database query language for a small database of U.S. geography.

The database contains about 800 facts. Based on Prolog with meta-predicates augmentatio

ns. An example:

answer(A, count(B, (city(B), loc(B, C),const(C, countryid(usa) ) ),A) )

“How many cities are there in the US?”

Page 10: Learning to Transform Natural to Formal Language

May 13th, 2006 10

Two methods

String-based transformation learning Directly maps strings of the NL sentences to the parse tree

of formal languages

Tree-based transformation learning Maps subtrees to subtrees between two languages. Assumes the syntactic parse tree and parser of the NL sent

ences are provided

Page 11: Learning to Transform Natural to Formal Language

May 13th, 2006 11

Semantic Parsing

“TEAM UNUM has the ball”

CONDITION →(bowner TEAM {UNUM})

SNP

TEAM UNUM

VP

VBZ

has

NP

DT

the

NN

ball

Page 12: Learning to Transform Natural to Formal Language

May 13th, 2006 12

Examples of Parsing

1. “If our player 4 has the ball, our player 4 should shoot.”

2. “If TEAM UNUM has the ball, TEAM UNUM should ACTION.” our 4 our 4 (shoot)

3. “If CONDITION , TEAM UNUM should ACTION.” (bowner our {4}) our 4 (shoot)

4. “If CONDITION , DIRECTIVE .” (bowner our {4}) (do our {4} (shoot) )

5. RULE( (bowner our {4}) (do our {4} (shoot) ))

Page 13: Learning to Transform Natural to Formal Language

May 13th, 2006 13

Variations of Rule Representation

SILT allows patterns to skip some words or nodes “if CONDITION, <1> DIRECTIVE.”

<1> -> ”then” To deal with non-compositionality

SILT allows to apply constrains “in REGION” matches “CONDITION -> (bpos REGION)” if

“in REGION” follows “the ball <1>”.

SILT allows to use templates with multi productions “TEAM player UNUM has the ball in REGION”

CONDITION → (and (bowner TEAM UNUM) (bpos REGION))

Page 14: Learning to Transform Natural to Formal Language

May 13th, 2006 14

Learning Transformation Rules

Input: A training set T of NL sentences paired with formal representations;a set of productions in the formal grammarOutput: A learned rule base LAlgorithm:Parse all formal representations in T using .Collect positive P and negative examples N for all .∈L = ∅Until all positive examples are covered, or no more good rulescan be found for any , ∈ do:

R’ = FindeBestRules( ,P,N)L = L R’∪Apply rules in L to sentences in T.

Given a NL sentence S:

• P: if is used in the formal expression of S, then S is positive to

• N: if is not used in the formal expression of S, then S is negative to

Page 15: Learning to Transform Natural to Formal Language

May 13th, 2006 15

Issues of SILT Learning

Non-compositionality

Rule cooperation Rules are learn in order. Therefore an over-general ancestor will lead to a

group of over-general child rules. Further, no rule can cooperate with that kind of rules.

Two approaches can solve:1. Find the single best rule for all competing productions

in each iteration.2. Over generate rules; then find a subset which can

cooperate

Page 16: Learning to Transform Natural to Formal Language

May 13th, 2006 16

FindBestRule() For String-based Learning

Input: A set of productions in the formal grammar; sets ofpositive P and negative examples N for each in Output: The best rule BRAlgorithm:R = ∅For each production π∈ Π :

Let Rπ be the maximally-specific rules derived from P.Repeat for k = 1000 times:

Choose r1, r2 ∈ Rπ at random.g = GENERALIZE(r1, r2, π)Add g to R.

R = R ∪ RBR = argmax r ∈ R goodness(r)Remove positive examples covered by BR from P .

Page 17: Learning to Transform Natural to Formal Language

May 13th, 2006 17

FindBestRule() Cont.

Goodness (r)

GENERALIZE r1, r2 : two transformation rules based on the same

production For example:

π : Region -> (penalty-area TEAM) pattern 1: TEAM ‘s penalty box pattern 2: TEAM penalty area Generalization: TEAM <1> penalty

)()(

))(()(

2

rnegrpos

rposrgoodness

Page 18: Learning to Transform Natural to Formal Language

May 13th, 2006 18

Tree-based Learning

Similar FindBestRules() algorithm

GENERALIZE Find the largest common subgraphs of two rules. For example:

π : Region -> (penalty-area TEAM)

Pattern 1 Pattern 2 Generalization

NNpenalty

NP

NPTEAM

POS ‘s

NNbox

PRP$ TEAM

NNarea

NP

NNpenalty

NP , TEAM TEAM

NNpenalty

NN

Page 19: Learning to Transform Natural to Formal Language

May 13th, 2006 19

Experiment

As for CLANG 300 pieces selected randomly from log files of 2003 RoboC

up Coach Competition. Each formal instruction was translated into English by hum

an. Average length of a NL sentence is 22.52 words.

As for GEOQUERY 250 questions were collected from undergraduate student

s. All English queries were translated manually. Average length of a NL sentence is 6.87 words.

Page 20: Learning to Transform Natural to Formal Language

May 13th, 2006 20

Result for CLANG

Page 21: Learning to Transform Natural to Formal Language

May 13th, 2006 21

Result for CLANG (Cont.)

Page 22: Learning to Transform Natural to Formal Language

May 13th, 2006 22

Result for GEOQUERY

Page 23: Learning to Transform Natural to Formal Language

May 13th, 2006 23

Result for GEOQUERY (Cont.)

Page 24: Learning to Transform Natural to Formal Language

May 13th, 2006 24

Time Consuming

Time consuming in minutes.

Page 25: Learning to Transform Natural to Formal Language

May 13th, 2006 25

Future Work

Though improved, SILT still lacks robustness of statistical parsing.

The hard-matching symbolic rules of SILT are sometimes too brittle.

A more unified implementation of tree-based SILT which allows to directly compare and evaluate the benefit of using initial syntactic parsers.

Page 26: Learning to Transform Natural to Formal Language

May 13th, 2006 26

Conclusion

A novel approach, SILT, can learn transformation rules that maps NL sentences into a formal language.

It shows better overall performance than previous approaches.

NLP, still a long way to go.

Page 27: Learning to Transform Natural to Formal Language

May 13th, 2006 27

Thank you!

Questions or comments?