29
SWG Strategy v1 (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley, IBM UK Steve Poteet, Boeing

SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

Embed Size (px)

Citation preview

Page 1: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy

v1 (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.

ACITA 2011 demonstration of ongoing NLP work

Dave Braines, David Mott,ETS, Hursley, IBM UKSteve Poteet, Boeing

Page 2: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.2

Supporting the analyst

doc27doc27

doc27

CE Facts

Inference Rationale

Argumentation

Search

Analysts Conceptual

Model

Assumptions

Uncertainty CE Tools

NLP

Requirements

Product

Page 3: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.3

Controlled English

A Controlled Natural Language, being a subset of English

– limited syntax, but still readable as English

– meanings of the expressions unambiguously defined

Avoids the complexity of a real Natural Language

– computer systems can read, interpret and apply it

Retains the appearance of a real language

– humans can naturally use it, without learning "computer speak"

The analyst may use Controlled English to construct their Conceptual Model

the person John is married to the person Jane and has red as hair colour.

Page 4: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.4

Current NLP Research Objectives

Improve Natural Language Processing of facts from documents

– analyst may utilise more information when inferencing

Allow the humans to be part of the NL processing

– hybrid reasoning about ambiguities, incomplete parsing, etc

Facilitate configuration of NLP tools in CE

Define a model of linguistics, grammar, semantics

Improve Expressibility of CE

– much interest, but needs a more powerful grammar

How is the Analysts Conceptual Model related to Natural Language?

Page 5: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.5

We have used CE to model:

[5]

• Collaborative Planning• Analysis of IED activities and societal influences• Matching Sensors to Missions • Provenance• Social Networks (Twitter)• UK Government data (crimes, accidents,

schools)• NL processing itself

Page 6: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.6

Our design principles for CE enhancement

Retain existing principles of a CE conceptual model

Based on full English grammar

Chart parser for efficient syntax parsing

Formal semantics, based upon scientific theory

Higher level extensions handled in same theory

Parser configurable in CE, based on linguistic model

Modelling of Sentence Context

Aim to significantly enhance CE expressivity

Page 7: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.7

Parallel NL and CNL parsers

NL Parser CNL Parserlexicon

conceptualmodel

Reference English

Grammar

SemanticTheory

Increase expressibility of CEBetter understanding of linguistics

expressive CE

basic CE or predicate logicexpressive CE

NLP

Page 8: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.8

Control of ambiguity

we start from basic CE and move towards full English

How do we handle crossing the ambiguity barrier?

Basic CE

anaphoric reference

sub clauses

prepositional phrases flexible identities

verb inflections

domain specific syntax

Ambiguity

Ambiguity Barrier

Full English

Page 9: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.9

Stanford parser as reference

But only provides syntax, what about semantics?

there is a person named Joe.

Stanford CE parser

Page 10: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.10

Extended CE Parser

S

NP VP

EX

there

NP

DT

VP

NN

a person

VBZ

is

VBN

named

NP

NNP

Joe

person(Joe)

v(A), A=Joe, person(A)

v(A), A=Joe, person(A)

v(A), A=Joe

exists(A)

v(A), person(A)

Semantics (based on Montague semantics)

@copula

@be @postmodifier

@nonfinite

Full English Syntax

Page 11: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.11

Linguistic Framethere is a linguistic frame named vp0 that

has 'is the dog Fido' as example and

defines the verb phrase VP_vp0 and

has the sequence

( the copula BE_vp0 , and the noun phrase OBJ_vp0 )

as syntactic pattern and

is predicated on the thing T and

has the statement that

( the noun phrase OBJ_vp0 is predicated on the thing OBJ )

and

( the thing T is the same as the thing OBJ )

as semantic statement.

the word |is| belongs to the linguistic category 'copula'.

the word |dog| is a noun.

the entity concept ce:Dog is expressed by the word |dog| and

has 'dog' as concept term.

semantics

syntax

copula noun phrase

verb phrase

is the dog fido

v(OBJ), dog(OBJ)..

v(T) T=OBJ,...

Analyst's Conceptual Model

Linguistic Model

Makes explicit a semantic theory

Page 12: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.12

Allowing analyst to define how words express concepts

Analyst

Analyst Helper

Conceptual Model

wordnet itanet

Entity Extractor

Stanford parser

Document

the concept C has the same meaning as the synset S.

the noun phrase NP has the word W as head/modifier and stands for the thing T.

the thing T is categorised as the concept C.

Page 13: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.13

Mapping CE concepts to words via WordNet synsets

meaning meaning

synset concept

word sense

word

lexicographer analyst

word sense

word

the synset {tank, armoured combat vehicle} means the same as the concept tank.

{tank,armoured combat vehicle}

armoured combat vehicle/1 tank/1

armoured combat vehicle

tank

conceptualise a ~ tank ~ T.

“meeting of minds”

the synset {tank, armoured combat vehicle} has the word sense tank/1 as component.

the word |tank| expresses the concept tank .

The Analyst STILL has to decide the

lexical relations, since only he knows what

his concept is

Page 14: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.14

CE rules to use WordNet to relate words to concepts

if ( the synset S means the same as to the concept C ) and

( the synset S has the word sense WS as component ) and

( the word sense WS has the word W as word )

then

( the word W expresses the concept C )

Analyst provides the link between his meaning and a

standard meaning

Now the parser can link words to concepts

Page 15: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.15

Rationale for entity extraction

the concept C has the same meaning as the synset S.

the noun phrase NP has the word W as head/modifier

the word sense WS adds meaning to the wordnet synset S.

the thing T is categorised as the concept C

the noun phrase NP stands for the thing T.

the word W expresses the concept C.

the word W expresses the word sense WS

Stanford Parser

wo

rdn

et

Document

Entity Extractor

the word sense WS adds meaning to the ita synset S.

the word W expresses the word sense WS

Analyst Helper

Wordnet Inference

there is an ita synset named S.

(General Semantics)

Page 16: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.16

Hierarchy of linguistic frames

predicate CE

semantics

syntax

the person John attends the meeting X. the person Jane attends the meeting X.

there is a situation X that is categorised as the concept meeting and has the person John as agent role and has the person Jane as patient role.

linguistic CE

semantics

syntax

domain CE

semantics

syntax

specialist CE

semantics

syntax

John attends a meeting with Jane.

Predicate Logic

the formula f3 has the statement that ( there is a meeting situation [123] that has the person Jane as patient agent and has the person John as agent role ) as semantic expression

Page 17: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.17

Combining Linguistic and Analytic Rationale

A fact extracted by a parser may lead to conclusions via analysts reasoning

– may include assumptions and uncertainty

The extraction of the fact may itself include assumptions and uncertainty

The total rationale graph of linguistic and analysts reasoning shows all sources of uncertainty

– removing a linguistic assumption may lead to no support for the analysts conclusions

Argumentation may need to occur at both the linguistic and analytic level

– but different skills (and people) needed for the different levels

Page 18: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.18

CE Store and Agents

CE Storepre-processing

Analysts Model

Documents, Reports

Analysisproduct

dialog context

grammar parsing1

semantic1

semantic2

semanticNanalystsinference

semantic models

Metadata structure

grammar parsing2

semantic3

Entities and

relations

Lexicon/Grammar

rules

ParsesRules

Metadata structure

Page 19: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.19

Extractor/Anaphor Agent

CE StoreAnalysts Model

Stanford Parser

Entity Extraction Entities

and "same as"

relations

Parse TreeRulesSYNCOIN

sentences

Anaphor Resolution

Java Agent

Java

Ag

en

t

Linguistic Model

Analysts Model

Linguistically Identified

Linguistic Model

•Stanford Parser reads SYNCOIN data and generates parse trees•Anaphor/Extractor Agent reads parse information and uses rules + models to:

• turn noun phrases into entities ("market")• link noun phrases that are anaphoric references ("he")

Page 20: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.20

Sample Entity Extraction Rules in CEif

( the noun phrase NP stands for the thing T and has the noun N as head ) and

( the noun N expresses the concept C )

then

( the thing T is categorised as the concept C )

.

if

( the noun phrase NP stands for the thing T and has the adjective A as modifier ) and

( the adjective A expresses the concept C )

then

( the thing T is categorised as the concept C )

.

if ( the noun phrase NP stands for the thing T and has the personal pronoun |he| as head )

then

( the thing T is a man ).

Page 21: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.21

Simplistic Anaphor Rules in CEif ( the noun phrase NP has the personal pronoun PRP as head )

then

( the noun phrase NP is an anaphor )

.

if ( the noun phrase NPA is an anaphor ) and

( the noun phrase NPA follows the noun phrase NP ) and

( the noun phrase NP stands for the man T ) and

( the noun phrase NPA stands for the man TA )

then

( the noun phrase NPA is coreferent with the noun phrase NP )

.

if ( the noun phrase NP1 is coreferent with the noun phrase NP2 ) and

( the noun phrase NP1 stands for the thing T1 ) and

( the noun phrase NP2 stands for the thing T2 )

then

( the thing T1 is the same as the thing T2 )

.

Needs much more rules with selection constraints on the target NP

Needs to handle more categories

Page 22: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.22

Extended CE Parser Agent

CE Store

CE parserCE

semantics semantic statement

Entities

Lexicon

SYNCOIN sentences

Grammar pattern

Linguistic Frame

mapping to concepts

Predicate Logic Model

SYNCOIN Model

•CE Parser agent reads SYNCOIN data and runs simple CE linguistic frames•Agent extracts best" parse", turns into low level CE•This is simple entity extraction

• when the noun phrase is at the start ("the man ...")

Java Agent

Analysts Model

Page 23: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.23

Extended CE Parser

Chart Parser

Phrase structure grammar

lexical categories annotations

lexicon of words, categories and

syntactic features

Semantic processor

Semantic representation

and combination

lock-step

Parse Trees

Logical Representation

Documents, Reports

CE

mapping to concepts

semantic statement

(1-1)

syntactic pattern

linguistic frame

Linguistic Model

Analyst's Conceptual Model

Predicate Logic ModelMapping assumes simple

1=1 word to concept

Page 24: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.24

CE fact extraction framework

SYNCOIN Sentence

as parsed by Stanford Parser + CE semantic extraction rules

SYNCOIN Sentence

as parsed by CE Parser + CE semantic extraction rules

Basic syntactic parse tree information from Stanford Parser Basic syntactic parse tree information from CE Parser

Semantic information more general than the ACM Semantic information more general than the ACM

Semantic information added from Analysts Conceptual Model Semantic information added from Analysts Conceptual Model

CE facts extracted from sentence CE facts extracted from sentence

Page 25: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.25

Applying rules to find entities

Page 26: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.26

Prepositional phrase "in" as a container

Page 27: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.27

Backup

Page 28: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.28

Using WordNet to extend the linguistic mappings

meaning meaning

synset concept

word sense

word

lexicographer analyst

word sense

word

the synset {tank, armoured combat vehicle} means the same as the concept tank.

{tank,armoured combat vehicle}

armoured combat vehicle/1tank/1

armoured combat vehicle

tank

conceptualise a ~ tank ~ T.

“meeting of minds”

the synset {tank, armoured combat vehicle} has the word sense tank/1 as component.

synset

the synset ‘{tank,armoured combat vehicle} ' is a hyponym of the synset ‘{military vehicle}'.

‘{military vehicle}'.

word

military vehicle.

the synset {military vehicle} means the same as the concept tank.

the word |military vehicle| expresses the concept tank.

Page 29: SWG Strategy (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved. v1 ACITA 2011 demonstration of ongoing NLP work Dave Braines, David Mott, ETS, Hursley,

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.29

CE rules to use WordNet to extend word-to-concept relations

if ( the synset S means the same as the concept C ) and

( the synset S is a hyponym of the synset Super )

then

( the synset Super means the same as the concept C )

.