Upload
miguel-hernandez
View
214
Download
1
Tags:
Embed Size (px)
Citation preview
SWG Strategy
v1 (C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.
ACITA 2011 demonstration of ongoing NLP work
Dave Braines, David Mott,ETS, Hursley, IBM UKSteve Poteet, Boeing
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.2
Supporting the analyst
doc27doc27
doc27
CE Facts
Inference Rationale
Argumentation
Search
Analysts Conceptual
Model
Assumptions
Uncertainty CE Tools
NLP
Requirements
Product
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.3
Controlled English
A Controlled Natural Language, being a subset of English
– limited syntax, but still readable as English
– meanings of the expressions unambiguously defined
Avoids the complexity of a real Natural Language
– computer systems can read, interpret and apply it
Retains the appearance of a real language
– humans can naturally use it, without learning "computer speak"
The analyst may use Controlled English to construct their Conceptual Model
the person John is married to the person Jane and has red as hair colour.
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.4
Current NLP Research Objectives
Improve Natural Language Processing of facts from documents
– analyst may utilise more information when inferencing
Allow the humans to be part of the NL processing
– hybrid reasoning about ambiguities, incomplete parsing, etc
Facilitate configuration of NLP tools in CE
Define a model of linguistics, grammar, semantics
Improve Expressibility of CE
– much interest, but needs a more powerful grammar
How is the Analysts Conceptual Model related to Natural Language?
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.5
We have used CE to model:
[5]
• Collaborative Planning• Analysis of IED activities and societal influences• Matching Sensors to Missions • Provenance• Social Networks (Twitter)• UK Government data (crimes, accidents,
schools)• NL processing itself
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.6
Our design principles for CE enhancement
Retain existing principles of a CE conceptual model
Based on full English grammar
Chart parser for efficient syntax parsing
Formal semantics, based upon scientific theory
Higher level extensions handled in same theory
Parser configurable in CE, based on linguistic model
Modelling of Sentence Context
Aim to significantly enhance CE expressivity
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.7
Parallel NL and CNL parsers
NL Parser CNL Parserlexicon
conceptualmodel
Reference English
Grammar
SemanticTheory
Increase expressibility of CEBetter understanding of linguistics
expressive CE
basic CE or predicate logicexpressive CE
NLP
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.8
Control of ambiguity
we start from basic CE and move towards full English
How do we handle crossing the ambiguity barrier?
Basic CE
anaphoric reference
sub clauses
prepositional phrases flexible identities
verb inflections
domain specific syntax
Ambiguity
Ambiguity Barrier
Full English
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.9
Stanford parser as reference
But only provides syntax, what about semantics?
there is a person named Joe.
Stanford CE parser
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.10
Extended CE Parser
S
NP VP
EX
there
NP
DT
VP
NN
a person
VBZ
is
VBN
named
NP
NNP
Joe
person(Joe)
v(A), A=Joe, person(A)
v(A), A=Joe, person(A)
v(A), A=Joe
exists(A)
v(A), person(A)
Semantics (based on Montague semantics)
@copula
@be @postmodifier
@nonfinite
Full English Syntax
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.11
Linguistic Framethere is a linguistic frame named vp0 that
has 'is the dog Fido' as example and
defines the verb phrase VP_vp0 and
has the sequence
( the copula BE_vp0 , and the noun phrase OBJ_vp0 )
as syntactic pattern and
is predicated on the thing T and
has the statement that
( the noun phrase OBJ_vp0 is predicated on the thing OBJ )
and
( the thing T is the same as the thing OBJ )
as semantic statement.
the word |is| belongs to the linguistic category 'copula'.
the word |dog| is a noun.
the entity concept ce:Dog is expressed by the word |dog| and
has 'dog' as concept term.
semantics
syntax
copula noun phrase
verb phrase
is the dog fido
v(OBJ), dog(OBJ)..
v(T) T=OBJ,...
Analyst's Conceptual Model
Linguistic Model
Makes explicit a semantic theory
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.12
Allowing analyst to define how words express concepts
Analyst
Analyst Helper
Conceptual Model
wordnet itanet
Entity Extractor
Stanford parser
Document
the concept C has the same meaning as the synset S.
the noun phrase NP has the word W as head/modifier and stands for the thing T.
the thing T is categorised as the concept C.
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.13
Mapping CE concepts to words via WordNet synsets
meaning meaning
synset concept
word sense
word
lexicographer analyst
word sense
word
the synset {tank, armoured combat vehicle} means the same as the concept tank.
{tank,armoured combat vehicle}
armoured combat vehicle/1 tank/1
armoured combat vehicle
tank
conceptualise a ~ tank ~ T.
“meeting of minds”
the synset {tank, armoured combat vehicle} has the word sense tank/1 as component.
the word |tank| expresses the concept tank .
The Analyst STILL has to decide the
lexical relations, since only he knows what
his concept is
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.14
CE rules to use WordNet to relate words to concepts
if ( the synset S means the same as to the concept C ) and
( the synset S has the word sense WS as component ) and
( the word sense WS has the word W as word )
then
( the word W expresses the concept C )
Analyst provides the link between his meaning and a
standard meaning
Now the parser can link words to concepts
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.15
Rationale for entity extraction
the concept C has the same meaning as the synset S.
the noun phrase NP has the word W as head/modifier
the word sense WS adds meaning to the wordnet synset S.
the thing T is categorised as the concept C
the noun phrase NP stands for the thing T.
the word W expresses the concept C.
the word W expresses the word sense WS
Stanford Parser
wo
rdn
et
Document
Entity Extractor
the word sense WS adds meaning to the ita synset S.
the word W expresses the word sense WS
Analyst Helper
Wordnet Inference
there is an ita synset named S.
(General Semantics)
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.16
Hierarchy of linguistic frames
predicate CE
semantics
syntax
the person John attends the meeting X. the person Jane attends the meeting X.
there is a situation X that is categorised as the concept meeting and has the person John as agent role and has the person Jane as patient role.
linguistic CE
semantics
syntax
domain CE
semantics
syntax
specialist CE
semantics
syntax
John attends a meeting with Jane.
Predicate Logic
the formula f3 has the statement that ( there is a meeting situation [123] that has the person Jane as patient agent and has the person John as agent role ) as semantic expression
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.17
Combining Linguistic and Analytic Rationale
A fact extracted by a parser may lead to conclusions via analysts reasoning
– may include assumptions and uncertainty
The extraction of the fact may itself include assumptions and uncertainty
The total rationale graph of linguistic and analysts reasoning shows all sources of uncertainty
– removing a linguistic assumption may lead to no support for the analysts conclusions
Argumentation may need to occur at both the linguistic and analytic level
– but different skills (and people) needed for the different levels
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.18
CE Store and Agents
CE Storepre-processing
Analysts Model
Documents, Reports
Analysisproduct
dialog context
grammar parsing1
semantic1
semantic2
semanticNanalystsinference
semantic models
Metadata structure
grammar parsing2
semantic3
Entities and
relations
Lexicon/Grammar
rules
ParsesRules
Metadata structure
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.19
Extractor/Anaphor Agent
CE StoreAnalysts Model
Stanford Parser
Entity Extraction Entities
and "same as"
relations
Parse TreeRulesSYNCOIN
sentences
Anaphor Resolution
Java Agent
Java
Ag
en
t
Linguistic Model
Analysts Model
Linguistically Identified
Linguistic Model
•Stanford Parser reads SYNCOIN data and generates parse trees•Anaphor/Extractor Agent reads parse information and uses rules + models to:
• turn noun phrases into entities ("market")• link noun phrases that are anaphoric references ("he")
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.20
Sample Entity Extraction Rules in CEif
( the noun phrase NP stands for the thing T and has the noun N as head ) and
( the noun N expresses the concept C )
then
( the thing T is categorised as the concept C )
.
if
( the noun phrase NP stands for the thing T and has the adjective A as modifier ) and
( the adjective A expresses the concept C )
then
( the thing T is categorised as the concept C )
.
if ( the noun phrase NP stands for the thing T and has the personal pronoun |he| as head )
then
( the thing T is a man ).
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.21
Simplistic Anaphor Rules in CEif ( the noun phrase NP has the personal pronoun PRP as head )
then
( the noun phrase NP is an anaphor )
.
if ( the noun phrase NPA is an anaphor ) and
( the noun phrase NPA follows the noun phrase NP ) and
( the noun phrase NP stands for the man T ) and
( the noun phrase NPA stands for the man TA )
then
( the noun phrase NPA is coreferent with the noun phrase NP )
.
if ( the noun phrase NP1 is coreferent with the noun phrase NP2 ) and
( the noun phrase NP1 stands for the thing T1 ) and
( the noun phrase NP2 stands for the thing T2 )
then
( the thing T1 is the same as the thing T2 )
.
Needs much more rules with selection constraints on the target NP
Needs to handle more categories
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.22
Extended CE Parser Agent
CE Store
CE parserCE
semantics semantic statement
Entities
Lexicon
SYNCOIN sentences
Grammar pattern
Linguistic Frame
mapping to concepts
Predicate Logic Model
SYNCOIN Model
•CE Parser agent reads SYNCOIN data and runs simple CE linguistic frames•Agent extracts best" parse", turns into low level CE•This is simple entity extraction
• when the noun phrase is at the start ("the man ...")
Java Agent
Analysts Model
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.23
Extended CE Parser
Chart Parser
Phrase structure grammar
lexical categories annotations
lexicon of words, categories and
syntactic features
Semantic processor
Semantic representation
and combination
lock-step
Parse Trees
Logical Representation
Documents, Reports
CE
mapping to concepts
semantic statement
(1-1)
syntactic pattern
linguistic frame
Linguistic Model
Analyst's Conceptual Model
Predicate Logic ModelMapping assumes simple
1=1 word to concept
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.24
CE fact extraction framework
SYNCOIN Sentence
as parsed by Stanford Parser + CE semantic extraction rules
SYNCOIN Sentence
as parsed by CE Parser + CE semantic extraction rules
Basic syntactic parse tree information from Stanford Parser Basic syntactic parse tree information from CE Parser
Semantic information more general than the ACM Semantic information more general than the ACM
Semantic information added from Analysts Conceptual Model Semantic information added from Analysts Conceptual Model
CE facts extracted from sentence CE facts extracted from sentence
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.25
Applying rules to find entities
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.26
Prepositional phrase "in" as a container
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.27
Backup
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.28
Using WordNet to extend the linguistic mappings
meaning meaning
synset concept
word sense
word
lexicographer analyst
word sense
word
the synset {tank, armoured combat vehicle} means the same as the concept tank.
{tank,armoured combat vehicle}
armoured combat vehicle/1tank/1
armoured combat vehicle
tank
conceptualise a ~ tank ~ T.
“meeting of minds”
the synset {tank, armoured combat vehicle} has the word sense tank/1 as component.
synset
the synset ‘{tank,armoured combat vehicle} ' is a hyponym of the synset ‘{military vehicle}'.
‘{military vehicle}'.
word
military vehicle.
the synset {military vehicle} means the same as the concept tank.
the word |military vehicle| expresses the concept tank.
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.29
CE rules to use WordNet to extend word-to-concept relations
if ( the synset S means the same as the concept C ) and
( the synset S is a hyponym of the synset Super )
then
( the synset Super means the same as the concept C )
.