From Question-Answeringto Information-Seeking Dialogs
Jerry R. Hobbs
Artificial Intelligence Center
SRI International
Menlo Park, California
(with Douglas Appelt, Chris Culy, David Israel, David Martin, Martin Reddy, Mark Stickel, and Richard Waldinger)
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 2
Key Ideas
1. Logical analysis/decomposition of questions into component questions, using a reasoning engine
2. Bottoming out in variety of web resources and information extraction engine
3. Use of component questions to drive subsequent dialogue, for elaboration, revision, and clarification
4. Use of analysis of questions to determine, formulate, and present answers.
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 3
Plan of Attack
Inference-Based System:
Inference for Question-Answering -- this year Inference for Dialog Structure -- next year, but starting design this year
Document retrieval and information extraction for question-answering:
Incorporate as resource in inference-based system -- this year
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 4
Composition of Informationfrom Multiple Sources
How far is it from Mascat to Kandahar?
What is the lat/longof Mascat?
What is the distancebetween the two lat/longs?
What is the lat/longof Kandahar?
AlexandrianDigital Library
Gazetteer Geographical Formula
orwww.nau.edu/~cvm/latlongdist.html
QuestionDecomposition
via Logical Rules
ResourcesAttached toReasoning
Process
AlexandrianDigital Library
Gazetteer
GEMINI
SNARK
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 5
Composition of Informationfrom Multiple Sources
Show me the region 100 km north of the capital of Afghanistan.
What is the capitalof Afghanistan?
What is the lat/long100 km north?
What is the lat/longof Kabul?
CIAFact Book Geographical
Formula
QuestionDecomposition
via Logical Rules
AlexandrianDigital Library
Gazetteer
Show thatlat/long
Terravision
ResourcesAttached toReasoning
Process
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 6
Combining Time, Space,and Personal Information
Could Mohammed Atta have met with an Iraqi official between 1998 and 2001?
IE Engine
GeographicalReasoning
QuestionDecomposition
via Logical Rules
ResourceAttached toReasoning
Process
meet(a,b,t) & 1998 t 2001
at(a,x1,t) & at(b,x2,t) & near(x1,x2) & official(b,Iraq)
go(a,x1,t) go(b,x2,t)
IE Engine
TemporalReasoning
Logical Form
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 7
Two Central Systems
GEMINI: Large unification grammar of English Under development for more than a decade Fast parser Generates logical forms Used in ATIS and CommandTalk
SNARK: Large, efficient theorem prover Under development for more than a decade Built-in temporal and spatial reasoners Procedural attachment, incl for web resources Extracts answers from proofs Strategic controls for speed-up
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 8
Linguistic Variation
How far is Mascat from Kandahar?How far is it from Mascat to Kandahar?How far is it from Kandahar to Mascat?How far is it betweeen Mascat and Kandahar?What is the distance from Mascat to Kandahar?What is the distance between Mascat and Kandahar?
GEMINI parses and produces logical forms for most TREC-type queriesUse TACITUS and FASTUS lexicons to augment GEMINI lexiconUnknown word guessing based on "morphology" and immediate context
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 9
"Snarkification"
Problem: GEMINI produces logical forms not completely aligned with what SNARK theories need
Current solution: Write simplification code to map from one to the other
Long-term solution: Logical forms that are aligned better
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 10
Relating Lexical Predicates
to Core Theory Predicates
"... distance ..." "how far ..."
distance-between
Need to write these axioms for every domain we deal withHave illustrative examples
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 11
Decomposition of Questions
lat-long(l1,x) & lat-long(l2,y) & lat-long-distance(d,l1,l2) --> distance-between(d,x,y)
Need axioms relating core theory predicates and predicates from available resourcesHave illustrative examples
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 12
Procedural Attachment
Declaration for certain predicates: There is a procedure for proving it Which arguments are required before called lat-long(l1,x) lat-long-distance(d,l1,l2)
When predicate with those arguments bound is generated in proof, procedure is exectuted.
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 13
Open Agent Architecture
OAA Agent
GEMINI snarkify SNARK
Resources viaOAA Agents
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 14
Use of SMART + TextProQuestion
Subquestion-1
Other Resources
QuestionDecomposition
via Logical Rules
ResourcesAttached toReasoning
Process
Subquestion-2
Subquestion-3
SMART + TextPro
OneResource
AmongMany
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 15
Information ExtractionEngine as a Resource
SMART: Document retrieval for pre-processing
TextPro: Top of the line information extraction engine
Analyze NL query w GEMINI and SNARK
Run TextPro over documents retrieved by SMART
Retrieve best-match passage
Use TextPro annotations or GEMINI analysis to extract answer from passage
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 16
Linking SNARK with TextPro
TextSearch(EntType(?x), Terms(p), Terms(c), WSeq)
& Analyze(WSeq, p(?x,c))
--> p(?x,c)
Call to SMART+TextPro
Type of questionedconstituent
Synonyms and hypernymsof word associated with p or c
Answer:Ordered sequenceof strings of words
Match pieces of answer stringswith pieces of querySubquery generated by SNARK
during analysis of query
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 17
Information ExtractionEngine as a Resource
SMART: Document retrieval for pre-processing
TextPro: Top of the line information extraction engine
Analyze NL query w GEMINI and SNARK
Run TextPro over documents retrieved by SMART
TextPro returns relevant templates
Agent turns templates into logic for SNARK to use in proof
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 18
Domain-Specific Patterns
Decide upon domain (e.g., nonproliferation)
Compile list of principal properties and relations of interest
Implement these patterns in TextPro
Implement link between TextPro and SNARK, converting between templates and logic
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 19
Temporal Reasoning: Structure
Topology of Time: start, end, before, between
Measures of Duration: for an hour, ...
Clock and Calendar: 3:45pm, Wednesday, June 12
Temporal Aggregates: every other Wednesday
Deictic Time: last year, ...
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 20
Temporal Reasoning: Goals
Develop temporal ontology (DAML)
Reason about time in SNARK (AQUAINT, DAML)
Link with Temporal Annotation Standards (AQUAINT)
Answer questions with temporal component (AQUAINT)
Nearly complete
In progress
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 21
Spatial and GeographicalReasoning: Structure
Topology of Space: Is Albania a part of Europe?
Dimensionality
Measures: How large is North Korea? Orientation and Shape: What direction is Monterey from SF?
Latitude and Longitude: Alexandrian Digital Library Gazetteer
Political Divisions: CIA World Fact Book, ...
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 22
Spatial and GeographicalReasoning: Goals
Develop spatial and geographical ontology (DAML)
Reason about space and geography in SNARK (AQUAINT, DAML)
Attach spatial and geographical resources (AQUAINT)
Answer questions with spatial component (AQUAINT)
Somecapability
now
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 23
Dialog Modeling
Key Idea: System matches user's utterance with one of several active tasks. Understanding dialog is one active task.
Rules of form:
property(situation) --> active(Task1)
including
utter(u,w) --> active(DialogTask) want(u,Task1) --> active(Task1)
Understanding is matching utterance (conjunction of predications) with an active task or the condition of an inactive task.
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 24
Dialog Task Model
understand(a,e,t): hear(a,w) & parse(w,e) & match(e,t)
yes Action determinedby utterance and
task
no -- x unmatched
Ask about x
05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 25
Fixed-Domain QA Evaluation
Pick a domain, e.g., nonproliferation
Pick a set of resources, including a corpus of texts, structured databases, web services
Have expert make up 200+ realistic questions, answerable with resources + inference
Divide questions into training and test sets
Give sites one month+ to work on training set
Test on test set