Upload
marvin-thomas
View
222
Download
0
Tags:
Embed Size (px)
Citation preview
Meaning-Oriented Question-Answering with Ontological
Semantics
An AQUAINT Project from
ILIT
• CRL is a research department in the School of Arts and Sciences at NMSU
• Funded externally• Currently has a staff of 10 PhDs• Mainly focuses on language engineering
research• Languages include – Arabic, Farsi,
Turkish, Spanish, Chinese, Japanese, Korean
• Advanced-technology company in Ithaca, New York
• Founded in 1990 by Dr. Richard Kittredge, Dr. Tanya Korelsky, and Dr. Owen Rambow.
• Goal is to transform results from research in natural language processing into practical software applications.
• Has developed a core set of text generation tools
• Current focus is on expanding the range of applications for this technology, with a particular focus on the Web.
• The Institute for Language and Information Technologies at University of Maryland Baltimore County
• Sergei Nirenburg, Director• Begins operation in September 2002 with a
team of 3 senior personnel• Close collaboration with NMSU CRL
ILIT
Recent Projects: CRL
CREST: Cross-Language Retrieval, Extract-ion, Summarization and Translation (a TIDES project)
An Arabic-English Translation System (a TIDES project)
MINDS: multilingual summarizationKeizai, MINDSEYE: cross- language retrievalFLAX: HTML parsingShiraz: Farsi-English, Dari-English MTExpedition: Rapid Ramp-Up of MT for Low
Density Languages
Recent Projects: CoGenTex
• Production of user directed multi-document summaries (RIPTIDES)
• Multimedia display will include fluent English responses coordinated with tables, diagrams, and hypertext follow-up (Reporter)
• Deep generation techniques that employ an explicit representation of communicative structure (FoG and LFS)
• Rule based text generation tools, both for answer planning and syntactic generation (Exemplars and RealPro)
Meaning-Oriented Question-Answering with Ontological
Semantics• Domain: travel and meetings
– question understanding and interpretation;
– determining the answer and
– presenting the answer
• two kinds of data source
– open text (in English, Arabic and one of Persian, Russian or Spanish)
– Structured Fact Database containing instances of ontological entities
Project Tasks• Design and Implementation of System
Architecture
• Knowledge Acquisition
• Question Understanding
• Question Interpretation
• Answer Determination
• Answer Formulation
• Documentation; User and Evaluator Training; Testing; and System Evaluation
Dialog and Self-Awareness-related
Answer Determination:
(for running commentaryand workflow and context-
related communication)
Question Interpretation:
Ÿ task contextŸ dialog contextŸ user profileŸ analyst team profile
QuestionUnderstanding
Answer Formulationand Presentation
Input:User Question
in English
Output:
System Response
in English
Task-Oriented AnswerDetermination from Fact
Database:
IE from Fact Database
NL Query Generation:
in English, Arabic andone of Persian, Russian,
Spanish
Answer Determinationfrom open text:
IR IE Production of TMRs
for Textual FIllers ofIE Templates
NLQuery
Fact Database:including
instances ofgoals, plans,
scripts
Ontology:including goals,plans, scripts
Lexicons forEach Language
in System:including names
and phrases
Static Knowledge Sources
Processing Modules andIntermediate Results
Goal and PlanProcessing
Manager
System Working Memory
Extended TMR:adds a statement of activegoals, plans and scripts in
the system
SystemResponse
in TMR
Basic TextMeaning
Representation(TMR)
Goal Attainment andPlan Execution
Agenda
Development Strategy
• Rapid Prototyping• Using pre-existing components• Evaluation of end-to-end system
performance for specific tasks
Deliverables• A QA system in the domain of travel and
meetings, with a capability to search for information in open texts in three languages and in a structured, ontology-based Fact DB;
• an enhanced text analysis system for each of the languages;
• a question interpretation module that takes into account user goals and the context of the dialog;
• an integrated IR/IE module working on open text in three languages, on the basis of ontologically defined extraction templates;
Deliverables (Cont.)
• an ontology of about 6,500 concepts;• A Fact DB of about 100,000 facts;• a system for automating the acquisition of the
Fact DB;• a semantic lexicon for each of the languages
in the system, at about 20,000 entries• a decision-making module that determines the
answer(s) and system action(s) at each step of the dialog/task processing;
• an ontological-semantic text generation module.
Sergei Nirenburg [email protected] Cowie [email protected] Korelsky [email protected] Kittredge [email protected]
Structured Common Fact Database
• Uniform format for all kinds of data• Uniform support for multiple
applications and tools• Semantically anchored in general
ontology• Constantly updated; today, manually to
semi-automatically; tomorrow, automatically
• Supports both domain knowledge and workflow specification
FACT DATABASE: The “Asian-Nation” Instance: “Turkey”
Ontology Defined
• An ontology is a formally and semantically defined repository of concepts and relations about the world.– Including knowledge about events, objects,
and work flow scripts
• Linked to the ontology are:– fact databases, including facts about actual
events, objects, places, personalities, etc.– “onomastica”, or multilingual proper name
lists
LEXICON: English lexical entry mapped to concept “EXIT”
LEXICON: Chinese lexical entry mapped to concept “EXIT”
Travel Tracking Template
PERSON-TRAVELLING NAMEALIASNATIONALITYAFFILIATIONPOSITION
PURPOSE-OF-TRAVEL (attend meeting of world leaders)DESTINATION (location of meeting)FLIGHT-INFORMATION
departure fromdeparture timearrival atarrive timeflight number
Text Meaning Representation
Output:• proposition _1
– head %travel_1• agent human_544 “Hakan Sukur”• source location_23 “London”• destination location_25 “Istanbul”• means flight-17776 “BA633”
– tmr-time• time-begin 20000702 “March 2, 2002”
– aspect• iteration single; phase end… “departed”
Input: Hakan Sukur arrived in Istanbul from London on British Airways Flight 633 on March 2, 2002
Language-Oriented Data and Tool Resources at CRL
Arabic
Azerbaijani
Chinese
Croatian
Danish
English
French
German
Italian
MRDs (among others…)
72,000 entries
233,000 entries
18,000 entries
115,000 entries
93,000 entries
44,000 entries
10,000 entries
Computational Lexicons
73,000 entries
45,000 entries
105,000 entries
75,500 entries
40,000 entries
80,000 entries
10,000 entries
Lexicons Connected to Ontology
Syntactic Grammars
Morphological Grammars
Text Corpora 10MB 325MB 2MB 2GB 10MB 3MB 1MB Segmenters and Tokenizers
Proper Name Recognizers
Morphological Analyzers
Syntactic Analyzers
Semantic Analyzers
Text Generators
Language-Oriented Data and Tool Resources at CRL
Japanese
Korean
Norwegian
Persian
Russian
Serbian
Spanish
Thai
Turkish
Ukrainian
MRDs (among others…)
60,000 entries
76,000 entries
51,000 entries
48,000 entries
18,000 entries
80,000 entries
Computational Lexicons
41,500 entries
83,300 entries
35,000 entries
55,000 entries
48,000 entries
75,000 entries
52,500 entries
2,000 entries
31,000 entries
90,000 entries
Lexicons Connected to Ontology
Syntactic Grammars
Morphological Grammars
Text Corpora 35MB 3MB 4MB 10MB 2MB 2GB 3MB Segmenters and Tokenizers
Proper Name Recognizers
Morphological Analyzers
Syntactic Analyzers
Semantic Analyzers
Multilingual and Cross-lingual Applications at CRL
Arabic
Chinese
Croatian
Danish
English
French
German
Italian
Knowledge-based MT
Source Target
Target Source
Source Target
Source Target
Transfer-based MT
Source Source Source Source Target
Multi-engine MT
Target
IR IE
Summarization QA and Other Mixed Complex Applications
Multilingual and Cross-lingual Applications at CRL
Japanese
Korean
Norwegian
Persian
Russian
Serbian
Spanish
Turkish
Ukrainian
Knowledge-based MT
Source Target
Transfer-based MT
Source Source Source Source Source Source Source Source Source
Multi-engine MT
Source
IR IE Summarization QA and Other Mixed Complex Applications