Upload
others
View
17
Download
0
Embed Size (px)
Citation preview
^Åèìáëáíáçå=oÉëÉ~êÅÜ=mêçÖê~ã==do^ar^qb=p`elli=lc=_rpfkbpp=C=mr_if`=mlif`v=k^s^i=mlpqdo^ar^qb=p`elli=
Approved for public release, distribution unlimited.
Prepared for: Naval Postgraduate School, Monterey, California 93943
NPS-AM-08-035
bñÅÉêéí=Ñêçã=íÜÉ==
mêçÅÉÉÇáåÖë=çÑ=íÜÉ=
ÑáÑíÜ=^ååì~ä=^Åèìáëáíáçå=
oÉëÉ~êÅÜ=póãéçëáìã=
ReSEARCH: A REQUIREMENTS SEARCH ENGINE
Published: 23 April 2008
by
Dr. Craig Martell, LT Paige H. Adams, Dr. Pranav Anand, ENS Grant Gehrke, Dr. Ralucca Gera, CPT Marco Draeger, Dr. Kevin Squire
5th Annual Acquisition Research Symposium of the Naval Postgraduate School:
Acquisition Research: Creating Synergy for Informed Change
May 14-15, 2008
Report Documentation Page Form ApprovedOMB No. 0704-0188
Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering andmaintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information,including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, ArlingtonVA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if itdoes not display a currently valid OMB control number.
1. REPORT DATE 23 APR 2008 2. REPORT TYPE
3. DATES COVERED 00-00-2008 to 00-00-2008
4. TITLE AND SUBTITLE ReSEARCH: A Requirements Search Engine
5a. CONTRACT NUMBER
5b. GRANT NUMBER
5c. PROGRAM ELEMENT NUMBER
6. AUTHOR(S) 5d. PROJECT NUMBER
5e. TASK NUMBER
5f. WORK UNIT NUMBER
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Naval Postgraduate School,Department of Computer Science,Monterey,CA,93943
8. PERFORMING ORGANIZATIONREPORT NUMBER
9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S)
11. SPONSOR/MONITOR’S REPORT NUMBER(S)
12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited
13. SUPPLEMENTARY NOTES 5th Annual Acquisition Research Symposium: Creating Synergy for Informed Change, May 14-15, 2008 inMonterey, CA
14. ABSTRACT
15. SUBJECT TERMS
16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT Same as
Report (SAR)
18. NUMBEROF PAGES
53
19a. NAME OFRESPONSIBLE PERSON
a. REPORT unclassified
b. ABSTRACT unclassified
c. THIS PAGE unclassified
Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18
^Åèìáëáíáçå=oÉëÉ~êÅÜ=mêçÖê~ã==do^ar^qb=p`elli=lc=_rpfkbpp=C=mr_if`=mlif`v=k^s^i=mlpqdo^ar^qb=p`elli=
The research presented at the symposium was supported by the Acquisition Chair of the Graduate School of Business & Public Policy at the Naval Postgraduate School. To request Defense Acquisition Research or to become a research sponsor, please contact: NPS Acquisition Research Program Attn: James B. Greene, RADM, USN, (Ret) Acquisition Chair Graduate School of Business and Public Policy Naval Postgraduate School 555 Dyer Road, Room 332 Monterey, CA 93943-5103 Tel: (831) 656-2092 Fax: (831) 656-2253 E-mail: [email protected] Copies of the Acquisition Sponsored Research Reports may be printed from our website www.acquisitionresearch.org Conference Website: www.researchsymposium.org
====================^Åèìáëáíáçå=oÉëÉ~êÅÜW=ÅêÉ~íáåÖ=ëóåÉêÖó=Ñçê=áåÑçêãÉÇ=ÅÜ~åÖÉ=======- i - =
=
Proceedings of the Annual Acquisition Research Program
The following article is taken as an excerpt from the proceedings of the annual
Acquisition Research Program. This annual event showcases the research projects
funded through the Acquisition Research Program at the Graduate School of Business
and Public Policy at the Naval Postgraduate School. Featuring keynote speakers,
plenary panels, multiple panel sessions, a student research poster show and social
events, the Annual Acquisition Research Symposium offers a candid environment
where high-ranking Department of Defense (DoD) officials, industry officials,
accomplished faculty and military students are encouraged to collaborate on finding
applicable solutions to the challenges facing acquisition policies and processes within
the DoD today. By jointly and publicly questioning the norms of industry and academia,
the resulting research benefits from myriad perspectives and collaborations which can
identify better solutions and practices in acquisition, contract, financial, logistics and
program management.
For further information regarding the Acquisition Research Program, electronic
copies of additional research, or to learn more about becoming a sponsor, please visit
our program website at:
www.acquistionresearch.org
For further information on or to register for the next Acquisition Research
Symposium during the third week of May, please visit our conference website at:
www.researchsymposium.org
====================^Åèìáëáíáçå=oÉëÉ~êÅÜW=ÅêÉ~íáåÖ=ëóåÉêÖó=Ñçê=áåÑçêãÉÇ=ÅÜ~åÖÉ=======- ii - =
=
THIS PAGE INTENTIONALLY LEFT BLANK
====================^Åèìáëáíáçå=oÉëÉ~êÅÜW=ÅêÉ~íáåÖ=ëóåÉêÖó=Ñçê=áåÑçêãÉÇ=ÅÜ~åÖÉ=======- 213 - =
=
ReSEARCH: A Requirements Search Engine
Presenter: Craig Martell holds a PhD from the University of Pennsylvania and is an Associate Professor in Computer Science at the Naval Postgraduate School. He specializes in natural-language processing applied to on-line chat, weblogs and semantic search.
Dr. Craig Martell Associate Professor Department of Computer Science Naval Postgraduate School Monterey, CA 93943 E-mail: [email protected]
Author: Paige H. Adams, Lieutenant, is a US Navy Information Warfare Officer. He has served as a Russian and Burmese linguist aboard a variety of surface, subsurface, and airborne platforms in the Atlantic, Pacific, and Arabian Gulf, as well as at joint field sites and on flag staff. Most recently, he served as Operations Officer and Joint Operations Department Head at the Misawa Security Operations Center in Japan. He is a 2000 graduate (summa cum laude) of Hawaii Pacific University with a BS/BA in Computer Information Systems. LT Paige Adams, USN Graduate Student Department of Computer Science Naval Postgraduate School Monterey, CA 93943 E-mail: [email protected]
Author: Dr. Pranav Anand is an Assistant Professor of Linguistics at UC Santa Cruz. He specializes in theoretical syntax and semantics, with an emphasis on how contextual factors affect the form and meaning of linguistic expressions. He received his AB from Harvard in 2001 and his PhD from MIT in 2006.
Dr. Pranav Anand Assistant Research Professor Department of Computer Science Naval Postgraduate School Monterey, CA 93943 E-mail: [email protected]
Author: Grant Gehrke, Ensign, is a native of Louisville, Kentucky. He graduated in 2007 from the Naval Academy with a degree in Computer Science. His current research focuses on using NLP approaches for authorship attribution in blogs. He will be graduating from the Naval Postgraduate School in June and reporting to Forrest Sherman, a destroyer homeported in Norfolk, Virginia. ENS Grant Gehrke, USN Graduate Student Department of Computer Science Naval Postgraduate School Monterey, CA 93943 E-mail: [email protected]
====================^Åèìáëáíáçå=oÉëÉ~êÅÜW=ÅêÉ~íáåÖ=ëóåÉêÖó=Ñçê=áåÑçêãÉÇ=ÅÜ~åÖÉ=======- 214 - =
=
Author: Dr. Ralucca Gera received her BS (2000), MA (2002), and PhD (2005) in Mathematics, all from Western Michigan University. She has been an Assistant Professor in the Applied Mathematics Department at NPS since 2005. Gera’s research interests are in combinatorics and graph theory, particularly domination and alliances in graphs.
Dr. Ralucca Gera Assistant Professor Department of Computer Science Naval Postgraduate School Monterey, CA 93943 E-mail: [email protected]
Author: Marco Draeger is a Captain in the German Army and studied Computer Science at the University of the Federal Armed Forces from 2000 to 2004. From 2004 to 2005, he was assigned to the Modeling and Simulation Department of the Center for Transformation of the Federal Armed Forces. Following this, he served at the German Command Support School until 2007. Since June 2007, Draeger has been a student in the Operations Research Department at the Naval Postgraduate School.
Marco Draeger Graduate Student Department of Computer Science Naval Postgraduate School Monterey, CA 93943 E-mail: [email protected]
Author: Dr. Kevin Squire received his PhD degree from the Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign (UIUC), Urbana, in 2004. He is currently an Assistant Professor of Computer Science at the Naval Postgraduate School, Monterey, CA. His research focuses on developing and applying machine-learning algorithms and models to many different domains—including natural language processing, autonomous language acquisition, scheduling, robotics, computer vision, and remote sensing.
Dr. Kevin Squire Assistant Professor Department of Computer Science Naval Postgraduate School Monterey, CA 93943 E-mail: [email protected]
Abstract This research address three closely related problems. (1) Most current search
technology is based on a popularity metric (e.g., PageRank or ExpertRank), but not on the semantic content of the document. (2) When building components in a service-oriented architecture (SOA), developers must investigate whether components that meet certain requirements already exist. (3) There is no easy way for writers of requirements documents to formally specify the meaning and domain of their requirements. Our goal in the research presented here is to address these concerns by designing a search-engine that searches over the “meanings” of requirements documents. In this paper, we present the current state of the ReSEARCH project.
Keywords: Semantic Search, Requirements, Open Architecture, Information Systems Technology
====================^Åèìáëáíáçå=oÉëÉ~êÅÜW=ÅêÉ~íáåÖ=ëóåÉêÖó=Ñçê=áåÑçêãÉÇ=ÅÜ~åÖÉ=======- 215 - =
=
====================^Åèìáëáíáçå=oÉëÉ~êÅÜW=ÅêÉ~íáåÖ=ëóåÉêÖó=Ñçê=áåÑçêãÉÇ=ÅÜ~åÖÉ=======- 216 - =
=
====================^Åèìáëáíáçå=oÉëÉ~êÅÜW=ÅêÉ~íáåÖ=ëóåÉêÖó=Ñçê=áåÑçêãÉÇ=ÅÜ~åÖÉ=======- 217 - =
=
====================^Åèìáëáíáçå=oÉëÉ~êÅÜW=ÅêÉ~íáåÖ=ëóåÉêÖó=Ñçê=áåÑçêãÉÇ=ÅÜ~åÖÉ=======- 218 - =
=
====================^Åèìáëáíáçå=oÉëÉ~êÅÜW=ÅêÉ~íáåÖ=ëóåÉêÖó=Ñçê=áåÑçêãÉÇ=ÅÜ~åÖÉ=======- 219 - =
=
====================^Åèìáëáíáçå=oÉëÉ~êÅÜW=ÅêÉ~íáåÖ=ëóåÉêÖó=Ñçê=áåÑçêãÉÇ=ÅÜ~åÖÉ=======- 220 - =
=
====================^Åèìáëáíáçå=oÉëÉ~êÅÜW=ÅêÉ~íáåÖ=ëóåÉêÖó=Ñçê=áåÑçêãÉÇ=ÅÜ~åÖÉ=======- 221 - =
=
====================^Åèìáëáíáçå=oÉëÉ~êÅÜW=ÅêÉ~íáåÖ=ëóåÉêÖó=Ñçê=áåÑçêãÉÇ=ÅÜ~åÖÉ=======- 222 - =
=
====================^Åèìáëáíáçå=oÉëÉ~êÅÜW=ÅêÉ~íáåÖ=ëóåÉêÖó=Ñçê=áåÑçêãÉÇ=ÅÜ~åÖÉ=======- 223 - =
=
====================^Åèìáëáíáçå=oÉëÉ~êÅÜW=ÅêÉ~íáåÖ=ëóåÉêÖó=Ñçê=áåÑçêãÉÇ=ÅÜ~åÖÉ=======- 224 - =
=
====================^Åèìáëáíáçå=oÉëÉ~êÅÜW=ÅêÉ~íáåÖ=ëóåÉêÖó=Ñçê=áåÑçêãÉÇ=ÅÜ~åÖÉ=======- 225 - =
=
====================^Åèìáëáíáçå=oÉëÉ~êÅÜW=ÅêÉ~íáåÖ=ëóåÉêÖó=Ñçê=áåÑçêãÉÇ=ÅÜ~åÖÉ=======- 226 - =
=
====================^Åèìáëáíáçå=oÉëÉ~êÅÜW=ÅêÉ~íáåÖ=ëóåÉêÖó=Ñçê=áåÑçêãÉÇ=ÅÜ~åÖÉ=======- 227 - =
=
====================^Åèìáëáíáçå=oÉëÉ~êÅÜW=ÅêÉ~íáåÖ=ëóåÉêÖó=Ñçê=áåÑçêãÉÇ=ÅÜ~åÖÉ=======- 228 - =
=
====================^Åèìáëáíáçå=oÉëÉ~êÅÜW=ÅêÉ~íáåÖ=ëóåÉêÖó=Ñçê=áåÑçêãÉÇ=ÅÜ~åÖÉ=======- 229 - =
=
List of References Apache Lucene. (2008). Lucene-java wiki. Retrieved March 23, 2008, from http://wiki.apache.org/lucene-
java
Ask.com. (2008). Ask search technology. Retrieved March 28, 2008, from http://about.ask.com/en/docs/about/ask technology.shtml
Brin, S., & Page, L. (1998, April). The anatomy of a large-scale hypertextual web search engine. In Proceedings of the seventh international world wide web conference (pp. 107-117). Published in Computer Networks and ISDN Systems, 30(1-7).
Harman, D. (1991). How effective is suffixing? Journal of the American Society for Information Science, 42(1), 7-15.
Harris, Z.S. (1954). Distributional structure. Word, 10, 146-162.
Hearst, M.A. (1992). Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on computational linguistics (pp. 539-545). Morristown, NJ: Association for Computational Linguistics.
Hersh, W.R. (2003). Information retrieval: A health and biomedical perspective. New York: Springer.
Johnson, J., & Blais, C. (2008). Share repository framework: Component specific and ontology. In Proceedings of the fifth annual acquisition research symposium. Monterey, CA: Naval Postgraduate School.
Langville, A.N., & Meyer, C.D. (2006, July). Google's PageRank and beyond: The science of search engine rankings. Princeton, NJ: Princeton University Press.
Lin, D. (1998). Extracting collocations from text corpora. In Workshop on computational terminology (pp. 57-63). Montreal, Canada.
Lin, D. (2008). MINIPAR. Retrieved March 28, 2008, from http://www.cs.ualberta.ca/~lindek/minipar.htm
Lin, D., & Pantel, P. (2001). DIRT—Discovery of inference from text. In KDD '01: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining (pp. 323-328). New York: ACM.
Notess, G.R. (n.d.). Review of altavista. Retrieved March 23, 2008, from http://www.searchengineshowdown.com/features/av/review.html
Sapp, G. (2000). Altavista proffers updated search technology to online businesses. Retrieved March 23, 2008, from http://www.infoworld.com/articles/ic/xml/00/03/21/000321icalta.html
Schmidt, M. (2005). Lucene wikipedia indexer. Retrieved March 23, 2008, from http://schmidt.devlib.org/software/lucene-wikipedia.html
Schwartz, C. (1998, September). Web search engines. Journal of the American Society of Information Science, 49(11): 973-982.
Snow, R., Jurafsky, D., & Ng, A.Y. (2005). Learning syntactic patterns for automatic hypernym discovery. In L.K. Saul, Y. Weiss & L. Bottou (Eds.), Advances in neural information processing systems 17 (pp. 1297-1304). Cambridge, MA: MIT Press.
Vajda, A. (2005). Pulling java lucene into python: Pylucene. Retrieved March 23, 2008, from http://chandlerproject.org/PyLucene/Paper
Zhang, D., & Dong, Y. (2000). An efficient algorithm to rank web resources. Computer Networks, 33(1-6), 449-455.
====================^Åèìáëáíáçå=oÉëÉ~êÅÜW=ÅêÉ~íáåÖ=ëóåÉêÖó=Ñçê=áåÑçêãÉÇ=ÅÜ~åÖÉ=======- 230 - =
=
THIS PAGE INTENTIONALLY LEFT BLANK
====================^Åèìáëáíáçå=oÉëÉ~êÅÜW=ÅêÉ~íáåÖ=ëóåÉêÖó=Ñçê=áåÑçêãÉÇ=ÅÜ~åÖÉ =
=
2003 - 2008 Sponsored Research Topics
Acquisition Management
Software Requirements for OA Managing Services Supply Chain Acquiring Combat Capability via Public-Private Partnerships (PPPs) Knowledge Value Added (KVA) + Real Options (RO) Applied to Shipyard
Planning Processes Portfolio Optimization via KVA + RO MOSA Contracting Implications Strategy for Defense Acquisition Research Spiral Development BCA: Contractor vs. Organic Growth
Contract Management
USAF IT Commodity Council Contractors in 21st Century Combat Zone Joint Contingency Contracting Navy Contract Writing Guide Commodity Sourcing Strategies Past Performance in Source Selection USMC Contingency Contracting Transforming DoD Contract Closeout Model for Optimizing Contingency Contracting Planning and Execution
Financial Management
PPPs and Government Financing Energy Saving Contracts/DoD Mobile Assets Capital Budgeting for DoD Financing DoD Budget via PPPs ROI of Information Warfare Systems Acquisitions via leasing: MPS case Special Termination Liability in MDAPs
====================^Åèìáëáíáçå=oÉëÉ~êÅÜW=ÅêÉ~íáåÖ=ëóåÉêÖó=Ñçê=áåÑçêãÉÇ=ÅÜ~åÖÉ =
=
Human Resources
Learning Management Systems Tuition Assistance Retention Indefinite Reenlistment Individual Augmentation
Logistics Management
R-TOC Aegis Microwave Power Tubes Privatization-NOSL/NAWCI Army LOG MOD PBL (4) Contractors Supporting Military Operations RFID (4) Strategic Sourcing ASDS Product Support Analysis Analysis of LAV Depot Maintenance Diffusion/Variability on Vendor Performance Evaluation Optimizing CIWS Lifecycle Support (LCS)
Program Management
Building Collaborative Capacity Knowledge, Responsibilities and Decision Rights in MDAPs KVA Applied to Aegis and SSDS Business Process Reengineering (BPR) for LCS Mission Module
Acquisition Terminating Your Own Program Collaborative IT Tools Leveraging Competence
A complete listing and electronic copies of published research are available on our website: www.acquisitionresearch.org
^Åèìáëáíáçå=êÉëÉ~êÅÜ=mêçÖê~ã=dê~Çì~íÉ=ëÅÜççä=çÑ=ÄìëáåÉëë=C=éìÄäáÅ=éçäáÅó=k~î~ä=éçëíÖê~Çì~íÉ=ëÅÜççä=RRR=avbo=ol^aI=fkdboplii=e^ii=jlkqbobvI=`^ifclokf^=VPVQP=
www.acquisitionresearch.org
1
A Semantic-Based Search Engine for Open Architecture Requirements
DocumentsDr. Craig Martell
Associate Professor, Naval Postgraduate School
Our Team
• Paige Adams, LT USN• Pranav Anand, Assistant Prof, Linguistics, UCSC• Grant Gehrke, ENS USN• Ralucca Gera, Assistant Prof, Mathematics, NPS• Marco Draeger, CPT German Army• Craig Martell, Associate Prof, Computer Science, NPS• Kevin Squre, Assistant Prof, Computer Science, NPS
2
The ReSEARCH Project: What we’re up to.
• Using open-sourced components, we want to design a semantic search engine for requirements documents that supports the SHARE repository
• We need to match over the meaning of a requirement, not a question or a query string.
• Do processing to “enrich” both the query and the documents with semantic information.
• Automatically augment ontologies with new hypernomy (“is a”) and mereology (“is a part of”) relations.– That is, do we have to be told that a Hummer is a vehicle, and one
of its parts is a steering wheel, or can we discover this from the text.
• Etc.
3
4
Why use semantic search?
• Existing keyword-based search engines do not take into account the semantics of the documents they are searching.
• This is important when trying to find components that do what you need, not what you type.
5
Why use semantic search?
• The query string and the desired documents may not use the same phrases.
Q: What fuel does the F-22A consume?A: The F-22A’s Raptor uses JP-8.
• Query and answer convey same meaning, but use different forms– Here, “consume” and “uses” are synonymous.
autoenginebonnettyreslorryboot
caremissions
hood makemodeltrunk
makehiddenMarkovmodel
emissionsnormalize
Synonymy Problem Polysemy Problem
Prior and current strategies
Keyword Search Model
7
Prior and current strategies
• Brin and Page (1998) revolutionized search by using PageRank, which changes the order in which the pages that match the keywords in the query are returned.
• The essence of the Google innovation is in how the PageRank algorithm works.
PageRank algorithm
The rank of a particular page depends on:• The number of pages pointing to it, • The rank of each page pointing to it,• The number of outgoing links on each those
pages.
8
PageRank algorithm• PageRank metric PR(P) defines recursively
the rank/importance of each page P by
where • T1, T2, … are all the pages pointing to P • each Ti has C(Ti) outgoing links.
9
)()(...
)()(
)()()(
2
2
1
1
n
n
TCTPR
TCTPR
TCTPRPPR +++∝
Random Surfer
To further determine the rank of all web pages Google simulates the behavior of virtual surfers randomly surfing the web.
A page's rank is then updated based on how frequently the random surfers visit that page.
This pre-existing rank of each individual website is assigned independently of any query.
10
Expert Rank
Ask.com (Ask Jeeves) uses the ExpertRank algorithm:
• uses the number of incoming links as well• attempts to identify topic clusters related to
search• find experts within these topics to “seed” the
rank of some websites as “expert” sites.• PageRank has the problem that “correct” is
not the same as “highly-ranked.”
11
Current Online Semantic Search
• Powerset Labs has emerged as a forerunner in online semantic search using natural language to extract facts from text.
• On 11 May 08, Powerset’s search moved from beta to a public release
• Currently, Powerset searches only Wikipedia documents, but intends to expand search to the Internet in the future.
Powerset Indexing System
• Powerset’s algorithms are not publicly available, but their behavior can be inferred from publicly available demos– Powerset parses documents to
extract “factz”– “Factz” are generally triples of
subject-verb-object– Search is performed over these
“factz” rather than the full text
Question Answering• Keywords such as “When” tell the system how to narrow results• “W” words such as “Who” and “When” act as wildcards for matching
“factz,” allowing many searches to be matched exactly
Question Answering• Other functional words such as “From” in the search “Politician from
Virginia” improve results significantly over searching on just the keywords “Politician” and “Virginia”
Question Answering
• Question Answering task does not align exactly with requirements document search
• Requirements documents do not hold “Answers” to questions
• Encoding of facts is, however, useful– Computationally less demanding– Efficient use of storage space for the index– Allows domain specific constructs of facts to be
formulated and recognized in the corpus
Query Expansion with Synonymy• The search “What do zombies eat?” suggests that Powerset searches
for synonyms of query terms, matching “devour”• Additionally, stemming matches the inverted form “eaten by”
Query Expansion with Synonymy• Stemming of terms is found in most search engines
and is fairly easy to perform• Matching synonyms allows “close” matches on
meaning without requiring an exact keyword match• Using a structured ontology, expansion is not limited
to synonyms but may be extended to hypernyms, hyponyms, and meronyms as well– Ontology based query expansion does not appear to be
used in current Powerset searches– One of our primary approaches:
• Research Question: Can we automatically augment a given ontology using the text of the documents?
Discovering Synonymous Sentences
19
• Harris (1954): Synonymous words will occur in the same kinds of environments
• Lin & Pantel (2001): Synonymous sentences will contain the same kinds of words
The F-22A consumes JP-8The F-22A’s engine uses JP-8
• Idea: construct sentence similarity metric
Discovering Synonymous Sentences
• Sentence similarity is the geometric average of the similarity of the positions in the sentence:
Discovering Synonymous Sentences
21
• Position similarity is a normalized sum of the pointwise mutual information of all words that appear in both positions of the respective paths:
Discovering Synonymous Sentences
22
• Lin & Pantel evaluated system against TREC-8 Question Answering Task question set.
The ReSEARCH Project: Work for us.• Using open-sourced components, design a semantic search
engine for requirements documents that supports the SHARE repository
• Match over the meaning of a requirement, not a question.• Do processing to “enrich” both the query and the documents
with semantic information.• Automatically augment ontologies with new hypernomy (“is a”)
and mereology (“is a part of”) relations.– That is, do we have to be told that a Hummer is a vehicle, and one
of its parts is a steering wheel, or can we discover this from the text.
• Lots more!
23
Backup Slides
24
Building the Index
A Sample Search with Lucene
Using an Augmented Search String
28
GOAL: Design an alternative method to explicitly store/represent semantic metadata in order to enable semantic search.
Our Work