Upload
isaac-rogers
View
23
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Semantic Search Facilitator: Concept and Current State of Development. InBCT Tekes PROJECT Chapter 3.1.3 : “Industrial Ontologies and Semantic Web” (year 2004). Industrial Ontologies Group. Researchers Vagan Terziyan Oleksandr Kononenko Andriy Zharko Oleksiy Khriyenko Olena Kaykova - PowerPoint PPT Presentation
Citation preview
Semantic Search Facilitator: Concept Semantic Search Facilitator: Concept and Current State of Developmentand Current State of Development
InBCT Tekes PROJECT Chapter 3.1.3 :InBCT Tekes PROJECT Chapter 3.1.3 :“Industrial Ontologies and Semantic Web” (year 2004)“Industrial Ontologies and Semantic Web” (year 2004)
• Researchers Vagan Terziyan Oleksandr Kononenko Andriy Zharko Oleksiy Khriyenko Olena Kaykova Olga Klochko Andriy Taranov
Industrial Ontologies GroupIndustrial Ontologies GroupIndustrial Ontologies GroupIndustrial Ontologies Group
• Contact: e-mail: [email protected] Phone: +358 14 260 4618 URL: http://www.cs.jyu.fi/ai/OntoGroup
ResourcesResourcesResourcesResources
12 000 EURO salaries for 5 months
Resources used from InBCT Project in 2004:Resources used from InBCT Project in 2004:
Semantic-basedSemantic-based EnhancementEnhancement ofof thethe InformationInformation RetrievalRetrieval
Motivation from Industrial Ontologies Group:Motivation from Industrial Ontologies Group:
““While recently there is lack of annotated resources in the Web, While recently there is lack of annotated resources in the Web, which makes metadata-based search useless, we should develop which makes metadata-based search useless, we should develop
enhanced Web search tool based on Google and WordNet ontology enhanced Web search tool based on Google and WordNet ontology
and provide semantic search user interfaceand provide semantic search user interface””
Semantic Web and Semantic Web and Information RetrievalInformation Retrieval
Semantic Web and Semantic Web and Information RetrievalInformation Retrieval
Semantic Web promises many advantages and benefits, but: We are only in “transition” towards the Semantic Web Resources are not yet annotated semantically Not enough metadata available in the Web for more smart
search
Semantic search of non-semantic data ??? Yes, why not? We need a Semantic Facilitator !
Semantic Facilitator ConceptSemantic Facilitator ConceptSemantic Facilitator ConceptSemantic Facilitator Concept
What is it? Search service that uses other services
Utilizes other search engines as Web services and… … makes their performance better due to smart query
generation algorithms Supports search within heterogeneous resources (Web
pages, Web databases, local file system, etc.) Filters returned results based on user preferences
Intelligent “semantic query”-based tool that really “understands” what users want to find
What it is not? Search engine, indexing tool, registry, etc. Data storage, database browser, etc.
Web search - What’s the Problem?Web search - What’s the Problem?Web search - What’s the Problem?Web search - What’s the Problem?
• Search in the web is not always convenient: Polysemy of words gives
irrelevant results Synonymy does not supported
by search engines => loss of relevant results
• There is a need to capture semantics from search query
?Mouse
Semantic Search AssistantSemantic Search Assistantlight version of Semantic Facilitatorlight version of Semantic Facilitator
Semantic Search AssistantSemantic Search Assistantlight version of Semantic Facilitatorlight version of Semantic Facilitator
• “Semantic Search Assistant” (SSA) is a software that: helps user to obtain more relevant results while
using standard search engine (Google) by interaction with WordNet ontology
finds possible contexts for words in search query can broaden or constrict search query with other
relevant words and phrases for result improvement works with not annotated documents is not restricted to any concrete domain
Sense DeterminationSense DeterminationSense DeterminationSense Determination
• WordNetWordNet is an open source ontology, which contains information about different meanings of a term, synonyms, antonyms and other lexical and semantic relations
• Having several words in search query we can determine in which context (sense) each of them is used with the help of WordNet: by comparing words synsets by comparing words textual descriptions and
examples by finding common roots going up in WordNet
hierarchy tree for each word by asking a user
How does it work?How does it work?How does it work?How does it work?
1. Gets keyword query 2. Translates original query into series of
queries to Google taking into account the semantics of keywords
3. Combines returned results
Ontology Ontology Personalization:Personalization:
is mechanism, which is mechanism, which allows users to have allows users to have own conceptual view own conceptual view and be able to use it for and be able to use it for semantic querying of semantic querying of search facilities. search facilities.
“Driver”
“Driver”
“Driver”
“Driver”“Driver”
Common ontologyCommon ontology
SSearchearch
Ontology PersonalizationOntology PersonalizationOntology PersonalizationOntology Personalization
Semantic Search Enhancement :Semantic Search Enhancement :Common (linguistic) Common (linguistic)
ontologyontology
QueryQuery : : ( ( XX XX XX XX XXXX XXXX XX XX ))
Domain ontologyDomain ontology
SemanticFilteringSemanticFiltering
Result:Result:
Semantic Search FacilitatorSemantic Search Facilitator uses uses ontologically (WordNet) defined ontologically (WordNet) defined knowledge about words and embedded knowledge about words and embedded support of advanced Google-search support of advanced Google-search query features in order to construct query features in order to construct more efficient queries from formal more efficient queries from formal textual description of searched textual description of searched information. Semantic Search information. Semantic Search Facilitator hides from users the Facilitator hides from users the complexity of query language of complexity of query language of concrete search engine and performs concrete search engine and performs routine actions that most of users do in routine actions that most of users do in order to achieve better performance order to achieve better performance and get more relevant results.and get more relevant results.
Semantic Search EnhancementSemantic Search EnhancementSemantic Search EnhancementSemantic Search Enhancement
Capturing Semantics from Capturing Semantics from Search PhrasesSearch Phrases
Motivation according to our Ukrainian colleague: Motivation according to our Ukrainian colleague: Vadim ErmolayevVadim Ermolayev
““Google query should be transformed based on domain Google query should be transformed based on domain ontologyontology””
Semantic Search AssistantSemantic Search Assistant
Semantic Search AssistantSemantic Search AssistantSemantic Search AssistantSemantic Search Assistant
Algorithm for the New Query Algorithm for the New Query GenerationGeneration
Algorithm for the New Query Algorithm for the New Query GenerationGeneration
Rij
…Word(i)
Sense (i1)
Sense (ij)
Sense (ip)
…
…
-1-1 11 …
…
Syn (ij1)
Syn (ijk)
Syn (ijmij )
… Nijk
i = 1, ni = 1, n
nn – number of the words from query
j = 1, pj = 1, p
pp – number of the word’s senses
-1-1 11
-1-1 11
- relevance of the word’s sense
00 11
Ri -
significance of the word in query
k = 1, mk = 1, mijij
mmijij – number of the word’s synonyms in senses
- number of the synonym’s senses
Algorithm for the New Query Algorithm for the New Query GenerationGeneration
Algorithm for the New Query Algorithm for the New Query GenerationGeneration
QQijkijk = =
Synonym Quality:Synonym Quality:
** LLNNijkijk
11 pp
j=1j=1
RRijij
LL – – number of the synsets which contain Synnumber of the synsets which contain Synijkijk
, , if Synif Synijkijk is a member of the synset is a member of the synsetjj
Word(i)
SynSynQQijkijk
SynSynQQijkijk
SynSynQQijkijk
SynSynQQijkijk
SynSynQQijkijk
SynSynQQijkijk
SynSynQQijkijk
……
Reduction of the synonym quality absolute valueReduction of the synonym quality absolute value if QQijkijk >= 0 >= 0, then synonym will used via ”OR” in a queryif QQijkijk < 0 < 0, , then will used via ”AND NOT”
Algorithm for the New Query Algorithm for the New Query GenerationGeneration
Algorithm for the New Query Algorithm for the New Query GenerationGeneration
Word(1)
SynSyn SynSyn SynSyn …… SynSyn
Word(i) SynSyn SynSyn SynSyn …… SynSyn
Word(n) SynSyn SynSyn SynSyn …… SynSyn
…
SynSyn SynSyn SynSyn SynSyn SynSyn SynSyn SynSyn SynSyn SynSyn
AND AND
OR (AND NOT) OR (AND NOT) OR (AND NOT)
Algorithm 1:Algorithm 1:
QueryQuery
Algorithm for the New Query Algorithm for the New Query GenerationGeneration
Algorithm for the New Query Algorithm for the New Query GenerationGeneration
Word(1)
SynSyn SynSyn SynSyn …… SynSyn
Word(i) SynSyn SynSyn SynSyn …… SynSyn
Word(n) SynSyn SynSyn …… SynSyn
…
SynSyn SynSyn SynSyn SynSyn SynSyn SynSyn SynSyn SynSyn SynSyn
AND AND
OR (AND NOT) OR (AND NOT) OR (AND NOT)
Algorithm 2:Algorithm 2:
QueryQuery
Filtering based on a
significance of the word RRii
SynSyn
|Q|>
Google APIGoogle APIAdaptation to search engineAdaptation to search engine
We use Google because..We use Google because..We use Google because..We use Google because..
Developers write software that connects remotely to the Google Web APIs service and access Google's index of more than 4 billion web pages
Google Web APIs support the same search syntax as the Google.com site
Communication is performed via the Simple Object Access Protocol (SOAP), an XML-based mechanism for exchanging typed information
..but that could be virtually any of existing search engine..but that could be virtually any of existing search engine..but that could be virtually any of existing search engine..but that could be virtually any of existing search engine
WordNetWordNet
( ( online access: online access: http://www.cogsci.princeton.edu/cgi-bin/webwnhttp://www.cogsci.princeton.edu/cgi-bin/webwn ) )
WordNet 2.0 Search ExampleWordNet 2.0 Search ExampleWordNet 2.0 Search ExampleWordNet 2.0 Search Example• Search word: "driver“ The noun "driver" has 5 senses in WordNet.
1. driver -- (the operator of a motor vehicle)2. driver -- (someone who drives animals that pull a vehicle)3. driver -- (a golfer who hits the golf ball with a driver)4. driver, device driver -- ((computer science) a program that determines how a computer will communicate with a peripheral device)5. driver, number one wood -- (a golf club (a wood) with a near vertical face that is used for hitting long shots from the tee)
• Sense 1driver -- (the operator of a motor vehicle) => busman, bus driver -- (someone who drives a bus) => chauffeur -- (a man paid to drive a privately owned car) => designated driver --(the member of a party who is designated to refrain from
alcohol and so is sober when it is time to drive home) => honker -- (a driver who causes his car's horn to make a loud honking sound; "the honker was fined for disturbing the peace") => motorist, automobilist -- (someone who drives (or travels in) an automobile) => owner-driver -- (a motorist who owns the car that he/she drives) => racer, race driver, automobile driver -- (someone who drives racing cars at high speeds) …
WordNet – Basic TerminologyWordNet – Basic TerminologyWordNet – Basic TerminologyWordNet – Basic Terminology
Syntactic category – part of speech {noun, verb, {noun, verb, adjective, adverb}adjective, adverb}
Synonymic set (synset)(synset) – list of synonymic words or collocations Every word can have several senses Every sense of a word is associated with synonyms
(synset) of the word in that specific sense Synsets are organized in hierarchies interlinked with
semanticrelations
WordNet – OrganizationWordNet – OrganizationWordNet – OrganizationWordNet – Organization
Building Blocks: Word forms – common word orthography Word meanings – by synsets
Relations: Lexical – between word forms Semantic – between word meanings
=> Pointers: Lexical – pertain only to specific word Semantic – pertain to all of the words in semantic set.
Semantic Search AssistantSemantic Search Assistantprototypeprototype
Features of SSAFeatures of SSAFeatures of SSAFeatures of SSA
• Platform independent (written in Java)
• Works in 2 modes:common mode, implements almost all of
Google functionality;extended mode, extends common mode,
makes several requests with the same semantic sense, returns compound results.
• Keeps results in XML format
Common modeCommon modeCommon modeCommon mode
• SSA has clear and simple interface, which helps user makes advanced Google search without special knowledge
• SSA transforms values of fields into Google request according to special format, which Google provides for advanced search
Extended modeExtended modeExtended modeExtended mode
• More powerful mode than the common one• SSA takes user request, makes a try to choose
more convenient sense with user’s help• Makes a set of requests, which extend user’s
request by synonyms and exclude unsuitable words
Generating of requests setGenerating of requests setGenerating of requests setGenerating of requests set
• WordNet API and dictionaries are used for generating the set of requests
• When user enters original request, SSA switches to the panel, where different senses of typed word are presented
Generating of requests set (2)Generating of requests set (2)Generating of requests set (2)Generating of requests set (2)
• For every presented sense on this panel a user can see some description (even example) extracted from WordNet dictionary
• Also he/she can set rate of correspondence for every sense in range [-1, 1]
Making compound resultMaking compound resultMaking compound resultMaking compound result
• SSA sends generated requests to Google one by one
• It keeps obtained results for each request separately
• User finally will get an integrated result, which was generated according special rules
Integrated results:Integrated results:generating rulesgenerating rules
Integrated results:Integrated results:generating rulesgenerating rules
• Unique identifier for each result is its URL
• SSA counts amount of URL appearances in returned results and sets this amount as index for every URL
• Results with bigger index are showed first
• If indexes are equal, results are shown according the order as Google returned them
Results analysisResults analysisResults analysisResults analysis
• After making all requests, SSA shows final results
• All results are keeping also in files in XML format for further analysing
• User can highlight results for specific request, if there were more than one request
ResultsResultsResultsResults
• Methods for automatic sense determination using WordNet Lexical Database were studied and correspondent algorithms were implemented
• Algorithm for new query generating were implemented and embedded to the programming complex
• User Interface for advanced search (with Google integration) was developed with Semantic Search Assistant functionality
ExampleExampleExampleExample
• Initial query:hotel reservation agency
(1, 7 and 5 senses correspondingly)
• From first 5 results only 3 are relevant(results with whole sequence of query words even does not appear in first three pages)
• Generated query:("hotel") ("booking" OR "reserve")
(-"qualification") ("bureau" OR "agency") (-"means")
• From first 5 results all are relevant (using synonym “booking” along with “reservation” was helpful)
ExampleExampleExampleExample
Results of initial query: Results of generated query:
More ExamplesMore Examples
Test 1:Test 1:Test 1:Test 1:Initial query: cork mousepadcork mousepad
Test 1:Test 1:Test 1:Test 1:Enhanced query: ("phellem" OR "bobfloat" OR "bobber" OR "cork" OR "bob") ("mousepad" OR ("phellem" OR "bobfloat" OR "bobber" OR "cork" OR "bob") ("mousepad" OR
"mouse mat")"mouse mat")
Initial query: cork mousepadcork mousepad
Test 2:Test 2:Test 2:Test 2:Initial query: flowers present shopflowers present shop
Test 2:Test 2:Test 2:Test 2:Enhanced query: ("flower") (-"heyday" -"prime" -"efflorescence") ("present") (-"nowadays" ("flower") (-"heyday" -"prime" -"efflorescence") ("present") (-"nowadays"
-"present tense") ("store" OR "shop") (-"workshop") -"present tense") ("store" OR "shop") (-"workshop")
Initial query: flowers present shopflowers present shop
Test 3:Test 3:Test 3:Test 3:Initial query: hotel reservation agencyhotel reservation agency
Test 3:Test 3:Test 3:Test 3:Enhanced query: ("hotel") ("booking" OR "reserve") (-"qualification") ("bureau" OR "agency") ("hotel") ("booking" OR "reserve") (-"qualification") ("bureau" OR "agency")
(-"means") (-"means")
Initial query: hotel reservation agencyhotel reservation agency
Test 4:Test 4:Test 4:Test 4:Initial query: zodiac fishzodiac fish
Test 4:Test 4:Test 4:Test 4:Enhanced query: ("zodiac") ("pisces" OR "fish" OR "pisces the fishes") ("zodiac") ("pisces" OR "fish" OR "pisces the fishes")
Initial query: zodiac fishzodiac fish
DrawbacksDrawbacksDrawbacksDrawbacks
• Lack highly specialized terminology for narrow domains in WordNet => difficult to get better results with SSA in such cases
• Frequent absence of sense relation between words in whole phrases => difficulty of context determination by used algorithms
• Presence of several very close senses for many terms in WordNet => no clear belonging of the word to some sense
• Possible wrong determination of part of speech for word in query => using improper synonyms and antonyms for making query
Possible Improvements and Possible Improvements and further workfurther work
Possible Improvements and Possible Improvements and further workfurther work
• Additional Adaptive Learning (for personalized context definition)
• Creating Global Sense Ontology on the basis of WordNet Database
• Improving algorithms for automatic computing of relevance indexes
• Adding algorithms for smart cutting off for generated queries
• Using fuzzy logic for determination of query context• Adding other lexical databases for supporting search in
specific domains (like programming, medicine)• Multilingual support
Current statusCurrent statusCurrent statusCurrent status• During Jan-May 2004 main efforts for the InBCT
“Semantic Search Facilitator” project were put into the research and design of the basic features of SSA and implementation of ontology-based search method.
• The development of the prototype Semantic Search Assistant software has been started and pilot version is ready.
• Starting 1.06.2004 kernel part of the Industrial Ontologies Group start working on TEKES project “SmartResource”: Proactive Self-Maintained Resources in Semantic Web
at Agora Center, University of Jyväskylä• Further development (from the point of stability and
usability) of SSA will be continued during Jul-Sep 2004