Upload
sophie-uttley
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Inside semantic Web search
engines: between semantic
annotation and Natural Language
Processing Dentro i motori di ricerca semantici: tra annotazione
semantica ed elaborazione della
lingua naturale
Incontro ISKO Italia - Torino 3 aprile 2009
Intervento diMela Bosch [email protected]
Terminology on Web Search Engines
Text Search Engine: based on Lexical analysis. The main aim of the lexical analysis is to divide the text into paragraphs, sentences and words and also entities such as e-mail addresses or URLs. All these elements are knows as tokens, and the Search Engine makes a parsing with statistical parameters to develop a range of links as a response to a query.
Latent semantic indexing (LSI): based on Latent semantic analysis (LSA); LSI is a technique of Natural Language Processing (NLP) which uses an indexed database of documents to find similar terms. It can find a synonym and then return the best matched websites for the query. LSI does not require exact matching words for ranking result.
Semantic Web search engines: take the sense of a word as a factor in its ranking lists or offers the user a choice as to the sense of a word or phrase.
Semantic Web search engines or Search engines of 3rd generation
Three types:User oriented Semantic Web search engine: It returns web page links. It can use internally both Semantic Web technologies and LSI. Ex.: True Knowledge, Hakia and PowerSet.
Semantic Web Services oriented engine: It returns links to ontologies, OWL files, RDF instances. It is inadequate for end users. Ex.: SOWL, WSE, Watson, Falcons, Sindice and Swoogle. The idea is to provide ways for businesses to inter-operate across domains or services.
Social-semantic Web oriented engine: The socio-semantic web (s2w) uses classification and ontologies in very practical situations. S2w search engines’ aim is to complement the formal Semantic Web vision adding a pragmatic collaborative tagging (folksonomy) approach. The main interest is to to enable users to share knowledge. Ex.: http://www.stumpedia.com/
Semantic Web search engines. What are all these differences for?“Semantic Web means many things to different people:•It is about artificial intelligence, computer programs solving complex optimization problems •It is about web services, in terms of end user value•It is the web of data, where information is represented in RDF or microformats and OWL.”See: http://www.readwriteweb.com/archives/semantic_web_patterns_a_guide_redux.php
•Natural Language Processing (NLP)•Annotation
The components of Semantic Web
search engines
•AnnotationFree-text annotation:
The annotations can be comments, notes, explanations, references, examples, advice, corrections or any other type of external remark that can be attached to or embedded in a Web document or a selected part of the document.
See: http://www.ncb.ernet.in/groups/dake/annotate/intro.shtml
Semantic annotation in general Semantic annotation is the association of a data entity with an element from a classification scheme, ontology or other knowledge
repository
Examples Examples of of semantic annotation: semantic annotation: • the assignment of MeSH descriptors to citations in MEDLINE the assignment of MeSH descriptors to citations in MEDLINE • the assignment of Gene Ontology terms to gene products in the assignment of Gene Ontology terms to gene products in
UniProtUniProt
Semantic Web Annotation
It is crucial to the fulfillment of the Semantic Web to It is crucial to the fulfillment of the Semantic Web to give useful meaning to data or to unstructured textgive useful meaning to data or to unstructured text
• A semantic annotation is a formal annotation, where the predicate is an ontological term, and the object conforms to an ontological definition.
Is Is the technique for uploadingthe technique for uploading machine understandable data on the machine understandable data on the Web by creating metadata through semantic taggingWeb by creating metadata through semantic tagging
• The term “annotation” can denote both the process of annotating and the result of that process.
Semantic Web Annotation
See: http://www.deg.byu.edu/ding/research/SemanticAnnotation.html
• an ontology which describes the domain of interest
• a data instance recognition process that discovers all instances of interest in target web documents based on the defined ontology
• an annotation generation process creates a semantic meaning disclosure file for each annotated document. Through the semantic meaning disclosure file, any ontology-aware machine agent can understand the target document.
The The Semantic Web Annotation process includes three Semantic Web Annotation process includes three components:components:
Annotation: can be manually, automatically or semi-automatically generated
The process of annotating requires semantic annotation tools:
Types of semantic annotation tools
Inline annotation means that the original document is augmented with metadata information.
Embedded Embedded metadatametadata
<html>…<annot>…</html> Also called:
Semantic Authoring or
Bottom-up approach
It focuses on annotating information
on pagesusing RDF
so that it is machine readable
It is generally preferable from the point of view of inter-operability
Types of semantic annotation tools:
Standoff annotation means that the metadata is stored separately from the original document.
Also called: top-down approach. Its focus is leveraging information from existing web pages, to derive meaning
automatically
<html>…</html> annotation
Attached metadataAttached metadata
The annotations are then stored in a database that is
made available tousers via websites and
sometimes via web services
There are several choices for annotation
Initially NLP •is conceived as a support for Linguistics studies •aims at using computers to interpret and manipulate words as a part of a language
The components of Semantic Web search engines •Natural Language Processing (NLP)
Then •Artificial Intelligence defines NLP as the act of using computers to process written and spoken languages for some practical purpose such as translating languages, or carrying conversations with machines.
A powerful method for the A powerful method for the investigation and evaluation of investigation and evaluation of
human language itself. i.e. human language itself. i.e. enhanced study over large corpora enhanced study over large corpora
of textsof texts
After the Web explosion NLP has been used for the development of natural language understanding
systems that convert samples of human language into more formal representations that are easier to
manipulate for computer programs.
The components of Semantic Web search engines •Natural Language Processing (NLP)
Now•Thanks to the NLP techniques different algorithms such as chunking, clustering, parsing, spellchecking, tagging, and word sense disambiguation are used to handle text intelligently and to get information from the Web on text data banks in order to answer questions
ConclusionConclusion However, both methodologies are now being combined:•semantic web search engines need many pages to be annotated (which requires an enormous effort),•so that NLP becomes an important help in automatic or semi-automatic annotation.•At the same time the precision of text analysis may be optimized by means of techniques of assignment provided by users and professionals.
In conclusion, the trend is the development of In conclusion, the trend is the development of collective knowledge systems that improve as collective knowledge systems that improve as more people participate, as they are based on more people participate, as they are based on
human contributions. All of this will possibly be human contributions. All of this will possibly be integrated by NLP algorithms.integrated by NLP algorithms.
References
Iskold, Alex. (2006) Semantic Web Patterns: A Guide to Semantic Technologies. http://www.readwriteweb.com/archives/semantic_web_patterns_a_guide_redux.php Atanas, K. et al. (2005) Semantic Annotation, Indexing, and Retrieval. Ontotext Lab. http://www.ontotext.com/publications/SemAIR_ISWC169.pdf Vehvilainen, A. et al. (2006) SemiAutomatic Semantic Annotation and Authoring, Tool for a Library Help Desk Service. Helsinki University. http://www.seco.tkk.fi/publications/2006/vehvilainen-hyvonen-alm-semi-automatic-semantic-annotation-and-authoring-tool.pdf Diana Maynard (2005) Benchmarking ontology-based annotation tools for the Semantic Web. Department of Computer Science, University of Sheffield, UK.http://gate.ac.uk/sale/ahm05/ahm.pdf Good, Benjamin M ; Kawas, Edward ; Wilkinson, Mark. (2007) Bridging the gap between social tagging and semantic annotation: E.D. the Entity Describer. http://precedings.nature.com/documents/945/version/2/html Useful links:http://www.semanticfocus.com/http://logic.stanford.edu/oem/projects.html#_Coordinating_Collective_Workhttp://semantic-mediawiki.org/wiki/Semantic_MediaWiki