14
engines: between semantic annotation and Natural Language Processing Dentro i motori di ricerca semantici: tra annotazione semantica ed elaborazione della lingua naturale Incontro ISKO Italia - Torino 3 aprile 2009 Intervento di Mela Bosch [email protected]

Inside semantic Web search engines: between semantic annotation and Natural Language Processing Dentro i motori di ricerca semantici: tra annotazione semantica

Embed Size (px)

Citation preview

Page 1: Inside semantic Web search engines: between semantic annotation and Natural Language Processing Dentro i motori di ricerca semantici: tra annotazione semantica

Inside semantic Web search

engines: between semantic

annotation and Natural Language

Processing Dentro i motori di ricerca semantici: tra annotazione

semantica ed elaborazione della

lingua naturale

Incontro ISKO Italia - Torino 3 aprile 2009

Intervento diMela Bosch [email protected]

Page 2: Inside semantic Web search engines: between semantic annotation and Natural Language Processing Dentro i motori di ricerca semantici: tra annotazione semantica

Terminology on Web Search Engines

Text Search Engine: based on Lexical analysis. The main aim of the lexical analysis is to divide the text into paragraphs, sentences and words and also entities such as e-mail addresses or URLs. All these elements are knows as tokens, and the Search Engine makes a parsing with statistical parameters to develop a range of links as a response to a query.

Latent semantic indexing (LSI): based on Latent semantic analysis (LSA); LSI is a technique of Natural Language Processing (NLP) which uses an indexed database of documents to find similar terms. It can find a synonym and then return the best matched websites for the query. LSI does not require exact matching words for ranking result.

Semantic Web search engines: take the sense of a word as a factor in its ranking lists or offers the user a choice as to the sense of a word or phrase.

Page 3: Inside semantic Web search engines: between semantic annotation and Natural Language Processing Dentro i motori di ricerca semantici: tra annotazione semantica

Semantic Web search engines or Search engines of 3rd generation

Three types:User oriented Semantic Web search engine: It returns web page links. It can use internally both Semantic Web technologies and LSI. Ex.: True Knowledge, Hakia and PowerSet.

Semantic Web Services oriented engine: It returns links to ontologies, OWL files, RDF instances. It is inadequate for end users. Ex.: SOWL, WSE, Watson, Falcons, Sindice and Swoogle. The idea is to provide ways for businesses to inter-operate across domains or services.

Social-semantic Web oriented engine: The socio-semantic web (s2w) uses classification and ontologies in very practical situations. S2w search engines’ aim is to complement the formal Semantic Web vision adding a pragmatic collaborative tagging (folksonomy) approach. The main interest is to to enable users to share knowledge. Ex.: http://www.stumpedia.com/

Page 4: Inside semantic Web search engines: between semantic annotation and Natural Language Processing Dentro i motori di ricerca semantici: tra annotazione semantica

Semantic Web search engines. What are all these differences for?“Semantic Web means many things to different people:•It is about artificial intelligence, computer programs solving complex optimization problems •It is about web services, in terms of end user value•It is the web of data, where information is represented in RDF or microformats and OWL.”See: http://www.readwriteweb.com/archives/semantic_web_patterns_a_guide_redux.php

•Natural Language Processing (NLP)•Annotation

The components of Semantic Web

search engines

Page 5: Inside semantic Web search engines: between semantic annotation and Natural Language Processing Dentro i motori di ricerca semantici: tra annotazione semantica

•AnnotationFree-text annotation:

The annotations can be comments, notes, explanations, references, examples, advice, corrections or any other type of external remark that can be attached to or embedded in a Web document or a selected part of the document.

See: http://www.ncb.ernet.in/groups/dake/annotate/intro.shtml

Semantic annotation in general Semantic annotation is the association of a data entity with an element from a classification scheme, ontology or other knowledge

repository

Examples Examples of of semantic annotation: semantic annotation: • the assignment of MeSH descriptors to citations in MEDLINE the assignment of MeSH descriptors to citations in MEDLINE • the assignment of Gene Ontology terms to gene products in the assignment of Gene Ontology terms to gene products in

UniProtUniProt

Page 6: Inside semantic Web search engines: between semantic annotation and Natural Language Processing Dentro i motori di ricerca semantici: tra annotazione semantica

Semantic Web Annotation

It is crucial to the fulfillment of the Semantic Web to It is crucial to the fulfillment of the Semantic Web to give useful meaning to data or to unstructured textgive useful meaning to data or to unstructured text

• A semantic annotation is a formal annotation, where the predicate is an ontological term, and the object conforms to an ontological definition.

Is Is the technique for uploadingthe technique for uploading machine understandable data on the machine understandable data on the Web by creating metadata through semantic taggingWeb by creating metadata through semantic tagging

• The term “annotation” can denote both the process of annotating and the result of that process.

Page 7: Inside semantic Web search engines: between semantic annotation and Natural Language Processing Dentro i motori di ricerca semantici: tra annotazione semantica

Semantic Web Annotation

See: http://www.deg.byu.edu/ding/research/SemanticAnnotation.html

• an ontology which describes the domain of interest

• a data instance recognition process that discovers all instances of interest in target web documents based on the defined ontology

• an annotation generation process creates a semantic meaning disclosure file for each annotated document. Through the semantic meaning disclosure file, any ontology-aware machine agent can understand the target document.

The The Semantic Web Annotation process includes three Semantic Web Annotation process includes three components:components:

Page 8: Inside semantic Web search engines: between semantic annotation and Natural Language Processing Dentro i motori di ricerca semantici: tra annotazione semantica

Annotation: can be manually, automatically or semi-automatically generated

The process of annotating requires semantic annotation tools:

Types of semantic annotation tools

Inline annotation means that the original document is augmented with metadata information.

Embedded Embedded metadatametadata

<html>…<annot>…</html> Also called:

Semantic Authoring or

Bottom-up approach

It focuses on annotating information

on pagesusing RDF

so that it is machine readable

Page 9: Inside semantic Web search engines: between semantic annotation and Natural Language Processing Dentro i motori di ricerca semantici: tra annotazione semantica

It is generally preferable from the point of view of inter-operability

Types of semantic annotation tools:

Standoff annotation means that the metadata is stored separately from the original document.

Also called: top-down approach. Its focus is leveraging information from existing web pages, to derive meaning

automatically

<html>…</html> annotation

Attached metadataAttached metadata

The annotations are then stored in a database that is

made available tousers via websites and

sometimes via web services

Page 10: Inside semantic Web search engines: between semantic annotation and Natural Language Processing Dentro i motori di ricerca semantici: tra annotazione semantica

There are several choices for annotation

Page 11: Inside semantic Web search engines: between semantic annotation and Natural Language Processing Dentro i motori di ricerca semantici: tra annotazione semantica

Initially NLP •is conceived as a support for Linguistics studies •aims at using computers to interpret and manipulate words as a part of a language

The components of Semantic Web search engines •Natural Language Processing (NLP)

Then •Artificial Intelligence defines NLP as the act of using computers to process written and spoken languages for some practical purpose such as translating languages, or carrying conversations with machines.

A powerful method for the A powerful method for the investigation and evaluation of investigation and evaluation of

human language itself. i.e. human language itself. i.e. enhanced study over large corpora enhanced study over large corpora

of textsof texts

Page 12: Inside semantic Web search engines: between semantic annotation and Natural Language Processing Dentro i motori di ricerca semantici: tra annotazione semantica

After the Web explosion NLP has been used for the development of natural language understanding

systems that convert samples of human language into more formal representations that are easier to

manipulate for computer programs.  

The components of Semantic Web search engines •Natural Language Processing (NLP)

Now•Thanks to the NLP techniques different algorithms such as chunking, clustering, parsing, spellchecking, tagging, and word sense disambiguation are used to handle text intelligently and to get information from the Web on text data banks in order to answer questions

Page 13: Inside semantic Web search engines: between semantic annotation and Natural Language Processing Dentro i motori di ricerca semantici: tra annotazione semantica

ConclusionConclusion However, both methodologies are now being combined:•semantic web search engines need many pages to be annotated (which requires an enormous effort),•so that NLP becomes an important help in automatic or semi-automatic annotation.•At the same time the precision of text analysis may be optimized by means of techniques of assignment provided by users and professionals.

In conclusion, the trend is the development of In conclusion, the trend is the development of collective knowledge systems that improve as collective knowledge systems that improve as more people participate, as they are based on more people participate, as they are based on

human contributions. All of this will possibly be human contributions. All of this will possibly be integrated by NLP algorithms.integrated by NLP algorithms.

Page 14: Inside semantic Web search engines: between semantic annotation and Natural Language Processing Dentro i motori di ricerca semantici: tra annotazione semantica

References

Iskold, Alex. (2006) Semantic Web Patterns: A Guide to Semantic Technologies. http://www.readwriteweb.com/archives/semantic_web_patterns_a_guide_redux.php Atanas, K. et al. (2005) Semantic Annotation, Indexing, and Retrieval. Ontotext Lab. http://www.ontotext.com/publications/SemAIR_ISWC169.pdf Vehvilainen, A. et al. (2006) SemiAutomatic Semantic Annotation and Authoring, Tool for a Library Help Desk Service. Helsinki University. http://www.seco.tkk.fi/publications/2006/vehvilainen-hyvonen-alm-semi-automatic-semantic-annotation-and-authoring-tool.pdf Diana Maynard (2005) Benchmarking ontology-based annotation tools for the Semantic Web. Department of Computer Science, University of Sheffield, UK.http://gate.ac.uk/sale/ahm05/ahm.pdf Good, Benjamin M ; Kawas, Edward ; Wilkinson, Mark. (2007) Bridging the gap between social tagging and semantic annotation: E.D. the Entity Describer. http://precedings.nature.com/documents/945/version/2/html  Useful links:http://www.semanticfocus.com/http://logic.stanford.edu/oem/projects.html#_Coordinating_Collective_Workhttp://semantic-mediawiki.org/wiki/Semantic_MediaWiki