77
Semantic Search (Semantische Suche) Bonn, 12. März 2013 Dr. Harald Sack Hasso-Plattner-Institut for IT-Systems Engineering University of Potsdam Donnerstag, 14. März 13

Semantic Search

Embed Size (px)

DESCRIPTION

Presentation at VfM Seminar 'Medieninformation und Mediendokumentation', Bonn, 12.03.2013

Citation preview

Page 1: Semantic Search

Semantic Search(Semantische Suche)Bonn, 12. März 2013

Dr. Harald SackHasso-Plattner-Institut for IT-Systems Engineering

University of Potsdam

Donnerstag, 14. März 13

Page 2: Semantic Search

vfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut Potsdam

■ HPI was founded in October 1998 as a public-private-partnership

■ HPI research and teaching is focussed onIT Systems Engineering

■ 10 professors and 100 scientific coworkers■ 450 bachelor / master students ■ Winner of CHE-ranking 2010

Hasso Plattner Institute for IT Systems EngineeringUniversity of Potsdam

2

Donnerstag, 14. März 13

Page 3: Semantic Search

■Research Topics■Semantic Web Technologies■Ontological Engineering■ Information Retrieval■Multimedia Analysis & Retrieval■Social Networking■Data/Information Visualization

■Research Projects:

Research Group Semantic Technologies & Multimedia Retrieval

Donnerstag, 14. März 13

Page 4: Semantic Search

4

Semantic Search

Inhalt:■ Introduction■ Media Analysis ■ Semantic Analysis■ Semantic Search■ Explorative Search■ Realization

Albrecht Dürer: Melancholia I, 1514

Donnerstag, 14. März 13

Page 5: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

5

The ‘Google Dilemma‘Donnerstag, 14. März 13

Page 6: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

6

Donnerstag, 14. März 13

Page 7: Semantic Search

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, Workshop ,Corporate Semantic Web‘, XInnovations 2011, Berlin, 19. Sep. 2011Google Multimedia Search

Donnerstag, 14. März 13

Page 8: Semantic Search

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Content Based Retrieval is Based on Textual Metadata

Google Multimedia SearchDonnerstag, 14. März 13

Page 9: Semantic Search

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Seach by Media Content

Donnerstag, 14. März 13

Page 10: Semantic Search

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

The Ordinary Archive is a Small World...

Jules Verne

Donnerstag, 14. März 13

Page 11: Semantic Search

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

But, wouldn‘t it be nice, if.....

Jules Verne

...but maybe you are also interested in- George Melies (2 videos)- Mark Twain (1 video)- H.G. Wells (2 videos)- science fiction (11 videos)- adventure (20 videos)- France (101 videos)- Moon (33 videos)- literature (434 videos)- art (1.205 videos)

Donnerstag, 14. März 13

Page 12: Semantic Search

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

(Traditional)Information Retrieval

Donnerstag, 14. März 13

Page 13: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

13

(Simplified) Information Retrieval Model

(acc. to Salton,G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York 1983)

Set of Documents

files of records

Set of Queries

Information requests

indexing language

similarity

indexingQueryFormulation

Donnerstag, 14. März 13

Page 14: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

14

relevant documents retrieved documents

relevant documents that have been retrieved

RP

Recall=| R ∩ P |

|R|

Precision=| R ∩ P |

|P|

Fα=(1+α)⋅(Recall ⋅ Precision )

α⋅(Recall + Precision )

Evaluation of Information Retrieval Systems

Donnerstag, 14. März 13

Page 15: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

15

Search Engines in the World Wide Web

• The World Wide Web is a distributed hypermedia system that

•consists of multimedia documents and• is connected via hyperlinks

Donnerstag, 14. März 13

Page 16: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

16

URL list

http://www.xxxx.de/1234...http://www.xxxx.de/2234...http://www.xxxx.de/3234...http://www.xxxx.de/4234...http://www.xxxx.de/5234...http://www.xxxx.de/6234...http://www.xxxx.de/7234......

<a href=“...“ .../>

<a href=“...“ .../>

HTMLdocuments

WWW-ServerHTTP Request

WWW server delivers requestedHTML documents to the web crawler

1

2

3

4

Information Gathering via Web Crawler (Robot)Search Engines in the WWW

Donnerstag, 14. März 13

Page 17: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

17

Data Normalization

Web Crawler

Data Analysis and creation of

index data structures

Preprocessing and IndexingSearch Engines in the WWW

Tokenization

Speech Identification

Word Stemming

POS-Tagging

Descriptor Generation

Document Preprocessing

Donnerstag, 14. März 13

Page 18: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

18

Efficient Index Data Structures

Aachen

Altavista

Ananas

……

Zustand

Zypern

Index

AnanasDocID Pos Frequency Weight

D123 1;13;77;132 4 9.4D456 22;38 2 6.7 … … … …D998 15 1 1.2

Location List D123Frequency URL <H1> … <H6> <title> … text

4 1 1 0 1 … 1

D123 http://producers.ananas.org/index.htm

<html><head><title=“Ananas around the World“></head><body> … </body></html>

Inverted File

File

Search Engines in the WWW

Donnerstag, 14. März 13

Page 19: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

19

Relevance Ranking

• based on Link Popularity (Google PageRank)

A

1.0

D

1.0

B

1.0

C

1.0

Start

Nr. PR(A) PR(B) PR(C) PR(D)1 1,0 1,0 1,0 1,02 1,0 0,575 2,275 0,153 2,083 0,575 1,191

20,15

… … … … …n 1,49 0,7833 1,577 0,15

Iteration of the PageRank computationA

1.49

D

0,15

B

0,78

C

1.57

resulting PageRank

Search Engines in theWWW

Donnerstag, 14. März 13

Page 20: Semantic Search

Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

The Web is big. Really big. You just won't believe how vastly, hugely, mind-bogglingly big it is.(...according to Douglas Adams)

20

Donnerstag, 14. März 13

Page 21: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

21

Language has its fa

llacies...

Donnerstag, 14. März 13

Page 22: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

22

in particular,

if we don‘t know the langua

ge

Donnerstag, 14. März 13

Page 23: Semantic Search

23

Semantic Search

Inhalt:■ Introduction■ Media Analysis ■ Semantic Analysis■ Semantic Search■ Explorative Search■ Realization

Donnerstag, 14. März 13

Page 24: Semantic Search

vfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut Potsdam

Step 1: Digitization of analog media

Step 2: Annotation with (text-based) metadata

Searching a (Multi) Media Archive

Step 3: Content-based Retrieval based on available metadata

Donnerstag, 14. März 13

Page 25: Semantic Search

vfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut Potsdam

Today: Manual Annotation

Donnerstag, 14. März 13

Page 26: Semantic Search

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

VisualConceptDetection

Text Recognition

Visual Analysis

Automated Media Analysis

Face Detection

Face Detection

Audio-Mining

structuralanalysis

AutomatedSpeech

Recognitionaudio event detection

Logo Detection

Donnerstag, 14. März 13

Page 27: Semantic Search

vfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut Potsdam

■Result: multimedia data with spatio-temporal Annotations

temporal metadata (e.g. MPEG-7) ... <Video> <TemporalDecomposition> <VideoSegment> <TextAnnotation> <KeywordAnnotation> <Keyword>Astronaut</Keyword> </KeywordAnnotation> </TextAnnotation> <MediaTime> <MediaTimePoint> T00:05:05:0F25 </MediaTimePoint> <MediaDuration> PT00H00M31S0N25F </MediaDuration> </MediaTime> ... </VideoSegment> </TemporalDecomposition> </Video> ...

time

Automated Media Analysis

Donnerstag, 14. März 13

Page 28: Semantic Search

vfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut Potsdam

■Result: multimedia data with spatio-temporal Annotations

time

Automated Media Analysis

spatial metadata (e.g. MPEG-7) ... <SpatialDecomposition> <TextAnnotation> <KeywordAnnotation> <Keyword>Astronaut</Keyword> </KeywordAnnotation> </TextAnnotation> <SpatialMask> <SubRegion> <Polygon> <Coords> 480 150 620 480 </Coords> </Polygon> </SubRegion> </SpatialMask> ... </SpatialDecomposition> ...

Donnerstag, 14. März 13

Page 29: Semantic Search

vfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut PotsdamDonnerstag, 14. März 13

Page 30: Semantic Search

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

• Authoritative Metadata• structured data• semi-structured data

• natural language text • Non-authoritative Metadata

• (free) user tags and comments• restricted vocabularies

• (Media) Analysis Metadata• low level features• high level features

• etc.

How to Determine the Meaning of Metadata?

SemanticAnalysis

reliability

context

pragmatics

location dependency

accuracy

timedependency

level ofabstraction

Donnerstag, 14. März 13

Page 31: Semantic Search

4242 42 4224424242 42 424231

Semantic Search

Inhalt:■ Introduction■ Media Analysis ■ Semantic Analysis■ Semantic Search■ Explorative Search■ Realization

Donnerstag, 14. März 13

Page 32: Semantic Search

• MPEG-7 has been re-engineered to become an OWL-DL ontology (2007: Arndt et al., COMM model)

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

32

4242 42 4224424242 42 4242Multimedia OntologiesSemantic Metadata

• Localize a region → Draw a bounding box

• Annotate the content → Interpret the content → Tag ,Astronaut‘

Donnerstag, 14. März 13

Page 33: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

4242 42 4224424242 42 424233

Example: Tagging with an MPEG-7 Ontology

Reg1

mpeg7:image

mpeg7:depicts

“Man on the Moon“

mpeg7:spatial_decomposition Reg1

mpeg7:StillRegion

rdf:type

mpeg7:depicts

“Neil Armstrong“

mpeg7:SpatialMask

mpeg7:polygon

mpeg7:Coords

Multimedia OntologiesSemantic Metadata

Donnerstag, 14. März 13

Page 34: Semantic Search

Neil Armstrong

Astronaut

is a

Person

is a

Science Occupation

subClassOf

Employment

subClassOf

Entities

Ontologies

has an

,Neil Armstrong‘ is more than just a character string

Kosmonautsame as

Juri Gagarin

is a

is NOT a

Donnerstag, 14. März 13

Page 35: Semantic Search

Where does the knowledge come from...?

Donnerstag, 14. März 13

Page 36: Semantic Search

Web of Data = Linked Open Data

Donnerstag, 14. März 13

Page 37: Semantic Search

Where does the knowledge come from...?

:Neil_Armstrong rdf:type dbpedia-owl:Astronaut .

subject property object

:Neil_Armstrongrdf:type

dbpedia-owl:Astronaut

Donnerstag, 14. März 13

Page 38: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

38

Astronaut

Named Entity Mapping

Person

Neil Armstrong

Science Occupation

Employment

is a is a

subClassOf

subClassOf

rdfs:label Neil Armstrong

rdf:type dbpedia-owl:Astronaut

rdf:type foaf:Person

Donnerstag, 14. März 13

Page 39: Semantic Search

4242 42 4224424242 42 4242

vfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut Potsdam

39

Semantic Multimedia Retrieval

Video Analysis /Metadata Extraction

timemetadata

metadatametadata

metadatametadata

e.g., person xylocation yzevent abc

e.g., bibliographical data,geographical data,encyclopedic data, ..

Entity Recognition/ Mapping

N. Ludwig, H. Sack: Named Entity Recognition for User-Generated Tags. In Proc. of the 8th Int. Workshop on Text-based Information Retrieval, IEEE CS Press, 2011

Donnerstag, 14. März 13

Page 40: Semantic Search

4242 42 4224424242 42 4242

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

40

http://dbpedia.org/resource/Neil_Armstrong

Semantic AnalysisNamed Entity Mapping

„Armstrong landed the Eagle on the Moon.“ Text

Entity Mapping

How do I find the right entity?

Donnerstag, 14. März 13

Page 41: Semantic Search

Semantic Web Technologies , Dr. Harald Sack, Hasso Plattner Institute, University of Potsdam

Armstrong

Donnerstag, 14. März 13

Page 42: Semantic Search

4242 42 4224424242 42 4242Semantic AnalysisNamed Entity Mapping

„Armstrong landed the Eagle on the Moon.“Text

Armstrong, Florida

Determine possible Entity Mapping Candidates

Armstrong, Ontario

Armstrong County, Texas

Armstrong Tunnel

Louis Armstrong

Armstrong Tools

Armstrong (moon crater)

Armstrong (car)

The Armstrongs

Craig Armstrong

Anton Armstrong

Edward Armstrong

Gary Armstrong

George Armstrong

The Armstrong Twins

Ian Armstrong

+ 400 more...

Neil Armstrong

Armstrong Bridge

Lance Armstrong

Armstrong, Ontario

Entity Candidate Generation

Donnerstag, 14. März 13

Page 43: Semantic Search

4242 42 4224424242 42 4242Semantic AnalysisNamed Entity Mapping

„Armstrong landed the Eagle on the Moon.“Text

Determine possible Entity Mapping Candidates• linguistic analysis (POS tagging)• normalization• encoding and spelling• special (language dependent) characters• language dependent spellings• abbreviations, acronyms• type dependent spellings• alternative names and synonyms• fuzzy string mapping• ...

Entity Candidate Generation

Donnerstag, 14. März 13

Page 44: Semantic Search

4242 42 4224424242 42 4242Semantic AnalysisNamed Entity Mapping

„Armstrong landed the Eagle on the Moon.“Text

Entity Selection is determined by• context• ambiguity of source data / mapping• accuracy /reliability of source data / mapping

Armstrong, Florida

Armstrong, Ontario

Armstrong County, Texas

Armstrong Tunnel

Louis Armstrong

Armstrong Tools

Armstrong (moon crater)

Armstrong (car)

The Armstrongs

Craig Armstrong

Anton Armstrong

Edward Armstrong

Gary Armstrong

George Armstrong

Ian Armstrong

Neil Armstrong

Armstrong Bridge

Armstrong, Ontario

Entity Selection Process

Donnerstag, 14. März 13

Page 45: Semantic Search

4242 42 4224424242 42 4242Semantic AnalysisNamed Entity Mapping

TemporalContext

SpatialContext

Context Item

SocialContext

Contextual Description

ClassDiversity

Level of Structure

SourceReliability

SourceDiversity

Context Dimensions

Ambiguity Accuracy

influences influences

Relevance

determines

N.Steinmetz, H.Sack: Semantic Multimedia Information Retrieval Based on Contextual Descriptions, 2013

„Armstrong landed the Eagle on the Moon.“Text

SEMEX Multimedia Context Model

Donnerstag, 14. März 13

Page 46: Semantic Search

Armstrong

George Armstrong Custer

Neil Armstrong

The Armstrong Twins

Armstrong, Florida

Armstrong, Ontario

Armstrong Automobile

Joe ArmstrongArmstrong County, Texass

Armstrong Gun

Craig Armstrong

Armstrong (Moon Crater)

Louis Armstrong

Armstrong Tunnel

Louis Armstrong International Airport

Armstrong‘s Theorem

Sir Thomas Armstrong

Ian Armstrong

Eagle Moon

Eagle (Bird)

Eagle (heraldry)

USCGC Eagle

The Eagle (2011 film)

Eagle (song)

John H. EagleEagle (typeface)

Eagle Falls (Washington)

Eagle (Moon Crater)

Eagle (comic)

Eagle (lunar module)

Eagle TV

Armstrong Tunnel

The Eagle (Pub)

War Eagle

The Eagle (newspaper)

Eagle (racehorse)

Angela EagleLinda Eagle

James Philipp Eagle

95 entities448 entities

Armstrong (British Columbia)Karen Armstrong

Curtis Armstrong

Gillian Armstrong Hilary Armstrong

William L. Armstrong

156 entities

Man on the Moon (film)

Moon (song)

Moon Son-Ri

C Moon

The Moon (Tarot card)

Edgar Moon

Moon OSMoon (Band)

Moon

Moon 44

Man on the Moon (soundtrack)

William Moon

Lottie Moon

Mr. Moon (song)

Man on the Moon (musical)

Darvin Moon

Moon 83

Francis MoonGary Moon

Robert Charles Moon

Black Moon

Allan Moon

Ban-Ki Moon

Fly me to the Moon (song)

Semantic AnalysisNamed Entity Mapping

„Armstrong landed the Eagle on the Moon.“

Consider all entities within the same context

Donnerstag, 14. März 13

Page 47: Semantic Search

Select matching entities from all possible candidate entities: • Popularity based strategies• Linguistical strategies• Statistical strategies• Semantic based strategies

General Approach1. Make an assumption 2. Do the strategies support or contradict your assumption3. Make decision according to logical and probabilistic rules/constraints

Semantic AnalysisNamed Entity Recognition

N. Ludwig, H. Sack, “Named entity recognition for user-generated tags,TIR 2011

• reference text corpus(wikipedia)

• link graph (wikipedia)• semantic graph

(dbpedia)

Entity Selection Process

Donnerstag, 14. März 13

Page 48: Semantic Search

Armstrong

George Armstrong Custer

The Armstrong Twins

Armstrong, Florida

Armstrong, Ontario

Armstrong Automobile

Joe ArmstrongArmstrong County, Texass

Armstrong Gun

Craig Armstrong

Armstrong (Moon Crater)

Armstrong Tunnel

Louis Armstrong International Airport

Armstrong‘s Theorem

Sir Thomas Armstrong

Ian Armstrong

Eagle Moon

Eagle (Bird)

Eagle (heraldry)

USCGC Eagle

The Eagle (2011 film)

Eagle (song)

John H. EagleEagle (typeface)

Eagle Falls (Washington)

Eagle (Moon Crater)

Eagle (comic)

Eagle TV

Armstrong Tunnel

The Eagle (Pub)

War Eagle

The Eagle (newspaper)

Eagle (racehorse)

Angela EagleLinda Eagle

James Philipp Eagle

95 entities448 entities

Armstrong (British Columbia)Karen Armstrong

Curtis Armstrong

Gillian Armstrong Hilary Armstrong

William L. Armstrong

156 entities

Man on the Moon (film)

Moon (song)

Moon Son-Ri

C Moon

The Moon (Tarot card)

Edgar Moon

Moon OSMoon (Band)

Moon 44

Man on the Moon (soundtrack)

William Moon

Lottie Moon

Mr. Moon (song)

Man on the Moon (musical)

Darvin Moon

Moon 83

Francis MoonGary Moon

Robert Charles Moon

Black Moon

Allan Moon

Ban-Ki Moon

Neil Armstrong

Eagle (lunar module)

Moon

Louis Armstrong

Fly me to the Moon (song)

Semantic AnalysisNamed Entity Recognition

„Armstrong landed the Eagle on the Moon.“

N. Steinmetz, H.Sack: Semantic Multimedia Information Retrieval Based on Contextual Descriptions, 2013

Entity Selection Process(Semantic) Graph Analysis

Donnerstag, 14. März 13

Page 49: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamTurmbau zu Babel, Pieter Brueghel, 1563

Semantic Search

Inhalt:■ Introduction■ Media Analysis ■ Semantic Analysis■ Semantic Search■ Explorative Search■ Realization

Donnerstag, 14. März 13

Page 50: Semantic Search

vfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut PotsdamTurmbau zu Babel, Pieter Brueghel, 1563

How to use semantic metadata in retrieval?

Donnerstag, 14. März 13

Page 51: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

51

(One of many Definitions...)Semantic Search

• Annotation of (text-based) metadata with semantic entities• Entity-based Information Retrieval• Make use of semantic relations, as e.g. content-based

similarities of relationships• Interoperable metadata via semantic annotations• for content-based description• for structural / technical description (Multimedia Ontologies)

Overall Goal: Quantitative and qualitative improvement of Information Retrieval

Donnerstag, 14. März 13

Page 52: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität PotsdamTurmbau zu Babel, Pieter Brueghel, 1563

Semantic metadata enable improvement of traditional keyword-based retrieval by(1) Query String Extension/Refinement

enables more precise or more complete search results(2) Cross Referencing

enables to complement search results with additional associated or similar information

(3) Exploratory Search enables visualization and navigation of the search space

(4) Reasoningenables to complement search results with implicitly given information

Donnerstag, 14. März 13

Page 53: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

53

Semantic SearchQuery String Extension

• Keyword-based search does not deliver all search results that are relevant for a query, because synonyms and metaphors might describe the queried content.

• Extension of the original query string (Query Extension)• from dictionaries and thesauri

• extend query with synonyms, hyponyms (specializations), etc.• from domain ontologies

• extend query with meronyms (part-of), related concepts, etc.

Original query string: Bank

possible extensions: Bank ∨ depository financial institution ∨ credit union ∨ acquirer ∨ federal reserve ∨ ... increase recall

Donnerstag, 14. März 13

Page 54: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

54

Semantic SearchQuery String Refinement

• Keyword-based search does also deliver search results that are not relevant for a query, because query terms and document terms might be ambiguous.

• Refinement of the original query string (Query Refinement)• from dictionaries and thesauri

• disambiguate polysemic terms with hypernyms (generalizations)• from domain ontologies

• disambiguate polysemic terms with holonyms

Original query string: Bank

possible refinements: (1) Bank ∧ financial institution (2) Bank ∧ incline ∧ slope ∧ side (3) Bank ∧ container (4) Bank ∧ deposit ∧ repository increase precision

Donnerstag, 14. März 13

Page 55: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

55

Semantic SearchCross Referencing

• Provide search results that do not literally contain the query string but are closely related to the query by content• Apply domain ontologies for determining related concepts• Appy statistical analysis of large (text) document

corpora

dbpedia:Neil_Armstrong

dbpedia:Apollo_11

dbprop:mission

Neil Armstrong NER

dbprop:mission

dbprop:mission

query string

dbpedia:Buzz_Aldrin

dbpedia:Michael_Collins

Donnerstag, 14. März 13

Page 56: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

56

Semantic SearchExploratory Search

• Provide additional search results that do not necessarely contain the query string but are related to the query by content or also are related to the search results achieved by the direct query• Apply domain ontologies and heuristics to determine the

relevance of facts

95

dbpedia:Apollo_11

category:Apollo_program

dbpedia:Apollo_13

dcterms:subject

yago:Space_accidents_and_incidents

rdf:type

rdf:type

dbpedia-owl:mission

dbpedia:Neil_Armstrong

dbpedia:Space_Shuttle_Challenger

dcterms:subject

Donnerstag, 14. März 13

Page 57: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

57

Semantic SearchReasoning

• Provide additional search results (and information) that do not necessarely contain the query string but are related to the query by content, whereby the relation may not be a direct one, but can be derived via entailment. • Apply domain ontologies, resoning algorithms and

heuristics to find new facts and determine the relevance of facts

95

Donnerstag, 14. März 13

Page 58: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

58

Semantic SearchReasoning

95

dbpedia:Neil_Armstrong

dbpedia:Apollo_11

dbpedia-owl:mission

category:Missions_to_the_Moon

dcterms:subjectcategory:Exploration_of_the_Moon

dcterms:subject

category:Spaceflight

skos:broader

dbpedia:Moon category:Animals_in_Space

dcterms:subject skos:broader

Example: query string= Neil Armstrong

(Hard) questions to solve via reasoning:•Will there be the Moon or documents about the Moon in the search results?• How is Neil Armstrong related to the Moon? (is he?)•Was Neil Armstrong (really) on the Moon?• ...

category:Moon

skos:broader

Donnerstag, 14. März 13

Page 59: Semantic Search

vfm - Seminar: Metadatenmanagement in Medienunternehmen, 05. September 2012, Bonn Jörg Waitelonis, Hasso-Plattner-Institut Potsdam

596

Semantic Search

Inhalt:■ Introduction■ Media Analysis ■ Semantic Analysis■ Semantic Search■ Explorative Search■ Realization

http://www.gocomics.com/calvinandhobbes/Donnerstag, 14. März 13

Page 60: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

60

Searching is not always

just searching

Donnerstag, 14. März 13

Page 61: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

61

I‘m looking for the book „Brave New World“ by Aldous Huxley in the first German edition...

Brave New World. - Aldous H U X L E Y.

- The Albatros Continental Library, 47

(Hamburg usw., Albatros Verlag, 1933)

257 S. 8“

II 1, 2506, 34548

Donnerstag, 14. März 13

Page 62: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

62

I really liked „Brave New World“ by Aldous Huxley but how should I find what to read next...?

Donnerstag, 14. März 13

Page 63: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

63

Exploratory Search• What, if the user does not know, which query string to use?• What, if the user is looking for complex answers?• What, if the user does not know the domain he/she is looking for?• What, if the user wants to know all(!) about a specific topic?

• ...,Browsing‘ instead of ,Searching‘• ...to find something by chance, i.e. Serendipity• ...to get an overview• ...enable content based navigation

Donnerstag, 14. März 13

Page 64: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

64

Gather knowledge about dbpedia:Brave_New_Worldand decide, which interesting fact to follow....

http://dbpedia.org/page/Brave_New_World

Enable Exploratory Search based on Linked Open Data

Donnerstag, 14. März 13

Page 65: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

65

dbpedia:Brave_New_World

dbpedia-owl:author

dbpedia:Aldous_Huxley

dbpedia-owl:author

dbpedia-owl:author

dbpedia-owl:author

Donnerstag, 14. März 13

Page 66: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

66

dbpedia:Brave_New_World

dbpedia-owl:author

dbpedia:Aldous_Huxley

dbpedia:ontology/influences

dbpedia:H._G._Wells

dbpedia:ontology/influences

dbpedia:George_Orwell

dbpedia:ontology/influences

dbpedia:Michel_Houellebecq

Donnerstag, 14. März 13

Page 67: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

67

dbpedia:H._G._Wells dbpedia:George_Orwell dbpedia:Michel_Houellebecq

dbpedia-owl:notableWork

dbpedia:Les_Particules_élémentaires

dbpedia-owl:notableWork

dbpedia:Nineteen_Eighty-Four

dbpedia-owl:notableWork

dbpedia:The_Time_Machine

Donnerstag, 14. März 13

Page 68: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

68

dbpedia-owl:author

dbpedia:Aldous_Huxley

...and now please surprise me.....SERENDIPITY

dbpedia:Tim_Berners-Leerdf:type

dbpedia:World_Wide_Web

dbpprop:inventor

Yago:EnglishExpatriatesInTheUnitedStates

rdf:type

rdf:type

dbpedia:Patrick_Stewart

dbpedia:Star_Trek:_The_Next_Generation

dbpedia-owl:starring

Donnerstag, 14. März 13

Page 69: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

69

Explorative Search

dbpedia-owl:mission

dbpedia:Neil_Armstrong

dbpedia:Apollo_11dbpedia-owl:mission

category:Apollo_program

dcterms:subject

dbpedia:Apollo_13

dcterms:subject

yago:Space_accidents_and_incidents

rdf:type

rdf:type

dbpedia:Space_Shuttle_Challenger

dbpedia-owl:mission

dbpedia:Buzz_Collins

dbpedia:Michael_Collins

Donnerstag, 14. März 13

Page 70: Semantic Search

Exploratory Search and Serendipity

•Find something that you were not looking for on purpose ...

dbpedia:Buzz_Collins

dbpedia:Cookie_Monster

dbpedia:Strictly_Come_Dancing

Donnerstag, 14. März 13

Page 71: Semantic Search

Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

71

Semantic Search

Inhalt:■ Introduction■ Multimedia Analysis ■ Semantic Analysis■ Semantic Search■ Explorative Search■ Realization

Donnerstag, 14. März 13

Page 72: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

72

Entity Based Search

• linguistic ambiguities of traditional keyword based search can be avoided

• enables high precision and high recall retrieval

http://www.yovisto.com/labs/autosuggestion/

• Query string refinement / extension• entity auto-suggestion• interpretation of natural language queries

J. Osterhoff, J. Waitelonis, H. Sack, Widen the Peepholes! Entity-Based Auto-Suggestion as a rich and yet immediate Starting Point for Exploratory Search, IVDW 2012

Donnerstag, 14. März 13

Page 73: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

73

http://mediaglobe.yovisto.com:8080/mggui-dev2/

search facets

C. Hentschel, H. Sack, et al., Open up cultural heritage in video archives with mediaglobe, I2CS 2012

Donnerstag, 14. März 13

Page 74: Semantic Search

Donnerstag, 14. März 13

Page 75: Semantic Search

Vorlesung Semantic Web, Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam

75

Exploratory Search with yovisto

J. Waitelonis, H. Sack: Towards exploratory video search using linked data, MTAP Volume 59, Number 2 (2012), 645-672

http://mediaglobe.yovisto.com:8080/

Donnerstag, 14. März 13

Page 76: Semantic Search

76

Semantic Search

Inhalt:■ Introduction■ Media Analysis ■ Semantic Analysis■ Semantic Search■ Explorative Search■ Realization

Albrecht Dürer: Melancholia I, 1514

Donnerstag, 14. März 13

Page 77: Semantic Search

Harald Sack, Hasso-Plattner-Institute for IT-Systems Engineering, LDW 2011, Magdeburg, 30. Sep. 2011

Contact:Dr. Harald SackHasso-Plattner-Institut für SoftwaresystemtechnikUniversität PotsdamProf.-Dr.-Helmert-Str. 2-3D-14482 Potsdam

Homepage:http://www.hpi.uni-potsdam.de/meinel/team/sack.htmlBlog: http://yovisto.blogspot.com/E-Mail: [email protected] Twitter: lysander07 / biblionomicon / yovisto Slides can be found at http://slideshare.com/lysander07/

more about Semantic Web Technologies t http://www.openhpi.de/

Thank you very much

for your attention!

Donnerstag, 14. März 13