Interlinking Personal Semantic Data on the Semantic Desktop and the Web of Data

Embed Size (px)

Citation preview

Slide 1

Interlinking Personal Semantic Dataon the Desktop and the Web

Laura Drgan

Outline

// IntroductionBackground and motivation

Research questions

// Directions and resultsWithin the Semantic desktop

To the Web of Data

A use case to rule them all

// ConclusionResearch answers

Future work

1

Outline

// IntroductionBackground and motivation

Research questions

// Directions and resultsWithin the Semantic desktop

To the Web of Data

A use case to rule them all

// ConclusionResearch answers

Future work

1

Outline

// IntroductionBackground and motivation

Research questions

// Directions and resultsWithin the Semantic desktop

To the Web of Data

A use case to rule them all

// ConclusionResearch answers

Future work

1

Background

Personal Information Management

2

Background

Personal Information Management

1945

19621968

1965

2

Background

Personal Information ManagementWeb

2

Background

Personal Information ManagementSemantic Web

2

Background

Personal Information ManagementSemantic Web

Semantic Desktop

2

Background

Personal Information ManagementSemantic Web

Semantic Desktop

2

Motivation

Use the framework provided by the Semantic Desktop to build useful applications and services3

Research questions

How to build semantic applications and tools for the Semantic Desktop to provide the best experience for the users, while creating reusable semantic data?

4

Research questions

How to build semantic applications and tools for the Semantic Desktop to provide the best experience for the users, while creating reusable semantic data?

4

Research questions

How to build semantic applications and tools for the Semantic Desktop?

4

Research questions

How to build semantic applications and tools for the Semantic Desktop?

How to expand the scope of the Semantic Desktop into the realm of the Web of Data, to benefit the users and enhance their experience?

4

Research questions

How to build semantic applications and tools for the Semantic Desktop?

How to expand the scope of the Semantic Desktop into the realm of the Web of Data, to benefit the users and enhance their experience?

4

Research questions

How to build semantic applications and tools for the Semantic Desktop?

How to expand the scope of the Semantic Desktop into the Web of Data?

4

Q1 sub-questions

semantic applications for the Semantic Desktop

How to create semantic data that is complete, correct, safe, and provides a high degree of interlinking with the already existing network of semantic data on the desktop?

How to reuse existing Semantic Desktop data in an application?

How to design the human-computer interaction in an application for the Semantic Desktop?

How to correctly evaluate a semantic application?

5

Q1 sub-questions

semantic applications for the Semantic Desktop

How to create semantic data that is complete, correct, safe, and provides a high degree of interlinking with the already existing network of semantic data on the desktop?

How to reuse existing Semantic Desktop data in an application?

How to design the human-computer interaction in an application for the Semantic Desktop?

How to correctly evaluate a semantic application?

5

Q2 sub-questions

connect the Semantic Desktop with the Web of Data

How to find Web instances representing the same real-world thing described by a Semantic Desktop resource?

How to use the Web information which is related to a desktop resource?

How to make desktop data available online safely?

6

Q2 sub-questions

connect the Semantic Desktop with the Web of Data

How to find Web instances representing the same real-world thing described by a Semantic Desktop resource?

How to use the Web information which is related to a desktop resource?

How to make desktop data available online safely?

6

Directions

7

Directions

1.

7

Directions

1.

7

Directions

7

Directions

2.

7

Directions

2.

1.

7

And then of course the 2 directions can and should and are combined

Within the Semantic Desktop

8

SemNotes

Challenges described by Q1

create new semantic dataData representation

Data management

reuse existing Semantic Desktop dataInterlinking

design the human-computer interactionVisualisation

correctly evaluate a semantic applicationTask-based comparison to Evernote

9

Data representation

10

Data representation

a pimo:Note ;

10

Data representation

a pimo:Note ; nao:prefLabel "holiday plans" ;

10

Data representation

a pimo:Note ; nao:prefLabel holiday plans ; nao:created 2010-09-16T21:08:54.29Zxsd:dateTime ; nao:lastModified 2010-09-17T10:59:01.58Zxsd:dateTime ; nao:numericRating 9xsd:int ;

10

Data representation

a pimo:Note ; nao:prefLabel holiday plans ; nao:created 2010-09-16T21:08:54.29Zxsd:dateTime ; nao:lastModified 2010-09-17T10:59:01.58Zxsd:dateTime ; nao:numericRating 9xsd:int ; nao:description ... xsd:string ;

10

Data representation

a pimo:Note ; nao:prefLabel holiday plans ; nao:created 2010-09-16T21:08:54.29Zxsd:dateTime ; nao:lastModified 2010-09-17T10:59:01.58Zxsd:dateTime ; nao:numericRating 9xsd:int ; nao:description ... xsd:string ; nao:hasTag ;

10

Data representation

a pimo:Note ; nao:prefLabel holiday plans ; nao:created 2010-09-16T21:08:54.29Zxsd:dateTime ; nao:lastModified 2010-09-17T10:59:01.58Zxsd:dateTime ; nao:numericRating 9xsd:int ; nao:description ... xsd:string ; nao:hasTag ; pimo:isRelated , .

10

Interlinking

Annotation suggestions:Based on the content of the note.

Certain types preferred.

Preference based on past use and matched length.

... brian ... Brian Davis

Brian Wall

... brian davis ... Brian Davis

11

Interlinking algorithm

Algorithmscan text; identify possible entities

for each possible entity find a list of desktop resource candidatescompute score for each possible candidate

filter list by score

sort by score

present the candidates to the user

create the relation only if the user chooses a resource

Visualisation - HCI

12

Visualisation - versions

13

Visualisation - versions

13

Visualisation - HCI

13

Evaluation

Task-based experiment

Comparation of SemNotes to Evernote

The effort of interlinking lower than the effort spent when searching.14

Evaluation

Experimental setup20 participants 14 use note-taking regularly

5 use Evernote in their daily activity

Familiar data 130 contacts

20 scientific papers

50 notes

8 tasks 2 tasks - familiarise the participants with the dataset

6 tasks focused on note-taking, varying the complexity

Measurements Time spent

Mouse clicks

Keystrokes

15

Evaluation

TasksFind notes tagged with todo

Find to-dos that are related to DERI

Find a to-do related to a presentation given by John

Take a note about planning a social event for your group

Find a note containing minutes from the last meeting about the NICE project. Change the date of the next meeting planned

Take a note for the action item assigned to you at the last meeting

Evaluation

Quantitative resultsTime spent note-taking no significant differences

Time spent searchingSemNotes significantly faster for complex queries

no significant difference for simple queries

16

Evaluation

Quantitative resultsTime spent note-taking no significant differences

Time spent searchingSemNotes significantly faster for complex queries

no significant difference for simple queries

Questionnaire resultsFasterBetter

16

Evaluation

Quantitative results

TaskTimeClicks

AvgMedtAvgMedt

T10.500.1520.16700.692

T2-8-8-2.94-0.333-1-0.48

T3-0.1251 -0.046 0.8571 1.426

T40.0630.0160.4866.06782.026

T514.357131.7134.81221.527

T60.2490.2431.00420.8123.08

But ...

The desktop is not any more the sole repository of personal informationSocial networks

Mobile devices

Cloud services

17

But the semantic desktop, as efficient as it might become with semantic tools and interconnected data, is no longer the only repository or even the main one some would say of personal data.

To the Web of Data

Challenges described by Q2 (Q2.1.)find Web aliases of Semantic Desktop resources

18

Finding Web Aliases

Web alias = Web resource representing the same real-world entity as the desktop resource

19

Finding Web Aliases

Different identifiers

19

Finding Web Aliases

Different identifiers

nepomuk:/res/Angela

http://angelaonthe.net/foaf/me

19

Finding Web Aliases

Different vocabularies

19

Finding Web Aliases

The sheer size of the Web of Data

19

Finding Web Aliases

The sheer size of the Web of Data

19

2 Step approach

Candidate Selection

20

2 Step approach

Candidate Selection

Query various Web of Data sources

Identify candidate URIs

Retrieve data for each of the candidate

20

2 Step approach

Candidate Selection

Query various Web of Data sources

Identify candidate URIs

Retrieve data for each of the candidate

Candidate Filtering

20

2 Step approach

Candidate Selection

Query various Web of Data sources

Identify candidate URIs

Retrieve data for each of the candidate

Candidate Filtering

Compute similarity score.

Filter the candidates.

20

Candidate Selection

Determined set of sourcesSpecific requirements

Restricted domain

Semantic search engineGeneric domain

Unknown data sources

21

Candidate Selection

Determined set of sourcesSpecific requirements

Restricted domain

Semantic search engineGeneric domain

Unknown data sources

21

Candidate Filtering

Filter by type

Compute similarity score

Filter by score

(local, web)returnscore

22

Matching Module

Typematchingreturn 0No

Compute scoreYes score threshold

NoYes(local, web)returnscore

Matching Parameters

String matching (SM) Exact matching versus approximate string matching

Koeln vs. Kln

Weighted properties (WP)Weighted participation of properties in the final score

Email address more exact than name

Multi-valued properties (MVP)All matching values for a property contribute to the score

e.g. Authors' names for a paper

Score Calculation

Driven by the local data

weighted sum of matching props

score =

total sum of all weighted props

Evaluation

Manually constructed gold standardData collection

Relevance judgements

IR measuresEffect of parameter settings

Adjust thresholds

23

Data collection

Desktop data50 people nco:PersonContact

50 music albums nmo:MusicAlbum

50 publications nfo:PaginatedTextDocument

11.917 triples

Web data20 candidates for each desktop resource -> 3000 URIs

1.530.686 triples

24

Relevance Judgements

25

Relevance Judgements

3000 pairs x 3 experts

Fleiss' K = 0.638 0.214Average pairwise agreement 92.252%

25

IR Measures

MAP

NDCG

P@k (k=1,2,3,4,5)

Baseline: exact match

all properties count equally

single value considered for each property

Evaluation Results

Approximate string matching improves results for albums and people

does not help for publications

Weights and multiple valueswhen combined improve results for publications,but not for the other types

26

Merging the two directions

1.

2.

27

A use case

Note Blog post [Semantic] note-taking [Semantic] blogging

[Preserve context] [Preserve privacy]

28

Steps

TransformationOn the local side

Extension to SemNotes

PublicationOn the server side

According to Linked Data principles

29

Steps

(Note-taking & annotation)(Entity matching)

TransformationOn the local side

Extension to SemNotes

PublicationOn the server side

According to Linked Data principles

29

Levels and layers

30

Ontology level

Local - Nepomuk ontologies

Remote SIOC, FOAF, DC, ...

pimo:Notesioc:Post

nao:Tagsioct:Tag

pimo:Personfoaf:Person

pimo:Projectdoap:Project

pimo:Eventical:Vevent

nao:prefLabelrdfs:label

nao:createddcterms:created

nao:lastModifieddcterms:modified

nao:hasTagsioc:topic

pimo:isRelatedsioc:related_to

Data level

Local notes, desktop resources (tags included)

Remote blog posts, Web resources, tags

http://semnotes.deri.ie/notes/note/id http://semnotes.deri.ie/notes/resource/id http://semnotes.deri.ie/notes/tag/label

Application level - local

Plugin for SemNotesAsk server for server URLs for the new note and resources

Replace desktop URIs with the server URLs in the note

Add RDFa to the note

Push the transformed note to the server

Application level - remote

Web server with MySQL, PHP, ARC2Create new URLs for resources

Receive and process the note

Publish the data online

Published data

31

Research answers

How to build semantic applications and tools for the Semantic Desktop?

How to expand the scope of the Semantic Desktop into the Web of Data?

32

Research answers

How to build semantic applications and tools for the Semantic Desktop? SemNotes Create new data

Reuse existing data

HCI

Evaluation

How to expand the scope of the Semantic Desktop into the Web of Data?

32

Research answers

How to build semantic applications and tools for the Semantic Desktop? SemNotes Create new data

Reuse existing data

HCI

Evaluation

How to expand the scope of the Semantic Desktop into the Web of Data?Web aliases

Semantic blogging use case

32

Future work

Information Extraction algorithms and methodscreate multiple types of relations based on the text

extract new entities from text

extract links between entities mentioned in the notes

Explore visualisations personal data browser

Large scale user study of semantic personal information usage and behaviours

33

Summary

1.

2.

Web aliases

+ semantic publishing use case

Digital Enterprise Research Institute www.deri.ie

Digital Enterprise Research Institutederi.ie