27
RDF GRAPH VISUALIZATION BY INTERPRETING LINKED DATA AS KNOWLEDGE Rathachai CHAWUTHAI & Prof.Hideaki TAKEDA National Institute of Informatics , and SOKENDAI RDF4U JIST2015 Yichang, China 11-13 Nov 2015

RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge

Embed Size (px)

Citation preview

RDF GRAPH VISUALIZATION BY INTERPRETING LINKED DATA AS KNOWLEDGE

Rathachai CHAWUTHAI & Prof.Hideaki TAKEDA National Institute of Informatics , and SOKENDAI

RDF4U

JIST2015 Yichang, China 11-13 Nov 2015

AGENDA

• Motivation

• Methods

• Graph Simplification

• Triple Ranking • Property Selection

• Outcome

• Future Plan

MOTIVATION

THE ROLE OF SEMANTIC WEB IN KNOWLEDGE MANAGEMENT

DDaattaa ttiieerr

SSeerrvviiccee ttiieerr

VViissuuaalliissaattiioonn ttiieerr

SSPPAARRQQLL JJEENNAA eettcc..

4

AApppplliiccaattiioonn//PPrreesseennttaattiioonn//

At Visualisation Tier, • RDF data are transformed into

Chart, Geographic Map, etc. and then serve users.

It’s cool, but • Users are far from RDF data, so

they do not understand the power of Semantic Web and do not realise how to contribute RDF data.

For this reason, • It could be good if users can read

RDF data directly using node-link diagram or concept-map diagram.

read

READING FROM A QUERY GRAPH

5

Querying the 2-hop neighbourhood (or more hops) of a given URI gives wider information on the topic.

CCaaffffee MMoocchhaa

EEsspprreessssoo CChhooccoollaattee

SSuuggaarr MMiillkk

CCooffffeeeettyyppee

sswweeeett

ttyyppee

ttaassttee

ssuuggaarrccaannee

mmaaddee ffrroomm

ccooww

pprroodduucceess

wwhhiittee

ccoolloorr

ccooccooaa

ccoonnttaaiinnss

aa sshhoott ooff

ttooppppeedd bbyyccoonnttaaiinnss

hhaass llaayyeerr ooff

ccaaffffeeiinnee ccoonnttaaiinn

443300 mmgg//LL

bbllaacckk

ccoolloorr

bbiitttteerrttaassttee

PROBLEMS

1) A Query Graph is TOO Complicated to Read.

http://lod.ac/species/Bubohttp://dbpedia.org/resource/Tokyo

6

PROBLEMS

7

2) Lacking of Reading Flow of RDF Data

All triples are equal, so Background Content and Main Point are NOT structured in any RDF graphs.

≠ TTooppiicc

GOAL

8

we prefer …….

✦ A Simply Readable Graph ✦ A Well-Reading-Flow Graph

TTooppiicc

TTooppiicc

Common Information

Topic-Specific Information

DEMO

http://my.tv.sohu.com/us/271745761/81854223.shtml

9

https://www.youtube.com/watch?v=z3roA9-Cp8g

bit.ly/youtube_rdf4u

bit.ly/sohu_rdf4u

Full urls

METHODS

OVERALL

11

Prop

erty

Sel

ectio

n

Gra

ph

Sim

plifi

cati

on

Trip

le R

anki

ng

RDF4U Human-Readable Graph

Original Query Graph

display/hide properties

select simplification rules

choose a proper rank

User

GRAPH SIMPLICATION

12

• Some well-prepared RDF repositories did reasoning on ontologies in order to support a SPARQL service.

• One impact is that the inferred triples create giant components in a graph.

• A closer look at the data indicates that the following situations are commonly found in any complex RDF graph. • equivalent or same-as instances (owl:sameAs), • transitive properties (e.g. skos:broaderTransitive), and • hierarchical classification (rdf:type & rdfs:subClassOf)

• Thus, this method aims to remove some redundant triples by using the mechanism of Semantic Web rules.

xx CC11

CC22

rrddffss::ssuubbCCllaassssOOffrrddff::ttyyppee

xx

yy

zzPP

PP

GRAPH SIMPLICATION

13

ss11 oo11

oo22

pp11

pp22ss22

oowwll::ssaammeeAAss and fD(s1) > fD(s2) ss11

pp11

pp22

oo11

oo22

To merge same-as nodes

To remove transitive links

To remove inferred type hierarchies

xx

yy

zzPP

PP

PP

and p rdf:type owl:TransitiveProperty .

xxCC11

CC22rrddff::ttyyppee

rrddff::ttyyppee

rrddffss::ssuubbCCllaassssOOff

11

22

33

GRAPH SIMPLICATION

Example Result

14

Graph Simplification

Superorder(Order(

owls(

Strigiformes(

Family(

Common(Name(Strigidae(Aves(

Bubo(

eagle(owls(

Genus(

Class(

birds(

Coelurosauria(

Neognathae(

Taxon(Name(

hasSynonym)

hasSynonym)

hasParentTaxon)

hasParentTaxon)hasParentTaxon)

hasTaxonRank)

hasTaxonRank)

hasTaxonRank)

hasTaxonRank)

hasSynonym)

hasParentTaxon)

hasTaxonRank)

type)type)

type)type)

type)

ScienAfic(Name(

http://lod.ac/species/Bubo

Simplified GraphOriginal Query Graph

TRIPLE RANKING

15

Since users have different background knowledge in a specific topic, beginners may interested in reading common information before getting topic-specific information, while experts may prefer to read only topic-specific information.

• Concept Level (resources || properties)

• General Concepts are terms that are commonly known such as “name”, “address”, and “class”, and they are always found in a corpus.

• Key Concepts are important terms that are always found in the query result and not many in the whole dataset.

• Information Level (triples)

• Common Information explains background knowledge that supports readers to understand the main content. (a lot of general concepts)

• Topic-Specific Information contains specific terms that are highly relevance to the article. (a lot of key concepts)

TRIPLE RANKING

16

are General Concepts are Key Concepts

Identify • General concepts • Key concepts

Get an RDF graph 2211

TRIPLE RANKING

17

are General Concepts are Key Concepts

Common Information Most of nodes and links are general concepts

33 44Topic-Specific Information Most of nodes and links are key concepts

α⋅w(s) + β⋅w(p) + γ⋅w(o)

3

α⋅w(s) + β⋅w(p) + γ⋅w(o)

α + β + γ

TRIPLE RANKING

18

w(uri)=fQ(uri)

log( fD(uri) + 1)

vw(⟨s,p,o⟩)=

a number of a URI in a Query result

a logarithmic scale of a number of a URI in a whole Dataset

Weight of a URI

Visualization-Weight of a Triple

The coefficients are 1.0 by default, but they can be adjusted due to for specific purpose.

Concept Level

Information Level

high: key concept low: general concept

high: topic-specific low: common info

TRIPLE RANKING

19

h"p://dbpedia.org/resource/Hydrogen 53 1,386 16.87h"p://dbpedia.org/resource/Category:Chemical_elements 14 10,880 3.47h"p://dbpedia.org/resource/Hydrogen_economy 13 6,489 3.41h"p://dbpedia.org/resource/Category:Diatomic_nonmetals 12 103 5.96h"p://dbpedia.org/resource/Category:Airship_technology 8 166 3.60h"p://www.w3.org/2004/02/skos/core#Concept 8 9,707,808 1.14

h"p://www.w3.org/2002/07/owl#Thing 2 9,761,514 0.29h"p://www.hydrogen.energy.gov/ 1 1 0.00

h"p://www.w3.org/2002/07/owl#sameAs 72 !meout 0.00

h"p://www.w3.org/1999/02/22-­‐rdf-­‐syntax-­‐ns#type 38 !meout 0.00

h"p://www.w3.org/2000/01/rdf-­‐schema#subClassOf 24 !meout 0.00h"p://www.w3.org/2002/07/owl#equivalentClass 22 !meout 0.00h"p://purl.org/dc/terms/subject 12 30,232,709 1.60h"p://www.w3.org/2004/02/skos/core#broader 12 2,485,421 1.88h"p://xmlns.com/foaf/0.1/isPrimaryTopicOf 3 34,557,438 0.40h"p://purl.org/dc/elements/1.1/rights 2 3,102,660 0.31

URIfQ fD

log(fD)

fQ

Reso

urce

sPr

oper

ties

in a Query graph

in a whole Dataset

Query Topic: dbpedia:Hydrogen

(raw: 1,291,986)

(raw: 15,195,702)

Concept Level

TRIPLE RANKING

20

Subject Predicate Object vw

dp:Hydrogen rdf:type owl:Thing 5.62

dp:Hydrogen rdf:type skos:Concept 6.01

dp:Hydrogen dct:subject dp:Chemical_elements 7.31

dp:Hydrogen dct:subject dp:Airship_technology 7.35

dp:Hydrogen rdf:type dp:Diatomic_nonmetals 7.48

HFor Example

http://dbpedia.org/resource/Hydrogen

Common

Topic-Specific

Information Level

TRIPLE RANKING

21

In case of sub-property (also sub-class)

ltk:higherTaxon

ltk:mergedIntoskos:broader

rdfs:subPropertyOf

rdfs:subPropertyOf

ltk:higherTaxon

ltk:mergedInto

a x

a y

skos:broadera x

a yskos:broader

more specific than

Raw Data Inferred Data

OUTCOME

PROTOTYPE

23

http://rc.lodac.nii.ac.jp/rdf4u/

Thanks toClient: D3js, Bootstrap, jQuery, Server: SimpleRDF, SPARQL for PHP

• To simplify a graph by removing some inferred triples.

• To give ranking scores to triples based on common and topic-specific information.

• To filter a graph by selecting preferred properties.

• To control an interactive graph diagram.

Features

bit.ly/rdf4u

DISCUSSION

Usefulness

Uniqueness

Novelty

Prospect

Some graph visualisation works: Motif, Gephi, RDF Gravity, Fenfire, and IsaViz,

• do not use the power of Semantic Web to sparsity a graph, and

• do not mention to provide different data for different user levels

• TF-IDF is adapted for ordering triple from common to topic-specific level of information.

• The degree of commonness versus specificity is calculated by evaluating the nature of the dataset with the algorithm.

• The triple ranking can be extended by applying various algorithm in order to satisfy diverse characteristics of the data in other domains such as Biodiversity Informatics.

• Mashup tools should consider this idea.

24

• A diagram is sparser and easier to be read by human.

• Beginners can read common information firstly.

• Expert can read topic-specific information.

FUTURE PLAN

• To do critical evaluation • Survey • Number of cutting edge

• To find the precise border between common information and topic-specific information

• To find a better way to count the number of URIs(always timeout)

• To remove noisy triples

• To improve triple ranking algorithm for other domains

25

Prop

erty

Sel

ectio

n

Gra

ph

Sim

plifi

cati

on

Trip

le R

anki

ngRDF4U

Human-Readable Graph

Original Query Graph

http://rc.lodac.nii.ac.jp/rdf4u

非常感謝

THANKS TO THESE IMAGE SOURCES

https://www.pinterest.com/pin/444660163179663554/

http://www.clipartpanda.com/categories/reading-clipart

https://en.wikipedia.org/wiki/Facebook_like_button

http://www.iconarchive.com/show/misc-icons-by-iconlicious/Monitor-icon.html

http://www.w3.org/RDF/icons/

http://designplaygrounds.com/tv/the-power-of-data-visualization-2/

https://conceptdraw.com/a1247c3/preview/256