Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling @ESWC2014

  • View
    280

  • Download
    2

  • Category

    Science

Preview:

Citation preview

Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling

Johann SchaibleGESIS Leibniz-Institute for the

Social Sciences, Cologne, Germanyjohann.schaible@gesis.org

Thomas GottronInstitute for Web Science and

Technologies, University of Koblenz-Landau, Germany

gottron@uni-koblenz.de

Ansgar ScherpKiel University and Leibniz

Information Center for Economics, Kiel, Germany

mail@ansgarscherp.net

1) Extended Version as technical report: http://bit.ly/lodsurveyreport2) Raw result data and survey in PDF: http://bit.ly/lodsurveydata #eswc2014Schaible

• How to…– …choose which vocabulary to reuse?– …find an appropriate mix of vocabularies?

• In order to achieve aspects, such as – providing a clear data structure– making data easier to be consumed– Achieving ontological agreement

Leads to different reuse strategies

Based on experience and “gut-feeling”

Motivation…

…and Contribution

Condense and aggregate expert’s knowledge and experience (“gut-feeling”)

1. Which aspects for reusing vocabularies are most important

2. Which vocabulary reuse strategy to followin a real-world scenario

Survey Design

Ranking Task T1

Ranking Task T2

Ranking Task T3

Aspects for reusing vocabularies

Reasons for ranking decision

Reasons for ranking decision

Reuse vs. Interlink Appropriate Mix of vocabularies

Additional Meta-Information

• Perspective of a LOD modeler• “Suppose, you have to model data as LOD…“

Ranking Tasks Structure

Assignment:

• Model data from a specific

domain as LOD

• Need to reuse vocabularies

• “Which of the provided

options do you consider the

better vocabulary reuse

strategy”

Ranking Tasks Example

Strategy minV:Reuse a minimum amount of vocabularies

Strategy pop:Reuse mainly popular vocabularies

Features for PopularityNumber of datasets using vocabulary V

Total occurrence of vocabulary term vi

Strategy:minV

Strategy:pop

Ranking Task T1

Reuse vs. Interlink

• Domain: Movies and actors

• Vocabulary reuse strategies:

1. pop: Reuse popular vocabularies

2. link: Define own vocabulary and link it to existing

popular vocabulary ()

3. max: Reuse a maximum amount of vocabularies

(lower boundary)

• Number of possible models to choose from: 3

Ranking Task T2

Find appropriate mix of different vocabularies

• Domain: Publications and authors

• Vocabulary reuse strategies:

1. minV: Reuse a minimum amount of vocabularies

2. max: Reuse a maximum amount of vocabularies (lower

boundary)

3. pop: Reuse popular vocabularies

4. minC: Reuse a minimum amount of vocabularies per

concept

• Number of possible models to choose from: 4

Ranking Task T3

Vocabulary reuse given additional

meta-information

• Domain: Music and musical artists

• Vocabulary reuse strategies:

1. minD: Reuse only domain specific vocabularies

2. minV: Reuse a minimum amount of vocabularies

3. pop: Reuse popular vocabularies

• Number of possible model to choose from: 3

Results of Ranking Tasks

Key insights• Reusing over interlinking• Popular vocabularies over minimizing number of vocabularies• Additional meta-information has effect on choice

11

Meta-Information Useful?

Key insights• No definite favorite support• # of datasets a vocabulary over total term occurrence• Most common use by others information: not valuable 12

Aspects for vocabulary reuse

Clear Data

Stru

cture

Data easi

er to be co

nsumed

Ontologic

al Agg

reement0

1

2

3

4

5

Before Ranking TasksAfter first ranking taskAfter second ranking task

Ratin

gs o

n a

5-po

int L

iker

t-sc

ale

13

• Linked Data experts and practitioners

• Acquired through LOD and Semantic Web mailing lists

• N = 79 (16 female, 63 male) (n.s. difference in answers)

• 67% academia, 23% industry, 10% both

• Research associates (22), postdocs (14), professors (8),

engineers and other professions (27).

• Age: M = 34.6, SD = 8.6

• Experience in LOD ( in years): M = 4, SD = 2.64

• Expertise in consuming and publishing LOD:

M = 3.64, S = 1 (on a 5-point-Likert Scale)

(n.s. difference in answers of group > 4 and group < 4)

Participants

• Which aspect are more important?

– All aspects are „somewhat important“ (Mdn = 4 )

– Aspects are rated higher in theory than in real-life

• Which strategy to follow?

– Preferred choice: reuse popular vocabularies

Better than minimizing number of vocabularies

– Popular vs. domain specific vocabularies: unclear

– Interlinking has not a good uptake

• Which meta-information is most useful?

– # of datasets using a vocabulary

– Most common use has no good uptake

Conclusion

15

1) Extended Version as technical report: http://bit.ly/lodsurveyreport2) Raw result data and survey in PDF: http://bit.ly/lodsurveydata #eswc2014Schaible

Questions?

Thank you very much for participating in the survey and helping me with my research

Recommended