16
Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling Johann Schaible GESIS Leibniz-Institute for the Social Sciences, Cologne, Germany [email protected] Thomas Gottron Institute for Web Science and Technologies, University of Koblenz-Landau, Germany [email protected] Ansgar Scherp Kiel University and Leibniz Information Center for Economics, Kiel, Germany [email protected] 1) Extended Version as technical report: http:// bit.ly/lodsurveyreport 2) Raw result data and survey in PDF: http:// bit.ly/lodsurveydata #eswc2014Scha ible

Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling @ESWC2014

Embed Size (px)

Citation preview

Page 1: Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling @ESWC2014

Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling

Johann SchaibleGESIS Leibniz-Institute for the

Social Sciences, Cologne, [email protected]

Thomas GottronInstitute for Web Science and

Technologies, University of Koblenz-Landau, Germany

[email protected]

Ansgar ScherpKiel University and Leibniz

Information Center for Economics, Kiel, Germany

[email protected]

1) Extended Version as technical report: http://bit.ly/lodsurveyreport2) Raw result data and survey in PDF: http://bit.ly/lodsurveydata #eswc2014Schaible

Page 2: Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling @ESWC2014

• How to…– …choose which vocabulary to reuse?– …find an appropriate mix of vocabularies?

• In order to achieve aspects, such as – providing a clear data structure– making data easier to be consumed– Achieving ontological agreement

Leads to different reuse strategies

Based on experience and “gut-feeling”

Motivation…

Page 3: Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling @ESWC2014

…and Contribution

Condense and aggregate expert’s knowledge and experience (“gut-feeling”)

1. Which aspects for reusing vocabularies are most important

2. Which vocabulary reuse strategy to followin a real-world scenario

Page 4: Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling @ESWC2014

Survey Design

Ranking Task T1

Ranking Task T2

Ranking Task T3

Aspects for reusing vocabularies

Reasons for ranking decision

Reasons for ranking decision

Reuse vs. Interlink Appropriate Mix of vocabularies

Additional Meta-Information

• Perspective of a LOD modeler• “Suppose, you have to model data as LOD…“

Page 5: Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling @ESWC2014

Ranking Tasks Structure

Assignment:

• Model data from a specific

domain as LOD

• Need to reuse vocabularies

• “Which of the provided

options do you consider the

better vocabulary reuse

strategy”

Page 6: Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling @ESWC2014

Ranking Tasks Example

Strategy minV:Reuse a minimum amount of vocabularies

Strategy pop:Reuse mainly popular vocabularies

Page 7: Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling @ESWC2014

Features for PopularityNumber of datasets using vocabulary V

Total occurrence of vocabulary term vi

Strategy:minV

Strategy:pop

Page 8: Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling @ESWC2014

Ranking Task T1

Reuse vs. Interlink

• Domain: Movies and actors

• Vocabulary reuse strategies:

1. pop: Reuse popular vocabularies

2. link: Define own vocabulary and link it to existing

popular vocabulary ()

3. max: Reuse a maximum amount of vocabularies

(lower boundary)

• Number of possible models to choose from: 3

Page 9: Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling @ESWC2014

Ranking Task T2

Find appropriate mix of different vocabularies

• Domain: Publications and authors

• Vocabulary reuse strategies:

1. minV: Reuse a minimum amount of vocabularies

2. max: Reuse a maximum amount of vocabularies (lower

boundary)

3. pop: Reuse popular vocabularies

4. minC: Reuse a minimum amount of vocabularies per

concept

• Number of possible models to choose from: 4

Page 10: Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling @ESWC2014

Ranking Task T3

Vocabulary reuse given additional

meta-information

• Domain: Music and musical artists

• Vocabulary reuse strategies:

1. minD: Reuse only domain specific vocabularies

2. minV: Reuse a minimum amount of vocabularies

3. pop: Reuse popular vocabularies

• Number of possible model to choose from: 3

Page 11: Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling @ESWC2014

Results of Ranking Tasks

Key insights• Reusing over interlinking• Popular vocabularies over minimizing number of vocabularies• Additional meta-information has effect on choice

11

Page 12: Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling @ESWC2014

Meta-Information Useful?

Key insights• No definite favorite support• # of datasets a vocabulary over total term occurrence• Most common use by others information: not valuable 12

Page 13: Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling @ESWC2014

Aspects for vocabulary reuse

Clear Data

Stru

cture

Data easi

er to be co

nsumed

Ontologic

al Agg

reement0

1

2

3

4

5

Before Ranking TasksAfter first ranking taskAfter second ranking task

Ratin

gs o

n a

5-po

int L

iker

t-sc

ale

13

Page 14: Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling @ESWC2014

• Linked Data experts and practitioners

• Acquired through LOD and Semantic Web mailing lists

• N = 79 (16 female, 63 male) (n.s. difference in answers)

• 67% academia, 23% industry, 10% both

• Research associates (22), postdocs (14), professors (8),

engineers and other professions (27).

• Age: M = 34.6, SD = 8.6

• Experience in LOD ( in years): M = 4, SD = 2.64

• Expertise in consuming and publishing LOD:

M = 3.64, S = 1 (on a 5-point-Likert Scale)

(n.s. difference in answers of group > 4 and group < 4)

Participants

Page 15: Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling @ESWC2014

• Which aspect are more important?

– All aspects are „somewhat important“ (Mdn = 4 )

– Aspects are rated higher in theory than in real-life

• Which strategy to follow?

– Preferred choice: reuse popular vocabularies

Better than minimizing number of vocabularies

– Popular vs. domain specific vocabularies: unclear

– Interlinking has not a good uptake

• Which meta-information is most useful?

– # of datasets using a vocabulary

– Most common use has no good uptake

Conclusion

15

Page 16: Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling @ESWC2014

1) Extended Version as technical report: http://bit.ly/lodsurveyreport2) Raw result data and survey in PDF: http://bit.ly/lodsurveydata #eswc2014Schaible

Questions?

Thank you very much for participating in the survey and helping me with my research