25
Die ZBW ist Mitglied der Leibniz-Gemeinschaft Improving Library Services with Semantic Web Technology - in the realm of Repository Systems Dr. Timo Borst Head of IT Development German National Library for Economics / Leibniz-Information Centre Economics Kiel/Hamburg, Germany ICDK 2011 14th – 16th February, Gurgaon/India

Improving library services with semantic web technology in the realm of repositories

  • Upload
    redsys

  • View
    609

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Improving library services with semantic web technology in the realm of repositories

Die ZBW ist Mitglied der Leibniz-Gemeinschaft

Improving Library Services with Semantic Web Technology- in the realm of Repository Systems Dr. Timo Borst

Head of IT DevelopmentGerman National Library for Economics /Leibniz-Information Centre EconomicsKiel/Hamburg, Germany

ICDK 201114th – 16th February, Gurgaon/India

Page 2: Improving library services with semantic web technology in the realm of repositories

Seite 2

Overview1. Current situation: Distributed (meta-)data management in library

applications

2. Popular approaches towards aggregation and homogeneity of metadata

3. Our approach: Integration and aggregation of authority values with Semantic Web technology

a) General ideab) Use case: Indexingc) Use case: Retrieving

4. “Lightweight” integration into existing repository systems and service providers

5. Conclusion

Page 3: Improving library services with semantic web technology in the realm of repositories

Seite 3

Current situation

• The rise of repository systems for academic publishing…

• …has led to a landscape of distributed systems, each of them holding its own metadata…

• …which is harvested and aggregated by service providers

Page 4: Improving library services with semantic web technology in the realm of repositories

Seite 4

Popular approaches towards aggregation and homogeneity of metadata

• Normalization in advance (before harvesting) requires

• a mandatory metadata scheme to be applied by the local repositories• a set of controlled vocabularies (e.g. for publication types)• an automatic validation of the harvested metadata

• Normalization afterwards (after harvesting) requires

• the definition of a minimum set of metadata fields• the definition of a basic intermediate metadata scheme for normalizing

the heterogeneous metadata records,• optionally data cleansing strategies like name disambiguation and

automatic indexing on the basis of thesauri

Both approaches are problematic and reveal ambiguities on the aggregation level !

Page 5: Improving library services with semantic web technology in the realm of repositories

Seite 5

Current situation

• …sounds easy and straight, but impliessevere problems esp. with regard toambiguity of• author names• subject headings

Page 6: Improving library services with semantic web technology in the realm of repositories

Seite 6

Current situation

„The major difficulty we have found is with DSpace’s handling of metadata. While we feel that the number of fields in Dublin Core isadequate for most if not all uses (DCMI Usage Board 2006), we aretroubled by the lack of authority control when completing its fields. Without some control over uniform titles, authors and subjectsaccessing the items in the future will very problematic.“

S. Chabot (http://subjectobject.net/2006/11/09/the-dspace-digital-repository-a-project-analysis/)

„Neither the standards nor the software unterlyinginstitutional repositories anticipated performing namingauthority control on widely disparate metadata fromhighly unreliable sources.“

D. Salo (http://minds.wisconsin.edu/handle/1793/31735)

Page 7: Improving library services with semantic web technology in the realm of repositories

Seite 7

Our approach: Integration of authority values with Semantic Web technology

• General idea: “Provide a framework for integrating authority data, which is both normative and flexible enough to tolerate local idiosyncrasies on a string level.”

• Approach: Concept modelling based on Semantic Web / SKOS standards

Page 8: Improving library services with semantic web technology in the realm of repositories

Seite 8

Our approach: Integration of authority values with Semantic Web technology

Page 9: Improving library services with semantic web technology in the realm of repositories

Seite 9

Our approach: Integration of authority values with Semantic Web technology – Web serviceExample queries (for concepts):

http://zbw.eu/beta/stw-ws/suggest?query=finanzkr…delivers all terms beginning with “finanzkr”

http://zbw.eu/beta/stw-ws/stw-ws-wrapper.php?service=labels&concept=http://zbw.eu/stw/descriptor/19664-4&lang=en…delivers all english synonyms of the german “Finanzkrise”

Page 10: Improving library services with semantic web technology in the realm of repositories

Seite 10

Use case: (Self-)Indexing• One of the most prominent use cases especially for librarians, but also

for scientists and active users not familiar with subject specific vocabularies

• Main goals:• Support the process of indexing in order to achieve a classification

of documents which is both coherent and flexible in the sense that it permits local idiosyncrasies related to authority terms

• Align different vocabularies in the sense that indexing in one vocabulary is automatically linked to another vocabulary

• Implementation: Extension of the submission interface of our repository by integrating the terminology web service as an autosuggest function

Page 11: Improving library services with semantic web technology in the realm of repositories

Seite 11

Use case: (Self-)Indexing

Submission form https://econstor.eu

Page 12: Improving library services with semantic web technology in the realm of repositories

Seite 12

Use case: Retrieving

• To be considered as the most important use case

• Often leading into the classical dilemma of precision and recall

• Main goal:• Support the process of retrieving, so users can find the relevant set of documents

• Implementation: Automatic expansion of the original query with synonyms, narrower and related terms

Page 13: Improving library services with semantic web technology in the realm of repositories

Seite 13

Use case: Retrieving

Expanded search for „financial crisis“ http://econstor.eu

Page 14: Improving library services with semantic web technology in the realm of repositories

Seite 14

Use case: Retrieving

Expanded search for „financial crisis“ http://econstor.eu

Page 15: Improving library services with semantic web technology in the realm of repositories

Seite 15

Use case: Retrieving

Expanded search for „financial crisis“ http://econstor.eu

Page 16: Improving library services with semantic web technology in the realm of repositories

Seite 16

Anwendungsfall_2: Suche

Page 17: Improving library services with semantic web technology in the realm of repositories

Seite 17

Anwendungsfall_2: Suche

Page 18: Improving library services with semantic web technology in the realm of repositories

Seite 18

“Lightweight” integration into existing repository systemsand service providers

Page 19: Improving library services with semantic web technology in the realm of repositories

Seite 19

“Lightweight” integration into existing repository systemsand service providers

Benefits• „Lightweight“ extension of legacy systems

• Strategy of „least intrusion“: No update or migration needed

• No changes to the core system, only some changes to the data model may be required:• Additional column for storing the URI of the authority key• Export resp. harvesting of the authority as a resource must be able

(->OAI-ORE)

• Other types of library applications suitable for these adaptations:• catalogues• portals (e.g. to generate publication lists from an identified author or

thematic issues) • Any collaborative system with annotation system

Page 20: Improving library services with semantic web technology in the realm of repositories

Seite 20

Zusammenfassung und Fazit

• Bibliotheksanwendungen erzeugen und verwalten jeweils eigene idiosynkratische Datenbestände.

• Dies erschwert die Pflege, den Austausch, die Aggregation und die Homogenisierung der (Meta-)Daten für erweiterte Dienste.

• Vorgelagerte Webservices als Teil einer übergreifenden Normdaten-Infrastruktur können frühzeitig zur Homogenisierung der Metadaten beitragen (bei gleichzeitiger Lokalisierung).

• Wenn diese Webservices verbreitet entstehen und genutzt werden, besteht die Chance zu einer weitergehenden Vernetzung lokal gepflegter Metadaten bei gleichzeitiger Verbesserung der datenbasierten Services.

• Die Möglichkeit zur „leichtgewichtigen Integration“ ist ein Angebot an Betreiber von Bibliotheksanwendungen, diese Webservices mit möglichst minimalem Aufwand in ihre Anwendungen zu integrieren.

Page 21: Improving library services with semantic web technology in the realm of repositories

Seite 21

Dr. Timo BorstDeutsche Zentralbibliothek für Wirtschaftswissenschaften / Leibniz-Informationszentrum Wirtschaft (ZBW)

[email protected]

Vielen Dank!

Page 22: Improving library services with semantic web technology in the realm of repositories

Seite 22

Anwendungsfall_3: Erfassung von Autoren

•Der Normalfall in Katalogen - in anderen Erfassungssystemen bisher der Ausnahmefall•Nutzergruppen: BibliothekarInnen + WissenschaftlerInnen (?) + BibliotheksnutzerInnen (?)•Vorgang: Eingabe von AutorInnen-Namen•Zielstellung: Den Vorgang der Autorenerfassung mit Hilfe von Normdaten zu verbessern, die durch Webservices bereit gestellt werden

Page 23: Improving library services with semantic web technology in the realm of repositories

Seite 23

Anwendungsfall_3: Erfassung von Autoren•Erfassungsmaske unter http://87.106.250.18/beta/econstor/

Page 24: Improving library services with semantic web technology in the realm of repositories

Seite 24

Bisherige Lösungsansätze zur Aggregierung & Homogenisierung

•Metadatensuche durch Aggregatoren• Parallele Abfrage entfernt-verteilter Systeme• Rückgabe und Aufbereitung des Suchergebnisses als

zusammengesetzte Trefferliste•Harvesting• Regelmäßiges Einsammeln von entfernt-verteilten

Metadaten• Homogenisierung ex ante oder ex post•Föderierte Suche

•…

Page 25: Improving library services with semantic web technology in the realm of repositories

Seite 25

Literatur•[1] http://wiki.dspace.org/index.php/Authority_Control_of_Metadata_Values•[2] http://minds.wisconsin.edu/handle/1793/31735•[3] http://dsug09.ub.gu.se/index.php/dsug/dsug09/paper/view/22/3•[4] http://subjectobject.net/2006/11/09/the-dspace-digital-repository-a-project-analysis/•[5] http://code.google.com/p/dspace-agrisap/wiki/ThesaurusAddOn•[6] http://edoc.hu-berlin.de/conferences/dc-2008/subirats-imma-199/PDF/subirats.pdf•[7] http://www.jisc.ac.uk/media/documents/programmes/sharedservices/names-phase-one-final-report,.pdf•[8] http://idea.library.drexel.edu/bitstream/1860/3173/1/20070051011.pdf•[9] http://ptsefton.com/blog/2006/06/06/the_affiliation_issue_in_institutional_repository_software/•[10] http://library.ust.hk/info/nac/nac-technical.html•[11] http://www.seco.tkk.fi/publications/2009/kurki-hyvonen-onki-people-2009.pdf•[12] http://journals.sfu.ca/archivar/index.php/archivaria/article/download/11883/12836•[13] http://www.dini.de/fileadmin/workshops/oa-netzwerk-juni2009/vernetzungstage_2009_malitz.pdf