Publication information
Copyright information
Notice
http://repository.ust.hk/ir/
This version is available at HKUST Institutional Repository via
If it is the author’s pre-published version, changes introduced as a result of publishing processes such as copy-editing and formatting may not be reflected in this document. For a definitive version of this work, please refer to the published version.
HKUST Library’s experience on bibliographic linked data
Lam, Ki Tat
Presention at Peking University Library, Beijing, China, 19 December 2018
©HKUST Library and Author
http://repository.ust.hk/ir/Record/1783.1-94224
HKUST Library’s experience on bibliographic linked data香港科技大学图书馆在书目链接数据方面的经验
K.T. Lam (林纪达), HKUST Library
1Last revised: 20 December 2018
Presentation at Peking University LibraryBeijing, China
19 December 2018
2
1. Linked data explained2. Bibliographic Linked Data Learning Platform3. HKCAN Linked Data Service4. Triplestore implementation details
Agenda
Numerous datasets openly accessible as linked data
http://lod-cloud.net
3
Linked Open Data Cloud(链接开放数据云)
4
Bibliographic Data (书目数据)
Are these bibliographic data visible on the Web?Are they accessible as linked data?
Become part of the much larger web of data (成为更大的数据网络的一部分) Increase visibility Enhance discovery
Classification scheme
Name authority
data
Library catalog
Subject headings
Research output data
Digital scholarship
dataResearcher
profiles
LibraryData
5
1. Linked Data Explained
1. String (字符串)“Alice loves Bob”
2. Thing (物)“Alice” – a person“loves” – a concept“Bob” – a person
3. Web page (网页) is basically constructed by strings
4. MARC record is a also constructed by strings –subfields are strings
245 10 |aAlice loves Bob /|cby Charlie
5. Human understands string, but not machines
6
Linked data explained6. Breaking up a string into things to form a graph (图)
7. Each thing has its own URIFull form Prefixed formhttp://my.com/people/alice mypeople:alicehttp://my.com/people/bob mypeople:bobhttp://my.com/concept/love myconcept:love
8. Representing things as triple (三元组), in the form of Subject (主语), Predicate (谓词) (property) and Object (宾语)
Subject Predicate Object<mypeople:alice> <myconcept:love> <mypeople:bob>
Aliceloves
Bob
9. RDF is an XML document describing the triples
10. Things are inter-linked, thus the term Linked Data (链接数据)
7
Alice loves
Macau
Bob
born init.com
founded
has employee
has headquarter
Linked data explained (cont.)
11. Triples are stored in a triplestore (三元组数据库)
12. Use SPARQL query language (查询语言) to retrieve and manipulate triples in triplestore
8
Results of a SPARQL query
SPARQL queryLinked data explained (cont.)
13. If the dataset of linked data is available for open access, it is called Linked Open Data (链接开放数据)
Examples Wikidata (http://wikidata.org) GeoNames (http://www.geonames.org)
National libraries: BNB, BNE, BNF, DNB Authority data: LCNAF, LCSH, MeSH,
VIAF
9
LOD cloud(https://lod-cloud.net )
Linked data explained (cont.)
14. Vocabulary (词汇) (or ontology (本体) in complex sense) defines the terms and relationships used in the data model
Examples of vocabularies that are linked data capable• FOAF is a vocabulary describing “Person”, see specification at http://xmlns.com/foaf/spec/
• SKOS supports the use of knowledge organization systems (e.g. thesauri, classification schemes, authority), see its vocabulary specification at https://www.w3.org/TR/2009/REC-skos-reference-20090818/#vocab
• Schema.org provides vocabularies for structured data on the Internet, see http://schema.org
• BIBFRAME 2.0 is a data model for bibliographic description, see its vocabulary specification at http://id.loc.gov/ontologies/bibframe.html
• MADSRDF is a vocabulary for authority data, see specification at http://www.loc.gov/standards/mads/rdf/v1.html#Identifier
• RDA is a descriptive cataloging standard. It also has a vocabulary to support linked data applications, available at https://www.rdaregistry.info/
10
Linked data explained (cont.)
11
Graph showing some of the terms and relationships in the vocabulary of BIBFRAME 2.0
rdf:type
bf:Text
hkust:991011643229703412#Work
bf:Work rdf:typebf:title
<>
rdf:type bf:Title
bf:contribution
<>
“Smoot, George”@en
rdf:type bf:Contribution
<>
bf:identifiedBy
rdf:type
bf:Person
bf:agent
rdf:type
rdfs:label
rdfs:label“Wrinkles in time”@en
bf:Agent
<>rdf:value
lcnaf:n94027724
id:vocabulary/relators/ctbbf:role
bf:Identifierrdf:type
http://catalog.ust.hk/bf/991011643229703412
Linked data explained (cont.)
12
2. Bibliographic linked data learning platform
13
Will MARC be replaced by BIBFRAME?
Classification scheme
Name authority
data
Librarycatalog
Subject headings
Research output data
Digital scholarship
data
Researcher profiles
MARC
to-be-replaced-by(将被取代)
BIBFRAME
?
14
• “MAchine Readable Catalog” – defined in 1960s, originally meant for printing of catalog cards by machines
• String-based, not entity-based. Not compatible in semantic web (不兼容语义网), which represents things (entities, concepts, …) as addressable URIs
• No build-in linking capability (没有链接能力) of things in between records; nor within a record
• MARC data is absent in venues beyond library systems (绝迹); invisible in linked open data
• In need of a replacement (取代) – a bibliographic data model with a vocabulary specifically designed to work in a linked data environment
LDR 02567nam a2200493Ia 4500001 991011895759703412005 20170811211630.0008 090623s2009 cc a b 000 0 chi d020 |a9787802500150020 |a780250015X035 |a(OCoLC)406830083040 |aHNK|cHNK041 1 |achi|hmul049 |aHNKA050 4 |aQB985|b.Y829 2008245 00 |6880-01/$1|a宇宙简史 :|b无限宇宙中的无穷智慧 = A brief history of
universe /|c哥白尼, 爱因斯坦, 霍金等著 ; 呂陈君主编880 00 |6245-01|aYu zhou jian shi :|bwu xian yu zhou zhong de wu
qiong zhi hui = A brief history of universe /|cGebaini, Aiyinsitan, Huojin deng zhu ; Lü Chenjun zhu bian.
246 30 |6880-02/$1|a无限宇宙中的无穷智慧880 30 |6246-02|aWu xian yu zhou zhong de wu qiong zhi hui.246 31 |aBrief history of universe.250 |6880-03/$1|a第1版880 |6250-03|aDi 1 ban.260 |6880-04/$1|a北京市 :|b中国言实出版社,|c2009.880 |6260-04|aBeijing Shi :|bZhongguo yan shi chu ban she,|c2009.300 |a2, 4, 284 p. :|bill. ;|c25 cm.440 0 |6880-05/$1|a思想悦读880 0 |6440-05|aSi xiang yue du.504 |aIncludes bibliographical references.546 |aTranslated from various languages.650 0 |aCosmology.650 0 |aCosmology|xHistory.700 1 |aCopernicus, Nicolaus,|d1473-1543.700 1 |aEinstein, Albert,|d1879-1955.700 1 |aHawking, Stephen,|d1942-700 1 |6880-06/$1|a呂陈君880 1 |6700-06|aLü, Chenjun.
MARC
15
• BIBFRAME (Bibliographic Framework) is developed by Library of Congress (began in 2011)
• BIBFRAME 2.0 released in 2016 – resolved many shortcomings and became more adoptable
• It simplified FRBR’s WEMI (Work-Expression-Manifestation-Item) data model to Work, Instance and Item
• Early adopters - who are experimenting, implementing, extending, exploring, watching BIBFRAME?
• Libraries, consortia, national libraries and museums, ILS vendors, developers, librarians
• Find out more from recent presentations:• IFLA WLIC 2018 (http://library.ifla.org/2202/1/141-schreur-
en.pdf)• European BIBFRAME Workshop 2018
(http://www.casalini.it/EBW2018/)• BIBFRAME Update Forum at the ALA Annual
Conference 2018 (https://www.loc.gov/bibframe/news/bibframe-update-an2018.html)
From: https://www.loc.gov/bibframe/docs/bibframe2-model.html
16
Graph showing some of the terms, literals and relationship in BIBFRAME 2.0
rdf:type
bf:Text
hkust:991011643229703412#Work
bf:Work rdf:typebf:title
<>
rdf:type bf:Title
bf:contribution
<>
“Smoot, George”@en
rdf:type bf:Contribution
<>
bf:identifiedBy
rdf:type
bf:Person
bf:agent
rdf:type
rdfs:label
rdfs:label“Wrinkles in time”@en
bf:Agent
<>rdf:value
lcnaf:n94027724
id:vocabulary/relators/ctbbf:role
bf:Identifierrdf:type
http://catalog.ust.hk/bf/991011643229703412
17
http://catalog.ust.hk/bf/991011643229703412
A page from HKUST Library’s Bibliographic Linked Data Learning Platform, showing terms, literals and relationship in BIBFRAME 2.0
18
Result set of a SPARQL query showing some triples from the BIBFRAME triplestore at HKUST Library
http://catalog.ust.hk/lod/graph/hkust:991011643229703412
19
Bibliographic Linked Data Learning Platform(书目链接数据学习平台)
http://catalog.ust.hk/bf
• For library colleagues to learn (学习) about bibliographic linked data (from Alma)• BIBFRAME 2.0• RDA/RDF• JSON-LD
• Tools for linked data experiments• Use cases on discovery (发现用例)• Harvest linked open data from WikiData and construct
Knowledge Card in our library catalog (on Primo)
• BIBFRAME triplestore, with SPARQL query capability• Linked open data (LOD)• Endpoint: http://catalog.ust.hk/lod/sparql?query=...
• As of 17 December 2018, the store has100,569 named graphs, 25,631,525 triples 20
Bibliographic Linked Data Learning Platform (书目链接数据学习平台) (cont.)
http://catalog.ust.hk/bf
21
SPARQL Query Form (https://catalog.ust.hk/lod/)• Three examples (use cases)
• Query Editor for constructing your own SPARQL.• View results in table and JSON formats• Machines can launch SPARQL to the “catalog”
LOD endpoint
Display the BIBFRAME linked data of this work in human readable form and in RDF/XML. Machines can download the data in RDF and in N-Triples format (http://catalog.ust.hk/bf/991011643229703412)
Display RDA/RDF linked data of this work
Display MARCXMLof this work, for comparison purpose
Display Knowledge Card of names and subjects found in this work
Display record in HKUST Library Catalog (PowerSearch, on Primo)(http://catalog.ust.hk/bf/991011643229703412)
Display JSON-LD linked data of this work
Bibliographic Linked Data Learning Platform (书目链接数据学习平台) (cont.)
22
Alma BIBFRAME 2.0
Bibliographic Linked Data
Learning Platform
<rdf:value rdf:resource="http://id.loc.gov/authorities/names/n79039943"/><rdf:value rdf:resource="http://id.loc.gov/authorities/names/n79022889"/><rdf:value rdf:resource="http://id.loc.gov/authorities/names/n81020731"/>
Knowledge Card (知识卡) in Library Catalog
http://catalog.ust.hk/bf
https://lbdiscover.ust.hk/bib/991011895759703412
23
Knowledge Card assists Discovery
Display Wikipedia articleLaunch search to Library Catalog
De-reference URIs of these identifiers to their linked data pages
Generate another Knowledge Card• Based on the names/subjects mentioned in the
BIBFRAME graphs that contain a name/subject (e.g. lcnaf: n81020731)
• 1 degree of separation
A name or a subject
(e.g. lcnaf: n81020731)
A set of graphs containing this name/subject
A set ofnames/subjects in these graphs
Knowledge Card containing this name/subject
Knowledge Cardcontaining related
names/subjects
http://catalog.ust.hk/bf/bf4hkust.php?recnum=991011895759703412&format=wikidata
Alma• [Done] View BIBFRAME in Alma Metadata Editor
• [Done] Provide RESTful APIs to publish bibliographic records in BIBFRAME, RDA/RDF and JSON-LD• [Done] Enrich published linked data (i.e. BIBFRAME, RDA/RDF and JSON-LD) with authority links (e.g. to
LCNAMES and LCSH, soon to HKCAN too)• [Planning] Import metadata to Alma in BIBFRAME
• [Planning] Editing metadata in BIBFRAME (proof of concept only)Primo• [Testing] Make library catalog visible on the Web – by publishing metadata to sitemapin using
schema.org
Dashboard - Linked Data Development in Ex Libris Products – Maintained by Working Body of IGELU-ELUNA Linked Open Data Working Grouphttps://docs.google.com/document/d/1L5O-P_MbllYk8KLbCmh0l2mNM9pM7v4UrotVijmzlLEMailing List of IGELU-ELUNA Linked Open Data Working Group Subscribe: https://exlibrisusers.org/listinfo/lod
24
Linked data in Alma/Primo
25
3. HKCAN Linked Data Service
26
JULAC HKCAN – HKUST Library’s participation• HKUST Library joined HKCAN in 2016 (加盟)• Deeply involved in migrating the dataset to Alma (搬
家)• Merged (合并) HKUST CJK authority records to
HKCAN• publish-merge-import – developed programs to
synchronize content with LCNAF (内容同步)• Automated data cleanup; compiled lists for
manual cleanup (翻新)• Host HKCAN server at HKUST (托管伺服器)• HKCAN will soon be available on Alma
Community Zone• Led the discussion with Ex Libris in enhancing Alma
to support multilingual authority control (多语言规范控制)
• Developed HKCAN Linked Open Data Service• Everyone can access the dataset via this Service
(http://hkcan.julac.org)
http://hkcan.julac.org
Derived from LC
Original Cataloging
Total
Pre-Migration(30 Sept 2016)
113,266 177,160 290,426
Day 1 on Alma NZ(18 Jan 2018)
133,923 176,700 310,623
Currently(4 Nov 2018)
133,085 177,618 310,703
HKCAN Infrastructure (基础设施)
Database Maintenance
Alma Network Zone
HKCAN MASTER Merging ProgramImport
Catalogervia Alma
Alma Community ZoneHKCAN
Publish
Linked DataConsumer
HKCAN OAI Publishing
Publish HarvestHKCAN URIsde-referencing
Multi-lingual Authority
Control
LCNAFid.loc.gov
Harvest
HKCAN TriplestoreTransform
SPARQL querying
HKCAN Linked Data Service
27
Goals (目标)• In Alma bibliographic linked data
(e.g. BIBFRAME), URIs of HKCAN names can be properly de-referenced to a HKCAN URI landing page
• HKCAN dataset becomes an authoritative source of Chinese names in the form of Linked Open Data• Serve RDF• Support SPARQL
28
HKCAN Linked Data Service
Linked DataConsumer
HKCAN URIsde-referencing
SPARQL query HKCAN Linked Data Service
“林語堂, 1895-1976"@zh-Hani
<>
bf:identifiedBy
rdf:type
bf:Person
rdf:type
rdfs:label
bf:Agent
<> rdf:value
hkcan:9811105907603406
bf:Identifier
rdf:type
Alma BIBFRAME 2.0 linked data
http://hkcan.julac.org
HKCAN MARC records in Alma• Contain multiple 1XXs (and no 7XX)
• Subfield 9 differentiates language/script used in the field
• HKCAN uses three codes in subfield 9. They follow ISO 15924• hani for Han (Hanzi, Kanji, Hanja)• jpan for Japanese (alias for Han + Hiragana +
Katakana)• kore for Korean (alias for Hangul + Han)
29
Modelling HKCAN’s multiple preferred names (首选名称多项性)
HKCAN linked data model (链接数据模型)• Support both MADSRDF and SKOS/RDF
• In MADSRDF, preferred names have property<madsrdf:authoritativeLabel>
• In SKOS/RDF, preferred names have property:<skos:prefLabel>
• Use the following properties to embrace HKCAN assigned preferred namesMADSRDF:
<madsrdf:hasExactExternalAuthority>SKOS:
<skos:exactMatch>
100 1 |aLin, Yutang,|d1895-1976100 1 |a林語堂,|d1895-1976|9hani100 1 |a林语堂,|d1895-1976|9hani
30
Graph showing madsrdf:hasExactExternalAuthority and madsrdf:authoritativeLabel
hkcan:9811105907603406
<http://hkcan.julac.org/authorities/names/9811105907603406>
"Lin, Yutang, 1895-1976"@en
madsrdf:authoritativeLabel
madsrdf:PersonalName
rdf:type
madsrdf:hasExactExternalAuthority
<>
"林语堂, 1895-1976"@zh-Hani
rdf:type madsrdf:PersonalName
madsrdf:authoritativeLabel
madsrdf:hasExactExternalAuthority
<>
"林語堂, 1895-1976"@zh-Hani
rdf:type madsrdf:PersonalName
madsrdf:authoritativeLabel
Data model of HKCAN’s multiple preferred names – MADSRDF
31
Graph showing skos:exactMatch and skos:prefLabel
hkcan:9811105907603406
skos:exactMatch
<>
"林語堂, 1895-1976"@zh-Hani
rdf:type
skos:Concept
skos:prefLabel
<http://hkcan.julac.org/authorities/names/9811105907603406>
skos:exactMatch
<>
"林语堂, 1895-1976"@zh-Hani
rdf:type
skos:Concept
skos:prefLabel
"Lin, Yutang, 1895-1976"@en
skos:prefLabel
rdf:typeskos:Concept
Data model of HKCAN’s multiple preferred names – SKOS
• Bibliographic linked data (e.g. Alma BIBFRAME) that contains HKCAN URIs can be de-referenced (参引) to this page
• Linked data consumers can retrieve HKCAN graph via its Linked Open Data endpoint
• Traditional users can download HKCAN records in MARCXML, MADSRDF and SKOS formats
32http://hkcan.julac.org/authorities/names/9811105907603406
HKCAN URI landing page
• Build HKCAN Tripletore so that linked data consumers can launch SPARQL queries to the HKCAN dataset
• As of 17 December 2018, it has 310,776 named graphs with a total of 25,274,936 triples
• Processes to create the store:• Extract HKCAN MARCXML records via HKCAN
OAI-PMH publishing endpoint on Alma
• Transform MARCXML records to RDF (using LC’s MADSRDF+SKOS transformation tool)
• Load RDF to the store• SPARQL Query Form (http://hkcan.julac.org/lod)
• Use cases provided• Query Editor for constructing your own SPARQL• View results in table and JSON formats
33
HKCAN Triplestore• Linked open data endpoint:
http://hkcan.julac.org/lod/sparql?query=...• Challenges
• High demand on server computing resource for speed performance and storage capacity
• To-do: implement text index to enhance searching of literals
35
SPARQL Query Editor
Result list of the above SPARQL query showing some triples from the HKCAN Triplestore
HKCAN Triplestore [cont.]
http://hkcan.julac.org/lod/graph/hkcan:9811105907603406
36
3. Triplestore implementation details
Apache Jena + Fuseki• Jena
• Open source Java framework for building semantic web and linked data applications
• TDB2• A component of Jena for RDF storage and
query• Support Jena APIs
• Fuseki• HTTP interface to RDF data• Support SPARQL querying and updating
37
Triplestore implementation details
• SPARQL Query Form• Based on open source Javascript by yasqui.org
• YASQE - SPARQL Query Editor• YASR – SPARQL Result Set
• Triplestore Server• Physical PC – Intel i5 CPU, 64GB RAM, 2x4TB
solid-state disk, Linux CentOS 7
• Host both HKCAN dataset and an experimental BIBFRAME dataset of the Library Catalog (less than 10% of records loaded)
38
Questions?
Thank you!