27
Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen ([email protected]) Logos Bible Software SemTech 2010 Slides: http://semanticbible.com/other/talks/2010/semt ech/lcv.html

Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen ([email protected]) Logos Bible Software SemTech 2010 Slides:

Embed Size (px)

Citation preview

Page 1: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

Using a Controlled Vocabulary for Managing a Digital Library Platform

Sean Boisen ([email protected])Logos Bible Software

SemTech 2010Slides:

http://semanticbible.com/other/talks/2010/semtech/lcv.html

Page 2: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

Outline

• Introduce the Logos digital library• Logos Controlled Vocabulary (LCV)

– What it is– How do we use it– What’s interesting about it

• Next steps

Page 3: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

Who Am I?

• 19 years with BBN Technologies– Information extraction, human language

technology– Scientist, technology manager

• 3+ years with Logos Bible Software– Senior Information Architect– Manager of Design & Editorial Dept.– Academic Products Manager

Page 4: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

The Importance of the Bible

• The most widely distributed book – ~83M per year worldwide

• The most widely translated work – > 2000 languages– 50 languages at www.biblegateway.com

• Spans 1000s of years of ancient history

Page 5: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

Logos Bible Software• High-end desktop

digital library– > 10k titles– > 100k users in 180

countries– Extensive cross-indexing

and hyper linking– Resources in a dozen

languages– Windows/Mac/iPhone/mobile

• Leading publisher and developer of digital resources for Bible study

• http://logos.com

Page 6: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

Network Effects

• Rich markup and original content• Information integration

Page 7: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

Added Value Strategy

• Domain-specific focus• Task-oriented guides that automate

research • Integrated tools and content• Unique digital assets that integrate

information and provide answers

Page 8: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

Controlled Vocabularies

• Organized system for labeling content– Using English terms

• Consistent representation of content• More effective search

Page 9: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

Logos Controlled Vocabulary (LCV)• Domain-specific (Biblical studies)• Semantic organization of reference

book content – not just terms• Mitigates problems of ambiguity,

homographs, synonyms, spelling variation

Page 10: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

LCV Value Proposition

• Recognizes key terms in the knowledge domain

• Provides alternate search terms and query expansion

• Supports user-created content and reading lists

• Integrates reference content• Provides semantic “glue” for the

library

Page 11: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

Example: Ambiguity

Page 12: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

Example: Homographs

Page 13: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

Example: Variation

Page 14: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

Scope

Page 15: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

TimBL's rules for Linked Data:• Use URIs to identify things (=

Identity) – Use HTTP URIs so people can look things

up

• Provide useful information in a standard format when someone references a URI (=Utility)

• Include links to other URIs (= Relationships)

Page 16: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

LCV as Linked Data: PriscaId: Prisca_Person Label: “Prisca”

Type: Person Name: True

PrefLabel: “Prisca” Extra-biblical:

False

AltLabel: “Priscilla”

Entities: agent:Prisca.1

Articles: Anchor.PRISCAPERSON, Tyndale.L4559, …

Topics: http://topics.logos.com/Prisca

Wikipedia: Priscilla and Aquila

Identity

Utility

Relationships

Page 17: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

LCV as Linked Data: DeceitId: deceit Label: “Deceit”

Type: Name: False

PrefLabel:

“Deceit” Extra-biblical:

False

AltLabel: “Deception”, “Deceitful”, “Deceive”

Articles: ISBE.DECEIT, NBD.R494, …

Topics: http://topics.logos.com/deceit

Identity

Utility

Relationships

Page 18: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

Example Semanticslcvinst:Aaron_Person rdf:type skos:Concept ; skos:prefLabel "Aaron"@en ; lcv:isname "true"^^xsd:boolean ; lcv:termType lcv:Person ; skos:related lcvinst:aaronsRod ; lcv:bkentity bk:Aaron .

res:anch.AARONPERSON rdf:type foaf:Document ; dct:subject lcvinst:Aaron_Person .res:TYNBIBDCT.L1 rdf:type foaf:Document ; dct:subject lcvinst:Aaron_Person .res:isbe.AARON rdf:type foaf:Document ; dct:subject lcvinst:Aaron_Person .

Page 19: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

Semantic Inter-relationships

Person

Concept

Thing

Place

Text

Concrete

Conceptual

Page 20: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

LCV Development

• Developed by merging content from 7 Bible dictionaries – Extract headwords– Do automatic

alignment (conservative)

– Review manually

• Reduced > 40k concepts down to ~10k

Page 21: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

LCV Development Continues

• Additional resources suggest new concepts: – Archaeol. Dict. of the Holy Land: 90/547 (16%)

• Mostly very specific locations (%EinSamiya_Place)

– Nelson's Illus. Bible Dictionary: 200/4833 (4%)– Harper's Bible Dictionary: 81/2962 (3%)

• Adding alternate terms• Subject areas for further expansion:

– Individuals from church history– Specialized theological concepts

Page 22: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

Use Case: Improved Topic Search• Link to the same concept regardless

of how originally labeled • Provide consistent semantics for

content • Suggest alternate concepts for the

same term • Provide query expansions for full text

search

Page 23: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

Use Case: Information Discovery• Automatically link

– Reference to concepts – Concept to related concepts – Concept to references

Page 24: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

Text Mining: Reference to Concepts• Aggregate reference

counts– Each article votes on

most likely references– Each concept votes

on the most likely concepts for a reference

• Reverse index from reference to concepts

• Estimates should improve with more content

Page 25: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

Text Mining: Related Concepts• Extract and aggregate key terms• Cluster documents

Page 26: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

Conclusions

• Controlled vocabulary coupled with parallel content

• Platform for text mining, user contribution

• Future Work– Continue adding resources– Additional content extraction– Add hierarchy (LCSH, WordNet)– Crowdsourcing

Page 27: Using a Controlled Vocabulary for Managing a Digital Library Platform Sean Boisen (sean@logos.com) Logos Bible Software SemTech 2010 Slides:

Resources

• A Controlled Vocabulary for Biblical Studies (Boisen). Presentation at BibleTech:2010.

• Domain-Specific Tools to Add Value to E-Books (Pritchett). Presentation at O'Reilly Tools of Change for Publishing Conference 2010.

• Deploying Semantic Technologies for Digital Publishing (Boisen). Presentation at SemTech:2007.