Upload
gisela-lancaster
View
24
Download
0
Embed Size (px)
DESCRIPTION
SKOS Ecoterm 2006 Alistair Miles CCLRC Rutherford Appleton Laboratory. Semantic Web Best Practices and Deployment. Reminder: what is it?. S imple K nowledge O rganisation S ystem Formal language for representing controlled structured vocabularies (thesauri, classification schemes, … ?) - PowerPoint PPT Presentation
Citation preview
Rutherford Appleton Laboratory
SKOSEcoterm 2006
Alistair MilesCCLRC Rutherford Appleton Laboratory
Semantic Web Best Practices and Deployment
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 2
Reminder: what is it?
• Simple Knowledge Organisation System• Formal language for representing
controlled structured vocabularies (thesauri, classification schemes, … ?)
• Subject metadata & information retrieval …– ‘this document is about romantic love’.– ‘this document is about the cure of tuberculosis by x-
ray in India in the 1950s’.
• Application of RDF
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 3
Since Ecoterm 2005 …
• SKOS Core Guide & SKOS Core Vocabulary Specification …– First Working Draft May 2005– Second Working Draft October 2005
• Minor changes
• Quick Guide to Publishing a Thesaurus on the Semantic Web …– First Working Draft May 2005
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 4
What comes next … ?
• Life after SWBPD-WG … ?• Plans for next phase of W3C
Semantic Web Activity …• New WG?• SKOS W3C Recommendation by end
2007?• N.B. Not yet approved!
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 5
If Rec then …
• What is the scope? What is the fundamental design goal?
• First part of SKOS Rec would be requirements specification.
• Between now and Sept/Oct 2006 … define scope and requirements.
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 6
What I’d like to do here …
• Talk about some of the assumptions behind SKOS.
• Sketch some ideas on how to define scope and requirements for SKOS.
• Get your [email protected]
“SKOS: Requirements for Standardization”isegserv.itd.rl.ac.uk/public/skos/press/dc2006/paper.pdf
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 7
Brief history of scope …
• 2003-04: SWAD-Europe– ISO 2788 thesauri– “Non-standard” thesauri via extensibility e.g.
GeMET– Classification scheme (PACS)– Multilingual thesauri– Semantic mapping
• 2004: W3C Glossaries• 2005: Discussion re “terminologies”• Subject headings? Gazeteers?
Folksonomies? Taxonomies?
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 8
Assumptions: purpose …
• Formal representation of controlled structured vocabularies intended for use in information retrieval applications.
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 9
Assumptions: workflow …
a) Build a vocabularyb) Build an indexc) Retrieve
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 10
Assumptions: components …
• Vocabulary Development Application– Something to help build a vocabulary
• Indexing Application– Something to help build an index
• Retrieval Application– Something to help retrieve things
• SKOS ultimately designed to support interoperation of these three “key components”.
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 11
Proposed scope …
• SKOS is a formal language for representing controlled structured vocabularies intended for use within information retrieval applications.
• SKOS is required to support the interoperation of these three key components.
• I.e. define the requirements for SKOS by describing a set of functionalities that must be enabled.
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 12
Other components …
• Vocabulary mapping … ?• Metadata registries … ?• … ?
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 13
Component specs …
• … first discuss social and technological context, then return to component specs …
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 14
Context …
• What is the social and technological context in which controlled structured vocabs are used?
• Assume two basic needs…– Locate something I already know about.– Discover something new.
• N.B. a good location service is not necessarily a good discovery service.
– Cf. Google and del.icio.us
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 15
Strategies …
• Basic strategies for implementing retrieval services …
1. Statistical text analysis2. Analysis of user behaviour3. Index with controlled vocab
• Other strategies …1. … kos-assisted text analysis?
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 16
Cost problem …
• Given that applying controlled structured vocab for retrieval involves significant initial and ongoing investment…
• Given that other strategies are cheaper…
• Huge pressure to drive down cost and increase utility.
• Requirement for seamless integration.– I.e. controlled vocab is seldom used in isolation, most
applications will combine strategies.
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 17
Use case …
• Search portal …• Use combined strategies.
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 18
Component specs …
• Important factors …
• Minimise cost.– Decentralisation.– Assistance.
• Maximise “utility”.– Query expansion.– Smart ranking.– Maximize lifetime.
• Use the Semantic Web!– Situation A. search across many collections, where
indexers use same controlled vocab.– Situation B. search across many collections, where
indexes use different controlled vocabs.
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 19
Focus areas …
• Decentralisation requires different models of collaboration and change.
• Representing change a key factor to keeping a vocab applicable.
• Ranking and scoring well understood for text, less so for controlled index.
• Theory of query expansion? Field trials of query expansion?
• Strategies for providing assistance?
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 20
Change and collaboration
• Continuum of collaboration models: centralized <-> decentralised
• Continuum of change management models: continuous <-> discrete
• Decentralization can reduce cost of development and maintenance
• Change management can ensure continued utility – maximize ROI
• Support for declarative representation of change a requirement for SKOS.
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 21
Semantic Web architecture…
• Exploit Semantic Web facility to distribute and merge data.
• However, publication of data in the Semantic Web, best practices need work.
• See “Best Practice Recipes for Publishing RDF Vocabularies” W3C Working Draft (Google “publishing RDF”).
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 22
Semantic Web architecture
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 23
Direct interaction …
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 24
Information retrieval…
• Indexing and query evaluation well understood for text content.
• Less well understood for controlled metadata.
• Query types?• Query evaluation strategies, e.g.
query expansion?• Ranking?
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 25
Assistance for indexers …
• Provide suggestions– Comparison of labels and annotations– Machine learning – Exploit lexical resources– … ?
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 26
Assistance for mappers …
• Provide suggestions …– Analysis of labels and annotations– Exploit lexical resources– … ?
http://www.w3.org/2004/02/skosAlistair Miles, Ecoterm 2006, slide 27
Summary
• SKOS: fundamental requirement to support information retrieval using controlled structured vocabularies.
• Define requirements by describing information retrieval functionalities.
• Divide functionalities into:– Presentation styles– Query types e.g. compound queries, coordination …– Query evaluation strategies
• Assumptions:– Key components– Semantic Web interaction– Context – pressure to make vocabularies “profitable”– … Issues: change, assistance, theory …