LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading

Preview:

DESCRIPTION

LinkSphere: P2P Cross Database Search -- Architecture and Issues Hugo Mills University of Reading. LinkSphere. Linking Researchers and their Data Social networking for researchers Cross-database search Mostly Arts and Humanities datasets “Promoting serendipity” - PowerPoint PPT Presentation

Citation preview

LinkSphere:P2P Cross Database Search --

Architecture and Issues

Hugo MillsUniversity of Reading

LinkSphere

• Linking Researchers and their Data

• Social networking for researchers

• Cross-database search

– Mostly Arts and Humanities datasets

– “Promoting serendipity”

– Access by and presentation of datasets to wider audiences

Datasets

Museums Archives Archaeology:

Silchester Excavation, IADB

Ure Museum of Classical Archaeology

CentAUR: ePrints Library

Beckett Collection Cole Museum of

Zoology Film Collection Herbarium Typography

Collections

Tycho

Fully asynchronous peer-to-peer communications framework

Written in Java Fully distributed Robust

“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.” (Leslie Lamport)

Has a simple distributed data store (“Virtual Registry”) for client metadata

Tycho

(Relatively) lightweight 3MiB for a fully functional system

Fast• Flexible, Extensible

– Bootstrap handlers– Additional message types– VR extensions– Alternative communication protocols– Discovery of core mediators via Bonjour/ZeroConf

XDB System Architecture

VR VRVRVR

Repo

Tycho Core

RepoRepoRepo

JDBC Web API SPARQL ...

REST search API

Search App Search App

Meta MetaMetaMeta

User Interface

• Main UI is web-based

– Uses AJAX

– Currently embedded within the LinkSphere project site

– Will ultimately move to the SNS

• Any UI possible using the REST API

Issues

• Getting the data is hard

– Implementation problems

– Maintenance problems

– Admin problems

– Social problems

– Legal problems

“Muddling along”

• Archive of material for intra-departmental use only

– Some legal issues involved

• Group of technicians administering the data

– Poor quality data

• Excel spreadsheet(!)

• Reluctant to have index of material made public

“Not ready yet”

• Big university projects

• New systems, (potentially) large data sets

• MERL museums archive (AdLib)

– Data all loaded from previous systems

– Access modules not yet installed

• CentAUR publications archive (ePrints 3)

– Very little data available yet

“Works For Me”

• Custom web application

– PHP, sophisticated

• External developer

• No documentation

• MySQL underneath

“It works, but...” (part 1)

• Non-technical users

• Admins are Mac-only, desktop-only people

• FileMaker Pro

• DB structure and UI developed externally

– No documentation

– This has bad implications

“It works, but...” (part 2)

• Completely custom application

– External developer

– No documentation (again)

– Large lump of write-only perl

• Custom data store

– Not SQL. Not XML. Not RDF.

• No external access

Unreachable data

• Uncommunicative systems

• Custom applications

– Developers/administrators AWOL

• Custom data models

• Lost passwords

• Excel spreadsheets

– See also, “Uncommunicative”

Unreachable data

• Private data

– Legal issues

– Possessive owners

• Internal use only

• Poor quality

• No data!

Conclusions

• Building the software is easy

• There is still lots of hard-to-reach data out there

• Issues are largely not technical

• More outreach to A&H areas needed

Acknowledgements and thanks

• LinkSphere team: Mark Baker, Shirley Williams, Pat Parslow (Reading), Claire Warwick, Melissa Terras, Claire Ross (UCL)

• Repository owners at Reading: Amy Smith (Ure Museum), Guy Baxter (University Archivist), Mary Dyson, Hadj Messelles (Typography), Jonathan Bignell (Film Studies), Alison Sutton (CentAUR), Mike Fulford, Amanda Clarke (Silchester)

• JISC VRE 3 programme

Tycho Architecture

VR

VR

VRM

M M

VR

M

C

C

C

C

CC

C

C

REST Interface

• /api/query

– POST to start new query asynchronously

• /api/query/query_id

– GET for query metadata

– DELETE to cancel query (or it will time-out naturally)

• /api/query/query_id/start/finish

– GET a range of results from the query

• Feedback API coming soon

REST Interface

• /api/repository

– GET list of repositories currently online

• /api/repository/repo_id

– GET for repository metadata• Link to repository itself

• Link to LinkSphere description of it

Recommended