20
DSpace System Architecture 11 July 2002 DSpace System Architecture ... W orkflow C ontent M anagem ent API E-person/ G roup M anager A uthorisation H istory M anager B usiness Logic Layer A dm inistration Toolkit Federation Services Storage A PI D Space Public A PI B itstream Storage M anager RDBM S W rapper Search (Lucene W rapper) B row se H andle M anager Ingest W eb U I O A IM etadata Providing Service W eb Service Interface JD BC PostgreSQ L Filing System

DSpace System Architecture 11 July 2002 DSpace System Architecture

Embed Size (px)

DESCRIPTION

DSpace System Architecture 11 July 2002 Metadata I Descriptive: Qualified Dublin Core in RDBMS; registry of elements and qualifiers Other schema (e.g. MARC) stored as serialised bitstreams Administrative: Some stored as Dublin Core (e.g. provenance) Mostly stored in legacy DB schema

Citation preview

Page 1: DSpace System Architecture 11 July 2002 DSpace System Architecture

DSpace System Architecture 11 July 2002

DSpace System Architecture

...

Workflow

ContentManagement

API

E-person/Group

Manager

Authorisation

HistoryManager

BusinessLogic Layer

AdministrationToolkit

Federation Services

Storage API

DSpace Public API

Bitstream Storage ManagerRDBMS Wrapper

Search(Lucene

Wrapper)

Browse

HandleManager

Ingest

Web UI OAI Metadata ProvidingService Web Service Interface

JDBC

PostgreSQLFiling System

Page 2: DSpace System Architecture 11 July 2002 DSpace System Architecture

DSpace System Architecture 11 July 2002

Logical Data Model

communitycommunity

collectioncollection

itemitem

bundlebundle

bitstreambitstreambitstreamformat

bitstreamformat

mmmm m

mmm

mmmm

mm

itemDC Record

1 m

1 1

Page 3: DSpace System Architecture 11 July 2002 DSpace System Architecture

DSpace System Architecture 11 July 2002

Metadata I

•Descriptive:•Qualified Dublin Core in RDBMS; registry of elements and qualifiers•Other schema (e.g. MARC) stored as serialised bitstreams

•Administrative:•Some stored as Dublin Core (e.g. provenance)•Mostly stored in legacy DB schema

Page 4: DSpace System Architecture 11 July 2002 DSpace System Architecture

DSpace System Architecture 11 July 2002

Metadata II

•Structural:•Currently, just structure of data model•Proposed use of METS to describe relationship between bitstreams in bundles

•Preservation:•Registry of bitstream formats with preservation metadata

Page 5: DSpace System Architecture 11 July 2002 DSpace System Architecture

DSpace System Architecture 11 July 2002

Metadata III

•Much of the metadata proprietary, embedded in RDBMS schema

•Defining Archival Information Package based on METS to store this in a standard way

Page 6: DSpace System Architecture 11 July 2002 DSpace System Architecture

DSpace System Architecture 11 July 2002

Persistent Naming

•Handle system from CNRI•Currently Handles assigned to items – soon to cover collections and communities•CNRI Handle resolving software with DSpace “plug-in”•Individual Handles managed within DSpace system

Page 7: DSpace System Architecture 11 July 2002 DSpace System Architecture

DSpace System Architecture 11 July 2002

Implementation

•Implemented in Java

•UNIX-type operating systems: Linux, HPUX etc

•Open source – BSD-style license

•Uses only free, open source libraries

Page 8: DSpace System Architecture 11 July 2002 DSpace System Architecture

DSpace System Architecture 11 July 2002

Storage Layer

•RDBMS: PostgreSQL via JDBC, and a simpler “homegrown” API on top

•Simple bitstream storage API – currently just stores bitstreams in the file system

Page 9: DSpace System Architecture 11 July 2002 DSpace System Architecture

DSpace System Architecture 11 July 2002

Content Management API

•Java classes for manipulating content and metadata

•Fully transaction-safe

•Invokes authorisation system

•Records object history

Page 10: DSpace System Architecture 11 July 2002 DSpace System Architecture

DSpace System Architecture 11 July 2002

Workflow System

•Submissions to a collection may go through editorial or review process

•Implemented simple submission workflow with three optional stage; meets initial requirements

Page 11: DSpace System Architecture 11 July 2002 DSpace System Architecture

DSpace System Architecture 11 July 2002

Search

•Simple wrapper around Jakarta “Lucene” search engine

•Lucene maintains indices, implements stemming, stop-word removal and wild-card searching

•Wrapper allows bounded search within specific community/collection

Page 12: DSpace System Architecture 11 July 2002 DSpace System Architecture

DSpace System Architecture 11 July 2002

Browse

•“Homegrown”

•Builds indices in RDBMS

•Browse by author, issue date, title

•Bounded browse within specific community/collection

Page 13: DSpace System Architecture 11 July 2002 DSpace System Architecture

DSpace System Architecture 11 July 2002

“E-people” and Group Management

•Currently represent only users of system: Not a naming authority

•Stores user’s e-mail address, real name, password, contact information

•Simple group management – organise e-people into named groups

Page 14: DSpace System Architecture 11 July 2002 DSpace System Architecture

DSpace System Architecture 11 July 2002

Authorisation

Policies are triples: (object, action, group)

•Any object (community, collection, item, bitstream) can have access policies, and policies can be inherited from containers

Page 15: DSpace System Architecture 11 July 2002 DSpace System Architecture

DSpace System Architecture 11 July 2002

History

•Mechanical history of objects stored in RDF - Harmony ABC model

•Stored in file system

•Simple index held in RDBMS

Page 16: DSpace System Architecture 11 July 2002 DSpace System Architecture

DSpace System Architecture 11 July 2002

Ingest

•Legacy import format (SIP) defined – soon to be based on METS

•To ingest, content + metadata transferred to temporary filespace on the server

•Ingest application is invoked, which trawls and ingests data

•Ingested content can be sent through or bypass workflow system

Page 17: DSpace System Architecture 11 July 2002 DSpace System Architecture

DSpace System Architecture 11 July 2002

Administration Toolkit

•Ubiquitous “Context” object is used for authorisation and transaction-safety

•Configuration system – includes handling of configuration files for external tools (e.g. Apache, Tomcat)

•Standard text-based logging

•Classes for manipulating Dublin Core and bitstream format registries

Page 18: DSpace System Architecture 11 July 2002 DSpace System Architecture

DSpace System Architecture 11 July 2002

Web User Interface I

•Implemented as Servlets (for processing HTTP requests) and JSPs (for displaying HTML) – allows easy modification of display without affecting functionality

•Tomcat 4.0 Servlet engine running with Apache (for SSL features)

•Caucho Resin also supported

Page 19: DSpace System Architecture 11 July 2002 DSpace System Architecture

DSpace System Architecture 11 July 2002

Web User Interface II

•Authentication with e-mail/password or X509 certificates

•Other sites can provide a site authenticator class with custom code, e.g. interfacing with a departmental database

Page 20: DSpace System Architecture 11 July 2002 DSpace System Architecture

DSpace System Architecture 11 July 2002

Web User Interface III

Pages for:•Search

•Browse

•Configurable community and collection home pages

•“My DSpace” – managing in-progress submissions and workflow tasks

•Central administration UI – currently little support for self-administration by communities

•Item display

•Item submission