Upload
4science
View
333
Download
0
Embed Size (px)
Citation preview
PUTTING HISTORICAL DATA IN CONTEXT: HOW TO USE DSPACE-GLAM
We will talk about…
1. Theoretical and methodological foundations of the DSpace-GLAM project2. Managing digital objects with DSpace3. Exentending the DSpace data model with DSpace-GLAM4. Integrating DSpace and DSpace-GLAM entities5. Digital cultural resources fruition and sharing with add-ons6. Dataset analysis with CKAN7. Conclusions
The BIG DATA age
• Since several years the term "Big Data" has been bursting into the world of Information Technology,
• Promising potential related to a new generation of technologies and architectures able to extract value from the enormous amount of data which is continuously produced in the most different fields
In the science domain "Big Data" are seen as an opportunity even bigger
The "data deluge" will make obsolete some of the fundamental concepts on which the scientific method has been based so far
A new scientific paradigm ?
No more theories?
No more hypothesis?
No more models?
Numbers speak for themselves?
A new scientific paradigm ?
Certainly new opportunities…
Source:http://bouache.com/blog/big-data/
• Being able to manipulate and analyze massive amounts of data represents an important progress for science
• It won’t abolish the need to build, refine and verify theories
• It will allow to formulate hypotheses and test them infinitely more rapidly and on an infinitely larger sample than in the past
…also for humanities
No data deluge, but…growingamount of data
• Databases• Electronic journals• Digitization• Tools for data extaction• …
A variety of multidisciplinarydata are related to Cultural Heritage and History
Different in:TypologyFormatStructureScale
More and more complexity
In the humanities most of the dataare created or collected by people(not measured by instruments)
They are affected by individuals, place, time
The are fragmentary, partial, biased
Source: http://www.asianscientist.com/2016/07/print/body-as-a-source-of-big-data/
Putting data in context
Digital Cultural Data have to be analyzed togetherwith all contextual information, digital and notdigital, needed to answer research questions, suchas:
• (cultural, social, economic, technological…) production context of a document/monument
• formation processes of an archaeologicalrecord
• contextual associations at different levels and scales (according to the different dimensions of variations)
Source: https://ddd.uab.cat/pub/expbib/2006/terradefoc/10.pdf
A Digital Humanities approach is fundamental…
Such an approach, with its focus on relationships, can help in identifying the important dimensionsof variation (the CONTEXT)
It can help in analyzing primary sources asevidences of a network of heterogeneous systems which can be studied by means of them through a global (holistic) and multidimensional analysis
Technological Environmental Social
Cultural Economic
Source: Hodder I. 2016, Studies in Human-Thing Entanglement, p. 28
…within a Digital Library Management System
To move such an approach from theory to practice we need infrastructures and tools for integration, analysis and storage of digital data and resources.
Today most of the cultural digital resources and data are in the Digital Libraries or Repositories
Are Digital Libraries and Repositories that must provide tools for:
• modeling, visualising and analysing information, both in a qualitative and quantitative way, as well as collaboratively working on it
• highlighting the relationships between data at different scales• explaining interpretations about the important dimensions of variation and about the network
of contextual relations in which historical sources are involved
To enter the daily workflow of historians, archaeologists and humanities scholars.
,
Why DSpace?
To achieve the outlined goals and build a state-of-art Digital Library Management System, open source software is preferable.
Development of open source software gives effective way to create Digital Library Management Systems with a small financial investment.
Looking exactly at sustainability, among the most used open source Digital Library Management Systems, we chose DSpace.
,
Why DSpace?
DSpace out-of-the-box allows to:
• capture and describe digital material using a submission workflow module, or a variety of batch ingest options
• distribute digital assets over the web through a search and retrieval system
• preserve digital assets over the long term
,
Why DSpace?
The system is based on the specifications of the OAIS (Open Archival Information System) for Long Term Preservation and is able to manage the whole "life-cycle" of a digital object in terms of "Digital Curation", by means of:
• metadata creation according to different standards • SIP (Submission Information Package) import and validation• AIP (Archival Information Package) creation• AIP export• storage management• digital resources dissemination (also by means of the OAI-PMH)• digital object history management and integrity check
,
Why DSpace?
,
There are over 2200 digital repositories and libraries worldwide using the DSpace application for a variety of digital archiving and dissemination needs.
DSpace is often used as an institutional repository to provide access to research outputs, scholarly publications, library collections, educational material and more.
It is also used as a digital library to store, preserve and disseminate digital cultural heritage.
A fairly large part of the world cultural and scientific heritage is already managed, accessed and preserved using DSpace
It makes sense to enhance a system already widely used rather than propose to migrate data to new platforms
DSpace Data Model
,
Communities & Collections
,
• Communities and collections are entities useful to aggregate DSpaceitems by:• Provenance and responsibility >>> Communities• Metadata, workflow, curation >>> Collections
• They both aggregate the items but they are conceptually different things!
Communities
,
Create your Community
Collections
,
Create your Collection
Collections
,
Collections
,
Collections
,
Workflow
,
Curating items
,
User management
,
E-People andGroups are the way DSpace identifies application users for the purpose of granting privileges
DSpace metadata
, Out-of-the-box DSpace can support multiple flat metadata schemas
You can configure multiple schemas by means of the “Metadata Schema Registry” and select metadata fields from a mix of configured schemas to describe your items
Communities and collections have some simple descriptive metadata (a name, and some descriptive prose)
The submission process
,
Defining the submission form
,
Configure the submission form by meansof input-form.xml file
You can configure different forms for different collections
You can create internal vocabularies for the fields
input-form.xml
,
input-form.xml
,
dc-schema (Required) : Name of metadata schema employed, e.g. dc for Dublin Core. This value mustmatch the value of the schema element defined in the Metadata Schema Registry
dc-element (Required) : Name of the element
dc-qualifier: Qualifier of the element entered, e.g. when the field iscontributor.advisor the value of this element would be advisor. Leaving this out means the input is for anunqualified element.
repeatable: Value is true when multiple values of this field are allowed, false otherwise. When you marka field repeatable, the UI servlet will add a control to let the user ask for more fields to enter additionalvalues.
label (Required): Text to display as the label of this field, describing what to enter, e.g. "Your Advisor'sName".
input-type(Required): Defines the kind of interactive widget to put in the form to collect the Dublin Corevalue.
input-form.xml
,
hint (Required): Content is the text that will appear as a "hint", or instructions, next to the input fields. Canbe left empty, but it must be present.
required: When this element is included with any content, it marks the field as a required input. If theuser tries to leave the page without entering a value for this field, that text is displayed as a warningmessage. For example, <required>You must enter a title.</required> Note that leaving the requiredelement empty will not mark a field as required, e.g.:<required></required>
input-form.xml – dropdown menus
,
To create an internal flat vocabulary youhave to:
• use the «dropdown», «qualdrop» or «list» value within the <input-type> element
• populate the <value-pairs> element
Hierarchical Taxonomies and Controlled Vocabularies
,Dspace offers also a way for structuring and managing more complex, hierarchical controlled vocabularies
Managed in a separate file
Taxonomies are described in XML
Vocabularies are invoked from the input-form.xml, using the <vocabulary> tag within the related <field>
Batch submission process
,Requires the creation of a DSpace Simple Archive:
• A directory for each item to import, containing:
• the files that make up the item.
• An xml file where each metadata element has it's own entry within a <dcvalue> tagset. There are currently three tag attributes available in the <dcvalue> tagset:• <element> - the Dublin Core element• <qualifier> - the element's qualifier• <language>- (optional)ISO language code for
element
• A “contents” file, with the files enumeration
• An (optional) collection file with the information about the collection(s) the item belongs to
<dublin_core>
<dcvalue element="title" qualifier="none">A
Tale of Two Cities</dcvalue>
<dcvalue element="date"
qualifier="issued">1990</dcvalue>
<dcvalue element="title"
qualifier="alternative"
language="fr">J'aime les Printemps</
dcvalue>
</dublin_core>
UI Batch Import
,
You have to:
• Compress the item directories into a zip files.
• Place the zip file in a public domain URL, like Dropbox or Google Drive or wherever you have access to do so
• Then log-in as Administrator and fill the form
UI Batch Import
,
Batch metadata editing
,
DSpace provides a batch metadata editing tool.
The batch editing tool facilitates the user to perform the following:
• Batch editing of metadata by means of a comma delimited file in CSV format
• Batch additions of metadata (e.g. add an abstract to a set of items)
• Batch find and replace of metadata values (e.g. correct misspelled surname across several records)
• Mass move items between collections
• Mass deletion, withdrawal, or re-instatement of items
• Enable the batch addition of new items (without bitstreams) via a CSV file
• Re-order the values in a list (e.g. authors)
Batch metadata editing
,
Extending Dspace
Cultural Institutions in the «Big Data Age» ask for:
• Complex and multidimensional metadata structures• Complex data models• Relationships management between different entities• Tools for digital data and resources visualization, analysis and
interpretation
Why not use an “extended” version of DSpace to meet these relevant needs?
DSpace-GLAM(Galleries, Libraries, Archives, Museums)
Built by 4Science on top of DSpace and to meet the needs of Cultural Heritage institutions
Flexible and extensible data model inherited from DSpace-CRIS (our RIMS)to manage relevant metadata standardsand specific conceptual models
With dedicated add-ons for digital objectscuration, fruition and sharing
Also an add-on for datasets visualizationand analysis
DSpace-GLAM(Galleries, Libraries, Archives, Museums)
DSpace-GLAM is free, open source, compliant with open standards
Add-ons are mainly distributed following a new business model (crowdsourcing)
Provides institutions witha sustainable and effective tool to manage and analyze Cultural Heritage Information
Weakness of DSpace metadata management
• Flat metadata model• Weak support for technical and structural metadata• All information are stored as string at the database level with minimal
support (and validation) for data entry in the UI
• DSpace-GLAM improves the metadata at the item level providing:-Additional input types for data entry (number, year and regex validation)-Partial support for nested metadata-Support for technical and structural metadata
DSpace-GLAM: interoperability
,• Connect to VIAF records and Getty Vocabularies for precise identification
of persons, artists and places
• It has been reported to work nicely with «plain» DSpace, with the authority implementation. Plan to include it out-of-box in DSpace 7
Extending the DSpace Data Model
DSpace-GLAM can manage all the entities important to contextualize digital cultural heritage:
• Persons• Families• Fonds• Events• Places• Concepts• …………..
Entities can be created to integrate different metadata standards and conceptual models
Extending the Data Model
• Persons
• Projects
• Organizations
are pre-defined entities inherited from DSpace-CRIS
… but you are not required to use (all of) them.
you can define additional entities
you can define your own relationships between entities, including the ones that you have defined
Pre-defined entities
Defining other entities
Entities components
Tabs
Box
Fields
Entities components: tabs
Entities components: boxes
Entities components: fields
• Each DSpace-GLAM entity instance has a status flag• Public: the details page is visible to anyone and it will be linked where
appropriate. The record is included in public search results• Private: only administrators can access the details page. The entity is indexed
only for use as authority entry
• Each property/attribute value has an edit mode:• Editable• Visibility flag only• Only Administrators• Read only
• A field becomes visible when included in a public visible tab/box
Data model configurationDSpace-GLAM visibility and security
• Visibility of a tab or box can be restricted to System administrators
Only RP owner
Admins and RP Owner
specific users and groups related to the entity instance
• To restrict the visibility of a box or tab to specific groups or users one or more properties must be indicated containing the users and/or groups that have access to the protected box / tab
Data model configurationDSpace-GLAM visibility and security
• It can be performed via UI and exported to xls
• It can be imported from XLS files
Data model configurationData model configuration
Data model configurationCreating entities relationships
Data model configurationCreating entities relationships
Data model configurationCreating inverse relationships between entities
DSpace-CRIS can use the SOLR indexes to reverse a relation• Documents are linked to the person • But you can also list the documents under a specific person
Relations are defined in the configuration spring file cris-relationpreference.xml and characterized by
A nameThe target entity (a CRIS Entity or a DSpace Item)The SOLR query with {0}, {1} placeholders to be replaced with the CRIS-ID or the uuid of the source CRIS instance
Data model configurationCreating inverse relationships between entities
(cris-relationpreference.xml)
<bean id="relationINTERPRETATIONVSEVENTSConfiguration" class="org.dspace.app.cris.configuration.RelationConfiguration">
<property name="relationName" value="crisinterpretation.events" />
<property name="relationClass" value="org.dspace.app.cris.model.ResearchObject" />
<property name="type" value="crisevents" /><property name="query">
<value>crisevents.eventsrelatedinterpretation_authority:{0}</value></property>
</bean>
Name
Target entity
Solr query
Data model configurationCreating inverse relationships between entities
• Inverse relations can be• Visualized • Used to show aggregated statistics
• To be visualized, relations are embedded in components (see cris-components.xml)
Data model configurationCreating inverse relationships between entities
(cris-components.xml)
<!-- Dynamic object component --><bean id="doComponentsService" class="org.dspace.app.cris.integration.CrisComponentsService">
<property name="components"><map>
<entry key="journalspublications" value-ref="publicationlistforjournals" /><entry key="eventsdocuments" value-ref="publicationlistforevents" /><entry key="placesevents" value-ref="eventlistforplaces" /><entry key="eventsperson" value-ref="personlistforevents" /><entry key="fondschild" value-ref="fondschildforfonds" /><entry key="fondspublications" value-ref="publicationlistforfonds" /><entry key="conceptdocuments" value-ref="publicationlistforconcept"/><entry key="conceptperson" value-ref="personlistforconcept"/>
</map></property>
</bean>Name of the related box for visualizing data
Data model configurationCreating inverse relationships between entities
(cris-components.xml)
<!-- Person list for Events dynamic entity --><bean id="personlistforevents"
class="org.dspace.app.webui.cris.components.CRISRPConfigurerComponent"><property name="relationConfiguration" ref="relationEVENTSVSRPConfiguration" /><property name="commonFilter">
<util:constantstatic-field="org.dspace.app.webui.cris.util.RelationPreferenceUtil.HIDDEN_FILTER" />
</property><property name="target" value="org.dspace.app.cris.model.ResearchObject" /><property name="facets" ref="facetsRPforComponentConfiguration" /><property name="types">
<map><entry key="all" value-ref="allObjectsComponent" />
</map></property>
</bean>
Data model configurationCreating inverse relationships
Data model configurationIntegrating DSpace and DSpace-GLAM
(dspace.cfg)
• All the GLAM’s entities can be linked with DSpace Items and used as authorities for item’smetadata
• This can be done adding some code to dspace.cfg file
##### Authority Control Settings #####plugin.named.org.dspace.content.authority.ChoiceAuthority = \org.dspace.app.cris.integration.ORCIDAuthority = RPAuthority,\org.dspace.content.authority.ItemAuthority = PublicationAuthority,\org.dspace.content.authority.ItemAuthority = DataSetAuthority,\org.dspace.app.cris.integration.DOAuthority = EVENTAuthority,\org.dspace.app.cris.integration.DOAuthority = FONDSAuthority,\org.dspace.app.cris.integration.DOAuthority = CONCEPTAuthority,\org.dspace.app.cris.integration.DOAuthority = INTERPRETATIONAuthority,\
Data model configurationIntegrating DSpace and DSpace-GLAM
(dspace.cfg)choices.plugin.dc.relation.conference = EVENTAuthoritychoices.presentation.dc.relation.conference = suggestauthority.controlled.dc.relation.conference = truecris.DOAuthority.dc_relation_conference.filter = resourcetype_authority:eventscris.DOAuthority.dc.relation.conference.new-instances = eventsItemCrisRefDisplayStrategy.publicpath.dc.relation.conference = events
choices.plugin.dc.relation.concept = CONCEPTAuthoritychoices.presentation.dc.relation.concept = suggestauthority.controlled.dc.relation.concept = truecris.DOAuthority.dc.relation_concept.filter = resourcetype_authority:conceptcris.DOAuthority.dc.relation.concept.new-instances = conceptItemCrisRefDisplayStrategy.publicpath.dc.relation.concept = concept
choices.plugin.dc.relation.fond = FONDSAuthoritychoices.presentation.dc.relation.fond = suggestauthority.controlled.dc.relation.fond = truecris.DOAuthority.dc_relation_fond.filter = resourcetype_authority:crisfonds AND crisfonds.fondsleaf:trueItemCrisRefDisplayStrategy.publicpath.dc.relation.fond = fonds
Authority nameDisplay mode For authority values
Originfor authority values
Entity to populatewith new values
Authority has its own ID
Path to use to linkthe entity
Data model configurationIntegrating DSpace and DSpace-GLAM
Data model configurationCreating inverse relationships between items and
entities
Data model configurationCreating inverse relationships between items and
entities
Data model configurationClustering of related objects
Out-of-the-box are available components implementations to allow configurable rendering of inverse relation for each entities (dspace items or dspace-glam entities)It is possible
• to configure which facets show in the component• to apply filters to the relation• It is possible to enable a clustering using custom categories defined
by facet queries
It is aware of the preference expressed for the relationships
Managing hierarchical archival structures
Extending the data model makes the system able to manage the hierarchicalmetadata structure required by archival standards such as ISAD (G) and EAD
DSpace-GLAM can also manage the production and preservation context of the archive required by ISAAR-CPF, EAC-CPF and ISDIAH
Creating and managing Archival Fonds at differentlevels
Relating an Archival Unit (Item) to a Fond
Visualizing hierarchical archival structures
Overview of the DSpace-GLAM data model
Overview of the DSpace-GLAM data model
Pointing out Social Networks
The system is able to draw graphs based on relationships between Persons using data from the different entities and from the DSpace Items
In particular it draws relationships between persons who:
• Co-authored the same items• Partecipated in the same event(s)• Partecipated in event(s) in the same place(s)• Are related to the same concept(s)
Visualizing relationships between historical figures
Network configuration (network.cfg)
Networks are implemented by plugins
You can write your own implementation typically starting from the default ones
You can canfigure the network layout (colors, nodes numbers, levels)
Formalizing and analysing interpretations
Interpretations are logical processes which starts from data and/or assumptionsand through logical reasoning and connecting persons, events, documents, etc., arrive to one or more conclusions
Often, in humanities, such processes are merged and hidden within naturallanguage narratives
To make such processes explicit, we have to scompose them in differentcomponents and in atomic propositions and display such elements
Formalizing and analysing interpretations
Linking interpretations to entities
With DSpace-GLAM you can link an interpretation to the items, the events and the persons, it is related to
Moreover you can link different interpretations to the same entity
Contextualizing historical data
Painting: The flagellation Painter: Piero della Francesca
Event: Council of Ferrara (AD 1438)
Event: Council of Mantua (AD 1459)
Place: Ferrara Concept: RenaissanceConcept: HumanismConcept: Neoplatonism
Person: Emperor John VIII Palaiologos
Place: Mantua
Interpretation: Ronchey
Ready for Linked Open Data
Ready for Linked Open Data
Linking and relating the createdentities with other authorities,the institution is ready to be part of the Linked Data Graph
Now we are working to include also the additional entities intothe DSpace RDF management features
GLAM
Navigation
Global search across the whole Digital Library
Infographics
Global search across the whole Digital Library
Top objectsusing several criteria
Global search across the whole Digital Library
Faceted Search
Facets
Customizable Browse indexes
Customizable Browse indexes
DSpace-GLAM use cases
Cutural Heritage image files (digitalized manuscripts, paintings, monuments, archaeological finds, rare books, etc.) need to be consulted online, discussed and commented / annotated
IIIF protocols and formats allow you to meet these requirements in a standard and understandable (for both humans and machine) way
DSpace-GLAM use cases
High-quality scanned books have images typically over 100MB for each page
The structure of image sequences are complex and relevant (sequences of pages, of the phases of an historical event, of a cycle of frescoes, etc.)
DSpace-GLAM use cases
The same requirements apply to audio and video content
-Streaming
-Internal structure
-Annotation / commenting / transcript
Adopt an open standard: the MPEG-DASH format allows adaptive streaming over simple html client with full support for multiple tracks, ToC, subtitles
4Science IIIF Image Viewer Addon
IIIF Compliant
1. Presentation API
2. Image API
3. Search API
4. Authentication API (soon)
DSpace item with “see online option”
Offering an integrated Universal Viewerplayer
IIIF Image API allows a smooth interaction with the image files
IIIF Presentation API generated on the fly using the metadata of the item and the bitstreams
Bitstreams metadata
Hierarchical ToC
An example from a PDF document offered as a complex package of page-image
Link images with their textual transcription / OCR
Indexing standard format (hOCR) in a webannotationserver to supply IIIF Search API
Side by side – image vs text using an additional OCR panel
An example in Arabic charactershttps://dspace-glam.4science.it/handle/1234/24
IIIF Image Viewer: share and reuse
Share images with other scholars/users without waiving proper attribution, e.g. using the «manifest» JSON file:
https://dspace-glam.4science.it/json/iiif/1234/11/30/manifest
in another IIIF Image Viewer:http://projectmirador.org/demo/
Audio/Video streaming
Full open source stack:1. Transcoding2. Adaptive streaming3. MPEG-DASH standard
Audio/Video streaming
https://dspace-glam.4science.it/explore?bitstream_id=1841&handle=1234/7&provider=video-streaming
Allows the transcode of the audio/video formats in a format and encoding appropriate to the adopted media server (adaptive video streaming)
Using the DASH standard protocol allows sharing video with other scholars/users without waiving proper attribution, e.g. using the «manifest» XML file:
https://dspace-glam.4science.it/av-stream/1841/ch/0/29/94/83/manifest.mpd
in another DASH clienthttp://dashif.org/reference/players/javascript/v2.4.1/samples/dash-if-reference-player/index.html
Visualizing and analysing datasets
4Science has released a free and open source integration with CKAN, the world's leading open-source data management platform
Using an extensible viewer framework you can now offer data discovery, exploration, preview, sampling and visualization from your DSpace repository
CKAN makes open webservices for tabular data available: https://ckan.org/
Visualizing and analysing datasets
We look at Dspace-GLAM not only as a tool for management and preservation, but also for analysis
Our integration with CKAN allows the visualization and analysis of repertoires and inventories by means of grids, graphs or maps
Datasets can also be related to items and otherentities
https://dspace-glam.4science.it/handle/1234/15
Archaeological finds geolocalization
Visualizing and analysing datasets
https://dspace-glam.4science.it/explore?bitstream_id=1971&handle=1234/22&provider=ckan-recline
Pottery distribution
Why do I need DSpace-GLAM?
• DSpace-GLAM is a powerful extension of DSpace created by 4Science to meet
the needs of Galleries, Libraries, Archives and Museums
• to be able to manage, analyze and preserve digital objects
• together with historical, archaeological or other cultural datasets,
• relating them with other entities such as persons, events, places,
concepts, etc.
• to describe the context of cultural objects and data, according to different
granularity levels, and to different interpretations
• using worldwide adopted, cutting-edge, open-source software and open
standards
How I get DSpace-GLAM?
• Every institution, can install Dspace-GLAM or upgrade its DSpace installation to DSpace-GLAM, extending documents management by creating new entities
• Your publications will be safely managed as before, adding the advantage of linking them to relevant information such as authors, datasets, events, concepts, networks and much more
When can I move to DSpace-GLAM?
• Now: every moment is appropriate to enhance your Digital Library, to better support research activities and make your service more relevant
• Upgrading from DSpace to DSpace-GLAM or installing a brand-new “extended” DLMS does not take much extra effort and it is largely rewarded by the extraordinary results that you can get
• As an extra security, (if you already have a Dspace repository) DSpace-GLAM does not alter the structure of the current objects managed by DSpace, so you can go back from DSpace-GLAM to DSpace at any time just dropping (a lot of) extra tables… but we are confident that you will not want to do that
• Our goal is to provide an environment for integrating the traditional hermeneutic and interpretative work of historical sciences with data visualization and analysis
• In this way, we hope, there may be a fundamental change in the way digital cultural heritage is experienced, analyzed and contributed to by the whole scientific community
Data Science in a Digital Humanities Framework
Thanks for your attention
Andrea Bollini
mobile: +39 333 934 1808
skype: a.bollini
orcid: 0000-0002-9029-1854
Claudio Cortese
mobile: +39 333 9340846
skype: claudio.cortese74
orcid: 0000-0003-4572-9711