20
Preserving Italian archaeological open data: the MOD solution. Francesca Anichini – MAPPA Lab, University of Pisa Gabriele Gattiglia – MAPPA Lab, University of Pisa

Caa2015 2 a_gattiglia

Embed Size (px)

Citation preview

Preserving Italian archaeological open data: the MOD solution. Francesca Anichini – MAPPA Lab, University of Pisa Gabriele Gattiglia – MAPPA Lab, University of Pisa

Preserving Italian archaeological open data: the MOD solution.

is a research Lab of the University of Pisa, in which archaeologists, mathematicians and geologists deal with: • mathematical models for archaeology (www.mappaproject.org) • open data (www.mappaproject.org/mod)

MAPPA Lab

The MAPPA Open Data (MOD) archaeological repository (www.mappaproject.org/mod) stores and consider openly accessible all kind of archaeological data from raw data to linked open data produced during the research process, including metadata, and it’s connected with the Journal of Open Archaeological Data for good quality data.

Preserving Italian archaeological open data: the MOD solution.

Preserving Italian archaeological open data: the MOD solution.

Our infrastructure is hosted on the data center (GNU/Linux) of University of Pisa, is designed on an Open Source LAMP technological platform using an Apache HTTP Server, PHP scripting language and MySQL Open Source relational database.

Preserving Italian archaeological open data: the MOD solution.

The growing volume of openly accessible data has a great impacting factor in the management and preservation of data.

0

50

100

150

200

250

300

350

400

2013 2014

datasetsarchivesgrey literature

+18%

+761%

+59%

Preserving Italian archaeological open data: the MOD solution.

The main functions of the MOD are to acquire, develop and manage data and related digital resources of value to archaeologists, and to promote and disseminate these resources as widely and effectively as possible, the preservation procedure starts from the acquisition: 1. the archaeological data collections are ingested as they are: potentially, each archaeological dataset is important for the discipline; 2. the data collections are checked and validated according to the legal and regulatory framework; 3. the data collections are indexed with keyword terms using an appropriate thesaurus 4. the data collections are catalogued according to appropriate metadata standards; 5. the data collections are licensed with appropriate licenses; 6. the datasets, documentation, metadata and other representation information that comprise each data collection are kept in conditions suitable for long-term archival storage;

The primary objective of all archives is to select, preserve and make available for use documents or information which have permanent or continuing value.

Preserving Italian archaeological open data: the MOD solution.

The MOD endeavours to undertake long term preservation working within a framework conforming to the ISO (14721:2012) specification of a reference model for an Open Archival Information System (OAIS) The Open Archival Information System (OAIS) reference model is an international standard which proposes common terms and concepts and a framework for entities and relationships between entities in digital preservation environments. OAIS is a conceptual framework and not a concrete implementation plan. An Open Archival Information System is an archive, consisting of an organization, which may be part of a larger organization, of people and systems that has accepted the responsibility to preserve information and make it available for a designated community. The term open is referred to fact that the standards are developed in a collaborative manner via open forums, and it does not imply that access to the archive is unrestricted.

Preserving Italian archaeological open data: the MOD solution.

The Open Archival Information System (OAIS) provides: • a framework for the understanding and increased awareness of

archival concepts needed for long term preservation and access, • the concepts needed by non-archival organizations to be

effective participants in the preservation process, • a framework, including terminology and concepts, for describing

and comparing architectures and operations between archives, • a framework for describing and comparing different Long Term

Preservation strategies and techniques, • a basis for comparing the data models of digital information

preserved by archives,

Preserving Italian archaeological open data: the MOD solution.

Currently, as our main purpose is to persuade the Italian archaeological community of the importance of open data, we use a basic policy for ingestion: we acquire raw data as they are

1

Preserving Italian archaeological open data: the MOD solution.

We don’t validate or describe the ‘archaeological’ quality of the data, because we firmly believe that the quality of research data must be judged by archaeologists themselves in a sort of open peer review method.

For increasing the quality of data we choose two ways: 1. encouraging to publish data paper in the

Journal of Open Archaeological Data 2. fostering education through the Open School

of Archaeological Data

Preserving Italian archaeological open data: the MOD solution.

we validate the data from a legal point of view. We publish a detailed guide in which are explained the procedures that must be followed to prepare and provide the material to be published. In compliance with the laws, published documents are not expected to contain the personal data of natural persons who have not previously agreed to their publication.. Specific disclaimers have been prepared and can be downloaded to help authors correctly collect the authorisations needed to put their material online.

2

Preserving Italian archaeological open data: the MOD solution.

We check each data collection in the light of the following regulations: a) Law 633/1941 on “Protection of copyright and rights related to its exercise”; b) Legislative Decree 42/2004, “Code of the cultural and landscape heritage; c) Legislative Decree 196/2003, “Personal data protection code” (Privacy Code); d) Legislative Decree 30/2005, “Industrial property code” (CPI).

Preserving Italian archaeological open data: the MOD solution.

3 the data collections are indexed with keyword terms using an appropriate thesaurus.

Preserving Italian archaeological open data: the MOD solution.

Once validated the data, we embed metadata to each dataset describing all the information regarding the dataset itself. We use a metadata schema for each dataset describing all the information regarding the dataset itself: the structure and format of the digital data, the history of the archaeological investigation, the sources used, the method and the relationship with the physical data. The schema is composed partly from Dublin Core and partly from ISO 19115 metadata core for the geographical section

4

Anichini F., Gattiglia G. MAPPA Open Data Metadata. The importance of archaeological background. F. Giligny, F. Djindjian, L. Costa, P. Moscati, S. Robert (eds.) Proceedings of the 42nd Annual Conference on Computer Applications and Quantitative Methods in Archaeology, CAA 2014, 21st century Archaeology

Preserving Italian archaeological open data: the MOD solution.

5 Reuse itself aids preservation, for this reason all datasets are licensed with CC-BY or CC-BY-SA licenses and published with a DOI (Digital Object Identifier).

More used licenses

Preserving Italian archaeological open data: the MOD solution.

The primary goal of the preservation policy is to ensure both the long-term preservation and the highest level of authenticity. 6

For example, RDBMS could be migrated to a number of flat files. However for each table, the relational links need to be expressed in documentation in terms of the type of the link and the manner in which the keys can be identified.

The authenticity needs to be re-established through the documentation of the actions taken and validation that the substantive content has not been altered.

long-term preservation is related to the elimination of software dependence, but to eliminate software dependence we must sacrifice the structure, that means that the end product of these transformations is not authentic versions of the original.

Preserving Italian archaeological open data: the MOD solution.

• normalization, i.e. migration to widely supported open standards;

• version migration, i.e. migration through successive versions of a format, in many cases it’s the only option for preserving proprietary formats that don’t migrate to open standards (this is practical where the software using proprietary formats is widely used within a research community)

• format migration, i.e. migration to other formats for dissemination;

Preserving Italian archaeological open data: the MOD solution.

Best solution Acceptable solution

tabular data CSV TAB TXT

XLS MDB/ACCDB, DBF ODS

Textual data XML RTF TXT

HTML DOC ODF

Documentation RTF HTML ODT

DOC PDF

Geospatial data SHP GEOTIFF DXF

MDB KML

Image data TIF JPEG TIFF RAW

Preserving Italian archaeological open data: the MOD solution.

refreshment media, i.e. migration between media which leave data unchanged. Refreshment is the process of transferring data from one type of storage medium to another to ensure continued access to the information, without alteration to the format of the data.

Thank you Mappa Lab [email protected]

Gabriele Gattiglia [email protected]

@g_gattiglia http://pisa.academia.edu/GabrieleGattiglia

More info

@MappaProject http://www.mappaproject.org

Francesca Anichini [email protected]

@FrAnichini https://pisa.academia.edu/FrancescaAnichini

Preserving Italian archaeological open data: the MOD solution.