25
Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research (Inrap) Federico Nurra Service activités internationales, Direction Scientifique et Technique, INRAP ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

Embed Size (px)

Citation preview

Page 1: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

Toward a long term data preservation strategy

and interoperability at the French National

Institute for Preventive Archaeological

Research (Inrap)

Federico Nurra

Service activités internationales,

Direction Scientifique et Technique,

INRAP

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

Page 2: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

The French Institute for Preventive Archaeology

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

Organisation

8 Regional Headquarters

About 50 archaeological centres

About:

2000 archaeologists

2300 operations a year

85 % Trial trenching

15% Excavations

~ 45.000 Archaeological Operations

Page 3: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

Starting point

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

Dolia:

• Inrap’s catalog of

documentary

collection / digital

library

Dolia:

• ~ 28.500 reports

Page 4: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

Mapping Dolia (UNIMARC) - ACDM

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

Champ DOLIA (UNIMARC) ACDM

Titre UNIMARC 200 [$a + $e] dcterm:title

Description UNIMARC 330 [$a] dcterms:description

Date UNIMARC 210 [$d] dcterms:issued

Mots Clés UNIMARC 610 [$a] dcat:keyword

Langue UNIMARC 101 [$a] dcterms:language

Chronologie UNIMARC 634 [$5] acdm:temporal

… … …

Responsable Sc. UNIMARC 700 [$a] acdm:scientificResponsible

Sujet UNIMARC 606 [$5] acdm:nativeSubject

… … …

Page 5: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

Thesaurus Mapping (Pactols-AAT)

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

Original Concept

(Pactols)

Exact Match

Close Match

Broad Match

Narrow Match

ARIADNE Concept

(AAT)

Results:

• 1814 Concepts (on 5149)

• 1282 Exact (70,7%)

• 132 Close (7,5%)

• 389 Broad (21,4%)

• 7 Narrow (0,4%)

http://pactols.frantiq.fr/opentheso/index.xhtml http://vocab.getty.edu/

[SKOS]

Page 6: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

Thesaurus Mapping (Pactols-AAT)

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

sourceLabel SourceURI matchURI targetLabel TargetURI

abandon de lieu http://ark.frantiq.fr/ark:/26678/pcrt9Xvh4RiNS5 skos:closeMatch emigration http://vocab.getty.edu/aat/300055406

abbatiale http://ark.frantiq.fr/ark:/26678/pcrt8icSwjOM7u skos:exactMatch abbey churches http://vocab.getty.edu/aat/300007495

abbaye http://ark.frantiq.fr/ark:/26678/pcrt8icSwjOM7u skos:exactMatch abbeys (monasteries) http://vocab.getty.edu/aat/300000642

abreuvoir http://ark.frantiq.fr/ark:/26678/pcrtBpaSMOlWvz skos:exactMatch troughs (containers) http://vocab.getty.edu/aat/300220971

abri sous roche http://ark.frantiq.fr/ark:/26678/pcrtkeLpJpVN4t skos:exactMatch rock shelters http://vocab.getty.edu/aat/300007699

abside http://ark.frantiq.fr/ark:/26678/pcrtJ71ZuyuOEq skos:exactMatch apses http://vocab.getty.edu/aat/300004607

acier http://ark.frantiq.fr/ark:/26678/pcrt3KtdnXPGUg skos:exactMatch steel (alloy) http://vocab.getty.edu/aat/300133751

acquisition de ressource naturelle http://ark.frantiq.fr/ark:/26678/pcrts8SiTTY3Ka skos:closeMatch extracting complexes http://vocab.getty.edu/aat/300000388

acquisition des données http://ark.frantiq.fr/ark:/26678/pcrtIq8AvOPlPV skos:exactMatch recording http://vocab.getty.edu/aat/300077610

acrotère http://ark.frantiq.fr/ark:/26678/pcrtTQjJbnRU7S skos:exactMatch acroteria http://vocab.getty.edu/aat/300002214

activités commerciales http://ark.frantiq.fr/ark:/26678/pcrtMNiNZMViWi skos:broadMatch trade (function) http://vocab.getty.edu/aat/300061886

administrations http://ark.frantiq.fr/ark:/26678/pcrtF4g7aXEn6l skos:exactMatch public administration http://vocab.getty.edu/aat/300254069

ADN http://ark.frantiq.fr/ark:/26678/pcrtDiedrkmaGg skos:exactMatch DNA http://vocab.getty.edu/aat/300379678

adobe http://ark.frantiq.fr/ark:/26678/pcrthELuSjXqGH skos:exactMatch adobe (material) http://vocab.getty.edu/aat/300081138

adolescent http://ark.frantiq.fr/ark:/26678/pcrtrg5RZoroR7 skos:exactMatch adolescents http://vocab.getty.edu/aat/300163862

adulte http://ark.frantiq.fr/ark:/26678/pcrtf4z1Qe2N4U skos:exactMatch adults http://vocab.getty.edu/aat/300154397

AFC http://ark.frantiq.fr/ark:/26678/pcrtu6MrQiPdBO skos:exactMatch factor analysis http://vocab.getty.edu/aat/300379464

agger http://ark.frantiq.fr/ark:/26678/pcrtKMaTkYqTSh skos:broadMatch embankments http://vocab.getty.edu/aat/300008023

agglomération secondaire http://ark.frantiq.fr/ark:/26678/pcrt1gsiMfFUP0 skos:broadMatch agglomerations http://vocab.getty.edu/aat/300008400

agglomération urbaine http://ark.frantiq.fr/ark:/26678/pcrtN5zGcqx0YR skos:exactMatch towns http://vocab.getty.edu/aat/300008375

Page 7: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

The French Institute for Preventive Archaeology

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

153.851

110.824 exact 72,03

9.212 close 5,99

33.606 broad 21,84

209 narrow 0,14

5.898 no subject

exact

close

broad

narrow

Page 8: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

Geocoding

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

121, Rue d’Alésia, 75014 Paris

API Géoportail IGN

(Géocodage) + API BAN

Lng: 2.323169

Lat: 48.829001

Results:

• 28.357

• 3.636 Exact points (12,8%)

• 8.327 ~ Street (29,4%)

• 16.267 ~ Town (57,4%)

• 127 No geolocation (0,4%)

Page 9: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

Geocoding

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

Heatmap Clustermap

Page 10: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

Chronological Mapping

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

Période (Pactols) Earliest start Latest stop

… … …

Néolithique -6000 -2201

Néolithique ancien -6000 -5301

Néolithique moyen -5300 -4501

Néolithique récent -4500 -2201

… … …

Protohistoire -2200 -51

Âge du Bronze -2200 -801

Bronze ancien -2200 -1601

… … …

Results:

• 124 Chronological

concepts

• LOD on http://perio.do/

Page 11: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

Chronological Mapping

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

MED ~2.200 reports NP ~3.200 reports

Page 12: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

The ARIADNE Portal

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

URI Ark - Dolia Subject AAT (LOD)

Geonames (LOD) PeriodO (LOD)

Page 13: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

All that glitters is not gold…

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

National Preservation Infrastructures

Page 14: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

A case study

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

Page 15: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

A case study

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

Page 16: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

A case study

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

Page 17: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

A case study

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

Page 18: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

A case study

• 11,6 Km2

• 7 «Prescriptions» of the regional services of the state (Emprises)

• 80 Evaluation tranches (Ouvertures)

• 532 Archaeological remains (Unités d’Observation)

• Archaeological investigations from 2001 to 2015

• 11,6 Gb of legacy data (it’s not a joke!)

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

Page 19: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

Questions

• How to treat the legacy data

• How to manage the 60 different recording systems? What standards to use?

• How to integrate artifacts?

• How to open the process to the pubblic? (define the users)

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

Page 20: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

Proposed Solutions ARIADNE Summer School, Athens, 12-17 June 2016 - Digital curation of archaeological knowledge

• Collect all the legacy data

• Finding a formal management system

• Go for basic preservation of digital content and registration

• Reach the researcher’s community

• Follow a multistructured approach for the next 5-6 years

• Organise awarness workshop

• Use the example of legacy data or the ARIADNE portal to convience for the

necessity

• Develop paradata journal /blog

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

Page 21: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

Pain Point(s)

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

Page 22: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

Solution(s)

Unique ID (e.g. Code INSEE + N. Oper.)

1_DOCUMENTATION

ADMINISTRATION

FILE_1.docx

FILE_2.xlsx

FILE_N.xxx

2_GIS_DATA

3_DATA

N_ETC…

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

Page 23: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

Minimal Viable Product

•2001_45039_015AH

•2007_45039_017AH

•2008_45039_018AH

•2009_45039_019AH

•2012_45039_020AH

•2013_45039_021AH

•2014_45039_022AH

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

Page 24: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

Needed Activities

• Short Term Strategy (a couple of months)

• Case Study (7 archaeological operations)

• ARIADNE Portal

• Medium Term Strategy (one year)

• Specific training for archaeologists

• Definition of «Best Pratices»

• Basic curation approach

• Long Term Strategy (5-6 years)

• Collect all data (almost 40.000 archaeological interventions)

• Authomatic ingest of new data (almost 2300 a.i. / year)

• Store and publish all data in a specific repository

• Curation strategy for a long term preservation

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry

Page 25: Federico Nurra - Toward a long term data preservation strategy and interoperability at the French National Institute for Preventive Archaeological Research

Q&A

Federico Nurra

[email protected]

ARIADNE Winter School, Prato, 12-15 December 2016 - Legacy datasets and their inclusion in the ARIADNE Registry