43
„Digital Humanities an der ÖAW“ “the right toolbox” acdh data services full stack Matej Ďurčo lunch time lecture, ZIM-ACDH, Graz - 2016-12-20 1

„Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

„Digital Humanities an der ÖAW“

“the right toolbox”

acdh data services full stack

Matej Ďurčolunch time lecture, ZIM-ACDH, Graz - 2016-12-20 1

Page 2: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

ACDH-OEAW - The New Centre

▸2014: ACDH Project; Planning & Preparation

▸1 January 2015 official foundation

as 29th institute of the Academy

▸April 2015 operational

▸Team expansion June 2015 (100 applications, 40 interviews, ~20 new

colleagues)

▸Current number of staff: ~50

2

Page 3: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

Organisation

4 Working groups

▸Networks & Outreach

▸Tools, Service, Systems

▸Data, Resources and Standards

▸eLexicography

+ 1 Department: Variation und Wandel des Deutschen in Österreich

+ 3 directors: Charly, Georg, Tara

3

Page 4: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

Position inside Academy

4

Page 5: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

The Four Pillars

▸DH research

▸Technical services & expertise

▸Social Infrastructure

▸Knowledge Transfer

5

Page 6: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

AG-1 Networks & Outreach

▸Knowledge transfer▹ ACDH Lectures, DH Tool Gallery, DHA-Days

▸ESFRI-Liasion▹ CLARIN, DARIAH, PARTHENOS, dariahTeach

▹ CLARIAH-AT - coordination, platform digital-humanities.at, DHA-Days

6

Page 7: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

AG-2 Tools, Services & SystemsProvide technical infrastructure

▸Provision of services (tools, applications, …)

▸Servers/services completely dockerized

▸Develop/adopt software “beg, steal & borrow” approach▹36 projects on github + dozens on redmine-git

▸Storage:▸Mysql, PostgreSQL, MSSQL

▸Triple Store: Blazegraph, Virtuoso

▸XMLDB: eXist, BaseX

▸Maintain the systems▹Run servers (in cooperation with ARZ – Academy’s computing centre)

▹Virtualisation

▹Databases

▹Low-level data management (backups, access)

▹AAI (SP)

▹Monitoring (icinga, piwik)7

Page 8: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

AG-3 Data, Resources, Standards

▸Consulting▹ Consultancy for drafting project proposals

▹ Data Management Plans

▸Data Modelling▹ TEI/XML, Relational DBs, RDF/LOD

▹Mapping/Transformation/Conversion

▸Data/Knowledge management ▹ Reference resources, Vocabularies

▸Content Curation▹ Annotation

▸Metadata Curation

▸Digital Editions

▸Digitisation

AG-2 and AG-3 tightly intertwined 8

Page 9: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

ACDH projects, resources, services

http://www.oeaw.ac.at/acdh/projects www.oeaw.ac.at/acdh/resources

https://github.com/acdh-oeaw9

Page 10: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

acdh data services

the right toolbox

10

Page 11: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

Reference workflow

Wissik / Durco 201611

Page 12: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

12

Page 13: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

Working Areas / Fachbereiche

▸NLP-pipeline / Corpus Search / Analysis

▸Semantic Technologies & Knowledge Management (“from data to knowledge”)

▸Repositories & Data Management (“save the data”)

▸Online-Publication & Digital Editions

▸Visualization (esp. networks/graphs)

▸Geocoding / Geovisualization

▸Virtual Research Environments (or “collaborative spaces”)

▸Digitisation

13

Page 14: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

online publication

corpus_shell

14

Page 15: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

Online publication - corpus_shell

▸Modular framework for publishing a wide range of language

resources designed to operate in a distributed and heterogeneous environment

▸Distributed setup based on FCS

▸ Integration with CLARIN CMDI

▸Open source

▸ In ~ 4 flavours:▹ XQuery/eXist - cr-xq-mets

▹ Php

▹ Perl (ddc wrapper)

▹ Python (sketch engine wrapper)

▸+ cs-xsl - a generic XSL-suite

15

Page 16: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

ABaC:us

https://abacus.acdh.oeaw.ac.at/16

Page 17: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

https://mecmua.acdh.oeaw.ac.at/

17

Page 18: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

https://baedeker.acdh.oeaw.ac.at/

18

Page 19: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

L

https://tunico.acdh.oeaw.ac.at/

19

Page 20: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

Lexical & Semantic services

20

Page 21: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

Lexical & Semantic services

▸ Motivation▹Extract structured information from unstructured data (text) (entails named entity recognition)

▹Enrich/link with external semantic reference resource (entity linking)

▹Collaboratively maintain own reference resources (vocabularies/thesauri)

▹=> interlink heterogeneous data from different projects via reference resources

▸Requirements▹Flexible selection of reference resource to match against

▹Customisable matching algorithm

▹Combination of automatic and manual processing

▹Programmatic access to the external resources (API)

& User interface

▹publish as LOD

21

Page 22: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

Lexical & Semantic services - Tools

▸ enrich (Apache Stanbol) - flexible framework for semantic content management

▸ reconcile - UI service for matching scientific plant names to common names, works

on top of Common Name Service’s API

▸ OpenSKOS - vocabulary repository, editor, service

▸ RDF-Mapping tools (ontop, Karma, 3M)

▸ Triple store(s) – Virtuoso, Blazegraph

22

Page 23: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

Enrich - Stanbol

23

http://enrich.acdh.oeaw.ac.at/

Page 24: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

Example: Reconcile

▸Look up one scientific name

▸ Import and reconcile a list of names

▸Get RDF file with response in EdmSKOS

▸Direct data post-processing online (filter, select, export)

24

http://reconcile.eos.arz.oeaw.ac.at/

Page 25: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

Lexical & Semantic services - Data

External reference resources:

▸GND - (Gemeinsame Normdatei): authority file for Persons, Corporate bodies,

Conferences and Events, Geographic Information, Topics and Works

▸GeoNames - Integrated geographical data source (contains: names of places in

various languages, coordinates, population, location types, etc.)

▸DBpedia - structured, machine-readable version of Wikipedia

▸Common names service - federated search of common names for a given

scientific name

Local resources:

▸Dboe plants common names

▸BBT - DARIAH BackBone Thesaurus

25

Page 26: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

Example: GeoNames

26

Page 27: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

Lexical & Semantic services - Workflow

▸Unstructured Text -> NER + Entity Linking -> Curation/Disambiguation ->

Publish as LOD

▸Example of specific workflow in APIS project:

27

Page 28: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

Lexical & Semantic services

Challenges & Future work

Challenges:

▸Disambiguation

▸False matches (precision)

▸Missing matches (recall)

Plans:

▸Establish the feedback loop from manual post-processing to automatic

processing

▸ Integrated Portal for Lexical & Semantic Services

▸Make the resources available via Triple Store

28

Page 29: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

GraphViewer

graphviewer.acdh.oeaw.ac.at

29

Page 30: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

Web application(s): analyze/visualize

30

Page 31: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

APIS – Persons-Places Network

31

Page 32: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

CLARIAH-AT

digital-humanities.at

32

Page 33: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

CLARIAH-AT

▹ACDH-OEAW coordinating the

national consortium

for CLARIN and DARIAH

▹Focus on commonalities

and synergies right from the start

▹Some 13 institutes from

6 institutions

▹Contributing In-Kinds

▹Engaging on EU-level

▹Knowledge Sharing/Exchange

▹http://digital-humanities.at/

as common „platform“ 33

Page 34: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

CLARIAH-AT - European Involvement

DARIAH

▸Co-head VCC1

▸Participation in NCC, JRC

▸Participation in 10 working groups

CLARIN

▸Involvement in the development of core infrastructures (CMDI, FCS)

▸Participation in all the bodies (NCF, SCCTC, TFs: CMDI, FCS, Curation, CCR …)

▸Running a CLARIN Centre

▸Hosting/Providing data

▸Developing/Providing tools

34

Page 35: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

Research Infrastructures

Work of ag2/3 deeply embedded in & contributing to RIs

▸DARIAH (Working groups)▹thesaurus maintenance - Backbone Thesaurus

▹guidelines and standards, ...

▸CLARIN ▹CLARIN Centres (CLARIN Centre Vienna)

▹Depositing service, NLP services

▹Metadata Catalogue - VLO

▸PARTHENOS▹Cluster of Humanities and Cultural Heritage RIs

▹Common semantic framework

▹Architecture / Framework for service provisioning

▸EUDAT 2020 ▹storage for research data 35

Page 36: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

DHA website – re-relaunch

http://digital-humanities.at/

1. 12. 2016

▸Decoupled front-end Drupal as CMS delivering data via

API consumed by AngularJS-client

▸Expand to a comprehensive

information resource

(“Knowledge Hub”)▹ Overview of DHA projects

▹ Bibliography (zotero library)

- Reading

- Resources & Services

▹ Glossary

working definitions of common terms (used also for content tagging)

36

Page 37: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

37

Page 38: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

Reference workflow

Wissik / Durco 201638

Page 39: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

How can tech infra foster innovation

▸enable collaboration: ▹ Federated Identity (use all services with your institutional account)

▹ Virtual Research Environments

▸enable sharing▹ Open Access - make (raw) data available

▹ Open Source - make source code available

(under permissive licenses)

(https://github.com/acdh-oeaw)

▸exploit sharing / reuse▹ loads of resources freely available (data, applications)

“everything is there, just use it”

▹ but (/ on the other hand) always need for customization 39

Page 40: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

Innovation vs. reliability/stability

▸ innovation: develop new experimental tools – may fail by definition

▸but innovation needs reliable/stable environment

▸Therefore development must be (also) conservative:

striving to achieve production level quality of service: apps that just work

quality of

▸hardware (processing/storage capacity, throughput, uptime),

▸ software (performance, correctness / no bugs),

▸data (curation, provenance)

▸ergonomy (as people are used to from commercial applications)

▸ “human factor” (response time, expertise, knowledge exchange)

▸ reliable technical infrastructure / seamless technology 40

Page 41: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

Challenges

▸Reuse ▹ Functionality (libraries, modules)

▹ Modular software, growing dependency stacks

▸Reproducibility

▸Quality of Service

▸ Innovation vs. Reliability

▸Workflows▹ In a hetereogeneous, distributed, dynamic environment

▸Communication / Knowledge Management▹ How not to drown in emails (and basecamp- and slack- and … messages)?

41

Page 42: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

Where do we want to go?

We’re in the process of finding out (2017!)

▸Vertical integration - VREs ▹ Harmonized portfolio of high-quality services supporting the whole research process

▹ Bring together Resources & Services

▸Horizontal integration ▹ broader coverage (of resources & services)

▹ From different domains

▸Safe home for the data

▸LR -> DH

▸Semantix*

42

Page 43: „Digital Humanities an der ÖAW“ “the right toolbox” · Semantic Technologies & Knowledge Management (“from data to knowledge”) ... Programmatic access to the external

Thank you!

Questions?

43