25
Data Modelling, Interoperability and Exploration for the Garzoni Project Frédéric Kaplan, Giovanni Colavizza EPFL

Garzoni conference 11 October 2014

Embed Size (px)

Citation preview

Page 1: Garzoni conference 11 October 2014

Data Modelling, Interoperability and Exploration for the Garzoni Project

Frédéric Kaplan, Giovanni Colavizza EPFL

Page 2: Garzoni conference 11 October 2014

Interoperability: the VTM quest

3 layers of interoperability: • Data models • Data formats • Interfaces

Page 3: Garzoni conference 11 October 2014

Requirements for the Garzoni project

• Future interoperability with VTM systems • Modularity at all levels (data model, formats

and interfaces) during the project and beyond • Portability (no “final” system, data migration)

Page 4: Garzoni conference 11 October 2014

Data model: our approach

Semi-structured model: no distinction between data and schema (how data is organised). The amount of structure depends on the purpose.

How? RDF statements: subject - predicate - object.

E.g. “Matio da Tomet (stampador al torcollo) 1582-10-30” [SUBJECT] has_folia [PREDICATE] “97r” [OBJECT]

Page 5: Garzoni conference 11 October 2014

Data model

Temporal conditions

Financial conditions

Identified Person

Apprentice Masters Guarantors

Contract

Professions Parishes

Document level

Access level

Page 6: Garzoni conference 11 October 2014

Example of contents, the upper layer

• Id: Name of apprentice, profession and date • Registry • Foliation • Apprentice • Master • Guarantors • Temp. conditions • Financial conditions • References to other documents (e.g. menzioni di

contratto)

Page 7: Garzoni conference 11 October 2014

Technical solution

Semantic MediaWiki platform

Database (relational)

Mediawiki

Web interface for data entry

Web crawler

Intermediary layer 1 page = 1 entity

everything is a page

RDF import/export

XML import/export

Page 8: Garzoni conference 11 October 2014

Technical solution

Semantic MediaWiki platform

Pro: • Highly modular: change of data model and interface (relatively) easy • User management • Versioning • Totally open: web API, import/export in rdf and xml

Cons: • The page is a unit of content and a unit of visualisation • Very limited for search and visualisation • Slow

Therefore: • Data already ported to other databases • Need to build a proper search interface

Page 9: Garzoni conference 11 October 2014

Data Exploration

Page 10: Garzoni conference 11 October 2014

Options

• Descriptive statistics • Regression analysis • Network analysis and visualisations

Page 11: Garzoni conference 11 October 2014

Most represented professionsPaternostrer da vero Tornidor Diamanter Arte dei colori Marangon da casse da specchi Indorador Marangon da casse Batioro Librer Stampador, Torcoler Stampador, Compositor Cuori d'oro Dalle perle false Tiraoro Murer Tagiapiera Marzer Marangon Orese Spechier

41 42 54 61 63 70 70 80 85 85 104 122 127 156 276 320 343 403 443 853

69 69 70 81 93 104 106 117 123 134 150 152 162 250 300 361 460 506 596 1032

Intagiador Pitor (24) Marangon da casse da specchi Marangon da casse Stampador, Torcoler Indorador Stampador, Compositor Librer Batioro Cuori d’oro Dalle perle false Squerariol Tiraoro Murer Tagiapiera Marangon Marzer Orese Spechier

Page 12: Garzoni conference 11 October 2014

Contracts by year

Page 13: Garzoni conference 11 October 2014

Women

• 7 women on 4878 full-information contracts (17 globally)

• 43 among masters • 256 among guarantors

Page 14: Garzoni conference 11 October 2014

Avg. salary per year and normalised number of contracts

0

1

2

3

4

5

6

7

8

9

10

Mea

n sa

lary

in d

ucat

s

1582

1583

1584

1591

1592

1593

1596

1597

1598

1620

1621

1622

1625

1626

1627

1632

1644

1645

1653

1654

1656

1657

1658

1664

Page 15: Garzoni conference 11 October 2014

Impact of print “industry”

0

1

2

3

4

5

6

7

8

9

10

Mea

n sa

lary

in d

ucat

s

1582

1583

1584

1591

1592

1593

1596

1597

1598

1620

1621

1622

1625

1626

1627

1632

1644

1645

1653

1654

1656

1657

1658

1664

Page 16: Garzoni conference 11 October 2014

Professions of the print “industry”

Total 436 100.00 Stampator al componer 1 0.23 100.00 Stampator 2 0.46 99.77 Stampador, Torcoler 85 19.50 99.31 Stampador, Stampador alle casse 1 0.23 79.82 Stampador, Miniador 3 0.69 79.59 Stampador, Compositor 104 23.85 78.90 Stampador in carta 1 0.23 55.05 Stampador alle casse 1 0.23 54.82 Stampador 33 7.57 54.59 Miniador 24 5.50 47.02 Ligador da libri 1 0.23 41.51 Librer, Stampador 2 0.46 41.28 Librer, Getador da caratteri da stampa 1 0.23 40.83 Librer da libri in greco 1 0.23 40.60 Librer da libri a stampa 2 0.46 40.37Librer da conti, Librer da carta bianca 4 0.92 39.91 Librer da carta bianca 11 2.52 38.99 Librer 85 19.50 36.47 In tenir libri 1 0.23 16.97 Getador da caratteri da stampa 18 4.13 16.74 Far libreti de spechi de miniar 3 0.69 12.61 Dalli canoni, Ligador di libri 1 0.23 11.93 Compositor, Intagiador da piere 1 0.23 11.70 Compositor alla stampa 6 1.38 11.47 Compositor 2 0.46 10.09 Cartoler 27 6.19 9.63 Carter 15 3.44 3.44 profession_code_strict Freq. Percent Cum.

Page 17: Garzoni conference 11 October 2014

Professions of the print “industry”

Car

ter

Car

tole

r

Com

posi

tor

Com

posi

tor a

lla s

tam

pa

Com

posi

tor,

Inta

giad

or d

a pi

ere

Dal

li ca

noni

, Lig

ador

di l

ibri

Far l

ibre

ti de

spe

chi d

e m

inia

r

Get

ador

da

cara

tteri

da s

tam

pa

In te

nir l

ibri

Libr

er

Libr

er d

a ca

rta b

ianc

a

Libr

er d

a co

nti,

Libr

er d

a ca

rta b

ianc

a

Libr

er d

a lib

ri a

stam

pa

Libr

er d

a lib

ri in

gre

co

Libr

er, G

etad

or d

a ca

ratte

ri da

sta

mpa

Libr

er, S

tam

pado

r

Liga

dor d

a lib

ri

Min

iado

r

Stam

pado

r

Stam

pado

r alle

cas

se

Stam

pado

r in

carta

Stam

pado

r, C

ompo

sito

r

Stam

pado

r, M

inia

dor

Stam

pado

r, St

ampa

dor a

lle c

asse

Stam

pado

r, To

rcol

er

Stam

pato

r

Stam

pato

r al c

ompo

ner

0

10

20

30

40

Page 18: Garzoni conference 11 October 2014

The case of the press0

.05

.1.15

.2.25

Density

8 10 12 14 16 18a_age

0.1

.2.3

.4Density

0 5 10 15 20annual_salary

0.05

.1.15

.2Density

10 15 20 25a_age

0.05

.1.15

.2.25

Density

4 6 8 10 12annual_salary

Stampador al componer Stampador al torcolo

Page 19: Garzoni conference 11 October 2014

Regression analysis

What factors impact the salary globally?

• Quondam: father alive positive impact • Age of apprentice: more aged better salary • Date: moderate positive impact (inflation?) • Length: longer contracts proportionally have less salary • Who pays the salary: very strong positive impact if not

master

Page 20: Garzoni conference 11 October 2014

Network analysis: connected components

2 networks: • Family relationships (mostly father-son) • Contract relationships (master, apprentice,

guarantor)

Page 21: Garzoni conference 11 October 2014

Network analysis: connected components

Page 22: Garzoni conference 11 October 2014

Network analysis: connected components

Page 23: Garzoni conference 11 October 2014

Network analysis: connected components

Page 24: Garzoni conference 11 October 2014

Network analysis

Grazioso Percacino

Marco Antonio di Giacomo (stampador al componer)

Zampaulo

Battista di Piero (stampador al torcollo)

Giacomo Petrobelli

Anibal Petrobelli

Zuan Antonio PasiniZamaria Pasini

Orazio dei Gobbi

Matio da Tomet

Cristofolo Galas

Santo Patariol

Bertolamio Albarella

Battista da Luso

Francesco Rubin

Gerolamo di Uberti

Sprospero (tentor)Santo di Domenico

Cabriel di Giacomo

Gerolamo di Giacomo (da Lonato)

Giacomo (da Lonato)

Bertolamio di Giacomo

Giacomo (de Trento della val de Eser)

Nicolò di Francesco 1

Francesco di Vicenzo 1

Zamaria di AndreaGaudentio di Gaudentio

Tomio di Battista

Bortolo (tirador)

Dorin di Durazo

Michiel di Noto Michiel

Giacomo Iusto

Anzolo di Gerolamo di Bernardo

Gerolamo di BernardoMarco di Vicenzo (grison)Cristofolo di Gales

Antonio di Bertolamio (dalla Val di Sole)

Antonio Botoni

Durazo

Piero 2

Gales

Battista (trentin)

Francesco Pasini

Vicenzo (ceruico)

Giacomo 1

Bertolamio (dalla Val di Sole)

Domenico (da Nondena)

Noto Michiel

Eredi di Francesco Rubin

Gaudentio (grison)

Francesco di Uberti

Giacomo Albarella

Andrea (dal Garda)

Eredi di Bortolo (tirador)

Giacomo (da Valvasone)

Vicenzo da Tomet

Vicenzo (grison)

Page 25: Garzoni conference 11 October 2014

Data Modelling, Interoperability and Exploration for the Garzoni Project

Grazie

Frédéric Kaplan, Giovanni Colavizza EPFL