16
Linked Humanities Data: The Next Frontier? A Case-Study in Historical Census Data Albert Meroño-Peñuela Knowledge Representation & Reasoning Group 29-10-2012

Linked Humanities data

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Linked Humanities data

Linked Humanities Data:The Next Frontier?

A Case-Study in Historical Census Data

Albert Meroño-PeñuelaKnowledge Representation & Reasoning Group

29-10-2012

Page 2: Linked Humanities data

Linked Humanities Data: The Next Frontier? 2

The Dutch historical censuses (1795-1971)

29-10-2012

Page 3: Linked Humanities data

Linked Humanities Data: The Next Frontier? 3

The Dutch historical censuses (1795-1971)

29-10-2012

Page 4: Linked Humanities data

Linked Humanities Data: The Next Frontier? 4

The Dutch historical censuses (1795-1971)

• Population, Houses and Occupation censuses

• 507 Excel files• 2,288 tables• 33,283

annotated cells

29-10-2012

Page 5: Linked Humanities data

Linked Humanities Data: The Next Frontier? 5

Heterogeneity: structural

29-10-2012

Page 6: Linked Humanities data

Linked Humanities Data: The Next Frontier? 6

Heterogeneity: semantic

• Variable meaning– Plaatselijke indeling / Kom, buiten de kom + Wijk +

Naam / Plaats– Variable design (age 14-18, 19-20 vs. 14-15, 16-20)

• Variable values– RomschKatholik, RomsKatholic, VaticanChristelijk– Change in municipalities, occupations

29-10-2012

Page 7: Linked Humanities data

Linked Humanities Data: The Next Frontier? 7

(Current) Harmonization

• Manually create a (more general) translation table using standard CS– Map occupation literals with HISCO codes– Map municipality literals with AC codes

• Cons– Expensive– Detail/specificity loss– Process is non-repeatable

29-10-2012

Page 8: Linked Humanities data

Linked Humanities Data: The Next Frontier? 8

Additional requirements

• Errors: non-destructive update of values• Provenance: record who did what, when, why• Datamodel: do not commit to a specific one• Linkage: enrich the dataset by linking it to

others (e.g. labour strikes, book publications in NL)

• Publication: open data for researchers

29-10-2012

Page 9: Linked Humanities data

Linked Humanities Data: The Next Frontier? 9

Census RDF: arch

29-10-2012

• RDF Data Cube Vocabulary (cell data)

• D2S Vocabulary (layout data)

• Open Annotation Core Data Model (annotation data)

Page 10: Linked Humanities data

Linked Humanities Data: The Next Frontier? 10

Census RDF: cell data

29-10-2012

Page 11: Linked Humanities data

Linked Humanities Data: The Next Frontier? 11

Census RDF: layout data

29-10-2012

Page 12: Linked Humanities data

Linked Humanities Data: The Next Frontier? 12

Census RDF: annotation data

29-10-2012

Page 13: Linked Humanities data

Linked Humanities Data: The Next Frontier? 13

Querying the RDF’d census

29-10-2012

Page 14: Linked Humanities data

Linked Humanities Data: The Next Frontier? 14

Not ready-to-publish RDF

• Disconnected graphs (but 279,136 possible variable mappings!)

• Complex & non-homogeneous SPARQL queries• Contradictory annotation statements• Drifted concepts– Tile settler -> roof repairer– Shoemaker (works with leather) -> shoemaker (owns a

company)

29-10-2012

Page 15: Linked Humanities data

Linked Humanities Data: The Next Frontier? 15

New challenges

• Dynamic ontologies– Different concept formalizations depending on the

time frame– Subjective definitions (contested concepts)

• Partitions and counting– Cannot merge counts of non aligned concepts– Infer individuals?

• Format round-tripping– On-demand XLS, CSV, RDF, RDB conversions with(out)

data loss29-10-2012

Page 16: Linked Humanities data

Thank you!Questions, suggestions?

http://cedar-project.nl/http://www.data2semantics.org/