60
Publishing and Pushing Linked Data in Archaeology Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/> Eric C. Kansa (@ekansa) UC Berkeley D-Lab & Open Context

Publishing and Pushing Linked Open Data

Embed Size (px)

DESCRIPTION

This presentation outlines the need to invest intellectual and expert human effort in data publication in order to see compelling research outcomes. I gave this presentation on April 10th, 2014 at the University of Pennsylvania in an event sponsored by the Penn Humanities forum (http://humanities.sas.upenn.edu/13-14/dhf_opendata.shtml)

Citation preview

Page 1: Publishing and Pushing Linked Open Data

Publishing and Pushing

Linked Data in Archaeology

Unless otherwise indicated, this work is licensed under a Creative Commons

Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/>

Eric C. Kansa (@ekansa)UC Berkeley D-Lab

& Open Context

Page 2: Publishing and Pushing Linked Open Data

Introduction

Challenges in Reusing Data1. Background

2. Data publishing workflow

3. Data curation and dynamism

Page 3: Publishing and Pushing Linked Open Data

“Gold Standard” of

professional contribution

Page 4: Publishing and Pushing Linked Open Data

My Precious Data:

Dysfunctional incentives

(poorly constructed metrics),

limit scope, diversity of

publications

Image Credit: “Lord of the Rings” (2003, New Line), All Rights Reserved Copyright

Page 5: Publishing and Pushing Linked Open Data
Page 6: Publishing and Pushing Linked Open Data
Page 7: Publishing and Pushing Linked Open Data

Need more carrots!

1. Citation, credit, intellectually valued

2. Research outcomes (new insights from data reuse!)

Page 8: Publishing and Pushing Linked Open Data

Need more carrots!

1. Citation, credit, intellectually valued

2. Research outcomes (new insights from data reuse!)

Why linked data

is so important

Page 9: Publishing and Pushing Linked Open Data

EOL Computable Data Challenge

(Ben Arbuckle, Sarah W. Kansa, Eric Kansa)

Page 10: Publishing and Pushing Linked Open Data
Page 11: Publishing and Pushing Linked Open Data

Large scale data sharing & integration for exploring the origins of farming.

Funded by EOL / NEH

Page 12: Publishing and Pushing Linked Open Data

1. 300,000 bone specimens

2. Complex: dozens, up to 110 descriptive fields

3. 34 contributors from 15 archaeological sites

4. More than 4 person yearsof effort to create the data !

Page 13: Publishing and Pushing Linked Open Data

Relatively collaborative bunch, Ben Arbuckle cultivated relationships & built trust over years prior to EOL funding.

Page 14: Publishing and Pushing Linked Open Data

Introduction

Challenges in Reusing Data1. Background

2. Data publishing workflow

3. Data curation and dynamism

Page 15: Publishing and Pushing Linked Open Data

1. Referenced by US National Science Foundation and National Endowment for the Humanities for Data Management

2. “Data sharing as publishing” metaphor

Page 16: Publishing and Pushing Linked Open Data

Raw Data: Idiosyncratic, sometimes highly coded, often inconsistent

Page 17: Publishing and Pushing Linked Open Data

Raw Data Can Be Unappetizing

Page 18: Publishing and Pushing Linked Open Data

Publishing Workflow

Improve / Enhance

1. Consistency

2. Context (intelligibility)

Page 19: Publishing and Pushing Linked Open Data

Sometimes data is better served cooked

Page 20: Publishing and Pushing Linked Open Data

- Documentation

- Review, editing

- Annotation

Page 21: Publishing and Pushing Linked Open Data

- Documentation

- Review, editing

- Annotation

Page 22: Publishing and Pushing Linked Open Data

- Documentation

- Review, editing

- Annotation

Page 23: Publishing and Pushing Linked Open Data

- Documentation

- Review, editing

- Annotation

Page 24: Publishing and Pushing Linked Open Data

- Documentation

- Review, editing

- Annotation

Page 25: Publishing and Pushing Linked Open Data

“Ovis orientalis”

Code: 14

Wild

sheep

Code: 70

Code: 16

Ovis orientalis

Code: 15

Sheep,

wild

O.

orientalis

Sheep

(wild)

Page 26: Publishing and Pushing Linked Open Data

- Documentation

- Review, editing

- Annotation

Page 27: Publishing and Pushing Linked Open Data

“Ovis orientalis”http://eol.org/pages/311906/

Code: 14

Wild

sheep

Code: 70

Code: 16

Ovis orientalis

Code: 15

Sheep,

wild

O.

orientalis

Sheep

(wild)

Page 28: Publishing and Pushing Linked Open Data

● Controlled vocabulary

● Linked Data applications

Page 29: Publishing and Pushing Linked Open Data

“Sheep/goat”http://eol.org/pages/32609438/

1. Needed to mint new concepts like “sheep/goat”

2. Vocabularies need to be responsive for multidisciplinary applications

Page 30: Publishing and Pushing Linked Open Data
Page 31: Publishing and Pushing Linked Open Data
Page 32: Publishing and Pushing Linked Open Data

Linking to UBERON1. Needed a controlled vocabulary for

bone anatomy

2. Better data modeling than common in zooarchaeology, adds quality.

Page 33: Publishing and Pushing Linked Open Data

Linking to UBERON1. Models links between anatomy,

developmental biology, and genetics

2. Unexpected links between the Humanities and Bioinformatics!

Page 34: Publishing and Pushing Linked Open Data
Page 35: Publishing and Pushing Linked Open Data
Page 36: Publishing and Pushing Linked Open Data

7000 BC (many pigs, cattle)

7500 BC (sheep + goat dominate, few pigs, few cattle)

6500 BC (few pigs, mixing with wild animals?)

8000 BC (cattle, pigs,

sheep + goats)

• Not a neat model of progress to adopt a more productive

economy. Very different, sometimes piecemeal adoption in

different regions.

• Separate coastal and inland routes for the spread of domestic

animals, over a 1000-year time period.

Page 37: Publishing and Pushing Linked Open Data

Easy to Align

1. Animal taxonomy

2. Bone anatomy

3. Sex determinations

4. Side of the animal

5. Fusion (bone growth, up to a point)

Page 38: Publishing and Pushing Linked Open Data

Hard to Align (poor modeling, recording)

1. Tooth wear (age)

2. Fusion data

3. Measurements

Despite common research methods!!

Page 39: Publishing and Pushing Linked Open Data

Professional expectations for data reuse

1. Need better data modeling (than feasible with, cough, Excel)

2. Data validation, normalization

3. Requires training & incentives for researchers to care more about quality of their data!

Page 40: Publishing and Pushing Linked Open Data

Nobody expected their data to see wider scrutiny either..

Page 41: Publishing and Pushing Linked Open Data

… and not just academic researchers, linked open data involves many sectors!

Page 42: Publishing and Pushing Linked Open Data

Digital Index of North American Archaeology (DINAA)

1. State “site files” created to comply with federal preservation laws

2. Main record of human occupation in North America

3. PIs: David G. Anderson and Josh Wells

Page 43: Publishing and Pushing Linked Open Data

DINAA

1. Stable URI for each site file.

2. CC-Zero (public domain)

3. Beginning to link to controlled vocabularies

Page 44: Publishing and Pushing Linked Open Data

Data are challenging!

1. Decoding takes 10x longer

2. Data management plans should also cover data modeling, quality control (esp. validation)

3. More work needed modeling research methods (esp. sampling)

4. Editing, annotation requires lots of back-and-forth with data authors

5. Data need investment to be useful!

Page 45: Publishing and Pushing Linked Open Data

Introduction

Challenges in Reusing Data1. Background

2. Data publishing workflow

3. Data curation and dynamism

Page 46: Publishing and Pushing Linked Open Data

Investing in Data is a Continual Need

1. Data and code co-evolve. New visualizations, analysis may reveal unseen problems in data.

2. Data and metadata change routinely (revised stratigraphy requires ongoing updates to data in this analysis)

3. Problems, interpretive issues in data (and annotations) keep cropping up.

4. Is publishing a bad metaphor implying a static product?

Page 47: Publishing and Pushing Linked Open Data
Page 48: Publishing and Pushing Linked Open Data

Data sharing as publication

Data sharing as open source release cycles?

Page 49: Publishing and Pushing Linked Open Data

Data sharing as publication

Data sharing as open source release cycles?

Page 50: Publishing and Pushing Linked Open Data

Data sharing as publication

AND

Data sharing as open source release cycles

Page 51: Publishing and Pushing Linked Open Data

Data are challenging!

1. Decoding takes 10x longer

2. Data management plans should also cover data modeling, quality control (esp. validation)

3. More work needed modeling research methods (esp. sampling)

4. Editing, annotation requires lots of back-and-forth with data authors

5. Data need investment to be useful!

Page 52: Publishing and Pushing Linked Open Data

Image Credit: “Brainchildvn” via Flickr (CC-By)http://www.flickr.com/photos/brainchildvn/3957949195

Page 53: Publishing and Pushing Linked Open Data

Image Credit: “Brainchildvn” via Flickr (CC-By)http://www.flickr.com/photos/brainchildvn/3957949195

Not an easy environment to seek new investments.

Page 54: Publishing and Pushing Linked Open Data
Page 56: Publishing and Pushing Linked Open Data

Bethany Nowviskie (University of Virginia)

Shifts in Career Paths and Professions (#alt-academy), different publishing incentives, emerging as data assume a greater emphasis

Page 57: Publishing and Pushing Linked Open Data

Bethany Nowviskie (University of Virginia)

Alt-Acs (contingent, low status) not a good answer, but reflect wider need for institutional reform.

Page 58: Publishing and Pushing Linked Open Data

One does not simply

walk into Mordor

Academia and share

usable data…

Image Credit: Copyright Newline Cinema

Page 59: Publishing and Pushing Linked Open Data

Final Thoughts

Data require intellectual investment, methodological and theoretical innovation.

Institutional structures poorly configured to support data powered research

New professional roles needed, but who will pay for it?

Page 60: Publishing and Pushing Linked Open Data

Thank you!

University of Pennsylvania Digital

Humanities Forum and other Sponsors!