51
Libraries in a data-centered environment Jakob Voß Ticer Summer School, August 22th, 2012

Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Libraries in a data-centered environment

Jakob Voß

Ticer Summer School, August 22th, 2012

Page 2: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Prolegomena

The importance of Data

The importance of Libraries

Summary

Appendices

Page 3: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Section 1

Prolegomena

Page 4: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

So what about the Cloud?

I It‘s a hype

I It’s a buzzword (cloud = bullshit)1

I Better know exactely what is referred to by “cloud”

Which notion of cloud do libraries refer to?

1to impress and persuade, unconcerned with falsehoods (Frankfurt, 2005)

Page 5: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Three notions of the Cloud

Page 6: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Figure: Infrastructure as a Service (IaaS)

Page 7: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Figure: Platform as a Service (PaaS)

Page 8: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Figure: Software as a Service (SaaS)

Page 9: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Software as a Service (aka “web application”)

I Software that you don’t have to install or update.

I Software that hides some of its complexity.

Any software is inherently more complex then the task itautomates. Don’t expect software to simplify anything!

Page 10: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Section 2

The importance of Data

Page 11: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Data vs. Applications

“Data matures like wine, applications like fish”— James Governor

Page 12: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Data vs. Applications

I For immediate consumptionI Requirements and business logic change

I Technical developments and trendsI People’s requirements change

Page 13: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Data vs. Applications

I Can be used in different context and times, if it is well done:I respect special properties of dataI respect different notions of data

Page 14: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Special properties of data

Bits can freely be rearranged. Eventually, data

I can be copied

I can be modified

very efficiently, without any traces or differencesbetween “original” and “copy”.

Page 15: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Special properties of data

Digital Collections and descriptions of data are data again.

Page 16: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Data challenges

This is where libraries are needed!

I Preservation

I Authenticity

I Provenance

I Identity

Page 17: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Data challenges: Preservation

I All data needs a carrier

I Unsolved problem in general, but established discipline

Page 18: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Data challenges: Authenticity

I Data modification leaves no traces

I Related to preservation but more about trust

Page 19: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Data challenges: Provenance

I Data copy leaves no traces

I Digital signatures and trust (again)

Page 20: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Data challenges: Identity

I A single bit changes the whole dataset

I Which modifications matter?

Page 21: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Notions of data

I It’s also a hype

I It’s not a buzzword (data 6= bullshit)2

I Better know exactely what is referred to by “data”

Which notion of data do libraries refer to?

2Not primarily used to impress (but also used without care)

Page 22: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Three notions of data3

I Data as facts

I Data as subjective observations

I Data as communications

3As identified by Ballsun-Stanton (2012)

Page 23: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Data as facts

I Hard numbers, product of reproducible measurements,scientific facts

I Used to reveal (the real) world

Page 24: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Data as facts in libraries

I Created by librariesI Holding countsI Patron informationI Formal metadata

I Collected by librariesI research data

Page 25: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Data as subjective observations

I Product of recorded observations, sense-impressions that mustbe filtered

I Used to construct (our) reality

Page 26: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Data as subjective observations in libraries

I Created by librariesI Subject indexingI User studiesI Analysis of publication trends

I Collected by librariesI research data

Page 27: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Data as communications

I Transferred or stored sign, a container of meaning in form ofsequence of bits

I Used to describe (any) reality

Page 28: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Data as communications in libraries

digital objects, electronic resources, informational objects,electronic publications, . . . , digital documents

I Collected by libraries

Page 29: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Data as communications/documents in libraries

A document is not information but a recorded “evidence insupport of a fact” (Briet, 1951),4 which can be any possiblestatement. This notion somehow got lost in the history of libraryand information science / documentation science (Ørom 2007).

I Advice: Don’t mess with research data as facts or asobservations but treat them as documents, like other (digital)publications!

4See Buckland (1997,1998) for an introduction.

Page 30: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

The importance of data in libraries

I More and more communication and publication is digital.

I If libraries still care for documents, they have to care for data.

Does your library actually care for data or is activity around dataoutsourced to the “tech people”? Would you outsource anyactivity around books to the “book people”?

Page 31: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Section 3

The importance of Libraries

Page 32: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

What does a library do?

A library collects, arranges, and makes available (published)documents (among other services) to meet user needs.

This should also apply to digital documents:

I collect data

I arrange data

I make available data

Page 33: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Collect data

Figure: Data needs care

Page 34: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Figure: How many libraries store digital objects

Page 35: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

The eResource fallacy

If libraries license eResources to be accessed from publisher sites,libraries are only temporary, intermediary retailers.

I Advice: Data that cannot be copied and modifed is lost.Actually collect and process digital documents (or you won’tbe in the document business anymore)

Page 36: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Why and how can libraries care for digital documents?

I Libraries are used to deal with documents (data ascommunications)

I collect digital documentI arrange digital documentsI make available digital documents

I Libraries can respond to the data challenges, because of:I TrustI NeutralityI Persistence

I Ensure that documents can be used as data:I copyingI modification

Page 37: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Example: Collections

I Create and present your digital collections

I in different forms on different platforms

Page 38: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

<!–

–>

Page 39: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Example: Annotations

One must be able to

I annotate your documents

I link to (parts of) it

I to track changes when reusing it (revision control)

Page 40: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques
Page 41: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

5

5Netline.org screenshot by David McClure with map tiles by Stamen Design,under CC BY 3.0, data by OpenStreetMap, under CC BY SA, maps from theHotchkiss Map Collection at LoC

Page 42: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Section 4

Summary

Page 43: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

The situation

I Software is inherently complex and becomes obsolete

I Data ages less rapidly if it can be used in different contextand time

In the end all content will be digital – get used to it!

Page 44: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Care for data!

“Data that is loved tends to survive” — Kurt Bollacker

I respect special properties of dataI copying and modification must be possible

I Be able to store whereever you likeI Be able to transform to any form you likeI Be able to combine with any data you like

I digital collections and metadata are also data

I respect different notions of dataI focus on data as communications instead of digging into

details of research data

I Provide solutions to the data challengesI preservationI authenticityI provenanceI identity

Page 45: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Where to start

I Collect digital publications!I Start archiving public websites, blogs, mailing lists etc.I Create and manage data/document repositories (see

yesterday’s talks)I Invest in preservationI Exchange digital documents with other libraries and initiatives

(see following talk by Herbert van de Sompel, LOCKSS. . . )

I Provide data as accessible as possible (Open Data)

I Publish your own digital publicationsI Allow annotating and connecting with your digital documents

I simple access will not be the primary role of librariesI Data needs to be usable in different context and times

Page 46: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Section 5

Appendices

Page 47: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

References

I Ballsun-Stanton, Brian (2012): Asking About Data: ExploringDifferent Realities of Data via the Social Data Flow NetworkMethodology. PhD thesis

I Briet, Suzanne (1951): Qu’est-ce que la documentation?Editions documentaires, industrielles et techniques

I Buckland, Michael (1997): What is a “document”? In:Journal of the American Society of Information Science(JASIST) 48.9, pp. 804–809

I Buckland, Michael (1998): What is a “digital document”? In:Document Numerique 2.2, pp. 221– 230

I Frankfurt, Harry G. (2005). On Bullshit. Princeton UniversityPress

I Ørom, Anders (2007): The concept of information versus theconcept of document. In: Skare et al. (eds.) : Document(re)turn. Contributions from a research field in transition.pp. 53-72. Peter Lang

Page 48: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Image Credits and licensesAll images from Wikimedia Commons:

I construction.jpg CC-0 by Schweinepeterle (Rolf H.)I cubicals.jpg CC-BY-SA by David R. TribbleI mcdonalds.jpg CC-0 by Raysonho@Grid EngineI wine.jpg CC-BY-SA by Rafael Garcia-SuarezI fish.jpg CC-BY-SA by mahalie stackpoleI periodictable.png CC-0 by CepheusI droste.jpg CC-BY ZzubnikI winecellar.jpg CC-BY-SA by Che (Petr Novak)I cloud.jpg CC-0 by Sidik iz PTUI regal.jpg CC-0 by Czarna TruciznaI telescope.jpg CC-0 by C. ZorziI tree.jpg CC-BY-SA by Ji-ElleI twins.jpg CC-0 by William Morris AgencyI earthquake.jpg CC-0 by Bert CohenI vermeer.jpg CC-BY-SA Kunstkenner2305I clown.jpg CC-BY by Hamdan Zakaria

Page 49: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Got questions? Just Ask!

http://libraries.stackexchange.com

Q&A about libraries and information science

Page 50: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

This presentation as digital document

Source code and images of this presentation are available athttps://github.com/jakobib/ticer2012 to be copied andmodified under CC-BY-SA license.

Page 51: Libraries in a data-centered environmentjakobib.github.io/ticer2012/slides.pdfI Briet, Suzanne (1951): Qu’est-ce que la documentation? Editions documentaires, industrielles et techniques

Original data from libraries

I Data not used as documents:I facts: library data (holdings, patrons, loans, formal metadata)I observations: subject metadata (descriptions)I communication: your publications

I Can be used in other context and timesI Example: Authority files, connected via VIAFI Good use of data refers to you as source and authority

I Advice: Provide as much as possible your original data anddocuments. Just publish and care for this documents like anyother acquisitions.