Transcribing between the lines: crowd-sourcing historic data collection

Transcribing between the lines: crowd-sourcing historic data collection

Nicole KearneyMuseum Victoria@nicolekearney

Dr Elycia WallisMuseum Victoria@elyw

Biodiversity Heritage Library (BHL)

The world’s largest online repository for biodiversity heritage and archival materials.

http://www.biodiversitylibrary.org

BHL-Australia

Total BHL-Au uploads• 546 volumes• 119 titles• 140,252 pages

An average of 2,000 pages/month

BHL-Australia

The Naturalist's Miscellany, or Coloured figures Of natural objects, Vol. 10, George Shaw, 1799.

The first published illustration of the Duck-billed Platypus

“Of all the Mammalia yet known it seems the most extraordinary…

…at first view, it naturally excites the idea of some deceptive preparation by artificial means.”

A synopsis of the Birds of Australia and the adjacent islands, John Gould, 1837.

What’s in the box?

Ornithology Department Archives

“Estate of Graham Brown – note books”

Catalogued in our Records & Archives database (TRIM)

Why are field diaries so important?

Field diaries are full of

DATA

DATE: 26 September 1948

OBSERVATIONS

TIME: 8am

LOCATION:Lake Corangamite

DATE: 26 September 1948

OBSERVATIONS

TIME: 8am

LOCATION:Lake Corangamite

BEHAVIOUR: nesting

SILVER GULLS (26.9.48)

300 nests on 1 island

15 islands of similar size

Estimates 4500 nests

Nesting success~ 1.5 eggs/nest

=7000 new gulls from this year from this locality

Underutilised resource

Inaccessible in their current state

• single hard copy• single location• hand-written (in the field)• historic scripts• unsearchable• uncatalogued

Our scientists need this data!

19312012 2014

Grampians National Park

Images: Heath Warwick & Nicole Kearney / Museum Victoria

A historic baseline for climate change research

?

Step 1: create individual records

A record for every item

Step 2: create digital versions

Digitisation & post processing

A digital version in our database

OCR from a page of Graham Brown’s diary

l>^v-^wAl^ livU*^/) Curiae '^tila'* -u^vttcvi Lsefei cit^:< Lv. 1^ Ol^Vm?iJcw , L>w i^-Ôtv^ dS^îL* ll^Ûk^ M/tTM^li?'^ tvc4fi>r '^^-^ G^WtY^^ uve^v. llCCUvlr]^vv\l^ '^L^>u^ l^t^

You can’t search handwriting

Step 3: transcription

Step 3: select a transcription tool

How to attract online volunteers?

http://volunteer.ala.org.au/

Forums build an online community


Ready for display?17.


DigiVol export

Extracted transcript in Word


Converted & reformatted


Ready for display!

Transcript in our database

Step 4: make them accessible

Add the metadata


Add the metadata


Add the metadata

Upload into Internet Archive

https://archive.org/

Final destination: BHL


Along with the transcriptions!


Final step?Tell everyone!

http://museumvictoria.com.au/about/mv-blog

http://blog.biodiversitylibrary.org/

Progress thus far…


• 36 field diaries digitised• 4 authors• 18 diaries transcribed

(2 per month)• 4 diaries in BHL• 70 crowd-sourced volunteers

New homes for our field diaries

… in our Scientific Art & Observation Collection.

But what about the data?

There’s a lot of data!

5 Graham Brown field diaries:

Date Species Location09/09/1947 Red Wattle bird Colac, near lake, in flowering gums13/09/1947 Crested Grebes Colac East, end of Church St, mouth

of the creek

13/09/1947 Little Pied Cormorant Colac, perched on the wreck13/09/1947 Mountain Duck Colac East, end of Church St, mouth

of the creek

13/09/1947 Musk Duck Colac, on the lake13/09/1947 Silver Gull Colac, over the lake, opposite

Queen's Avenue

5611 animal sightings

547 mentions of people & organisations

A final word about online volunteers

Rewarding online volunteers


Slide credit: Paul Flemons, DigiVol Volunteer Survey, April 2015

Thank you

Nicole [email protected]

@nicolekearney

Dr Elycia [email protected]

@elyw

mailto:[email protected]

mailto:[email protected]

Technology

Transcribing between the lines: crowd-sourcing historic data collection