61
Transcribing between the lines: crowd-sourcing historic data collection Nicole Kearney Museum Victoria @nicolekearney Dr Elycia Wallis Museum Victoria @elyw

Transcribing between the lines: crowd-sourcing historic data collection

Embed Size (px)

Citation preview

Page 1: Transcribing between the lines: crowd-sourcing historic data collection

Transcribing between the lines: crowd-sourcing historic data collection

Nicole KearneyMuseum Victoria@nicolekearney

Dr Elycia WallisMuseum Victoria@elyw

Page 2: Transcribing between the lines: crowd-sourcing historic data collection

Biodiversity Heritage Library (BHL)

The world’s largest online repository for biodiversity heritage and archival materials.

http://www.biodiversitylibrary.org

Page 3: Transcribing between the lines: crowd-sourcing historic data collection

BHL-Australia

Total BHL-Au uploads• 546 volumes• 119 titles• 140,252 pages

An average of 2,000 pages/month

Page 4: Transcribing between the lines: crowd-sourcing historic data collection

BHL-Australia

The Naturalist's Miscellany, or Coloured figures Of natural objects, Vol. 10, George Shaw, 1799.

The first published illustration of the Duck-billed Platypus

“Of all the Mammalia yet known it seems the most extraordinary…

…at first view, it naturally excites the idea of some deceptive preparation by artificial means.”

Page 5: Transcribing between the lines: crowd-sourcing historic data collection

A synopsis of the Birds of Australia and the adjacent islands, John Gould, 1837.

Page 6: Transcribing between the lines: crowd-sourcing historic data collection
Page 7: Transcribing between the lines: crowd-sourcing historic data collection
Page 8: Transcribing between the lines: crowd-sourcing historic data collection
Page 9: Transcribing between the lines: crowd-sourcing historic data collection
Page 10: Transcribing between the lines: crowd-sourcing historic data collection

What’s in the box?

Ornithology Department Archives

“Estate of Graham Brown – note books”

Catalogued in our Records & Archives database (TRIM)

Page 11: Transcribing between the lines: crowd-sourcing historic data collection

Why are field diaries so important?

Page 12: Transcribing between the lines: crowd-sourcing historic data collection
Page 13: Transcribing between the lines: crowd-sourcing historic data collection
Page 14: Transcribing between the lines: crowd-sourcing historic data collection

Field diaries are full of

DATA

Page 15: Transcribing between the lines: crowd-sourcing historic data collection

DATE: 26 September 1948

OBSERVATIONS

TIME: 8am

LOCATION:Lake Corangamite

Page 16: Transcribing between the lines: crowd-sourcing historic data collection

DATE: 26 September 1948

OBSERVATIONS

TIME: 8am

LOCATION:Lake Corangamite

BEHAVIOUR: nesting

Page 17: Transcribing between the lines: crowd-sourcing historic data collection

SILVER GULLS (26.9.48)

300 nests on 1 island

15 islands of similar size

Estimates 4500 nests

Nesting success~ 1.5 eggs/nest

=7000 new gulls from this year from this locality

Page 18: Transcribing between the lines: crowd-sourcing historic data collection

Underutilised resource

Inaccessible in their current state

• single hard copy• single location• hand-written (in the field)• historic scripts• unsearchable• uncatalogued

Page 19: Transcribing between the lines: crowd-sourcing historic data collection

Our scientists need this data!

19312012 2014

Grampians National Park

Images: Heath Warwick & Nicole Kearney / Museum Victoria

Page 20: Transcribing between the lines: crowd-sourcing historic data collection

A historic baseline for climate change research

Page 21: Transcribing between the lines: crowd-sourcing historic data collection

?

Page 22: Transcribing between the lines: crowd-sourcing historic data collection

Step 1: create individual records

Page 23: Transcribing between the lines: crowd-sourcing historic data collection

A record for every item

Page 24: Transcribing between the lines: crowd-sourcing historic data collection

Step 2: create digital versions

Page 25: Transcribing between the lines: crowd-sourcing historic data collection

Digitisation & post processing

Page 26: Transcribing between the lines: crowd-sourcing historic data collection

A digital version in our database

Page 27: Transcribing between the lines: crowd-sourcing historic data collection

OCR from a page of Graham Brown’s diary

l>^v-^wAl^ livU*^/) Curiae '^tila'* -u^vttcvi Lsefei cit^:< Lv. 1^ Ol^Vm?iJcw , L>w i^-^Otv^ dS^^iL* ll^^Uk^ M/tTM^li?'^ tvc4fi>r '^^-^ G^WtY^^ uve^v. llCCUvlr]^vv\l^ '^L^>u^ l^t^

You can’t search handwriting

Page 28: Transcribing between the lines: crowd-sourcing historic data collection

Step 3: transcription

Page 29: Transcribing between the lines: crowd-sourcing historic data collection
Page 30: Transcribing between the lines: crowd-sourcing historic data collection
Page 31: Transcribing between the lines: crowd-sourcing historic data collection
Page 32: Transcribing between the lines: crowd-sourcing historic data collection
Page 33: Transcribing between the lines: crowd-sourcing historic data collection

Step 3: select a transcription tool

Page 34: Transcribing between the lines: crowd-sourcing historic data collection
Page 35: Transcribing between the lines: crowd-sourcing historic data collection
Page 36: Transcribing between the lines: crowd-sourcing historic data collection

How to attract online volunteers?

http://volunteer.ala.org.au/

Page 37: Transcribing between the lines: crowd-sourcing historic data collection

Forums build an online community

http://volunteer.ala.org.au/

Page 38: Transcribing between the lines: crowd-sourcing historic data collection

Ready for display?17.

Page 39: Transcribing between the lines: crowd-sourcing historic data collection

http://volunteer.ala.org.au/

DigiVol export

Page 40: Transcribing between the lines: crowd-sourcing historic data collection

Extracted transcript in Word

http://volunteer.ala.org.au/

Page 41: Transcribing between the lines: crowd-sourcing historic data collection

Converted & reformatted

http://volunteer.ala.org.au/

Page 42: Transcribing between the lines: crowd-sourcing historic data collection

Ready for display!

Page 43: Transcribing between the lines: crowd-sourcing historic data collection

Transcript in our database

Page 44: Transcribing between the lines: crowd-sourcing historic data collection

Step 4: make them accessible

Page 45: Transcribing between the lines: crowd-sourcing historic data collection

Add the metadata

Page 46: Transcribing between the lines: crowd-sourcing historic data collection

http://volunteer.ala.org.au/

Add the metadata

Page 47: Transcribing between the lines: crowd-sourcing historic data collection

http://volunteer.ala.org.au/

Add the metadata

Page 48: Transcribing between the lines: crowd-sourcing historic data collection

Upload into Internet Archive

https://archive.org/

Page 49: Transcribing between the lines: crowd-sourcing historic data collection

Final destination: BHL

http://www.biodiversitylibrary.org

Page 50: Transcribing between the lines: crowd-sourcing historic data collection

Along with the transcriptions!

http://www.biodiversitylibrary.org

Page 51: Transcribing between the lines: crowd-sourcing historic data collection

Final step?Tell everyone!

Page 52: Transcribing between the lines: crowd-sourcing historic data collection

http://museumvictoria.com.au/about/mv-blog

Page 53: Transcribing between the lines: crowd-sourcing historic data collection

http://blog.biodiversitylibrary.org/

Page 54: Transcribing between the lines: crowd-sourcing historic data collection

Progress thus far…

http://volunteer.ala.org.au/

• 36 field diaries digitised• 4 authors• 18 diaries transcribed

(2 per month)• 4 diaries in BHL• 70 crowd-sourced volunteers

Page 55: Transcribing between the lines: crowd-sourcing historic data collection

New homes for our field diaries

… in our Scientific Art & Observation Collection.

Page 56: Transcribing between the lines: crowd-sourcing historic data collection

But what about the data?

Page 57: Transcribing between the lines: crowd-sourcing historic data collection

There’s a lot of data!

5 Graham Brown field diaries:

Date Species Location09/09/1947 Red Wattle bird Colac, near lake, in flowering gums13/09/1947 Crested Grebes Colac East, end of Church St, mouth

of the creek

13/09/1947 Little Pied Cormorant Colac, perched on the wreck13/09/1947 Mountain Duck Colac East, end of Church St, mouth

of the creek

13/09/1947 Musk Duck Colac, on the lake13/09/1947 Silver Gull Colac, over the lake, opposite

Queen's Avenue

5611 animal sightings

547 mentions of people & organisations

Page 58: Transcribing between the lines: crowd-sourcing historic data collection

A final word about online volunteers

Page 59: Transcribing between the lines: crowd-sourcing historic data collection

Rewarding online volunteers

http://volunteer.ala.org.au/

Page 60: Transcribing between the lines: crowd-sourcing historic data collection

Slide credit: Paul Flemons, DigiVol Volunteer Survey, April 2015

Page 61: Transcribing between the lines: crowd-sourcing historic data collection

Thank you

Nicole [email protected]

@nicolekearney

Dr Elycia [email protected]

@elyw