31
1 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

1

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

Page 2: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

2

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

PARADISEC, the Pacific And RegionalArchive for Digital Sources in EndangeredCultures

Nick ThiebergerLinguistics DepartmentUniversity of Melbourne

Documenting Endangered Languages Workshop, November 2007

Page 3: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

3

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

What is PARADISEC?What is PARADISEC?Project aiming to preserve and make accessible researchers’ fieldrecordings of cultural materials:

• fieldtapes• notes,• dictionaries,• grammars,• texts,• etc.

Page 4: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

4

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

What is PARADISEC?What is PARADISEC?

Collaborative digital research resource set up byUniversity of Sydney, University of Melbourne &Australian National University, 2003. (UNEjoined 2004)

75% initial funding from Australian Research CouncilLIEF Scheme

Page 5: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

5

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

PARADISEC aimsPARADISEC aims

Recognition of the responsibility of researchers topreserve outputs of their research

Preservation: to adopt current optimal standardsand formats to maximise sustainability and futureusability of the collection

Page 6: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

6

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

Endangered recordingsEndangered recordings• Small and endangered languages recorded on analogue

formats becoming obsolete

• Recordings physically deteriorating due to poor storageconditions (mould, dust etc)

• Small and endangered languages recorded on analogueformats becoming obsolete

• Recordings physically deteriorating due to poor storageconditions (mould, dust etc)

Page 7: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

7

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

• Examples:

• Stephen Wurm’s 1970s Solomon Islands tapes(~120 tapes and transcripts/fieldnotes)

• Arthur Capell’s 114 tapes, Pacific and PNG1950s (and 30 archive boxes of fieldnotes)

• Bert Voorhoeve’s 180

tapes - West Papua

• Tom Dutton’s 295

PNG tapes

• Examples:

• Stephen Wurm’s 1970s Solomon Islands tapes(~120 tapes and transcripts/fieldnotes)

• Arthur Capell’s 114 tapes, Pacific and PNG1950s (and 30 archive boxes of fieldnotes)

• Bert Voorhoeve’s 180

tapes - West Papua

• Tom Dutton’s 295

PNG tapes

Page 8: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

8

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

Endangered recordingsEndangered recordings• Difficult to discover existence and thus plan to

preserve such collections

• Virtually impossible for speakers to locatematerial in their languages

• Loss of research heritage and education sectorinvestment in research

• No current repository to house this material

• Difficult to discover existence and thus plan topreserve such collections

• Virtually impossible for speakers to locatematerial in their languages

• Loss of research heritage and education sectorinvestment in research

• No current repository to house this material

Page 9: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

9

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

Regional linksRegional linksVanuatu Kaljoral Senta - provision of safe ‘blind’ backup of parts of

their collection

University of New Caledonia

Digitisation of mouldy field recordings

Tjibaou Centre - New Caledonia - discussion of metadata andarchiving methods

Institute of Papua New Guinea Studies - provision of CD copies oftapes, inclusion of funding for attendance our conferences

Vanuatu Kaljoral Senta - provision of safe ‘blind’ backup of parts oftheir collection

University of New Caledonia

Digitisation of mouldy field recordings

Tjibaou Centre - New Caledonia - discussion of metadata andarchiving methods

Institute of Papua New Guinea Studies - provision of CD copies oftapes, inclusion of funding for attendance our conferences

Page 10: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

10

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

Page 11: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

11

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

Online catalogue: paradisec.org.au/catalog

Page 12: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

12

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

Online catalogue: paradisec.org.au/catalog

Page 13: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

13

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

Online catalogue: paradisec.org.au/catalog

Page 14: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

14

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

Data access (paradisec.org.au/repository)Data access (paradisec.org.au/repository)

Page 15: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

15

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

Data access (paradisec.org.au/repository)Data access (paradisec.org.au/repository)

Page 16: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

16

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

RightsRights

• Depositor and user agreement forms online

• Rights information embedded in theprocessing system for eventual automatedaccess or restriction of access

• Password access currently implemented onshared database and store files

Page 17: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

17

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

AccessAccess• Currently only depositor access

• Download whole files from data store(e.g. for authorised community use)

• CD audio/data copies provided todepositors and to relevant culturalcentre if appropriate

Page 18: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

18

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

AccessAccess

• Streaming media(browsing, usingAnnodex)

• Audition section of file(planned)

• Sample stories withtime-aligned transcripts(EOPAS)

• Building on LACITO’swork

http://maenad.itee.uq.edu.au/exist/exist/eopas3/transcript/13009745

Page 19: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

19

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

http://paradisec.org.au/fieldnotes/SAW2/SAW2.htm

SAW2-009-excerpt.mp3/

Page 20: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

20

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

AccessAccess

• Images of fieldnotes

• Wurm notes (initially 120 items)http://paradisec.org.au/fieldnotes/SAW2.htm

• Capell notes (30 boxes, 14,000 images)http://paradisec.org.au/fieldnotes/AC2.htm

• Roesler notes (600 images)http://paradisec.org.au/fieldnotes/ROES/web/roes.htm

Page 21: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

21

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

TrainingTrainingWe have run training sessions in the use of linguistic software (in

particular Shoebox, Toolbox, Transcriber, Elan and regular expressions)at the following locations during 2004-2007:

• Melbourne University (4 x)

• Sydney University (3 x)

• University of Queensland

• Kalgoorlie Language Centre

• Muurrbay Many Rivers Language Centre (Nambucca Heads)

• New South Wales Aboriginal Languages Research and Resource Centre(Sydney)

• Australian Institute for Aboriginal and Torres Strait Islander Studies(AIATSIS)

• Victorian Aboriginal Corporation for Languages (Melbourne)

• Australian Linguistic Society conferences

• University of Hawai'i at Manoa (3 x)

• LSA Summer Institute, July 2007

We have run training sessions in the use of linguistic software (inparticular Shoebox, Toolbox, Transcriber, Elan and regular expressions)at the following locations during 2004-2007:

• Melbourne University (4 x)

• Sydney University (3 x)

• University of Queensland

• Kalgoorlie Language Centre

• Muurrbay Many Rivers Language Centre (Nambucca Heads)

• New South Wales Aboriginal Languages Research and Resource Centre(Sydney)

• Australian Institute for Aboriginal and Torres Strait Islander Studies(AIATSIS)

• Victorian Aboriginal Corporation for Languages (Melbourne)

• Australian Linguistic Society conferences

• University of Hawai'i at Manoa (3 x)

• LSA Summer Institute, July 2007

Page 22: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

22

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

As at September 17th 2007 - 4,219 items in thecatalog; 26,543 files totaling 3.34 TB, with 1854 hoursof audio

Data from 599 languages from 55 countries

PARADISEC one of 36 participating OLAC archives -OLAC is a sub-community of the Open ArchivesInitiative

PARADISEC Progress reportPARADISEC Progress report

Page 23: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

23

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

LinkagesLinkages

Importance of relationships with regional culturalorganisations, including repatriation of copies oftapes

• Vanuatu Kaljoral Senta - provision of safe ‘blind’backup of their digitised sound collection

• University of New Caledonia - Digitisation of mouldyfield recordings

• Institute of PNG studies

• Need more such links

Page 24: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

24

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

Working well?Working well?

• Relationships with regional agencies

• Workflow for digitisation, metadata entry etc

• Training of new researchers

• Developing trust of depositors

• Extent of data converted from analog

• Relationships with regional agencies

• Workflow for digitisation, metadata entry etc

• Training of new researchers

• Developing trust of depositors

• Extent of data converted from analog

Page 25: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

25

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

Critical issues not covered?Critical issues not covered?

• Outreach to our region

• Location of endangered collections in the region

• Preservation of these collections

• Funding!

• Loss of expertise during funding hiatus

• Real need to establish methods for datacuration, metadata etc that are easy to use

• Outreach to our region

• Location of endangered collections in the region

• Preservation of these collections

• Funding!

• Loss of expertise during funding hiatus

• Real need to establish methods for datacuration, metadata etc that are easy to use

Page 26: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

26

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

Cooperation between similarprograms?Cooperation between similarprograms?• OLAC

• DELAMAN

More efficient use of existing resources:

• Provision of templates and cataloging software

• OLAC

• DELAMAN

More efficient use of existing resources:

• Provision of templates and cataloging software

Page 27: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

27

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

ContactsContacts

http://paradisec.org.au

Director (Sydney)

[email protected]

Project manager (Melbourne)

[email protected]

http://paradisec.org.au

Director (Sydney)

[email protected]

Project manager (Melbourne)

[email protected]

Page 28: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

28

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

Preservation - principlesPreservation - principles

• Conform to international standards

• Use standard digital archival formats

• Open source software (reusability ofcomponents) where possible

• Plan for user communities (speakers and theirdescendants)

Page 29: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

29

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

WorkflowWorkflow

• To build good data while doing normalwork:

• Fieldwork

• Transcription

• Interlinearisation

• Lexicography

• Grammatical analysis

• To build good data while doing normalwork:

• Fieldwork

• Transcription

• Interlinearisation

• Lexicography

• Grammatical analysis

Page 30: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

30

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

Recording- named

analogue digitised/digital captured

archival digital file

transcribed andlinked (using e.g.

Transcriberor Elan)

Media corpus instantiates links to media(e.g. Audiamus)

concordance of texts, navigation tool

output to e.g. Shoebox forinterlinearising

archived withPARADISEC

archived withPARADISECTexts, dictionary etc

descriptive metadata added

Typical workflow resulting in well-formed data

Page 31: Nicholas Thieberger NSF Documenting Endangered ... Workshop Oct07...9 NSF Documenting Endangered Languages Workshop, Durham, New Hampshire, October 2007 Nicholas Thieberger PARADISEC

31

NSF Documenting Endangered Languages Workshop,Durham, New Hampshire, October 2007

Nicholas ThiebergerPARADISEC

LinkagesLinkagesTestbed for the Australian Partnership for Sustainable Repositories project

Support from the Australian Partnership for Advanced Computing (APAC)

Participant in the Australian GrangeNet highspeed network

ANU Internet Futures Project (programming for web interface to the APACaccount)

Australian Academy of the Social Sciences (French cooperation)

Sydney Uni International Development fund (U Texas visit)

EMELD, (airfares, accommodation and registration at the EMELD conference inMichigan, USA).

School of Society Culture and Performance, University of Sydney (RIBG fundingsupport)

Faculty of Arts, University of Sydney (refurbishment of rooms and infrastructuraland training support)

Test project for EthnoER media annotation grant

More …