18
From DOBES to CLARIN and beyond Axel Horstmann Peter Wittenburg Erhard Hinrichs VolkswagenFoundation MPI for Psycholinguistics University of Tübingen ?

From DOBES to CLARIN and beyond Axel Horstmann Peter Wittenburg Erhard Hinrichs VolkswagenFoundation MPI for Psycholinguistics University of Tübingen ?

Embed Size (px)

Citation preview

From DOBES to CLARIN and beyond

Axel Horstmann

Peter Wittenburg

Erhard Hinrichs

VolkswagenFoundation

MPI for Psycholinguistics

University of Tübingen

?

FACTS AND FIGURES

• Non-profit-making foundation established unter private law based in Hanover• Not affiliated with the car manufacturer of the same name• Founded by the Governments of the Federal Republic of • Germany and the State of Lower Saxony in 1961• Objective: to support science and technology as well as the humanities and the social sciences in research and university teaching• Assets: about 2.45 billion euros• Funding p.a.: about 110 million euros• One of the most potent private research funding foundations in Europe

FOCUS ON HUMANITIES AND SOCIAL SCIENCES

• Current funding initiatives (see KURZINFORMATION / BASIC INFORMATION): about 45 to 50 % of the funds given to H&SC

• Initiatives focussing on infrastructural support of H&SC:• Kulturwissenschaftliche Dokumentation (closed)• Archive als Fundus der Forschung (closed)• DOBES: Dokumentation bedrohter Sprachen

• Projects including infrastructural support of H&SC• Strategy building on digitization of endangered books• Digitization of the so-called “Aschebücher” of the HAAB

Weimar (in preparation)

"E-HUMANITIES": POSSIBILITIES AND PERSPECTIVES

• Strong interest in innovative approaches

• Funds available for projects involving activities towards "E-Humanities" (e.g.: digitization of data, collections, archival material) within current funding initiatives

• Funding possibilities for meetings, workshops, conferences etc. focussing on "E-Humanities" (within the funding initiative Symposia and Summer Schools)

• New perspectives on "E-Humanities" (possibly) opened up within a new funding initiative aiming at Research in Museums (actually in planning) including to a certain extent digitization activities - … and not to forget the

• Flagship "DOBES" ...

Concrete steps or Babylonian Tower

• we don’t know exactly what eHumanities means

• we feel that mechanisms in research processes are changing rapidly with technological innovation as motor• but we can’t say: “we are now going to design eHumanities” • we probably can say: “let’s plan further concrete projects and actions and see”

• many excellent projects around – let me just refer to the good sides of DOBES as one of these steps

(Documentation of Endangered Languages funded by VolkswagenFoundation)

What is DOBES?

44 DOBES teams working fully distributed and self-organized incl. linguists, anthropologists, musicologists, ethno-biologists, etc. In addition, VWF installed a central archive Start in 2000

What changed in DOBES?

• handing over all data after a limited time to an archive was completely new and is an explicit step, although the results will not be ready

• there is a push to make data accessible to others from the beginning - also new for many and not without conflicts

• asking researchers to categorize and organize material according to agreed metadata was also new and still requires evangelization

• including multimedia in the documentation and dealing with audio/video as basis was kind of new and requires techno-knowledge

Which infrastructure by DOBES?

• a stable, reliable and open repository/archiving system handling 30 TB • data storage not encapsulated and in open formats • introduction of persistent identifiers to ensure investments in relating fragments• a network of 12 centers worldwide included in data distribution• of these 6 copies in centers with hardware migration strategy • a number of web-based applications offering various ways to access the data

CLARIN/D-SPIN Challenges

eResearch is about global collaboration in key areas of science and the next generation of infrastructure that will enable it (J. Taylor)

• goal is an open research infrastructure to overcome the huge fragmentation of language resources and tools and to offer them to research communities - in particular to humanities

• help tackling the LARGE challenges (multilingual societies)

• but also helping the individual researcher • example: align a transcription and an audio signal• how many researchers know about how to do this

• see CLARIN/D-SPIN as a huge virtual marketplace of resources and tools that can be combined due to integration and interoperability solutions • not forget Henry Thompsons (one of the XML fathers) don't have an agreed descriptive system in our domain

CLARIN/D-SPIN Research Infrastructure

• basis of big supermarket are classification and convincing organization principles based on 10 years of experience we know that only a flexible component model will be accepted

• seem to go towards a Federation of LRT producers that can make contracts with Identity Federations just one signature necessary to get all researchers with their home identity integrated have already setup a first small test federation (EC-DAM-LR)

• researchers dream: virtual collection building and creating workflows flexibly - not trivial due to import/export aspects LREC showed that we know already a lot about the problem

CLARIN/D-SPIN Network of Service Centers

• need a network of strong and persistent centers of "new" type

• researchers will only adapt if they can rely on new mechanisms

• need to simplify the IPR/license situation

towards eHumanities

• CLARIN has > 100 members from 32 countries • in Germany 9 well-known centers and some more will join • is an enormous challenge to make a real step ahead in CLARIN

• can we all together extend to eHumanities infrastructure or are we already close to collapse?

a few questions I

• will there be a separate infrastructure for each H discipline?

• NO

• there will be several shared services such as a PID registration and resolution service

• however: • building a joint infrastructure has to do with community building, trust, common language etc • too big communities would not work • so let's move on in TextGrid, DARIAH, CLARIN etc• but let's have a close and fair contact to find synergies

• competition will become heavy and our competitors are the Googles of the world!

a few questions II

• will there be a single market place for the humanities?

• NO

• acceptance of a market place is dependent on classification and organization principles - as already said• these are different in all disciplines

• so have to start from the disciplines in our solutions • already difficult enough

• leave it to Semantic Web guys to enable cross-walk

a few questions III

• who will be the main players?

• of course the big libraries, archives and museums• but what about the universities and big organizations such as MPG

• important: • we see new requirement profiles emerging • kind of job sharing can be predicted

• of course: close collaboration with innovative libraries such as SUB etc is required

computer centers

curation centers

content centers

highly specialized groups

RZG, GWDG

MPDL + few domain MPIs

a number of domain MPIs

highly specialized MPI departments

a few questions IV

• key bricks for interoperability?

• we need open registries of all sort and smart registry frameworks

• schema registries• concept registries (ISOcat - a creation of ISO TC37/SC4)• relation registries • etc

• however:• a very complex landscape seems to emerge • how to make it usable by laymen?• how to convince researchers to work with them?

• no one knows yet - we need to try out - what else?

Summary

• we need initiatives again and again to stepwise advance the borders

• it is now also time to transform existing knowledge into persistent infrastructures

• will need a lot of sensitivity and patience - RI building costs time

• emerging landscapes will have an underlying complexity • need to offer discipline vocabulary• need to hide complexity to a certain extent • need to offer persistency

Project solutions are not per se useful as infrastructure solutions!

End

in Germany we have already a good mixture with TextGrid, DOBES, eAqua, DARIAH and CLARIN/D-SPIN have to get together frequently

Thanks for the attention.