13
Antoine Isaac Information and networking days H2020 / Connecting Europe Facility, Jan 15-16, 2014

Multilingual challenges in Europeana

Embed Size (px)

DESCRIPTION

Presentation at the H2020-CEF Infoday, 16 January 2014 http://ec.europa.eu/digital-agenda/en/news/information-and-networking-days-h2020-work-programme-2014-2015-connecting-europe-facility

Citation preview

Page 1: Multilingual challenges in Europeana

Antoine Isaac

Information and networking days

H2020 / Connecting Europe Facility, Jan 15-16, 2014

Page 2: Multilingual challenges in Europeana

Europe’s platform to access cultural heritage

Currently30M objects

Page 3: Multilingual challenges in Europeana

Built on descriptive metadatafrom a broad, heterogeneous network

Audiovisual collections

National Aggregators

Regional Aggregators

Archives

Thematic collections

Libraries

Musées Lausannois

Culture.frThe European Library

APEX

European Film Gateway Europeana Fashion

2,300 galleries, museums, archives and libraries

Page 4: Multilingual challenges in Europeana

Accessing items from 36 countries

top 16

Portal interface in 31 languagesMetadata in 33 languages

Page 5: Multilingual challenges in Europeana

Serving Europe’s citizens

5M visits on Europeana.eu7M Facebook impressionsAPI use…

Page 6: Multilingual challenges in Europeana

Content (digital objects on the site of the provider)

Metadata (descriptive object information)

Public DomainCreative Commons LicensesRights reservedOrphan work

Facilitating re-use on the legal side

CC

Page 7: Multilingual challenges in Europeana

Facilitating re-use on the language side?

Our network needs automatic translation tools to address information needs all over Europe

Page 8: Multilingual challenges in Europeana

Gathering/linking existing multilingual data

Page 9: Multilingual challenges in Europeana

Related projects applying NLP tools

E.g., The PATHS project has developed techniques to enrich English and Spanish collections

1)Identification of key entities

2)Detection of (typed) similarities between objects, using metadata

3)“Background links” to external resources such as Wikipedia

4)Classification of object against a hierarchy of topic

Applying these techniques to other languages would require work

1)requires language-specific tools (PoS tagging, lemmatization)

2)is straightforward to apply to new languages

3)requires language-specific tools

4)depends on (3) and on translation of some topics

http://www.paths-project.eu/eng/Resources/Semantic-Enrichment-of-Cultural-Heritage-content-in-PATHS

Page 10: Multilingual challenges in Europeana

Language challenges for Digital Libraries

Typical queries are very short

Average < 2 terms

Identification of query language is not easy, even manually

39% of queries may belong to several languages

Plenty of named entities

60% of queries are for persons & places

Not only is it hard for queries: the same issues apply to the descriptive metadata

Studies by Humboldt University on Europeana and The European Libraryhttp://www.clef-initiative.eu/documents/71612/86374/CLEF2010wn-LogCLEF-StillerEt2010.pdf

Page 11: Multilingual challenges in Europeana

Language processing issues at the scale of Europe

Page 12: Multilingual challenges in Europeana

Thank you!

Antoine Isaac

[email protected]

@EuropeanaEU

Page 13: Multilingual challenges in Europeana

Europeana’s vision and mission

We believe in making cultural heritage openly accessible in a digital way, to promote the exchange of ideas and information

We want to be a catalyst for change in the world of cultural heritage