15
Conference "Europeana and the Bulgarian Institutions", Plovdiv, 3-4 Apr 2012 Ontotext Experience in Cultural Heritage Bulgariana Collections in Europeana Vladimir Alexiev, PhD, PMP Mariana Damova, PhD

Semantic Technology Applicability to CH

Embed Size (px)

DESCRIPTION

Ontotext Experience in Cultural Heritage Bulgariana Collections in Europeana Vladimir Alexiev, PhD, PMP Mariana Damova, PhD. Semantic Technology Applicability to CH. Best way to interconnect data. If the Web (1.0) is a giant hyper-linked document, Semantic Web (3.0) is a giant linked data-base - PowerPoint PPT Presentation

Citation preview

Page 1: Semantic Technology Applicability to CH

Conference "Europeana and the Bulgarian Institutions", Plovdiv, 3-4 Apr 2012

Ontotext Experience in Cultural HeritageBulgariana Collections in Europeana

Vladimir Alexiev, PhD, PMPMariana Damova, PhD

Page 2: Semantic Technology Applicability to CH

Semantic Technology Applicability to CH

• Best way to interconnect data. If the Web (1.0) is a giant hyper-linked document, Semantic Web (3.0) is a giant linked data-base

• Unified, globalized and abstracted representation (RDF, RDFS, OWL2, RIF). Schema info (metadata) is represented the same way as data

• Ontologies and schemas ensure metadata interoperability (ESE, EDM, LIDO, CIDOC CRM, EADS, MODS…)

• Linked Open Data provides additional context (DBpedia, GeoNames, FreeBase, WordNet, …)

• Thesauri ensure consistent vocabulary (Getty ULAN, AAT, TGN; IconClass, VIAF, etc)

• adopts semtech for all future development (EDM). First White Paper "Knowledge = Information in Context" looks at the key role of LOD– "Linked data gives machines the ability to make associations and put search terms into context.

Without linked data, Europeana could be seen as a simple collection of digital objects. With linked data, the potential is far greater"

Ontotext experience in CH; Bulgariana collections in Europeana #23-4 Apr 2012

Page 3: Semantic Technology Applicability to CH

Ontotext

• Ontotext is a Bulgarian company with 65 staff: Sofia, Varna, Ruse, Asenovgrad, Innsbruck (AT), London (UK), Connecticut (US), Wellington (NZ)

• Started in 2000 as a research lab in Sirma Group. Spun off in 2008 with investment from NEVEQ

• World-leader in semantic technologies. 360-degree semtech: repository (OWLIM), text mining (KIM, GATE), web mining (WMF), Ontology and Linked Data Management

• Most successful Bulgarian participant in EU FP 5,6,7 research projects (16 completed, 7 in execution). Received the prestigious Pitagoras award

• Revenue growth in the last 3 years: 210%. 5M BGL in 2011, over 7M expected in 2012

Ontotext experience in CH; Bulgariana collections in Europeana #33-4 Apr 2012

Page 4: Semantic Technology Applicability to CH

Commercial Projects

• Commercial revenue grew 10x in last 3 years and is close to 2/3 of total

• Data providers 27% (jobs, food, cars), Media/Publishing 26%, Government 18%, Life Sciences 11%, Cultural Heritage 10%, Telecom 4%

• Technical topics range from core semtech to ontology design, master data management, web services, SOA, business processes, eGov, etc etc

• UK 59%, US 18%, Global 9%, BG 7%, IT 3%, KR 2%, MX 2%, DE, NL

• Regular SemTech training courses in London

• Great potential in Cultural Heritage so we want to focus on that

Ontotext experience in CH; Bulgariana collections in Europeana #43-4 Apr 2012

Page 5: Semantic Technology Applicability to CH

Clients Related to Media and Cultural Heritage

• Project clients: UK, KR, JP, SE, NL, BG

• Research projects executed by Ontotext

• Projects using OWLIM: EU, PL, JP, UK

Ontotext experience in CH; Bulgariana collections in Europeana #53-4 Apr 2012

Page 6: Semantic Technology Applicability to CH

Projects Related to Media and Cultural Heritage (1)

• British Broadcasting Corporation (BBC): Dynamic Semantic Publishing. The World Cup (2010), BBC Sports (2011) and Olympics (2012) multi-sites run on top of OWLIM. KIM-based Concept Extraction

• Press Association (UK): commercial image annotation and search, Concept Extraction

• The National Archives (UK): Semantic KB and search for Government Web Archive. 780M documents (150M after de-duplication), 10B facts

• British Museum (UK): ResearchSpace project funded by Mellon Foundation (US): Collaborative web-based research for the cultural heritage scholarly community. Based on the CIDOC CRM ontology

• de Bibliothek (NL): data aggregation from 150 national/local sources to semantic format, unified search (40M objects)

• National Institute of Informatics (JP): Linked Open Data in Academia (LODAC): aggregates museum and other data across multiple Japanese resources

6

Page 7: Semantic Technology Applicability to CH

Projects Related to Media and Cultural Heritage (2)

• Polish Digital National Museum (PL): aggregates artifacts from 70 contributing cultural institutions

• PrestoSpace (FP6): Preservation towards storage and access. Standardized Practices for Audiovisual Contents in Europe. Continuation: PrestoCenter.org

• MOLTO (FP7) : Multilingual Online Translation. Knowledge infrastructure, interoperability between natural language and structured queries, museum object descriptions in 15 languages. Based on the CIDOC CRM ontology

• Gothenburg City Museum (SE): 9K museum objects for use case of CH knowledge representation that allows querying and presenting semantic search results in natural language.

• Bulgariana (BG, KR): a Bulgarian aggregator for Europeana, including digital repository for CH objects, semantic conversion (ESE, EDM), submission to Europeana, and community building

7

Page 8: Semantic Technology Applicability to CH

Bulgariana

• A Bulgarian aggregator to Europeana that includes– A public website for sharing information– A wiki (Confluence) for discussion, technical materials, coordination and

collaboration– A digital repository (DSpace) for storing and presenting digitized cultural

heritage– Conversion/ingestion tools for converting objects to the required Europeana

formats: ESE and EDM (pilot)– An OAI-PMH endpoint for serving content to Europeana– Semantic search using OWLIM (in the future)

• Partners– BG-KR IT Cooperation Center: initial funding– Ontotext: initiative, semtech, Europeana contact– Sirma Media: digital repository And we want you!

Ontotext experience in CH; Bulgariana collections in Europeana #83-4 Apr 2012

Page 9: Semantic Technology Applicability to CH

Collaboration and Networking

• Google Group "Cultural Heritage Digitalisation"– Jointly created with SU FMI in Oct 2011– 40 members, 80 messages in 5 months (still not a lot of activity…)

• Meetings– 20121010: joint MS program in Digitalization (IMI BAS, UNIBIT).

Welcome by Ontotext, proposed to use Bulgariana as a platform– 20120119: Restart(?) of expert working group (Ministry of Culture)– 20120130: Europeana1 "Mission Possible" (Ontotext, Sofia University)– 20120319: Europeana2: “Bulgarian projects for digitalization and presentation of cultural

heritage Europeana" (V.Tarnovo Regional Library) All presentations and contacts are published– 20120305: “Workshop on Multilingual Digital Repositories and Services” (Sofia: ITD, VirtSOI,

DSLL, ATLAS, Share.TEC)– 20120918: "Digitalization, Preservation and Presentation of Cultural and Scientific Heritage"

(DiPP 2012, organized by IMI BAS, hosted by V.Tarnovo Library)– 201211xx: Europeana3 (Varna Regional Library)

Ontotext experience in CH; Bulgariana collections in Europeana #93-4 Apr 2012

Page 10: Semantic Technology Applicability to CH

Current Proposals• WSR4Europeana (web science research for Europeana): FP7 People (Marie Curie) Initial

Training Network (Multi-Partner ITN). Doctoral research, exchanges, training– Partners: Humboldt U (DE), Tampere (FI), Aalto U (FI), FORTH (GR), RSLIS (DK), U Mannheim (DE), VU Brussel (BE),

NTUA (GR), Ontotext (BG), Seme4 (UK), Net7 (IT),– Associated: Europeana (NL), CNR ISTI (IT), U Carlos III (ES), CCS (DE), Tufts U (US)– Emerging fields, incl. semantic repositories for Europeana, semantic annotation

• SmartCulture: Regions of Knowledge 2012 cluster of clusters– International: Madrid, Basque, Birmingham, Siena, Eindhoven, Central Denmark, Sofia– BG cluster: Sofia Development Organization, Sofia University, UNIBIT, IMI-BAS, Ontotext, Tetracom, DSLL

• Ontology-based Digital Platform for Knowledge Sustainability: ICT Call9– U Lyon, invited GeoCad93. Tentative

• Balkan Wars: PSP Call6– Idea by PrimaSoft/SoftLib. Interest VTU, V.Tarnovo library , IMI BAS, Plovdiv Library– Need 6 international partners: Turkey, Serbia, Macedonia, etc.Tentative

• Geographical Regions : PSP Call6. BAS, Austria, Romania, GeoCad93. Tentative

• Slavonic Manuscripts: PSP Call6. BAS, tentative

Ontotext experience in CH; Bulgariana collections in Europeana #103-4 Apr 2012

Page 11: Semantic Technology Applicability to CH

Bulgariana Wiki

Ontotext experience in CH; Bulgariana collections in Europeana #113-4 Apr 2012

Page 12: Semantic Technology Applicability to CH

Bulgariana Collections (1)• Pra-historic and

Thracian Civilizations

• Unpublished Thracian archeological objects. Prof. Valeria Fol, Center of Thracology at the Institute for Balkan Studies at the Bulgarian Academy of Sciences

Ontotext experience in CH; Bulgariana collections in Europeana #123-4 Apr 2012

Page 13: Semantic Technology Applicability to CH

Bulgariana Collections (2)

• Golden Pages from the Bulgarian Renaissance– Unique manuscripts of Bulgarian folk songs collected in 19th century

by Miladinov Brothers, published in 2008 by Dr Luchia Antonova, Institute of Bulgarian Language, BAS

Ontotext experience in CH; Bulgariana collections in Europeana #133-4 Apr 2012

МАРКО КРАЛЕВИКИ БОЛЕН СЕ КАИТ И СЕ ИСПОВЕДВИТ

Поболил се Марко Кралевике,що си лежал токму три години,от нищо се иляч (1) не на’ож’ал.И му рече негва стара майќа:“Ай ти, Марко, ай ти, синко милий;не си болен, синко, от господа,тук си болен, синко, от гре’о’и,да ти викна попой (2), ду’овници,лепо да се синко исповедиш,да си кажиш твоите гре’о’и!”….

Page 14: Semantic Technology Applicability to CH

Bulgariana Collections Published to Europeana (1)

Ontotext experience in CH; Bulgariana collections in Europeana #143-4 Apr 2012

Page 15: Semantic Technology Applicability to CH

Bulgariana Collections Published to Europeana (2)

Ontotext experience in CH; Bulgariana collections in Europeana #153-4 Apr 2012