16
The Future is All Min Text and Data Mining Projects in Europe @openminted_eu @futuretdm @openminted_eu @futuretdm Funded by:

The Future is All Mine

Embed Size (px)

Citation preview

Page 1: The Future is All Mine

The Future is All MineText and Data Mining Projects in Europe

@openminted_eu @futuretdm

@openminted_eu @futuretdm

Funded by:

Page 2: The Future is All Mine

Projects funded by@openminted_e

u @futuretdm

Page 3: The Future is All Mine

Text and data mining is the future“Text and data mining (TDM) is the process of deriving information from machine-read material. It works by copying large quantities of material, extracting the data, and recombining it to identify patterns.” JISC

Projects funded by@openminted_e

u @futuretdm

Page 4: The Future is All Mine

Text and data mining helps us understand the pastMining historical books:the evolution of languageSource: http://www.sciencemag.org/content/331/6014/176 (Baylor College of Medicine, Houston)

Projects funded by@openminted_e

u @futuretdm

Page 5: The Future is All Mine

Text and data mining predicts the future

Mining newspapers:Predicts revolutions Source: http://journals.uic.edu/ojs/index.php/fm/article/view/3663/3040 (University of Illinois)

Projects funded by@openminted_e

u @futuretdm

Page 6: The Future is All Mine

Text and data mining saves the future

Mining scientific publications about diseases:Save livesSource: http://dl.acm.org/citation.cfm?id=2623667 (Baylor College of Medicine, Houston)

Projects funded by@openminted_e

u @futuretdm

Page 7: The Future is All Mine

Text mining – it seems so easy:

Linguistic Analysis:

Entity Recognition

Data MiningKnowledge Discovery

Information Extraction

STAGE 1 STAGE 2 STAGE 3 STAGE 4Information

Retrieval

Projects funded by@openminted_e

u @futuretdm

Page 8: The Future is All Mine

But it actually poses many challenges…

??

??

??

???

?? ??

??

??

How do I make my texts

readable by machines?

?Which mining method to

use?

STAGE 1 STAGE 2 STAGE 3 STAGE 4Where do I find data?

Projects funded by@openminted_e

u @futuretdm

Page 9: The Future is All Mine

9

Current Barriers in Europe

Awareness across Institutions & Stakeholders- Lack of awareness among research

communities- Lack of guidance to uncover TDM

potential

Skills and Tools- Availability and accessibility across

disciplines- Gap in skills across various sectors

Licensing & Open Access- License proliferation and interoperability

issues- License barriers to transparent open

access

Copyright and Data Protection- TDM activities infringing current

copyright laws- Legal and policy limitations and barriers

for TDM

Projects funded by@openminted_e

u @futuretdm

Page 10: The Future is All Mine

EU PROJECTS on TDMFutureTDM

Identify TDM barriers and policy solutions

Open mine

Build a TDM eInfrastructure

Projects funded by@openminted_e

u @futuretdm

Page 11: The Future is All Mine

ELABORATE a legal and policy framework for future TDM and specify a research agenda to foster the spread of TDM

BUILD a website: a Collaborative Knowledge Base and an Open Information Hub combined

ANALYSE current application areas and best

practices in TDM

ASSESS existing studies, legal

regulations and policies on TDM

Main Objectives of FutureTDM

INVOLVE all key stakeholders to

identify practices, requirements, and specific challenges

INCREASE awareness of TDM to attract new target groups and science domains

@openminted_eu @futuretdm

This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No 665940.

Page 12: The Future is All Mine

Bottom-up approach:

Stakeholder workshops and

knowledge cafes throughout Europe

FutureTDM

@openminted_eu @futuretdm

This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No 665940.

Page 13: The Future is All Mine

Data centre Data centre Data centre Data centre

in public cloud

Publisher text corpus

OpenAIRE/CORE text corpus

PMC text corpus

Other text corpora

Other text corpora

Other text corpora

Other types of text corpora

Layer 3:Interoperabilityto shared storage and computing resources

Language resources Language resources Language resources Language resources

Layer 2: Interoperability oflanguage resources & corpora

Layer 1:Interoperabilityof text mining services (platforms or components)

Language resources and corpora registry service

Platform services Registry Workflow ManagementAuth2 & Policy management Annotator Accounting

Mining Platforms Mining Platforms Mining Platforms

Proprietary architectures

Mining Platforms

Objective of OpenMinTeD

@openminted_eu Projects funded by@futuretdm

Page 14: The Future is All Mine

OpenMinTeD brings together:

14

ACCESSIBLE CONTENT

DISCOVERABLE SERVICES

EFFICIENTPROCESSING

TDM COMMUNITIES

VALUE ADDED APPS

Via standardised programmatic interfaces and access rules

Easily discoverable text mining services and workflows which process, analyse and annotate text

Operate on public e-Infrastructures via standarized APIs

Different scientific communities have different challenges

Community-driven applications to illustrate the value of the infastructure. Engage with industry.

OPENMINTED = The Open Mining Infrastructure for Text and Data

Page 15: The Future is All Mine

Become involvedFollow us on Twitter for the latest updates and blogs@openminted_eu@futuretdm

Follow our websiteswww.openminted.euwww.futuretdm.eu

Projects funded by@openminted_e

u @futuretdm

Page 16: The Future is All Mine

THANK YOU

• Athena RIC• Univ. of Manchester (NacTem)• Univ. of Darmstadt• INRA• EMBL-EBI• Agro-Know• LIBER• Univ. of Amsterdam• Open University UK• EPFL• CNIO• Univ. of Sheffield (GATE)• GESIS• GRNET• Frontiers• Univ. of Stirling

PARTNERS OPENMINTED PARTNERS FUTURETDM• SYNYO GmbH (SYNYO)• LIBER Europe• Open Knowledge Foundation

LBG (OK/CM) • Radboud Univ. Nijmegen• The British Library Board • Univ. of Amsterdam• Athena RIC• Ubiquity Press • Fundacja Projekt: Polska (FPP)