THE NEXT CHALLENGE · 2017. 12. 2. · 17 | Internal use only Multi Relational Big Data: the next...

Preview:

Citation preview

1 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

MULTI-RELATIONAL BIG DATA: THE NEXT CHALLENGE

Fernando Sancho Caparrini Universidad de Sevilla

fsancho@us.es

2 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

Agenda

• About information

• Ideal structures

• An initial proposal

• … and Big Data

• Conclusions

3 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

Introduction

20th Century

• Technological Revolution • Democratization of computers as working tool

… • A lot of data has been produces and digitalized

21st Century

• Information Revolution • Democratization of automatic information processing

4 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

About Information • Unstructured vs. Structured

• Raw vs. Preprocessed

• Schemaless vs. Schema

• Massive analysis capability vs. Not

Two very different contexts:

• Scientific Areas

• Humanistic Areas

5 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

About Information Examples of successful projects where information is strongly

structured:

• Mathematics

• Physic

• Biological Databases

• Chess, Go,… (Games)

• Image Processing

• Expert Systems (health, insurances,…)

6 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

Information in Archives/Humanities Main features:

• Structural Complexity

• Semantic Complexity

• Contextual Complexity

Sacrifice of interpretative facets

7 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

Searching for the Perfect Structure

• Existence of common (and ideal) structures in many disciplines:

• Vectorial Spaces in Natural Sciences

• Data-Frames in Social Science

• …? for general complex purposes

• The importance of standards:

• Theory to support reasoning

• Case studies to compare developments

• Format conversion and adaptation

8 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

What we need…

• Flexibility

• Multilevel

• Schemaless

• Tools

• Storage

• Handling

• Analysis

• Standards-based

• Mergeable

• Natural

• Robust

• Reusable

• Verifiable

9 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

A Proposal: Multi-relational Networks

• Based on a robust mathematical theory: Graph Theory

• Methodology:

• Schema Generation

• Information projection

• Analysis by Long Distance Queries

• Link-Discovery

10 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

Some simple case studies…

11 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

1550-1575 1575-1600 1600-1625 1625-1650 1650-1675

1750-1775 1700-1725 1725-1750 1775-1800 1800-1825

v v 1675-1700

v 1825-1850

v

v

v

v v v v

v

Evolution of Hispanic Baroque

12 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

Analysis of Ecuadorian Cultural Heritage

13 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

Contemporary art (Museo Reina Sofía)

Temporary Evolution of Element Descriptors

Theme by artists

of s

Artist Clustering

14 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

Gastronomic creativity in elBulli Schema

15 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

BD + Archives = New Challenges • Big Data : A concept from Business World

• Volume+Variety+Velocity+Veracity

• Curation Problem?

• Automatic Curation

• No Curation at all !!!

• Hybrid Systems by…

• Merging:

• Merging Networks

• Mining Networks

16 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

Making the defect a virtue: Semantics as a wealth of information

What can be automated ? (…development required)

Manual Curation Problem

Automatic Annotation (Ontologies)

Machine Learning

Data Science

Link Discovery

Semantic Reasoning

Formal Concept Analysis

BD + Archives + AI = New Opportunities

(Ontologies)

17 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

We need to keep development… • Improve tools for Multi-relational Networks methodology.

• New visualizations to get insight from networks.

• New algorithms to extract information from network data.

• Algorithms for automatic merging of complex networks.

• Improve data conversions:

• … to text • (advanced OCR)

• … to network • (advanced understanding)

18 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

Conclusions In the face of big problems :

Multidisciplinary approaches Development of new tools (theoretical and practical). Need for adequate training.

Disciplines involved : Humanities (diverse) for targeting decision and semantic

interpretation. Mathematics for theoretical modeling. Computer Science for the effective development of

visualization, manipulation. Data Science for the analysis tools.

19 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

Contact?

If you wish to contact :

Fernando Sancho Caparrini: fsancho@us.es

(or: fsanchocaparrini@gmail.com)

Thank you

for your attention!

Recommended