19
Multi Relational Big Data: the next challenge ALA-ICA 2017 MULTI-RELATIONAL BIG DATA: THE NEXT CHALLENGE Fernando Sancho Caparrini Universidad de Sevilla [email protected]

THE NEXT CHALLENGE · 2017. 12. 2. · 17 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017 We need to keep development… • Improve tools for Multi-relational

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: THE NEXT CHALLENGE · 2017. 12. 2. · 17 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017 We need to keep development… • Improve tools for Multi-relational

1 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

MULTI-RELATIONAL BIG DATA: THE NEXT CHALLENGE

Fernando Sancho Caparrini Universidad de Sevilla

[email protected]

Page 2: THE NEXT CHALLENGE · 2017. 12. 2. · 17 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017 We need to keep development… • Improve tools for Multi-relational

2 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

Agenda

• About information

• Ideal structures

• An initial proposal

• … and Big Data

• Conclusions

Page 3: THE NEXT CHALLENGE · 2017. 12. 2. · 17 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017 We need to keep development… • Improve tools for Multi-relational

3 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

Introduction

20th Century

• Technological Revolution • Democratization of computers as working tool

… • A lot of data has been produces and digitalized

21st Century

• Information Revolution • Democratization of automatic information processing

Page 4: THE NEXT CHALLENGE · 2017. 12. 2. · 17 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017 We need to keep development… • Improve tools for Multi-relational

4 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

About Information • Unstructured vs. Structured

• Raw vs. Preprocessed

• Schemaless vs. Schema

• Massive analysis capability vs. Not

Two very different contexts:

• Scientific Areas

• Humanistic Areas

Page 5: THE NEXT CHALLENGE · 2017. 12. 2. · 17 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017 We need to keep development… • Improve tools for Multi-relational

5 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

About Information Examples of successful projects where information is strongly

structured:

• Mathematics

• Physic

• Biological Databases

• Chess, Go,… (Games)

• Image Processing

• Expert Systems (health, insurances,…)

Page 6: THE NEXT CHALLENGE · 2017. 12. 2. · 17 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017 We need to keep development… • Improve tools for Multi-relational

6 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

Information in Archives/Humanities Main features:

• Structural Complexity

• Semantic Complexity

• Contextual Complexity

Sacrifice of interpretative facets

Page 7: THE NEXT CHALLENGE · 2017. 12. 2. · 17 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017 We need to keep development… • Improve tools for Multi-relational

7 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

Searching for the Perfect Structure

• Existence of common (and ideal) structures in many disciplines:

• Vectorial Spaces in Natural Sciences

• Data-Frames in Social Science

• …? for general complex purposes

• The importance of standards:

• Theory to support reasoning

• Case studies to compare developments

• Format conversion and adaptation

Page 8: THE NEXT CHALLENGE · 2017. 12. 2. · 17 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017 We need to keep development… • Improve tools for Multi-relational

8 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

What we need…

• Flexibility

• Multilevel

• Schemaless

• Tools

• Storage

• Handling

• Analysis

• Standards-based

• Mergeable

• Natural

• Robust

• Reusable

• Verifiable

Page 9: THE NEXT CHALLENGE · 2017. 12. 2. · 17 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017 We need to keep development… • Improve tools for Multi-relational

9 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

A Proposal: Multi-relational Networks

• Based on a robust mathematical theory: Graph Theory

• Methodology:

• Schema Generation

• Information projection

• Analysis by Long Distance Queries

• Link-Discovery

Page 10: THE NEXT CHALLENGE · 2017. 12. 2. · 17 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017 We need to keep development… • Improve tools for Multi-relational

10 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

Some simple case studies…

Page 11: THE NEXT CHALLENGE · 2017. 12. 2. · 17 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017 We need to keep development… • Improve tools for Multi-relational

11 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

1550-1575 1575-1600 1600-1625 1625-1650 1650-1675

1750-1775 1700-1725 1725-1750 1775-1800 1800-1825

v v 1675-1700

v 1825-1850

v

v

v

v v v v

v

Evolution of Hispanic Baroque

Page 12: THE NEXT CHALLENGE · 2017. 12. 2. · 17 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017 We need to keep development… • Improve tools for Multi-relational

12 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

Analysis of Ecuadorian Cultural Heritage

Page 13: THE NEXT CHALLENGE · 2017. 12. 2. · 17 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017 We need to keep development… • Improve tools for Multi-relational

13 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

Contemporary art (Museo Reina Sofía)

Temporary Evolution of Element Descriptors

Theme by artists

of s

Artist Clustering

Page 14: THE NEXT CHALLENGE · 2017. 12. 2. · 17 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017 We need to keep development… • Improve tools for Multi-relational

14 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

Gastronomic creativity in elBulli Schema

Page 15: THE NEXT CHALLENGE · 2017. 12. 2. · 17 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017 We need to keep development… • Improve tools for Multi-relational

15 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

BD + Archives = New Challenges • Big Data : A concept from Business World

• Volume+Variety+Velocity+Veracity

• Curation Problem?

• Automatic Curation

• No Curation at all !!!

• Hybrid Systems by…

• Merging:

• Merging Networks

• Mining Networks

Page 16: THE NEXT CHALLENGE · 2017. 12. 2. · 17 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017 We need to keep development… • Improve tools for Multi-relational

16 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

Making the defect a virtue: Semantics as a wealth of information

What can be automated ? (…development required)

Manual Curation Problem

Automatic Annotation (Ontologies)

Machine Learning

Data Science

Link Discovery

Semantic Reasoning

Formal Concept Analysis

BD + Archives + AI = New Opportunities

(Ontologies)

Page 17: THE NEXT CHALLENGE · 2017. 12. 2. · 17 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017 We need to keep development… • Improve tools for Multi-relational

17 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

We need to keep development… • Improve tools for Multi-relational Networks methodology.

• New visualizations to get insight from networks.

• New algorithms to extract information from network data.

• Algorithms for automatic merging of complex networks.

• Improve data conversions:

• … to text • (advanced OCR)

• … to network • (advanced understanding)

Page 18: THE NEXT CHALLENGE · 2017. 12. 2. · 17 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017 We need to keep development… • Improve tools for Multi-relational

18 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

Conclusions In the face of big problems :

Multidisciplinary approaches Development of new tools (theoretical and practical). Need for adequate training.

Disciplines involved : Humanities (diverse) for targeting decision and semantic

interpretation. Mathematics for theoretical modeling. Computer Science for the effective development of

visualization, manipulation. Data Science for the analysis tools.

Page 19: THE NEXT CHALLENGE · 2017. 12. 2. · 17 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017 We need to keep development… • Improve tools for Multi-relational

19 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017

Contact?

If you wish to contact :

Fernando Sancho Caparrini: [email protected]

(or: [email protected])

Thank you

for your attention!