Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
1 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017
MULTI-RELATIONAL BIG DATA: THE NEXT CHALLENGE
Fernando Sancho Caparrini Universidad de Sevilla
2 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017
Agenda
• About information
• Ideal structures
• An initial proposal
• … and Big Data
• Conclusions
3 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017
Introduction
20th Century
• Technological Revolution • Democratization of computers as working tool
… • A lot of data has been produces and digitalized
21st Century
• Information Revolution • Democratization of automatic information processing
4 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017
About Information • Unstructured vs. Structured
• Raw vs. Preprocessed
• Schemaless vs. Schema
• Massive analysis capability vs. Not
Two very different contexts:
• Scientific Areas
• Humanistic Areas
5 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017
About Information Examples of successful projects where information is strongly
structured:
• Mathematics
• Physic
• Biological Databases
• Chess, Go,… (Games)
• Image Processing
• Expert Systems (health, insurances,…)
6 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017
Information in Archives/Humanities Main features:
• Structural Complexity
• Semantic Complexity
• Contextual Complexity
Sacrifice of interpretative facets
7 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017
Searching for the Perfect Structure
• Existence of common (and ideal) structures in many disciplines:
• Vectorial Spaces in Natural Sciences
• Data-Frames in Social Science
• …? for general complex purposes
• The importance of standards:
• Theory to support reasoning
• Case studies to compare developments
• Format conversion and adaptation
8 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017
What we need…
• Flexibility
• Multilevel
• Schemaless
• Tools
• Storage
• Handling
• Analysis
• Standards-based
• Mergeable
• Natural
• Robust
• Reusable
• Verifiable
9 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017
A Proposal: Multi-relational Networks
• Based on a robust mathematical theory: Graph Theory
• Methodology:
• Schema Generation
• Information projection
• Analysis by Long Distance Queries
• Link-Discovery
10 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017
Some simple case studies…
11 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017
1550-1575 1575-1600 1600-1625 1625-1650 1650-1675
1750-1775 1700-1725 1725-1750 1775-1800 1800-1825
v v 1675-1700
v 1825-1850
v
v
v
v v v v
v
Evolution of Hispanic Baroque
12 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017
Analysis of Ecuadorian Cultural Heritage
13 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017
Contemporary art (Museo Reina Sofía)
Temporary Evolution of Element Descriptors
Theme by artists
of s
Artist Clustering
14 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017
Gastronomic creativity in elBulli Schema
15 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017
BD + Archives = New Challenges • Big Data : A concept from Business World
• Volume+Variety+Velocity+Veracity
• Curation Problem?
• Automatic Curation
• No Curation at all !!!
• Hybrid Systems by…
• Merging:
• Merging Networks
• Mining Networks
16 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017
Making the defect a virtue: Semantics as a wealth of information
What can be automated ? (…development required)
Manual Curation Problem
Automatic Annotation (Ontologies)
Machine Learning
Data Science
Link Discovery
Semantic Reasoning
Formal Concept Analysis
BD + Archives + AI = New Opportunities
(Ontologies)
17 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017
We need to keep development… • Improve tools for Multi-relational Networks methodology.
• New visualizations to get insight from networks.
• New algorithms to extract information from network data.
• Algorithms for automatic merging of complex networks.
• Improve data conversions:
• … to text • (advanced OCR)
• … to network • (advanced understanding)
18 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017
Conclusions In the face of big problems :
Multidisciplinary approaches Development of new tools (theoretical and practical). Need for adequate training.
Disciplines involved : Humanities (diverse) for targeting decision and semantic
interpretation. Mathematics for theoretical modeling. Computer Science for the effective development of
visualization, manipulation. Data Science for the analysis tools.
19 | Internal use only Multi Relational Big Data: the next challenge ALA-ICA 2017
Contact?
If you wish to contact :
Fernando Sancho Caparrini: [email protected]
(or: [email protected])
Thank you
for your attention!