24
Introduction to NoSQL databases [email protected] Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years Unprecedent management of this volume Need distribution: Computation Data Clusters: Huge number of servers Ex: Google, Amazon, Facebook Google DataCenter : 5000 servers/data center, ~1M de servers Facebook : 1 PetaBytes of data Big Data Context

Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases [email protected] Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

Introduction to NoSQL databases

[email protected]

Applications and Web platformsExponential growth of the amount of Data

x2 / 2 yearsUnprecedent management of this volume• Need distribution:

• Computation• Data

• Clusters: Huge number of serversEx: Google, Amazon, FacebookGoogle DataCenter :

• 5000 servers/data center, ~1M de serversFacebook :

• 1 PetaBytes of data

Big Data Context

Page 2: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

Standard Data Warehousing Methodology

• VolumeDesigned to store GB/TBBut needs PB (maybe EB)

• VarietyHeterogeneous dataVariable types Text, semi-structured

• VelocityFast data production

DBMS vs the Problem of 3V

Page 3: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

DBMS vs the Problem of 3V

• Pros DBMS▫ Joins between tables▫ SQL: Rich query language▫ Constraints: integrity, data types, normal forms

• Cons for Distribution▫ How to partition/place data?▫ How make joins?▫ How to manage fault tolerance?

DBMS vs Distribution

Page 4: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

Distributed systems: BASE• Basically Available :

• Any request => An answer• Even in a changing state

• Soft-state : • Opposite to Durability.• System’s state (servers or data) could

change over time (without any update)• Eventually consistent :

• With time, data can be consistent• Updates have to be propagated

ACID vs BASE

0 141 2 3 4 5 6 7 8 9 10 11 12 13

ACID BASEAtomicityConsistencyIsolationDurability

Basically AvailableSoft-StateEventually Consistent

NoSQLSGBDR VS

Local DBMS: ACID transactions• Atomicity: integral completion or none• Consistency: consistent at start and end• Isolation: no communication between them• Durability: an operation cannot be reversed

NoSQL : Not Only SQL• New data storage/management approach• Scales up the system (through distribution)• Complex metadata management (schemaless)

Do not substitute DBMS!!NoSQL’s target:• Very huge volume of data (PetaBytes)• Very short response time• Consistency is not mandatory

What is NoSQL?

Page 5: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

• No more relations• No schema (hardly data types)• Dynamic schemaØ Collections

• No more tuples• Nesting• ArraysØ JSON documents

• Distribution• Parallelism (abstraction of Map/Reduce, not cloud computing)

• Data replication• Availability vs Consistency• No more transactionsØ Few writes, many reads

NoSQL characteristics

• Files are segmented in chunks (64MB, 256MB…)• Chunks are distributed in the cluster• Data are placed according to a sharding strategy

• 3 types of technics of sharding:1. HDFS: Resource allocation based (racks, switches, datacenter)

• ++ Fault tolerance and massive computations2. Clustered index: Tree-based structure (sort)

• ++ Physical data clustering and dynamicity3. DHT: Hash-based structure

• ++ elasticity and self-management

Sharding: Data Distribution Strategies

Page 6: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

Resources allocation strategy• Distributed file system• Rely on servers load balancing• Dedicated to fault tolerance• Dynamic allocation and optimization

Sharding with HDFS

(1) See Hadoop Distributed File System http://chewbii.com/transparents-hdfs-hadoop/

Sharding with HDFSExample

NameNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

NameNode

DataNode

DataNodeRack Rack

DataNode

DataNode

DataNode

DataNode

DataNode

DataNodeRack

DataNode

NameNode

DataNode

DataNode

switch

switch

switch

Rack Rack

DataNode

DataNode

DataNode

DataNodeChunk 3

DataNode

DataNodeRack

DataNode

NameNode

Chunk 1

DataNodeChunk 4

DataNode

Chunk 2

Chunk 5

Chunk 6

switch

switch

switch

Rack Rack

DataNode

DataNode

DataNode

DataNodeChunk 3

DataNode

DataNodeRack

DataNode

NameNode

Chunk 1 Chunk 2

DataNodeChunk 4

Chunk 5DataNode

Chunk 2

Chunk 1 Chunk 3Chunk 4

Chunk 6Chunk 5

Chunk 6

switch

switch

switch

Rack Rack

DataNode

DataNode

DataNode

DataNodeChunk 3

DataNode

DataNodeRack

DataNode

NameNode

Chunk 1 Chunk 2

DataNode

Chunk 5

Chunk 4Chunk 1

Chunk 2

Chunk 3

Chunk 5DataNode

Chunk 2

Chunk 1 Chunk 3

Chunk 4

Chunk 4

Chunk 6Chunk 5

Chunk 6

Chunk 6

switch

switch

switch

Rack Rack

DataNode

DataNode

DataNode

DataNodeChunk 3

DataNode

DataNodeRack

DataNode

NameNode

Chunk 1

Secondary NameNode

Chunk 2

DataNode

Chunk 5

Chunk 4Chunk 1

Chunk 2

Chunk 3

Chunk 5DataNode

Chunk 2

Chunk 1 Chunk 3

Chunk 4

Chunk 4

Chunk 6Chunk 5

Chunk 6

Chunk 6

switch

switch

switch

Page 7: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

Distributed computation Fault Tolerance

Sharding with HDFSNoSQL solutions

Distributed clustered index• Sorted data according to a key

• Primary key• Build chunks (default 256Mo)• Distribute chunks• Replicate chunks

�Range queries and group by queries⚠Only one key can be chosen

Sharding with Clustered index

(2) See « index non-dense » http://chewbii.com/videos-optimisation-bases-de-donnees/ (Vidéo 4)

Page 8: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

Sharding with Clustered IndexExample

Chunk-inf -> 10 000

Chunk10 001 -> 20 000

Chunk20 001 -> 30 000

Chunk30 001 -> 40 000

Chunk40 001 -> 50 000

Chunk50 001 -> 60 000

Chunk60 001 -> 70 000

Chunk70 001 -> 80 000

Chunk80 001 —> inf

Noeud

Chunk-inf -> 10 000

Chunk10 001 -> 20 000

Chunk20 001 -> 30 000

Chunk30 001 -> 40 000

Chunk40 001 -> 50 000

Chunk50 001 -> 60 000

Chunk60 001 -> 70 000

Chunk70 001 -> 80 000

Chunk80 001 —> inf

Noeud Noeud Noeud NoeudNoeud + réplicas

Chunk-inf -> 10 000

Chunk10 001 -> 20 000

Chunk20 001 -> 30 000

Chunk30 001 -> 40 000

Chunk40 001 -> 50 000

Chunk50 001 -> 60 000

Chunk60 001 -> 70 000

Chunk70 001 -> 80 000

Chunk80 001 —> inf

Noeud + réplicas Noeud + réplicas Noeud + réplicas Noeud + répl.Noeud + réplicas

Feuille-inf -> 30 000

Feuille60 001 -> +inf

Feuille30 001 -> 60 000

Chunk-inf -> 10 000

Chunk10 001 -> 20 000

Chunk20 001 -> 30 000

Chunk30 001 -> 40 000

Chunk40 001 -> 50 000

Chunk50 001 -> 60 000

Chunk60 001 -> 70 000

Chunk70 001 -> 80 000

Chunk80 001 —> inf

Noeud + réplicas Noeud + réplicas Noeud + réplicas Noeud + répl.Noeud + réplicas

Racine

Feuille-inf -> 30 000

Feuille60 001 -> +inf

Feuille30 001 -> 60 000

Chunk-inf -> 10 000

Chunk10 001 -> 20 000

Chunk20 001 -> 30 000

Chunk30 001 -> 40 000

Chunk40 001 -> 50 000

Chunk50 001 -> 60 000

Chunk60 001 -> 70 000

Chunk70 001 -> 80 000

Chunk80 001 —> inf

Noeud + réplicas Noeud + réplicas Noeud + réplicas Noeud + répl.Noeud + réplicas

RouteurRacine

Feuille-inf -> 30 000

Feuille60 001 -> +inf

Feuille30 001 -> 60 000

Chunk-inf -> 10 000

Chunk10 001 -> 20 000

Chunk20 001 -> 30 000

Chunk30 001 -> 40 000

Chunk40 001 -> 50 000

Chunk50 001 -> 60 000

Chunk60 001 -> 70 000

Chunk70 001 -> 80 000

Chunk80 001 —> inf

Noeud + réplicas Noeud + réplicas Noeud + réplicas Noeud + répl.Noeud + réplicas

Routeur x3Racine

Feuille-inf -> 30 000

Feuille60 001 -> +inf

Feuille30 001 -> 60 000

Chunk-inf -> 10 000

Chunk10 001 -> 20 000

Chunk20 001 -> 30 000

Chunk30 001 -> 40 000

Chunk40 001 -> 50 000

Chunk50 001 -> 60 000

Chunk60 001 -> 70 000

Chunk70 001 -> 80 000

Chunk80 001 —> inf

Noeud + réplicas Noeud + réplicas Noeud + réplicas Noeud + répl.

Clustering Complex queries

Sharding with Clustered IndexSolutions

Page 9: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

Distributed Hash Table (DHT3)• Ring of virtual servers

• Max 264

• Unique hash table: splitted & distributed• Routing• Efficiency• Self-Management (no main server)

Sharding with DHT

(3) See DHT http://chewbii.com/hachagedynamique/ (Vidéo 4 & 5)

Sharding with DHTExample

0 = 264 = 4 x 262

262

263 = 2 x 262

3 x 262

0 = 264 = 4 x 262

Portion du

serveur S1

Portion duserveur S2

Portion du serveur S3

Port

ion

duse

rveu

r S4

Portion du

serveur S5

S3

S4

S5

S1

S2

262

263 = 2 x 262

3 x 262

0 = 264 = 4 x 262

Portion du

serveur S1

Portion duserveur S2

Portion du serveur S3

Port

ion

duse

rveu

r S4

Portion du

serveur S5

S3

S4

S5

S1

S2

262

263 = 2 x 262

3 x 262

d1

d2

d3

d4

d5

d6

d7

0 = 264 = 4 x 262

Portion du

serveur S1

Portion duserveur S2

Portion du serveur S3

Port

ion

duse

rveu

r S4

Portion du

serveur S5

S3

S4

S5

S1

S2

262

263 = 2 x 262

3 x 262

d1

d2

d3

d4

d5

d6 Chunk S1d1

d5 d2 d4réplicas

resp

Chunk S2d5 d2

d4réplicas

resp

d3

Chunk S3d4

réplicas

resp

d3 d6

d7

d7

Chunk S4d3

réplicas

resp

d6 d1

Chunk S5d6

réplicas

resp

d1 d5 d2

d7

d7

0 = 264 = 4 x 262

Portion du

serveur S1

Portion duserveur S2

Portion du serveur S3

Port

ion

duse

rveu

r S4

Portion du

serveur S5

S3

S4

S5

S1

S2

262

263 = 2 x 262

3 x 262

d1

d2

d3

d4

d5

d6 Chunk S1d1

d5 d2 d4réplicas

resp

Chunk S2d5 d2

d4réplicas

resp

d3

Chunk S3d4

réplicas

resp

d3 d6

d7

d7

Chunk S4d3

réplicas

resp

d6 d1

Chunk S5d6

réplicas

resp

d1 d5 d2

d7

d7

0 = 264 = 4 x 262d3 ?

Portion du

serveur S1

Portion duserveur S2

Portion du serveur S3

Port

ion

duse

rveu

r S4

Portion du

serveur S5

S3

S4

S5

S1

S2

262

263 = 2 x 262

3 x 262

d1

d2

d3

d4

d5

d6 Chunk S1d1

d5 d2 d4réplicas

resp

Chunk S2d5 d2

d4réplicas

resp

d3

Chunk S3d4

réplicas

resp

d3 d6

d7

d7

Chunk S4d3

réplicas

resp

d6 d1

Chunk S5d6

réplicas

resp

d1 d5 d2

d7

d7

0 = 264 = 4 x 262d3 ?

Portion du

serveur S1

Portion duserveur S2

Portion du serveur S3

Port

ion

duse

rveu

r S4

Portion du

serveur S5

S3

S4

S5

S1

S2

262

263 = 2 x 262

3 x 262

d1

d2

d3

d4

d5

d6 Chunk S1d1

d5 d2 d4réplicas

resp

Chunk S2d5 d2

d4réplicas

resp

d3

Chunk S3d4

réplicas

resp

d3 d6

d7

d7

Chunk S4d3

réplicas

resp

d6 d1

Chunk S5d6

réplicas

resp

d1 d5 d2

d7

d7

0 = 264 = 4 x 262d3 ?

Page 10: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

Elasticity Self-Management

Sharding with DHTSolutions

NoSQL Data Type Families

Nicolastype:prof

RégisLuc

type:resp formation

Célinetype:prof

CNAMadresse:…

OCadresse:…

CentraleSupélec

employeur

employeur

employeur

vacation

Paris Gif-sur-Yvette

siège social siège social siège social

vacation employeur

vacation

vacation

Key-Value stores

Document-orientedGraph oriented

Column-oriented

Nicolas Gaël Philippe Céline

{ "_id": "Nicolas", "type": "prof", "lieu": { "nom": "ESILV", "adresse":"12 avenue Léonard de Vinci", "ville": "Paris La Défense" }, "spec": ["BDD","NoSQL"], "intérêts": ["BZH", "Star Wars"]}

Keys

documents

{ "_id": "Gaël", "lieu": { "nom": "ESILV", "adresse":"12 avenue Léonard de Vinci", "ville": "Paris La Défense" }, "spec": ["Signal Processing", "Tourism"], "intérêts": ["Music"]}

{ "_id": "Philippe", "type": "prof", "lieu": { "nom": "Cnam", "adresse":"2 rue Conté", "ville": "Paris" }, "spec": ["BDD", "NoSQL""Music"]}

{ "_id": "Céline", "type": "prof", "lieu": { "nom": "CentraleSupelec", "adresse":"rue Joliot-Cury", "ville": "Gif-sur-Yvette » }, "spec": ["Ontologie", "Logique formelle", "Visualisation"]}

Nicolas Gaël Philippe Céline

type: profemployer: ESILV

spec: BDD,NoSQLhobbies: BZH, Star Wars

type: resp formationemployer: ESILV

spec: Signal Processing,Visualization

hobbies: Tourism, Music

type:profemployer:CNAM

spec: BDD, NoSQL, Music

type: profemployer: CentraleSupelec

spec: Ontology, Formal Logic, Visualization

Keys

values

status

profresp

formationprof

prof

id

Nicolas

Gaël

Philippe

Céline

employer

ESILV

ESILV

CNAM

CentraleSupelec

id

Nicolas

Gaël

Philippe

Céline

spec

BDD

Signal Processing

BDD

Ontology

id

Nicolas

Gaël

Philippe

Céline

NoSQLNicolas

VisualizationGaël

NoSQLPhilippe

MusicPhilippe

Formal LogicCéline

VisualizationCéline

id

Nicolas

Gaël

hobbies

BZH

Tourism

Nicolas Star Wars

Gaël Music

Page 11: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

Similar to a distributed “HashMap”

Key + Value:• No fixed schema on values (strings, object, integer, binaries…)

Drawbacks:• No structures nor typing• No structured-based queries

Key-Value Store

Key-Value StoreExample

22

Relational database

status employer spec

prof ESILV BDD, NoSQLresp

formation ESILV Signal Processing, Visualization

prof CNAM BDD, NoSQL, Music

prof CentraleSupelec Ontology, Formal Logic, Visualization

id

Nicolas

Gaël

Philippe

Céline

hobbies

BZH, Star Wars

Tourism, Music

Nicolas Gaël Philippe Céline

type: profemployer: ESILV

spec: BDD,NoSQLhobbies: BZH, Star Wars

type: resp formationemployer: ESILV

spec: Signal Processing,Visualization

hobbies: Tourism, Music

type:profemployer:CNAM

spec: BDD, NoSQL, Music

type: profemployer: CentraleSupelec

spec: Ontology, Formal Logic, Visualization

Keys

values

Nicolas Gaël Philippe Céline

type: profemployer: ESILV

spec: BDD,NoSQLhobbies: BZH, Star Wars

type: resp formationemployer: ESILV

spec: Signal Processing,Visualization

hobbies: Tourism, Music

type:profemployer:CNAM

spec: BDD, NoSQL, Music

type: profemployer: CentraleSupelec

spec: Ontology, Formal Logic, Visualization

Keys

values

Nicolas Gaël Philippe Céline

type: profemployer: ESILV

spec: BDD,NoSQLhobbies: BZH, Star Wars

type: resp formationemployer: ESILV

spec: Signal Processing,Visualization

hobbies: Tourism, Music

type:profemployer:CNAM

spec: BDD, NoSQL, Music

type: profemployer: CentraleSupelec

spec: Ontology, Formal Logic, Visualization

Keys

values

Nicolas Gaël Philippe Céline

type: profemployer: ESILV

spec: BDD,NoSQLhobbies: BZH, Star Wars

type: resp formationemployer: ESILV

spec: Signal Processing,Visualization

hobbies: Tourism, Music

type:profemployer:CNAM

spec: BDD, NoSQL, Music

type: profemployer: CentraleSupelec

spec: Ontology, Formal Logic, Visualization

Keys

values

Nicolas Gaël Philippe Céline

type: profemployer: ESILV

spec: BDD,NoSQLhobbies: BZH, Star Wars

type: resp formationemployer: ESILV

spec: Signal Processing,Visualization

hobbies: Tourism, Music

type:profemployer:CNAM

spec: BDD, NoSQL, Music

type: profemployer: CentraleSupelec

spec: Ontology, Formal Logic, Visualization

Keys

values

Nicolas Gaël Philippe Céline

type: profemployer: ESILV

spec: BDD,NoSQLhobbies: BZH, Star Wars

type: resp formationemployer: ESILV

spec: Signal Processing,Visualization

hobbies: Tourism, Music

type:profemployer:CNAM

spec: BDD, NoSQL, Music

type: profemployer: CentraleSupelec

spec: Ontology, Formal Logic, Visualization

Keys

values

Page 12: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

CRUD• CREATE ( key, value)

• CREATE ("Nicolas", "type:'prof',employer:'ESILV',spec:'BDD,NoSQL',hobbies:'BZH,Star Wars' ") à OK• READ( key)

• READ("Nicolas") à "type:'prof',employer:'ESILV',spec:'BDD,NoSQL',hobbies:'BZH,Star Wars' "• UPDATE( key, value)

• UPDATE("Nicolas", "type:'prof',employer:ESILV',spec:'BDD,NoSQL' ") à OK• DELETE( key)

• DELETE("Nicolas") à OK

Key-Value StoreQueries

23

Nicolas Gaël Philippe Céline

type: profemployer: ESILV

spec: BDD,NoSQLhobbies: BZH, Star Wars

type: resp formationemployer: ESILV

spec: Signal Processing,Visualization

hobbies: Tourism, Music

type:profemployer:CNAM

spec: BDD, NoSQL, Music

type: profemployer: CentraleSupelec

spec: Ontology, Formal Logic, Visualization

Keys

values

Nicolas Gaël Philippe Céline

type: profemployer: ESILV

spec: BDD,NoSQLhobbies: BZH, Star Wars

type: resp formationemployer: ESILV

spec: Signal Processing,Visualization

hobbies: Tourism, Music

type:profemployer:CNAM

spec: BDD, NoSQL, Music

type: profemployer: CentraleSupelec

spec: Ontology, Formal Logic, Visualization

Keys

values

Nicolas Gaël Philippe Céline

type: profemployer: ESILV

spec: BDD,NoSQLhobbies: BZH, Star Wars

type: resp formationemployer: ESILV

spec: Signal Processing,Visualization

hobbies: Tourism, Music

type:profemployer:CNAM

spec: BDD, NoSQL, Music

type: profemployer: CentraleSupelec

spec: Ontology, Formal Logic, Visualization

Keys

values

Nicolas Gaël Philippe Céline

type: profemployer: ESILV

spec: BDD,NoSQL

type: resp formationemployer: ESILV

spec: Signal Processing,Visualization

hobbies: Tourism, Music

type:profemployer:CNAM

spec: BDD, NoSQL, Music

type: profemployer: CentraleSupelec

spec: Ontology, Formal Logic, Visualization

Keys

values

• Web site logs store• Web site/DB cache• User profil management• Sensors status• E-commerce bucket storage

Key-Value StoreUse cases

Page 13: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

Efficiency Easy to set up

Key-value StoreSolutions

25

Column-based storage• DBMS: tuples (lines/rows) are stored in a whole• Column-oriented: attributes are separated in column-families and stored independently• Efficient queries on columns (aggregates)

Easy to insert a new column• Dynamic schema

Column-Oriented Store

Page 14: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

status employer spec

prof ESILV BDD, NoSQLresp

formation ESILV Signal Processing, Visualization

prof CNAM BDD, NoSQL, Music

prof CentraleSupelec Ontology, Formal Logic, Visualization

id

Nicolas

Gaël

Philippe

Céline

hobbies

BZH, Star Wars

Tourism, Music

Column-Oriented StoresExample

27

status

profresp

formationprof

prof

id

Nicolas

Gaël

Philippe

Céline

employer

ESILV

ESILV

CNAM

CentraleSupelec

id

Nicolas

Gaël

Philippe

Céline

spec

BDD

Signal Processing

BDD

Ontology

id

Nicolas

Gaël

Philippe

Céline

NoSQLNicolas

VisualizationGaël

NoSQLPhilippe

MusicPhilippe

Formal LogicCéline

VisualizationCéline

id

Nicolas

Gaël

hobbies

BZH

Tourism

Nicolas Star Wars

Gaël Music

status

profresp

formationprof

prof

id

Nicolas

Gaël

Philippe

Céline

employer

ESILV

ESILV

CNAM

CentraleSupelec

id

Nicolas

Gaël

Philippe

Céline

spec

BDD

Signal Processing

BDD

Ontology

id

Nicolas

Gaël

Philippe

Céline

NoSQLNicolas

VisualizationGaël

NoSQLPhilippe

MusicPhilippe

Formal LogicCéline

VisualizationCéline

id

Nicolas

Gaël

hobbies

BZH

Tourism

Nicolas Star Wars

Gaël Music

status

profresp

formationprof

prof

id

Nicolas

Gaël

Philippe

Céline

employer

ESILV

ESILV

CNAM

CentraleSupelec

id

Nicolas

Gaël

Philippe

Céline

spec

BDD

Signal Processing

BDD

Ontology

id

Nicolas

Gaël

Philippe

Céline

NoSQLNicolas

VisualizationGaël

NoSQLPhilippe

MusicPhilippe

Formal LogicCéline

VisualizationCéline

id

Nicolas

Gaël

hobbies

BZH

Tourism

Nicolas Star Wars

Gaël Music

status

profresp

formationprof

prof

id

Nicolas

Gaël

Philippe

Céline

employer

ESILV

ESILV

CNAM

CentraleSupelec

id

Nicolas

Gaël

Philippe

Céline

spec

BDD

Signal Processing

BDD

Ontology

id

Nicolas

Gaël

Philippe

Céline

NoSQLNicolas

VisualizationGaël

NoSQLPhilippe

MusicPhilippe

Formal LogicCéline

VisualizationCéline

id

Nicolas

Gaël

hobbies

BZH

Tourism

Nicolas Star Wars

Gaël Music

status

profresp

formationprof

prof

id

Nicolas

Gaël

Philippe

Céline

employer

ESILV

ESILV

CNAM

CentraleSupelec

id

Nicolas

Gaël

Philippe

Céline

spec

BDD

Signal Processing

BDD

Ontology

id

Nicolas

Gaël

Philippe

Céline

NoSQLNicolas

VisualizationGaël

NoSQLPhilippe

MusicPhilippe

Formal LogicCéline

VisualizationCéline

id

Nicolas

Gaël

hobbies

BZH

Tourism

Nicolas Star Wars

Gaël Music

Column-oriented queries• How many professors (status) are at

ESILV (employer)

Column-Oriented StoresQueries

28

status

profresp

formationprof

prof

id

Nicolas

Gaël

Philippe

Céline

employer

ESILV

ESILV

CNAM

CentraleSupelec

id

Nicolas

Gaël

Philippe

Céline

spec

BDD

Signal Processing

BDD

Ontology

id

Nicolas

Gaël

Philippe

Céline

NoSQLNicolas

VisualizationGaël

NoSQLPhilippe

MusicPhilippe

Formal LogicCéline

VisualizationCéline

id

Nicolas

Gaël

hobbies

BZH

Tourism

Nicolas Star Wars

Gaël Music

status

profresp

formationprof

prof

id

Nicolas

Gaël

Philippe

Céline

employer

ESILV

ESILV

CNAM

CentraleSupelec

id

Nicolas

Gaël

Philippe

Céline

spec

BDD

Signal Processing

BDD

Ontology

id

Nicolas

Gaël

Philippe

Céline

NoSQLNicolas

VisualizationGaël

NoSQLPhilippe

MusicPhilippe

Formal LogicCéline

VisualizationCéline

id

Nicolas

Gaël

hobbies

BZH

Tourism

Nicolas Star Wars

Gaël Music

status

profresp

formationprof

prof

id

Nicolas

Gaël

Philippe

Céline

employer

ESILV

ESILV

CNAM

CentraleSupelec

id

Nicolas

Gaël

Philippe

Céline

spec

BDD

Signal Processing

BDD

Ontology

id

Nicolas

Gaël

Philippe

Céline

NoSQLNicolas

VisualizationGaël

NoSQLPhilippe

MusicPhilippe

Formal LogicCéline

VisualizationCéline

id

Nicolas

Gaël

hobbies

BZH

Tourism

Nicolas Star Wars

Gaël Music

Page 15: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

• Statistics on online polls• Logging• Category searchs on products (ie. Ebay)• Reporting

Column-Oriented StoresUse cases

Aggregates Correlations

Column-Oriented StoresSolutions

30

Page 16: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

Based on the key-value store• Add semi-structured data (JSON/XML)• Keys indexing• Lists of values, nesting• Allow fusion of concepts

HTTP API• More complex than CRUD• Rich queries (database)

Document-Oriented Stores

Document-Oriented StoresExample

32

Nicolas Gaël Philippe Céline

{ "_id": "Nicolas", "type": "prof", "lieu": { "nom": "ESILV", "adresse":"12 avenue Léonard de Vinci", "ville": "Paris La Défense" }, "spec": ["BDD","NoSQL"], "intérêts": ["BZH", "Star Wars"]}

Keys

documents

{ "_id": "Gaël", "lieu": { "nom": "ESILV", "adresse":"12 avenue Léonard de Vinci", "ville": "Paris La Défense" }, "spec": ["Signal Processing", "Tourism"], "intérêts": ["Music"]}

{ "_id": "Philippe", "type": "prof", "lieu": { "nom": "Cnam", "adresse":"2 rue Conté", "ville": "Paris" }, "spec": ["BDD", "NoSQL""Music"]}

{ "_id": "Céline", "type": "prof", "lieu": { "nom": "CentraleSupelec", "adresse":"rue Joliot-Cury", "ville": "Gif-sur-Yvette » }, "spec": ["Ontologie", "Logique formelle", "Visualisation"]}

Nicolas Gaël Philippe Céline

{ "_id": "Nicolas", "status": "prof", "employer": { "name": "ESILV", "address":"12 avenue Léonard de Vinci", "city": "Paris La Défense" }, "spec": ["BDD","NoSQL"], "hobbies": ["BZH", "Star Wars"]}

Keys

documents

{ "_id": "Gaël", "status" : "resp formation "employer": { "name": "ESILV", "address":"12 avenue Léonard de Vinci", "city": "Paris La Défense" }, "spec": ["Signal Processing", "Visualization"], "hobbies": ["Tourism", "Music"]}

{ "_id": "Philippe", "status": "prof", "employer": { "name": "Cnam", "address":"2 rue Conté", "city": "Paris" }, "spec": ["BDD", "NoSQL", "Music"]}

{ "_id": "Céline", "status": "prof", "employer": { "name": "CentraleSupelec", "address":"rue Joliot-Cury", "city": "Gif-sur-Yvette » }, "spec": ["Ontology", "Formal Logic", "Visualization"]}

Nicolas Gaël Philippe Céline

{ "_id": "Nicolas", "status": "prof", "employer": { "name": "ESILV", "address":"12 avenue Léonard de Vinci", "city": "Paris La Défense" }, "spec": ["BDD","NoSQL"], "hobbies": ["BZH", "Star Wars"]}

Keys

documents

{ "_id": "Gaël", "status" : "resp formation "employer": { "name": "ESILV", "address":"12 avenue Léonard de Vinci", "city": "Paris La Défense" }, "spec": ["Signal Processing", "Visualization"], "hobbies": ["Tourism", "Music"]}

{ "_id": "Philippe", "status": "prof", "employer": { "name": "Cnam", "address":"2 rue Conté", "city": "Paris" }, "spec": ["BDD", "NoSQL", "Music"]}

{ "_id": "Céline", "status": "prof", "employer": { "name": "CentraleSupelec", "address":"rue Joliot-Cury", "city": "Gif-sur-Yvette » }, "spec": ["Ontology", "Formal Logic", "Visualization"]}

Nicolas Gaël Philippe Céline

{ "_id": "Nicolas", "status": "prof", "employer": { "name": "ESILV", "address":"12 avenue Léonard de Vinci", "city": "Paris La Défense" }, "spec": ["BDD","NoSQL"], "hobbies": ["BZH", "Star Wars"]}

Keys

documents

{ "_id": "Gaël", "status" : "resp formation "employer": { "name": "ESILV", "address":"12 avenue Léonard de Vinci", "city": "Paris La Défense" }, "spec": ["Signal Processing", "Visualization"], "hobbies": ["Tourism", "Music"]}

{ "_id": "Philippe", "status": "prof", "employer": { "name": "Cnam", "address":"2 rue Conté", "city": "Paris" }, "spec": ["BDD", "NoSQL", "Music"]}

{ "_id": "Céline", "status": "prof", "employer": { "name": "CentraleSupelec", "address":"rue Joliot-Cury", "city": "Gif-sur-Yvette » }, "spec": ["Ontology", "Formal Logic", "Visualization"]}

Nicolas Gaël Philippe Céline

{ "_id": "Nicolas", "status": "prof", "employer": { "name": "ESILV", "address":"12 avenue Léonard de Vinci", "city": "Paris La Défense" }, "spec": ["BDD","NoSQL"], "hobbies": ["BZH", "Star Wars"]}

Keys

documents

{ "_id": "Gaël", "status" : "resp formation "employer": { "name": "ESILV", "address":"12 avenue Léonard de Vinci", "city": "Paris La Défense" }, "spec": ["Signal Processing", "Visualization"], "hobbies": ["Tourism", "Music"]}

{ "_id": "Philippe", "status": "prof", "employer": { "name": "Cnam", "address":"2 rue Conté", "city": "Paris" }, "spec": ["BDD", "NoSQL", "Music"]}

{ "_id": "Céline", "status": "prof", "employer": { "name": "CentraleSupelec", "address":"rue Joliot-Cury", "city": "Gif-sur-Yvette » }, "spec": ["Ontology", "Formal Logic", "Visualization"]}

Nicolas Gaël Philippe Céline

{ "_id": "Nicolas", "status": "prof", "employer": { "name": "ESILV", "address":"12 avenue Léonard de Vinci", "city": "Paris La Défense" }, "spec": ["BDD","NoSQL"], "hobbies": ["BZH", "Star Wars"]}

Keys

documents

{ "_id": "Gaël", "status" : "resp formation "employer": { "name": "ESILV", "address":"12 avenue Léonard de Vinci", "city": "Paris La Défense" }, "spec": ["Signal Processing", "Visualization"], "hobbies": ["Tourism", "Music"]}

{ "_id": "Philippe", "status": "prof", "employer": { "name": "Cnam", "address":"2 rue Conté", "city": "Paris" }, "spec": ["BDD", "NoSQL", "Music"]}

{ "_id": "Céline", "status": "prof", "employer": { "name": "CentraleSupelec", "address":"rue Joliot-Cury", "city": "Gif-sur-Yvette » }, "spec": ["Ontology", "Formal Logic", "Visualization"]}

status employer spec

prof ESILV BDD, NoSQLresp

formation ESILV Signal Processing, Visualization

prof CNAM BDD, NoSQL, Music

prof CentraleSupelec Ontology, Formal Logic, Visualization

id

Nicolas

Gaël

Philippe

Céline

hobbies

BZH, Star Wars

Tourism, Music

Page 17: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

Nicolas Gaël Philippe Céline

{ "_id": "Nicolas", "status": "prof", "employer": { "name": "ESILV", "address":"12 avenue Léonard de Vinci", "city": "Paris La Défense" }, "spec": ["BDD","NoSQL"], "hobbies": ["BZH", "Star Wars"]}

Keys

documents

{ "_id": "Gaël", "status" : "resp formation "employer": { "name": "ESILV", "address":"12 avenue Léonard de Vinci", "city": "Paris La Défense" }, "spec": ["Signal Processing", "Visualization"], "hobbies": ["Tourism", "Music"]}

{ "_id": "Philippe", "status": "prof", "employer": { "name": "Cnam", "address":"2 rue Conté", "city": "Paris" }, "spec": ["BDD", "NoSQL", "Music"]}

{ "_id": "Céline", "status": "prof", "employer": { "name": "CentraleSupelec", "address":"rue Joliot-Cury", "city": "Gif-sur-Yvette » }, "spec": ["Ontology", "Formal Logic", "Visualization"]}

Manipulations on documents content• Establishment (employer.name) of professors (status) specialized in BDD (in spec)

Document-Oriented StoresQueries

33

• CMS: Content Management Systems• Online libraries• Products management• Software logging• Metadata on multimedia stores

• Collection of complex events• Emails storage• Social networks logging

Document-Oriented StoresUse cases

Page 18: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

Richness of queries Manage objects

Document-Oriented StoresSolutions

35

Storage: nodes, relations and properties• Graph Theory• Pattern queries on the graph• Data are loaded on demand• Difficulties for data modeling

Graph-Oriented Stores

Page 19: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

status employer spec

prof ESILV BDD, NoSQLresp

formation ESILV Signal Processing, Visualization

prof CNAM BDD, NoSQL, Music

prof CentraleSupelec Ontology, Formal Logic, Visualization

id

Nicolas

Gaël

Philippe

Céline

hobbies

BZH, Star Wars

Tourism, Music

Graph-Oriented StoresExample

37

Nicolas PhilippeGaël CélineNicolastype:prof

Philippetype:prof

Gaëltype:resp

Célinetype:prof

Nicolastype:prof

Philippetype:prof

Gaëltype:resp

Célinetype:prof

CentraleSupelecCnamESILV

Nicolastype:prof

Philippetype:prof

Gaëltype:resp

Célinetype:prof

CentraleSupelecCnam

employer employer employeremployer

ESILV

Nicolastype:prof

Philippetype:prof

Gaëltype:resp

Célinetype:prof

CentraleSupelecCnam

employer

exemployer

vacation employer employeremployer

ESILV

Nicolastype:prof

Philippetype:prof

Gaëltype:resp

Célinetype:prof

CentraleSupelec

Paris Gif-sur-Yvette

Cnam

employer

exemployer

vacation employer employer

location location location

employer

ESILV

Nicolastype:prof

Philippetype:prof

Gaëltype:resp

Célinetype:prof

CentraleSupelec

Paris Gif-sur-Yvette

Cnam

employer

exemployer

vacation employer employer

location location location

employer

ESILV

Pattern queries• Persons who had 2 employers at Paris

Graph-Oriented StoresQueries

38

Page 20: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

• Pattern recognition• Recommandation• Fraud detection• SIG (road network, electricty, reachability)

• Graph computations• Shorted path• PageRank• Connexity• Communities detection

• Linked data

Graph-Oriented StoresUse cases

FlockDB

Network Recommandation

Graph-Oriented StoresSolutions

40

Page 21: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

NoSQL Community Impact

https://db-engines.com/en/ranking

3 main properties for distributed management1. Consistency:

• A data returns the proper value at any time (coherency)

2. Availability:• Even if a server is down, data is

available3. Partition Tolerance:

• Even if the system is partitioned, a query must have an answer (unless for global failures)

Theorem: A distributed, networked system can have only two of these three properties.

The CAP Theorem[Brewer 2000]

Consistency

Availability

Partition Tolerance

CA AP

CP

Page 22: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

CACohérence + Disponibilité

CPCohérence + Distribution

APDisponibilité + Distribution

v1 v1 v1 v1v1

écriture

CACohérence + Disponibilité

CPCohérence + Distribution

APDisponibilité + Distribution

v2v1 v1 v1 v1v1

L1écriture L2

CACohérence + Disponibilité

CPCohérence + Distribution

APDisponibilité + Distribution

v2v1 v1 v1 v1v1

v2 v2L1écriture L2

CACohérence + Disponibilité

CPCohérence + Distribution

APDisponibilité + Distribution

v2v1 v1 v1 v1v1

L1écriture L2 écriture

CACohérence + Disponibilité

CPCohérence + Distribution

APDisponibilité + Distribution

v2v1 v2v1 v1 v1v1

L1écriture L2 L1écriture L2

CACohérence + Disponibilité

CPCohérence + Distribution

APDisponibilité + Distribution

v2v1 v2v1 v1 v1v1

L1écriture L2 L1écriture L2

CACohérence + Disponibilité

CPCohérence + Distribution

APDisponibilité + Distribution

v2v1 v2v1 v1 v1v1

attenteL1écriture L2 L1écriture L2

CACohérence + Disponibilité

CPCohérence + Distribution

APDisponibilité + Distribution

synchrone

v2v1 v2v1 v2v1 v1v1

attente

ack

L1écriture L2 L1écriture L2

CACohérence + Disponibilité

CPCohérence + Distribution

APDisponibilité + Distribution

synchrone

v2v1 v2v1 v2v1 v1v1

v2 v2L1écriture L2 L1écriture L2

CACohérence + Disponibilité

CPCohérence + Distribution

APDisponibilité + Distribution

synchrone

v2v1 v2v1 v2v1 v1v1

L1écriture L2 L1écriture L2

CACohérence + Disponibilité

CPCohérence + Distribution

APDisponibilité + Distribution

synchrone

écriture

v2v1 v2v1 v2v1 v2v1 v1

L1écriture L2 L1écriture L2

CACohérence + Disponibilité

CPCohérence + Distribution

APDisponibilité + Distribution

synchrone

L1écriture L2

v2v1 v2v1 v2v1 v2v1 v1

v2 v1L1écriture L2 L1écriture L2

CACohérence + Disponibilité

CPCohérence + Distribution

APDisponibilité + Distribution

synchrone

L1écriture L2

asynchrone

v2v1 v2v1 v2v1 v2v1 v1

v2 v1L1écriture L2 L1écriture L2

CACohérence + Disponibilité

CPCohérence + Distribution

APDisponibilité + Distribution

synchrone

L1écriture L2

asynchrone

v2v1 v2v1 v2v1 v2v1 v2v1

v2 v1

The CAP TheoremIllustration

43

Availability(Disponibilité)

Consistency(Cohérence)

Partition tolerance(Distribution)

RelationnelOrienté Clé-ValeurOrienté ColonneOrienté DocumentOrienté Graphe

Modèle

Availability(Disponibilité)

Consistency(Cohérence)

Partition tolerance(Distribution)

RelationnelOrienté Clé-ValeurOrienté ColonneOrienté DocumentOrienté Graphe

OracleMySQL

SQLServer

DB2PostgreSQL

Modèle

Availability(Disponibilité)

Consistency(Cohérence)

Partition tolerance(Distribution)

RelationnelOrienté Clé-ValeurOrienté ColonneOrienté DocumentOrienté Graphe

RedisMemcached

CosmosDB

SimpleDB OracleMySQL

SQLServer

DB2PostgreSQL

Modèle

Availability(Disponibilité)

Consistency(Cohérence)

Partition tolerance(Distribution)

RelationnelOrienté Clé-ValeurOrienté ColonneOrienté DocumentOrienté Graphe

RedisMemcached

CosmosDB

SimpleDB

BigTableHBase

ElasticsearchSpark

OracleMySQL

SQLServer

DB2PostgreSQL

Modèle

Availability(Disponibilité)

Consistency(Cohérence)

Partition tolerance(Distribution)

RelationnelOrienté Clé-ValeurOrienté ColonneOrienté DocumentOrienté Graphe

MongoDB

RedisMemcached

CosmosDB

SimpleDB

BigTableHBase

ElasticsearchSpark

OracleMySQL

SQLServer

DB2PostgreSQL

CouchBase DynamoDB

Cassandra

Modèle

Availability(Disponibilité)

Consistency(Cohérence)

Partition tolerance(Distribution)

RelationnelOrienté Clé-ValeurOrienté ColonneOrienté DocumentOrienté Graphe

MongoDB

RedisMemcached

CosmosDB

SimpleDB

BigTableHBase

ElasticsearchSpark

OracleMySQL

SQLServer

DB2PostgreSQL

CouchBase DynamoDB

Cassandra

Neo4jOrientDBFlockDB

Modèle

Availability(Disponibilité)

Consistency(Cohérence)

Partition tolerance(Distribution)

RelationnelOrienté Clé-ValeurOrienté ColonneOrienté DocumentOrienté Graphe

MongoDB*

RedisMemcached*

CosmosDB*

SimpleDB*

BigTableHBase

ElasticsearchSpark

OracleMySQL

SQLServer

DB2PostgreSQL

CouchBase*DynamoDB*

Cassandra*

Neo4jOrientDBFlockDB

* Possibilité de changer le mode de cohérence

Cohérence <-> Disponibilité

Modèle

The CAP TheoremCAP triangle

44

Page 23: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

Data synchronization• PAC

• Partitioning/Data movements• Between Availability and Consistency

• ELC• Normal state of the network• Between Latency and Consistency

CAP vs PACELC[Abadi 2012]

• PA/EL• Dynamo, Cassandra, Riak• Always available with a minimum of latency

• PC/EC• VoltDB/H-Store, MegaStore• Consistency whatever happens

• PA/EC• MongoDB• While partitioning: availability• In normal state: consistency

Initially XML used for complex internet communications (Web Services)

• Too verboseJSON (JavaScript Object Notation)

• Lightweight, text-oriented, language independent

• Used for several Web services (Google API, Twitter API)

JSON: Javascript Object Notation

Page 24: Introduction to NoSQLdatabases€¦ · Introduction to NoSQLdatabases nicolas.travers@devinci.fr Applications and Web platforms Exponential growth of the amount of Data x2 / 2 years

JSON: Javascript Object NotationKey-values & Typing

Key + Value• “lastname” : “Travers”• Keys with quotationsIdentifiers• “ _id ” commonly used• Overwrite already stored ids• Can be automaticaly generated

• Ex MongoDB: "_id" : ObjectId(1234567890)

Objects/Documents• Collection of key/values• { “_id” : 1234,

“lastname” : “Travers”,“firstname” : “Nicolas”,“kind” : 1}

Scalar• String, Integer, float, boolean, null…Documents• Objects {…}List• Arrays [ … ]• No typing

“lessons” : [ “SQL”, 1, 4.2, null, “NoSQL” ]• Can nest documents

“doc” : [ {“test” : 1},{“test” : {“nesting” : 1.0}},{“key” : “text”, “value” : null} ]

{“_id” : 1234,“lastname” : “Travers”, “firstname” : “Nicolas”,“employers” : [

{ “company” : “ESILV”, "starting_date" : “2018-09",“location” : {“street” : “12 avenue Léonard de Vinci”,“city” : “Paris La Défense”, “zip” : 92916},

{ “company” : “CNAM”, "starting_date" : “2007-09", "end_date" : “2018-08",

“location” : {“street” : “2 rue Conté”,“city” : “Paris”, “zip” : 75141},{ “company” : “UCP”, "starting_date" : “2006-09", "end_date" : “2007-08",

“location” : {“street” : “2 rue Conté”,“city” : “Cergy-Pontoise”, “zip” : 91000},{ “company” : “UVSQ”, "starting_date" : “2004-01", "end_date" : “2006-08",

“location” : {“street” : “45 avenue des Etats-Unis”,“city” : “Versailles”, “zip” : 78000},]

},“fields” : [ “DB” , “DB optimization”, “XML”, “NoSQL”, “IR” ],"hobbies" : ["Star Wars", "BZH"]

}

JSON: Javascript Object NotationComplete example