Upload
antonio-castellon
View
115
Download
2
Embed Size (px)
Citation preview
Introduction to NoSQL
(Not Only SQL)
By Antonio Castellón :: Multi-Disciplinary Engineer – Computer Science
May, 2015 - for Philip Morris International R&D
Problem : Data Complex
Problem : Data Complex to Model
Problem : Dynamic Data ( Uncertainty )
End User requirements and data itself sometimes generate different types of uncertainty
The NoSQL Jungle
Data – NoSQL – Different implementations
CURRENTLY +150
Data - NoSQL – Comparing data structure
Image from: http://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/
Data - NoSQL – Compare
98% of the business requirements
There is still billions of nodes and relationships
Data - NoSQL – Keys to fit
Key-value store
Column Store
Document Store
Graph Database
Performance High High High Variable
Scalability High High Variable (High) Variable
Flexibility High Moderate High High
Complexity None Low Low High
Functionality Variable(None) Minimal Variable (Low) Graph Theory
Data – Our selection
Graph Databases
Data – Graph Databases – Why?
Flexible data structureDoesn’t matter if the relations will change in the future.
Closer match to business logic
Data – Graph Databases – Why?
Natural query system You tell what you want, not how to get it.
with recursive cluster (party, path, depth) as ( select cast(@userId as character varying), cast(@userId as character varying), 1 union ( select (case when this.party = amc.userA then amc.userB when this.party = amc.userB then amc.userA end), (this.path || '.' || (case when this.party = amc.userA then amc.userB when this.party = amc.userB then amc.userA end)), this.depth + 1 from cluster this, chat amc where ((this.party = amc.userA and position(amc.userB in this.path) = 0) or (this.party = amc.userB and position(amc.userA in this.path) = 0)) AND this.depth < @depth + 1 ) ) select party, path from cluster where not exists ( select * from cluster c2 where cluster.party = c2.party and ( char_length(cluster.path) > char_length(c2.path) or (char_length(cluster.path) = char_length(c2.path)) and (cluster.path > c2.path) ) ) order by party, path;
SQL = several hours to be executed
VS
START b = node:User(UserId=‘Manolo') MATCH (b) --(friend)--(friendoffriend) RETURN count(friendoffriend)
Cypher Language = 635ms
Data - Graph Databases – Why?
Fits very well with complex data
Data - Graph Databases – Why?
Fits very well with Bio-Informatics
0.9 Billion relationsips
Data – Graph Databases – Why?
Fast Prototyping and developmentWe don’t need to lose too much time to define the schema (fine-grained).
Data - Graph Databases – What is it?
Properties
Labels
Relationships
Data - Graph Databases - Implemented by …
Data - Graph Databases – Top 3
Name API Query Methods Consistency Staff (people) / Community
OrientDB Java Traverser API, Blueprints, Rexster
Own SQL-like Query Language, Gremlin
ACID, MVCC 3 / Low
Neo4j Java, Python, JPython, Ruby, JRuby, JavaScript (Node.js), PHP, .NET, Django, Clojure, Spring, Scala, or REST (any language)
Cypher (native/preferred), Native Java APIs (special cases), Traverser API, REST, Blueprints, Gremlin
ACID 42 / Very High
DEX Java, C++, .NET Native Java, C# and C++ APIs, Blueprints, Gremlin
Consistency, durability and partial isolation and atomicity
5 / ?
Data - Graph Databases - Neo4j customers
Data - Graph Database - Neo4j - Partners
EndThanks you for your attention.