Upload
andrea-giuliano
View
224
Download
0
Embed Size (px)
Citation preview
O R I E N T D B G O FA S T I N A G R A P H W O R L D
February 25th, 2016@bit_sharkAndrea Giuliano
Andrea Giuliano @bit_shark
$whoami
O N C E U P O N A T I M E
O N C E U P O N A T I M E - 1 9 7 9
• first commercially available RDBMS
• written in assembly
• runs in 128K of memory
• not support for transactions
• support for basic sql queries and joins
R E L AT I O N A L D ATA B A S E S
• data is presented to the user in the form of rows and columns (a relation)
• data can be manipulated through relational operators in a tabular form
O V E R T I M E
• data start growing in size
• data become heterogeneous
• structured, semi-structured, unstructured data
• rate at which data is generated increased
B I G D ATA
3 0 Y E A R S L AT E R ( 2 0 0 9 )
• NoSQL movement
• some intents of NOSQL databases:
• being non-relational
• simplicity of design
• simpler horizontal scaling
• speed up some operations
• distributed
( S O M E ) T Y P E S O F N O S Q L D ATA B A S E S
• document
• key-value
• object-oriented
• graph
• multi-model
D O C U M E N T M O D E L
• the document encapsulate data in some standard format: yaml, json, xml, bson
{ "id": 45, "name": "Andrea", "fav_colours": ["blue", "green"], "driver_license": { "number": "AA123" } }
K E Y- VA L U E M O D E L
• dictionary in which data is represented as a collection of key-value pairs
> SET akey “Andrea”
> GET akey “Andrea”
akey Andrea
O B J E C T- O R I E N T E D M O D E L
• data is represented in the form of objects
Animal
Dog Cat
G R A P H M O D E L
• data is represented in the form of a graph
M U LT I M O D E L
K e y - Va l u e
D o c u m e n t
O b j e c t - o r i e n t e dG r a p h
R E L AT I O N A L V S N O S Q L
• how data is represented
• how data is related
• relational databases have the concept of joins
• NoSQL databases have multiple concepts
• aggregation
• relation (through edges)
I S S U E S W I T H J O I N
User
name id
Andrea 45
John 48
Steven 53
Bill 70
Like
user_id food_id
45 13
45 49
70 38
Food
id name
13 Pasta
38 Sushi
49 Kebab
63 Meat
SELECT F.name FROM User U, Like L, Food F WHERE U.name='Andrea' AND U.id=L.user_id AND L.food_id=F.id;
I S S U E S W I T H J O I N
User
name id
Andrea 45
John 48
Steven 53
Bill 70
Like
user_id food_id
45 13
45 49
70 38
Food
id name
13 Pasta
38 Sushi
49 Kebab
63 Meat
SELECT F.name FROM User U, Like L, Food F WHERE U.name='Andrea' AND U.id=L.user_id AND L.food_id=F.id;
double JOIN per record at runtime
I S S U E S W I T H J O I N
• the relationships are computed every time a query is performed
• time complexity grows with data: O(log n)
• heavy runtime cost with large datasets
• index lookup does not help
• speeds up searches but slows down inserts, updates, deletes
• imagine on billions of records
speakerdeck.com/agiuliano/index-management-in-depth
S U M M I N G U P J O I N
• a join operation involves
• searching a record in the starting table (User)
• use the foreign key to lookup the intermediate table (Like) through its index
• traversing the intermediate table looking up the target table (Food) ids
The more entries you have the more your queries are SLOW
www.flickr.com/photos/blacktigersdream/8737830046
S AV I N G P R O J E C T I O N S
S AV I N G P R O J E C T I O N S
advantages
• data is predetermined
disadvantages
• data synchronization
• solves only reads
UserLikesFood
User user_id Like food_id
Andrea 45 Pasta 13
Andrea 45 Kebab 49
Bill 70 Sushi 38
R E L AT I O N S H I P S I N N O S Q L W O R L D
R E L AT I O N S H I P S I N D O C U M E N T S
• embed information in documents where you need them
• data duplication
• faster access
{ "id": 45, "name": "Andrea", "likes": ["Pasta", "Kebab"] }
G R A P H S
G R A P H
G = (V, E )Graph Vertices Edges
Edge Vertex
Graph
G R A P H
AndreaBMW
name: Andrea
license: A123
drives
model: X5 doors: 5
V E R T I C E S A R E D I R E C T E D
V E R T I C E S C A N H AV E
P R O P E R T I E S
E D G E S C A N H AV E
P R O P E R T I E S
G R A P H
AndreaBMW
drives
owns
N-M relationships can be represented using multiple edges
B U I L D S M A R T R E L AT I O N S H I P S
Andrea
Luxury Cars
BMW
Ferrari
Customers
John
Cars
Root vertices
B U I L D S M A R T R E L AT I O N S H I P S
• root vertices can be meta graphs
• meta graphs add information to make traversal easier and faster
a Car can be enriched with information regarding • date of purchase • country of manufacture
EXAMPLE
www.flickr.com/photos/aigle_dore/5952275132
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Purchase Year 2016
Month Jan 2016
Day 01/15/2016
FerrariMaserati
Month Feb 2016
Day 02/01/2016
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Made
FerrariMaserati
EuropeItaly
Germany
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Made
Purchase Year 2016
Month Jan 2016
Day 01/15/2016
FerrariMaserati
Month Feb 2016
Day 02/01/2016
EuropeItaly
Germany
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Made
Purchase Year 2016
Month Jan 2016
Day 01/15/2016
FerrariMaserati
Month Feb 2016
Day 02/01/2016
EuropeItaly
Germany
get all the italian cars sold on 01/15/2016
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Made
Purchase Year 2016
Month Jan 2016
Day 01/15/2016
FerrariMaserati
Month Feb 2016
Day 02/01/2016
EuropeItaly
Germany
let’s start from Made
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Made
Purchase Year 2016
Month Jan 2016
Day 01/15/2016
FerrariMaserati
Month Feb 2016
Day 02/01/2016
EuropeItaly
Germany
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Made
Purchase Year 2016
Month Jan 2016
Day 01/15/2016
FerrariMaserati
Month Feb 2016
Day 02/01/2016
EuropeItaly
Germany
found the cars made in Italy now filter by date using incoming edges
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Made
Purchase Year 2016
Month Jan 2016
Day 01/15/2016
FerrariMaserati
Month Feb 2016
Day 02/01/2016
EuropeItaly
Germany
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Made
Purchase Year 2016
Month Jan 2016
Day 01/15/2016
FerrariMaserati
Month Feb 2016
Day 02/01/2016
EuropeItaly
Germany
let’s try from Purchase
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Made
Purchase Year 2016
Month Jan 2016
Day 01/15/2016
FerrariMaserati
Month Feb 2016
Day 02/01/2016
EuropeItaly
Germany
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Made
Purchase Year 2016
Month Jan 2016
Day 01/15/2016
FerrariMaserati
Month Feb 2016
Day 02/01/2016
EuropeItaly
Germany
found the cars purchased on 01/15/2016 now filter by country using incoming edges
B U I L D S M A R T R E L AT I O N S H I P S
BMW
Made
Purchase Year 2016
Month Jan 2016
Day 01/15/2016
FerrariMaserati
Month Feb 2016
Day 02/01/2016
EuropeItaly
Germany
O R I E N T D B
O R I E N T D B
• nosql database
• multimodel
• high performance (can write 400,000 records/sec*)
• http rest and json api
• ACID
*On Intel i7 8 core CPU, 16 GB RAM, SSD RPM, Multi-threads, no indexes (orientdb.com)
15+ languages 30+ drivers
I N S TA L L AT I O N
orientdb.com/docs/2.1/Tutorial-Installation.html
$ docker run -d -v … orientdb/orientdb
$ brew install orientdb
L O G I C A L C O N C E P T S
• class • type of data model
• cluster • stores groups of records within a class
class Car
cluster
USA_car
cluster
Italy_car
V E R T I C E S
• record identifier (RID)
• each record has its own self-assigned unique ID
• composed of 2 parts #<cluster-id>:<cluster-position>
• list of properties
• edge’s RID
• in
• out
E D G E S
• record identifier (RID) • each record has its own self-assigned unique ID
• composed of 2 parts #<cluster-id>:<cluster-position>
• in • RID of the ingoing vertex
• out • RID of the outgoing vertex
R E L AT I O N S H I P S
• does not make use of JOINs like RDBMS • physical links O(1) • relationship managed by storing the edge’s RID in
both vertices as “out” and “in” • for 1-to-n relationship collections of rid are used
o u t : [ # 1 3 : 3 5 ] i n : [ # 1 5 : 1 0 0 ] l i c e n s e : A 1 2 3
drives
o u t : [ # 1 4 : 5 4 ] n a m e : A n d r e a
i n : [ # 1 4 : 5 4 ] m o d e l : X 5
#13:35 #15:100
#14:54
Andrea BMW
T R AV E R S E A R E L AT I O N S H I P
o u t : [ # 1 3 : 3 5 ] i n : [ # 1 5 : 1 0 0 ]
drives
o u t : [ # 1 4 : 5 4 ] i n : [ # 1 4 : 5 4 ]
#13:35 #15:100
#14:54
Andrea BMW
T R AV E R S E A R E L AT I O N S H I P
drives
#13:35 #15:100
#14:54
Andrea BMWo u t : [ # 1 3 : 3 5 ] i n : [ # 1 5 : 1 0 0 ]o u t : [ # 1 4 : 5 4 ] i n : [ # 1 4 : 5 4 ]
C R E AT E A C L A S S
CREATE CLASS Car EXTENDS V
V
C a r
E
d r i v e s
CREATE CLASS drives EXTENDS E
A D D P R O P E R T I E S T O A C L A S S
• create properties involves to define its name and its type
• is mandatory in order to define indexes or constraints
CREATE PROPERTY Car.model String
C a rm o d e l : S t r i n g
A D D C O N S T R A I N T S T O A P R O P E R T Y
• alter the defined property adding the constraint
ALTER PROPERTY Car.model MANDATORY TRUE
C a rm o d e l : S t r i n g
Q U E R Y I N G
SELECT FROM Car WHERE model=‘X5’
C a rr i d : # 1 5 : 6 m o d e l : X 5
SELECT FROM #15:6
Q U E R Y I N G
C a rr i d : # 1 5 : 6 m o d e l : X 5
SELECT FROM [#15:6, #15:7]
C a rr i d : # 1 5 : 7 m o d e l : Z 4
Q U E R Y I N G
SELECT name, OUT(“drives”).model AS DrivesCarFROM #17:0
name DrivesCar
Andrea [“X5”, “Z4”]
Q U E R Y I N G
SELECT name, OUT(“drives”).model AS DrivesCar FROM #17:0 UNWIND DrivesCar
name DrivesCar
Andrea X5
Andrea Z4
Q U E R Y I N G
TRAVERSE * FROM #17:0 MAXDEPTH 4
Andrea
BMW
Maserati
drives
drives
D E P T H F I R S T S E A R C H
TRAVERSE * FROM #17:0 STRATEGY DEPTH_FIRST
1
2 87
3 6 9 1 2
1 11 054
B R E A D T H F I R S T S E A R C H
1
2 43
TRAVERSE * FROM #17:0 STRATEGY BREADTH_FIRST
5 6 7 8
1 21 11 09
W H E N
• store inter-connected data
• query data by relation of arbitrary length
• continuously evolving data set
• make it easy to evolve the database