Upload
alikonweb
View
143
Download
0
Tags:
Embed Size (px)
Citation preview
2
About me {
"_id": "555ae00a475a9b259281b21a",
"name": "Nicola Galgano",
"alias": "alikon",
"gender": "male",
"work": "DB consultant on banking systems",
"company": "looking for a new one",
"email": "[email protected]",
"twitter": "@alikon",
"address": "Roma, Italy, EU“,
“current_hobby”:”run away from dentist”}
5
What is Big Data ?Big data is an all-encompassing term for any collection of data sets so large or complex that it becomes difficult to process them using traditional data processing applications.
From wikipedia
7
Where big data come from ?
Internet
of
Everything
IPv6 = 2^128
3,4e+38
IPv6 can address every quark in the world
11
…and for the 5x9 ?Availability Downtime/year Downtime/month Downtime/week
90 % (1 nine) 36.5 days 72 hours 16.8 hours
99 % (2 nines) 3.65 days 7.20 hours 1.68 hours
99,9 % (3 nines) 8.76 hours 43.8 minutes 10.1 minutes
99,99 % (4 nines) 52.56 minutes 4.38 minutes 1.01 minutes
99,999% (5 nines) 5.26 minutes 25.9 seconds 6.05 seconds
13
NoSql (no-SQL or Not Only SQL)
Next Generation Databases mostly addressing some of the points:
non-relational distributed horizontal scalable open-source
From www.nosql-database.org
15
Non Relational ? What ?!?!A data model is a rapresentation that we use to perceive and manipulate data
•Logic model• Normalization• 1NF,2NF,3NF,..• E-R • Schema (rigid)• Algebra of sets
•Impedance mismatch
16
NoSQL Data models
Schemaless(dynamic/implicit)
DenormalizationAggregate
Aggregates are the basic element of data storage
17
Key / ValueSimple data model
Blob/Opaque
Only 3 API function• Get(key)• Set(key, value)• Delete(key)
Key and value can be complex
18
Document More trasparent
JSON (JavaScript Object Notation)
A lightweight data interchange format
Easy for humans and machines to read and write
19
ColumnSparse semi structured,
sorted map.
Flexible number of columns
Column key can be grouped to family
How is stored
20
Graph Graph theory model G = ( V, E ) Store, map and query relationships
•Node connected by edges
•Complex relationships
•Recommend products
•ACID
Queries = graph traversal
21
Map reduce
The map job takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs)
The reduce job takes the output from a map as input and combines those data tuples into a smaller set of tuples
refers to 2 separate and distinct tasks
Tasks runs in parallel
There is no “Silver Bullet” There are multiple ways to model data How the data is going to be accessed Read intensive or Write intensive Complex queries
23
Schemaless NormalizedModel
24
How do you scale ?Vertical (up)Add more power (ram/cpu/disk)
Horizontal (out) Add more commodity systems
25
The 8 fallacies of distributed computing
1. The network is reliable. 2. Latency is zero. 3. Bandwidth is infinite. 4. The network is secure. 5. Topology doesn't change. 6. There is one administrator. 7. Transport cost is zero. 8. The network is homogeneous.
26
Sharding Split up data into multiple chunks Store each chunk in a separate data node
Partitioning strategy “The shard key“ Multishard ops (Join/aggregate) Load balancing
27
Replication Master / Slave Multi / Master
Synchonous Asynchonous
Provide redundancy Increase availability Failover (automatic)
29
RDBMS are ACID with transaction
Transaction A sequence of operations that form a single unit of work
Transaction have 4 propertiesAtomicConsistentIsolatedDurable
30
ACID - AtomicityTransfer 100€ from A to B
1. Read(a)
2. If a > 100
3. A=A-100
4. Write(A)
5. Read(b)
6. B=B+100
7. Write(B)
31
ACID - Consistency
Transfer 100€ from A to B
1. Read(a)
2. If A > 100
3. A=A-100
4. Write(A)
5. Read(B)
6. B=B+100
7. Write(B)
32
ACID - IsolationTransfer 100€ from A to B
1. Read(A)
2. If A > 100
3. A=A-100
4. Write(A)
5. Read(B)
6. B=B+100
7. Write(B)
33
ACID - DurabilityTransfer 100€ from A to B
1. Read(A)
2. If A > 100
3. A=A-100
4. Write(A)
5. Read(b)
6. B=B+100
7. Write(B)
34
NoSQL are BASEBasically Available: There will be a response to any request. Fast response even if some replicas are slow or crashed
Soft State: The state of the system could change over time It’s user application task to guarantee consistency
Eventual consistent: The system will eventually become consistent once it stops
receiving input. The data will propagate to everywhere
35
Eventual Consistency (example) Nick finds a cool photo and shares with Maria by posting
on her Facebook wall Nick asks Maria to check it out Maria logs in her account, checks her Facebook wall but:
- Nothing is there! (x apart) Nick tells Maria to wait a bit and check out later Maria waits for a minute or so and checks back:
- She finds the photo Nick shared with her!
36
CAP theorem It’s impossible for a distributed computer system to
simultaneously provide all this three guarantees:
Consistency – all node see the same data at same time Availability – all can always read and write Partition tollerance – the system will work on failure*
A distributed system can satisfay only 2 at the same time
38
The ATM example ATM will allow you to withdraw money even if the
machine is partitioned from the network
Higher availability means higher revenue
However, it puts a limit on the amount of withdraw The bank might also charge you a fee when a
overdraft happens
39
From CAP to PACELC
In the absence of partitions
how does the system trade off
latency (L) and consistency (C)?
41
SummaryACID RDBMS BASE NOSQL
Strong consistency Isolation Transaction Mature technology SQL Available & consistent Scale up (limited) Shared something (disk/ram/proc)
Weak consistenct (stale data) Last write wins Program managed New technology No standard Available & partition tolerant Scale out (unlimited*) Shared nothing (parallelizable)