69
Politecnico di Milano NoSQL databases Elisabetta Di Nitto [email protected] 30/03/2017 Lecture for the course: Big Data Technologies Credits to Marco Scavuzzo

NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Embed Size (px)

Citation preview

Page 1: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Politecnicodi Milano

NoSQL  databases

Elisabetta Di  [email protected]

30/03/2017Lecture  for  the  course:  Big  Data  Technologies

Credits  to  Marco  Scavuzzo

Page 2: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

What  is  big  data?

Big  Data  is  a  collection  of  very  huge  data  sets  with  a  great  diversity  [Chen  &  Zhang  2014]

Big  data  should  be  thought  of  as  a  process  — how  to  get  to  new  insights,  how  to  turn  them  into  action,  resulting  in  business  value  [Gartner  2015]

Big  Data  are  high-­volume,  high-­velocity,  and/or  high-­variety  information  assets  that  require  new  forms  of  processing  to  enable  enhanced  decision  making,  insight  discovery  and  process  optimization  [Gartner  2012]

- 2 -

Page 3: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Politecnicodi Milano

RDBMS  vs  NoSQL

Page 4: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

New  requirements

A  new  size  for  dataIn  2009  Google  was  processing  24  Petabyte  per  dayIn  2009  Facebook  declared  to  store  about  60  millions  of  imagesThe  Internet  archive  stores  2  Petabyte  of  data

Data  instances  can  be  different  one  from  the  otherSome  fields  may  be  missingSome  other  fields  may  have  different  types

Videocamera,  10  megapixel,  100$Apples,  3  Kg,  4$

Correlation  between  data  is  not  defined  a  priori  but  discovered  a  posteriori  Data  come  at  high  speed

4

Page 5: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

RDBMS:  assumptions  and  benefits

AssumptionsWell-­defined  structure  for  data,  known  when  data  is  stored  in  the  DBMSData  is  dense  and  uniformIndexes  can  be  defined  a  priori  and  used  for  queriesData  stays  within  a  few  Gigabyte

BenefitsIt  uses  the  lowest  amount  of  disk  spaceIt  is  a  well-­understood  model  and  query  languageIt  can  support  a  wide  variety  of  use  casesIt  has  schema-­enforced  data  consistency

- 5 -

Page 6: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Issues

Assumptions  are  not  valid  anymore!Well-­defined  structure  for  data,  known  when  data  is  stored  in  the  DBMS

Relationships  between  data  not  known  in  some  casesData  is  dense  and  uniform

Videocamera,  10  megapixel,  100$Apples,  3  Kg,  4$

Data  stays  within  a  few  Gigabyte24  Petabytes  per  day…  60  millions  of  images…

- 6 -

Page 7: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Issues

RDMS  end-­user  orientedDatabase  takes  care  of  data  aggregation  based  on  queriesDatabase  provides  transactional  guaranties,  schemas,  and  referential  integrity

TodayLess  interest  in  aggregation  managed  by  databases

We  want  to  control  aggregation  and  build  parallel  computation  for  this  purpose

Possibility  to  control  integrity  and  validity  of  data  at  the  level  of  applications

- 7 -

Page 8: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

What  is  NoSQL?

No  use  of  SQL  (or  some  specific  constructs)  as  query  language:Manage  large  volumes  of  data  that  do  not  necessarily  follow  a  fixed  schemaData  is  partitioned  among  different  machines  and  JOIN  operations  are  not  usable

ACID  guarantees  may  be  relaxed:E.g.,  eventual  consistency  Transactions  limited  to  single  data  items

Distributed,  fault-­tolerant  architectureData  held  in  a  redundant  manner  on  several  serversHorizontal  scalability

8

Page 9: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

NoSQL  characteristics

Data  Model  and  CRUD  operations

Key-­Value

Document-­based

Column-­based

Graph-­based

Distributed  management  of  data  and  queries

Partitioning

Replication

PACELC

- 9 -

Page 10: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Result

RDBMS  focusesSchemaRelations  between  entitiesTransactionsIntegrity  checksRich  query  language

NoSQL  focusesLight  schemaAggregationsFocus  on  eventual  consistencyData  partitioningBasic  Create  Read  Update  Delete(CRUD)  operations

- 10 -

Page 11: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

An  example  with  relations

11

Student  ID Student  Name132 Giovanni  Rossi145 Ginevra Bianchi150 Chiara Bassi

Course  ID Course Name Instructor123 SE EDN134 DB  1 LT167 Math GL

Student  ID Course  ID Date Score132 123 10/06/2013 25132 134 11/06/2013 26145 123 10/06/2013 30

Page 12: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

NoSQL  databases  and  data  denormalization (1)  – key-­value  approach

- 12 -

132course1 Giovanni Rossi,  123,  SE,  EDN,  10/06/2013,  25

132course2 Giovanni Rossi,  134,  DB1,  LT,  11/06/2013,  26

145course1 Ginevra Bianchi,  123,  SE,  EDN,  10/06/2013,  30

150 Chiara  Bassi

Page 13: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

NoSQL  databases  and  data  denormalization(2)  – document-­based  approach

- 13 -

132

Student: {name:  ‘Giovanni Rossi’,  id:  132,  Exams:  [{id:  123,  name:  ‘SE’,  instructor:  ‘EDN’,  date:  10/06/2013,  score:  25},{id:  134,  name:  ‘DB1’,  instructor:  ‘LT’,  date:  11/06/2013,  score:  26}]}

145

Student: {name:  ‘Ginevra  Bianchi’,  id:  145,  Exams:  [{id:  123,  name:  ‘SE’,  instructor:  ‘EDN’,  date:  10/06/2013,  score:  30}]}

150 Student: {name:  ‘Chiara  Bassi’,  id:  150}

Page 14: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

NoSQL  databases  and  data  denormalization(3)  – column-­based  approach

14

Row  Key StudentData ExamData

S ID Student  Name C  ID C Name Instructor Date Score

CBNN 150 Chiara Bassi

GBSE 145 Ginevra Bianchi 123 SE EDN 10/06/2013 30

GRSE 132 Giovanni  Rossi 123 SE EDN 10/06/2013 25

GRDB 132 Giovanni  Rossi 134 DB  1 LT 11/06/2013 26

Column  familiesData  ordered  by  row  key

Page 15: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

NoSQL  databases  and  data  denormalization (3)  –graph-­based  approach

- 15 -

SE DB1

GR

GB

CB

EDN LT

Passed  2510/06/2013

Passed  2611/06/2013

Taught  by Taught  byPassed  3011/06/2013

Page 16: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Key  Value  CRUD  operations

Query  operations  are  limited  toput(key,value)get(key)delete(key)

16

Page 17: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Document-­based  CRUD  operations  (MongoDB)

db.collection.find(<query  filter>,  <projection>)<query  filter>  -­>  {<field1>:  <value1>,  …}<projection>  -­>  {<field1>:  1,  <field2>:  1}  includes  both  field1  and  field2  in  the  result  setWriting  {<field1>:  0,  <field2>:  0}    field1  and  field2  are,  instead,  excluded  from  the  result  set.  Examplesdb.newDB.find()db.newDB.find(”Student.name":  ”Giovanni  Rossi")db.newDB.find ({”Student.Exams.name":  "SE"})db.newDB.find({”Student.Exams.name":  "SE”,  {”Student.Exams.instructor":  0})

Page 18: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Indexes

Support  efficient  execution  of  queriesWithout  indexes  a  whole  collection  needs  to  be  scanWith  indexes  the  search  can  be  limited  to  a  subset  of  documentsThe  index  stores  the  value  of  a  specific  field  or  set  of  fields,  ordered  by  the  value  of  the  field.

Page 19: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

MongoDB -­ examples  of  indexes

Create  Index  on  any  field  in  the  document

//  1  means  ascending,  -­1  means  descendingdb.newDB.createIndex({“Student.name”:  1})

db.newDB.createIndex({”Student.Exams.name":  -­1})

- 19 -

Page 20: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Issue

What  if  we  want  to  look  for  all  students  attending  a  certain  course?db.newDB.find ({”Student.Exams.name":  "SE"})

…  but  …  we  have  to  retrieve  all  Student  documents  (the  aggregates  containing  the  whole  career  of  each  student)Do  we  have  another  option?Create  aggregates  by  Course  not  by  StudentAdopt  a  column-­based  approach

- 20 -

Page 21: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Column-­based  CRUD  operations  (Hbase)

create  ‘Students’,  ‘StudentData’,  ‘ExamData’  

put  ‘Students’,  ‘CBNN’,  ‘StudentData:S ID’,  ‘150’put  ‘Students’,  ‘CBNN’,  ‘StudentData:Student Name’,  ‘Chiara  Bassi’put  ‘Students’,  ‘GBSE’,  ‘StudentData:S ID’,  ‘145’put  ‘Students’,  ‘GBSE’,  ‘StudentData:Student Name’,  ‘Ginevra  Bianchi’put  ‘Students’,  ‘GBSE’,  ‘ExamData:Score’,  ‘30’…

Page 22: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Column-­based  CRUD  operations  (Hbase)

get  ’Students',  ’GBSE'  -­>  you  get  all  data  concerning  key  GBSEscan  ’Students',  {COLUMNS  =>  [’StudentData:Student Name',  ’ExamData:Score']}  -­>  you  get  all  data  in  the  table  concerning  columns  Student  Name  and  Scorescan  ’Students',  {STARTROW  =>  ’GR'}  -­>  you  get  all  data  with  a  row  key  starting  with  ‘GR’  (all  exams  of  Giovanni  Rossi)

Page 23: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Row-­key  design  considerations  in  HBase

Having  an  efficient  HBase  system  depends  on  how  the  row-­key  is  chosenDepending  on  the  data  access  pattern,  you  will  need  to  design  your  key  accordingly  in  order  to  achieve  better  performance  (per  table)Write  intensive:  random  row  keysRead  intensive:  sequential  row  keysEx.:  App.  for  time  series  analysis  èsequential  row  keys

23

Page 24: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

NoSQL  characteristics

Data  Model  and  CRUD  operations

Key-­Value

Document-­based

Column-­based

Graph-­based

Distributed  management  of  data  and  queries

Partitioning

Replication

PACELC

- 24 -

Page 25: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Data  partitioning

Distributed  RDBMS  often  based  on  a  shared  disk  architectureLimited  scalability

NoSQL  focus  on  a  shared-­nothing  architecture

25

Page 26: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Partitioning

Adopted  whendata  exceeds  the  capacity  of  a  single  machinetraffic  grows  è data  accesses  need  to  be  load-­balanced

26

Vertical Partitioning

Horizontal Partitioningalso called Sharding

Page 27: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Partitioning  and  key-­values

Data  are  independent  from  each  other  -­>  easy  to  distribute  them

- 27 -

132course1 Giovanni Rossi,  123,  SE,  EDN,  10/06/2013,  25

132course2 Giovanni Rossi,  134,  DB1,  LT,  11/06/2013,  26

145course1 Ginevra Bianchi,  123,  SE,  EDN,  10/06/2013,  30

150 Chiara  Bassi

Page 28: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

How  to  shard?

28

Page 29: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

How  to  shard?  (cont)

29

On nodes joining or leaving, data have to be redistributed è Inefficient

Page 30: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

How  to  shard?  (cont)

30

Page 31: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Sharding approaches

Hash-­shardingSee  beforeNo  need  for  a  coordinator,  all  nodes  can  compute  the  distributionCan  support  only  get  operations,  no  scans

Range-­shardingData  are  grouped  based  on  value  ranges  and  distributed  accordinglyA  coordinator  needs  to  manage  the  assignments  and  redistributionScans  within  a  certain  value  range  can  be  local  to  a  partition

- 31 -

Page 32: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Sharding approaches

Entity-­group  shardingIt  is  to  enable  single  partition  transactions  on  co-­located  dataEntity  groups  are  explicitly  defined  by  the  application  or  derived  by  analysing transactions

- 32 -

Page 33: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Sharding and  HBase

33

Row  Key StudentData ExamData

S ID Student  Name C  ID C Name Instructor Date Score

CBNN 150 Chiara Bassi

GBSE 145 Ginevra Bianchi 123 SE EDN 10/06/2013 30

GRSE 132 Giovanni  Rossi 123 SE EDN 10/06/2013 25

GRDB 132 Giovanni  Rossi 134 DB  1 LT 11/06/2013 26

Column  familiesData  ordered  by  row  key

Page 34: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Sharding and  HBase – physical  view  on  data  organization

- 34 -

Row  key Column  key Timestamp Cell  valueCBNN StudentData:S ID 1273516197868 150CBNN StudentData:Stud

ent Name1273516197865 Chiara  Bassi

GBSE StudentData:S ID 1073516197865 145GBSE StudentData:Stud

ent Name1273516197886 Ginevra Bianchi

… … … …

StudentData column family

Row  key Column  key Timestamp Cell  valueBGSE ExamData:C ID 1373516197849 123… … … …

ExamData column family

Page 35: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Sharding and  HBASE

- 35 -

© Hortonworks Inc. 2011 Page 9 Architecting the Future of Big Data

Logical ArchitectureDistributed, persistent partitions of a BigTable

ab

dc

ef

hg

ij

lk

mn

po

Table A

Region 1

Region 2

Region 3

Region 4

Region Server 7Table A, Region 1Table A, Region 2

Table G, Region 1070Table L, Region 25

Region Server 86Table A, Region 3Table C, Region 30Table F, Region 160Table F, Region 776

Region Server 367Table A, Region 4Table C, Region 17Table E, Region 52

Table P, Region 1116

Legend: - A single table is partitioned into Regions of roughly equal size. - Regions are assigned to Region Servers across the cluster. - Region Servers host roughly the same number of regions.

Page 36: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

HBase  Architecture

36

Page 37: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

HBase  physical  view

HBase  creates  a  separate  HFile for  each  Column  FamilyQueries  for  a  single  row,  spanning  through  multiple  CFs,  require  HBase  to  reconstruct  the  row  from  multiple  HFiles.Queries  requesting  multiple  Rows  and  a  single  Column  Family  are  much  more  performant

Keys  need  to  be  chosen  with  careThey  are  the  basis  for  defining  regionsIf  you  use  timestamps-­based  keys,  all  data  in  a  certain  time  interval  will  be  in  the  same  regionThis  may  cause  overload  of  that  region  in  case  of  batch  queries  or  store  operations

37

Page 38: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Replication

Adopted  to:increase  fault-­tolerance  (mitigate  disasters)Load  balance  traffic  occurring  on  the  same  data  sub-­set  (minimize  latency)  

Data  replication  approaches:1. Inter  data  centers  (multiple  geographic  zones)a. Active-­Passive  (one  passive  data  center  used  just  for  

backups  and  reads)b. Active-­Active  (both  datacenters  accept  reads  and  

writes)2. Intra  data  center

38

Page 39: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Replication

39

Page 40: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Non  linearizable write  operations  -­example

40

Page 41: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Replication  strategies

Data  updates  sent  to  all  replicas  at  the  same  timeData  updates  sent  to  a  master  firstData  updates  sent  to  an  arbitrary  location  first

- 41 -

Page 42: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Data  updates  sent  to  all  replicas  at  the  same  time

Assuming  concurrent  updates  Replica  may  choose  different  update  orders  =>  potential  inconsistencyConsensus  protocol  in  place  =>  increase  of  latency

- 42 -

Page 43: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Data  updates  sent  to  a  master  first

Replication  is  synchronous=>  Increase  latency,  it  will  depend  on  the  slowest  node

Replication  is  asynchronousIf  reads  are  allowed  from  all  nodes  =>  inconsistency  can  occur  (e.g.,  PNUTS)If  reads  are  allowed  only  at  the  master  node  =>  no  inconsistency,  latency  increases

A  combination  of  synchronous  and  asynchronous

E.g.,  DynamoDB,  Cassandra,  Riak

43

Page 44: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Data  updates  sent  to  an  arbitrary  location  first

Similar  to  master-­slaveIf  synchronous  replication  =>  latency  can  be  further  increased  in  case  of  simultaneous  updates  

- 44 -

Page 45: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

PACELC  (Abadi 2012)

PACELC  (pass-­elk)  =  if  network  is  Partitioned  then  trade  off  Availability  and  Consistency  Else  trade  off  Latency  and  ConsistencyCAP  theorem  for  abnormal  cases  +  LC  tradeoffs  for  normal  operation

- 46 -

Page 46: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

DBMS  and  CAP  theorem  (Brewers)

Three  possible  guaranteesConsistency:  all  nodes  see  the  same  data  at  the  same  timeAvailability:  every  request  receives  a  response  about  whether  it  was  successful  or  failedPartition  tolerance:  the  system  continues  to  operate  despite  arbitrary  message  loss  or  failure  of  part  of  the  system

CAP  theorem:    it  is  impossible for  a  distributed  computer  system  to  simultaneously  provide  all  three  guarantees(proven  by  Lynch  and  Gilbert,  2002)

Consistency

AvailabilityPartition  tolerance- 47 -

Page 47: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

AP  (Availability  &  Partition  tolerance)    =  CAP-­Availability  system  example

- 48 -

Page 48: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

CP  (Consistency  &  Partition  tolerance)  =  CAP-­Consistency    system  example

- 49 -

Page 49: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

CAP  Theorem  in  summary

When  a  network  partition  occursPreserve  CAP-­Availability:  app  keeps  writing  in  the  database  è Data  not  replicated  è Not  CAP-­ConsistentPreserve  CAP-­Consistency:  Rejecting  operations  on  any  replicaaccepting  ops.  on  R1  and  stopping  ops.  on  R2  (or  vice  versa)  until  the  partition  is  resolved  and  replicas  are  in  sync  (Who  decides  which  replica  can  accept  requests?)

- 50 -

Page 50: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Back  to  the  non-­linearizable example

51

Here  we  are  focusing  on  reducing  latencyOther  scenarios  are  possible

Page 51: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

PACELC  exemplified

PA/EL:  upon  Partitions,  privileges  Availability  Else  LatencyDynamoDB,  Cassandra,  Ryak

PC/EC:  upon  Partitions,  privileges  Consistency  Else  Consistency  Hbase,  VoltDB

PA/EC:  upon  Partitions,  privileges  Availability  Else  Consistency  MongoDB

PC/EL:  upon  Partitions,  privileges  Consistency  Else  LatencyPNUTS

- 52 -

Page 52: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Scalability  vs  expressiveness

53

Expressiveness

Scalability

Key-Value

Column-Based

Document-based

RDBMS

In-MemoryKey-Value

Optimized query Projections, sorting, dynamic queries, indexes, transactions, triggers …

ACID, lock, 2PC, low scalability

Horizontal and vertical partitioning, auto reconciling, P2P

Sharding, load balancer,strict consistency on request, lock-free transactions Graph DBs

Page 53: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Politecnicodi Milano

Database  as  a  Service  (DaaS)

- 54 -

Page 54: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Database  as  a  Service  (DaaS)

Relational  Databases:  Google  Cloud  SQL,  Amazon  RDSCross  data-­center  replication

NoSQLGoogle  Datastore

Lightweight  Transactions  (2PC),  Secondary  IndexesAzure  Tables,  Amazon  DynamoDB,  Google  BigTableAzure  DocumentDB

Different  SLA  levels,  manageable  trade-­offs  between  consistency,  latency  and  availability  

55

Page 55: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Database  as  a  Service  (DaaS)

Automated  scalingFault  toleranceNodes  and  replicas  are  manned  by  the  cloud  operatorAutomated  failure  recovery

Low  maintenanceManaged  updates  at  different  layers  (bare  metal,  software  patches,  etc.)

Geographic  distributionTo  increase  durabilityand  decrease  latency

Accessibility  (always  on)

56

Page 56: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Lack  of  well  defined  SLAs  for  cloud  services

Our  experience  with  DaaSThroughput  for  read  and  write  operations?How  many  parallel  read/write?What  is  the  incoming  and  outgoing  bandwidth?  What  is  the  meaning  for  errors  notified  by  the  DaaS?

In  most  cases  there  is  limited  documentation

57

Page 57: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Our  experience

TABLE I: Migrations preserving eventual consistency

From GAE Datastore to Azure Tables From Azure Tables to GAE Datastore

dataset #a dataset #b dataset #c dataset #a

Source size (MB) 16 64 512 -# of Entities 36940 147758 1182062 36940Migration time (s) 1098 4270 34111 13101Entities throughput (ent/s) 33.643 34.604 34.653 2.820Queued data (MB) 81.98 336.73 2709.80 93.50Extraction&Conversion time (s) 31 120 768 24Queued data throughput(KB/s) 2707.985 2873.446 3613.067 3989.513Avg. %CPU usage 4.749 3.947 4.111 0.605

Moreover, it is exposed as a web service and can be easilyused to configure and control a migration process.

In [20] and [23], we conducted several tests to measurethe performance of our initial implementation. In particular,we migrated data generated by a proof-of-concept application– called Meeting in the Cloud (MiC) [36]2. MiC was built tobe deployed and to exploit the services of two clouds, Azure,and Google App Engine (GAE)3. More specifically, when itis deployed on GAE it uses GAE Datastore and supportsreadings according to an eventually consistency semantics.Instead, when it is deployed on Azure, it uses Azure Tablesand supports readings according to a strongly consistencysemantics. The adoption of two different semantics in the useof the two DaaS therefore allowed us to experiment with datamigration in both conditions.

Data are stored by the MiC application in the form ofentities (both in GAE Datastore and in Azure Tables; an entityis similar to the concept of row in relational databases) whoseaverage size is 754 bytes. In Tables I and II we report theresults obtained when migrating these entities among thesetwo DaaS. Multiple data sets have been considered and testswere performed three times (tables report average figures).Source size indicates the size in Megabytes of the entitieswe are migrating in that particular test (whose number isdenoted with # of Entities); Migration time reports the time, inseconds, needed to complete that particular migration; Entitiesthroughput shows the ratio between the number of migratedentities and the migration time; Queued data indicates the totalsize of the entities (in the meta-model representation format)that have been stored inside the queue during all the migrationprocess; Extraction and conversion time show the time neededto extract all the entities from the source DaaS, convertingthem into the meta-model representation and store them inthe queue; Queued data throughput is the ratio of the twoprevious rows; finally, avg. %CPU usage reports the averageCPU percentage measured. In order to conduct these tests, wedeployed the migration system on a Google Compute Engine(a IaaS platform) Virtual Machine (VM) – physically hostedby the Google data-center in Western-Europe (WE) – withtwo virtual CPUs and 7.5GB of RAM, running a Linux-basedoperating system.

Test results, reported in the left part of Table I, showedthat we were able to migrate entities from GAE Datastoretowards Azure Tables, preserving eventual consistency, at anaverage rate of 34 entities/s, independently from the dataset

2GitHub repository - https://github.com/deib-polimi/mic-backend3Recently, it has been extended to be deployed also on other PaaS level

clouds.

size. In order to verify the reverse migration and to checkthe correctness of our migration process, we transferred data,already migrated, back to GAE Datastore. During these exper-iments we experienced several undocumented errors thrown bythe DaaS when migrating more than 70,000 entities towardsGAE Datastore, under various working conditions, and withdifferent data sets. For this reason, we report here only theexperiment concerning 36,940 entities (see the right part ofTable I). In this case we checked that the migration wasconservative by verifying that the number of migrated entitieswas the same and that their content was identical to the onewe migrated in the first test. The migration time, however,was increased by a factor of 11.93. This is because entitiesmanaged according to an eventual consistency policy and, thus,belonging to different Partition Groups in the intermediateformat, are required to have different Ancestors in Datastore.Thus, every write requires not only the creation of the entity inthe database but also the creation of a corresponding ancestor(this corresponds to two write operations plus one read neededto check if that ancestor is not already existing).

Table II reports the results achieved when migrating entitiesfrom Azure Tables to GAE Datastore, and vice versa, pre-serving strong consistency. As data to be handled accordingto the strong consistency semantics in Datastore have to beconnected to the same ancestor, the migration time towardsGAE Datastore includes not only the write operations butalso all reads needed to retrieve the ancestor to be used forgrouping the entities. As regards data migration back intoAzure Tables, the results we obtained, in terms of throughput,are similar to the respective ones, when preserving eventualconsistency, since no computationally intensive operation isperformed to maintain strong consistency in Azure Tables (it isjust a matter of setting the same Partition Key when translatingentities contained in the same meta-model Partition Group).Of course, also in this case we verified that the migration wasconservative.

In both scenarios, the use of CPU by Hegira4Cloud was al-ways negligible. However, the limited throughput we obtainedin all cases proved that this initial prototype is not suitable tomanage BigData problems.

VI. IMPROVING HEGIRA4CLOUD THROUGH ACTIONRESEARCH

Given the lack of tools for design-level analysis and opti-mization of our framework, we decided to address the problemof improving the performance of Hegira4Cloud through actionresearch. More specifically, we decided to explore the solutionspace through the development of various experiments and as-

TABLE II: Migrations preserving strong consistency

From Azure Tables to GAE Datastore From GAE Datastore to Azure Tables

dataset #d dataset #e dataset #f dataset #d dataset #e dataset #f

# of Entities 9235 36940 55410 9235 36940 55410Migration time (s) 1402 5340 8599 275 1098 1645Entities throughput (ent/s) 6.587 6.918 6.444 33.581 33.643 33.684Queued data (MB) 22.40 89.61 134.41 22.97 93.50 140,33Extraction and Conversion time (s) 10 30 41 8 31 45Queued data throughput (KB/s) 2294.067 3058.627 3357.047 2940.16 3088.51 3193.29Avg. %CPU usage 1.509 1.139 0.957 4.932 4.746 4.564

sociated prototypes. Given the unesplicable and undocumentedbehavior we experienced while writing on GAE Datastore, wedecided to perform our experiments referring to the migrationfrom Datastore to Azure Tables and not vice versa. In our spaceexploration approach we went through the steps presented inthe following subsections.

A. Testing the read and write DaaS limitsThe read and write throughput from/to the DaaS under

consideration can certainly be considered as an upper boundfor any migration system. For this reason, our investigationstarted with an analysis of such throughput. We could not findany information on this in the documentation offered by GAEDatastore and Azure Tables. In [37] authors measured AzureTables performance by issuing CRUD (i.e. create, read, update,and delete) operations, but since those tests were performedfive years ago, they do not necessarily reflect the currentsituation. Thus, we decided to perform our own experiments.

TABLE III: Azure Tables entities throughput.

EAS ETS(MB) #threads Tt (s) X (ent/s) X (KB/s)

754 Byte 106 10 217.3 680.0 499.5754 Byte 106 20 121.3 1218.1 894.8754 Byte 106 40 102.3 1444.4 1061.0754 Byte 106 60 122.0 1211.1 889.7754 Byte 106 192 127.3 1160.7 852.7

1KB 152 10 230.0 642.4 676.71KB 152 20 156.0 947.2 997.71KB 152 40 129.0 1145.4 1206.64KB 580 10 244.0 605.6 2434.14KB 580 20 148.0 998.4 4013.04KB 580 40 139.0 1063.0 4272.84KB 580 60 120.0 1231.3 4949.34KB 580 192 116.6 1267.2 5093.7

As regards the read throughput from the source DaaS, wecan safely say that it is not a bottleneck for the migrationsystem, since, based on the preliminary results we obtainedin Tables I and II, Hegira4Cloud is able to read and processdata at a throughput up to 1,539 entities/s (given by the ratiobetween the number of migrated entities and the extractionand conversion time). For this reason, we do not go further intesting the read limits.

Instead, for what concerns the write throughput towardsAzure Tables, we developed a tool which allows us to measurethe maximum number of parallel writes an application is ableto make in the same Azure Table account. In the experiments,we considered entities having different average size (754 Byte,1KB and 4KB), trying to understand how entity size impactsAzure Tables write speed. For the tests we used an Azure VM

with the following configuration: Ubuntu Server 12.04, locatedin Microsoft WE data-center, with 4 CPU and 7 GB of RAM.Moreover, we used log4j library to compute tests duration anda custom library to measure the size of objects stored in thequeue and the size of Azure Tables entities [38]. Table IIIsummarizes the test results. For each run we transferred aconstant number of entities (147,758 entities) and we variedthe following parameters:

• The entity average size EAS, as well as the entities totalsize ETS.

• The number of consumer threads #threads writing inparallel to Azure Tables.

As a result of the tests we collected the following outputperformance metrics:

1) The overall time Tt required for writing entities in AzureTables.

2) The maximum throughput X achieved for data transfer.We report both entities/s and in KB/s values.

Each of the values in the table is the average obtained fromthree runs with the same inputs.

We found that Azure Tables maximum write throughputvaried between 600 and 1,444 entities/s, depending on thenumber of threads writing in parallel on it. While it seems wereached a limit in terms of writable entities per second, thethroughput measured in KB/s kept increasing with the entitiesaverage size, proving that the database limits the number ofoperations per second and not the throughput in terms ofbytes per second. These values, if compared with the resultswe obtained with our preliminary version of Hegira4Cloud inSection V (up to 34.6 ent/s), show that the target DaaS is notthe system bottleneck and the limited throughput we obtainedwith Hegira4Cloud is due to some other cause. Hence, weconducted further tests.

B. Checking if the bandwidth is a system bottleneck

The next step towards the identification of the bottleneckconsists in measuring the maximum network bandwidth avail-able for migration.

For these tests, we deployed two VMs on Azure and oneon Google Compute Engine. More specifically, the Azure VMshad the following configuration: Ubuntu Server 12.04, locatedin the same Microsoft WE data center (not in the same virtualnetwork), with 4 CPU cores and 7 GB RAM. The Google VMhad the following characteristics: Debian 7, hosted in GoogleWE data-center, with 2 virtual CPUs and 7.5GB of RAM.

On each of these VMs we installed the iperf tool [39], ableto measure the maximum bandwidth on IP networks among

58

Page 58: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Politecnicodi Milano

Polyglot  persistence

- 59 -

Page 59: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Polyglot  Persistence

In  recent  years  the  idea  of  polyglot  persistence  has  emerged  and  become  popularUse  appropriate  data  model  for  different  parts  of  the  persistence  layerRelational  databases  for  structured,  tabular  dataDocument  databases  for  un/semi-­structured  dataKey-­Value  databases  for  hash  tablesGraph  databases  for  highly  linked  referential  data

This  brings  data  consistency  and  duplication  issues

60

Page 60: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Polyglot  Persistence

61

Rapid access for reads and writes. No need to be durable.

Needs transactional updates. Tabular structure fits data.

Needs high availability across multiple locations. Can merge inconsistent writes.

Rapidly traverse links between friends, product purchases, and ratings.

High volume of writes on multiple nodes.

Large-scale analytics on large cluster.

SQL interfaces work well with reporting tools.

Lots of reads, infrequent writes. Products make natural aggregate.

Page 61: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Multi-­model  NoSQL  databases

Polyglot  persistence  may  bring  data  consistency  and  duplication  issuesHaving  a  polyglot  persistent  application  requires  to  manage  and  orchestrate  different  complex  systems  (operational  complexity)Multi-­model  databases  consist  of  document  store,  KV  store  and  graph  database  all  in  one  database  engineThey  provide  a  unique  query  language  and  APIThe  most  common  implementations  (ArangoDB,  OrientDB)  map  document  and  graph  data  models  to  the  Key-­Value  data  model

62

Page 62: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Politecnicodi Milano

Conclusions  on  NoSQL

- 63 -

Page 63: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Conclusions

RDBMS  pros:ACID  transactions  (when  needed)  make  development  easierMost  SQL  code  is  portable  to  other  SQL  databasesType  columns  and  integrity  constraints  validate  data  before  it  is  added  to  the  DB.  It  increases  Data  Quality

RDBMS  cons:ORM  layer  can  be  complexRDBMS  are  difficult  to  scale  outSharding requires  complex  application  code  and  will  be  operationally  inefficientDifficult  to  store  un/semi structured  data

64

Page 64: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Conclusions

NoSQL  pros:Data  modeling  can  be  an  iterative  processLinear  scaling  occurs  as  nodes  are  added  to  the  clusterLower  operational  costs  are  obtained  by  autoshardingNative  integration  of  Map/Reduce  Frameworks  and  Full-­text  search  enginesNo  need  for  ORM  layersEasy  and  efficient  storage  of  high-­variable  data  

65

Page 65: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Conclusions

NoSQL  “cons”Implicit  schema  at  the  application  levelApplications  need  to  check  for  consistency  and  integrity  constraintsNot  possible  to  express  stored  procedures  (except  for  HBase  and  MongoDB)No  transactions  (across  multiple  objects),  conflict  resolution  must  be  done  by  the  client  application

66

Page 66: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Conclusions

NoSQL  “cons”:ACID  transactions  are  limited  to  just  one  element  (row,  document,  entity,  etc.)  in  contrast  with  RDBMS

2nd generation  NoSQL  or  NewSQL databases  try  to  cope  with  this  problem

Data  models  and  query  languages  are  proprietary  and  create  vendor  lock-­inData  structure  is  chosen  (denormalization)  upfront,  based  on  the  queries  that  will  be  expressed.  If  queries  change  also  data  need  to  be  changed  (exception:  graph  dbs)  è Map/Reduce

67

Page 67: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Good  candidate  projects  for  NoSQL  databases

StrategicCompany  core  business  (competitive  advantage)Not  utility  projects  (i.e.  they  aren’t  central  to  the  competitive  advantage  of  the  company)

Too  risky  to  adopt  and  low  benefitsAND

Rapid  time  to marketTo  maximize  the  development  productivity

AND/ORData  intensive

Lots  of  dataLow  latency  and  high  availabilityLots  of  traffic:  reads  or  writes

68

Page 68: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Thanks  for  your  attention...

69

Page 69: NoSQL&databases - WordPress.com · NoSQL&databases&and&data&denormalization (3)–columnVbasedapproach 14 ... E.g.,&DynamoDB,Cassandra, Riak 43. Data&updates&sent&to&an&arbitrary&

Bibliography

https://medium.com/@sent0hil/consistent-­hashing-­a-­guide-­go-­implementation-­fe3421ac3e8fhttps://martin.kleppmann.com/2015/05/11/please-­stop-­calling-­databases-­cp-­or-­ap.htmlhttp://www.christof-­strauch.de/nosqldbs.pdfhttp://bigdatauniversity.com/courses/course/view.php?id=572&justenroled=1http://www.n10k.com/blog/hbase-­for-­architects/[Chen  &  Zhang  2014]  C.L.  Philip  Chen,  Chun-­Yang  Zhang,  Data-­intensive  applications,  challenges,  techniques  and  technologies:  A  survey  on  Big  Data,  Information  Sciences  275  (2014)  314–347.[Gartner  2015]  Frank  Buytendijk,  Thomas  W.  Oestreich,  Organizing  for  Big  Data  Through  Better  Process  and  Governance,  Gartner  report,  March  2015.[Abadi et  al  2013]  The  Beckman  Report  on  Database  Research  http://beckman.cs.wisc.edu/beckman-­report2013.pdfhttp://docs.mongodb.org/Joshua  Shinavier -­ http://www.slideshare.net/joshsh/texas-­linuxfestival2014Making  Sense  of  NoSQL,  A  GUIDE  FOR  MANAGERS  AND  THE  REST  OF  US  -­ DAN  MCCREARY  ANN  KELLYNoSQL  Distilled:  A  Brief  Guide  to  the  Emerging  World  of  Polyglot  Persistence  -­ Pramod J.  Sadalage,Martin Fowlerhttp://martinfowler.com/bliki/PolyglotPersistence.html

70