40
d60 developing smart software solutions So you want to liberate your data? April 2012

So you want to liberate your data?

Embed Size (px)

Citation preview

Page 1: So you want to liberate your data?

d60 developing smart software solutions

So you want to liberate your data? April 2012

Page 2: So you want to liberate your data?

Mogens  Heller  Grabe  

     

[email protected]  @mookid8000  

h8p://mookid.dk/oncode  

Page 3: So you want to liberate your data?
Page 4: So you want to liberate your data?

Agenda  

•  Data,  queries,  etc.  •  Concurrency  •  AggregaEon  •  Deployment  •  Durability  •  Things  to  be  aware  of  

Page 5: So you want to liberate your data?

MongoDB  

•  Document  database  •  Currently  in  v.  2.0.4  •  Developed  by  10gen  •  Open  source  

–  server  is  GNU  AGPL  v3  –  clients  (the  official)  are  Apache  V2  

•  Absolutely  free  to  use  –  you  can  get  a  commercial  version  of  the  db  though  –  has  support,  SSL,  and  more  security  features  

Page 6: So you want to liberate your data?

Conceptual  data  organizaEon  

process database collection document

process

database table row

Page 7: So you want to liberate your data?

Data  

Page 8: So you want to liberate your data?

Example  1  

•  Install  •  Mongo  Shell  •  Show  database  contents  •  Add  and  show  a  document  

Page 9: So you want to liberate your data?

Queries  

including  several  other  query  operators:  $gt,  $gte,  $lt,  $lte,  $exists,  $all,  etc...  

Page 10: So you want to liberate your data?

Indexes  

Page 11: So you want to liberate your data?

Updates  

including  several  other  update  modifiers:  $inc,  $set,  $addToSet,  $rename,  etc...  

Page 12: So you want to liberate your data?

Example  2  

•  Import  some  data  •  Query  •  Update  •  Index  •  Query  

Page 13: So you want to liberate your data?

ACID?  

•  Atomic:  Yeah  well,  per  document.  •  Consistent:  Yeah  well,  can  be.  •  Isolated:  Yeah  well,  per  document.  •  Durable:  Yeah  well,  can  be  –  not  default  though....  

Page 14: So you want to liberate your data?

Concurrency  

•  Pushing  it  down  the  stack  

Page 15: So you want to liberate your data?

Concurrency  

•  Preserve  invariants  with  update  precondiEons  

Page 16: So you want to liberate your data?

Concurrency  

•  Use  opEmisEc  locking  when  replacing  document  

 (and  then  check  whether  n  is  0  or  1...)  

Page 17: So you want to liberate your data?

Concurrency  

•  Use  FindAndModify  to  “check  out”  documents  

Page 18: So you want to liberate your data?

AggregaEon  

•  Map/reduce  

Page 19: So you want to liberate your data?

AggregaEon  

•  Map/reduce  – Map:  for  each  document:    emit  0  or  more  (key,  value)  tuples  

– Reduce:  given  a  (key,  value[]),    return  1  value  

Page 20: So you want to liberate your data?

AggregaEon  m  =  function()  {          var  doc  =  this;          doc.appearances.forEach(function(a)  {                  emit(a,  {                          count:  1,                            names:  [doc.firstName  +  “  “  +  doc.lastName]                  });          });  }    r  =  function(key,  values)  {          var  count  =  0;          var  names  =  [];          values.forEach(function(v)  {                  count  +=  v.count;                  names  =  names.concat(v.names);          });          return  {count:  count,  names:  names};  }  

Page 21: So you want to liberate your data?

Example  3  

•  Use  map/reduce  to  collect  informaEon  on  who  appeared  in  each  episode  

Page 22: So you want to liberate your data?

AggregaEon  

•  AggregaEon  framework  (not  available  unEl  2.2)  – declaraEve  syntax  for  construcEon  of  an  aggregaEon  pipeline  

Page 23: So you want to liberate your data?

AggregaEon  

•  AggregaEon  framework  (not  available  unEl  2.2)  

Page 24: So you want to liberate your data?
Page 25: So you want to liberate your data?
Page 26: So you want to liberate your data?
Page 27: So you want to liberate your data?
Page 28: So you want to liberate your data?

Deployment  

•  Several  configuraEons  – we’ll  check  out  replica  sets  and  sharding  

Page 29: So you want to liberate your data?

Replica  sets  

•  Master-­‐slave  with  automaEc  failover  – Each  mongod  should  be  started  with  the  -­‐-­‐replset  argument  

– AddiEonal  nodes  added  from  the  shell  – Make  sure  the  number  of  nodes  is  odd,  possibly  by  adding  an  arbiter  

Page 30: So you want to liberate your data?

Replica  sets  

•  Higher  availability  •  Scale  out  reads  •  Backup  without  interfering  with  the  primary  

Page 31: So you want to liberate your data?

Sharding  

•  Auto-­‐sharding  – happens  by  user-­‐defined  shard  key  

– can  be  defined  per  collecEon  

–  requires  special  nodes:  mongos  (the  load  balancer)  and  a  mongod  that  is  configured  to  be  a  configuraEon  server  

Page 32: So you want to liberate your data?

Sharding  

•  Scale  out  writes  

•  LimitaEons:  – Shard  key  is  immutable  – All  inserts/updates  must  include  the  shard  key  – Cannot  enforce  (arbitrary)  uniqueness  across  shards,  only  for  shard  key  

Page 33: So you want to liberate your data?

Sharding  +  replica  sets  

Page 34: So you want to liberate your data?

MongoDB’s  durability  story  

•  Memory-­‐mapped  files.  •  fsync.  

•  Durability  through  replicaEon  – pre  1.8  

•  Durability  through  journaling  – an  opEon  since  1.8  –  replica  sets  sEll  cool  though  – default  since  2.0  

Page 35: So you want to liberate your data?

MongoDB’s  durability  story  

•  Inserts  and  updates  are  unsafe  by  default!!  – only  purpose:  get  awesome  benchmarks  – bad:  bites  you  in  the  a**  

•  Exposed  differently  on  drivers,  but  always  maps  to  db.getLastError()  

Page 36: So you want to liberate your data?
Page 37: So you want to liberate your data?

MongoDB’s  durability  story  

•  Conclusion:  It’s  cool  that  you  can  tweak  it  per  operation,  but  it’s  uncool  that  it’s  unsafe.  

Page 38: So you want to liberate your data?

Things  to  be  aware  of  

•  Safe  mode  off  •  32/64  bit  •  Memory-­‐mapped  file  •  Global  write  lock  •  Indexes  should  always  fit  in  RAM  

Page 39: So you want to liberate your data?

Thanks  for  listening!  

[email protected]  @mookid8000  

h8p://mookid.dk/oncode  

Page 40: So you want to liberate your data?

Image  credits  The  world’s  most  interesEng  man:  h8p://i.qkme.me/3mwy.jpg  Bison:  h8p://www.flickr.com/photos/johan-­‐gril/5632513228/  Tired  Fry:  h8p://cdn.memegenerator.net/instances/400x/18731987.jpg              Thanks  for  lerng  me  borrow  your  awesome  images  –  if  you  ever  meet  me,  I’ll  buy  you  a  beer.  Seriously,  I  will.