35
© 2014 GridGain Systems, Inc. YAKOV ZHDANOV Director R&D InMemory Compu7ng: From DiskFirst Architecture to MemoryFirst www.gridgain.com @gridgain

In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

YAKOV  ZHDANOV  Director  R&D  

In-­‐Memory  Compu7ng:  From  Disk-­‐First  Architecture  to  Memory-­‐First  

www.gridgain.com   @gridgain  

Page 2: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

GridGain:  In-­‐Memory  Compu7ng  Leader    

•  More  than  5  years  in  producGon  •  100s  of  customers  &  users    •  Starts  every  10  seconds  worldwide  

Page 3: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

Agenda  

•  [R]EvoluGon  of  Data:  Extreme  Volume  Growth  •  In-­‐Memory  CompuGng:  Faster  Apps  on  Larger  Data  Sets  •  Example  System:  Moving  ApplicaGon  “In-­‐Memory”  •  GridGain:  Overview  And  “Live  Coding”  Sessions  •  QA  

Page 4: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

[R]Evolu7on  of  Data  

2.5  exabytes*  of  data  was  generated  every  day  in  2012  (IBM)    This  is  62.5  km  high  stack  of  1Tb  3.5”  HDD    

*  1  exabyte  =  1018  bytes  

62.5  km  

Page 5: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

Why  Speeding  Up  Data  Processing?  

•  2008:  Amazon  found  every  100ms  of  latency  cost  them  1%  in  sales  

•  Analysts  demand  sub-­‐second,  near  real-­‐Gme  query  results  

•  On-­‐line  traders  want  ultra-­‐low  latencies  

Page 6: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

Problem  

Data  sets  are  large  and  complex  –  new  tools  needed.  

Page 7: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

Solu7on:  In-­‐Memory  Compu7ng  •  Data  is  in-­‐memory  –  no  disk  IO  •  Mostly  distributed  –  mulG-­‐node  topologies  •  Parallel  processing  •  Deal  with  operaGonal  data  set  •  Map-­‐Reduce  support  •  Middleware  sogware  

RAM  storage  and  parallel  distributed  processing  are  two  fundamental  pillars  of  in-­‐memory  compuGng.  

Page 8: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

In-­‐Memory  Compu7ng:  The  Best  Use  Cases*    

•  Investment  banking  •  Insurance  claim  processing  &  modeling  •  Real-­‐Gme  ad  plahorms  •  Merchant  plahorm  for  online  games  •  GeospaGal/GIS  processing  •  Medical  imaging  processing  •  Complex  event  processing  of  streaming  sensor  data  

*I  can  only  speak  for  GridGain  which  has  producGon  customers  in  a  wide  variety  of  industries  to  be  staGsGcally  significant  

Page 9: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

Example  System  

•  Imagined  system  in  the  beginning  of  its  lifecycle  •  Growing:  handling  more  users  and  data  •  Moving  to  memory  for  beker  characterisGcs  •  Comparison:  before  and  ager  

Page 10: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

Example  System:  Just  Launched  

Page 11: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

Example  System:  Growing  

Page 12: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

Example  System:  Moving  to  Memory  (Step  1)  

Page 13: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

Example  System:  Moving  to  Memory  (Step  2)  

Page 14: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

Comparison:  Before  And  A\er  

•  Data  distribuGon  in  in-­‐memory  system  •  Simple  data  update  scenarios  •  Long  running  queries  •  Scalability    •  Failure  tolerance  

Page 15: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

Comparison:  Data  Distribu7on  

Employee  Larger  data  sets  stored  in  PARTITIONED  manner  

Employee  PARTITIONED   1/4  +  [1/4]   1/4  +  [1/4]   1/4  +  [1/4]   1/4  +  [1/4]  

Each  server  is  PRIMARY  for  1/4  of  Employee  objects  and  BACKUP  for  another  1/4.  

Page 16: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

Comparison:  Data  Distribu7on  

Company   Employee  •  Smaller  data  sets  may  be  

REPLICATED  for  colocaGon  

•  Larger  data  sets  stored  in  PARTITIONED  manner  

Company  REPLCATED  

Employee  PARTITIONED  

FULL  

1/4  +  [1/4]  

FULL  

1/4  +  [1/4]  

FULL  

1/4  +  [1/4]  

FULL  

1/4  +  [1/4]  

Companies  and  employees  are  colocated  

Page 17: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

Comparison:  Data  Access  And  Update  Before:  Update  User  Profile  

100%  of  the  requests  end  up  to  the  same  

DB  server  

Ager:  Update  User  Profile  

Servers  evenly  share  the  load,  since  profiles  are  distributed  along  the  cluster.  Persistent  storage  is  

updated  in  async  manner  

Page 18: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

Comparison:  Long  Running  Queries  Before:  select  avg(sum)  from  Orders  

100%  of  the  requests  end  up  to  the  same  DB  server  which  is  responsible  for  scanning  the  enGre  data  set.  

Ager:  select  avg(sum)  from  Orders  

Servers  evenly  share  the  load  running  query  against  parGGoned  data.  Each  server  has  smaller  

data  set  to  process.  

Page 19: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

Comparison:  Scalability  

Page 20: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

Comparison:  Scalability  

Page 21: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

Comparison:  Failure  Tolerance  

Failure  of  master  DB  server  or  certain  servers  in  NoSQL  deployment  threatens  the  app.    

Failure  of  1-­‐2-­‐3-­‐N  nodes  or  planned  shutdown  does  not  threaten  the  cluster.    

Page 22: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

Economics  of  In-­‐Memory  Compu7ng  

•  High  Performance  and  low  Latencies  •  RAM  +  Network  Faster  than  Disk  or  Flash  •  VolaGle  and  Persistent  •  Cost  EffecGve  •  OLAP  and  OLTP  Use  Cases  •  Distributed  or  Not  •  Caching,  Streaming,  ComputaGons  •  Data  Querying  –  SQL  or  Unstructured  

Page 23: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

Where  Is  Your  Applica7on?  

RDBMS  

L2  Caching  NoSQL  Distributed  Disk  Systems  

In-­‐Memory  Data  Grids  IMDBs  

Page 24: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

GridGain:  Try  In-­‐Memory  Compu7ng  

•  Full  in-­‐memory  stack:  compute,  caching,  streaming  •  IntuiGve  APIs  –  easy  to  start  with  •  Open-­‐sourced  under  Apache  2.0  license  •  Hosted  and  developed  on  GitHub  –  fork  and  enjoy!  

Page 25: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

GridGain:  In-­‐Memory  Data  Fabric    Strategic  Approach  to  IMC  

•  Supports all Apps

•  Simple Java APIs •  1 JAR Dependency •  High Performance & Scale •  Automatic Fault Tolerance •  Management/Monitoring •  Runs on Commodity Hardware

•  Supports existing & new data sources

•  No need to rip & replace

Clustering & Compute Grid

Data Grid Streaming

Hadoop Acceleration

©  2014  GridGain  Systems,  Inc.  

Page 26: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

GridGain:  Clustering  And  Compute  •  Direct  API  for  MapReduce  •  Direct  API  for  Fork/Join  •  Zero  Deployment  •  Cron-­‐like  Task  Scheduling  •  State  Checkpoints  •  Early  and  Late  Load  Balancing  •  AutomaGc  Failover  •  Full  Cluster  Management  •  Pluggable  SPI  Design  

Page 27: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

GridGain:  Automa7c  Cluster  Discovery  

Page 28: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

GridGain:  Closure  Execu7on  

Page 29: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

GridGain:  Closure  Execu7on  

Page 30: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

GridGain:  In-­‐Memory  Caching  and  Data  Grid  

•  Distributed  In-­‐Memory  Key-­‐Value  Store  •  Replicated  and  ParGGoned  •  TBs  of  data,  of  any  type  •  On-­‐Heap  and  Off-­‐Heap  Storage  •  Backup  Replicas  /  AutomaGc  Failover    •  Distributed  ACID  TransacGons    •  SQL  queries  and  JDBC  driver  •  ColocaGon  of  Compute  and  Data  

Page 31: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

GridGain:  Cache  Opera7ons  

Page 32: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

GridGain:  Cache  Transac7on  

Page 33: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

GridGain:  Distributed  Java  Data  Structures  

•  Distributed  Map  (cache)  •  Distributed  Set  •  Distributed  Queue  •  CountDownLatch  •  AtomicLong  •  AtomicSequence  •  AtomicReference  •  Distributed  ExecutorService  

Page 34: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

Client-­‐Server  vs  Affinity  Coloca7on  

Client-­‐Server   Affinity  ColocaGon  

Page 35: In-Memory&Compu7ng:& FromDisk ...©2014"GridGain"Systems,"Inc." GridGain:&In-Memory&Compu7ng&Leader&& • More"than"5"years"in"producGon" • 100s"of"customers"&"users"" • Starts"every"10"seconds

©  2014  GridGain  Systems,  Inc.  

THANK  YOU  

www.gridgain.com   @gridgain