22
Coherence & Big Data Ben Stopford

Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,

Coherence & Big Data

Ben Stopford

Page 2: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,

Can you do ‘Big Data’ in Coherence?

Page 3: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,

Maybe?!?!?!

•  Problem: Cost of memory / 6x storage ratio – > Elastic data (Disk or RAM) – > Keep number indexes small – > off heap indexes (coming)

•  Problem: Getting your (big) data loaded – > Recoverable caching – > Use other distributed backing store

Page 4: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,

But

•  Elastic data & recoverable caching are separate (plan to unify) – RC => ED is IO intensive (two distinct copies). – 2x disk footprint – No compression – Rebalance time – Memory Ratio (the 6x) >>> Low TB Zone

Page 5: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,

BIG DATA BANDWAGON

BIG DATA!BAND

WAGGON!

Page 6: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,

Backing Layer

Cohe

renc

e!

NoSQ

L!

Recent data in cache!

Fast data load!

Lower cost full history!

Write-through!

Page 7: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,

Hadoop

•  Backing – HDFS

•  Big files (~GBs) •  No random write (ok if you journal writes) •  Use sequence files •  Hard to manage active set

– Hbase (Better option) •  Fast writes (LSM) •  Supports predicate pushdown •  More complex setup (ZK, NN etc)

Page 8: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,

Heavily memory optimised. Fast but too similar to Coherence to be a good fit!

KV but can scan with MR API. Eventually consistentency may not suit!

Read/Memory optimised (3.0 big improvement). Rich queries.!

KV with secondary indexes & range predicates!

NoSQL Backing Low memory footprint, write optimised!•  Cassandra

•  MongoDB •  Oracle NoSQL •  Riak •  Couchbase

Page 9: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,

Streams

Page 10: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,

Message Stream Products

RabbitMQ Kafka

Aeron

Page 11: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,

•  Great complement for Coherence •  Write through to a topic. Immutable state.

Other !data !

center!DB

Cache of recent data with a rich query API!

Event stream!(system of record)!

Async views: relational, raw, streaming, historic!

Async!Streaming

clients!

sync!

async!

Inbound stream processors!Direct reads & writes!

Messaging as a Backing Store

Page 12: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,

Hang Tertiary ‘VIEWS’

•  Search: Elastic Search, Solr •  Graph: Neo4J, OrientDB •  Relational: Oracle. Postgres, Teradata •  Analytic: Exadata, Teradata, Greenplumb •  Document archive: Mongo •  Hadoop: HBase, HDFS, Parquet, avro, PB etc

•  Complexity increases with Polyglot Persistence Pattern.

•  Replica instantiation is good

Page 13: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,

Streams Processors

•  Storm •  Samza •  Spark Steaming (microbatch) •  Libraries such as Esper

Page 14: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,

Stream layer (fast)!

Batch Layer!Serving Layer!

All y

our

data! Query!

Query!

Lambda Architecture

Page 15: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,

Kafka + Storm!

Hadoop!Cassandra!

All y

our

data! Query!

Query!

Lambda Architecture

- Cool architecture for use cases that cannot work in a single pass.!- General applicability limited by double-query & double-coding.!

Page 16: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,

All y

our

data!

Kappa Architecture Views!

Client!

Client!

Stream!

Search!

NoSQL!

SQL!

Stream !Processor!

Page 17: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,

All y

our

data!

Kappa Architecture Views!

Client!

Client!

- Simpler choice where stream processors can handle full problem set!

Kaffka!

Elastic!Search!

Cassandra!

Oracle!

Samza or!Storm!

Page 18: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,

Operational /Analytic Bridge

A

ll yo

ur d

ata!

Client!

Client!

Client!Operational!

Search!

SQL!

NoSQL!Stream!

Views!Stream !

Processor!

Page 19: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,

Operational /Analytic Bridge

A

ll yo

ur d

ata!

Client!

Client!

Client!Coherence!

Hadoop!

Oracle!

Cassandra,!MongoDB!

Kaffka,!RabbitMQ!

…!

Views!

- Adds coordination layer needed for collaborative updates!

Samza!

Page 20: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,

Nice Stuff

•  Scale-by-Sharding at the front, Scale-by-Replication at the back

•  Some “normalisation” at front. Fully denormlaised at the back.

•  Rewind used to recreate ‘views’

Page 21: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,

So

•  New Coherence features should make TB+ generally viable

•  Sensible caching/processing layer over a simpler store

•  NoSQL can provide a sensible interim backing store for larger datasets

•  Forms a great write-through layer atop a streaming architecture (Op/Analytic Bridge)

Page 22: Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast data load ! Lower cost full history! Write-through! Hadoop • Backing ... HBase, HDFS,

Thanks!