Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast...

Coherence & Big Data

Ben Stopford

Can you do ‘Big Data’ in Coherence?

Maybe?!?!?!

•  Problem: Cost of memory / 6x storage ratio – > Elastic data (Disk or RAM) – > Keep number indexes small – > off heap indexes (coming)

•  Problem: Getting your (big) data loaded – > Recoverable caching – > Use other distributed backing store

•  Elastic data & recoverable caching are separate (plan to unify) – RC => ED is IO intensive (two distinct copies). – 2x disk footprint – No compression – Rebalance time – Memory Ratio (the 6x) >>> Low TB Zone

BIG DATA BANDWAGON

BIG DATA!BAND

WAGGON!

Backing Layer

Recent data in cache!

Fast data load!

Lower cost full history!

Write-through!

Hadoop

•  Backing – HDFS

•  Big files (~GBs) •  No random write (ok if you journal writes) •  Use sequence files •  Hard to manage active set

– Hbase (Better option) •  Fast writes (LSM) •  Supports predicate pushdown •  More complex setup (ZK, NN etc)

Heavily memory optimised. Fast but too similar to Coherence to be a good fit!

KV but can scan with MR API. Eventually consistentency may not suit!

Read/Memory optimised (3.0 big improvement). Rich queries.!

KV with secondary indexes & range predicates!

NoSQL Backing Low memory footprint, write optimised!•  Cassandra

•  MongoDB •  Oracle NoSQL •  Riak •  Couchbase

Streams

Message Stream Products

RabbitMQ Kafka

•  Great complement for Coherence •  Write through to a topic. Immutable state.

Other !data !

center!DB

Cache of recent data with a rich query API!

Event stream!(system of record)!

Async views: relational, raw, streaming, historic!

Async!Streaming

clients!

async!

Inbound stream processors!Direct reads & writes!

Messaging as a Backing Store

Hang Tertiary ‘VIEWS’

•  Search: Elastic Search, Solr •  Graph: Neo4J, OrientDB •  Relational: Oracle. Postgres, Teradata •  Analytic: Exadata, Teradata, Greenplumb •  Document archive: Mongo •  Hadoop: HBase, HDFS, Parquet, avro, PB etc

•  Complexity increases with Polyglot Persistence Pattern.

•  Replica instantiation is good

Streams Processors

•  Storm •  Samza •  Spark Steaming (microbatch) •  Libraries such as Esper

Stream layer (fast)!

Batch Layer!Serving Layer!

data! Query!

Query!

Lambda Architecture

Kafka + Storm!

Hadoop!Cassandra!

data! Query!

Query!

Lambda Architecture

- Cool architecture for use cases that cannot work in a single pass.!- General applicability limited by double-query & double-coding.!

Kappa Architecture Views!

Client!

Stream!

Search!

NoSQL!

Stream !Processor!

Kappa Architecture Views!

Client!

- Simpler choice where stream processors can handle full problem set!

Kaffka!

Elastic!Search!

Cassandra!

Oracle!

Samza or!Storm!

Operational /Analytic Bridge

Client!

Client!Operational!

Search!

NoSQL!Stream!

Views!Stream !

Processor!

Operational /Analytic Bridge

Client!

Client!Coherence!

Hadoop!

Oracle!

Cassandra,!MongoDB!

Kaffka,!RabbitMQ!

Views!

- Adds coordination layer needed for collaborative updates!

Samza!

Nice Stuff

•  Scale-by-Sharding at the front, Scale-by-Replication at the back

•  Some “normalisation” at front. Fully denormlaised at the back.

•  Rewind used to recreate ‘views’

•  New Coherence features should make TB+ generally viable

•  Sensible caching/processing layer over a simpler store

•  NoSQL can provide a sensible interim backing store for larger datasets

•  Forms a great write-through layer atop a streaming architecture (Op/Analytic Bridge)

Thanks!

Coherence & Big Data · BIG DATA! BAND WAGGON! Backing Layer e! L! Recent data in cache! Fast...

Documents

Backup/Recovery Strategies Update for IBM i Presentations/Debbie... · – Backing Up the Entire System – Backing Up System Data – Backing Up User Data – Backing Up Spooled

1:120 TT-NEUHEITEN - vlacky.com · one refrigerator waggon and one hinged cover waggon (unique edition, last ... 2nd class fast passenger train coach 175 PKP 13646 Reisezugwagen 2

Waggon Wheels

Dealing with Data Gradients: “Backing Out” & Calibration

Chapter 2 Backing Storage, Memory and Data Representation

Backing Data Workshop - emrsettlement.co.uk · What is backing data? In order for you to validate fully our invoices (and the settlement calculations behind them), you will need:

Protecting Your Data From Computer Failures (backing up ... · Protecting Your Data From Computer Failures (backing up your data) Randall Parabicoli Language Learning Center University

Guide to Linux Installation and Administration, 2e1 Chapter 13 Backing Up System Data

Waggon The Violin Method cópia.pdf

Project Waggon Tippler

SOS – Backing up your data · SOS – Backing up your data SOS Support Knowledge Base Page 1 Purpose The purpose of this knowledgebase article is to help you back up your data files

Backing Storage Media - WordPress.com...Backing Storage Media Definition: “Internal or External devices that are used to store data either temporarily or permanently.”. eos NOTE:

waggon the violin method.pdf

THE HAY WAGGON - OnTheMarket Hay Waggon The Hay Waggon is a ... enchanting heathland, ... Hartfield is a friendly area with a wide range of schooling options - from primary to sixth

Storing data - Internal memory, backing storage, and measuring memory

MP02- backing up contacts information ENG GER IT backing up contacts EN DE... · 2 __ MP02: backing up contacts information Note: Contacts information is the only data that can be

Backing up your data · BACKING UP YOUR DATA YOUR DATA’S INSURANCE POLICY. WHAT IS BACKUP Copy your documents, photos, music, videos and settings to another location In the event

IBM Spectrum Protect for Databases: Data Protection for ... · Backing up SQL Server data ..... . 101 Quick guide for backing up data ..... . 101 Cr eating legacy backups of SQL Server

Backing Up Your Data

EIGHT MONTHS IN AN OX-WAGGON. - University of Pretoria