Intuitions for Scaling Data-Centric Architectures Ben Stopford Confluent Inc

Preview:

Citation preview

Intuitions for Scaling Data-Centric Architectures

Ben StopfordConfluent Inc

Intuitions for Scale

Intuition does not come to the unprepared mind

A.E.

Locality &Sequential Addressing

Computers work best with sequential workloads

Disk buffer

Page cache

L3 cacheL2 cache

L1 cache

Pre-fetch is your friend

Random vs. Sequential Addressing

300 reads/sec 200MB/s

e.g. sequential is ~7000x faster for 100B rows

This isn’t just Disk

L3

L2L1

Random RAM ~ Sequential Disk

10-100x

Files

We can write sequentially to a file quickly

Reading Efficiently

Scan

Position & Scan(pages)

Avoid Random Reads

Writing Tradeoffs

Append OnlyJournal

(Sequential IO)

Update in PlaceOrdered File(Random IO)

v2

v1

v2

v1

Supporting Lookups

Add Indexes for Selectivity

bob

dave fred hary mikesteve vince

Index

Heap file

Goodbye Sequential Write Performance

bob

dave fred hary mikesteve vince

Random IO

Sequential IO

Option A: Put Index in Memory

RAM

Disk

Option B: Use a chronology of small index files

Writes

batch up

sort

write to disk

older files

small index file

…with tricks to optimise out the need for random IO

RAM

Disk

file metadata & bloom filter

Log Structured Merge Trees

• A collection of small, immutable indexes

• Append only, de-duplicate by merging files

• Low memory index structures increase read performance

Shift problem of Random Access from “write” to “read” concern

Option C: Brute Force

A B C

A1A2A3A4

B1

B2

B3B4

C1

C2

C3

C4

‘column per file’ arrangement

same order for each file

Option C: Columnar

Merge Join

compressedcolumns

A1

A2A3A4

B1

B2

B3B4

C1

C2

C3

C4

Brute Force, by Column

• Less IO, by column, compressed• Held in Row order => merge joins via

rowid• Predicates can operate on

compressed data• Late materialisation.

Many of the most scalable technologies play to one of these core efficiencies

Riak, Mongo etc

RAM

Disk

Kafka

(Queues are Databases - 1995 Jim Gray)

Hbase, Cassandra, RocksDB etc

LSM

Redshift etc, Parquet (Hadoop)

A B C

A1

A2A3A4

B1

B2

B3B4

C1

C2

C3

C4

Parallelism

Partitioning & Replication

Partitioning - KV

K-V storessingle endpoint query routing

Partitioning - Batch

Divide and conquer

Partitioning: Concurrency Limits

Use of secondary indexes can limit concurrency at

scale

Replication

Replication

• Replication provides one route out of this.

• Replicas isolate load -> scales out concurrency for general workloads.

• Obviously provides redundancy etc too.

• If async, trades off against consistency (CAP)

Atomaticity & Ordering

These can be expensive

Solution: Avoid, Isolate or embrace disorder (Bloom etc)

Atomic(Mutable)

Immutable

Circling Synchronous, Mutable State

Trapped in the Persist & Query pattern… in

a fully ACID world

Separating Paradigms - CQRS

Client

Command

Query

DB DBDenormalis

e/

Precompute

DRUID

realtime node

historynode

Query hits both

Operational /Analytic BridgeD

ATA

Client

Client

ClientMutable

Search

SQL

NoSQLStream

ImmutableViews

denormalise

Stream layer (fast)

Batch LayerServing Layer

All

you

r d

ata Query

Query

Lambda ArchitectureSeparating Stream & Batch

All

you

r d

ata

Stream Data platformsViews

Client

Client

Kafka

Search

Columnar

Hadoop

Stream processo

r

Isolate consistency concerns, Leverage in-flight data, Promote immutable replicas

Sys 1

Sys 2

Sys 3

Stream

Things we Like

Treating state is an immutable chronology

time

Listening and reacting to things as they are written

Replaying things that happened before

history

Regenerate state

Enrich views

Avoiding (or Isolating) the need to mutate

Mutable Immutable

Read-optimising the immutable

Denormalise

Primitive operations for Shards and Replicas (sync/async)

Being able to reason about time in an asynchronous world

Blending the utility of different tools in a single data platform

Sys 1

Sys 2

Sys 3

Stream

Thanks

slides available @ benstopford.com