45
Spark Streaming Spark streaming recipes and “exactly once” semantics revised

Spark Streaming Recipes and "Exactly Once" Semantics Revised

Embed Size (px)

Citation preview

Page 1: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Spark StreamingSpark streaming recipes and “exactly once” semantics

revised

Page 2: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Appsflyer: Basic Flow

Advertiser

Publisher

Click

Install

Page 3: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Appsflyer as Marketing PlatformAttribution

Statistics: clicks, installs, in-app events, launches, uninstalls, etc.

Life time value

Retargeting

Fraud detection

Prediction

A/B testing

etc...

Page 4: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Appsflyer Technology~7B events / day

Hundreds of machines in Amazon

Tens of micro-services

Apache Kafka

service

service

service

service

service

service

DBAmazon S3

MongoDB

Redshift

Druid

Page 5: Spark Streaming Recipes and "Exactly Once" Semantics Revised

What is stream processing?

Page 6: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Stream Processing

Minimize latency between data ingestion and insights

Usages

● Real-time dashboard

● Fraud prevention● Ad bidding● etc.

Page 7: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Stream Processing FrameworksKey Differences

● Latency● Windowing support● Delivery semantics● State management● API easiness● Programming languages

support● Community support● etc..

Page 8: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Apache Spark

Spark Driverval textFile = sc.textFile("hdfs://...")val counts = textFile

.flatMap(line => line.split(" "))

.map(word => (word, 1))

.reduceByKey(_ + _)counts.saveAsTextFile("hdfs://...")

Cluster Manager

Worker Node

Executor

Task Task

Worker Node

Executor

Task Task

Page 9: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Apache Spark

External World

RDD RDD

RDD RDD

RDD External World

read

read

transform

transform

transform

transform

action

Page 10: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Streaming in Spark

Advantages● Reuse existing infra● Rich API● Straightforward windowing● It’s easier to implement “exactly

once”

Disadvantages

● Latency

Input Stream Micro batches

Spark Engine

Processed data

Page 11: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Windowing in Spark Streaming

● Window length and sliding interval must be multiples of batch interval

● Possible usages○ Finding Top N elements during last M period of time○ Pre-aggregation of data prior to inserting to DB○ etc.

DStream

Window length

Sliding interval

Page 12: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Do we need “exactly once” semantics?

Page 13: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Data Processing Paradigms

Batch layer

Real-time layer

Real-time layer

http://www.kappa-architecture.com/https://en.wikipedia.org/wiki/Lambda_architecture

Page 14: Spark Streaming Recipes and "Exactly Once" Semantics Revised

How do we achieve “exactly once”?

Page 15: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Achieving “Exactly once”

Producer

Doesn’t duplicate messages

Stream processorTracking state

(checkpointing)

Resilient components

ConsumerReads only new messages

“Easy” way:Message deduplication based on some

ID

Idempotent output destination

Page 16: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Stream Checkpointing

https://en.wikipedia.org/wiki/Snapshot_algorithm

Barriers are injected into the data stream.

Once intermediate step sees barriers from all of its input streams it outputs barrier to all of its outgoing streams.

Once all sink operators see barrier for a snapshot, they acknowledge the snapshot, and it’s considered committed.

Multiple barriers can be seen in stream flow.

Operators store their state to an external storage.

On failure, all the operators' state will fall back to the latest complete snapshot, and data source will also fall back to the position recorded with this snapshot.

storage

chec

kpoin

t

Page 17: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Micro-batch Checkpointing

receive

process

state

receive

process

state

receive

process

state

while (true) { // 1. receive next batch of data // 2. compute next stream and state}

Unit of fault tolerance

Page 18: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Resilience in Spark StreamingAll Spark components must be resilient!

Driver application process

Master process

Worker process

Executor process

Receiver thread

Worker node

Driver

Master

Worker Node

ExecutorTask

Task

Worker Node

ExecutorTask

Task

Page 19: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Driver

Driver ResilienceClient mode

Driver application is running inside the “spark-submit” process.

If this process dies the entire application is killed.

Cluster mode

Driver application runs on one of worker nodes.

“--supervise” option makes driver restarton a different worker node.

Running through Marathon

Marathon can re-start failedapplications automatically.

Master

Worker Node

ExecutorTask

Task

Worker Node

ExecutorTask

Task

Page 20: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Master ResilienceSingle master

The entire application is killed.

Multi-master mode

A standby master is elected active.

Worker nodes automatically register with new master.

Leader election via ZooKeeper. Driver

Master

Worker Node

ExecutorTask

Task

Worker Node

ExecutorTask

Task

Page 21: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Worker ResilienceWorker process

When failed, all child processes (driver or executor) are killed.

New worker process is launched automatically.

Executor process

Restarted on failure by the parent worker process.

Receiver thread

Running inside the Executor process - same as Executor.

Worker node

Failure of worker node behaves thesame as killing all its componentsindividually.

Driver

Master

Worker Node

ExecutorTask

Task

Worker Node

ExecutorTask

Task

Page 22: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Resilience doesn’t ensure “exactly once”

Page 23: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Checkpointing

Checkpointing helps recover from driver failure.

Stores computation graph to some fault tolerant place (like HDFS or S3).

What is saved as metadataMetadata of queued but not processed batches

Stream operations (code)

Configuration

DisadvantagesFrequent checkpointing reduces throughput.

As the code itself is saved, upgrade is not possible without removing checkpoints.

Page 24: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Write Ahead Log

Synchronously saves received data to fault tolerant storage.

Helps recover received, but not yet committed blocks.

DisadvantagesAdditional storage is required.

Reduced throughput.

Executor

input streamReceiver

Page 25: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Problems with Checkpointing and WALData can be lost even when using checkpointing (batches hold in

memory will be lost on driver failure).

Checkpointing and WAL prevent data loss, but do not provide “exactly once” semantics.

If receiver fails before updating offsets in ZooKeeper - we are in trouble.

In this case data will be re-read from Kafka and from WAL.

Still not exactly once!

Page 26: Spark Streaming Recipes and "Exactly Once" Semantics Revised

The SolutionDon’t use receivers - read directly from input stream instead.

Driver instructs executors what range to read from a stream (stream must be rewindable).

Read range is attached to the batch itself.

Example (Kafka direct stream):Application Driver

StreamingContext

1. Periodically query latest offsets for topics & partitions

2. Calculates offset ranges for the next batch

Executor

3. Schedule the next micro-batch job

4. Consume data for the calculated offsets

Page 27: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Example #1

Page 28: Spark Streaming Recipes and "Exactly Once" Semantics Revised

The ProblemEvents counting.

Group by different set of dimensions.

Have pre-aggregation layer that reduces load on DB on spikes.

DB

app_id event_name country count

com.app.bla FIRST_LAUNCH US 152

com.app.bla purchase IL 10

com.app.jo custom_inapp_20 US 45

Page 29: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Transactional Events AggregatorBased on SQL database

Store Kafka partition offsets into the DB

Increment event counters in transaction based on current and stored offsets.

SQL DB

Driver Executor Executor

1. Read last Kafka partitions and their offsets from the DB

2. Create direct Kafka stream based on read partitions and offsets

3. Consume events from Kafka

4. Aggregate events

5. Upsert event counter along with current offsets in transaction

Page 30: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Creating Kafka Stream

Page 31: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Aggregation & Writing to DB

Page 32: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Example #2

Page 33: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Snapshotting Events Aggregator

Driver Executor Executor

1. Read last Kafka partitions and their offsets from S3

2. Create direct Kafka stream based on read partitions and offsets

3. Consume events from Kafka

4. Aggregate events

5. Store processed data and Kafka offsets under /data/ts=<timestamp> and /offsets/ts=<timestamp> respectively

S3

Aggregator Application

Page 34: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Snapshotting Events Aggregator

Executor

Executor Executor

1. Find last committed timestamp

2. Read data for the last timestamp from /data/ts=<timestamp>

4. Aggregate events by different dimensions, and split to cubes

6. Delete offsets and data for the timestamp

/offsets/ts=<timestamp>/data/ts=<timestamp>

S3

Loader Application

Cassandra

5. Increment counters in different cubes

Driver

Page 35: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Aggregator

Page 36: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Aggregator

Page 37: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Loader

Page 38: Spark Streaming Recipes and "Exactly Once" Semantics Revised

DeploymentWe use Mesos

Master HA for free.

Marathon keeps Spark streaming application alive.

Read carefully

http://spark.apache.org/docs/latest/streaming-programming-guide.html#performance-tuning

Inspect, re-configure, retry

Turn off Spark dynamicity

Preserve data locality

Find balance between cores/batch interval/block interval

Processing time must be less than batch interval

Tips

Page 39: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Thank you!(and we’re hiring)

Page 40: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Right NowReal time analytics dashboard

Page 41: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Right NowProcesses ~50M events a day

Reduces the stream in two sliding windows:

1.Last 5 seconds (“now”)

2.Last 10 minutes (“recent”)

At most once semantics

Page 42: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Right Now

Page 43: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Right NowWhy Spark?

Experienced with Spark

Convenient Clojure wrappers (Sparkling, Flambo)

Documentation and community

Page 44: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Right NowIn Production

3 m3.xlarge machines for the workers (4 cores each)spark.default.parallelism=10

Lesson learned: foreachRDD and foreachPartition

Page 45: Spark Streaming Recipes and "Exactly Once" Semantics Revised

Thank you!