Streaming computing: architectures, and tchnologies

Streaming ComputingSome thoughts and technology choices for event-driven processing

Natalino Busa - 29 Aug. 2013

Outline

● Concurrency● Streaming computing

● Technologies○ Gigaspaces○ Storm○ Akka

● Comparison matrix● Opportunities

Algorithms: a tribute

Numbers and Algorithms:

9th century Persian Muslim mathematician Abu Abdullah Muhammad ibn Musa Al-Khwarizmi,

whose work built upon that of the 7th century Indian mathematician Brahmagupta.

We own a lot to these guys !!!

Why do we need parallelism?

It gets bigger,

It doesn’t get much faster

We get more cores in a chip.

More cores = more parallelismWe are happy now, right?

Moore’s law

Every 18 months, the number of CPU

core’s double

Another interpretation:

Every 18 months, the number of idle

CPU core’s double

More parallelism

We trade:

Time vs ( CPU, Memory, I/O)

Modern applications

Scalability:Vertical: concurrency

(use all the cores, memory and I/O of a given machine)

Horizontal: distribution (use all the machines in the cluster)

High availability: Fault tolerance: all levels (local, distributed)

(the terminator effect: you can stop it but can’t kill it )

Streaming applications

Performance: Efficient use of resources:

CPU and memory, but also OS threads and sockets

Asynchronous:

event driven, reacts on new data

Distributed:

more machines = more performancethe algorithm is partitioned and/or replicated on the cluster

What to increase?

More CPU: It helps when there is

computation involved

More MEMORY: It helps when there is

more state to keep

More I/O: It helps when there are

more messages to transfer

Streaming or batch?

ProcessingData

Natalino Busa - 12 Feb. 2013

source system target systemour system

What differentiate Streaming from Batch?

● Granularity of Data● Granularity of Processing

Granularity impacts:

Throughput, Latency, and the Cost of the system!

The choice is yours

1000 events/sec (1 KB/event)

running on 100 cores all day long

“Wait a day, then process”

860 M events = 86 GB of data

Latency: 24 hoursThroughput: 1 update/day

BATCH: Hadoop

Latency 1ms Throughput: 1000 updates/sec

STREAMING: Akka

“Do not wait”

Process the 1KB of data each msec.

“Both are valid options. It depends on the application domain and the requirements/specs of the target and source systems”

Mapping it to existing applications

Granularity of Data

256 GB 256 GB

Granularity of Processing

1 CPU 100 CPU’s

Traditional DB systems Big Data (Hadoop)

Granularity of Data

1 KB 1 KB

Granularity of Processing

1 CPU 100 CPU’s

Traditional mail server Web application server

Technologies: Gigaspaces

Technologies: StormTopology

SupervisingScaling

Technologies: Akka

Supervising:tree of actors

Topology (statics and dynamic actors)

Scaling and distributed processing

Technology matrix

aGranularity of Processing

Small Big

Small Akka AkkaGigaspaces

Big ? Storm

System end-to-end throughput

High ~ 10’000 events/sec Medium ~100 events/sec Low ~10 events/sec

Akka Storm/ Gigaspaces Scripting languages

Big Data in motion

Both are:Distributed, fault-tolerant, streaming

- Storm ++ multi-language -- not user/admin friendly -- slow supervising

processing elements are jvm’s ideal when data is coarse grained

- Akka ++ high throughput, fine grained actors ++ dynamic topologies -- low-level, but high performance

processing elements are small and lightweightideal for millions of transactions per second

- Gigaspaces ++ combines memory + application distribution -- framework api is not very flexible

processing elements are jvmsideal for all-in-one solution, with little customization

Opportunity: Lambda Architecture

Logic layerSoftware as a Servicee.g realt-time predictor

Natalino Busa - 12 Feb. 2013from http://www.manning.com/marz/

Opportunity: Batch + Streaming

BatchComputing

Front End Services

In-MemoryDistributed Database

In-memoryDistributed DB’s

BatchStreaming

HTML5 Client / Responsive Applow-latencyHTTP API services FETCH

(refresh)

StreamingComputing

Data Warehouses Messaging Busses

PUSH(SSE, notifications)

Thanks

linkedin:

www.linkedin.com/in/natalinobusa

www.natalinobusa.com

twitter:

@natalinobusa

Streaming computing: architectures, and tchnologies

Technology

High Performance Cluster Computing Architectures and Systems

An Approach to Secure Cloud Computing Architectures

Fog Computing: Principles, Architectures, and Applications

Middleware, Service-Oriented Architectures and Grid Computing

Go for Real Time Streaming Architectures - DotGo 2017

Architectures for Exascale Computing

Memory Systems and Memory-Centric Computing …...Fundamentally Energy-Efficient Architectures Memory-centric (Data-centric) Architectures Fundamentally Low-Latency Architectures Architectures

IBM Platform Computing Solutions Reference Architectures

A Reconfigurable Low Power High Throughput Streaming ... · computing systems, specialized architectures for big data processing are needed to enable low power and high throughput

The data streaming paradigm and its use in Fog architectures

by Michael Hansen Survival of the Fittest - Streaming Architectures · Survival of the Fittest - Streaming Architectures by Michael Hansen. Today ... directly on streams in real-time

Streaming Graph Analytics: Complexity, Scalability, and Architectures … · 2019. 11. 3. · CLSAC: Oct. 26, 2016 1 Streaming Graph Analytics: Complexity, Scalability, and Architectures

Getting It Right Exactly Once: Principles for Streaming Architectures

Parallel and Distributed Computing Architectures and

Uncertainty Analysis of Middleware Architectures for Streaming Smart … · 2016-01-11 · Uncertainty Analysis of Middleware Architectures for Streaming Smart Grid Applications Ilge

Performance Analysis of Cloud Computing Architectures

Running M3D on Advanced Computing Architectures

Next Generation Cloud Computing Architectures: Performance

Streaming Patterns Revolutionary Architectures with the Kafka API

1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive