Kafka on YARN (KOYA) at Slider Meetup 20150304

Preview:

Citation preview

Kafka On YARN (KOYA)

An Open Source Initiative to Integrate Kafka & YARN

Thomas Weise – thomas@datatorrent.com

Siyuan Hua – siyuan@datatorrent.com

March 4th, 2015

Apache Kafka

“A high-throughput distributed messaging system.”

“Fast, Scalable, Durable, Distributed”

Kafka is a natural fit to deliver events into a our stream processing platform.

Feed

Kafka feeds Stream Processing

Kafka Cluster

Server-1

P1 P2 P3

Server-2

P1 P2 P3

Server-3

P1 P2 P3

YARN Cluster

Node Manager

DT Container

Node Manager

DT AppMaster

DT Container

… …

Resource Manager

Problem?

• It is not easy to get started with Kafka

– Initial deployment difficult (build your own tool)

• It is not easy to keep it running

– No central management (status, configuration changes,…)

– No automatic replacement for failed broker

• Operational Inefficiencies

– Resource fragmentation, underutilization

– Common infrastructure not leveraged, extra skill sets

• Adaption Barrier!

Why Kafka on YARN

• YARN enables:

– Horizontal scalability with commodity hardware

– Central resource management with queues, limits and locality preferences

– Framework for achieving fault tolerance and security

• Automate:

– Broker recovery

– Deployment of Kafka clusters

• Integrate:

– User friendly management (alternative to Kafka command line utilities)

YARN Cluster

Kafka on YARN through Slider

Node Manager

Node Manager

DT AppMaster

DT Container

… …

Resource Manager

Node Manager

Node Manager

Slider AM

DT Container

Server-1

P1

P2

P3

Server-2

P1

P2

P3

Slider Agent

Slider Agent

Why Slider?

• Automates deployment and configuration of components– Simplify on-demand cluster creation

• Generic AM for long running services– Management of container failures – automates recovery– Sticky allocation of components to hosts across AM restart– Isolation: node labels to pin components to specific set of machines

• Central status– View all servers in one place

• Areas for improvement– Anti-affinity support (YARN limitation)– Agent API documentation– Flexibility in component instance specification

Configuration Example

Demo

Project Status

• Open Source: https://github.com/DataTorrent/koya

• Python Scripts + Configuration

• Works on Hadoop 2.6 through Slider 0.6

• Install: Embedded Slider or Application Package

• First Release by Q2

• Future Enhancements

– Expanded Status Info through Slider AM

– Explore Kafka management UI options

– Support for Disk as a Resource in YARN - YARN-2139

– Better control over server placement (anti-affinity)

– Slider-799

Q & A

Thank You!