32
Streaming Data Ingest and Processing with Kafka

Streaming Data Ingest and Processing with Apache Kafka

Embed Size (px)

Citation preview

Streaming Data Ingest and Processing with Kafka

You will learn how to

• Realize the value of streaming data ingest with Kafka

• Turn databases into live feeds for streaming ingest and processing

• Accelerate data delivery to enable real-time analytics

• Reduce skill and training requirements for data ingest

Apache Kafka and Stream Processing

About Confluent• Founded by the creators of Apache

Kafka• Founded September 2014• Technology developed while at

LinkedIn• 73% of active Kafka committers

Cheryl DalrympleCFO

Jay KrepsCEO

Neha NarkhedeCTO, VP Engineering

Luanne DauberCMO

Leadership

Todd BarnettVP WW Sales

Jabari NortonVP Business Dev

What does Kafka do? Producers

Consumers

Kafka Connect

Kafka Connect

Topic

Your interfaces to the world

Connected to your systems in real time

Kafka is much more thana pub-sub messaging system

Before: Many Ad Hoc Pipelines

Search Security

Fraud Detection Application

User Tracking Operational Logs Operational Metrics

Hadoop Search Monitoring Data Warehouse

Espresso Cassandra Oracle

After: Stream Data Platform with KafkaDistributed Fault Tolerant Stores Messages

Search Security

Fraud Detection Application

User Tracking Operational Logs Operational MetricsEspresso Cassandra Oracle

Hadoop Log Search Monitoring Data Warehouse

Kafka

Processes Streams

People Using Kafka TodayFinancial Services

Entertainment & Media

Consumer Tech

Travel & Leisure

Enterprise Tech

Telecom Retail

Common Kafka Use Cases

Data transport and integration• Log data• Database changes• Sensors and device data• Monitoring streams• Call data records• Stock ticker data

Real-time stream processing• Monitoring• Asynchronous applications• Fraud and security

What is the key challenge?

Making sure all data ends up in the right places

Kafka for Integration

1. Ad-hoc pipelines2. Extreme processing3. Loss of metadata

Data Integration Anti-Patterns

Tight Coupling

Agility

Because at the heart of EVERY system……there is a LOG,and Kafka is a scalable and reliable system to manage LOGs

Why is Kafka such a great fit?

Basic Data Integration Patterns

Push

Pull

Kafka Connect Allows Kafka to Pull Data

Turn the Change Capture Log into a Kafka Topic

16

• Database data is available for any application• No impact on production• Database TABLES turned into a STREAM of events• Ready for the next challenge? Stream processing

applications

What’s next?

Confluent Platform with Attunity Connectivity

Confluent Platform

Alerting

Monitoring

Real-time Analytics

Custom Application

Transformations

Real TimeApplications

Apache Kafka CoreConnectors

Control Center Clients & Developer Tools

Hadoop

ERP

CRM

Data Warehouse

RDBMS

Data Integration

Connectors

Database Changes Mobile DevicesloTLogs Website Events

Confluent Platform Confluent Platform Enterprise External Product

Support, Services and Consulting

Kafka Streams

Source Sink

Confluent Platform: It’s Kafka ++Feature Benefit Apache Kafka Confluent Platform

3.0Confluent Enterprise

3.0

Apache Kafka High throughput, low latency, high availability, secure distributed message system

Kafka Connect Advanced framework for connecting external sources and destinations into Kafka

Java Client Provides easy integration into Java applications

Kafka Streams Simple library that enables streaming application development within the Kafka framework

Additional Clients Supports non-Java clients; C, C++, Python, etc.

Rest Proxy Provides universal access to Kafka from any network connected device via HTTP

Schema Registry Central registry for the format of Kafka data – guarantees all data is always consumable

Pre-Built Connectors HDFS, JDBC and other connectors fully Certified and fully supported by Confluent

Confluent Control Center Includes Connector Management and Stream Monitoring

Support Connection and Monitoring command center provides advanced functionality and control Community Community 24x7x365

Free Free Subscription

Confluent Control Center

Configures Kafka Connect data pipelines

Monitors all pipelines from end-to-end

Connector Management

Attunity ReplicateStreaming databases into Kafka

About Attunity

Overview

Global operations, US HQ 2000 customers in 65 countriesNASDAQ traded, fast growing

Global Footprint

Data Integration and Big Data Management

1. Accelerate data delivery and availability

2. Automate data readiness for analytics 3. Optimize data management with

intelligence

Attunity Replicate Attunity Compose Attunity Visibility

Universal Data Availability Data Warehouse Automation

Data Usage Profiling & Analytics

Move data to any platform

Automate ETL/EDW

Optimizeperformance and

cost

On Premises / Cloud

Hadoop FilesRDBMS EDW SAP Mainframe

Attunity Product Suite

Stream your databases to Kafka with Attunity Replicate:• Easily – configurable and automated solution, with a

few clicks you can turn databases into live feeds for Kafka• Continuously – capture and stream data changes

efficiently, in real-time, and with low impact • Heterogeneously – using the same platform for

many source database systems (Oracle, SQL, DB2, Mainframe, many more…)

Attunity Replicate for Kafka

Attunity Replicate architecture

Transfer

TransformFilterBatch

CDC Incremental

In-Memory

File Channel

Batch

Hadoop

Files

RDBMS

Data Warehouse

Mainframe

Cloud

On-prem

Cloud

On-prem

Hadoop

Files

RDBMS

Data Warehouse

Kafka

Persistent Store

Demand•Easy ingest and CDC•Real-time processing•Real-time monitoring•Real-time Hadoop•Scalable to 1000’s applications•One publisher – multiple consumers

Attunity Replicate•Direct integration using Kafka APIs

•In-memory optimized data streaming

•Support for multi-topic and multi-partitioned data publication

•Full load and CDC•Integrated management and monitoring via GUI

Kafka and real-time streaming 

CDC

Attunity Replicate for Kafka - Architecture

MSG

n 2 1

MSG MSG

Data Streaming

Transaction logs

In memory optimized metadata management and data transport

Message broker

Message broker

Bulk Load

MSG

n 2 1

MSG MSG

Data Streaming

T1/P0

T2/P1

T3/P0

Broker 1M0 M1 M2 M3 M4 M5 M6 M7 M8

M0 M1 M2 M3 M4 M5

M0 M1 M2 M3 M4 M5 M6 M7

T1/P1

T2/P0

Broker 2

M0 M1 M2 M3 M4

M0 M1 M2 M3 M4 M5 M6

"table": "table-name",

"schema": "schema-name",

"op": "operation-type",

"ts": "change-timestamp",

"data": [{"col1": "val1"}, {"col2": "val2"}, …., {"colN": "valN"}]

"bu_data": [{"col1": "val1"}, {"col2": "val2"}, …., {"colN": "valN"}],

Easily create and manage Kafka endpoints

Eliminate manual coding•Drag and drop interface for all sources and targets

•Monitor and control data stream through web console

•Bulk load or CDC•Multi-topic and multi-partitioned data publication

Attunity Replicate

Command Line

Zero-footprint architectureLower impact on IT•No software agents on

sources and targets for mainstream databases

•Replicate data from 100’s of source systems with easy configuration

•No software upgrades required at each database source or target

Hadoop

Files

RDBMS

EDW

Mainframe

• Log based• Source specific

optimization

Hadoop

Files

RDBMS

EDW

Kafka

Heterogeneous – Broad support for sources and targets

RDBMS

OracleSQL ServerDB2 LUWDB2 iSeriesDB2 z/OSMySQLSybase ASEInformix

Data Warehouse

ExadataTeradataNetezzaVerticaActian VectorActian Matrix

HortonworksClouderaMapRPivotal

Hadoop

IMS/DBSQL M/PEnscribeRMSVSAM

Legacy

AWS RDSSalesforce

Cloud

RDBMS

OracleSQL ServerDB2 LUWMySQLPostgreSQLSybase ASEInformix

Data Warehouse

ExadataTeradataNetezzaVerticaPivotal DB (Greenplum)Pivotal HAWQActian VectorActian MatrixSybase IQ

HortonworksClouderaMapRPivotal

Hadoop

MongoDB

NoSQL

AWS RDS/Redshift/EC2Google Cloud SQLGoogle Cloud DataprocAzure SQL Data WarehouseAzure SQL Database

Cloud

Kafka

Message Broker

targets

sources