34
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Streaming Analytics— Getting Started with Amazon Kinesis June 20, 2016

Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Embed Size (px)

Citation preview

Page 1: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Streaming Analytics—Getting Started with Amazon Kinesis

June 20, 2016

Page 2: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

What to expect from this sessionAmazon Kinesis: Getting started with streaming data on AWS • Streaming scenarios• Amazon Kinesis Streams overview• Amazon Kinesis Firehose overview • Firehose experience for Amazon S3 and Amazon Redshift

Page 3: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Amazon Kinesis Streams

Build your own custom applications that

process or analyze streaming data

Amazon Kinesis Firehose

Easily load massive volumes of streaming data into Amazon S3 and Amazon Redshift

Amazon Kinesis Analytics

Easily analyze data streams using

standard SQL queries

Amazon Kinesis: Streaming data made easyServices make it easy to capture, deliver, and process streams on AWS

In Preview

Page 4: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

What to expect from this sessionAmazon Kinesis streaming data in the AWS cloud• Amazon Kinesis Streams• Amazon Kinesis Firehose (focus of this session)• Amazon Kinesis Analytics

In Preview

Page 5: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Scenarios Accelerated Ingest-Transform-Load

Continual Metrics Generation

Responsive Data Analysis

Data types IT logs, applications logs, social media/clickstreams, sensor or device data, market data

Ad/marketing tech

Publisher, bidder data aggregation

Advertising metrics like coverage, yield, conversion

Analytics on user engagement with ads, optimized bid/buy engines

IoT Sensor, device telemetry data ingestion

IT operational metrics dashboards

Sensor operational intelligence, alerts, and notifications

Gaming Online customer engagement data aggregation

Consumer engagement metrics for level success; transition rates; cost, time, and resources (CTR)

Clickstream analytics, leaderboard generation, player-skill match engines

Consumer engagement

Online customer engagement data aggregation

Consumer engagement metrics like page views, CTR

Clickstream analytics, recommendation engines

Streaming data scenarios across segments

1 2 3

Page 6: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Amazon Kinesis: Streaming data done the AWS wayMakes it easy to capture, deliver, and process real-time data streams

Pay as you go, no up-front costs

Elastically scalable

Right services for your specific use cases

Real-time latencies

Easy to provision, deploy, and manage

Page 7: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Amazon Kinesis StreamsBuild your own data streaming applications

Easy administration: Simply create a new stream and set the desired level of capacity with shards. Scale to match your data throughput rate and volume. Build real-time applications: Perform continual processing on streaming big data using Amazon Kinesis Client Library (KCL), Apache Spark/Storm, AWS Lambda, and more. Low cost: Cost-efficient for workloads of any scale.

Page 8: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Sending and reading data from Streams

AWS SDK

LOG4J

Flume

Fluentd

Get* APIs

Amazon Kinesis Client Library +Connector Library

Apache Storm

Amazon Elastic MapReduce

Sending Consuming

AWS Mobile SDK

Amazon Kinesis Producer Library

AWS Lambda

Apache Spark

Page 9: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Real-time streaming data ingestion

Custom-built streaming applications

Inexpensive: $0.014 per 1,000,000 PUT payload units

Amazon Kinesis StreamsManaged service for real-time processing

Page 10: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

We listened to our customers…

Page 11: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Amazon Kinesis Streams select new features…

Amazon Kinesis Producer Library

PutRecords API, 500 records or 5 MB payload

Amazon Kinesis Client Library in Python, Node.js, Ruby…

Server-side time stamps

Increased individual max record payload 50 KB to 1 MB

Reduced end-to-end propagation delay

Extended stream retention from 24 hours to 7 days

Page 12: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Amazon Kinesis Firehose

Page 13: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Amazon Kinesis FirehoseLoad massive volumes of streaming data into Amazon S3 and Amazon Redshift

Zero administration: Capture and deliver streaming data into Amazon S3, Amazon Redshift, and other destinations without writing an application or managing infrastructure.

Direct-to-data store integration: Batch, compress, and encrypt streaming data for delivery into data destinations in as little as 60 seconds using simple configurations.

Seamless elasticity: Seamlessly scales to match data throughput without intervention.

Capture and submit streaming data to Firehose

Firehose loads streaming data continuously into S3 and

Amazon Redshift

Analyze streaming data using your favorite BI tools

• Amazon S3• Amazon Redshift• Amazon

Elasticsearch Service

Page 14: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

AWS Platform SDKsMobile SDKsAmazon Kinesis Agent AWS IoT

Amazon S3 Amazon Redshift

• Send data from IT infra, mobile devices, sensors • Integrated with AWS SDK, agents, and AWS IoT

• Fully managed service to capture streaming data

• Elastic w/o resource provisioning• Pay-as-you-go: 3.5 cents/GB transferred

• Batch, compress, and encrypt data before loads• Loads data into Amazon Redshift tables by using

the COPY command

Amazon Kinesis Firehose

Capture IT and app logs, device and sensor data, and more

Enable near-real time analytics using existing tools

Amazon Elasticsearch Service

Page 15: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Scenarios Accelerated Ingest-Transform-Load

Continual Metrics Generation

Responsive Data Analysis

Data Types IT logs, applications logs, social media/clickstreams, sensor or device data, market data

Marketing tech Publisher, bidder data aggregation

Advertising metrics like coverage, yield, conversion

Analytics on user engagement with ads, optimized bid/buy engines

IoT Sensor, device telemetry data ingestion

IT operational metrics dashboards

Sensor operational intelligence, alerts and notifications

Gaming Online customer engagement data aggregation

Consumer engagement metrics for level success, transition rates, CTR

Clickstream analytics, leaderboard generation, player-skill match engines

Consumer online

Online customer engagement data aggregation

Consumer engagement metrics like page views, CTR

Clickstream analytics, recommendation engines

Streaming data scenarios across segments

1 2 3

Page 16: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

1. Delivery stream: The underlying entity of Firehose. Use Firehose by creating a delivery stream to a specified destination and send data to it. • You do not have to create a stream or provision shards.• You do not have to specify partition keys.

2. Records: The data producer sends data blobs as large as 1,000 KB to a delivery stream. That data blob is called a record.

3. Data producers: Producers send records to a delivery stream. For example, a web server that sends log data to a delivery stream is a data producer.

Amazon Kinesis Firehose Three simple concepts

Page 17: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Amazon Kinesis Firehose console experience Unified console experience for Firehose and Streams

Page 18: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Amazon Kinesis Firehose console (S3) Create fully managed resources for delivery without building an app

Page 19: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Amazon Kinesis Firehose console (S3) Configure data delivery options simply using the console

Page 20: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Amazon Kinesis Firehose console (Amazon Redshift)

Configure data delivery to Amazon Redshift simply using the console

Page 21: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Amazon Kinesis Firehose console (Amazon Elasticsearch Service)

Page 22: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Amazon Kinesis agentSoftware agent makes submitting data to Firehose easy • Monitors files and sends new data records to your delivery stream• Handles file rotation, check pointing, and retry upon failures• Preprocessing capabilities such as format conversion and log parsing• Delivers all data in a reliable, timely, and simple manner• Emits Amazon CloudWatch metrics to help you better monitor and

troubleshoot the streaming process• Supported on Amazon Linux AMI with version 2015.09 or later, or Red Hat

Enterprise Linux version 7 or later; install on Linux-based server environments such as web servers, front ends, log servers, and more

• Also enabled for Streams

Page 23: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Amazon Kinesis Firehose pricingSimple, pay-as-you-go, and no up-front costs

Dimension ValuePer 1 GB of data ingested $0.035

Page 24: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Amazon Kinesis Firehose or Amazon Kinesis Streams?

Page 25: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Amazon Kinesis Streams is a service for workloads that requires custom processing, per incoming record, with sub-1-second processing latency, and a choice of stream processing frameworks.

Amazon Kinesis Firehose is a service for workloads that require zero administration, ability to use existing analytics tools based on Amazon S3, Amazon Redshift or Amazon Elasticsearch Service, and a data latency of 60 seconds or higher.

Page 26: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Amazon Kinesis Analytics

Page 27: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Amazon Kinesis AnalyticsAnalyze data streams continuously with standard SQL

Apply SQL on streams: Easily connect to data streams and apply existing SQL skills.

Build real-time applications: Perform continual processing on streaming big data with sub-second processing latencies.

Scale elastically: Elastically scales to match data throughput without any operator intervention.

Preview!

Connect to Amazon Kinesis streams, Firehose delivery

streams

Run standard SQL queries against data streams

Amazon Kinesis Analytics can send processed data to analytics tools so you can create alerts and respond in real time

Page 28: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Select Amazon Kinesis Customer Case Studies

Ad tech Gaming IoT

Page 29: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

DataXu: Digital AdTech

Amazon Redshift

Page 30: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Sushiro: Kaiten Sushi Restaurants380 stores stream data from sushi plate sensors and stream to Amazon Kinesis

Page 31: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Sushiro: Kaiten Sushi Restaurants380 stores stream data from sushi plate sensors and stream to Kinesis

Page 32: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Sushiro: Kaiten Sushi Restaurants380 stores stream data from sushi plate sensors and stream to Kinesis

Page 33: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Buzzing API

APIReadyData

Amazon KinesisStreams

Node.JS App- Proxy

Clickstream

Data ScienceApplication

Amazon Redshift

ETL on EMR

Users to Hearst

Properties

Final Hearst Data Pipeline

LATENCY

THROUGHPUT

Milliseconds 30 Seconds 100 Seconds 5 Seconds

100 GB/Day 5 GB/Day 1 GB/Day 1 GB/Day

Agg Data Models

Firehose

S3

Page 34: Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016

Thank you!