Upload
amazon-web-services
View
1.942
Download
3
Embed Size (px)
Citation preview
aws.amazon.com/webinars/apac/webinar-week | #AWSWebinarWeek
Real-time Data ProcessingKinesis and beyond
Santanu Dutt
INDEXA. What is real-time?
B. Examples and Challenges
C. Kinesis and beyond
1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Streams5. ElasticSearch6. IoT Rule Engine
D. Demo
v
Examples• Algorithmic Trading < 10 msec• Real time bidding < 100 msec• Common IoT scenarios < 5 to 10 sec • Infrastructure Monitoring Dashboard < 1 min• Google Maps Traffic < 5 mins• Social Network and Media recommendation < 15 min to a Day• Most Business Analytics Scenarios < 30 mins• Social Network listening < Depends on how fast you want to respond>!
INDEXA. What is real-time?
B. Examples and Challenges
C. Kinesis and Beyond
1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Streams5. ElasticSearch6. IoT Rule Engine
D. Demo
v
v
Examples• Algorithmic Trading < 10 msec• Real time bidding < 100 msec• Common IoT scenarios < 5 to 10 sec • Infrastructure Monitoring Dashboard < 1 min• Google Maps Traffic < 5 mins• Social Network and Media recommendation < 15 min to a Day• Most Business Analytics Scenarios < 30 mins• Social Network listening < Depends on how fast you want to respond!
v
v
ChallengesA. Speed of Analytics and Response
B. Volume of data
C. Maturity or Capabilities of Analytics Framework
D. Storing and Presentation of results
The Motivation for Continuous Processing
v
Some statistics about what AWS Data Services• Metering service
• 10s of millions records per second• Terabytes per hour• Hundreds of thousands of sources• Auditors guarantee 100% accuracy at month end
• Data Warehouse• 100s extract-transform-load (ETL) jobs every day• Hundreds of thousands of files per load cycle• Hundreds of daily users• Hundreds of queries per hour
Metering Service
v
Internal AWS Metering ServiceWorkload• 10s of millions records/sec• Multiple TB per hour• 100,000s of sources
Pain points• Doesn’t scale elastically• Customers want real-time
alerts• Expensive to operate• Relies on eventually consistent
storage
v
Our Big Data Transition
Old requirements• Capture huge amounts of data and process it in hourly or daily batches
New requirements• Make decisions faster, sometimes in real-time• Scale entire system elastically • Make it easy to “keep everything”• Multiple applications can process data in parallel
A General Purpose Data FlowMany different technologies, at different stages of evolution
Client/Sensor Aggregator Continuous Processing
Storage Analytics + Reporting
Kafka
?
vKinesis
Movement or activity in response to a stimulus.
A fully managed service for real-time processing of high-volume, streaming data. Kinesis can store and process terabytes of data an hour from hundreds of thousands of sources. Data is replicated across multiple Availability Zones to ensure high durability and availability.
Customer View
Scenarios Accelerated Ingest-Transform-Load Continual Metrics/ KPI Extraction Responsive Data Analysis
Data Types IT infrastructure, Applications logs, Social media, Fin. Market data, Web Clickstreams, Sensors, Geo/Location data
Software/ Technology
IT server , App logs ingestion IT operational metrics dashboards Devices / Sensor Operational Intelligence
Digital Ad Tech./ Marketing
Advertising Data aggregation Advertising metrics like coverage, yield, conversion
Analytics on User engagement with Ads, Optimized bid/ buy engines
Financial Services Market/ Financial Transaction order data collection
Financial market data metrics Fraud monitoring, and Value-at-Risk assessment, Auditing of market order data
Consumer Online/E-Commerce
Online customer engagement data aggregation
Consumer engagement metrics like page views, CTR
Customer clickstream analytics, Recommendation engines
Customer Scenarios across Industry Segments
1 2 3
What Biz. Problem needs to be solved? Mobile/ Social Gaming Digital Advertising Tech.
Deliver continuous/ real-time delivery of game insight data by 100’s of game servers
Generate real-time metrics, KPIs for online ad performance for advertisers/ publishers
Custom-built solutions operationally complex to manage, & not scalable
Store + Forward fleet of log servers, and Hadoop based processing pipeline
• Delay with critical business data delivery• Developer burden in building reliable, scalable
platform for real-time data ingestion/ processing• Slow-down of real-time customer insights
• Lost data with Store/ Forward layer• Operational burden in managing reliable, scalable platform
for real-time data ingestion/ processing• Batch-driven real-time customer insights
? Accelerate time to market of elastic, real-time applications – while minimizing operational overhead
Generate freshest analytics on advertiser performance to optimize marketing spend, and increase responsiveness to clients
INDEXA. What is real-time?
B. Examples and Challenges
C. Kinesis and Beyond
1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Streams5. ElasticSearch6. IoT Rule Engine
D. Demo
INDEXA. What is real-time?
B. Examples and Challenges
C. Kinesis and Beyond
1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Streams5. ElasticSearch6. IoT Rule Engine
D. Demo
v
Amazon Kinesis StreamsBuild your own data streaming applications
• Easy administration: Simply create a new stream, and set the desired level of capacity with shards. Scale to match your data throughput rate and volume.
• Build real-time applications: Perform continual processing on streaming big data using Kinesis Client Library (KCL), Apache Spark/Storm, AWS Lambda, and more.
• Low cost: Cost-efficient for workloads of any scale.
Kinesis Architecture
Run code in response to an event and automatically manage compute.
Amazon Kinesis – An Overview
Kinesis Stream: Managed ability to capture and store data
• Streams are made of Shards
• Each Shard ingests data up to
1MB/sec, and up to 1000 TPS
• Each Shard emits up to 2 MB/sec
• All data is stored for 24 hours
• Scale Kinesis streams by adding or
removing Shards
• Replay data inside of 24Hr. Window
Putting Data into KinesisSimple Put interface to store data in Kinesis• Producers use a PUT call to store data in a
Stream• PutRecord {Data, PartitionKey,
StreamName}
• A Partition Key is supplied by producer and used to distribute the PUTs across Shards
• Kinesis MD5 hashes supplied partition key over the hash key range of a Shard
• A unique Sequence # is returned to the Producer upon a successful PUT call
Creating and Sizing a Kinesis Stream
Building Kinesis Processing Apps: Kinesis Client LibraryClient library for fault-tolerant, at least-once, Continuous Processing
o Java client library, source available on Github
o Build & Deploy app with KCL on your EC2 instance(s)
o KCL is intermediary b/w your application & stream
Automatically starts a Kinesis Worker for each shard
Simplifies reading by abstracting individual shards
Increase / Decrease Workers as # of shards changes
Checkpoints to keep track of a Worker’s location in the
stream, Restarts Workers if they fail
o Integrates with AutoScaling groups to redistribute workers to
new instances
Amazon Kinesis Connector LibraryCustomizable, Open Source code to Connect Kinesis with S3, Redshift, DynamoDB
S3
DynamoDB
Redshift
Kinesis
ITransformer
• Defines the transformation of records from the Amazon Kinesis stream in order to suit the user-defined data model
IFilter
• Excludes irrelevant records from the processing.
IBuffer
• Buffers the set of records to be processed by specifying size limit (# of records)& total byte count
IEmitter
• Makes client calls to other AWS services and persists the records stored in the buffer.
v
USE Cases Ultra Low Latency Analytics (seconds) Complex Computations• => Complex algorithm execution
• => Tuple Processing – every bit of data processed independently vs. aggregation where it goes from 1st row to last row.
• => Moving Window Analysis – moving car from 2nd to 3rd min and then 5th to 6th min.
INDEXA. What is real-time?
B. Examples and Challenges
C. Kinesis and Beyond
1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Streams5. ElasticSearch6. IoT Rule Engine
D. Demo
v
Amazon Kinesis FirehoseLoad massive volumes of streaming data into Amazon S3 and Amazon Redshift
• Zero administration: Capture and deliver streaming data into S3, Redshift, and other destinations without writing an application or managing infrastructure.
• Direct-to-data store integration: Batch, compress, and encrypt streaming data for delivery into data destinations in as little as 60 secs using simple configurations.
• Seamless elasticity: Seamlessly scales to match data throughput w/o intervention
Capture and submit streaming data to Firehose
Firehose loads streaming data continuously into S3 and Redshift
Analyze streaming data using your favorite BI tools
v
Amazon Kinesis Firehose to RedshiftA two-step process
• Use customer-provided S3 bucket as an intermediate destination• Still the most efficient way to do large scale loads to Redshift.• Never lose data, always safe, and available in your S3 bucket.
• Firehose issues customer-provided COPY command synchronously. It continuously issues a COPY command once the previous COPY command is finished and acknowledged back from Redshift.
1
2
v
USE Cases Kinesis Firehose used when needed to do batch with more frequency. As
long as analysis can be done with SQL.
Micro-batching scenarios with latencies more 60 second tolerable
In case of Redshift Target – Analytics that can be achieved with standard SQL and User Defined Functions (UDFs)
Most “Real-Time Business Insights” kind of scenarios can be easily supported with
Kinesis Firehose + Redshift!
INDEXA. What is real-time?
B. Examples and Challenges
C. Kinesis and Beyond
1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Streams5. ElasticSearch6. IoT Rule Engine
D. Demo
v
Amazon Kinesis AnalyticsAnalyze data streams continuously with standard SQL
• Apply SQL on streams: Easily connect to data streams and apply existing SQL skills.
• Build real-time applications: Perform continual processing on streaming big data with sub-second processing latencies
• Scale elastically: Elastically scales to match data throughput without any operator intervention.
Announcement Only!
Amazon Confidential
Connect to Kinesis streams,Firehose delivery streams
Run standard SQL queries against data streams
Kinesis Analytics can send processed data to analytics tools so you can create alerts and
respond in real-time
v
USE Cases
Low latency time series analytics
Analytics that can be achieved with confines of supported SQL • - Running Totals• - Moving Averages• - Number of people entering a stadium
INDEXA. What is real-time?
B. Examples and Challenges
C. Kinesis and Beyond
1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Streams5. ElasticSearch6. IoT Rule Engine
D. Demo
v
Amazon DynamoDB Streams – time-ordered sequence of item-level changes• Time and partition ordered log
• Provides a stream of inserts, deletes, updates• Old item• New item• Primary key• Change type
• Stream items delivered exactly once
• Streams are asynchronous
• Scales with your table
DynamoDB DynamoDB Streams
v
USE Cases
Ultra Low Latency Analytics (seconds) when data is available in Kinesis and DynamoDB Stream, e.g.
Energy meters data coming into Kinesis, to continuously update billing info.
Changes to social network profile stored in DynamoDB, to transmit updates to connection immediately (e.g. user adds a new job to his profile).
INDEXA. What is real-time?
B. Examples and Challenges
C. Kinesis and Beyond
1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Stream and Kinesis Stream processing using Lambda5. ElasticSearch6. IoT Rule Engine
D. Demo
v
How Elasticsearch can help
• Combined with Logstash and Kibana, the ELK stack provides a tool for real-time analytics and data visualization
Plug-insA. Kinaba 3B. Kibana 4C. JettyD. cloud-awsE. KuromojiF. icu
v
v
ElasticSearch APIQUERY
AGGREGATION
Aggregation and FilteringDocuments
Aggregation and FilteringDocuments
Query
Aggregation and FilteringDocuments
Query
Buckets
Aggregation and FilteringDocuments
Query
Buckets
Aggregation and FilteringDocuments
Query
Buckets
Metrics 123 420 510
v
USE Cases Real-Time Dashboards (Kibana)
Alerting (Percolator API)
Real-Text Analytics, as in Social Media Listening
Real-Time Geospatial Queries and Geospatial Analysis
INDEXA. What is real-time?
B. Examples and Challenges
C. Kinesis & Beyond
1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Stream and Kinesis Stream processing using Lambda5. ElasticSearch6. IoT Rule Engine
D. Demo
v
AWS IoT
“Securely connect one or one-billion devices to AWS, so they can interact with applications and other devices”
v
AWS IoT
DEVICE SDKSet of client libraries to
connect, authenticate and exchange messages
DEVICE GATEWAYCommunicate with devices via
MQTT and HTTP
AUTHENTICATIONAUTHORIZATION
Secure with mutual authentication and encryption
RULES ENGINETransform messages
based on rules and route to AWS Services
AWS Services- - - - -
3P Services
DEVICE SHADOWPersistent thing state during
intermittent connections
APPLICATIONS
AWS IoT API
DEVICE REGISTRYIdentity and Management of
your things
v
USE Cases Processing sensor data (millions of data points from hundreds of thousands of
sensors) in real time for Alerting
Redirecting sensor data for multi-data-point analysis to Kinesis, DynamoDB
Spark/Storm
Lambda(arbitrary, Node,
Python, Java)
Redshift(structured, SQL)
ElasticSearch(un-structured, JSON)
HIVE SQL
Quick Sight(GUI)
Kinesis Analytics(Limited SQL)
IoT Rule Engine(SQL)
Diffi
culty
of w
orki
ng
with
Spark/StormKinesis
ElasticSearch+ Logstash
Lambda+ Kinesis
Kinesis Analytics
Redshift + DMS
Redshift +Firehose
MR/HIVE/Impala/ Presto +
Firehose
Quick Sight
LATENCY
CAPA
BILI
TIES
IoT Rule Engine
Sub-second Few seconds 2-5 Minutes
Storm+ Kafka
INDEXA. What is real-time?
B. Examples and Challenges
C. Kinesis & Beyond
1. Kinesis Stream2. Kinesis Firehose3. Kinesis Analytics (SQL)4. DynamoDB Stream and Kinesis Stream processing using Lambda5. ElasticSearch6. IoT Rule Engine
D. Demo
v
Demo Time.
Website - https://secure.amitksh.net/cdn/webinarWeek.htmlReal time updates from Kinesis - https://secure.amitksh.net/rtChart.html
Interesting Possibilities!
Quick Sight
Online Labs & Training
Gain confidence and hands-on experience with AWS.
Watch free Instructional Videos and explore Self-Paced Labs
Instructor Led Classes
Learn how to design, deploy and operate highly available, cost-
effective and secure applications on AWS in courses led by qualified
AWS instructors
Validate your technical expertise with AWS and use practice exams to help you
prepare for AWS Certification
AWS Certification
More info at http://aws.amazon.com/training