Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
April 2021
Real Time Data Ingestion for
Your Analytics Ecosystem
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Agenda
What is data streaming and real-time analytics?
Why should you use real-time analytics?
Common use-cases
Enabling streaming and real-time analytics
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
• The number of “smart” devices is
projected to be 200 billion by
2021 (over 100X increase in ten
years)
• 90% of the data in the world was
generated in the last 2 years
• There are 2.5 quintillion bytes of
data created each day, and this
pace is accelerating
The of data being produced is increasing
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Data is produced continuously and at high
Source:
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Data is produced from a large of sources
Smart buildings
Web clickstream Application logs
IoT sensorsMetering records
Mobile apps
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
The of data diminishes over time
Source: Perishable insights,
Mike Gualtieri, Forrester
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Failing to act in real-time can translate to real losses
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
To create companies must derive insights
from a of data sources that are
producing data at high and
real-time
Data
warehouse
Data
lake
Data
streams
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Data streaming and real-time analytics
Ingest, process, and analyze high volumes of high velocity data
from a variety of sources in real time
Devices and applications that
produce real-time
data at high velocity.
Data from tens of thousands of data sources can be collected and ingested
in real time.
Data is stored in the order it was received for a set
duration of time, and can be replayed
indefinitely during this time.
Records are read inthe order they are
produced enabling real-time analytics or streaming
ETL.
Data lakeData Warehouse(most common)
Database(least common)
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Common real-time analytics use cases
Anomaly and fraud detection
Empowering IoT Analytics
Nourishing Marketing campaigns
Real-time personalization
Tailoring customer experience in real-time
Supporting healthcare and emergency services
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Achieving real-time analytics use cases
Messaging between
micro-services
Response analytics
(Web and mobile
app notifications)
Log ingestion
IoT device
maintenance
Change Data
Capture (CDC)
Streaming ETL
into data lakes and
data warehouses
milliseconds seconds minutes
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Challenges of data streaming
Difficult to setup Tricky to scale
Hard to achieve high
availability
Integration required
development
Error prone and complex to
manage
Expensive to
maintain
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Streaming real-time data with AWS
Easy to use Elastic Highly available &
durable
Seamless AWS integrations
Fully managed
Pay for what you use
Get started in less than 10 minutes using our solution implementation templates –https://aws.amazon.com/solutions/implementations/aws-streaming-data-solution-for-amazon-kinesis/
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Data sources
Mobile Apps Web Clickstream/ Social
Application Logs IoT Sensors
Connected Products Smart Buildings
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Stream ingestion
AWS IoT
Amazon CloudWatch
Logs
Amazon CloudWatch
Events
AWS SDKLOG4J
Flume
FluentdAWS Mobile
SDK
Kinesis
Producer
Library
Kinesis Agent
* AWS DMS includes 8 on-premise databases, 1 Azure
database, 5 RDS/Aurora database types, and Amazon S3
AWS Toolkits/Libraries AWS Service Integrations 3rd Party Offerings
Amazon Database
Migration Service*
Data from tens of thousands of data sources can be collected and ingested in real time
Amazon Dynamo DB
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Stream storage
Data is stored in the order it was received for a set duration
of time, and can be replayed indefinitely during this time.
Collect and store data streams
for analytics
Amazon Managed
Streaming for
Apache Kafka
Amazon Kinesis
Data Streams
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Amazon Kinesis Data Streams
Amazon Kinesis
Data Analytics
Amazon Kinesis
Data Firehose
Spark on EMR
Amazon EC2
AWS LambdaINPUT
Capture and send data to
Amazon Kinesis Data Streams
Shard 1
Shard 2
Shard 3
Shard 4
Shard n
Kinesis Data Streams
Ingests and stores data streams
for processing
OUTPUT
Analyze Streaming data using
your favorite BI tools
• Easy administration and low cost
• Real-time, elastic performance
• Secure, durable storage
• Available to multiple real-time analytics applications
• Average latency of 200ms with one standard consumer
• Enhanced Fan Out offers typical average latency of 70 ms
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Amazon Managed Streaming for Apache Kafka (MSK)
Availability Zone 1 Availability Zone 2 Availability Zone 3
Amazon MSK VPC
Your VPC
Producer Consumer Cluster Operation
• Scalability
• Security
• High Availability
• Deep AWS integrations
• Monitoring
• Rolling Version Upgrades
• Fully compatible with Apache Kafka
• HIPAA, SOC, PCI compliant
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
AWS Glue Schema Registry
Centrally discover, control, and evolve your data schemas
Improve data quality for your data streaming applications
Enforce schemas and schema evolution to prevent downstream application failures
Easily integrates with Amazon Managed Streaming for Apache Kafka (MSK), Amazon Kinesis Data Streams, and Amazon Kinesis Data Analytics for Apache Flink for convenient setup
Use provided open source libraries to compress data and save on storage and data transfer costs
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Amazon Kinesis Data Firehose
Amazon S3
Amazon Redshift
Amazon
Elasticsearch
Splunk
HTTP EndpointsINPUT
Capture and send data to
Amazon Kinesis Data Firehose
Kinesis Data Firehose
Prepares and loads the data
continuously to the destinations
you chooseOUTPUT
Analyze Streaming data using
your favorite BI tools
• Zero administration and seamless elasticity
• Direct-to-data store integration
• Serverless continuous data transformations
• Near real time
• Data format conversion to Parquet/ ORC
• Deliver data directly to Datadog, Sumo Logic, New Relic and MongoDB
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Stream processing
Amazon EMR
AWS Lambda
Kinesis
Kinesis Client Library
+
Connector Library
AWS Services
Apache Spark
3rd party
SQL/
Java
Amazon
Kinesis Data
Analytics
Records are read in the order they are produced,
enabling real-time analytics or streaming ETL
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Amazon Kinesis Data Analytics
OUTPUT
Send processed data to analytics tools so you
can create alerts and respond in real-time
KINESIS DATA ANALYTICS
SQL
Query and analyze streaming
data using SQL
Stateful stream processing
using Apache Flink
Amazon Kinesis
Data Firehose
Amazon Kinesis
Data Streams
Amazon MSK
Additional
streaming sources
INPUT
Captured streaming data
• Interact with streaming data in real time using SQL or integrated Apache Flink applications
• Build fully managed and elastic stream processing applications
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Customers using AWS Streaming solutions
1 billion events per
week from connected
devices
Near-real-time home
valuation
(Zestimates)
Live clickstream
dashboards refreshed
under 10s
100 GB/day
clickstreams from
250+ sites
50 billion daily ad
impressions, sub-
50 ms responses
Online stylist
processing 10 million
events/day
Migrated data bus from
Self-Managed Apache
Kafka to Kinesis
Facilitate
communications
between 100+
microservices
Analyze billions of
network flows in
real time
IoT predictive
analytics
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
• “Zestimate” home valuation for more than 100 million homes in US
• Uses wide variety of data as inputs to the algorithm
• Legacy home-grown solution no longer meeting the scaling needs
• Needed more real-time estimates for a better customer experience
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
What they did
Amazon EMR
Amazon Kinesis
Data Streams
Amazon Kinesis
Data Firehose
Amazon Simple
Storage Service
(S3)
ClientClient Client
Input Data
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Benefits
• Compute Zestimates faster and more frequently
• Up-to-date and accurate information as it is based on latest data
• No longer concerned with managing and scaling servers
• Easy scalability to support growth
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
• Monitoring and optimizing network is critical
• Needed a solution to ingest, augment and analyze
network traffic
• Respond to downtime in real-time
• Identify performance improvement opportunities
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
What they did
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Benefits
• Process and enrich multiple terabytes each day, representing billions of events, with sub-second response times for analytics queries
• Highly cost efficient compared to competing solutions
• Freedom to experiment with system architecture
• Data ingestion initiated with just a few simple API calls
• Highly elastic solution with close to 1,000 Amazon Kinesis shards working in parallel
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
• SaaS startup in Germany serving over 100m monthly users
• Speed Kit – main product that accelerates e-commerce websites
• Legacy solution used batch processing, which could take up to 24 hours
• Delayed insights until the next day
• Transformed batch based analytics process to a continuous event-
processing pipeline using Kinesis Data Analytics
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
What they did
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Benefits
• End-to-end processing time reduced from 24 hours to sub-minute latency
• Fully functional prototype within 4 weeks and in production in 8 weeks
• 500 million page loads from over 100 million unique users every month
• Continuously updating dashboards for all our customers
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential
Getting Started
https://aws.amazon.com/solutions/implementations/aws-streaming-
data-solution-for-amazon-kinesis/
Spin-up a Kinesis application within minutes
Contact us at [email protected], [email protected]
Need a deep dive?
https://www.workshops.aws/categories/Analytics
https://amazonmsk-labs.workshop.aws/en/
Kinesis and MSK Hands-on labs
https://aws.amazon.com/solutions/implementations/aws-streaming-
data-solution-for-amazon-msk/
Get started in minutes on MSK with our ready to use templates