33
© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential © 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential April 2021 Real Time Data Ingestion for Your Analytics Ecosystem

Real Time Data Ingestion for Your Analytics Ecosystem

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

April 2021

Real Time Data Ingestion for

Your Analytics Ecosystem

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

Agenda

What is data streaming and real-time analytics?

Why should you use real-time analytics?

Common use-cases

Enabling streaming and real-time analytics

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

• The number of “smart” devices is

projected to be 200 billion by

2021 (over 100X increase in ten

years)

• 90% of the data in the world was

generated in the last 2 years

• There are 2.5 quintillion bytes of

data created each day, and this

pace is accelerating

The of data being produced is increasing

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

Data is produced continuously and at high

Source:

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

Data is produced from a large of sources

Smart buildings

Web clickstream Application logs

IoT sensorsMetering records

Mobile apps

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

The of data diminishes over time

Source: Perishable insights,

Mike Gualtieri, Forrester

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

Failing to act in real-time can translate to real losses

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

To create companies must derive insights

from a of data sources that are

producing data at high and

real-time

Data

warehouse

Data

lake

Data

streams

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

Data streaming and real-time analytics

Ingest, process, and analyze high volumes of high velocity data

from a variety of sources in real time

Devices and applications that

produce real-time

data at high velocity.

Data from tens of thousands of data sources can be collected and ingested

in real time.

Data is stored in the order it was received for a set

duration of time, and can be replayed

indefinitely during this time.

Records are read inthe order they are

produced enabling real-time analytics or streaming

ETL.

Data lakeData Warehouse(most common)

Database(least common)

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

Common real-time analytics use cases

Anomaly and fraud detection

Empowering IoT Analytics

Nourishing Marketing campaigns

Real-time personalization

Tailoring customer experience in real-time

Supporting healthcare and emergency services

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

Achieving real-time analytics use cases

Messaging between

micro-services

Response analytics

(Web and mobile

app notifications)

Log ingestion

IoT device

maintenance

Change Data

Capture (CDC)

Streaming ETL

into data lakes and

data warehouses

milliseconds seconds minutes

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

Challenges of data streaming

Difficult to setup Tricky to scale

Hard to achieve high

availability

Integration required

development

Error prone and complex to

manage

Expensive to

maintain

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

Streaming real-time data with AWS

Easy to use Elastic Highly available &

durable

Seamless AWS integrations

Fully managed

Pay for what you use

Get started in less than 10 minutes using our solution implementation templates –https://aws.amazon.com/solutions/implementations/aws-streaming-data-solution-for-amazon-kinesis/

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

Data sources

Mobile Apps Web Clickstream/ Social

Application Logs IoT Sensors

Connected Products Smart Buildings

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

Stream ingestion

AWS IoT

Amazon CloudWatch

Logs

Amazon CloudWatch

Events

AWS SDKLOG4J

Flume

FluentdAWS Mobile

SDK

Kinesis

Producer

Library

Kinesis Agent

* AWS DMS includes 8 on-premise databases, 1 Azure

database, 5 RDS/Aurora database types, and Amazon S3

AWS Toolkits/Libraries AWS Service Integrations 3rd Party Offerings

Amazon Database

Migration Service*

Data from tens of thousands of data sources can be collected and ingested in real time

Amazon Dynamo DB

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

Stream storage

Data is stored in the order it was received for a set duration

of time, and can be replayed indefinitely during this time.

Collect and store data streams

for analytics

Amazon Managed

Streaming for

Apache Kafka

Amazon Kinesis

Data Streams

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

Amazon Kinesis Data Streams

Amazon Kinesis

Data Analytics

Amazon Kinesis

Data Firehose

Spark on EMR

Amazon EC2

AWS LambdaINPUT

Capture and send data to

Amazon Kinesis Data Streams

Shard 1

Shard 2

Shard 3

Shard 4

Shard n

Kinesis Data Streams

Ingests and stores data streams

for processing

OUTPUT

Analyze Streaming data using

your favorite BI tools

• Easy administration and low cost

• Real-time, elastic performance

• Secure, durable storage

• Available to multiple real-time analytics applications

• Average latency of 200ms with one standard consumer

• Enhanced Fan Out offers typical average latency of 70 ms

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

Amazon Managed Streaming for Apache Kafka (MSK)

Availability Zone 1 Availability Zone 2 Availability Zone 3

Amazon MSK VPC

Your VPC

Producer Consumer Cluster Operation

• Scalability

• Security

• High Availability

• Deep AWS integrations

• Monitoring

• Rolling Version Upgrades

• Fully compatible with Apache Kafka

• HIPAA, SOC, PCI compliant

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

AWS Glue Schema Registry

Centrally discover, control, and evolve your data schemas

Improve data quality for your data streaming applications

Enforce schemas and schema evolution to prevent downstream application failures

Easily integrates with Amazon Managed Streaming for Apache Kafka (MSK), Amazon Kinesis Data Streams, and Amazon Kinesis Data Analytics for Apache Flink for convenient setup

Use provided open source libraries to compress data and save on storage and data transfer costs

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

Amazon Kinesis Data Firehose

Amazon S3

Amazon Redshift

Amazon

Elasticsearch

Splunk

HTTP EndpointsINPUT

Capture and send data to

Amazon Kinesis Data Firehose

Kinesis Data Firehose

Prepares and loads the data

continuously to the destinations

you chooseOUTPUT

Analyze Streaming data using

your favorite BI tools

• Zero administration and seamless elasticity

• Direct-to-data store integration

• Serverless continuous data transformations

• Near real time

• Data format conversion to Parquet/ ORC

• Deliver data directly to Datadog, Sumo Logic, New Relic and MongoDB

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

Stream processing

Amazon EMR

AWS Lambda

Kinesis

Kinesis Client Library

+

Connector Library

AWS Services

Apache Spark

3rd party

SQL/

Java

Amazon

Kinesis Data

Analytics

Records are read in the order they are produced,

enabling real-time analytics or streaming ETL

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

Amazon Kinesis Data Analytics

OUTPUT

Send processed data to analytics tools so you

can create alerts and respond in real-time

KINESIS DATA ANALYTICS

SQL

Query and analyze streaming

data using SQL

Stateful stream processing

using Apache Flink

Amazon Kinesis

Data Firehose

Amazon Kinesis

Data Streams

Amazon MSK

Additional

streaming sources

INPUT

Captured streaming data

• Interact with streaming data in real time using SQL or integrated Apache Flink applications

• Build fully managed and elastic stream processing applications

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

Customers using AWS Streaming solutions

1 billion events per

week from connected

devices

Near-real-time home

valuation

(Zestimates)

Live clickstream

dashboards refreshed

under 10s

100 GB/day

clickstreams from

250+ sites

50 billion daily ad

impressions, sub-

50 ms responses

Online stylist

processing 10 million

events/day

Migrated data bus from

Self-Managed Apache

Kafka to Kinesis

Facilitate

communications

between 100+

microservices

Analyze billions of

network flows in

real time

IoT predictive

analytics

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

• “Zestimate” home valuation for more than 100 million homes in US

• Uses wide variety of data as inputs to the algorithm

• Legacy home-grown solution no longer meeting the scaling needs

• Needed more real-time estimates for a better customer experience

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

What they did

Amazon EMR

Amazon Kinesis

Data Streams

Amazon Kinesis

Data Firehose

Amazon Simple

Storage Service

(S3)

ClientClient Client

Input Data

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

Benefits

• Compute Zestimates faster and more frequently

• Up-to-date and accurate information as it is based on latest data

• No longer concerned with managing and scaling servers

• Easy scalability to support growth

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

• Monitoring and optimizing network is critical

• Needed a solution to ingest, augment and analyze

network traffic

• Respond to downtime in real-time

• Identify performance improvement opportunities

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

What they did

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

Benefits

• Process and enrich multiple terabytes each day, representing billions of events, with sub-second response times for analytics queries

• Highly cost efficient compared to competing solutions

• Freedom to experiment with system architecture

• Data ingestion initiated with just a few simple API calls

• Highly elastic solution with close to 1,000 Amazon Kinesis shards working in parallel

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

• SaaS startup in Germany serving over 100m monthly users

• Speed Kit – main product that accelerates e-commerce websites

• Legacy solution used batch processing, which could take up to 24 hours

• Delayed insights until the next day

• Transformed batch based analytics process to a continuous event-

processing pipeline using Kinesis Data Analytics

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

What they did

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

Benefits

• End-to-end processing time reduced from 24 hours to sub-minute latency

• Fully functional prototype within 4 weeks and in production in 8 weeks

• 500 million page loads from over 100 million unique users every month

• Continuously updating dashboards for all our customers

© 2021, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential

Getting Started

https://aws.amazon.com/solutions/implementations/aws-streaming-

data-solution-for-amazon-kinesis/

Spin-up a Kinesis application within minutes

Contact us at [email protected], [email protected]

Need a deep dive?

https://www.workshops.aws/categories/Analytics

https://amazonmsk-labs.workshop.aws/en/

Kinesis and MSK Hands-on labs

https://aws.amazon.com/solutions/implementations/aws-streaming-

data-solution-for-amazon-msk/

Get started in minutes on MSK with our ready to use templates