Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Modern Data PlatformsRe-thinking Data
Mariano KampPrincipal Solutions ArchitectAmazon Web Services
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Why? 3 reasons why data is important to you.
What?
Re-thinking data? Broadening the horizon.
A differentiated view of needs lead to a differentiated
view on solution building blocks.
A conceptual view and exemplary implementation
scenarios.
How?A bottom up whirlwind tour through capabilities and
a top down approach to deliver a solution seed to
enagage with your business.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Data
every 5 years
There is more data than people think
15years
live for
Data platforms need to
1,000xscale
>10xgrows
You are here
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
There are more people accessing data
And morerequirements for making data available
Data Scientists
Analysts
Business Users
Applications
Secure Real time
Flexible Scalable
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Data is used onall levels
Within broad and narrow contexts
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
“Organizations that successfully
generate business value from their data
will outperform their peers. An
Aberdeen survey saw organizations
who implemented a data lake
outperforming similar companies by
9% in organic revenue growth.*”
24%
15%
Leaders Followers
Organic revenue growth
*Aberdeen: Angling for Insight in Today’s Data Lake, Michael Lock, SVP Analytics and Business Intelligence
Driving valuefrom data
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Sensor &
Log Data
Analytical Data
External Data
Systems of
Record
Systems of
Engagement
Data Sources
…
Insights Consumption
Amazon
QuickSight
Amazon
Athena
AWS
Lambda
Amazon
SageMaker
Amazon
API Gateway
Move Data
Amazon
Kinesis
AWS
Glue
AWS
Database
Migration Service
…
Data Lake
A data lake is a centralized repository that allows you to store
all your structured and unstructured data at any scale
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Sensor &
Log Data
Analytical Data
External Data
Systems of
Record
Systems of
Engagement
Data Sources
Consumable Data
Raw Data
Prepared Data
Raw Data - “As-Is”• All source system data for all time
• Data treated as containers
• Cheap
Prepared Data – “Integrated”• Trustworthy
• Easy to use, frictionless
• Business atomic
Consumable Data - “As-Needed”• Laser focused, application specific
• Right format, granularity
• Fits consumer’s technology
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Sensor &
Log Data
Analytical Data
External Data
Systems of
Record
Systems of
Engagement
Data Sources
Consumable Data
Raw Data
Prepared Data Business Users• Dashboarding
• Standardized Reports
• Aggregate Level Access
• Use KPIs to Steer Business
Processes
• Slice & Dice Analysis
Data Experts• Ad-hoc Reports
• Data Modeling
• Development of New KPIs
• Business Analysis
• Detail Level Data Access
• Data Visualization, Sandbox
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Sensor &
Log Data
Analytical Data
External Data
Systems of
Record
Systems of
Engagement
Data Sources
Consumable Data
Raw Data
Prepared Data
Downstream Systems• Data Feeds
• Data Platform acts as
Information Hub
• Centralizing Enterprise Data
Flow for dependent Systems
• Scheduled or On-Demand
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Sensor &
Log Data
Analytical Data
External Data
Systems of
Record
Systems of
Engagement
Data Sources
Consumable Data
Raw Data
Prepared Data
Data Scientists• Data Exploration
• Business Analysis
• Model Development
• Unstructured Data
• Approximate Data Integration
• Data Products
• Sandbox
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Sensor &
Log Data
Analytical Data
External Data
Systems of
Record
Systems of
Engagement
Data Sources
Consumable Data
Raw Data
Prepared Data
Insights Applications• Serving Actionable Insights
at the Point of Impact
• Providing the Smartness to
Business Processes
• Batch and Neartime
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Sensor &
Log Data
Analytical Data
External Data
Systems of
Record
Systems of
Engagement
Data Sources
Consumable Data
Raw Data
Prepared Data
Insights ApplicationsActionable Insights at
the Point of Impact
Downstream SystemsData Feeds,
Information Hub
Business UsersDashboarding,
Use KPIs, Slice & Dice
Data ExpertsAd-hoc Reports,
Create KPIs
Data ScientistExploration, Integration,
Predictive Models
Data and Insights Applications
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Sensor &
Log Data
Analytical Data
External Data
Systems of
Record
Systems of
Engagement
Data Sources
Consumable Data
Raw Data
Prepared Data
Insights ApplicationsActionable Insights at
the Point of Impact
Downstream SystemsData Feeds,
Information Hub
Business UsersDashboarding,
Use KPIs, Slice & Dice
Data ExpertsAd-hoc Reports,
Create KPIs
Data ScientistExploration, Integration,
Predictive Models
Data and Insights Applications
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
…
…
Sensor &
Log Data
Analytical Data
External Data
Systems of
Record
Systems of
Engagement
Data Processing, Metadata Management Machine Learning
Management and GovernanceSecurity, Identity and Compliance
Data Sources
Amazon
EMR
AWS
Glue
Amazon
Athena
Amazon
Rekognition
Amazon
Transcribe
Amazon
CloudWatch
AWS
CloudFormation
AWS
CloudTrail
Amazon
SageMaker
Amazon
Comprehend
AWS
Config
AWS
IAM
AWS
KMS
…
Consumable Data
Amazon
Elasticsearch
Service
Amazon
Redshift
Amazon
DynamoDB
Amazon
ElastiCache
Amazon
Neptune
Amazon
S3
…
Insights Consumption
Amazon
QuickSight
Amazon
Athena
AWS
Lambda
Amazon
SageMaker
Amazon
API Gateway
Move Data
Amazon
Kinesis
AWS
Glue
AWS
Database
Migration Service
…
Raw Data
Prepared Data
Amazon
S3
Amazon
S3
Amazon
S3 Glacier
…
Insights ApplicationsActionable Insights at
the Point of Impact
Downstream SystemsData Feeds,
Information Hub
Business UsersDashboarding,
Use KPIs, Slice & Dice
Data ExpertsAd-hoc Reports,
Create KPIs
Data ScientistExploration, Integration,
Predictive Models
Data and Insights Applications
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
…
…
Sensor &
Log Data
Analytical Data
External Data
Systems of
Record
Systems of
Engagement
Data Processing, Metadata Management Machine Learning
Management and GovernanceSecurity, Identity and Compliance
Data Sources
Amazon
EMR
AWS
Glue
Amazon
Athena
Amazon
Rekognition
Amazon
Transcribe
Amazon
CloudWatch
AWS
CloudFormation
AWS
CloudTrail
Amazon
SageMaker
Amazon
Comprehend
AWS
Config
AWS
IAM
AWS
KMS
…
Consumable Data
Amazon
Elasticsearch
Service
Amazon
Redshift
Amazon
DynamoDB
Amazon
ElastiCache
Amazon
Neptune
Amazon
S3
…
Insights Consumption
Amazon
QuickSight
Amazon
Athena
AWS
Lambda
Amazon
SageMaker
Amazon
API Gateway
Move Data
Amazon
Kinesis
AWS
Glue
AWS
Database
Migration Service
…
Raw Data
Prepared Data
Amazon
S3
Amazon
S3
Amazon
S3 Glacier
…
Insights ApplicationsActionable Insights at
the Point of Impact
Downstream SystemsData Feeds,
Information Hub
Business UsersDashboarding,
Use KPIs, Slice & Dice
Data ExpertsAd-hoc Reports,
Create KPIs
Data ScientistExploration, Integration,
Predictive Models
Data and Insights Applications
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
…
…
Sensor &
Log Data
Analytical Data
External Data
Systems of
Record
Systems of
Engagement
Data Processing, Metadata Management Machine Learning
Management and GovernanceSecurity, Identity and Compliance
Data Sources
Amazon
EMR
AWS
Glue
Amazon
Athena
Amazon
Rekognition
Amazon
Transcribe
Amazon
CloudWatch
AWS
CloudFormation
AWS
CloudTrail
Amazon
SageMaker
Amazon
Comprehend
AWS
Config
AWS
IAM
AWS
KMS
…
Consumable Data
Amazon
Elasticsearch
Service
Amazon
Redshift
Amazon
DynamoDB
Amazon
ElastiCache
Amazon
Neptune
Amazon
S3
…
Insights Consumption
Amazon
QuickSight
Amazon
Athena
AWS
Lambda
Amazon
SageMaker
Amazon
API Gateway
Move Data
Amazon
Kinesis
AWS
Glue
AWS
Database
Migration Service
…
Raw Data
Prepared Data
Amazon
S3
Amazon
S3
Amazon
S3 Glacier
…
Insights ApplicationsActionable Insights at
the Point of Impact
Downstream SystemsData Feeds,
Information Hub
Business UsersDashboarding,
Use KPIs, Slice & Dice
Data ExpertsAd-hoc Reports,
Create KPIs
Data ScientistExploration, Integration,
Predictive Models
Data and Insights Applications
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
…
…
Sensor &
Log Data
Analytical Data
External Data
Systems of
Record
Systems of
Engagement
Data Processing, Metadata Management Machine Learning
Management and GovernanceSecurity, Identity and Compliance
Data Sources
Amazon
EMR
AWS
Glue
Amazon
Athena
Amazon
Rekognition
Amazon
Transcribe
Amazon
CloudWatch
AWS
CloudFormation
AWS
CloudTrail
Amazon
SageMaker
Amazon
Comprehend
AWS
Config
AWS
IAM
AWS
KMS
…
Consumable Data
Amazon
Elasticsearch
Service
Amazon
Redshift
Amazon
DynamoDB
Amazon
ElastiCache
Amazon
Neptune
Amazon
S3
…
Insights Consumption
Amazon
QuickSight
Amazon
Athena
AWS
Lambda
Amazon
SageMaker
Amazon
API Gateway
Move Data
Amazon
Kinesis
AWS
Glue
AWS
Database
Migration Service
…
Raw Data
Prepared Data
Amazon
S3
Amazon
S3
Amazon
S3 Glacier
…
Insights ApplicationsActionable Insights at
the Point of Impact
Downstream SystemsData Feeds,
Information Hub
Business UsersDashboarding,
Use KPIs, Slice & Dice
Data ExpertsAd-hoc Reports,
Create KPIs
Data ScientistExploration, Integration,
Predictive Models
Data and Insights Applications
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
…
…
Sensor &
Log Data
Analytical Data
External Data
Systems of
Record
Systems of
Engagement
Data Processing, Metadata Management Machine Learning
Management and GovernanceSecurity, Identity and Compliance
Data Sources
Amazon
EMR
AWS
Glue
Amazon
Athena
Amazon
Rekognition
Amazon
Transcribe
Amazon
CloudWatch
AWS
CloudFormation
AWS
CloudTrail
Amazon
SageMaker
Amazon
Comprehend
AWS
Config
AWS
IAM
AWS
KMS
…
Consumable Data
Amazon
Elasticsearch
Service
Amazon
Redshift
Amazon
DynamoDB
Amazon
ElastiCache
Amazon
Neptune
Amazon
S3
Insights Consumption
Amazon
QuickSight
Amazon
Athena
AWS
Lambda
Amazon
SageMaker
Amazon
API Gateway
…
Move Data
Amazon
Kinesis
AWS
Glue
AWS
Database
Migration Service
…
Raw Data
Prepared Data
Amazon
S3
Amazon
S3
Amazon
S3 Glacier
…
Insights ApplicationsActionable Insights at
the Point of Impact
Downstream SystemsData Feeds,
Information Hub
Business UsersDashboarding,
Use KPIs, Slice & Dice
Data ExpertsAd-hoc Reports,
Create KPIs
Data ScientistExploration, Integration,
Predictive Models
Data and Insights Applications
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
…
…
Sensor &
Log Data
Analytical Data
External Data
Systems of
Record
Systems of
Engagement
Data Processing, Metadata Management Machine Learning
Management and GovernanceSecurity, Identity and Compliance
Data Sources
Amazon
EMR
AWS
Glue
Amazon
Athena
Amazon
Rekognition
Amazon
Transcribe
Amazon
CloudWatch
AWS
CloudFormation
AWS
CloudTrail
Amazon
SageMaker
Amazon
Comprehend
AWS
Config
AWS
IAM
AWS
KMS
…
Consumable Data
Amazon
Elasticsearch
Service
Amazon
Redshift
Amazon
DynamoDB
Amazon
ElastiCache
Amazon
Neptune
Amazon
S3
…
Insights Consumption
Amazon
QuickSight
Amazon
Athena
AWS
Lambda
Amazon
SageMaker
Amazon
API Gateway
Move Data
Amazon
Kinesis
AWS
Glue
AWS
Database
Migration Service
…
Raw Data
Prepared Data
Amazon
S3
Amazon
S3
Amazon
S3 Glacier
…
Insights ApplicationsActionable Insights at
the Point of Impact
Downstream SystemsData Feeds,
Information Hub
Business UsersDashboarding,
Use KPIs, Slice & Dice
Data ExpertsAd-hoc Reports,
Create KPIs
Data ScientistExploration, Integration,
Predictive Models
Data and Insights Applications
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
…
…
Sensor &
Log Data
Analytical Data
External Data
Systems of
Record
Systems of
Engagement
Data Processing, Metadata Management Machine Learning
Management and GovernanceSecurity, Identity and Compliance
Data Sources
Amazon
EMR
AWS
Glue
Amazon
Athena
Amazon
Rekognition
Amazon
Transcribe
Amazon
CloudWatch
AWS
CloudFormation
AWS
CloudTrail
Amazon
SageMaker
Amazon
Comprehend
AWS
Config
AWS
IAM
AWS
KMS
…
Consumable Data
Amazon
Elasticsearch
Service
Amazon
Redshift
Amazon
DynamoDB
Amazon
ElastiCache
Amazon
Neptune
Amazon
S3
…
Insights Consumption
Amazon
QuickSight
Amazon
Athena
AWS
Lambda
Amazon
SageMaker
Amazon
API Gateway
Move Data
Amazon
Kinesis
AWS
Glue
AWS
Database
Migration Service
…
Raw Data
Prepared Data
Amazon
S3
Amazon
S3
Amazon
S3 Glacier
…
Insights ApplicationsActionable Insights at
the Point of Impact
Downstream SystemsData Feeds,
Information Hub
Business UsersDashboarding,
Use KPIs, Slice & Dice
Data ExpertsAd-hoc Reports,
Create KPIs
Data ScientistExploration, Integration,
Predictive Models
Data and Insights Applications
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Systems of
Record
Data Processing, Metadata Management Machine Learning
Data Sources
AWS
Glue
Amazon
SageMaker
Consumable Data Insights Consumption
Amazon
QuickSight
Amazon
Athena
Amazon
SageMaker
Move Data
AWS
Database
Migration Service
Raw Data
Prepared Data
Amazon
S3
Amazon
S3
Insights ApplicationsActionable Insights at
the Point of Impact
Downstream SystemsData Feeds,
Information Hub
Business UsersDashboarding,
Use KPIs, Slice & Dice
Data ExpertsAd-hoc Reports,
Create KPIs
Data ScientistExploration, Integration,
Predictive Models
Data and Insights Applications
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
https://aws.amazon.com/de/quickstart/architecture/data-lake-foundation-with-aws-services/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Instructions
Code
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
AWS Lake Formation (join the preview)Build, secure, and manage a data lake in days
Build a data lake in days,
not months
Build and deploy a fully
managed data lake with a few
clicks
Enforce security policies
across multiple services
Centrally define security,
governance, and auditing policies in
one place and enforce those policies
for all users and all applications
Combine different
analytics approaches
Empower analyst and data scientist
productivity, giving them self-
service discovery and safe access to
all data from a single catalog
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
How it works
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Thank you!
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Mariano [email protected]
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMITSUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
More data lakes & analytics on AWS than anywhere else
CHALLENGE
Need to create constant feedback loop
for designers
Gain up-to-the-minute understanding
of gamer satisfaction to guarantee
gamers are engaged, thus resulting in
the most popular game played in the
world
Fortnite | 125+ million players
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Epic Games uses Data Lakes and analytics
Entire analytics platform running on AWS
S3 leveraged as a Data Lake
All telemetry data is collected with Kinesis
Real-time analytics done through Spark on EMR, DynamoDB to create scoreboards and real-time queries
Use Amazon EMR for large batch data processing
Game designers use data to inform their decisions
Game
clients
Game
servers
Launcher
Game
services
N E A R R E A L T I M E P I P E L I N E
N E A R R E A L T I M E P I P E L I N E
Grafana
Scoreboards API
Limited Raw Data
(real time ad-hoc SQL)User ETL
(metric definition)
Spark on EMR DynamoDB
NEAR REALTIME PIPELINES
BATCH PIPELINES
ETL using
EMR
Tableau/BI
Ad-hoc SQLS3
(Data Lake)
Kinesis
APIs
Databases
S3
Other
sources
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Customer interest in AWS Lake Formation
“We are very excited about the launch of AWS Lake
Formation, which provides a central point of control to
easily load, clean, secure, and catalog data from thousands of
clients to our AWS-based data lake, dramatically reducing
our operational load. … Additionally, AWS Lake Formation
will be HIPAA compliant from day one …”
- Aaron Symanski, CTO, Change Healthcare
“I can’t wait for my team to get our hands on AWS Lake
Formation. With an enterprise-ready option like Lake
Formation, we will be able to spend more time deriving
value from our data rather than doing the heavy lifting
involved in manually setting up and managing our data lake.”
- Joshua Couch, VP Engineering, Fender Digital
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.SUMMIT
Select AWS Glue customers