Upload
amazon-web-services
View
267
Download
5
Embed Size (px)
Citation preview
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Gaurav Kumar, Christian Lam, David Winters
November 30, 2016
Taking Data to the ExtremeMBL202
David Winters
Big Data Architect,
Data Science &
Engineering, GoPro
Gaurav Kumar
Product Lead,
Data Science &
Engineering, GoPro
Christian Lam
Analytics Engineer,
Data Science &
Engineering, GoPro
Origin Story
•Make Friends
•Haul Ass
•Maintain Balance
•No Half-Assery
•Integrity. Always
•Be a HERO
Yes, this comes from the top…
High-Level Architecture
ETL Cluster
• Aggregations and Joins
• Hive
• Map/Reduce
Secure Data Mart Cluster
• End User Query
• Impala/Sentry
• Parquet
Analytics Apps
•Hue
•Tableau
•Python
•R
Streaming Ingest Cluster
•Log file streaming
•RESTful service
•Kafka
•Spark Streaming
•HBase
Batch Induction Framework
• Batch files
• Scheduled downloads
• Pre-processing
• Java App
Original
Cluster
JSON
JSON
Parquet
DDL
Data Pipeline
Streaming Ingest Cluster
ELBHTTP
Pipeline for processing of streaming logs
To ETL Cluster
Data Pipeline
ETL Cluster
HDFS
Hive Metastore
To SDM Cluster
From Streaming
Ingest Cluster
Batch
Induction
Framework
Data Delivery!
HDFS
Hive Metastore
Applications
Thrift
ODBC
Server
UserStudio
Studio - Staging
GDA
Report
SDM Cluster
From ETL Cluster
Areas for Improvement
• Isolation of workloads• Fast ingest• Secure• Fast delivery/queries• Loosely coupled clusters
• Multiple copies of data• Tightly coupled storage and compute• Lack of elasticity• Operational overhead of multiple clusters
Amazon S3
bucket
Future Architecture
Streaming
Ingest Cluster
Batch Induction Framework
Hive
Metastore
Ephemeral
ETL
Cluster
JSON
Parquet+
DDL
Aggregates
Events+
StateEphemeral
Data Mart
Cluster #1
Ephemeral
Data Mart
Cluster #2
Ephemeral
Data Mart
Cluster #N
Data Ops
Operations
Dataopsdashboardsallowustomonitorthehealthofdatastreamsanddetectanomalies,aswellasTableauServeritself.
Operations
PRE BUILT AGGREGATIONS
• Aggregates are
important for the
successful adoption
of Tableau
• Example: Karma
flight table
Operations
FEED PRODUCT
INSIGHTS
• Product insights can
result in product design
changes
• How we observed
Camera as A Hub
upload behaviors
Analytics
Reporting
Analytics
CAMERA
CONNECTS…
• Help us with
figuring out
stolen/smuggled
cameras
• Where we
should put our
new marketing
dollars for new
products
DASHBOARDS (GoPro Plus SUBSCRIPTIONS)
• Real-time
subscriber growth
• Utilizes Hive
External Tables
and JSON SERDE
Analytics