20
Big Data, AWS and The Data Pipeline

HUG Ireland Event - DNM slides

Embed Size (px)

Citation preview

Page 1: HUG Ireland Event - DNM slides

Big Data, AWS and The Data Pipeline

Page 2: HUG Ireland Event - DNM slides

• AWS and Big Data

• Amazon Solutions : − Kinesis

− S3

− EMR

− Redshift

− Data-Pipeline

• DoneDeal Project Overview (Martin Peters)

• Amazon Solutions Applied to DoneDeal (Solution Overview)

• Q&A

Agenda

Page 3: HUG Ireland Event - DNM slides

Why AWS and Big Data

• Agility – Amazon Web Services provides a broad range of services to help you build and

deploy Big Data applications quickly and easily

• Elasticity – AWS gives you fast access to flexible and low cost IT resources, so you can

rapidly scale virtually any big data application including data warehousing, clickstream analytics, fraud detection, recommendation engines, event-driven ETL, serverless computing, and internet-of-things processing.

• Pay for what you need – With AWS you don’t need to make large upfront investments in time and

money to build and maintain infrastructure. Instead, you can provision exactly the right type and size of resources you need to power your Big Data applications.

• Data Centric Services – Making the process of collecting, uploading, storing, and processing data on

AWS faster, simpler, and increasingly comprehensive.

Page 4: HUG Ireland Event - DNM slides

AWS Data Centric Services

• Data-centric Services :

– Managing Databases is Painful and Difficult

• Amazon Amazon RDS addresses many of the pain points and provides many ease-of-use features.

– SQL Databases do not Work Well at Scale – Amazon DynamoDB provides a fully managed, NoSQL model that has no inherent scalability limits

Page 5: HUG Ireland Event - DNM slides

AWS Data Centric Services Cont’d

• Hadoop is Difficult to Deploy and Manage – Amazon EMR can launch managed Hadoop clusters in minutes.

• Data Warehouses are Costly, Complex, and Slow – Amazon Redshift provides a fast, fully-managed petabyte-scale data warehouse at 1/10th the cost of traditional solutions.

• Streaming Data is Difficult to Capture – Amazon Kinesis facilitates real-time data processing of data streams at terabyte scale

Page 6: HUG Ireland Event - DNM slides

AWS and Big Data Use Cases

• On-Demand Big Data Analytics

• See http://aws.amazon.com/big-data/use-cases/ for more examples: – Clickstream Analysis

– Event-driven Extract, Transform, Load (ETL)

Page 7: HUG Ireland Event - DNM slides

Big Data Challenges

• Ever Increasing

– Volume

– Velocity

– Variety

• Ever Decreasing Latency

– Big Data moving to Real-Time Big Data

• Multiple overlapping tools and platforms

Page 8: HUG Ireland Event - DNM slides

Which Tools ?

Page 9: HUG Ireland Event - DNM slides

Simplify the Model

Page 10: HUG Ireland Event - DNM slides

Applying the Model to Solutions

Quick Sight

Page 11: HUG Ireland Event - DNM slides

Ingest: Stream to Kinesis

• Multiple options e.g.

Page 12: HUG Ireland Event - DNM slides

Ingest And Store: Kinesis, KCL

• Why Stream Storage: – Convert Multiple event streams into fewer persistent sequential streams

(easier to process) – Buffer and De-couple producers and consumers

• Kinesis – Low Latency – High Durability – Managed Service

• Kinesis Connector Library – Transform – Buffer – Filter – Emit

Page 13: HUG Ireland Event - DNM slides

Dynamic Capacity: Auto scaling

• The scaling policies that you define adjust the number of instances, within your minimum and maximum number of instances, based on the criteria that you specify.

Page 14: HUG Ireland Event - DNM slides

Store : Simple Storage Service(S3)

• Secure, Scalable, Reliable

Page 15: HUG Ireland Event - DNM slides

Process : Elastic Map Reduce (EMR)

Page 16: HUG Ireland Event - DNM slides

Process : RedShift

Page 17: HUG Ireland Event - DNM slides

RedShift Cont’d

Page 18: HUG Ireland Event - DNM slides

Orchestrate: DataPipeline

Page 19: HUG Ireland Event - DNM slides

Visualise: Tableau, QuickSight..

Page 20: HUG Ireland Event - DNM slides

Q&A

• Nigel and Martin ..