Architecting big data solutions in the cloud

Preview:

Citation preview

Session Objectives And Takeaways

Lambda Architecture

http://lambda-architecture.net/

1.All data entering the system is dispatched to both the

batch layer and the speed layer for processing.

2.The batch layer has two functions: (i) managing the

master dataset (an immutable, append-only set of raw

data), and (ii) to pre-compute the batch views.

3.The serving layer indexes the batch views so that they

can be queried in low-latency, ad-hoc way.

4.The speed layer compensates for the high latency of

updates to the serving layer and deals with recent data

only.

5.Any incoming query can be answered by merging

results from batch views and real-time views.

Lambda Architecture

Linux

Windows

What is HDInsight

HDInsight clusters on Azure

What is HBase

Order No Customer Name Customer Phone Company Name Company Address

12012015 Mostafa 101-232-2345 Microsoft Redmond, WA

Customer Company

Order No Customer

Name

Customer

Phone

Company Name Company

Address

12012015 Mostafa 101-232-2345 Microsoft Redmond, WA

Create

Select

Update

Select

What is HBase

data warehouse system

What is Hive

distributed fault-tolerant open-source

analytics solutions

templates

What is Apache Storm

Topologies

topology

Stream

Tuple

Spout

Bolt streams tuples streams

Apache Storm Components

100x 10x

What is Apache Spark

complexities of ingesting and storing all of your data batch streaming interactive analytics

Azure Data Lake (ADL)

Azure Data Lake (ADL)

Azure Data Lake Analytics

http://mostafa.rocks