Upload
agileiss
View
297
Download
4
Embed Size (px)
Citation preview
Experts in
&
Enterprise Data LakeBuild Lake on Cloud
A T T U N I T Y
PARTNERS
Innovating and Engineering
High Performance Data Integration
BI & Analytics
Platforms
On-Premise, Cloud or Hybrid.
Data Lake is a repository for large quantities and varieties of data, both structured and unstructured. Data generalists / programmers
can tap the stream data for real-time analytics.
Data scientists use the lake for discovery and ideation.
Data lakes take advantage of commodity cluster
computing techniques for massively scalable,
low-cost storage of data files in any format.
Of�load “Cold” Data From DW to Hadoop
Dramatically lowers the cost per terabyte to store data - Hadoop based storage is 30x cheaper
More Information can be retained and analyzed
Improves performance of the Data Warehouse
“Cold” data still available to be queried on-line or interactively
“Cold” data in Hadoop can be mined for additional insights or combined with other data
Bene�its
Data WarehouseETL
Reports / Dashboard / Queries
“HOT”
Hadoop “COLD”
Ongoing data load
Initial bulk load of raw or
infrequently used data
Re-factor queries and reports to work via HIVE-QL
Translate DW Data Model to Hive /
HCatalog
For frequently used data
AF
TE
RB
EF
OR
E
The data lake accepts input from various sources and
can preserve both the original data fidelity and the
lineage of data transformations. Data models emerge
with usage over time rather than being imposed up front.
The lake can serve as a staging area for the data warehouse, the location of more carefully "treated" data for reporting and analysis in batch mode.
What is a Data Lake?
QuboleAWS Data Pipe Line
FT
P
En
terp
rise
Sys
tem
s
D ATA L A K E O N C L O U DAW S - S 3
Amazon AWS Cloud
Google +
iTunes Store
Google Play
You Tube
Amazon MP3
Spotify
VEVO
Amazon Prime
HULU
DATA ARCHIVES
XML
OTHER
EXCEL
TXT
CSV
JSON
EDI
External Business Partners & Third Party
SAP
MySQL Prod
uct,
Cus
tom
er
& O
ther
Dat
a
CRMOracle
Oracle SQL Server
MySQL Oracle SQL Server
MicroStrategy | Business Objects
Dashboard
ETL
Reporting
FTP
Spark
HIVE
PrestoHadoop
Qubole
Analytics & Data Scientist
MicroStrategy | TableauHadoop Map Reduce
Data Stream’s to Data
Lake On-Demand Data Flow
Regular Data Flow
Replication
Data Lake Reference Architecture
SERVICES
STAFFING DATA WAREHOUSING BI APPLICATIONS CLOUD BI MOBILE BI BIG DATA
MASTER DATA MANAGEMENT
W W W . A G I L E I S S . C O M