Upload
thanuja-seneviratne
View
607
Download
1
Tags:
Embed Size (px)
Citation preview
Part I Recap
Big Data Market
› Data Growth
› Market Growth
› Market Drivers
› Adoption Cycle
› Forrester Market Report Findings
Big Data Products
› Enterprise Data Warehouses (EDW) – non-canonical, traditional
› Big Data Products Offering
› Hadoop and its Distros
› MapR and Others
› Big Data Products Stack
Future of Big Data
Data Science vs Traditional Analytics Traditional Analytics - Decide what data is relevant, create a static data model, data visualize
Data Science – Assemble all possible data, create a predictive model, operationalize the model (visualize, feed to another system)
Three types of data stores/data management systems› Relational vs non-relational [MSSQL, Oracle, MySql vs NoSql products]
› Relational “big data” offering called EDW (mostly packaged as MPP appliances)
› Each three types has merits in certain use cases and will be continued to be used in the industry
› Why EDW is not enough for new “big data” scenarios Three V’s becoming too heavy
Time to Market is delayed
High Cost
Write-first schema unnecessary
Importance of Individualized experience› Another Sample case: Money found $ 1000 in front of a bank, Will a person return it to the bank or
runaway with it?
› Multiple business cases and multiple use cases
Hadoop as the premier open source “big data” offering and its distros
Other Hadoop-like “big data” offerings
Market Drivers
› Business Drivers Reactive Analytics instead Proactive Analytics
Insights generated for competitive advantage
Rise of Data-First enterprise
› Technical Drivers Data growing exponentially to petabyte scale
Data is everywhere with variety of formats
› Financial Drivers Cost of IT continues to grow
Commodity hardware instead Enterprise hardware
Forrester Market Report Findings
› Unstoppable Hadoop momentum in the market
› More and more enterprises wants to do POC’s
› Open source is the key
› Many Big Data products – a fair amount products to chose
from. But no market dominating leader yet.
Hadoop distributions
Other products including MapR
› Enterprise Hadoop and partnerships with large vendors
IBM, TeraData, Pivotal, Microsoft
› Hadoop in the cloud
› Hadoop Ecosystem
Enterprise Data Warehouses (EDWs)› Traditional big data offering
› Non-canonical or original way of storing large data sets
› Refer to Part I slides
Hadoop and its distros› History of Hadoop
› Hadoop as a Platform
HortonWorks Data Platform (HDP)
Cloudera Distribution on Hadoop (CDH)
Big Vendors› IBM’s BigInsights – This is a Hadoop distro through Cloudera’s CDH
› Microsoft’s HDInsight on Azure – this is a Hadoop distro through
HortonWorks’ HDP
› SAP’s HANA – this is a Hadoop distro through HortonWorks’ HDP
MapR and Others› Instead HDFS MapR uses Network File System (NFS)
› MapR Distros
Open source M3 in Amazon Cloud
Premium M5 in Amazon Cloud
MapR distro on Google
› Others
Amazon EMR› A Hadoop distro on Amazon EC2 clusters in the Amazon cloud
› Exposed a Web service to manage the clusters
› Most popular and cost-effective distro apart from Cloudera and
HortonWorks
Hybrids› Converging SQL Enterprise Data Warehouses (specially MPP
products) with Big Data
› The investments made for long running contracts with EDW vendors
are safeguarded
› Existing SQL/DW knowledge and skill set can be utilized
› Following are popular products:
Market leader by 2020
Many products and alternatives are coming our way
5Vs-driven ecosystem instead 3Vs
Demanding skill-set around the Big Data technologies
› Enterprise Hadoop,
› Hadoop Distros,
› MapR and its Distros,
› Hadoop stack,
› Application Frameworks and languages
“R” language and frameworks
Scala language and frameworks
Subjective evolution instead objective evolution
› Improvements to Big Data Infrastructure (BDI)
› Improvements to Big Data Life Cycle (BDLC)
› Evolve to All-Data processing