22
Big Data Storage and Analytics Q&A Matthew Aslett, research director

Cloudian 451-hortonworks - webinar

Embed Size (px)

Citation preview

Big Data Storage and Analytics Q&A

Matthew Aslett, research director

2

Webinar Logistics

●  Be on the look-out for polling questions ●  You may ask questions at any time during the presentation by using the

Q&A box ●  ON-Demand Viewers please tweet us questions @cloudianstorage ●  At the end of the presentation please provide feedback and rate us

451 Research is an information technology research & advisory company Founded in 2000

210+ employees, including over 100 analysts

1,000+ clients: Technology & Service providers, corporate advisory, finance, professional services, and IT decision makers

12,500+ senior IT professionals in our research community

Over 52 million data points each quarter

4,500+ reports published each year covering 2,000+ innovative technology & service providers

Headquartered in New York City with offices in London, Boston, San Francisco, and Washington D.C.

451 Research and its sister company Uptime Institute comprise the two divisions of The 451 Group

Research & Data

Advisory Services

Events

3 Copyright (C) 2015 451 Research LLC

4

Our Speakers

4

Paul Turner leads marketing, product planning and strategy at Cloudian. A storage industry expert, he joined Cloudian from NetApp where he ran the Product Strategy Office, guiding their investments into FlashRay,Iongrid and CacheIQ. Paul has more than 23 years of development and management leadership, including 15 years at Oracle.

Matt Aslet, Research Director for the data platforms and analytics research channel, has overall responsibility for the coverage of operational and analytic databases, data integration, data quality, and business intelligence. Matt's own primary area of focus is on relational and non-relational databases - including NoSQL and NewSQL - data warehousing, data caching, and Hadoop. Matthew is also an expert in open source software and regularly contributes to 451 Research's open source-related research.

John Kreisa A veteran from the enterprise marketing industry, John has worked on products at every level of the IT stack from the depths of storage through to the insight of business intelligence and analytics. Currently John leads partner and strategic marketing initiatives at open source leader Hortonworks who develops, distributes and supports Apache Hadoop.

•  Apache Hadoop •  Object storage •  NoSQL •  Steam processing •  Predictive analytics •  Data wrangling

Big data: cause and effect

5 Copyright (C) 2015 451 Research LLC

CAUSE?

•  Apache Hadoop •  Object storage •  NoSQL •  Steam processing •  Predictive analytics •  Data wrangling

Big data: cause and effect

•  Volume •  Velocity •  Variety

EFFECT

6 Copyright (C) 2015 451 Research LLC

CAUSE?

•  Apache Hadoop •  Object storage •  NoSQL •  Steam processing •  Predictive analytics •  Data wrangling

Big data: cause and effect

•  Volume •  Velocity •  Variety

EFFECT EFFECTED CAUSE

7 Copyright (C) 2015 451 Research LLC

•  Apache Hadoop •  Object storage •  NoSQL •  Steam processing •  Predictive analytics •  Data wrangling

Big data: cause and effect

•  Volume •  Velocity •  Variety

Economics: •  Commodity hardware •  Open source software

EFFECT EFFECTED CAUSE

8 Copyright (C) 2015 451 Research LLC

Big data is driven by economics

9

“Big  data  is  what  happened  when  the  cost  of  keeping  informa5on  became  less  than  the  cost  of  throwing  it  away.”    –  George  Dyson  

“Big  data:  New  business  insights  based  on  storing,  processing  and  analyzing  data  that  was  previously  ignored  due  to  the  cost  and  func5onal  limita5ons  of  tradi5onal  data  management  technologies.”    –  451  Research      

Copyright (C) 2015 451 Research LLC

Big data is driven by economics

10 Copyright (C) 2015 451 Research LLC

What  happened  when  the  cost  of  keeping  informa5on  became  less  than  the  cost  of  throwing  it  away?  

Big data is driven by economics

11

What  happened  when  the  cost  of  keeping  informa5on  became  less  than  the  cost  of  throwing  it  away?  •  The  processing  and  analysis  of  very  large  data  sets  in  their  en5rety  •  Increased  adop5on  of  massively  parallel  processing  approaches  •  Storage  and  analysis  of  both  structured  and  mul5-­‐structured  data  •  Integra5on  of  external  (social)  and  corporate  data  for  more  complete  perspec5ve  •  Schema-­‐free  and  schema-­‐on-­‐read  approaches  to  data  storage/analysis  •  Adop5on  of  exploratory  analy5c  approaches  to  iden5fy  new  paSerns  in  data  •  Predic5ve  analy5cs  as  a  fundamental  component  of  BI  strategies  •  Machine-­‐learning  algorithms  automate  the  reflec5on  of  collec5ve  intelligence  •  Increased  adop5on  of  in-­‐memory  databases  for  rapid  data  inges5on  •  Real-­‐5me  analysis  of  data  prior  to  storage  within  the  data  warehouse/Hadoop  •  Interac5ve,  na5ve,  SQL-­‐based  analysis  of  data  in  Hadoop  and  HBase  •  Large-­‐scale  processing  of  sensor  and  other  machine-­‐generated  data/events  

     

Copyright (C) 2015 451 Research LLC

•  Apache Hadoop •  Object storage •  NoSQL •  Steam processing •  Predictive analytics •  Data wrangling

Big data: cause and effect

•  Volume •  Velocity •  Variety

Economics: •  Commodity hardware •  Open source software

EFFECT EFFECTED CAUSE

12

   

       

   

   IoT  

Copyright (C) 2015 451 Research LLC

Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Traditional Analytic Systems Under Pressure Challenges •  Constrains data to app •  Can’t manage new data •  Costly to Scale

Business Value

Clickstream

Geolocation

Web Data

Internet of Things

Docs, emails

Server logs

2012 2.8 Zettabytes

2020 40 Zettabytes

LAGGARDS

INDUSTRY LEADERS

1

2 New Data

ERP CRM SCM

New

Traditional

Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Modern Data Architecture Emerges to Unify Analytics & Data Processing

Modern Data Analytics Architecture •  Enable applications to have access to

all your enterprise data through an efficient centralized platform

•  Supported with a centralized approach analytics, governance, security and operations

•  Versatile to handle any applications and datasets no matter the size or type

Clickstream   Web    &  Social  

Geoloca3on   Sensor    &  Machine  

Server    Logs  

Unstructured  

SOU

RC

ES

Existing Systems

ERP   CRM   SCM  

AN

ALY

TIC

S

Data Marts

Business Analytics

Visualization & Dashboards

AN

ALY

TIC

S

Applications Business Analytics

Visualization & Dashboards

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

HDFS (Hadoop Distributed File System)

YARN: Data Operating System

Interactive Real-Time Batch Partner ISV Batch Batch MPP  

EDW  

Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Hadoop Driver: Enabling the Data Lake for Analytics SC

ALE

SCOPE

Data Lake Definition •  Centralized Architecture

Multiple applications on a shared data set with consistent levels of service

•  Any App, Any Data Multiple applications accessing all data affording new insights and opportunities.

•  Unlocks ‘Systems of Insight’ Advanced algorithms and applications used to derive new value and optimize existing value.

Drivers: 1.  Cost Optimization 2.  Advanced Analytic Apps

Goal: •  Centralized Architecture •  Data-driven Business

DATA LAKE

Journey to the Data Lake with Hadoop

Systems of Insight

16

Your Data at Webscale Economics

16

HyperStore:    SoZware  Defined  Storage  

REPLICATION    (RF=1,2,3,4)  

ERASURE  CODING  (N+1,2,3,4)   COMPRESSION  

(Zlib,lz4)  

Commodity  Servers   Scale  Out   Durable   Simple  to  Use  

CPU   Disks   Network  

     

Heterogeneous  Node  

100TB  

300TB  

17

Smart Data

17

Consumer Activity (Events, GPS, WiFi) �

Social Media Device Tracking and Logs

Cloudian HyperStore

INTERNET  OF  THINGS  

BIG  DATA  Event  processing  

plaMorm  ü Analyze more – allows for efficient bulk

data analysis in place

ü Faster time-to-decision

ü HyperStore scales out with your data – adding nodes for I/O

Analytics

Result of Analysis �

18

Integration of Cloudian and Hortonworks

18

19

Interoperability : Cloudian & Hortonworks

19

YARN : Data Operating System

Script

Pig

Search

Solr

SQL

Hive/Tez, HCatalog

NoSQL

HBase Accumulo

Stream

Storm

Others

In-Memory Analytics,

ISV engines

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° °

°

°

N

Batch

Map Reduce

Linux Windows On-Premise Cloud

HDFS S3 Native File System (URI scheme: s3n)

20

Use Cases

20

Hadoop for Internet of Things

Clickstream data Sentiment data Server log data Sensor data Analysis of what people click on – Individual web pages and in what order. Clickstream analysis can reveal how users research products and also how they complete their online purchases. ü  Internet Marketing ü  Online Commerce

Unstructured data on opinions, emotions, and attitudes from sources like social media posts, blogs, online product reviews and customer support interactions. Organizations use sentiment analysis to understand how the public feels about something and track how those opinions change over time. ü  Retail ü  Media & Entertainment

Large enterprises build, manage and protect their own proprietary, distributed information networks. Server logs are the computer-generated records that report data on the operations of those networks. When there is a problem, its one of the first places the IT team looks for a diagnosis.

ü  IT Organizations ü  Customer Support

From refrigerators and coffee makers to energy-measuring smart meters, sensor data is everywhere. It is created by the machinery that runs assembly lines and the cell towers that route our phone calls. It is net new data that is increasing exponential in the information age. ü  Manufacturing ü  Industrial

21

Cloudian Smart Support

21

Thank You! Matt Aslett [email protected] www.451research.com @maslett

Paul Turner [email protected] www.cloudian.com @CloudianStorage John Kreisa [email protected] www.hortonworks.com @Hortonworks