Upload
joan-novino
View
229
Download
1
Embed Size (px)
Citation preview
Azure Café Marketplace Hortonworks Data Platform
MARKTPLACE
Learn how to architected, developed, and build completely in the
open, Hortonworks Data Platform (HDP).
Enterprise ready data platform to adopt a Modern Data
Architecture.
AZURE
CAFE
Agenda
• Introductions
• Big Data Market Trends
• Hortonworks Marketplace Solutions
• Demo with Q&A
• Next Steps
An online store for highly optimized and integrated
applications and services ready to deploy on
Microsoft Azure
Growing ecosystem of 3,000+ apps or components
Reduced sales cycle with pre-configured, ready-to-run
apps and services
Streamlined configuration, deployment, and management
Integrated platform experience
Top scenarios include: Big data, security, networking,
DevOps & automation, business continuity & backup,
management apps
Microsoft Azure Marketplace
ON
LY
100open source
Apache Hadoop data platform
% Founded in 2011
HADOOP1ST
provider to go public
IPO 4Q14 (NASDAQ: HDP)
employees across
800+
countries
technology partners
1,350
17TM
Hortonworks Company Profile
Fastest company to reach $100 M in revenue
Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDP FOR DATA AT REST
HDF FOR DATA IN MOTION
ACTIONABLEINTELLIGENCE
MODERN DATA APPS
Hortonworks Delivers
Two Connected Data
Platforms: HDP and HDF
PERISHABLE INSIGHTS
HISTORICAL INSIGHTS
INTERNETOF
ANYTHING
HDP + HDF Create Modern Data Apps
Real-Time Cyber Security
protects systems with superior threat detection
Smart Manufacturing
dramatically improves yields by managing more variables in greater detail
Connected, Autonomous Cars
drive themselves and improve road safety
Future Farming
optimizing soil, seeds and equipment to measured conditions on each square foot
Automatic Recommendation Engines
match products to preferences in milliseconds
DATA AT REST
HDF DATA IN MOTION
ACTIONABLEINTELLIGENCE
MODERN DATA APPS
SOURCES REGIONAL
INFRASTRUCTURE
CORE
INFRASTRUCTURE
HDF Manages Bidirectional Dataflow
Constrained
High-Latency
Localized Context
Hybrid – Cloud/On-Premise
Low-Latency
Global Context
Hortonworks Data Flow
Visual User Interface
Drag and drop for efficient, agile operations
Immediate Feedback
Start, stop, tune, replay dataflows in real-time
Adaptive to Volume and Bandwidth
Any data, big or small
Event Level Data Provenance
Governance, compliance & data evaluation
Secure Data Acquisition & Transport
Fine grained encryption for controlled data sharing and selective data democratization
Powered by
Apache NiFi
Hortonworks Data Platform Processes Data at Rest
GOVERNANCE
Manage and audit data according to
policy
OPERATIONS
Manage, Monitor and Maintain
cluster operations
DATA ACCESS
YARN: Data Operating System(Cluster Resource Management)
Batch
1 • • • • • • • • • • •
• • • • • • • • • • • •
HDFS(Hadoop Distributed File System)
Deployment
SECURITY
Authentication, Authorization &
Encryption for data at rest or in motion
InteractiveMachine Learning
Search Real Time
Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hortonworks Influences
the Apache Community
We Employ the Committers
--one third of all committers to the Apache®
Hadoop™ project, and a majority in Apache
NiFi and other important projects
Our Committers Innovate
and improve Connected Data Platforms
We Influence the Hadoop Roadmap
by communicating important requirements to
the community through our leaders
A P A C H E H A D O O P C O M M I T T E R S
HDP and HDF – Flexible Deployment Options
On-Premises Cloud
Virtualized
Your deployment of Hadoop
• VMWare
• Docker
• OpenStack
HDP on Your Hardware
• Linux or Windows
HDP on Appliance
Turnkey Hadoop Appliances
• Teradata
• Microsoft
• PSSC Labs
Infrastructure as a Service (IAAS)
• Amazon EC2
• Microsoft Azure
• Rackspace
Hadoop as a Service (HAAS)
Managed Hadoop Service
• Microsoft HDInsight
HDP on Azure
HDP Sandbox on
Marketplace
HDP Azure IaaS on
Marketplace
HDInsight on
Marketplace
Cloudbreak on
launch.hortonworks…
• Single node HDP
Cluster on
Marketplace
• Fully functional – all
HDP components
are preinstalled and
running
• Centos 7.1 VM
• Great for getting
started
• Multi node HDP on
Azure IaaS
• Users specify
number of nodes,
type of VM, HA/non-
HA.
• Processes HDFS
data on VHD disks
attached to VMs
• Can connect to
WASB
• Great for maximally
performing non-
elastic clusters
• Managed PaaS
offering by Microsoft
• Great for elastic
clusters -- compute
scaling independent
of storage
• Can spin up more
nodes on demand
automatically
• Processes data in
WASB (ADLS
coming soon)
• Autoscaling HDP
Clusters on Azure
• Runs HDP in Docker
containers
• Processes WASB
data
• Great for elastic
clusters – scale
compute layer
independently from
storage
• Periscope scales
clusters depending
on SLA
requirements
Classic Hadoop Driver: Cost optimization
Archive Data off EDWMove rarely used data to Hadoop as active
archive, store more data longer
Onboard costly ETL processFree your EDW to perform high-value functions
like analytics & operations, not ETL
Enrich the value of your EDWUse Hadoop to refine new data sources, such as
web and machine data for new analytical context
AN
AL
YT
ICS
Data
Marts
Business
Analytics
Visualization
& Dashboards
HDP helps you reduce costs and optimize the value associated with your EDW
AN
AL
YT
ICS
DA
TA
SY
ST
EM
S
Data
Marts
Business
Analytics
Visualization
& Dashboards
HDP 2.3
ELT
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
N
Cold Data,
Deeper Archive
& New Sources
Enterprise Data
Warehouse
Hot
MPP
In-Memory
Clickstream Web & Social
Geolocation Sensor & Machine
Server Logs
Unstructured
Existing Systems
ERP CRM SCM
SO
UR
CE
S
Case Study: 12 month Hadoop evolution at TrueCarD
ata
Pla
tfo
rm C
ap
ab
ilit
ies
12 months execution plan
June 2013
Begin
Hadoop
Execution
July 2013Hortonworks
Partnership
May ‘14
IPO
Aug 2013
Training
& Dev
Begins
Nov 2013
Production
Cluster
60 Nodes
2 PB
Jan 2014
40% Dev
Staff
PerficientDec 2013
Three
Production
Apps
(3 total)
Feb 2014
Three More
Production
Apps
(6 total)
12 Month Results at TRUECar
• Six Production Hadoop Applications
• Sixty nodes/2PB data
• Storage Costs/Compute Costs
from $19/GB to $0.12/GB
“We addressed our data platform capabilities
strategically as a pre-cursor to IPO.”
Common Apache NiFi Use Cases
Predictive Analytics
Ensure the highest value data is captured and available for analysis
ComplianceGain full transparency into provenance and flow of data
IoT OptimizationSecure, Prioritize, Enrich and Trace data at the edge
Fraud DetectionMove sales transaction data in real time to analyze on demand
Big Data IngestEasily and efficiently ingest data into Hadoop
Value Resources
Gain visibility into how data sources are used to determine value
20092006
1 ° ° ° ° °
° ° ° ° ° N
HDFS (Hadoop Distributed File
System)
MapReduceLargely Batch Processing
Hadoop w/ MapReduce
YARN: Data Operating System
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° °
°
°N
HDFS (Hadoop Distributed File System)
Hadoop2 & YARN based Architecture
Silo’d clusters
Largely batch system
Difficult to integrate
MR-279: YARN
Hadoop 2 & YARN
Interactive Real-TimeBatch
Architected &
led development
of YARN to enable
the Modern Data
Architecture
October 23, 2013
YARN: A Data Operating System
Enables Multi-Tenancy
Better Utilization of existing clusters
• 60% – 150% improvement on node utilization
Enable next-generation Vendor Integration
• YARN is an application framework. (e.g: SAS, R, SAP)
Run Next-Generation Workloads
• Interactive SQL + Streaming + ML +…
YARN in Production
• Yahoo: ~40,000 nodes, multiple clusters running YARN across over 365PB of data
• Spotify, Progressive, Kohls, UHG, Sprint, JPMC, Target, AIG, Samsung
YARN: Data Operating System
(Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Script
Pig
SQL
Hive
TezTez
Java
Scala
Cascading
Tez
° °
° °
° ° ° ° °
° ° ° ° °
Others
ISV
Engines
HDFS (Hadoop Distributed File System)
Stream
Storm
Search
Solr
NoSQL
HBase
Accumulo
Slider Slider
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
How do you Operate a Hadoop Cluster?
Apache Ambari is a
platform to provision,
manage and monitor
Hadoop clusters
Storm/Spark Streaming
Storm
Detailed Reference Architecture for IoT Applications
HDF
Flume
Sink to
HDFS
Transform
Interactive
UI Framework
Hive
Hive
HDFS
HDFS
SOURCE DATA
Server logs
Application Logs
Firewall Logs
CRM/ERP
Sensor
Kafka
Kafka
Stream to
HDF
Forward to
Storm
Real Time Storage
Spark-ML
Pig
Alerts
Bolt to
HDFS
Dashboard
Silk
JMSAlerts
Hive Server
HiveServer
Reporting
BI Tools
High Speed
Ingest
Real-Time
Batch Interactive
Machine Learning
Models
Spark
Pig
Alerts SQOOP
Flume
Iterative ML
Hbase/Pheonix
HBaseEvent Enrichment
Spark-Thrift
Pig
Azure Café Next Steps For more information regarding the Azure Marketplace and Hortonworks solutions
contact:
• Marti Stephens-Hartka – Microsoft ISV Leader East Region [email protected]
• Saptak Sen – Hortonworks Group Manager Partner Solutions [email protected]
Additional Resources:
• Azure HDInsight: https://azure.microsoft.com/en-us/services/hdinsight/
• Hortonworks Sandbox on Azure Marketplace: https://azure.microsoft.com/en-
us/marketplace/partners/hortonworks/hortonworks-sandbox/
• Hortonworks Data Platform on Azure Marketplace: https://azure.microsoft.com/en-
us/marketplace/partners/hortonworks/hortonworks/
• Hortonworks Customer Stories: http://hortonworks.com/customers/
• Hortonworks Blog: http://hortonworks.com/blog/
• Microsoft Cortana Analytics Suite: http://www.microsoft.com/en-us/server-cloud/cortana-analytics-suite/overview.aspx
• Azure Data Lake Analytics: https://azure.microsoft.com/en-us/solutions/data-lake/
• Hortonworks and Microsoft on YouTube: https://www.youtube.com/watch?v=zWVlOMlzZgw&feature=youtu.be
https://tryazuremarketplace.com/
Complete Labs for top AMP ISVs Hortonworks, Barracuda, Chef, Docker, Kemp and Coming Soon (Hanu and
more)
• Most popular Azure
Marketplace solutions in
4 tracksHolistic
• 3 week intervals
between same track ISV
• Onboarding assistance
with lab set up
Programmatic
• Need access to Azure
SubscriptionTargeted
Dev Ops Security Big Data Management
Chef Barracuda Hortonworks Cloud Cruiser
April 27th April 20th May 5th May 18th
Docker Kemp DataStax Hanu
May 11th May 18th May 25th June 8th
Core OS Nasuni
June 1st June 15th
*Registration links not yet available
Barracuda Bus Tour BriefActivity Name Barracuda + Microsoft North America Bus/MTC TourApproximate Length
15 cities 02/09/16 – 04/21/16
General Overview
The Barracuda Bus Tour is an annual event. – this is Barracuda’s fifth annual tour, but first with a partner. The goal of this year’s tour with Microsoft is to:- Drive awareness of the Barracuda/Microsoft solutions – Office 365 and Azure- Target engagement across three focus areas: Customers, Partners and Microsoft
Sellers- Drive pipeline/revenue – Azure consumption and Office 365 active usage
Event Track 10:30am – 12:45pm Customer Seminar - Migration and Security for Azure and Office 365 –1:30pm – 3:30pm Partner & Seller Seminar – Migration and Security for Azure and Office 365 –
Goals/Metrics - Solution Awareness- Education- Leads/Revenue- Drive Azure consumption- Drive Office 365 active usage
Registration Link https://httpswww.barracuda.com/programs/expedition
Date(s) Location/ State
Tues, 2/16 Mountain View, CA
Tues, 2/23 Irvine, CA
Wed, 2/24 Los Angeles, CA
We, 3/16 Portland, OR
Thurs, 3/17 Seattle, WA
Wed, 3/30 Dallas, TX
Thurs, 3/31 Houston, TX
Wed, 4/6 Minneapolis, MN
Thurs, 4/7 Chicago, IL
Mon, 4/11 Detroit, MI
Wed, 4/13 Boston, MA
Thurs, 4/14 New York, NY
Mon, 4/18 Philadelphia, PA
Tues, 4/19 Reston, VA
Wed, 4/20 Charlotte, NC
Thurs, 4/21 Atlanta, GA
An online store for highly optimized and integrated
applications and services ready to deploy on
Microsoft Azure
Growing ecosystem of 3,000+ apps or components
Reduced sales cycle with pre-configured, ready-to-run
apps and services
Streamlined configuration, deployment, and management
Integrated platform experience
Top scenarios include: Big data, security, networking,
DevOps & automation, business continuity & backup,
management apps
Microsoft Azure Marketplace