Upload
hortonworks
View
395
Download
3
Tags:
Embed Size (px)
Citation preview
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Getting to What Matters: Accelerating Your Path Through the Big Data Lifecycle with CSC and Hortonworks
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Presenters
• John Kreisa (@marked_man) VP Strategic Marketing, Hortonworks Over 20 years in data management as a developer and a marketer
• Tim Gasper (@TimGasper) Global Offerings Manager, CSC Led product for Infochimps for 4 years, now called the CSC Big Data PaaS; leads product/offering management for CSC Big Data & Analytics
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Traditional systems under pressure Challenges • Constrains data to app • Can’t manage new data • Costly to Scale
Business Value
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
2012 2.8 Zettabytes
2020 40 Zettabytes
LAGGARDS
INDUSTRY LEADERS
1
2 New Data
ERP CRM SCM
New
Traditional
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop emerged as foundation of new data architecture
Apache Hadoop is an open source data platform for managing large volumes of high velocity and variety of data • Built by Yahoo! to be the heartbeat of its ad & search business
• Donated to Apache Software Foundation in 2005 with rapid adoption by large web properties & early adopter enterprises
• Incredibly disruptive to current platform economics
Traditional Hadoop Advantages ü Manages new data paradigm ü Handles data at scale ü Cost effective ü Open source
Application
Storage HDFS
Batch Processing MapReduce
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
SYSTEMS INTEGRATOR
OPERATIONAL TOOLS
DEV & DATA TOOLS
INFRASTRUCTURE
Hadoop is deeply integrated in the data center SO
UR
CES
EXISTING Systems
Clickstream Web &Social GeolocaDon Sensor & Machine
Server Logs Unstructured
DAT
A S
YSTE
M
RDBMS EDW MPP
APPLICAT
IONS
Deep Partnerships Hortonworks engages in deep engineered relationships with the leaders in the data center, such as HP, Microsoft, Red Hat, SAP, SAS & Teradata Broad Partnerships Over 600 partners work with us to certify their applications to work with Hadoop so they can extend big data to their users
HDP 2.2
Gov
erna
nce
&
Inte
grat
ion
Secu
rity
Ope
ratio
ns
Data Access
Data Management
YARN
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
SYSTEMS INTEGRATOR
OPERATIONAL TOOLS
DEV & DATA TOOLS
INFRASTRUCTURE
CSC and the Modern Data Architecture Modern Data Architecture
• Enable applications to have access to all your enterprise data through an efficient centralized platform
• Supported with a centralized approach governance, security and operations
• Versatile to handle any applications and datasets no matter the size or type
CSC Extends Hadoop’s Reach
• Allows for multiple deployment options - including on-premise, managed or Big Data as a Service.
• CSC’s global consulting services can help you architect, develop and implement your big data strategy, analytics, integrations, and platforms
Clickstream Web & Social
GeolocaDon Sensor & Machine
Server Logs
Unstructured
SOU
RC
ES
Existing Systems
ERP CRM SCM
AN
ALY
TIC
S
Data Marts
Business Analytics
Visualization & Dashboards
AN
ALY
TIC
S
Applications Business Analytics
Visualization & Dashboards
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
HDFS (Hadoop Distributed File System)
YARN: Data Operating System
Interactive Real-Time Batch Partner ISV Batch Batch MPP EDW
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop Driver: Cost optimization
Archive Data off EDW Move rarely used data to Hadoop as active archive, store more data longer
Offload costly ETL process Free your EDW to perform high-value functions like analytics & operations, not ETL
Enrich the value of your EDW Use Hadoop to refine new data sources, such as web and machine data for new analytical context
AN
ALY
TIC
S
Data Marts
Business Analytics
Visualization & Dashboards
HDP helps you reduce costs and optimize the value associated with your EDW
AN
ALY
TIC
S D
ATA
SYST
EMS
Data Marts
Business Analytics
Visualization & Dashboards
HDP 2.2
ELT °
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
N
Cold Data, Deeper Archive & New Sources
Enterprise Data Warehouse
Hot
MPP
In-Memory
Clickstream Web & Social
GeolocaDon Sensor & Machine
Server Logs
Unstructured
Existing Systems
ERP CRM SCM
SOU
RC
ES
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Single View Improve acquisition and retention
Predictive Analytics Identify your next best action
Data Discovery Uncover new findings
Financial Services
New Account Risk Screens Trading Risk Insurance Underwriting
Improved Customer Service Insurance Underwriting Aggregate Banking Data as a Service
Cross-sell & Upsell of Financial Products Risk Analysis for Usage-Based Car Insurance Identify Claims Errors for Reimbursement
Telecom Unified Household View of the Customer Searchable Data for NPTB Recommendations Protect Customer Data from Employee Misuse
Analyze Call Center Contacts Records Network Infrastructure Capacity Planning Call Detail Records (CDR) Analysis
Inferred Demographics for Improved Targeting Proactive Maintenance on Transmission Equipment Tiered Service for High-Value Customers
Retail 360° View of the Customer Supply Chain Optimization Website Optimization for Path to Purchase
Localized, Personalized Promotions A/B Testing for Online Advertisements Data-Driven Pricing, improved loyalty programs
Customer Segmentation Personalized, Real-time Offers In-Store Shopper Behavior
Manufacturing Supply Chain and Logistics Optimize Warehouse Inventory Levels Product Insight from Electronic Usage Data
Assembly Line Quality Assurance Proactive Equipment Maintenance Crowdsource Quality Assurance
Single View of a Product Throughout Lifecycle Connected Car Data for Ongoing Innovation Improve Manufacturing Yields
Healthcare Electronic Medical Records Monitor Patient Vitals in Real-Time Use Genomic Data in Medical Trials
Improving Lifelong Care for Epilepsy Rapid Stroke Detection and Intervention Monitor Medical Supply Chain to Reduce Waste
Reduce Patient Re-Admittance Rates Video Analysis for Surgical Decision Support Healthcare Analytics as a Service
Oil & Gas Unify Exploration & Production Data Monitor Rig Safety in Real-Time Geographic exploration
DCA to Slow Well Declines Curves Proactive Maintenance for Oil Field Equipment Define Operational Set Points for Wells
Government Single View of Entity CBM & Autonomic Logistic Analysis Sentiment Analysis on Program Effectiveness
Prevent Fraud, Waste and Abuse Proactive Maintenance for Public Infrastructure Meet Deadlines for Government Reporting
Hadoop Driver: Advanced analytic applications
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop Driver: Enabling the data lake SC
ALE
SCOPE
Data Lake Definition • Centralized Architecture
Multiple applications on a shared data set with consistent levels of service
• Any App, Any Data Multiple applications accessing all data affording new insights and opportunities.
• Unlocks ‘Systems of Insight’ Advanced algorithms and applications used to derive new value and optimize existing value.
Drivers: 1. Cost Optimization 2. Advanced Analytic Apps
Goal: • Centralized Architecture • Data-driven Business
DATA LAKE
Journey to the Data Lake with Hadoop
Systems of Insight
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Case Study: 12 month Hadoop evolution at TrueCar D
ata
Plat
form
Cap
abili
ties
12 months execution plan
June 2013 Begin Hadoop Execution
July 2013 Hortonworks Partnership
May ‘14 IPO
Aug 2013 Training & Dev Begins
Nov 2013 Production Cluster 60 Nodes 2 PB
Jan 2014 40% Dev Staff Proficient
Dec 2013 Three Production Apps (3 total)
Feb 2014 Three More Production Apps (6 total)
12 Month Results at TRUECar • Six Production Hadoop Applications • Sixty nodes/2PB data • Storage Costs/Compute Costs
from $19/GB to $0.23/GB
“We addressed our data platform capabilities strategically as a pre-cursor to IPO.”
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
CSC Big Data & Analytics
• Fastest Time to Value Proven methodologies and customer success stories achieving insight in 30 days and production rollout in 90.
• Industry Analytics Expertise Experience combining horizontal analytics approaches and techniques with industry and vertical specialization.
• Global Solutions Integrator Worldwide delivery capabilities and experience with a broad set of both open and proprietary technologies and vendors.
• End-to-End Consulting Taking customers on a journey from strategy and roadmap, to business and technology transformation, to ongoing SLA management and as-a-Service.
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
CSC Big Data Platform as a Service
Big Data Platform as a Service
Flexible Deployment Options
Hadoop Queries Streams CSC Command and Control
MongoDB Elasticsearch
Storm Kafka
PostgreSQL PostGIS
Deployment Center
Operations Center
Support Center
Application Center
Knowledge Center
Public Cloud
Virtual Private Cloud
Enterprise Private Cloud
Dedicated Cluster
Enterprise Grade Security Access Control
Compliance Support
Perimeter Security
Activity Monitoring
Audit Logging Encryption Malware
Protection Hardened
OS
DataStax TitanDB
ETL Data Transformation Business Intelligence Data Mining Advanced Analytics Geolocation
Hive w/ Tez HBase
Accumulo
HDFS, YARN, MR, Spark, …
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Across the Industries, Clients See the Possibilities
Financial Services Utilities Transportation Health and Life Sciences
Retail Telecommunications
• Fraud detection • Risk management • 360° view of the
customer
• Real-time route optimization based on traffic and weather
• Maintenance optimization and asset tracking
• 360° view of the customer
• Click-stream analysis • Real-time promotions
Law Enforcement • Real-time multimodal
surveillance • Situational awareness • Cybersecurity detection
• CDR processing • Churn prediction • Geomapping/marketing • Network monitoring
• Epidemic early warning system
• ICU monitoring • Remote healthcare
monitoring
• Analysis of weather impact on power generation
• Transmission monitoring • Smart grid management
• Predictive maintenance • Real-time parts flow
monitoring • Product configuration
planning
Manufacturing
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
But They Struggle With Consistent Challenges
• Data complexity • Robust and scalable service • Speed of stand-up
1. Setting up and operating a big data and analytics platform
2. Applying the right data science
3. Integrating insights into their business processes
• Skills shortage • Skills retention
4. Identifying and managing big data skills
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Time to Value, Time to Next Iteration
Business Discovery
Info Discovery
Logical Data Model
Physical Data Model
System Staging
Data Ingestion, Transformation, ETL
Application Development
Analytics
Production Staging
Data Warehouse Project 12-24 Months to Reach Production
Big Data Project 3-6 Months to Reach Production
Prod. Stag.
Business Discovery
Info Discovery
Sys. Stag.
Initial Data
Ingest
Schema on Read
Analytics
App Dev
Schema on Read
Analytics
App Dev
Schema on Read
Analytics
App Dev
Schema on Read
Analytics
App Dev
Schema on Read
Analytics
App Dev
Schema on Read
Analytics
App Dev
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Following the Big Data Maturity Lifecycle and…
• Determining use cases • Art of the possible • Technology evaluation &
understanding
• Validate business value hypothesis with real data
• Quick win, low hanging fruit, rapid initial phase
• Implement one key transformation or insight into business process
• Longer project timelines and robust ROI tracking
• Expand to other key use cases for a big data enabled department of business function
• Incorporate complementary tools and technology for a broader solution
• Shift from a department or function focus to a cross-org focus
• Introduce insights from across silos
• Implement self-service capabilities for analytics and data integration
• Provide marketplaces, catalogs, and collaboration zones
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
… Leveraging an App Reference Design Framework
It’s all about the apps.
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Proof of Value: Food & Hospitality Retailer
This Food & Hospitality Retailer has a footprint of over 650 regional hotels, 2,800 coffee shops, and a number of restaurant chains. CSC provides the infrastructure, data platform, and analytics that uncovers revenue opportunities in customer web interactions.
• The client wanted to quickly evaluate the use of big data and the value that it brings as it relates to identifying new business opportunities
• Ease of use was a key need in making insights and reporting more accessible to analysts… and increasing the speed with which they could analyze
• Time to market was a key factor in the decision to implement a comprehensive big data platform. The client realized: – A bare platform would not be easy
to manage – Their staff does not possess the skills to operate a
bare platform – They needed to focus on the
big data applications, rather than the platform
• CSC designed and configured the solution, built and deployed it in the cloud, and developed ETL flows to transport web activity data within 90 days: – Core platform (BDPaaS) leveraging Hortonworks
Data Platform, including Hive with Tez – Aggregating lots of different data sources to create
one massive web log data set – Adding data science algorithms to clean up data for
better insights – Providing Pentaho Business Analytics as a
comprehensive reporting and dashboard suite for insight presentation
• CSC managed the infrastructure, platform components, and data flows, in addition to providing continued support/consultation services to the client
• The client is generating insights on how customers interact with their website, and improving their services for happier customers and more streamlined business: – Faster path to ROI with both tech and services – Creating a real-time customer insights dashboard
and set of reports – Ability to prove the value of big data internally
through the mining of data and generation of insights and reports for various teams
– Scalability to more data sources and use cases, including plans for mobile application analytics and operational metrics, as well as operational business analytics combining internal and external data sources
SOLUTION
CHALLENGE RESULTS
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Business Unit Strategy: Network Rail
Network Rail manages the most of the rail infrastructure across Great Britain, responsible for control and maintenance of over 2,500 railway stations, 20,000 miles of track, and 40,000 bridges and tunnels. CSC provides a data and analytics hub for massive amounts of imagery and analog track monitoring data.
• Network Rail needed a platform that could not only store, but also analyze petabytes of data over the long-term: – Track imagery and video data captured via drones
and cameras – Vibration data captured via maintenance trains – Other forms of large file size analog data crossed
with operational, structured data sets • Network Rail wanted to implement the solution
quickly, and ramp up data volumes at a fast pace • Goal of leveraging combined services to assist with
loading data, managing the underlying infrastructure, and working with and analyzing the data
• CSC designed and configured the solution, built and deployed it in the cloud, and developed ETL flows to import massive amounts of bulk data on an ongoing basis – Core platform (BDPaaS) leveraging Hortonworks
Data Platform, including Hive with Tez • CSC’s platform integrated with ESRI ArcGIS for Big
Data geolocation analysis features including geotagging and geo tiles
• CSC managed the infrastructure, platform components, and data flows, in addition to providing continued support/consultation services to the client
• Network Rail is generating insights on how to prioritize in near real-time the improvement and maintenance of the massive railway track and infrastructure footprint – Advanced analytics of analog data, including
geolocation capabilities – Ability to handle the scale required by the massive
amount of data under management and data growth – Complete transformation of a business unit’s
analytics capability on track for success in less than 12 months
SOLUTION
CHALLENGE RESULTS
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Question & Answer session will be conducted electronically, using the panel to the right of your screen
Get started with Hortonworks Sandbox http://hortonworks.com/sandbox
Follow us: @hortonworks
CSC Big Data Maturity Survey http://www.csc.com/big_data_index Learn
More
@CSCNews
Next Steps
CSC Big Data Home http://www.csc.com/big_data