Upload
inside-analysis
View
141
Download
2
Embed Size (px)
DESCRIPTION
The Briefing Room with Dr. Robin Bloor and Techwise Live Webcast on April 23, 2014 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=fa24dc305208c34cd98fb63f62797323 Few innovations over the past 40 years compare to the hype or significance of Hadoop. But despite the vast array of software vendors touting their Hadoop strategy, very few businesses have yet to fully wrap their heads around what this development means, or where it's going. Register for this inaugural episode of TechWise to hear veteran IT Analyst, Dr. Robin Bloor explain how Hadoop is vastly different from Linux or SOA, and why its future remains largely unwritten. He will be joined by Constellation Research Founder Ray Wang, who will provide his perspective on the Hadoop landscape. They will then take questions from the audience about any facet of this transformative trend. Visit InsideAnlaysis.com for more information.
Citation preview
Grab some coffee and enjoy the pre-show banter before the top of the hour!
The Briefing Room
Hand in Hand—Optimizing the Data Warehouse for Big Data
Twitter Tag: #briefr
The Briefing Room
! Reveal the essential characteristics of enterprise software, good and bad
! Provide a forum for detailed analysis of today’s innovative technologies
! Give vendors a chance to explain their product to savvy analysts
! Allow audience members to pose serious questions... and get answers!
Mission
Twitter Tag: #briefr
The Briefing Room
Topics
This Month: BIG DATA
May: DATABASE
June: ANALYTICS & MACHINE LEARNING
2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
Twitter Tag: #briefr
The Briefing Room
Analyst: Claudia Imhoff
Claudia Imhoff is President & Founder of
Intelligent Solutions, Inc.
Twitter Tag: #briefr
The Briefing Room
Pentaho
! Pentaho offers a suite of open source business intelligence products called Pentaho Business Analytics
! Pentaho’s big data solution provides access to any data source, and includes data integration, discovery, analysis and visualization
! Pentaho’s solutions are available in community or enterprise editions
Twitter Tag: #briefr
The Briefing Room
Guest: Chuck Yarbrough
Chuck is the Director of Big Data Product Marketing at Pentaho, a leading big data analytics company that helps organizations engineer big data connections, blend data and report and visualize all of their data. Much of Chuck's focus at Pentaho is in educating organizations on how big data can help win, serve and retain customers, lower costs and grow revenue through the proper use of big data. A life-long participant in the data game, Chuck has held leadership roles at Deloitte Consulting, SAP Business Objects, Hyperion and National Semiconductor.
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 10
Director, Big Data Product Marketing @cyarbrough
April 29, 2014
Data Warehouse Optimization
Blueprint Chuck Yarbrough
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 11
ANY Analytics • Reports • Dashboards • Visualizations • Discovery • Predictive • Any role
Analytics
ANY Environment • Data warehouses • Data marts • Stack vendors • Cloud • Embedded
Existing & New Data Infrastructure &
Processes
ANY Data • Relational • Operational • Big Data • Data sources not
yet anticipated
Billing
Location
Social Media
Customer
Web
Network
OUR VISION
The New Reality: Powerful yet simplified analytics for all users
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 12
Improve operational effectiveness Machines/sensors: predict failures, network attacks
Financial risk management: reduce fraud, increase security
Reduce data warehouse cost Integrate new data sources without increased database cost
Provide online access to ‘dark data’
Drive incremental revenue Predict customer behavior across all channels
Understand and monetize customer behavior
Begin to monetize data as a service
Emerging big data use cases demand blending multiple data sources
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 13
Entry
Tran
sfor
m
Advanced
Opt
imiz
e A Spectrum of Big Data Use Cases What the Market is Deploying Today and Planning for Tomorrow
Data Warehouse Optimization
Streamlined Data
Refinery
Big Data Exploration
Customer 360 Degree
View
Harnessing Machine &
Sensor Data
Next Generation Applications
Internal Big Data as a Service
On-Demand Big Data Blending
Big Data Predictive Analytics
Use Case Complexity
Bus
ines
s Im
pact
Monetize My Data
Data Warehouse Optimization
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 14
Entry
Tran
sfor
m
Advanced
Opt
imiz
e A Spectrum of Big Data Use Cases What the Market is Deploying Today and Planning for Tomorrow
Data Warehouse Optimization
Streamlined Data
Refinery
Big Data Exploration
Customer 360 Degree
View
Harnessing Machine &
Sensor Data
Next Generation Applications
Internal Big Data as a Service
On-Demand Big Data Blending
Big Data Predictive Analytics
Use Case Complexity
Bus
ines
s Im
pact
Monetize My Data
Data Warehouse Optimization
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 15
Cut Downtime and Focus on Product Creation
Remove Costly Legacy Systems
Simplicity Empowers Business Users
Data Warehouse Optimization Remove the clutter and connect to Big Data
“Using Pentaho in our data warehouse, it now takes about 20 minutes to break down a metric and do specific analysis to identify performance issues. In the past, similar queries would take all night.” Greg Allen, Business Analyst, Kiva
“Pentaho Data Integration not only simplifies the data delivery process but also enables us to gather the high-quality data. Ultimately Pentaho has enabled us to reach our goal of making the Swiss real estate market more transparent.” Prof. Dr. Peter IlG, Managing Director, Swiss Real Estate Datapool
“We needed fully functional reporting and data integration tools but wanted to cut the cost burden experienced with Oracle. After looking at what was out there, Pentaho had the complete tool set, and after further testing, our users noticed no difference in the features they need.” Uwe Geercken. IT Manger, Swissport
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 16
Data Warehouse Optimization Shrink Data Costs & Boost Analytics Performance for Business Users
Key Considerations
• Normally leverages Hadoop
• Relevant across industries
• May require new coding skillsets that are hard to find
Why Do It?
• Save data capacity & management costs
• Empower IT and business users to meet goals on time
What is it?
• Existing DW infrastructure can’t support data explosion, & adding DW capacity is costly
• So offload low priority data to Big Data store to extend capacity
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 17
Data Warehouse Optimization Shrink Data Costs & Boost Analytics Performance for Business Users
CRM & ERP Systems
Data Warehouse
PDI
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 18
Data Warehouse Optimization Shrink Data Costs & Boost Analytics Performance for Business Users
CRM & ERP Systems
Data Warehouse
PDI
PDI
Hadoop Cluster
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 19
Data Warehouse Optimization Shrink Data Costs & Boost Analytics Performance for Business Users
CRM & ERP Systems
Data Warehouse
PDI
PDI
Hadoop Cluster
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 20
Data Warehouse Optimization Shrink Data Costs & Boost Analytics Performance for Business Users
CRM & ERP Systems
Data Warehouse
PDI
Other Data Sources
PDI
PDI
Hadoop Cluster
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 21
Data Warehouse Optimization Shrink Data Costs & Boost Analytics Performance for Business Users
CRM & ERP Systems
Data Warehouse
PDI
Other Data Sources
PDI
PDI
Hadoop Cluster
Analytic Data Mart
PDI
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 22
Data Warehouse Optimization Shrink Data Costs & Boost Analytics Performance for Business Users
CRM & ERP Systems
Data Warehouse
PDI
Other Data Sources
PDI
PDI
Hadoop Cluster
Analytic Data Mart
PDI
Relational Layer
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 23 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 23
Data Warehouse Optimization Cost effective, fast processing
Business Challenge • Gain competitive advantage through intraday
balance reporting for commercial customers
• Use Hadoop and relational data stores to process huge volumes 15x faster
to develop 10x faster to execute
No coding
Integrate with existing
Easy to find resources
Pentaho Benefits • Graphical orchestration for Hadoop, Hbase &
DB2 data integration workloads
• 15x faster to develop, 10x faster to execute
A Major Financial Institution
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 24
Optimize data infrastructure to connect hundreds of interdependent banking applications
Internal User Reporting & Data
mining
Clients Statements,
Balance, Transaction Reporting &
Analytics
A Major Financial Institution
Hadoop Cluster
Historical Data Mart
Data Marts
Customer & Account
Master Data
Payments Data
Cash Processing
Data
Other Financial
Apps
PDI PDI
Scalable Enterprise Data Hub
Hundreds of Enterprise Data
Sources
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 25
Thank You
blog.pentaho.com
@Pentaho
Facebook.com/Pentaho
Pentaho Business Analytics
JOIN THE CONVERSATION. YOU CAN FIND US ON:
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 26
Text Here
Streamlined Data Refinery Drive a Sustainable Analytics Strategy with Big Data ETL at Scale
Vertical Fit
High Tech, Telecom, Media, Financial Services, etc
Technology Fit
Primarily Hadoop, but also NoSQL
Benefits
• Establish usable analytics on diverse sources at high volume (terabytes+)
• Speed queries substantially with
rapid ingestion & powerful processing
• Reduce costs of ETL processing
Challenges
• Expansive integration project
• May require new coding skillsets that are hard to find
• May call for swapping from a data warehouse to a higher performing Analytic database, depending on requirements
Why Do It?
• Give business users insight into all data
• Scale ETL and data management cost savings
• Next step after DW optimization
What is It? In the face of exploding volumes of transaction, customer, and other data, traditional ETL systems slow down, making analytics unworkable. One solution is to streamline most data through a scalable Big Data processing hub – that pushes refined data to a data warehouse or analytical database for low-latency self-service analytics across a diverse base of data.
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 27
Streamlined Data Refinery Drive a Sustainable Analytics Strategy with Big Data ETL at Scale
• Offers a full platform for this use case, including broad data integration (incl. leading Hadoop distros and analytic DBs) and a powerful array of easy to use front-end analytics
• Visual mapReduce mitigates need for additional developers, and makes Big Data accessible to existing IT staff
• Pentaho mapReduce runs much faster in the cluster vs. other scripting tools
Why
Transactions – Batch & Real-time
PDI Enrollments & Redemptions
Location, Email, Other
Data
Hadoop Cluster
PDI
Analytic Database
Analyzer
Reports
© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 28
Pentaho Big Data Analytics Platform Simplified data preparation and analytics for all users
Simplified Analytics
Experience
Enterprise Big Data
Integration
Blended Big Data
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved
Claudia Imhoff
29
President and Founder Intelligent Solutions, Inc.
A thought leader, visionary, and practitioner, Claudia Imhoff, Ph.D., is an internationally recognized expert on analytics, business intelligence, and the architectures to support these initiatives. Dr. Imhoff has co-authored five books on these subjects and writes articles (totaling more than 150) for technical and business magazines. She is also the Founder of the Boulder BI Brain Trust, a consortium of internationally-recognized independent analysts and experts. You can follow them on Twitter at #BBBT or become a subscriber at www.bbbt.us.
Email: [email protected] Phone: 303-444-6650 Twitter: Claudia_Imhoff
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved
Topics
§ An extended data warehouse architecture for a modern BI environment
§ Questions
30
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved
Data Warehouse Technology Drivers
§ Do more with less § Data compression § Schemas on read § Open source components § In-Memory capabilities
§ Simpler environments § Cloud deployments § Easier data management § Mobile and Self-service BI § Built-in analytic functions
31
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved
Extended Data Warehouse Architecture
32
Traditional EDW environment
Investigative computing platform
Data refinery
Data integration platform
Analytic tools & applications
Operational real-time environment
RT analysis engine
Other internal & external structured & multi-structured data
Real-time streaming data Operational systems
BI services
Slide created by Colin White – BI Research, Inc.
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved
Data Integration Use Case: Data Refinery
Ingests raw detailed data in batch and/or real-time into a managed data store
Distills the data into useful business information and distributes the results to downstream systems
May also directly analyze certain types of data
Employs low-cost hardware and software to enable large amounts of detailed data to be managed cost effectively
Requires (flexible) governance policies to manage data security, privacy, quality, archiving and destruction
Traditional EDW environment
Investigative computing platform
Data refinery
Data integration platform
33
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved
Traditional EDW Use Cases
Most BI environments today § New technologies can be
incorporated into the EDW environment to improve performance, efficiency and reduce costs
Use cases § Production reporting § Historical comparisons § Customer analysis (next
best offer, segmentation, life-time value scores, churn analysis, etc.)
§ KPI calculations § Profitability analysis § Forecasting
Traditional EDW environment
Data refinery
Data integration platform
Analytic tools & applications
Operational real-time environment
RT analysis engine Operational systems
BI services
34
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved
Investigative Computing Use Cases
New technologies used here include: o Hadoop, in-memory computing, columnar storage, data compression, appliances, etc. Use cases o Data mining and predictive modeling for EDW and real-time environments o Cause and effect analysis o Data exploration (“Did this ever happen?” “How often?”) o Pattern analysis o General, unplanned investigations of data
Data refinery
Data integration platform
Analytic tools & applications
Operational real-time environment
RT analysis engine
Investigative computing platform
Operational systems
BI services
35
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved
Operational RT Environment Use Cases
Embedded or callable BI services: o Real-time fraud detection o Real-time loan risk
assessment o Optimizing online promotions o Location-based offers o Contact center optimization o Supply chain optimization
Real-time analysis engine: § Traffic flow optimization § Web event analysis § Natural resource
exploration analysis § Stock trading analysis § Risk analysis § Correlation of unrelated
data streams (e.g., weather effects on product sales)
36 Operational real-time environment
RT analysis engine
Other internal & external structured & multi-structured data
Real-time streaming data
Operational systems
BI services
36
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved
BUT – All Components Must Work Together!
37
analytic models analyses
New sources of data Enterprise DW
Analytic tools
Investigative computing platform Data refinery Operational systems
existing customer
data
next best customer offer
3rd party data location data social data
feedback
RT analysis engine call center dashboard or web event stream
Slide created by Colin White – BI Research, Inc.
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved
Topics
§ Extending the data warehouse architecture for a modern analytics environment
§ Questions
38
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved
Many Organizations Not Ready for Big Data?
§ Many companies are struggling to get a traditional data warehouse in place and produce basic BI § Business users not analytically savvy § Minimal governance § Chaotic architectures
§ What do you say to these organizations?
39
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved
Existing Data Warehouse
§ Do organizations have to rip and replace their existing DW to solve big data problems? § When do I use a traditional DW versus the Hadoop
environment? § Does the data hub replace the data warehouse?
40
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved
Data Integration
§ Where is ETL used and not used? § How do enterprises control data blending and
virtualization (do they need to)? § Is data governance still important?
§ How does it change in this new environment?
41
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved
New IT Skills
§ To achieve DW optimization… § Does IT have to rip and replace their employees? § Should they rely on consultants? § To what extent?
§ What is needed to move from basic DW to a big data architecture?
42
Copyright © 2014, Intelligent Solutions, Inc., All Rights Reserved
Evolving to Advanced Analytics
§ Is it mandatory to hire data scientists? § Is training on new technology enough? § What else is needed to make the company more
analytically-driven?
43
Twitter Tag: #briefr
The Briefing Room
Twitter Tag: #briefr
The Briefing Room
Upcoming Topics
www.insideanalysis.com
2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
This Month: BIG DATA
May: DATABASE
June: ANALYTICS & MACHINE LEARNING
Twitter Tag: #briefr
The Briefing Room
THANK YOU for your
ATTENTION!