Smarter Analytics: Supporting the Enterprise with Automation

Preview:

DESCRIPTION

The Briefing Room with Barry Devlin and WhereScape Live Webcast on June 10, 2014 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=5230c31ab287778c73b56002bc2c51a The data warehouse is intended to support analysis by making the right data available to the right people in a timely fashion. But conditions change all the time, and when data doesn’t keep up with the business, analysts quickly turn to workarounds. This leads to ungoverned and largely un-managed side projects, which trade short-term wins for long-term trouble. One way to keep everyone happy is by creating an integrated environment that pulls data from all sources, and is capable of automating both the model development and delivery of analyst-ready data. Register for this episode of The Briefing Room to hear data warehousing pioneer and Analyst Barry Devlin as he explains the critical components of a successful data warehouse environment, and how traditional approaches must be augmented to keep up with the times. He’ll be briefed by WhereScape CEO Michael Whitehead, who will showcase his company’s data warehousing automation solutions. He’ll discuss how a fast, well-managed and automated infrastructure is the key to empowering faster, smarter, repeatable decision making. Visit InsideAnlaysis.com for more information.

Citation preview

Grab some coffee and enjoy the pre-show banter before the top of the hour!

The Briefing Room

Smarter Analytics: Supporting the Enterprise with Automation

Twitter Tag: #briefr

The Briefing Room

Welcome

Host: Eric Kavanagh

eric.kavanagh@bloorgroup.com @eric_kavanagh

Twitter Tag: #briefr

The Briefing Room

!   Reveal the essential characteristics of enterprise software, good and bad

!   Provide a forum for detailed analysis of today’s innovative technologies

!   Give vendors a chance to explain their product to savvy analysts

!   Allow audience members to pose serious questions... and get answers!

Mission

Twitter Tag: #briefr

The Briefing Room

Topics

This Month: ANALYTICS & MACHINE LEARNING

July: INNOVATIVE TECHNOLOGY

August: BIG DATA ECOSYSTEM

2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room

Twitter Tag: #briefr

The Briefing Room

Twitter Tag: #briefr

The Briefing Room

Analyst: Barry Devlin

Dr. Barry Devlin is among the foremost authorities on business insight and one of the founders of data warehousing, having published the first architectural paper on the topic in 1988. With over 30 years of IT experience, he is a widely respected analyst, consultant, lecturer and author. His 2013 book, “Business unIntelligence—Insight and Innovation beyond Analytics and Big Data,” is available as hardcopy and e-book. Barry is founder and principal of 9sight Consulting. He specializes in the human, organizational and IT implications of deep business insight solutions that combine operational, informational and collaborative environments. A regular contributor to BeyeNETWORK and TDWI, Barry is based in Cape Town, South Africa and operates worldwide.

Twitter Tag: #briefr

The Briefing Room

WhereScape

! WhereScape is a data warehousing software company

!   It offers WhereScape 3D, software for planning and reality-testing data warehousing and business intelligence projects; and WhereScape RED, an integrated development environment used for building, deploying and managing data warehouses and data marts.

! WhereScape RED allows developers to automate the data warehousing life cycle

Twitter Tag: #briefr

The Briefing Room

Guest: Michael Whitehead

A data warehousing industry veteran, Michael Whitehead has spent more than a decade designing and building commercial data warehouses for customers in a wide variety of industries. Prior to founding WhereScape, Michael had Asia Pacific responsibilities for data warehousing for Sequent Computer Systems, Inc.

Michael Whitehead June 2014

Smarter Analytics

Why were sales down this week

versus last year?

Grocery  Store  with  Class,  Walter  Watzpatzkowski,  15  /1/09  

We promoted ice cream but the

weather was unreasonably

cold Grocery  Store  with  Class,  Walter  Watzpatzkowski,  15  /1/09  

Our competitor ran a better promotion

Grocery  Store  with  Class,  Walter  Watzpatzkowski,  15  /1/09  

1990s - Decision support system (For the time) large amounts of data, stored in various inscrutable file formats and database management systems. Want actionable information? Write a program. One program per analytical problem…. Reporting bureaus

This  model’s  dysfuncBons  created  the  need  for  data  warehousing…  

2000s - Enterprise data warehousing Separate the refinement of raw data – regardless of the source – from the delivery of subsets of that data, to various decision-making constituencies. Build a solid, scalable information delivery infrastructure for the corporation. Support variability, and change, at both ends. Apply appropriate governance, risk management, compliance mechanisms. [And stabilize the supply side of the market, in the process…]

A  design  paFern  for  stable,  OperaBonalized  informaBon  

refining  and  delivery  

The economic conditions led to a

change in demographics of

the people walking past my store

Grocery  Store  with  Class,  Walter  Watzpatzkowski,  15  /1/09  

2014 - big data technologies

Large amounts of data, stored in various inscrutable file formats and database management systems. Want actionable information? Write a program. One program per analytical problem…. Oh, and batch-oriented. And integrate-it-yourself.

Instead  of  JCL,  Pig.  Instead  of  CICS  and  Comshare,    Cloudera.  In  what  way  is  this  model  a  leap  forward?  

HOW DID WE GET HERE?

People built Data warehouses that don’t support

analytics

Grocery  Store  with  Class,  Walter  Watzpatzkowski,  15  /1/09  

2014 – “self service” technologies Large amounts of data, stored in various inscrutable file formats AND data warehouses. Want actionable information? Create a dataset. One dataset per analytical problem….

The  newer  tech  is  great.    Is  the  way  it  is  used  a  leap  forward?  

Automation is key for better support

of analytics

Smith  Cannery:  Extension  and  Experiment  StaBon  CommunicaBons  Photograph  CollecBon  (p120)  

STEPS 1.  Identify attributes

2.  Identify business key

3.  Index business key and add a unique constraint

4.  Create surrogate key with auto sequence generation

5.  Index surrogate key

6.  Insert zero surrogate key row with values set for each attribute

7.  Add a modified timestamp column

8.  Write the SQL code to Insert new business keys or Update existing business key rows. Maintain the modified timestamp

9.  Create any other indexes required for querying

10.  Decide best practice for index maintenance during load. Keep in situ or drop and recreate after load.

11.  Document procedure

Etc Etc

Really? 1.  Identify attributes

2.  Identify business key

3.  Index business key and add a unique constraint

4.  Create surrogate key with auto sequence generation

5.  Index surrogate key

6.  Insert zero surrogate key row with values set for each attribute

7.  Add a modified timestamp column

8.  Write the SQL code to Insert new business keys or Update existing business key rows. Maintain the modified timestamp

9.  Create any other indexes required for querying

10.  Decide best practice for index maintenance during load. Keep in situ or drop and recreate after load.

11.  Document procedure

Etc Etc

What can be automated?

•  Profiling

•  Model conversion

•  Object creation

•  Code generation

•  Indexing

•  Impact analysis

•  Documentation

What it will look like? The new data warehouse

The new data warehouse Five Key Changes

Pooling – new types of data, staged differently than we’ve staged pampered data, in the past. A multi-engine “logical” data warehouse: NoSQL à Not Only SQL Support for discovery, prototyping and evaluation of analytics Support for continuing data integration, through to the “end use” tier Automation of the data warehousing platform’s core functionality

Back  to  best-­‐of-­‐breed,  customer-­‐specific  IntegraBon  models  

Conclusion Let’s not stuff it up (again)

•  Data people – challenge ourselves to do more, faster

•  Analysts – don’t give up on the data people

Twitter Tag: #briefr

The Briefing Room

Perceptions & Questions

Analyst: Barry Devlin

Copyright © 2014 9sight Consulting, All Rights Reserved

Dr Barry Devlin Founder & Principal

9sight Consulting

Business Intelligence: Smarter Analytics: Supporting the Enterprise with Automation

Bloor Briefing Room 10 June 2014

un ^

Analytics (and big data ) emerged for business with social media and web logs

§  Understanding and tracking sentiment –  What do you think? How do you react? –  Basic analytics and BI activity on a new

data source

§  Real-time insight into and influence on website activities –  Why did you abandon your cart? –  What would you most likely buy

on getting a cross-sell? –  Deep, real-time analytics and BI

with operational integration

30 Copyright © 2014, 9sight Consulting

§  Extends existing processes –  Micro-management of supply chains and

extension all the way to the consumer – Sourcing and delivery

§  Creates completely new business models –  Often depending on analytics

– Motor insurance à encouragement & prevention – Hospital care à health monitoring

31 Copyright © 2014, 9sight Consulting

The Internet of Things adds urgency to a new automation of analytics and BI

The biz-tech ecosystem reflects the complexity of today’s business.

32 Copyright © 2014, 9sight Consulting

Business

Information Technology

Information abundance and variety

Customer interaction and technical savvy

Speed of decision and appropriate action

Market flexibility and uncertainty

Competition Mobile devices

Externally-sourced information

The architecture for the biz-tech ecosystem consists of information pillars. §  Single architecture for all types of

data/information –  Mix/match technology as needed –  Relational, NoSQL, Hadoop, etc.

§  Integration of sources and stores –  Instantiation gathers measures,

events, messages and transactions –  Assimilation integrates stored info. –  Reification virtualizes access

§  Data flows as fast as needed and reconciled when necessary –  No unnecessary storage or

transformations –  (Contrast layered data architecture)

33 Copyright © 2014, 9sight Consulting

Transactions

Human-sourced

(information)

Machine-generated

(data)

Process-mediated

(data)

Context-setting (information)

Assimilation

Transactional (data)

Events Measures Messages

Instantiation

Reification

Information pillars can be mapped to today’s BI and analytics tools and environments.

§  Process-mediated data –  Traditional computing –  Via data entry, cleansing processes –  Relational databases

§  Machine-generated data –  Output of machines and sensors –  The Internet of Things –  NoSQL, Streaming, (RDBMS)

§  Human-sourced information –  Subjectively interpreted record of

personal experiences –  From Tweets to Videos –  Hadoop, Enterprise Content

Management

34 Copyright © 2014, 9sight Consulting

Transactions

Human-sourced

(information)

Machine-generated

(data)

Process-mediated

(data)

Context-setting (information)

Assimilation

Transactional (data)

Events Measures Messages

Instantiation

BI EDW

OLTP

Oper. Analytics

Pred. Analytics

From BI to Business unIntelligence

35 Copyright © 2014, 9sight Consulting

§  Information, knowledge and meaning –  Understanding real world context

§  Process, predefined and emergent –  Automating the creation and use

of information

§  Beyond bounded rationality –  How decisions are really made

§  http://bit.ly/BunI-Technics : 25% discount with code “BIInsights25”

Copyright © 2014 9sight Consulting, All Rights Reserved

Dr Barry Devlin Founder & Principal

9sight Consulting

Thank you!

Additional resources §  All articles and white papers available

at: http://bit.ly/9sight_papers

§  Blogs at: http://bit.ly/BD_Blog

§  Follow me on Twitter: @BarryDevlin

36

Questions (1)

1.  The Enterprise Data Warehousing architecture of the 2000s (I would say 1990s) was driven by the business need for consistency / reconciliation of data from many sources. It’s perhaps suboptimal for timeliness (real-time data) and maintenance (multiple layers of ETL function). How can the sort of automation you’re proposing help in these two areas?

2.  You compare 1980s and 2014 approaches asking how this model is a “leap forward.” One difference is users’ (data scientists) skills with technology. Wouldn’t automation disempower such users?

3.  What would a warehouse that “supports Analytics” look like?

4.  You say “Automation is the key for better support of analytics,” but how does automation support the agility and flexibility needed for analytics?

5.  A big idea in analytics is “model on read.” Automation typically requires/provides “model on write.” How do you address these very opposite needs?

37 Copyright © 2014, 9sight Consulting

Questions (2)

6.  Your pooling tier reminds me of the “Data Lake” – of which I’m not a big fan! Why would I want to bring “pampered data” ( I assume traditional data) through this pool? Seems like an additional / unnecessary step?

7.  What engines (other than SQL) do you envisage? Which do / will you support?

8.  Can you describe what the linkage between the different engines means? If integration how is it done?

9.  What data integration support do you envisage in the “end use” tier?

10.  Overall, how do you see your existing products evolving to implement the various aspects of this architecture? Does the relational database remain the core component, or do you envisage a more central role for Hadoop, as in Cloudera’s Enterprise Data Hub?

38 Copyright © 2014, 9sight Consulting

Twitter Tag: #briefr

The Briefing Room

Twitter Tag: #briefr

The Briefing Room

Upcoming Topics

www.insideanalysis.com

2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room

This Month: ANALYTICS & MACHINE LEARNING

July: INNOVATIVE TECHNOLOGY

August: BIG DATA ECOSYSTEM

Twitter Tag: #briefr

The Briefing Room

THANK YOU for your

ATTENTION!

Recommended