Upload
michael-lamont
View
138
Download
1
Tags:
Embed Size (px)
DESCRIPTION
A basic introduction to data warehouses, their uses, and their benefits.
Citation preview
Platforms
Implementation of BI platform requires
lots of important choices:
Type of platform
Software tools & technologies
IT usually takes lead on technology and
platform decisions
Important for business managers to
participate in decision making – they’ll
actually be using the platform
Platforms
BI platforms capture raw operational
data and convert it to useful info
Process used by a platform can be
simple or complex
Data warehouse is most common BI
platform
Data warehouses have several distinct
components that work together
BI Platform
Data Sources
Operational Systems
Organizations usually have dozens of
operational systems that support day-to-
day transactions
Line-of-business apps:
Human resources
Enterprise Resource Planning
Supply chain
Point-Of-Sale
Operational Systems
Efficient at supporting transactional
processes
Not so good for business analysis
Not really able to use data from multiple
sources
BI Platform
Data Sources
Data
Warehouse
Data Warehouse
Collective repository of data from a
company’s operational systems
Data warehouse feeds data into series
of subject-specific databases called
data marts
Some “data warehouse” platforms are
really just a collection of data marts
BI Platform
Data Sources
Data
Warehouse
HR
Sales
Finance
Data Marts
Data Marts
Data marts are subject-specific
HR
Sales
Finance
Marketing
Etc
Definition of “subject” varies from
company to company depending on
needs
Data Marts
Examples of data marts in a single
company:
Support Sales dept’s analysis of
performance and margins
Let HR dept analyze headcount and
absence trends
Data Sharing
Data warehouses shouldn’t be collection
of independent silos of data
Silos of data are what operational systems
already give you
A good data warehouse makes it easy to
normalize measures and dimensions
Ensures dimensions & measures have same
meanings across company
Support metrics calculations across data feeds
Data Sharing
Operational systems can’t calculate many useful metrics because they can’t integrate/share data
Calculating revenue per employee requires data from Sales and HR data silos
Easy to calculate these metrics in a data warehouse with shared data and dimensions
More shared data = more powerful analysis
Data Integration
Integrating data into a common
warehouse is hardest part of BI process
Each operational system creates
mountains of data in incompatible
formats
Extract, Transform, Load processes
load data from operational systems into
data warehouse.
BI Platform
Data Sources
Data
Warehouse
HR
Sales
Finance
Data Marts ETL Processes
Data Integration
Business managers/analysts aren’t
usually involved in technical details of
ETL
Participate in defining business rules for
how data is integrated
Data integration rules determined by:
Type of analysis to be performed
How well data supports requirements
Data Analysis
Analysis processes responsible for
assembling charts, graphs, etc and
delivering them to business users
Software packages used for these tasks
are called front-end tools
Harvest info from data warehouse
Present to users in visual formats
Data Analysis
More advanced analysis tools can be
used to explain behavior or uncover
hidden trends
Goal of analysis process is to help
decision makers by giving them useful
data
Reporting & Analysis
Piece of BI that business users are most
familiar with
Primary purpose: put data in hands of
business users
Reporting & analysis processes need to
assemble data into formats that hold
meaning for business users
Reporting & Analysis
Multidimensional analysis designed to
make data understandable/useful to
business users
Tabular grids excellent way to
consolidate & present data
Also important to graphically chart data
Graphs and tables work together to give
business users different perspectives on
data
Graphics Example
Tenure Sick
Days
10 8.04
8 6.95
13 7.58
9 8.81
11 8.33
14 9.96
6 7.24
4 4.26
12 10.84
7 4.82
5 5.68
Tenure Sick
Days
10 9.14
8 8.14
13 8.74
9 8.77
11 9.26
14 8.1
6 6.13
4 3.1
12 9.13
7 7.26
5 4.74
Tenure Sick
Days
10 7.46
8 6.77
13 12.74
9 7.11
11 7.81
14 8.84
6 6.08
4 5.39
12 8.15
7 6.42
5 5.763
Tenure Sick
Days
8 6.58
8 5.76
8 7.71
8 8.84
8 8.47
8 7.04
8 5.25
19 12.5
8 5.56
8 7.91
8 6.89
Dept 1 Dept 2 Dept 3 Dept 4
Avg Tenure: 9 years
Avg Sick Days: 7.5
Graphics Example
0
5
10
15
0 5 10 15
Dept 1
0
5
10
0 5 10 15
Dept 2
0
5
10
15
0 5 10 15
Dept 3
0
5
10
15
0 10 20
Dept 4
Business Users
Power Analysts
Information Consumers
Information Users
Business Users
Information Users
Information Users
Require standard reports
Can be short or extensive
Usually contains charts and tables
Want consistent report formats
No need to “slice and dice” data
Static or very simple dynamic reports
Printed
MS Office document formats (PPT, XLS)
Business Users
Information Consumers
Information Consumers
Want to perform dynamic data queries
Not experts in database design or query
tools
Want to be able to pivot and nest data
inside intuitive interface
Interactive ad hoc tools can provoke info
users to cross the line into info
consumer territory
Business Users
Power Analysts
Power Analysts
Use the full analytical power of the
system to do free-form ad hoc analysis
Knows the details of database design
and query tool software
Creates reports for others
Smallest of the three groups of users
Front-End Tools
Present data from warehouse to
business users as reports and
interactive data views
Can be grouped into two categories:
Reporting tools
Data exploration
Front-End Tools
Reporting paradigm:
Excellent at producing tabular reports
Lots of mature and stable packages
Web interfaces for wide-scale deployment
Strong printing/scheduling capabilities
Multidimensional data exploration:
Excellent for dealing with OLAP cubes
Support interactive ad hoc analysis
Graphical charts and views
Front-End Tools
Competitive market space
Wide range of available features and
functionality
Front-End Tools
Remember: features aren’t benefits
Advanced analysis features useful to
power analysts, but not info users
Invest time to figure out broader BI
objectives and needs of users
Select solution providers based on your
objectives and needs
Data Warehouses
Primary task: support reporting &
analysis
Warehouse design & content driven by
business needs
Business people determine what info they
need to make better decisions faster
IT implements warehouse to fit business
needs
Data Warehouses
Business & IT need to be aligned on
business requirements
Subject Oriented
Data warehouses organize data into
subject-specific data marts
Data marts are NOT silos of data
Data marts gather data from multiple
operational systems to support analyses
Ex: product line profitability
Data in the warehouse is shared by the
data marts
Consistent Data
Warehouses provide consistent data by
using the same dimensions and
measures for all data
Consistent - data to be analyzed has
same definitions across entire company
Achieving data consistency requires
both integration and organizational
decisions
Consistent Data
Data from multiple operational systems
has to be integrated into one common
data set for analysis
Problem: Different systems may have
subtly different definitions of “discount”
Solution: Data warehouse
integrates/transforms data based on
consistent business rule
Consistent Data
Problem: Source data has different
dimension structures
Solution: Warehouse defines uniform
dimension designs
Consistent data requires standardized
measure & dimension definitions
Everyone in company needs to “speak
the same language” for dimensions &
measures
Cleansed Data
Cleansed data – data that has been
validated by business & structural rules
Storing cleansed data is a key priority
for data warehouses
Data from operational systems is usually
uncleansed “dirty data”
Types of Dirty Data
Missing
Information not entered into an order
tracking system
Incorrect
One Walmart reporting it sold 50K razor
blades in an hour
Data entry errors
Booston, MA
Subtle issues like double-counting
Cleansed Data
ETL processes use business rules to
load valid data and cleanse/reject invalid
data
Historical Data
Warehouses let you analyze data over
specific time periods
Provides users with “snapshots” of data
from operational systems
Warehouse data is static, unlike
operational systems
Warehouse data refreshed at regular
time intervals
Historical Data
Data warehouses are non-volatile
Historical data lets analysts identify
trends and exceptions
Ex: comparing year-over-year sales on a
quarterly basis
Fast Delivery of Data
Warehouse has to provide data to users
quickly and efficiently
Database technology and structures
need to be fast & efficient
Two types of databases in common
usage:
OLAP (OnLine Analytical Processing)
RDBMS (Relational DataBase Management
Systems)
OLAP Databases
Benefits of OLAP:
Native support of multidimensional analysis
Fast data retrieval
Pre-process data as much as possible
Ideal for fast retrieval of aggregated data
OLAP is usually a good candidate for
data marts
OLAP Databases
Important recent developments:
Much easier to design OLAP databases
Acquisition costs are extremely low
SMBs can now use technology that was
only available to large enterprises a few
years ago
OLAP & Relational Databases
Relational databases often store
underlying data supplied to OLAP
database
RDBMS stores detailed data, OLAP
stores summarized data views
Example: Sales data mart
Relational stores daily sales data
OLAP stores and manages summarized
sales data by customer, product, region, etc.
Relational Databases
Relational databases can host data
marts without OLAP
Use their own set of dimensions &
measures to support analysis
Requires sophisticated front-end tools
that can quickly assemble relational data
into multidimensional formats
Conclusions
Data warehouse architecture is flexible,
effective decision support platform
Warehouse helps organize and deliver
data to decision makers
Brings BI to life through data marts, DB
technology, ETL tools, and analysis tools
Helps business managers make better
decisions faster
Michael Lamont