Upload
anisa
View
47
Download
0
Embed Size (px)
DESCRIPTION
Data Warehouse & Business Intelligence Concepts & Architecture Sanjeev. Data Warehouse & Business Intelligence. Topics To Be Discussed: Why Do We Need A Data Warehouse ? What Exactly Is A Data Warehouse ? Features Of Data Warehouse Sources Of Data Warehouse Data Warehouse Designs - PowerPoint PPT Presentation
Citation preview
© Copyright GlobalLogic 2010 1
Connect. Collaborate. Innovate.
Data Warehouse & Business Intelligence
Concepts & Architecture
Sanjeev
© Copyright GlobalLogic 2010 2
Connect. Collaborate. Innovate.
Topics To Be Discussed:
Why Do We Need A Data Warehouse ? What Exactly Is A Data Warehouse ? Features Of Data Warehouse Sources Of Data Warehouse Data Warehouse Designs Data Warehouse – Data Usage Why There Is A Need of Business Intelligence ? DWH & BI - Architecture Case Study
Data Warehouse & Business Intelligence
© Copyright GlobalLogic 2010 3
Connect. Collaborate. Innovate.Data WarehouseWhy Do We Need A Data Warehouse ?
Data Access Problem – Data In “Jail” The single key to survival in the 1990s (and beyond) is - being able to analyze, plan and react to changing business conditions in a much more rapid fashion.
To do this, top managers, analysts and knowledge workers in our enterprises need more and better information.
Information technology itself has made possible the revolutions in the way that Organizations today operate throughout the world.
More and more powerful computers on everyone‘s desks, and
Communication networks that span the globe
BUT STILL
Executives and decision makers can't get their hands on critical information that already exists in the organization.
Continued…
© Copyright GlobalLogic 2010 4
Connect. Collaborate. Innovate.Data Warehouse
Data Access Problem – Data In “Jail” Organizations- large and small, create
billions of bytes of data about all aspects of their business, millions of individual facts about their customers, products, operations and people
But for the most part, this data is locked up in a myriad of computer systems and is exceedingly difficult to get at.
This phenomenon has been described as "data in jail".
Only a small fraction of the data that is captured, processed and stored in the enterprise is actually available to executives and decision makers.
Technologies for the manipulation and presentation of data have literally exploded.
Large segments of the enterprise are still "data poor.“
Whatever is BETTER, FASTER and CHEAPER, is not FUNCTIONALLY COMPLETE.
Continued…
Why Do We Need A Data Warehouse ?
© Copyright GlobalLogic 2010 5
Connect. Collaborate. Innovate.Data Warehouse
Data Access Problem – Data In “Jail”Solution – A Data Warehouse
A set of significant new concepts and tools have evolved providing all the key people in the enterprise with access to whatever level of information needed for the enterprise to survive and prosper in an increasingly competitive world.
The term that has come to characterize this new technology is “Data Warehousing.”
to provide an Organization flexible, effective and efficient means of getting at the sets of data that have come to represent one of the Organization‘s most critical and valuable assets. To make sure that the enterprise-wise information should be available for decision making purpose at all levels, at any point of time.
Why Do We Need A Data Warehouse ?
© Copyright GlobalLogic 2010 6
Connect. Collaborate. Innovate.
A Data Warehouse is a special kind of database, which stores SUBJECT ORIENTED, INTEGRATED, TIME VARIANT, NON-VOLATILE
collection of data in support of management’s decision making process.
It is a structured repository of historic data of the Organization which support managerial decision making.
It is developed in an evolutionary process by integrating data from non-integrated legacy systems.
Many design elements that optimize transaction processing are inefficient (in several ways) in a data warehouse.
Managerial access to data for decision making requires access mechanisms that would violate many principles of regular DB design, like Normalization, Security, Integrity, etc.
Continued…
Data Warehouse
What Exactly Is A Data Warehouse ?
© Copyright GlobalLogic 2010 7
Connect. Collaborate. Innovate.Data Warehouse - Features
Subject Oriented
Integrated Time Variant Non-volatile
Data is integrated & loaded by Subject.
In DWH, data is obtained from various sources and is kept in a consistent format.
For example, in a operational system, multiple ID values are generated for the same order in Order, Account Receivable & Product databases. But, in DWH, one single ID value is used for 1 order.
In DWH, every data component is stored for a designated time period (3/10/20 yrs) in comparison to current operational data.
Data is stored at summary level, with no frequent data-updates as compare to transaction level data, which is updated frequently.
Key Features of Data Warehouse :
DWH
Cust Prod
Order
199819992000
ODS
DWH
Create
Update
Read
Delete
ReplaceInsert
Load
Load
Read
Read
© Copyright GlobalLogic 2010 8
Connect. Collaborate. Innovate.Data Warehouse - SourcesSources of Data Warehouse Data
ArchivesHistoric
Data
Current system of recordsRecent
History
Operational TransactionsFuture Data
Source
DWH
© Copyright GlobalLogic 2010 9
Connect. Collaborate. Innovate.Data Warehouse Designs
Virtual DWH Central DWH Distributed DWH Enterprise DWH End users are allowed to access operational databases directly.
Provides flexibility as well as the minimum amount of redundant data that must be loaded and maintained.
Put the largest unplanned query load on operational systems.
Mostly used where undefined needs to access operational data from a relatively large class of end-users are very high and the likely frequency of requests is low.
A single physical database that contains all of the data for a specific functional area, department, division, or enterprise, for a specific time period.
Often selected where there is a common need for informational data
The data stored in the DWH is accessible from one place and must be loaded and maintained on a regular basis.
They are data warehouses in which the certain components of the data warehouse are distributed across a number of different physical databases.
Increasingly, large Organizations are pushing decision-making down to lower and lower levels of the Organization and in turn pushing the data needed for decision making down (or out) to the LAN or local computer serving the local decision-maker.
Distributed DWH usually involve the most redundant data and, as a consequence, most complex loading and updating processes.
If an Organization has a single, all-encompassing database, EDWH is recommended.
If the Organization has fragmented databases then an enterprise data mart must be constructed for the data warehouse, which will - reflect the data after transformations - assists with retrieval and end-user access.
Various Data Warehouse Designs
© Copyright GlobalLogic 2010 10
Connect. Collaborate. Innovate.
Produce Reports For Long Term Trend analysis
Produce Reports Aggregating Enterprise Data
Produce Reports of Multiple Dimensions (Earned revenue by month by product by branch)
Appropriate Use of Data Warehouse Data
Inappropriate Use of Data Warehouse Data Replace Operational Systems
Replace Operational Systems’ Reports
Analyze Current Operational Results
Data Warehouse - Data
Nature of Data Warehouse Data Data in a DWH is always historic and is static in nature.
It is used to look at the information over periods of time.
It is usually built from the operational data available in the Organization.
The data may not necessarily be from with-in the Organization.
© Copyright GlobalLogic 2010 11
Connect. Collaborate. Innovate.Data Warehouse & Business Intelligence
SQL is inadequate for analytical applications due to the following reasons: The conditions in WHERE clause often contains too many AND , OR conditions. OR conditions are poorly handled by most RDBMS. Statistical functions such as standard deviations are not supported by SQL. Aggregation over time is not supported. Users often need to pose related queries to get the desired results. There is no convenient way to express commonly occurring groups of queries. Most of the times, the SQL queries are not optimized and hence take lot of time to produce results. Many business operations are hard or impossible to express in SQL
Comparisons (with aggregation) Multiple Aggregations Reporting features
To overcome the above limitations, some business intelligence is required over the available data using a separate set of tools, which can help in doing all kind of required analysis and generating reports.
Continued…
SQL Is Inadequate – Need of B.I.
© Copyright GlobalLogic 2010 12
Connect. Collaborate. Innovate.Data Warehouse & Business Intelligence
DWH & Business IntelligenceTo overcome the limitations of SQL, Business Intelligence is required for the following tasks:
Data Integration – extracting the data from different heterogeneous sources and store it in a consistent format at one location (DWH).
Data Transformation – transform the data in the required format before loading, so that data can be maintained in a consistent format.
Data Marts, Multi-dimensional databases and cubes – creating data-marts, multi-dimensional databases and cubes, which will act as a source for various reports and trend-analysis.
Data Access & Analysis – make the data available to the end-users in the form of reports and dash-boards and help the data analyst to do different kind of analysis to support the decision making at the senior management level.
© Copyright GlobalLogic 2010 13
Connect. Collaborate. Innovate.
End-To-End Data Warehouse & BI Architecture
Data Warehouse & BI Architecture
Data Sources Extraction Staging Load Central DWH Data Marts Data Access & Analysis
© Copyright GlobalLogic 2010 14
Connect. Collaborate. Innovate.
The Problem - Data in ERP “Jail” – Virginia Tech University Data structures difficult to understand and inefficient to access for
analysis and reports Data values change so point-in-time data lost Growing backlog of report requests
Data Warehouse - Case Study
The Solution – A DWH for ERP Data – Virginia Tech University Initial charge – Build a data warehouse Initial vision – Create business view of administrative data for Virginia Tech
USER
DATA WAREHOUSE
TRANSACTIONAL ERP SYSTEM Data Access Architecture
© Copyright GlobalLogic 2010 15
Connect. Collaborate. Innovate.
Staffing
DBA Data Administrator Data Warehouse Architects Training Coordinators Web Application Developers
Other Resources
Hardware Software
The Solution – A DWH for ERP Data
Laying The Foundation
Planning
Surveyed other institutions
Did site visits and interviews
Established scope
Identified first subject area
Drafted project plan
Delivered management briefings
Staff Education and Training
Data Warehouse Institute
Ralph Kimball Approach to design DWH
Data Warehouse - Case Study
© Copyright GlobalLogic 2010 16
Connect. Collaborate. Innovate.
The Solution – A DWH for ERP Data
Data Warehouse - Case Study
Building The Data Warehouse
Strategy
Build by subject area
Develop iteratively
Design for enterprise
Design
Star Schema
Time Dimension
Transaction Detail
Surrogate Keys
Conformed Dimensions
Slowly Changing Dimensions
The Design – Multi Dimensional
© Copyright GlobalLogic 2010 17
Connect. Collaborate. Innovate.Data Warehouse - Case StudyThe Solution – A DWH for ERP Data
The Design – STAR Schema
FACT TABLE
DIMENSIONSDIMENSION
S
© Copyright GlobalLogic 2010 18
Connect. Collaborate. Innovate.Data Warehouse - Case StudyThe Solution – A DWH for ERP Data
Confirmed Dimensions
© Copyright GlobalLogic 2010 19
Connect. Collaborate. Innovate.Data Warehouse - Case StudyThe Solution – A DWH for ERP Data
Design - Slowly Changing Data (Dimensions)
There are 3 ways to manage the change in the slowing changing dimension data:
Overwrite changed attribute in the same record
add new record for the new value
use additional fields for old and new values in the same record
Proper standards should be followed while designing the DWH
Object names should be meaningful and standardized
Indicators should be used to simplify the queries
Descriptions should be provided along with each piece of code
Data should be available with business descriptions to make it clear to the end-users
Special Features
External data may be included
Derivations, calculations, aggregations and summary data should be included
© Copyright GlobalLogic 2010 20
Connect. Collaborate. Innovate.Data Warehouse - Case StudyThe Solution – A DWH for ERP Data
Building The Data Warehouse
Development Process
Data Model Design (Erwin) Source-To-Target mapping Business Definitions ETL Development / Testing (Data Stage) Data Verification Process Control Checks Pilot User Training
Data Access Strategy Stewardship same as ERP ERP security definitions leveraged Warehouse security built as part of ETL Training precedes access
© Copyright GlobalLogic 2010 21
Connect. Collaborate. Innovate.Data Warehouse - Case StudyThe Solution – A DWH for ERP Data
Building The Data Warehouse
© Copyright GlobalLogic 2010 22
Connect. Collaborate. Innovate.Data Warehouse - Case StudyThe Solution – A DWH for ERP Data
Query ExampleCreate a report which will show employee id, name, current hire date, gender, ethnicity, rank and tenure of all full time minority faculties.
Result - ERP Queryselect spriden_id, concat(spriden_last_name,concat(', ',concat(spriden_first_name,concat(' ', spriden_mi)))), to_char(pebempl_current_hire_date,'DD-MON-YYYY'), decode(spbpers_sex,'M','Male','F','Female'), stvethn_desc, ptrrank_desc, ptrtenr_descfrom spriden, spbpers, pebempl, stvethn, perrank a, ptrrank, perappt c, ptrtenrwhere pebempl_empl_status = 'A’ and pebempl_ecls_code in ('2A','2B','2C','2F','2G','2H','2K','2L', '3A','3B','3C','3D','3H','3I','3J','3M') and pebempl_pidm = spbpers_pidm and (spbpers_sex = 'F' or spbpers_ethn_code != '1') and pebempl_pidm = spriden_pidm and spriden_change_ind is null and spbpers_ethn_code = stvethn_code and pebempl_pidm = a.perrank_pidm and a.perrank_action_date = (select MAX(perrank_action_date) from perrank b where b.perrank_pidm = a.perrank_pidm) and a.perrank_rank_code = ptrrank_code and pebempl_pidm = c.perappt_pidm and c.perappt_action_date = (select max(perappt_action_date) from perappt d where c.perappt_pidm = d.perappt_pidm) and perappt_tenure_code = ptrtenr_code
© Copyright GlobalLogic 2010 23
Connect. Collaborate. Innovate.
Create a report which will show employee id, name, current hire date, gender, ethnicity, rank and tenure of all full time minority faculties.
Result - DWH Queryselect ssn_fin_num, current_full_name, salary_hire_date, gender_desc, ethnicity_desc, rank_desc, tenure_desc from employee where current_record_ind = 'Y’ and active_employee_ind = 'Y‘ and faculty_ind = 'Y’ and full_time_ind = 'Y’ and (gender_code = 'F' or ethnicity_code != '1')
Data Warehouse - Case StudyThe Solution – A DWH for ERP Data
Query Example
© Copyright GlobalLogic 2010 24
Connect. Collaborate. Innovate.
DWH – Design Features
DWH Metadata System has Business definitions maintained by Data Experts, which are stored with the data
Data Architecture is structured for query
Data can be accessed by various clients
DWH is designed to include historic data
DWH provides a stable business view of data
Data Warehouse - Case StudyThe Solution – A DWH for ERP Data
© Copyright GlobalLogic 2010 25
Connect. Collaborate. Innovate.Data Warehouse & BI Architecture