25
Copyright GlobalLogic 2010 1 Connect. Collaborate. Innovate. Data Warehouse & Business Intelligence Concepts & Architecture

Data Warehouse & Business Intelligence Concepts & Architecture Sanjeev

  • Upload
    anisa

  • View
    47

  • Download
    0

Embed Size (px)

DESCRIPTION

Data Warehouse & Business Intelligence Concepts & Architecture Sanjeev. Data Warehouse & Business Intelligence. Topics To Be Discussed: Why Do We Need A Data Warehouse ? What Exactly Is A Data Warehouse ? Features Of Data Warehouse Sources Of Data Warehouse Data Warehouse Designs - PowerPoint PPT Presentation

Citation preview

Page 1: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 1

Connect. Collaborate. Innovate.

Data Warehouse & Business Intelligence

Concepts & Architecture

Sanjeev

Page 2: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 2

Connect. Collaborate. Innovate.

Topics To Be Discussed:

Why Do We Need A Data Warehouse ? What Exactly Is A Data Warehouse ? Features Of Data Warehouse Sources Of Data Warehouse Data Warehouse Designs Data Warehouse – Data Usage Why There Is A Need of Business Intelligence ? DWH & BI - Architecture Case Study

Data Warehouse & Business Intelligence

Page 3: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 3

Connect. Collaborate. Innovate.Data WarehouseWhy Do We Need A Data Warehouse ?

Data Access Problem – Data In “Jail” The single key to survival in the 1990s (and beyond) is - being able to analyze, plan and react to changing business conditions in a much more rapid fashion.

To do this, top managers, analysts and knowledge workers in our enterprises need more and better information.

Information technology itself has made possible the revolutions in the way that Organizations today operate throughout the world.

More and more powerful computers on everyone‘s desks, and

Communication networks that span the globe

BUT STILL

Executives and decision makers can't get their hands on critical information that already exists in the organization.

Continued…

Page 4: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 4

Connect. Collaborate. Innovate.Data Warehouse

Data Access Problem – Data In “Jail” Organizations- large and small, create

billions of bytes of data about all aspects of their business, millions of individual facts about their customers, products, operations and people

But for the most part, this data is locked up in a myriad of computer systems and is exceedingly difficult to get at.

This phenomenon has been described as "data in jail".

Only a small fraction of the data that is captured, processed and stored in the enterprise is actually available to executives and decision makers.

Technologies for the manipulation and presentation of data have literally exploded.

Large segments of the enterprise are still "data poor.“

Whatever is BETTER, FASTER and CHEAPER, is not FUNCTIONALLY COMPLETE.

Continued…

Why Do We Need A Data Warehouse ?

Page 5: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 5

Connect. Collaborate. Innovate.Data Warehouse

Data Access Problem – Data In “Jail”Solution – A Data Warehouse

A set of significant new concepts and tools have evolved providing all the key people in the enterprise with access to whatever level of information needed for the enterprise to survive and prosper in an increasingly competitive world.

The term that has come to characterize this new technology is “Data Warehousing.”

to provide an Organization flexible, effective and efficient means of getting at the sets of data that have come to represent one of the Organization‘s most critical and valuable assets. To make sure that the enterprise-wise information should be available for decision making purpose at all levels, at any point of time.

Why Do We Need A Data Warehouse ?

Page 6: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 6

Connect. Collaborate. Innovate.

A Data Warehouse is a special kind of database, which stores SUBJECT ORIENTED, INTEGRATED, TIME VARIANT, NON-VOLATILE

collection of data in support of management’s decision making process.

It is a structured repository of historic data of the Organization which support managerial decision making.

It is developed in an evolutionary process by integrating data from non-integrated legacy systems.

Many design elements that optimize transaction processing are inefficient (in several ways) in a data warehouse.

Managerial access to data for decision making requires access mechanisms that would violate many principles of regular DB design, like Normalization, Security, Integrity, etc.

Continued…

Data Warehouse

What Exactly Is A Data Warehouse ?

Page 7: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 7

Connect. Collaborate. Innovate.Data Warehouse - Features

Subject Oriented

Integrated Time Variant Non-volatile

Data is integrated & loaded by Subject.

In DWH, data is obtained from various sources and is kept in a consistent format.

For example, in a operational system, multiple ID values are generated for the same order in Order, Account Receivable & Product databases. But, in DWH, one single ID value is used for 1 order.

In DWH, every data component is stored for a designated time period (3/10/20 yrs) in comparison to current operational data.

Data is stored at summary level, with no frequent data-updates as compare to transaction level data, which is updated frequently.

Key Features of Data Warehouse :

DWH

Cust Prod

Order

199819992000

ODS

DWH

Create

Update

Read

Delete

ReplaceInsert

Load

Load

Read

Read

Page 8: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 8

Connect. Collaborate. Innovate.Data Warehouse - SourcesSources of Data Warehouse Data

ArchivesHistoric

Data

Current system of recordsRecent

History

Operational TransactionsFuture Data

Source

DWH

Page 9: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 9

Connect. Collaborate. Innovate.Data Warehouse Designs

Virtual DWH Central DWH Distributed DWH Enterprise DWH End users are allowed to access operational databases directly.

Provides flexibility as well as the minimum amount of redundant data that must be loaded and maintained.

Put the largest unplanned query load on operational systems.

Mostly used where undefined needs to access operational data from a relatively large class of end-users are very high and the likely frequency of requests is low.

A single physical database that contains all of the data for a specific functional area, department, division, or enterprise, for a specific time period.

Often selected where there is a common need for informational data

The data stored in the DWH is accessible from one place and must be loaded and maintained on a regular basis.

They are data warehouses in which the certain components of the data warehouse are distributed across a number of different physical databases.

Increasingly, large Organizations are pushing decision-making down to lower and lower levels of the Organization and in turn pushing the data needed for decision making down (or out) to the LAN or local computer serving the local decision-maker.

Distributed DWH usually involve the most redundant data and, as a consequence, most complex loading and updating processes.

If an Organization has a single, all-encompassing database, EDWH is recommended.

If the Organization has fragmented databases then an enterprise data mart must be constructed for the data warehouse, which will - reflect the data after transformations - assists with retrieval and end-user access.

Various Data Warehouse Designs

Page 10: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 10

Connect. Collaborate. Innovate.

Produce Reports For Long Term Trend analysis

Produce Reports Aggregating Enterprise Data

Produce Reports of Multiple Dimensions (Earned revenue by month by product by branch)

Appropriate Use of Data Warehouse Data

Inappropriate Use of Data Warehouse Data Replace Operational Systems

Replace Operational Systems’ Reports

Analyze Current Operational Results

Data Warehouse - Data

Nature of Data Warehouse Data Data in a DWH is always historic and is static in nature.

It is used to look at the information over periods of time.

It is usually built from the operational data available in the Organization.

The data may not necessarily be from with-in the Organization.

Page 11: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 11

Connect. Collaborate. Innovate.Data Warehouse & Business Intelligence

SQL is inadequate for analytical applications due to the following reasons: The conditions in WHERE clause often contains too many AND , OR conditions. OR conditions are poorly handled by most RDBMS. Statistical functions such as standard deviations are not supported by SQL. Aggregation over time is not supported. Users often need to pose related queries to get the desired results. There is no convenient way to express commonly occurring groups of queries. Most of the times, the SQL queries are not optimized and hence take lot of time to produce results. Many business operations are hard or impossible to express in SQL

Comparisons (with aggregation) Multiple Aggregations Reporting features

To overcome the above limitations, some business intelligence is required over the available data using a separate set of tools, which can help in doing all kind of required analysis and generating reports.

Continued…

SQL Is Inadequate – Need of B.I.

Page 12: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 12

Connect. Collaborate. Innovate.Data Warehouse & Business Intelligence

DWH & Business IntelligenceTo overcome the limitations of SQL, Business Intelligence is required for the following tasks:

Data Integration – extracting the data from different heterogeneous sources and store it in a consistent format at one location (DWH).

Data Transformation – transform the data in the required format before loading, so that data can be maintained in a consistent format.

Data Marts, Multi-dimensional databases and cubes – creating data-marts, multi-dimensional databases and cubes, which will act as a source for various reports and trend-analysis.

Data Access & Analysis – make the data available to the end-users in the form of reports and dash-boards and help the data analyst to do different kind of analysis to support the decision making at the senior management level.

Page 13: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 13

Connect. Collaborate. Innovate.

End-To-End Data Warehouse & BI Architecture

Data Warehouse & BI Architecture

Data Sources Extraction Staging Load Central DWH Data Marts Data Access & Analysis

Page 14: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 14

Connect. Collaborate. Innovate.

The Problem - Data in ERP “Jail” – Virginia Tech University Data structures difficult to understand and inefficient to access for

analysis and reports Data values change so point-in-time data lost Growing backlog of report requests

Data Warehouse - Case Study

The Solution – A DWH for ERP Data – Virginia Tech University Initial charge – Build a data warehouse Initial vision – Create business view of administrative data for Virginia Tech

USER

DATA WAREHOUSE

TRANSACTIONAL ERP SYSTEM Data Access Architecture

Page 15: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 15

Connect. Collaborate. Innovate.

Staffing

DBA Data Administrator Data Warehouse Architects Training Coordinators Web Application Developers

Other Resources

Hardware Software

The Solution – A DWH for ERP Data

Laying The Foundation

Planning

Surveyed other institutions

Did site visits and interviews

Established scope

Identified first subject area

Drafted project plan

Delivered management briefings

Staff Education and Training

Data Warehouse Institute

Ralph Kimball Approach to design DWH

Data Warehouse - Case Study

Page 16: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 16

Connect. Collaborate. Innovate.

The Solution – A DWH for ERP Data

Data Warehouse - Case Study

Building The Data Warehouse

Strategy

Build by subject area

Develop iteratively

Design for enterprise

Design

Star Schema

Time Dimension

Transaction Detail

Surrogate Keys

Conformed Dimensions

Slowly Changing Dimensions

The Design – Multi Dimensional

Page 17: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 17

Connect. Collaborate. Innovate.Data Warehouse - Case StudyThe Solution – A DWH for ERP Data

The Design – STAR Schema

FACT TABLE

DIMENSIONSDIMENSION

S

Page 18: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 18

Connect. Collaborate. Innovate.Data Warehouse - Case StudyThe Solution – A DWH for ERP Data

Confirmed Dimensions

Page 19: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 19

Connect. Collaborate. Innovate.Data Warehouse - Case StudyThe Solution – A DWH for ERP Data

Design - Slowly Changing Data (Dimensions)

There are 3 ways to manage the change in the slowing changing dimension data:

Overwrite changed attribute in the same record

add new record for the new value

use additional fields for old and new values in the same record

Proper standards should be followed while designing the DWH

Object names should be meaningful and standardized

Indicators should be used to simplify the queries

Descriptions should be provided along with each piece of code

Data should be available with business descriptions to make it clear to the end-users

Special Features

External data may be included

Derivations, calculations, aggregations and summary data should be included

Page 20: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 20

Connect. Collaborate. Innovate.Data Warehouse - Case StudyThe Solution – A DWH for ERP Data

Building The Data Warehouse

Development Process

Data Model Design (Erwin) Source-To-Target mapping Business Definitions ETL Development / Testing (Data Stage) Data Verification Process Control Checks Pilot User Training

Data Access Strategy Stewardship same as ERP ERP security definitions leveraged Warehouse security built as part of ETL Training precedes access

Page 21: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 21

Connect. Collaborate. Innovate.Data Warehouse - Case StudyThe Solution – A DWH for ERP Data

Building The Data Warehouse

Page 22: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 22

Connect. Collaborate. Innovate.Data Warehouse - Case StudyThe Solution – A DWH for ERP Data

Query ExampleCreate a report which will show employee id, name, current hire date, gender, ethnicity, rank and tenure of all full time minority faculties.

Result - ERP Queryselect spriden_id, concat(spriden_last_name,concat(', ',concat(spriden_first_name,concat(' ', spriden_mi)))), to_char(pebempl_current_hire_date,'DD-MON-YYYY'), decode(spbpers_sex,'M','Male','F','Female'), stvethn_desc, ptrrank_desc, ptrtenr_descfrom spriden, spbpers, pebempl, stvethn, perrank a, ptrrank, perappt c, ptrtenrwhere pebempl_empl_status = 'A’ and pebempl_ecls_code in ('2A','2B','2C','2F','2G','2H','2K','2L', '3A','3B','3C','3D','3H','3I','3J','3M') and pebempl_pidm = spbpers_pidm and (spbpers_sex = 'F' or spbpers_ethn_code != '1') and pebempl_pidm = spriden_pidm and spriden_change_ind is null and spbpers_ethn_code = stvethn_code and pebempl_pidm = a.perrank_pidm and a.perrank_action_date = (select MAX(perrank_action_date) from perrank b where b.perrank_pidm = a.perrank_pidm) and a.perrank_rank_code = ptrrank_code and pebempl_pidm = c.perappt_pidm and c.perappt_action_date = (select max(perappt_action_date) from perappt d where c.perappt_pidm = d.perappt_pidm) and perappt_tenure_code = ptrtenr_code

Page 23: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 23

Connect. Collaborate. Innovate.

Create a report which will show employee id, name, current hire date, gender, ethnicity, rank and tenure of all full time minority faculties.

Result - DWH Queryselect ssn_fin_num, current_full_name, salary_hire_date, gender_desc, ethnicity_desc, rank_desc, tenure_desc from employee where current_record_ind = 'Y’ and active_employee_ind = 'Y‘ and faculty_ind = 'Y’ and full_time_ind = 'Y’ and (gender_code = 'F' or ethnicity_code != '1')

Data Warehouse - Case StudyThe Solution – A DWH for ERP Data

Query Example

Page 24: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 24

Connect. Collaborate. Innovate.

DWH – Design Features

DWH Metadata System has Business definitions maintained by Data Experts, which are stored with the data

Data Architecture is structured for query

Data can be accessed by various clients

DWH is designed to include historic data

DWH provides a stable business view of data

Data Warehouse - Case StudyThe Solution – A DWH for ERP Data

Page 25: Data Warehouse  & Business Intelligence Concepts & Architecture Sanjeev

© Copyright GlobalLogic 2010 25

Connect. Collaborate. Innovate.Data Warehouse & BI Architecture