27
CISB594 – Business CISB594 – Business Intelligence Intelligence Data Warehouse Part Data Warehouse Part I I

CISB594 – Business Intelligence Data Warehouse Part I

Embed Size (px)

Citation preview

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehouse Part IData Warehouse Part I

CISB594 – Business IntelligenceCISB594 – Business Intelligence

ReferenceReference• Materials used in this presentation are extracted mainly from

the following texts, unless stated otherwise.

CISB594 – Business IntelligenceCISB594 – Business Intelligence

ObjectivesObjectives

At the end of this lecture, you should be able to:• Understand the basic definitions and concepts of data

warehouses• Understand how a data warehouse differs from an

operational database• Describe the characteristics of data warehouse• Describe data warehouse process overview• Describe the different types of data warehouse architectures

CISB594 – Business IntelligenceCISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data WarehouseData Warehouse• “The data warehouse is a collection of integrated, subject-

oriented databases designed to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time” (Inmon)

• A copy of transaction data specifically structured for query and analysis (Kimball)

• A data warehouse is a repository of an organization's electronically stored data, designed to facilitate reporting and analysis . (Wikipedia)

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data WarehouseData Warehouse

• A decision support database that is maintained separately from the organization’s operational database

• Support information processing by providing a solid platform of consolidated, historical data for analysis

• In your own words?

CISB594 – Business IntelligenceCISB594 – Business Intelligence

4 main characteristics of data 4 main characteristics of data warehousing warehousing

1.1. Subject oriented Subject oriented • Organized around major subjects, such as sales progressOrganized around major subjects, such as sales progress• Containing only information relevant for decision supportContaining only information relevant for decision support• Focusing on the modeling and analysis of data for decision Focusing on the modeling and analysis of data for decision

makers, not on daily operations or transaction processingmakers, not on daily operations or transaction processing• Provide a simple and concise view around particular subject Provide a simple and concise view around particular subject

issuesissues

CISB594 – Business IntelligenceCISB594 – Business Intelligence

4 main characteristics of data 4 main characteristics of data warehousing warehousing

1.1. Subject oriented Subject oriented • For example, to learn more about your company's sales, you For example, to learn more about your company's sales, you

can build a warehouse that concentrates on sales. can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like Using this warehouse, you can answer questions like "Who was "Who was

our best customer for this item last year?" our best customer for this item last year?" This ability to This ability to define a data warehouse by subject matter, sales in this case, define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented makes the data warehouse subject oriented (http://docs.oracle.com/)(http://docs.oracle.com/)

CISB594 – Business IntelligenceCISB594 – Business Intelligence

4 main characteristics of data 4 main characteristics of data warehousing warehousing

2.2. Integrated Integrated • Constructed by integrating multiple, Constructed by integrating multiple,

various data sourcesvarious data sources• Must place data from different sources Must place data from different sources

into a consistent format, to do so they into a consistent format, to do so they must deal with naming conflict and must deal with naming conflict and discrepancies discrepancies

• Data cleaning and data integration Data cleaning and data integration techniques are appliedtechniques are applied

• Ensure consistency in naming Ensure consistency in naming conventions among different data conventions among different data sourcessources

• When data is moved to the When data is moved to the warehouse, it is convertedwarehouse, it is converted

CISB594 – Business IntelligenceCISB594 – Business Intelligence

4 main characteristics of data 4 main characteristics of data warehousing warehousing

3. Time variant (time series) 3. Time variant (time series) • maintains historical data, data for analysis from multiple maintains historical data, data for analysis from multiple sources contain multiple time pointssources contain multiple time pointsA data warehouse's focus on change over timeA data warehouse's focus on change over time• The time horizon for the data warehouse is significantly The time horizon for the data warehouse is significantly

longer than that of operational systemslonger than that of operational systems• Operational database: current value dataOperational database: current value data• Data warehouse data: provide information from a Data warehouse data: provide information from a

historical perspective (e.g., past 5-10 years)historical perspective (e.g., past 5-10 years)

CISB594 – Business IntelligenceCISB594 – Business Intelligence

4 main characteristics of data 4 main characteristics of data warehousing warehousing

4. Non-volatile4. Non-volatile• after data are entered into a data warehouse, users cannot after data are entered into a data warehouse, users cannot

change or update the data.change or update the data.• Operational update of data does not occur in the data Operational update of data does not occur in the data

warehouse environmentwarehouse environment• Does not require transaction processing, recovery, and Does not require transaction processing, recovery, and

concurrency control mechanismsconcurrency control mechanisms• Requires only two operations in data accessing: Requires only two operations in data accessing:

• Initial loading of data and access of dataInitial loading of data and access of data

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Summary of Data WarehouseSummary of Data Warehouse

• Runs on a DBMS such as Oracle, SQL, DB2 …Runs on a DBMS such as Oracle, SQL, DB2 …• Keeps a large amount of data from different time for a long Keeps a large amount of data from different time for a long

period of timeperiod of time• Data in data warehouse cannot be overwritten by usersData in data warehouse cannot be overwritten by users• Data comes from various sources, internally and externallyData comes from various sources, internally and externally• Carefully designed to allow for analysis/ pattern discovery on Carefully designed to allow for analysis/ pattern discovery on

identified subject matteridentified subject matter

CISB594 – Business IntelligenceCISB594 – Business Intelligence

OLTPOLTP• OLTP (on-line transaction processing)

– Major task of traditional relational DBMS– Day-to-day operations: purchasing, inventory, banking,

manufacturing, payroll, registration, accounting, etc.

– Database type : Operational

CISB594 – Business IntelligenceCISB594 – Business Intelligence

OLAPOLAP

• Online Analytical Processing (OLAP) Online Analytical Processing (OLAP) is a reporting application is a reporting application that provides high-performance analysis and easy reporting that provides high-performance analysis and easy reporting on large volumes of dataon large volumes of data

• The goal of OLAP:The goal of OLAP:– multidimensional data analysis, multidimensional data analysis, – provide fast and flexible data summarization, analysis, and provide fast and flexible data summarization, analysis, and

reporting capabilitiesreporting capabilities– ability to view trends over timeability to view trends over time

• Type of database : Data warehouseType of database : Data warehouse

CISB594 – Business IntelligenceCISB594 – Business Intelligence

OLTP vs OLAPOLTP vs OLAPOLTP OLAP

Users Clerk, IT professional Knowledge worker

Function Day to day operations Decision support

DB DesignTo suit typical database function of update, edit, delete, relational

Designed for reporting on Subjects, datawarehouse

DataCurrent, up-to-datedetailed,

Historical, summarized, multidimensional, integrated, consolidated

Usage Repetitive, structured Ad-hoc, un-structured

Access Read/write Read. Lots of scans

Type of Work Short, simple transaction Complex query

# Records Accessed Tens Millions

# Users Thousands Hundreds, Tens

DB Size 100MB-GB 100GB-TB

CISB594 – Business IntelligenceCISB594 – Business Intelligence

How the database looks like for How the database looks like for the two typesthe two types

• The operational database (relational):

CISB594 – Business IntelligenceCISB594 – Business Intelligence

How the database looks like for How the database looks like for the two typesthe two types

• The datawarehouse (star schema):

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Why …Why …• Can we not operate on operational database to obtain

the answers to our business questions?• Answer : require complex query formulation,

preparation of data to address the query and if use the operational database, the process will be very slow due to complex joins and multiple scans– A typical data warehouse query scans thousands or millions

of rows. For example, "Find the total sales for all customers last month."

– A typical OLTP operation accesses only a handful of records. For example, "Retrieve the current order for this customer."

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Ask yourselfAsk yourself

• Explain data warehouse. How does it differ from operational database? Provide an example to support your answer• Explain the 4 main characteristics of data warehouse• Compare and contrast OLAP to OLTP

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehousing - ConceptData Warehousing - ConceptData mart – Smaller and focuses on a particular subject or department. – It is a subset of data warehouse/departmental data

warehouse– A data mart is a smaller DW designed around one problem,

organizational function, topic, or other focus area.Can be Dependent data mart

– A subset that is created directly from a data warehouse – Ensures that the end user is viewing the same version of the

data that are accessed by all other data warehouse usersOr Independent data mart

– A small data warehouse designed for a strategic business unit or a department

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehousing - ConceptData Warehousing - Concept• Enterprise data warehouse (EDW)– A large scale data warehouse used across the enterprise

for decision support– Used to provide data for many types of DSS, including

CRM, supply chain management, BPM, KMS etc• Metadata – Data about data. In a data warehouse, metadata describe

the contents of a data warehouse and the manner of its use.

– Metadata in layman term : Metadata describes other data. It provides information about a certain item's content. For example, an image may include metadata that describes how large the picture is, the color depth, the image resolution, when the image was created, and other data

http://www.techterms.com/definition/metadata

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehousing Data Warehousing Process OverviewProcess Overview

The data warehousing process consists of the following steps:1. Data are imported from various internal and external sources2. Data are cleansed and organized consistently with the organization’s

needs3a. Data are loaded into the enterprise data warehouse4a.If desired, data marts are created as subsets of the EDW—or—3b.Data are loaded into data marts4b.The data marts are consolidated into the EDW5. Analyses are performed as needed

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehousing - Process Data Warehousing - Process OverviewOverview

The major components of a data warehousing process • Data sourcesData sources. Data are sourced from operational systems and possibly from

external data sources.• Data extractionData extraction. Data are extracted using custom-written or commercial

software called ETL.• Data loadingData loading. Data are loaded into a staging area, where they are

transformed and cleansed. The data are then ready to load into the data warehouse.

• Data warehouse/Comprehensive databaseData warehouse/Comprehensive database. This is the EDW that supports decision analysis by providing relevant summarized and detailed information.

• Middleware toolsMiddleware tools. Middleware tools enable access to the data warehouse from a variety of front-end applications.

Data Warehousing - Process Overview Data Warehousing - Process Overview

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehousing ArchitecturesData Warehousing Architectures • There are several basic architectures for data warehousing• To distinguished the architectures data warehouse is divided

into three parts:• The data warehouse itself• Data acquisition (back-end) software, which extracts data

from legacy systems and external sources, consolidates and loads into the data warehouse

• Client (front-end) software, which allows users access and analyze data from the warehouse

Data Warehousing ArchitecturesData Warehousing Architectures

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Data Warehousing Architectures Data Warehousing Architectures

1. Information interdependence between organizational units

2. Upper management’s information needs

3. Urgency of need for a data warehouse

1. 4. Constraints on resources, funding 2. 5. Strategic view of the data

warehouse prior to implementation3. 6. Compatibility with existing

systems4. 7. Perceived ability of the in-house

IT staff5. 8. Technical issues, technology6. 9. Social/political factors/nature of

users

Factors that potentially affect the architecture selection decision:

CISB594 – Business IntelligenceCISB594 – Business Intelligence

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Now ask if ..Now ask if ..

You are able to:• Understand the basic definitions and concepts of data

warehouses• Understand how a data warehouse differs from a database• Describe the characteristics of data warehouse• Describe data warehouse process overview

CISB594 – Business IntelligenceCISB594 – Business Intelligence