28
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence & Data Warehousing

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

Embed Size (px)

Citation preview

Page 1: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-1

David M. Kroenke

Database Processing Chapter 15

Business Intelligence &

Data Warehousing

Page 2: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-2

Business Intelligence (BI) Systems

• Business Intelligence (BI) systems are information systems that assist managers and other professionals:– To analyze current and past activities, and– To predict future events.

• Two broad categories:– Reporting– Data mining

Page 3: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-3

The Relationship ofOperational and BI Applications

Page 4: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-4

Reporting Applications

• Reporting system applications:– Filter, Sort, Group, Simple Calculations using

SQL– Classify entities (customers, products,

employees, etc.)• RFM Analysis

– Deal with critical report delivery

Page 5: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-5

Data Mining Applications

• Data mining applications are used to:– Perform what-if analysis– Make predictions– Facilitate decision making

• Data mining applications use sophisticated statistical and mathematical techniques.

• Report delivery is not as critical.

Page 6: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-6

Need for Data Warehousing• Integrated, company-wide view of high-quality

information (from disparate databases)• Separation of operational and informational systems

and data (for improved performance)

Comparison of Operational and Informational Systems

Page 7: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-7

Definitions• Data Warehouse:

– A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes

– Subject-oriented: e.g. customers, patients, students, products– Integrated: Consistent naming conventions, formats, encoding

structures; from multiple data sources– Time-variant: Can study trends and changes– Non-updatable: Read-only, periodically refreshed

• Data Mart:– A data warehouse that is limited in scope

Page 8: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-8

Data Warehouse vs. Data Mart

Page 9: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-9

Components of a Data Warehouse

Page 10: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-10

Data Warehouse and Data Marts

Page 11: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-11

Data Warehouses and Data Marts:Problems of Using Transaction Data for BI

Page 12: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-12

The ETL Process

• Extract/Capture– Static vs. Incremental

• Transform– Scrub or data cleansing– Data selection, joining, aggregation

• Load and Index– Refresh vs. Update

Page 13: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-13

Data Warehouse ETL Sequence

Page 14: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-14

Components of a Star Schema

Page 15: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-15

Star Schema Example

Page 16: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-16

Issues Regarding Star Schema

• Dimension table keys should be surrogate:– Keys may change over time– Length/format consistency

• Granularity of Fact Table – what level of detail? – Transactional grain – finest level– Aggregated grain – more summarized– Finer grain: better market basket analysis capability, but much more

data (more dimension tables, more rows in fact table)

• Duration of the database – how much history should be kept?– Natural duration – 13 months or 5 quarters– Financial institutions may need longer duration– Older data is more difficult to source and cleanse

Page 17: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-17

Data Warehousing at Wal-Mart

• As of 2000 (Foote and Krishnamurthi, 2001)

– Held 7.5 TB, with plans to reach 24 TB (1TB = 250M pages of text)

– Kept 65 weeks of data– Had invested $4 Billion– Power users generated $12,000/query

• As of 2005 (Wall Street Journal, December 3-4, 2005)

– Held 570 TB (more than Internet’s fixed pages)– Predicted Hurricane Ivan would spur demand for easy

breakfasts• Stocked Florida stores with Pop-Tarts

Page 18: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-18

Reporting Systems:RFM Analysis

• RFM Analysis analyzes and ranks customers according to purchasing patterns:– R = Recent (most recent order)– F = Frequent (how often an order is made)– M = Money (dollar amount of orders)

• Customers are sorted into five groups, each containing 20% of the customers.

• Each group is given a numerical value:– 1 = Top 20%– 2, 3, 4 = Each 20% in between top and bottom 20%– 5 = Bottom 20%

Page 19: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-19

Reporting Systems:RFM Analysis (Continued)

Ajax ordered recently (1), orders often (1) but does not order the most expensive items (3) – Try to sell Ajax more expensive goods!

Bloominghams has not ordered recently (5), but has ordered often (1) and purchased the most expensive items (1).This customer may be looking for a different vendor – better call!

Page 20: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-20

Reporting Systems:OnLine Analytical Processing [OLAP]

• An OLAP report has measures and dimensions:– Measure — A data item of interest.– Dimension — A characteristic of a measure.

• OLAP cube — A presentation of a measure with associated dimensions.– An OLAP cube can have any number of axes.– The terms OLAP cube and OLAP report are

synonymous.

• OLAP allows drill-down — a further division of the data into more detail.

Page 21: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-21

Reporting Systems:OLAP Drill Down:

Product Family by Store Type

Page 22: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-22

Reporting Systems:OLAP Drill Down:

Product Family and Store Location by Store Type

Page 23: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-23

Reporting Systems:OLAP Drill Down:

Store Location and Product Family by Store Type

Page 24: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-24

Data Mining Applications

• Data mining applications use sophisticated statistical and mathematical techniques to find patterns and relationships that can be used to classify and predict.– Unsupervised data mining — Statistical techniques

are used to identify groups of entities with similar characteristics.

• Cluster Analysis– Supervised data mining:

• A model is developed.• Statistical techniques are used to estimate parameter values

of the model.– Regression analysis

Page 25: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-25

Data Mining Applications:The Convergence of the Disciplines

Page 26: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-26

Data Mining Applications:Three Popular Data Mining Techniques

• Decision tree analysis — Classifies entities into groups based on past history.

• Logistic regression — Produces equations that offer probabilities that certain events will occur.

• Neural Networks — Complex statistical prediction techniques

Page 27: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-27

Data Mining Applications:Market Basket Analysis

• Market Basket Analysis — Determines patterns of associated buying behavior.– Support — The probability that two items will be

purchased together.– Confidence — The probability that an item will be

purchased given the fact that the customer has already purchased another particular item.

– Lift — the ratio of confidence to the basic probability that a particular item will be purchased.

Page 28: DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 15-1 David M. Kroenke Database Processing Chapter 15 Business Intelligence

DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall

15-28

Data Mining Applications:Market Basket Analysis Example