Upload
sheena-nash
View
221
Download
1
Embed Size (px)
DESCRIPTION
Building a database: Data Modeling Normalization One-to-One Relationships One-to-Many Relationships Many-to-Many Relationships ERD (Entity Relationship Diagram)
Citation preview
The Data Warehouse
Chapter 6
6.1 Operational Databases
= transactional database
designed to process individual transaction quickly and efficiently
On-Line Transactional Processing
(OLTP) Data Warehouse
Building a database: Data Modeling Normalization
• One-to-One Relationships• One-to-Many Relationships• Many-to-Many Relationships
ERD (Entity Relationship Diagram)
Figure 6.1 A simple entity-relationship diagram
Type IDYear
Make
Income Range
Customer ID
Vehicle - Type Customer
Normalization• First Normal Form (atomic value)• Second Normal Form (No 부분종속 ) R (A, B, C, D, E)
•Third Normal Form (No 이전종속 ) R (A, B, C, D, E)
The Relational Model
주문서 ( 주문번호 , 주문일 , 고객번호 , 고객명 , 주소 , 제품번호 , 제품명 , 수량 , 단가 )
주 문 서
주문번호 : 주문일 :
고객번호 : 고객명 : 주소 :
제품번호 제품명 수량 단가 금액
1111 MP3 2 60,000 120,000
2115 공 CD 3 10,000 30,000
합계 : 150,000
Table 6.1a • Relational Table for Vehicle-Type
Type ID Make Year
4371 Chevrolet 19956940 Cadillac 20004595 Chevrolet 20012390 Cadillac 1997
Table 6.1b • Relational Table for Customer
Customer IncomeID Range ($) Type ID
0001 70–90K 23900002 30–50K 43710003 70–90K 69400004 30–50K 45950005 70–90K 2390
Table 6.2 • Join of Tables 6.1a and 6.1b
Customer IncomeID Range ($) Type ID Make Year
0001 70–90K 2390 Cadillac 19970002 30–50K 4371 Chevrolet 19950003 70–90K 6940 Cadillac 20000004 30–50K 4595 Chevrolet 20010005 70–90K 2390 Cadillac 1997
6.2 Data Warehouse Design
OLTP Data Warehouse
Process Oriented Subject Oriented
Normalized Denormalized
Day-to-day operation Historical
Constant Update Not subject to change (read only)
Lowest level of granularity Design issue
Figure 6.2 A data warehouse process model
OperationalDatabase(s)
Decision Support SystemDataWarehouse
IndependentData Mart
ExternalData
ETL Routine(Extract/Transform/Load)
DependentData Mart
Extract/Summarize Data
Report
Structuring the Data Warehouse:
• Fact Table (dimension key + fact)• Dimension Tables ( Not Normalized,
Slowly Changing Dimensions )
(1)Multidimensional Database
(2)Relational Database Multidimensional Format
Star Schema
Figure 6.3 A star schema for credit cared purchases
Cardholder Key Purchase Key1 2
Fact TableAmountTime KeyLocation Key
101 14.50
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
15 4 115 8.251 2 103 22.40
Location Key Street10 425 Church St
Location DimensionRegionStateCity
SCCharlston 3...
.
.
.
.
.
.
.
.
.
.
.
.
GenderMale
.
.
.
Female
Income Range50 - 70,000
.
.
.
70 - 90,000
Cardholder Key Name1 John Doe
.
.
.
.
.
.
2 Sara Smith
Cardholder Dimension
Purchase Key Category1 Supermarket
.
.
.
.
.
.
2 Travel & Entertainment
Purchase Dimension
3 Auto & Vehicle4 Retail5 Restarurant6 Miscellaneous
Time Key Month10 Jan
Time DimensionYearQuarterDay
15 2002...
.
.
.
.
.
.
.
.
.
.
.
.
The Multidimensionality of the Star Schema
PurchaseKey
Location Key
Time Key
A(C i,1,2,10)
Cardholder Ci
Figure 6.4 Dimensions of the fact table shown in Figure 6.3
Additional Relational Schemas
• Snowflake Schema Dimension tables are further subdivided
•Constellation Schema Sharing dimensions
Figure 6.5 A constellation schema for credit card purchases and promotions
Cardholder Key Purchase Key1 2
Purchase Fact TableAmountTime KeyLocation Key
101 14.50
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
15 4 115 8.251 2 103 22.40
Time Key Month5 Dec
Time DimensionYearQuarterDay
431 2001
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8 Jan 13 200210 Jan 15 2002
Promotion Key DescriptionPromotion Dimension
Cost
.
.
.
.
.
.
.
.
.
1 watch promo 15.25
Purchase Key Category1 Supermarket2 Travel & Entertainment
Purchase Dimension
3 Auto & Vehicle4 Retail5 Restarurant6 Miscellaneous
Location Key Street5 425 Church St
Location DimensionRegionStateCity
SCCharleston 3...
.
.
.
.
.
.
.
.
.
.
.
.
Cardholder Key Promotion Key1 1
Promotion Fact TableResponseTime Key
5 Yes
.
.
.
.
.
.
.
.
.
.
.
.
2 1 5 No
GenderMale
.
.
.
Female
Income Range50 - 70,000
.
.
.
70 - 90,000
Cardholder Key Name1 John Doe
.
.
.
.
.
.
2 Sara Smith
Cardholder Dimension
Decision Support: Analyzing the Warehouse Data
• Reporting Data• Analyzing Data (multidimensional data analysis tool)• Knowledge Discovery (through data mining)
6.3 On-line Analytical Processing (OLAP)
- Query based methodology
- Supports data analysis in multidimensional environment
- Storage methods
(1) Relational data store Star Schema
(2) Multidimensional array data store
OLAP Operations
• Slice – A single dimension operation• Dice – A multidimensional operation• Roll-up – A higher level of generalization• Drill-down – A greater level of detail• Rotation – View data from a new perspective
Figure 6.6 A multidemensional cube for credit card purchases
Dec.
Mar.
Feb.
Apr.
May
Jun.
Jul.
Aug.
Sep.
Oct.
Nov.
Jan.
Mon
th
Supe
rmar
ket
Mis
cella
neou
s
Res
taur
ant
Trav
el
Ret
ail
Vehi
cle
Category
RegionOne
FourThreeTwo
Month = Dec.
Count = 110Amount = 6,720Region = TwoCategory = Vehicle
Concept Hierarchy
A mapping that allows attributes to be viewed from varying levels of detail.
Region
Street Address
City
State
Figure 6.8 Rolling up from months to quarters
Q4
Q2
Q3
Tim
e
Supe
rmar
ket
Mis
cella
neou
s
Res
taur
ant
Trav
el
Ret
ail
Vehi
cle
Category
Q1
Month = Oct./Nov/Dec.
Region = OneCategory = Supermarket
6.4 Excel Pivot Tables for Multidimensional Data Analysis
Figure 6.15 A credit card promotion cube
No
YesWat
ch P
rom
o
No
Life Insurance Promo
Magazine
Promo
No
Yes
Yes
Watch Promo = No
Magazine Promo = YesLife Insurance Promo = Yes
Figure 6.16 A pivot table with page variables for credit card promotions