24
Copyright © Star soft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Embed Size (px)

Citation preview

Page 1: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

1

Data Warehouse Architecture

By

Slavko Stemberger

Page 2: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

2

Some Acronyms/Terms

• OLAP– On-line Analytical Processing

• ROLAP– Relational OLAP

• OLTP– On-Line Transaction Processing (operational

system)

Page 3: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

3

Some Acronyms/Terms

• Metadata– Data about data (data dictionary)

• Source System– An operational system that provides data for the

data warehouse

• MOLAP– Multidimensional OLAP

Page 4: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

4

Some Acronyms/Terms

• Data Warehouse– A queryable source of data

• Data Mart– A logical subset of a data warehouse

• Data Staging Area– An intermediate storage location used for ETL

• ETL– Extract, Transform and Load

Page 5: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

5

Data Structures/Databases

• Hierarchical DB

• Network DB

• Relational DB

• O-O DB

• Dimensional DB

• Flat Files

Page 6: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

6

Modeling Methods

• Dimensional

• Object Oriented (O-O)

• Entity-Relationship (E-R)

Page 7: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

7

Entity-Relationship Modeling

• Instantaneous snapshot of the business

• Removed data redundancy (eliminates update anomalies)

• Shows detail relationships

• Complex network of entities can be difficult for end-users to understand

• Used for operational system

Page 8: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

8

Dimensional Modeling

• Data duplication is allowed (in the dimensions)

• Query based

• Easier for users to understand– Not as much detail shows as in E-R

• Used in data warehouses

Page 9: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

9

Dimensional Models

• Star Schema

• Snowflake Schema

• The “Cube”

Page 10: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

10

The “Cube”

• Logical structure of ALL data warehouses

• Can be implemented physically in an RDB like Oracle

• Some view this as limited to data marts

Page 11: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

11

Star Schema

• Easy to understand

• Flexible in type of questions that can be asked

• Supports very large data warehouses

• There is data redundancy (in the dimensions)

Page 12: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

12

Snowflake Schema

• “Normalized” star schema

• More complex than the star schema - harder to understand and work with

• Solves some problems that cannot be done with star schema

Page 13: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

13

Dimension Tables

• Each variable has a set of known, relatively small, set of values

• 4 - 20 dimensions per data warehouse/data mart is the norm

• A set of independent variables that affect an observation

Page 14: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

14

Dimension Tables (cont…)

• Some numeric values are descriptive– Numeric descriptive values should be suspect

of being facts e.g. standard product price may be a fact because it can change and one can ask “what was the average standard price of the product over the last 12 months”

• Columns are descriptive and usually textual

Page 15: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

15

Dimension Tables (cont…)

• Time dimension keys may be/should be assigned in the order of the dates in the fact table - this allows physical partitioning

• In general avoid “smart” keys - they should be meaningless

• Avoid production keys

• Dimension keys should be meaningless surrogate keys

Page 16: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

16

Dimension Tables - Granularity

• Keep the grain of the data as small as possible (as detail as possible)– This makes the warehouse more resistant to

change– It is easier to add attributes to existing

dimensions– superior results in data mining operations

• Definition: The level of detail of the data

Page 17: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

17

Dimension Tables - “Types”

• Degenerate

• “Junk”

• Other

• Time

Page 18: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

18

Dimension Tables - Time

• Must be consistent across all fact tables

• Create partial attributes year, month and day and their concatenations (year + month, year + month + day, year + week, …)– Without the concatenations, it is difficult to ask

for time ranges

• All data marts and warehouses have at least one time dimension

Page 19: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

19

Dimension Tables - Degenerate

• Usually a control document id such as order number, invoice number, etc

• No value in creating a physical table

• Put the id into the fact table

• Dimensions with only one attribute

Page 20: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

20

Dimension Tables - “Junk”

• Possible Actions:– Put the these flags into the fact table– Make each one into a dimension– Drop them from the design– Create one dimension with all combinations of

these flags

• Given: Leftover flags and text attributes

Page 21: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

21

Fact Tables

• Degenerate dimension keys (if they exist)

• Facts– Additive– Semi-additive– Non-additive– None (factless tables)

• Dimension keys

Page 22: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

22

Facts - Additive

• Can be added across all combination of dimensions

• Examples: sales in dollars or units

• These are measures of activity

Page 23: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

23

Facts - Semi-additive/non-additive

• Some may be added across some dimensions but not others– e.g. Bank Balance

• Some may not be added at all– e.g. Temperature

• These are measures of intensity

Page 24: Copyright © Starsoft Inc, 2000 1 Data Warehouse Architecture By Slavko Stemberger

Copyright © Starsoft Inc, 2000

24

Closing

• Other things to look at– Mutating dimensions– Hierarchical data (e.g. product structures)– Security– Data Loading– Cleansing– etc.