78
1 Data Warehousing Data Warehousing • Denis Manley • Enterprise Systems • FT228/3

1 Data Warehousing Denis Manley Enterprise Systems FT228/3

Embed Size (px)

Citation preview

Page 1: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

1

Data WarehousingData Warehousing

• Denis Manley

• Enterprise Systems

• FT228/3

Page 2: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

2

Data Warehousing• A data warehouse is a repository for all an

organisations data. It is designed to present data in very user-friendly formats to end-users.

• Why is Data Warehousing necessary?– think of a typical company that uses computers for

order processing, invoicing, etc.

– Data is gathered for every transaction that occurs. Different departments probably use different systems, each designed for a specific task.

Page 3: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

3

Data Warehousing (cont’d)– There will be lots of different database Tables, in

different formats, on different machines. There will be tie-ups between data items, but those tie-ups may not be obvious outside the applications that manipulate the tables.

– Also, the amount of data is constantly increasing. Some estimates state that the size of a typical data processing system doubles every two years.

– Both standard, and ad-hoc, reports must be catered for, and they must run in a reasonable time-frame.

Page 4: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

4

Data Warehousing (cont’d)

• Some solutions to the issues listed can be implemented in the applications that manage specific data. – E.G. if you want to ask “How much does this customer

owe?” then the original package is probably the one to use.

– But if you want to ask “Was this ad campaign more successful than that one?”, you require data from more disparate sources, and one application may not provide all of it.

Page 5: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

5

Data Warehousing (cont’d)• The new alternative is a Data Warehouse:

– a D.W. is a way of organising data from a wide range of formats, possibly from different locations, so it’s all available in a single location.

– Once this stage is complete, the collection of data is usually frequently replicated around multiple locations.

– This means users have a local copy of the data they need to inspect. This improves query run-times, and reduces communications overheads.

Page 6: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

6

Data Warehousing (cont’d)

– The data is not only collected and joined, however. What the data means is described in a way that the end-users can easily understand, e.g. using the name “Customer Account Number” instead of “order>cacno” for a field.

– The data must be quickly accessible, so that users can find data quickly and easily.

Page 7: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

7

Data Warehousing process

Operational data in legacy systemse.g. OLTP apps.

Data fusion: Assembles diverse dataData Cleansing: “fixing” data

Meta Data:shows transformations of data,origins of data, etc.

Data Migration:load data and meta-data into warehouse periodically

Data Warehouse

Decision-support analyst queries

Page 8: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

8

Data Warehousing Architecture

• A data warewhouse is an architectural construct of an information system that provides users with current and historical decision support information that is hard to access or present in traditional operational data stores.

• It comprises a number of components illustrated in figure 1

Page 9: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

9

Data Warehousing Architecture

Figure 1 : A datawarehouse architecture

Page 10: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

10

Data Warehousing Architecture

• Data Warehouse database– This is a cornerstone of the data warehouse

(item 2 in figure 1)– Almost always implemented on a relational

database management system– However very large databases, ad hoc

processing and the need for flexible user views e.g. aggregates and drill downs are calling for different technological approaches:

• Parallel relational database designs requiring the use of symmetric multiprocessors, Massively parallel processors and clusters

Page 11: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

11

Data Warehousing Architecture

• Speeding up traditional RDBMS

• The use of Multidimensional databases (MDDBs) which are tightly coupled to on-line analytical processing

Page 12: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

12

Data Warehousing Architecture

• Sourcing, acquisation, cleanup and transformation of data– Implementing data warehouses involves extracting data

from operational systems including legacy systems and putting it into a suitable format.

– The various tools are illustrated as item 1 in figure 1

– These tools perform all the conversions, summarisations, key changes, structural changes, and condensations needed to transform disparate data into information can be used by decision support tools

Page 13: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

13

Data Warehousing Architecture

– It produces programs and control statements required to move data into the data warehouse form multiple operational systems

– Maintains the metadata– Remove unwanted data– Converts to common data names and

definitions– Calculates summaries – Establish defaults for missing data– Keep track of source data definition changes

Page 14: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

14

Data Warehousing Architecture

• The tools have to deal with:– Database hetergeneity: DBMS can be very

different in data models, data access language etc.

– Data hetergeneity: the difference in the way data is defined and used e.g. synonyms and different attributes for the same entity etc.

Page 15: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

15

Data Warehousing Architecture

• Metadata (item 3 figure1): data about data that describes the datawarehouse– Technical metadata: contains information for

warehouse designers and administrators• Information about data sources

• Transformation descriptions

• Rules used to perform data clean up

• Access authorisation, information delivery history, data acquisation history, data access etc

– Business metadata: information that gives users an understanding of the information stored in the data warehouse.

• Queries, reports images

• Data warehouse operational information e.g. data history and ownership

Page 16: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

16

Meta Data Versioning• In the operational environment, there tends to be a

single instance of data and meta data at any one moment in time.

• In the data warehouse environment, there are multiple definitions of data and meta data over an historically long period of time.

• Therefore, versioning of data warehouse meta data is crucial to the success of the data warehouse vis-a-vis the end users’ ability to access and understand the data in the data warehouse.

Page 17: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

17

Guidelines for Metadata Management

• Develop an Information Directory that integrates technical and business metadata.

• Keep the metadata current and accurate!

• Maintain the time variant history of the metadata!

• Provide meaningful descriptions and definitions (use business definitions, not technical definitions, where possible).

• The end users must be educated about what metadata is, how to access it, how to use it, etc.

Page 18: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

18

Meta Data Answers Questions for Users of the Data Warehouse

• How do I find the data I need?

• What is the original source of the data?

• How was this summarization created?

• What queries are available to access the data?

?? How have business

definitions and terms changed over time?

How do product lines vary across organizations?

What business assumptions have been made?

Page 19: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

19

The Role of Meta Data

in the Data Warehouse Architecture

• Know what data you have

and

• You can trust it!

Meta Data enables data to become Meta Data enables data to become information, because with it youinformation, because with it you

Page 20: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

20

Data marts (item 4 figure 1)• A data mart is a data store that is subsidary to a data

warehouse of intergrated data.• The data mart is directed at a partition of data (subject area)

that is created for the use of a dedicated group of users and is sometimes termed a “subject warehouse”

• The data mart might be a set of denormalised, summarised or aggregated data that can be placed on the data warehouse database or more often placed on a separate physical store.

• Data marts can be “dependent data marts” when the data is sourced from the data warehouse.

• Independent data marts represent fragmented solutions to a range of business problems in the enterprise, however, such a concept should not be deployed as it doesn’t have the “data intergration” concept that’s associated with data warehouses.

Page 21: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

21

Independent Data Marts

Systems of Record

Extract,Transform,

Clean, Integrate,Summarize,

etc....

Data Mart

Page 22: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

22

Independent Data Marts

Three Times!Extract,

Transform,Clean, Integrate,

Summarize, etc....

Page 23: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

23

Independent Data Marts

$Significant and expensive

duplication of effort and data.

Page 24: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

24

Independent Data Marts

$Maintenance of proliferating unarchitected

marts expensive and cumbersome.

$

Page 25: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

25

Unarchitected Data Marts

?There may be metadata for some marts, but what about consistency & history?

md

md

Page 26: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

26

md

md

Contrast Architected Data Warehouse

Data WarehouseDept’l

Dept’l

Dept’l

Atomic

Systems of Record

Data Access

Dependent (Architected) Departmental “Marts” with the Appropriate Subset of

Metadata

Information Directory

md

Page 27: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

27

Independent Data Marts vs. The Real Thing

Architected to meet organizational as well as departmental requirements

Data and results consistent

Redundancy is managed

Detailed history available for drill-down

Metadata is consistent!

Unarchitected Data Marts

Easy to do, but...

? Are the extracts, transformations, integrations & loads consistent?

? Is the redundancy managed?

? What is the impact on the sources?

Data Warehouse

Atomic Dept’l

md

Page 28: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

28

Independent Data marts• However, such marts are not necessarly all

bad.

• Often a valid solution to a pressing business problem:– Extremely urgent user requirements– The absence of a budget for a full data

warehouse– The decentralisation of business units

Page 29: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

29

Data Warehousing Architecture

• Access Tools (item 5 figure 1)– The principal purpose of the data warehouse is

to provide information for strategic decision making.

– The main tools used to achieve this objective are:

• Data query and reporting tools

• Executive information system tools

• On-line analytical processing tools

• Data mining tools

Page 30: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

30

Data Warehousing Architecture

• Query and reporting tools– Reporting tools

• Production reporting tools e.g. generate operational reports

• Report writers: inexpensive desktop tools

– Managed Query tools• Shield users from the complexities of SQL and

database structures by inserting a metalayer between the users and the database

Page 31: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

31

A Few Definitions

• OLAP– On-Line Analytical Processing– A set of functionality that attempts to facilitate

multidimensional analysis

• Multidimensional Analysis– The ability to manipulate information by a

variety of relevant categories or “dimensions” to facilitate analysis and an understanding of that data

– Has also been called as “drill-down”, “drill-across” and “slice and dice”

Page 32: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

32

A Few Definitions

• Hypercube– A means of visually representing

multidimensional data

• Star Schema– A means of aggregating data based on a

set of known dimensions, attempting to store data multidimensionally in a two dimensional RDBMS

• Snowflake Schema– An extension of the star schema by means

of applying additional dimensions to the dimensions of a star schema in a relational environment

Page 33: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

33

A Few Definitions

• Multidimensional Database–Also known as MDDB or MDDBS–A class of proprietary, non-relational database

management tools that store and manage data in a multidimensional manner, as opposed to the two dimensions associated with traditional relational database management systems

• OLAP Tools–A set of software products that attempt to facilitate

multidimensional analysis–Can incorporate data acquisition, data access, data

manipulation, or any combination thereof

Page 34: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

34

A Few Definitions

• ROLAP– Relational OLAP– Using an RDBMS to implement an OLAP

environment– Typically involves a star schema to provide the

multidimensional capabilities– OLAP tool manipulates RDBMS star schema

data

Page 35: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

35

A Few Definitions

• MOLAP (1)–Multidimensional OLAP

–Using an MDDBS to store and access data

–MDDBS “directly” manages data multidimensionally

–Usually requires proprietary (non-SQL) access tools

• MOLAP (2)–OLAP tool facilitates multidimensional capabilities without

the need for a star schema

–Often utilizes a 3-Tier environment, where middle tier server preprocesses data from an RDBMS

–Some OLAP tools access an RDBMS directly and build “cubes” as a “fat” client

Page 36: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

36

How Can OLAP Be Accomplished?

• Use the Data Warehouse as the architected foundation for the organization’s informational processing requirements

• Use the appropriate design techniques to ensure that the data required is at the appropriate degree of granularity at the atomic level, and the appropriate degree of summarization at the departmental level

• Use the appropriate tools to either access relational data in a multidimensional manner, or manage multidimensional data

Page 37: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

37

Data mining• A CSF for any business is its ability to effectively

use information.• This strategic use can occur from discovering

hidden, undected and frequently valuable information about customers suppliers retailers etc.

• This information can be used to formulate effective business, marketing and sales strategies.

• A relatively new technology that can be used to achieve this strategic advantage is “data mining”“data mining”

Page 38: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

38

Data visualisation• Data visualisation is a method of presenting the

output of the previously mentioned methods in such a way that the problem or solution is clearly visible to domain experts and even casual observers.

• It goes way beyond simple bar and pie charts• It is a collection of complex techniques that focus

on determining the best way to display complex patterns on a 2-D computer monitor.

• Such techniques involve experimenting with various colours, shapes 3-d images, sound and vittual reality to help users really see and feel the problem and solution.

Page 39: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

39

DW administration and management(Item 6 figure 1)

• Managing data warehouses involves:– Security and priority management– Monitoring updates from multiple sources– Data quality checks– Managing and updataing metadata– Replicating, subsetting and distributing data– Backup and recovery– Data warehouse storage management e.g.

capacity planning, hierarchical storage management, purging of aged data.

Page 40: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

40

Information delivery system(Item 7 figure 1)

• The IDS is used to enable the process of subscribing for data warehouse information and having it delivered to one or more destinations of choice according to some user-specified scheduling algorithm

• IDS may be based on time of day or on completion of an external event.

• IDS can be achieved by Client/Server architecture and now by Interner/intranet and World Wide Web.

Page 41: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

41

Top 10“Don’t’s”

ofData Warehousing

Page 42: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

42

10. Pre-selecting Your Technical Environment

This is a very common trap in which many organizations find themselves. It is traditional to select the hardware, software and other physical, technical components of a system as one of the earliest activities.

However, a data warehouse is an unstable environment from a sizing perspective. How do you know the hardware/RDBMS/end user tool is appropriate for your data warehouse before conducting even the first round of analysis?

If at all possible, wait to select your technical environment until after you have analyzed the business requirements for information, data, and potential systems of record.

Page 43: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

43

9. Allowing Access Tool to Determine Data Architecture

This is an extension of #10, but is important enough to list by itself.If you select an end user tool before developing your data architecture, it is very likely that that architecture will suffer at the hand of design requirements delivered by the tool.If you have to sacrifice design requirements in order to meet functional requirements of a tool, it is probably time to put that tool aside and select another one.

Page 44: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

44

8. Unarchitected Data MartsOK. Data marts are good; they are an essential part of the data warehouse architecture. But to build only a data mart and to ignore the rest of the data warehouse (specifically the atomic level data and centralized meta data) will lead you down a path that will be more expensive and deliver less quality of data than the alternative.The alternative is to architect and build the data warehouse incrementally, iteratively. Include data marts as departmental instances of the architecture, and populate them from the atomic level data. This will ensure accuracy across the architecture, and reduce costs by eliminating unnecessary population of stand-alone data marts.

Page 45: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

45

7. Boiling the OceanIt is more efficient to implement the data warehouse in small, achievable and palatable chunks than to try to implement it all at once. When I say “boil the ocean”, I mean trying to do too many things for too many people all at the same time.There is an old adage: “You can have everything; where would you put it all?” The same holds true for a data warehouse. If you try to design, develop and implement a data warehouse that is all-encompassing as your first iteration, how will the users be able to use all that you delivered? And in the mean time, while you’ve been trying to meet all of their needs, you have failed to meet any needs. And users won’t forget that for a long time.

Page 46: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

46

6. “If you build it they will come”

If you design, develop and implement an operational system, such as an order processing system, that new system is typically going to replace an existing system. In other words, the old system goes away and the users have no choice but to use the new one. Not so with the d/w.“If you build it…” implies an analysis that includes only bottom-up activities. It is crucial to the success of a data warehouse that a top-down analysis of user requirements for information be conducted.After that, users must be tutored, mentored and otherwise have their hands held as part of the implementation of the data warehouse. Existence does not guarantee utilization and, therefore, value.

Page 47: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

47

5. Putting ROI before RFI (Requirements for Information)

It is very difficult to quantify the intangible benefits that a data warehouse can provide to an organization. How can you put a price on increased customer loyalty. Somewhere, sometime, someone has probably made this calculation. In most cases, however, the determination of how beneficial the data warehouse will be is based on criteria that was developed for operational systems. Just as you cannot use operational data to meet your strategic informational requirements, it is difficult to calculate the return on investment (ROI) of a data warehouse.

In terms of benefits to the organization, it is more appropriate to concentrate on how well the data warehouse addresses the target users’ requirements for information.

Page 48: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

48

4. No Committed User Involvement

Write this down:The success of any data warehouse is directly proportional to the amount of end user participation!A data warehouse cannot be successful without active participate on the part of the target users. Period. If you do not have user participation, you will find yourself in a situation where you will build it and hope that they will come.If there is no serious user participation in a data warehouse project, you have to seriously question whether or not the organization truly needs a data warehouse.

Page 49: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

49

3. No Dedicated DBAIn many situations the lack of a dedicated database administrator (DBA) has prevented a data warehouse project to be complete 1) on time, or 2) successfully.

“Borrowing” a DBA from the operational “pool” will only result in questions about the nature of the data warehouse data models and database design. It’s too flat, not normalized properly, too much redundancy, and other criticisms are well suited for an operational system’s database design, but not a data warehouse.

Considering that “data” is the first word in “data warehouse”, be sure you have a dedicated database administration resource committed to this important project.

Page 50: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

50

2. No Meta DataMeta data is like documentation and training: Everyone knows it is necessary, but it usually gets dropped somewhere along the route to implementation.For the data warehouse, meta data is more important than just your typical documentation. Remember, in order to turn data into information you have to have the data, know that you have it, be able to access it, and trust it. Meta data is the means by which the users will be able to understand and trust the data. A time-variant record of where data came from, what happened to it along the way, where it is in the data warehouse, and what vehicles exist to access it will spell the difference between success and frustration.

Page 51: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

51

1. Analysis ParalysisThe inability to proceed past a sticking question. Wanting to “boil the ocean” and model/design everything before proceeding with development. Having to resolve political issues surrounding a “standard” or “common” definition. All of these things (and more!) will result in analysis paralysis.

The 80/20 rule is very applicable to the development of a data warehouse. Execute 20% effort to get 80% of the total outcome, then move on to the next set of challenges and opportunities. Many data warehouse failures started when the development team stopped.

Get your hands around an idea, understand what the users’ requirements for information are, and build something that produces something that can be evaluated. Don’t just stand there…do something!

Page 52: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

52

Parallel Data Management• A topic that’s closely linked to Data Warehousing is that

of Parallel Data Management.• The argument goes:

– if your main problem is that your queries run too slowly, use more than one machine at a time to make them run faster (Parallel Processing).

– Oracle uses this strategy in its warehousing products.

• There are two types of parallel processing - Symmetric Multiprocessing (SMP), and Massively Parallel Processing (MPP)

Page 53: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

53

Parallel Data Management (cont’d)

• SMP - means the O.S. runs and schedules tasks on more than one processor without distinction.– in other words, all processors are treated equally in an

effort to get the list of jobs done.

• MPP - more varied in its design, but essentially consists of multiple processors, each running their own program.– the problem with MPP is to harness all these processors

to solve a single problem.

Page 54: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

54

Parallel Data Management (cont’d)

• Regardless of the architecture used, there are still alternatives regarding the use of the parallel processing capabilities.– In normal Transaction processing, each transaction

runs on a separate processor, because transactions are small units of work that run in a reasonable time-span.

– However, the type of analysis carried out in data warehouse applications isn’t like that. Typically you want to run a query that looks at all the data in a set of tables. The problem is splitting that into chunks that can be assigned to the multiple processors.

Page 55: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

55

Parallel Data Management (cont’d)

• There are two possible solutions to this problem: Static and Dynamic Partitioning.– In Static Partitioning you break up the data into a

number of sections. Each section is placed on a different processor with its own data storage and memory. The query is then run on each of the processors, and the results combined at the end to give the entire picture.

– This is like joining a queue in a supermarket. You stay with it until you reach the check-out.

Page 56: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

56

Parallel Data Management (cont’d)

– The main problem with Static Partitioning is that you can’t tell how much processing the various sections need. If most of the relevant data is processed by one processor you could end up waiting almost as long as if you didn’t use parallel processing at all.

– In Dynamic Partitioning the data is stored in one place, and the data server takes care of splitting the query into multiple tasks, which are allocated to processors as they become available.

– This is like the single queue in a bank. As a counter position becomes free the person at the head of the queue takes that position

Page 57: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

57

Parallel Data Management (cont’d)

• With Dynamic Partitioning the performance improvement can be dramatic, but the partitioning is out of the users hands.

Page 58: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

58

2 Tiered Architecture

Shared Global Data Storage Shared Application Logic

End User Functionality Data Display Personal Data Storage Personal Application Logic

Tier 1

Enterprise Server

Tier 2

Client

This slide defines the fundamentals of a two-tiered architecture

Page 59: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

59

2 Tiered Architecture in the DW Environment

Atomic Data AcquisitionOrganizational Level DataSecondary Data AcquisitionDepartmental Level Data

Data Access ProcessingIndividual Level DataData Manipulation

Tier 1

Enterprise Server

Tier 2

Client

This slide depicts where the various components of the data warehouse architecture would fall in a two-tiered technical architecture.

Page 60: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

60

2 Tiered Architecture in the DW Environment

Tier 1

Enterprise Server

Tier 2

Client

Dept’l

Dept’l

Atomic

Ind’l

Information Directory

Ind’l

This is a graphical representation of the two-tiered data warehouse architecture. The atomic level, metadata and departmental levels would reside on the Enterprise Server, while any individual level data would be stored on the client. In this scenario the client is a “thin” one, with little processing beyond query and simple data manipulation.

Page 61: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

61

2 Tiered Architecture in the DW Environment

Dept’l

Dept’l

Atomic

Ind’l

Information Directory

Tier 1

Enterprise Server

Tier 2

“Fat” Client

In this variation, a “fat” client at the second tier facilitates a considerable amount of data manipulation, freeing up capacity on the Enterprise Server for other, more global activities.

Page 62: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

62

2 Tiered Architecture in the DW Environment

Tier 1

Enterprise Server

Tier 2

“Thin” Client

Dept’l

Atomic

Ind’l

Information Directory

Ind’l

Multidimensional Preprocessing

In this variation the second tier is a “thin” client, and volumes of data are pre-processed on the Enterprise Server. This enables access to sophisticated analyses by more than one “thin” client.

Page 63: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

63

3 Tiered Architecture

Shared Global Data Storage

Shared Application Logic

Shared Local Data Shared Application

Logic

Tier 1

Enterprise Server

Tier 2

Departmental Server

Tier 3

Client

End User Functionality

Personal Data Storage

Personal Application Logic

In a three-tiered technical environment a “middle” tier has been added for shared data and application logic.

Page 64: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

64

3 Tiered Architecture in the DW Environment

Atomic Data Acquisition

Organizational Level Data

Secondary Data Acquisition

Departmental Level Data

Data Access Processing

Individual Data Data Manipulation

Tier 1

Enterprise Server

Tier 2

Departmental Server

Tier 3

Client

In the data warehouse, this middle tier can take on the responsibility for departmental level (data mart) processing. There is little or no impact on the third tier, while the first tier is relieved of some mid-level functionality.

Page 65: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

65

3 Tiered Architecture in the DW Environment

Tier 1

Enterprise Server

Tier 2

Departmental Server

Tier 3

Client

Dept’l

Dept’l

Atomic

Ind’l

Local m/dInformation Directory

Ind’l

In this graphical representation of the three-tiered technical architecture, the departmental data has been moved to the middle tier. In reality, departmental instances can still exist on the first tier. In cases where the middle tier is geographically distant from the first tier, it is advisable to provide a subset of local metadata for the users of the middle tier’s data.

Page 66: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

66

3 Tiered Architecture in the DW Environment

Dept’l

Dept’l

Atomic

Ind’l

Local m/dInformation Directory

Tier 1

Enterprise Server

Tier 2

Departmental Server

Tier 3

“Fat” Client

This scenario also supports a “fat” client at the third tier, where the individual level data is maintained and manipulated.

Page 67: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

67

3 Tiered OLAP Arch. in the DW Environment

Secondary Data Acquisition

Departmental Level Data

Shared Data Access Processing

Data Access Individual DataData

Manipulation

Tier 1

Enterprise Server

Tier 2

Departmental Server

Tier 3

“Thin” Client

Atomic Data Acquisition

Organizational Level Data

One of the biggest benefits of a three-tiered technical architecture for a data warehouse is the ability to facilitate tremendous amounts of pre-processing of data on the middle tier.

Page 68: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

68

3 Tiered OLAP Arch. in the DW Environment

Dept’l

Atomic

Multidimensional Preprocessing

Information Directory

Local m/d

Ind’l

Ind’l

Tier 1

Enterprise Server

Tier 2

Departmental Server

Tier 3

“Thin” Client

In this case, the middle tier pre-processes data for use by many “thin” clients, rather than requiring the individual levels on the third tier to repetitively processes the same data with complex algorithms.

Page 69: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

69

Product CodeProdRef Eff. Date

ProdRef End DateProduct NameUnit PriceProduct CategoryProduct TypeProduct Sub Type

Product RefCustomer IDActivity DateProduct Code

Product NameSales Rep IDQty PurchasedTotal DollarsPromotion Flag

Cust PurchasesCustomerCustomer IDStatus Date

Cust Addr StateCust ZIP CodeCustomer TypeCustomer Status...

Sales Rep ID

Sales Person NameStore ID

Sales Rep RefStore ID

Store NameStore LocationDistribution Channel

Outlet Reference

Cust Averages

Customer IDCust Average Date

Cust Avg. End DateCust Avg. Rev.Cust Longevity

The Atomic Schema

Page 70: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

70

Dimension Table 1

Dimension Table 2

Dimension Table 3

Dimension Table 4

Fact Table

Dimension Key 4

Description 4Aggregatn Lvl 4.1Aggregatn Lvl 4.2Aggregatn Lvl 4.n

Dimension Key 1

Description 1Aggregatn Lvl 1.1Aggregatn Lvl 1.2Aggregatn Lvl 1.n

Dimension Key 2

Description 2Aggregatn Lvl 2.1Aggregatn Lvl 2.2Aggregatn Lvl 2.n

Dimension Key 3

Description 3Aggregatn Lvl 3.1Aggregatn Lvl 3.2Aggregatn Lvl 3.n

Dimension Key 1Dimension Key 2Dimension Key 3Dimension Key 4

Fact 1Fact 2Fact 3Fact 4...Fact n

The Star Schema

Page 71: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

71

Dimension Table 1

Dimension Key 1

Description 1Aggregatn Lvl 1.1Aggregatn Lvl 1.2Aggregatn Lvl 1.n

Dimension Table

• Describes the data that has been organized in the Fact Table

• Key should either be the most detailed aggregation level necessary (e.g. country vs. county), if possible, or...

• Surrogate keys may be necessary, but will decrease the natural value of the key

• Manageable number of aggregation levels

Page 72: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

72

Dimension Key 1Dimension Key 2Dimension Key 3Dimension Key 4

Fact 1Fact 2Fact 3Fact 4...Fact n

Fact Table

Fact Table• Quantifies the data that has been

described by the Dimension Tables

• Key made up of unique combination of values of dimension keys

–ALWAYS contains date or date dimension

• Fact values should be additive

–Aggregations of quantities or amounts from atomic level

–No percentages or ratios

–May be non-additive, time-variant data

Page 73: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

73

Purchases 1

Days of ActivityUnit PriceTotal QuantityTotal DollarsReturned QtyReturned DollarsPromotion Qty

Sales Rep IDProduct CodeCust ZIP CodeCustomer TypeWeek Ending Date

Cust ZIP Code

CityState/ProvinceCountry

Customer Location

Product

Product Code

Product NameProd. CategoryProduct TypeProd Sub Type

Week Ending Date

MonthQuarterYear

Date Information

Customer Type

Cust Type Desc

Customer Type

Selling Responsibility

Sales Rep ID

Sales Rep NameStore IDStore NameStore LocationSales Channel

For Example:

Page 74: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

74

Select E.Month, B.Customer_Type, C.Product_Type,D.Store_Location, sum(A.Total_Quantity)

From Purchases_1 A, Customer_Type B, Product C,Selling_Responsibility D, Date_Information E

Where B.Customer_Type = A.Customer_Type andC.Product_Code = A.Product_Code andD.Sales_Rep_ID = A.Sales_Rep_ID andE.Week_Ending_Date = A.Week_Ending_Date andE.Year = “1996” andC.Product_Category = “V”

Group by E.Month, B.Customer_Type, C.Product_Type,D.Store_Location;

Star Schema Query

Page 75: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

75

Caution!: Overly Complex Dimension

• Number of aggregation levels within the dimension becomes unmanageable

• Logically or functionally incorrect combination of aggregation levels within a dimension

Page 76: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

76

Split overly complex dimension into several logical, more manageable

dimensions, based on business function of aggregation levels

Answer #1: Split Dimension

Page 77: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

77

Answer #2: The Snowflake

• Identify hierarchies of aggregation levels and “dimensionalize” the primary dimension

• Secondary dimensions descriptive of the primary dimension

Page 78: 1 Data Warehousing Denis Manley Enterprise Systems FT228/3

78

Weekly

DateD1

D2

D3

D4

Monthly

DateD1

D2

D3

D4

Answer: Distinct Time Period Fact Tables

• Create separate fact tables to account for different time periods

• Date still part of each fact table key• Same dimension tables used by both fact tables• Improves overall performance (loading and accessing)

for each time period• Will not increase amount of managed redundancy