78
Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database (source of data) It holds more data, which is processed by Analytical and Statistical software The data, or much of it, will probably be historical (also known as Legacy data) And some of the data probably won’t be of the same quality and content - or even consistent

Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Embed Size (px)

Citation preview

Page 1: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 1

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

A Data Warehouse can be thought of a a special type of database - generally a very large database (source of data)

It holds more data, which is processed by Analytical and Statistical software

The data, or much of it, will probably be historical (also known as Legacy data)

And some of the data probably won’t be of the same quality and content - or even consistent

Page 2: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 2

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

A more technical definition would be

‘A data warehouse is a subject-orientated, integrated, time-variant and nonvolatile set of data which supports decision making processes’.

Subject databases are designed around the essential entities of a business (for example customers) rather than applications (for example Sales, Insurance..)

The reference to time-variant means that data sets are organised by time periods (months, quarters..)

Page 3: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 3

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

A data warehouse is a snapshot of an organisation at a given or particular time.

This snapshot is created by the extraction of data from existing systems.

This extracted data must be transformed, cleaned and loaded into the Data Warehouse AND regular snapshots must be made to maintain the relevance and usefulness of the data warehoused.

Page 4: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 4

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

To process, develop and present results from what is normally a large amount of data (many gigabytes or terabtyes is realistic), there is the need to use ‘data management aids’

These aids consist of

Extraction tools

Analysis tools (such as OLAP, which we’ll meet in more details later)

Application tools

Some of the workhorses are DSS, EIS, OLAP, SQL, Data Mining

Page 5: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 5

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

There are some interesting terms in Data Warehousing

Dimensions : A spreadsheet is a 2 dimensional display of data - so are most relational databases.

However Data Warehousing tools can extend this association to a ‘multi dimension’ or a ‘variable dimension’

Do you remember the nominal dimension

the ordinal dimension

and the continuous dimension in statistics ?

Page 6: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 6

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Just in case you don’t --

A nominal variable is an unordered category such as a region, suburb, - - -

An ordinal variable is an ordered category such as an age group

A continuous variable (normally) has a numeric value such as income, passenger-kilometres

Page 7: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 7

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

There is a term ‘hypercube’ which is a smart way of displaying (visualisation - which will appear soon) muiti dimensional data - or more correctly data in a multi-dimension form

A hypercube is a combination of several types of dimensions

An example: Identifier dimensions could be product and store

(which are both nominal) and the variable dimensions could be sales and customers.

You should know the terms ‘dependent’ and ‘independent’ variables

Page 8: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 8

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Very powerful forms of analysis when both dimensions are continuous

And just to finish off this brief introduction to terms :

Data mining is the search for relationships and global patterns which exist in large databases, but are obscured or hidden in what are normally large (or enormous) amounts of data.

Data mining is also associated with advanced machine learning technologies which are used to discover pockets of knowledge hidden in the mass of data

Page 9: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 9

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

And just to finish off this introduction, and give you some feel of what ‘large amounts of data’ are:-

One company in the United States of America - a catalog publisher, reputedly has collected data on 30 million households

The company uses Quadstone (data mining), SAS, SPSS, KnowledgeSeeker.

It runs about 100 marketing campaigns each year

Page 10: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 10

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

There are a number of aspects to consider when developing a database, and many of these are applicable to a Data Warehouse.

The major hurdle is the establishment and agreement by Users of the need to build a Data Warehouse

This can be expanded into these ‘supporter’ groups

1. Understand the users by– Management information needs– Area of Business operation– Responsibilities– Computer literacy

Page 11: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 11

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

This (almost automatically) suggests a Business Management hierarchy - where no one person is responsible for everything

And on this presumption we move onto

2. Determine what decisions people need to make OR would like to make if only they had sufficient supporting information

3. What decisions they have made and are trying to assess the impact of those decisions (can you remember a number of ‘high level’ decisions made in Industry which have been disastrous - but this knowledge arrived too late to reverse or recover lost markets ?)

Page 12: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 12

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

3. Who will most likely make informed decisions using the data warehouse as the source.

It is not accurate to assume that because there is the availability of a data warehouse, all managers or decision makers will use it effectively (or at all)

4. Who could be ‘likely’ or potential new users ?

This simple question opens up the Competitive Appraisal or Enterprise Management Appraisal environment.

How effective and accurate are these Appraisals - how current are they

Who conducted them - how skilled were they ?

- what level of management skills ?

Page 13: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 13

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Another way of stating this is ‘Who has serious potential in the Organisation’

Does this conflict with Privacy and Discrimination principles ?

5. Select the most efficient, effective, processible, and reliable data from when (?) to the most current period to be the content of the Data Warehouse

6. Make life enjoyable for the users – user screens– software icon or template based– match these with the skills of the users

Page 14: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 14

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

7. Match the processes with the current and adaptive skills of the users - people do develop advanced skills

8. Ensure that the data used is accurate, and can be trusted and consistently recognised across all departments, areas, states, countries of the Company

9. Monitor the results and particularly the reactions of the users to the information advantages they are receiving

10. Search, or listen attentively for other sources of data for the particular Data Warehouse

Page 15: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 15

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

11. Be alert to changes in Management - and the effect this could or will have on a Data Warehouse (either an established one, or one being developed. - Senior Management changes do happen quickly at times - especially after a General Meeting where shareholders can either be Happy or Not Happy)

12. Keep the users, and particularly the ‘Upper End of Management’ users satisfied

And, make sure there is an energetic, resourceful and knowledgeable Project Leader (and read the paper on Team Construction at the website).

Page 16: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 16

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

There are some impediments to the design, development and use of a Data Warehouse

1. Not surprisingly, not every organisation is geared to, or is working towards total centralisation

2. The development of Data Warehouse invariably requires the integration of a number of technologies which are not compatible at many levels. This is due, in part, to the ongoing developments and upgrades which are inherent in, and to, any user of Information Technology

Page 17: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 17

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

3. The ‘we want it now’ syndrome - which if allowed to rush the design, development and implementation invariably leads to a poor design, low quality but rapid development, and a Warehouse which doesn’t match expectations

4. The effect of changes both during the development and at the operational level. Conditions do change, and requirements also change and the overall effect is ongoing planned and directed maintenance

5. Data Warehouses process large amounts of data and use, in most cases, high level analysis and reporting tools. This is not a good environment for rapid or immediate response

Page 18: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 18

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

6. Another dimension which has occurred in the past 2 to 3 years is the emphasis on ‘customer information’ which depends on normal IT systems to capture this data - this is a developing skill and early data will not be as complete nor as revealing as ‘newer’ systems

7. The tracking, storing and prediction of customer behaviour may require different analysis tools.

8. Current databases provide ‘On Line Analytical Processing’ capabilities for on line real time applications - the major effort is to associate this with legacy data in a meaningful and seamless manner - and maintain high processing rates

Page 19: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 19

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

So now that the groundwork has been laid, let’s move onto some more ‘ponderables’ of what are regarded as ‘Design Constraints’ - another name for non-avoidable requirements .

1. Visualisation– easily and quickly digested (understood)– simple but informative– recognisable information (names …..)– intuitive

Page 20: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 20

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

So what do these contribute to a Data Warehouse ?

That is not the right question - It should be, what should design provide to make the results of a Data Warehousing process – understandable– easy to reprocess– easy to explore– easy to partition or to hand over to other tools

Page 21: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 21

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

If you look at many of the ‘advanced systems’ available you would be probably convinced that one of the main design objectives is to make the ‘system’ as complex as possible

That’s not good

Many designs are complex and intricate - great skill is needed to source, process, analyse, locate, redimension

Features are not solutions - features are an aid to arrive at a solution

Page 22: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 22

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Comprehension

Smart or overfull screens have two effects– they divert the attention of serious users– they confuse people– they make a process difficult to operate

Screen displays should have more free space

Page 23: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 23

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Screen displays should provide less choice - not all possible choices (or menu items) on one screen. (Many web sites suffer from the overfull problem).

‘Smart’ dashboards do not appeal to many users

Multiple mouse clicks on the same screen mean a complex screen.

Page 24: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 24

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Delivery Speed:

Very much associated with ‘We must have it now!’ syndrome

Acceptable delay tolerance from the end user is <= 0

Designers cannot to expect users to accept a long delay even if – the results are complicated– the amount of data to be processed is large– the process algorithms are complex and/or recursive

Page 25: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 25

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Cost of ‘Implementation’ - from an idea to production

These costs include:

Labour (staff) costs during the design stage

Delay costs during the design stage

and this is before any useful results are produced - possibly a good case for a Return on Investment or Rate of Return calculation ?

The are other incidental costs such as Business staff and management time

Page 26: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 26

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Costs are also connected with the size of the Data Warehouse

A 10 table design may not incur much complexity.

A 100 table is almost certain to

A 1000 table is guaranteed to (and may not even be successful).

Page 27: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 27

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Technology Costs - Hardware and Software

These items should be scaled to the ‘size’ of the known requirements

And they should be able to be extended beyond the first implementation - if it is successful (nothing generates more demands than a successful Data Warehouse - probably due to a ‘wait and see’ attitude)

Hardware should be periodically discarded and replaced with newer and more powerful versions

Page 28: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 28

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Software is the key to fast development, user-friendly systems, and fast delivery of data to the users for their queries of the Data Warehouse applications.

A Data Warehouse works on a 2 phase basis

1. An ‘Extract, Transform and Load’ process (also known as ETL)

2. Front office user queries and reports

Much of the design work is focussed on the processes which will provide relevant and accurate data from the sources so that queries and reports can be produced

Page 29: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 29

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Daily Management and Administrative Costs

This is NOT a NEW phenomenon which occurred with the emergence of Data Warehouses

It is not avoidable but it is manageable and can be controlled

Items which are found here are

Routine loading of data into Fact and Dimension tables

Production of standard reports

Page 30: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 30

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Unexpected but Necessary Amendments to either Data or Processing

Included here are such annoying aspects such as

Late arriving facts

Late arriving dimensions

Corrections to existing data

Other Amendments include

New dimensions

New Dimension Attributes

New facts

The details or summary structure of the data source

Page 31: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 31

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Prevention of Inappropriate or Irrelevant Results

This can be avoided and depends of the ‘correct target’ being selected

Which in turn means that the extensive and continuous business requirements gathering at the beginning and throughout the life of the warehouse

Which presupposes that either a data warehouse application may outlive its purpose

- Or it may be replaced by a newer, different model

Page 32: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 32

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Many years ago a chap called Hamlet (in one of William Shakespeare’s play (of the same name)) was given these words

‘To be, or not to be

That is the question’ (and lots more of course).

The current paraphrase of that is

‘To centralise or not to centralise

That is the question’

The complete quotation is at the end of these overheads

Page 33: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 33

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

There are solid arguments to ‘Centralise’ and there are solid arguments ‘Not to Centralise’.

Centralisation of data (as in a Centralised Data Warehouse) invariably assumes

‘perfect information, control and communication’

An alternative is to create data marts (specialised smaller centres of data) and to tie, or associate, these with an architecture which allows them to work together effectively (much like a distributed database).

Page 34: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 34

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

As with many developments, there are a number of ‘inescapables’ (much like fees and taxes)

Unavoidable realities are a recognition that the ideal model is an ideal, not a practicality nor reality

A realistic model is referred to a ‘descriptive’ - the ideal model is classed as ‘normative’.

So with that little bit of Business philosophy, let’s have a quick look at reality in designing and implementing a data Warehouse

Page 35: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 35

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

The opportunities for building a ‘perfect design’ from a centralised position are very limited (in calculus this would be represented by sn 0 )

The major elements of skills, understanding, foresight, wisdom, sufficient, time are not normally available in abundance

We construct, and live with, designs in separate areas (departments) and we learn the requirements of the many users slowly and sometimes painfully. We also need time and patience to understand and recognise the impact of data

Page 36: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 36

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

We develop things incrementally

It’s not surprising then that the weight of experience indicates that a Data Warehouse is basically a decentralised system and may have no ‘core matter’ at all

Don’t get discouraged - our national road systems were not constructed overnight

Neither were our water, power, gas systems nor our Universities - and they are subject to continuous change - so why should there be the unnatural expectation for a Data Warehouse ?

Page 37: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 37

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Another level of complexity is that of many, and incompatible, technologies

It is normal for organisations to have multiple ‘business’ systems which were developed at different time periods and on different computers and with different applications software platforms

To integrate these requires much skill at the communications and applications levels. XML is the current ‘hope’ for clear and multiple different data forms.

Curl is another new object oriented language which has appeared - and there will be others …..

Page 38: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 38

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Most users expect what is known as ‘rapid deployment’ which means the time from inception to delivery of output

Oracle Corporation (and others) suggest that a minimum 6 week period from acceptance and recognition of the need for a Data Warehouse is industry acceptable

What would probably happen if the ‘wait period’ was longer ?

If the deployment period is say 12 months, what are the risks of - budget reallocation ?

- loss of organisation competitive edge ?

- no more work for the Data Warehouse contractor ?

Page 39: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 39

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Continuing Change

One aspect of Business is predictable - change

This makes long range assumptions both dangerous (particularly if they are not updated) and unreliable

Timeliness of Data

The Data Warehouse should ideally reflect both atomic data (the most persistent, lowest level of data)

and real time - which is an expectation that latest data will be less than 60 minutes old, or even 0 minutes old.

There is a term ‘lockstep’ which refers to the ‘in step’ condition of operational systems and Data Warehouse systems.

Page 40: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 40

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

The Optimists probably will look for ‘data ahead of its time’, that is before an event happens

We currently have ‘futures’ and ‘futures trading’ so is ‘future data’ all that unrealistic ?

Seamless Integration/Connection

The final requirement is for the Data Warehouse to be able to connect the real-time data without prejudice (seamless) with the static historical data (which hopefully will not alter).

This will give the appearance of a total time base

Page 41: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 41

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

The next few overheads will try to explain some current techniques and software which are employed to both minimise the time factor in producing outputs, and also improve or maintain the quality and scope of data

There will also be some thoughts about a ‘new technique’ known as ‘Executive Dashboards’

Page 42: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 42

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

A New Term - OLTP databases

OLAP is the acronym for ‘On Line Transaction Processing’ which links in nicely with the previous overheads regarding the ‘we want it now’ catch cry from Business Managers and Customers (we’ve overlooked these people have we ?)

So now we have OLTP databases and Data Warehouses

Big deal ?

What’s the difference or differences, if any ?

Is it just another expensive IT tool ?

Page 43: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 43

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

OLTP databases run business applications developed around– Siebel– SAP– PeopleSoft (and others) ? Oracle takeover

They are developed for multiple users, many of whom are remote from the processing centre (as if that is something unusual with our current capabilities with communications)

These users require (sometimes ‘demand’)– very rapid response times– very high levels of accuracy of data (why ?)

Page 44: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 44

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

In these conditions, it is reasonable to suggest that there could be thousands of concurrent users (you remember concurrency ?) many, or perhaps all, of whom want to press the Enter key (or Submit a process with the more menu function driven Client devices)

and expect – an immediate response– an accurate response

Page 45: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 45

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Let us digress a bit and ponder on what is going on at the database level

There will be a series of processing events which must occur (quickly, but in a predetermined sequence)

1. Business programming logic and SQL statements (like the SQL you have met) must be executed - this is not avoidable

2. Response data from the appropriate database or databases must be gathered - this will involve at least one CPU, some (or much) sorting, and also the unavoidable Input/Output (I/O) time.

Page 46: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 46

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

When all the detail events have been completed and the business function is performed, the production unit of work is committed, or rolled back. (you have met these conditions earlier)

Just to make things a bit more entertaining, some OLTP

database applications employ batch processing and others will service decision-support queries (with the associated extensive browsing and highly complex calculations required in many cases)

An just to query the slogan ‘Life’s Good’ (or LG), OLTP databases are expected to run thousands of transactions per minute.

Page 47: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 47

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

OLTP database applications also perform– Insert– Update– Delete

functions - and you have had some contact with transactions which affect more than 1 row in a table, and more than 1 table in a database (known as contention management and locking technology)

So you will appreciate that OLTP is a high work level environment - so to expect fast output adds to the transaction load

Page 48: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 48

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Let’s look at Data Warehouse Databases

They assist in Business decision making processes - and of course some (or many) of these are complex

Normally, Data Warehouse databases do not run transactions at high rates

However, they do respond to complex business questions relevant to the available data

These ‘queries’ are delivered to the database via complex SQL or by user friendly query tools

Page 49: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 49

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Users don’t normally expect instantaneous response

Data Warehouse response times tend to be measured in minutes - although some queries will take much longer

There are a number of interesting aspects of a Data Warehouse database

– they tend to be ‘read only’– they are not available 24 x 7 as there needs to be a

data loading time– this data load is done in batch mode in ‘off hours’– ‘parallelism’ is used to improve response time

Page 50: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 50

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

You have probably heard of the need to ‘tune’ software to improve performance (aka response time)

The same consideration applies to Data Warehouses

Performance tuning improvements can provide about 20% improvement by configuring the database manager, the database itself , and tablespace settings.

Further improvements can be achieved by carefully modifying the query access SQL code

Page 51: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 51

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

‘Tuning SQL’ invariably requires that– physical design alterations involving optimum

indexes– indexes are applied to minimise sort (output) times– clustering is used to minimise sort times– automatic Summary Tables are developed– stored procedures are used or implemented– triggers are used

Page 52: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 52

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

The term ‘multidimensional data’ is commonly used in Data Warehouse applications

There are 2 basic models

These are– The Star model– The Hypercube

The Star model is similar to a relational database table in construction

The ‘attributes’ are known as ‘facts’

Page 53: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 53

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Product Region

Model Nation

Type District

Colour Product Dealer

Region

Time

Channel Time

Channel Revenue Week

Expenses Year

Units

A Star Structure

Page 54: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 54

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Don’t be mislead

The selection of the ‘components’ and ‘sub-components’ takes considerable time and rationalisation

And agreement

Page 55: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 55

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

A Hypercube is a representation of a multidimensional set of data

These could be the dimensions of such a construction

Dimension Example

Brand Haymes

Store Caulfield East

Customer Segment Home renovation

Product Group Paints

Period September

Variable Number Sold

Page 56: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 56

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

The spatial representation of data in a cube can be difficult

2000

1999

ALL Holden Toyota ALL ALL

Red

Page 57: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 57

Operational Security Confidentiality, Integrity, Availability e.g. Password Protection, Biometrics,Firewalls

Tactical Security Confidentiality, Integrity, Availability e.g. Risk Assessment, Incidence Response

Strategic Security Confidentiality, Integrity, Availabilitye.g. Information Protection Policy, Privacy Policy

Data WarehousingData Warehousing

Levels of Abstraction

Security

Quality

Data Application Process Technology

Page 58: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 58

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

An expert in Data Warehouse design and implementation (Ralph Kimball) gives this advice on design a successful Data Warehouse

1. Separate the systems– logically– physically– administratively

Page 59: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 59

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

2. The Data Warehouse project should be considered as 4 distinct and different systems

– Production (source) transaction processing systems– Data Warehouse staging area systems– Data Warehouse presentation systems, including

client/server and Web-based query tools and report writers

– Optional high-end analytic tools supporting data mining, forecasting, scoring or allocating

Page 60: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 60

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

The data warehouse manager should NOT be responsible for the source systems which are capturing and processing transactions

The data warehouse manager should not therefore be involved with legal and financial functions

The data warehouse manager should not be involved with rollback and recovery functions of executing systems

Page 61: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 61

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

So now that we have an idea of what a Data Warehouse manager should not get involved with, let’s see what systems a Data Warehouse manager SHOULD get involved with :

1. The Data Staging area:

Production data from many sources is collected, cleaned, combined and eventually delivered to the Data

Warehouse presentation systems

An important feature here is that final data warehouse clients should not ever have access to the staging area

Page 62: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 62

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

This should ensure

– no guarantee for up-time service levels for queries or reports

– no requirement to enforce client-server security– no need to build query performance enhances such as

indexes or aggregations– that logical and physical conflicts between cleaning and

querying just won’t occur– no guarantee of consistency across separate

asynchronous data sources

Page 63: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 63

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

An explanation :

The major structures in the data staging area are– flat files– the entity/relationship schema

These are sourced from the production systems (they are either directly extracted or are derived).

About 90 to 95% of the processing in the staging area is sorting or sequential processing

Page 64: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 64

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

2. The Presentation System

This must be built to support and enhance the query and reporting activities

This system needs to be – simple (where have you heard that before ?)– fast– and most importantly, will produce the ‘right data’ to

satisfy the analysis needs of the end users and stakeholders

Page 65: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 65

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

As a bonus, those requirements which were excluded from the staging area can now be addressed

The high level data structures in the in the presentation level are

– the relational star schema– and the on-line analytical processing data cube

Page 66: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 66

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

3. An operational layer of specific high-end analytic tools

These tools will (normally) use large amounts of data from a Data Warehouse, and generally in batches

Not surprisingly perhaps, the skills of analysis (data mining, forecasting, scoring and allocating tools) use highly specialised formulae and algorithms which are best handled and applied by experts - generally not

the Data Warehouse designer

Page 67: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 67

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Data mining for instance often needs to assess large numbers of ‘observations’ - this can be thousands or hundreds of thousands of observations

And this can be an iterative process - re-use

And high data rates are the order of the day

A safe solution is to hand the set of observations, (just once) to the data-mining team (who would be professional data-mining experts) and to let them apply their skills

Page 68: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 68

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

The current tools or trends in presentation are relational star schemas and OLAP data cubes

They have proven to be excellent devices for presentation and also satisfy one of the design criteria which is ‘the ability to present results in an understandable manner’.

Supporting these tools (or these methods of data presentation) are some smart software which allows queries to be developed based on powerful algorithms.

The latest standard for SQL for instance has a ‘cube’ command in its repertoire

Page 69: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 69

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Star schemas are directed at very large data sets with millions or billions of measurements or in lay terms many millions of customers in a customer or product entity.

OLAP cubes, as you have gathered by their construction, are appropriate for smaller data sets where analytic tools can perform complex data comparisons and calculations

And of course, there are hybrid star schema - OLAP systems which combine the best of both worlds and allow for very large data sets in star schema format to be smooth drill-down targets from smaller OLAP data cubes, all within a single interface.

Page 70: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 70

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

The next 2 overheads, which are quite complex and are probably difficult to read, hopefully will give some impression of the muscle in Oracle 9i applied to the design, development and availability of databases and data warehouses

Page 71: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 71

Oracle 9iOracle 9i

Building Infrastructure

Database

Component Based DevelopmentBusiness Rules with BC4J

Model Based DevelopmentDatabase Schemas

Database Form-basedApplications and Reports

BC4J - Oracle JSP

Oracle 9iAS Forms Service

HTTP Server

Oracle 9iASReports Services

Administration

Users

Users

Business Logic Presentation Logic

9i Application Server

Page 72: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 72

Another Meaningful Diagram Another Meaningful Diagram

Other Operational Data

External Data

E-Business SuiteData

ElectronicStorefrontData

WarehouseBuilder

Designing andBuilding the Warehouse

Data Load

9i with OLAPServices

Querying

Mining

Reporting

Analysing

ApplicationServer

Creating ReportsPreparing to Mine dataBuilding BI ApplicationsBuilding Predefined Queries

Developer Suite

Legacy Data

BI Development Tools

BI Application Deployment

Page 73: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 73

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

And to finish, I’m going to introduce a ‘state of the art’ device

It is known as an Executive Dashboard

It is a subset of portal technology (you have used your Monash Portal ??) to provide

– specific– performance based information

to Upper and Senior Management

It can be described as a subset of a Data Warehouse presentation

Page 74: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 74

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

It is a one-screen display (cockpit is the term used) of all critical measurements for managing an organisation which offers

– actionable information– business performance measurements– up to date information on status– up to date information on forecasts against

benchmarks

They are the current facility of the visualisation of Critical or Key Performance Indicators

Page 75: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 75

Some Thoughts on Data WarehousingSome Thoughts on Data Warehousing

Finally :

Just in case you are wondering what the full text of Hamlet’s ‘to be or not to be’ soliloquy is

The next 3 overheads contain his thoughts on ? ? ? ?

Page 76: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 76

Shakespeare’s Hamlet, Prince of Denmark

Shakespeare’s Hamlet, Prince of Denmark

To be, or not to be - that is the question;

Whether ‘tis nobler in the mind to suffer

The slings and arrows of outrageous fortune

Or to take arms against a sea of troubles,

And by opposing them ? To die, to sleep -

No more; and by a sleep to say we end

The heart-ache and the thousand natural shocks

That flesh is heir to. ‘Tis a consummation

Devoutl’y to be wished. To die, to sleep;

To sleep, perchance to dream. Ay, there’s the rub;

For in that sleep of death what dreams may come

When we have shuffled off this mortal coil,

Page 77: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 77

Shakespeare’s Hamlet, Prince of Denmark (Act 3)

Shakespeare’s Hamlet, Prince of Denmark (Act 3)

Must give us pause. There’s the respect

That makes us calamity of so long life;

For who would bear the whips and scorns of time,

Th’ oppressor’s wrong, the proud man’s contumely,

The pangs of despis’d love, the law’s delay

The insolence of office, and the spurns

That patient merit of th’ unworthy takes,

When he himself might his quietus make

With a bare bodkin ? Who would there fardels bear,

To grunt and sweat under a dreary life,

Page 78: Data Warehousing Lec08 / 1 Some Thoughts on Data Warehousing A Data Warehouse can be thought of a a special type of database - generally a very large database

Data Warehousing Lec08 / 78

Shakespeare’s Hamlet, Prince of Denmark (Act 3)

Shakespeare’s Hamlet, Prince of Denmark (Act 3)

But that the dread of something after death -

The undiscover’d country, from whose bourn

No traveller returns - puzzles the will,

And makes us rather bear those ills we have

Than fly to others that we know not of ?

Thus conscience doth make cowards of us all;

And the native hue of resolution

Is sicklied o’er with the pale cast of thought,

And enterprises of great pitch and moment,

With this regard, their currents turn awry

And lose the name of action.