49
SFDV3007 Chapter 4: Decision Support Systems

SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Embed Size (px)

Citation preview

Page 1: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

SFDV3007

Chapter 4:Decision Support Systems

Page 2: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Overview of Chapter 4

• What is decision support?• Decision support systems• Data warehouse concepts• Data warehouse analysis & design• Online analytical processing

(OLAP)

2

Page 3: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

What is decision support?

(Kifer §1.4, 17.1; Silberschatz §22.1)

• Data → timely, relevant, well-visualised information.

• Tune information and presentation to specific purposes (often decision-making).

3

Page 4: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Decision-making occurs atthe operational level

(see also Figure 4–1)

• Very short-term.• Well-defined inputs.• Produced by existing applications or simple

front-end tools.• Line managers.• Operational decision making has to do with

the day-to-day operations of an organisation, e.g., should we produce hammers tomorrow?

• Data are clearly defined (how many hammers do we need to produce 4

Page 5: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Decision-making occurs atthe tactical level

(see also Figure 4–1)

• Short-term.• Less well-defined inputs.• Middle managers.• Tactical decision making has to do with short-term

planning (weeks → months), e.g., when should we start gearing up for Christmas production? When is the best time to introduce or discontinue a particular product?

• Things start to get less clear at this level, because external factors are starting to influence the data. Data may be coming not only from internal systems, but also from outside, and will probably need some processing to be useful. 5

Page 6: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Decision-making occurs atthe strategic level

(see also Figure 4–1)

• Long-term.• Ill-defined inputs.• Often cannot use pre-existing applications ⇒

Decision Support Systems (DSS); or Executive Information Systems (EIS).

• Senior managers.• Strategic decision making has to do with long-term

planning (months → years), e.g., should we expand our product in China?

• The inputs at this level come from both internal and external sources and often include ill-defined or “fuzzy” data, and even rumour and speculation. 6

Page 7: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Operational vs. decision support queries

Operational– How many brass reciprocating hammers do we have in

stock?– How much electrical twine did we sell yesterday?

Decision support– How many brass reciprocating hammers were sold to

customers aged 18–25 in large North Island towns over each of the last six months?

– If we double the advertising budget for electrical twine, how might that affect revenues for the next six months?

The key point here is that operational queries are typically very easy to ask in SQL, whereas decision support queries can be difficult or even impossible to formulate using SQL. 7

Page 8: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

There is a strong need for DSS

• Modern business very complex.• Shrinking time frame for decision-making.• Data from multiple sources: (see Figure 4–1)

– internal vs. external– “formal” vs. “informal”⇒ must be sensibly integrated

• External data includes things like economic indicators, share prices, marketing information, competitors’ data (!); anything that’s relevant to the running of the business.

• Informal data includes hearsay, rumour, speculation, gossip, etc., etc.

8

Page 9: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Components of a DSS(adapted from Rob & Coronel, Figure 12.1)

9

Page 10: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Components of a DSS(adapted from Rob & Coronel, Figure 12.1)

10

• The data store is a specially-structured database integrating data from multiple sources.

• Data extraction and filtering extracts, validates and cleans data from many sources. Data cleaning is a non-trivial problem; sometimes it can be faster to re-enter the data manually than to clean it automatically. Cleaning includes things like ensuring that all data have the same format (not just the data type, but also things like unit sizes, e.g., thousands vs. millions of dollars), that values are stored in the correct columns, and even things like adjusting monetary values for differences in exchange rates or inflation.

• Business model data are generated by various modelling algorithms, like a regression model or linear programming.

• The end-user presentation tool is important for visualising information in a useful form (e.g., graphs vs. raw numbers). Poor visualisation can make the difference between a good decision and a bad decision.

Page 11: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

There are many types of decision support tool

Basic– Ad hoc query tools (SQL).– Graph and report generators.– Spreadsheets (small data sets only!)

More advanced– Data warehouses.– Online analytical processing (OLAP).– “OLAP” was first used by Codd in 1993

11

Page 12: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Operational data vs.decision support data

12

• Operational data are usually stored in online transaction processing (OLTP) databases.

• Decision support data are needed for tactical and strategic decision making.

• The key differences are:• Timespan• Granularity• Dimensionality

Page 13: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Operational data vs.decision support data

13

Characteristic Operational data Decision support data

Data currency Current operations Historic data

Real-time data Snapshot of company data

Time component (week/month/year)

Granularity Atomic, detailed data Summarised data

Summarisation level

Low; some aggregation

High; heavily aggregated

Data structure Highly normalised Non-normalised

Mostly RDBMS Complex structures

Some relational; mostly multidimensional

Transaction type Mostly updates Mostly queries

Transaction volumes

High update volume Periodic loads and summary calculations

Transaction speed Updates are critical Retrievals are critical

Query activity Low to medium High

Query scope Narrow range Broad range

Query complexity Simple to medium Very complex

Data volumes Hundreds of MiB → GiB

Hundreds of GiB → TiB

Page 14: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Timespan is a key difference

Operational– Very short (current transactions).

Decision support– Long (past and future).– Data may not be current.

14

Page 15: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Granularity is a key difference

Operational– Represent specific transactions (atomic).

Decision support– Varying levels of aggregation (atomic →

highly summarised).• For example, sales aggregated by day, week, month, year,

city, region, country, …. Typically, all of these levels of aggregation will be pre-computed so as to speed up querying.

– Drilling down vs. rolling up.

15

Page 16: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Dimensionality is a key difference

(Kifer §17.2; Silberschatz §22.2.1)

Operational– “Flat” (tables of atomic transactions). **

Decision support– Many dimensions.

• However “dimension” means something different in a DSS context. “Variable of interest” is a more accurate term. In that sense, a single relation stores data about a single item of interest, and can therefore be considered to be a single dimension.

– Orders by region per quarter (2D)– Compare sales of products during the last six months by region,

city, store & customer (4D)

– Multidimensional data may be stored in multidimensional databases (MDD), which store data as multidimensional arrays rather than as tables.

16

Page 17: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Dimensionality is a key difference

(adapted from Rob & Coronel, Figure 12.13; see also Kifer §17.2 & Silberschatz §22.2.1)

17

Page 18: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Data warehouses storedecision support data(Mannino §14.1.2, 14.1.3; Rob & Coronel Table 12.5)

• Designed and optimised for decision support data.

• Internal structure quite different from operational databases:– aggregated– denormalised– data from multiple internal/external

sources

18

Page 19: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

DW data are integrated(Rob & Coronel Table 12.5)

Operational database data– Mostly internal sources.– Multiple representations.

• Similar data from different sources can have different representations or meanings. For example, tax numbers may be stored as ##-###-### or as ########, and a Boolean condition may be labelled as T/F, 0/1 or Y/N.

• Internal source: sales database.

Data warehouse data– Both internal and external sources.

• External sources: share prices, government economic indicators, etc.

– Transformed, cleaned and summarised during integration.

19

Page 20: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

DW data are subject-oriented

(Rob & Coronel Table 12.5)Operational database data

– Functional or process-oriented (invoices, payments, products).

Data warehouse data– Facts or measures organised by major subject areas (sales,

marketing, etc.).– Held according to dimensions or variables of interest:

product, customer, region, …– Aggregated data from many operational tables.– Queries tuned to specific decision-making needs.

20

Page 21: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

DW data are time-variant

(Rob & Coronel Table 12.5)Operational database data

– Current transactions with precise time stamps.

Data warehouse data– Time an important dimension for almost all

subject areas.– Data aggregated by time, e.g., sales by week,

month, quarter, year…– Historical focus (past and future).

21

Page 22: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

DW data are non-volatile

(Rob & Coronel Table 12.5)Operational database data– Frequent changes ⇒ dynamic.– Often archived periodically.– The bigger the database, the slower it will be, so archiving periodically will

help with database performance.

– The main side effect of this is that operational databases typically don’t grow that large

Data warehouse data– Read only (occasional batch updates) ⇒ static.– Historical data retained ⇒ always growing (GiB → …)– Data are generally at least several TiB in size, and are always growing because

nothing is deleted.

22

Page 23: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Drill-down and Roll-up• The two basic hierarchical operations

when displaying data at multiple levels of aggregations are the drill-down and roll-up operations.

• Drill-down refers to the process of viewing data at a level of increased detail

• Roll-up refers to the process of viewing data with decreasing detail.

23

Page 24: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Defining a data warehouse

in more detail(Kifer §17.6; Silberschatz §22.4.1; Table 4–2)• Read-only database optimised for data analysis and

query processing.• Data from:

– “legacy”/archived databases – operational databases– other sources

• Optimisation includes:– decisions on aggregations– important dimensions– appropriate indexing and physical designNote that because they are read-only, you can throw as many indexes at

them

24

Page 25: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Data marts are small, specialised data

warehouses• Focused subset of data.• “Clusters” of data marts surrounding

central enterprise data warehouse?– This is useful if the central data warehouse is

particularly large. Extracting a smaller subset relating only to the problem at hand enables faster processing and a tighter focus.

25

Page 26: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Data warehouse analysisis more demanding

(Mannino §14.3.1)

• Some queries may be impossible if not designed for.

• Not as flexible for ad hoc queries.• Users must identify intended use.• Data derived from both internal and

external sources (e.g., Internet: Yahoo!, Dow Jones, NASDAQ).

26

Page 27: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

The difficulty of DW design

(The Standish Group (1997), “The Meta Myth”; http://standishgroup.com/)

Interviewer: How many data warehouses have you had?

Data warehouser: We have had eight.

Interviewer: To what do you attribute so many warehouses?

Data warehouser: Seven mistakes…

27

Page 28: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Facts are a key design aspect

(Mannino §14.3.2)

• Facts are the base values that we are interested inmonitoring .

• Examples: revenue, profits, cost, number of sales.

• Also known as measures.

28

Page 29: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Dimensions are a key design aspect

(Mannino §14.3.2)

• A factor/variable that influences the facts.

• The values of the dimensions affect our view of the facts

• Examples: time, product, customer, salesrep, location.

• Each has attributes.

29

Page 30: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Time as a dimension(see also Mannino §14.2.3)

• Not as simple as it seems!• Granularity (unit size): year,

month, week, day, hour.• Alternate units (periodicity):

season, financial year, quarter.

30

Page 31: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Star schemas for relationaldata warehouses

(Kifer pp. 715–717; Silberschatz §22.4.2; Figure 4–3; see also Data Warehousing Guide ch. 2)

• Central fact table.• Cluster of related dimension tables.• Needed because of inadequate

physical data independence? (denormalised)

• Partial normalisation leads to a “snowflake” or “starflake” structure (also “constellation”).

31

Page 32: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Star schemas for relationaldata warehouses

(Kifer pp. 715–717; Silberschatz §22.4.2; Figure 4–3; see also Data Warehousing Guide ch. 2)

32

Page 33: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Star schemas for relationaldata warehouses

(Kifer pp. 715–717; Silberschatz §22.4.2; Figure 4–3; see also Data Warehousing Guide ch. 2)

• Star schemas are the most common structure used for relational data warehouses.

• The fact table is typically a huge composite of numbers relating to various things, and is heavily denormalised with lots of duplicated values.

• Relational data warehouses are typically denormalised, although this has more to do with a lack of physical data independence in RDBMS.

33

Page 34: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Three steps to populatea data warehouse

This process is often referred to by the acronym ETL

Extraction: obtaining data from sources.Transformation: altering form of data (includes

cleaning). Transformation involves ensuring that consistent data formats are used and also fitting imported data into the DW structure . Cleaning involves ensuring that data are correct, accurate and self-consistent.

Loading: adding data to warehouse.• Possibly intermediate data staging steps.

– Data staging can happen between each of the three stages. Thus, we might extract the data from the original sources and store it in a temporary staging database before it enters the transformation process.

• Critical for successful data warehouses.

Page 35: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Performance tuningfor data warehouses

• Complex queries ⇒ denormalisation (fewer joins).• Mostly read-only + complex queries ⇒ index

heavily.• Other techniques:

– normalise dimension tables• Normalising the dimension tables simplifies filtering operations related to the

dimensions.

– multiple fact tables for different aggregation levels• that each fact table will be smaller, and thus faster to access.

– physical tuning: partitioning, replication, etc.

35

Page 36: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Performance tuningfor data warehouses

• B-tree indexes and hashing generally useful.• Bitmap indexes particularly for “counting by category”

queries.– Bitmap indexes also useful because there are lots of low selectivity

columns in the fact table.

• Integrated indexes for dimension tables..• Function-based indexes could be useful? • Dimension tables are effectively commonly used “lookup tables”, so

storing them (in Oracle) as index-organised tables could be beneficial. • The main advantage is that the index (and hence the table) will

typically be kept in RAM.

36

Page 37: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Oracle10g supports data warehouses

(Data Warehousing Guide )

The simple approach– Use distribution and replication services.– Does not scale well.

Oracle data mart suite– Add-on for constructing Oracle data marts.– GUI interface.– Modules for design, extraction,

transformation and loading of data.– Third-party tools also available.

37

Page 38: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Oracle10g supports data warehouses

• Bitmap & function-based indexes, index-organised tables.– Bitmap join indexes speed up queries that join dimension tables to a

fact table, because they effectively index the join

• Bitmap join indexes.• Other relevant tools:

– SQL*Loader (possibly in conjunction with Transparent Gateways)

– export and import (basic)• If your data sources are Oracle databases, then you could in

theory export from the sources and import into the DW

38

Page 39: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

OLAP tools enable complex data processing

(Silberschatz §22.2; Figure 4–4)

• Complex analysis of multidimensional data.

• Spreadsheet-like “simplicity”.• Data stored in warehouse or tool’s

internal proprietary database.

39

Page 40: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

OLAP tools have many capabilities

• Data transformation.• Business modelling.• Statistical analysis.• Powerful GUI query facility.• Visualisation (graphics).

40

Page 41: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

A simple OLAPexample using Excel(Kifer §17.3; see also Silberschatz §22.2.3–22.2.5)

• Sales subject area dimensions: customer, salesreps, product, region, time, …

• View sales aggregated by dimensions.• Dynamically alter presentation:

– drill down/roll up– “slice and dice”

– “pivot” the table - is a data summarization tool found in data visualization programs such as spreadsheets

– highlight exceptions (e.g., high loss products)– invent new columns (e.g., % sales revenue)

41

Page 42: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

“Slice and dice” enables dynamic visualisation

42

A slice is effectively a restriction on a subset of values in a particular dimension. We can slice on just one dimension, or many dimensions

at once.

Page 43: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

OLAP data may bestored in different ways

(Kifer §17.4; Silberschatz §22.2.2)

• Internal proprietary database (often MDD - Multidimensional database.).

• Access external databases (data warehouses):

– relational (ROLAP)– multidimensional (MOLAP)– both (HOLAP) - hybrid OLAP

43

Page 44: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Some OLAP products

• Oracle Business Intelligence and Hyperion BI+.

• IBM Cognos Business Intelligence.• Business Objects & Crystal

Reports.• JasperSoft Business Intelligence

Suite (http://www.jasperforge.org/).

44

Page 45: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Oracle10g SQL hasadditional OLAP support

(see also Kifer §17.3.2 and Silberschatz §22.2.3)

• GROUP BY CUBE (<columns>).• GROUP BY ROLLUP (<columns>).

– These enable you to build data cubes and aggregated rollups straight out of Oracle

• GROUPING SETS (different from SQL:1999’s GROUPING function).

• No RANK or PARTITION BY (SQL:1999).

45

Page 46: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Data mining mayfind hidden trends

(Kifer §17.7; Silberschatz §22.3)

• OLAP & data warehousing let us identify trends and relationships.

BUT: some relationships too complex or subtle to easily notice.

• Data mining tools claim to sift through databases and find unrecognised relationships and trends.

46

Page 47: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

There are many datamining techniques

(Kifer §17.8–17.11; Silberschatz §22.3)

• Neural networks• Complex visualisation.• Genetic algorithms (evolve a

solution).• Advanced statistical analysis

(traditional).

47

Page 48: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Data mining examples

• Fraud detection (phone, credit card).• MCI’s statistical profiles.• Risk assessment for car insurance

(FIG).• NBA strategy analysis.

• But it’s not foolproof…

48

Page 49: SFDV3007 Chapter 4: Decision Support Systems. Overview of Chapter 4 What is decision support? Decision support systems Data warehouse concepts Data warehouse

Summary of Chapter 4

• Decision support– decision support vs. operational– decision support systems

• Data warehouses– Characteristics– logical & physical design

• Online analytical processing (OLAP)• Data mining

49