24
1 Class Agenda: 03/26 – 4/02 Review Test 1 Discuss Analytical Reports (essay questions and class reports) Contrast transactional vs. data warehouse design Discuss process of data warehouse design Contrast different approaches to data warehouse design Design a data warehouse

Class Agenda: 03/26 – 4/02

  • Upload
    ginger

  • View
    49

  • Download
    0

Embed Size (px)

DESCRIPTION

Class Agenda: 03/26 – 4/02. Review Test 1 Discuss Analytical Reports (essay questions and class reports) Contrast transactional vs. data warehouse design Discuss process of data warehouse design Contrast different approaches to data warehouse design Design a data warehouse. - PowerPoint PPT Presentation

Citation preview

Page 1: Class Agenda:  03/26 – 4/02

1

Class Agenda: 03/26 – 4/02

Review Test 1Discuss Analytical Reports (essay questions

and class reports)Contrast transactional vs. data warehouse

designDiscuss process of data warehouse designContrast different approaches to data

warehouse designDesign a data warehouse

Page 2: Class Agenda:  03/26 – 4/02

2

Written Analysis

An analysis requires: Thesis statement; Key points that are used to analyze the thesis statement:

Facts or Fact-like data (examples, cases, logical reasoning);

Interpretation of the key points; and Recommendations.

Page 3: Class Agenda:  03/26 – 4/02

3

What is a thesis statement?

A one or two sentence presentation of the analysis.

A strong thesis statement: Takes a stand. Justifies why there is a controversy and presents the key

argument. Expresses one main idea. Is specific.

Page 4: Class Agenda:  03/26 – 4/02

Strong and weak thesis statements

Weak There are benefits and drawbacks to implementing business intelligence in

XYZ organization. This paper analyzes the need for initial and ongoing governance of business

intelligence within XYZ. Organizations such as XYZ should enhance their customer relationship

management by using a business intelligence system. Strong

Implementing an enterprise data warehouse may be costly and time-consuming, but XYZ needs integrated data and business intelligence analytical tools to improve market share.

While it may be difficult to gain agreement on the data composition of a data warehouse, that initial implementation effort pales in comparison to the long-term support required to maintain a comprehensive business intelligence system. This paper analyzes how XYZ supports long-term use of their BI system with an effective governance structure.

Page 5: Class Agenda:  03/26 – 4/02

Question #1: How does big data differ from not-so-big-data? Is an organization that is competing on analytics using big data? Is big data required for an organization to be using business intelligence effectively?

Thesis statement possibilities:

1) Organizations use data to support decision making at differing levels; some organizations use complex algorithms that require sophisticated data sets while others are content to view simple reports. An organization may make data driven decisions without incorporating the use of big data required to support complex statistical algorithms.

2) The concept of “big data” is in its infancy and its definition is based on the context of its use. What one organization might regard as “big data” another would view as “regular old data.” Both organizations could use their relative data sets to support effective decision making.

3) There is a clear difference between organizations who are using “big data” and those that are simply making decisions supported by data sets. Both types of organizations could be using business intelligence systems. However, organizations with big data use not only a large volume of data, but incorporate both internal and external data, have data sets that change quickly, and gather data from many diverse data sources.

What is the essence of these questions?

How do we differentiate the terms of our field: big data, business intelligence, competing on analytics?

Page 6: Class Agenda:  03/26 – 4/02

6

Goals for Transaction Database Design

Make required data available to support business processes.

Protect the integrity of the data. Reduce data redundancy. Prevent data anomalies.

Provide for change. Prevent inflexible data structures. Anticipate changes.

Page 7: Class Agenda:  03/26 – 4/02

7

How do we achieve those goals?

Effective systems analysis and design techniques.

Relational DBMS.Normalization.

Page 8: Class Agenda:  03/26 – 4/02

8

Goals for data warehouse design

Make complete and accurate information easily accessible.

Present information consistently.Be adaptive and flexible to change.Provide reasonable and expected performance

for information to support decision making.Protect/secure information.

Page 9: Class Agenda:  03/26 – 4/02

9

How do we achieve those goals?

More effective systems analysis and design techniques.

Knowledge of required decision support systems.

Appropriate DBMS.Appropriate use (or non-use) of

normalization.

Page 10: Class Agenda:  03/26 – 4/02

10

Three different data models

Transaction (operational) data model: Contains current data required by separate and/or integrated operational systems. Supports the transactional processing of the organization. Is frequently used to support day-to-day decision making. 3rd normal form.

Reconciled (enterprise data warehouse) data model: Contains detailed, current data intended to be the single, authoritative source for all decision support applications. Usually in 3rd normal form.

Derived (data mart) data model: Contains data that are selected, formatted and aggregated for end-user decision support applications. Star or snowflake schema. Probably not normalized.

Page 11: Class Agenda:  03/26 – 4/02

Reconciled and Derived Data Models

Reconciled (EDW) Independent of specific

decisions Centralized control;

usually owned by IT Historical Not usually summarized Normalized Flexible Many data sources Long life Starts large, becomes

larger

Derived (Data Mart) Specific decisions One central subject Usually accessed directly by

users; usually decentralized into user area

Closely defined subject area Detailed and/or

summarized Usually denormalized Restrictive – few sources Short life span Starts small, becomes large

Page 12: Class Agenda:  03/26 – 4/02

Two general approaches to designEnterprise Data Warehouse

(Bill Inmon) Focus is on enterprise

subjects that will be needed to support comprehensive decision making.

Emphasis on creating design that is consistent among subject areas.

Implementation is of a data mart.

Uses ERD for modeling. Relies on comprehensive

blueprint for interrelation of data.

Interrelated Data Marts (Ralph Kimball)

Focus is on business subject area for data warehouse.

Emphasis on creating simple design that can be implemented quickly.

Implementation is of a data mart.

Uses “dimensional model” for modeling. Kind of like an ERD with UML-type aspects.

Relies on consistent interrelation of data by integration of existing data models.

Page 13: Class Agenda:  03/26 – 4/02

13

Compare/Contrast Approaches

Similarities: Both focus on subject areas for development of data model. Both require extensive input from data warehouse stakeholders. Both produce a subject-oriented, non-volatile, time-related data

warehouse. Both try to quickly implement a prototype data mart.

Differences: Inmon creates a more integrated and consistent data warehouse by

attempting to design an enterprise-wide warehouse at the beginning of the first data warehouse project. This is called a “reconciled” DW design.

Kimball relies on future project teams referencing existing data warehouse models for new projects.

Page 14: Class Agenda:  03/26 – 4/02

14

What do both approaches yield?

A design for a data mart. The design for a data mart is based on the

concept of a data warehouse “cube.”A cube is a logical construct containing a “fact”

table that is accessed on multiple “dimension” tables.

A fact table contains values that a manager uses to make decisions.

A dimension table is used as a reference for the values in the fact table.

Page 15: Class Agenda:  03/26 – 4/02

15

Steps of data warehouse design

1. Identify the stakeholders that need data to support their decisions.

2. Define and describe the data needs of those stakeholders.

3. Define the subject area.4. Choose (EDW and data mart) or just data mart.5. Select the data of interest. May be internal, external.

May be purchased. May be stored in a transaction database – may not. May be generated just for the data warehouse.

6. Add element of time.7. Add derived data.8. Determine granularity level.9. Summarize data.10. Identify and attempt to solve potential performance

issues.

Page 16: Class Agenda:  03/26 – 4/02

16

How do you identify those people within an

organization who require data to support their

decision making processes?

Page 17: Class Agenda:  03/26 – 4/02

17

Define and describe the data needs

Usually termed “stakeholder analysis”. Differing levels of decision making require differing sets of

data. Internal vs. external data. Integrated vs. non-integrated data. Detailed vs. summarized data.

Different stakeholders require different access mechanisms. Online vs. reports. Pre-formatted vs. ad-hoc availability of data.

Different stakeholders require different timing. Online, real time vs. delay. Relative size of delay/timeliness is always an issue.

Page 18: Class Agenda:  03/26 – 4/02

Stakeholder Analysis Table Example – TECStakeholder Decision Making

ResponsibilitiesExisting Information?

Additional Information?

Availability of Additional Information?

Marketing and Sales Manager

Identify new markets for TEC servicesIdentify new geographic regions.

Current markets by region, by job category, by type of work

Competitors.Size of accounting temp market.Regional unemployment

Not in existing system and cannot be compiled manually. Maybe web survey?

Placement Manager

Determine best applicant for a given position.

Job requirements.Applicant qualifications.Customer satisfaction with past workers.

Employee satisfaction.Anticipated customer needs.

Not in existing system. Will need new surveys and potential survey groups.

Finance Analyst

Evaluate relative profitability by worker and customer.

Hourly pay rate, hourly timecards.

More refined categories of customer and worker.

Not available in current systems.

Recruiter Select best advertising outlets for workers.Find best workers.Choose best potential workers.

None. Customer satisfaction by worker background (education, skills, etc.)Advertising tracking?

Not available in current systems.

Page 19: Class Agenda:  03/26 – 4/02

19

Define the subject area

Potential subject areas in common to many businesses: Customers: people and organizations who acquire and/or use the

company’s products. Equipment: Machinery, devices, tools and their components. Facilities: Real estate and their components. Sales: Transactions that move a product from company to a

customer. Suppliers: Entities that provide a company with goods and services. Products: Goods and services that the company, or its competitors,

provide to customers. Materials: Goods and services that the company uses to produce its

products. Financials: Information about money that is received, retained,

expended, invested or in any way tracked by the company. Human resources: Individuals who perform work for the company –

may be employees, contracts, or simply positions.

Page 20: Class Agenda:  03/26 – 4/02

20

Select the data of interest

Use the existing transaction database model. Identify and understand the necessary

business decisions. Identify external data that could help support

decisions.Use tables to help sort available attributes.Look at appendix 1 of the TEC exercise.

Page 21: Class Agenda:  03/26 – 4/02

21

Add element of time

Data warehouse is a historical model rather than a current “point in time” model.

Must have a way to incorporate changes that occur over time.

Important issues: Fact table must include a time component. Ranges of time vs. effective period in time Time also relates to dimension tables May have to deal with differing time periods. Examples

are fiscal years, “holiday rush,” billing cycle, etc.

Page 22: Class Agenda:  03/26 – 4/02

22

Add derived data

Derived data includes any kind of calculated field.

Examples: total sales; net sales amount; total funds raised; total cost of products.

Issues: Must be identified, defined and agreed upon by data

warehouse stakeholders. Must be documented in metadata. Must be consistent.

Page 23: Class Agenda:  03/26 – 4/02

23

Determine granularity level

What are the benefits and drawbacks of a low level of granularity?

What are the benefits and drawbacks of a high level of granularity?

What factors should be considered when determining the level of granularity in the data warehouse?

Page 24: Class Agenda:  03/26 – 4/02

24

Summarize (aggregate) data

What is summarized data?How is data summarized?Does summarized data save disk space?Why summarize data?