24
www.company.com Lab3 CPIT 440 Data Mining and Warehouse

Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

Embed Size (px)

Citation preview

Page 1: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Lab3

CPIT 440Data Mining and Warehouse

Page 2: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Lab3: Outlines• Introduction to Data Warehouse

– What is Data Warehouse ?– Difference between Data Warehouse and Database

• Introduction to OLAP operations – Introduction to cubes– Cube structure – OLAP Operations

• Exercises

CPIT 440Data Mining and Warehouse

Page 3: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Data Warehouse• What is Data Warehouse ?

– A data warehouse is a repository of an organization's stored data that is designed for query and analysis rather than for transaction processing to facilitate reporting and analysis.

– It usually contains historical data derived from transaction data, but it can include data from other sources.

– It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.

CPIT 440Data Mining and Warehouse

Page 4: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Data Warehouse

CPIT 440Data Mining and Warehouse

Page 5: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Difference between Data Warehouse and Database

• A question we often asks out in the field is: I already have a database, so why do I need a data warehouse ? What is the difference between a database vs. a data warehouse?

CPIT 440Data Mining and Warehouse

Database Data Warehouse

Designed to handle transactions

It is structured to make analytics fast and easy.

It isn’t designed to handle and do analytics well.

It exists as a layer on top of another database or databases, and takes the data from all these databases and creates a layer optimized for and dedicated to analytics.

Page 6: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Introduction to OLAP Operations • Introduction to cubes:

– A cube is a set of data that is usually constructed from a subset of a data warehouse and is organized and summarized into a multidimensional structure defined by a set of dimensions and measures.

– Cubes are the main objects in online analytic processing (OLAP),

– It is a technology that provides fast access to data in a data warehouse.

CPIT 440Data Mining and Warehouse

Page 7: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Introduction to OLAP Operations

CPIT 440Data Mining and Warehouse

Page 8: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Introduction to OLAP Operations • Cube Structure:

– Every cube has a schema, which is the set of joined tables in the data warehouse from which the cube draws its source data.

– The central table in the schema is the fact table, the source of the cube's measures.

– The other tables are dimension tables, the sources of the cube's dimensions.

CPIT 440Data Mining and Warehouse

Page 9: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Introduction to OLAP Operations • Cube Structure

– A cube's structure is defined by its measures and dimensions.

– They are derived from tables in the cube's data source.– The set of tables from which a cube's measures and

dimensions are derived is called the cube's schema.– Every cube schema consists of a single fact table and

one or more dimension tables. – The cube's measures are derived from columns in the

fact table. – The cube's dimensions are derived from columns in the

dimension tables.

CPIT 440Data Mining and Warehouse

Page 10: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Introduction to OLAP Operations

• Cube Structure – Star schema: A fact table in the middle connected to a

set of dimension tables

– Snowflake schema: A refinement of star schema where

some dimensional hierarchy is normalized into a set of

smaller dimension tables, forming a shape similar to

snowflake

– Fact constellations: Multiple fact tables share dimension

tables, viewed as a collection of stars, therefore called

galaxy schema or fact constellation

CPIT 440Data Mining and Warehouse

Page 11: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Introduction to OLAP Operations

CPIT 440Data Mining and Warehouse

• OLAP Operations:– Roll up: summarize data / dimension reduction– Roll down: reverse of roll-up

• Make detailed data, or introducing new dimensions

– Slice and dice– Pivot (rotate)

Page 12: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Roll up and Roll down

CPIT 440Data Mining and Warehouse

Page 13: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Slice and Dice

CPIT 440Data Mining and Warehouse

Page 14: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Pivot (Rotate)

CPIT 440Data Mining and Warehouse

Page 15: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Exercise 1• Suppose that a data warehouse consists of the

three dimensions: time, doctor, and patient, and the two measures count and charge, where charge is the fee that a doctor charges a patient for a visit.

CPIT 440Data Mining and Warehouse

Page 16: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Exercise 1(a) Enumerate three classes of schemas that are

popularly used for modeling data warehouses.

Three classes of schemas popularly used for modeling data warehouses are

• The star schema, • The snowflake schema• The fact constellations schema.

CPIT 440Data Mining and Warehouse

Page 17: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Exercise 1(b) Draw a schema diagram for the above data

warehouse using one of the schema classes listed in part (a).

CPIT 440Data Mining and Warehouse

Page 18: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Exercise 1(c) Starting with the base cuboid [day; doctor;

patient], what specific OLAP operations should be performed in order to list the total fee collected by each doctor in 2004?

CPIT 440Data Mining and Warehouse

The operations to be performed are:• Roll-up on time from day to year.• Slice for time=2004.• Roll-up on patient from individual patient to all.

Page 19: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Exercise 2• Suppose that a data warehouse for Big

University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg. grade.

• When at the lowest conceptual level (e.g.,for a given student, course, semester, and instructor combination), the avg. grade measure stores the actual course grade of the student.

• At higher conceptual levels, avg. grade stores the average grade for the given combination.

CPIT 440Data Mining and Warehouse

Page 20: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Exercise 2(a) Draw a snowflake schema diagram for the data

warehouse.

CPIT 440Data Mining and Warehouse

Page 21: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Exercise 2(b) Starting with the base cuboid [student; course;

semester; instructor], what specific OLAP operations should perform in order to list the average grade of CS courses for each Big University student.

CPIT 440Data Mining and Warehouse

The specific OLAP operations to be performed are:• Roll-up on course from course id to department.• Roll-up on student from student id to university.• Dice on course, student with department=\CS" and university = \Big University".• Drill-down on student from university to student name.

Page 22: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Exercise 3• Suppose that a data warehouse consists of the

four dimensions; date, spectator, location, and game, and the two measures, count and charge, where charge is the fee that a spectator pays when watching a game on a given date.

• Spectators may be students, adults, or seniors, with each category having its own charge rate.

CPIT 440Data Mining and Warehouse

Page 23: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Exercise 3(a) Draw a star schema diagram for the data

warehouse.

CPIT 440Data Mining and Warehouse

Page 24: Www.company.com Lab3 CPIT 440 Data Mining and Warehouse

www.company.com

Exercise 3(b) Starting with the base cuboid [date; spectator;

location; game], what specific OLAP operations should perform in order to list the total charge paid by student spectators at GM Place in 2004?

CPIT 440Data Mining and Warehouse

The specific OLAP operations to be performed are:• Roll-up on date from date id to year.• Roll-up on spectator from spectator id to status.• Roll-up on location from location id to location name.• Roll-up on game from game id to all.• Dice with status=\students", location name=\GM Place", and year=2004.