24
Dr. Abdul Basit Siddiqui Assistant Professor FURC (Lecture Slides Week # 2)

Dwh lecture slides-week3&4

Embed Size (px)

Citation preview

Dr. Abdul Basit Siddiqui Assistant Professor

FURC(Lecture Slides Week # 2)

Why a Data Warehouse (DWH)?Data recording and storage is growing:

Almost every industry has huge amount of operational data.

Careful use/analysis of historic information may result in excellent prediction for the future: Knowledge worker wants to turn available data into useful

information. This information is used by them to support strategic

decision making.Gives total view of the organization:

It is a platform for consolidated historical data for analysis. It stores data of good quality so that knowledge worker

can make correct decisions.Intelligent decision-support is required for decision-

making.

Data Warehouse & Mining- Spring 201404/15/23 2

Why a Data Warehouse? (Contd.)

From business perspective:It is latest marketing weapon.Helps to keep customers by learning more

about their needs. Valuable tool in today’s competitive fast

evolving world.

Data Warehouse & Mining- Spring 201404/15/23 3

Reason-I: Why a Data Warehouse (DWH)?

Data sets are growing:

How Much Data is that?1 MB 220 or 106 bytes Small novel 3½ Disk.

1 GB 230 or 109 bytesPaper reams that could fill the back of a pickup van.

1 TB 240 or 1012 bytes50,000 trees chopped and converted into paper and printed.

2 PB 1 PB = 250 or 1015 bytes Academic research libraries across USA.

5 EB 1 EB = 260 or 1018 bytesAll words ever spoken by the Human Beings.

Data Warehouse & Mining- Spring 201404/15/23 4

Reason-I: Why a Data Warehouse (DWH)?

Size of Data Sets are going up.Cost of Data Storage is coming down.

The amount of data average business collects and stores is doubling every year.

Total hardware and software cost to store and manage 1 MB of data: 1990: $ 15 2002: ¢ 15 (down 100 times) 2010: < ¢ 1 (down 150 times)

A few examples: Wall Mart: 24+ TB Finance Telecom: 100+ TB CERN: Upto 20 PB by 2006 Stanford Linear Accelerator Center (SLAC): 500 TB Telenor, Ufone, Mobilink, Warid, Zong ???

Data Warehouse & Mining- Spring 201404/15/23 5

Caution!

A Warehouse of Data is NOT a

Data Warehouse.

Data Warehouse & Mining- Spring 201404/15/23 6

Caution!

Sizeis NOT

Everything.

Data Warehouse & Mining- Spring 201404/15/23 7

Reason-2: Why a Data Warehouse (DWH)?

DBMS Approach List of all items that were

sold last month?

List of all makeup items purchased by Sassi?

The total sales of the last month grouped by branch?

How many sales transactions occurred during the month of January?

Intelligent Enterprise Which items sell together?

Which items to stock?

Where and how to place the items? What discounts to offer?

How best to target customers to increase sales at a branch?

Which customers are most likely to respond to my next promotional campaign, and why?

Data Warehouse & Mining- Spring 2014

Businesses demand Intelligence (BI). Complex questions from integrated data. “Intelligent Enterprise”

04/15/23 8

Reason-3: Why a Data Warehouse (DWH)?

Businesses want much more …What happened?Why it happened?What will happen?What is happening?What do you want to happen?

Data Warehouse & Mining- Spring 201404/15/23 9

What is a Data Warehouse?

A complete repository of historical corporate data

extracted from transaction systems that is available for ad-hoc access by knowledge

workers.

Data Warehouse & Mining- Spring 201404/15/23 10

What is a Data Warehouse?Transaction System:

Management Information System (MIS)Could be typed sheets (NOT transaction system)

Ad-Hoc Access:Does not have a certain access patternQueries not known in advanceDifficult to write SQL in advance

Knowledge Workers:Typically NOT IT literate (Executives, Analysts,

Managers)NOT clerical workersDecision makers

Data Warehouse & Mining- Spring 201404/15/23 11

What is a Data Warehouse?Inmons’s Definition:

A Data Warehouse is: Subject-oriented Integrated Time-variant Nonvolatile

Collection of data in support of management’s decision making process.

Data Warehouse & Mining- Spring 201404/15/23 12

Another View of a DWH

Data Warehouse & Mining- Spring 2014

Subject Oriented

Integrated

Time Variant

Non Volatile

04/15/23 13

Subject-orientedData Warehouse is organized around subjects such as sales,

product, customer. It focuses on modeling and analysis of data for decision makers.Excludes data not useful in decision support process.

Data Warehouse & Mining- Spring 201404/15/23 14

Integration

Data Warehouse is constructed by integrating multiple heterogeneous sources.

Data Preprocessing are applied to ensure consistency.

Data Warehouse & Mining- Spring 2014

RDBMS

LegacySystem

DataWarehouse

Flat File Data ProcessingData Transformation

04/15/23 15

Time-variant

Provides information from historical perspective e.g. past 5-10 years.

Every key structure contains either implicitly or explicitly an element of time.

Data Warehouse & Mining- Spring 201404/15/23 16

Nonvolatile

Data once recorded cannot be updated.Data Warehouse requires two operations in

data accessingInitial loading of dataAccess of data

Data Warehouse & Mining- Spring 2014

load

access

04/15/23 17

Summary: What is a Data Warehouse?

It is a blend of many technologies, the basic concept being:Take all data from different operational systemsIf necessary, add relevant data from industryTransform all data and bring into a uniform

formatIntegrate all data as a single entityStore data in a format supporting easy access

for decision supportCreate performance enhancing indicesImplement performance enhancement joinsRun ad-hoc queries with slow selectivity

Data Warehouse & Mining- Spring 201404/15/23 18

Benefits of Data Warehouse

High returns on investment.Substantial competitive advantage.Increased productivity of corporate decision-

makers.Fast reporting for decision making process.Reduced reporting load on transactional systems.Making institutional data more user-friendly and

accessible for knowledge workers.Integrated data from different source systems.Enabled ‘point-in-time’ analysis and trending over

time.Helps in identifying and resolving data integrity

issues, either in the warehouse itself or in the source systems that collect the data.

Data Warehouse & Mining- Spring 201404/15/23 19

Data Warehouse: How is it Different?

1. Decision making is Ad-Hoc

Data Warehouse & Mining- Spring 201404/15/23 20

Data Warehouse: How is it Different?

2. Different patterns of hardware utilization

Data Warehouse & Mining- Spring 2014

Bus Service vs. Train

04/15/23 21

Data Warehouse: How is it Different?

3. Combines operational and historic data Don’t do data entry into a DWH. OLTP or ERP

are the source systems. OLTP systems don’t keep history, cannot get

balance statement more than a year old. DWH keep historical data, even of bygone

customers. Why? In the context of bank, want to know why the

customer left? What are the events that led to his/her leaving?

Why? Customer retention

Data Warehouse & Mining- Spring 201404/15/23 22

Data Warehouse: How is it Different?

How much history? Depends on:

Industry Cost of storing historical data Economic value of historical data

Industry and history Telecom calls are much much more as compared to

bank transactions 18 months

Retailers interested in analyzing yearly seasonal patterns

65 weeks, why? Insurance companies want to do actuary analysis, use

the historical data in order to predict risk 7 years

Hence NOT a complete repository of data.

Data Warehouse & Mining- Spring 201404/15/23 23

Data Warehouse: How is it Different?

How much history?

Economic value of data vs. storage cost

Data Warehouse a complete repository of data?

Data Warehouse & Mining- Spring 201404/15/23 24