82
Information Systems Management - IS433 Semester 1, 2015

Database, Data Warehouse and Data Mining

Embed Size (px)

DESCRIPTION

This document contains information in Database, Data Warehouse and Data Mining

Citation preview

Page 1: Database, Data Warehouse and Data Mining

Information Systems Management - IS433

Semester 1, 2015

Page 2: Database, Data Warehouse and Data Mining

Lecture 32

Database, Data Warehouse, and Data Mining

Managing Data to Improve Business Performance

IS433 – Information Management

Page 3: Database, Data Warehouse and Data Mining

Lecture 3 – Learning Objectives3

1. Describe how data and document management impact profits and performance.

2. Understand how managers are supported or constrained by data quality.

3. Discuss the functions of databases and database management systems.

4. Understand how logical views of data provide a customized support and improve data security.

5. Describe the tactical and strategic benefits of data warehouses, data marts, and data centers.

6. Describe transaction and analytic processing systems.7. Explain how enterprise content management and electronic

records management reduce cost, support business operations, and help companies meet their regulatory and legal requirements.

IS433 – Information Management

Page 4: Database, Data Warehouse and Data Mining

IT-Performance Model

4

IS433 – Information Management

Page 5: Database, Data Warehouse and Data Mining

Applebee’s international learns and earns from ITs data (Case Study)

The Problem:

Over the past decades, business have invested heavily in IT infrastructures (eg. ISs) to capture, store, analyze and communicate data.

Creation of ISs to manage and process data and the deployment of communication networks does not generate value, as measured by an increase in profit (profit = revenues – expenses)

Company realized that Profit increases when employees learn from data and use data to increase revenues, reduce expenses or both.

In the learn and earn model…., from their data, they can predict what actions will lead to the greatest increase in net earnings.

5

Page 6: Database, Data Warehouse and Data Mining

Business uncertainty

What will be monthly demand for Product X over each of the next 3 months?

• Knowing demand for Product X means Knowing how much order. Sales quantity and sales revenues are maximized because there are no inventory shortages or lost sales. Expenses are minimized because there is no unsold inventory

Which marketing promotions for Product Y are customers most likely respond to?

• Knowing which marketing promotion will get the highest response rate maximizes sales revenues while avoiding the huge expense of a useless promotion.

6

IS433 – Information Management

Page 7: Database, Data Warehouse and Data Mining

Applebee’s international learns and earns from ITs data

Applebee -- the largest casual dinning enterprise in the world. As of 2008, 2000 Applebee’s restaurants operating in 49 states and 17 countries, 510 company owned.

*** To make difference and to build CUSTOMER LOYALTY (return visits), management wanted customers to experience a good time while having a great meal at attractive prices.

To achieve this goal, management had to 1. be able to forecast demand ACCURATELY and 2. to become familiar with customer’s experiences and regional food preferences.

For example, knowing which items to add to the menu based on past food preferences helps motivate return visits.

7

IS433 – Information Management

Page 8: Database, Data Warehouse and Data Mining

Problem

Another problem is that it is difficult to bring together huge quantities of data located in different databases in a way that creates value.

Without efficient processes for managing vast amounts of customers data and turning data into usable knowledge, companies can miss critical opportunities to find insights hidden in the data.

8

IS433 – Information Management

Page 9: Database, Data Warehouse and Data Mining

IT Solution

Applebee implemented an enterprise data warehouse (EDW) from teradata (teradata.com) with data analysis capabilities that helped management acquire and accurate understanding of sales, demand, and costs.

EDW is a data repository whose data are analyzed and used throughout the organization to improve responsiveness (to customer) and ultimately net earnings.

Collect data concerning the previous day’s sales from all point-of-sale (POS) systems located at every company-owned restaurant.

Organize this data to report every item sold in 15-min intervals.

Reduce time required to collect POS data from 2 weeks to 1 day.

Respond quickly to guest’s needs and respond to changes in guest’s preferences.

Help company provide services that attract customers and build loyalty.

9

IS433 – Information Management

Page 10: Database, Data Warehouse and Data Mining

Results

Management can collect and analyze detailed data in near real-time using EDW.

Regional managers can select the best menu offerings and operate more efficiently.

From customer satisfaction surveys, … be able to identify regional preferences, predict product demand and build financial models.

Improved customers’ experience, satisfaction, and loyalty

Increase earnings

Total sales increased by 3.9% over prior year and opened 16 new restaurants.

10

IS433 – Information Management

Page 11: Database, Data Warehouse and Data Mining

Lessons learned from this case

Learn Importance of timely and detailed data collection, data analysis based on insights from data.

Learn that it is necessary to collect vast amount of data, organize and store them PROPERLY in one place and then use the results of analysis to make better marketing and make strategic decisions.

11

IS433 – Information Management

Page 12: Database, Data Warehouse and Data Mining

Applebee’s International Learns & Earns

Problem: Huge quantities of data in many Databases.

Solution: Enterprise data warehouse implemented.

Results: Improved profitability.

IS433 – Information Management

Page 13: Database, Data Warehouse and Data Mining

Data, Master Data, and Document Management

13

IS433 – Information Management

Page 14: Database, Data Warehouse and Data Mining

Data Management

Basic rule is that, … to maximize earnings, companies invest in data management technologies that increase

The opportunity to earn revenues. (Customer relationship management)

The ability to cut expenses (Inventory management)

To improve business processes and performance, managers and decision makers need rapid access to data.

14

IS433 – Information Management

Page 15: Database, Data Warehouse and Data Mining

Data Management

Data management is about the design of data infrastructures to provide employees with complete, timely, accurate, accessible, understandable, and relevant data.

By definition, Data management is a structured approach for Capturing, Storing, Processing, Integrating, Distributing, Securing, and Archiving data EFFECTIVELY throughout their life cycle.

15

IS433 – Information Management

Page 16: Database, Data Warehouse and Data Mining

Uncertainty: A constraint on managers

The viability of business decisions depends on access to high-quality data,

Quality of data depends on effective approaches to data management.

Too often, managers and information workers are constrained by data that can’t be trusted because they are incomplete, out of context, outdated, inaccurate, inaccessible require weeks to analyze

In those situations, decision maker is facing too much uncertainty to make intelligent Business decisions.

16

IS433 – Information Management

Page 17: Database, Data Warehouse and Data Mining

Uncertainty: A constraint on managers

Data errors and inconsistencies lead to mistakes and lost opportunities such as failed deliveries, invoicing blunders, and problems synchronizing data from multiple locations.

Lead to Data analysis errors that resulted from the use of inaccurate formulas or untested models. TransAlta is Canadian power generator company. A spreadsheet

mistake led to buying more US power transmission hedging contracts at higher prices. The data error cost US$24 million.

In the retail sector, the cost of errors due to unreliable and incorrect data alone is estimated to be as high as $40 billion annually

In the healthcare industry, data errors not only increase healthcare costs by billions of dollars, but also cost thousands of lives. (Read A Closer Look 3.1)

17

IS433 – Information Management

Page 18: Database, Data Warehouse and Data Mining

Cost of Poor Quality Data

IS433 – Information Management

18

Cost of Poor Quality Data = Lost of Business Value + Cost to prevent errors + Cost to correct Errors

Examples: Loss of Business – sales opportunities are missed, orders

returned because wrong items were delivered, or errors frustrate and drive away customers.

Preventing errors – amount of time taken by employees to verify the information to avoid mistakes.

Correcting errors – correction required to process corrections to database.

Organizations with at least 1,000 knowledge workers lose ~ $5.7 million annually in time wasted by employees reformatting data as they move among applications.

Page 19: Database, Data Warehouse and Data Mining

Data Life Cycle

19

Internal Data

External Data

Data Warehouse

Personal

expertise &

Judgement

Data Marts

Data Marts

OLAP, Queries, EIS, DSS

Data Mining

Data Visualization

Decision Support

Knowledge and its Management

SCM

CRM

E-Commerse

Strategy

Others

Data Sources and Databases Data Storage Data Analytics Results

Business Applications

IS433 – Information Management

Page 20: Database, Data Warehouse and Data Mining

General Data Principles

3 General Data Principles relate to data life cycle perspective and help to guide IT investment decisions;

1. Principal of data diminishing – value of data as they age

2. Principal of 90/90 data use – 90 % of data stored is hardly accessed after 90 days. Data lose much of their value after 3 months.

3. Principal of data in context – ability to capture, process, format, and distribute data in real time or faster requires huge investment in data architecture

20

IS433 – Information Management

Page 21: Database, Data Warehouse and Data Mining

Data Visualization

To format data into meaningful contexts for users, businesses employ “data visualization” and decision support tools.

Present data in ways that are faster and easier for users for users to understand.

Data visualization tools are less expensive and easier to manipulate.

• Table provides more precise data, whereas the graph takes much less time to understand.

21

IS433 – Information Management

Page 22: Database, Data Warehouse and Data Mining

22

IS433 – Information Management

Page 23: Database, Data Warehouse and Data Mining

Data Management: Problems and Challenges

Remember that dirty data result in poor business decisions, poor customer service, poor product design, wasteful situations.

Even if data are accurate, timely, and clean, they might not be usable.

Organizations with >1000 workers lose $5.7 millions annually in time wasted by employees reformatting data as they move among applications.

Problems about managing, searching for, retrieving data located throughout the enterprise is a major challenge for various reasons…

23

IS433 – Information Management

Page 24: Database, Data Warehouse and Data Mining

Data Management: Problems and Challenges

The volume of data increases exponentially with time. New data are added rapidly.

Business records must be kept for a long time for auditing or legal reasons, even though the organization may no longer access them.

24

IS433 – Information Management

Page 25: Database, Data Warehouse and Data Mining

Data are scattered throughout organizations and are collected and created by many individuals using different methods and devices. Data are frequently stored in multiple servers and locations and also in different computing systems, databases, formats, and human and computer languages.

Data security, quality, and integrity Legal requirements relating to data differ among countries.

Data are created and used offline without going through QC Validity of data is questionable.

Data throughout and organization may be redundant and out-of-date, creating a huge maintenance problem.

Data Management: Problems and Challenges25

IS433 – Information Management

Page 26: Database, Data Warehouse and Data Mining

To deal with these difficulties, organizations invest in data management solutions.

It is inefficient or even impossible for queries to use traditional data management methods. Eg. Applebee’s case

Data management support transaction processing by organizing data in one location.

Data Management: Problems and Challenges26

IS433 – Information Management

Page 27: Database, Data Warehouse and Data Mining

Master Data Management

is a process whereby companies integrate data from various sources or enterprise applications to provide a more unified view of the data.

In reality, MDM can’t create a single unified version of data. Realistically MDM consolidates data from various sources

into a master reference file, which then feeds data back to the applications.

A master data reference file is based on data entity. A data entity is anything real or abstract about which a

company wants to collect and store data. Common data entities in business include customer, vendor, product, and employee.

Master data entities are the main entities of a company such as customers, products, suppliers, employees, and assets.

Each department has distinct master data needs. Eg. Marketing pricing, brand, packaging, whereas

Production costs and schedules

27

IS433 – Information Management

Page 28: Database, Data Warehouse and Data Mining

Benefits of a unified view of customers

Better, more accurate customer data to support marketing, sales, support and service

Better responsiveness to ensure that all employees who deal with customers have up-to-date, reliable information on customers

Better revenue management and more responsive business decisions.

28

IS433 – Information Management

Page 29: Database, Data Warehouse and Data Mining

Transforming data into knowledge

Businesses do not run on raw data, but run on data that have been processed into information and knowledge.

29

IS433 – Information Management

Page 30: Database, Data Warehouse and Data Mining

Transforming Data into Knowledge

Extract, Transform and Load

30

IS433 – Information Management

Page 31: Database, Data Warehouse and Data Mining

Data quality and integrity

Data collection process that can create problems concerning quality of data being collected.

Regardless of how the data are collected, they need to be validated so users know they can trust them.

Garbage in, Garbage out Garbage in, Gospel out risker poor-quality

data are trusted and used for planning. DQ is a measure of the data’s usefulness as well as

the quality of decisions based on data. Accuracy, Accessibility, Relevance, Timeliness and

Completeness

31

IS433 – Information Management

Page 32: Database, Data Warehouse and Data Mining

32

IS433 – Information Management

Page 33: Database, Data Warehouse and Data Mining

Data Privacy and Ethical Use

Businesses that collect data about employees, customers, or anyone else have the duty to protect data.

Data should be accessible only to authorized people.

Securing data Difficult and Expensive

To invest in data securities, the government has imposed enormous fines and penalties for data breaches

33

IS433 – Information Management

Page 34: Database, Data Warehouse and Data Mining

Document Management

Business records contracts, research, accounting documents, memos, customer/client communications and meeting minutes.

Document Management is the automated control of imaged and electronic documents, spreadsheets, voice and email, word processing documents from INITIAL creation to FINAL archiving or destruction.

Document management system (DMS) consist of HW, SW that manage and archive E-documents and convert paper document to E-documents and then index and store them according to policy. Eg. Keep emails for 7 years, Promotions for 1 year and then discards.

34

IS433 – Information Management

Page 35: Database, Data Warehouse and Data Mining

Statistics by Gartner Group

Most office workers lose up to 500 hours a year looking for documents.

On average, professionals spend 50% of their time looking for information.

The average organization: Spends $20 in labor to file each document.

Spends $120 in labor finding each misfiled document.

Loses 1 out of every 20 documents.

Spends 25 hours re-creating each lost document.

The Gartner Group predicts that more than 90% of the organizations will be using a DMS by 2007.

IS433 – Information Management

Page 36: Database, Data Warehouse and Data Mining

3-36

Figure 3.13 Electronic records management from creationto retention or destruction

IS433 – Information Management

Page 37: Database, Data Warehouse and Data Mining

Unstructured business records

Businesses generate volumes of documents, messages, and memos that, by their nature, contain unstructured content that cannot be put into a database.

Many of these materials are business records that must be retained and made available when requested by auditors, investigators, the SEC, the IRS, or other authorities.

To be retrievable, business records must be organized and indexed.

Records are not needed for current operations or decisions, are archived—moved into longer-term storage.

3-37

IS433 – Information Management

Page 38: Database, Data Warehouse and Data Mining

Business Value of E-Records Management

Companies need to be prepared to respond to an audit, federal investigation, lawsuit, or other legal action against it.

Examples of lawsuits: patent violations, fraud, product safety negligence, theft of intellectual property, breach of contract, wrongful termination, harassment, and discrimination

E-discovery is the process of gathering electronically stored information in preparation for trial, legal or regulatory investigation, or administrative action as required by law.

When a company receives an e-discovery request, the company must produce what is requested—or face charges of obstructing justice or being in contempt of court.

3-38

IS433 – Information Management

Page 39: Database, Data Warehouse and Data Mining

Companies have incurred huge costs for not responding to e-discovery

Failure to save e-mails resulted in a $2.75 million fine for Phillip Morris.

Failure to respond to e-discovery requests cost Bank of America $10 million in fines.

Failure to produce backup tapes and deleted e-mails resulted in a $29.3 million jury verdict against UBS Warburg in the landmark case, Zubulake v. UBS Warburg.

3-39

IS433 – Information Management

Page 40: Database, Data Warehouse and Data Mining

3.2 File Management System

Records, File

Bit, Byte

Database – Primary key, Secondary key, Foreign key

40

IS433 – Information Management

Page 41: Database, Data Warehouse and Data Mining

Figure 3.6Hierarchy of data for a computer-based file.

IS433 – Information Management

Page 42: Database, Data Warehouse and Data Mining

Limitations Data Redundancy – Different programmers create

different data-manipulating applications, the same data could be duplicated in several files.

Data inconsistency – Actual data values are not synchronized across various copies of data. For example, Customers with serveral loans, for each loan there is a file containing customer fields (name, address, email, phone), then a change to a customers’s address in only one file creates inconsistencies.

Data isolation – File organization creates silos of data that make it extremely difficult to access data from different applications. For example, wants to know which product customers bought and which customers own more than 1000, To get results, have to filter and integrate data manually from multiple files.

42

IS433 – Information Management

Page 43: Database, Data Warehouse and Data Mining

Limitations

Data security – Securing data is difficult in the file environment because new applications are added to the system. As the number of applications increases, so does the number of people who can access data

Lack of data integrity – In the file environment it is harder to enforce data integrity rules, which include preventing data input errors, eg. SSN (social security Number)

Data concurrency – At the same time, one is updating record, another may be accessing that record can’t get the most current update. To prevent a concurrency problem, applications and data need to be independent of one another. In file environment, they are dependent.

43

IS433 – Information Management

Page 44: Database, Data Warehouse and Data Mining

Figure 3.8

Computer-based files of this type cause problems such as redundancy,inconsistency, and data isolation.

IS433 – Information Management

Page 45: Database, Data Warehouse and Data Mining

3.3 Database and DBMS

Database helps minimize data redundancy, data isolation and data inconsistency.

Data can be shared among users

Security and data integrity are easier to control and application are independent of the data they process.

There are two basic types of databases: Centralized and Distributed.

45

IS433 – Information Management

Page 46: Database, Data Warehouse and Data Mining

Database Types

Figure 3.9 (a) Centralized database. (b) Distributed database with complete or partial copies of the central database in more than one location.

46

IS433 – Information Management

Page 47: Database, Data Warehouse and Data Mining

Centralized Databases

Stores all related files in one location More consistent

Files are not accessible except via the centralized host computer, where they can be protected more easily from unauthorized access or modification.

Vulnerable to a single point of failure.

Computer fails, all users are affected.

When users are widely dispersed and must perform data manipulations from distances, they often experience transmission delays.

47

IS433 – Information Management

Page 48: Database, Data Warehouse and Data Mining

Distributed Databases

A replicated database store complete copies of the entire database in multiple location. This arrangement provides backup in case of a failure of problems with the centralized database.

Improve the response time (local users)

Much more expensive to set up and maintain because each replica must be updated as records are added to, modified in and deleted from any of the databases.

The updates may be done at the end of the day, otherwise the various databases will contain conflicting data.

48

IS433 – Information Management

Page 49: Database, Data Warehouse and Data Mining

Distributed Databases

A partitioned database is divided up so that each location has a portion of the entire database –usually the portion that meets users’ local needs.

Provide response speed of localized files without the need to replicate all changes in multiple locations.

Advantage: data can be entered more quickly and kept more accurate by the users immediately responsible for data

49

IS433 – Information Management

Page 50: Database, Data Warehouse and Data Mining

Centralized and Distributed Database Architecture

IS433 – Information Management

50

Databases are optimised for transactions and queries Data entering the databases from POS (point of sale) terminals,

scanners, online sales, and other sources are stored in a structured data format, depending upon the type of DBMS.

Databases are optimised for extremely fast processing of queries –or ad hoc user specific data.

Databases need to strike a balance between transaction processing efficiency and query efficiency.

Given these functions, databases cannot be optimised for data mining, complex online analytics processing (OLAP), and decision support.

These limitations led to the introduction of data warehouse technology.

Data warehouses and data mart are optimised for OLAP, data mining, BI and decision support.

Page 51: Database, Data Warehouse and Data Mining

DBMS

A program that provide access to databases

DBMS permits an organization to centralize data, manage them efficiently and provide access to the stored data.

Range from simple Microsoft Access to full-featured Oracle and DB2

51

IS433 – Information Management

Page 52: Database, Data Warehouse and Data Mining

Major Functions of DBMS

Data filtering and profiling – inspect data for errors, inconsistency, redundancy, incomplete information

Data quality – correcting, standardizing, verifying the integrity of data

Data synchronization – Integrating, matching or linking data from disparate sources

Data enrichment – Enhancing data using information from internal and external data sources

Data maintenance – Checking and controlling data integrity over time

52

IS433 – Information Management

Page 53: Database, Data Warehouse and Data Mining

Copyright 2010 John Wiley & Sons, Inc. IS433 – Information Management

Page 54: Database, Data Warehouse and Data Mining

Data Management

Why does data management matter?

No enterprise can be effective without high quality data that is accessible when needed.

Data that’s incomplete or out of context cannot be trusted.

Organizations with at least 1,000 knowledge workers lose ~ $5.7 million annually in time wasted by employees reformatting data as they move among applications.

What is the goal of data management?

To provide the infrastructure and tools to transform raw data into usable information of the highest quality.

3-54

IS433 – Information Management

Page 55: Database, Data Warehouse and Data Mining

Data Management

Why is data management difficult and expensive?

Volume of data is increasing exponentially.

Data is scattered throughout the organization.

Data is created and used offline without going through quality control checks.

Data may be redundant and out-of-date, creating a huge maintenance problem.

3-55

IS433 – Information Management

Page 56: Database, Data Warehouse and Data Mining

Data Management Technologies

IS433 – Information Management

56

Data warehouses – integrate data from multiple databases and data silos and organise them for complex analysis, knowledge discovery, and to support decision making

Data Marts – small scale data warehouses that supports a single function or department. Organisation that are unable to invest in data warehousing may start with one or more data marts

Business Intelligence (BI) – tools and techniques process data and do statistical analysis for insight and discovery-that is, to discover meaningful relationships in the data, keep informed of real time, gain insight, detect trends, and identify opportunities and risks.

Page 57: Database, Data Warehouse and Data Mining

Current key issues

Master data management (MDM): Processes to integrate data from various sources and enterprise apps in order to create a unified view of the data.

Document management system (DMS): Hardware and software to manage, archive, and purge files and other electronic documents (e-documents).

Most of DMS are workflow Software Green computing!

Green computing: Efforts to conserve natural resources and reduce effects of computer usage on the environment.

3-57Data Management

IS433 – Information Management

Page 58: Database, Data Warehouse and Data Mining

3.4 Data Warehouses, Data Marts and Data Centers

It’s not necessarily the biggest companies that are the most successful, but the smartest ones.

Being a smart company means having on-demand access to relevant data, understanding them (data visualization), and using what you learn from them to increase productivity and profitability.

Data warehouses support and help them make smartest decision

58

IS433 – Information Management

Page 59: Database, Data Warehouse and Data Mining

Data Warehouses

DW is a repository (a type of database) in which data are organized so that they can be readily analyzed using methods such as data mining, decision support, querying and other applications.

Examples are revenue management, CRM, Fraud detection, payroll-management applications

Databases are designed and optimized to store data whereas data warehouses are designed and optimized to respond to analysis questions that are critical for a business.

59

IS433 – Information Management

Page 60: Database, Data Warehouse and Data Mining

Data Warehouses

Databases are online transaction processing (OLTP) systems in which every transaction is recorded quickly.

For example, withdraws from a bank ATM, these transactions must be recorded and processed as they occur in real-time. Databases systems for banking are designed to ensure that every transaction get recorded immediately.

Database are volatile because data are constantly added, edited or updated.

The volatility caused by transaction processes makes data analysis too difficult.

To overcome this, data are Extracted from designated databases, Transformed and Loaded into a data warehouse.

These data are read-only data. They remain the same until the next scheduled ETL.

Warehouse data are not volatile so data warehouse are designed as online analytical processing (OLAP) system

60

IS433 – Information Management

Page 61: Database, Data Warehouse and Data Mining

Trend towards more real-time support from data warehouse

Modern business world is experiencing a growing trend toward real-time data warehousing and analytics.

In the past, it did not require instant response time, direct customer interaction.

Companies with an active data warehouse will be able to interact appropriately with a customer to provide superior customer service enhance companies’ revenues.

61

IS433 – Information Management

Page 62: Database, Data Warehouse and Data Mining

Benefits of data warehouse

Benefits are both business and IT-related

From business perspective, companies can make better decisions because they have access to better information.

From IT perspective, DWs deliver information more effectively and efficiectly.

62

IS433 – Information Management

Page 63: Database, Data Warehouse and Data Mining

Benefits of data warehouse

Marketing – Use DW for product introductions, product information access, marketing program effectiveness and product line profitability. Maximize per customer profitability

Pricing and contracts – Use data to calculate costs accurately to optimize pricing. Too low or too high.

Forecasting – Visibility of end customer demand

Sales – Determine sales profitability and productivity for all territories and regions.

Financial – Use daily, weekly or monthly results for improved financial management.

63

IS433 – Information Management

Page 64: Database, Data Warehouse and Data Mining

Characteristics of Data Warehouse

Organization : data are organized by subject (customer, vendor, product, price level and region)

Consistency : Data in different databases may be encoded differently. In WH, they are coded in consistent manner. Eg. 0/1 or M/F

Time variant : The data are kept for many years so they can be used for identifying trends, forecasting and making comparisons over time

Nonvolatile : Once the data are entered into WH, they are not updated.

Relational : Data DW uses relational structure.

64

IS433 – Information Management

Page 65: Database, Data Warehouse and Data Mining

Characteristics of Data Warehouse

Client/Server : Data WH uses client/server architecture mainly to provide the end user an easy access to its data

Web-based : Data WH are designed to provide an efficient computing environment for Web-based applications

Integration : Data from various sources are integrated. Web services are used to support integration.

Real-time : Provide real-time capabilities

65

IS433 – Information Management

Page 66: Database, Data Warehouse and Data Mining

IS433 – Information Management

Page 67: Database, Data Warehouse and Data Mining

Building a Data Warehouse

Very large and expensive

Need to address a series of basic questions Does top management support Data WH?

Do users support Data WH?

Do users want access to broad range of data? Single repository or a set of standalone data marts?

Do users want data access and analysis tools?

Do users understand how to use the data WH to solve business problems?

Does the unit have one or more power users who can understand data WH technologies?

67

IS433 – Information Management

Page 68: Database, Data Warehouse and Data Mining

Building a Data Warehouse

11

68

IS433 – Information Management

Page 69: Database, Data Warehouse and Data Mining

Suitability

Data WH is appropriate for organizations that have some of following characteristics

End users need to access large amount of data Operational data are stored in different systems Organization employ an information-based approach to

management. Organization serves a large, diverse customer base. Same data are represented differently in different systems Data are store in highly technical formats that are difficult

to decipher. Extensive end-user computing is performed (many end

users performing many activities)

69

IS433 – Information Management

Page 70: Database, Data Warehouse and Data Mining

Data marts

Data warehouse – Too expensive for a company to implement

As an alternative, many firms create a lower-cost, scaled down version of a data warehouse called a data mart.

Data marts are designed for a strategic business unit, or a single department

Allow for local rather than central control.

Contain less information than Data WH

Respond more quickly and easier to understand

70

IS433 – Information Management

Page 71: Database, Data Warehouse and Data Mining

71

Page 72: Database, Data Warehouse and Data Mining

Data Center

is the name given to the newer facilities containingmission-critical ISs and components that deliver data and IT services to the enterprise.

Integrate networks, computer systems and storage devices.

Insure the availability of power and provide physical and data security.

Newest data centers include temperature and fire controls, physical and digital security, redundant power supplies as uninterruptible power sources (UPS), redundant data communications connections.

72

IS433 – Information Management

Page 73: Database, Data Warehouse and Data Mining

Enterprise Content Management

Become an important data management technology, particularly for large and medium-sized organizations.

Includes electronic document management, web content management, digital asset management and electronic records management (ERM).

ERM infrastructures help reduce costs, easily share content across the enterprise, minimize risk, automate expensive time-intensive and manual processes and consolidate multiple web sites onto a single platform.

73

IS433 – Information Management

Page 74: Database, Data Warehouse and Data Mining

4 key forces

4 key forces are driving organizations to adopt a strategic, enterprise-level approach to planning and deploying content systems (Content Systems).

Compounding growth of content generated by organizations

The need to integrate that content within business processes

The need to support increasing sophistication for business user content access and interaction.

The need to maintain governance and control over content to ensure regulatory compliance and preparedness for legal discovery.

74

IS433 – Information Management

Page 75: Database, Data Warehouse and Data Mining

Discovery or the request for information

Nearly 90 percent of US corporations become engaged in lawsuits

The average $1 billion company in US faces 147 lawsuits

Each lawsuit will involve discovery or the request for information (almost always involves the request for email and other electronic communications).

75

Page 76: Database, Data Warehouse and Data Mining

Discovery

Discovery is the process of gathering information in preparation for trial, legal or regulatory investigations, or administrative action as required by law.

Electronic information is involved; the process is called E-Discovery.

Serveral cases where a company incurred huge costs for not responding to e-discovery are followings: Failure to save emails results in a $2.75 million fine for Phillip

Morris

Failure to respond to e-discovery requests cost Bank of America $10 million

Failure to produce backup tapes and deleted emails results in a $29.3 million jury verdict against USB Warburg.

76

Page 77: Database, Data Warehouse and Data Mining

Managerial Issues

Reducing uncertainty – Requires a data infrastructure that can capture, process and report information in near real-time.

Cost-benefit issues and justification – Some solutions are expensive and justifiable only in large corporations. Smaller ones can make solutions cost effective if they make use of existing databases rather than creating new ones.

Where to store data physically – Should data be distributed close to their users? This arrangement could speed up data entry and updating, but it could also generate replication and security risks. Should data be centralized for easier control, security and disaster recovery? This offers fewer communications and single-point-of failure risks.

77

Page 78: Database, Data Warehouse and Data Mining

Managerial Issues

Legal issues – Failure to manage electronic records exposes companies to fines from the courts and regulatory agencies such as IRS.

Internal or External – Should a firm invest internally or externally?

Disaster recovery – Can an organization’s business processes (dependent on databases and data WH) recover after an information system disaster?

Data security and ethics – Are the company’s customer and other competitive data safe from snooping and sabotage ? Are confidential data, such as personnel details safe from improper or illegal access?

78

Page 79: Database, Data Warehouse and Data Mining

Managerial Issues

Privacy – Storing data in a WH, and conducting data mining may result in the invasion of individual privacy. What will the company do to protect individuals?

The legacy data problem – Data in older, perhaps obsolete, databases still need to be available to newer database management systems. Many of legacy application programs used to access the older data cannot be converted into new computing environment without expense. Two approaches to solve this problem Create a database front end that can act as a translator from the old

system to the new.

Integrate the older applications into the new system so that data can be seamlessly accessed in the original format.

79

Page 80: Database, Data Warehouse and Data Mining

Managerial Issues

Data Delivery – Moving data around an enterprise efficiently is often a major problem.

80

Page 81: Database, Data Warehouse and Data Mining

Quiz

What is the goal of data management?

Explain how having detailed real-time or near real-time data can improve productivity and decision quality.

How are organizations using their data warehouses to improve consumer satisfaction and the company’s profitability?

81

Page 82: Database, Data Warehouse and Data Mining

Link Library

Advizor Solutions, data analytics and visualization http://advizorsolutions.com/

Clarabridge: How Text Mining Works http://clarabridge.com/

SAS Text Miner http://sas.com/

Tableau data visualization software http://tableausoftware.com/data-visualization-software/

EMC Corp., enterprise content management http://emc.com

Oracle DBMS http://oracle.com/

Copyright 2012 John Wiley & Sons, Inc.

3-82