44
1 Chapter 11 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

  • View
    221

  • Download
    3

Embed Size (px)

Citation preview

Page 1: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

1

Chapter 11Chapter 11Data Management:

Warehousing, Analyzing,

Mining & Vizualization

Page 2: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

2

Learning Objectives

Recognize the importance of data, their managerial issues, and their life cycle.

Describe the sources of data, their collection, and quality issues.

Relate data management to multimedia and document management.

Explain the operation of data warehousing and its role in decision support.

Page 3: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

3

Learning Objectives (cont.)

Understand the data access and analysis problem and the data mining and online analytical processing solutions.

Describe data presentation methods and explain geographical information systems, visual simulations, and virtual reality as decision support tools.

Discuss the role and provide examples of marketing databases.

Recognize the role of the Web in data management.

Page 4: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

4

Case: Sears & Data Warehouses Problem: Sears was caught by surprise in the 1980s as shoppers defected to

specialty stores and discount mass merchandisers.

Solution: Sears constructed a single sales information data warehouse, replacing 18

old databases which were packed with redundant, conflicting & obsolete data.

By 2001, Sears made the following Web initiatives: e-Commerce home improvement center B2B supply exchange for the retail industry Online Toy catalog and much more

Page 5: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

5

Case: Sears & Data WarehousesResults: The ability to monitor sales by item per store enables Sears to create a

sharp local market focus. Data monitoring of Web-based sales helps Sears marketing and Web

advertisement plans. Response time to queries has dropped from days to minutes. The data warehouse offers Sears employees a tool for making better

decisions. Sears retailing profits have climbed more than 20 % annually since the

data warehouse was implemented.

Page 6: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

6

Difficulties of Managing Data The amount of data increases exponentially.

Data are scattered throughout organizations and are collected by many individuals using several methods and devices.

Only small portions of an organization’s data are relevant for any specific decision.

An ever-increasing amount of external data needs to be considered in making organizational decisions.

Data are frequently stored in several servers and locations in an organization.

Page 7: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

7

Difficulties of Managing Data (cont.)

Raw data may be stored in different computing systems, databases,

formats, and human and computer languages.

Legal requirements relating to data differ among countries and

change frequently.

Selecting data management tools can be a major problem because of

the huge number of products available.

Data security, quality, and integrity are critical yet are easily

jeopardized.

Page 8: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

8

Data Life Cycle

Page 9: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

9

Data Sources & Collection

Internal Data. An organization’s internal data are about people, products, services, and processes.

Personal Data. IS users or other corporate employees may document their own expertise by creating personal data.

External Data. There are many sources for external data, ranging from commercial databases to sensors and satellites.

The Internet & Commercial Database Services. Some external data flow to an organization through electronic data interchange (EDI), through other company-to-company channels or the Internet.

Page 10: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

10

Data Quality

Data Quality (DQ) is an extremely important issue since quality determines the data’s usefulness as well as the quality of the decisions based on the data.

Page 11: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

11

Data Quality Problems (Strong et al.,1997)

Intrinsic DQ: Accuracy, objectivity, believability, and reputation.

Accessibility DQ: Accessibility and access security.

Contextual DQ: Relevancy, value added, timeliness, completeness, amount of data.

Representation DQ: Interpretability, ease of understanding, concise representation, consistent representation.

Page 12: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

12

Object-Oriented Databases The object-oriented database is the most widely used of the

newest methods of data organization, especially for Web applications.

An object-oriented database is a part of the object-oriented paradigm, which also includes object-oriented programming, operating systems, and modeling.

Object-oriented databases are sometimes referred to as multimedia databases and are managed by special multimedia database management systems.

Page 13: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

13

Document ManagementDocument Management is the automated control of electronic

documents, page images, spreadsheets, word processing documents, and complex, compound documents through their entire life cycle within an organization, from initial creation to final archiving.

Benefits of Document Management : Greater control over production, storage, and distribution of documents

Greater efficiency in the reuse of information

Control of a document through a workflow process

Reduction of product cycle times

Page 14: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

14

Case: U.S. Automobile Association (USAA)

Problem: The USAA is a large insurance company in Texas that serves over 2

million officers. In the 1980s, the company experienced extreme delays in data retrieval and searches.

Solution: Using an environment called Automated Insurance Environment,

USAA has been transformed into a completely paperless company. Results: The system reduces the cost of storing documents, improves

customer service, and improves productivity of employees. USAA now saves $70,500,000 for the 10,000,000 documents handled

annually.

Page 15: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

15

Data Processing

Transactional: The data in transactions

processing systems (TPS) are organized mainly in a hierarchical structure and are centrally processed.

Databases and processing systems are known as operational systems.

Analytical: Analytical processing

involves analysis of accumulated data, mainly by end-users.

Includes DSS, EIS, Web applications, and other end-user activities.

Data processing in organizations can be viewed either as transactional or analytical.

Page 16: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

16

Delivery Systems

A good data delivery system should be able to support:

Easy data access by the end-users themselves.

A quick decision-making process.

Accurate and effective decision making.

Flexible decision making.

Page 17: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

17

Data Warehouses The purpose of a data warehouse is to establish a data repository

that makes operational data accessible in a form readily acceptable for analytical processing activities (e.g. decision support, EIS)

Data warehouses include a companion called metadata, meaning data about data.

Major Benefits of Data Warehouses:(1) The ability to reach data quickly, as they are located in one place.(2) The ability to reach data easily, frequently by end-users

themselves, using Web browsers.

Page 18: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

18

Data Warehouses

Page 19: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

19

Characteristics of Data Warehouses1) Organization. Data are organized by detailed subjects.

2) Consistency. Data in different operational databases may be encoded differently. In the warehouse they will be coded in a consistent manner.

3) Time variant. The data are kept for 5 to 10 years so they can be used for trends, forecasting, and comparisons over time.

4) Non-volatile. Once entered into the warehouse, data are not updated.

5) Relational. The data warehouse uses a relational structure.

6) Client/server. The data warehouse uses the client/server to provide the end user an easy access to its data.

Page 20: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

20

Data Warehouse Suitability Data warehousing is most appropriate for organizations in which

some of the following apply. Large amounts of data need to be accessed by end-users. The operational data are stored in different systems. An information-based approach to management is in use. There is a large, diverse customer base. The same data are represented differently in different

systems. Data are stored in highly technical formats that are difficult to

decipher. Extensive end-user computing is performed.

Page 21: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

21

Data MartsData Marts are an alternative used by many other firms is creation of a

lower cost, scaled-down version of a data warehouse. They refer to small warehouses designed for a strategic business unit (SBU) or a department.

Two major types of Data Marts:1) Replicated (dependent) Data Marts. In such cases one can replicate functional subsets of the data warehouse in smaller databases.2) Stand-Alone Data Marts. A company can have one or more independent data marts without having a data warehouse.

Page 22: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

22

Knowledge Discovery in Databases (KDD)

KDD is the process of extracting useful knowledge from volumes of data.

It is the subject of extensive research. KDD’s objective is to identify valid, novel, potentially useful, and

ultimately understandable patterns in data. KDD is useful because it is supported by three technologies that

are now sufficiently mature: Massive data collection Powerful multiprocessor computers Data mining algorithms

Page 23: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

23

Evolution of KDD Stages in the Evolution of Knowledge Discovery

Evolutionary Stage Business Question Enabling Technologies Characteristics

Data Collection(1960s)

What was my totalrevenue in the last fiveyears?

Computer, tapes, disks. Retrospective, staticdata delivery

Data Access (1980s) What were unit sales inNew England last March?

Relational databases(RDBMS), structured querylanguage (SQL)

Retrospective,dynamic data deliveryat record level

Data Warehousing &Decision Support(early 1990s)

Drill down to Boston? Online analytic processing(OLAP), multidimensionaldatabases, data warehouses

Retrospective,dynamic data deliveryat multiple levels

Intelligent DataMining (late 1990s)

What’s likely to happen toBoston unit sales nextmonth? Why?

Advanced algorithms,multiprocessor computers,massive databases

Prospective,proactive informationdelivery

Source: Courtesy of Accrue Software.

Page 24: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

24

Tools & Techniques of KDD Ad-hoc queries allow users to request in real time information

from the computer that is not available in the periodical reports. Such answers are needed to expedite decision making.

Online analytical processing (OLAP) refers to such end-user activities as DSS modeling using spreadsheets and graphics, which are done online.

Ready-made Web-based Analysis. Many vendors provide ready made analytical tools, mostly in finance, marketing, and operations.

Page 25: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

25

Data Mining Data mining derives its name from the similarities

between searching for valuable business information in a large database,and mining a mountain for valuable ore.

Data mining technology can generate new business opportunities by providing these capabilities: Automated prediction of trends and behaviors. Data

mining automates the process of finding predictive information in large databases.

Automated discovery of previously unknown patterns. Data mining tools identify previously hidden patterns in one step.

Page 26: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

26

Applications of Data Mining

Retailing & Sales Banking

Manufacturing & Production

Brokerage & Securities trading

Computer hardware & software

Insurance Policework Government & Defense Airlines Health care Broadcasting Marketing

Data Mining is currently being used in the following areas;

Page 27: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

27

Text & Web Mining Text mining is the application of data mining to non-

structured or less structured text files. Text mining helps organizations to do the following:

Find the “hidden” content of documents, including additional useful relationships.

Group documents by common themes.

Web Mining refers to mining tools used to analyze a large amount of data on the Web, such as what customers are doing on the Web—that is, to analyze clickstream data.

Page 28: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

28

Data Visualization

Data visualization refers to the presentation of data by technologies such as digital images, geographical information systems, graphical user interfaces, multidimensional tables and graphs, virtual reality, three-dimensional presentations, and animation.

Page 29: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

29

CASE: Data Visualization Helps Haworth

Problem Haworth Corporation, a major office furniture manufacturer, has

maintained a competitive edge by offering customization. But many customers are unable to visualize the 21 million

potential product combinations.Solution: Computer visualization software enables sales representatives

with laptops to show customers exactly what they were ordering.

Results: Reduction in time spent between sales reps and CAD operators,

& increased customer satisfaction with quicker delivery.

Page 30: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

30

Multidimensionality Modern data and information may have several dimensions.

e.g. Management may be interested in examining sales figures in a certain city by product, by time period, by salesperson, and by store.

It is important to provide the user with a technology that allows him or her to add, replace, or change dimensions quickly and easily in a table and/or graphical presentation.

The technology of slicing, dicing, and similar manipulations is called Multidimensionality.

Page 31: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

31

MultidimensionalityThree factors are considered in multidimensionality:       

Examples of dimensions:

Products, salespeople, market segments, business units, geographical

locations, distribution channels, countries,

industries.

Examples of measures:

Money, sales volume, head

count, inventory profit, actual versus forecasted results.

Examples of

time:

Daily, weekly, monthly, quarterly,

yearly.

Page 32: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

32

Advantages of Multidimensionality

.

Data can be presented and navigated with relative ease. Multidimensional databases are easier to maintain.

Multidimensional databases are significantly faster than relational databases as a result of the additional dimensions and the anticipation of how the data will be accessed by users.

Page 33: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

33

Geographic Information Systems (GIS) A geographical information system (GIS) is a computer-based

system for capturing, storing, checking, integrating, manipulating, and displaying data using digitized maps. – Every record or digital object has an identified geographical location.

Banks are using GIS for plotting the following:– Branch and ATM locations– Customer demographics – Volume and traffic patterns of business activities– Geographical area served by each branch– Market potential for banking activities– Strengths and weaknesses against the competition– Branch performance

Page 34: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

34

Geographic Information Systems (GIS) GIS Software varies in its capabilities, from simple computerized

mapping systems to enterprise wide tools for decision support data analysis.

GIS Data are available from a wide variety of sources. Government sources (via the Internet and CD-ROM) provide some data, while vendors provide diversified commercial data as well

GIS & Decision Making. The graphical format of makes it easy for managers to visualize the data & make decisions.

GIS and the Internet or intranet. Most major GIS software vendors are providing Web access, such as embedded browsers, or a Web/Internet/intranet server that hooks directly into their software.

Emerging GIS Applications.

Page 35: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

35

Visual Interactive Modeling (VIM)

Visual interactive modeling (VIM) uses computer graphic displays to represent the impact of different management decisions on goals such as profit or market share. – A VIM can be used both for

supporting decisions & training.

– It can represent a static or a dynamic system.

Visual interactive simulation

(VIS) is one of the most developed areas in VIM. – It is a decision simulation in

which the end-user watches the progress of the simulation model in an animated form using graphics terminals.

Page 36: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

36

Virtual Reality Virtual reality (VR) is interactive, computer-generated, three-

dimensional graphics delivered to the user through a head-mounted display.

VR applications to date have been used to support decision making indirectly. – Boeing has developed a virtual aircraft mock-up to test designs. – At Volvo, VR is used to test virtual cars in virtual accidents.

Data visualization helps financial decision makers by using visual, spatial & aural immersion virtual systems. – Some stock brokerages have a VR application in which users surf over

a landscape of stock futures, with color, hue, and intensity.

Page 37: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

37

Marketing Transaction Database

The Marketing transaction database (MTD) combines many of characteristics of static databases and marketing data sources into a new database that allows marketers to engage in real-time personalization and target every interaction with customers.

The MTD provides dynamic, or interactive, functions not available with traditional types of marketing databases. Exchanging information allows marketers to refine their understanding of

each customer continuously. Data mining, data warehousing, and MTDs are delivered on the

Internet and intranets.

Page 38: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

38

Implementation ExamplesThe following examples illustrate how companies use data mining and

warehousing to support the new marketing approaches;

Alamo Rent-a-Car discovered that German tourists liked bigger cars. So now, when Alamo advertises its rental business in Germany, the ads include information about its larger models.

Au Bon Pain Company discovered that they were not selling as much cream cheese as planned. When they analyzed point-of-sale data, they found that customers preferred small, one-serving packaging.

AT&T and MCI sift through terabytes of customer phone data to fine-tune marketing campaigns and determine new discount calling plans.

Page 39: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

39

CASE: Data Mining Powers Walmart

Wal-Mart’s formula for success owes much to the company’s multimillion-dollar investment in data warehousing.

The systems house data on point of sale, inventory, products in transit, market statistics, customer demographics, finance, product returns, and supplier performance. – The data are used for three broad areas of decision support:

• analyzing trends• managing inventory• understanding customers

The data warehouse is available over an extranet to store managers and suppliers. – In 2001, 5,000 users made over 35,000 database queries each day.

Page 40: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

40

Web-based Data Management Systems Business intelligence activities – from data acquisition, through

warehousing, to mining – can be performed with Web tools or are interrelated with Web technologies and e-Commerce.

e-Commerce software vendors are providing Web tools that connect the data warehouse with EC ordering and cataloging systems.– e.g. Tradelink, a product of Hitachi

Data warehousing and decision support vendors are connecting their products with Web technologies and EC. – e.g. Comshare’s DecisionWeb, Brio’s Brio One, Web Intelligence from

Business Objects, and Cognos’s DataMerchant.

Page 41: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

41

Corporate Portals

Page 42: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

42

Web-based Data Acquisition & Agents

Web-based Data Acquisition Traditional data acquisition has

become a pervasive element in today’s business environment.

This acquisition includes both the recording of information from online surveys and questionnaires, and direct measurements taken in the manufacturing environment.

Intelligent Data Warehouse The amount of data in the data

warehouse can be very large.

While the organization of data is done in a way that permits easy search, it still may be useful to have a search engine for specific applications.

Page 43: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

43

Managerial Issues Cost–benefit issues &

justification. A cost–benefit analysis must be undertaken before any commitment to new technologies.

Where to store data physically. Should data be distributed close to their sources? Or should data be centralized for easier control.

Legal issues. Data mining gives raise to a variety of legal issues.

The legacy data problem. What should be done with masses of information already stored in a variety of formats, often known as the legacy data acquisition problem?

Page 44: 1 Chapter 11 Data Management: Warehousing, Analyzing, Mining & Vizualization

44

Managerial Issues (cont.) Disaster recovery. How well can

an organization’s business processes recover after an information system disaster?

Internal or external? Should a firm store & maintain its databases internally or externally?

Data security and ethics. Are the company’s competitive data safe from external snooping or sabotage?

Ethics. Should people have to pay for use of online data?

Privacy. Collecting data in a warehouse and conducting data mining may result in the invasion of privacy.

Data purging. When is it beneficial to “clean house” and purge information systems of obsolete or non–cost-effective data?

Data delivery. A problem regarding how to move data efficiently around an enterprise also exists.