50
CHAPTER 4 CHAPTER 4 DATABASES AND DATA DATABASES AND DATA WAREHOUSES WAREHOUSES A Gold Mine of A Gold Mine of Information Information

CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

Embed Size (px)

Citation preview

Page 1: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

CHAPTER 4CHAPTER 4

DATABASES AND DATA DATABASES AND DATA WAREHOUSESWAREHOUSES

A Gold Mine of A Gold Mine of InformationInformation

Page 2: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

Today, Organizations Need...Today, Organizations Need... Information to compete effectively Information just to stay alive in the information age Information organized in such a way that you can easily and quickly get to it Information-processing tools that help you work with information

Introduction4-2

Page 3: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

Using Databases and Data Using Databases and Data Warehouses Instead of Warehouses Instead of

Shopping CartsShopping Carts Mervyn 抯 Data warehouse and data mining tools

Page 4: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

YOUR FOCUS IN THIS CHAPTERYOUR FOCUS IN THIS CHAPTER

The Difference Between Logical and Physical Views of Information

Databases and Database Management Systems

How You Can Develop Database Applications

Data Warehouses and Data Mining Tools

Introduction4-3

Page 5: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

THREE THINGS THREE THINGS ORGANIZATIONS DO WITH ORGANIZATIONS DO WITH

INFORMATIONINFORMATION1.Process information in the form of trans

actions

2.Use information to make a decision

3.Manage information while it 抯 used

Information Revisited4-4

Page 6: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

PROCESSING INFORMATION IN PROCESSING INFORMATION IN THE FORM OF TRANSACTIONSTHE FORM OF TRANSACTIONS

Such as payroll processing, order processing, and handling your registration requests for classes.

This is called ONLINE TRANSACTION PROCESSING (OLTP) - the gathering of input information, processing that information, and updating existing information to reflect the gathered and processed information.

Operational databases support OLTP.

Information Revisited4-5

Page 7: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

USING INFORMATION TO USING INFORMATION TO MAKE A DECISIONMAKE A DECISION

For answering such questions as, 揌 ow many senior-level marketing majors have not taken statistics?

This is called ONLINE ANALYTICAL PROCESSING (OLAP) - the manipulation of information to support decision making.

Data warehouses support OLAP.

Information Revisited4-6

Page 8: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

MANAGING INFORMATION WHIMANAGING INFORMATION WHILE ITLE IT 扴扴 USEDUSED

Determining who can view or use information Specifying how to back up information Identifying what storage technologies to use

Information Revisited4-7

Most importantly, managing information includes organizing it so that people can logically use it without having to know anything about its physical structure. The difference between logical and physical is key.

Page 9: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

In managing information, physical deals with the structure of information as it resides on various storage media.

Logical deals with how knowledge workers view their information needs, and includes such terms as:– CHARACTER - our smallest unit of information.– FIELD - group of related characters.– RECORD - group of related fields.– FILE - group of related records.– DATABASE - group of logically associated files.– DATA WAREHOUSE - information from many databases.

Information Revisited4-8

Page 10: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

DATABASEDATABASE

A database is actually composed of two parts:1. the information itself

– the files that are logically associated

2. the logical structure of the information– called the data dictionary

Databases

a collection of information that you organize and access according to the logical structure of that

information.

4-9

Page 11: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

A Database Is a Collection of A Database Is a Collection of InformationInformation

Most databases contain two or more files with related information.

The Inventory database (Figure 4.4, page 125) contains two files - Part and Facility.

These two files are logically related because parts are stored in facilities and because you would use both of these files to manage your inventory.

Databases4-10

Page 12: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

A Database Contains a Logical A Database Contains a Logical StructureStructure

You organize and access a database by its logical structure, not its physical position.

DATA DICTIONARY - contains the logical structure of information in a database.

The data dictionary contains the logical properties that describe information in a database.

See Figure 4.5 (page 126) for the data dictionary of the Percentage Markup field in the Inventory database.

Databases4-11

Page 13: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

A Database Has Logical Ties Among A Database Has Logical Ties Among the Informationthe Information

A PRIMARY KEY is a field in a database file that uniquely describes each record.

A FOREIGN KEY is a primary key of one file that also appears in another file. So, foreign keys specify how files are logically related.

For example, the Part and Facility files are logically related. So, in Figure 4.4 you can see that Facility Number (the primary key for the Facility file) exists in the Part file (where it 抯 a foreign key).

Databases4-12

Page 14: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

A Database Contains Built-in A Database Contains Built-in Integrity ConstraintsIntegrity Constraints

An INTEGRITY CONSTRAINT is a rule that helps assure the quality of the information in a database.

A registration database at your school includes integrity constraints concerning prerequisites for certain classes.

Our Inventory database includes an integrity constraint that says a part in the Part file cannot be assigned to a facility that does not exist in the Facility file.

Databases4-13

Page 15: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

DATABASE MANAGEMENT DATABASE MANAGEMENT SYSTEM (DBMS)SYSTEM (DBMS)

A DBMS contains 5 software components:1. DBMS engine2. Data definition subsystem3. Data manipulation subsystem4. Application generation subsystem5. Data administration subsystem

Database Management Systems

the software you use to specify the logical organization for a database and access it.

4-14

Page 16: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

DBMS ENGINEDBMS ENGINE

Recall that: PHYSICAL VIEW deals with how information is physically arranged, stored, and accessed on some type of secondary

storage device. LOGICAL VIEW focuses on how you need to arrange and access information to meet your particular business needs.

DBMSs

accepts logical requests from the various other DBMS subsystems, converts them to their

physical equivalent, and actually accesses the database and data dictionary as they exist on a

storage device.

4-15

Page 17: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

DATA DEFINITION SUBSYSTEMDATA DEFINITION SUBSYSTEM

You use this subsystem to define the information logical structure when you first create a database.

Once you 抳 e created a database, you use this subsystem to define new fields, delete fields, or change field properties.

Figure 4.5 (page 126) contains this subsystem screen for the Part file.

DBMSs

helps you create and maintain the data dictionary and define the structure of the files in

a database.

4-16

Page 18: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

DATA MANIPULATION SUBSYSTEMDATA MANIPULATION SUBSYSTEM

This subsystem is most often the primary interface between you as a user and the information contained in a database.

Tools in this subsystem include views, report generators, query-by-example tools, and structured query language.

DBMSs

helps you add, change, and delete information in a database and mine it for valuable information.

4-17

Page 19: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

DATA MANIPULATION TOOLSDATA MANIPULATION TOOLS

VIEW - allows you to see the content of a database file, make whatever changes you want, perform simple sorting, and query to find the location of specific information. See Figure 4.7 page 129.

REPORT GENERATOR - helps you quickly define formats of reports and what information you want to see in a report. See Figures 4.8 and 4.9 page 130.

DBMSs4-18

Page 20: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

DATA MANIPULATION TOOLSDATA MANIPULATION TOOLS QUERY-BY-EXAMPLE (QBE) TOOL - helps you gra

phically design the answer to a question. Figure 4.10 (page 130) shows the QBE for displaying the names and phone numbers of facility managers in charge of parts that cost more than $10.

STRUCTURED QUERY LANGUAGE (SQL) - a standardized fourth-generation language found in most database environments. SQL is the same as QBE, except that you perform a query by creating a statement instead of pointing, clicking, dragging.

DBMSs4-19

Page 21: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

APPLICATION GENERATION APPLICATION GENERATION SUBSYSTEMSUBSYSTEM

Tools for creating data entry screens (See Figure 4.12 page 131 for an example) Programming languages specific to a particular DBMS Interfaces to commonly used programming languages that are independent of

any DBMS.

DBMSs

contains facilities to help you develop transaction-intensive applications. This

subsystem includes:

4-20

Page 22: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

DATA ADMINISTRATION DATA ADMINISTRATION SUBSYSTEMSUBSYSTEM

Backup and recovery Security management Query optimization Reorganization Concurrency control Change management

DBMSs

helps you manage the overall database environment by providing facilities for:

4-21

Page 23: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

THE RELATIONAL DATABASE THE RELATIONAL DATABASE MODELMODEL

This is the most popular model. Each table is called a RELATION. A relation contains information about a particular ENTITY CLASS (a concept - people, places, or things -

about which you wish to store information and that you can identify with a unique key).

Database Models

a database model that uses a series of two-dimensional tables or files to store information.

4-22

Page 24: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

Figure 4.14 (page 136) shows a relational database for a video rental store.

The entity classes are Customer, Video, Video Rental, and Distributor.

Notice how these tables are related to each other through the use of foreign keys.

In the Video Rental relation, you 抣 l find a primary key that uses more than one one field to create a unique description. This is called a COMPOSITE PRIMARY KEY.

A primary key that uses only one field is called an ATOMIC PRIMARY KEY.

Database Models4-23

Page 25: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

THE OBJECT-ORIENTED (O-O) THE OBJECT-ORIENTED (O-O) DATABASE MODELDATABASE MODEL

An OBJECT-ORIENTED DATABASE MANAGEMENT SYSTEM (O-O DBMS) is the DBMS software that allows you to develop and work with an O-O database.

Database Models

a database model that brings together, stores, and allows you to work with both information and

procedures that act on the information.

4-24

Page 26: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

This model takes advantage of the concept of an OBJECT - a software module containing information that describes an entity class along with a list of procedures that can act on the information describing the entity class.

Figure 4.15 (page 138) shows the same video rental store using the O-O database model.

Notice that the objects (entity classes) - which include Customer, Video Rental, Video, and Distributor - contain both information and procedures for working with that information.

See Appendix C for more on objects.

Database Models4-25

Page 27: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

DEVELOPING YOUR OWN DEVELOPING YOUR OWN DATABASEDATABASE

Being able to develop your own database is a part of knowledge worker computing.

Building a database for your personal needs includes the following 4 steps:1. Defining entity classes and primary keys2. Defining relationships among entity classes3. Defining information (fields) for each relation4. Using a data definition language to create the database

Follow along as we build the database to support the report in Figure 4.16 on page 140.

Developing Databases4-26

Page 28: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

You own a small business and are interested in tracking employees by the department in which they work, job assignment, and the number of hours assigned.

Each of your employees can be assigned to only one department, but a department may have many employees (a department, however, may not have any employees assigned to it). Each employee can be assigned to any number of jobs and a job can have many employees assigned to it, but it 抯 not necessary that any employees be assigned to a certain job.

Page 29: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

#1 - DEFINING ENTITY CLASSES #1 - DEFINING ENTITY CLASSES AND PRIMARY KEYSAND PRIMARY KEYS

From the report in Figure 4.16, you can identify the entity classes as Employee, Department, and Job.

Now, for each entity class, you must define a primary key that provides a unique description. These include:• Employee entity class - Emp ID (e.g., 2345 for Smith)• Department entity class - Dept (e.g., 15)• Job entity class - Job (e.g., 14 for Acct)

Developing Databases4-27

Page 30: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

#2 - DEFINING RELATIONSHIPS #2 - DEFINING RELATIONSHIPS AMONG ENTITY CLASSESAMONG ENTITY CLASSES

For this step, use an ENTITY-RELATIONSHIP (E-R) DIAGRAM, a graphical method of representing entity classes and their relationships.

See Figure 4.17 (page 140) for the initial E-R diagram of our database and a listing of E-R diagram symbols.

Developing Databases4-28

Page 31: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

An Employee must be assigned to a Department. An Employee cannot be assigned to more than one Department. A Department may have many Employees assigned to it. A Department is not required to have any Employees assigned to it.

Developing Databases

EMPLOYEE DEPARTMENTM:1

4-29

Page 32: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

After building the initial E-R diagram, you must follow the process of normalization.

NORMALIZATION is a process of assuring that a relational database structure can be implemented as a series of two-dimensional tables.

Normalization includes the following 3 steps:1.Eliminate repeating groups or M:M relationships2.Assure that each field in a relation depends only on the primary ke

y of that relation3.Remove all derived fields from the relations.

Developing Databases4-30

Page 33: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

The first rule of normalization states that no M:M relationships can exist.

There is an M:M between Employee and Job. You eliminate this by creating an INTERSECTION RELAT

ION - a relation you create to eliminate a repeating group. An intersection relation will have a composite primary key t

hat consists of the primary key fields from the two intersecting relations.

In Figure 4.18 (page 142), we created an intersection relation called Employee-Job to eliminate the M:M relationship.

Developing Databases4-31

Page 34: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

#3 - DEFINING INFORMATION (F#3 - DEFINING INFORMATION (FIELDS) FOR EACH RELATIONIELDS) FOR EACH RELATION

In this step, you follow rules #2 and #3 of normalization.

Your goal here is two-fold:1.Make sure that the information in each relation is indeed

in the correct relation

2.Make sure that the information cannot be derived from other information.

Developing Databases4-32

Page 35: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

To determine if information is in the correct relation, ask:揇 oes this piece of information depend only on the primar

y key for this relation? If the answer is yes, the information is in the correct relat

ion. In the Employee relation (Figure 4.20 page 144), we cur

rently store Dept Sup. Does Dept Sup depend on Emp ID?

The answer is no - Dept Sup depends on Dept, so it should be in the Department relation.

Developing Databases4-33

Page 36: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

Derived information - information that can be mathematically determined from other information - should not be stored in your database.

For example, # Emp is a field in the Department relation.

However, we can simply count the number of occurrences of each Dept in the Employee relation and determine the number of employees.

So, we remove # Emp from the database.

Developing Databases4-34

Page 37: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

#4 - USING A DATA DEFINITION #4 - USING A DATA DEFINITION LANGUAGE TO CREATE THE DLANGUAGE TO CREATE THE D

ATABASEATABASE The final step is to actually create the relations

you identified in steps 1-3. You do this with a data definition language. This step includes:

– Developing a data dictionary– Defining the various relations– Defining primary keys and relationships

Developing Databases4-35

Page 38: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

Employee Relation

字段名 类型 长度 说明Emp Id 字符型 4 员工的代码Name 字符型 20Dept 字符型 2

Job Relation

字段名 类型 长度 说明Job 字符型 2 工作的代码

Job Name 字符型 10

Page 39: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

Department Relation字段名 类型 长度 说明Department 字符型 2 部门的代码Dept Sup 字符型 20

Employee Relation字段名 类型 长度 说明Emp Id 字符型 4 员工的代码Job 字符型 10Hours 数字型 单精度

Page 40: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

DATA WAREHOUSEDATA WAREHOUSE

are a logical extension of databases support OLAP are among the newest and hottest buzz words and concepts in the IT field.

Data Warehouses

a logical collection of information - gathered from many different operational databases - that supports business analysis activities and

decision-making tasks. Data warehouses...

4-36

Page 41: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

DATA WAREHOUSE FEATURESDATA WAREHOUSE FEATURES Data warehouses combine information from different

databases– Making them a true repository of all an organization 抯 informati

on

Data warehouses are multi-dimensional– As opposed to 2 dimensions in the relational model– Often called hypercubes (See Figure 4.23 page 148)

Data warehouses support decision making– While databases support OLTP, data warehouses support OLAP

Data Warehouses4-37

Page 42: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

DATA MINING TOOLSDATA MINING TOOLS

QUERY-AND-REPORTING TOOLS - QBE tools, SQL, and report generators. INTELLIGENT AGENTS - various artificial intelligence tools that form the basis

for 搃 nformation discovery?in OLAP. MULTIDIMENSIONAL ANALYSIS (MDA) TOOLS - slice-and-dice techniques t

hat allow you to view multidimensional information from different perspectives.

Data Warehouses

the software tools you use to query information in a data warehouse.

4-38

Page 43: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

IMPORTANT CONSIDERATIONS IMPORTANT CONSIDERATIONS IN USING A DATA WAREHOUSEIN USING A DATA WAREHOUSE

Do you need a data warehouse? Do you already have a data warehouse? Who will the users be? How up-to-date must the information be? What data mining tools do you need?

Data Warehouses4-39

Page 44: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

How will changes in technology affect organizing and managing information

What types of database model and databases are most appropriate

Page 45: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

MANAGING THE INFORMATION MANAGING THE INFORMATION RESOURCERESOURCE

How will changes in technology affect organizing and managing information?

What types of database models and databases are most appropriate?

Who should oversee the organization 抯 information?

Managing Information4-40

Page 46: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

OVERSEEING YOUR ORGANIZATOVERSEEING YOUR ORGANIZATIONION 扴扴 INFORMATIONINFORMATION

CHIEF INFORMATION OFFICER (CIO) is the IT manager who directs all IT systems and personnel while communicating directly with the highest levels of the organization.

DATA ADMINISTRATION plans for, oversees the development of, and monitors the information resource.

DATABASE ADMINISTRATION is responsible for the more technical and operational aspects of managing information in databases.

Managing Information4-41

Page 47: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

MANAGING THE INFORMATION MANAGING THE INFORMATION RESOURCERESOURCE

Is information ownership a consideration? What are the ethics involved in organizing and

managing information? How should databases and database

applications be developed and maintained?

Managing Information4-42

Page 48: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

TO SUMMARIZETO SUMMARIZE How we view information:

– The physical view of information deals with how information is physically arranged, stored, and accessed on some type of secondary storage device.

– The logical view of information focuses on how you need to arrange and access information to meet your particular business needs.

A database is a collection of information that you organize and access according to the logical structure of that information.

The data dictionary contains the logical structure of information in a database.

4-43

Page 49: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

TO SUMMARIZETO SUMMARIZE A database management system is the software you us

e to specify the logical organization for a database and access it.

Popular database models include the relational model and the object-oriented model.

The four steps of developing a personal database application include:1. Define entity classes and primary keys2. Define relationships among entity classes3. Define information (fields) for each relation4. Use a data definition language to create the database

4-44

Page 50: CHAPTER 4 DATABASES AND DATA WAREHOUSES A Gold Mine of Information

TO SUMMARIZETO SUMMARIZE Data warehouses are a logical collection of

information - gathered from many different operational databases - that supports business analysis activities and decision-making tasks.

Data mining tools - the software tools you use to query information in a data warehouse - include query-and-reporting tools, intelligent agents, and multidimensional analysis (MDA) tools.

4-45