21
Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3

Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3

Embed Size (px)

Citation preview

Page 1: Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3

Database and Data Warehouse

Module B: Designing and Building a Relational Database

Chapter 3

Page 2: Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3

2

I. The Hierarchy of Data

• Database p.84-85• File (for Entity Class, Table p.97)• Record (for instance of entity class, Row p.97)• Field (Attribute, Column)

– Data Types (only one type for a field)

• Primary Key p.97– Atomic primary key– Composite primary key

Page 3: Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3

3

II. Database and DBMS

•Database definition p.86

A collection of information that you organize and access according to the logical structure of that information:

- centralizing and logically relating data,

- minimizing redundancy, and

- being independent from application programs.

Page 4: Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3

4

• Important Characteristics of a Database p.87-89– A collection of information: logically related tables.– A logical structure: It’s recorded in data

dictionary, e.g., the Design View for Table in Access. A data dictionary is a file that stores definitions of data elements, such as field name and record that contains this field, data type, format, description, ownership, security and so on.

– Logical Tie among Information: Relationships among tables. Two table are connected by common field(s) that are the primary key of one of those tables. Primary key is a field or field(s) that uniquely identify a record in a table. Foreign key is a primary key of one table that also appears in another table as common field(s). (continued)

Page 5: Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3

5

• Important Characteristics of a Database (continued)– Data Integrity Constraints: rules that help ensure

the quality of the information. For example,– Data entry error prevention: Validation Rule in

Design View of Access Table.– Referential integrity: Enforce referential integrity

for Relationships of tables in Access database. For common fields between two tables, new value(s) must be first assigned to the primary key and then can be used in another table as a value(s) of foreign key (p.89). You have to delete records in the reverse order.

Page 6: Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3

6

• Five Components of a DBMS p.90 Fig. 3.6– DBMS engine: It accepts logical requests from

other DBMS subsystems and converts them into physical equivalent to actually accesses database.

– Data definition language: specify logical structure of database, e.g., Design View of Table in Access..

– Data manipulation language: an interface helps you add, change, delete and query information in a database. For example, Datasheet View of Table, Report generator, and QBE (Query-by-Example) and SQL (Structured Query Language) for extracting information to answer certain questions.

(continued)

Page 7: Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3

7

• Five Components of a DBMS (continued)– Application generator: It enables you to create

applications without writing any programs. For Access database applications, you can create tables, queries, reports, forms (p.94 Fig. 3.12), even macros without writing any programming commands.

– Data administration subsystem: p.94– Backup and recovery

– Security control

– Query optimization: minimize response times

– Reorganization: maintain statistics to increase speed and performance of physical access.

– Concurrency control

– Change management: see the impact of structural change. For example, relationships of tables in Access.

Page 8: Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3

8

What are Differences between Spreadsheets and DBMS?

• Linkage between elements– Spreadsheet between cells in same table– DBMS between elements in different tables

• Orientation– Spreadsheet is toward calculations– DBMS is toward organization and linkage of data

elements• Control

– DBMS has stronger control for data security and integrity.

– Spreadsheet is limited.

Page 9: Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3

9

III. Database Models p.96• Four database models: hierarchical, network,

relational, and object-oriented database model.• Relational Database Model: Most popular database

model. Data are stored in multiple two-dimensional tables. – A table represents information of for an entity class.– A record stores information of an instance.– A piece of information from one table can be related

to a piece of information from another table if they have common field(s) (primary key and foreign key, p. 98 Fig. 3.14).

Page 10: Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3

10

What are the four important goals for relational database design?

• Have the capability of storing all pertinent data in the database.– All pertinent data can be found in the database.– First determination of all the attributes (fields).– Then, how many relations (tables).– How many databases.

• Eliminate redundant data.– Splitting relations (tables) is a standard design procedure to eliminate

redundantly duplicated data.• Keep the number of relations (tables) to a minimum.

– Too many relations (tables) in the database may make it cumbersome to use from a user’s standpoint.

• Have the relations (tables) normalized.– Splitting relations (tables) following specific procedure to eliminate

deletion and update problems.

*: Glenn A. Jackson, 1988, “Relational Database Design with Microcomputer Applications”

Page 11: Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3

11

III. Database Models (continued)• Object Oriented Model: (p.99)

– An object combines information (attributes) of an entity class and procedures (methods) that can act on the information. P.99

– Coupling (P.99 Fig. 4.15): Object “Customer” is coupled to object “Video Rental” if and only if object “Customer” sends a message to object “Video Rental”.

– Emphasis on reuse: p.100 – quality– productivity– shift from “creating” software to “assembling”

software

Page 12: Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3

12

IV. Develop Your Own Relational Database 1. The Four-Step Procedure for Database Design p.118• Defining entity classes and primary keys: Tables• Defining relationships among entity classes: (Fig. B.1)

– One-to-Many (1:M): “An employee must be assigned to one and only one department while one department can have more than one employee.” Employee is on “Many side” and Department is on “One side”.

– Many-to-Many (M:M): An employee can be assigned to more than one job while one job can be assigned to more than one employee.” It should be split into two one-to-many relationships (Fig. 4.18).

– Drop E-R diagram, use Standard Notation (see project demonstration).

Page 13: Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3

13

1. The Four-Step Procedure for Database Design

(continued)• Defining data fields for each table: Design View of

Access table.– What information is for an entity class?– Specify properties for error prevention.

• Using a data definition language to create the database: from paper (standard notation) to Access table.

Page 14: Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3

14

2. Three Normalization Rules p.120

Purpose: to reduce data redundancy.• Rule 1. Eliminating repeating groups or many-to-many

relationship (Design steps 1 & 2)– Split table: one table for one entity– Split many-to-many relationship into two one-to-

many relationships. • Rule 2. Each field in a table depends only on the

primary key and entire primary key (composite primary key p.97) (Design step 3)

• Rule 3. Remove all derived fields (you can later use calculated fields in Access) p.123

Page 15: Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3

15

3. Design a query (Access Skill Set 1 & Exercise)

• QBE (Query By Example) vs. SQL:– QBE is part of application generator. It

enables you to generate queries without writing any programming commands

– SQL is a query language for programming.– QBE is easier to use. SQL is more flexible and

powerful. Managers may prefer to use QBE.

Page 16: Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3

16

• To design a query, you need to answer four questions:

– What output do you want to see? (fields to show, how to sort)

– What tables are involved?– What constraints (criteria) are given?– How are the tables joined together?

(common field(s))

Exercise:

Work on Query Exercise

Page 17: Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3

17

4. Standard Notation (Drop E-R diagram)

(Access Database Project Demonstration)

Department(Dept, Dept Sup)

Employee(Emp ID, Name, Dept)

Employee-Job(Emp ID, Job, Hours)

Job(Job, Job Name)

1

1

1

Legacy System Reengineering

Page 18: Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3

18

V. Microsoft Access

• Seven objects of Access database: table, query, report, form, page, macro, VB module and Page.– Table: stores data for an entity (e.g., customer,

order, loan, etc.) in rows (records) and columns (fields). A record includes all information about a specific entity. A field is a piece of information in a record.

– Query: answers questions by retrieving certain fields of certain records from one or more tables that satisfy a set of criteria.

Page 19: Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3

19

– Form: A form provides a user-friendly interface to enter, maintain and view records, and it is based on a table or a query.

– Report: It presents data from tables or queries in the way that you want.

– Macro: a set of Access macro commands that automates tasks.

– Module: a program written in Visual Basic to automate tasks.

– Page: publish database table or query as a Web page.

Page 20: Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3

20

VI. Data Warehouse and Data Mining p.100

Three Key Features of Data Warehouse• Data warehouses combine information from

different databases (compatibility p.101 Fig. 3.16).• Data warehouses are multidimensional (Hypercube

p.102 Fig. 3.17) while the relational database is represented in a series of two-dimensional tables.

• Data warehouses support decision making: Unlike most databases, data warehouses are not transaction-oriented. Data warehouses support online analytical processing (OLAP vs. OLTP p.102).

Page 21: Database and Data Warehouse Module B: Designing and Building a Relational Database Chapter 3

21

What is Data Mining?

• Data mining uses various tools and algorithms to search large data sets for “hidden” or “buried” relationship not yet identified by management.

• Data Mining Tools– Regression for relationship

– Cluster analysis for distinct statistical segments

– Neural network for a predictive pattern.

– More

* Indiana University