150
Data Modeling and Database Design Minder Chen, Ph.D. [email protected] is assigned to contains staffed by subcontract member is a member of belongs to Employee Employee number First name Last name Employee function Employee salary Team Team number Specialty Division Division number Division name Division address Task Task name Task cost Project Project number Project name Project label Start date End date Customer Customer number Customer name Customer address Customer activity Customer telephone Customer fax

Data Modeling and Database Design Minder Chen, Ph.D. [email protected] is assigned to contains staffed by subcontract member is a member of belongs to Employee

Embed Size (px)

Citation preview

Page 1: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

Data Modeling and Database DesignMinder Chen, Ph.D. [email protected]

is assigned to

contains

staffed by

subcontract

member

is a member of

belongs to

Employee

Employee number

First name

Last name

Employee function

Employee salary

Team

Team number

Specialty

Division

Division number

Division name

Division address

Task

Task name

Task cost

Project

Project number

Project name

Project label

Start date

End date

Customer

Customer number

Customer name

Customer address

Customer activity

Customer telephone

Customer fax

Page 2: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 2 -

Data Modeling and Database Design Course Outline• INTRODUCTION

– Introduction to Data Modeling– Database Development Life Cycle Overview

• ENTITY AND RELATIONSHIP – Develop the Subject Area Diagram– Develop Preliminary Data Model: Entity & Relationship

Identification • ATTRIBUTES AND SUBTYPES

– Attributes Identification and Definition – Develop Fully Attributed Data Model– Identifiers– Data Modeling Exercise– Partitioning and Entity Subtypes

• NORMALIZATION – Normalization – Normalization Exercise – De-normalization

• DATA MODEL EVALUATION AND MAPPING TO RELATIONAL DBMS

– Refine a Data Model: Analysis and Simplification– Transform to Physical Data Base Design

• PowerDesigner: Data Architect• Pysical DB Design and Data Warehouse DB Design

Page 3: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 3 -

ReferencesData Modeling and Database Design 1. Batini, Ceri, Navathe, Conceptual Database Design, Redwood City, CA: The

Benjamin/Cummings Publishing Company, Inc., 1992. 2. Teorey, T. J., Database Modeling and Design: The Entity-Relationship

Approach, Morgan Kaufmann Publishers, Inc., 1990. 3. Thomas A. Bruce, Designing Quality Databases with IDEF1X Information

Models, Dorset House Publishing, NY: New York, 1991. 4. Texas Instruments, A Guide to IE Using IEF, 2nd edition, Part No. 2739756-0001,

1990.5. Martin, James, Information Engineering Book II: Planning and Analysis,

Prentice-Hall Inc., 1989. 6. Dave Ensor, Ian Stevenson, Oracle Design, O'Reilly & Associates, 19977. Rob Gillette, etc., Physical Database Design for Sybase SQL Server, Prentice

Hall, 1995. 8. Ralph Kimball, The Data Warehouse Toolkit, Wiley, 1996.

JAD References1. August, J. H.. Joint Application Design: The Group Session Approach to

System Design. Englewood Cliffs, NY, Prentice Hall, Inc., 1991.2. Wood, J. and Silver, D. Joint Application Design: How to Design Quality

Systems in 40% Less Time. New York, NY, John Wiley & Sons, 1989. 3. Andrews, D. C. and Leventhal, N. S., Fusion: Integrating IE, CASE, and JAD: A

Handbook for Reengineering the Systems Organization, Englewood Cliffs, NJ: Yourdon Press, 1993.

Page 4: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 4 -

Data Modeling and Database Design: INTRODUCTION

• Systems Development Life Cycle (SDLC) in a Client/Server Environment

• Introduction to Data Modeling

• Database Development Life Cycle Overview

Page 5: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 5 -

Rationales for Data Modeling

• Data is the foundation of modern information systems enabled by data base technologies.

• Data in an organization exist and can be described independently of how these data are used.

• Data should be managed as a corporate-wide resource.

• The types of data used in an organization do not change very much.

• Data have certain inherent properties which lead to correct structuring.

• If we structure data according to their inherent properties, the structure (i.e., data models) will be stable.

Page 6: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 6 -

History of Data Modeling • Importance of Entity-Relationship Modeling Technique

– Database – Data modeling and enterprise-wide data – Data quality – Data updating and accessing tools and procedure– Data sharing culture

• ER modeling technique was first developed by Peter Chen in 1976

– A conceptual/logical data modeling tool

– A user-oriented approach

– A graphic-based method

• ER modeling technique is the major data modeling method in Information Engineering and is widely supported by most of CASE tools.

• Data modeling is the foundation of most database-centered transaction processing systems and data warehouse systems

Page 7: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 7 -

CSC Development Strategies

• RE-CREATE new business process & systems from scratch

• RE-ENGINEER business process & systems

• RE-DESIGN current systems

• RE-HOST current systems

• RE-IMAGE current systems

HIGHHIGH

LOWLOW

Risk Long Term RewardShort Term CostsDegree of Change

Page 8: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 8 -

Distribution of Business Function (Logic)

DataSpace

PresentationService

PresentationLogic

FunctionLogic

DataLogic

DataService

PresentationSpace

Client Server

• Presentation logic• Local input validation • Output production logic• Local peripheral drivers• Performance critical processing

• Functions that access data on the server

• Functions that need input from multiple users

• Functions that coordinate the work of several user

Issues: • Distribution of data • Platform-specific capabilities and interoperability• Connectivity capabilities/platform • Frequency of change to codes• Configuration management

Page 9: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 9 -

C/S Development Methodology

UserInterface

ApplicationLogic

C/S Architecture

ConceptualAnalysis

Logical Design

Physical Design

SDLC

WorkFlow

FormSequences

Forms,Screens

ProcessFlow

ObjectInteractionModel

Programs,Procedures

performance =>rules=>

Source: David Vaskevitch, Client/Server Strategies, IDG Books, 1993.

Information& Data Base

Data Model

DatabaseSchema

Tables, Indexes

Page 10: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 10 -

Client/Server Application Development Methodology

Requirements

Information& Data Base

ProcessesBehavior

WorkflowUser Interface

Architecture

ApplicationDesign and Development

Source: David Vaskevitch, Client/Server Strategies, IDG Books, 1993.

Where Do You Start?

Page 11: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 11 -

Data Modeling (Data Base Design) Process

Information Requirements

Conceptual

DB Design

Logical

DB Design

Physical

DB Design

Conceptual (Enterprise) DB Schema

Logical DB Schema

Physical DB Schema

A conceptual DB schema is a high-level description of the database, independent of the particular DBMS.

A logical DB schema is a description of the structure of the database that can be processed by a DBMS: relational, network, or hierarchical.

A physical DB schema is a description of the implementation of the database in external memory; it describes the storage structures and access methods used in order to effectively access and maintain data.

Source: Batini, C., Ceri, S., and Navathe, S. B., Conceptual Database Design: An Entity-Relationship Approach, The Benjamin/Cummings Publishing Company, Inc., 1992.

Page 12: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 12 -

Multiple Perspectives

DATA ACTIVITY

EMPLOYEE

HIREEMPLOYEE

PAYEMPLOYEE

PROMOTEEMPLOYEE

FIREEMPLOYEE

......

....

......

....

ONE BUSINESS

We do these things

We usethis data

Page 13: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 13 -

Member Agreementis enrolled under;

applies to

Club

established by; established

Member Order

Product Promotionsponsors;

is sponsored byis featured in;

features

generates; generated by

sells; is sold on

placed by; places

Data Model (Entity Relationship Diagram)

Page 14: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 14 -

Entity Relationship Diagram: Subject Area and Entity Type

• Subject Area and Subject Area Diagram

• Entity Types

• Entity Instances

• Finding Entity Types

• Evaluating Entity Types

Page 15: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 15 -

Subject Area (Submodel)• A natural area of interest to the business that is centered

on a major resource, inputs, outputs, or activity of the business.

• It contains a set of entity types. • We start the data modeling in the ISP stage by identifying

subject areas with names and descriptions. • In BAA stage, subject areas are used to as high level

grouping of entity types. • Naming: a subject area is a noun in plural form and often

has the name as the central entity type in the subject area. • Examples:

Project Member Task

Project

Projects

Page 16: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 16 -

Subject Area Diagram

Customers

PurchaseOrders

Buyers

Raw-materialsProducts

Sales-persons

: Subject Area

: Association

Legends

Orders Suppliers

Page 17: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 17 -

Entity Types

• Definition: – An entity is an object or event, real or abstract, about

which we would like to store data. Entity is the abbreviation of entity type. It represent a set of entity instances which can be described by the same set of attribute types. The value of the same attribute for each entity instance may be different.

• Identifying Entity Types– What information is required by the business?

– Things that are of interest to the business that need to be remembered in order to manage and track them.

– Things belong to the same entity type have common characteristics.

Page 18: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 18 -

Naming Entity Types• The name of each entity is in singular form

– a noun– an adjective + a noun– a noun + a noun => (noun string)– an adjective + a noun + a noun

• Examples– Customer, Customer Order, Product, Hourly Employee, Project, Department, Unfilled

Customer Order • Be clear and concise • Avoid abbreviation • Be consist with user’s terminology • Identify synonyms

– Customer Client– Product Merchandise– Supplier Vendor– Teacher Faculty

• Use one name as the official name and document others as aliases

Page 19: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 19 -

Exercise: Entity Type Naming

• Courses

• Department

• Customer Order

• PO

Page 20: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 20 -

Properties of Entity Types

• Name

• Description

• Identifier

• Properties: Estimated number (Max., Min., Average) of entity instances

• Expected growth rate of entity instances

• Subject Area in which the Entity Type resides

• Attributes that describe the Entity Types

• Examples of entity type instances

Page 21: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 21 -

Definition of an Entity Type

• A poor definition of Customer: Anyone that buys something from the company. – Can employees be a customer? – Can a leasor be a customer? – If the company sold a subsidiary to another

company, does the new owner consider a customer?

• Good definition should be: – Compatible– Precise– Concise– Clear– Complete

Page 22: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 22 -

Good Definition

• Compatible– Customer: An ORGANIZATION that purchase

PRODUCTs for personal use. – Distributor: An ORGANIZATION that purchase

PRODUCTs for resale.

• Precision: – With appropriate qualifiers – Example: An ORGANIZATION is considered to have

purchase a PRODUCT when we receive a valid PURCHASE ORDER from it.

• Complete– ORGANIZATION, PRODUCT, PURCHASE ORDER

need to be defined.

• Concise and Clear– Use modular definition

Page 23: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 23 -

Example of Entity Type Descriptions

Entity Type Description

Customer Information about all persons or organizations who purchases

Product All goods manufactured and sold

Raw-material Components used to manufacture Products.

Supplier Vendors of Raw Materials.

Buyer Company personnel responsible for purchasing Raw-Materials from Suppliers

Page 24: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 24 -

Entity Type and Entity Instance (Occurrence)

Entity Types Entity Instance

Vendor ABC Co.

Employee John Smith

Course Intro. to IE

Department Marketing Department

Page 25: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 25 -

Exercise: Entity Types or Entity Instances?

• Maryland

• Organization Unit

• Customer

• President

• Bill Clinton

• Department of Commerce

• Address

Page 26: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 26 -

Finding Entity Types

• Interviews with users

• JAD workshops

• Business forms

• Reports

• Computer files using reverse engineering

• Operation manuals

Page 27: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 27 -

Where to Look for an Entity Type?

• Tangible or Intangible Things– The nouns that are used to describe the problem domain will often

correspond to the major Entity Types of the system, at least at a high level.

– Examples: Product, Sensor, and Employee, Department, and Sale Office.

• Resources– Any resources that an organization needs to manage should be

represented as an Entity Type. Information assists the efficient and effective use of other resources through improved decision.

– Examples: Inventory, Machine, Bank Account, and Customer. • Roles Played

– Roles can be played by persons or organizational units. – Examples: Customers, Managers, and Account representatives.

• Events– Events are incidents that occur at points in time. An event often

involved an interaction between two Entity Types or an action that changes the status of an Entity Type.

– Examples: Sale, Delivery, and Registration of a motor vehicle.

Page 28: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 28 -

BIAIT: Business Information Analysis and Integration Technique

• Analysis of Orders• Ordered entities can be a thing, a space, or a skill. • View the order from supplier side. • If an organization receives no orders, it has no reason

for existing. • An organization unit can receive multiple types of

orders.• 4 questions about the Supplier:

– Billing (Cash)? – Deliver Late (Immediate)? – Profile customer? – Negotiate price (Fixed)?

• 3 questions about the Ordered Entity: – Rented (Sold)? – Tracked? – Made to order (Stock)?

Source: Carlson, W. M., "BIAIT: Business Information Analysis and Integration Technique - The New Horizon," Data Base, Vol. 10, No. 4, 1979, pp. 3-9.

Page 29: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 29 -

Criteria for Evaluating an Entity Type

• Need to be remembered by the information system in order to be functional.

• Can be operated on: CREATE, READ, UPDATE, DELETE.

• Has a set of operations/services that always apply to change the status of each occurrence of an Entity Type.

• Carry a set of attributes that always apply to describe each occurrence of an Entity Type.

• Have at least one relationship with other entity type.

• Exist more than one entity occurrence (instance) in an Entity Type.

• Have at least a unique identifier.

• Domain-based requirements: Something that the system must have in order to operate. These may be clearly specified in the problem description or known from subject matter experts.

Page 30: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 30 -

Entity Relationship Modeling and Diagramming

• Relationships

• Entity Relationship Diagramming Notation

• Attributes

• Identifiers

• Partitioning and Entity Subtypes

Page 31: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 31 -

Relationship (Type)

• Definition– A Relationship Type is an association among Entity

Types. It indicates that there is a business relationship between these Entity Types.

– Relationship Membership is the participation of an Entity Type in a Relationship.

– In IE, a Relationship Type can involve only two Entity Types (binary relationship). Some other modeling techniques allow n-ary relationships.

• Examples– CUSTOMER places ORDER

– ORDER is placed by CUSTOMER

– EMPLOYEE works on PROJECT

– PROJECT has project member EMPLOYEE

Page 32: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 32 -

Paring (Relationship Instance)

Entity Types Entity Instance

StudentStudent#1

Student#2

Course

Course#ACourse#BCourse#CCourse#D

Relationship Relationship Paring

Student

takes

Course

Student#1 takes Course#AStudent#1 takes Course#BStudent#1 takes Course#DStudent#2 takes Course#AStudent#2 takes Course#CStudent#2 takes Course#D

• Relationship paring is a pair of Entity Instances of two Entity Types associated by a Relationship Type between these two Entity Types.

Page 33: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 33 -

Relationship Instances Grouping

• Definition: A collection of pairings of a Relationship Membership in which an Entity Instance is involved.

• Examples: – Student#1 takes Course#A, #B, and #D

– Student#2 takes Course#A, #C, and #D

– Course#A is taken by Student#1 and Student#2

Page 34: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 34 -

Relationship Cardinality

E1 E2

E1

E1

E2

E2

One-to-ManyOne-to-Many

Many-to-ManyMany-to-Many

One-to-OneOne-to-One

1:1

1:M

M:N

Page 35: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 35 -

Relationship Cardinality

• The number of Entity Instances involved in the Relationship Instances Grouping in a Relationship Type.

• Three Forms of Cardinality 1. One-to-one (1:1)

DEPARTMENT has MANAGEREach DEPARTMENT has one and only one MANAGEREach MANAGER manages one and only one DEPARTMENT

2. One-to-many (1:m)CUSTOMER places ORDEREach CUSTOMER sometimes (95%) place one or more ORDERsEach ORDER always is placed by exactly one CUSTOMER

3. Many-to-many (m:n)INSTRUCTOR teaches COURSE Each INSTRUCTION teaches zero, one, or more COURSEsEach COURSE is taught by one or more INSTRUCTORs

Page 36: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 36 -

Entity Relationship Diagram (ERD): Notations

Entity-X Entity-Y relationship-description

reversed-relation-description

Example Example

Cardinality indicator

Department Manageris-managed-by

manages

Translate into two structured statementsTranslate into two structured statements

min max

zeroonemany

Each Entity-X relationship-description cardinality-indicator (one-or-many) Entity-Y Each Entity-Y reversed-relationship-description (zero-or-one) Entity-Y

Graphical NotationsGraphical Notations

Page 37: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 37 -

Optionality of Relationship Memberships

• Whether all entity instances of both entity types need to participate in relationship pairing.

• Optionality: – Mandatory

– Optional

• Example: – CUSTOMER membership is optional

– ORDER membership is mandatory

CUSTOMER ORDER places

is placed by

Page 38: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 38 -

Relationship Statements

CUSTOMER ORDER places

is placed by

Cardinality indicator

oneone or more

Graphical NotationsGraphical Notations

Optionality indicator

zero (sometimes)one (always)

Each CUSTOMER sometimes places one or more ORDER.Each ORDER always is placed by one CUSTOMER.

Each Entity X optionality relationship cardinality Entity YEach Entity X optionality relationship cardinality Entity Y

Page 39: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 39 -

Defining Relationships

• Name• Description• Property

– Cardinality volumes– Optionality percentage: % of Entity Type X's

instances pairing with Entity Type's Y's instances

– Transferability: A relationship is transferable if an entity instance can change its pairing within the same relationship. » TRANSFERABLE: An EMPLOYEE can change to a

different DEPARTMENT. » NON-TRANSFERABLE: An ORDER cannot be

transferred to another CUSTOMER.

Page 40: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 40 -

ERD: More Examples

Employee Project

manages

is-managed-byworks-for

has-project-members

Part

is-consists-of

contained-in

Customer Orderplaces

belongs-to

Productis-contained-in

contains

(a)

(b)

(c)

Involuted or LoopedRelationship

Parallel Relationship

Page 41: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 41 -

ERD: Alternative Notations

Orderplaces

belongs-to

Alternative Notations: Alternative Notations:

Orderplaces

belongs-to

Order

Customer

placesCustomer belongs-to

OrderplacesCustomer1 M

Customer

Page 42: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 42 -

Identifying Relationships

• Association between entity types

• Entity types that are used on the same forms or documents.

• A description in a business document that has a verb that relates two entity types– has

– consists of

– uses

Page 43: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 43 -

Attributes• Definition

– Characteristics that could be used to describe Entity Types and Relationship Types. However, in IE, relationship types are not allowed to have attributes.

• Naming Conventions: – Names that have business meaning– Don't use abbreviation or possessive case, e.g., PN and

Customer's name– Don't include entity type name because IEF will prefix the attribute

name with entity type name automatically – Use standard format:

Entity Type Name (Qualifiers) Domain NameCustomer NameEmployee Starting Date

• Examples– Customer has customer name, address, and telephone number– Product has quantity-on-hand, weight, volume, color, and name. – Employee has SSN, salary, and birthday. – Employee-works-for-project has percentage-of-time, starting-date.

Page 44: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 44 -

Attributes: Notations

Student

Student IDStudent Name

Birth dateStudent IDCourse no.

Birth date

enrollment

Student(Student ID, Student Name, Birth Date)Student(Student ID, Student Name, Birth Date)

Finding Attributes: Attributes are identified progressively during BAA phase.• Data Analysis • Activity Analysis • Interaction Analysis • Current Systems Analysis

Finding Attributes: Attributes are identified progressively during BAA phase.• Data Analysis • Activity Analysis • Interaction Analysis • Current Systems Analysis

Employee

Employee numberFirst nameLast nameEmployee functionEmployee salary

studentID

namephone

Student

Page 45: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 45 -

Attribute Value

• Definition – Attribute Values are instances of Attributes used to describe

specific Entity Instances

• Examples– Customer Number: 011334

– Customer Name: Minder Chen

– State: VA

– Order Total: $23,000

– Sale tax: $250

• An attribute of an entity type should have only one value at any given time. (No repeating group)

• Avoid using complex coding scheme for an attribute.

For example: PART Number: X-XXX-XXX

Part Type Material Sequence Number

Page 46: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 46 -

Type & Instance

OBJECT TYPE OCCURRENCE

Entity Type Entity Instance

Entity Entity Instance

Entity Type Entity

Relationship (Type) Pairing (Relationship Instance)

Attribute (Type) (Attribute) Value

Page 47: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 47 -

Attribute Source Categories

• Basic– Definition: An Attribute Value that cannot be

deduced or calculated. – Examples: Student name and Birthday

• Derived– Definition: The Attribute Value can be calculated or

deduced from relationship Groupings or from the values of other Attributes. The value of a Derived Attribute changes constantly.

– Examples: Student Age, Account Balance, Number of courses taken.

• Designed– Definition: The Attribute is created to overcome the

system constraints. The value of a Designed Attribute does not change.

– Examples: Student ID, Course number.

Page 48: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 48 -

Data Types

Page 49: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 49 -

Properties of Attributes• Name • Description• Attribute Source Category: Basic, Derived, Designed• Domain or data type: Text, Number, Date, Time, Timestamp • Optionality: Mandatory or optional • Length and/or precision• Permitted Values (Legal Values)

– Ranges– A set of values (Code Table)

• Default value or algorithm

Tools such as PowerBuilder has additional properties for table’s columns called extended attributes

– Validation Rule– Editing Format– Reporting Format

– Column Heading

– Form Label

– Code Table

Page 50: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 50 -

Composite Attribute

• Definition:

• Example: – Telephone Number =

Area code + Exchange + Extension

• There is no support of composite attribute type most of CASE tools. In such case, an composite attribute must be stored as an entity type.

Page 51: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 51 -

Domain

• A collection of values which can be taken by one or more attributes.

• Date is the domain for Ordered Date, Student's Birthday, Employee Starting Date.

• A used defined domain can have customized validation rules and formats.

• CASE tools such as IEF only supports the following basic domains: – Text

– Number

– Date

– Time

– Timestamp

Page 52: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 52 -

Identifiers

• The identifier of an entity type is a set of attributes and/or relationships whose values can uniquely identify an entity.

• Entity types should have one identifier.• Identifiers may consist of

– A single attribute: Student ID– A set of attributes: Students ID + Course ID – An attribute and a relationship membership

(implemented as a foreign Key): Order Item No + Order Has Order Item

Page 53: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 53 -

Identifying Relationship

orderorderitem

is part of

contains

customerproduct

places

is placed by

is ordered by

has

ORDERS

Symbol for Identifying Relationship

Page 54: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 54 -

Data Modeling Case StudyThe following is description by a pharmacy owner:

"Jack Smith catches a cold and what he suspects is a flu virus. He makes an appointment with his family doctor who confirm his diagnosis. The doctor prescribes an antibiotic and nasal decongestant tablets. Jack leaves the doctor's office and drives to his local drug store. The pharmacist packages the medication and types the labels for pill bottles. The label includes information about customer, the doctor who prescribe the drug, the drug (e.g., Penicillin), when to take it, and how often, the content of the pill (250 mg), the number of refills, expiration date, and the date of purchase."

Please develop a data model for the entities and relationships within the context of pharmacy. Also develop a definition for "prescription". List all your underlying assumptions used in your data models.

Page 55: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 55 -

Data Modeling Case StudyGiven the following narrative description of entities and their relationships, prepare a draft entity relationship diagram (ERD). Be sure any reasonable assumptions that you are making.

Burger World Distribution Center serves as a supplier to 45 Burger World franchises. You are involved with a project to build a database system for distribution. Each franchise submits a day-by-day projection of sales for each of Burger World's menu products - the products listed on the menu at each restaurant - for the coming month. All menu product require ingredients and/or packaging items. Based on projected sales for the store, the system must generate a day-by-day and ingredients need and then collapse those needs into one-per-week purchase requisitions and shipments.

Page 56: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 56 -

Data Modeling Process

• List entity types

• Create relationships– Pick a central entity type

– Work around the neighborhood» Add entity types to the diagram

» Build relationships among them

– Determine cardinalities of relationships

• Find/Create identifiers for each entity type

• Add attributes to the entity type in the data model

• Analyze and revise the data model

Page 57: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 57 -

Classifying Attribute and Partitioning

• An Entity Subtype A collection of Entities of the same type to which a narrower definition and additional Attributes and Relationships apply. An Entity Subtype inherits (retains) all the Attributes and Relationships of its parent Entity Type.

• Classifying Attribute: An attribute of the Base Entity Type whose values partition the Entity Instances into Subtypes.

• Partitioning: A basis for subdividing one entity type into subtypes. The process of dividing an Entity Type into several Subtypes based on a Classifying Attribute is called Partitioning.

• The Classifying Attribute is recorded as a property of the Partitioning and it appears on the diagram.

Page 58: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 58 -

Characteristics of Partitioning

• Optionality: – Mandatory: Every Entity instances of the Entity Type

must fall into one of the Subtype categories.– Optional: Not every Entity instances of the Entity

Type must fall into one of the Subtype categories.

• Entity Life Cycle: The states through which an Entity Type can pass are used for Partitioning.

• Enumeration: – Fully enumerated– Not fully enumerated

• Classifying Attributes and Values– Classifying Attribute: Type– D: Domestic Subtype– F: Foreign Subtype

Page 59: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 59 -

Partitioning and Entity Subtype: Notation

Employee

SeminarLecturer

ATTRIBUTE: Employee ID Name Birthday

ATTRIBUTE: Teaching Quality Indicator

Teaches

Staff

Type

Wage

Hourly

Status

Page 60: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 60 -

Alternative Notations for Subtypes

employeeIDnamephone

full-time-emp

employeeID (FK)salary

part-time-emp

employeeID (FK)hourly-rate

employee type

Complete Category

All categories shown

Savings

Rate

Checking

Fees

Account

Account NumberName

IDEF1X PowerDesigner

Page 61: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 61 -

Entity Subtype Partitioning

Order

Taken

Scheduled

Order Status

Shipped

Billed

Paid

Life Cycle PartitioningLife Cycle Partitioning

Page 62: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 62 -

Normalization

• A data base is a model or an image of the reality.

• Logical Data Base Design is a process of modeling and capturing the end-user views of an application domain and synthesis them into a data base structure.

• Normalization is a logical data base design method.

• The basis for normalization is the functional dependencies among attributes in a table.

Page 63: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 63 -

SQL Terminology

p_no product_namequantity price

101 Color TV 24500

201 B&W TV 10250

202 PC 52000

CREATE TABLES (p_no CHAR(5) NOT NULL,

product_name CHAR(20), quantity SMALLINT, price DECIMAL(10, 2));

Create a table in SQLCreate a table in SQL

Product TableProduct Table

Row

Column

Page 64: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 64 -

SQL Terminology

Set Theory Relational DB File Example

Relation Table File Product_table

Attribute Column Data item Product_name

Tuple Row Record Product_101's info.

Domain Pool of legal values Data type DATE

Page 65: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 65 -

SQL Principles

• The result of a SQL query is always a table (View or Dynamic Table)

• Rows in a table are considered to be unordered

• Dominate the markets since late 1980s

• Can be used in interactive programming environments

• Provide both data definition language (DDL) and data manipulation language (DML)

• A non-procedural language

• Can be embedded in 3GL: – Embedded SQL

– Dynamic SQL

Page 66: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 66 -

SQL: Data Definition Language (DDL)

CREATE

DROP

TABLE

VIEW

INDEX

DATABASE

ALTER TABLE

Page 67: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 67 -

SQL: Introduction

• A relational data base is perceived by its users as a collection of tables

• E. F. Codd 1969

• Dominate the markets since late 1980s

• Strengths: – Simplicity

– End-user orientation

– Standardization

– Value-based instead of pointer-based

– Endorsed by major computer companies

• Most CASE products support the development of relational data base centered applications

Page 68: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 68 -

SQL: Data Manipulation Language (DML)

SELECT UPDATE INSERTDELETE

SELECT UPDATE INSERTDELETE

p_no product_name quantity price

101 Color TV 24 500

201 B&W TV 10 250

202 PC 5 2000

SELECT [DISTINCT] column(s)FROM table(s) [WHERE conditions][GROUP BY column(s) [HAVING condition]][ORDER BY column(s)]

SELECT [DISTINCT] column(s)FROM table(s) [WHERE conditions][GROUP BY column(s) [HAVING condition]][ORDER BY column(s)]

The Generic Form of the SELECT StatementThe Generic Form of the SELECT Statement

Page 69: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 69 -

Database Table• The following code retrieves only the Last Name and the

Employee ID where the Employee ID is greater than 5. The records are retrieved in descending order.

SELECT LastName, EmployeeIDFROM EmployeesWHERE EmployeeID > 5ORDER BY EmployeeID DESC

Page 70: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 70 -

WHERE Clause

• WHERE: Use the Where clause to limit the selection. The # symbol indicates literal date values.

SELECT * FROM Employees WHERE LastName = "Smith"

SELECT Employees.LastName FROM Employees WHERE Employees.State in ('NY','WA')

SELECT OrderID FROM Orders WHERE OrderDate BETWEEN #01/01/93# AND #01/31/93#

Page 71: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 71 -

Keys

• A key, also called identifier, is an Attribute or a Composite Attribute that can be used to uniquely identify an instance of an entity type.

• Examples:

Entity Type Key

Warehouse Warehouse Number

Product Product Number

Student Student ID or SSN

Ship Name and Port of Registration

Stock of Product Product Number and Warehouse No.

Page 72: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 72 -

Types of Key

• Primary Key: A unique key is an attribute or a set of attributes that has been used by the DBMS as the identifier of a table.

• Candidate (Alternative) Key: An attribute or a set of attributes that could have been used as the primary key of a table.

• Secondary (Index) Key: An attribute or a set of attributes that has been used to construct the data retrieval index.

• Concatenated (Combined or Composite) Key: A set of attributes that has been used as the key.

• Foreign Key: An attribute or a set of attributes that is used as the primary key in another table.

Page 73: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 73 -

Purposes of Normalization

• Avoid maintenance problems such as Update .

• Insert: There may be no place to insert new information.

• Delete: Some important information will be lost by deletion.

• Update: Inconsistency may occur because of the existence of data redundancy.

• Provide maximum flexibility to meet future information needs by keeping tables corresponding to object types in their simplified forms.

Page 74: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 74 -

A Common Sense Approach to Normalization

• Don't rush to put all the information in one table.

• Create a table to correspond to a class of a simple object type that should exist by itself, i.e., "one fact in one place."

• Include common fields (links) as ways of joining information from several related tables.

• Avoid redundancy by using links to retrieve data from related tables.

Page 75: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 75 -

Normalization Theory

• Normalization is a process of systematically breaking a complex table into simpler ones.

• It is built around the concept of normal forms. • A relation is in a particular normal form if it

satisfies a specific set of constraints such as dependencies among attributes in the relation.

• For x is an integer and x > 1, if a relation is in x-NF than it is in (x-1)-NF.

• Higher order normal forms are usually more desirable than lower order normal forms.

• Normalization process usually starts from complex relations which are usually drawn from some existing documents such as business forms.

Page 76: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 76 -

A Business Form

Page 77: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 77 -

An Informal Example of Normalization

• A CUSTOMER ORDER contains the following information: – OrderNo– OrderDate– CustNo– CustAddress– CustType– Tax– Total– one or more than one Order-Item which has

» ProductNo» Description» Quantity» UnitPrice» Subtotal.

Page 78: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 78 -

SolutionUnnormalized table

Remove repeating group

1st NF

2nd NF

3rd NF

Remove partial FD

Remove transitive FD

(OrderNo, OrderDate, CustNo, CustAddress, CustType, Tax, TotalTotal)

(OrderNo, ProductNo, Description, Quantity, UnitPrice, SubtotalSubtotal)

(ProductNo, Description, UnitPrice)

(OrderNo, ProductNo, Quantity, UnitPrice, SubtotalSubtotal)

(OrderNo, OrderDate, CustNo, Tax, TotalTotal)

(CustNo, CustAddress, CustType)

(OrderNo, OrderDate, CustNo, CustAddress, CustType, Tax, TotalTotal, 1{ProductNo, Description, Quantity, UnitPrice,Subtotal}n)

Page 79: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 79 -

Unnormalized Form

• A relation that has multi-valued attributes (repeating groups).

• Normalization Process: Remove Multi-value Attributes • If an unnormalized relation R has a primary key K and a

multi-value attribute M, the normalization process is: – The multi-value attribute M should be removed from R.– A new relation will be created with (K,M) as the primary key of the

relation. – There may be some other attributes associated with this new

relation. – R will then be at least in 1NF.

• Example: An Employee relation has an attribute language-spoken. For some employees there may be more than one language that they can speak.

EMP (employeeID, empName, empAddress, (language1, language2, ...))

EMP (employeeID, empName, empAddress)

EMP-LANGUAGE (employeeID, language, skillLevel)

Page 80: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 80 -

How Do You Remove the Repeating Groups?CREATE TABLE MEM_CONDITION ( MEMBER# VARCHAR2(12) NOT NULL, CASE# VARCHAR2(16) NOT NULL, DIAG_ARRAY_1 VARCHAR2(6) NOT NULL, DIAG_ARRAY_2 VARCHAR2(6) NOT NULL, DIAG_ARRAY_3 VARCHAR2(6) NOT NULL, DIAG_ARRAY_4 VARCHAR2(6) NOT NULL, DIAG_ARRAY_5 VARCHAR2(6) NOT NULL, DIAG_EX_ARRAY_1 VARCHAR2(2) NOT NULL, DIAG_EX_ARRAY_2 VARCHAR2(2) NOT NULL, DIAG_EX_ARRAY_3 VARCHAR2(2) NOT NULL, DIAG_EX_ARRAY_4 VARCHAR2(2) NOT NULL, DIAG_EX_ARRAY_5 VARCHAR2(2) NOT NULL, DRUG_ARRAY_1 VARCHAR2(12) NOT NULL, DRUG_ARRAY_2 VARCHAR2(12) NOT NULL, DRUG_ARRAY_3 VARCHAR2(12) NOT NULL, DRUG_ARRAY_4 VARCHAR2(12) NOT NULL, DRUG_ARRAY_5 VARCHAR2(12) NOT NULL, LC_ARRAY_1 VARCHAR2(4) NOT NULL, LC_ARRAY_2 VARCHAR2(4) NOT NULL, LC_ARRAY_3 VARCHAR2(4) NOT NULL, LC_ARRAY_4 VARCHAR2(4) NOT NULL, LC_ARRAY_5 VARCHAR2(4) NOT NULL, MEM_REVIEW VARCHAR2(4) NOT NULL, OP# VARCHAR2(4) NOT NULL, PROC_ARRAY_1 VARCHAR2(6) NOT NULL, PROC_ARRAY_2 VARCHAR2(6) NOT NULL, PROC_ARRAY_3 VARCHAR2(6) NOT NULL, PROC_ARRAY_4 VARCHAR2(6) NOT NULL, PROC_ARRAY_5 VARCHAR2(6) NOT NULL, PROV_ARRAY_1 VARCHAR2(12) NOT NULL, PROV_ARRAY_2 VARCHAR2(12) NOT NULL, PROV_ARRAY_3 VARCHAR2(12) NOT NULL, PROV_ARRAY_4 VARCHAR2(12) NOT NULL, PROV_ARRAY_5 VARCHAR2(12) NOT NULL, REC_TYPE VARCHAR2(2) NOT NULL, SP_ARRAY_1 VARCHAR2(4) NOT NULL, SP_ARRAY_2 VARCHAR2(4) NOT NULL, SP_ARRAY_3 VARCHAR2(4) NOT NULL, SP_ARRAY_4 VARCHAR2(4) NOT NULL, SP_ARRAY_5 VARCHAR2(4) NOT NULL, TRANSCODE VARCHAR2(2) NOT NULL, TT_ARRAY_1 VARCHAR2(4) NOT NULL, TT_ARRAY_2 VARCHAR2(4) NOT NULL, TT_ARRAY_3 VARCHAR2(4) NOT NULL, TT_ARRAY_4 VARCHAR2(4) NOT NULL, TT_ARRAY_5 VARCHAR2(4) NOT NULL, VOID VARCHAR2(2) NOT NULL, YMDEFF VARCHAR2(8) NOT NULL, YMDEND VARCHAR2(8) NOT NULL, YMDTRANS VARCHAR2(8) NOT NULL, PRIORITY VARCHAR2(2) NOT NULL );

Page 81: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 81 -

Functional Dependency

• Notation: R.X => R.Y

• Definition: Attribute Y of Relation R is functionally dependent on the Attribute X of Relation R when there is each value of R.Y associated with no more than one value of R.X. R.X and R.Y may be composite attributes.

• Description: – R .Y is functionally dependent on R.X

– R.X functionally determines R.Y

Page 82: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 82 -

Full & Partial Dependency

• R.A => R.B

• If B is not functionally dependent on any subset of A (other than A itself), B is fully dependent on A in R.

• If B is functionally dependent on a subset of A (other than A itself), B is partially dependent on A in R.

Page 83: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 83 -

First Normal Form (1NF)• A relation R is in the first normal form (1NF) if and only if all

attributes of any tuple in R contain only atomic values. • Normalization Process:

– Remove Partial Functional Dependencies– If R is in 1NF and has a composite primary key (K1,K2), an attribute

P is functionally dependent on K1 (K1 => P) (i.e., P is partially dependent on (K1, K2)), the normalization process is:

– The attribute P should be removed from R and a new relation will be created with K1 as the primary key and P as a non-key attribute.

– A relation that is in 1NF and not in 2NF must have a composite primary key.

• Example– Supplier-Part relation has attributes supplier#, part#, qty, city,

distance, where (supplier#, part#) is the key. – City is partially dependent on supplier#.

SUPPLIER-PART (supplier#, part#, qty, city, distance)

SUPPLIER-PART (supplier#, Part#, qty) SUPPLIER (supplier#, city, distance)

Page 84: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 84 -

Non-loss Decomposition

• Normalization is a reduction (decomposition) process that replaces a relation by suitable projections. Each of the projection is a new relation that is in a further normalized form than the original relation. The collection of projections is equivalent to the original relation.

• The original relation can always be recovered by taking the natural join of these projections.

• Any information that can be derived from the original relation can also be derived from the further normalized relations. The converse is not true.

• The process is reversible because no information is loss in the reduction process.

Page 85: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 85 -

Transitive Dependency

In a relation R,

if R.A =>R.B and R.B => R.C

then attribute C is said to be transitively dependent on attribute A.

Page 86: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 86 -

Second Normal Form (2NF) • A relation R is in the second normal form (2NF) if and only if

it is in 1NF and every non-key attribute is fully dependent on the primary key.

• Normalization Process: Remove Transitive Dependencies• If R is in 2NF and has two non-key attributes A1 and A2

where A2 is functionally dependent on A1 (A1 => A2). The A2 should be removed from R and a new relation will be created with A1 as the primary key and A2 as a non-key attribute.

• Example– Supplier relation has attributes supplier#, city, distance, where

supplier# is the key and distance to a supplier can be determined by the city of the supplier.

SUPPLIER (supplier#, city, distance, quality_level)

SUPPLIER (Supplier#, city, quality_level)

CITY-DISTANCE (city, distance)

Page 87: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 87 -

Third Normal Form (3NF)

• A relation R is in the third normal form (3NF) if and only if the non-key attributes (if there is any) are fully dependent on the primary key of R (i.e., R is in its 2NF) and are mutually independent.

• Heuristic to Check Whether a Relation Is in 3NF– All the non-key attributes (which are not multi-value attributes) are

dependent on the (primary) key, the whole key, and nothing but the key.

• All the non-key attributes have atomic value and dependent on the key (1NF - No multi-value attribute),

• the whole key, (2NF - No Partially Functional Dependency)

• and nothing but the key (3NF - No Transitive Functional Dependency)

ExplanationExplanation

Page 88: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 88 -

Normalization Process

F G

Unnormalized FormUnnormalized Form

A F G H

1NF 1NF

HA B C D E

2NF 2NF

A F G

3NF 3NF

F H

3NF 3NF

A B C D E

A B

D E 3NF 3NF

3NF 3NF

remove transitive dependenciesremove partial dependencies

remove repeating groups

DC

Page 89: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 89 -

Normalization: Pros and Cons

• Pros– Reduce data redundancy & space required

– Enhance data consistency

– Enforce data integrity

– Reduce update cost

– Provide maximum flexibility in responding ad hoc queries

• Cons– Many complex queries will be slower because joins have to be

performed to retrieve relevant data from several normalized tables

– Programmers/users have to understand the underlying data model of an database application in order to perform proper joins among several tables

– The formulation of multiple-level queries is a nontrivial task.

Page 90: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 90 -

Join Two Tables

SELECT Categories.CategoryName, Products.ProductName

FROM Categories, Products

WHERE Products.CategoryID = Categories.Category ID

Page 91: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 91 -

Tables in Relational DB

ID

ID ID

• Identify Primary Keys and Foreign Keys in the following Tables!!!

Page 92: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 92 -

Join Tables

SELECT Orders.OrderID, Orders.CustID, LastName, Firstname, Orders.ItemID, Description

FROM Customer, Orders, Inventory

WHERE Customer.CustID = Orders.CustID AND

Orders.ItemID = Inventory.ItemID

ORDER BY CustID, Orders.ItemID

ID ID

Page 93: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 93 -

Foreign Keys & Primary Keys in a Sample Access Database

Page 94: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 94 -

An Example of a Complex Query

SELECT customer_name, customer_phoneFROM customerWHERE customer_number IN

SELECT customer_number FROM orderWHERE order_no IN

SELECT order_noFROM orderItemWHERE product_number = 007

Please list name and phone number of customers who have ordered product number 007.

Page 95: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 95 -

Denormalization

• The process of intentionally backing away from normalization to improve performance. Denormalization should not be the first choice for improving performance and should only be used for fine tuning a database for a particular application.

• Requirements – Prior normalization– Knowledge of data usage

• Benefits – Minimize the need for joins– Reduce number of tables– Reduce number of foreign keys– Reduce number of indices

• Knowledge of Data Usage– How often are two data items needed together – How many rows are involved– How volatile is denormalized data – How important is visibility of data to users – What is the minimum response time and frequency of an query

Page 96: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 96 -

De-normalization: An Example

• Where: – R1 (ProductNo, SupplierNo, Price)

– R2 (SupplierNo, Name, Address, Phone)

– R1*R2 (ProductNo, SupplierNo, Name, Address, Phone, Price)

• R2 should be kept to prevent data loss.

• Data redundancy in R1*R2 and R2 could cause potential data inconsistency problems if the redundant data in these two tables are not maintained properly.

R1 R2JOIN

R1 * R 2 R2

Denormalization

Page 97: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 97 -

Data Model Refinement and Transformation

• Data Model Refinement

• Associative Entity Type

• Removing Many-to-Many Relationships

• Keys

• Transformation to Relational Databases

Page 98: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 98 -

Refinement of a Data Model: Analysis and Simplification

• Isolated Entity Type

• Solitary Entity Type

• One-to-One Relationship

• Redundant Relationship

• Multi-Valued Attributes

• Attribute with Attributes

• Many-to-Many Relationship

Page 99: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 99 -

Isolated Entity Type• An Entity Type that does not participate in a

Relationship.

• Since every Entity Type should participate in at least one Relationship, there exist two alternatives:

– Identify a relevant Relationship

– Remove the Entity Type from the model

Page 100: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 100 -

Solitary Entity Type• An Entity Type that has only one Entity Instance.

Examples: Computer Center, Sales Tax, and Current Order Number. Solitary Entity Types may be too restrictive.

• Alternatives: – Introduce another Entity Type with a wider scope.

– Computer Center ==> Organization Unit

– Define it as an Attribute of an Entity Type.

– Sales Tax ==> Sales Tax of Order

– Define it as a data element in an parameter table. A parameter table has only one row.

– Current Order Number ==> Current Order Number of Parameter Table

Page 101: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 101 -

Evaluate One-to-One Relationship

PurchaseRequest

PurchaseOrder

becomes

has request

Maybe IncorrectMaybe Incorrect

PurchaseOrder

CorrectCorrect

• It may be an unnecessary relationship between two Entity Types if they have the same attribute and relationships (i.e., they are identical).

• It should be then combined into one Entity Type.

Page 102: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 102 -

Redundant Relationship

orderis part of

contains

customer

places

is placed by

is ordered by

has

ORDERS

Is this relationship redundant?

has orderedproduct

orderitem

Differences in timing of an entity type in its life cycle:• Implemented as separate entity types or use subtypes• Use value of attributes or additional attributes to differentiate them

Page 103: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 103 -

Redundant Relationship

Productstocks

RedundantRedundant

Non-redundantNon-redundant

Warehouse

is contained in

contains Order Order Line is contained in

contains

Order History is contained in

contains Customer is contained in

contains

is placed by

places

is held asholds

is held in containsStock

Product

Page 104: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 104 -

Multi-Valued Attribute

• Definition– An Attribute that may have more than one value at a time is called a multi-

valued attribute.

• Solution: – Create an Entity Type for the multi-valued attribute

• Example: – Languages spoken by an Employee

– Employee(ID, Name, Phone, Languages)

– Employee(111, “John Smith”, 201-999-8888, (English, Chinese))

– Employee(ID, Name, Phone)

– Employee(111, “John Smith”, 210-999-8888)

– Employee_language(ID, Language)

– Employee_language(111, English)

– Employee_language(111, Chinese)

Page 105: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 105 -

Attribute with Attributes

• An Attribute that can be described by other Attributes is called an attribute with attributes.

• Example: – College Degree by an Employee

– (John Smith has a College Degree in Computer Sciences from George Mason University)

• Solution: – Create an Entity Type to avoid an Attribute with

Attributes.

– Add new attributes to the existing Entity Type.

Page 106: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 106 -

Associative Entity Type

• An Associative Entity Type is an Entity Type whose existence is meaningful only if it participates in several (>=2) Relationship Types at the same time.

• Associative Entity Types are often introduced to represent additional information in many-to-many Relationships or to decompose a many-to-many Relationship into two one-to-many Relationships.

• Associative Entity Types are also used to represent n-ary Relationships in a binary data model.

Page 107: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 107 -

Remove Many-to-Many Relationship

Ordercontains

belongs-to

Order Productcontains

is contained in

Why?Why?

has

belongs to

How? How? A many-to-many relationship can be decomposed into two one-to-many Relationships by creating an Associative Entity Type between the existing two Entity Types.

• There is no place to attach Attributes that are required to describe a many-to-many Relationship.

• It is difficult to translate many-to-many Relationships into relational tables automatically.

GivenGiven

Product

Order Line

Page 108: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 108 -

Remove Many-to-Many Relationships: Exercises

Supplierhas-sources

offers

Remove the many-to-many relationship from the following ER diagrams

Coursetakes

is-taken-by

Part

Student

consists-of

is-contained-in

(a)

(b)

(c)

Product

Page 109: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 109 -

Bills of Material

Part

consists-of is-a-component-in

Product Structure

Product-Structure(Parent Part No, Child Part No, Quantity)

A

B C

D E D F

2 1

1 3 2 2

A B 2A C 1B D 1B E 3C D 2C F 2

Page 110: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 110 -

Using an Associative Entity Type to Represent an N-ary Relationship

Product Project

Supplier

is used inProduct Usage

supplies

uses

Product Project

Supplier

involved in product usage

involved in product usage

involved in product usage

Product Usage is an Associative Entity Type for a 3-ary Relationship.Product Usage is an Associative Entity Type for a 3-ary Relationship.

Page 111: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 111 -

Translate Data Models to Relational Tables

Order Productcontains

is contained inOrder Line

has

belongs to

GivenGiven

Key: Order#Attribute: Order date Customer ID Sale Person ID

Key: Order#+Product#Attribute: Quantity Unit Price

Key: Product#Attribute: Description Qty-on-hand Unit Price

CREATE TABLE ORDER

(OrderNo CHAR(10) NOT NULL,

OrderDate DATE,

CustomerID CHAR(10),

SalePersonID CHAR(10));

Relational Tables CreatedRelational Tables Created

Page 112: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 112 -

Transformation of Data Models to Relational Database Tables

• The entire, or part of, a data (entity-relationship) model can be translated into a normalized database design.

• Objects Created – At most one relational database

– One or more relations (tables)

– Data structures (DDL) representing the elements (attributes) and the primary key of each relation

– Data type of each data elements

Page 113: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 113 -

Heuristics of Transformation• A table is created for each Entity Type in the ER diagram.

• A table is created for each multi-valued attribute.

• Relationship Types are implemented as tables or as foreign keys in other tables.

• Many-to-many relationship types are translated into tables.

• Foreign keys are used for implementing one-to-one and one-to-many Relationship Types.

• For one-to-many Relationship Types, the foreign key is placed in the table that represents the Entity Type on the "many" end of the Relationship Type.

• For identifyingidentifying one-to-many Relationship Types, the PK of the "one" table migrate to the "many" table as a FK and the FK is also part of the PK of the "many" table.

• For non-identifyingnon-identifying one-to-many Relationship Types, the PK of the "one" table migrate to the "many" table as a FK and the FK is a non-key attribute of the "many" table.

Page 114: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 114 -

PowerDesign: Data Architect

Generation/Reverse Engineering:CDM, PDM

Target DBMS

Generation & Reverse Engineering:

Triggers & Stored ProceduresDatabase Structure

Target4GL Tool

Extended AttributesDatabase Structure

Generation & Reverse Engineering:

http://www.powersoft.com/

Page 115: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 115 -

PowerDesigner

Page 116: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 116 -

A Sample Conceptual Data Model

Is member ofsupervises

Is manager of

UsesSubcontract

composes composed of

Division

Division numberDivision nameDivision address

Employee

Employee numberFirst nameLast nameEmployee functionEmployee salary

Customer

Customer numberCustomer nameCustomer addressCustomer activityCustomer telephoneCustomer fax

Project

Project numberProject nameProject label

Team

Team numberSpeciality

Task

Task nameTask cost

Material

Material numberMaterial nameMaterial type

Participate

Start dateEnd date

Conceptual Data Model

Project : Management

Model : Project Management

Author : User Version 6.x 7/21/98

Activity

Start dateEnd date

Page 117: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 117 -

Notations

Division

Division numberDivision nameDivision address

Employee

Employee numberFirst nameLast nameEmployee functionEmployee salary

Employee

Employee numberFirst nameLast nameEmployee functionEmployee salary

Entity

Relatio

nsh

ip

One-to-many

Page 118: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 118 -

More on Relationships

member

is a member ofEmployee

Employee numberFirst nameLast nameEmployee functionEmployee salary

Team

Team numberSpecialty

A project 'contains’ one or more tasks, and a task's existence is dependent on the project.

Many-to-many cardinality

ProjectProject numberProject nameProject label

TaskTask nameTask cost

Page 119: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 119 -

Advanced Concepts

SavingsRate

CheckingFees

AccountAccount NumberName

Employee

Employee numberFirst nameLast nameEmployee function

Employee salary

Reflexive relationship

Subtype

composes composed of

MaterialMaterial numberMaterial nameMaterial type

Page 120: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 120 -

Define Entities

Page 121: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 121 -

Define Attributes

Page 122: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 122 -

Check Parameters

Page 123: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 123 -

Relationship Definition

Page 124: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 124 -

Dependent (Identifying Relationship)• Check the box to

indicate a dependent relationship. "One to many" and "mandatory" are automatically chosen as the cardinality and optionality.

• At the physical data model level, the parent entity type's primary key (PK) will become part of the dependent child entity type's PK. It is also a foreign key.

Page 125: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 125 -

Inheritance (Super-Type and Sub-Type)

Page 126: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 126 -

Generate Physical Data Model

Page 127: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 127 -

Physical Data Model

DIVNUM = DIVNUM

EMPLOYEEEMPNUM <pk>DIVNUM <fk>EMPFNAMEMPLNAMEMPFUNCEMPSAL

DIVISIONDIVNUM <pk>DIVNAMEDIVADDR

DIVNUM automatically migrates as a foreign key.

belongs to

EmployeeEmployee numberFirst nameLast nameEmployee functionEmployee salary

DivisionDivision numberDivision nameDivision address

Conceptual

Data Model

Conceptual

Data Model

Physical

Data Model

Physical

Data Model

Tran

sform

ation Do not define FK

as an attribute.

Page 128: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 128 -

Dependent Relationship

PRONUM = PRONUM

PROJECT

PRONUM <pk>CUSNUM <fk>EMPNUM <fk>ACTBEGACTENDPRONAMEPROLABL

TASK

PRONUM <pk,fk>TSKNAME <pk>ACTBEGACTENDTSKCOST

Project

Project numberProject nameProject label

Task

Task nameTask cost

Conceptual

Data Model

Conceptual

Data Model

Physical

Data Model

Physical

Data Model

Tran

sform

ation

Page 129: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 129 -

Physical Data Model

PRONUM = PRONUMTSKNAME = TSKNAME

EMPNUM = EMPNUM

MATNUM = CPN_MATNUM

MATNUM = CPD_MATNUM

DIVNUM = DIVNUM

EMPNUM = EMPNUM

MATNUM = MATNUM

PRONUM = PRONUM

EMPNUM = EMPNUM

EMPNUM = EMP_EMPNUM

EMPNUM = EMPNUM

TEANUM = TEANUM

CUSNUM = CUSNUM

DIVISIONDIVNUM <pk>DIVNAMEDIVADDR

EMPLOYEEEMPNUM <pk>EMP_EMPNUM <fk>DIVNUM <fk>EMPFNAM <ak>EMPLNAM <ak>EMPFUNC <ak>EMPSAL

CUSTOMERCUSNUM <pk>CUSNAMECUSADDRCUSACTCUSTELCUSFAX

PROJECTPRONUM <pk>CUSNUM <fk>EMPNUM <fk>ACTBEGACTENDPRONAMEPROLABL

TEAMTEANUM <pk>TEASPE

TASKPRONUM <pk,fk>TSKNAME <pk>ACTBEGACTENDTSKCOST

MATERIALMATNUM <pk>MATNAMEMATTYPE

PARTICIPATEPRONUM <pk,fk>TSKNAME <pk,fk>EMPNUM <pk,fk>PARBEGPAREND

MEMBERTEANUM <pk,fk>EMPNUM <pk,fk>

USEDMATNUM <pk,fk>EMPNUM <pk,fk>

COMPOSECPD_MATNUM <pk,fk>CPN_MATNUM <pk,fk>

Physical Data Model

Project: Management

Model : Project Management

Author : User Version6.x 7/21/98

EMPLOYE_MATERIALMATERIAL.MATNAME char(30)PROJ.EMPLOYEE.EMPNUM numeric(5)PROJ.EMPLOYEE.EMPFNAM char(30)PROJ.EMPLOYEE.EMPLNAM char(30)PROJ.EMPLOYEE.EMPFUNC char(30)

MATERIALPROJ.EMPLOYEEUSED

Page 130: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 130 -

References (Relationships at the Physical Data Model)

Page 131: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 131 -

Referential Integrity

• The arrow is pointing from the table containing the foreign key to the table where the foreign key is used as a primary key.

Page 132: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 132 -

Deletion Rules

• Update Constraints

• Delete Constraints

–None

–Restrict

–Cascade

–Set null

–Set Default

Page 133: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 133 -

Generation of Oracle SQL DLL-- ============================================================-- Database name: PROJECT -- DBMS name: ORACLE Version 8 -- Created on: 7/21/98 8:59 PM -- ============================================================

-- ============================================================-- Table: DIVISION -- ============================================================create table ADMIN.DIVISION( DIVNUM numeric(5) not null constraint CKC_DIVNUM_DIVISION check (DIVNUM >= '1'), DIVNAME char(30) not null, DIVADDR char(80) null , constraint PK_DIVISION primary key (DIVNUM))/

-- ============================================================-- Table: CUSTOMER -- ============================================================create table PROJ.CUSTOMER( CUSNUM numeric(5) not null constraint CKC_CUSNUM_CUSTOMER check ( CUSNUM >= '1'), CUSNAME char(30) not null, CUSADDR char(80) not null, CUSACT char(80) null , CUSTEL char(12) null , CUSFAX char(12) null , constraint PK_CUSTOMER primary key (CUSNUM))/

-- ============================================================-- Table: TEAM -- ============================================================create table PROJ.TEAM( TEANUM numeric(5) not null constraint CKC_TEANUM_TEAM check (TEANUM >= '1'), TEASPE char(80) null , constraint PK_TEAM primary key (TEANUM))/

-- ============================================================-- Table: MATERIAL -- ============================================================create table PROJ.MATERIAL( MATNUM numeric(5) not null constraint CKC_MATNUM_MATERIAL check (MATNUM >= '1'), MATNAME char(30) not null, MATTYPE char(30) not null, constraint PK_MATERIAL primary key (MATNUM))/

-- ============================================================-- Table: EMPLOYEE -- ============================================================create table PROJ.EMPLOYEE( EMPNUM numeric(5) not null constraint CKC_EMPNUM_EMPLOYEE check ( EMPNUM >= '1'), EMP_EMPNUM numeric(5) null , DIVNUM numeric(5) not null, EMPFNAM char(30) null , EMPLNAM char(30) not null, EMPFUNC char(30) null , EMPSAL numeric(8,2) null , constraint PK_EMPLOYEE primary key (EMPNUM), constraint AK_EMP_AK1_EMPLOYEE unique (EMPLNAM, EMPFNAM, EMPFUNC))/

-- ============================================================-- Index: CHIEF_FK -- ============================================================create index PROJ.CHIEF_FK on PROJ.EMPLOYEE (EMP_EMPNUM asc)/

-- ============================================================-- Index: BELONGS_TO_FK2 -- ============================================================create index PROJ.BELONGS_TO_FK2 on PROJ.EMPLOYEE (DIVNUM asc)/

-- ============================================================-- Table: PROJECT -- ============================================================create table PROJ.PROJECT( PRONUM numeric(5) not null constraint CKC_PRONUM_PROJECT check ( PRONUM >= '1'), CUSNUM numeric(5) not null, EMPNUM numeric(5) null , ACTBEG timestamp null constraint CKC_ACTBEG_PROJECT check ( ACTBEG is null or ((activity.begindate < activity.enddate))), ACTEND timestamp null constraint CKC_ACTEND_PROJECT check ( ACTEND is null or ((activity.begindate < activity.enddate))), PRONAME char(30) not null, PROLABL char(80) null , constraint PK_PROJECT primary key (PRONUM))/

-- ============================================================-- Index: SUBCONTRACT_FK -- ============================================================create index PROJ.SUBCONTRACT_FK on PROJ.PROJECT (CUSNUM asc)/

-- ============================================================-- Index: IS_RESPONSIBLE_FOR_FK -- ============================================================create index PROJ.IS_RESPONSIBLE_FOR_FK on PROJ.PROJECT (EMPNUM asc)/

-- ============================================================-- Table: TASK -- ============================================================create table PROJ.TASK( PRONUM numeric(5) not null, TSKNAME char(30) not null, ACTBEG timestamp null constraint CKC_ACTBEG_TASK check (ACTBEG is null or ((activity.begindate < activity.enddate))), ACTEND timestamp null constraint CKC_ACTEND_TASK check (ACTEND is null or ((activity.begindate < activity.enddate))), TSKCOST numeric(8,2) not null, constraint PK_TASK primary key (PRONUM, TSKNAME), constraint CKT_TASK check ( (task.begindate < min(participate.begindate) and task.enddate < max(participate.enddate))))/

-- ============================================================-- Index: BELONGS_TO_FK -- ============================================================create index PROJ.BELONGS_TO_FK on PROJ. TASK (PRONUM asc)/

-- ============================================================-- Table: PARTICIPATE -- ============================================================create table PROJ.PARTICIPATE( PRONUM numeric(5) not null, TSKNAME char(30) not null, EMPNUM numeric(5) not null, PARBEG timestamp null constraint CKC_PARBEG_PARTICIP check (PARBEG is null or (((task.begindate < min(participate.begindate) and task.enddate < max(participate.enddate)) and (participate.begindate < participate.enddate)))), PAREND timestamp null constraint CKC_PAREND_PARTICIP check (PAREND is null or (((task.begindate < min(participate.begindate) and task.enddate < max(participate.enddate)) and (participate.begindate < participate.enddate)))), constraint PK_PARTICIPATE primary key (PRONUM, TSKNAME, EMPNUM), constraint CKT_PARTICIPATE check ( ((task.begindate < min(participate.begindate) and task.enddate < max(participate.enddate)) and (participate.begindate < participate.enddate))))/

-- ============================================================-- Index: WORKS_ON_FK -- ============================================================create index PROJ.WORKS_ON_FK on PROJ. PARTICIPATE (EMPNUM asc)/

-- ============================================================-- Index: IS_DONE_BY_FK -- ============================================================create index PROJ.IS_DONE_BY_FK on PROJ. PARTICIPATE (PRONUM asc, TSKNAME asc)/

-- ============================================================-- Table: MEMBER -- ============================================================create table PROJ.MEMBER( TEANUM numeric(5) not null, EMPNUM numeric(5) not null, constraint PK_MEMBER primary key (TEANUM, EMPNUM))/

-- ============================================================-- Index: MEMBER_FK -- ============================================================create index PROJ.MEMBER_FK on PROJ.MEMBER (TEANUM asc)/

-- ============================================================-- Index: IS_MEMBER_OF_FK -- ============================================================create index PROJ.IS_MEMBER_OF_FK on PROJ.MEMBER (EMPNUM asc)/

-- ============================================================-- Table: USED -- ============================================================create table PROJ.USED( MATNUM numeric(5) not null, EMPNUM numeric(5) not null, constraint PK_USED primary key (MATNUM, EMPNUM))/

-- ============================================================-- Index: USED_FK -- ============================================================create index PROJ.USED_FK on PROJ.USED (MATNUM asc)/

-- ============================================================-- Index: USES_FK -- ============================================================create index PROJ.USES_FK on PROJ.USED (EMPNUM asc)/

-- ============================================================-- Table: COMPOSE -- ============================================================create table PROJ.COMPOSE( CPD_MATNUM numeric(5) not null, CPN_MATNUM numeric(5) not null, constraint PK_COMPOSE primary key (CPD_MATNUM, CPN_MATNUM))/

-- ============================================================-- Index: COMPOSES_FK -- ============================================================create index PROJ.COMPOSES_FK on PROJ.COMPOSE (CPD_MATNUM asc)/

-- ============================================================-- Index: COMPOSED_OF_FK -- ============================================================create index PROJ.COMPOSED_OF_FK on PROJ.COMPOSE (CPN_MATNUM asc)/

alter table PROJ.EMPLOYEE add constraint FK_EMPLOYEE_CHIEF_EMPLOYEE foreign key (EMP_EMPNUM) references PROJ.EMPLOYEE (EMPNUM)/

alter table PROJ.EMPLOYEE add constraint FK_EMPLOYEE_BELONGS_T_DIVISION foreign key (DIVNUM) references ADMIN.DIVISION (DIVNUM)/

alter table PROJ.PROJECT add constraint FK_PROJECT_SUBCONTRA_CUSTOMER foreign key (CUSNUM) references PROJ.CUSTOMER (CUSNUM)/

alter table PROJ.PROJECT add constraint FK_PROJECT_IS_RESPON_EMPLOYEE foreign key (EMPNUM) references PROJ.EMPLOYEE (EMPNUM)/

alter table PROJ.TASK add constraint FK_TASK_BELONGS_T_PROJECT foreign key (PRONUM) references PROJ.PROJECT (PRONUM)/

alter table PROJ.PARTICIPATE add constraint FK_PARTICIP_WORKS_ON_EMPLOYEE foreign key (EMPNUM) references PROJ.EMPLOYEE (EMPNUM)/

alter table PROJ.PARTICIPATE add constraint FK_PARTICIP_IS_DONE_B_TASK foreign key (PRONUM, TSKNAME) references PROJ.TASK (PRONUM, TSKNAME)/

alter table PROJ.MEMBER add constraint FK_MEMBER_MEMBER_TEAM foreign key (TEANUM) references PROJ.TEAM (TEANUM)/

alter table PROJ.MEMBER add constraint FK_MEMBER_IS_MEMBER_EMPLOYEE foreign key (EMPNUM) references PROJ.EMPLOYEE (EMPNUM)/

alter table PROJ.USED add constraint FK_USED_USED_MATERIAL foreign key (MATNUM) references PROJ.MATERIAL (MATNUM)/

alter table PROJ.USED add constraint FK_USED_USES_EMPLOYEE foreign key (EMPNUM) references PROJ.EMPLOYEE (EMPNUM)/

alter table PROJ.COMPOSE add constraint FK_COMPOSE_COMPOSES_MATERIAL foreign key (CPD_MATNUM) references PROJ.MATERIAL (MATNUM)/

alter table PROJ.COMPOSE add constraint FK_COMPOSE_COMPOSED__MATERIAL foreign key (CPN_MATNUM) references PROJ.MATERIAL (MATNUM)/

Page 134: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 134 -

Referential Integrityalter table PROJ.EMPLOYEE add constraint FK_EMPLOYEE_CHIEF_EMPLOYEE foreign key (EMP_EMPNUM) references PROJ.EMPLOYEE (EMPNUM)/alter table PROJ.EMPLOYEE add constraint FK_EMPLOYEE_BELONGS_T_DIVISION foreign key (DIVNUM) references ADMIN.DIVISION (DIVNUM)/alter table PROJ.PROJECT add constraint FK_PROJECT_SUBCONTRA_CUSTOMER foreign key (CUSNUM) references PROJ.CUSTOMER (CUSNUM)/alter table PROJ.PROJECT add constraint FK_PROJECT_IS_RESPON_EMPLOYEE foreign key (EMPNUM) references PROJ.EMPLOYEE (EMPNUM)/alter table PROJ.TASK add constraint FK_TASK_BELONGS_T_PROJECT foreign key (PRONUM) references PROJ.PROJECT (PRONUM)/

Page 135: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 135 -

Physical Database Design Activities

Define Tables & Columns

Define Keys

Identify Critical Transactions

Add Columns: • Redundant columns• Derived data columns

Manipulate Tables: • Collapse tables• Supertypes & subtypes

Add Tables: • Derived data

tables

Handle Integrity Issues: • Row uniqueness & Domain restrictions• Referential integrity & Generate sequence numbers• Derived and redundant data

Controlling Access

Manage Objects: • Sizes• Placement

Source: Gillete, Rob, etc., Physical Database Design for Sybase SQL Server, Prentice Hall, 1995.

Page 136: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 136 -

Architecture of Data Warehouse

CorporateOperationalDatabase

Data Warehouse End UserAccess and OLAP front-end Tools

• EIS• DSS• Report Writers• Spreadsheets

Summarized

Detailed

PastCurrent

DataReplication & Cleansing

InformationalDatabase

• Data extraction• Data filtering• Table joining• Translation• Re-Formatting

Projected

Derived

Data Bridging/Transformation

MetadataInfo. Directory

Page 137: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 137 -

Operational vs. Informational Databases

DataContent

Data organizations

Data Volatility

Data normalization

Access frequency

DataUpdate

Usage

Response Time

Current value

Application by application

Dynamic

Fully normalized for transaction processing

High

Updated on a record and fieldbasis

Highly structured transaction processing

Sub-second to 2-3 seconds

Archival data, summarized data, calculated data

Subject areas across enterprise

Static until refreshed

Joined views suitable for business analysis

Low - Medium

Access only; no direct update

Highly unstructured, heuristicor analytical processing

Several seconds to minutes

Operational Database Informational DatabaseCharacteristics

Page 138: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 138 -

Relational View

Multidimensional View

Excel Pivot Table Wizard

Page 139: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 139 -

Dimensional ModelProduct• Key• Name• Description• Size• Price

Promotion• Key• Description• Discount• Media

Market Region• Key• Description• District• Region• Demographics

Time• Key• Weekday• Holiday• Fiscal

Sale

Product KeyMarket KeyPromotion KeyTime Key• Dollars• Units• Price• Cost

Time

RegionProduct

Page 140: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 140 -

Modeling a Data Warehouse

• MDM: Multidimensional Modeling – A logical model of business information

– Easy to understand

– Applicable to relational and multidimensional databases

– Extremely useful for analysis

– A tried-and-tested techniques

• Why? – An OLTP (On-Line Transaction Process) design of an

order processing system may have dozens or hundreds of tables. It becomes difficult for business managers to understand the design in order to analyze the data.

Page 141: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 141 -

Approach

• Designed around numeric data: – values

– counts

– weights

– occurrence

• An example of a MDM problem statement: – "What is my profitability by customer over

time, by organization?"

Page 142: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 142 -

The Classic Star Schema

Market IDdescriptionregionstatedistrictcity

Product IDdescriptionsupplier IDbrandcolorsize

Period IDdescriptionyearquartermonthcurrent flagresolutionsequence

Market IDProduct ID Period IDdollarsunitsprice

Market Dimension

Product Dimension

Fact Table Period Dimension

Each dimension is described by its own table and the facts are arranged in a single large table with a concatenated primary key comprises the individual keys of each dimension.

Page 143: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 143 -

Snow Flake Structure

Product identifier = Product identifier

Brand identifier = Brand identifier

Year identifier = Year identifier

Quarter identifier = Quarter identifier

Month identifier = Month identifier

Week identifier = Week identifier

Country identifier = Country identifier

Region identifier = Region identifier

Time identifier = Time identifier

Customer identifier = Customer identifier

Store identifier = Store identifier

Customer

Customer identifier <pk> intCustomer name char(30)Customer address char(80)Customer activity char(80)Customer phone number char(12)Customer fax number char(12)

Sale

Time identifier <fk> intCustomer identifier <fk> intStore identifier <fk> intProduct identifier <fk> intSale total realSale revenu real

Store

Store identifier <pk> intRegion identifier <fk> intStore name char(50)Store address char(80)Store manager char(30)Store phone number char(20)Store FAX number char(20)Store financial services type char(10)Store photo services type char(10)

DayWeek identifier <fk> intTime identifier <pk> intDate datetimeDay of week char(30)Day number in month int

ProductProduct identifier <pk> intBrand identifier <fk> intProduct description char(80)Product category char(30)Product unit price int

Region

Region identifier <pk> intCountry identifier <fk> intRegion name char(30)

Country

Country identifier <pk> intCountry name char(80)

Year

Year identifier <pk> intYear name char(30)

Month

Month identifier <pk> intQuarter identifier <fk> intMonth name char(10)

Quarter

Quarter identifier <pk> intYear identifier <fk> intQuarter name char(10)

Week

Week identifier <pk> intMonth identifier <fk> intWeek name char(30)Week number in year int

Brand

Brand identifier <pk> intBrand name char(30)

Page 144: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 144 -

Steps to Build MDM• Pick a business subject area

– Weekly sales reports, monthly financial statements, insurance claim costs.

• Asking six fundamental questions: – What business process is being modeled?

– At what level of detail (granularity) is "active" analysis conducted?

– What do the measures have in common (the "dimensions")?

– What are the dimensions' attributes?

– Are the attributes stable or variable over time and is their "cardinality" bounded or unbounded?

Page 145: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 145 -

Issues

• Active analysis– Mechanical manipulation: Pivoting, Drilling

down, Graphing – Agent-based manipulation: Alert reporting,

exception reporting– Workflow manipulation: Publishing,

distributing documents. • Cardinality means "how many"

– A relational database usually has "unbounded" cardinality

– A multidimensional database usually has "bounded" cardinality. Complete reorganization is needed to change cardinality.

Page 146: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 146 -

A Data Model for an Electronic Commerce Applicationdept_id = parent_id

sku = sku

pfid = pfid

shopper_id = shopper_id

pfid = pfid

shopper_id = shopper_id

pfid = pfid

order_id = order_id

pfid = pfid

pfid = pfid

dept_id = dept_id

basketshopper_id char(32)date_changed datetimemarshalled_order image

deptdept_id intparent_id intname varchar(255)description textdate_changed datetime

product_attributepfid varchar(30)attribute_id tinyintattribute_index tinyintattribute_value varchar(20)

product_familypfid varchar(30)dept_id intmanufacturer_id intname varchar(255)short_description varchar(255)long_description textimage_filename varchar(255)intro_date datetimedate_changed datetimelist_price intmonogramable tinyint

product_variantsku intpfid varchar(30)attribute0 tinyintattribute1 tinyintattribute2 tinyintattribute3 tinyintattribute4 tinyint

promo_crosspfid varchar(30)related_pfid varchar(30)description varchar(255)

promo_pricepromo_name varchar(255)promo_type intpromo_description textpromo_rank intactive intdate_start datetimedate_end datetimeshopper_all intshopper_column varchar(64)shopper_op varchar(2)shopper_value varchar(64)cond_all intcond_column varchar(64)cond_op varchar(2)cond_value varchar(64)cond_basis char(1)cond_min intaward_all intaward_column varchar(64)award_op varchar(2)award_value varchar(64)award_max intdisjoint_cond_award intdisc_type char(1)disc_value real

promo_upsellpfid varchar(30)related_pfid varchar(30)description varchar(255)

receiptorder_id char(26)shopper_id char(32)total intstatus tinyintdate_entered datetimedate_changed datetimemarshalled_receipt image

receipt_itempfid varchar(30)sku intorder_id char(26)row_id intquantity intadjusted_price int

shoppershopper_id char(32)created datetimename varchar(235)password varchar(20)street varchar(50)city varchar(50)state varchar(30)zip varchar(15)country varchar(20)phone varchar(16)email varchar(50)

Page 147: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 147 -

Attribute 0 of pfid 14 is size and the attribute value 1 is Grande

and 2 is Tall and 3 is Short

Page 148: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 148 -

Web-based Build-To-Order Application

Page 149: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 149 -

Data Model for Build-To-Order Application

Page 150: Data Modeling and Database Design Minder Chen, Ph.D. mchen@gmu.edu is assigned to contains staffed by subcontract member is a member of belongs to Employee

© Minder Chen, 1993~2002 Data Modeling - 150 -

http://www.oracle.com/tools/jdeveloper/documents/jsptwp/index.html?content.html

Auction Web Site's Data Model