Upload
steven-rodgers
View
216
Download
3
Tags:
Embed Size (px)
Citation preview
2
Data design When we build a new database …
– Relational database design in DBMS When we transform a existing database
…– Data manipulation (merge, clean, format,
etc.)– Information Retrieval
3
What Is a DBMS?
A very large, integrated collection of data.
Models real-world enterprise.– Entities (e.g., students, courses)– Relationships (e.g., Madonna is taking CSC
132) A Database Management System
(DBMS) is a software package designed to maintain and utilize databases.
4
Why Use a DBMS?
Data independence and efficient access. Reduced application development time. Data integrity and security. Uniform data administration. Concurrent access, recovery from
crashes.
5
Why study Database implementation?
Good job market. – Web Developer– SQL Programmer (development DBA)– Database Administrator (production
DBA)– Data Analyst
Graduate school– DBMS encompasses most of CS
6
Data Models A data model is a collection of concepts for
describing data. A schema is a description of a particular
collection of data, using the a given data model.
The relational model of data is the most widely used model today.– Main concept: relation, basically a table with rows
and columns.– Every relation has a schema, which describes the
columns, or fields.
7
Levels of Abstraction
Many views, single conceptual (logical) schema and physical schema.– Views describe how
users see the data.
– Conceptual schema defines logical structure
– Physical schema describes the files and indexes used.DDL (CREAT, ALTER, DROP); DML (SELECT, INTERT, UPDATE);
DCL (GRANT, REVOKE); TCL (COMMIT, SAVEPOINT, ROLLBACK).
Physical Schema
Conceptual Schema
View 1 View 2 View 3
8
Example: University Database
Conceptual schema: – Students (sid: string, name: string, login: string, age:
integer, gpa:real)– Courses (cid: string, cname:string, credits:integer) – Enrolled (sid:string, cid:string, grade:string)
Physical schema:– Relations stored as unordered files. – Index on first column of Students.
External Schema (View): – Course_info(cid:string,enrollment:integer)
9
Data Independence
Applications insulated from how data is structured and stored.
Logical data independence: Protection from changes in logical structure of data.
Physical data independence: Protection from changes in physical structure of data.
One of the most important benefits of using a DBMS!
12
Overview of db design Requirement analysis
– Data to be stored– Applications to be built– Operations (most frequent) subject to performance requirement
Conceptual db design– Description of the data (including constraints)– By high level model such as ER
Logical db design– Choose DBMS to implement– Convert conceptual db design into database schema
Beyond ER design Schema refinement (normalization) Physical db design
– Analyze the workload– Indexing
Security design
13
Conceptual design
Issues to consider: (ER Model is used at this stage.) – What are the entities and relationships in the enterprise?– What information about these entities and relationships
should we store in the database (i.e., attributes)?– What are the integrity constraints or business rules
that hold? Solution:
– A database `schema’ in the ER Model can be represented pictorially (ER diagrams).
– Can map an ER diagram into a relational schema.
14
University database
Entities: Students, professors, courses, textbook, classroom, transcript, emails
Attributes: terms, ssn , birthdate, cell phone, account balance, parents, age, gender, gpa, major, classification, grade, name.
15
ER Model Basics
Entity: Real-world object distinguishable from other objects. An entity is described (in DB) using a set of attributes.
Entity Set: A collection of similar entities. E.g., all employees. – All entities in an entity set have the same set of
attributes. – Each entity set has a key.– Each attribute has a domain.
Employees
ssnname
lot
What’s should be entities? Attributes?
What’s the key? How many keys one object can have?
16
A Universal Data Model for All?
Companies
Employees Departments
Name Business
Name Locationssn Budget
17
Key
A key is a minimal set of attributes whose values uniquely identify an entity in the set.
Candidate key. Primary key.
18
Entity, Entity Set, Attribute, and Schema
ID or SSN Name UserID Age GPA999-38-4431 John Smith jsmith 21 3.68999-28-3341Miki Jordan mjordan 28 3.45
331-43-4567 David Kim dkim 25 4.00
535-34-5678 Paul Lee plee 26 3.89
19
Relationship: Association among 2 or more entities. E.g., Sam works in the Accounting Department.
Relationship Set: Collection of similar relationships. E.g., Many individuals works in many different departments.
ER Model Basics (Contd.)
lot
dname
budgetdid
sincename
Works_In DepartmentsEmployees
ssn
20
Entity vs. Entity Set
Example: Student
John Smith (999-21-3415, jsmith@, John Smith, 18, 3.5)
Students in CSC439
999-21-3415, jsmith@, John Smith, 18, 3.5999-31-2356, jzhang@, Jie Zhang, 20, 3.0999-32-1234, ajain@, Anil Jain, 21, 3.8
21
Example of Keys
999-21-3415, jsmith@, John Smith, 18, 3.5999-31-2356, jzhang@, Jie Zhang, 20, 3.0999-32-1234, ajain@, Anil Jain, 21, 3.8
Primary key Candidate key
22
Relationship vs. Relationship Set
John Smith
ITCS3160
(999-21-3415, jsmith@, John Smith, 18, 3.5)
(3160, ITCS, DBMS, J. Fan, 3, Kenn. 236)
Relationship
23
Relationship vs. Relationship Set
999-21-3415, jsmith@, John Smith, 18, 3.5999-31-2356, jzhang@, Jie Zhang, 20, 3.0999-32-1234, ajain@, Anil Jain, 21, 3.8
3160, ITCS, DBMS, J. Fan, 3, Kenn. 2366157, ITCS, Visual DB, J. Fan, 3, Kenn. 236
Relationship set(“Enrolled in”)
Students
Courses
24
Students
Name Login
Id
Age
GPACourses
Id Name Credit
Enrolled_In
Grade
Relationship vs. Relationship Set
Room
Descriptive attribute
25
Example 1
Build an ER-diagram for a university database:– Students
Have an Id, Name, Login, Age, GPA– Courses
Have an Id, Name, Credit Hours– Students enroll in courses
Receive a grade
26
Example 2
Build an ER Diagram for a hospital database:– Patients
Name, Address, Phone #, Age– Drugs
Name, Manufacturer , Expiration Date– Patients are prescribed of drugs
Dosage, # Days
29
Potential Relationship Types
Mary studies in the CS Dept. Tom studies in the CS Dept. Jack studies in the CS Dept. … The CS Dept has lots of students. No student in the CS Dept works in other else Dept
at the same time.
Students CS DeptIN? ?
30
Potential Relationship Types
Mary is taking the ITCS3160,ITCS2212. Tom is taking the ITCS3160, ITCS2214. Jack is taking the ITCS1102, ITCS2214. … 61 students are taking ITCS3160. 120 students are taking ITCS2214.
Students Coursestake? ?
31
lot
dname
budgetdid
sincename
Works_In DepartmentsEmployees
ssn
Key Constraints
Consider Works_In: An employee can work in many departments; A dept can have many employees.
32
lot
dname
budgetdid
sincename
Works_In DepartmentsEmployees
ssn
Key Constraints
Consider Works_In: An employee can work in at most one department; A dept can have many employees.
Why "work-in" is not "key constraint"??
33
Key Constraints
In contrast, each dept has at most one manager, according to the key constraint on Manages.
dname
budgetdid
since
lot
name
ssn
ManagesEmployees Departments
Key Constraint
(time constraint)
At most one!!!
34
Participation Constraints
Does every department have a manager?– If so, this is a participation constraint: the
participation of Departments in Manages is said to be total (vs. partial).
Every did value in Departments table must appear in a row of the Manages table (with a non-null ssn value!)lot
name dnamebudgetdid
sincename dname
budgetdid
since
Manages
since
DepartmentsEmployees
ssn
Works_In
Total w/keyconstraint
Partial
TotalTotal
35
lot
name dnamebudgetdid
sincename dname
budgetdid
since
Manages
since
DepartmentsEmployees
ssn
Works_In
Total & keyconstraint
Total
TotalTotal
What are the policies behind this ER model?
36
lot
name dnamebudgetdid
sincename dname
budgetdid
since
Manages
since
DepartmentsEmployees
ssn
Works_In
Total w/keyconstraint
Partial
TotalTotal
dname
budgetdid
since
lot
name
ssn
ManagesEmployees Departments
Works_InAny Difference?
37
Weak Entities vs. Owner Entities
A weak entity can be identified uniquely only by considering the primary key of another (owner) entity.– Owner entity set and weak entity set must
participate in a one-to-many relationship set (1 owner, many weak entities).
– Weak entity set must have total participation in this identifying relationship set.
lot
name
agepname
DependentsEmployees
ssn
Policy
cost
Weak EntityIdentifying Relationship
Primary Keyfor weak entity
38
dnamebudgetdid
name
Departments
ssn lot
Employees Works_In3
Durationfrom to
Ternary Relationship
lot
dname
budgetdid
sincename
Works_In DepartmentsEmployees
ssn
Why?
39
ISA (`is a’) Hierarchies
Contract_Emps
namessn
Employees
lot
hourly_wages
ISA
Hourly_Emps
contractid
hours_workedAs in C++, or other PLs, attributes are inherited.If we declare A ISA B, every A entity is also considered to be a B entity. Overlap constraints: Can Joe be an Hourly_Emps as well
as a Contract_Emps entity? (Allowed/disallowed) Covering constraints: Does every Employees entity also
have to be an Hourly_Emps or a Contract_Emps entity? (Yes/no)
Reasons for using ISA: – To add descriptive attributes specific to a subclass.
– To identify entitities that participate in a relationship.
40
Aggregation Used when we have
to model a relationship involving (entitity sets and) a relationship set.– Aggregation allows
us to treat a relationship set as an entity set for purposes of participation in (other) relationships.
– Monitors mapped to table like any other relationship set.
budgetdidpid
started_on
pbudgetdname
until
DepartmentsProjects Sponsors
Employees
Monitors
lotname
ssn
Aggregation
41
Real Database Design
Build an ER Diagram for the following information:– Walmart Stores
Store Id, Address, Phone #
– Products Product Id, Description, Price
– Manufacturers Name, Address, Phone #
– Walmart Stores carry products Amount in store
– Manufacturers make products Amount in factory/warehouses
42
Conceptual Design Using the ER Model
Design choices:
– Should a concept be modeled as an entity or an attribute?
– Should a concept be modeled as an entity or a relationship?
– Identifying relationships: Binary or Ternary? Aggregation?
Always follow the requirements.
43
Entity vs. Attribute
Should address be an attribute of Employees or an entity (connected to Employees by a relationship)?
Depends upon the use we want to make of address information, and the semantics of the data:
If we have several addresses per employee, address must be an entity (since attributes cannot be set-valued).
If the structure (city, street, etc.) is important, e.g., we want to retrieve employees in a given city, address must be modeled as an entity (since attribute values are atomic).
44
Entity vs. Attribute (Contd.)
Works_In2 does not allow an employee to work in a department for two or more periods.
Similar to the problem of wanting to record several addresses for an employee: we want to record several values of the descriptive attributes for each instance of this relationship.
name
Employees
ssn lot
Works_In2
from todname
budgetdid
Departments
dnamebudgetdid
name
Departments
ssn lot
Employees Works_In3
Durationfrom to
45
Entity vs. Relationship First ER diagram OK if
a manager gets a separate discretionary budget for each dept.– Redundancy of
dbudget, which is stored for each dept managed by the manager.
– Misleading: suggests dbudget tied to managed dept.
What if a manager gets a discretionary budget that covers all managed depts?
Manages2
name dnamebudgetdid
Employees Departments
ssn lot
dbudgetsince
Employees
since
name dnamebudgetdid
Departments
ssn lot
Manager
Manages3
dbudget
IsA
46
Binary vs. Ternary Relationships
Requirements: A policy not to be owned by more than one
employee. Every policy must be owned by some employee. Dependents is a weak entity set. Each identified
by pname +policyid
agepname
DependentsCovers
name
Employees
ssn lot
Policies
policyid cost
Bad design
*
47
Binary vs. Ternary Relationships
If each policy is owned by just 1 employee:– Key
constraint on Policies would mean policy can only cover 1 dependent!
agepname
DependentsCovers
name
Employees
ssn lot
Policies
policyid cost
Bad design
Beneficiary
agepname
Dependents
policyid cost
Policies
Purchaser
name
Employees
ssn lot
Better design
*
48
Binary vs. Ternary Relationships (Contd.)
Previous example illustrated a case when binary relationships were better than one ternary relationship.
An example in the other direction: a ternary relation Teaches relates entity set Students, Professors and Courses, and has descriptive attributes term and year. No combination of binary relationships is an adequate substitute:– P teaches S, S takes C, P teaches C, do not necessarily
imply that P indeed teaches S of C!– How do we record term and year?
49
Students
Professors Courses
Teaches
term year
Professors Courses Students
Teaches Takes
term year term year
50
Summary of Conceptual Design
Conceptual design follows requirements analysis, – Yields a high-level description of data to be stored
ER model popular for conceptual design– Constructs are expressive, close to the way people
think about their applications. Basic constructs: entities, relationships, and
attributes (of entities and relationships). Some additional constructs: weak entities,
ISA hierarchies, and aggregation. Note: There are many variations on ER model.
51
Summary of ER (Contd.)
Several kinds of integrity constraints can be expressed in the ER model: key constraints, participation constraints, and overlap/covering constraints for ISA hierarchies. Some foreign key constraints are also implicit in the definition of a relationship set.– Some constraints (notably, functional dependencies)
cannot be expressed in the ER model.– Constraints play an important role in determining
the best database design for an enterprise.
52
Summary of ER (Contd.)
ER design is subjective. There are often many ways to model a given scenario! Analyzing alternatives can be tricky, especially for a large enterprise. Common choices include:– Entity vs. attribute, entity vs. relationship, binary or
n-ary relationship, whether or not to use ISA hierarchies, and whether or not to use aggregation.
Ensuring good database design: resulting relational schema should be analyzed and refined further. FD information and normalization techniques are especially useful.