Upload
nicholas-york
View
217
Download
1
Tags:
Embed Size (px)
Citation preview
Database management
• Teréz A. Várkonyi• [email protected]• +361 666 57 29• http://uni-obuda.hu/users/varkonyi.teri• Antal Bejczy Center for Intelligent Robotics• 82 Kiscelli str.
Requirements
• Excercise for the semester• 3 lab tests• 2 theoretical tests• Oral exam• Moodle: elearning.uni-obuda.hu• Teacher changes
Brief contents
• Information processing• Database management systems• Relational data model• Basic terms• Designing a database• Normalization• Relation decomposition
Information processing
Information processing
• Information: from which data can be derived– Oral– Written or printed– Electronic
• Needs systematization
Steps of information processing
• Gathering useful facts: what kind of knowledge do we need?
• Encoding: e.g. linguistic, magnetic, electronic, etc. (useful for those who know the code)
• Recording• Utilizing: searching, sorting, grouping, finding
correspondence
Difficulties
• What if the collected data is not correct?• Information is power – keep it secret!
– Bodyguard?– Secret code?– Firewall?
• Data transmission
Advantages of computers
• XVIII. century – Hollerith card– numerical calculations– 1890: census in the USA
• 2nd generation: compilers, bigger capacity, first real info processing applications
• 3rd generation: operating systems, bigger storage capacity, parallel computing, more opportunities
Database Management Systems
Database Management Systems
• New concept: gathering all the data and correspondence to one integrated database
• Answer the questions with this DB
• Giant data collection (needs storing, processing)• Basic elements
– Entity: e.g. students, courses, etc.– Relationships: e.g. David attends Database
management
DBMS – what for?
• E.g.: registries, banks, facebook• Amount of data in business sector in 2012:
360 GB/person• Data independence and effective searching• Data integrity, safety• Unified administration• Concurrent access, fault tolerant process
(quick reboot after crash)
• Control of replication• Support of quick application development• Standardization
– Methods– Programs– Access– Etc.
Why learn database management?
• Variety of tasks it can solve• Information processing: increased need• Quantity and heterogenity of data is growing
every day– Digital libraries, interactive videos, e-trade, sensor
nets, telecommunication• Many areas are used from computer science
– OS, programming, theory, AI, multimedia, logic
Motivation and tool
• Number of users and tasks increases• Need for the ad hoc type serving• Need for the uniform serve• Course costs 2000€• Database approach
– Every data and relationship in one database– Serve everyone from the same database– Access might be limited (not to the whole DB)
Basic principle
• We start from the data we have and• we collect every entity and their relationship• into one integrated database.• Every user can use this or• the part of this database for answering
questions.
Basic terms
• Data model: notion collection describing the data
• Schema: Describing a dataset with a data model
• Relational data model: nowadays mostly used– Relation: table– Every relation has a schema describing the
structure of the relation (attributes)
Example – relational data model
• Relational data model: columns (attributes), rows (records), tables (entities), relations between them
• Schema: Table COURSES with attributes NEPTUN, Course name and Teacher
NEPTUN Course name Teacher
NAIAB0SEND Database managementTeréz A. Várkonyi
NAMIK1ERNMKinematics and Dynamics of Industrial Robots Péter Zentay
Data representation
• What are the boundaries of the questions the database can answer?
• Models the real world: mini world with limitations• 3 levels:
– Conceptual model: A world described by the DB– Implementation/representation model: a model
understandable for the DBMS (structured records, tables, fields, etc.)
– Physical model: DBMS implemented on the computer (files, programs)
Structure of the DBMS
• The user has permission for the smaller part of the DB: View
Conceptual model
Implementation model
Physical model
View1 View2 View3
Example: university DB
• Conceptual model:– Student (sid: string, name: string, age: integer,
cumulative average: real)– Subject (subid: string, sname: string, credit:
integer)– Registration (sid: string, subid: string, mark:
integer, date: date)
Example: university DB
• Implementation model:– Create table subject
(subid varchar(10) not null primary key,sname varchar (50) not null,credit int not null
)• Physical model: files containing unsorted data• View: Teachers can see info about their own
courses
Relational data model
Relational data model
• 4 types of data model– Hierarchic data model (data trees, 1:N relations)– Net data model– Relational data model– Object oriented data model
Relational data model
• Relation:table+constraints
• Column headers:attribute/domain
• Rows:data records/tuples
• Database:Set of tables
Relationship
• 1:1 (one to one) relationship– Person&ID number– Husband&wife
Rare in real world
Relationship
• 1:N (one to many) relationship:– Mother&Children– Owner&Cars
Person Owns Car
Relationship
• M:N (many to many) relationship– Actor&plays– Teachers&Students
Actor Acts Play
Requirements
• There cannot be two identical row or column• The order of the column cannot carry
information• Superkey: set of attributes that
unambiguously defines the other attributes of every row (NEPTUN+semester for students)
• Key: Superkey which cannot be reduced
Keys as frame
• Basic terms: Primary key (Person:ID), foreign key(Owns:Owner’s ID), simple/composite key(Person:ID/Owns:Owner ID+Car’s plate)
• System of keys=frame of databaseOWNERS
ID Name
1 John Doe
2 Jane Doe
CARS Plate Type
OMW-123 Porsche
ABC-234 Porsche
DEF-456 Ferrari
OWNS Plate Owner
OMW-123 1
ABC-234 1
DEF-456 2
Relational algebra
Mathematics
NOOOOOOO!
Basic terms
• Elements: , , 𝑎 𝑏 𝑐• Sets: , , 𝐴 𝐵 𝐶• Defining a set:
– enumeration: ={ , , }, thus 𝐴 𝑎 𝑏 𝑐 𝑎∈𝐴– rules: ={ | ≥100 ≤1000}𝐵 𝑥 𝑥 ∧𝑥
• Subset: , ha :𝐴⊂𝐵 ∀𝑎∈𝐴 𝑎∈𝐵• Ordered set (vector): = , , 𝑞 ⟨𝑎 𝑏 𝑐⟩• Descartes-multiplication: × , e.g. ={0,1},𝐴 𝐵 𝐴
={ , }, then × ={ 0, , 0, , 1, , 1, }𝐵 𝑎 𝑏 𝐴 𝐵 ⟨ 𝑎⟩ ⟨ 𝑏⟩ ⟨ 𝑎⟩ ⟨ 𝑏⟩
Attributes, dependencies, keys
• Attributes: sets A, B, C, D, E• Entities ~ set of attributes: • Dependency: function
„others” {C,D,E} depend on the key {A,B}• Key: A,B
– Simple– Composite
• Secondary attributes: C, D, E
Example
• fworker={name, institute}{salary, room}• Key: name, institute• Secondary: salary, room• fworker={name, institute}{salary}
• fworker={name, institute}{room}
Operation with dependecies
• Unify: left hand side is equivalent
• Compositionfworker={name, institute}{salary, room}fworker={name, institute}{salary}fworker={name, institute}{room}
&
Relational schema
• Descartes multiplication of given attributes and dependencies
• Gives the structure of the database• Relation: the tables with data that fulfills the
schema– Columns: attributes– Rows: records
Connection of relations, foreign key
• Relation r’s attributes can be extended to relation s if the attributes of the key of s (Ks) are attributes of r
• Ks is called foreign key in relation r if– It is primary key in s– The values in r exist in s
• Relationship of s and r is 1:N
Example
OWNERS (S)ID Name
1 John Doe
2 Jane Doe
CARS (R) Plate Type Owner
OMW-123 Porsche 1
ABC-234 Porsche 2
DEF-456 Ferrari 1
Anomalies
• Insertion anomaly: Superkey needs too much data, some is missing, cannot be inserted. Solution: Reduce superkey to key.
• Update anomaly: a value exists in several places in the database, it has to modified in each places. Solution: Store data in one separate table and modify once.
• Deletion anomaly: By deleting a row, other important information is lost (different objects stored in one table – not good).
Example
• Relation={product_code, product_name, product_description,price,supplyer, supplyer_address}
• Update anomaly: change in the address, modify it everywhere
• Insertion anomaly: new product without price• Deletion anomaly: lost contact with supplyer.
Shall we delete the products also?
Example no. 2
• Teachers(ID, name,address,telephone, course_name, semester/hours, requirements)
• Two entity sets in one relation• Solution: divide into relations
– Teachers(ID,name,address,telephone)– Courses(course_name,semester/
hours,requirements)– Teach (teacher_id,course_name)
Database normalization
Steps of designing a database
• Collect the attributes to be stored!• Write down the dependencies!• Know very well your data model!• To avoid anomalies, normalize!
1 NF
• Every value in every row is a single value• Does not contain embedded tables/records• Oracle 8 supports embedded data• Be careful
Example
Name Field of research
T. A. Várkonyi Mathematics
T. A. Várkonyi Computer science
Name Field of researchT. A. Várkonyi Mathematics,
Computer science
2 NF
• 1 NF and there cannot be data in the relation that depends only on the part of the key (no partial functional dependency)
• Example: Order(date, buyer_ID, product_ID, product_no,product_description, comments)
• key: date,buyer_ID,product_ID• product_ID product_description• Solution: create a new table for the products
(product_ID,product_description)
3 NF
• 2 NF and there is no secondary attribute in the relation that depends on a secondary attribute (no transitive dependency)
• Example: soft_drink(name,bottle_type, manufacturer_name,manufacturer_address)
• Key: name,bottle_type• Manufacturer_namemanufacturer_address
BCNF (Boyce-Codd NF)
• 3 NF and there is no subset of the key that depends on other key or secondary attribute
• Example: let’s assume that every teacher has only one course to teach: {teacher, year} course
• Neptun(teacher,year,semester,course,headcount)• Keys:
– teacher,year,semester
– course,year,semester
Teacher
Teacher Year Semester Course Headcount
TA Varkonyi 2014/2015 1 Database m. 25
Zsolt Szabo 2014/2015 1 Database lab 25
TA Varkonyi 2014/2015 2 Database m. 25
Zsolt Szabo 2014/2015 2 Database lab 25
TA Varkonyi 2014/2015 3 Database m. 25
Solution
Year Semester Course Headcount
2014/2015 1 Database m. 25
2014/2015 1 Database lab 25
2014/2015 2 Database m. 25
2014/2015 2 Database lab 25
2014/2015 3 Database m. 25
Teacher Year Course
TA Varkonyi 2014/2015 Database m.
Zsolt Szabo 2014/2015 Database lab
TA Varkonyi 2014/2015 Database m.
Zsolt Szabo 2014/2015 Database lab
TA Varkonyi 2014/2015 Database m.
Conclusions
• If a relation is in 0 NF (can be put in tables) and does not contain multiple field then it is in at least 1 NF
• If a relation is in 1 NF and does not contain partial functional dependency then it is in at least 2 NF
• If a relation is in 2 NF and does not contain transitive functional dependency then it is in at least 3 NF
3 NF vs. BCNF
• A 3 NF is not in BCNF if– There are several possible keys,– these keys are composite, and– there is a common attribute in the keys
Decomposition of relations
Motivation
• Decompose the original relation to several relations to avoid anomalies
• Question: does the new database describe the original model?
• Decomposition has to– Be lossless– Preserve dependencies
Finding the key
• Attribute set is key-candidate of relation , if– functional dependency stands– There is no subset of that could determine the
other attributes of relation R.
Superkey
• Extending the key with secondary attributes• Not minimal key• Attribute sets that contain key-candidates
Closure of an attribute set
• To find new relationships• Closure of attribute set based on functional
dependency set:
– Let’s find dependency from so that but . So let’s extend:
– Repeat this until there is no possibility to extend X.
Armstrong axioms
• To find new dependencies in a relation.• , , and are attribute sets• A functional dependency is reflexive:
If then (a key defines its own attributes)
• A funtional dependency is transitive:If and then
• A funtional dependency is augmentive:If then {,}{,}
Dependency preservation
• After decomposition, the originial dependencies can be infered from the new relation’s dependencies.
• Def.: Decomposition of relation R preserves dependency according to dependency set F, if we can logically deduce F from the union of the dependencies of (e.g. by Armstrong axioms or closure).
Lossless decomposition
• By uniting/jointing the decomposed tables, the original tables before normalization can be created
• 3 NF and BCNF are always lossless
BUT!
• BCNF does not always preserve dependencies
Preserve dependencies - check
Wrong example
• , • Decomposition of : • Non-trivial dependency in : (transitive, see
Amstrong axioms)• Non-trivial dependency in : • By uniting dependency sets and : .• {} cannot be deduced!
Good example
• , • Decomposition of : • Non-trivial dependency in : • Non-trivial dependency in : • By uniting dependency sets and : , original is
gained
Example - BCNF
• (City,Street,Postal code)
– not BCNF because key C depends on not-key P• Decomposition of to BCNF: , • Lossless (see the proof later)• Non-trivial dependency in : • Non-trivial dependency in : • First dependency is lost.
Conclusions
• BCNF does not always preserve dependencies• 3 NF always preserves dependencies and is
always lossless• Use 3 NF and check if BCNF preserves
dependency
Lossless - check
Lossless
• By uniting/jointing the decomposed tables, the original tables before normalization can be recreated
• The decomposition cannot lead to bad database structure:– 3 NF and BCNF are always lossless, otherwise
there’s no reason to normalize
Example – information loss
Model Name Price Category
a11 100 Canon
s20 200 Nikon
a70 150 Canon
Model Name Category
a11 Canon
s20 Nikon
a70 Canon
Price Category
100 Canon
200 Nikon
150 Canon
𝑅1 𝑅2
𝑅
Recomposition
• Red lines are not in the original relation
• How could we separate?
Model Name Price Category
a11 100 Canon
a11 150 Canon
s20 200 Nikon
a70 100 Canon
a70 150 Canon
𝑅1∪𝑅2
𝑅 Model Name Price Category
a11 100 Canon
s20 200 Nikon
a70 150 Canon
Check losslessness of a decomposition
• Let the decomposition of relation be and let be their dependeny set. Let’s create table T:– Number of rows:= number of relations in D (m). 1
row/1 relation.– Number of columns:=number of attributes in the
original relation.• , if kth attribute exists in ith relation• , otherwise.
Solution – cont.
• Iteration: Let’s apply the elements of dependency set : – In table T, if there are two identical rows in the
columns of X, then let’s modify the columns of Y: for each column, if one of the (two) values is a(i), then its pair has to be modified to a(i). If both are b(i,k), then modify one of them to be equal to its pair.
• Decision: Finally, if there is at least one row which contains only s, then the composition is lossless. Otherwise, not.
Example
•
Creating table T
First dependency
Second dependency
Third dependency (unnecessary)
BCNF - example
• Fproduct: {ID} {Name, Price, VATtype}
• Forder: {OrderID} {Address}
• Fquantities: {ID, OrderID} {Quantity}
• FVAT: {VATtype} {VAT %}
ID Name Price VATtype VAT % OrderID Quant. Address
Example – cont.ID Name Price VATtype VAT % OrderID Quant. Addr.B(1,1) B(1,2) B(1,3) B(1,4) B(1,5) B(1,6) B(1,7) B(1,8)
B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) B(2,6) B(2,7) B(2,8)
B(3,1) B(3,2) B(3,3) B(3,4) B(3,5) B(3,6) B(3,7) B(3,8)
B(4,1) B(4,2) B(4,3) B(4,4) B(4,5) B(4,6) B(4,7) B(4,8)
ID Name Price VATtype VAT % OrderID Quant. Addr.A(1) A(2) A(3) A(4) B(1,5) B(1,6) B(1,7) B(1,8)
B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)
A(1) B(3,2) B(3,3) B(3,4) B(3,5) A(6) A(7) B(3,8)
B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)
Example – cont.
• Fproductk: {ID} {Name, Price, VATtype}
ID Name Price VATtype VAT % OrderID Quant. Addr.A(1) A(2) A(3) A(4) B(1,5) B(1,6) B(1,7) B(1,8)
B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)
A(1) B(3,2) B(3,3) B(3,4) B(3,5) A(6) A(7) B(3,8)
B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)
ID Name Price VATtype VAT % OrderID Quant. Addr.A(1) A(2) A(3) A(4) B(1,5) B(1,6) B(1,7) B(1,8)
B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)
A(1) A(2) A(3) A(4) B(3,5) A(6) A(7) B(3,8)
B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)
Example – cont.
Forder: {Quantity} {Address}
ID Name Price VATtype VAT % OrderID Quant. Addr.A(1) A(2) A(3) A(4) B(1,5) B(1,6) B(1,7) B(1,8)
B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)
A(1) A(2) A(3) A(4) B(3,5) A(6) A(7) B(3,8)
B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)
ID Name Price VATtype VAT % OrderID Quant. Addr.A(1) A(2) A(3) A(4) B(1,5) B(1,6) B(1,7) B(1,8)
B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)
A(1) A(2) A(3) A(4) B(3,5) A(6) A(7) A(8)
B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)
Example – cont.
• Fquantities: {ID, OrderID} {Quantity}
ID Name Price VATtype VAT % OrderID Quant. Addr.A(1) A(2) A(3) A(4) B(1,5) B(1,6) B(1,7) B(1,8)
B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)
A(1) A(2) A(3) A(4) B(3,5) A(6) A(7) A(8)
B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)
Example – cont.
• FVAT: {VATtype} {VAT %}
ID Name Price VATtype VAT % OrderID Quant. Addr.A(1) A(2) A(3) A(4) B(1,5) B(1,6) B(1,7) B(1,8)
B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)
A(1) A(2) A(3) A(4) B(3,5) A(6) A(7) A(8)
B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)
ID Name Price VATtype VAT % OrderID Quant. Addr.A(1) A(2) A(3) A(4) A(5) B(1,6) B(1,7) B(1,8)
B(2,1) B(2,2) B(2,3) B(2,4) B(2,5) A(6) B(2,7) A(8)
A(1) A(2) A(3) A(4) A(5) A(6) A(7) A(8)
B(4,1) B(4,2) B(4,3) A(4) A(5) B(4,6) B(4,7) B(4,8)
Thank you for your attention!