MI0034-Database Management System Set

Embed Size (px)

Citation preview

  • 7/22/2019 MI0034-Database Management System Set

    1/17

    MI0034 Assignment 1

    Q1. Differentiate between Traditional File System & Modern Database System? Describe the

    properties of Database & the Advantage of Database?

    Answer:

    Traditional File Systems Vs Modern Database Management Systems

    Traditional File system is the system that was

    followed before the advent of DBMS i.e., it is

    the older way.

    This is the Modern way which has replaced

    the older concept of File system.

    In Traditional file processing, data definition is

    part of the application program and works with

    only specific application.

    Data definition is part of the DBMS

    Application is independent and can be

    used with any application.

    File systems are Design Driven; they require

    design/coding change when new kind of data

    occurs.

    E.g.: In a traditional employee the master file

    has Emp_name, Emp_id, Emp_addr,

    Emp_design, Emp_dept, Emp_sal, if we want to

    insert one more column Emp_Mob number

    then it requires a complete restructuring of the

    file or redesign of the application code, even

    though basically all the data except that in one

    column is the same.

    One extra column (Attribute) can be

    added without any difficulty

    Minor coding changes in the

    Application program may be required.

    Traditional File system keeps redundant

    [duplicate] information in many locations. This

    might result in the loss of Data Consistency.

    For e.g.: Employee names might exist in

    separate files like Payroll Master File and also

    in Employee Benefit Master File etc. Now if an

    employee changes his or her last name, the

    name might be changed in the pay roll master

    file but not be changed in Employee Benefit

    Master File etc. This might result in the loss of

    Data Consistency.

    Redundancy is eliminated to the maximum

    extent in DBMS if properly defined.

    In a File system data is scattered in various

    files, and each of these files may be in different

    formats, making it difficult to write new

    application programs to retrieve the

    appropriate data.

    This problem is completely solved here.

    Security features are to be coded in the

    Application Program itself.

    Coding for security requirements is not

    required as most of them have been taken

    care by the DBMS.

  • 7/22/2019 MI0034-Database Management System Set

    2/17

    MI0034 Assignment 1

    Hence, a data base management system is the software that manages a database, and is

    responsible for its storage, security, integrity, concurrency, recovery and access.

    The DBMS has a data dictionary, referred to as system catalog, which stores data about

    everything it holds, such as names, structure, locations and types. This data is also referred to

    as Meta data.

    Properties of Database

    The following are the important properties of Database:

    1. A database is a logical collection of data having some implicit meaning. If the data are

    not related then it is not called as proper database.

    E.g. Student studying in class II got 5th rank.

    Stud_name Class Rank obtained

    Vijetha Class II 5th

    2. A database consists of both data as well as the description of the database structure and

    constraints.

    Type Description

    Character It is the students name

    Alpha numeric It is the class of the student

    3. A database can have any size and of various complexity. If we consider the above

    example of employee database the name and address of the employee may consists of

    very few records each with simple structure.

    E.g.

    Emp_Name Emp_ID Emp_Address Emp_Desig Emp_Sal

    Prasad 100 Shubhodaya, Near Katariguppe Big

    Bazaar, BSK II stage, Bangalore

    Project Leader 40000

    Usha 101 #165, 4th main Chamrajpet,Bangalore

    Softwareengineer

    10000

    Nupur 102 #12, Manipal Towers, Bangalore Lecturer 30000

    Peter 103 Syndicate house, Manipal IT executive 15000

    Like this there may be n number of records.

  • 7/22/2019 MI0034-Database Management System Set

    3/17

    MI0034 Assignment 1

    4. The DBMS is considered as general-purpose software system that facilitates the process

    of defining, constructing and manipulating databases for various applications.

    5. A database provides insulation between programs, data and data abstraction. Data

    abstraction is a feature that provides the integration of the data source of interest and

    helps to leverage the physical data however the structure is.

    6. The data in the database is used by variety of users for variety of purposes. For E.g.

    when you consider a hospital database management system the view of usage of

    patient database is different from the same used by the doctor. In this case the data are

    stored separately for the different users. In fact it is stored in a single database. This

    property is nothing but multiple views of the database.

    7. Multiple user DBMS must allow the data to be shared by multiple users simultaneously.

    For this purpose the DBMS includes concurrency control software to ensure that theupdation done to the database by variety of users at single time must get updated

    correctly. This property explains the multiuser transaction processing.

    Advantages of using DBMS

    1. Redundancy is reduced.

    2. Data located on a server can be shared by clients.

    3. Integrity (accuracy) can be maintained.

    4. Security features protect the Data from unauthorized access.

    5. Modern DBMS support internet based application.

    6. In DBMS the application program and structure of data are independent.

    7. Consistency of Data is maintained.

    8. DBMS supports multiple views. As DBMS has many users, and each one of them might use it

    for different purposes, and may require to view and manipulate only on a portion of the

    database, depending on requirement.

    Q2. What is the disadvantage of sequential file organization? How do you overcome it? What

    are the advantages & disadvantages of Dynamic Hashing?

    Answer:

  • 7/22/2019 MI0034-Database Management System Set

    4/17

    MI0034 Assignment 1

    In sequential file organization records are arranged in physical sequence by the value of some

    field, called the sequence field. Often the field chosen is a key field, one with unique values that

    are used to identify records. The records are simply laid out on the storage devices, often

    magnetic tape, in increasing or decreasing order by the value of the sequence field.

    For example IBM's sequential Access Method SAM among others, uses this organization. This

    organization is simple, easy to understand and easy to manage but is best for providing

    sequential access, retrieving records on faster another in the same order in which they are

    stored. It is not good for direct or random access, which means picking out a particular record;

    because it generally requires that we pass over prior records in order to find the target record.

    It is also not possible to insert a new record in the middle of the file. Sequential is the oldest type of file

    organization and despite its shortcomings is well suited for certain applications. In sequential file

    organization records arranged like in physical positions. Retrieving records take much time.

    We can physically order the records of a file on disk based on the values of one of their fields called the

    ordering field. This leads to an ordered or sequential file. Records are placed in an order in sequential

    file and that's why it is totally different from unordered files where records are stored in the order in

    which they are inserted, it means that if new records are inserted then they took place at the end of file.

    Sequential files have many advantages as given below.

    * To find a record in the sequential file is very efficient, because all files are stored in an order and no

    sorting is required.

    * Finding the next record from a current record is also very efficient and it does not require additional

    block access as the next record is in the current record.

    * If we use a technique like Binary Search, then it becomes more efficient and easy to search a record.

    While on the other hand if we search a record in an unordered file then we have to check the whole file

    because records are stored in no order in unordered file. Sequential files are also known as Sorted files.

    Dynamic Hashing In dynamic hashing, the access structure is built on the binary representation

    of the hash value. In this, the number of buckets is not fixed [as in regular hashing] but grows or

    diminishes as needed. The file can start with a single bucket, once that bucket is full, and a new

    record is inserted, the bucket overflows and is slit into two buckets. The records are distributed

    among the two buckets based on the value of the first [leftmost] bit of their hash values.

    Records whose hash values start with a 0 bit are stored in one bucket, and those whose hash

    values start with a 1 bit are stored in another bucket. At this point, a binary tree structure calleda directory is built. The directory has two types of nodes.

    1. Internal nodes: Guide the search, each has a left pointer corresponding to a 0 bit, and a right

    pointer corresponding to a 1 bit.

    2. Leaf nodes: It holds a pointer to a bucket a bucket address.

    Each leaf node holds a bucket address. If a bucket overflows, for example: a new record is

    inserted into the bucket for records whose hash values start with 10 and causes overflow, then

  • 7/22/2019 MI0034-Database Management System Set

    5/17

    MI0034 Assignment 1

    all records whose hash value starts with 100 are placed in the first split bucket, and the second

    bucket contains those whose hash value starts with 101. The levels of a binary tree can be

    expanded dynamically.

    Advantages of dynamic hashing:

    1. The main advantage is that splitting causes minor reorganization, since only the records in

    one bucket are redistributed to the two new buckets.

    2. The space overhead of the directory table is negligible.

    3. The main advantage of extendable hashing is that performance does not degrade as the file

    grows. The main space saving of hashing is that no buckets need to be reserved for future

    growth; rather buckets can be allocated dynamically.

    Disadvantages:

    1. The index tables grow rapidly and too large to fit in main memory. When part of the index

    table is stored on secondary storage, it requires extra access.

    2. The directory must be searched before accessing the bucket, resulting in two-block access

    instead of one in static hashing.

    3. A disadvantage of extendable hashing is that it involves an additional level of indirection.

    Q3. What is relationship type? Explain the difference among a relationship instance,

    relationship type & a relation set?

    Answer:

    In the real world, items have relationships to one another. E.g.: A book is published by a

    particular publisher. The association or relationship that exists between the entities relates data

    items to each other in a meaningful way. A relationship is an association between entities.

    A collection of relationships of the same type is called a relationship set.

    A relationship type R is a set of associations between E, E2..En entity types mathematically, Ris a set of relationship instances ri.

    E.g.: Consider a relationship type WORKS_FOR between two entity types employee and

    department, which associates each employee with the department the employee works for.

    Each relationship instance in WORKS_FOR associates one employee entity and one department

    entity, where each relationship instance is ri which connects employee and department entities

    that participate in ri.

  • 7/22/2019 MI0034-Database Management System Set

    6/17

    MI0034 Assignment 1

    Employee el, e3 and e6 work for department d1, e2 and e4 work for d2 and e5 and e7 work for

    d3. Relationship type R is a set of all relationship instances.

    Some instances of the WORKS_FOR relationship

    Degree of relationship type: The number of entity sets that participate in a relationship set. A

    unary relationship exists when an association is maintained with a single entity.

    A binary relationship exists when two entities are associated.

    A tertiary relationship exists when there are three entities associated.

    Degree of relationship type

    Role Names and Recursive Relationship

    Each entry type to participate in a relationship type plays a particular role in the relationship.

    The role name signifies the role that a participating entity from the entity type plays in each

    relationship instance, e.g.: In the WORKS FOR relationship type, the employee plays the role of

    employee or worker and the department plays the role of department or employer. However in

    some cases the same entity type participates more than once in a relationship type in different

    roles. Such relationship types are called recursive.

  • 7/22/2019 MI0034-Database Management System Set

    7/17

    MI0034 Assignment 1

    E.g.: employee entity type participates twice in SUPERVISION once in the role of supervisor and

    once in the role of supervisee.

    Constraints on Relationship Types

    Relationship types usually have certain constraints that limit the possible combination of

    entities that may participate in the relationship instance.

    E.g.: If the company has a rule that each employee must work for exactly one department. The

    two main types of constraints are cardinality ratio and participation constraints.

    The cardinality ratio specifies the number of entities to which another entity can be associated

    through a relationship set.

    Mapping cardinalities should be one of the following.

    One-to-One: An entity in A is associated with at most one entity in B and vice versa.

    Employee can manage only one department and that a department has only one manager.

    One-to-Many: An entity in A is associated with any number in B. An entity in B however can be

    associated with at most one entity in A.

    Each department can be related to numerous employees but an employee can be related to

    only one department

    Many-to-One: An entity in A is associated with at most one entity in B. An entity in B however

    can be associated with any number of entities in A. Many depositors deposit into a single

    account.

  • 7/22/2019 MI0034-Database Management System Set

    8/17

    MI0034 Assignment 1

    Man-to-Many: An entity in A is associated with any number of entities in B and an entity in B is

    associated with any number of entities in A.

    An employee can work on several projects and several employees can work on a project.

    Participation Roles: There are two ways an entity can participate in a relationship where there

    are two types of participations.

    1. Total: The participation of an entity set E in a relationship set R is said to be total if every

    entity in E participates in at least one relationship in R. Every employee must work for a

    department. The participation of employee in WORK FOR is called total.

    Some instances of the WORKS_FOR relationship

    Total participation is sometimes called existence dependency.

    2. Partial: If only some entities in E participate in relationship in R, the participation of entity set

    E in relationship R is said to be partial.

    Some instances of the WORKS_FOR relationship

  • 7/22/2019 MI0034-Database Management System Set

    9/17

    MI0034 Assignment 1

    We do not expect every employee to manage a department, so the participation of employee in

    MANAGES relationship type is partial.

    Weak Entity: Some entity types may not have any key attribute of their own; they are called

    weak entity types. An entity set that has a primary key is termed as a strong entity type. A weak

    entity type always has a total participation [existence dependence] with respect to a strong

    entity.

    A weak entity type is dependent on the existence of another entity. Weak entity is also referred

    to as child, dependent OR subordinate entities, and strong entities as parent, owner OR

    dominant entities. E.g.: In the following relationship PARENT is a weak entity as it needs the

    entity EMPLOYEE for its existence. The entities EMPLOYEE, COMPANY etc. are strong entities.

    Weak entities are represented by a double lined rectangle.

    Q4. What is SQL? Discuss.

    Answer:

    SQL stands for structured Query Language which is used for programming the database. It is a non-

    procedural language, meaning that SQL describes what data to retrieve delete or insert, rather than how

    to perform the operation. It is the standard command set used to communicate with the RDBMS.

    A SQL query is not-necessarily a question to the database. It can be command to do one of the

    following.

    Create or delete a table.

    Insert, modify or delete rows.

    Search several rows for specifying information and return the result in order.

    Modify security information.

    The SQL statement can be grouped into following categories.

    1. DDL (Data Definition Language)

    2. DML (Data Manipulation Language)

    3. DCL (Data Control Language)

    4. TCL (Transaction Control Language)

  • 7/22/2019 MI0034-Database Management System Set

    10/17

    MI0034 Assignment 1

    DDL: Data Definition Language

    The DDL statement provides commands for defining relation schema i,e for creating tables,

    indexes, sequences etc. and commands for dropping, altering, renaming objects.

    DML: (Data Manipulation Language)

    The DML statements are used to alter the database tables in some way. The UPDATE,

    INSERT and DELETE statements alter existing rows in a database tables, insert new records

    into a database table, or remove one or more records from the database table.

    DCL: (Data Control Language)

    The Data Control Language Statements are used to Grant permission to the user and

    Revoke permission from the user, Lock certain Permission for the user.

    SQL DBA>Revoke Import from Akash;

    SQL DBA>Grant all on emp to public;

    SQL DBA>Grant select, Update on EMP to L.Suresh;

    SQlDBA>Grant ALL on EMP to Akash with Grant option;

    Revoke: Revoke takes out privilege from one or more tables or views.

    SQL DBA>rEOKE UPDATE, DELETE FROM l.sURES;

    SQL DBA>Revoke all on emp from Akash

    TCL: (Transaction Control Language)

    It is used to control transactions.

    Eg: Commit

    Rollback: Discard/Cancel the changes up to the previous commit point.

    Q5. What is Normalization? Discuss various types of Normal Forms.

    Answer:

    Normalization is the process of building database structures to store data, because any

    application ultimately depends on its data structures. If the data structures are poorly designed,

    the application will start from a poor foundation. This will require a lot more work to create a

    useful and efficient application. Normalization is the formal process for deciding which

    attributes should be grouped together in a relation. Normalization serves as a tool for validating

  • 7/22/2019 MI0034-Database Management System Set

    11/17

    MI0034 Assignment 1

    and improving the logical design, so that the logical design avoids unnecessary duplication of

    data, i.e. it eliminates redundancy and promotes integrity. In the normalization process we

    analyze and decompose the complex relations into smaller, simpler and well-structured

    relations.

    The Normal Forms

    The database community has developed a series of guidelines for ensuring that databases are

    normalized. These are referred to as normal forms and are numbered from one (the lowest

    form of normalization, referred to as first normal form or 1NF) through five (fifth normal form

    or 5NF). In practical applications, you'll often see 1NF, 2NF, and 3NF along with the occasional

    4NF. Fifth normal form is very rarely seen and won't be discussed in this article.

    Before we begin our discussion of the normal forms, it's important to point out that they are

    guidelines and guidelines only. Occasionally, it becomes necessary to stray from them to meet

    practical business requirements. However, when variations take place, it's extremely important

    to evaluate any possible ramifications they could have on your system and account for possible

    inconsistencies. That said, let's explore the normal forms.

    First Normal Form (1NF)

    A relation schema R is in first normal form if every attribute of R takes only single atomic values.

    We can also define it as intersection of each row and column containing one and only one

    value. To transform the un-normalized table (a table that contains one or more repeating

    groups) to first normal form, we identify and remove the repeating groups within the table.

    E.g. Dept

    D.Name D.No D. location

    R&D 5 [England, London, Delhi)

    HRD 4 Bangalore

    Consider the figure that each dept can have number of locations. This is not in first normal form

    because D.location is not an atomic attribute. The dormain of D location contains multivalues.

    There is a technique to achieve the first normal form. Remove the attribute D.location that

    violates the first normal form and place into separate relation Dept_location.

    Eg: Dept Dept. Location

    Dept.no. D.Name

    5 R&D

    6 HRD

    Dept_location Dept_No

    London 5

    Delhi 5

    Bangalore 6

  • 7/22/2019 MI0034-Database Management System Set

    12/17

    MI0034 Assignment 1

    Functional dependency: The concept of functional dependency was introduced by Prof. Codd in

    1970 during the emergence of definitions for the three normal forms. A functional dependency

    is the constraint between the two sets of attributes in a relation from a database.

    Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y,

    in R, (X->Y) if and only if each value of X is associated with one value of Y. X is called thedeterminant set and Y is the dependent attribute.

    Second Normal Form (2NF)

    A second normal form is based on the concept of full functional dependency. A relation is in

    second normal form if every non-prime attribute A in R is fully functionally dependent on the

    Primary Key of R.

    Second normal form (2NF) further addresses the concept of removing duplicative data:

    Meet all the requirements of the first normal form.

    Remove subsets of data that apply to multiple rows of a table and place them in separate

    tables.

    Create relationships between these new tables and their predecessors through the use of

    foreign keys.

    A Partial functional dependency is a functional dependency in which one or more non-key

    attributes are functionally dependent on part of the primary key. It creates a redundancy in

    that relation, which results in anomalies when the table is updated.

  • 7/22/2019 MI0034-Database Management System Set

    13/17

    MI0034 Assignment 1

    (a) Normalizing EMP_PROJ into 2NF relations

    (b) Normalizing EMP_DEPT into 3NF relations

    Third Normal Form (3NF)

    Third normal form (3NF) goes one large step further:

    Meet all the requirements of the second normal form.

    Remove columns that are not dependent upon the primary key.

    This is based on the concept of transitive dependency. We should design relational schema in

    such a way that there should not be any transitive dependencies, because they lead to update

    anomalies. A functional dependence [FD] x->y in a relation schema 'R' is a transitive

    dependency. If there is a set of attributes 'Z' Le x->, z->y is transitive. The dependency SSN-

    >Dmgr is transitive through Dnum in Emp_dept relation because SSN->Dnum and Dnum->Dmgr,

    Dnum is neither a key nor a subset [part] of the key.

  • 7/22/2019 MI0034-Database Management System Set

    14/17

    MI0034 Assignment 1

    According to codd's definition, a relational schema 'R is in 3NF, if it satisfies 2NF and no

    no_prime attribute is transitively dependent on the primary key. Emp_dept relation is not in

    3NF, we can normalize the above table by decomposing into E1 and E2.

    Note: Transitive is a mathematical relation that states that if a relation is true between the first

    value and the second value, and between the second value and the 3rd value, then it is true

    between the 1st and the 3rd value.

    Fourth Normal Form (4NF)

    Finally, fourth normal form (4NF) has one additional requirement:

    Meet all the requirements of the third normal form.

    A relation is in 4NF if it has no multi-valued dependencies.

    Multi valued dependencies are based on the concept of first normal form, which prohibits

    attributes having a set of values. If we have two or more multi valued independent attributes in

    the same relation, we get into a situation where we have to repeat every value of one of the

    attributes, with every value of the other attributes to keep the relation state consistent, and to

    maintain independence among the attributes involved. This constraint is specified by a Multi

    valued dependency.

    Consider a table employee that has the attribute name, project and hobby.

    An employee can work in more than one project and can have more than one hobby.

    The employees, projects and hobbies are independent of one another.

    A given project or hobby is associated with any number of employees.

    To keep the Relation State consistent we must have separate tuples to represent every

    combination of employee's project and employees hobbies.

    The drawback of EMPLOYEE relation is redundant data. This redundant data leads to update

    anomaly. For example, if we wish to add one more project on Sybase, so that employ B is

  • 7/22/2019 MI0034-Database Management System Set

    15/17

    MI0034 Assignment 1

    handling, then we must add two more tuples for each hobby. The values Reading and Movie of

    hobby are repeated with each value of project. This redundancy is undesirable. One way to

    remove redundancy is to decompose EMPLOYEE relation into two relations PROJECT AND

    HOBBY.

    NOW, if we wish to insert Sybase in PROJECT relation, then there is only one entry required.

    Definition (MVD): A relation R(X.Y.Z) is said to have multivalued dependency XY if the set of

    Y values for a given [X,Z] pair does not depend on Z, but depends only on X, then we say XY

    "X multi-determines y" or "y is multi-dependent on x". Then such FD is called Multivalued

    Dependency (MVD) and is represented by double arrows.

    We can also define MVD as, for each value of X there is a set of values for Y, and a set of values

    for Z. However, the set of values for Y and Z are independent of each other.

    So wherever two independent one_to_many relationships (A:B and A:C) are mixed on the same

    relation, a multivalued dependency arises. Multivalued dependency can be avoided using thefourth normal form.

    EMPLOYEE

    NAME PROJECT HOBBY

    A Microsoft Cricket

    A Oracle Music

    A Microsoft Music

    A Oracle Cricket

    B INTEL Movies

    B Sybase Reading

    B INTEL Reading

    B Sybase Movies

    Decompose relation to reduce redundancy

    PROJECT

    NAME PROJECT

    A Microsoft

    A Oracle

    B Intel

    B Sybase

    HOBBY

    NAME PROJECT

    A Cricket

    A Music

    B Movie

    B Reading

  • 7/22/2019 MI0034-Database Management System Set

    16/17

    MI0034 Assignment 1

    The definition of 4NF is violated when a relation has undesirable multivalued dependencies,

    and hence identify such relations and decompose into 4NF relations.

    Alternate definition: A relation R is said to be in 4NF if for every MVD AB that holds over R,

    one of the following is true:

    B A (trivial), or

    AB = R or

    A is a super key

    The Employee relation is not in 4NF because of the non-trivial MVDs (project and hobby

    attributes of employee relation are independent of each other) and NAME is not a super key of

    EMPLOYEE. To make this relation into 4NF you have to decompose EMPLOYEE to PROJECT AND

    HOBBY.

    These normalization guidelines are cumulative. For a database to be in 2NF, it must first fulfill

    all the criteria of a 1NF database.

    Q6. What do you mean by Shared Lock & Exclusive lock? Describe briefly two phase locking

    protocol?

    Answer:

    A lock is a restriction on access to data in a multi-user environment. It prevents multiple users

    from changing the same data simultaneously. If locking is not used, data within the databasemay become logically incorrect and may produce unexpected results.

    Shared Locks: It is used for read only operations, i.e., used for operations that do not change or

    update the data.

    E.G., SELECT statement:,

    Shared locks allow concurrent transaction to read (SELECT) a data. No other transactions can

    modify the data while shared locks exist. Shared locks are released as soon as the data has been

    read.

    Exclusive Locks: Exclusive locks are used for data modification operations, such as UPDATE,

    DELETE and INSERT. It ensures that multiple updates cannot be made to the same resource

    simultaneously. No other transaction can read or modify data when locked by an exclusive lock.

    Exclusive locks are held until transaction commits or rolls back since those are used for write

    operations.

  • 7/22/2019 MI0034-Database Management System Set

    17/17

    MI0034 Assignment 1

    There are three locking operations: read lock(X), write lock(X), and unlock(X). A lock associated

    with an item X, LOCK(X), now has three possible states: "read locked", "write-locked", or

    "unlocked". A read-locked item is also called share-locked, because other transactions are

    allowed to read the item, whereas a write-locked item is called exclusive-locked, because a

    single transaction exclusive holds the lock on the item.

    Each record on the lock table will have four fields: