Bt0066 database management system2

1. Explain the statement that relational algebra operators can be composed. Why the ability to compose operators is important?

The Relational Algebra consists of a set of basic operators that take relations as their operands and return the result as a relation. This means that operators can be composed - the output of one operation may be used as input to another operation.

Query languages : Allow manipulation and retrieval of data from a database.

Relational model supports simple, powerful QLs:o Strong formal foundation based on logic.o Allows for much optimization.

Query Languages != programming languages!o QLs not expected to be “Turing complete”.o QLs not intended to be used for complex calculations.o QLs support easy, efficient access to large data sets.

Formal Relational Query Languages

Two mathematical Query Languages form the basis for “real” languages (e.g. SQL), and for implementation:

o Relational Algebra : More operational(procedural), very useful for representing execution plans.

o Relational Calculus : Lets users describe what they want, rather than how to compute it. (Non-operational, declarative.)

Preliminaries

A query is applied to relation instances, and the result of a query is also a relation instance.

o Schemas of input relations for a query are fixed (but query will run regardless of instance!)

o The schema for the result of a given query is also fixed! Determined by definition of query language constructs.

Positional vs. named-field notation: o Positional notation easier for formal definitions, named-

field notation more readable.

Copy© Milan K Antony | [email protected] | http://solveditpapers.blogspot.in/

mailto:[email protected]

o Both used in SQL

Relational Algebra

Basic operations:o Selection ( ) Selects a subset of rows from relation.o Projection ( ) Deletes unwanted columns from relation.o Cross-product ( ) Allows us to combine two relations.o Set-difference ( ) Tuples in reln. 1, but not in reln. 2.o Union ( ) Tuples in reln. 1 and in reln. 2.

Additional operations:o Intersection, join, division, renaming: Not essential, but

(very!) useful. Since each operation returns a relation, operations can be

composed! (Algebra is “closed”.)

Projection

Deletes attributes that are not in projection list. Schema of result contains exactly the fields in the projection list,

with the same names that they had in the (only) input relation. Projection operator has to eliminate duplicates! (Why??, what

are the consequences?)o Note: real systems typically don’t do duplicate elimination

unless the user explicitly asks for it. (Why not?)

Selection

Selects rows that satisfy selection condition. Schema of result identical to schema of (only) input relation. Result relation can be the input for another relational algebra

operation! (Operator composition.)

Union, Intersection, Set-Difference

All of these operations take two input relations, which must be union-compatible:

o Same number of fields.o `Corresponding’ fields have the same type.

What is the schema of result?

Cross-Product



Each row of S1 is paired with each row of R1. Result schema has one field per field of S1 and R1, with field

names `inherited’ if possible.o Conflict: Both S1 and R1 have a field called sid.

Division

Not supported as a primitive operator, but useful for expressing queries like: Find sailors who have reserved all boats.

Precondition: in A/B, the attributes in B must be included in the schema for A. Also, the result has attributes A-B.

o SALES(supId, prodId); o PRODUCTS(prodId); o Relations SALES and PRODUCTS must be built using

projections. o SALES/PRODUCTS: the ids of the suppliers supplying ALL

products.

Expressing A/B Using Basic Operators

Division is not essential op; just a useful shorthand. o (Also true of joins, but joins are so common that systems

implement joins specially. Division is NOT implemented in SQL).

Idea: For SALES/PRODUCTS, compute all products such that there exists at least one supplier not supplying it.

o x value is disqualified if by attaching y value from B, we obtain an xy tuple that is not in A.

Since each operation returns a relation, operations can be composed!

2. What is an unsafe query? Give an example and explain why it is important to disallow such queries.

Here is an informal example of an unsafe query. Consider the database of allpeople living in Sweden, and suppose that the Last_Name field may contain any string of up to 64 characters. Then the following is an unsafe query which is easily represented in the domain calculus. “Give



me the set of strings of length at most 64 which do not represent the last name of someone living in Sweden.”The formalization of safety is rather technical,and will not be presented here. For a query designed by hand, it is usually obvious whether or not it is safe. Of course, a query processor must be able to detect safety, or lack thereof.

3. Give a set of FDs for the relation schema R(A,B,C,D) with primary key AB under which R is in 1NF but not in 2NF. . Consider the set of FD: AB → CD and B → C. AB is obviously a key for thisrelation since AB → CD implies AB → ABCD. It is a primary key since there areno smaller subsets of keys that hold over R(A,B,C,D). The FD: B → C violates2NF since:

C B is false; that is, it is not a trivial FDB is not a superkeyC is not part of some key for RB is a proper subset of the key AB (transitive dependency)

4. When it is useful to have replication or fragmentation of data? Explain.

Replication is the process of sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility. It could be data replication if the same data is stored on multiple storage devices, or computation replication if the same computing task is executed many times. A computational task is typically replicated in space, i.e. executed on separate devices, or it could be replicated in time, if it is executed repeatedly on a single device.In computing, failover is the capability to switch over automatically to a redundant or standby computer server, system, or network upon the failure or abnormal termination of the previously active application,[1] server, system, or network. Failover happens without human intervention and generally without warning, unlike switchoverFault-tolerance or graceful degradation is the property that enables a system (often computer-based) to continue operating properly in the event of the failure of (or one or more faults within) some of its components. If its operating quality decreases at all, the



http://en.wikipedia.org/wiki/Computer_system

http://en.wikipedia.org/wiki/System

http://en.wikipedia.org/wiki/Switchover

http://en.wikipedia.org/wiki/Failover#cite_note-0

http://en.wikipedia.org/wiki/Application_software

http://en.wikipedia.org/wiki/Abnormal_end

http://en.wikipedia.org/wiki/Computer_network

http://en.wikipedia.org/wiki/System

http://en.wikipedia.org/wiki/Server_(computing)

http://en.wikipedia.org/wiki/Computer

http://en.wikipedia.org/wiki/Redundancy_(engineering)

http://en.wikipedia.org/wiki/Computing

http://en.wikipedia.org/wiki/Data_storage_device

http://en.wikipedia.org/wiki/Fault-tolerance

http://en.wikipedia.org/wiki/Hardware

http://en.wikipedia.org/wiki/Software

decrease is proportional to the severity of the failure, as compared to a naïvely-designed system in which even a small failure can cause total breakdown. Fault-tolerance is particularly sought-after in high-availability or life-critical systems.

5. What is late binding of methods? Give an example of inheritance that illustrates the need for dynamic binding.

Connecting a method call(i.e. Function Call) to a method body(i.e. Function) is called binding. When binding is performed before the program is run (by the compiler and linker, if there is one), it’s called early binding. You might not have heard the term before because it has never been an option with procedural languages. C compilers have only one kind of method call, and that’s early binding. The other solution is called late binding, which means that the binding occurs at run time, based on the type of object. Late binding is also called dynamic binding or run-time binding. When a language implements late binding, there must be some mechanism to determine the type of the object at run time and to call the appropriate method. That is, the compiler still doesn’t know the object type, but the method-call mechanism finds out and calls the correct method body. The late-binding mechanism varies from language to language, but you can imagine that some sort of type information must be installed in the objects. All method binding in Java uses late binding unless the method is static or final (private methods are implicitly final). This means that ordinarily you don’t need to make any decisions about whether late binding will occur—it happens automatically. Reference Links - Here is a very nice article on Difference Between Early and late Binding The polymorphic assignment p := r is valid because of the above rule. If condition c is false, p will be attached to an object of type POLYGON for the computation of p . perimeter , which will thus use the polygon algorithm. In the opposite case, however, p will be attached to a rectangle; then the computation will use the version redefined for RECTANGLE . This is known as dynamic binding .

6. Describe the circumstances in which you would choose to use Embedded SQL rather than Standalone SQL or only a general-purpose programming language.

Embedded SQL refers to the use of standard SQL statements embedded within a procedural programming language SQL functions are primarily a mechanism for extending the power of SQL to handle attributes of complex data types (like images), or to perform complex



http://en.wikipedia.org/wiki/Life-critical_system

http://en.wikipedia.org/wiki/High-availability

http://en.wikipedia.org/wiki/High-availability

and non-standard operations. Embedded SQL is useful when imperative actions like displaying results and interacting with the user are needed. These cannot be done conveniently in an SQL only environment. Embedded SQL can be used instead of SQL functions by retrieving data and then performing the function's operations on the SQL result. However a drawback is that a lot of query-evaluation functionality may end up getting repeated in the host language codeWhen sorting data to be displayed in a subfile, the Open Query File (OPNQRYF) command is often used. This comes in handy when you are providing the user with the ability to change how he displays the data in the subfile. I agree that OPNQRYF is a powerful way to select and sort data before loading your subfile, but its real strength is sorting and selecting data in a batch environment. I have used it with subfile programs, and, althoughit’s a bit of a pain, it does work, especially if you use optimizing techniques. Another method for providing multiple sorting criteria in your subfiles is to use logical files. One problem with logical files is that you are required to maintain an access path for every sort criterion required by your users. Another problem is the complexity of your subfile load routine when using multiple logical files to provide the users with different looks at the data. That gets ugly.What if you had a tool that allowed you the same select and sort capabilities as logical files and OPNQRYF but did so in a flexible and efficient manner, while allowing you to produce a well-structured, easily maintainable program? Wouldn’t that be awesome? Well, with embedded SQL, that’s just what you get. SQL embedded into your RPG program allows you to sort and select the data as you want, but it also provides the user with some added power.What if multiple users from different departments are going to use your subfile program but, as is usually the case with users, they each want to see the data in a different order? You can certainly accomplish this by including enough logical files in your program to cover every possible sorting criterion, but this may require many logical files, especially if the user wants secondary and tertiary sorts. This may also create unnecessary access path administration for the system if all the logical files are created for this one application. It also makes for an ugly subfile build routine in your RPG program. Another method to allow dynamically sorting subfiles is to use OPNQRYF. You could create one open query for each possible sort or dynamically create the open query based on what the user wants to do. Either way, you would be passing parameters back and forth between the RPG and CL programs to either build the OPNQRYF command or select which specific OPNQRYF command to use. I’ve messed around with the latter method,



and the CL can get cumbersome if you try to give the users complete flexibility by allowing them to sort by more than one field at a time. I love users, but that’s a lot of work just to provide them with a different view of the same data.By embedding SQL, you can dynamically build your SELECT statement based on how the user wants to sort the data, all in one neat little subroutine. You can also use techniques that allow you to provide this capability very efficiently and, with a couple of minor changes, for data on other AS/400s. I’ll tell you how it’s done.

7. What is relational completeness? If a query language is relationally complete, can you write any desired query in that language?

Relational algebra is essentially equivalent in expressive power to relational calculus (and thus first-order logic); this result is known as Codd's theorem. Some care, however, has to be taken to avoid a mismatch that may arise between the two languages since negation, applied to a formula of the calculus, constructs a formula that may be true on an infinite set of possible tuples, while the difference operator of relational algebra always returns a finite result. To overcome these difficulties, Codd restricted the operands of relational algebra to finite relations only and also proposed restricted support for negation (NOT) and disjunction (OR). Analogous restrictions are found in many other logic-based computer languages. Codd defined the term relational completeness to refer to a language that is complete with respect to first-order predicate calculus apart from the restrictions he proposed. In practice the restrictions have no adverse effect on the applicability of his relational algebra for database purposes.

8. What types of anomalies are found in relational database

There are several standard types of anomaly in a database. It doesn't need to be a relational database; the same anomalies are present in any database. A properly designed relational database specifically aims to eliminate these anomalies. If the database is not properly normalized, it is susceptible to insertion, update and deletion anomalies. The update anomaly occurs when the same data is stored in several records and a change has to be made. If only some of the records are updated, an update erors occurs - eg we could end up recording two teachers as giving the same class. If you are dealing with hundreds of records, this is quite likely to occur. If you wish to



http://en.wikipedia.org/wiki/Relation_(database)

http://en.wikipedia.org/wiki/Codd's_theorem

http://en.wikipedia.org/wiki/Relational_calculus

record some info in advance of needing it, such as the name of a teacher for a class that at present has no students enrolled, and the teacher name is only recorded next to each student rather than in a separate table, then with no student enrolled, you can't add the teacher info. This is an insertion anomaly. A deletion anomaly occurs not when you delete something by accident, but when deleting a record also removes the only instance of some other data. So if we recorded a class's details along with a student's, have only one student enrolled and then that student withdrew, we would be deleting the class info as well. Normalisation of a relational database is designed to overcome these problems



Education

Bt0066 database management system2