Jyrki-2002 03 25

Embed Size (px)

Citation preview

  • 8/14/2019 Jyrki-2002 03 25

    1/9

    Instruktorvagter i N034

    to 4.4 kl. 13-16ti 9.4 kl. 13-16to 11.4 kl. 13-16ti 16.4 kl. 13-16

    to 18.4 kl. 13-16

    Plus you can still rely on our e-mail support.

    Note that there are no weekly exercises inweeks 13 (Easter) and 14.

    c Performance Engineering Laboratory 1

    Apache and MySQL on the samecomputer

    EDB-afdelingen har opfodret folk til atbenytte deres UID som portnr. for bade

    Apache og MySQL, hvilket konikter, hvisde forsges krt pa samme maskine.

    http://www.diku.dk/teaching/2002f/617/Forum/2002.03.18-18:30:57.html

    You should thank Martin Parm for solving all

    these kinds of problems for you. His time-saving packages are available via our homepage.

    c Performance Engineering Laboratory 2

    Course 2000Book

    Raghu Ramakrishnan, Johannes Gehrke: Da-tabase Management Systems , 2nd edition,McGraw-Hill Higher Education (2000)

    Jeg bryder mig ikke om bogen. Hver gangen argument ville fylde mere end 3 linjer

    skriver de bare at det er intuitivt.

    You will see the reason today. Some of thetopics are difficult.

    c Performance Engineering Laboratory 3

    My expectations

    Instead of all these discussions on organiza-tional matters I would like to have deep di-scussions on the scientic contents of ourpensum .

    Which of these two claims is correct?

    1. a key can be found in polynomial time[Ramakrishnan and Gehrke 2000, p. 443]

    2. Finding a key is shown to be NP-complete[Ramakrishnan and Gehrke 2000, p. 456]

    Is this claim correct?

    The existence of a polynomial algorithm forobtaining a lossless-join, dependency-preserv-ing decomposition into 3NF is surprising

    c Performance Engineering Laboratory 4

  • 8/14/2019 Jyrki-2002 03 25

    2/9

    Todays program

    Design theory

    In addition to our textbook, I looked at thefollowing articles when preparing this lecture:

    Catriel Beeri and Philip A. Bernstein, Com-putational problems related to the design of normal form relational schemas, ACM Transa-ctions on Database Systems 4 (1979), 3059

    Jiann H. Jou and Patrick C. Fischer, Thecomplexity of recognizing 3NF relation sche-mes, Information Processing Letters 14(1982), 187190

    Claudio L. Lucchesi and Sylvia L. Osborn,

    Candidate keys for relations, Journal of Com-puter and Systems Sciences 17 (1978), 270279

    Don-Min Tsou and Partick C. Fischer, Decom-position of a relation scheme into Boyce-Codd normal form, SIGACT News 14 (1982),2329c Performance Engineering Laboratory 5

    Design Theory

    In this theory the goal is

    1. to dene the desirable properties of rela-

    tion schemas rigorously and

    2. to show how a relational schema withthese desirable properties can be produ-ced mechanically.

    So we are interested in automatic tools (al-gorithms) for obtaining a database schemawith these properties.

    c Performance Engineering Laboratory 6

    Functional dependencies (FDs)

    A functional dependency X Y holdsover relation R if, for every allowable in-stance r of R :

    t 1 r , t2 r , X ( t 1 ) = X ( t 2 ) implies

    Y ( t 1 ) = Y ( t 2 ) i.e., given two tuples in r , if the X

    values agree, then the Y values mustalso agree. ( X and Y are sets of at-tributes.)

    An FD is a statement about all allowablerelations.

    Must be identied based on the se-mantics of the application in question.

    Given some allowable instance r of R ,we can check if it violates some FD f ,but we cannot tell if f holds over R !

    c Performance Engineering Laboratory 7

    Logical implication

    We say that F logically implies X Y , writ-ten F |= X Y , if every relation r for R thatsatises the dependencies in F also satisesX Y .

    Theorem: Suppose R is a relation schemaand A , B , and C are some of its attribu-tes. Suppose also that the functional de-pendencies A B and B C are knownto hold in R . Then A C must hold inR .

    Proof: Suppose r is a relation that satisesA B and B C , but there are twotuples and in r such that and agreein the component for A but disagree inC . Then we must ask whether and

    agree on attribute B . If not, then r wouldviolate A B . If they do agree on B , thensince they disagree on C , r would violateB C . Hence, r must satisfy A C .

    c Performance Engineering Laboratory 8

  • 8/14/2019 Jyrki-2002 03 25

    3/9

    Closure of dependency sets

    We dene F + , the closure of F , to be the setof functional dependencies that are logicallyimplied by F ; i.e.,

    F +

    = {X Y | F |= X Y } .

    Example: Let R = ABC and F = {A B ,B C }. Then F + consists of all thosedependencies X Y such that either

    1. X contains A , e.g., ABC AB , AB BC , or A C ,

    2. X contains B but not A , and Y doesnot contain A , e.g., BC B , B C ,

    or B , or3. X Y is one of the three dependen-

    cies C C , C , or .

    Can you prove this?

    c Performance Engineering Laboratory 9

    Keys of relations

    Key: If R is a relation schema with attributesA1 , A 2 ,. . . , A m and functional dependen-cies F , a set of attributes K is a key of R if:

    1. K A 1 A 2 . . . A m is in F + .

    2. For no proper subset of K is (1) true.

    Superkey: If K satises (1), then K is a su-perkey (superset of a key).

    Prime: An attribute is a prime if it is a mem-ber of at least one key.

    Example: Assume that the non-trivial fun-

    ctional dependencies in the schema withattributes CITY, STREET, and ZIP are:

    CITY STREET ZIPZIP CITY

    One can check that {CITY, STREET }and {STREET, ZIP } are both keys.

    c Performance Engineering Laboratory 10

    Armstrongs axioms

    To determine keys, and to understand logicalimplications, we need to compute F + fromF , or at least, to tell, given F and functionaldependency X Y , whether X Y is inF + . To do so we need a set of inference

    rules that can be used to generate all theother dependencies implied.

    The following set of rules, often called Arm-strongs axioms , are complete and soundinference rules for functional dependencies.Let X , Y , and Z be sets of attributes.

    Reexivity: If Y X , then X Y .

    Augmentation: If X Y , then XZ Y Z for any Z .

    Transitivity: If X Y and Y Z , thenX Z .

    c Performance Engineering Laboratory 11

    Reasoning about FDs

    Couple of additional rules (that follow fromArmstrongs axioms):

    Union: If X Y and X Z , then X Y Z

    Decomposition: If X Y Z , then X Y

    and X Z

    Example: Contracts(cid,sid,jid,did,pid,qty,value)

    C is a key: C CSJDPQV

    Project purchases each part using singlecontract: JP C

    Dept purchases at most one part froma supplier: SD P

    JP C , C CSJDPQV imply JP

    CSJDPQV SD P implies SDJ JP

    SDJ JP , JP CSJDPQV implySDJ CSJDPQV

    c Performance Engineering Laboratory 12

  • 8/14/2019 Jyrki-2002 03 25

    4/9

  • 8/14/2019 Jyrki-2002 03 25

    5/9

  • 8/14/2019 Jyrki-2002 03 25

    6/9

    Fourth normal form

    4NF: Remove non-trivial multi-valued de-pendencies. All multi-valued dependen-cies must be functional dependencies.

    [Carter 2000, Fig. 2.31][Carter 2000, Fig. 2.32]

    c Performance Engineering Laboratory 21

    Fifth normal form

    5NF: If, even after 4NF you can still nd away of splitting the schema apart fromthe trivial decompositions without lo-sing data, then do it.

    Example: A customer schema with attribu-tes {c no, sname, address, balance } couldbe non-loss decomposed into schemas withattributes {c no, sname }, {c no, address },and {c no, balance }. There would be nopoint in doing this though.

    There is no easy method of testing whethera schema is in 5NF other than trying everypossible split (apart from the simple decom-positions mentioned above) and showing thatdata would be lost in each case.

    c Performance Engineering Laboratory 22

    Summary of normal forms

    Form Action1NF Remove repeating groups2NF Remove partial dependencies3NF Remove non-key dependencies

    (only Case 1)

    BCNF Remove non-key dependencies(also Case 2)4NF Remove multi-valued dependencies5NF No non-trivial non-loss decomposi-

    tions exist

    [Ramakrishnan and Gehrke 2000, Fig. 15.9][Ramakrishnan and Gehrke 2000, Fig. 15.10]

    c Performance Engineering Laboratory 23

    Attribute closure

    Computing the closure of a set of FDs canbe expensive. (Its size can be exponential in# attributes!)

    Typically, we just want to check if a given

    FD X Y is in the closure of a set of FDsF . An efficient check exists:

    1. Compute the attribute closure of X ,denoted X + , with respect to F , whichis the set of all attributes A such thatX A is in F + .

    2. Check if Y is a subset of X + .

    Example: Does F = {A B, B C,CD E }imply A E ? That is, is A E in theclosure F + ? Equivalently, is E in A + ?

    c Performance Engineering Laboratory 24

  • 8/14/2019 Jyrki-2002 03 25

    7/9

    Computing the attribute closure

    Input: A set of attributes X and a set of functional dependencies F . Let U denotethe set of attributes appearing in these.

    Output: The attribute closure of X with re-spect to F .

    Attribute-Closure( X ,F )1 i 02 closure 0 X3 do4 i i +15 closure i closure i 16 for each Y Z in F 7 if Y closure i 18 closure i closure i Z 9 while closure i = closure i 110 return closure i

    c Performance Engineering Laboratory 25

    Correctness

    Theorem: The closure algorithm correctlydecides whether or not a FD A 1 A 2 . . . A k B follows from a given set of FDs F .

    Proof: There are two parts:

    1. We must show that if A 1 A 2 A k Bis asserted by the closure test, i.e., B {A 1 , A 2 , . . . , A k}+ , then A 1 A 2 A k B holds in any relation that satisesall the FDs in F .

    This part can be proved by induction.

    2. We must show that the closure algo-rithm does not fail to discover a FD

    that truly follows from the set of FDsF .

    This part can be proved by a contraargument.

    c Performance Engineering Laboratory 26

    Complexity

    Assume that |U | = m and |F | = n . Further-more, assume that the attributes are repre-sented as numbers that can be stored in amachine word. Hence, their manipulation ta-kes O (1) time.

    Let us represent each set of attributes as abalanced binary search tree.

    Y Z : This can be implemented in O ( m )time.

    Y Z : This requires at most O ( |Z | log 2 m )time.

    The running time of the closure algorithmis bounded by O ( m log 2 m + m 2 n ). Can youimprove upon this?

    c Performance Engineering Laboratory 27

    Is X a key?

    Let U, F be a dependency system . For keyK of U, F , attribute A of K is essential toK if K \ { A} is not a key for U, F .

    1. Check whether the attribute closure of Xwith respect to F is U . If not, X is not akey.

    2. For each attribute A of X , check that itis essential to X .

    In total, the attribute closure algorithm is cal-led |X | +1 times. That is, this problem canbe solved in polynomial time.

    c Performance Engineering Laboratory 28

  • 8/14/2019 Jyrki-2002 03 25

    8/9

    Find one key

    1. Let K denote a key candidate. Initially,K U .

    2. For each attribute in K , check whether itis essential to K or not. If a non-essentialattribute A is found, let K K \ { A} andrepeat. Otherwise stop and report K asa key.

    In total, the attribute closure algorithm is cal-led at most O ( m 2 ) times so the problem canbe solved in polynomial time.

    c Performance Engineering Laboratory 29

    Find all keys

    It is known that the number of all keys can beexponential in m (the number of attributes)and factorial in n (the number of FDs). Thatis, the problem of nding the set of all keysis computationally infeasible.

    An output-sensitive algorithm is known whichcomputes the set of keys in time polynomialin the size of the input plus the size of theoutput. If the number of keys will be small,this algorithm can be used.

    c Performance Engineering Laboratory 30

    The key of cardinality problem

    In this problem we are asked to determinewhether there exists a key with less than attributes or not.

    Theorem: The key of cardinality problemis NP-complete.

    Proof: This can be proved by showing that,e.g., the vertex cover problem is polyno-mially transformable into the key of car-dinality problem.

    c Performance Engineering Laboratory 31

    The prime attribute problem

    In this problem we are given a set U of at-tributes, a set F of FDs, and one attributeA , and the task is to decide whether or notA is prime relative to the dependency systemU, F .

    Theorem: The prime attribute problem isNP-complete.

    Proof: Clearly the problem lies in NP. Hence,it is sufficient to prove that the key cardi-nality problem is polynomially transfor-mable into the prime attribute problem.

    c Performance Engineering Laboratory 32

  • 8/14/2019 Jyrki-2002 03 25

    9/9

    Is a schema in 3NF?

    Theorem: The problem of deciding whetheror not a relation schema is in 3NF undera set of FDs is NP-complete.

    Proof: A reduction of primeness to 3NF.

    c Performance Engineering Laboratory 33

    Is a schema in BCNF?

    Lemma: Let U be a set of attributes and F a set of FDs over U . Then the followingtwo statements are equivalent:

    (a) U is in BCNF under F .

    (b) For every f F the left side of f is asuperkey.

    Proof: Not difficult.

    Theorem: One can decide whether or not arelation schema is in BCNF under a setF of FDs in polynomial time.

    Proof: Simply check whether the left side of every FD is a superkey. That is, the al-tribute closure algorithm has to be calledat most |F | times.

    c Performance Engineering Laboratory 34

    Lossless-join decomposition intoBCNF

    Theorem: Given a set U of attributes anda set F of FDs over U , it is possible toproduce a lossless-join decomposition of U such that every set in the decompo-

    sition is in BCNF under F in polynomialtime.

    The algorithm is complicated. The exponen-tial algorithm using F + is conceptually simp-ler.

    c Performance Engineering Laboratory 35

    Dependency-preservingdecomposition into BCNF

    Theorem: Let F be a set of FDs over a setof attributes U . The problem of decidingwhether there is a dependency-preservingdecomposition of U into BCNF under F

    is NP-hard.

    Proof: I have not looked at this.

    c Performance Engineering Laboratory 36