QueryOptimization_Siao

Embed Size (px)

Citation preview

  • 8/6/2019 QueryOptimization_Siao

    1/24

    Query Optimization

    CS 157B

    Ch. 14

    Mien Siao

  • 8/6/2019 QueryOptimization_Siao

    2/24

    Outline Introduction

    Steps in Cost-based query optimization- Query

    Flow Projection Example

    Query Interaction in DBMS

    Cost-based query Optimization: Algebraic

    Expressions

  • 8/6/2019 QueryOptimization_Siao

    3/24

    Introduction What is Query Optimization?

    Suppose you were given a chance tovisit 15 pre-selected different citiesin Europe. The only constraint wouldbe Time

    -> Would you have a plan to visitthe cities in any order?

  • 8/6/2019 QueryOptimization_Siao

    4/24

  • 8/6/2019 QueryOptimization_Siao

    5/24

    Plan:

    -> Place the 15 cities in different groups

    based on their proximity to each other.-> Start with one group and move on tothe next group.

    Important point made over here is thatyou would have visited the cities in amore organized manner, and the Timeconstraint mentioned earlier would have

    been dealt with efficiently.

  • 8/6/2019 QueryOptimization_Siao

    6/24

  • 8/6/2019 QueryOptimization_Siao

    7/24

    Starting with System-R, most of thecommercial DBMSs use cost-based

    optimizers.

    The estimation should be accurateand easy. Another important point is

    the need for being logicallyconsistent because the least costplan will always be consistently low.

  • 8/6/2019 QueryOptimization_Siao

    8/24

    Steps in a Cost-based query

    optimization

    1. Parsing

    2. Transformation

    3. Implementation

    4. Plan selection based on costestimates

  • 8/6/2019 QueryOptimization_Siao

    9/24

    Query Flow

    Parser

    Optimizer

    CodeGenerator/Interpreter

    Processor

    SQL

  • 8/6/2019 QueryOptimization_Siao

    10/24

    Query Parser Verify validity of the SQLstatement. Translate query into an internalstructure using relational calculus.

    Query Optimizer Find the best expressionfrom various different algebraic expressions.Criteria used is Cheapness

    Code Generator/Interpreter Make calls forthe Query processor as a result of the work doneby the optimizer.

    Query Processor Execute the calls obtainedfrom the code generator.

  • 8/6/2019 QueryOptimization_Siao

    11/24

    Cost of physical plans includes processortime and communication time. The mostimportant factor to consider is disk I/Os

    because it is the most time consumingaction.

    Some other costs associated are:- Operations (joins, unions,

    intersections).- The order of operations.Why?

  • 8/6/2019 QueryOptimization_Siao

    12/24

    Joins, unions, and intersections areassociative and commutative.

    - Management of storage ofarguments and passing of it.

    Factors mentioned above should belimited and minimized when creatingthe best physical plan.

  • 8/6/2019 QueryOptimization_Siao

    13/24

  • 8/6/2019 QueryOptimization_Siao

    14/24

    We can fit 5 tuples into 1 block- 5 tuples * 190 bytes/tuple = 950 bytes

    can fit into 1 block- For 20,000 tuples, we would require

    4,000 blocks (20,000 / 5 tuples per block= 4,000

    With a projection resulting in elimination ofcolumn c (150 bytes), we could estimatethat each tuple would decrease to 40bytes (190 150 bytes)

  • 8/6/2019 QueryOptimization_Siao

    15/24

    Now, the new estimate will be 25 tuples in1 block.

    - 25 tuples * 40 bytes/tuple = 1000 byteswill be able to fit into 1 block

    - With 20,000 tuples, the new estimate is800 blocks (20,000 tuples / 25 tuples per

    block = 800 blocks)

    Result is reduction by a factor of 5

  • 8/6/2019 QueryOptimization_Siao

    16/24

    Query interaction in DBMS How does a query interact with a

    DBMS?

    - Interactive users

    - Embedded queries in programswritten in C, C++, etc.

    What is the difference betweenthese two ?

  • 8/6/2019 QueryOptimization_Siao

    17/24

    Interactive Users:

    - When there is an interactive userquery, the query goes through theQuery Parser, Query Optimizer,Code Generator, and Query

    Processor each time.

  • 8/6/2019 QueryOptimization_Siao

    18/24

  • 8/6/2019 QueryOptimization_Siao

    19/24

    - In an embedded query, the callsgenerated by the code generator are

    stored in the database. Each timethe query is reached within theprogram at run-time, the QueryProcessor invokes the stored calls in

    the database.- Optimization is independent in

    embedded queries.

  • 8/6/2019 QueryOptimization_Siao

    20/24

    Cost-based query Optimization:

    Algebraic ExpressionsIf we had the following query-

    SELECT p.pname, d.dname

    FROM Patients p, Doctors d

    WHERE p.doctor = d.dname

    AND d.dgender = M

  • 8/6/2019 QueryOptimization_Siao

    21/24

    projection

    filter

    join

    Scan (Patients) Scan (Doctors)

  • 8/6/2019 QueryOptimization_Siao

    22/24

    Cost-based query Optimization :

    Transformationprojection projection

    filter join

    join

    Scan (Patients) Scan (Doctors) Scan(Patients) Scan(Doctors)

  • 8/6/2019 QueryOptimization_Siao

    23/24

    Cost-based query Optimization:

    Implementationprojection projection

    filter hash join

    natural join filter

    Scan(Patients) Scan(Doctors) Scan(Patients) Scan(Doctors)

  • 8/6/2019 QueryOptimization_Siao

    24/24

    Cost-based query Optimization:

    Plan selection based on costsprojection projection

    filter hash join

    natural join filter

    Scan(Patients) Scan(Doctors) Scan(Patients) Scan(Doctors)

    Estimated Costs= 100ms

    Estimated Costs= 50ms