Join Stratagies

Embed Size (px)

Citation preview

  • 5/24/2018 Join Stratagies

    1/3

    Merge Join

    --------------------

    Merge join is a concept in which rows to be joined must be present in same AMP. If the rows to be

    joined are not on the same AMP, Teradata will either redistribute the data or duplicate the data inspool to make that happen based on row hash of the columns involved in the joins WHERE Clause.

    If two tables to be joined have same primary Index, then the records will be present in Same

    AMP and Re-Distribution of records is not required.

    There are four scenarios in which redistribution can happen for Merge Join

    Case 1:If joining columns are on UPI = UPI, the records to be joined are present in Same AMP and

    redistribution is not required. This is most efficient and fastest join strategy

    Case 2:If joining columns are on UPI= Non Index column, the records in 2nd table has to be

    redistributed on AMP's based on data corresponding to first table.

    Case 3:If joining columns are on Non Index column= Non Index column, the both the tables are

    to be redistributed so that matching data lies on same amp , so the join can happen on redistributeddata. This strategy is time consuming since complete redistribution of both the tables takes across all

    the amps

    Case 4:For join happening on Primary Index, If the Referenced table (second table in the join) is very

    small, then this table is duplicated /copied on to every AMP.

    Nested Join

    -------------------

    Nested Join is one of the most precise join plans suggested by Optimizer .Nested Join works on

    UPI/USI used in Join statement and is used to retrieve the single row from first table . It then checks

    for one more matching rows in second table based on being used in the join using an index (primary

    or secondary) and returns the matching results.

    Example:

    SelectEMP.Ename, DEP.Deptno, EMP.salary

    from

    EMPLOYEEEMP,

    DEPARTMENTDEP

    WhereEMP.Enum= DEP.Enum

    andEMp.Enum= 2345; -- this results in nested join

    Hash join----------------

    Hash join is one of the plans suggested by Optimizer based on joining conditions. We can say Hash

    Join to be close relative of Merge based on its functionality. In case of merge join, joining would

    happen in same amp. In Hash Join, one or both tables which are on same amp are fit completely

    inside the AMP's Memory . Amp chooses to hold small tables in its memory for joins happening on

    ROW hash.

    Advantages of Hash joins are

    1. They are faster than Merge joins since the large table doesnt need to be sorted.

    2. Since the join happening b/w table in AMP memory and table in unsorted spool, it happens so

    quickly.

  • 5/24/2018 Join Stratagies

    2/3

    Exclusion Join

    -------------------------

    These type of joins are suggested by optimizer when following are used in the queries

    - NOT IN

    - EXCEPT- MINUS

    - SET subtraction operations

    SelectEMP.Ename, DEP.Deptno, EMP.salary

    from

    EMPLOYEEEMP

    WHEREEMP.EnumNOTIN

    ( SelectEnumfrom

    DEPARTMENTDEP

    whereEnumisNOTNULL);

    Please make sure to add an additional WHERE filter with IS NOT NULL since usage of

    NULL in a NOT IN list will return no results.

    Exclusion join for following NOT In query has 3 scenarios

    Case 1:matched data in "NOT IN" sub Query will disqualify that row

    Case 2:Non-matched data in "NOT IN" sub Query will qualify that row

    Case 3:Any Unknown result in "NOT IN" will disqualify that row - ('NULL' is a typical example of this

    scenario).

  • 5/24/2018 Join Stratagies

    3/3