Home > ASE, Developement > Joins Algorithms

Joins Algorithms

January 28th, 2011 Leave a comment Go to comments

Source : sybase.com and www

Time Complexity of Nested Loop Join Algo : O(N * M)

Time Complexity of a HASH JOIN is O(N + M), where N is the hashed table and M the is lookup table. Hashing and hash lookups have constant complexity.

Time Complexity of a MERGE JOIN is O(N*Log(N) + M*Log(M)): it’s the sum of times to sort both tables plus time to scan them.

Hash Join Algo
====================
The hash join algorithm builds an in-memory hash table of the smaller of its two inputs, and then reads the larger input and probes the in-memory hash table to find matches, which are written to a work table. If the smaller input does not fit into memory, the hash join operator partitions both inputs into smaller work tables. These smaller work tables are processed recursively until the smaller input fits into memory.
The hash join algorithm has the best performance if the smaller input fits into memory, regardless of the size of the larger input. In general, the optimizer will choose hash join if one of the inputs is expected to be substantially smaller than the other.

Merge Join Algo
==================
The merge join algorithm reads two inputs which are both ordered by the join attributes. For each row of the left input, the algorithm reads all of the matching rows of the right input by accessing the rows in sorted order.
If the inputs are not already ordered by the join attributes (perhaps because of an earlier merge join or because an index was used to satisfy a search condition), then the optimizer adds a sort to produce the correct row order. This sort adds cost to the merge join.
One advantage of a merge join compared to a hash join is that the cost of sorting can be amortized over several joins, provided that the merge joins are over the same attributes. The optimizer will choose merge join over a hash join if the sizes of the inputs are likely to be similar, or if it can amortize the cost of the sort over several operations.

Nested Loops Join Algo
==================
The nested loops join computes the join of its left and right sides by completely reading the right hand side for each row of the left hand side. (The syntactic order of tables in the query does not matter, because the optimizer chooses the appropriate join order for each block in the request.)

The optimizer may choose nested loops join if the join condition does not contain an equality condition, or if it is optimizing for first-row time.
Since a nested loops join reads the right hand side many times, it is very sensitive to the cost of the right hand side. If the right hand side is an index scan or a small table, then the right hand side can likely be computed using cached pages from previous iterations. On the other hand, if the right hand side is a sequential table scan or an index scan that matches many rows, then the right hand side needs to be read from disk many times. Typically, a nested loops join is less efficient than other join methods. However, nested loops join can provide the first matching row quickly compared to join methods that must compute their entire result before returning.
Nested loops join is the only join algorithm that can provide sensitive semantics for queries containing joins. This means that sensitive cursors on joins can only be executed with a nested loops join.

Check the Sybase Wiki @ sybasewiki.com
Categories: ASE, Developement Tags: , , , ,
  1. Akash Kumar
    February 7th, 2012 at 03:39 | #1

    Hello everybody ,
    I need some help for a T-Sql update statement using “Abstract plan” (AP) . It is in ASE 15.

    Some information about the tables involved in the query:
    1) There two tables named as “small” and “big”
    2) The table “small” has 1000 rows in it
    3) The table “big” has 20 million rows
    4) Join columns are “Id1” and “Id2”
    5) Both tables have unique non-clustered indexes on these two join columns

    Index on (Id1 + Id2)

    Without the abstract plan clause “plan” optimizer uses nested-loop-join.
    My goal is to use HASH join instead of Nested-Loop-Join.

    So I use abstract plan in the query like below:

    update big
    set Amount= 100.00
    from small s,big b
    where s.Id1 = b.Id1
    and s.Id2 = b.Id2
    plan “(h_join (sort (t_scan s)) (sort (t_scan b)))”

    I am getting message below and optimizer is using nested-Loop_join (please see the query – plan below)

    Abstract Plan (AP) Warning : An error occured while applying the AP:
    ( h_join ( sort ( t_scan s ) ) ( sort ( t_scan b ) ) )
    to the SQL query:
    update big set Amount= 100.00 from small s,big b where s.Id1 = b.Id1 and s.Id2 = b.Id2
    Failed to apply the top operator ‘h_join ‘ of the following AP fragment:
    ( h_join ( sort ( t_scan s ) ) ( sort ( t_scan b ) ) )
    The children of the ‘h_join ‘ operator don’t have t:q!he required properties, as ordering, partitioning.
    The logical relational operation is valid at this point but it can not be implemented by this physical operator.
    The AP property enforcer operators ‘sort’, ‘xchg’ or ‘enforce’ can be used to create the needed physical properties.
    When an ordering is missing, the nl_join or h_join peer physical operators that don’t need an ordering can be used.
    The following template can be used as a basis for a valid AP:
    ( also_enforce ( update ( also_enforce ( join ( also_enforce ( scan ( table ( b big ) ) ) ) ( also_enforce ( scan ( table ( s small ) ) ) ) ) ) ) )
    The optimizer will try to complete the compilation of this query; on success, the query will be executed normally.

    QUERY PLAN FOR STATEMENT 1 (at line 9).
    Optimized using Serial Mode
    Optimized using the Abstract Plan in the PLAN clause.

    STEP 1
    The type of query is UPDATE.

    5 operator(s) under root

    |ROOT:EMIT Operator (VA = 5)
    |
    | |UPDATE Operator (VA = 4)
    | | The update mode is direct.
    | |
    | | |NESTED LOOP JOIN Operator (VA = 3) (Join Type: Left Semi Join)
    | | |
    | | | |SORT Operator (VA = 1)
    | | | | Using Worktable1 for internal storage.
    | | | |
    | | | | |SCAN Operator (VA = 0)
    | | | | | FROM TABLE
    | | | | | big
    | | | | | b
    | | | | | Table Scan.
    | | | | | Forward Scan.
    | | | | | Positioning at start of table.
    | | | | | Using I/O Size 4 Kbytes for data pages.
    | | | | | With LRU Buffer Replacement Strategy for data pages.
    | | |
    | | | |SCAN Operator (VA = 2)
    | | | | FROM TABLE
    | | | | small
    | | | | s
    | | | | Table Scan.
    | | | | Forward Scan.
    | | | | Positioning at start of table.
    | | | | Using I/O Size 4 Kbytes for data pages.
    | | | | With LRU Buffer Replacement Strategy for data pages.
    | |
    | | TO TABLE
    | | big
    | | Using I/O Size 4 Kbytes for data pages.

    Thanks

    • sybanva
      February 9th, 2012 at 08:57 | #2

      Hi Akash,
      Could you please let me know the optimization goal for you server. As join changes as with optimization goal.

      Please mark a mail to sybanva@gmail for faster response. Thank you!

      • Akash Kumar
        February 9th, 2012 at 22:21 | #3

        Thanks for the response. I sent an e-mail to sybanva@gmail.com
        with info you asked.
        Thanks again.

        • Akash Kumar
          February 23rd, 2012 at 23:43 | #4

          Friends,
          Any update on commnet# 1 , 2 and 3

          Regards

  1. April 6th, 2012 at 17:48 | #1