Subselect VS TempTable: Which one is more efficient?

  • -- GIVEN 3 TEMPORDERS

    #TempOrder

    orderId

    -------

    1 --has 10 orderlines shipped

    2 --has 10 orderlines shipped

    3 --has 10 orderlines shipped

    -- GIVEN 1 BILLION ORDERES IN DATABASE AND AT LEAST 10 BILLION ORDERLINES (ALL ORDERS HAS 10 SHIPPED ORDERLINES)

    Order

    orderId

    -------

    1 --has 10 orderlines shipped

    2 --has 10 orderlines shipped

    3 --has 10 orderlines shipped

    4 --has 10 orderlines shipped

    5 --has 10 orderlines shipped

    : --has 10 orderlines shipped

    : --has 10 orderlines shipped

    100000000 --has 10 orderlines shipped (1 billionth record)

    --RETRIEVE TEMPORDER AND ITS SHIIPPED ORDERLINES

    orderId orderLineId

    ------- -----------

    1 1

    1 2

    1 3

    : :

    2 1

    2 2

    2 3

    : :

    3 1

    3 2

    3 3

    : :

    --METHOD1: SUBSELECT

    SELECT TempOL.orderId, TempOL.orderLineId

    FROM #TempOrder TO

    INNER JOIN

    (SELECT OL.orderId, OL.orderLineId FROM OrderLine OL WHERE OL.status = 'Shipped') AS TempOL

    WHERE TempOL.orderId = O.orderId

    --METHOD2: TEMPTABLE

    INSERT INTO #TempOrderWithShipped (orderId, orderLineId)

    SELECT O.orderId, OL.orderId

    FROM #TempOrder O

    INNER JOIN OrderLine OL ON OL.orderId = O.orderId AND OL.status = 'Shipped'

    SELECT * FROM #TempOrderWithShipped

    So the main question is which is more efficient?

    I'm pretty new to SQL, but my shallow analysis is this..

    METHOD1 will read 1 billion x 10 records then connect the 3 records

    METHOD2 will read 3 x 10 records

    which makes METHOD2 more efficient

    I also like METHOD2 as far as code readability

    I'm not looking for coding alternative mainly because the real code that I'm working is much more complicated than the above. But yeah, if you think there's a 3rd way, let me know.

  • You're pretty much spot on with your analysis. In the first example the nested query (or derived table as it's commonly referred to) runs first, the results of which are then filtered in the where clause.

    The first example is roughly equivalent to inserting the results of nested query into a temp table and then running the outer query whilst joining on this temporary table you created. It's not exactly the same but it illustrates the point.

    This isn't to say that derived tables can't be efficient.

    By the way, if you use the SET STATISTICS IO ON option (just run this statement once within the connection you're working in) and then run the queries you'll get details relating to the IO impact of the queries, which can often provide very obvious clues as to efficiency.

  • Although I hate to say it this way, it depends..

    But first, it seems a simple inner join would more than suffice for what you want, just join the header to the details table on the primary keys and specify your filter criteria in the where clause.. Much simpler and no need for sub-select or temp tables..

    Now, on to your question..

    Part of the considerations for me has always been load time and space to load the temp table, both tend to be expensive operations and a sub-select will often win. I have often done it both ways and tested it to see what was better. We also have the case of temp table vs. table variable which is even stickier. I had a case where a process ran much better up to a point with table variables and then slower later on, and slower using temp tables to a point and then much faster later on. This process was rearchitected to use BOTH because at that defined point even with the copy of data from the table variable to the temp table it was still 40 times faster.

    Temp tables tend to be unindexed so you will take a hit on that, and in previous versions even if you defined an index it would never be used due to the way the optimization engine evaluated the batch.

    I'm not sure if I answered your question or just gave you more to think about..

    CEWII

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply