Redundant sort

  • I'm working with tables containing 100's of millions of rows. to simplify the problem, I have 2 tables Transaction and Account, and each Transaction relates to an Account.

    The transaction table is clustered by a unique transaction ID.

    The account table is clustered by an account ID.

    What I 'm trying to to is select a subset of the Transactions using from a subset of the accounts. the subset of Transactions is based on a date period.

    The transaction results table needs to be sorted (I clustered) by the Transaction_ID (this is sort it can be merge joined later). To achieve this I generally create a temp table clustered on the primary key.

    CREATE TABLE #TRANSACTION_SUBSET

    (

    TRANSACTION_ID INT NOT NULL

    ,TRANSACTION_DATE SMALLDATETIME NOT NULL

    )

    CREATE CLUSTERED INDEX __TRANSACTION_PK ON #TRANSACTION_SUBSET (TRANSACTION_ID)

    /* Select Account Subset*/

    SELECT ACCOUNT_ID

    INTO #ACCOUNT_SUBSET

    FROM ACCOUNT WHERE ACCOUNT_STATUS = 'E';

    /* Select Transaction Subset*/

    INSERT #TRANSACTION_SUBSET

    SELECT TRANSACTION_ID,TRANSACTION_DATE

    FROM #ACCOUNT_SUBSET A

    INNER HASH JOIN TRANSACTION T

    ON T.ACCOUNT_ID = A.ACCOUNT_ID

    WHERE TRANSACTION_DATE BETWEEN '2010-01-01' AND '2011-01-01'

    You will note that I used an explicity join hint to force the plan to HASH join the ACCOUNT_SUBSET.

    This execution plan uses the clustered index (ie. the one sorted by TRANSACTION_ID) - and does a hash lookup into the ACCOUNT_SUBSET table.

    What I was expecting, and considering the input is being processed in the same order and my output (#TRANSACTION_SUBSET) - that there would be no sort required - but it does.

    Even though the input and output order are the same - SQL insists on sorting the stream before it outputs it.

    I see this alot - and would have expected the sort to be redundant.

    Can anyone shed any light on why SQL would want to sort - even though there is no requirement to?

    While I've been typing - it's just occurred to me that it may be something to do with parallel queries. I'll investigate that - but in the mean time, if anyone has anythoughts.

    Regards,

    Gary.

  • You are likely onto something with the parallel queries thought (try a MaxDOP of 1 to test it), but remember just because SQL is doing a JOIN with the data "ordered" in the transaction table, it can read the data in any order desired (especially if it is being executed in parallel).

    --------------------------------------------------------------------------
    When you realize you've dug yourself into a hole....Step 1...stop digging.

  • Thanks for you comments.

    I have tested with MaxDOP=1, and it makes no difference.

    However, I did find that by selecting the TOP 100K records, it actually generated the plan without the sort.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply