January 21, 2011 at 1:33 am
I'm working with tables containing 100's of millions of rows. to simplify the problem, I have 2 tables Transaction and Account, and each Transaction relates to an Account.
The transaction table is clustered by a unique transaction ID.
The account table is clustered by an account ID.
What I 'm trying to to is select a subset of the Transactions using from a subset of the accounts. the subset of Transactions is based on a date period.
The transaction results table needs to be sorted (I clustered) by the Transaction_ID (this is sort it can be merge joined later). To achieve this I generally create a temp table clustered on the primary key.
CREATE TABLE #TRANSACTION_SUBSET
(
TRANSACTION_ID INT NOT NULL
,TRANSACTION_DATE SMALLDATETIME NOT NULL
)
CREATE CLUSTERED INDEX __TRANSACTION_PK ON #TRANSACTION_SUBSET (TRANSACTION_ID)
/* Select Account Subset*/
SELECT ACCOUNT_ID
INTO #ACCOUNT_SUBSET
FROM ACCOUNT WHERE ACCOUNT_STATUS = 'E';
/* Select Transaction Subset*/
INSERT #TRANSACTION_SUBSET
SELECT TRANSACTION_ID,TRANSACTION_DATE
FROM #ACCOUNT_SUBSET A
INNER HASH JOIN TRANSACTION T
ON T.ACCOUNT_ID = A.ACCOUNT_ID
WHERE TRANSACTION_DATE BETWEEN '2010-01-01' AND '2011-01-01'
You will note that I used an explicity join hint to force the plan to HASH join the ACCOUNT_SUBSET.
This execution plan uses the clustered index (ie. the one sorted by TRANSACTION_ID) - and does a hash lookup into the ACCOUNT_SUBSET table.
What I was expecting, and considering the input is being processed in the same order and my output (#TRANSACTION_SUBSET) - that there would be no sort required - but it does.
Even though the input and output order are the same - SQL insists on sorting the stream before it outputs it.
I see this alot - and would have expected the sort to be redundant.
Can anyone shed any light on why SQL would want to sort - even though there is no requirement to?
While I've been typing - it's just occurred to me that it may be something to do with parallel queries. I'll investigate that - but in the mean time, if anyone has anythoughts.
Regards,
Gary.
January 21, 2011 at 7:59 am
You are likely onto something with the parallel queries thought (try a MaxDOP of 1 to test it), but remember just because SQL is doing a JOIN with the data "ordered" in the transaction table, it can read the data in any order desired (especially if it is being executed in parallel).
--------------------------------------------------------------------------
When you realize you've dug yourself into a hole....Step 1...stop digging.
January 21, 2011 at 8:12 am
Thanks for you comments.
I have tested with MaxDOP=1, and it makes no difference.
However, I did find that by selecting the TOP 100K records, it actually generated the plan without the sort.
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply