August 23, 2012 at 8:08 am
Hi all,
Please note the real question is at the end.
I have following query, both pto and ph has about 30million rows. The query initially run very slow (3 mins). So I added two index on pto, ph respectively.
SELECT
MAX(ph.txn_date_time)
FROM
pto AS pto WITH (NOLOCK)
INNER JOIN ph AS ph WITH (NOLOCK) ON ph.receipt_id = pto.receipt_id
WHERE
pto.subtype = 'ff'
AND pto.Units_No > 0
AND ph.branch_id = 5
CREATE NONCLUSTERED INDEX [IX_pto_subTypeUnitReceipt] ON [dbo].[pto]
(
[SUBTYPE] ASC,
[Units_No] ASC,
[RECEIPT_ID] ASC
)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = ON, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [Indexes]
CREATE NONCLUSTERED INDEX [IX_ph_branchReceiptTxn] ON [dbo].[ph]
(
[BRANCH_ID] ASC,
[RECEIPT_ID] ASC,
[TXN_DATE_TIME] ASC
)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [Indexes]
Now the query runs in 350ms. Great. The execution plan is also very simple, it uses the created index from the two tables and did a Hash join on the receipt_id column then a Stream Aggregate to do the MAX(ph.txn_date_time). So every column in the query is covered by the two added index.
The question is why it used a Hash join on the receipt_id column? I mean since RECEIPT_ID in both indexes are sorted the optimizer should have used a merge join. To figure out why I changed the first index to below (put RECEIPT_ID before Units_No).
CREATE NONCLUSTERED INDEX [IX_pto_subTypeUnitReceipt] ON [dbo].[pto]
(
[SUBTYPE] ASC,
[RECEIPT_ID] ASC,
[Units_No] ASC
)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = ON, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [Indexes]
And now I see the Merge join on the RECEIPT_ID column. The query also runs in 170ms. Now obviously the optimizer think the RECEIPT_ID in both indexes are sorted so a merge join is used. But I don't understand why in the first case it doesn't think so?
Thanks,
Nian
August 23, 2012 at 8:34 am
nzhang6666 (8/23/2012)
Hi all,Please note the real question is at the end.
I have following query, both pto and ph has about 30million rows. The query initially run very slow (3 mins). So I added two index on pto, ph respectively.
SELECT
MAX(ph.txn_date_time)
FROM
pto AS pto WITH (NOLOCK)
INNER JOIN ph AS ph WITH (NOLOCK) ON ph.receipt_id = pto.receipt_id
WHERE
pto.subtype = 'ff'
AND pto.Units_No > 0
AND ph.branch_id = 5
CREATE NONCLUSTERED INDEX [IX_pto_subTypeUnitReceipt] ON [dbo].[pto]
(
[SUBTYPE] ASC,
[Units_No] ASC,
[RECEIPT_ID] ASC
)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = ON, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [Indexes]
CREATE NONCLUSTERED INDEX [IX_ph_branchReceiptTxn] ON [dbo].[ph]
(
[BRANCH_ID] ASC,
[RECEIPT_ID] ASC,
[TXN_DATE_TIME] ASC
)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [Indexes]
Now the query runs in 350ms. Great. The execution plan is also very simple, it uses the created index from the two tables and did a Hash join on the receipt_id column then a Stream Aggregate to do the MAX(ph.txn_date_time). So every column in the query is covered by the two added index.
The question is why it used a Hash join on the receipt_id column? I mean since RECEIPT_ID in both indexes are sorted the optimizer should have used a merge join. To figure out why I changed the first index to below (put RECEIPT_ID before Units_No).
CREATE NONCLUSTERED INDEX [IX_pto_subTypeUnitReceipt] ON [dbo].[pto]
(
[SUBTYPE] ASC,
[RECEIPT_ID] ASC,
[Units_No] ASC
)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = ON, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [Indexes]
And now I see the Merge join on the RECEIPT_ID column. The query also runs in 170ms. Now obviously the optimizer think the RECEIPT_ID in both indexes are sorted so a merge join is used. But I don't understand why in the first case it doesn't think so?
Thanks,
Nian
Look at the data and compare the output from the following queries:
select top (100)
[SUBTYPE],
[RECEIPT_ID],
[Units_No]
from
pto
order by
[SUBTYPE],
[RECEIPT_ID],
[Units_No];
select top (100)
[SUBTYPE],
[RECEIPT_ID],
[Units_No]
from
pto
order by
[SUBTYPE],
[Units_No],
[RECEIPT_ID];
August 23, 2012 at 8:50 am
Hi Lynn,
Just compared the results and they are same.
I don't understand why you suggest doing this. I mean even the data is different but they are still ordered. As far as I know "A Merge Join occurs on tables where the join columns are presorted." is it?
Nian
August 23, 2012 at 9:17 am
nzhang6666 (8/23/2012)
Hi Lynn,Just compared the results and they are same.
I don't understand why you suggest doing this. I mean even the data is different but they are still ordered. As far as I know "A Merge Join occurs on tables where the join columns are presorted." is it?
Nian
Perhaps not enough values were returned. Look at the ORDER BY's, each matches the one of the indexes you created. Come up with a small set of rapidly changing data for all three columns and see what each one would return.
Look at the order of the columns between the two indexes:
[SUBTYPE], [Units_No], [RECEIPT_ID]
[SUBTYPE], [RECEIPT_ID], [Units_No]
The data in the first could look like:
ff,1,101
ff,1,103
ff,2,104
ff,2,102
ff,3,102
The same data in the second:
ff,101,1
ff,102,2
ff,102,3
ff,103,1
ff,104,2
See a difference?
August 23, 2012 at 9:37 am
Hi Lynn,
I got the point. The receipt_id can not guaranteed to be ordered, if it's not the first column in the ordering columns. In this case the subtype is fixed, so receipt_id is guaranteed to be ordered when it is the second column.
Thank you.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply