December 5, 2013 at 9:32 am
Hi
I have a select statement that runs forever because of 'Select Distinct' and it makes tempdb grow very large due to sorting. Does anyone have suggestions on how to improve this select statement?
on CONVERT(VARCHAR, a.app_key) = SUBSTRING(c.user1, 1,6), dbo.dealer d
where c.DocType = 'VO' and c.Crtd_DateTime > '2012-01-01 00:00:00' and c.VendId like '01%'
December 5, 2013 at 12:43 pm
On The server NJACCOUNTING01, look at the indexes on the table marlin_test.dbo.apdoc
an index like this would greatly help the query, i believe:
CREATE INDEX IX_apdoc ON dbo.apdoc(Crtd_DateTime,DocType,VendId) INCLUDE (user1,vendid)
see what indexes are already there, and see if you can replace or modify an existing, or add one like my example above;
add the index, test your query again, and compare the before and after execution plans to see if ti's worthy of permanent promotion.
Lowell
December 5, 2013 at 3:13 pm
Thank you so much for the reply. I need to mention that the tables application.app_key and apdoc.user1 exist on separate databases (db1.application.app_key and db2.apdoc.user1.). I am running the query from db1 using a linked server connection to db2. I created an index on db1.application with (app_key, app_dlr_fkey) and one on db2.apdoc with (vendid, user1). After updating the stats on both tables I displayed the proposed execution plan 99% of the cost was on the Inner Join as a nested loop. I got a result set back pretty quick but got 100's of duplicate rows on user1 and vendid even if I user cross join. Any ideas why?
on CONVERT(VARCHAR(30), a.app_key) = SUBSTRING(c.user1, 1,6) , dbo.dealer d
where c.DocType = 'VO' and c.Crtd_DateTime > '2012-01-01 00:00:00' and c.VendId like '01%'
December 6, 2013 at 12:35 pm
jdbrown239 (12/5/2013)
Thank you so much for the reply. I need to mention that the tables application.app_key and apdoc.user1 exist on separate databases (db1.application.app_key and db2.apdoc.user1.). I am running the query from db1 using a linked server connection to db2. I created an index on db1.application with (app_key, app_dlr_fkey) and one on db2.apdoc with (vendid, user1). After updating the stats on both tables I displayed the proposed execution plan 99% of the cost was on the Inner Join as a nested loop. I got a result set back pretty quick but got 100's of duplicate rows on user1 and vendid even if I user cross join. Any ideas why?Select distinct d.dlr_reference_num,d.dlr_key,a.app_dlr_fkey, CONVERT(VARCHAR(30), a.app_key)AS RapAPP, c.user1, c.vendid from NJACCOUNTING01.marlin_test.dbo.apdoc c inner join dbo.application aon CONVERT(VARCHAR(30), a.app_key) = SUBSTRING(c.user1, 1,6) , dbo.dealer d
where c.DocType = 'VO' and c.Crtd_DateTime > '2012-01-01 00:00:00' and c.VendId like '01%'
It would help us answer this question if you explained the structure of your tables (especially the relations between them - one-to-one, one-to-many, etc.) and attached the actual execution plan. Providing the DDL is most helpful, and if you want some code suggestions, consumable sample data would be appreciated.
Jason Wolfkill
December 6, 2013 at 1:29 pm
one of the things that is going to make this slow is the linked server;
the execution plan will end up copying the entire table over to tempdb, and THEN filter the data witht eh WHERE / do the joins;
that's a lot of data over the wire that's not needed, most likely.
switching to a divide-and-conquor plan, you can get just the data that matches form the linke dserver into a temp table, adn then join on that.
something like this is my first guess, without any DDL to back it up.
--get the minimal data over the wire possible by using EXECUTE AT
INSERT INTO #temp
EXECUTE ( 'SELECT
c.DocType,c.Crtd_DateTime,c.VendId,c.user1,c.vendid
FROM marlin_test.dbo.apdoc c
WHERE c.DocType = ''VO''
AND c.Crtd_DateTime > ''2012-01-01 00:00:00''
AND c.VendId LIKE ''01%'';' ) AT NJACCOUNTING01;
--data was already filtered at linked server, just join and distinct it.
SELECT DISTINCT
d.dlr_reference_num,
d.dlr_key,
a.app_dlr_fkey,
CONVERT(VARCHAR(30), a.app_key)AS RapAPP,
c.user1,
c.vendid
FROM #temp c
INNER JOIN dbo.application a
ON CONVERT(VARCHAR(30), a.app_key) = SUBSTRING(c.user1, 1, 6),
dbo.dealer d
Lowell
December 6, 2013 at 1:38 pm
Lowell (12/6/2013)
one of the things that is going to make this slow is the linked server;the execution plan will end up copying the entire table over to tempdb, and THEN filter the data witht eh WHERE / do the joins;
that's a lot of data over the wire that's not needed, most likely.
switching to a divide-and-conquor plan, you can get just the data that matches form the linke dserver into a temp table, adn then join on that.
something like this is my first guess, without any DDL to back it up.
--get the minimal data over the wire possible by using EXECUTE AT
INSERT INTO #temp
EXECUTE ( 'SELECT
c.DocType,c.Crtd_DateTime,c.VendId,c.user1,c.vendid
FROM marlin_test.dbo.apdoc c
WHERE c.DocType = ''VO''
AND c.Crtd_DateTime > ''2012-01-01 00:00:00''
AND c.VendId LIKE ''01%'';' ) AT NJACCOUNTING01;
--data was already filtered at linked server, just join and distinct it.
SELECT DISTINCT
d.dlr_reference_num,
d.dlr_key,
a.app_dlr_fkey,
CONVERT(VARCHAR(30), a.app_key)AS RapAPP,
c.user1,
c.vendid
FROM #temp c
INNER JOIN dbo.application a
ON CONVERT(VARCHAR(30), a.app_key) = SUBSTRING(c.user1, 1, 6),
dbo.dealer d
I'll vouch for this approach. Generally, it's a good idea to push as much of the filtering as possible to the linked server for two reasons - one, it cuts down the amount of data piped back to the calling server, as Lowell mentioned, and two, the linked server can use statistics, indexes, and metadata that are unavailable to the calling server to construct a more efficient plan for returning its data. I've used the temp table approach Lowell describes for these very reasons.
Jason Wolfkill
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply