April 24, 2024 at 4:12 pm
I have been asked to fine-tune the following SQL that was written by a former employee. The DBA says that using the select tables in inner join subquery is causing heavy resource usage as it selects all the rows from the tables. He wants to know whether I could use the fields at the beginning of the select statement and remove those inner join subqueries. I am not sure whether this is possible and would like to ask the experts here for their comments. Any help is greatly appreciated.
I am including some extract where the DBA pointed out the issues.
Select emp.vid,pt,dn,dcode,dcat,dtime,dlocation from
(select id, pdate, duration
from demo where mneu = 'COST' and pdate >= '01/01/2024' and pdate<='01/31/2024'
) T1
Inner Join
(select vid,elemid, resp, qval from sal where resp <>0
) T2
on T2.elemid = T1.id
Inner Join
(Select apptid, name from tappt
) T3
on T3.apptid = T1.id
) ACTY
Inner Join
(
Select vid,pid,acc,stat from RM where loc not like 'ABC%'
)REM
on REM.vid = ACTY.vid
Inner Join
( Select pit, name from HMA
) HM
On HM.pid = REM.pid
Inner Join
(
Select pid, prm from HMR where desc = 'RET'
) HRM
On HRM.pid = REM.pid
Inner Join
(
Select mdid,emid,qryval from MGE where desc = 'POP'
)LC
on LC.mdid = ACTY.dlocation
April 24, 2024 at 5:54 pm
No idea about performance, and don't know where some of those columns are coming from, but it looks like it could be simplified to something like this, which is at least much simpler to read.
Select emp.vid,pt,dn,dcode,dcat,dtime,dlocation
from demo T1
inner join sal T2 on T2.elemid = T1.id
inner join tappt T3 on T3.apptid = T1.id
inner join RM REM on REM.vid = emp.vid
inner join HMA HM on HM.pid = REM.pid
inner join HMR HRM on HRM.pid = REM.pid
inner join MGE LC on LC.mdid = ACTY.dlocation
where T1.mneu = 'COST' and T1.pdate >= '01/01/2024' and T1.pdate<='01/31/2024'
and T2.resp <> 0
and REM.loc not like 'ABC%'
and HRM.desc = 'RET'
and LC.desc = 'POP'
April 24, 2024 at 6:25 pm
Since all of the joins to the derived tables (selects in the from/join) - there is no benefit to even using the sub-queries. Removing the sub-queries and refactoring to simple join statements - the code becomes much easier to read and manage.
As for the statement from your DBA saying these sub-queries (derived tables) are reading all of the data from the tables - that isn't caused by having the derived tables. That will be due to statistics on each table and the expected number of rows that SQL Server estimates will be returned. If that estimate is larger than about 20% of the table - SQL Server will perform a clustered index scan.
If the columns in the where clause are not indexed - then SQL Server must read every row in that table to determine if that row meets the requirements.
I would look to the execution plan (actual if possible) to see what indexes are being used - and to see if the estimated number of rows matches (or is very close) to actual number of rows. If not, then you probably have an issue with out of date statistics - if that is not the case then maybe you can identify one or more indexes that can be added to improve the performance.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
April 24, 2024 at 8:01 pm
I see "emp." being referenced, but I don't see "emp" anywhere in the supplied query.
You would have to check each of the joined tables to see if there is an index that matches the query against the table. If not, you could create a covering index if/as needed that would prevent a full scan of the table.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
April 24, 2024 at 9:53 pm
"Select emp.vid,pt,dn,dcode,dcat,dtime,dlocation "
IF you are selecting all the columns from just your base table, and non from the other join tables, you could try removing the joins and instead employ the WHERE Exists operation. But as others have said Indexes and statistics (via the query plan) are the place to start.
----------------------------------------------------
April 25, 2024 at 12:35 am
Thanks a lot for all the replies. Much appreciated. I am going to remove the sub queries and see how it works. Will also ask the DBA for the execution plan for the original and the modified query and check the indexes too. Again really appreciate all the replies.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply