August 22, 2006 at 11:02 am
How can I speed up a query that uses the != operator? Specifically the equation is a!=''. I remember reading something about not allowing nulls on the column will help the != out some by making it sargable(sp?). I can't find the article. Anyone know the ruling on this?
The main query has three columns that are subquerys on big tables that use the != operator. I need to squeek every bit of performance out. I am working on the indexes and I need to know what modifications I need to do to the subquery tables so that they will run as fast as possible.
The subquerys also use IN ('2006-07-01','2006-07-02', etc . . ), any idea on what changes to assure this is sargable also?
Putting the subqueries into a view won't help because the subquery may have extra added onto the WHERE clause at runtime. I have played around with doing this and not shown any increases in speed.
This query is going to be ran a couple hundread, maybe thousands of times a day, so anything will help.
So, in summary, I am just looking for details on the sargability of the IN () and != operators.
Thanks in advance to any responses to this post.
August 22, 2006 at 11:17 am
>>The main query has three columns that are subquerys on big tables that use the != operator.
So, the query is of the form ...
SELECT
(SELECT Column FROM OtherTable WHERE ...) As Column1,
(SELECT Column FROM SomeOtherTable WHERE ) As Column2,
etc etc
??
If so, you'll get cursor-like performance. It would be better to eliminate the sub-SELECTs and instead construct derived tables that are joined to in the FROM ...
Guess you need to post the full query SQL.
August 22, 2006 at 11:32 am
Alright, you asked for it . .
select distinct top 100 percent k.off_cod, k.ssn, d2.fullpart, d2.maxhours, d.mid_ini, rtrim(d.fir_nam) as fir_nam, rtrim(d.las_nam) as las_nam,
isnull((select distinct 1 from exc as e where e.ssn=k.ssn and rtrim(err) != '' and col_dat in ('2006-07-01','2006-07-02', etc . )), 0) as isexcerr,
isnull((select distinct 1 from sch as s where s.ssn=k.ssn and rtrim(err) != '' and col_dat in ('2006-07-01','2006-07-02', etc . )), 0) as isscherr,
isnull((select distinct 1 from exc as a where a.ssn=k.ssn and charindex('?', swm_cod) > 0 and col_dat in ('2006-07-01','2006-07-02', etc . )), 0) as isman,
case (
isnull((select distinct 1 from sch as b where b.ssn=k.ssn and b.l_sub<b.l_upd and col_dat in ('2006-07-01','2006-07-02', etc . )), 0) +
isnull((select distinct 1 from exc as e where e.ssn=k.ssn and e.l_sub<e.l_upd and e.swm_cod != '' and col_dat in ('2006-07-01','2006-07-02', etc . )), 0))
when 0 then 0 else 1 end as sub
from cdat d inner join
crep k on d.ssn=k.ssn inner join
cdat2 d2 on d.ssn=d2.ssn
where (d.ssn > 999999) and (k.ssn > 999999) and k.editlock='xx5555'
order by d.las_nam, d.fir_nam
I have simplified the table and column names for increased readibility. I know it's not easy to read in this forum, but if you copy and paste the text into Notepad, then you will see how it's setup a lot easier.
The EXC and SCH tables have about 4-5 millions rows and are pretty wide. The CDAT, CREP and CDAT2 tables have anywhere from 500-20,000 rows. The query runs in about 2-8 seconds depending on the number of dates supplied and how many people you are running it for. I need it to run in less than 2 seconds preferabily.
Any help is appreciated. Thanks.
August 22, 2006 at 11:33 am
Just a note, but the exc and sch tables are being modified by other users, so I'm not sure if a derived table would help.
I have tried to setup the subquerys into views, even views with indexes, but saw no performance increase.
August 22, 2006 at 11:53 am
This is the problem I'm referring to:
select distinct 1
from exc as e
where e.ssn=k.ssn
and rtrim(err) != ''
and col_dat in ('2006-07-01','2006-07-02', etc . )), 0) as isexcerr
In other words, you want the column [isexcerr] to be either 1 or 0, depending on whether at least 1 record can be found in table [exc] ?
Doing this as a SELECT DISTINCT as a sub-SELECT is a performance nightmare in waiting.
You do it in the FROM as follows:
select distinct top 100 percent
k.off_cod, k.ssn, etc etc,
-- Convert the existence of an ssn in the derived table
-- to a 1 or 0
CASE WHEN dt_exc.ssn IS NULL THEN 0 ELSE 1 END As isexcerr
FROM etc etc
-- Left Join to a derived table to get isexcerr
LEFT JOIN (
SELECT e.ssn
FROM exc as e
WHERE rtrim(err) != ''
AND col_dat in ('2006-07-01','2006-07-02', etc .)
GROUP BY e.ssn
) dt_exc -- name it "dt_exc"
ON (dt_exc.ssn = k.ssn)
August 22, 2006 at 12:43 pm
I have tried to convert it into a query that's all joins instead of the subquerys, but I didn't see a performance increase. It was close, but still a couple of seconds slower. Is this what you were referring to?
If I break the query up into 4 seperate queries that join to the three other tables to eliminate the subqueries, they are faster. But the sum of the 4 seperate queries is more than the time of the original query.
Any idea on the != and the IN () operator's sargability?
August 22, 2006 at 1:13 pm
The RTRIM(err) != '' will definitely not be sargable.
What is in the [err] column that forces you to RTRim it ? Can it have leading spaces also ? If it never has leading spaces and always starts with an alpha-numeric, you could try this instead of not equal to. This can potentially use any index on [err]:
Err LIKE '[a-z,0-9]%'
August 22, 2006 at 1:30 pm
I never like to use IN or NOT IN. I will try to avoid it following way. This aprocah always helped me to improve the performance.
SET NOCOUNT ON
DECLARE @MainTable TABLE
(
MyValue VARCHAR(10),
MyDate DATETIME
)
INSERT @MainTable
SELECT 'MyValue 1', '01/01/2006' UNION
SELECT 'MyValue 2', '01/02/2006' UNION
SELECT 'MyValue 3', '01/03/2006' UNION
SELECT 'MyValue 4', '01/04/2006' UNION
SELECT 'MyValue 5', '01/05/2006' UNION
SELECT 'MyValue 6', '01/06/2006' UNION
SELECT 'MyValue 7', '01/07/2006' UNION
SELECT 'MyValue 8', '01/08/2006' UNION
SELECT 'MyValue 9', '01/09/2006' UNION
SELECT 'MyValue 10', '01/10/2006'
/* Using IN */
SELECT * FROM @MainTable
WHERE
MyDate IN ('01/01/2006', '01/03/2006', '01/08/2006')
/* Avoiding IN */
DECLARE @ParmDates TABLE
(
ParamDate DATETIME
)
INSERT @ParmDates
SELECT '01/01/2006' UNION
SELECT '01/03/2006' UNION
SELECT '01/08/2006'
SELECT A.MyValue,
A.MyDate -- A.OtherColumns
FROM
@MainTable A
JOIN
@ParmDates B
ON
A.MyDate = B.ParamDate
/* Using NOT IN */
SELECT * FROM @MainTable
WHERE
MyDate NOT IN ('01/01/2006', '01/03/2006', '01/08/2006')
/* Avoiding NOT IN */
SELECT A.MyValue,
A.MyDate -- A.OtherColumns
FROM
@MainTable A
LEFT OUTER JOIN
@ParmDates B
ON
A.MyDate = B.ParamDate
WHERE
B.ParamDate IS NULL
Regards,
gova
August 22, 2006 at 1:41 pm
PW - the err column is a varchar(100). I haven't used varchar that much, so I just had the RTRIM on there for safety. Would LIKE '[a-z,0-9]%' be the functional equilivant in my situation?
govi - I originaly didn't have the IN list in the subqueries, it was part of the main join like you are suggesting. This is a web app and all of the values that were in the table were already stored locally, so I just had the web app add those values to the IN clause. That change sped the query up by about 50%.
I have had several iterations of this query and have spent much time trying to optimize it. Thanks for all of your replies.
August 22, 2006 at 2:47 pm
I know it'll be difficult but if this is going to run a lot on your system you should:
Good luck!
Zubeyir
August 22, 2006 at 10:59 pm
It's simple... >''
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply