The dreaded != operator

Question

Post reply

The dreaded != operator

Will1922

SSCertifiable

Points: 7189
More actions
August 22, 2006 at 11:02 am

#153147

How can I speed up a query that uses the != operator? Specifically the equation is a!=''. I remember reading something about not allowing nulls on the column will help the != out some by making it sargable(sp?). I can't find the article. Anyone know the ruling on this?
The main query has three columns that are subquerys on big tables that use the != operator. I need to squeek every bit of performance out. I am working on the indexes and I need to know what modifications I need to do to the subquery tables so that they will run as fast as possible.
The subquerys also use IN ('2006-07-01','2006-07-02', etc . . ), any idea on what changes to assure this is sargable also?
Putting the subqueries into a view won't help because the subquery may have extra added onto the WHERE clause at runtime. I have played around with doing this and not shown any increases in speed.
This query is going to be ran a couple hundread, maybe thousands of times a day, so anything will help.

So, in summary, I am just looking for details on the sargability of the IN () and != operators.

Thanks in advance to any responses to this post.
Live to Throw
Throw to Live
Will Summers

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

PW-201837 SSC-Insane Points: 20805 More actions · Answer 1

>>The main query has three columns that are subquerys on big tables that use the != operator.

So, the query is of the form ...

SELECT

(SELECT Column FROM OtherTable WHERE ...) As Column1,

(SELECT Column FROM SomeOtherTable WHERE ) As Column2,

etc etc

??

If so, you'll get cursor-like performance. It would be better to eliminate the sub-SELECTs and instead construct derived tables that are joined to in the FROM ...

Guess you need to post the full query SQL.

Will1922 SSCertifiable Points: 7189 More actions · Answer 2

Alright, you asked for it . .

select distinct top 100 percent k.off_cod, k.ssn, d2.fullpart, d2.maxhours, d.mid_ini, rtrim(d.fir_nam) as fir_nam, rtrim(d.las_nam) as las_nam,

isnull((select distinct 1 from exc as e where e.ssn=k.ssn and rtrim(err) != '' and col_dat in ('2006-07-01','2006-07-02', etc . )), 0) as isexcerr,

isnull((select distinct 1 from sch as s where s.ssn=k.ssn and rtrim(err) != '' and col_dat in ('2006-07-01','2006-07-02', etc . )), 0) as isscherr,

isnull((select distinct 1 from exc as a where a.ssn=k.ssn and charindex('?', swm_cod) > 0 and col_dat in ('2006-07-01','2006-07-02', etc . )), 0) as isman,

case (

isnull((select distinct 1 from sch as b where b.ssn=k.ssn and b.l_sub<b.l_upd and col_dat in ('2006-07-01','2006-07-02', etc . )), 0) +

isnull((select distinct 1 from exc as e where e.ssn=k.ssn and e.l_sub<e.l_upd and e.swm_cod != '' and col_dat in ('2006-07-01','2006-07-02', etc . )), 0))

when 0 then 0 else 1 end as sub

from cdat d inner join

crep k on d.ssn=k.ssn inner join

cdat2 d2 on d.ssn=d2.ssn

where (d.ssn > 999999) and (k.ssn > 999999) and k.editlock='xx5555'

order by d.las_nam, d.fir_nam

I have simplified the table and column names for increased readibility. I know it's not easy to read in this forum, but if you copy and paste the text into Notepad, then you will see how it's setup a lot easier.

The EXC and SCH tables have about 4-5 millions rows and are pretty wide. The CDAT, CREP and CDAT2 tables have anywhere from 500-20,000 rows. The query runs in about 2-8 seconds depending on the number of dates supplied and how many people you are running it for. I need it to run in less than 2 seconds preferabily.

Any help is appreciated. Thanks.

Live to Throw
Throw to Live
Will Summers

Will1922 SSCertifiable Points: 7189 More actions · Answer 3

Just a note, but the exc and sch tables are being modified by other users, so I'm not sure if a derived table would help.

I have tried to setup the subquerys into views, even views with indexes, but saw no performance increase.

Live to Throw
Throw to Live
Will Summers

PW-201837 SSC-Insane Points: 20805 More actions · Answer 4

This is the problem I'm referring to:

select distinct 1

from exc as e

where e.ssn=k.ssn

and rtrim(err) != ''

and col_dat in ('2006-07-01','2006-07-02', etc . )), 0) as isexcerr

In other words, you want the column [isexcerr] to be either 1 or 0, depending on whether at least 1 record can be found in table [exc] ?

Doing this as a SELECT DISTINCT as a sub-SELECT is a performance nightmare in waiting.

You do it in the FROM as follows:

select distinct top 100 percent

k.off_cod, k.ssn, etc etc,

-- Convert the existence of an ssn in the derived table

-- to a 1 or 0

CASE WHEN dt_exc.ssn IS NULL THEN 0 ELSE 1 END As isexcerr

FROM etc etc

-- Left Join to a derived table to get isexcerr

LEFT JOIN (

SELECT e.ssn

FROM exc as e

WHERE rtrim(err) != ''

AND col_dat in ('2006-07-01','2006-07-02', etc .)

GROUP BY e.ssn

) dt_exc -- name it "dt_exc"

ON (dt_exc.ssn = k.ssn)

Will1922 SSCertifiable Points: 7189 More actions · Answer 5

I have tried to convert it into a query that's all joins instead of the subquerys, but I didn't see a performance increase. It was close, but still a couple of seconds slower. Is this what you were referring to?

If I break the query up into 4 seperate queries that join to the three other tables to eliminate the subqueries, they are faster. But the sum of the 4 seperate queries is more than the time of the original query.

Any idea on the != and the IN () operator's sargability?

Live to Throw
Throw to Live
Will Summers

PW-201837 SSC-Insane Points: 20805 More actions · Answer 6

The RTRIM(err) != '' will definitely not be sargable.

What is in the [err] column that forces you to RTRim it ? Can it have leading spaces also ? If it never has leading spaces and always starts with an alpha-numeric, you could try this instead of not equal to. This can potentially use any index on [err]:

Err LIKE '[a-z,0-9]%'

Roy Rogers SSCertifiable Points: 7712 More actions · Answer 7

I never like to use IN or NOT IN. I will try to avoid it following way. This aprocah always helped me to improve the performance.

SET NOCOUNT ON

DECLARE @MainTable TABLE

(

MyValue VARCHAR(10),

MyDate DATETIME

)

INSERT @MainTable

SELECT 'MyValue 1', '01/01/2006' UNION

SELECT 'MyValue 2', '01/02/2006' UNION

SELECT 'MyValue 3', '01/03/2006' UNION

SELECT 'MyValue 4', '01/04/2006' UNION

SELECT 'MyValue 5', '01/05/2006' UNION

SELECT 'MyValue 6', '01/06/2006' UNION

SELECT 'MyValue 7', '01/07/2006' UNION

SELECT 'MyValue 8', '01/08/2006' UNION

SELECT 'MyValue 9', '01/09/2006' UNION

SELECT 'MyValue 10', '01/10/2006'

/* Using IN */

SELECT * FROM @MainTable

WHERE

MyDate IN ('01/01/2006', '01/03/2006', '01/08/2006')

/* Avoiding IN */

DECLARE @ParmDates TABLE

(

ParamDate DATETIME

)

INSERT @ParmDates

SELECT '01/01/2006' UNION

SELECT '01/03/2006' UNION

SELECT '01/08/2006'

SELECT A.MyValue,

A.MyDate -- A.OtherColumns

FROM

@MainTable A

JOIN

@ParmDates B

ON

A.MyDate = B.ParamDate

/* Using NOT IN */

SELECT * FROM @MainTable

WHERE

MyDate NOT IN ('01/01/2006', '01/03/2006', '01/08/2006')

/* Avoiding NOT IN */

SELECT A.MyValue,

A.MyDate -- A.OtherColumns

FROM

@MainTable A

LEFT OUTER JOIN

@ParmDates B

ON

A.MyDate = B.ParamDate

WHERE

B.ParamDate IS NULL

Regards,
gova

Will1922 SSCertifiable Points: 7189 More actions · Answer 8

PW - the err column is a varchar(100). I haven't used varchar that much, so I just had the RTRIM on there for safety. Would LIKE '[a-z,0-9]%' be the functional equilivant in my situation?

govi - I originaly didn't have the IN list in the subqueries, it was part of the main join like you are suggesting. This is a web app and all of the values that were in the table were already stored locally, so I just had the web app add those values to the IN clause. That change sped the query up by about 50%.

I have had several iterations of this query and have spent much time trying to optimize it. Thanks for all of your replies.

Live to Throw
Throw to Live
Will Summers

Zubeyir SSCrazy Points: 2559 More actions · Answer 9

I know it'll be difficult but if this is going to run a lot on your system you should:

avoid using functions on the joins and where clauses (rtrim like functions)
use exists/not exists instead of in/not in where possible
try to avoid sub-queries (as suggested above by others)

Good luck!

Zubeyir

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 10

It's simple... >''

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)