The dreaded != operator

  • How can I speed up a query that uses the != operator?  Specifically the equation is a!=''.  I remember reading something about not allowing nulls on the column will help the != out some by making it sargable(sp?).  I can't find the article.  Anyone know the ruling on this?

    The main query has three columns that are subquerys on big tables that use the != operator.  I need to squeek every bit of performance out.  I am working on the indexes and I need to know what modifications I need to do to the subquery tables so that they will run as fast as possible.

    The subquerys also use IN ('2006-07-01','2006-07-02', etc . . ), any idea on what changes to assure this is sargable also?

    Putting the subqueries into a view won't help because the subquery may have extra added onto the WHERE clause at runtime.  I have played around with doing this and not shown any increases in speed.

    This query is going to be ran a couple hundread, maybe thousands of times a day, so anything will help.

     

    So, in summary, I am just looking for details on the sargability of the IN () and != operators.

     

    Thanks in advance to any responses to this post.


    Live to Throw
    Throw to Live
    Will Summers

  • >>The main query has three columns that are subquerys on big tables that use the != operator.

    So, the query is of the form ...

    SELECT

      (SELECT Column FROM OtherTable WHERE ...) As Column1,

      (SELECT Column FROM SomeOtherTable WHERE ) As Column2,

    etc etc

    ??

    If so, you'll get cursor-like performance. It would be better to eliminate the sub-SELECTs and instead construct derived tables that are joined to in the FROM ...

    Guess you need to post the full query SQL.

  • Alright, you asked for it . .

    select distinct top 100 percent k.off_cod, k.ssn, d2.fullpart, d2.maxhours, d.mid_ini, rtrim(d.fir_nam) as fir_nam, rtrim(d.las_nam) as las_nam, 

                              isnull((select distinct 1 from exc as e where e.ssn=k.ssn and rtrim(err) != '' and col_dat in ('2006-07-01','2006-07-02', etc . )), 0) as isexcerr,

                              isnull((select distinct 1 from sch as s where s.ssn=k.ssn and rtrim(err) != '' and col_dat in ('2006-07-01','2006-07-02', etc . )), 0) as isscherr,

                              isnull((select distinct 1 from exc as a where a.ssn=k.ssn and charindex('?', swm_cod) > 0 and col_dat in ('2006-07-01','2006-07-02', etc . )), 0) as isman,

                          case (

                              isnull((select distinct 1 from sch as b where b.ssn=k.ssn and b.l_sub<b.l_upd and col_dat in ('2006-07-01','2006-07-02', etc . )), 0) +

                              isnull((select distinct 1 from exc as e where e.ssn=k.ssn and e.l_sub<e.l_upd and e.swm_cod != '' and col_dat in ('2006-07-01','2006-07-02', etc . )), 0))

                           when 0 then 0 else 1 end as sub

    from         cdat d inner join

                          crep k on d.ssn=k.ssn inner join

                          cdat2 d2 on d.ssn=d2.ssn

    where     (d.ssn > 999999) and (k.ssn > 999999) and  k.editlock='xx5555'

    order by d.las_nam, d.fir_nam

    I have simplified the table and column names for increased readibility.  I know it's not easy to read in this forum, but if you copy and paste the text into Notepad, then you will see how it's setup a lot easier.

    The EXC and SCH tables have about 4-5 millions rows and are pretty wide.  The CDAT, CREP and CDAT2 tables have anywhere from 500-20,000 rows.  The query runs in about 2-8 seconds depending on the number of dates supplied and how many people you are running it for.  I need it to run in less than 2 seconds preferabily. 

    Any help is appreciated.  Thanks.


    Live to Throw
    Throw to Live
    Will Summers

  • Just a note, but the exc and sch tables are being modified by other users, so I'm not sure if a derived table would help. 

    I have tried to setup the subquerys into views, even views with indexes, but saw no performance increase.


    Live to Throw
    Throw to Live
    Will Summers

  • This is the problem I'm referring to:

    select distinct 1

              from exc as e

              where e.ssn=k.ssn

              and rtrim(err) != ''

              and col_dat in ('2006-07-01','2006-07-02', etc . )), 0) as isexcerr

    In other words, you want the column [isexcerr] to be either 1 or 0, depending on whether at least 1 record can be found in table [exc] ?

    Doing this as a SELECT DISTINCT as a sub-SELECT is a performance nightmare in waiting.

    You do it in the FROM as follows:

    select distinct top 100 percent

      k.off_cod, k.ssn, etc etc,

      -- Convert the existence of an ssn in the derived table

      -- to a 1 or 0

      CASE WHEN dt_exc.ssn IS NULL THEN 0 ELSE 1 END As isexcerr

    FROM etc etc

    -- Left Join to a derived table to get isexcerr

    LEFT JOIN (

      SELECT e.ssn

      FROM exc as e 

      WHERE rtrim(err) != '' 

      AND col_dat in ('2006-07-01','2006-07-02', etc .)

      GROUP BY e.ssn

    ) dt_exc    -- name it "dt_exc"

      ON (dt_exc.ssn = k.ssn)

  • I have tried to convert it into a query that's all joins instead of the subquerys, but I didn't see a performance increase.  It was close, but still a couple of seconds slower.  Is this what you were referring to?

    If I break the query up into 4 seperate queries that join to the three other tables to eliminate the subqueries, they are faster.  But the sum of the 4 seperate queries is more than the time of the original query. 

     

    Any idea on the != and the IN () operator's sargability?


    Live to Throw
    Throw to Live
    Will Summers

  • The RTRIM(err) != '' will definitely not be sargable.

    What is in the [err] column that forces you to RTRim it ? Can it have leading spaces also ? If it never has leading spaces and always starts with an alpha-numeric, you could try this instead of not equal to. This can potentially use any index on [err]:

    Err LIKE '[a-z,0-9]%'

  • I never like to use IN or NOT IN. I will try to avoid it following way. This aprocah always helped me to improve the performance.

    SET NOCOUNT ON

    DECLARE @MainTable TABLE

    (

    MyValue VARCHAR(10),

    MyDate DATETIME

    )

    INSERT @MainTable

    SELECT 'MyValue 1', '01/01/2006' UNION

    SELECT 'MyValue 2', '01/02/2006' UNION

    SELECT 'MyValue 3', '01/03/2006' UNION

    SELECT 'MyValue 4', '01/04/2006' UNION

    SELECT 'MyValue 5', '01/05/2006' UNION

    SELECT 'MyValue 6', '01/06/2006' UNION

    SELECT 'MyValue 7', '01/07/2006' UNION

    SELECT 'MyValue 8', '01/08/2006' UNION

    SELECT 'MyValue 9', '01/09/2006' UNION

    SELECT 'MyValue 10', '01/10/2006'

    /* Using IN */

    SELECT * FROM @MainTable

    WHERE

     MyDate IN ('01/01/2006', '01/03/2006', '01/08/2006')

    /* Avoiding IN */

    DECLARE @ParmDates TABLE

    (

    ParamDate DATETIME

    )

    INSERT @ParmDates

    SELECT '01/01/2006' UNION

    SELECT '01/03/2006' UNION

    SELECT '01/08/2006'

    SELECT A.MyValue,

     A.MyDate -- A.OtherColumns

    FROM

     @MainTable A

    JOIN

     @ParmDates B

    ON

     A.MyDate = B.ParamDate

    /* Using NOT IN */

    SELECT * FROM @MainTable

    WHERE

     MyDate NOT IN ('01/01/2006', '01/03/2006', '01/08/2006')

    /* Avoiding NOT IN */

    SELECT A.MyValue,

     A.MyDate -- A.OtherColumns

    FROM

     @MainTable A

    LEFT OUTER JOIN

     @ParmDates B

    ON

     A.MyDate = B.ParamDate

    WHERE

     B.ParamDate IS NULL

    Regards,
    gova

  • PW - the err column is a varchar(100).  I haven't used varchar that much, so I just had the RTRIM on there for safety.  Would LIKE '[a-z,0-9]%' be the functional equilivant in my situation?

    govi - I originaly didn't have the IN list in the subqueries, it was part of the main join like you are suggesting.  This is a web app and all of the values that were in the table were already stored locally, so I just had the web app add those values to the IN clause.  That change sped the query up by about 50%.

    I have had several iterations of this query and have spent much time trying to optimize it.  Thanks for all of your replies.

     


    Live to Throw
    Throw to Live
    Will Summers

  • I know it'll be difficult but if this is going to run a lot on your system you should:

    • avoid using functions on the joins and where clauses (rtrim like functions)
    • use exists/not exists instead of in/not in where possible
    • try to avoid sub-queries (as suggested above by others)

    Good luck!

    Zubeyir

  • It's simple... >''

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply