Select with IS NULL on column running slow

  • I have a table with following struncture

    column1 int

    column2 int

    column3 int

    column4 decimal

    there are more than 3 million record in table; column4 is NULLable and has NULL value for more than 1.5 million rows.

    when I include column4 IS NULL in where clause it takes more than 3 minutes to return result whereas whithout where clause it takes 15 seconds.

    As far as Indexes are concern so I have clustered index on column1, column2 and column3.

    I tried creating non-clustered index on column4 but there is no difference in response.

    I need to check NULL for column4 while updating record and also for some report.

    Any suggestion to improve performance please..

    [font="Verdana"]--www.sqlvillage.com[/size][/font]

  • It would help if we could see the query (or queries). Also, are there only 4 columns, or are you you telling us about 4 columns. The DDL for the table and some sample data also could help some.

    😎

  • select * from Table1

    where column4 IS NULL

    There are only four columns

    Sample Data:

    1124100065.16

    1156101101.11

    11781015NULL

    118991015NULL

    11989102834NULL

    [font="Verdana"]--www.sqlvillage.com[/size][/font]

  • Query that performs slow:

    select * from Table1

    where column4 IS NULL

    Sample Data in Table:

    1124100065.16

    1156101101.11

    11781015NULL

    118991015NULL

    11989102834NULL

    [font="Verdana"]--www.sqlvillage.com[/size][/font]

  • Mohan Kumar (3/14/2008)


    Query that performs slow:

    select * from Table1

    where column4 IS NULL

    Sample Data in Table:

    1124100065.16

    1156101101.11

    11781015NULL

    118991015NULL

    11989102834NULL

    This query will be slow, you are only checking column4 to see if it is null, and there are 1.5 + million rows, it will do a table scan. An index on column4 will not help.

    What other queries are you attempting that are too slow, you mention something about updating and testing for null. are the update queries also slow?

    😎

  • Just for informational purposes what is the response of the query if you return rows where column4 is not null?

    I am going to guess that the query optimizer is not using an index because the where clause is not selective enough. It thinks that a table scan will be faster than traversing the index. You could try forcing the index by using from table with (index=index on column4).

    You could also try adding column4 to the clustered index since the non-clustered index includes the clustered index anyway.

  • Can you show us the query plans for both?

    thanks

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • You could also try adding column4 to the clustered index since the non-clustered index includes the clustered index anyway.

    I agree this may help. This should give you an index scan, but searching the leaf level only aint so bad ;).

    Additionally, you may consider using a default value 0 in you table, instead of allowing nulls. I think the optimizer will be able to choose better plans, when it is looking for numbers and not nulls.

  • Adam,

    I considered suggesting a default value as well, but, especially with numeric data, I prefer to leave null than to put in 0 as they really do mean different things especially when you consider averages. Also 0 could be a valid value and what if you need a count of all items where column4 = 0. I know using null can be a point of contention, but, particularly with numerics, I prefer them over a default or "dummy' value. Even if you include a number that you know will never be used, you still have to program around that number just like a null, although it may help in an index. Do you know if anyone has tested that?

  • Good to point this out. This is one of those situations where it depends on the business. Calculations can a be a tricky topic because users sometimes believe that the data they are looking at is correct, while in fact NULL mathmatics is rendering results invalid.

    For example, say column4 is payment applied and the "big guys" want to determine what the average payment is. The "big guys" should be aware that the average does not include colums where column4 is null.

    Take a look at the examples below:

    In this example the averages are way off

    create table #t(

    col1 int,

    col2 int,

    col3 int,

    col4 int

    )

    insert into #t

    select 1,124,1000,65.16 union all

    select 1,156,101,101.11 union all

    select 1,178,1015,NULL union all

    select 1,1899,1015,NULL union all

    select 1,1989,102834,NULL

    SELECT avg(col4)

    FROM #t

    update #t

    set col4 = 0

    where col4 is null

    SELECT avg(col4)

    FROM #t

    drop table #t

    using a simple select that fails

    In this example, the number of return rows is incorrect.

    select *

    from #t

    where col4 < 100

    Like I said before, this is not black and white. Using a default value is strickly based on the enviroment. You can alway code your queries to manage NULLs.

    Thanks for pointing this out Jack 😀

  • The above post was for everyone chiming in on the post that may not know about NULL calculations in SQL, not for you Jack.

    Addtionally, this site has a good article.

    http://www.sqlservercentral.com/articles/Advanced+Querying/gotchasqlaggregatefunctionsandnull/1947/

  • Adam Haines (3/15/2008)


    The above post was for everyone chiming in on the post that may not know about NULL calculations in SQL, not for you Jack.

    Addtionally, this site has a good article.

    http://www.sqlservercentral.com/articles/Advanced+Querying/gotchasqlaggregatefunctionsandnull/1947/

    Adam,

    Thanks for posting the example. That is something I need to get better at. I think calculations involving Nulls is one of the most misunderstood and miscoded things in SQL Server. Like anything else you need to plan for it so you are ready for it and handle it correctly.

    For example I worked at a paper mill with 5 paper machines and we needed to calculate average production (total, good, bad). Well it was possible for a machine to have 0 good or 0 bad which could be a null returned in a left join which in this case means 0 and we want that averaged in, but if a machine had no production because it was shut down for business reasons we did not want that count in the average. So you need to know the business rules in these cases. As usual, "It Depends"

  • Heh... I think the OP "left the building".

    The other thing is, ( I don't think anyone mentioned this above) you will never get the same speed using SELECT * no matter what you have for indexes (unless you have a full table index to cover, which is just wrong for so many reasons) because the best you can ever do is get an Index Scan.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I also don't think it would matter in an index if the 1.5+ million rows of null in column4 were set to a default of 0 either. It would still need to scan the table to return every row where column4 = 0.

    Last employer, we had an invoice header table with a bit field (0 = current, 1 = historical) that was indexed. This worked great for us as 95% of our queries was for current invoices, and when you have about 20,000 current vs about 1,000,000 historical; the index was a great asset.

    The index would have been crap if 95% of the queries was against historical data.

    😎

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply