count(*) giving wrong results

  • Hello, I had a strange problem and am wondering if anyone can tell me what could have caused it. Running these queries on a certain table in SQL Server 2000 gave the following results:

    select count(*) from Table1

    -> about 19 million

    select count(*) from Table1 where DateColumn < 'Jan 31, 2008'

    -> about 10.5 million

    select count(*) from Table1 where DateColumn >= 'Jan 31, 2008'

    -> about 10.5 million

    (there are actually only 2 different dates in DateColumn, one from the beginning of January and one from the beginning February)

    note: DateColumn is part of the primary key

    After reindexing the table the first query returned the correct result of about 21 million. Does anyone know what the problem could have been and how it could have happened?

    Thanks

  • Your statistics were probably out of date. Running the UPDATE STATISTICS command would've been a better command to use (first) rather than rebuilding the index.

    You should regularly update statistics on your indexes. The optimizer relies on statistics to determine whether or not to use an index when building a query plan. If the statistics are out of date, it can lead to the optimizer ignoring the index when it should be using it.



    Scott Duncan

    MARCUS. Why dost thou laugh? It fits not with this hour.
    TITUS. Why, I have not another tear to shed;
    --Titus Andronicus, William Shakespeare


  • This COULD be the issue:

    From the BOL:

    COUNT(*) returns the number of rows in a specified table without eliminating duplicates. It counts each row separately, including rows that contain null values.

    That means if you have one DATETIME that is NULL, it will be counted in both counts.

    Try:

    SELECT COUNT(*) FROM Table1

    WHERE DATECOLUMN IS NULL

    If the count returns more than 0, that is your issue.

    -SQLBill

  • Thanks for the responses. I forgot to mention but I did try updating the stats before reindexing, but it didn't help. Also I just checked and there are no nulls.

    I did just finish running dbcc checkdb on a copy of the database from before the "fix", and I got a whole bunch of "Missing or invalid key in index" errors for two of the indexes on that table. I guess somehow the indexes got corrupted, now I can try and find out why.

  • SQLBill (2/11/2008)


    That means if you have one DATETIME that is NULL, it will be counted in both counts.

    -SQLBill

    The OP stated that it was part of the PK (Nulls not allowed).

    Actually I would expect that the index was in some way corrupted giving bad results.

    I have also seen strange results with smalldates

  • auaiomrn (2/12/2008)


    I did just finish running dbcc checkdb on a copy of the database from before the "fix", and I got a whole bunch of "Missing or invalid key in index" errors for two of the indexes on that table. I guess somehow the indexes got corrupted, now I can try and find out why.

    Didn't catch this before I replied, but I can guess that confirms my suspicion

  • If there are nulls, the first query should give result more than the other 2 queries combine because nulls should not be counted in the 2 queries.

    Try use DATEDIFF:

    select count(*) from Table1 where DATEDIFF(dd,DateColumn,'Jan 31, 2008')<0

    select count(*) from Table1 where DATEDIFF(dd,DateColumn,'Jan 31, 2008')>=0

    I am afraid that your SQL sees them as strings, not real date.

    The other possibility might be some of dates have values older than the oldest date posible. I am not sure the number, but it might be only see them as 2 digit years.

  • nugroho (2/13/2008)


    If there are nulls, the first query should give result more than the other 2 queries combine because nulls should not be counted in the 2 queries.

    It was part of the PK. PK's cannot be NULL. So that couldn't have been the issue.

    Try use DATEDIFF:

    select count(*) from Table1 where DATEDIFF(dd,DateColumn,'Jan 31, 2008')<0

    select count(*) from Table1 where DATEDIFF(dd,DateColumn,'Jan 31, 2008')>=0

    I am afraid that your SQL sees them as strings, not real date.

    This is a performance nightmare. You just forced a full scan of the table that has the perfect index.

    I agree that with your idea. Just doing 'Jan 31, 2008' will do an implicit conversion to a "SMALLDATETIME" since there is no time in the string. This could cause issues, but since a rebuild fixed the count, that wouldn't have been the issue either.

    select count(*) from Table1 where DateColumn < CAST('Jan 31, 2008' as DATETIME)

    or even better

    select count(*) from Table1 where DateColumn < CONVERT(DATETIME,'Jan 31, 2008',107)

    FYI, If it can be avoided, NEVER wrap an indexed column with a function

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply