June 27, 2005 at 1:45 pm
I have a table where I cannot get a reliable answer for how many rows have a NULL value for a key field.
CREATE TABLE [dbo].[EBTable] (
[EBID] [int] IDENTITY (1, 1) NOT NULL ,
-- Other fields omitted for clarity
[KeyField] [char] (54) NULL ,
CONSTRAINT [PK_EBTable] PRIMARY KEY CLUSTERED ([EBID]) WITH FILLFACTOR = 100 ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE INDEX [IX_EBTable_KeyField] ON [dbo].[EBTable] ([KeyField]) WITH FILLFACTOR = 100 ON [PRIMARY]
GO
-- Counting all rows in the table is always correct
SELECT COUNT(*) FROM dbo.EBTable
-- -----------
-- 12808937
GO
-- Counting rows with non-null values always works
SELECT COUNT(*) FROM dbo.EBTable WHERE KeyField IS NOT NULL
-- -----------
-- 11135154
GO
-- Counting rows with null values doesn't work
SELECT COUNT(*) FROM dbo.EBTable WHERE KeyField IS NULL
-- -----------
-- 10931004
GO
-- Counting both null and non-null at the same time always works
SELECT NullKey, COUNT(*) as Records
FROM (
SELECT CASE WHEN KeyField IS NULL THEN 1 ELSE 0 END AS NullKey
FROM dbo.EBTable
) x
GROUP BY NullKey WITH ROLLUP
NullKey Records
----------- -----------
0 11135154
1 1673783
NULL 12808937
Obviously, if there are 12.8 million rows and over 11 million have a KeyField value it is impossible to have another 10 million with no KeyField value. What's worse is the NULL row count varies from 9+ million to 12+ million at different times.
The execution plan says all queries are using the index on KeyField. I have used DBCC DBREINDEX, and even dropped and recreated the index, but the only effect is a slight variation in the incorrect answer. If the queries are run without the secondary index, forcing a table scan, the answer is correct.
There is no other activity in the table at this time. CHECKTABLE and CHECKDB can't find any errors in the table or database.
Any ideas?
June 27, 2005 at 1:57 pm
Key field has value as 'NULL'.
It is not counted as NULL. It happens when you make a wrong query to insert nulls
SELECT COUNT(*) FROM dbo.EBTable WHERE KeyField IS NULL OR KeyField = NULL OR KeyField = 'NULL'
Added
SELECT COUNT(*) FROM dbo.EBTable WHERE KeyField IS NOT NULL AND KeyField <> NULL AND KeyField <> 'NULL'
might give the results you are expecting
Regards,
gova
June 27, 2005 at 2:30 pm
For all things that are null, err not null, ergo Unknown.
read todays article
http://www.sqlservercentral.com/columnists/mcoles/fourrulesfornulls.asp
June 27, 2005 at 3:23 pm
Note that all queries use "IS NULL" and "IS NOT NULL", never "=NULL" or "<>NULL".
And I can assure you that none of the fields contain the character string 'NULL'
June 27, 2005 at 10:03 pm
I mis-read the original post.
Can you play around like this and post the results
/* Just to analize the data */
SELECT KeyField, * FROM dbo.EBTable WHERE KeyField IS NULL -- yes million rows will kill the machine
/* logically should return 0 rows */
SELECT A.*
FROM
(SELECT KeyField, * FROM dbo.EBTable WHERE KeyField IS NOT NULL) A
JOIN (SELECT KeyField, * FROM dbo.EBTable WHERE KeyField IS NULL) B
ON A.KeyField = B.KeyField
Regards,
gova
June 28, 2005 at 8:14 am
Have you tried to recreate the non clustered index? Checked for table inconsistency? Maybe you get different query plans for your queries and your index/table is corrupt (which could lead to different resuts)?
June 28, 2005 at 8:45 am
As stated initially, I have run DBCC CHECKDB and CHECKTABLE with no errors found. I have also both reindexed and drop/created the secondary index.
An execution plan that involves an index seek on the secondary index returns the wrong count, while one that uses a table scan or an index scan on the secondary index gives the correct count.
The table had to be updated yesterday so there are no more NULL KeyField values, but as an experiment I copied it and set KeyField = NULL in 1,605,243 rows. The first SELECT COUNT(*) WHERE IS NULL returned 12,284,824 so the problem is persistent.
When I look at the execution plan for SELECT COUNT(*) ...IS NULL, if I hover over the arrow coming from the index seek the popup correctly shows an estimated row count of 1,605,243. But when I executed it, this time the answer was 12,572,104.
I tried SELECT <non-key field> ... WHERE KeyField IS NULL. The execution plan used an index scan to build a hash table and used a parallel hash join with the full table to get the results. The estimated row size is 1,605,243 but it returned 12,500,284 rows.
June 28, 2005 at 9:04 am
Hehe, sorry. Obviously I forgot part that before I started answering
I don't see anything else that I might point out though... But since the value of the COUNT with the IS NULL is varying from execution to execution maybe this is a good time to bring in Microsoft PSS.
June 28, 2005 at 12:31 pm
Have you tried your query with (MAXDOP = 1)?
* Noel
June 28, 2005 at 12:47 pm
This is on a 2-CPU (hyperthreaded) server.
With MAXDOP=1, I get the correct result for SELECT COUNT(*) ... IS NULL (1.6 million). With MAXDOP=2,3, or 4 I get various numbers in the 10-12 million range.
June 28, 2005 at 12:58 pm
Hey Noeld, can you explain why the dual processor is screwing this operation??
June 28, 2005 at 1:31 pm
Assuming that the Poster is not using SP4 (most of us aren't ) he just hit and old bug in SMP systems:
http://support.microsoft.com/default.aspx?scid=kb;en-us;814509
* Noel
June 28, 2005 at 1:41 pm
Thanx... aren't I glad I don't have a dual proc .
June 28, 2005 at 1:41 pm
Crap, I just remembered that the new unoperational server is dual proc .
June 28, 2005 at 1:52 pm
Unfortunately for me I have hit that bug times, in the begining it cost me some hair
* Noel
Viewing 15 posts - 1 through 15 (of 25 total)
You must be logged in to reply to this topic. Login to reply