January 11, 2005 at 2:58 am
Hello
Here's a challange 😉
The following query:
select count(*) from Table_T
returns 3804039
select count(*) from Table_T where datearchived is null
The result is 3753136
running the following:
select count(*) from Table_T where datearchived is null and (batch is not null or batch != '')
*should* return fewer rows? It actually returns 3774090!
We transferred the table via DTS to a Test server, & the problem went away. So, last night we detached the db & copied to a different location, thus hopefully ensuring we have a 'bit copy' of it. We are in the process of copying this file to the test server, where we will attach it.
The current theory is that the dateArchived column contains a rougue value that is terminating the resultset.
Anyone seen this sort of behaviour before? If so, how did you fix it?
David Jackson
January 11, 2005 at 3:05 am
What is the count if you ask like this?
select count(*) from Table_T where datearchived is null and ISNULL(batch, ' ') < > ' '
(remember there is a difference between ' ' and '' - space and empty string)
/Kenneth
January 11, 2005 at 3:15 am
Thanks for the response
I had a simliar one, but for completness:
select count(*) from Table_t
select count(*) from Table_t where datearchived is null
select count(*) from Table_t where (batch is not null or batch != '') and datearchived is null
select count(*) from Table_t where (batch is not null or batch != '') and isNull(datearchived, 0) = 0
select count(*) from Table_t where (batch is not null or batch != '') and coalesce(datearchived, 0) = 0
select count(*) from Table_t where datearchived is null and ISNULL(batch, ' ') ' '
returns:
-----------
3804063
(1 row(s) affected)
-----------
3753156
(1 row(s) affected)
-----------
3774106
(1 row(s) affected)
-----------
3751496
(1 row(s) affected)
-----------
3751496
(1 row(s) affected)
-----------
3688811
(1 row(s) affected)
David 'baffled' Jackson
January 11, 2005 at 4:01 am
Hmmm.. well.. if you take a closer look at this formulation: (batch is not null or batch != '')
What you ask for here is the count of all rows that are not null (this includes spaces) - OR rows which do not have a space. (those are counted again). What you've managed to do is to count some rows twice.
The correct syntax for the question that you want to pose is
select count(*) from Table_t where datearchived is null and ISNULL(batch, ' ') < > ' '
That is, count all rows that have datearchived set to null AND batch is anything but a space or null.
/Kenneth
January 11, 2005 at 4:12 am
Just another thought:
Have you used sp_spaceused @updateusage = true or sp_updatestats yet?
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
January 11, 2005 at 4:23 am
Here's a quick demo showing the 'flawed' logic.
create table #t (id int not null, batch char(1) null, arch datetime null )
insert #t select 1, null, null
insert #t select 2, ' ', null
insert #t select 3, 'a', null
insert #t select 4, 'b', getdate()
insert #t select 5, 'c', getdate()
select * from #t
id batch arch
----------- ----- --------------------------
1 NULL NULL
2 NULL
3 a NULL
4 b 2005-01-11 10:44:44.857
5 c 2005-01-11 10:44:44.857
(5 row(s) affected)
select * from #t where arch is null
id batch arch
----------- ----- --------------------------
1 NULL NULL
2 NULL
3 a NULL
(3 row(s) affected)
select * from #t where arch is null and (batch is not null or batch <> ' ')
id batch arch
----------- ----- --------------------------
2 NULL
3 a NULL
(2 row(s) affected)
Ooops! Not quite what we intended, though it is exactly what we asked for arch is null AND batch is not null OR not a space. (OR is the key word here)
select * from #t where arch is null and isnull(batch, ' ') <> ' '
id batch arch
----------- ----- --------------------------
3 a NULL
(1 row(s) affected)
This is more like it
/Kenneth
January 11, 2005 at 5:39 am
Hello
I followed the above, and I can now see where I am counting rows twice. So suddenly I'm a lot more comfortable that my db is not corrupt. Just my logic!
but, can you explain why the following two queries return diferent results?
select count(*) from Table_T where (ISNULL(batch, ' ') ' ') and ISNULL(datearchived, 1) = 1
select count(*) from Table_T where (ISNULL(batch, ' ') ' ') and datearchived is null
-----------
3687529
(1 row(s) affected)
-----------
3688841
(1 row(s) affected)
Also, when I run the original queries on a test server, I get a smaller number the second time round, unlike my production system which it is *nearly* always get a bigger one. Sometimes it is smaller, which is what I expected.
Dave 'still stumped' Jackson
January 11, 2005 at 6:13 am
Which is the greater amount, the first or the second? I suspect it is the first which if so check your datearchived column for any values of 1/2/1900 as that is what 1 will be implicitly converted to to make equal datatypes with datearchived.
January 11, 2005 at 9:51 am
No, it is the second.
Actually, we think we have found the problem, & a workround.
The title of the page at http://support.microsoft.com/default.aspx?scid=kb;en-us;814509#kb4
says it all really, "A Parallel Query with a COUNT Aggregate Function May Return Unexpected Results" May!
The workround is to turn OFF parallelism with the maxdop option. So:
select count(*) from Letter_T where (ISNULL(batch, ' ') ' ') and ISNULL(datearchived, 1) = 1 option (maxdop 1) select count(*) from Letter_T where (ISNULL(batch, ' ') ' ') and datearchived is null option (maxdop 1)
Now both return the same.
You cannot test this on a PC with only 1 CPU, as Parallelism does NOT come into consideration by the optimizer.
This was why I could only see it on my Production server.
We added a new index last week as the SP in question was perfoming table scans. This has introduced the problem documented at http://support.microsoft.com/default.aspx?scid=kb;en-us;814509#kb4.
Taking the index off means we now get the lower of the two numbers, which is correct.
It now table scans again, but slow & right is better than quick & wrong!
We will evaluate our other procedures to see if the 'new' index will cause the same problem. If so, we need to add option (maxdop 1) to them too.
Moral: Adding indexes can really hurt selects as well as updates & inserts.
Live & learn. Sigh. This has cost us 5-6 man days & a lot of grief.
I hope posting this steers someone in the right direction quicker than we got there.
--
David Jackson
January 12, 2005 at 2:10 am
The general recommendation (from me and some others) since 7.x, is to default to set parallellism OFF, and only turn it on with individual querys proven to benefit from it and also work as expected.
In other word, parallelism should always be off by default. Your case proves this point yet again.
/Kenneth
January 12, 2005 at 3:36 am
😉
The cost of the SP With parallelism & the index which causes the problem (non clustered) is 16
without parallelism - 33
without the index, with parallelism - 133! (Why we put the index on in the first place).
without the index, without parallelism - 142.
So the parallelism isn't helping as much as the index.
My problem is that if I apply the index & turn off parallelism, I suspect I'll cripple the app we use, which is written in VB6 & no great performer anyway. Going to my boss & telling him I will slow the system down will not be good.
On the other hand, I cannot guarantee DI if I don't do something! I am investigating the Hot fix, which I have been running locally for a while. Of course, my PC is single CPU, so I've never seen this problem when testing etc.
I have took the index off while we do an impact assessment on what other queries could be affected.
The pattern is COUNT Aggregate with a Is Null (or equivalent) in the where clause, so I can look for those.
--
David Jackson
January 12, 2005 at 6:24 am
Unfortunately, if the choice is between 'fast but not correct' and 'slow but correct', and at the same time 'slow' is deemed as 'not working', you're between a rock and a hard place.
The current situation is in a 'not working' status, and the fix is considered as the same.
What you can try to do, if turning off parallellism is absolutely impossible, is trying to rewrite the query differently and see if that helps. A different plan may help - though you have no control what plan the server will choose every time. Since parallellism seems to be proven be the culprit here, staying with it on, seems risky at best.
/Kenneth
January 12, 2005 at 7:19 am
Taking the non clustered index has got this going, albiet at the cost of increasing query times for any SP's that access that particular table.
As I said, an impact assessment is on going, to whit:
Find any SP's that meet this scenario
Run the select code contained in each, multiple times, to see if it returns different results.
If so, run the same code with the option (maxdop 1) added to see if it fixes it.
Fix it.
We put together a little VB app that runs code/SP's multiple times, and stores the rowcount of the result set it retrieves. If anyone is interested I'll post the code.
Dave 'really fed up with MS for releasing this' Jackson
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply