March 31, 2010 at 8:14 am
Hugo Kornelis (3/31/2010)
I would agree that, depending on transaction isolation level and @@TRANCOUNT, the results of the three statements might be different on a busy server. But for your code, with the single SELECT, I would consider anything but three equal values a bug.
I don't see why. The COUNT done at the default isolation level could easily read the same values twice or skip values completely. (Ignore the complications introduced by my use of system tables for the sake of an easy example). Three different scans could easily, in general, produce three different results. The fact that it is a single statement doesn't change that.
And in situations such as the scenario that brought us here, where the actual subquery occurs only once in the submitted query and it's SQL Server itself that expands it, I'd consider it even more buggy!
That I can see, but if we accept that the 'compressed' CASE statement is purely a syntactical convenience, and that it is merely shorthand for the searched expression...then clearly the subqueries must be evaluated x number of times, due to the fact that data access makes the expression non-deterministic.
I agree that this whole area is a very odd duck though - much like the multiply-referenced CTE issue.
March 31, 2010 at 8:28 am
Paul White NZ (3/31/2010)
Hugo Kornelis (3/31/2010)
I would agree that, depending on transaction isolation level and @@TRANCOUNT, the results of the three statements might be different on a busy server. But for your code, with the single SELECT, I would consider anything but three equal values a bug.I don't see why. The COUNT done at the default isolation level could easily read the same values twice or skip values completely.
No. That would be possible if you use NOLOCK or DIRTY_READ, but not at the default level (READ_COMMITTED).
Three different scans could easily, in general, produce three different results. The fact that it is a single statement doesn't change that.
Except that I believe that when it is a single statement, there should not be three different scans. Especially BECAUSE that could return inconsistent results.
But I think we'll just have to agree to disagree on this one! 😉
March 31, 2010 at 8:42 am
Hugo Kornelis (3/31/2010)
No. That would be possible if you use NOLOCK or READ_UNCOMMITTED, but not at the default level (READ_COMMITTED).
I know you know this! It has been blogged many times, but here's the entry from your 'blog-colleague' Alex K:
Double/missed reads are definitely possible and even quite likely even at REPEATABLE READ.
Except that I believe that when it is a single statement, there should not be three different scans. Especially BECAUSE that could return inconsistent results. But I think we'll just have to agree to disagree on this one! 😉
Ah right, I see your point. You think the three identical sub-queries should be 'collapsed' into one execution by SQL Server. Sounds distinctly dodgy to me 🙂
March 31, 2010 at 8:58 am
I think my 'distinctly dodgy' comment warrants a little expansion :blush:
AFAIK, SQL Server has never set out to guarantee anything about a query behaving as if it occurred all-at-once at a single point in time. You can approximate this behaviour, indirectly, using the SERIALIZABLE isolation level. Statement-level consistency can also be achieved using READ_COMMITTED_SNAPSHOT. Transaction-level consistency requires full SNAPHSOT. None of them will explicitly perform the sort of 'expression collapsing' you seem to be after - at least not with anything that accesses data. Compute Scalar operators are frequently used to re-use computed constant values in a query plan, of course.
March 31, 2010 at 9:14 am
Good question.
Also, I learned a lot more from the discussion from Paul and Hugo.
---------------------------------------------------------------------
Use Full Links:
KB Article from Microsoft on how to ask a question on a Forum
March 31, 2010 at 9:32 am
Trey Staker (3/31/2010)
Also, I learned a lot more from the discussion from Paul and Hugo.
Hey thanks, Trey. Hugo's always good for a quality discussion 🙂
March 31, 2010 at 11:06 am
Paul White NZ (3/31/2010)
I think my 'distinctly dodgy' comment warrants a little expansion :blush:AFAIK, SQL Server has never set out to guarantee anything about a query behaving as if it occurred all-at-once at a single point in time. You can approximate this behaviour, indirectly, using the SERIALIZABLE isolation level. Statement-level consistency can also be achieved using READ_COMMITTED_SNAPSHOT. Transaction-level consistency requires full SNAPHSOT. None of them will explicitly perform the sort of 'expression collapsing' you seem to be after - at least not with anything that accesses data. Compute Scalar operators are frequently used to re-use computed constant values in a query plan, of course.
This is definitely a gray area. In SQL Server, the default transaction mode is Autocommitment. This means that each statement is committed when it completes (see "http://msdn.microsoft.com/en-us/library/ms187878(v=SQL.100).aspx".) That has most frequently been explained to me as "each statement being treated as its own transaction". However, Microsoft's documentation does not explicitly state this.
If each statement were indeed treated as its own transaction, then Hugo's query should always return the same results in all three values, because it's a single statement (even though it has subqueries, they are part of a larger single statement) and the Isolation rule would require that the statement see the data as it was before any concurrent query started, or after any concurrent query finished, but not in some intermediate state.
March 31, 2010 at 11:56 am
sknox (3/31/2010)
..."each statement being treated as its own transaction". However, Microsoft's documentation does not explicitly state this.
http://msdn.microsoft.com/en-us/library/ms175523.aspx does.
If each statement were indeed treated as its own transaction, then Hugo's query should always return the same results in all three values, because it's a single statement (even though it has subqueries, they are part of a larger single statement) and the Isolation rule would require that the statement see the data as it was before any concurrent query started, or after any concurrent query finished, but not in some intermediate state.
The isolation rule says:
Modifications made by concurrent transactions must be isolated from the modifications made by any other concurrent transactions. A transaction either recognizes data in the state it was in before another concurrent transaction modified it, or it recognizes the data after the second transaction has completed, but it does not recognize an intermediate state. This is referred to as serializability because it results in the ability to reload the starting data and replay a series of transactions to end up with the data in the same state it was in after the original transactions were performed.
But, also see: Isolation Levels in the Database Engine
That link is really important in understanding what the various isolation level do, and do not, guarantee.
Please also read the blog entry I referred to before, showing how data can be double-read or skipped entirely under READ COMMITTED or REPEATABLE READ.
Paul
March 31, 2010 at 12:08 pm
Paul White NZ (3/31/2010)
Hugo Kornelis (3/31/2010)
No. That would be possible if you use NOLOCK or READ_UNCOMMITTED, but not at the default level (READ_COMMITTED).I know you know this! It has been blogged many times, but here's the entry from your 'blog-colleague' Alex K:
Double/missed reads are definitely possible and even quite likely even at REPEATABLE READ.
I've got to correct you AND myself now.
The blog post by Alex that you link to proves that you can get inconsistent results, but it does not read a row twice, nor will it skip a row. The issue in Alex' post is that one connection reads the entire table while another connection does a transaction that consists of two updates to two different rows. When the first connection reads the first row involved in the update, the second connection has not yet started updating it so it is not locked; it can be read and the first connection gets the "old" (but at that time still current) value. A few microseconds later, it tries to read the other row, but that has been locked at this time by the pending update. So the read is blocked until the update transaction is either committed or rolled back. The former is what ends up happening, so the "new" value of the row (which at that time is the current and committed value) is read. I must say that I doubt if this is how Codd and ANSI intended relational databases to work, but it is well known behaviour.
In order to demonstrate reading the same row twice or skipping a row, you'd have to use a script that get a SELECT COUNT(*) query wrong, even though the concurrent transaction never inserts or deletes rows. As far as I recall, Itzik Ben-Gan was the first to show that this can happen when the NOLOCK hint is used in the SELECT COUNT(*) query. But when I was googling for the article, I found an excellent description of the issue by Tony Rogerson (who credits Itzik, so I guess my memory is still doing at least some things right). Here is a link: http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/11/10/1280.aspx.
But while searching for this, I also found another article that Tony wrote where he demonstrates that the same thing can indeed happen under the read committed isolation level. This article was new to me, which is why I contradicted you at first, but I was indeed wrong on this, as shown by Tony here: http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/11/16/1345.aspx.
March 31, 2010 at 12:22 pm
sknox (3/31/2010)
If each statement were indeed treated as its own transaction, then Hugo's query should always return the same results in all three values, because it's a single statement (even though it has subqueries, they are part of a larger single statement) and the Isolation rule would require that the statement see the data as it was before any concurrent query started, or after any concurrent query finished, but not in some intermediate state.
Well, as already mentioned in my previous post, this is not how SQL Server works. If 10,000 rows are read, each row is read in a committed state, but the 10,000th row can change between the moment the 1st one is read and the 10,000th row itself is read. The only thing that can not happen is a row being read with uncommitted data. So if a concurrent transaction executes
BEGIN TRANSACTION;
UPDATE TheTable
SET SomeColumn = 'New Value'
WHERE RowNumber = 10000;
ROLLBACK TRANSACTION;
then under read committed, there is no way the 'New Value' can ever be read. Under read uncommitted or with the nolock hint -which is the same-, you can get this value (which logically never existed in the database) returned. You could even get data returned that violates a constraint and that therefor logically is IMPOSSIBLE to exist in the database, since there always passes at least a minimum amount of time from the moment the change is made until the moment the constraint violation is discovered and the modification is rolled back.
But I'm digressing.
The point I am actually trying to make is not so much about how SQL Server actually works, but how I believe the relational model is intended to work. For modifications, this is known - regardless of how the engine is actually implemented, it has to act as if all modifications are carried out in a single unsplittable time moment. This is why [font="Courier New"]UPDATE Table SET ColA = ColB, ColB = ColA;[/font] works and actually swaps the columns' values - in a procedural language, you'd need some work area for intermediate storage.
For SELECT, my gut feeling is that the behaviour should be similar. But I do not know if this is actually dictated by the ANSI standard. I do have a copy of it, but it's quite hard to read and I can't invest the time right now. So whether backed bty ANSI or not - my gut feeling is that, if the same data is used multiple times in a query, it should use the same value, regardless of concurrent modifications.
(And, of course, if the reference is only actually used once in the query and some internal process of SQL Server duplicates it somewhere on the road from query to execution plan, having the result changed between the first and second execution is only worse).
March 31, 2010 at 12:34 pm
Hugo Kornelis (3/31/2010)
James Lean (3/31/2010)
CirquedeSQLeil (3/30/2010)
However, you will find that many prefer the isnull due to an increase in performance.Do you have a supporting link or further information on the performance difference between ISNULL and COALESCE? I was unaware of any significant difference between the two?
COALESCE is implemented as a quite literal implementation of its definition in the ANSI standard (which is: COALESCE(a, b, ...) is equivalent to CASE WHEN a IS NOT NULL THEN a WHEN b IS NOT NULL THEN b ... ELSE NULL END).
So if any of the arguments is a subquery, the implementation if COALESCE((subquery), ...) becomes CASE WHEN (subquery) IS NOT NULL THEN (subquery) ... ELSE NULL END. And the optimizer that sits a bit further down the line does apparently not realize that the two subqueries are identical, so it happily produces a plan that will evaluate the same subquery twice.
Without the subqueries, I have never witnessed a measurable performance difference.
I haven't taken the testing quite to the same level as some of the tests done by others in the community. Adam Machanic summarizes his testing here.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
March 31, 2010 at 12:39 pm
Hugo Kornelis (3/31/2010)
I've got to correct you AND myself now.
Did you read the comments in Alex's blog entry (including mine) or the previous article?
My point there, as here, is that page splits can cause rows to be read twice or not at all.
I felt sure that you and I had discussed this effect before.
Alex's previous article demonstrates COUNT(*) totals going awry, and IIRC Tony R's blog entry is about roughly the same thing, but I am not a fan 🙂
Please do read the comments on Alex K's blog, because I don't want you to think I am making this up as I go along...;-)
March 31, 2010 at 12:40 pm
James Lean (3/31/2010)
Thanks Hugo. That would seem to be a very specific case, I wouldn't have thought it was a good reason to make a general case for using COALESCE over ISNULL.Personally I do tend to use COALESCE, even with only two arguments. That way, if you do need to add more arguments in future you don't have to worry about changing it. Unless I do need to take advantage of the implicit CASTing to the first argument's type, in which case I'll use ISNULL.
As always, I suppose, "it depends". 😉
My preference is for the use of coalesce. Especially in the scenario that I must use several comparisons. I certainly would not prefer to use isnull in that scenario.
isnull(isnull(isnull(somevalue,othervalue),othervalue1),othervalue2)
or
coalesce(somevalue,othervalue,othervalue1,othervalue2)
It seems a lot easier to read and understand as well.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
March 31, 2010 at 12:42 pm
Trey Staker (3/31/2010)
Good question.Also, I learned a lot more from the discussion from Paul and Hugo.
Agreed - good discussion.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
March 31, 2010 at 1:17 pm
CirquedeSQLeil (3/31/2010)
Adam Machanic summarizes his testing here.
In the comments, Adam notes that there seems to be no difference in SQL Server 2008...
Viewing 15 posts - 16 through 30 (of 44 total)
You must be logged in to reply to this topic. Login to reply