January 30, 2008 at 10:21 pm
Comments posted to this topic are about the item
Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5
[/font][/url]
The new article was published on 10 Nov 2009 and may be found at the following URL...
http://www.sqlservercentral.com/articles/T-SQL/68467/
The new article is basically a rewrite of the old article with some fixes and workarounds for some of the things we've encountered in the last year. A second article is in the works for some of the many uses for the "Quirky Update"... there was just too much in the first article to conveniently fit.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 30, 2008 at 10:25 pm
Watch out with those clustered indices...the order is NOT guaranteed to be the index order.
There are a few cases - out of disk space is one I experienced, parallel query execution et al - that cause the leaf data to be out of order.
Regards
Andy Davies
January 30, 2008 at 10:41 pm
Yep... thanks for the feedback. I've seen that, too. But only on SELECTs... and only on non-clustered indexes. I've never seen it fail on UPDATEs on CLUSTERED index hints. That's why I did all the proofing in the article with the Merry-Go-Round index.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 31, 2008 at 1:06 am
Yes, I'd definitely agree that the 'Quirky update' approach is the right one. I've always used an 'order by' but never tried clustered index/hints to force the order. It seems pretty robust too. I've used it in Sybase and, I reckon, all versions of SQL Server in the past.
The strange thing about the 'Quirky Update' technique is that, one has experimented with it, one discovers all sorts of problems that it solves. Directory trees, Book indices (Index, Use of, Updates), financial reporting. Whoever put the feature into Sybase was very far-sighted.
Best wishes,
Phil Factor
January 31, 2008 at 1:48 am
Hello Jeff,
very interesting article.
Somehow it throws all my beliefs into the trash bin:
1. The update method with variables is not explicitly supported by Microsoft
2. The clustered index does not guarantee ordered results
3. Order By in derived tables does not guarantee ordered results.
Your "evidence" does not really convince me. It could be just a lucky day (or a lot of them) ;).
If anyone would have some "official" documentation on these 3 topics, that would be great!
Btw: Did anyone try the "Order By" Solution?
It returns wrong results for me, but maybe did not copy the code correctly (somehow the linefeeds
disappear with c/p).
Best Regards,
Chris Büttner
January 31, 2008 at 3:40 am
I think Andy has a point there. On multiprocessor machine SQL Server might use more threads to scan the index and then merge the result sets in one data stream before executing the update, and this could mess up things. I'd add a maxdop 1 option just to be sure. Even with one thread, I'm not sure that SQL Server would start scanning from the beginning of the index if some data pages are already in memory (perhaps because someone has just finished a full scan of the clustered index). We should use an order by to be sure, but it's not allowed on updates. Here's a trick I use in situations like this: I put a 'greater than' condition on the clustered index column(s) (it works on single column keys, it should work on multicolumn key if we use the first column of the index), where the comparison value is less than any value of the index column (say min(IndexColumnName)-1). This forces the engine to traverse the index looking for the first record matching the condition to start the scan from there. You could also avoid the index hint, unless that column also belongs to other indexes.
Interesting, these are the execution plans without and with the ordering condition AccountID>0:
UPDATE dbo.JBMTest
SET @PrevRunBal = RunBal = @PrevRunBal + Amount,
@PrevAcctID = AccountID
FROM dbo.JBMTest WITH (INDEX(IX_JBMTest_AccountID_Date),TABLOCKX)
OPTION(MAXDOP 1)
StmtText -------------------------------------------------------------------------------------------------------------------------------------------------------------
|--Clustered Index Update(OBJECT: ([Northwind].[dbo].[JBMTest].[IX_JBMTest_AccountID_Date]), SET: ([@PrevAcctID]=[Expr1005], [JBMTest].[RunBal]=[Expr1004]))
|--Compute Scalar(DEFINE: ([Expr1004]=[@PrevRunBal]=[@PrevRunBal]+[JBMTest].[Amount], [Expr1005]=[@PrevAcctID]=[JBMTest].[AccountID]))
|--Top(ROWCOUNT est 0)
|--Clustered Index Scan(OBJECT: ([Northwind].[dbo].[JBMTest].[IX_JBMTest_AccountID_Date]))
UPDATE dbo.JBMTest
SET @PrevRunBal = RunBal = @PrevRunBal + Amount,
@PrevAcctID = AccountID
FROM dbo.JBMTest WITH (INDEX(IX_JBMTest_AccountID_Date),TABLOCKX)
WHERE AccountID>0
OPTION(MAXDOP 1)
StmtText --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|--Clustered Index Update(OBJECT: ([Northwind].[dbo].[JBMTest].[IX_JBMTest_AccountID_Date]), SET: ([@PrevAcctID]=[Expr1005], [JBMTest].[RunBal]=[Expr1004]), DEFINE: ([Expr1004]=[@PrevRunBal]=[@PrevRunBal]+[JBMTest].[Amount], [Expr1005]=[@PrevAcctID]=[JBMTest].[AccountID]), WHERE: ([JBMTest].[AccountID] > 0))
Quite different, but with the same results in terms of I/O statistics:
Table 'JBMTest'. Scan count 1, logical reads 10186, physical reads 0, read-ahead reads 50.
Ok, time to go back to work.
bye
Salvor
January 31, 2008 at 3:59 am
Very good article, nice testcode. :w00t::cool:
There may be some testing needed regarding parallelism,
maybe for now just add a OPTION(MAXDOP 1).
The only remarks I would add to the conclusion is :
- For now it works and it is the fasted propriatary solution for the problem.
Remeber the views with order by clause that suddenly nolonger
externalized the order in SQL2005, but worked so well in SQL2000.
Thank you for sharing the knowledge.:smooooth:
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
January 31, 2008 at 4:33 am
I have to agree with the naysayers unfortunately. It's great at the moment and really does seem to work (I haven't seen it fail yet either). I was always suspicious and slightly annoyed when people used the select top 100 percent with an order by in a view - dirty hack in my opinion. This could go the same way. Having said that, perhaps we could get someone from the MS Query Engine team to elaborate on
* The update syntax of @variable = col = newValue
* What happens if you have multiple of these in the same update. eg
@variable = col = col + @variable
col2 = col + @variable
Does col2 effectively get col + col + @variable or just col + @variable?
* Forced ordering.
I'm going to run some quick tests on my sql 2k5 installation to see if there are any quirks (I'd be very surprised if I found something that was overlooked by the likes of Jeff though 😛 )
January 31, 2008 at 5:17 am
Christian,
The use of variables in Update statements has always been supported by Microsoft. The full syntax for the original Transact SQL Update statement is...
[font="Courier New"]
update [[database.]owner.]{table_name | view_name}
set [[[database.]owner.]{table_name. | view_name.}]
column_name1 =
{expression1 | null | (select_statement)} |
variable_name1 =
{expression1 | null | (select_statement)}
[, column_name2 = {expression2 | null |
(select_statement)}]... |
variable_name2 = {expression1 | null | (select_statement)}
[from [[database.]owner.] {table_name | view_name}
[, [[database.]owner.] {table_name |
view_name}]]...
[where search_conditions]
[/font]
and the classic example that Sybase gives for assigning variables in an update statement is....
[font="Courier New"]
DECLARE @price money
SELECT @price = 0
UPDATE titles
SET total_sales = total_sales + 1,
@price = price
WHERE title_id = 'BU1032'
SELECT @price, total_sales
FROM titles
WHERE title_id = 'BU1032'
total_sales
------------------------ -----------
19.99 4096
[/font]
It all seems to be well documented, so I reckon it is safe to use!
Best wishes,
Phil Factor
January 31, 2008 at 5:38 am
Hi Phil,
you are correct, it obviously is supported.
I misunderstood a section in a book [1] that covered this type of statement.
In this book it was only mentioned that the variable thing is "far from standard",
but obviously not "unsupported".
Thanks for your hint!
[1] Inside Microsoft SQL Server 2005: T-SQL Querying
Best Regards,
Chris Büttner
January 31, 2008 at 6:15 am
Very interesting article.
I want to add my voice to those urging caution with regard to relying on order being forced by a clustered index. According to the SQL standard, the only way to enforce order is with ORDER BY.
The problem with relying on a clustered index is that it isn't supported: no matter how many times you have tested it, it is not guaranteed to work, and data corruption introduced by a few pages not being in the order in which you expect them is likely to be both subtle and really difficult to fix. I can only imagine trying to diagnose the problem... it makes me queasy just thinking about it.
I will consider using the ORDER BY method, which seems to me to be innovative and safe code.
January 31, 2008 at 6:31 am
Phil, Chris,
Indeed no discussion regarding the use of target variables.
The point is, will its content be correct during the setbased work
if the variable is used more then once in the statement.
Suppose you have more than one price ....
(cfr result data for select from views containing an order by ( before the 'fix' and trace) )
For now it seems to work with correct results, but ... you have to
figure out how to force the process order.
I'm glad Jeff cleared it out in this article :w00t:
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
January 31, 2008 at 6:32 am
Did you test using the Clustered Index with the ORDER BY subquery? It seems to me that the ORDER BY Would enforce the order, but since you are using a Clustered Index it wouldn't cause much overhead unless you were going to get bit by a disordered resultset anyway.
--
JimFive
January 31, 2008 at 7:33 am
Since you installed SQL 2K5, have you looked into the "OVER" clause (BOL reference: ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/ddcef3a6-0341-43e0-ae73-630484b7b398.htm) and the Ranking Functions (BOL reference: ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/e7f917ba-bf4a-4fe0-b342-a91bcf88a71b.htm)?
I am curious how they perform against your test. I have used the Ranking Functions, but only on result sets of a few thousand rows and it has performed well for me.
Darrin
January 31, 2008 at 8:03 am
good one - no doubt about it. I hadn't yet seen the merry-go-round in action, but it does make sense now.
From what I've seem, the OVER syntax, and the ranking functions give you no advantage whatsoever on anything except for counts in this particular case. It's in my mind one of the failings/shortcoming of that syntax, since with all it does, you STILL have to do triangle joins or this kind of technique to get any kind of running aggregate other than a count (which you get implicitly through the ROW_NUMBER() function). I am pretty sure that's part of the OVER syntax from Ansi that didn't make the 2005 implementation (as I recall, that OTHER database product does running aggregates using its OVER syntax)..
And - just like you'd expect - using a CTE to try to force the order does nothing for you. It operates the same as a view does (inner ORDER BY is ignored).
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Viewing 15 posts - 1 through 15 (of 250 total)
You must be logged in to reply to this topic. Login to reply