April 8, 2011 at 8:50 pm
Here's the non-Quirky Update code to do what you want. For those that think a CTE would be better, please read the comments in the code to find out why you might want to reconsider...
--=====================================================================================================================
-- Setup the demonstration. Nothing in this section is a part of the solution. It's just to create test data.
--=====================================================================================================================
--===== Do this demonstration in a nice safe place that everyone has.
USE tempdb;
--===== Conditionally drop all the test tables to make reruns easier
IF OBJECT_ID('tempdb.dbo.tmp_conf_server' ,'U') IS NOT NULL DROP TABLE tempdb.dbo.tmp_conf_server;
--===== Create a simulation of a real table in tempdb
CREATE TABLE dbo.tmp_conf_server
(
RecordedDateTime DATETIME NOT NULL,
ServerName VARCHAR(260) NOT NULL,
LastRestarted DATETIME NOT NULL,
TotalBatchRequests BIGINT NOT NULL,
TotalDeadlocks BIGINT NOT NULL,
TotalCompilations BIGINT NOT NULL,
TotalReCompilations BIGINT NOT NULL,
CONSTRAINT PK_cls_RecordedDateTime_Servername__tmp_conf_server
PRIMARY KEY CLUSTERED (RecordedDateTime ASC, ServerName ASC)
);
--===== Populate the simulated real table with test data
INSERT INTO tempdb.dbo.tmp_conf_server
(RecordedDateTime, ServerName, LastRestarted, TotalBatchRequests, TotalDeadlocks, TotalCompilations, TotalReCompilations)
SELECT '2011-04-05 16:46:46.630','server1','2011-02-25 13:29:00.000',500,12,3050,5555 UNION ALL
SELECT '2011-04-05 17:49:22.350','server1','2011-02-25 13:29:00.000',750,12,4000,6666 UNION ALL
SELECT '2011-04-06 17:34:22.473','server1','2011-02-25 13:29:00.000',810,12,4075,7777 UNION ALL
SELECT '2011-04-05 16:46:47.630','server2','2011-02-27 11:29:00.000',888,5 ,30 ,3333 UNION ALL
SELECT '2011-04-05 17:49:23.350','server2','2011-02-27 11:29:00.000',891,6 ,40 ,5900 UNION ALL
SELECT '2011-04-06 17:34:23.473','server2','2011-02-27 11:29:00.000',950,8 ,7000,5950;
--=====================================================================================================================
-- Solve the problem of getting a "delta" between the current and previous rows for the "INC" columns
-- in a simplified manner and without a "Quirky Update".
--=====================================================================================================================
--===== Conditionally drop all the test tables to make reruns easier
IF OBJECT_ID('tempdb..#tmp_conf_server_Inc','U') IS NOT NULL DROP TABLE #tmp_conf_server_Inc;
--===== Create and populate the table we'll use to calculate the increments with.
-- Notice how ISNULL is used to make a NOT NULL column on the fly.
SELECT RowNum = ISNULL(ROW_NUMBER() OVER (ORDER BY ServerName, RecordedDateTime),0),
ServerName,
RecordedDateTime,
LastRestarted,
TotalBatchRequests,
TotalDeadlocks,
TotalCompilations,
TotalReCompilations,
IncSeconds = CAST(0 AS BIGINT),
IncBatchRequests = CAST(0 AS BIGINT),
IncDeadlocks = CAST(0 AS BIGINT),
IncCompilations = CAST(0 AS BIGINT),
IncReCompilations = CAST(0 AS BIGINT)
INTO #tmp_conf_server_Inc
FROM dbo.tmp_conf_server;
--===== Add a clustered index (as a PKK in this case) NOT to do a "Quirky Update" but to simply make the code run faster.
-- As a sidebar, NEVER name a constraint on a #Temp table or it will destroy the ability to do concurrent runs.
ALTER TABLE #tmp_conf_server_Inc
ADD PRIMARY KEY CLUSTERED (RowNum);
--===== You could certainly do this without a Temp Table using a CTE, but it would take twice as long because it would
-- have to execute the CTE twice.
SELECT hi.ServerName,
hi.RecordedDateTime,
hi.LastRestarted,
hi.TotalBatchRequests,
hi.TotalDeadlocks,
hi.TotalCompilations,
hi.TotalReCompilations,
IncSeconds = CASE WHEN hi.ServerName = lo.ServerName THEN DATEDIFF(ss, lo.RecordedDateTime , hi.RecordedDateTime) ELSE NULL END,
IncBatchRequests = CASE WHEN hi.ServerName = lo.ServerName THEN hi.TotalBatchRequests - lo.TotalBatchRequests ELSE NULL END,
IncDeadlocks = CASE WHEN hi.ServerName = lo.ServerName THEN hi.TotalDeadlocks - lo.TotalDeadlocks ELSE NULL END,
IncCompilations = CASE WHEN hi.ServerName = lo.ServerName THEN hi.TotalCompilations - lo.TotalCompilations ELSE NULL END,
IncReCompilations = CASE WHEN hi.ServerName = lo.ServerName THEN hi.TotalReCompilations - lo.TotalReCompilations ELSE NULL END
FROM #tmp_conf_server_Inc lo
RIGHT OUTER JOIN #tmp_conf_server_Inc hi ON lo.RowNum+1 = hi.RowNum;
--Jeff Moden
Change is inevitable... Change for the better is not.
April 10, 2011 at 1:16 am
Jeff Moden (4/8/2011)
Also, be advised. You didn't add the required clustered index to the #Temp table and that may be why your 2 part code didn't work the way you expected. Still and like I said before, a "Quirky Update" isn't need for the simple chore of diffing two adjacent rows. I'll have some code for you for that soon.
Nadrek,
In case you're wondering, and if I'm not mistaken, the clustered index fields are in the wrong order.
April 10, 2011 at 11:51 am
They in the incorrect order only on the source table. That's why he copies the rows to another table where he can apply a different clustered index.
Still, you just don't need a QU for this exercise.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 11, 2011 at 9:56 am
No, I'm not trolling; I'm advertising your article! The full context was:
-- Quirky update!!! BEWARE!!! Do not use this for critical things!
-- BEFORE USING, read http://www.sqlservercentral.com/articles/T-SQL/68467/
Jeff Moden (4/8/2011)
Also, be advised. You didn't add the required clustered index to the #Temp table and that may be why your 2 part code didn't work the way you expected. Still and like I said before, a "Quirky Update" isn't need for the simple chore of diffing two adjacent rows. I'll have some code for you for that soon.
What is the problem with creating the clustered index as the primary key in the CREATE TABLE #tmp_conf_server_Inc statement itself, which was:
CONSTRAINT [PK_cls_RecordedDateTime_Servername__#tmp_conf_server_Inc] PRIMARY KEY CLUSTERED
(
[Anchor],
[ServerName] ASC,
[RecordedDateTime] ASC
)WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90, DATA_COMPRESSION = PAGE) ON [PRIMARY]
Was there another problem with my two-part set statement variant?
I see the Rownum based self-join; I'll take a look at how that performs with several hundred thousand rows when I get a chance.
April 12, 2011 at 12:01 pm
Nadrek (4/11/2011)
No, I'm not trolling; I'm advertising your article! The full context was:
-- Quirky update!!! BEWARE!!! Do not use this for critical things!
-- BEFORE USING, read http://www.sqlservercentral.com/articles/T-SQL/68467/
"Do not use this for critical things" is an "advertisement"? Not where I come from. 😉
And, yes, I use it for "critical" things.
Anyway, sorry about the confusion on the clustered index. I got a bit cross-eyed on that because of the names of the tables. I'll take another look.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 12, 2011 at 12:11 pm
Jeff Moden (4/12/2011)
"Do not use this for critical things" is an "advertisement"? Not where I come from. 😉
And, yes, I use it for "critical" things.
Anyway, sorry about the confusion on the clustered index. I got a bit cross-eyed on that because of the names of the tables. I'll take another look.
Well, the link to the article is an advertisement :). The "do not use this for critical things" is much easier to say than "you have to follow all the rules; if you fail even one rule, you may, or may not, receive invalid results"; and I tend towards limited use of undocumented features for critical work; I'll use them more often in noncritical work.
May 11, 2011 at 3:42 pm
Thanks to all the contributors to this article and its comments. To help some purists understand how Jeff's article is adding value: I work in a management consulting firm where we use SQL as a more powerful version of Excel - very far from production, with custom-coded analyses we use for just a few months in a specific server environment, with the analyses carefully sanity checked by humans before implementing anything, and we just have to be right on average and not 100% of the time. A decade ago, one of us discovered quirky update (probably from one of the poorly written articles back then), and it has enabled us to run sophisticated analysis on multi-billion row tables to drive hundreds of millions of dollars of measurable value for our clients. Without quirky update, we would often not be able to deliver our answers quickly enough to drive this value. (Although I'm excited to try Paul's SQLCLR solution now, especially to see if it works or can be adapted to work for partitioned tables.)
Over the years, we discovered many of these rules (most importantly, the row-increment error checking that I think Paul and Tom posted here), but it has been very helpful to have them all in one place! For the last year we have been referring our new analysts to this article as a helpful checklist to accompany the careful sanity checking and sample testing they always do when writing new code, and we have been so grateful to have it.
For the forthcoming rewrite, it might be helpful to note (in an appendix?) that if you set the value of the same variable multiple times, it is not predictable in which order these statements will be processed. I know that should be the default assumption, and it has been pointed out in some of the earlier posts in this conversation. Still, it's something we've found worth including in our internal training documents because it often trips up newcomers to the technique and costs them some debugging time to learn it the hard way.
May 11, 2011 at 5:04 pm
Nadrek (4/12/2011)
Jeff Moden (4/12/2011)
"Do not use this for critical things" is an "advertisement"? Not where I come from. 😉
And, yes, I use it for "critical" things.
Anyway, sorry about the confusion on the clustered index. I got a bit cross-eyed on that because of the names of the tables. I'll take another look.
Well, the link to the article is an advertisement :). The "do not use this for critical things" is much easier to say than "you have to follow all the rules; if you fail even one rule, you may, or may not, receive invalid results"; and I tend towards limited use of undocumented features for critical work; I'll use them more often in noncritical work.
Sorry... I missed this... Maybe not in your country or language but that's not what's implied by such language in my country and the two statements have drastically different meanings. I actually take offense to the wording or would prefer that you not use it again and also change existing wording.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 11, 2011 at 5:05 pm
ethan.murray (5/11/2011)
Thanks to all the contributors to this article and its comments. To help some purists understand how Jeff's article is adding value: I work in a management consulting firm where we use SQL as a more powerful version of Excel - very far from production, with custom-coded analyses we use for just a few months in a specific server environment, with the analyses carefully sanity checked by humans before implementing anything, and we just have to be right on average and not 100% of the time. A decade ago, one of us discovered quirky update (probably from one of the poorly written articles back then), and it has enabled us to run sophisticated analysis on multi-billion row tables to drive hundreds of millions of dollars of measurable value for our clients. Without quirky update, we would often not be able to deliver our answers quickly enough to drive this value. (Although I'm excited to try Paul's SQLCLR solution now, especially to see if it works or can be adapted to work for partitioned tables.)Over the years, we discovered many of these rules (most importantly, the row-increment error checking that I think Paul and Tom posted here), but it has been very helpful to have them all in one place! For the last year we have been referring our new analysts to this article as a helpful checklist to accompany the careful sanity checking and sample testing they always do when writing new code, and we have been so grateful to have it.
For the forthcoming rewrite, it might be helpful to note (in an appendix?) that if you set the value of the same variable multiple times, it is not predictable in which order these statements will be processed. I know that should be the default assumption, and it has been pointed out in some of the earlier posts in this conversation. Still, it's something we've found worth including in our internal training documents because it often trips up newcomers to the technique and costs them some debugging time to learn it the hard way.
Ethan... thank you very much for taking the time to write that up. I appreciate it very much.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 12, 2011 at 1:40 pm
Jeff Moden (5/11/2011)
Sorry... I missed this... Maybe not in your country or language but that's not what's implied by such language in my country and the two statements have drastically different meanings. I actually take offense to the wording or would prefer that you not use it again and also change existing wording.
My apologies; I did not mean any offense. I've edited my code and post with the more precise version.
That said, while I do not need to use the quirky update, did anyone ever notice what mistake I made on the 2 part version, so I can avoid making it again?
May 12, 2011 at 6:58 pm
Nadrek (5/12/2011)
Jeff Moden (5/11/2011)
Sorry... I missed this... Maybe not in your country or language but that's not what's implied by such language in my country and the two statements have drastically different meanings. I actually take offense to the wording or would prefer that you not use it again and also change existing wording.
My apologies; I did not mean any offense. I've edited my code and post with the more precise version.
That said, while I do not need to use the quirky update, did anyone ever notice what mistake I made on the 2 part version, so I can avoid making it again?
Thank you for your consideration.
I'll also go back and try to figure out your question. Thanks, Nadrek.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 14, 2011 at 9:20 am
FYI:
I just compared this Quirky Update code to the newest enhancements in the next version of SQL Server (code named "Denali") CTP3 and found that the Quirky Update performs far greater than the new enhancements in Denali CTP3 currently do. (The Quirky Update was not modified to perform the safety check.)
Denali CTP3 code:
-- DENALI CTP3 Running Totals with enhanced over clause.
;
WITH cte AS
(
SELECT TransactionDetailID,
[Date],
AccountID,
Amount,
AccountRunningTotal = SUM(Amount) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID),
AccountRunningCount = SUM(1) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID),
NCID
FROM dbo.TransactionDetail
)
UPDATE td
SET AccountRunningTotal = cte.AccountRunningTotal,
AccountRunningCount = cte.AccountRunningCount
FROM dbo.TransactionDetail td
JOIN cte
ON cte.TransactionDetailID = td.TransactionDetailID
GO
--===== Verify the the running total worked on the
-- TransactionDetail table
USE TEMPDB --Takes 10 seconds on my machine
EXEC dbo.Verify
GO
Results (comparing Quirky Update to new features):
(Not sure why the Quirky Update is showing zero writes - possible bug in Profiler with CTP3?)
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
July 14, 2011 at 11:37 am
It's not going to matter... even though I'm seriously disappointed in the performance (as you've measured), the new SUM() is faster than a While Loop, easier to write, does not require a table update, and it's "officially" supported.
I normally don't go near the bleeding edge but I guess I'm going to have to load up Denalli CTP3 and do some testing on my own.
Thanks for the early feedback, Wayne. I really appreciate the time you put into this one.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 14, 2011 at 12:11 pm
Wayne, you can avoid the join:
UPDATE target
SET AccountRunningTotal = art,
AccountRunningCount = arc
FROM
(
SELECT
td.AccountRunningTotal,
td.AccountRunningCount,
art = SUM(td.Amount) OVER (PARTITION BY td.AccountID ORDER BY td.Date, td.TransactionDetailID),
arc = COUNT_BIG(*) OVER (PARTITION BY td.AccountID ORDER BY td.Date, td.TransactionDetailID)
FROM dbo.TransactionDetail AS td
) AS target
July 14, 2011 at 1:31 pm
That should run quite a bit faster.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 241 through 255 (of 307 total)
You must be logged in to reply to this topic. Login to reply