May 28, 2015 at 12:56 pm
[font="Comic Sans MS"]A while back, a "quirky update" method was proposed for lightning fast running totals based on the three-part MSSQL UPDATE's SET statement and tally tables. However, some claimed this was not 100% absolutely guaranteed behavior.
How does the new OVER clause compare in terms of performance ?
[/font]
[font="Courier New"]DECLARE @Tbl TABLE
(
pk int not null primary key identity,
N int
)
INSERT INTO @Tbl (N) SELECT TOP 1000 1 FROM syscolumns a CROSS JOIN syscolumns b
SELECT pk, SUM(pk) OVER (ORDER BY pk )
FROM @Tbl
[/font]
May 28, 2015 at 1:12 pm
Done correctly (i.e. with ROWS instead of the default RANGE), the enhanced WINDOWING functionality blows away even the Quirky Update method for Running Totals. Here is a great blog post by Aaron Bertrand on a wide array of Running Totals solutions:
http://sqlperformance.com/2012/07/t-sql-queries/running-totals
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
May 28, 2015 at 1:14 pm
You could check this article by Dwain Camps https://www.simple-talk.com/sql/t-sql-programming/calculating-values-within-a-rolling-window-in-transact-sql/
May 28, 2015 at 1:48 pm
It's funny how Aaron and Dwain have different conclusions. I trust that they both know what they're doing, so I would just repeat the tests myself.
However, I might end up using window functions because they perform really well and are easier to write correctly and share.
May 28, 2015 at 2:48 pm
[font="Comic Sans MS"]Went for these links you provided and played around a bit with the ROW UNBOUNDED part I had neglected.
[/font]
[font="Courier New"]DECLARE @Tbl TABLE
(
pk BIGINT not null primary key identity, -- BIGINT necessary to avoid overflows
N int,
NSum BIGINT
)
IF OBJECT_ID('tempdb.dbo.#T2', 'U') IS NOT NULL
DROP TABLE #T2
-- Populate table with 1 million rows
INSERT INTO @Tbl (N) SELECT TOP 1000000 1 FROM syscolumns a CROSS JOIN syscolumns b
-- Use a temp table to store results to allow viewing the results on last rows without
-- sending to the results window the preceeding 1 million – 20 rows, takes too long
SELECT *
INTO #T2
FROM (
SELECT pk, SUM(pk) OVER (ORDER BY pk ROWS UNBOUNDED PRECEDING) AS NSum FROM @Tbl
) AS x
SELECT TOP 20 * FROM #T2 ORDER BY pk DESC --Last line NSUM: 500,000,500,000
[/font]
[font="Comic Sans MS"]Notes:
1.Using[/font] [font="Courier New"]ROWS UNBOUNDED [/font][font="Comic Sans MS"]took 4 seconds for 1 million rows. 10 million rows took 37 seconds
2.NOT Using[/font] [font="Courier New"]ROWS/RANGE UNBOUNDED [/font][font="Comic Sans MS"]took 22 seconds to run for 1 million rows.
3.Using[/font] [font="Courier New"]RANGE UNBOUNDED [/font][font="Comic Sans MS"]took 22 seconds to run for 1 million rows.[/font]
May 28, 2015 at 3:51 pm
I would suspect that the answer to which one is faster "quirky method" or "windowed function", is going to be heavily dependent of the available indexes.
The "quirky method" relies exclusively clustered index (assuming one exists) to establish the order of the running total. There's no real control beyond that.
The windowed function actually requires that you have, what Itzik Ben-Gan calls, a POC index (Partition, Order & Covering) in order to get the best possible performance.
Based on that, I would expect that the windowed function method to have the edge over the quirky method, only if the correct POC index is in place, otherwise, I'd expect the quirky method to blow the windowed function out of the water.
In addition... The reason I would expect the POC'd windowed function to edge out the quirky, is simply because I'd expect that the POC index would NOT contain all of the columns on the table... Which would allow more rows per page... Which means fewer pages. If the POC index covers the full table, I'd expect the two approaches to have similar performance.
That said, I have not yet had a chance to properly test this hypothesis... I'll add it to my "weekend chores" list, unless someone wants to beat me to the punch.
May 28, 2015 at 3:59 pm
Please share your test with us when you have it. 🙂
May 28, 2015 at 6:46 pm
Jason A. Long (5/28/2015)
I would suspect that the answer to which one is faster "quirky method" or "windowed function", is going to be heavily dependent of the available indexes.The "quirky method" relies exclusively clustered index (assuming one exists) to establish the order of the running total. There's no real control beyond that.
The windowed function actually requires that you have, what Itzik Ben-Gan calls, a POC index (Partition, Order & Covering) in order to get the best possible performance.
Based on that, I would expect that the windowed function method to have the edge over the quirky method, only if the correct POC index is in place, otherwise, I'd expect the quirky method to blow the windowed function out of the water.
In addition... The reason I would expect the POC'd windowed function to edge out the quirky, is simply because I'd expect that the POC index would NOT contain all of the columns on the table... Which would allow more rows per page... Which means fewer pages. If the POC index covers the full table, I'd expect the two approaches to have similar performance.
That said, I have not yet had a chance to properly test this hypothesis... I'll add it to my "weekend chores" list, unless someone wants to beat me to the punch.
Jason. Do you have a link to Itzik's article on the subject? I'm especially interested in the POC index but would love to read about insitu.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 28, 2015 at 7:06 pm
So... Just knocked out the 1st test...
I wanted to start off with a simple "Apples to Apples" race on a simple 3 column table with a single key clustered index...
/* ================================================================
Create a simple 1M test table with a single key column clustered index
================================================================ */
IF OBJECT_ID('dbo.RunningTotal_1', 'U') IS NOT NULL
DROP TABLE dbo.RunningTotal_1;
WITH n (n) AS (-- creates a million row tally table
SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)
), Tally (n) AS (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM n n1, n n2, n n3, n n4, n n5, n n6
)
SELECT
ISNULL(DATEADD(mi, t.n, '2010-01-01'), '1900-01-01') AS DateTimeStamp,--ISNULL causes the resulting column to be "NOT NULL" so that it can be used as the tables PK.
CAST(1.0 AS NUMERIC(19,4)) AS TransactionAmount,
CAST(NULL AS NUMERIC(19,4)) AS RunningTotal
INTO dbo.RunningTotal_1
FROM
Tally t;
ALTER TABLE dbo.RunningTotal_1 ADD CONSTRAINT pk_RT1 PRIMARY KEY CLUSTERED (DateTimeStamp);
I used 3 different test scripts... The 1st is the "Quirky Method" and the other two are variations of the "Windowed Function Method". (I just wanted to make sure there was no difference between updating a derived table vs updating a CTE).
SET NOCOUNT ON;
--===========================================================================
--Quirky Method --
DECLARE @rt NUMERIC(19,4) = 0;
UPDATE dbo.RunningTotal_1 SET @rt = RunningTotal = @rt + TransactionAmount;
--===========================================================================
-- Winndowed SUM Method (Update Derived table)
UPDATE x SET x.RunningTotal = x.RT
FROM (
SELECT
rt1.RunningTotal,
SUM(rt1.TransactionAmount) OVER (ORDER BY rt1.DateTimeStamp ROWS UNBOUNDED PRECEDING) AS RT
FROM
dbo.RunningTotal_1 rt1
) x;
--===========================================================================
-- Winndowed SUM Method (Update CTE)
WITH x AS (
SELECT
rt1.RunningTotal,
SUM(rt1.TransactionAmount) OVER (ORDER BY rt1.DateTimeStamp ROWS UNBOUNDED PRECEDING) AS RT
FROM
dbo.RunningTotal_1 rt1
)
UPDATE x SET x.RunningTotal = x.RT;
Note... In this case, the existing clustered index doing double duty as the POC index... (Partition = <no partition>, Order by = DateTimeStamp (PK) & Covering = <the rest of the columns in the table>)
The results are as follows...
Trials 1, 4 & 7 are the quirky method.
Trials 2, 5 & 8 are windowed / derived table
Trials 3, 6 & 9 are windowed / CTE
As you can tell, it was a blood bath... The quirky method is the winner by a huge margin...
Next test will be with a much wider table in order to "fatten up" the clustered index and blow out the number of pages...
May 28, 2015 at 7:21 pm
Jeff Moden (5/28/2015)
Jason A. Long (5/28/2015)
I would suspect that the answer to which one is faster "quirky method" or "windowed function", is going to be heavily dependent of the available indexes.The "quirky method" relies exclusively clustered index (assuming one exists) to establish the order of the running total. There's no real control beyond that.
The windowed function actually requires that you have, what Itzik Ben-Gan calls, a POC index (Partition, Order & Covering) in order to get the best possible performance.
Based on that, I would expect that the windowed function method to have the edge over the quirky method, only if the correct POC index is in place, otherwise, I'd expect the quirky method to blow the windowed function out of the water.
In addition... The reason I would expect the POC'd windowed function to edge out the quirky, is simply because I'd expect that the POC index would NOT contain all of the columns on the table... Which would allow more rows per page... Which means fewer pages. If the POC index covers the full table, I'd expect the two approaches to have similar performance.
That said, I have not yet had a chance to properly test this hypothesis... I'll add it to my "weekend chores" list, unless someone wants to beat me to the punch.
Jason. Do you have a link to Itzik's article on the subject? I'm especially interested in the POC index but would love to read about insitu.
As far as online articles that cover the POC indexes, this is the best one I'm aware of... SQL Server 2012: How to Write T-SQL Window Functions, Part 3
My 1st exposure to the concept, however, was in his book, Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions.
If you're wandering... Yes, IMO it's well worth the asking price...
May 28, 2015 at 7:57 pm
Jeff Moden (5/28/2015)
Jason A. Long (5/28/2015)
I would suspect that the answer to which one is faster "quirky method" or "windowed function", is going to be heavily dependent of the available indexes.The "quirky method" relies exclusively clustered index (assuming one exists) to establish the order of the running total. There's no real control beyond that.
The windowed function actually requires that you have, what Itzik Ben-Gan calls, a POC index (Partition, Order & Covering) in order to get the best possible performance.
Based on that, I would expect that the windowed function method to have the edge over the quirky method, only if the correct POC index is in place, otherwise, I'd expect the quirky method to blow the windowed function out of the water.
In addition... The reason I would expect the POC'd windowed function to edge out the quirky, is simply because I'd expect that the POC index would NOT contain all of the columns on the table... Which would allow more rows per page... Which means fewer pages. If the POC index covers the full table, I'd expect the two approaches to have similar performance.
That said, I have not yet had a chance to properly test this hypothesis... I'll add it to my "weekend chores" list, unless someone wants to beat me to the punch.
Jason. Do you have a link to Itzik's article on the subject? I'm especially interested in the POC index but would love to read about insitu.
I just noticed... It appears that the forum ate my 1st reply... Not sure how or why that happened.
My 1st exposure to the concept of POC indexes was by reading Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions
Since then, I've also found this... SQL Server 2012: How to Write T-SQL Window Functions, Part 3
Hopefully the post will stick this time...
May 28, 2015 at 8:06 pm
Jeff Moden (5/28/2015)
Jason A. Long (5/28/2015)
I would suspect that the answer to which one is faster "quirky method" or "windowed function", is going to be heavily dependent of the available indexes.The "quirky method" relies exclusively clustered index (assuming one exists) to establish the order of the running total. There's no real control beyond that.
The windowed function actually requires that you have, what Itzik Ben-Gan calls, a POC index (Partition, Order & Covering) in order to get the best possible performance.
Based on that, I would expect that the windowed function method to have the edge over the quirky method, only if the correct POC index is in place, otherwise, I'd expect the quirky method to blow the windowed function out of the water.
In addition... The reason I would expect the POC'd windowed function to edge out the quirky, is simply because I'd expect that the POC index would NOT contain all of the columns on the table... Which would allow more rows per page... Which means fewer pages. If the POC index covers the full table, I'd expect the two approaches to have similar performance.
That said, I have not yet had a chance to properly test this hypothesis... I'll add it to my "weekend chores" list, unless someone wants to beat me to the punch.
Jason. Do you have a link to Itzik's article on the subject? I'm especially interested in the POC index but would love to read about insitu.
Jeff - I've tried to reply to your post twice and the forum has eaten both of them... I'm guessing it doesn't like hyperlinks???
Here's the tag-less version...
My 1st exposure to the concept of a POC index was in Itzik's book, "Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions"... http://www.amazon.com/Microsoft-High-Performance-Functions-Developer-Reference/dp/0735658366
Since then, I've come across this online... SQL Server 2012: How to Write T-SQL Window Functions, Part 3... http://sqlmag.com/sql-server-2012/sql-server-2012-how-write-t-sql-window-functions-part-3
Let's see if 3rd time is the charm...
May 28, 2015 at 9:09 pm
Jason,
It looks like the auto SPAM post hider got ya. I can see all of your posts but I can't make them visible for you. Not sure what the tripwire on these were other than some type of link.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 28, 2015 at 9:37 pm
Jeff Moden (5/28/2015)
Jason,It looks like the auto SPAM post hider got ya. I can see all of your posts but I can't make them visible for you. Not sure what the tripwire on these were other than some type of link.
They do contain links... One to Amazon and one to SQL Server Pro. I know I've posted the same Amazon link in another thread, so it maybe doesn't like SQL Server Pro???
In any case, thanks for looking. I thought I was loosing it there for a second. :crazy:
May 30, 2015 at 8:31 am
Jason, I don't think we can call your test apples-to-apples, at least not "real-world" anyway. At least 95% of the real-world running totals cases I have ever seen are a) partitioned by one or more columns b) do not have any index that would help either method and c) are output queries, not updating a column on a table. Also IIRC there are something like 6 rules that MUST be adhered to when doing the Quirky Update and I think you only have one of them (the clustered index).
Whenever you get around to your next test can you construct it along those lines please? No worries if you are like me and too busy to get it set up! 🙂
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 15 posts - 1 through 15 (of 33 total)
You must be logged in to reply to this topic. Login to reply