October 31, 2009 at 3:41 pm
Attachment added to original post... I know I had it there the first time (it was the first thing I did so that I wouldn't forget), but I wonder what happened to it.
I'm also going to do a quick test of Paul's newest 2008 code... trying to let him redeem himself... 😉
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
October 31, 2009 at 4:31 pm
Thanks Wayne, I've got the attachment. I don't think Paul has to redeem himself. He submitted a very original solution which seems, based on the execution plan, competitive to the other solutions. Maybe the optimizer engine will always be a mystery to us.
Peter
October 31, 2009 at 7:35 pm
Even though I posted this looking for a SQL 2005 solution, I tried out Paul's newest code. And I'm happy to announce that this new bit of code has redeemed him... it turns out to be the fastest code of all. See SSCTest5.trc and SSCTest3.sqlplan in the attachment for the performance data and execution plan. (Note: for this test, I started out with a larger tempdb, and ran the three contenders. None needed to increase it any.) The total time on the million-row test was 3.5 mins, beating Peter's Cross-Join by 45 seconds. It had slightly larger CPU time, and smaller reads. Surprisingly (to me), it had a simpler execution plan. And to top it all off, it was simpler to code. It's just too bad that I need a 2005 solution, or I'd surely be using this.
Now, I need to go study up on the values statement he used in that Cross-Apply. That's one of the things I love about this site... you will always find something else new that you can learn from.
Well done Paul!
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
October 31, 2009 at 11:01 pm
Elegant and fast. Can't imagine a better combination. But I've come to expect that from Paul.
Wayne. Just out of curiosity, how fast would the 2008 solution run without the order by clause? The sort in the execution plan constituted the lion's share of the workload.
Notes to self:
Cross Apply works with any derived table(?)(!) I've used it with subqueries as well as table valued functions, but with the values clause it's just an internal table of constants.
The values clause can include columns which are functions of outside values. From BOL:
The values are specified as a comma-separated list of scalar expressions
The combination produces a the equivalent of a cross join that can calculate new values with no need for CASE expressions. That... just...plain...rocks.
My company will be on 2008 in production before the year is out and that 2008 solution is query is going into my toolbox. We have a huge monthly calculation job that this could speed up significantly. Guys, I want to say again that it's been a real pleasure to follow this thread. My sincere respects go out to you all.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
November 1, 2009 at 9:42 am
Bob Hovious 24601 (10/31/2009)
Wayne. Just out of curiosity, how fast would the 2008 solution run without the order by clause? The sort in the execution plan constituted the lion's share of the workload.
Pretty darn fast. 50 secs.
To be fair, I tried all three without the ordering. Peter's Cross-Join jumps to just under 2 minutes. Bob, for some reason, your solution increased to 16 minutes... must be something going on with my laptop. I'll try again later.
See SSCTest6.trc & SSCTest6.sqlplan for trace and execution plan. The plan for Paul's cross-apply w/ values is pretty impressive.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
November 1, 2009 at 1:44 pm
Hey everyone,
Although the 2008 method is undoubtedly impressive, I'm very aware of the fact that it is a relatively small tweak to the original (awesome) solution from Peter. He should get the credit for the method really.
I also just wanted to add a couple of points about the performance tests:
1. The UNPIVOT/PIVOT uses a lot of tempdb because the QO inserts a pre-aggregate for the final PIVOT. This appears in the plan as a Hash Match (Partial Aggregate). The full reasons behind it are rather complex, but suffice to say it isn't at all optimal with this data set! Adding an OPTION (ORDER GROUP) to the query removes the tempdb blow-out problem, and speeds things up enormously.
2. The UNPIVOT/PIVOT will never be as fast as the other methods for large data sets. Because it splits the old and new values into separate rows, the intermediate row sizes are twice as large as the other methods. I included it mainly because the concept was interesting, and also to illustrate how clever the QO can be when applying transformations. It will never be the fastest way.
3. Peter's method also suffers at the hands of the optimizer for large sets. The lazy spool on the #Update table is a bit silly - we can fix it by adding OPTION (FORCE ORDER) to the query text, or by creating a (very reasonable) PRIMARY KEY on the RowID column of the #Update table.
The re-written 2008 syntax still produces the best plan, but I thought I should add the above comments to ensure we are fair in our assessments 🙂
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
November 1, 2009 at 6:34 pm
Don't worry, Paul. There is plenty of respect to go around and I'm taking none of it away from Peter. Or Wayne.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
November 1, 2009 at 7:17 pm
Paul White (11/1/2009)
or by creating a (very reasonable) PRIMARY KEY on the RowID column of the #Update table.
I was going to ask what kind of indexes you'll thought would be of use in this query. I hadn't pushed it since I figured that since I'm working with the entire table, a scan is probably what it would end up with anyway. Guess it's time to test some more...
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
November 2, 2009 at 1:44 am
Bob Hovious 24601 (11/1/2009)
Don't worry, Paul. There is plenty of respect to go around and I'm taking none of it away from Peter. Or Wayne.
Same here. You guys all rock!
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
November 2, 2009 at 1:53 am
WayneS (11/1/2009)
I was going to ask what kind of indexes you'll thought would be of use in this query. I hadn't pushed it since I figured that since I'm working with the entire table, a scan is probably what it would end up with anyway. Guess it's time to test some more...
Well I'm pretty sure Jeff would argue that every table needs a primary key 😉 but you're right - the index only serves to give the optimizer more information about the data (row_id is unique). The query hints seem to produce the same benefits for no cost.
We could maybe look at doing some stuff with indexed computed columns, but I'd be surprised if any sort of indexing would end up saving more than it cost. In any event, I'd guess that the 2008-enhanced version of Peter's method running a parallel plan will be impossible to beat :w00t:
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
November 2, 2009 at 6:10 pm
Paul White (11/1/2009)
Although the 2008 method is undoubtedly impressive, I'm very aware of the fact that it is a relatively small tweak to the original (awesome) solution from Peter. He should get the credit for the method really.
You know, it is this attitude that I see so much from the "regulars" here, and it's one of the reasons I love this site. But Paul, none of the others participating in this thread saw this... YOU did. YOU realized that this little tweak could be done, and that it would make a huge difference. So Paul, it might have been a "minor" tweak to an already good solution, but YOU made it, and ended up with the best plan yet (so far...). So, take credit where credit is due.
...but I thought I should add the above comments to ensure we are fair in our assessments.
and these are some excellent comments. My question is... how have you figured out all these different options? They aren't used very much...
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
November 3, 2009 at 12:43 pm
Paul White (11/1/2009)
3. Peter's method also suffers at the hands of the optimizer for large sets. The lazy spool on the #Update table is a bit silly - we can fix it by adding OPTION (FORCE ORDER) to the query text, or by creating a (very reasonable) PRIMARY KEY on the RowID column of the #Update table.
Finally got the time to look at the test results. Great job Wayne, also very useful comments Paul. I'm learning all the time. About the lazy spool this. In my original cross join solution I used
cross join
(
select 1 column_no union all select 2 union all select 3
) n
which was changed into
CROSS JOIN (select TOP 30 column_no = row_number() over (order by object_id) from sys.objects) n
for the 30 column problem, which looks like the problem here. The Constant Scan is abandonned and the order of both table scans is reversed.
I suggest using the original code expanded to 30 constants i.e.
CROSS JOIN (
select 1 column_no union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10 union all
select 11 union all
select 12 union all
select 13 union all
select 14 union all
select 15 union all
select 16 union all
select 17 union all
select 18 union all
select 19 union all
select 20 union all
select 21 union all
select 22 union all
select 23 union all
select 24 union all
select 25 union all
select 26 union all
select 27 union all
select 28 union all
select 29 union all
select 30
) n
This will again result in a Constant Scan in the inner loop. It takes some coding but it's consistent and no need for query hints.
Peter
November 5, 2009 at 12:42 am
WayneS (11/2/2009)
You know, it is this attitude that I see so much from the "regulars" here, and it's one of the reasons I love this site. But Paul, none of the others participating in this thread saw this... YOU did. YOU realized that this little tweak could be done, and that it would make a huge difference. So Paul, it might have been a "minor" tweak to an already good solution, but YOU made it, and ended up with the best plan yet (so far...). So, take credit where credit is due.
Ok, ok - appropriate credit taken 😀
Just make sure you take credit for the test rigs, testing, and publishing of the results!
WayneS (11/2/2009)
My question is... how have you figured out all these different options? They aren't used very much...
Just past experience with the optimizer! Once you put yourself in its place, and try to reason out why it made the choices it made, it becomes relatively easy to add appropriate hints to lead it in the right direction - as sometimes is necessary. The optimizer generally makes good choices given the information it has available. Sometimes though, the DBA knows something about the data that can only be passed on to the optimizer through the use of hints, extra statistics, or a constraint (like the PK I used, for example). More on this later.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
November 5, 2009 at 3:34 am
Here is the 2005-syntax equivalent of the 2008 code I posted (it produces an identical plan on 2005 to the 2008 version):
SELECT U.row_id, CA.column_nm, CA.old, CA.new
FROM #Update U
CROSS
APPLY (
SELECT 'column01', CONVERT(SQL_VARIANT, U.column01_old), CONVERT(SQL_VARIANT, U.column01_new) UNION ALL
SELECT 'column02', U.column02_old, U.column02_new UNION ALL
SELECT 'column03', U.column03_old, U.column03_new UNION ALL
SELECT 'column04', U.column04_old, U.column04_new UNION ALL
SELECT 'column05', U.column05_old, U.column05_new UNION ALL
SELECT 'column06', U.column06_old, U.column06_new UNION ALL
SELECT 'column07', U.column07_old, U.column07_new UNION ALL
SELECT 'column08', U.column08_old, U.column08_new UNION ALL
SELECT 'column09', U.column09_old, U.column09_new UNION ALL
SELECT 'column10', U.column10_old, U.column10_new UNION ALL
SELECT 'column11', U.column11_old, U.column11_new UNION ALL
SELECT 'column12', U.column12_old, U.column12_new UNION ALL
SELECT 'column13', U.column13_old, U.column13_new UNION ALL
SELECT 'column14', U.column14_old, U.column14_new UNION ALL
SELECT 'column15', U.column15_old, U.column15_new UNION ALL
SELECT 'column16', U.column16_old, U.column16_new UNION ALL
SELECT 'column17', U.column17_old, U.column17_new UNION ALL
SELECT 'column18', U.column18_old, U.column18_new UNION ALL
SELECT 'column19', U.column19_old, U.column19_new UNION ALL
SELECT 'column20', U.column20_old, U.column20_new UNION ALL
SELECT 'column21', U.column21_old, U.column21_new UNION ALL
SELECT 'column22', U.column22_old, U.column22_new UNION ALL
SELECT 'column23', U.column23_old, U.column23_new UNION ALL
SELECT 'column24', U.column24_old, U.column24_new UNION ALL
SELECT 'column25', U.column25_old, U.column25_new UNION ALL
SELECT 'column26', U.column26_old, U.column26_new UNION ALL
SELECT 'column27', U.column27_old, U.column27_new UNION ALL
SELECT 'column28', U.column28_old, U.column28_new UNION ALL
SELECT 'column29', U.column29_old, U.column29_new UNION ALL
SELECT 'column30', U.column30_old, U.column30_new
) CA (column_nm, old, new)
ORDER BY U.row_id, CA.column_nm;
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
November 5, 2009 at 3:57 am
Finally, here's my script for what I believe to be the absolute fastest possible way to populate the table and return the desired results:
The key point is that all the sorts are eliminated...:w00t:
USE tempdb;
GO
-- Drop the temporary table if it exists
IF OBJECT_ID(N'tempdb..#Update', N'U') IS NOT NULL DROP TABLE #Update;
GO
-- Create the temporary table, complete with primary key
CREATE TABLE #Update
(
row_id BIGINT NOT NULL PRIMARY KEY CLUSTERED,
column01_old DATETIME NOT NULL,
column01_new DATETIME NOT NULL,
column02_old BIT NOT NULL,
column02_new BIT NOT NULL,
column03_old DECIMAL(10,0) NOT NULL,
column03_new DECIMAL(10,0) NOT NULL,
column04_old DATETIME NOT NULL,
column04_new DATETIME NOT NULL,
column05_old BIT NOT NULL,
column05_new BIT NOT NULL,
column06_old DECIMAL(10,0) NOT NULL,
column06_new DECIMAL(10,0) NOT NULL,
column07_old DATETIME NOT NULL,
column07_new DATETIME NOT NULL,
column08_old BIT NOT NULL,
column08_new BIT NOT NULL,
column09_old DECIMAL(10,0) NOT NULL,
column09_new DECIMAL(10,0) NOT NULL,
column10_old DATETIME NOT NULL,
column10_new DATETIME NOT NULL,
column11_old DATETIME NOT NULL,
column11_new DATETIME NOT NULL,
column12_old BIT NOT NULL,
column12_new BIT NOT NULL,
column13_old DECIMAL(10,0) NOT NULL,
column13_new DECIMAL(10,0) NOT NULL,
column14_old DATETIME NOT NULL,
column14_new DATETIME NOT NULL,
column15_old BIT NOT NULL,
column15_new BIT NOT NULL,
column16_old DECIMAL(10,0) NOT NULL,
column16_new DECIMAL(10,0) NOT NULL,
column17_old DATETIME NOT NULL,
column17_new DATETIME NOT NULL,
column18_old BIT NOT NULL,
column18_new BIT NOT NULL,
column19_old DECIMAL(10,0) NOT NULL,
column19_new DECIMAL(10,0) NOT NULL,
column20_old DATETIME NOT NULL,
column20_new DATETIME NOT NULL,
column21_old DATETIME NOT NULL,
column21_new DATETIME NOT NULL,
column22_old BIT NOT NULL,
column22_new BIT NOT NULL,
column23_old DECIMAL(10,0) NOT NULL,
column23_new DECIMAL(10,0) NOT NULL,
column24_old DATETIME NOT NULL,
column24_new DATETIME NOT NULL,
column25_old BIT NOT NULL,
column25_new BIT NOT NULL,
column26_old DECIMAL(10,0) NOT NULL,
column26_new DECIMAL(10,0) NOT NULL,
column27_old DATETIME NOT NULL,
column27_new DATETIME NOT NULL,
column28_old BIT NOT NULL,
column28_new BIT NOT NULL,
column29_old DECIMAL(10,0) NOT NULL,
column29_new DECIMAL(10,0) NOT NULL,
column30_old DATETIME NOT NULL,
column30_new DATETIME NOT NULL,
);
GO
-- Generate test rows (5,000 in this example)
-- There is *no sort* in the plan!
;WITH CTE AS
(
SELECT TOP (5000)
n = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
master.sys.allocation_units A1,
master.sys.allocation_units A2,
master.sys.allocation_units A3
ORDER BY
n ASC
)
INSERT #Update
SELECT CTE.N,
DATEADD(MINUTE, CTE.N, '2000-01-01'), DATEADD(HOUR, CTE.N, '2000-01-01'), 0, 1, CTE.N + 0.1, CTE.N + 0.3,
DATEADD(MINUTE, CTE.N, '2000-01-01'), DATEADD(HOUR, CTE.N, '2000-01-01'), 0, 1, CTE.N + 0.1, CTE.N + 0.3,
DATEADD(MINUTE, CTE.N, '2000-01-01'), DATEADD(HOUR, CTE.N, '2000-01-01'), 0, 1, CTE.N + 0.1, CTE.N + 0.3,
DATEADD(MINUTE, CTE.N, '2000-01-01'), DATEADD(HOUR, CTE.N, '2000-01-01'),
DATEADD(MINUTE, CTE.N, '2000-01-01'), DATEADD(HOUR, CTE.N, '2000-01-01'), 0, 1, CTE.N + 0.1, CTE.N + 0.3,
DATEADD(MINUTE, CTE.N, '2000-01-01'), DATEADD(HOUR, CTE.N, '2000-01-01'), 0, 1, CTE.N + 0.1, CTE.N + 0.3,
DATEADD(MINUTE, CTE.N, '2000-01-01'), DATEADD(HOUR, CTE.N, '2000-01-01'), 0, 1, CTE.N + 0.1, CTE.N + 0.3,
DATEADD(MINUTE, CTE.N, '2000-01-01'), DATEADD(HOUR, CTE.N, '2000-01-01'),
DATEADD(MINUTE, CTE.N, '2000-01-01'), DATEADD(HOUR, CTE.N, '2000-01-01'), 0, 1, CTE.N + 0.1, CTE.N + 0.3,
DATEADD(MINUTE, CTE.N, '2000-01-01'), DATEADD(HOUR, CTE.N, '2000-01-01'), 0, 1, CTE.N + 0.1, CTE.N + 0.3,
DATEADD(MINUTE, CTE.N, '2000-01-01'), DATEADD(HOUR, CTE.N, '2000-01-01'), 0, 1, CTE.N + 0.1, CTE.N + 0.3,
DATEADD(MINUTE, CTE.N, '2000-01-01'), DATEADD(HOUR, CTE.N, '2000-01-01')
FROM CTE
ORDER BY n ASC;
GO
-- The final query (again no sort!)
SELECT U.row_id, CA.column_nm, CA.old, CA.new
FROM #Update U WITH (TABLOCK)
CROSS
APPLY (
SELECT 'column01', CONVERT(SQL_VARIANT, U.column01_old), CONVERT(SQL_VARIANT, U.column01_new) UNION ALL
SELECT 'column02', U.column02_old, U.column02_new UNION ALL
SELECT 'column03', U.column03_old, U.column03_new UNION ALL
SELECT 'column04', U.column04_old, U.column04_new UNION ALL
SELECT 'column05', U.column05_old, U.column05_new UNION ALL
SELECT 'column06', U.column06_old, U.column06_new UNION ALL
SELECT 'column07', U.column07_old, U.column07_new UNION ALL
SELECT 'column08', U.column08_old, U.column08_new UNION ALL
SELECT 'column09', U.column09_old, U.column09_new UNION ALL
SELECT 'column10', U.column10_old, U.column10_new UNION ALL
SELECT 'column11', U.column11_old, U.column11_new UNION ALL
SELECT 'column12', U.column12_old, U.column12_new UNION ALL
SELECT 'column13', U.column13_old, U.column13_new UNION ALL
SELECT 'column14', U.column14_old, U.column14_new UNION ALL
SELECT 'column15', U.column15_old, U.column15_new UNION ALL
SELECT 'column16', U.column16_old, U.column16_new UNION ALL
SELECT 'column17', U.column17_old, U.column17_new UNION ALL
SELECT 'column18', U.column18_old, U.column18_new UNION ALL
SELECT 'column19', U.column19_old, U.column19_new UNION ALL
SELECT 'column20', U.column20_old, U.column20_new UNION ALL
SELECT 'column21', U.column21_old, U.column21_new UNION ALL
SELECT 'column22', U.column22_old, U.column22_new UNION ALL
SELECT 'column23', U.column23_old, U.column23_new UNION ALL
SELECT 'column24', U.column24_old, U.column24_new UNION ALL
SELECT 'column25', U.column25_old, U.column25_new UNION ALL
SELECT 'column26', U.column26_old, U.column26_new UNION ALL
SELECT 'column27', U.column27_old, U.column27_new UNION ALL
SELECT 'column28', U.column28_old, U.column28_new UNION ALL
SELECT 'column29', U.column29_old, U.column29_new UNION ALL
SELECT 'column30', U.column30_old, U.column30_new
) CA (column_nm, old, new)
ORDER BY U.row_id
OPTION (MAXDOP 1);
I have coded this example for 5K rows (my laptop is no match for Wayne's machine!) but nothing changes with 1M rows...or more.
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
Viewing 15 posts - 31 through 45 (of 50 total)
You must be logged in to reply to this topic. Login to reply