August 22, 2014 at 6:20 am
Hi. I was recently told that the following block of code:
select a.customerid,
a.customerName
into NewTable
from(
select customerid,
customername
from database.dbo.customer
union
select customerid,
customername
from database_1.dbo.customer
) a
is better performing and preferred to:
select customerid,
customername
into NewTable
from database.dbo.customer
union
select customerid,
customername
from database_1.dbo.customer
Can anyone comment on this and help me understand why putting the query in a subquery is better performing or preferable to just putting the results into a table? Thank you.
August 22, 2014 at 8:45 am
August 22, 2014 at 9:23 am
I checked the execution plans of each, and they are the same. I am of the opinion that the subquery is not going to improve performance of the query, and the execution plans seem to prove me correct. I prefer the straight "select into" version because it is less code and, in my opinion, is a cleaner version.
I was wondering if anyone in the community knows why I would be told, by a database architect, that the subquery version of this query would be better performing. I was forced to change all of my queries to use the subquery version, rather than a straight select into. I was not given a reason that satisfied me, so I am hoping someone in this community can tell me why I would be directed in this manner.
Thank you for your time.
August 22, 2014 at 9:34 am
Cathy DePaolo (8/22/2014)
I checked the execution plans of each, and they are the same. I am of the opinion that the subquery is not going to improve performance of the query, and the execution plans seem to prove me correct. I prefer the straight "select into" version because it is less code and, in my opinion, is a cleaner version.I was wondering if anyone in the community knows why I would be told, by a database architect, that the subquery version of this query would be better performing. I was forced to change all of my queries to use the subquery version, rather than a straight select into. I was not given a reason that satisfied me, so I am hoping someone in this community can tell me why I would be directed in this manner.
Thank you for your time.
Because that is the syntax the architect is comfortable with. There is no valid reason that one is preferred over the other. Those two queries are functionally equivalent and it is no surprise that the execution plans are identical.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
August 22, 2014 at 9:50 am
Cathy DePaolo (8/22/2014)
I checked the execution plans of each, and they are the same. I am of the opinion that the subquery is not going to improve performance of the query, and the execution plans seem to prove me correct. I prefer the straight "select into" version because it is less code and, in my opinion, is a cleaner version.I was wondering if anyone in the community knows why I would be told, by a database architect, that the subquery version of this query would be better performing. I was forced to change all of my queries to use the subquery version, rather than a straight select into. I was not given a reason that satisfied me, so I am hoping someone in this community can tell me why I would be directed in this manner.
Thank you for your time.
Your architect is incorrect as others have pointed out and the time you spent changing all of your relevant queries was pointlessly wasted.
The exercise isn't a complete waste of time however - you're hopefully encouraged by the responses here to ask the next self-appointed expert who comes along to prove their crazy-*** theory before you waste good company time implementing it. You might also question what significant opportunities for improvement this particular database architect was failing to see.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
August 22, 2014 at 10:41 am
ChrisM@Work (8/22/2014)
Cathy DePaolo (8/22/2014)
I checked the execution plans of each, and they are the same. I am of the opinion that the subquery is not going to improve performance of the query, and the execution plans seem to prove me correct. I prefer the straight "select into" version because it is less code and, in my opinion, is a cleaner version.I was wondering if anyone in the community knows why I would be told, by a database architect, that the subquery version of this query would be better performing. I was forced to change all of my queries to use the subquery version, rather than a straight select into. I was not given a reason that satisfied me, so I am hoping someone in this community can tell me why I would be directed in this manner.
Thank you for your time.
Your architect is incorrect as others have pointed out and the time you spent changing all of your relevant queries was pointlessly wasted.
The exercise isn't a complete waste of time however - you're hopefully encouraged by the responses here to ask the next self-appointed expert who comes along to prove their crazy-*** theory before you waste good company time implementing it. You might also question what significant opportunities for improvement this particular database architect was failing to see.
+1000
August 23, 2014 at 2:53 pm
Cathy DePaolo (8/22/2014)
I checked the execution plans of each, and they are the same. I am of the opinion that the subquery is not going to improve performance of the query, [font="Arial Black"]and the execution plans seem to prove me correct.[/font] I prefer the straight "select into" version because it is less code and, in my opinion, is a cleaner version.I was wondering if anyone in the community knows why I would be told, by a database architect, that the subquery version of this query would be better performing. I was forced to change all of my queries to use the subquery version, rather than a straight select into. I was not given a reason that satisfied me, so I am hoping someone in this community can tell me why I would be directed in this manner.
Thank you for your time.
Gosh, be careful now. In this case, it works out, but the Execution plans never [font="Arial Black"]prove [/font]anything having to do with performance. Not even the ACTUAL execution plans. The numbers that show up in an execution plan are frequently just estimates and the "% of batch" number can be 100% "wrong" (if not wrong, then very deceiving.
First, build the following test table so we have some known example data to work with...
/**********************************************************************************************************************
Purpose:
Create a voluminous test table with various types of highly randomized data.
--Jeff Moden
**********************************************************************************************************************/
--===== Conditionally drop the test table to make reruns easier
IF OBJECT_ID('tempdb..#JBMTest','U') IS NOT NULL
DROP TABLE #JBMTest
;
--===== Create and populate a 1,000,000 row test table.
-- "SomeID" has a range of 1 to 1,000,000 unique numbers
-- "SomeInt" has a range of 1 to 50,000 non-unique numbers
-- "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings
-- "SomeMoney has a range of 0.00 to 100.00 non-unique numbers
-- "SomeDateTime" has a range of >=01/01/2000 and <01/01/2020 non-unique date/times
-- "SomeDate" has a range of >=01/01/2000 and <01/01/2020 non-unique "whole dates"
-- "SomeName" contains random characters at random lengths from 2 to 20 characters
SELECT TOP 1000000
SomeID = IDENTITY(INT,1,1),
SomeInt = ABS(CHECKSUM(NEWID())) % 50000 + 1,
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID())) % (ASCII('Z')-ASCII('A')+1) +ASCII('A'))
+ CHAR(ABS(CHECKSUM(NEWID())) % (ASCII('Z')-ASCII('A')+1) +ASCII('A')),
SomeMoney = CAST(RAND(CHECKSUM(NEWID())) * 100 AS DECIMAL(9,2)), --Note rounding
SomeDateTime = RAND(CHECKSUM(NEWID())) * DATEDIFF(dd,'2000','2020') + CAST('2000' AS DATETIME),
SomeDate = ABS (CHECKSUM(NEWID())) % DATEDIFF(dd,'2000','2020') + CAST('2000' AS DATETIME),
SomeName = RIGHT(NEWID(),ABS(CHECKSUM(NEWID())) % 19 + 2)
INTO #JBMTest
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
;
ALTER TABLE #JBMTest
ADD CONSTRAINT PK_#JBMTest PRIMARY KEY CLUSTERED (SomeID)
;
Then, from the same window, run this code with the actual execution plan turned on where the second snippet actually takes twice as long as the first.
--===== Execution plan says these are the same...
-- Are they?
SET STATISTICS TIME,IO ON;
SELECT TOP 10
*
FROM #JBMTest
ORDER BY SomeDate;
SET STATISTICS TIME OFF;
PRINT '=================================================='
SET STATISTICS TIME,IO ON;
SET ROWCOUNT 10
SELECT *
FROM #JBMTest
ORDER BY SomeDate
SET ROWCOUNT 0
SET STATISTICS TIME OFF;
PRINT '=================================================='
Both pieces of code return exactly the same information from the test table. The Actual Execution plan says that both will take "50% of batch". But if you look at the "Messages" tab, you see the true story.
(10 row(s) affected)
Table '#JBMTest____________________________________________________________________________________________________________000000018739'. Scan count 5, logical reads 6831, physical reads 0, read-ahead reads 12, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
[font="Arial Narrow"] CPU time = 764 ms, elapsed time = 291 ms.[/font]
==================================================
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
(10 row(s) affected)
Table '#JBMTest____________________________________________________________________________________________________________000000018739'. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
[font="Arial Black"] CPU time = 1716 ms, elapsed time = 1584 ms.[/font]
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
==================================================
The "% of batch% can actually be 100% wrong as in the following code example. If you don't already have a Tally table, this code will build one for you. If you already have one, then skip that part...
This is to build the Tally table...
--===================================================================
-- Create a Tally table from 1 to 11000
--===================================================================
--===== Create and populate the Tally table on the fly.
SELECT TOP 11000
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM Master.sys.ALL_Columns ac1
CROSS JOIN Master.sys.ALL_Columns ac2
;
--===== Add a CLUSTERED Primary Key to maximize performance
ALTER TABLE dbo.Tally
ADD CONSTRAINT PK_Tally_N
PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
;
--===== Allow the general public to use it
GRANT SELECT ON dbo.Tally TO PUBLIC
;
Using the Tally table above, this next bit of code shows that the execution plan can be quite deceiving. It appears to say that the first query will take 0% of the batch and the second query takes 100% when nearly the exact opposite is true.
/****************************************************************************************
Purpose:
This code demonstrates that the estimated and actual execution plans in SQL Server can
be 100% INCORRECT and that the execution plan should only be relied on to provide hints
as to what may be wrong with a query rather than an absolute indication. This code runs
in SQL Server 2005 and up only.
The code creates a temp table for 10 years worth of dates starting with 2000-01-01 using
two different methods. The first method uses a recursive CTE and the second method uses
an on-the-fly creation of a "Tally" table. The output of each method is directed to a
"throw-away" variable to take display delays out of the picture.
Please see the separate documentation included in this package for addition explanations
and a copy of the execution actual execution plan.
--Jeff Moden (28 Sep 2009)
****************************************************************************************/
SET NOCOUNT ON
--=======================================================================================
-- Recursive method shown by (Name with-held)
--=======================================================================================
PRINT '========== Recursive method =========='
--===== Turn on some performance counters ===============================================
SET STATISTICS IO ON
SET STATISTICS TIME ON
DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.
--===== Execute the code being tested ===================================================
DECLARE @DateVal DATETIME
SET @DateVal = '2000-01-01'
;with mycte as
(
select @DateVal AS DateVal
union all
select DateVal + 1
from mycte
where DateVal + 1 < DATEADD(yy, 30, @DateVal)
)
select @Bitbucket = d.dateval
from mycte d
OPTION (MAXRECURSION 0)
--===== Turn off the performance counters and print a separator =========================
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT REPLICATE('=',90)
GO
--=======================================================================================
-- Tally table method by Jeff Moden
--=======================================================================================
PRINT '========== Tally table method =========='
--===== Turn on some performance counters ===============================================
SET STATISTICS IO ON
SET STATISTICS TIME ON
DECLARE @Bitbucket DATETIME --Holds display output so display times aren't measured.
--===== Execute the code being tested ===================================================
DECLARE @StartDate AS DATETIME
SET @StartDate = '2000-01-01'
SELECT TOP (DATEDIFF(dd,@StartDate,DATEADD(yy,30,@StartDate)))
@Bitbucket = @StartDate-1+t.N
FROM dbo.Tally t
ORDER BY N
--===== Turn off the performance counters and print a separator =========================
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
PRINT REPLICATE('=',90)
GO
Here are the run results. Notice that what was marked as 0% of batch takes almost all of the time and that which was marked as 100% of batch takes almost zero time.
========== Recursive method ==========
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Worktable'. Scan count 2, logical reads 65749, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
[font="Arial Black"] CPU time = 93 ms, elapsed time = 109 ms.[/font]
==========================================================================================
========== Tally table method ==========
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Tally'. Scan count 1, logical reads 20, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
[font="Arial Black"] CPU time = 0 ms, elapsed time = 2 ms.[/font]
==========================================================================================
The reason for that disparity in that last example is that the execution plan is for just the first iteration of the Recursive CTE.
The ONLY way to prove that one method is better than the other is to run them and measure them.
As to the examples you gave, I join the others in asking what proof the DA has than one should be better than the other. As a man much wiser than I once said, "A single test is worth a thousand expert opinions". With that in mind, here's a test for what you posted...
First, here's the code to build 2 test DBs and populate them with test data...
--=====================================================================================================================
-- Create two test DBs. These may be dropped in their entirety when done testing
--=====================================================================================================================
/***********************
DROP DATABASE TestDB;
DROP DATABASE TestDB1
DROP TABLE #TestData
***********************/
CREATE DATABASE TestDB;
CREATE DATABASE TestDB1;
GO
--=====================================================================================================================
-- Create the test data in each DB
--=====================================================================================================================
--===== Create a common collection of data so that the random names are the same for each ID later on.
SELECT TOP 1000000
CustomerID = IDENTITY(INT,1,1)
,CustomerName = CAST(NEWID() AS VARCHAR(36))
INTO #TestData
FROM master.sys.all_columns ac1
CROSS JOIN master.sys.all_columns ac2
;
--===== Randomly distribute half into one database
SELECT TOP 500000
*
INTO TestDB.dbo.Customer
FROM #TestData
ORDER BY NEWID()
;
--===== Randomly distribute half into the other database.
-- Note that this will be some overlap with the other database.
SELECT TOP 500000
*
INTO TestDB1.dbo.Customer
FROM #TestData
ORDER BY NEWID()
;
--===== Do a little cleanup
DROP TABLE #TestData
;
And here's the code to test the two methods...
--=====================================================================================================================
-- Test each method from the original post for performance
--=====================================================================================================================
SET STATISTICS TIME,IO ON
;
--===== The altered method
select a.customerid,
a.customerName
into NewTable
from(
select customerid,
customername
from TestDB.dbo.customer
union
select customerid,
customername
from TestDB1.dbo.customer
) a
;
SET STATISTICS TIME,IO OFF
;
DROP TABLE dbo.NewTable
SET STATISTICS TIME,IO ON
;
--===== Original method
select customerid,
customername
into NewTable1
from TestDB.dbo.customer
union
select customerid,
customername
from TestDB1.dbo.customer
;
SET STATISTICS TIME,IO OFF
;
DROP TABLE dbo.NewTable1
;
PRINT '============================================================================================='
PRINT '============================================================================================='
GO 5
What I get on my machine is two nearly identical scripts taking turns winning except for the first iteration which doesn't count in this kind of testing.
Beginning execution loop
Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 2184 ms, elapsed time = 1230 ms.
(750114 row(s) affected)
Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 2089 ms, elapsed time = 852 ms.
(750114 row(s) affected)
=============================================================================================
=============================================================================================
Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1982 ms, elapsed time = 1152 ms.
(750114 row(s) affected)
Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 2061 ms, elapsed time = 1136 ms.
(750114 row(s) affected)
=============================================================================================
=============================================================================================
Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 2057 ms, elapsed time = 903 ms.
(750114 row(s) affected)
Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 2045 ms, elapsed time = 1089 ms.
(750114 row(s) affected)
=============================================================================================
=============================================================================================
Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 2012 ms, elapsed time = 1036 ms.
(750114 row(s) affected)
Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1995 ms, elapsed time = 1012 ms.
(750114 row(s) affected)
=============================================================================================
=============================================================================================
Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 2013 ms, elapsed time = 1089 ms.
(750114 row(s) affected)
Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Customer'. Scan count 5, logical reads 3291, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 2090 ms, elapsed time = 1081 ms.
(750114 row(s) affected)
=============================================================================================
=============================================================================================
Batch execution completed 5 times.
I have to agree with the others... changing the code on the recommendation from the DA was a waste of time on my machine. That, notwithstanding, neither you nor the DA will know for sure until you run this test and others (for different scenarios).
--Jeff Moden
Change is inevitable... Change for the better is not.
August 23, 2014 at 3:09 pm
ChrisM@Work (8/22/2014)
Cathy DePaolo (8/22/2014)
I checked the execution plans of each, and they are the same. I am of the opinion that the subquery is not going to improve performance of the query, and the execution plans seem to prove me correct. I prefer the straight "select into" version because it is less code and, in my opinion, is a cleaner version.I was wondering if anyone in the community knows why I would be told, by a database architect, that the subquery version of this query would be better performing. I was forced to change all of my queries to use the subquery version, rather than a straight select into. I was not given a reason that satisfied me, so I am hoping someone in this community can tell me why I would be directed in this manner.
Thank you for your time.
Your architect is incorrect as others have pointed out and the time you spent changing all of your relevant queries was pointlessly wasted.
The exercise isn't a complete waste of time however - you're hopefully encouraged by the responses here to ask the next self-appointed expert who comes along to prove their crazy-*** theory before you waste good company time implementing it. You might also question what significant opportunities for improvement this particular database architect was failing to see.
As already said, +1000!
There's nothing wrong with challenging a DA when it comes to performance because, and with absolutely no malice in my heart, DAs don't actually need to know how to write a lick of code, never mind high performance code, to do their jobs correctly.
It's a real shame that this DA didn't do a substantial bit of testing before issuing this replacement edict. Reminds me of the "experts" that one of my previous companies hired before I joined... they recommended replacing all cursors with Temp Tables and WHILE loops. No one asked for or offered proof. They spent 6 months doing the replacements and the code actually ran slower.
Titles mean nothing in this area. Trust no one when it comes to such recommendations (not even me). Always ask to see some substantial coded proof or create the tests yourself. You could save the company a whole lot of time and aggravation. If you take the time to do some additional "what if" experimentation in the process, you might actually come up with a way to make a real improvement in performance.
Sorry... I get ticked at such "experts". I'll put the soapbox away now.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 23, 2014 at 3:11 pm
Sean Lange (8/22/2014)
There is no valid reason that one is preferred over the other.
Actually, there is 😀 ... the first code is both shorter to write and easier to read, not to mention that the original code didn't cost extra to have a change put in that didn't matter. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
August 23, 2014 at 3:18 pm
BTW, for anyone interested but especially for Cathy...
The following two articles explain the basics of how to quickly and easily build a million rows of test data and the resulting code only takes seconds to execute so you can test over and over, if need be. Let your imagination run wild.
http://www.sqlservercentral.com/articles/Data+Generation/87901/
http://www.sqlservercentral.com/articles/Test+Data/88964/
--Jeff Moden
Change is inevitable... Change for the better is not.
August 23, 2014 at 4:48 pm
Jeff Moden (8/23/2014)
DAs don't actually need to know how to write a lick of code, never mind high performance code, to do their jobs correctly.
Hmm???
😎
BTW +100 for the soapbox speach
August 24, 2014 at 9:02 am
Jeff Moden (8/23/2014)
There's nothing wrong with challenging a DA when it comes to performance
up to there I agree with you 100%
because, and with absolutely no malice in my heart, DAs don't actually need to know how to write a lick of code, never mind high performance code, to do their jobs correctly.
And I don't agree with that at all; a DA had better be able to produce a schema design which enables high-performance code to be written, and if the DA knows nothing about code and in particular nothing about high performance code that DA is not going to be able to do that.
Tom
August 24, 2014 at 2:30 pm
Jeff Moden (8/23/2014)
Cathy DePaolo (8/22/2014)
I checked the execution plans of each, and they are the same. I am of the opinion that the subquery is not going to improve performance of the query, [font="Arial Black"]and the execution plans seem to prove me correct.[/font] I prefer the straight "select into" version because it is less code and, in my opinion, is a cleaner version.I was wondering if anyone in the community knows why I would be told, by a database architect, that the subquery version of this query would be better performing. I was forced to change all of my queries to use the subquery version, rather than a straight select into. I was not given a reason that satisfied me, so I am hoping someone in this community can tell me why I would be directed in this manner.
Thank you for your time.
Gosh, be careful now. In this case, it works out, but the Execution plans never [font="Arial Black"]prove [/font]anything having to do with performance. Not even the ACTUAL execution plans.
Sorry, but I'm going to disagree with you there.
If two queries have identical execution plans, then they will (in the absence of blocking or waits) perform identically.
I don't mean same costs (costs are always estimated), I mean identical plans. If the operators are the same and the row counts are the same and the number of executions for each operator is the same, then the two queries will be executed the same way because the plan is the 'recipe' given to the execution engine.
It's a good way to tell if two different ways of writing a query are equivalent. If they produce identical plans, then they have to be logically equivalent.
I enjoy doing this to people who insist on 'subqueries in the from clause are BAD' or 'subqueries in the from clause are good' (I've seen both recently), as I can just write the queries with and without the subquery (logically equivalent ways) and then show that the execution plans are absolutely, 100%, completely identical in every way and hence there's no difference between the two queries by the time they reach the query execution engine.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 24, 2014 at 2:31 pm
TomThomson (8/24/2014)
Jeff Moden (8/23/2014)
There's nothing wrong with challenging a DA when it comes to performanceup to there I agree with you 100%
because, and with absolutely no malice in my heart, DAs don't actually need to know how to write a lick of code, never mind high performance code, to do their jobs correctly.
And I don't agree with that at all; a DA had better be able to produce a schema design which enables high-performance code to be written, and if the DA knows nothing about code and in particular nothing about high performance code that DA is not going to be able to do that.
We wouldn't want to leave the design of the internals of a building to the builders would we?
😎
August 24, 2014 at 10:44 pm
Jeff Moden (8/23/2014)
Sean Lange (8/22/2014)
There is no valid reason that one is preferred over the other.Actually, there is 😀 ... the first code is both shorter to write and easier to read, not to mention that the original code didn't cost extra to have a change put in that didn't matter. 😉
The "valid" reasons here are subject to personal preference (other then fewer keystrokes). I would not have changed the original because I too think that is easier to read. I certainly would never force somebody to change their code because I don't like the style in which they wrote it. Now if performance was measurably an issue that would be another story.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply