December 29, 2014 at 6:46 pm
sqldriver (12/29/2014)
Jeff Moden (12/29/2014)
sqldriver (12/29/2014)
I'm not exactly sure how to test that, but I do have a 2014 Dev edition installed, and ran the below code.If there's something in particular you'd like me to try to confirm this behavior, let me know.
SET NOCOUNT ON
SET STATISTICS IO, TIME ON
/*
SELECT @@VERSION
Microsoft SQL Server 2014 - 12.0.2456.0 (X64)
Dec 11 2014 17:32:03
Copyright (c) Microsoft Corporation
Developer Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)
*/
USE AdventureWorks2014
;
WITH pp
AS ( SELECT p.BusinessEntityID ,
p.Title ,
p.FirstName ,
p.LastName
FROM Person.Person AS p
WHERE p.BusinessEntityID % 2 = 0
)
SELECT *
FROM pp p1
JOIN pp p2 ON p2.BusinessEntityID = p1.BusinessEntityID
/*
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Person'. Scan count 2, logical reads 7638, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
*/
;
WITH pp
AS ( SELECT p.BusinessEntityID ,
p.Title ,
p.FirstName ,
p.LastName
FROM Person.Person AS p
WHERE p.BusinessEntityID % 2 = 0
),
pp2 AS( SELECT p1.BusinessEntityID ,
p1.Title ,
p1.FirstName ,
p1.LastName
FROM pp p1
)
SELECT *
FROM pp2 p1
JOIN pp2 p2 ON p2.BusinessEntityID = p1.BusinessEntityID
/*
SQL Server Execution Times:
CPU time = 32 ms, elapsed time = 137 ms.
Table 'Person'. Scan count 2, logical reads 7638, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
*/
EDIT: had top 10 in one query but not the other.
Just look at the actual execution plan for two sets of blocks that look similar.
See attached; doesn't seem to.
I can test on 2012 later tonight. The laptop I had it installed on is sleeping with the BlackBerrys, so need a few to dig up the CD and install it somewhere else.
Looks like it does to me. It's doing 2 clustered index scans on the person table.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 29, 2014 at 7:35 pm
Jeff Moden (12/29/2014)
sqldriver (12/29/2014)
Jeff Moden (12/29/2014)
sqldriver (12/29/2014)
I'm not exactly sure how to test that, but I do have a 2014 Dev edition installed, and ran the below code.If there's something in particular you'd like me to try to confirm this behavior, let me know.
SET NOCOUNT ON
SET STATISTICS IO, TIME ON
/*
SELECT @@VERSION
Microsoft SQL Server 2014 - 12.0.2456.0 (X64)
Dec 11 2014 17:32:03
Copyright (c) Microsoft Corporation
Developer Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)
*/
USE AdventureWorks2014
;
WITH pp
AS ( SELECT p.BusinessEntityID ,
p.Title ,
p.FirstName ,
p.LastName
FROM Person.Person AS p
WHERE p.BusinessEntityID % 2 = 0
)
SELECT *
FROM pp p1
JOIN pp p2 ON p2.BusinessEntityID = p1.BusinessEntityID
/*
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Person'. Scan count 2, logical reads 7638, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
*/
;
WITH pp
AS ( SELECT p.BusinessEntityID ,
p.Title ,
p.FirstName ,
p.LastName
FROM Person.Person AS p
WHERE p.BusinessEntityID % 2 = 0
),
pp2 AS( SELECT p1.BusinessEntityID ,
p1.Title ,
p1.FirstName ,
p1.LastName
FROM pp p1
)
SELECT *
FROM pp2 p1
JOIN pp2 p2 ON p2.BusinessEntityID = p1.BusinessEntityID
/*
SQL Server Execution Times:
CPU time = 32 ms, elapsed time = 137 ms.
Table 'Person'. Scan count 2, logical reads 7638, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
*/
EDIT: had top 10 in one query but not the other.
Just look at the actual execution plan for two sets of blocks that look similar.
See attached; doesn't seem to.
I can test on 2012 later tonight. The laptop I had it installed on is sleeping with the BlackBerrys, so need a few to dig up the CD and install it somewhere else.
Looks like it does to me. It's doing 2 clustered index scans on the person table.
Now I see what you mean, after re-reading your comment about self-joins.
I do not like this.
WITH pp
AS ( SELECT p.BusinessEntityID ,
p.Title ,
p.FirstName ,
p.LastName
FROM Person.Person AS p
WHERE p.BusinessEntityID % 2 = 0
)
SELECT *
FROM pp p1
JOIN pp p2
ON p2.BusinessEntityID = p1.BusinessEntityID
JOIN pp p3
ONp3.BusinessEntityID = p1.BusinessEntityID
JOIN pp p4
ON p4.BusinessEntityID = p1.BusinessEntityID
JOIN pp p5
ON p5.BusinessEntityID = p1.BusinessEntityID
JOIN pp p6
ON p6.BusinessEntityID = p1.BusinessEntityID
JOIN pp p7
ON p7.BusinessEntityID = p1.BusinessEntityID
JOIN pp p8
ON p8.BusinessEntityID = p1.BusinessEntityID
JOIN pp p9
ON p9.BusinessEntityID = p1.BusinessEntityID
JOIN pp p10
ON p10.BusinessEntityID = p1.BusinessEntityID
December 29, 2014 at 9:00 pm
Yep... when it comes to self-joins/multiple references, CTEs truly live up to the alternate name of "inline views". Like I said, it's sometimes MUCH more effective to direct the output of what you would have used as a CTE to a table and then multi-reference the table. For big stuff, it also give you the opportunity to add the right kind of indexes to the table.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 30, 2014 at 12:42 am
Jeff Moden (12/28/2014)
Eirikur Eiriksson (12/27/2014)
Run results
OPERATION DURATION
--------------------------------------------------------------- -----------
Eirikur's version with Jeff's tricks 2416806
Jeff's version, which also creates and populates the table... 2628008
Eirikur's version 3656208
sqldriver's version 10546826
p.s. I continue to be amazed at the performance of the hardware you're running on, Eirikur. My little 4 processor i5 laptop doesn't stand a chance.
Now you are in for a little surprise Jeff as I'm running this on a "slightly tinkered" i5 laptop (E6220), SQL Server 2014, second generation Intel SSD and most importantly an optimal set-up of the hardware for the SQL Server, must admit it hums along nicely, boots from cold in less than seven seconds π
π
Shifting gears a bit and just to share a conviction of mine, I've recently been involved in a discussion about throwing heavy iron at problem code rather than fixing the code itself. As impressed as I am with the hardware, this is an outstanding example of how much more ROI one can get by knowing the ropes of T-SQL. The original problem on my laptop took 16524945 uS and took 10546826 uS on that wonderful box of yours. That's an improvement of 36%, which is awesome for "just" a hardware change. Using the time from my lesser powered laptop even for the version that uses ISNULL(), the time dropped to 3014173, which is nearly an 82% improvement. In simpler terms, the hardware improvement allowed for a 1/2X improvement. The change in code that we both made exceeds a 4.4X improvement in performance.
I wonder where those folks on that other discussion I've been having will find a computer that actually runs 4.4X faster and how much it will cost? π And it doesn't appear that MPP will actually help for a query like this one because 1) parallelism is being used even in the original query and 2) you actually might have to rewrite code anyway to take advantage of MPP because it's a different paradigm.
I do agree here and this is where things get really interesting, although metaphorically speaking one is indeed entering the forest of great misconceptions. Heavier iron does not necessarily equal better performance for all problems. As an example, running the test harness posted earlier on this thread, on a beefy Xeon workstation and on a production spec lots-of-cores-buckets-of_memory server, reveals the fact that those massive irons do not produce any performance benefits for this problem, it just about matches Jeff's old i5 laptop. Running the same harness on a snappy i7 laptop produced roughly 1/4 improvement. Keep in mind that the i7 laptop is around five times more expensive than the old i5 laptop.;-)
December 30, 2014 at 5:48 am
Eirikur Eiriksson (12/30/2014)
I do agree here and this is where things get really interesting, although metaphorically speaking one is indeed entering the forest of great misconceptions. Heavier iron does not necessarily equal better performance for all problems. As an example, running the test harness posted earlier on this thread, on a beefy Xeon workstation and on a production spec lots-of-cores-buckets-of_memory server, reveals the fact that those massive irons do not produce any performance benefits for this problem, it just about matches Jeff's old i5 laptop. Running the same harness on a snappy i7 laptop produced roughly 1/4 improvement. Keep in mind that the i7 laptop is around five times more expensive than the old i5 laptop.;-)
Exactly. Thanks for doing that bit of analysis. It's exactly the same experience that I've had in the past.
Of course, some folks will ask "what's the difference between 3.5 and 2.4 seconds"? My answer is 1.1 seconds times the number of times it will be executed each day and, perhaps, customer perception. When you're in a hurry, that 1.1 seconds can appear to be quite long.
The number of times that something will be executed is really important in deciding whether or not to tune a query. A recent example for me is a customer wanted me to tune a query that executed in only 900 milliseconds. The trouble was, it was going to be executed a quarter million times per run (225,000 seconds or more than 62 hours) and the run was going to be initiated by a user sitting in front of a computer. I got the run time down to an average of half a millisecond (1800X faster) with a total run time of 125 seconds or just over 2 minutes).
It took me over 4 hours to figure that one out. Do you think it was worth it to the customer?
There's no purchase of iron that could come close to that no matter the cost.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 30, 2014 at 8:22 am
Jeff, so what was the trick?
I guess when you need to execute something only once and it takes 900 miliseconds you give it no further thought.
Still, getting anything at all to go 1800 times faster is amazing.
And if it takes you 4 hours to get there...
So spill the beans!
Even a simple descriptive outline would be sufficient.
December 31, 2014 at 8:34 am
Michael Meierruth (12/30/2014)
Jeff, so what was the trick?I guess when you need to execute something only once and it takes 900 miliseconds you give it no further thought.
Still, getting anything at all to go 1800 times faster is amazing.
And if it takes you 4 hours to get there...
So spill the beans!
Even a simple descriptive outline would be sufficient.
Quick thought, most of us are doing this regularly, seeing improvements in the order of third or fourth magnitude is indeed quite common. My take on this is not how good one is but rather how appalling the existing code/config/whatever was, ORM generated or purely badly written doesn't matter, just plain bad.
π
December 31, 2014 at 8:41 am
Jeff Moden (12/30/2014)
Eirikur Eiriksson (12/30/2014)
I do agree here and this is where things get really interesting, although metaphorically speaking one is indeed entering the forest of great misconceptions. Heavier iron does not necessarily equal better performance for all problems. As an example, running the test harness posted earlier on this thread, on a beefy Xeon workstation and on a production spec lots-of-cores-buckets-of_memory server, reveals the fact that those massive irons do not produce any performance benefits for this problem, it just about matches Jeff's old i5 laptop. Running the same harness on a snappy i7 laptop produced roughly 1/4 improvement. Keep in mind that the i7 laptop is around five times more expensive than the old i5 laptop.;-)Exactly. Thanks for doing that bit of analysis. It's exactly the same experience that I've had in the past.
Of course, some folks will ask "what's the difference between 3.5 and 2.4 seconds"? My answer is 1.1 seconds times the number of times it will be executed each day and, perhaps, customer perception. When you're in a hurry, that 1.1 seconds can appear to be quite long.
The number of times that something will be executed is really important in deciding whether or not to tune a query. A recent example for me is a customer wanted me to tune a query that executed in only 900 milliseconds. The trouble was, it was going to be executed a quarter million times per run (225,000 seconds or more than 62 hours) and the run was going to be initiated by a user sitting in front of a computer. I got the run time down to an average of half a millisecond (1800X faster) with a total run time of 125 seconds or just over 2 minutes).
It took me over 4 hours to figure that one out. Do you think it was worth it to the customer?
There's no purchase of iron that could come close to that no matter the cost.
Couldn't agree more but the sad fact is that often one gets less time to improve things than the actual execution time. Up-scaling the hardware is also often not an option, it would require hardware in the tera herz region which simply does not exist. On the other hand one must make clear that proper sizing is an entirely different chapter than throwing bigger irons at the problem.
π
December 31, 2014 at 9:04 am
I'll show some cards here. I don't have the full process documented, but the results were pretty cool. Just a couple code changes, and a little index tuning brought this query from 523ms to 27ms.
After the indexes, it was a matter of getting rid of a couple expensive spools that distinctifying (or uniqueifying :w00t: ) the data helped with, and losing that subquery as an inner join thing that I hate. Replacing it with CROSS APPLY helped because it breaks the operations up into smaller chunks, and the memory grants it asks for get reused on each execution.
Or so I hear...
Anyway, this is when Jeff and Eirikur point out something that will make it run in 2ms, and I'll go back to feeling like a first generation monkey.
/*
CREATE NONCLUSTERED INDEX [IX_ServerSummary_ServerID_MeasureDate_RAMPagesPerSec_AvailableMemory_RAMPct]
ON [dbo].[ServerSummary]
(
[ServerID],[MeasureDate],[RAMPagesPerSec],[AvailableMemory],[RAMPct]
) WITH (ONLINE = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
*/
/*
Gets rid of index spool
CREATE NONCLUSTERED INDEX [IX_MeasureDate_ServerID_CPUProcTimePCT] ON [dbo].[ServerProcessorSummary]
(
[MeasureDate] ASC,
[ServerID] ASC,
[CPUProcessorTimePct] ASC
)WITH (ONLINE = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
*/
SET STATISTICS IO, TIME ON
--Yours
SELECT s.ArtifactID ,
s.ServerTypeID ,
ss.RAMPagesPerSec ,
AvgPCore.PCore ,
AVG(ISNULL(ss.RAMPct, 0)) OVER ( PARTITION BY ss.MeasureDate,
ss.ServerID ) ,
AvailableMemory ,
dbo.GetServerHourID(s.ArtifactID, ss.MeasureDate) ,
ss.MeasureDate
FROM dbo.ServerSummary ss
INNER JOIN ( SELECT
MeasureDate ,
ServerID ,
AVG(ISNULL(CPUProcessorTimePct, 0)) OVER ( PARTITION BY MeasureDate,
ServerID ) PCore
FROM [dbo].[ServerProcessorSummary]
) AS AvgPCore ON ss.MeasureDate = AvgPCore.MeasureDate
AND ss.ServerID = AvgPCore.ServerID
INNER JOIN dbo.[Server] s ON ss.ServerID = s.ServerID
INNER JOIN DB.dbo.ResourceServer rs ON s.ArtifactID = rs.ArtifactID
WHERE s.ServerTypeID IN ( 1, 3 )
AND ss.MeasureDate > DATEADD(dd, -90, GETUTCDATE())
AND ss.MeasureDate NOT IN (
SELECT SummaryDayHour
FROM dbo.SystemLoadSummary )
--Mine
SELECT x.ArtifactID ,
x.ServerTypeID ,
x.RAMPagesPerSec ,
x.PCore ,
AVG(ISNULL(x.RAMPct, 0)) OVER ( PARTITION BY x.MeasureDate,
x.ServerID ) ,
AvailableMemory ,
dbo.GetServerHourID(x.ArtifactID, x.MeasureDate) ,
x.MeasureDate
FROM ( SELECT DISTINCT
ca.MeasureDate ,
ca.ServerID ,
ca.PCore ,
s.ArtifactID ,
s.ServerTypeID ,
ss.RAMPagesPerSec ,
ss.RAMPct ,
ss.AvailableMemory
FROM dbo.ServerSummary ss
INNER JOIN dbo.[Server] s ON ss.ServerID = s.ServerID
INNER JOIN DB.dbo.ResourceServer rs ON s.ArtifactID = rs.ArtifactID
CROSS APPLY ( SELECT MeasureDate ,
ServerID ,
AVG(ISNULL(CPUProcessorTimePct,
0)) OVER ( PARTITION BY MeasureDate,
ServerID ) PCore
FROM dbo.ServerProcessorSummary sps
WHERE ss.MeasureDate = sps.MeasureDate
AND ss.ServerID = sps.ServerID
) AS ca
WHERE s.ServerTypeID IN ( 1, 3 )
AND ss.MeasureDate > DATEADD(dd, -90, GETUTCDATE())
AND ss.MeasureDate NOT IN (
SELECT DISTINCT
SummaryDayHour
FROM dbo.SystemLoadSummary )
) AS x
December 31, 2014 at 10:56 am
Michael Meierruth (12/30/2014)
Jeff, so what was the trick?I guess when you need to execute something only once and it takes 900 miliseconds you give it no further thought.
Still, getting anything at all to go 1800 times faster is amazing.
And if it takes you 4 hours to get there...
So spill the beans!
Even a simple descriptive outline would be sufficient.
The problem was that they had an EAV where they wanted to get an exact match on multiple elements. We'll save the discussion about whether I think EAVs are good or bad for a later time.
Anyhow, they were using conventional methods of interrogating the table with multiple simultaneous self joins. I converted it to a cascading cte where each cte referred to the previous cte and used it as another join to the table. What this did was very quickly isolate just the IDs that met the requirements of the cascade and the final select with its final join to the table very quickly returned all of the rows (joined by ID) that had an entity that contained all of the elements that were on individual rows.
Think of it as a "Reverse Triangular Join".
--Jeff Moden
Change is inevitable... Change for the better is not.
December 31, 2014 at 12:12 pm
Eirikur Eiriksson (12/31/2014)
Michael Meierruth (12/30/2014)
Jeff, so what was the trick?I guess when you need to execute something only once and it takes 900 miliseconds you give it no further thought.
Still, getting anything at all to go 1800 times faster is amazing.
And if it takes you 4 hours to get there...
So spill the beans!
Even a simple descriptive outline would be sufficient.
Quick thought, most of us are doing this regularly, seeing improvements in the order of third or fourth magnitude is indeed quite common. My take on this is not how good one is but rather how appalling the existing code/config/whatever was, ORM generated or purely badly written doesn't matter, just plain bad.
π
Heh... and then there are other times when it requires hours of sucking down beer popsicles in the presence of dust bunnies in the corner whilst wrapped in your favorite flannel toga, hugging a pillow, endlessly twiddling your hair and speaking in tongues to the SQL gods. π
--Jeff Moden
Change is inevitable... Change for the better is not.
December 31, 2014 at 12:19 pm
Jeff Moden (12/31/2014)
Eirikur Eiriksson (12/31/2014)
Michael Meierruth (12/30/2014)
Jeff, so what was the trick?I guess when you need to execute something only once and it takes 900 miliseconds you give it no further thought.
Still, getting anything at all to go 1800 times faster is amazing.
And if it takes you 4 hours to get there...
So spill the beans!
Even a simple descriptive outline would be sufficient.
Quick thought, most of us are doing this regularly, seeing improvements in the order of third or fourth magnitude is indeed quite common. My take on this is not how good one is but rather how appalling the existing code/config/whatever was, ORM generated or purely badly written doesn't matter, just plain bad.
π
Heh... and then there are other times when it requires hours of sucking down beer popsicles in the presence of dust bunnies in the corner whilst wrapped in your favorite flannel toga, hugging a pillow, endlessly twiddling your hair and speaking in tongues to the SQL gods. π
TouchΓ©! apart from the twiddling of the hair, that's a thing of the past:rolleyes:
π
January 6, 2015 at 5:18 am
Michael Meierruth (12/30/2014)
Jeff, so what was the trick?I guess when you need to execute something only once and it takes 900 miliseconds you give it no further thought.
Still, getting anything at all to go 1800 times faster is amazing.
And if it takes you 4 hours to get there...
So spill the beans!
Even a simple descriptive outline would be sufficient.
Here you go.
January 6, 2015 at 8:28 am
aaron.reese (1/6/2015)
Michael Meierruth (12/30/2014)
Jeff, so what was the trick?I guess when you need to execute something only once and it takes 900 miliseconds you give it no further thought.
Still, getting anything at all to go 1800 times faster is amazing.
And if it takes you 4 hours to get there...
So spill the beans!
Even a simple descriptive outline would be sufficient.
Here you go.
The chart didn't have anything for a quarter million times in a single run that could be executed dozens of times per day. π
--Jeff Moden
Change is inevitable... Change for the better is not.
January 6, 2015 at 4:08 pm
No, but I think 4 hours should be OK:-)
Viewing 15 posts - 16 through 29 (of 29 total)
You must be logged in to reply to this topic. Login to reply