Faster way to populate number table variation with inline tally?

Question

Faster way to populate number table variation with inline tally?

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 1

sqldriver (12/29/2014)
Jeff Moden (12/29/2014)
sqldriver (12/29/2014)
I'm not exactly sure how to test that, but I do have a 2014 Dev edition installed, and ran the below code.
If there's something in particular you'd like me to try to confirm this behavior, let me know.
SET NOCOUNT ON
SET STATISTICS IO, TIME ON
/*
SELECT @@VERSION
Microsoft SQL Server 2014 - 12.0.2456.0 (X64)
Dec 11 2014 17:32:03
Copyright (c) Microsoft Corporation
Developer Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)
*/
USE AdventureWorks2014
;
WITH pp
AS ( SELECT p.BusinessEntityID ,
p.Title ,
p.FirstName ,
p.LastName
FROM Person.Person AS p
WHERE p.BusinessEntityID % 2 = 0
)
SELECT *
FROM pp p1
JOIN pp p2 ON p2.BusinessEntityID = p1.BusinessEntityID
/*
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Person'. Scan count 2, logical reads 7638, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
*/
;
WITH pp
AS ( SELECT p.BusinessEntityID ,
p.Title ,
p.FirstName ,
p.LastName
FROM Person.Person AS p
WHERE p.BusinessEntityID % 2 = 0
),
pp2 AS( SELECT p1.BusinessEntityID ,
p1.Title ,
p1.FirstName ,
p1.LastName
FROM pp p1
)
SELECT *
FROM pp2 p1
JOIN pp2 p2 ON p2.BusinessEntityID = p1.BusinessEntityID
/*
SQL Server Execution Times:
CPU time = 32 ms, elapsed time = 137 ms.
Table 'Person'. Scan count 2, logical reads 7638, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
*/
EDIT: had top 10 in one query but not the other.
Just look at the actual execution plan for two sets of blocks that look similar.
See attached; doesn't seem to.
I can test on 2012 later tonight. The laptop I had it installed on is sleeping with the BlackBerrys, so need a few to dig up the CD and install it somewhere else.

Looks like it does to me. It's doing 2 clustered index scans on the person table.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sqldriver SSChampion Points: 10440 More actions · Answer 2

Jeff Moden (12/29/2014)
sqldriver (12/29/2014)
Jeff Moden (12/29/2014)
sqldriver (12/29/2014)
I'm not exactly sure how to test that, but I do have a 2014 Dev edition installed, and ran the below code.
If there's something in particular you'd like me to try to confirm this behavior, let me know.
SET NOCOUNT ON
SET STATISTICS IO, TIME ON
/*
SELECT @@VERSION
Microsoft SQL Server 2014 - 12.0.2456.0 (X64)
Dec 11 2014 17:32:03
Copyright (c) Microsoft Corporation
Developer Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)
*/
USE AdventureWorks2014
;
WITH pp
AS ( SELECT p.BusinessEntityID ,
p.Title ,
p.FirstName ,
p.LastName
FROM Person.Person AS p
WHERE p.BusinessEntityID % 2 = 0
)
SELECT *
FROM pp p1
JOIN pp p2 ON p2.BusinessEntityID = p1.BusinessEntityID
/*
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Person'. Scan count 2, logical reads 7638, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
*/
;
WITH pp
AS ( SELECT p.BusinessEntityID ,
p.Title ,
p.FirstName ,
p.LastName
FROM Person.Person AS p
WHERE p.BusinessEntityID % 2 = 0
),
pp2 AS( SELECT p1.BusinessEntityID ,
p1.Title ,
p1.FirstName ,
p1.LastName
FROM pp p1
)
SELECT *
FROM pp2 p1
JOIN pp2 p2 ON p2.BusinessEntityID = p1.BusinessEntityID
/*
SQL Server Execution Times:
CPU time = 32 ms, elapsed time = 137 ms.
Table 'Person'. Scan count 2, logical reads 7638, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
*/
EDIT: had top 10 in one query but not the other.
Just look at the actual execution plan for two sets of blocks that look similar.
See attached; doesn't seem to.
I can test on 2012 later tonight. The laptop I had it installed on is sleeping with the BlackBerrys, so need a few to dig up the CD and install it somewhere else.
Looks like it does to me. It's doing 2 clustered index scans on the person table.

Now I see what you mean, after re-reading your comment about self-joins.

I do not like this.

WITH pp

AS ( SELECT p.BusinessEntityID ,

p.Title ,

p.FirstName ,

p.LastName

FROM Person.Person AS p

WHERE p.BusinessEntityID % 2 = 0

)

SELECT *

FROM pp p1

JOIN pp p2

ON p2.BusinessEntityID = p1.BusinessEntityID

JOIN pp p3

ONp3.BusinessEntityID = p1.BusinessEntityID

JOIN pp p4

ON p4.BusinessEntityID = p1.BusinessEntityID

JOIN pp p5

ON p5.BusinessEntityID = p1.BusinessEntityID

JOIN pp p6

ON p6.BusinessEntityID = p1.BusinessEntityID

JOIN pp p7

ON p7.BusinessEntityID = p1.BusinessEntityID

JOIN pp p8

ON p8.BusinessEntityID = p1.BusinessEntityID

JOIN pp p9

ON p9.BusinessEntityID = p1.BusinessEntityID

JOIN pp p10

ON p10.BusinessEntityID = p1.BusinessEntityID

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 3

Yep... when it comes to self-joins/multiple references, CTEs truly live up to the alternate name of "inline views". Like I said, it's sometimes MUCH more effective to direct the output of what you would have used as a CTE to a table and then multi-reference the table. For big stuff, it also give you the opportunity to add the right kind of indexes to the table.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eirikur Eiriksson SSC Guru Points: 182915 More actions · Answer 4

Jeff Moden (12/28/2014)
Eirikur Eiriksson (12/27/2014)
Run results
OPERATION DURATION
--------------------------------------------------------------- -----------
Eirikur's version with Jeff's tricks 2416806
Jeff's version, which also creates and populates the table... 2628008
Eirikur's version 3656208
sqldriver's version 10546826
p.s. I continue to be amazed at the performance of the hardware you're running on, Eirikur. My little 4 processor i5 laptop doesn't stand a chance.

Now you are in for a little surprise Jeff as I'm running this on a "slightly tinkered" i5 laptop (E6220), SQL Server 2014, second generation Intel SSD and most importantly an optimal set-up of the hardware for the SQL Server, must admit it hums along nicely, boots from cold in less than seven seconds 😀

😎

Shifting gears a bit and just to share a conviction of mine, I've recently been involved in a discussion about throwing heavy iron at problem code rather than fixing the code itself. As impressed as I am with the hardware, this is an outstanding example of how much more ROI one can get by knowing the ropes of T-SQL. The original problem on my laptop took 16524945 _uS and took 10546826 _uS on that wonderful box of yours. That's an improvement of 36%, which is awesome for "just" a hardware change. Using the time from my lesser powered laptop even for the version that uses ISNULL(), the time dropped to 3014173, which is nearly an 82% improvement. In simpler terms, the hardware improvement allowed for a 1/2X improvement. The change in code that we both made exceeds a 4.4X improvement in performance.
I wonder where those folks on that other discussion I've been having will find a computer that actually runs 4.4X faster and how much it will cost? 😀 And it doesn't appear that MPP will actually help for a query like this one because 1) parallelism is being used even in the original query and 2) you actually might have to rewrite code anyway to take advantage of MPP because it's a different paradigm.

I do agree here and this is where things get really interesting, although metaphorically speaking one is indeed entering the forest of great misconceptions. Heavier iron does not necessarily equal better performance for all problems. As an example, running the test harness posted earlier on this thread, on a beefy Xeon workstation and on a production spec lots-of-cores-buckets-of_memory server, reveals the fact that those massive irons do not produce any performance benefits for this problem, it just about matches Jeff's old i5 laptop. Running the same harness on a snappy i7 laptop produced roughly 1/4 improvement. Keep in mind that the i7 laptop is around five times more expensive than the old i5 laptop.;-)

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 5

Eirikur Eiriksson (12/30/2014)
I do agree here and this is where things get really interesting, although metaphorically speaking one is indeed entering the forest of great misconceptions. Heavier iron does not necessarily equal better performance for all problems. As an example, running the test harness posted earlier on this thread, on a beefy Xeon workstation and on a production spec lots-of-cores-buckets-of_memory server, reveals the fact that those massive irons do not produce any performance benefits for this problem, it just about matches Jeff's old i5 laptop. Running the same harness on a snappy i7 laptop produced roughly 1/4 improvement. Keep in mind that the i7 laptop is around five times more expensive than the old i5 laptop.;-)

Exactly. Thanks for doing that bit of analysis. It's exactly the same experience that I've had in the past.

Of course, some folks will ask "what's the difference between 3.5 and 2.4 seconds"? My answer is 1.1 seconds times the number of times it will be executed each day and, perhaps, customer perception. When you're in a hurry, that 1.1 seconds can appear to be quite long.

The number of times that something will be executed is really important in deciding whether or not to tune a query. A recent example for me is a customer wanted me to tune a query that executed in only 900 milliseconds. The trouble was, it was going to be executed a quarter million times per run (225,000 seconds or more than 62 hours) and the run was going to be initiated by a user sitting in front of a computer. I got the run time down to an average of half a millisecond (1800X faster) with a total run time of 125 seconds or just over 2 minutes).

It took me over 4 hours to figure that one out. Do you think it was worth it to the customer?

There's no purchase of iron that could come close to that no matter the cost.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Michael Meierruth SSChampion Points: 10051 More actions · Answer 6

Jeff, so what was the trick?

I guess when you need to execute something only once and it takes 900 miliseconds you give it no further thought.

Still, getting anything at all to go 1800 times faster is amazing.

And if it takes you 4 hours to get there...

So spill the beans!

Even a simple descriptive outline would be sufficient.

Eirikur Eiriksson SSC Guru Points: 182915 More actions · Answer 7

Michael Meierruth (12/30/2014)
Jeff, so what was the trick?
I guess when you need to execute something only once and it takes 900 miliseconds you give it no further thought.
Still, getting anything at all to go 1800 times faster is amazing.
And if it takes you 4 hours to get there...
So spill the beans!
Even a simple descriptive outline would be sufficient.

Quick thought, most of us are doing this regularly, seeing improvements in the order of third or fourth magnitude is indeed quite common. My take on this is not how good one is but rather how appalling the existing code/config/whatever was, ORM generated or purely badly written doesn't matter, just plain bad.

😎

Eirikur Eiriksson SSC Guru Points: 182915 More actions · Answer 8

Jeff Moden (12/30/2014)
Eirikur Eiriksson (12/30/2014)
I do agree here and this is where things get really interesting, although metaphorically speaking one is indeed entering the forest of great misconceptions. Heavier iron does not necessarily equal better performance for all problems. As an example, running the test harness posted earlier on this thread, on a beefy Xeon workstation and on a production spec lots-of-cores-buckets-of_memory server, reveals the fact that those massive irons do not produce any performance benefits for this problem, it just about matches Jeff's old i5 laptop. Running the same harness on a snappy i7 laptop produced roughly 1/4 improvement. Keep in mind that the i7 laptop is around five times more expensive than the old i5 laptop.;-)
Exactly. Thanks for doing that bit of analysis. It's exactly the same experience that I've had in the past.
Of course, some folks will ask "what's the difference between 3.5 and 2.4 seconds"? My answer is 1.1 seconds times the number of times it will be executed each day and, perhaps, customer perception. When you're in a hurry, that 1.1 seconds can appear to be quite long.
The number of times that something will be executed is really important in deciding whether or not to tune a query. A recent example for me is a customer wanted me to tune a query that executed in only 900 milliseconds. The trouble was, it was going to be executed a quarter million times per run (225,000 seconds or more than 62 hours) and the run was going to be initiated by a user sitting in front of a computer. I got the run time down to an average of half a millisecond (1800X faster) with a total run time of 125 seconds or just over 2 minutes).
It took me over 4 hours to figure that one out. Do you think it was worth it to the customer?
There's no purchase of iron that could come close to that no matter the cost.

Couldn't agree more but the sad fact is that often one gets less time to improve things than the actual execution time. Up-scaling the hardware is also often not an option, it would require hardware in the tera herz region which simply does not exist. On the other hand one must make clear that proper sizing is an entirely different chapter than throwing bigger irons at the problem.

😎

sqldriver SSChampion Points: 10440 More actions · Answer 9

I'll show some cards here. I don't have the full process documented, but the results were pretty cool. Just a couple code changes, and a little index tuning brought this query from 523ms to 27ms.

After the indexes, it was a matter of getting rid of a couple expensive spools that distinctifying (or uniqueifying :w00t: ) the data helped with, and losing that subquery as an inner join thing that I hate. Replacing it with CROSS APPLY helped because it breaks the operations up into smaller chunks, and the memory grants it asks for get reused on each execution.

Or so I hear...

Anyway, this is when Jeff and Eirikur point out something that will make it run in 2ms, and I'll go back to feeling like a first generation monkey.

/*

CREATE NONCLUSTERED INDEX [IX_ServerSummary_ServerID_MeasureDate_RAMPagesPerSec_AvailableMemory_RAMPct]

ON [dbo].[ServerSummary]

(

[ServerID],[MeasureDate],[RAMPagesPerSec],[AvailableMemory],[RAMPct]

) WITH (ONLINE = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

*/

/*

Gets rid of index spool

CREATE NONCLUSTERED INDEX [IX_MeasureDate_ServerID_CPUProcTimePCT] ON [dbo].[ServerProcessorSummary]

(

[MeasureDate] ASC,

[ServerID] ASC,

[CPUProcessorTimePct] ASC

)WITH (ONLINE = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

*/

SET STATISTICS IO, TIME ON

--Yours

SELECT s.ArtifactID ,

s.ServerTypeID ,

ss.RAMPagesPerSec ,

AvgPCore.PCore ,

AVG(ISNULL(ss.RAMPct, 0)) OVER ( PARTITION BY ss.MeasureDate,

ss.ServerID ) ,

AvailableMemory ,

dbo.GetServerHourID(s.ArtifactID, ss.MeasureDate) ,

ss.MeasureDate

FROM dbo.ServerSummary ss

INNER JOIN ( SELECT

MeasureDate ,

ServerID ,

AVG(ISNULL(CPUProcessorTimePct, 0)) OVER ( PARTITION BY MeasureDate,

ServerID ) PCore

FROM [dbo].[ServerProcessorSummary]

) AS AvgPCore ON ss.MeasureDate = AvgPCore.MeasureDate

AND ss.ServerID = AvgPCore.ServerID

INNER JOIN dbo.[Server] s ON ss.ServerID = s.ServerID

INNER JOIN DB.dbo.ResourceServer rs ON s.ArtifactID = rs.ArtifactID

WHERE s.ServerTypeID IN ( 1, 3 )

AND ss.MeasureDate > DATEADD(dd, -90, GETUTCDATE())

AND ss.MeasureDate NOT IN (

SELECT SummaryDayHour

FROM dbo.SystemLoadSummary )

--Mine

SELECT x.ArtifactID ,

x.ServerTypeID ,

x.RAMPagesPerSec ,

x.PCore ,

AVG(ISNULL(x.RAMPct, 0)) OVER ( PARTITION BY x.MeasureDate,

x.ServerID ) ,

AvailableMemory ,

dbo.GetServerHourID(x.ArtifactID, x.MeasureDate) ,

x.MeasureDate

FROM ( SELECT DISTINCT

ca.MeasureDate ,

ca.ServerID ,

ca.PCore ,

s.ArtifactID ,

s.ServerTypeID ,

ss.RAMPagesPerSec ,

ss.RAMPct ,

ss.AvailableMemory

FROM dbo.ServerSummary ss

INNER JOIN dbo.[Server] s ON ss.ServerID = s.ServerID

INNER JOIN DB.dbo.ResourceServer rs ON s.ArtifactID = rs.ArtifactID

CROSS APPLY ( SELECT MeasureDate ,

ServerID ,

AVG(ISNULL(CPUProcessorTimePct,

0)) OVER ( PARTITION BY MeasureDate,

ServerID ) PCore

FROM dbo.ServerProcessorSummary sps

WHERE ss.MeasureDate = sps.MeasureDate

AND ss.ServerID = sps.ServerID

) AS ca

WHERE s.ServerTypeID IN ( 1, 3 )

AND ss.MeasureDate > DATEADD(dd, -90, GETUTCDATE())

AND ss.MeasureDate NOT IN (

SELECT DISTINCT

SummaryDayHour

FROM dbo.SystemLoadSummary )

) AS x

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 10

Michael Meierruth (12/30/2014)
Jeff, so what was the trick?
I guess when you need to execute something only once and it takes 900 miliseconds you give it no further thought.
Still, getting anything at all to go 1800 times faster is amazing.
And if it takes you 4 hours to get there...
So spill the beans!
Even a simple descriptive outline would be sufficient.

The problem was that they had an EAV where they wanted to get an exact match on multiple elements. We'll save the discussion about whether I think EAVs are good or bad for a later time.

Anyhow, they were using conventional methods of interrogating the table with multiple simultaneous self joins. I converted it to a cascading cte where each cte referred to the previous cte and used it as another join to the table. What this did was very quickly isolate just the IDs that met the requirements of the cascade and the final select with its final join to the table very quickly returned all of the rows (joined by ID) that had an entity that contained all of the elements that were on individual rows.

Think of it as a "Reverse Triangular Join".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 11

Eirikur Eiriksson (12/31/2014)
Michael Meierruth (12/30/2014)
Jeff, so what was the trick?
I guess when you need to execute something only once and it takes 900 miliseconds you give it no further thought.
Still, getting anything at all to go 1800 times faster is amazing.
And if it takes you 4 hours to get there...
So spill the beans!
Even a simple descriptive outline would be sufficient.
Quick thought, most of us are doing this regularly, seeing improvements in the order of third or fourth magnitude is indeed quite common. My take on this is not how good one is but rather how appalling the existing code/config/whatever was, ORM generated or purely badly written doesn't matter, just plain bad.
😎

Heh... and then there are other times when it requires hours of sucking down beer popsicles in the presence of dust bunnies in the corner whilst wrapped in your favorite flannel toga, hugging a pillow, endlessly twiddling your hair and speaking in tongues to the SQL gods. 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eirikur Eiriksson SSC Guru Points: 182915 More actions · Answer 12

Jeff Moden (12/31/2014)
Eirikur Eiriksson (12/31/2014)
Michael Meierruth (12/30/2014)
Jeff, so what was the trick?
I guess when you need to execute something only once and it takes 900 miliseconds you give it no further thought.
Still, getting anything at all to go 1800 times faster is amazing.
And if it takes you 4 hours to get there...
So spill the beans!
Even a simple descriptive outline would be sufficient.
Quick thought, most of us are doing this regularly, seeing improvements in the order of third or fourth magnitude is indeed quite common. My take on this is not how good one is but rather how appalling the existing code/config/whatever was, ORM generated or purely badly written doesn't matter, just plain bad.
😎
Heh... and then there are other times when it requires hours of sucking down beer popsicles in the presence of dust bunnies in the corner whilst wrapped in your favorite flannel toga, hugging a pillow, endlessly twiddling your hair and speaking in tongues to the SQL gods. 😛

Touché! apart from the twiddling of the hair, that's a thing of the past:rolleyes:

😎

aaron.reese SSChampion Points: 13498 More actions · Answer 13

Michael Meierruth (12/30/2014)
Jeff, so what was the trick?
I guess when you need to execute something only once and it takes 900 miliseconds you give it no further thought.
Still, getting anything at all to go 1800 times faster is amazing.
And if it takes you 4 hours to get there...
So spill the beans!
Even a simple descriptive outline would be sufficient.

Here you go.

http://xkcd.com/1205/

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 14

aaron.reese (1/6/2015)
Michael Meierruth (12/30/2014)
Jeff, so what was the trick?
I guess when you need to execute something only once and it takes 900 miliseconds you give it no further thought.
Still, getting anything at all to go 1800 times faster is amazing.
And if it takes you 4 hours to get there...
So spill the beans!
Even a simple descriptive outline would be sufficient.
Here you go.
http://xkcd.com/1205/

The chart didn't have anything for a quarter million times in a single run that could be executed dozens of times per day. 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

aaron.reese SSChampion Points: 13498 More actions · Answer 15

aaron.reese

SSChampion

Points: 13498

January 6, 2015 at 4:08 pm

#1768291

No, but I think 4 hours should be OK:-)