Faster way to populate number table variation with inline tally?

  • sqldriver (12/29/2014)


    Jeff Moden (12/29/2014)


    sqldriver (12/29/2014)


    I'm not exactly sure how to test that, but I do have a 2014 Dev edition installed, and ran the below code.

    If there's something in particular you'd like me to try to confirm this behavior, let me know.

    SET NOCOUNT ON

    SET STATISTICS IO, TIME ON

    /*

    SELECT @@VERSION

    Microsoft SQL Server 2014 - 12.0.2456.0 (X64)

    Dec 11 2014 17:32:03

    Copyright (c) Microsoft Corporation

    Developer Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)

    */

    USE AdventureWorks2014

    ;

    WITH pp

    AS ( SELECT p.BusinessEntityID ,

    p.Title ,

    p.FirstName ,

    p.LastName

    FROM Person.Person AS p

    WHERE p.BusinessEntityID % 2 = 0

    )

    SELECT *

    FROM pp p1

    JOIN pp p2 ON p2.BusinessEntityID = p1.BusinessEntityID

    /*

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table 'Person'. Scan count 2, logical reads 7638, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    */

    ;

    WITH pp

    AS ( SELECT p.BusinessEntityID ,

    p.Title ,

    p.FirstName ,

    p.LastName

    FROM Person.Person AS p

    WHERE p.BusinessEntityID % 2 = 0

    ),

    pp2 AS( SELECT p1.BusinessEntityID ,

    p1.Title ,

    p1.FirstName ,

    p1.LastName

    FROM pp p1

    )

    SELECT *

    FROM pp2 p1

    JOIN pp2 p2 ON p2.BusinessEntityID = p1.BusinessEntityID

    /*

    SQL Server Execution Times:

    CPU time = 32 ms, elapsed time = 137 ms.

    Table 'Person'. Scan count 2, logical reads 7638, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    */

    EDIT: had top 10 in one query but not the other.

    Just look at the actual execution plan for two sets of blocks that look similar.

    See attached; doesn't seem to.

    I can test on 2012 later tonight. The laptop I had it installed on is sleeping with the BlackBerrys, so need a few to dig up the CD and install it somewhere else.

    Looks like it does to me. It's doing 2 clustered index scans on the person table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/29/2014)


    sqldriver (12/29/2014)


    Jeff Moden (12/29/2014)


    sqldriver (12/29/2014)


    I'm not exactly sure how to test that, but I do have a 2014 Dev edition installed, and ran the below code.

    If there's something in particular you'd like me to try to confirm this behavior, let me know.

    SET NOCOUNT ON

    SET STATISTICS IO, TIME ON

    /*

    SELECT @@VERSION

    Microsoft SQL Server 2014 - 12.0.2456.0 (X64)

    Dec 11 2014 17:32:03

    Copyright (c) Microsoft Corporation

    Developer Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)

    */

    USE AdventureWorks2014

    ;

    WITH pp

    AS ( SELECT p.BusinessEntityID ,

    p.Title ,

    p.FirstName ,

    p.LastName

    FROM Person.Person AS p

    WHERE p.BusinessEntityID % 2 = 0

    )

    SELECT *

    FROM pp p1

    JOIN pp p2 ON p2.BusinessEntityID = p1.BusinessEntityID

    /*

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Table 'Person'. Scan count 2, logical reads 7638, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    */

    ;

    WITH pp

    AS ( SELECT p.BusinessEntityID ,

    p.Title ,

    p.FirstName ,

    p.LastName

    FROM Person.Person AS p

    WHERE p.BusinessEntityID % 2 = 0

    ),

    pp2 AS( SELECT p1.BusinessEntityID ,

    p1.Title ,

    p1.FirstName ,

    p1.LastName

    FROM pp p1

    )

    SELECT *

    FROM pp2 p1

    JOIN pp2 p2 ON p2.BusinessEntityID = p1.BusinessEntityID

    /*

    SQL Server Execution Times:

    CPU time = 32 ms, elapsed time = 137 ms.

    Table 'Person'. Scan count 2, logical reads 7638, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    */

    EDIT: had top 10 in one query but not the other.

    Just look at the actual execution plan for two sets of blocks that look similar.

    See attached; doesn't seem to.

    I can test on 2012 later tonight. The laptop I had it installed on is sleeping with the BlackBerrys, so need a few to dig up the CD and install it somewhere else.

    Looks like it does to me. It's doing 2 clustered index scans on the person table.

    Now I see what you mean, after re-reading your comment about self-joins.

    I do not like this.

    WITH pp

    AS ( SELECT p.BusinessEntityID ,

    p.Title ,

    p.FirstName ,

    p.LastName

    FROM Person.Person AS p

    WHERE p.BusinessEntityID % 2 = 0

    )

    SELECT *

    FROM pp p1

    JOIN pp p2

    ON p2.BusinessEntityID = p1.BusinessEntityID

    JOIN pp p3

    ONp3.BusinessEntityID = p1.BusinessEntityID

    JOIN pp p4

    ON p4.BusinessEntityID = p1.BusinessEntityID

    JOIN pp p5

    ON p5.BusinessEntityID = p1.BusinessEntityID

    JOIN pp p6

    ON p6.BusinessEntityID = p1.BusinessEntityID

    JOIN pp p7

    ON p7.BusinessEntityID = p1.BusinessEntityID

    JOIN pp p8

    ON p8.BusinessEntityID = p1.BusinessEntityID

    JOIN pp p9

    ON p9.BusinessEntityID = p1.BusinessEntityID

    JOIN pp p10

    ON p10.BusinessEntityID = p1.BusinessEntityID

  • Yep... when it comes to self-joins/multiple references, CTEs truly live up to the alternate name of "inline views". Like I said, it's sometimes MUCH more effective to direct the output of what you would have used as a CTE to a table and then multi-reference the table. For big stuff, it also give you the opportunity to add the right kind of indexes to the table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/28/2014)


    Eirikur Eiriksson (12/27/2014)


    Run results

    OPERATION DURATION

    --------------------------------------------------------------- -----------

    Eirikur's version with Jeff's tricks 2416806

    Jeff's version, which also creates and populates the table... 2628008

    Eirikur's version 3656208

    sqldriver's version 10546826

    p.s. I continue to be amazed at the performance of the hardware you're running on, Eirikur. My little 4 processor i5 laptop doesn't stand a chance.

    Now you are in for a little surprise Jeff as I'm running this on a "slightly tinkered" i5 laptop (E6220), SQL Server 2014, second generation Intel SSD and most importantly an optimal set-up of the hardware for the SQL Server, must admit it hums along nicely, boots from cold in less than seven seconds πŸ˜€

    😎

    Shifting gears a bit and just to share a conviction of mine, I've recently been involved in a discussion about throwing heavy iron at problem code rather than fixing the code itself. As impressed as I am with the hardware, this is an outstanding example of how much more ROI one can get by knowing the ropes of T-SQL. The original problem on my laptop took 16524945 uS and took 10546826 uS on that wonderful box of yours. That's an improvement of 36%, which is awesome for "just" a hardware change. Using the time from my lesser powered laptop even for the version that uses ISNULL(), the time dropped to 3014173, which is nearly an 82% improvement. In simpler terms, the hardware improvement allowed for a 1/2X improvement. The change in code that we both made exceeds a 4.4X improvement in performance.

    I wonder where those folks on that other discussion I've been having will find a computer that actually runs 4.4X faster and how much it will cost? πŸ˜€ And it doesn't appear that MPP will actually help for a query like this one because 1) parallelism is being used even in the original query and 2) you actually might have to rewrite code anyway to take advantage of MPP because it's a different paradigm.

    I do agree here and this is where things get really interesting, although metaphorically speaking one is indeed entering the forest of great misconceptions. Heavier iron does not necessarily equal better performance for all problems. As an example, running the test harness posted earlier on this thread, on a beefy Xeon workstation and on a production spec lots-of-cores-buckets-of_memory server, reveals the fact that those massive irons do not produce any performance benefits for this problem, it just about matches Jeff's old i5 laptop. Running the same harness on a snappy i7 laptop produced roughly 1/4 improvement. Keep in mind that the i7 laptop is around five times more expensive than the old i5 laptop.;-)

  • Eirikur Eiriksson (12/30/2014)


    I do agree here and this is where things get really interesting, although metaphorically speaking one is indeed entering the forest of great misconceptions. Heavier iron does not necessarily equal better performance for all problems. As an example, running the test harness posted earlier on this thread, on a beefy Xeon workstation and on a production spec lots-of-cores-buckets-of_memory server, reveals the fact that those massive irons do not produce any performance benefits for this problem, it just about matches Jeff's old i5 laptop. Running the same harness on a snappy i7 laptop produced roughly 1/4 improvement. Keep in mind that the i7 laptop is around five times more expensive than the old i5 laptop.;-)

    Exactly. Thanks for doing that bit of analysis. It's exactly the same experience that I've had in the past.

    Of course, some folks will ask "what's the difference between 3.5 and 2.4 seconds"? My answer is 1.1 seconds times the number of times it will be executed each day and, perhaps, customer perception. When you're in a hurry, that 1.1 seconds can appear to be quite long.

    The number of times that something will be executed is really important in deciding whether or not to tune a query. A recent example for me is a customer wanted me to tune a query that executed in only 900 milliseconds. The trouble was, it was going to be executed a quarter million times per run (225,000 seconds or more than 62 hours) and the run was going to be initiated by a user sitting in front of a computer. I got the run time down to an average of half a millisecond (1800X faster) with a total run time of 125 seconds or just over 2 minutes).

    It took me over 4 hours to figure that one out. Do you think it was worth it to the customer?

    There's no purchase of iron that could come close to that no matter the cost.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, so what was the trick?

    I guess when you need to execute something only once and it takes 900 miliseconds you give it no further thought.

    Still, getting anything at all to go 1800 times faster is amazing.

    And if it takes you 4 hours to get there...

    So spill the beans!

    Even a simple descriptive outline would be sufficient.

  • Michael Meierruth (12/30/2014)


    Jeff, so what was the trick?

    I guess when you need to execute something only once and it takes 900 miliseconds you give it no further thought.

    Still, getting anything at all to go 1800 times faster is amazing.

    And if it takes you 4 hours to get there...

    So spill the beans!

    Even a simple descriptive outline would be sufficient.

    Quick thought, most of us are doing this regularly, seeing improvements in the order of third or fourth magnitude is indeed quite common. My take on this is not how good one is but rather how appalling the existing code/config/whatever was, ORM generated or purely badly written doesn't matter, just plain bad.

    😎

  • Jeff Moden (12/30/2014)


    Eirikur Eiriksson (12/30/2014)


    I do agree here and this is where things get really interesting, although metaphorically speaking one is indeed entering the forest of great misconceptions. Heavier iron does not necessarily equal better performance for all problems. As an example, running the test harness posted earlier on this thread, on a beefy Xeon workstation and on a production spec lots-of-cores-buckets-of_memory server, reveals the fact that those massive irons do not produce any performance benefits for this problem, it just about matches Jeff's old i5 laptop. Running the same harness on a snappy i7 laptop produced roughly 1/4 improvement. Keep in mind that the i7 laptop is around five times more expensive than the old i5 laptop.;-)

    Exactly. Thanks for doing that bit of analysis. It's exactly the same experience that I've had in the past.

    Of course, some folks will ask "what's the difference between 3.5 and 2.4 seconds"? My answer is 1.1 seconds times the number of times it will be executed each day and, perhaps, customer perception. When you're in a hurry, that 1.1 seconds can appear to be quite long.

    The number of times that something will be executed is really important in deciding whether or not to tune a query. A recent example for me is a customer wanted me to tune a query that executed in only 900 milliseconds. The trouble was, it was going to be executed a quarter million times per run (225,000 seconds or more than 62 hours) and the run was going to be initiated by a user sitting in front of a computer. I got the run time down to an average of half a millisecond (1800X faster) with a total run time of 125 seconds or just over 2 minutes).

    It took me over 4 hours to figure that one out. Do you think it was worth it to the customer?

    There's no purchase of iron that could come close to that no matter the cost.

    Couldn't agree more but the sad fact is that often one gets less time to improve things than the actual execution time. Up-scaling the hardware is also often not an option, it would require hardware in the tera herz region which simply does not exist. On the other hand one must make clear that proper sizing is an entirely different chapter than throwing bigger irons at the problem.

    😎

  • I'll show some cards here. I don't have the full process documented, but the results were pretty cool. Just a couple code changes, and a little index tuning brought this query from 523ms to 27ms.

    After the indexes, it was a matter of getting rid of a couple expensive spools that distinctifying (or uniqueifying :w00t: ) the data helped with, and losing that subquery as an inner join thing that I hate. Replacing it with CROSS APPLY helped because it breaks the operations up into smaller chunks, and the memory grants it asks for get reused on each execution.

    Or so I hear...

    Anyway, this is when Jeff and Eirikur point out something that will make it run in 2ms, and I'll go back to feeling like a first generation monkey.

    /*

    CREATE NONCLUSTERED INDEX [IX_ServerSummary_ServerID_MeasureDate_RAMPagesPerSec_AvailableMemory_RAMPct]

    ON [dbo].[ServerSummary]

    (

    [ServerID],[MeasureDate],[RAMPagesPerSec],[AvailableMemory],[RAMPct]

    ) WITH (ONLINE = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    */

    /*

    Gets rid of index spool

    CREATE NONCLUSTERED INDEX [IX_MeasureDate_ServerID_CPUProcTimePCT] ON [dbo].[ServerProcessorSummary]

    (

    [MeasureDate] ASC,

    [ServerID] ASC,

    [CPUProcessorTimePct] ASC

    )WITH (ONLINE = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    */

    SET STATISTICS IO, TIME ON

    --Yours

    SELECT s.ArtifactID ,

    s.ServerTypeID ,

    ss.RAMPagesPerSec ,

    AvgPCore.PCore ,

    AVG(ISNULL(ss.RAMPct, 0)) OVER ( PARTITION BY ss.MeasureDate,

    ss.ServerID ) ,

    AvailableMemory ,

    dbo.GetServerHourID(s.ArtifactID, ss.MeasureDate) ,

    ss.MeasureDate

    FROM dbo.ServerSummary ss

    INNER JOIN ( SELECT

    MeasureDate ,

    ServerID ,

    AVG(ISNULL(CPUProcessorTimePct, 0)) OVER ( PARTITION BY MeasureDate,

    ServerID ) PCore

    FROM [dbo].[ServerProcessorSummary]

    ) AS AvgPCore ON ss.MeasureDate = AvgPCore.MeasureDate

    AND ss.ServerID = AvgPCore.ServerID

    INNER JOIN dbo.[Server] s ON ss.ServerID = s.ServerID

    INNER JOIN DB.dbo.ResourceServer rs ON s.ArtifactID = rs.ArtifactID

    WHERE s.ServerTypeID IN ( 1, 3 )

    AND ss.MeasureDate > DATEADD(dd, -90, GETUTCDATE())

    AND ss.MeasureDate NOT IN (

    SELECT SummaryDayHour

    FROM dbo.SystemLoadSummary )

    --Mine

    SELECT x.ArtifactID ,

    x.ServerTypeID ,

    x.RAMPagesPerSec ,

    x.PCore ,

    AVG(ISNULL(x.RAMPct, 0)) OVER ( PARTITION BY x.MeasureDate,

    x.ServerID ) ,

    AvailableMemory ,

    dbo.GetServerHourID(x.ArtifactID, x.MeasureDate) ,

    x.MeasureDate

    FROM ( SELECT DISTINCT

    ca.MeasureDate ,

    ca.ServerID ,

    ca.PCore ,

    s.ArtifactID ,

    s.ServerTypeID ,

    ss.RAMPagesPerSec ,

    ss.RAMPct ,

    ss.AvailableMemory

    FROM dbo.ServerSummary ss

    INNER JOIN dbo.[Server] s ON ss.ServerID = s.ServerID

    INNER JOIN DB.dbo.ResourceServer rs ON s.ArtifactID = rs.ArtifactID

    CROSS APPLY ( SELECT MeasureDate ,

    ServerID ,

    AVG(ISNULL(CPUProcessorTimePct,

    0)) OVER ( PARTITION BY MeasureDate,

    ServerID ) PCore

    FROM dbo.ServerProcessorSummary sps

    WHERE ss.MeasureDate = sps.MeasureDate

    AND ss.ServerID = sps.ServerID

    ) AS ca

    WHERE s.ServerTypeID IN ( 1, 3 )

    AND ss.MeasureDate > DATEADD(dd, -90, GETUTCDATE())

    AND ss.MeasureDate NOT IN (

    SELECT DISTINCT

    SummaryDayHour

    FROM dbo.SystemLoadSummary )

    ) AS x

  • Michael Meierruth (12/30/2014)


    Jeff, so what was the trick?

    I guess when you need to execute something only once and it takes 900 miliseconds you give it no further thought.

    Still, getting anything at all to go 1800 times faster is amazing.

    And if it takes you 4 hours to get there...

    So spill the beans!

    Even a simple descriptive outline would be sufficient.

    The problem was that they had an EAV where they wanted to get an exact match on multiple elements. We'll save the discussion about whether I think EAVs are good or bad for a later time.

    Anyhow, they were using conventional methods of interrogating the table with multiple simultaneous self joins. I converted it to a cascading cte where each cte referred to the previous cte and used it as another join to the table. What this did was very quickly isolate just the IDs that met the requirements of the cascade and the final select with its final join to the table very quickly returned all of the rows (joined by ID) that had an entity that contained all of the elements that were on individual rows.

    Think of it as a "Reverse Triangular Join".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Eirikur Eiriksson (12/31/2014)


    Michael Meierruth (12/30/2014)


    Jeff, so what was the trick?

    I guess when you need to execute something only once and it takes 900 miliseconds you give it no further thought.

    Still, getting anything at all to go 1800 times faster is amazing.

    And if it takes you 4 hours to get there...

    So spill the beans!

    Even a simple descriptive outline would be sufficient.

    Quick thought, most of us are doing this regularly, seeing improvements in the order of third or fourth magnitude is indeed quite common. My take on this is not how good one is but rather how appalling the existing code/config/whatever was, ORM generated or purely badly written doesn't matter, just plain bad.

    😎

    Heh... and then there are other times when it requires hours of sucking down beer popsicles in the presence of dust bunnies in the corner whilst wrapped in your favorite flannel toga, hugging a pillow, endlessly twiddling your hair and speaking in tongues to the SQL gods. πŸ˜›

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/31/2014)


    Eirikur Eiriksson (12/31/2014)


    Michael Meierruth (12/30/2014)


    Jeff, so what was the trick?

    I guess when you need to execute something only once and it takes 900 miliseconds you give it no further thought.

    Still, getting anything at all to go 1800 times faster is amazing.

    And if it takes you 4 hours to get there...

    So spill the beans!

    Even a simple descriptive outline would be sufficient.

    Quick thought, most of us are doing this regularly, seeing improvements in the order of third or fourth magnitude is indeed quite common. My take on this is not how good one is but rather how appalling the existing code/config/whatever was, ORM generated or purely badly written doesn't matter, just plain bad.

    😎

    Heh... and then there are other times when it requires hours of sucking down beer popsicles in the presence of dust bunnies in the corner whilst wrapped in your favorite flannel toga, hugging a pillow, endlessly twiddling your hair and speaking in tongues to the SQL gods. πŸ˜›

    TouchΓ©! apart from the twiddling of the hair, that's a thing of the past:rolleyes:

    😎

  • Michael Meierruth (12/30/2014)


    Jeff, so what was the trick?

    I guess when you need to execute something only once and it takes 900 miliseconds you give it no further thought.

    Still, getting anything at all to go 1800 times faster is amazing.

    And if it takes you 4 hours to get there...

    So spill the beans!

    Even a simple descriptive outline would be sufficient.

    Here you go.

    http://xkcd.com/1205/

  • aaron.reese (1/6/2015)


    Michael Meierruth (12/30/2014)


    Jeff, so what was the trick?

    I guess when you need to execute something only once and it takes 900 miliseconds you give it no further thought.

    Still, getting anything at all to go 1800 times faster is amazing.

    And if it takes you 4 hours to get there...

    So spill the beans!

    Even a simple descriptive outline would be sufficient.

    Here you go.

    http://xkcd.com/1205/

    The chart didn't have anything for a quarter million times in a single run that could be executed dozens of times per day. πŸ˜›

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • No, but I think 4 hours should be OK:-)

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply