Windows Functions: Tell me when that changes.

  • j-1064772 (2/18/2016)


    The execution plans do not tell the whole story. I have seen some cases by Gila Monster where the execution was simplified but the load on the server was not improved (roughly same execution time.

    Agreed, that's why I mention it as a possibility, not a certainty. To be sure about performance improvement, we would need to generate sample data and run tests. I'm not sure that I could generate useful sample data without knowing the real behavior for this data. That's the reason I didn't provide a test as well.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • davidawest00 (2/17/2016)


    SS will create ALL THE ROWS for a CTE before it processes the main query, whether it will need all those rows or not

    Not necessarily. The query below only does an index seek on the nc1 index of sysjobsteps when I ran it. Sure, if the CTE is complex enough, or returns enough of the rows in the table(s) it uses, it may be materialised. You can't generalise that to all cases, though.

    WITH JobsAS (

    SELECT * FROM msdb.dbo.sysjobsteps

    )

    SELECT step_name

    FROM Jobs

    WHERE job_id = '22B6E268-E13D-497C-B582-0C0559E42382'

    John

  • John Mitchell-245523 (2/18/2016)


    davidawest00 (2/17/2016)


    SS will create ALL THE ROWS for a CTE before it processes the main query, whether it will need all those rows or not

    Not necessarily. The query below only does an index seek on the nc1 index of sysjobsteps when I ran it. Sure, if the CTE is complex enough, or returns enough of the rows in the table(s) it uses, it may be materialised. You can't generalise that to all cases, though.

    WITH JobsAS (

    SELECT * FROM msdb.dbo.sysjobsteps

    )

    SELECT step_name

    FROM Jobs

    WHERE job_id = '22B6E268-E13D-497C-B582-0C0559E42382'

    John

    Just another example to demonstrate that we're telling SQL Server what we want, not how we want it to do it. There's a lot happening behind our code which is not interpreted left to right.

    CREATE TABLE #Example(

    Mystring char(5)

    );

    INSERT INTO #Example VALUES('a'), ('1');

    WITH CTE AS(

    SELECT *

    FROM #Example

    WHERE ISNUMERIC(Mystring) = 1

    )

    SELECT *

    FROM CTE

    WHERE Mystring = 1;

    DROP TABLE #Example;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Thanks for the article.

  • davidawest00 - Wednesday, February 17, 2016 10:35 AM

    I have encountered this precise scenario a dozen times.You can do the same thing with a correlated sub-query, and they usually scale better than any CTE because a CTE is just an in-line view, which SQL Server will generate first.In other words, SS will create ALL THE ROWS for a CTE before it processes the main query, whether it will need all those rows or not, and you can't put an index on a CTE.Slow and disk intensive. Bleh! Better to create 2 temp tables and index them: 1 for the current order and 1 for the previous order.Then join. Last time I did that the stored proc went from 2 hours to 20 minutes.

    Just to clarify, this is a myth. SQL Server does not "evaluate" a CTE in this manner at all. CTE's are simply a coding construct similar to a derived table or a correlated subquery.
    There are some parallels with recursive CTE's but that's a different story.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • You can also provide a Default value for the LEAD/LAG Window functions.

    In this case, it means not having to add the extra AND PREVIOUS_CODE IS NOT NULL in the where clause


    WITH cteSAMPLEDATA
    AS (
       SELECT ID, ORDER_ID, CODE, ORDER_DATE FROM
       (VALUES
       (1, 'ORDER1', 'DS', N'20151001'),
       (2, 'ORDER2', 'DS', N'20151001'),
       (5, 'ORDER2', 'DS', N'20151002'),
       (3, 'ORDER3', 'DS', N'20151001'),
       (6, 'ORDER3', 'AG', N'20151002'),
       (8, 'ORDER3', 'AG', N'20151003'),
       (4, 'ORDER4', 'DS', N'20151001'),
       (7, 'ORDER4', 'AG', N'20151002'),
       (9, 'ORDER4', 'DS', N'20151003')
       )
       AS MYTABLE(ID, ORDER_ID, CODE, ORDER_DATE)
    ), cteGET_MAX_DATE_AND_CODE
    AS (
    SELECT
       H1.ID,
       H1.ORDER_ID,
       H1.CODE,
       H1.ORDER_DATE,
       MAX(H1.ORDER_DATE) --Get the latest date for each ORDER
          OVER (PARTITION BY ORDER_ID)     AS MAX_DATE,
       LEAD(H1.CODE, 1, H1.CODE) --Get the code 1 row before the current row.
          OVER (PARTITION BY ORDER_ID
            ORDER BY ORDER_DATE DESC)    AS PREVIOUS_CODE
    FROM cteSAMPLEDATA H1
    )
    SELECT
       ID, ORDER_ID, CODE, PREVIOUS_CODE, ORDER_DATE, MAX_DATE
    FROM cteGET_MAX_DATE_AND_CODE
    WHERE ORDER_DATE = MAX_DATE
    AND CODE   != PREVIOUS_CODE
    AND PREVIOUS_CODE IS NOT NULL
    ;

  • ChrisM@Work - Friday, March 17, 2017 2:54 AM

    davidawest00 - Wednesday, February 17, 2016 10:35 AM

    I have encountered this precise scenario a dozen times.You can do the same thing with a correlated sub-query, and they usually scale better than any CTE because a CTE is just an in-line view, which SQL Server will generate first.In other words, SS will create ALL THE ROWS for a CTE before it processes the main query, whether it will need all those rows or not, and you can't put an index on a CTE.Slow and disk intensive. Bleh! Better to create 2 temp tables and index them: 1 for the current order and 1 for the previous order.Then join. Last time I did that the stored proc went from 2 hours to 20 minutes.

    Just to clarify, this is a myth. SQL Server does not "evaluate" a CTE in this manner at all. CTE's are simply a coding construct similar to a derived table or a correlated subquery.
    There are some parallels with recursive CTE's but that's a different story.

    Amen to that!  I knew if I read down far enough, one of you heavy hitters would pick up on that.  Same thing with a View.  They usually aren't "materialized first" and cannot be treated as if they were a table.  They become a part of the overall code, just as an iTVF usually does.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • adame - Thursday, February 18, 2016 1:38 AM

    Agreed, always look to use new features.

    Nah...you did good.  I agree that there have been some very useful and well thought out new features (ROW_NUMBER and APPLY are my all-time "new feature" favorites, so far) but a lot of the new features have been, ummmm.... no other word for it... "crap" or lead to misuse.  My three least favorite "new" features so far are the newer date and time (including DATETIME2, etc, because they disallow direct temporal math), PIVOT/UNPIVOT (pre-aggregated CROSS TABS are both easier and faster for what I do), and FORMAT (44 times slower than CONVERT) with incremental Recursive CTEs following as a very close second.

    Heh... and yeah.... I definitely get the "legacy" stuff.  I had to use a crow-bar and jet-propelled pork chop launcher to get our company to finally upgrade from 2005 to 2012 and that was just last year.  We actually had the hardware for the migration setup and ready nearly 2 years before that.  A lot of people simply can't afford to do any upgrades and so I frequently shoot for the lowest common denominator (generally 2005) unless it just doesn't make sense to do so because of some huge performance gain... which usually ISN'T the case. πŸ˜‰

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Friday, March 17, 2017 9:14 AM

    adame - Thursday, February 18, 2016 1:38 AM

    Agreed, always look to use new features.

    Nah...you did good.  I agree that there have been some very useful and well thought out new features (ROW_NUMBER and APPLY are my all-time "new feature" favorites, so far) but a lot of the new features have been, ummmm.... no other word for it... "crap" or lead to misuse.  My three least favorite "new" features so far are the newer date and time (including DATETIME2, etc, because they disallow direct temporal math), PIVOT/UNPIVOT (pre-aggregated CROSS TABS are both easier and faster for what I do), and FORMAT (44 times slower than CONVERT) with incremental Recursive CTEs following as a very close second.

    Heh... and yeah.... I definitely get the "legacy" stuff.  I had to use a crow-bar and jet-propelled pork chop launcher to get our company to finally upgrade from 2005 to 2012 and that was just last year.  We actually had the hardware for the migration setup and ready nearly 2 years before that.  A lot of people simply can't afford to do any upgrades and so I frequently shoot for the lowest common denominator (generally 2005) unless it just doesn't make sense to do so because of some huge performance gain... which usually ISN'T the case. πŸ˜‰

    Whatever did we do without ROW_NUMBER? Oh, I remember now - we ran stuff into a temp table, using the IDENTITY() function in the SELECT. Mimicing PARTITION BY was less trivial. Throw NTILE, SUM, COUNT etc into the mix and you've got a sea-change in the way we code. Magic πŸ™‚

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Nice article..i love window functions..makes for an

    elegant code..my go to analytical function for

    Year onYear type or queries .

  • Good example of LEAD. As with all things SQL, there are so many approaches to the problem. I do love window functions, and lead/lag are particularly useful. That said, I might have approached this specific problem differently. Mostly I throw this out there just for alternative thoughts, not as a criticism. 

    One of the things that struck me about the approach outlined is that at the core of the requirement in this case is that we are only interested in the most recent two records for an order. With that in mind, I would have addressed that first, so that the comparison of last value to previous value only has to manage two records per order. You could go so far as to even remove orders with only one record. I would accomplish this with a CTE that uses either ROW_NUMBER or COUNT() OVER () to serially number each set of records over an orderId with desc on date. From that CTE, it's easy to throw a filter on the next select where row_number < 2 and now you've got a record set that doesn't have any records that don't matter. If you really wanted to get fancy, you could filter them further for orders that have more than one record. This would mean the record set that you are ultimately scraping for last value to previous value comparison has precisely 2 records per order. No more and no less.

    From there a lead function could be used, or you could self join the table, or you could use the pivot operator and compare columns, or you could group on orderId and compare MAX(value) to MIN(value) - any of which I feel could read cleaner, with varying degrees of performance considerations. 

    For me, readability is my first consideration, then performance (assuming results are correct, of course). That said, this is what I visualized immediately (not tested)

    ; WITH ORDERCODES AS
    (
    SELECT
      ID
      , ORDER_ID
      , CODE
      , ORDER_DATE
      , REV_ORDER_SEQUENCE = ROW_NUMBER() OVER (PARTITION BY ORDER_ID ORDER BY ORDER_DATE DESC)
      , TOTAL_ORDER_RECORDS = COUNT(1) OVER (PARTITION BY ORDER_ID)
    FROM
      SAMPLEVALUES
    )
    , ORDERSINERROR
    (
    SELECT ORDER_ID
    FROM ORDERCODES OC
    WHERE
      OC.REV_ORDER_SEQUENCE <= 2
      AND OC.TOTAL_ORDER_RECORDS > 1
    GROUP BY ORDER_ID
    HAVING MAX(CODE) <> MIN(CODE)
    )

    SELECT OC.*
    FROM
    ORDERCODES OC INNER JOIN
    ORDERSINERROR OIE
      ON OIE.ORDER_ID = OC.ORDER_ID
      AND OC.REV_ORDER_SEQUENCE <= 2
    ORDER BY
     ORDER_ID
     , ORDER_DATE

  • ChrisM@Work - Friday, March 17, 2017 10:37 AM

    Whatever did we do without ROW_NUMBER? Oh, I remember now - we ran stuff into a temp table, using the IDENTITY() function in the SELECT. Mimicing PARTITION BY was less trivial. Throw NTILE, SUM, COUNT etc into the mix and you've got a sea-change in the way we code. Magic πŸ™‚

    Partition By was pretty easy to solve once you had things in the Temp Table.  I don't even have to tell you how because you already know.  The reminder is "QU". πŸ˜‰

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Saturday, March 18, 2017 3:31 PM

    ChrisM@Work - Friday, March 17, 2017 10:37 AM

    Whatever did we do without ROW_NUMBER? Oh, I remember now - we ran stuff into a temp table, using the IDENTITY() function in the SELECT. Mimicing PARTITION BY was less trivial. Throw NTILE, SUM, COUNT etc into the mix and you've got a sea-change in the way we code. Magic πŸ™‚

    Partition By was pretty easy to solve once you had things in the Temp Table.  I don't even have to tell you how because you already know.  The reminder is "QU". πŸ˜‰

    You know that I know that...
    Our BI team had a logic problem last week. I gave them three different solutions to play with, all learned here on ssc. The fastest was a QU solution which ran in moments. They were blown away.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • ChrisM@home - Saturday, March 18, 2017 4:07 PM

    Jeff Moden - Saturday, March 18, 2017 3:31 PM

    ChrisM@Work - Friday, March 17, 2017 10:37 AM

    Whatever did we do without ROW_NUMBER? Oh, I remember now - we ran stuff into a temp table, using the IDENTITY() function in the SELECT. Mimicing PARTITION BY was less trivial. Throw NTILE, SUM, COUNT etc into the mix and you've got a sea-change in the way we code. Magic πŸ™‚

    Partition By was pretty easy to solve once you had things in the Temp Table.  I don't even have to tell you how because you already know.  The reminder is "QU". πŸ˜‰

    You know that I know that...
    Our BI team had a logic problem last week. I gave them three different solutions to play with, all learned here on ssc. The fastest was a QU solution which ran in moments. They were blown away.

    Could either of you point to an article or description of a "QU solution", please?

    [font="Arial Narrow"]bc[/font]

  • bc_ - Wednesday, March 22, 2017 12:43 PM

    ChrisM@home - Saturday, March 18, 2017 4:07 PM

    Jeff Moden - Saturday, March 18, 2017 3:31 PM

    ChrisM@Work - Friday, March 17, 2017 10:37 AM

    Whatever did we do without ROW_NUMBER? Oh, I remember now - we ran stuff into a temp table, using the IDENTITY() function in the SELECT. Mimicing PARTITION BY was less trivial. Throw NTILE, SUM, COUNT etc into the mix and you've got a sea-change in the way we code. Magic πŸ™‚

    Partition By was pretty easy to solve once you had things in the Temp Table.  I don't even have to tell you how because you already know.  The reminder is "QU". πŸ˜‰

    You know that I know that...
    Our BI team had a logic problem last week. I gave them three different solutions to play with, all learned here on ssc. The fastest was a QU solution which ran in moments. They were blown away.

    Could either of you point to an article or description of a "QU solution", please?

    Here's the article explaining it.
    http://www.sqlservercentral.com/articles/T-SQL/68467/

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply