Linking to the Previous Row

  • Aaron Gonzalez (1/23/2009)


    Déjà vu...I did add this article to my briefcase last year...

    Anyway...nice article, David.

    yeah....friday is repeats day!

  • This is a well written, concise article with a very practical application. I've read a bit about CTEs but this is the first time I've seen an example that I have already had to solve the old fashioned way.

  • A CTE is a temporary result set, which is valid for the scope of a single Select Update Insert or Delete statement.

    BEWARE

    This claim is false, and proof is included. A CTE is instead incorporated into the SQL statement BEFORE the statement is evaluated, planned for and finally executed. Look at it as an inline view instead of an intermediate result set from an execution point of view.

    When you execute:

    set statistics io on;

    select * from dbo.PriceCompare;

    you get the following statistics:

    Table 'PriceHistory'. Scan count 23, logical reads 46, physical reads 0, read-ahead reads 0,

    lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Items'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0,

    lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Now lets create two views of our own just to demonstrate you get identical results:

    create view PriceCompareFakeCTE as

    select

    i.Item

    , ph.ItemId

    , ph.PriceStartDate

    , ph.Price

    , row_number() over ( partition by ph.ItemId order by ph.PriceStartDate ) as rownum

    from

    Items i

    inner join PriceHistory as ph on ph.ItemId = i.ItemId

    ;

    go

    create view dbo.PriceCompareUsingView as

    select

    currow.Item as Item

    , prevrow.Price as OldPrice

    , currow.Price as RangePrice

    , currow.PriceStartDate as StartDate

    , nextrow.PriceStartDate as EndDate

    from

    PriceCompareFakeCTE as currow

    left join PriceCompareFakeCTE as prevrow on

    prevrow.ItemId = currow.ItemId and

    prevrow.rownum = currow.rownum - 1

    left join PriceCompareFakeCTE as nextrow on

    nextrow.ItemId = currow.ItemId and

    nextrow.rownum = currow.rownum + 1

    ;

    go

    and verify using:

    set statistics io on;

    select * from dbo.PriceCompareUsingView;

    you get the following identical statistics:

    Table 'PriceHistory'. Scan count 23, logical reads 46, physical reads 0, read-ahead reads 0,

    lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Items'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0,

    lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    The query plans as shown in management studio also display the same structure. Neither view or CTE guarantees the same data is not just read a 2nd time if the optimizer deems that to be more efficient. The read data is already in the cache and storing the intermediate result would take up writes to tempdb.

    It can hurt performance and break code when you assume a CTE guarantees the result is build only once.

    Now the proof

    Here the view with build in CTE from the original article, but now with a newid() function in it that is displayed for the previous, current and next table aliases used in the query.

    alter view dbo.PriceCompare as

    with

    PriceCompare as

    (

    select

    i.Item

    , ph.ItemId

    , ph.PriceStartDate

    , ph.Price

    , row_number() over (partition by ph.ItemId order by ph.PriceStartDate) as rownum

    , newid() as random

    from

    Items as i

    inner join PriceHistory as ph on ph.ItemId = i.ItemId)

    select

    currow.Item as Item

    , prevrow.Price as OldPrice

    , currow.Price as RangePrice

    , currow.PriceStartDate as StartDate

    , nextrow.PriceStartDate as EndDate

    , currow.random as curRandom

    , prevrow.random as prevRandom

    , nextrow.random as nextRandom

    from

    PriceCompare as currow

    left join PriceCompare as prevrow on

    prevrow.ItemId = currow.ItemId and

    prevrow.rownum = currow.rownum - 1

    left join PriceCompare as nextrow on

    nextrow.ItemId = currow.ItemId and

    nextrow.rownum = currow.rownum + 1

    ;

    go

    with the above modified view in place, execute:

    select * from dbo.PriceCompare;

    Now compare the prevRandom and nextRandom columns in your result to that of the curRandom from the actual previous and next record.

    They are different each and every time :), which would not have been the case if the CTE was an intermediate result that is queried multiple times by the joins in the query!

    Request to author

    Please correct the article as people will assume your claim to be true and code accordingly in the future.

  • Mazharuddin Ehsan (3/16/2008)


    I hope to show in this article that the CTEs usefulness is not purely cosmetic.

    You are right david. It would be immensely useful in more complicated situations. I would have to do as follows without the use of CTE in the present example.

    CREATE VIEW [dbo].[PriceCompare] AS

    SELECT

    currow.Item,

    prevrow.Price AS OldPrice,

    currow.Price AS RangePrice,

    currow.PriceStartDate AS StartDate,

    nextrow.PriceStartDate AS EndDate

    FROM

    (SELECT i.Item, ph.ItemId, ph.PriceStartDate, ph.Price,

    ROW_NUMBER() OVER (Partition BY ph.ItemId ORDER BY PriceStartDate) AS rownum

    FROM Items i INNER JOIN PriceHistory ph

    ON i.ItemId = ph.ItemId) currow

    LEFT JOIN

    (SELECT i.Item, ph.ItemId, ph.PriceStartDate, ph.Price,

    ROW_NUMBER() OVER (Partition BY ph.ItemId ORDER BY PriceStartDate) AS rownum

    FROM Items i INNER JOIN PriceHistory ph

    ON i.ItemId = ph.ItemId) nextrow

    ON currow.rownum = nextrow.rownum - 1 AND currow.ItemId = nextrow.ItemId

    LEFT JOIN

    (SELECT i.Item, ph.ItemId, ph.PriceStartDate, ph.Price,

    ROW_NUMBER() OVER (Partition BY ph.ItemId ORDER BY PriceStartDate) AS rownum

    FROM Items i INNER JOIN PriceHistory ph

    ON i.ItemId = ph.ItemId) prevrow

    ON currow.rownum = prevrow.rownum + 1 AND currow.ItemId = prevrow.ItemId

    Too clumsy and reiterative as you see. I am looking forward to your input for 'using CTEs with recursive queries'

    Best regards,

    Maz

    My previous post is proof that what you written is exactly as SQL Server sees it after it fuses the CTE with the main query. Yes it is difficult to spot at a glance the inlined queries are identical, and that is the only reason a CTE is better.

  • A CTE is a temporary result set, which is valid for the scope of a single Select Update Insert or Delete statement.

    A CTE is instead incorporated into the SQL statement BEFORE the statement is evaluated, planned for and finally executed. Look at it as an inline view instead of an intermediate result set from a execution point of view.

    I get your distinction, which in some cases may be an important one. I haven't looked in detail at your example (as it's already Friday afternoon here and I'm winding down!), but I suspect the resultset is regenerated precisely because of the newid() function. Otherwise it would reuse the data.

    But I'd say it's beyond the scope of this article....but not the subsequent discussion! What I mean by that is that the primary goal of the article was to introduce the reader into practical applications for CTEs, by means of a real life example. This can mean simplifying slightly...like when your English teacher says "Never start a sentence with a conjunction." or a colleague might say "Never use cursors". It's good advice until the day when it isn't.

    Perhaps I should have phrased it like Books online, who cleverly uses "thought of" as a veracity disclaimer.

    "A common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. " (From BOL)

    So I'll leave the article as it is, but thank you for pointing out that this phrase isn't strictly true!

    Regards,

    David

  • slange (1/23/2009)


    Bill Coale (3/13/2008)


    Also truncate to midnight the start/end date and any date comparisons to the range

    I handle the end date solution a little differently....

    Yes, and I'm looking forward to migrating to SQLServer 2008, with the new data type for just the date with no time.

  • I understand your point of view as well as the theory you put forward as to why it might be different in case of my proof. Therefore I did the query once more on my modified view to see if the statistics and query plan where any different from the original. They were not and that is 100% proof that the original also generates no intermediate results.

    The reasons I feel it is important to change:

    1. Half the discussion goes about using CTEs and you confirmed it is one of the key ingredients that you build your case upon and is in fact a focus.

    2. The article is very easy to correct without negative effects.

    If I am right and I think I am, people new to CTE get put on the wrong track from the start. When they think a CTE is a means to get an intermediate result they will code with that in mind. Also, this forum is about learning and sharing knowledge, so it is important the information is as correct as possible. Therefore I hope you will reconsidder after taking a look at the code I posted. There is a case to be made to keep the article as is and add a correction at the bottom or top. This way the following discussion keeps making sense for new readers.

    Other then that, enjoy your weekend....I sure will 🙂

  • Nice article. Do you have any recommendation regarding CTEs vs. temporary tables in 2005 or 2008?

  • Hello everybody,

    I have wrote about a problem that I have and I thinks who will be on the context of this article.

    I appreciate your help, thanks 😉 :

    http://www.sqlservercentral.com/Forums/Topic642999-362-1.aspx

  • bteague (1/23/2009)


    Nice article. Do you have any recommendation regarding CTEs vs. temporary tables in 2005 or 2008?

    2008 I haven't used yet...so can't say.

    I think you've got to situate CTEs somewhere between temporary tables and subqueries (a select in a select).

    A few points below in no particular order.

    1) The most obvious advantage over both a subquery and a temp table is that CTEs are more readable.

    2) You can put a CTE in a view (like a subquery but unlike a temp table)

    3) You can reference a CTE multiple times in your statement (like a temp table but unlike a subquery.)

    4) CTEs are in scope only for the current statement (like a subquery but unlike a temp table.)

    One really handy thing about CTEs (which I don't think I mention in the article) is the ability to use multiple CTEs for the same select, and also for one CTE to reference another. Thus you can achieve something really quite complex - where doing the same thing with temp tables or subquery would be an absolute nightmare.

    Another point is that with CTEs you don't explicitly define data types, as you usually would with a temp table (unless you "select into".)

    As Peter pointed out a few posts ago, a CTE is more akin to a subquery than a temp table, in that it is constructed when it is referenced and not upfront.

    As for performance specifically of CTEs vs temp tables, I don't have sufficient knowledge to comment - so I won't! (I know that the rownumber() function isn't the worlds fastest - but that's in no way specific to a CTE.)

    Hope this answers at least partially your question.

    I'm currently working on a SQL2000 project and I really, really miss my CTEs, so my final word would be one of caution - CTEs are addictive, withdrawal isn't pretty, and there's no decent rehab!

    Regards,

    David.

  • Absolutely loved the article. I am sure everyone had to do something like this at one time or another. Thank you for sharing this technique and nice explanations.

  • David McKinney (1/25/2009)


    As Peter pointed out a few posts ago, a CTE is more akin to a subquery than a temp table, in that it is constructed when it is referenced and not upfront.

    As for performance specifically of CTEs vs temp tables, I don't have sufficient knowledge to comment - so I won't! (I know that the rownumber() function isn't the worlds fastest - but that's in no way specific to a CTE.)

    I'll say, be careful... CTE's are (currently) re-executed everytime you call them. Temp Tables only suffer the single execution to build them. Further, large Temp Tables can be indexed the way that you need them... CTE's cannot.

    Last, but not least, a CTE is good for a single query. A Temp Table's life span is for the whole proc that made it.

    No, I'm not saying to not use CTE's or to use Temp Tables instead. I'm saying, "It Depends". 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Good article. I tried to use it to get results of calculations for previous year and quarter but not working.

    Here is the scenario: I want to calculate CPrice, and PPrice

    and display as for Yr = 2002, Qtr = 3, CPrice = (Price/Prod), PPrice = (CPrice of Yr-2002, Qtr-2)

    Yr, Qtr, Price, Prod, CPrice, PPrice

    2002, 2, 120, 100, 1.2, 0

    2002, 3, 150, 50, 3, 1.2

    and so on.....

    How can I use your example "Linking to the Previous Row"?

  • Dave R (1/30/2009)


    Good article. I tried to use it to get results of calculations for previous year and quarter but not working.

    How can I use your example "Linking to the Previous Row"?

    The key is that, after the numbering is applied, the rows you want to compare / link are sequential i.e. you want to link row n to row n-1 or n+1. So if you have got one row per quarter, then you should order by year, quarter. If you've got multiple rows per quarter then you either have to filter with a where clause or partition. I can't tell you which - it depends on your situation.

    I suggest you adopt a step by step approach as in the example, and try to get all relevant data appearing on the same row.

    (It is possible that you don't even need to number - that you already have a numbering system with your year and quarter - but maybe it's easiest to apply a row_number anyway.)

    Hope this helps,

    Let me know how you get on,

    David.

  • Excellent and clearly written!

    I'm always reading CTE articles and this is one of the best, because it explains a situation which is very commonly encountered.

    - Paul

    - Paul

    http://paulpaivasql.blogspot.com/

Viewing 15 posts - 46 through 60 (of 147 total)

You must be logged in to reply to this topic. Login to reply