TSQL Return start/end price and start/end datetime by product name.

  • That works Mark, nice one. So does this:

    ;WITH OrderedData AS (

    SELECT Seq = ROW_NUMBER() OVER (ORDER BY [Name], [DateTime], Price),

    [Name], Price, [DateTime]

    FROM #Temp

    ),

    Calculator AS (

    SELECT fr.Seq, fr.[Name], fr.Price, fr.[DateTime],

    Band = CAST('1' AS INT)

    FROM OrderedData fr

    WHERE fr.Seq = 1

    UNION ALL

    SELECT tr.Seq, tr.[Name], tr.Price, tr.[DateTime],

    Band = CASE WHEN tr.[Name] <> lr.[Name] OR tr.Price <> lr.Price THEN Band+1 ELSE Band END

    FROM Calculator lr

    INNER JOIN OrderedData tr ON tr.Seq = lr.Seq+1

    )

    SELECT a.[Name], a.Price, a.StartDate, b.StartDate AS EndDate

    FROM (

    SELECT Band, [Name], Price, MIN([DateTime]) AS StartDate

    FROM Calculator

    GROUP BY Band, [Name], Price

    ) a

    LEFT JOIN (

    SELECT Band, [Name], Price, MIN([DateTime]) AS StartDate

    FROM Calculator

    GROUP BY Band, [Name], Price

    ) b ON b.[Name] = a.[Name] AND b.Band = a.Band + 1

    ORDER BY a.[Name], a.Price

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • This one, does return proper results, However, on the given dataset sample, it performs 3 scans & 26 logical reads (incl. 24 reads of CTE worktable) vs 34 scans but only 7 logical reads. If you double the test data, then the CTE method will perform twice as more of reads while the number of reads using "quirky update" will stay the same. Inreasing the volume of data will stop SQL to perform logical reads of CTE worktable and number of reads in "quirky update" increases.

    I guess, it should be tested on the real set of data to check what performs best.

    I also think, that with proper indices (for example unique clustered on Name and Date), it is possible to achieve the acceptable performance of the version (revised & fixed) which uses APPLY, and, from my personal point of view is the most qute one :-D:

    select bs.Name

    ,bs.Price as StartPrice

    ,Min(bs.Date) as StartDT

    ,etp.NextPrice

    ,Min(etp.EndDT) as EndDT

    from #Temp bs

    outer apply (select top 1 et.Name

    ,et.Price as NextPrice

    ,et.Date as EndDt

    from #Temp et

    where et.Name = bs.Name and et.Price != bs.Price and et.Date > bs.Date

    order by et.Date asc

    ) etp

    group by bs.Name, bs.Price, etp.NextPrice

    order by bs.Name, StartDT

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • I need to clarify :

    elutin (6/2/2010)


    This one, does return proper results, However, on the given dataset sample, it performs 3 scans & 26 logical reads (incl. 24 reads of CTE worktable) vs 34 scans but only 7 logical reads. ...

    This should read as:

    This one (submitted by Mark-101232), does return proper results, However, on the given dataset sample, it performs 3 scans & 26 logical reads (incl. 24 reads of CTE worktable) vs 4 scans but only 7 logical reads by "quirky update" method.

    ...

    Also, Chris-Morris version of CTE just on 29 test data rows in the #Temp performs :

    Table 'Worktable' (CTE). Scan count 5, logical reads 369

    Table '#Temp'. Scan count 60, logical reads 60

    All versions should be tested against the real data. Would be interesting to know which one performs best...

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Hi All,

    I have looked at your solutions and tried them against my data and here are the results.

    Bhuvnesh

    Your code worked fine, quite fast - 11 Seconds, 267 rows

    Table 'PowerUpdate'. Scan count 5, logical reads 88, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'PowerItem'. Scan count 5, logical reads 1820, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (141337 row(s) affected)

    Table '#work_______________________________________________________________________________________________________________000000000170'. Scan count 4, logical reads 4904, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#work_______________________________________________________________________________________________________________000000000170'. Scan count 1, logical reads 1275, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (141337 row(s) affected)

    (267 row(s) affected)

    Table '#work_______________________________________________________________________________________________________________000000000170'. Scan count 5, logical reads 1274, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Warning: Null value is eliminated by an aggregate or other SET operation.

    Chris Morris-439714 (1:52pm)

    No end price in your query, 240 rows returned, results not accurate

    (240 row(s) affected)

    Table 'PowerUpdate'. Scan count 10, logical reads 176, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'PowerItem'. Scan count 10, logical reads 3640, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Here are the first two rows of the results

    $49.95 Value Pack0.14112010-02-17 00:05:11.6232010-03-21 09:20:07.430

    $49.95 Value Pack0.14912010-02-08 13:00:02.3572010-02-11 00:05:09.663

    The First Row has the wrong End Date

    Mark-101232

    Your code is fast and apprears accurate, 268 row(s) in ~6 seconds

    Table 'PowerUpdate'. Scan count 10, logical reads 176, physical reads 6, read-ahead reads 18, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'PowerItem'. Scan count 10, logical reads 3622, physical reads 24, read-ahead reads 1632, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Chris Morris-439714 : 4:06pm

    Sorry but this did not work and failed with the following after 1 minute 20 seconds.

    Msg 530, Level 16, State 1, Line 4

    The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

    Eugene Elutin

    Your code had not completed after 14 minutes

    For anyone who wants the real data that I was working with you can download the data from here.

    http://blog.crowe.co.nz/attachments/powerdata.rar

    I really appreciate the time you all have put into this query. I find it very cool to see the different solutions and hope one day I will be able to help others as you have done with me.

    At this time Marks' code is fast and looks accurate, thanks very much.

    Cheers

    Chris

  • I couldn't believe your performance figures and I've desided to test myself.

    After loading 140707 records into PowerItem table and adding index on PowerUpdateID (which I'm sure you have, otherwise your view will be always slow) I have run the "quirky" update and got 267 rows as results in 3 seconds. This result matches Bhuvnesh's 267 rows and one row short of Mark-101232 268 rows.

    Where this extra row comes from? It does come from situation when the same price change happens multiple times: from 1750 to 1700 then from 1700 to 1750 and then again from 1750 to 1700. Therefore small fix in "quirky" update is required in the final query:

    select Name

    ,NewPrice AS StartPrice

    ,MIN(NewDate) AS StartDate

    ,PrevPrice AS EndPrice

    ,PrevDateAS EndDate

    from #Work

    group by Name, NewPrice, PrevPrice, PrevDate

    order by Name, StartDate ASC

    I am running tests in Express edition on local pc, so my tempdb is on the same drive as main db. In real life, tempdb usually located on dedicated disk, which make additional performance benefits in relation of temp tables.

    Looking into similar perfomance figures Mark CTE version is defenetly a winner as it's more elegant.

    BTW. Version with OUTER APPLY is really slow in this case due to use of non-equal conditions and view which has no helpful indices.

    Cheers,

    Eugene

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • I made a small mistake on my test results in that I said that the first result from was "Bhuvnesh's" but it was in fact from "Cold Coffee"

    I did not notice the quote and just grabbed the wrong name

    Cheers

    Chris

  • Thanks Eugene Elutin

    I will check the missing index and look over the TSQL all of you have provided.

    It is an idea I need to get better at

    cheers

    chris

  • Hi again

    I just wanted to let you know that I put the query I liked the most in this thread into a blog post.

    http://blog.crowe.co.nz/blog/archive/2010/06/22/SQL-Server--TSQL-Return-startend-price-and-startend-date.aspx

    The reason was that I appreciated the solution but did not fully understand why it works (the one that I choose)

    So I tried to break it down into steps and hence the post came about.

    Thanks again for your help

    chris

  • NOTE: This may be obsolete by now, I got very busy and just now able to spend a quick couple of minutes on this.

    Please try this, testing it on a larger batch of data.

    If you have have/add index of (Name, Price, Datetime) in that exact order, this query should be super-fast. Even without that, hopefully it will be reasonable πŸ™‚ .

    If not, you might have to put the CTE into a temp table so that it can be indexed.

    ;WITH CtePriceDate AS (

    SELECT Name, Price, MIN(Datetime) AS Datetime,

    ROW_NUMBER() OVER (ORDER BY Name, Price, MIN(Datetime)) AS RowNumber

    FROM #Temp

    GROUP BY Name, Price

    )

    SELECT cpd1.Name, cpd1.Price AS MinPrice, cpd1.Datetime AS MinDate,

    cpd2.Price AS NextPrice, cpd2.Datetime AS NextDatetime

    FROM CtePriceDate cpd1

    LEFT OUTER JOIN CtePriceDate cpd2 ON cpd1.Name = cpd2.Name AND

    cpd1.RowNumber = cpd2.RowNumber - 1

    Scott Pletcher, SQL Server MVP 2008-2010

  • Hi Scott

    I tried your code and got a few issues if I sort by Name, MinDate as can be seen below.

    The first row the next Date Time is actually after the 2nd rows's min date whereas they should be the same.

    Name MinPriceMinDate NextPrice NextDatetime

    $49.95 Value Pack0.16932010-02-06 22:24:30.6630.17162010-03-21 09:20:07.430

    $49.95 Value Pack0.14912010-02-08 13:00:02.3570.15052010-02-12 00:05:09.017

    $49.95 Value Pack0.15182010-02-11 00:05:09.6630.15562010-03-24 00:05:02.980

    $49.95 Value Pack0.15052010-02-12 00:05:09.0170.15182010-02-11 00:05:09.663

    It should have been

    Name Min PriceMin DateTime NextPriceNext Price DateTime

    $49.95 Value Pack0.16932010-02-06 22:24:30.6630.14912010-02-08 13:00:02.357

    $49.95 Value Pack0.14912010-02-08 13:00:02.3570.15182010-02-11 00:05:09.663

    $49.95 Value Pack0.15182010-02-11 00:05:09.6630.15052010-02-12 00:05:09.017

    $49.95 Value Pack0.15052010-02-12 00:05:09.0170.14112010-02-17 00:05:11.623

    $49.95 Value Pack0.14112010-02-17 00:05:11.6230.17162010-03-21 09:20:07.430

    $49.95 Value Pack0.17162010-03-21 09:20:07.4300.14112010-03-22 10:50:03.027

    Thanks for the post

    Chris

  • Sorry about that. I don't have much time to look ... I just do this for a quick minute or two when I need a break. This is my break! -- am I a database-geek or what!! πŸ™‚

    Anyway, please try code below. I think it will work, but I'm not sure about how it will perform. Please let me know on both fronts.

    --add another row so that the price goes back to an earlier price

    --but at a later date, e.g., from $4 to $5 and then back to $4 (!)

    INSERT INTO #Temp (id, [Name], Price, [DateTime])

    SELECT 2998, '$49.95 Value Pack', 0.1505, 'Feb 22 2010 22:05'

    ;WITH CteDatePrice AS (

    SELECT Name, Datetime, Price,

    ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Datetime) AS RowNumber

    FROM #Temp

    ), CteMinPriceDate AS (

    SELECT cdp1.Name, cdp1.Price AS MinPrice, cdp1.Datetime AS MinDate,

    ROW_NUMBER() OVER (ORDER BY cdp1.Name, cdp1.Datetime) AS RowNumber

    FROM CteDatePrice cdp1

    WHERE RowNumber = 1 OR NOT EXISTS(

    SELECT 1

    FROM CteDatePrice cdp2

    WHERE cdp2.Name = cdp1.Name

    AND cdp2.RowNumber = cdp1.RowNumber - 1

    AND cdp2.Price = cdp1.Price)

    )

    SELECT cmpd1.Name, cmpd1.MinPrice, cmpd1.MinDate,

    cmpd2.MinPrice AS NextPrice, cmpd2.MinDate AS NextDate

    FROM CteMinPriceDate cmpd1

    LEFT OUTER JOIN CteMinPriceDate cmpd2 ON cmpd2.Name = cmpd1.Name AND

    cmpd2.RowNumber = cmpd1.RowNumber + 1

    Scott Pletcher, SQL Server MVP 2008-2010

  • I gave your new version a go Scott and I killed it after 40 minutes.

    Chris

  • Yeah, sorry, I knew the first CTE was a killer, but I just didn't have any time today.

    Scott Pletcher, SQL Server MVP 2008-2010

  • As mentioned already, assuming a proper index -- (name, date) INCLUDE price, in that order -- code below should work OK, although I still don't like how poorly the optimizer handles the query plan.

    ;WITH CteMinPriceDate AS (

    SELECT t1.Name, t1.date AS Mindate, t1.Price AS MinPrice,

    ROW_NUMBER() OVER (ORDER BY t1.Name, t1.date) AS RowNumber

    FROM #Temp t1

    WHERE price <> ISNULL((

    SELECT TOP 1 t2.price

    FROM #Temp t2

    WHERE t2.Name = t1.Name

    AND t2.date < t1.date

    ORDER BY date DESC), -999)

    )

    SELECT cmpd1.Name, cmpd1.MinPrice, cmpd1.Mindate,

    cmpd2.MinPrice AS NextPrice, cmpd2.Mindate AS NextDate

    FROM CteMinPriceDate cmpd1

    LEFT OUTER JOIN CteMinPriceDate cmpd2 ON cmpd2.Name = cmpd1.Name AND

    cmpd2.RowNumber = cmpd1.RowNumber + 1

    The query plan indicates it's worse than the other CTE version, but the execution plan analysis indicates it's better. :unsure:

    If you've got time, I'd really appreciate knowing how it performed on your actual data πŸ™‚ .

    Scott Pletcher, SQL Server MVP 2008-2010

  • I killed your last one scott after 14 minutes.

    Just to be sure I ran the original one again from Mark-101232

    WITH CTE1 AS (

    SELECT Name ,Price ,DateTime,

    ROW_NUMBER() OVER(PARTITION BY Name ORDER BY DATETime) AS rn1,

    ROW_NUMBER() OVER(PARTITION BY Name,Price ORDER BY DATETime) AS rn2

    FROM dbo.vPowerData),

    CTE2 AS (

    SELECT Name,Price AS [Min Price],MIN(DateTime) AS [Min DateTime],MAX(rn1) AS maxRN

    FROM CTE1

    GROUP BY Name,Price,rn2-rn1)

    SELECT a.Name,a.[Min Price],a.[Min DateTime],

    b.Price AS [NextPrice],

    b.DateTime AS [Next Price DateTime]

    FROM CTE2 a

    LEFT OUTER JOIN CTE1 b ON b.Name=a.Name AND b.rn1=a.maxRN+1 AND b.DateTime>a.[Min DateTime]

    ORDER BY a.Name,a.[Min DateTime];

    It took 5 seconds

    Chris

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply