June 2, 2010 at 9:06 am
That works Mark, nice one. So does this:
;WITH OrderedData AS (
SELECT Seq = ROW_NUMBER() OVER (ORDER BY [Name], [DateTime], Price),
[Name], Price, [DateTime]
FROM #Temp
),
Calculator AS (
SELECT fr.Seq, fr.[Name], fr.Price, fr.[DateTime],
Band = CAST('1' AS INT)
FROM OrderedData fr
WHERE fr.Seq = 1
UNION ALL
SELECT tr.Seq, tr.[Name], tr.Price, tr.[DateTime],
Band = CASE WHEN tr.[Name] <> lr.[Name] OR tr.Price <> lr.Price THEN Band+1 ELSE Band END
FROM Calculator lr
INNER JOIN OrderedData tr ON tr.Seq = lr.Seq+1
)
SELECT a.[Name], a.Price, a.StartDate, b.StartDate AS EndDate
FROM (
SELECT Band, [Name], Price, MIN([DateTime]) AS StartDate
FROM Calculator
GROUP BY Band, [Name], Price
) a
LEFT JOIN (
SELECT Band, [Name], Price, MIN([DateTime]) AS StartDate
FROM Calculator
GROUP BY Band, [Name], Price
) b ON b.[Name] = a.[Name] AND b.Band = a.Band + 1
ORDER BY a.[Name], a.Price
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 2, 2010 at 9:55 am
This one, does return proper results, However, on the given dataset sample, it performs 3 scans & 26 logical reads (incl. 24 reads of CTE worktable) vs 34 scans but only 7 logical reads. If you double the test data, then the CTE method will perform twice as more of reads while the number of reads using "quirky update" will stay the same. Inreasing the volume of data will stop SQL to perform logical reads of CTE worktable and number of reads in "quirky update" increases.
I guess, it should be tested on the real set of data to check what performs best.
I also think, that with proper indices (for example unique clustered on Name and Date), it is possible to achieve the acceptable performance of the version (revised & fixed) which uses APPLY, and, from my personal point of view is the most qute one :-D:
select bs.Name
,bs.Price as StartPrice
,Min(bs.Date) as StartDT
,etp.NextPrice
,Min(etp.EndDT) as EndDT
from #Temp bs
outer apply (select top 1 et.Name
,et.Price as NextPrice
,et.Date as EndDt
from #Temp et
where et.Name = bs.Name and et.Price != bs.Price and et.Date > bs.Date
order by et.Date asc
) etp
group by bs.Name, bs.Price, etp.NextPrice
order by bs.Name, StartDT
June 2, 2010 at 10:06 am
I need to clarify :
elutin (6/2/2010)
This one, does return proper results, However, on the given dataset sample, it performs 3 scans & 26 logical reads (incl. 24 reads of CTE worktable) vs 34 scans but only 7 logical reads. ...
This should read as:
This one (submitted by Mark-101232), does return proper results, However, on the given dataset sample, it performs 3 scans & 26 logical reads (incl. 24 reads of CTE worktable) vs 4 scans but only 7 logical reads by "quirky update" method.
...
Also, Chris-Morris version of CTE just on 29 test data rows in the #Temp performs :
Table 'Worktable' (CTE). Scan count 5, logical reads 369
Table '#Temp'. Scan count 60, logical reads 60
All versions should be tested against the real data. Would be interesting to know which one performs best...
June 3, 2010 at 3:50 am
Hi All,
I have looked at your solutions and tried them against my data and here are the results.
Bhuvnesh
Your code worked fine, quite fast - 11 Seconds, 267 rows
Table 'PowerUpdate'. Scan count 5, logical reads 88, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'PowerItem'. Scan count 5, logical reads 1820, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(141337 row(s) affected)
Table '#work_______________________________________________________________________________________________________________000000000170'. Scan count 4, logical reads 4904, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#work_______________________________________________________________________________________________________________000000000170'. Scan count 1, logical reads 1275, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(141337 row(s) affected)
(267 row(s) affected)
Table '#work_______________________________________________________________________________________________________________000000000170'. Scan count 5, logical reads 1274, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Warning: Null value is eliminated by an aggregate or other SET operation.
Chris Morris-439714 (1:52pm)
No end price in your query, 240 rows returned, results not accurate
(240 row(s) affected)
Table 'PowerUpdate'. Scan count 10, logical reads 176, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'PowerItem'. Scan count 10, logical reads 3640, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Here are the first two rows of the results
$49.95 Value Pack0.14112010-02-17 00:05:11.6232010-03-21 09:20:07.430
$49.95 Value Pack0.14912010-02-08 13:00:02.3572010-02-11 00:05:09.663
The First Row has the wrong End Date
Mark-101232
Your code is fast and apprears accurate, 268 row(s) in ~6 seconds
Table 'PowerUpdate'. Scan count 10, logical reads 176, physical reads 6, read-ahead reads 18, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'PowerItem'. Scan count 10, logical reads 3622, physical reads 24, read-ahead reads 1632, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Chris Morris-439714 : 4:06pm
Sorry but this did not work and failed with the following after 1 minute 20 seconds.
Msg 530, Level 16, State 1, Line 4
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
Eugene Elutin
Your code had not completed after 14 minutes
For anyone who wants the real data that I was working with you can download the data from here.
http://blog.crowe.co.nz/attachments/powerdata.rar
I really appreciate the time you all have put into this query. I find it very cool to see the different solutions and hope one day I will be able to help others as you have done with me.
At this time Marks' code is fast and looks accurate, thanks very much.
Cheers
Chris
June 3, 2010 at 6:10 am
I couldn't believe your performance figures and I've desided to test myself.
After loading 140707 records into PowerItem table and adding index on PowerUpdateID (which I'm sure you have, otherwise your view will be always slow) I have run the "quirky" update and got 267 rows as results in 3 seconds. This result matches Bhuvnesh's 267 rows and one row short of Mark-101232 268 rows.
Where this extra row comes from? It does come from situation when the same price change happens multiple times: from 1750 to 1700 then from 1700 to 1750 and then again from 1750 to 1700. Therefore small fix in "quirky" update is required in the final query:
select Name
,NewPrice AS StartPrice
,MIN(NewDate) AS StartDate
,PrevPrice AS EndPrice
,PrevDateAS EndDate
from #Work
group by Name, NewPrice, PrevPrice, PrevDate
order by Name, StartDate ASC
I am running tests in Express edition on local pc, so my tempdb is on the same drive as main db. In real life, tempdb usually located on dedicated disk, which make additional performance benefits in relation of temp tables.
Looking into similar perfomance figures Mark CTE version is defenetly a winner as it's more elegant.
BTW. Version with OUTER APPLY is really slow in this case due to use of non-equal conditions and view which has no helpful indices.
Cheers,
Eugene
June 3, 2010 at 2:09 pm
I made a small mistake on my test results in that I said that the first result from was "Bhuvnesh's" but it was in fact from "Cold Coffee"
I did not notice the quote and just grabbed the wrong name
Cheers
Chris
June 3, 2010 at 2:12 pm
Thanks Eugene Elutin
I will check the missing index and look over the TSQL all of you have provided.
It is an idea I need to get better at
cheers
chris
June 22, 2010 at 10:22 pm
Hi again
I just wanted to let you know that I put the query I liked the most in this thread into a blog post.
The reason was that I appreciated the solution but did not fully understand why it works (the one that I choose)
So I tried to break it down into steps and hence the post came about.
Thanks again for your help
chris
June 23, 2010 at 12:59 pm
NOTE: This may be obsolete by now, I got very busy and just now able to spend a quick couple of minutes on this.
Please try this, testing it on a larger batch of data.
If you have have/add index of (Name, Price, Datetime) in that exact order, this query should be super-fast. Even without that, hopefully it will be reasonable π .
If not, you might have to put the CTE into a temp table so that it can be indexed.
;WITH CtePriceDate AS (
SELECT Name, Price, MIN(Datetime) AS Datetime,
ROW_NUMBER() OVER (ORDER BY Name, Price, MIN(Datetime)) AS RowNumber
FROM #Temp
GROUP BY Name, Price
)
SELECT cpd1.Name, cpd1.Price AS MinPrice, cpd1.Datetime AS MinDate,
cpd2.Price AS NextPrice, cpd2.Datetime AS NextDatetime
FROM CtePriceDate cpd1
LEFT OUTER JOIN CtePriceDate cpd2 ON cpd1.Name = cpd2.Name AND
cpd1.RowNumber = cpd2.RowNumber - 1
Scott Pletcher, SQL Server MVP 2008-2010
June 23, 2010 at 3:49 pm
Hi Scott
I tried your code and got a few issues if I sort by Name, MinDate as can be seen below.
The first row the next Date Time is actually after the 2nd rows's min date whereas they should be the same.
Name MinPriceMinDate NextPrice NextDatetime
$49.95 Value Pack0.16932010-02-06 22:24:30.6630.17162010-03-21 09:20:07.430
$49.95 Value Pack0.14912010-02-08 13:00:02.3570.15052010-02-12 00:05:09.017
$49.95 Value Pack0.15182010-02-11 00:05:09.6630.15562010-03-24 00:05:02.980
$49.95 Value Pack0.15052010-02-12 00:05:09.0170.15182010-02-11 00:05:09.663
It should have been
Name Min PriceMin DateTime NextPriceNext Price DateTime
$49.95 Value Pack0.16932010-02-06 22:24:30.6630.14912010-02-08 13:00:02.357
$49.95 Value Pack0.14912010-02-08 13:00:02.3570.15182010-02-11 00:05:09.663
$49.95 Value Pack0.15182010-02-11 00:05:09.6630.15052010-02-12 00:05:09.017
$49.95 Value Pack0.15052010-02-12 00:05:09.0170.14112010-02-17 00:05:11.623
$49.95 Value Pack0.14112010-02-17 00:05:11.6230.17162010-03-21 09:20:07.430
$49.95 Value Pack0.17162010-03-21 09:20:07.4300.14112010-03-22 10:50:03.027
Thanks for the post
Chris
June 23, 2010 at 5:12 pm
Sorry about that. I don't have much time to look ... I just do this for a quick minute or two when I need a break. This is my break! -- am I a database-geek or what!! π
Anyway, please try code below. I think it will work, but I'm not sure about how it will perform. Please let me know on both fronts.
--add another row so that the price goes back to an earlier price
--but at a later date, e.g., from $4 to $5 and then back to $4 (!)
INSERT INTO #Temp (id, [Name], Price, [DateTime])
SELECT 2998, '$49.95 Value Pack', 0.1505, 'Feb 22 2010 22:05'
;WITH CteDatePrice AS (
SELECT Name, Datetime, Price,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Datetime) AS RowNumber
FROM #Temp
), CteMinPriceDate AS (
SELECT cdp1.Name, cdp1.Price AS MinPrice, cdp1.Datetime AS MinDate,
ROW_NUMBER() OVER (ORDER BY cdp1.Name, cdp1.Datetime) AS RowNumber
FROM CteDatePrice cdp1
WHERE RowNumber = 1 OR NOT EXISTS(
SELECT 1
FROM CteDatePrice cdp2
WHERE cdp2.Name = cdp1.Name
AND cdp2.RowNumber = cdp1.RowNumber - 1
AND cdp2.Price = cdp1.Price)
)
SELECT cmpd1.Name, cmpd1.MinPrice, cmpd1.MinDate,
cmpd2.MinPrice AS NextPrice, cmpd2.MinDate AS NextDate
FROM CteMinPriceDate cmpd1
LEFT OUTER JOIN CteMinPriceDate cmpd2 ON cmpd2.Name = cmpd1.Name AND
cmpd2.RowNumber = cmpd1.RowNumber + 1
Scott Pletcher, SQL Server MVP 2008-2010
June 23, 2010 at 6:05 pm
I gave your new version a go Scott and I killed it after 40 minutes.
Chris
June 23, 2010 at 6:38 pm
Yeah, sorry, I knew the first CTE was a killer, but I just didn't have any time today.
Scott Pletcher, SQL Server MVP 2008-2010
June 23, 2010 at 7:28 pm
As mentioned already, assuming a proper index -- (name, date) INCLUDE price, in that order -- code below should work OK, although I still don't like how poorly the optimizer handles the query plan.
;WITH CteMinPriceDate AS (
SELECT t1.Name, t1.date AS Mindate, t1.Price AS MinPrice,
ROW_NUMBER() OVER (ORDER BY t1.Name, t1.date) AS RowNumber
FROM #Temp t1
WHERE price <> ISNULL((
SELECT TOP 1 t2.price
FROM #Temp t2
WHERE t2.Name = t1.Name
AND t2.date < t1.date
ORDER BY date DESC), -999)
)
SELECT cmpd1.Name, cmpd1.MinPrice, cmpd1.Mindate,
cmpd2.MinPrice AS NextPrice, cmpd2.Mindate AS NextDate
FROM CteMinPriceDate cmpd1
LEFT OUTER JOIN CteMinPriceDate cmpd2 ON cmpd2.Name = cmpd1.Name AND
cmpd2.RowNumber = cmpd1.RowNumber + 1
The query plan indicates it's worse than the other CTE version, but the execution plan analysis indicates it's better. :unsure:
If you've got time, I'd really appreciate knowing how it performed on your actual data π .
Scott Pletcher, SQL Server MVP 2008-2010
June 23, 2010 at 8:43 pm
I killed your last one scott after 14 minutes.
Just to be sure I ran the original one again from Mark-101232
WITH CTE1 AS (
SELECT Name ,Price ,DateTime,
ROW_NUMBER() OVER(PARTITION BY Name ORDER BY DATETime) AS rn1,
ROW_NUMBER() OVER(PARTITION BY Name,Price ORDER BY DATETime) AS rn2
FROM dbo.vPowerData),
CTE2 AS (
SELECT Name,Price AS [Min Price],MIN(DateTime) AS [Min DateTime],MAX(rn1) AS maxRN
FROM CTE1
GROUP BY Name,Price,rn2-rn1)
SELECT a.Name,a.[Min Price],a.[Min DateTime],
b.Price AS [NextPrice],
b.DateTime AS [Next Price DateTime]
FROM CTE2 a
LEFT OUTER JOIN CTE1 b ON b.Name=a.Name AND b.rn1=a.maxRN+1 AND b.DateTime>a.[Min DateTime]
ORDER BY a.Name,a.[Min DateTime];
It took 5 seconds
Chris
Viewing 15 posts - 16 through 30 (of 30 total)
You must be logged in to reply to this topic. Login to reply