May 26, 2014 at 9:43 pm
Comments posted to this topic are about the item Cursor-Killing: Accessing Data in the Next Row
May 27, 2014 at 6:25 am
Yikes, it would never even occur to me to use a cursor in that situation (but I know there are those that do). I have always used a self-join. Pre-2005, in situations where the previous-period key was not a simple arithmetic calculation I would do something like use a correlated subquery to find the most recent period before the current one.
May 27, 2014 at 8:09 am
It's good to know about LAG and LEAD, that I didn't know of, but I use the approach below when faced with that kind of problem and don't have performance issues:
SELECT
T2.InvID, T2.Date, T2.NewLocationID AS LocationID, COUNT(*) AS ItemIdent INTO #Temp
FROM
Table AS T1
INNER JOIN Table AS T2 ON T1.InvID = T2.InvID AND T2.Date >= T1.Date
GROUP BY T2.InvID, T2.Date, T2.NewLocationID
SELECT T1.LocationID, DATEDIFF(d, T1.Date, T2.Date) AS QtyDaysInLocation
FROM #Temp AS T1 LEFT JOIN #Temp AS T2 ON T1.InvID = T2.InvID AND T1.ItemIdent = T2.ItemIdent - 1
May 27, 2014 at 8:24 am
Do not normally direct others to sites from the one the forum is on, this is another good explanation and illustration of LEAD and LAG: http://blog.sqlauthority.com/2011/11/15/sql-server-introduction-to-lead-and-lag-analytic-functions-introduced-in-sql-server-2012/
May 27, 2014 at 8:39 am
Pedro Gomes-379488 (5/27/2014)
It's good to know about LAG and LEAD, that I didn't know of, but I use the approach below when faced with that kind of problem and don't have performance issues:SELECT
T2.InvID, T2.Date, T2.NewLocationID AS LocationID, COUNT(*) AS ItemIdent INTO #Temp
FROM
Table AS T1
INNER JOIN Table AS T2 ON T1.InvID = T2.InvID AND T2.Date >= T1.Date
GROUP BY T2.InvID, T2.Date, T2.NewLocationID
SELECT T1.LocationID, DATEDIFF(d, T1.Date, T2.Date) AS QtyDaysInLocation
FROM #Temp AS T1 LEFT JOIN #Temp AS T2 ON T1.InvID = T2.InvID AND T1.ItemIdent = T2.ItemIdent - 1
Be careful with this as it's considered hidden RBAR and can be even worse than a cursor. Read the following article on it. http://www.sqlservercentral.com/articles/T-SQL/61539/
May 27, 2014 at 8:53 am
There are certainly many ways to accomplish this and end up with the same results---but performance is the key in deciding what to do. LEAD and LAG are very, very efficient---and their value increases if you need other window functions over the same window, as the window itself can be shared across each.
Definitely test each case to determine how many seeks/scans are being performed and how IO looks---especially if you are upgrading to SQL Server 2012 and have new tools at your disposal that can be applied to older code.
A correlated subquery or cursor/WHILE solution may be adequate for small data sets, but data can change over time, so caution should be exercised before making changes now that could end up being a hindrance in the future.
May 27, 2014 at 9:10 am
I forgot to mention. Thank you for the article Ed. It's nice,I wish that I could use it more, but we haven't migrated to 2012 (or 2014). I have to stay with the ROW_NUMBER() version as it seems to be the best option after the LEAD and LAG option you presented.
May 27, 2014 at 12:47 pm
Thanks for the advice. My query is a little more controlled than the cases exposed in the article.
I use an equal join and only >= inside the same ID, which reduces the product.
Besides, I rarely use subqueries in the SELECT clause, only the in the FROM clause. I know it doesn't make any difference but the former seems to be more keen to getting out of control than the latter. By using subqueries in the FROM clause you get more concious of the cardinalities involved in your joins.
May 27, 2014 at 1:07 pm
Don't be so sure about the control that you expect by the limit on the ID. The rows are still being generated and you can easily check it with the actual execution plan. With a few thousand rows, the actual number of rows generated can be huge.
May 28, 2014 at 4:31 am
CELKO (5/27/2014)
I would use a different data model that has a correct temporal interval (start_date, end_date).
I agree completely Joe. However I find myself using this technique a lot in order to get someone else's data into that correct model.
You always have to start from where you are rather than where you'd like to be.
June 2, 2014 at 2:56 pm
I look forward to articles like this displaying the new technology. We're on SS 2008 and have no plans to change any time soon. So I don't have a chance to play with the newer stuff.
I would either have built a temp table with indexes and populated from the parent table, then query the parent table with the temp table using "index +1" or some such logic. As one poster mentioned, I have also used the self-join. However, as was mentioned, the parent table could use a makeover. A "lastupdate" field being fired from an update trigger sounds like a good idea to me.
Thanx!
Wallace Houston
Sunnyland Farms, Inc.
"We must endeavor to persevere."
December 18, 2015 at 2:15 am
Window functions are great and work well when you want to access records related in some way to every record. The only problem with LEAD and LAG and window functions is they can only access records within your result set. When you need to query only certain records but want access many different other records somehow related to those records. I like Outer Apply. The performance difference you see with Joins and Outer Applies is due to proper indexing.
December 18, 2015 at 5:27 am
Did you also compile a table of results based on
10,000
1,000,000
and other counts of data?
412-977-3526 call/text
December 18, 2015 at 6:05 am
These specific examples were on smaller data sets---specifically taken from a question asked online prior to this article's original publication. I've tested window functions on other data sets (anywhere from a few rows to billions), and performance should be tailored to the expected data growth in the future, whatever that happens to be.
Window functions are very convenient, but I find it important to test thoroughly as they can be performance hogs if the range being windowed is large enough (as that will incur a scan across whatever range is being queried for, or a seek if an index happens to cover everything involved).
robert.sterbal 56890 (12/18/2015)
Did you also compile a table of results based on10,000
1,000,000
and other counts of data?
December 18, 2015 at 7:26 am
Window functions are great and work well when you want to access records related in some way to current records. The only problem with LEAD and LAG and window functions is they can only access records within your result set. When you need data from only a few select Records or want access many different other records somehow related to those records or these other records are spread out through what would be a large result set. I favor using derived tables correlated with an Outer Apply. The performance difference you see with the Joins and Outer Apply methods here is usually due to proper indexing.
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply