May 11, 2009 at 9:07 pm
Comments posted to this topic are about the item ROW_NUMBER(): An Efficient Alternative to Subqueries
May 12, 2009 at 1:18 am
Using non-deterministic functions like ROW_NUMBER and NEWID need to be done carefully - my blog article: http://sqlblogcasts.com/blogs/tonyrogerson/archive/2008/06/12/bug-in-inline-expansion-of-non-deterministic-functions-in-derived-tables-and-cte-s-causes-incorrect-results.aspx explains the serious bug that exists in SQL Server 2005 with using them.
Tony.
May 12, 2009 at 2:19 am
And if you're using SQLServer2000, then you'll need to use a "TOP n" correlated subquery, since ROW_NUMBER() is not available.. eg:
select RecentVersion.*
from(select distinct ProductId from dbo.ProductVersion) ProductList
inner join dbo.ProductVersion RecentVersion
on RecentVersion.ProductId = ProductList.ProductId
and cast(RecentVersion.Version as varchar(10)) + '.' + cast(RecentVersion.MinorVersion as varchar(10)) + '.' + cast(RecentVersion.ReleaseVersion as varchar(10)) = (
select top 1 cast(x.Version as varchar(10)) + '.' + cast(x.MinorVersion as varchar(10)) + '.' + cast(x.ReleaseVersion as varchar(10))
from dbo.ProductVersion x
where x.ProductId = RecentVersion.ProductId
order by x.Version desc, x.MinorVersion desc, x.ReleaseVersion desc
)
May 12, 2009 at 4:59 am
Other uses of Row_number() function
http://sqlblogcasts.com/blogs/madhivanan/archive/2007/08/27/multipurpose-row-number-function.aspx
Failing to plan is Planning to fail
May 12, 2009 at 6:22 am
Great Article - thank you
May 12, 2009 at 7:29 am
Nice example!
Alternativly you could use a cursor to make it more readable and improve performance over a nested query. That said i dont like cursors and do prefer common table expressions!
SET ROWCOUNT 0
go
IF OBJECT_ID('tempdb..#ProductVersion') IS NULL
BEGIN
CREATE TABLE #ProductVersion
(
ProductID int NOT NULL,
Version int NOT NULL,
MinorVersion int NOT NULL,
ReleaseVersion int NOT NULL,
StandardCost numeric(30, 4) NOT NULL
)
END
DECLARE @ProductId INT
DECLARE A CURSOR FOR
SELECTProductId
FROMProductVersion
GROUP BY ProductId
ORDER BY ProductId
OPEN A
FETCH NEXT FROM A
INTO @ProductId
WHILE@@FETCH_STATUS = 0
BEGIN
INSERT INTO #ProductVersion
SELECTTOP 1 ProductId,
Version,
MinorVersion,
ReleaseVersion,
StandardCost
FROMProductVersion
WHEREProductId = @ProductId
ORDER BY ProductId ASC,
Version DESC,
MinorVersion DESC,
ReleaseVersion DESC
FETCH NEXT FROM A
INTO @ProductId
END
CLOSE A
DEALLOCATE A
SELECT * FROM #ProductVersion
DROP TABLE #ProductVersion
May 12, 2009 at 8:24 am
Thanks for the great article - I thought that you did a good job of explaining how to use the row_number function and comparing its performance to subqueries.
Tim
May 12, 2009 at 9:18 am
How could I leverage this to get, say, the "most current" version of a contract (i.e., either the contract currently in effect or the latest contract even if it was expired) *and* the latest version of the contract (i.e., it doesn't take effect until the future? I have a query using subqueries for this that looks something like this (this is simplified a bit). The row_number() query is much more concise but only returns that max contract date for each contract, which leaves out the current dates for those that have them:
SELECT DISTINCT group_number
,subgroup_number
,ISNULL(
(SELECT MAX(contract_thru_date)
FROM contracts curr1
WHERE curr1.group_number=curr.group_number
AND GETDATE() BETWEEN contract_from_date AND contract_thru_date
,(SELECT MAX(contract_thru_date)
FROM contracts prev
WHERE prev.group_number=curr.group_number
)
) contract_thru_date FROM contracts curr
EDIT: NM, I figured it out. All I had to was add an or statement to the where in the CTE. Great article, came at a good time! :w00t:
May 12, 2009 at 9:38 am
Well done article.
Question: I hate to open a can of worms on a different topic, but I'm so curious as to why you specified the nolock hint for the row_number version of the query. Do you know of specific problems that require use of nolock for this type of implementation?
Thanks.
May 12, 2009 at 9:43 am
tony rogerson (5/12/2009)
Using non-deterministic functions like ROW_NUMBER and NEWID need to be done carefully - my blog article: http://sqlblogcasts.com/blogs/tonyrogerson/archive/2008/06/12/bug-in-inline-expansion-of-non-deterministic-functions-in-derived-tables-and-cte-s-causes-incorrect-results.aspx explains the serious bug that exists in SQL Server 2005 with using them.
Afraid that I do not understand this. AFAIK, CTEs are table expressions like non-indexed views. If you write them non-deterministically (which is what you do in your blog article), then they return non-deterministic results. And if you reference a non-deterministic table expression multiple times, you are not guaranteed to get the same results. That's straight ANSI SQL to the best of my knowledge and I see no bug in Transact-SQL demonstrated here.
And as far as ROW_NUMBER() it only returns non-deterministic results if the OVER clause parameters are non-deterministic. If you use a deterministic ORDER BY (which also requires that the column set is uniquely orderable), then you will get deterministic results.
All this really demonstrates is that if you input non-deterministic expressions without instantiating them, then you will get non-deterministic results back. Garbage In, Garbage Out.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
May 12, 2009 at 12:16 pm
There are not any issues to my knowledge that require the NOLOCK hint for ROW_NUMBER(). I just used it for performance reasons.
May 12, 2009 at 12:19 pm
JJ B (5/12/2009)
Well done article.Question: I hate to open a can of worms on a different topic, but I'm so curious as to why you specified the nolock hint for the row_number version of the query. Do you know of specific problems that require use of nolock for this type of implementation?
Thanks.
There aren't any problems with the ROW_NUMBER() version of the query that require the NOLOCK hint. I just used it for the slight performance improvement.
May 12, 2009 at 12:20 pm
Francis Rodrigues (5/12/2009)
There are not any issues to my knowledge that require the NOLOCK hint for ROW_NUMBER(). I just used it for performance reasons.
Thanks for the clarification. I guess that's what I was trying to get at. Do you mean that ROW_NUMBER() has such bad performance that you need to resort to using NOLOCK? Or is using NOLOCK just a common practice for your queries to try to preempt performance problems?
Thanks.
May 12, 2009 at 12:31 pm
In what way was simple set logic unable to perform this operation? It looks like this could be solved with left outer joins and is null tests. I don't have the test db mentioned, so my > may be wrong, it could need to be a exclude1.Version
left join Production.ProductVersion exclude2
on p.ProductID = exclude2.ProductID
and p.Version = exclude2.Version
and p.MinorVersion > exclude2.MinorVersion
left join Production.ProductVersion exclude3
on p.ProductID = exclude3.ProductID
and p.Version = exclude3.Version
and p.MinorVersion = exclude3.MinorVersion
and p.ReleaseVersion > exclude3.ReleaseVersion
left join Production.ProductVersion exclude4
on p.ProductID = exclude4.ProductID
and p.Version = exclude4.Version
and p.MinorVersion = exclude4.MinorVersion
and p.ReleaseVersion = exclude4.ReleaseVersion
and p.StandardCost > exclude4.StandardCost
where exclude1.Version is null
and exclude2.MinorVersion is null
and exclude3.ReleaseVersion is null
and exclude4.StandardCost is null
[/code]
May 12, 2009 at 2:35 pm
wbrianwhite (5/12/2009)
In what way was simple set logic unable to perform this operation? It looks like this could be solved with left outer joins and is null tests. I don't have the test db mentioned, so my > may be wrong, it could need to be a <, I can never remember without testing it. But something like this should do the trick
I think it's just a matter of ...elegance? To me using Row_Number() statements (and CTEs in general) are easier to write and easier to follow in specific situations (provided the one reading knows what they do). And I can tell you just from testing this morning that the Row_Number() query has a performance improvement over the left joins and isnull tests I had been using previously.
Viewing 15 posts - 1 through 15 (of 60 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy