May 11, 2009 at 9:07 pm
Comments posted to this topic are about the item ROW_NUMBER(): An Efficient Alternative to Subqueries
May 12, 2009 at 1:18 am
Using non-deterministic functions like ROW_NUMBER and NEWID need to be done carefully - my blog article: http://sqlblogcasts.com/blogs/tonyrogerson/archive/2008/06/12/bug-in-inline-expansion-of-non-deterministic-functions-in-derived-tables-and-cte-s-causes-incorrect-results.aspx explains the serious bug that exists in SQL Server 2005 with using them.
Tony.
May 12, 2009 at 2:19 am
And if you're using SQLServer2000, then you'll need to use a "TOP n" correlated subquery, since ROW_NUMBER() is not available.. eg:
select RecentVersion.*
from(select distinct ProductId from dbo.ProductVersion) ProductList
inner join dbo.ProductVersion RecentVersion
on RecentVersion.ProductId = ProductList.ProductId
and cast(RecentVersion.Version as varchar(10)) + '.' + cast(RecentVersion.MinorVersion as varchar(10)) + '.' + cast(RecentVersion.ReleaseVersion as varchar(10)) = (
select top 1 cast(x.Version as varchar(10)) + '.' + cast(x.MinorVersion as varchar(10)) + '.' + cast(x.ReleaseVersion as varchar(10))
from dbo.ProductVersion x
where x.ProductId = RecentVersion.ProductId
order by x.Version desc, x.MinorVersion desc, x.ReleaseVersion desc
)
May 12, 2009 at 4:59 am
Other uses of Row_number() function
http://sqlblogcasts.com/blogs/madhivanan/archive/2007/08/27/multipurpose-row-number-function.aspx
Failing to plan is Planning to fail
May 12, 2009 at 6:22 am
Great Article - thank you
May 12, 2009 at 7:29 am
Nice example!
Alternativly you could use a cursor to make it more readable and improve performance over a nested query. That said i dont like cursors and do prefer common table expressions!
SET ROWCOUNT 0
go
IF OBJECT_ID('tempdb..#ProductVersion') IS NULL
BEGIN
CREATE TABLE #ProductVersion
(
ProductID int NOT NULL,
Version int NOT NULL,
MinorVersion int NOT NULL,
ReleaseVersion int NOT NULL,
StandardCost numeric(30, 4) NOT NULL
)
END
DECLARE @ProductId INT
DECLARE A CURSOR FOR
SELECTProductId
FROMProductVersion
GROUP BY ProductId
ORDER BY ProductId
OPEN A
FETCH NEXT FROM A
INTO @ProductId
WHILE@@FETCH_STATUS = 0
BEGIN
INSERT INTO #ProductVersion
SELECTTOP 1 ProductId,
Version,
MinorVersion,
ReleaseVersion,
StandardCost
FROMProductVersion
WHEREProductId = @ProductId
ORDER BY ProductId ASC,
Version DESC,
MinorVersion DESC,
ReleaseVersion DESC
FETCH NEXT FROM A
INTO @ProductId
END
CLOSE A
DEALLOCATE A
SELECT * FROM #ProductVersion
DROP TABLE #ProductVersion
May 12, 2009 at 8:24 am
Thanks for the great article - I thought that you did a good job of explaining how to use the row_number function and comparing its performance to subqueries.
Tim
May 12, 2009 at 9:18 am
How could I leverage this to get, say, the "most current" version of a contract (i.e., either the contract currently in effect or the latest contract even if it was expired) *and* the latest version of the contract (i.e., it doesn't take effect until the future? I have a query using subqueries for this that looks something like this (this is simplified a bit). The row_number() query is much more concise but only returns that max contract date for each contract, which leaves out the current dates for those that have them:
SELECT DISTINCT group_number
,subgroup_number
,ISNULL(
(SELECT MAX(contract_thru_date)
FROM contracts curr1
WHERE curr1.group_number=curr.group_number
AND GETDATE() BETWEEN contract_from_date AND contract_thru_date
,(SELECT MAX(contract_thru_date)
FROM contracts prev
WHERE prev.group_number=curr.group_number
)
) contract_thru_date FROM contracts curr
EDIT: NM, I figured it out. All I had to was add an or statement to the where in the CTE. Great article, came at a good time! :w00t:
May 12, 2009 at 9:38 am
Well done article.
Question: I hate to open a can of worms on a different topic, but I'm so curious as to why you specified the nolock hint for the row_number version of the query. Do you know of specific problems that require use of nolock for this type of implementation?
Thanks.
May 12, 2009 at 9:43 am
tony rogerson (5/12/2009)
Using non-deterministic functions like ROW_NUMBER and NEWID need to be done carefully - my blog article: http://sqlblogcasts.com/blogs/tonyrogerson/archive/2008/06/12/bug-in-inline-expansion-of-non-deterministic-functions-in-derived-tables-and-cte-s-causes-incorrect-results.aspx explains the serious bug that exists in SQL Server 2005 with using them.
Afraid that I do not understand this. AFAIK, CTEs are table expressions like non-indexed views. If you write them non-deterministically (which is what you do in your blog article), then they return non-deterministic results. And if you reference a non-deterministic table expression multiple times, you are not guaranteed to get the same results. That's straight ANSI SQL to the best of my knowledge and I see no bug in Transact-SQL demonstrated here.
And as far as ROW_NUMBER() it only returns non-deterministic results if the OVER clause parameters are non-deterministic. If you use a deterministic ORDER BY (which also requires that the column set is uniquely orderable), then you will get deterministic results.
All this really demonstrates is that if you input non-deterministic expressions without instantiating them, then you will get non-deterministic results back. Garbage In, Garbage Out.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
May 12, 2009 at 12:16 pm
There are not any issues to my knowledge that require the NOLOCK hint for ROW_NUMBER(). I just used it for performance reasons.
May 12, 2009 at 12:19 pm
JJ B (5/12/2009)
Well done article.Question: I hate to open a can of worms on a different topic, but I'm so curious as to why you specified the nolock hint for the row_number version of the query. Do you know of specific problems that require use of nolock for this type of implementation?
Thanks.
There aren't any problems with the ROW_NUMBER() version of the query that require the NOLOCK hint. I just used it for the slight performance improvement.
May 12, 2009 at 12:20 pm
Francis Rodrigues (5/12/2009)
There are not any issues to my knowledge that require the NOLOCK hint for ROW_NUMBER(). I just used it for performance reasons.
Thanks for the clarification. I guess that's what I was trying to get at. Do you mean that ROW_NUMBER() has such bad performance that you need to resort to using NOLOCK? Or is using NOLOCK just a common practice for your queries to try to preempt performance problems?
Thanks.
May 12, 2009 at 12:31 pm
In what way was simple set logic unable to perform this operation? It looks like this could be solved with left outer joins and is null tests. I don't have the test db mentioned, so my > may be wrong, it could need to be a exclude1.Version
left join Production.ProductVersion exclude2
on p.ProductID = exclude2.ProductID
and p.Version = exclude2.Version
and p.MinorVersion > exclude2.MinorVersion
left join Production.ProductVersion exclude3
on p.ProductID = exclude3.ProductID
and p.Version = exclude3.Version
and p.MinorVersion = exclude3.MinorVersion
and p.ReleaseVersion > exclude3.ReleaseVersion
left join Production.ProductVersion exclude4
on p.ProductID = exclude4.ProductID
and p.Version = exclude4.Version
and p.MinorVersion = exclude4.MinorVersion
and p.ReleaseVersion = exclude4.ReleaseVersion
and p.StandardCost > exclude4.StandardCost
where exclude1.Version is null
and exclude2.MinorVersion is null
and exclude3.ReleaseVersion is null
and exclude4.StandardCost is null
[/code]
May 12, 2009 at 2:35 pm
wbrianwhite (5/12/2009)
In what way was simple set logic unable to perform this operation? It looks like this could be solved with left outer joins and is null tests. I don't have the test db mentioned, so my > may be wrong, it could need to be a <, I can never remember without testing it. But something like this should do the trick
I think it's just a matter of ...elegance? To me using Row_Number() statements (and CTEs in general) are easier to write and easier to follow in specific situations (provided the one reading knows what they do). And I can tell you just from testing this morning that the Row_Number() query has a performance improvement over the left joins and isnull tests I had been using previously.
Viewing 15 posts - 1 through 15 (of 60 total)
You must be logged in to reply to this topic. Login to reply