May 3, 2012 at 8:22 am
thanks for the book reference
May 3, 2012 at 8:49 am
novak 32871 (5/3/2012)
I know but there is already SQL 2012 out. It is good exercise but 2000/2003 has it's age and most of clients already moved. But I understand your point.
Hardly. Some people have moved, but the last surveys I saw earlier this year had the majority (>50%) of databases on SQL 2000/2005
May 3, 2012 at 11:17 am
The recursive CTE of these procedures is much faster on large data sets. Easier to grok, too, IMO:
CREATE PROCEDURE [dbo].[usp_Employee_Select_Down_Param_EmpID]
@selectEmpID INT = NULL, -- NULL = all
@maxSupLevel INT = 10 -- NULL = no limit
AS
SET @selectEmpID = ISNULL(@selectEmpID, 0);
WITH [workTbl]
AS
(
SELECT
[e1].[EmpID]
, [e1].[EmpName]
, [e1].[SupID]
, 0 AS [SupLevel]
FROM [dbo].[Employee] AS [e1]
WHERE
[e1].[EmpID] = @selectEmpID
UNION ALL
SELECT
[e2].[EmpID]
, [e2].[EmpName]
, [e2].[SupID]
, [w].[SupLevel] + 1 AS [SupLevel]
FROM [workTbl] AS [w]
JOIN [dbo].[Employee] AS [e2] ON
[e2].[SupID] = [w].[EmpID]
)
SELECT
[EmpID]
, [EmpName]
, [SupID]
, [SupLevel]
FROM [workTbl]
WHERE [SupLevel] <= @maxSupLevel
ORDER BY
[SupLevel]
, [SupID]
, [EmpID];
GO
CREATE PROCEDURE [dbo].[usp_Employee_Select_Up_Param_EmpID]
@selectEmpID INT, -- required
@maxSupLevel INT = 10 -- NULL = no limit
AS
DECLARE @workTbl TABLE
(
[EmpID] INT NOT NULL
, [EmpName] VARCHAR(50) NOT NULL
, [SupID] INT NULL
, [SupLevel] INT NOT NULL
);
WITH [workTbl]
AS
(
SELECT
[e1].[EmpID]
, [e1].[EmpName]
, [e1].[SupID]
, 0 AS [SupLevel]
FROM [dbo].[Employee] AS [e1]
WHERE
[e1].[EmpID] = @selectEmpID
UNION ALL
SELECT
[e2].[EmpID]
, [e2].[EmpName]
, [e2].[SupID]
, [w].[SupLevel] - 1 AS [SupLevel]
FROM [workTbl] AS [w]
JOIN [dbo].[Employee] AS [e2] ON
[w].[SupID] = [e2].[EmpID]
)
INSERT @workTbl
SELECT
[EmpID]
, [EmpName]
, [SupID]
, [SupLevel]
FROM [workTbl]
WHERE [SupLevel] >= -@maxSupLevel;
DECLARE @minFoundSupLevel INT;
SELECT @minFoundSupLevel = MIN([SupLevel]) FROM @workTbl;
SELECT
[EmpID]
, [EmpName]
, [SupID]
, [SupLevel] - @minFoundSupLevel AS [SupLevel]
FROM @workTbl
ORDER BY
[SupLevel]
, [SupID]
, [EmpID];
GO
May 3, 2012 at 11:58 am
novak 32871 (5/3/2012)
I know but there is already SQL 2012 out. It is good exercise but 2000/2003 has it's age and most of clients already moved. But I understand your point.
Oooh, 2003, that was an excellent edition of SQL Server!
But very exclusive, hard to find.
π
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
May 3, 2012 at 5:53 pm
W. Kevin Hazzard (5/3/2012)
The recursive CTE of these procedures is much faster on large data sets. Easier to grok, too, IMO:
CREATE PROCEDURE [dbo].[usp_Employee_Select_Down_Param_EmpID]
@selectEmpID INT = NULL, -- NULL = all
@maxSupLevel INT = 10 -- NULL = no limit
AS
SET @selectEmpID = ISNULL(@selectEmpID, 0);
WITH [workTbl]
AS
(
SELECT
[e1].[EmpID]
, [e1].[EmpName]
, [e1].[SupID]
, 0 AS [SupLevel]
FROM [dbo].[Employee] AS [e1]
WHERE
[e1].[EmpID] = @selectEmpID
UNION ALL
SELECT
[e2].[EmpID]
, [e2].[EmpName]
, [e2].[SupID]
, [w].[SupLevel] + 1 AS [SupLevel]
FROM [workTbl] AS [w]
JOIN [dbo].[Employee] AS [e2] ON
[e2].[SupID] = [w].[EmpID]
)
SELECT
[EmpID]
, [EmpName]
, [SupID]
, [SupLevel]
FROM [workTbl]
WHERE [SupLevel] <= @maxSupLevel
ORDER BY
[SupLevel]
, [SupID]
, [EmpID];
GO
CREATE PROCEDURE [dbo].[usp_Employee_Select_Up_Param_EmpID]
@selectEmpID INT, -- required
@maxSupLevel INT = 10 -- NULL = no limit
AS
DECLARE @workTbl TABLE
(
[EmpID] INT NOT NULL
, [EmpName] VARCHAR(50) NOT NULL
, [SupID] INT NULL
, [SupLevel] INT NOT NULL
);
WITH [workTbl]
AS
(
SELECT
[e1].[EmpID]
, [e1].[EmpName]
, [e1].[SupID]
, 0 AS [SupLevel]
FROM [dbo].[Employee] AS [e1]
WHERE
[e1].[EmpID] = @selectEmpID
UNION ALL
SELECT
[e2].[EmpID]
, [e2].[EmpName]
, [e2].[SupID]
, [w].[SupLevel] - 1 AS [SupLevel]
FROM [workTbl] AS [w]
JOIN [dbo].[Employee] AS [e2] ON
[w].[SupID] = [e2].[EmpID]
)
INSERT @workTbl
SELECT
[EmpID]
, [EmpName]
, [SupID]
, [SupLevel]
FROM [workTbl]
WHERE [SupLevel] >= -@maxSupLevel;
DECLARE @minFoundSupLevel INT;
SELECT @minFoundSupLevel = MIN([SupLevel]) FROM @workTbl;
SELECT
[EmpID]
, [EmpName]
, [SupID]
, [SupLevel] - @minFoundSupLevel AS [SupLevel]
FROM @workTbl
ORDER BY
[SupLevel]
, [SupID]
, [EmpID];
GO
Another one with yet another unfounded, unproven claim of performance. This is how rumors get started, folks. Whether it's true or not, without proof, claims of performance should remain highly dubious until proof is actually offered.
At the personal level, I think it's much easier to "grok" things that don't have recursion. π
There are now 3 people on this thread who have unsubstantiated claims of performance. Any of you willing to prove it with code? π
--Jeff Moden
Change is inevitable... Change for the better is not.
May 3, 2012 at 6:04 pm
imarran (5/3/2012)
I also have a solution in the form of a view that you can join to that will display upline and down line structure depending on how it is joined toIts quite technical but if you guys are keen I will submit it through an article
I don't know about anyone else, but I'd love to see it. Write that bad boy up! π
--Jeff Moden
Change is inevitable... Change for the better is not.
May 3, 2012 at 6:07 pm
Koen Verbeeck (5/3/2012)
novak 32871 (5/3/2012)
I know but there is already SQL 2012 out. It is good exercise but 2000/2003 has it's age and most of clients already moved. But I understand your point.Oooh, 2003, that was an excellent edition of SQL Server!
But very exclusive, hard to find.
π
I assume 2003 refers to Windows Server 2003 as a platform upon which many people still run SQL Server 2000.
May 3, 2012 at 8:13 pm
Jeff, the while loop-based approach works well if you have a few thousand records or less. And for employee data, that design is likely to provide sufficient performance. If you find a large enough set of hierarchical data in the public domain, I'd put recursive CTEs up against the iterative approach any day. On the comprehension issue, I find recursive CTEs to be very expressive, much easier to understand than row-by-row designs that accomplish the same thing. Not everyone's agrees with me though but that's OK. To each his own.
May 3, 2012 at 9:48 pm
Just for the record, I *LIKE* recursive code. I've written a lot of recursive code in various languages over the years, as well as reusable and reenterable code.
I do want to keep in mind that when working with, and for, people with less experience in those areas it can be a big jump for them.
I just haven't used the rCTE construct (yet). This blog has given me the confidence to go ahead and do it. For that I thank you all.
Chuck Hoffman
------------
I'm not sure why we're here, but I am sure that while
we're here we're supposed to help each other
------------
May 4, 2012 at 12:20 am
From microsoft's own documentation, they don't claim rCTEs to be faster than any other option.
So that would leave the main use of CTE to be writing "simpler", more concise sql (which of course can come down to personal taste, as single-statement-scripts can be a nightmare too).
So I'm not sure about where people are getting the idea that it is faster. However! What I asked in the first comment in this thread was whether rCTEs are "any worse" than a manual looping structure, i.e. is there any hidden overhead? Essentially they are both looping over the same set of data and joining more rows to it through recursion, correct?
If the performance is equivalent, then it comes down to style, taking into account the following factors: readability, compatibility, and re-useability.
May 4, 2012 at 6:52 am
Great points, Miika. In my experience with terabytes of hierarchical data in my system, recursive CTEs are faster, especially when going down the tree. Going up creates a single "thread" of records so the iterative approach tends to be about as fast as recursive CTEs in that case. Going down the tree creates many more records due to the one-to-many relationships that exist.
If Jeff Moden (or anyone) aims me at some public domain data that's hierarchical in nature, I'll load it, test both approaches for traversing up and down then I'll publish the results. I'd need a million rows or so, I think to show big differences. The other thing that will show dramatic results is having a lots of one-to-many relationships as you move down the tree. That's where the query optimizer in SQL injects most of the performance advantages that recursive CTEs provide, in my experience.
May 4, 2012 at 7:55 am
W. Kevin Hazzard (5/4/2012)
Great points, Miika. In my experience with terabytes of hierarchical data in my system, recursive CTEs are faster, especially when going down the tree. Going up creates a single "thread" of records so the iterative approach tends to be about as fast as recursive CTEs in that case. Going down the tree creates many more records due to the one-to-many relationships that exist.If Jeff Moden (or anyone) aims me at some public domain data that's hierarchical in nature, I'll load it, test both approaches for traversing up and down then I'll publish the results. I'd need a million rows or so, I think to show big differences. The other thing that will show dramatic results is having a lots of one-to-many relationships as you move down the tree. That's where the query optimizer in SQL injects most of the performance advantages that recursive CTEs provide, in my experience.
How big a hierarchy have you actually worked with? I ask because million node hierarchies don't exactly grow on tree's (no pun intended) on the internet and I'd like to know what you're basing your claims of performance on.
With that thought in mind, you'll have to use some made-up data. The following stored proc will allow you to do some very quick and flexible testing for small numbers of nodes in a hierarchy and takes just 22 seconds (or less on a good box) to build a "pure" DAG with a million nodes.
--===== Do this in a nice safe place that everyone has.
USE tempdb
;
-- DROP PROCEDURE dbo.BuildTestHierarchy
;
GO
CREATE PROCEDURE dbo.BuildTestHierarchy
/******************************************************************************
Create a randomized "clean" hierarchy. Each EmployeeID (except the first one,
of course) is assigned a random ManagerID number which is always less than the
current EmployeeID. This code runs nasty fast and is great for testing
hierarchical processing code.
Usage: (both examples build a million row Adjacency List Hierarchy)
EXEC dbo.BuildTestHierarchy 1000000
EXEC dbo.BuildTestHierarchy @pRowsToBuild = 1000000
Revision History:
Rev 00 - 28 Apr 2010 - Jeff Moden - Initial creation and test.
Rev 01 - 15 May 2010 - Jeff Moden - Abort if current DB isn't "tempdb" to
protect users that want to "play".
******************************************************************************/
--===== Declare the I/O parameters
@pRowsToBuild INT
AS
--===== Make sure that we're in a safe place to run this...
IF DB_NAME() <> N'tempdb'
BEGIN
RAISERROR('Current DB is NOT tempdb. Run aborted.',11,1);
RETURN;
END
;
--===== Conditionaly drop the test table so we can do reruns more easily
IF OBJECT_ID('TempDB.dbo.Employee','U') IS NOT NULL
DROP TABLE TempDB.dbo.Employee
;
--===== Build the test table and populate it on the fly.
-- Everything except ManagerID is populated here.
SELECT TOP (@pRowsToBuild)
ISNULL(ROW_NUMBER() OVER (ORDER BY (SELECT 1)),0) AS EmployeeID,
CAST(0 AS INT) AS ManagerID,
CAST(NEWID() AS VARCHAR(36)) AS EmployeeName,
(ABS(CHECKSUM(NEWID()))%12+1)*1000 AS Sales
INTO TempDB.dbo.Employee
FROM Master.sys.All_Columns ac1
CROSS JOIN Master.sys.All_Columns ac2
;
--===== Update the test table with ManagerID's. The ManagerID is some random
-- value which is always less than the current EmployeeID to keep the
-- hierarchy "clean" and free from "loop backs".
UPDATE TempDB.dbo.Employee
SET ManagerID = CASE
WHEN EmployeeID > 1
THEN ABS(CHECKSUM(NEWID())) % (EmployeeID-1) +1
ELSE NULL
END
;
--===== Add some indexes that most folks would like have on such a table
ALTER TABLE TempDB.dbo.Employee
ADD CONSTRAINT PK_Employee
PRIMARY KEY CLUSTERED (EmployeeID)
;
CREATE INDEX IX_Employee_ManagerID
ON TempDB.dbo.Employee (ManagerID)
;
I'd need a million rows or so, I think to show big differences
See? That's what I'm talking about. A lot of people "think" the rCTE will be faster but I'm not so sure that anyone making that claim on this thread has actually tested to find out for sure. π
--Jeff Moden
Change is inevitable... Change for the better is not.
May 5, 2012 at 8:49 pm
Proving yet again there is always more than one way to skin a cat. This may not be the best way, but it is clearly better than the author's first solution. rCTE is the most efficient solution, and having co-workers with less experience is not a good enough reason to skip implementing it. The less experienced should challenge themselves, break out of their comfort zones, and take the time to learn.
May 6, 2012 at 7:55 am
@tmarkham1 - I taught programming languages in college for 11 years and loved every minute of it. It was hard work but I found that I learned as much as the students did, not about programming languages per se, but about human nature and the power of exploration. Once in a while, though, a student would teach me something entirely new. And I loved that most of all.
The difference between Chuck's iterative approach and the recursive approach highlights potentially the largest split between developers in the world today. And I don't mean just software development using languages like Java and C#. This applies to database developers, too. We write a lot of code, don't we?
Chuck used the so-called imperative approach to his design: ordered, often iterative steps designed to mutate data in an external structure (i.e. his @workTbl variable). My response was functional and set-based, using recursive function calls and limiting state changes to the inside of the recursive "thread" (i.e. my [workTbl] common table expression). The key point I want to make here is that neither of these approaches is better than the other unless they offer some advantages (to Jeff Moden's point).
I think the recursive approach is better because it's in keeping with the set-based mindset of T-SQL. It's not just WHILE loops that I avoid. I shun cursors for the same reason. The whole idea of category theory (which is what T-SQL is derived from), is to get out of the mindset of thinking about instances of data and to start thinking of data more categorically. WHILE loops, like cursors, get your brain into "instance mode", making you think at a lower level of abstraction. Recursive CTEs on the other hand force your brain into "category mode", making you think at a higher level of abstraction.
When database developers cringe at recursion, most of the time it's because they aren't allowing themselves the luxury of higher abstraction. But this is our greatest advantage in the database realm. We have tools like JOIN, UNION, INTERSECT, etc. that allow us to escape from "instance brain". That escape hatch allows us to handle immense amounts of information with ease. Now, I just need to prove to Jeff that recursive CTEs are better by being faster, too, and we'll have some real fun. π
Kudos to Chuck for his article. I loved it which is why I responded in the first place. I hope that he's emboldened by the experience and shares more great ideas in the future.
May 6, 2012 at 8:23 am
You can add my vote to seeing imarranβs hierarchy solution using views.
Chuck has stated, in an earlier posting in this thread, that he is supporting SQL Server 2000 databases (amongst others), so CTEs are not available in that environment, irrespective of any performance improvements they may or may not provide. They just do not work.
Dave
Viewing 15 posts - 16 through 30 (of 49 total)
You must be logged in to reply to this topic. Login to reply