September 29, 2011 at 11:48 am
I'm trying to better understand recursive CTE queries, and I've hit a bit of a hurdle. In the recursive portion, I'm trying to build a comma-delimited list of values. I want to end up only with the final (largest) list for each group, but I'm getting every step in the recursion.
(This is my first post, so I apologize if I screw up the tags...)
CREATE TABLE #Orders
(
EmpID INT,
OrderID VARCHAR(30),
OrderDate DATETIME
)
INSERT INTO #Orders
VALUES (101, 95001, '03/01/2011'),
(101, 95026, '03/01/2011'),
(101, 95053, '03/01/2011'),
(104, 95014, '03/06/2011'),
(104, 95018, '03/06/2011'),
(106, 95077, '03/03/2011'),
(106, 95121, '03/03/2011'),
(106, 95085, '03/03/2011'),
(106, 95222, '03/09/2011'),
(115, 95117, '03/05/2011'),
(115, 95141, '03/05/2011'),
(130, 95066, '03/11/2011'),
(142, 95512, '03/19/2011'),
(142, 95588, '03/19/2011'),
(142, 95722, '03/26/2011'),
(142, 95764, '03/26/2011'),
(142, 95848, '03/26/2011')
WITH OrderRows AS (
SELECT ROW_NUMBER() OVER(PARTITION BY EmpID, OrderDate ORDER BY EmpID, OrderID, OrderDate) AS RowNum,
EmpID, OrderID, CONVERT(VARCHAR(10), OrderDate, 101) AS OrderDate
FROM #Orders
),
CTE AS (
SELECT RowNum, EmpID, CAST(OrderID AS VARCHAR(500)) AS OrderID, OrderDate
FROM OrderRows
UNION ALL
SELECT O.RowNum, C.EmpID, CAST(C.OrderID + ISNULL(', ' + O.OrderID, '') AS VARCHAR(500)), C.OrderDate
FROM OrderRows O
INNER JOIN CTE C
ON O.EmpID = C.EmpID
AND O.OrderDate = C.OrderDate
AND O.RowNum = C.RowNum + 1
)
SELECT *
FROM CTE C
ORDER BY EmpID
The first CTE (OrderRows) is there to derive a table with a rownum, for joining purposes. The CAST() of OrderID is to get the data types to match in the anchor and recursive portions. The CONVERT() is just to drop the time portion.
So, these are the results I want (don't need RowNum; it's there just for comprehension):
RowNumEmpIDOrderDateOrderID
31013/1/201195001, 95026, 95053
21043/6/201195014, 95018
31063/3/201195077, 95085, 95121
11063/9/201195222
21153/5/201195117, 95141
11303/11/201195066
21423/19/201195512, 95588
31423/26/201195722, 95764, 95848
And these are the results I get:
RowNumEmpIDOrderDateOrderID
11013/1/201195001
21013/1/201195026
31013/1/201195053
31013/1/201195026, 95053
21013/1/201195001, 95026
31013/1/201195001, 95026, 95053
21043/6/201195014, 95018
11043/6/201195014
21043/6/201195018
11063/3/201195077
21063/3/201195085
31063/3/201195121
11063/9/201195222
31063/3/201195085, 95121
21063/3/201195077, 95085
31063/3/201195077, 95085, 95121
21153/5/201195117, 95141
11153/5/201195117
21153/5/201195141
11303/11/201195066
11423/19/201195512
21423/19/201195588
11423/26/201195722
21423/26/201195764
31423/26/201195848
31423/26/201195764, 95848
21423/26/201195722, 95764
31423/26/201195722, 95764, 95848
21423/19/201195512, 95588
Help? ๐
September 29, 2011 at 11:57 am
OK, I just figured out to add
WHERE RowNum = 1
in the anchor portion of the recursive CTE, but I'm still not quite there.
September 29, 2011 at 12:03 pm
Which is more important to you ?
Learning about recursive CTEs
OR
Getting the required query results?
I ask because we can teach you about CTEs or we can show you the better way to get the query result.
The two are not the same thing.
Thanks for the sample data - good work!
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
September 29, 2011 at 12:16 pm
Really, both are important, in a sense. I already HAVE the results in the form I wanted (and not by just deleting results rows); I ended up creating a temp table storing the max(len(OrderID)) for each EmpID/OrderDate group, then joined back to that to just get the rows I wanted.
But I'm sure this could be done in a single query, right? If you're suggesting there is a more elegant solution WITHOUT CTE's, I'd love to see that! But this actually came about as I was spending some time this morning working on understanding CTE's better. (My current puzzle is breaking a string of words into strings of n words each, which is a bit over my head right now.)
So if I had to pick one of your options, it's to understand CTE's better. I already emailed out the results desired anyway. ๐
ETA: Now that I've added in the restriction of RowNum = 1 in the anchor portion of the recursive CTE, I see that I could do what I did before, but just on MAX(RowNum) instead of worrying about the LEN(). Still looking for the single-query solution, though!
September 29, 2011 at 5:26 pm
Hi jeffem,
Ok, so first the CTE: you have not "missed" anything really, but you have found one of the problems associated with using a recursive CTE for this type of job ; knowing which output rows to keep!
I am sure there are as many methods out there as there are varieties of apples, here is one:
It relies on recording the depth of the recursion, simply by adding 1 each time, then on an extra sort to find the lowest level for each group (read EmpId and OrderDate) so that the results can finally be filtered.
Here's this one method:
;WITH OrderRows AS (
SELECT ROW_NUMBER() OVER(PARTITION BY EmpID, OrderDate ORDER BY EmpID, OrderID, OrderDate) AS RowNum,
EmpID, OrderID, CONVERT(VARCHAR(10), OrderDate, 101) AS OrderDate
FROM #Orders
),
CTE AS (
SELECT RowNum, EmpID, CAST(OrderID AS VARCHAR(500)) AS OrderID, OrderDate, 1 as OrderCount
FROM OrderRows
UNION ALL
SELECT O.RowNum, C.EmpID, CAST(C.OrderID + ISNULL(', ' + O.OrderID, '') AS VARCHAR(500)), C.OrderDate, 1+OrderCount
FROM OrderRows O
INNER JOIN CTE C
ON O.EmpID = C.EmpID
AND O.OrderDate = C.OrderDate
AND O.RowNum = C.RowNum + 1
), joined_rows AS (
SELECT *, row_number() OVER(PARTITION BY EmpId,OrderDate ORDER BY OrderCount DESC) as Marker
FROM CTE C
)
select EmpId,OrderId,OrderDate
from joined_rows
where Marker = 1
order BY EmpId
Notice the extra column called OrderCount in your CTE (CTE!)? It records the depth of the row (or it's count) so that in the new CTE called "joined_rows", I can generate yet another row number for each EmpID and OrderDate combination, ordered by the OrderCount DESC.
This new "Marker" then contains the value 1 for the rows we need and we can select by that.
This is horrible code, very wasteful and will grind to a halt on a decent size set of data.
Take a look at the execution plan for that recursive ( :sick: :crying: ) cte:
Now, let's take a look at one method (there will be many of these too) for doing this without a recursive CTE:
First get a list of the EmpID and OrderDate combinations you need and then go grab the order list for each one...
SELECT EmpID,OrderId,OrderDate
FROM
(
SELECT EmpID,OrderDate AS OrderDateTime
FROM #Orders
GROUP BY EmpID,OrderDate
) AS EmpDates
OUTER APPLY
(
SELECT ','+OrderId
FROM #Orders AS Orders
WHERE Orders.EmpID = EmpDates.EmpID
AND Orders.OrderDate = EmpDates.OrderDateTime
ORDER BY OrderId
FOR XML PATH('') -- this concatenates the OrderIds
) AS OA(List)
OUTER APPLY
(
-- This strips the leading comma from each list and strips the time element from the OrderDate
SELECT STUFF(OA.List,1,1,''),CONVERT(CHAR(10),OrderDateTime,101)
) AS Result(OrderId,OrderDate)
ORDER BY EmpID
And the execution plan for this : (Indexing is obviously missing from this trivial example)
What can you tell from those plans? Not a lot at first sight, but if you examine them you will see 30 scans of the orders table in the recursive cte and the comparitively low (still not great though) 9 scans in the alternative, xml path method.
Now imagine scaling up the data set to 1 million rows... I bet the cte would not finish.
Hope this helps.
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
September 29, 2011 at 10:15 pm
As magoo has said, there are better methods for this type of query than a recursive cte.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
September 30, 2011 at 11:48 am
Seรฑor Magoo,
That is some terrific info and very easy to digest! I greatly appreciate the time you spent on that. It helps me understand CTE's a bit better, and now I see that I need to explore how to best use APPLY some more! Just trying to get more tools in my toolbox.
Thanks again!
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply