Using a Recursive CTE to Generate a List

Question

Using a Recursive CTE to Generate a List

Viewing 10 posts - 61 through 69 (of 69 total)

You must be logged in to reply to this topic. Login to reply

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 1

Jeff Moden (6/27/2015)
I'm still working on it. Everything I come up with has had a hole in it.

Sorry it's been a while Jeff, crazy busy times. Next gig booked 🙂

If you remember a year or four back when quite a few of us were messing around with rCTE's and figuring out what they were good for, I remember trying to put a ROW_NUMBER on the recursive leg of the recursive part and always getting 1 as the result, even if there was a bunch of rows from the last set which match the current set. I think this could provide, for most, good evidence that the table spool in a rCTE works with rows, not pages, or even sets. I'll knock out a sample later if time permits.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 2

Paul White (6/28/2015)
The reason I often refer to logical reads of worktables (including spools) being measured in rows rather than pages is an aid to understanding for people who are confused by the large number of worktable logical reads reported, compared with reads from regular tables. Certainly, worktables and spools are internally organized similarly (though not identically) to real tables. Specifically, worktables do indeed have (optimized) page storage and either a heap-like or b-tree structure.
The key difference is in the way reads are reported. Consider 100 rows on a single page in a regular user heap table. Reading all 100 rows will report a single logical read, despite the fact the same page was actually accessed 100 times. When organized in an index structure, there may be an extra logical read or two (or three...) due to the non-leaf levels of the index being accessed once at the start of the operation. So, the logical reads people are most familiar with, represent the number of *pages* read (not the number of rows).
Spools and worktables, in general, do not apply this page-level reporting policy. Depending on the type of spool (index- or heap-based) and other details, they will report at least one logical read each time a row is processed by the spool. Sometimes this includes writes as well as reads. If the primary spool is an index spool, the operation will also count any non-leaf index page accesses as well, further inflating the numbers.
The important consequence, is that worktables and spools report logical reads differently. They should not be compared directly with logical reads on user-visible tables. The reads for a worktable or spool will, in general, be proportional to the number of *rows* processed, and in the simplest cases, equal to the number of rows.
All this frequently confuses DBAs and developers who are used to tuning performance based solely on logical reads (not a school of thought I subscribe to by the way). It also explains why running an experiment to do the same thing as a spool with real tables will generally show lower logical reads (example).

Thanks tons for the links, Paul. I've read them before, unless you've published the equation for relating rows to logical reads elsewhere - but the conclusion is the same. Couldn't find the articles at the weekend but found an excellent spool article penned by your mate Rob Farley so it was time well spent.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

Paul White (6/28/2015)
...All this frequently confuses DBAs and developers who are used to tuning performance based solely on logical reads (not a school of thought I subscribe to by the way). ...

No kidding. Tuning a query early last week I came across this:

== SLOW QUERY =========================================================

Table 'tbl_OrdersDataRetrievedFromFact_ForMailing'. Scan count 5, logical reads 11,408, ….

Table 'tbl_B2_CustomerID_Latest_LOOKUP'. Scan count 5, logical reads 39,000, ….

Table 'tbl_B2_CustomerMailing_NKY'. Scan count 5, logical reads 124,802, ….

(2429008 row(s) affected)

SQL Server Execution Times:

CPU time = 452,015 ms, elapsed time = 214,388 ms.

== FAST QUERY =========================================================

Table 'tbl_OrdersDataRetrievedFromFact_ForMailing'. Scan count 4,647,316, logical reads 14,866,827, ….

Table 'tbl_B2_CustomerID_Latest_LOOKUP'. Scan count 5, logical reads 38,956, ….

Table 'tbl_B2_CustomerMailing_NKY'. Scan count 5, logical reads 123,202, ….

(2429008 row(s) affected)

SQL Server Execution Times:

CPU time = 86,157 ms, elapsed time = 26,357 ms.

===========================================================

You cannot rely solely on logical reads - and neither of the queries uses a spool. I can share the execution plans privately if anyone's interested. The plan shapes have changed slightly due to index changes so the differences are less drastic but the principle remains.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 4

IdRatherCodeIt (6/29/2015)
I used both the cte and for xml to retrieve a grouped list of about 938 rows from a table with 3458 records as the source (filtered for my use case).
The cte took 4:02 minutes and it dropped any null values in the concatenated value due to the aggregation.
The for xml took 0.26 seconds with all my null values
--pete

Can you share your code, Pete?

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Jeff Moden SSC Guru Points: 1004425 More actions · Answer 5

ChrisM@Work (6/30/2015)
Paul White (6/28/2015)
...All this frequently confuses DBAs and developers who are used to tuning performance based solely on logical reads (not a school of thought I subscribe to by the way). ...
No kidding. Tuning a query early last week I came across this:
== SLOW QUERY =========================================================
Table 'tbl_OrdersDataRetrievedFromFact_ForMailing'. Scan count 5, logical reads 11,408, ….
Table 'tbl_B2_CustomerID_Latest_LOOKUP'. Scan count 5, logical reads 39,000, ….
Table 'tbl_B2_CustomerMailing_NKY'. Scan count 5, logical reads 124,802, ….
(2429008 row(s) affected)
SQL Server Execution Times:
CPU time = 452,015 ms, elapsed time = 214,388 ms.
== FAST QUERY =========================================================
Table 'tbl_OrdersDataRetrievedFromFact_ForMailing'. Scan count 4,647,316, logical reads 14,866,827, ….
Table 'tbl_B2_CustomerID_Latest_LOOKUP'. Scan count 5, logical reads 38,956, ….
Table 'tbl_B2_CustomerMailing_NKY'. Scan count 5, logical reads 123,202, ….
(2429008 row(s) affected)
SQL Server Execution Times:
CPU time = 86,157 ms, elapsed time = 26,357 ms.
===========================================================
You cannot rely solely on logical reads - and neither of the queries uses a spool. I can share the execution plans privately if anyone's interested. The plan shapes have changed slightly due to index changes so the differences are less drastic but the principle remains.

Agreed but (and with the understanding that I've not yet seen the execution plan) that has all the smackings of an index seek occurring 4.6 million times and might be another area of optimization that might make yet another significant reduction.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 6

Jeff Moden (6/30/2015)
ChrisM@Work (6/30/2015)
Paul White (6/28/2015)
...All this frequently confuses DBAs and developers who are used to tuning performance based solely on logical reads (not a school of thought I subscribe to by the way). ...
No kidding. Tuning a query early last week I came across this:
== SLOW QUERY =========================================================
Table 'tbl_OrdersDataRetrievedFromFact_ForMailing'. Scan count 5, logical reads 11,408, ….
Table 'tbl_B2_CustomerID_Latest_LOOKUP'. Scan count 5, logical reads 39,000, ….
Table 'tbl_B2_CustomerMailing_NKY'. Scan count 5, logical reads 124,802, ….
(2429008 row(s) affected)
SQL Server Execution Times:
CPU time = 452,015 ms, elapsed time = 214,388 ms.
== FAST QUERY =========================================================
Table 'tbl_OrdersDataRetrievedFromFact_ForMailing'. Scan count 4,647,316, logical reads 14,866,827, ….
Table 'tbl_B2_CustomerID_Latest_LOOKUP'. Scan count 5, logical reads 38,956, ….
Table 'tbl_B2_CustomerMailing_NKY'. Scan count 5, logical reads 123,202, ….
(2429008 row(s) affected)
SQL Server Execution Times:
CPU time = 86,157 ms, elapsed time = 26,357 ms.
===========================================================
You cannot rely solely on logical reads - and neither of the queries uses a spool. I can share the execution plans privately if anyone's interested. The plan shapes have changed slightly due to index changes so the differences are less drastic but the principle remains.
Agreed but (and with the understanding that I've not yet seen the execution plan) that has all the smackings of an index seek occurring 4.6 million times and might be another area of optimization that might make yet another significant reduction.

You're absolutely right - the fast plan has a ton of seeks which the slow plan doesn't have, but the old plan carries 55 million rows almost all the way through whereas the fast plan knocks the rowcount down to about 4.6 million very quickly. If this has whetted your appetite, PM me with an email address and I'll whizz them over to you. They're not really that interesting other than mocking excessive reliance on logical reads.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Jonathan AC Roberts SSCoach Points: 18377 More actions · Answer 7

Thought it might be worth adding some bulk test data:

-- You'll need Jeff Moden's Tally table for this http://www.sqlservercentral.com/articles/T-SQL/62867/

TRUNCATE TABLE [dbo].[OfficeCounty]

DELETE [dbo].[County]

DELETE [dbo].[Office]

-- Generate 26*26*10 counties (10 counties in each of 676 states)

INSERT [dbo].[County] ([CountyName], [StateAbbr])

SELECT 'Area ' + CONVERT(varchar,T3.N) + ' ' + CHAR(T1.N) + CHAR(T2.N) + ' County, ' + CHAR(T1.N) + CHAR(T2.N) [CountyName],

CHAR(T1.N) + CHAR(T2.N) [StateAbbr]

FROM dbo.Tally T1

INNER JOIN dbo.Tally T2

ON T2.N BETWEEN ASCII('A') AND ASCII('Z')

INNER JOIN dbo.Tally T3

ON T3.N BETWEEN 1 AND 10

WHERE T1.N BETWEEN ASCII('A') AND ASCII('Z')

ORDER BY 1

-- Generate 1 office in each of 676 states

INSERT [dbo].[Office] ([OfficeName], [Address1], [Address2])

SELECT CHAR(T1.N) + CHAR(T2.N) + ' Office' AS OfficeName,

CONVERT(varchar, T1.N - ASCII('A') + 1) + ' ' + CHAR(T1.N) + ' Avenue, ' +CHAR(T1.N) + CHAR(T2.N) AS Address1,

'State ' + + CHAR(T1.N) + CHAR(T2.N) AS Address2

FROM dbo.Tally T1

INNER JOIN dbo.Tally T2

ON T2.N BETWEEN ASCII('A') AND ASCII('Z')

WHERE T1.N BETWEEN ASCII('A') AND ASCII('Z')

-- Populate [OfficeCounty]

INSERT INTO [dbo].[OfficeCounty]

([OfficeID]

,[CountyID])

SELECT O.OfficeId,

C.CountyId

FROM Office O

INNER JOIN County C

ON C.StateAbbr = RIGHT(O.Address1, 2)

ORDER BY 1,2

I then did some tests on the following queries:

SET STATISTICS TIME, IO ON

GO

-- ***************************************************************

-- TEST 1

-- Query from article http://www.sqlservercentral.com/articles/Recursive+CTE/99294/

-- ***************************************************************

--source CTE:

WITH office_county_state AS (

SELECT

o.OfficeID,

c.CountyName,

c.StateAbbr,

ROW_NUMBER() OVER (PARTITION BY o.OfficeID,c.StateAbbr ORDER BY CountyName) AS rank_county_name,

COUNT(CountyName) OVER (PARTITION BY o.OfficeID,c.StateAbbr) AS max_rank_county_name

FROM dbo.Office o

INNER JOIN dbo.OfficeCounty oc

ON o.OfficeID=oc.OfficeID

INNER JOIN dbo.County c

ON c.CountyId=oc.CountyID

)

--recursive CTE:

,recurs_CTE AS

(

--anchor query:

SELECT

OfficeID,

StateAbbr,

rank_county_name,

max_rank_county_name,

CAST(CountyName AS VARCHAR(8000)) AS countynames

FROM office_county_state

WHERE rank_county_name=1

UNION ALL

--recursive member:

SELECT

d.officeID,

d.StateAbbr,

d.rank_county_name,

d.max_rank_county_name,

CAST(recurs_CTE.countynames + ',' + d.CountyName AS VARCHAR(8000))

FROM recurs_CTE

INNER JOIN office_county_state d

ON d.OfficeID=recurs_CTE.OfficeID

AND d.StateAbbr=recurs_CTE.StateAbbr

WHERE d.rank_county_name=1+recurs_CTE.rank_county_name

)

SELECT recurs_CTE.OfficeID,

recurs_CTE.StateAbbr,

recurs_CTE.countynames

FROM recurs_CTE

-- we filter the resultset from the recursive CTE to get the longest list of countyName

WHERE recurs_CTE.rank_county_name=recurs_CTE.max_rank_county_name

GO

-- ***************************************************************

-- TEST 2

-- Using FOR XML with CROSS APPLY

-- ***************************************************************

SELECT OfficeID,

S.StateAbbr,

C.countrynames

FROM dbo.Office O

CROSS APPLY (SELECT TOP(1)

C.StateAbbr

FROM dbo.County C

INNER JOIN dbo.OfficeCounty OC

ON OC.CountyID = C.CountyId

AND OC.OfficeID = O.OfficeID

) AS S

CROSS APPLY (SELECT STUFF((SELECT ', ' + CountyName

FROM dbo.County C

INNER JOIN dbo.OfficeCounty OC

ON OC.CountyID = C.CountyId

AND OC.OfficeID = O.OfficeID

ORDER BY CountyName

FOR XML PATH(''))

,1,2,'') countrynames

) AS C

ORDER BY 2

GO

-- ***************************************************************

-- TEST 3

-- Using a cursor and temporary table

-- ***************************************************************

DECLARE myCursor cursor LOCAL FORWARD_ONLY FAST_FORWARD READ_ONLY

FOR SELECT OfficeId

FROM dbo.Office

DECLARE @OfficeId int, @CountyNames varchar(3000), @StateAbbr varchar(5)

SELECT @OfficeId OfficeId, @StateAbbr StateAbbr, @CountyNames CountyNames

INTO #Results WHERE 1=0

OPEN myCursor

FETCH NEXT FROM myCursor INTO @OfficeId

WHILE @@FETCH_STATUS = 0 BEGIN

SELECT @CountyNames = '' -- Initialise

SELECT @CountyNames = @CountyNames + ', ' + C.CountyName,

@StateAbbr = C.StateAbbr

FROM dbo.[County] C

INNER JOIN dbo.[OfficeCounty] OC

ON OC.CountyId= C.CountyId

AND OC.OfficeID = @OfficeId

ORDER BY C.CountyName ASC

INSERT INTO #RESULTS

SELECT @OfficeId, @StateAbbr, STUFF(@CountyNames, 1, 2, '')

FETCH NEXT FROM myCursor INTO @OfficeId

END

CLOSE myCursor

DEALLOCATE myCursor

SELECT * FROM #RESULTS ORDER BY 2

DROP TABLE #RESULTS

GO

Results for first two queries:

-- *************************

-- TEST 1

-- *************************

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server parse and compile time:

CPU time = 84 ms, elapsed time = 84 ms.

(676 row(s) affected)

Table 'Worktable'. Scan count 6767, logical reads 109756814, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'County'. Scan count 6761, logical reads 236635, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'OfficeCounty'. Scan count 6761, logical reads 87893, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Office'. Scan count 6761, logical reads 47327, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 492922 ms, elapsed time = 499470 ms.

-- *************************

-- TEST 2

-- *************************

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

(676 row(s) affected)

Table 'County'. Scan count 0, logical reads 14872, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 1352, logical reads 29946, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'OfficeCounty'. Scan count 2, logical reads 26, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Office'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 93 ms, elapsed time = 254 ms.

The rCTE really does not perform at all well with volume data. 500 seconds vs 0.254 seconds. Which is about 2000 times slower.

Paul White SSC Guru Points: 150468 More actions · Answer 8

ChrisM@Work (6/30/2015)
If you remember a year or four back when quite a few of us were messing around with rCTE's and figuring out what they were good for, I remember trying to put a ROW_NUMBER on the recursive leg of the recursive part and always getting 1 as the result, even if there was a bunch of rows from the last set which match the current set.

If you mean what I think you mean, this is intended (though counter-intuitive!) behaviour, see this Connect item.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 9

Paul White (6/30/2015)
ChrisM@Work (6/30/2015)
If you remember a year or four back when quite a few of us were messing around with rCTE's and figuring out what they were good for, I remember trying to put a ROW_NUMBER on the recursive leg of the recursive part and always getting 1 as the result, even if there was a bunch of rows from the last set which match the current set.
If you mean what I think you mean, this is intended (though counter-intuitive!) behaviour, see this Connect item.

Yes! But my explanation is so much better than yours :p

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Paul White SSC Guru Points: 150468 More actions · Answer 10

ChrisM@Work (6/30/2015)
Yes! But my explanation is so much better than yours :p

Nothing changes 🙂

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi