June 30, 2015 at 6:43 am
Jeff Moden (6/27/2015)
I'm still working on it. Everything I come up with has had a hole in it.
Sorry it's been a while Jeff, crazy busy times. Next gig booked 🙂
If you remember a year or four back when quite a few of us were messing around with rCTE's and figuring out what they were good for, I remember trying to put a ROW_NUMBER on the recursive leg of the recursive part and always getting 1 as the result, even if there was a bunch of rows from the last set which match the current set. I think this could provide, for most, good evidence that the table spool in a rCTE works with rows, not pages, or even sets. I'll knock out a sample later if time permits.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 30, 2015 at 6:48 am
Paul White (6/28/2015)
The reason I often refer to logical reads of worktables (including spools) being measured in rows rather than pages is an aid to understanding for people who are confused by the large number of worktable logical reads reported, compared with reads from regular tables. Certainly, worktables and spools are internally organized similarly (though not identically) to real tables. Specifically, worktables do indeed have (optimized) page storage and either a heap-like or b-tree structure.The key difference is in the way reads are reported. Consider 100 rows on a single page in a regular user heap table. Reading all 100 rows will report a single logical read, despite the fact the same page was actually accessed 100 times. When organized in an index structure, there may be an extra logical read or two (or three...) due to the non-leaf levels of the index being accessed once at the start of the operation. So, the logical reads people are most familiar with, represent the number of *pages* read (not the number of rows).
Spools and worktables, in general, do not apply this page-level reporting policy. Depending on the type of spool (index- or heap-based) and other details, they will report at least one logical read each time a row is processed by the spool. Sometimes this includes writes as well as reads. If the primary spool is an index spool, the operation will also count any non-leaf index page accesses as well, further inflating the numbers.
The important consequence, is that worktables and spools report logical reads differently. They should not be compared directly with logical reads on user-visible tables. The reads for a worktable or spool will, in general, be proportional to the number of *rows* processed, and in the simplest cases, equal to the number of rows.
All this frequently confuses DBAs and developers who are used to tuning performance based solely on logical reads (not a school of thought I subscribe to by the way). It also explains why running an experiment to do the same thing as a spool with real tables will generally show lower logical reads (example).
Thanks tons for the links, Paul. I've read them before, unless you've published the equation for relating rows to logical reads elsewhere - but the conclusion is the same. Couldn't find the articles at the weekend but found an excellent spool article penned by your mate Rob Farley so it was time well spent.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 30, 2015 at 6:53 am
Paul White (6/28/2015)
...All this frequently confuses DBAs and developers who are used to tuning performance based solely on logical reads (not a school of thought I subscribe to by the way). ...
No kidding. Tuning a query early last week I came across this:
== SLOW QUERY =========================================================
Table 'tbl_OrdersDataRetrievedFromFact_ForMailing'. Scan count 5, logical reads 11,408, ….
Table 'tbl_B2_CustomerID_Latest_LOOKUP'. Scan count 5, logical reads 39,000, ….
Table 'tbl_B2_CustomerMailing_NKY'. Scan count 5, logical reads 124,802, ….
(2429008 row(s) affected)
SQL Server Execution Times:
CPU time = 452,015 ms, elapsed time = 214,388 ms.
== FAST QUERY =========================================================
Table 'tbl_OrdersDataRetrievedFromFact_ForMailing'. Scan count 4,647,316, logical reads 14,866,827, ….
Table 'tbl_B2_CustomerID_Latest_LOOKUP'. Scan count 5, logical reads 38,956, ….
Table 'tbl_B2_CustomerMailing_NKY'. Scan count 5, logical reads 123,202, ….
(2429008 row(s) affected)
SQL Server Execution Times:
CPU time = 86,157 ms, elapsed time = 26,357 ms.
===========================================================
You cannot rely solely on logical reads - and neither of the queries uses a spool. I can share the execution plans privately if anyone's interested. The plan shapes have changed slightly due to index changes so the differences are less drastic but the principle remains.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 30, 2015 at 6:54 am
IdRatherCodeIt (6/29/2015)
I used both the cte and for xml to retrieve a grouped list of about 938 rows from a table with 3458 records as the source (filtered for my use case).The cte took 4:02 minutes and it dropped any null values in the concatenated value due to the aggregation.
The for xml took 0.26 seconds with all my null values
--pete
Can you share your code, Pete?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 30, 2015 at 7:30 am
ChrisM@Work (6/30/2015)
Paul White (6/28/2015)
...All this frequently confuses DBAs and developers who are used to tuning performance based solely on logical reads (not a school of thought I subscribe to by the way). ...No kidding. Tuning a query early last week I came across this:
== SLOW QUERY =========================================================
Table 'tbl_OrdersDataRetrievedFromFact_ForMailing'. Scan count 5, logical reads 11,408, ….
Table 'tbl_B2_CustomerID_Latest_LOOKUP'. Scan count 5, logical reads 39,000, ….
Table 'tbl_B2_CustomerMailing_NKY'. Scan count 5, logical reads 124,802, ….
(2429008 row(s) affected)
SQL Server Execution Times:
CPU time = 452,015 ms, elapsed time = 214,388 ms.
== FAST QUERY =========================================================
Table 'tbl_OrdersDataRetrievedFromFact_ForMailing'. Scan count 4,647,316, logical reads 14,866,827, ….
Table 'tbl_B2_CustomerID_Latest_LOOKUP'. Scan count 5, logical reads 38,956, ….
Table 'tbl_B2_CustomerMailing_NKY'. Scan count 5, logical reads 123,202, ….
(2429008 row(s) affected)
SQL Server Execution Times:
CPU time = 86,157 ms, elapsed time = 26,357 ms.
===========================================================
You cannot rely solely on logical reads - and neither of the queries uses a spool. I can share the execution plans privately if anyone's interested. The plan shapes have changed slightly due to index changes so the differences are less drastic but the principle remains.
Agreed but (and with the understanding that I've not yet seen the execution plan) that has all the smackings of an index seek occurring 4.6 million times and might be another area of optimization that might make yet another significant reduction.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 30, 2015 at 8:31 am
Jeff Moden (6/30/2015)
ChrisM@Work (6/30/2015)
Paul White (6/28/2015)
...All this frequently confuses DBAs and developers who are used to tuning performance based solely on logical reads (not a school of thought I subscribe to by the way). ...No kidding. Tuning a query early last week I came across this:
== SLOW QUERY =========================================================
Table 'tbl_OrdersDataRetrievedFromFact_ForMailing'. Scan count 5, logical reads 11,408, ….
Table 'tbl_B2_CustomerID_Latest_LOOKUP'. Scan count 5, logical reads 39,000, ….
Table 'tbl_B2_CustomerMailing_NKY'. Scan count 5, logical reads 124,802, ….
(2429008 row(s) affected)
SQL Server Execution Times:
CPU time = 452,015 ms, elapsed time = 214,388 ms.
== FAST QUERY =========================================================
Table 'tbl_OrdersDataRetrievedFromFact_ForMailing'. Scan count 4,647,316, logical reads 14,866,827, ….
Table 'tbl_B2_CustomerID_Latest_LOOKUP'. Scan count 5, logical reads 38,956, ….
Table 'tbl_B2_CustomerMailing_NKY'. Scan count 5, logical reads 123,202, ….
(2429008 row(s) affected)
SQL Server Execution Times:
CPU time = 86,157 ms, elapsed time = 26,357 ms.
===========================================================
You cannot rely solely on logical reads - and neither of the queries uses a spool. I can share the execution plans privately if anyone's interested. The plan shapes have changed slightly due to index changes so the differences are less drastic but the principle remains.
Agreed but (and with the understanding that I've not yet seen the execution plan) that has all the smackings of an index seek occurring 4.6 million times and might be another area of optimization that might make yet another significant reduction.
You're absolutely right - the fast plan has a ton of seeks which the slow plan doesn't have, but the old plan carries 55 million rows almost all the way through whereas the fast plan knocks the rowcount down to about 4.6 million very quickly. If this has whetted your appetite, PM me with an email address and I'll whizz them over to you. They're not really that interesting other than mocking excessive reliance on logical reads.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 30, 2015 at 9:47 am
Thought it might be worth adding some bulk test data:
-- You'll need Jeff Moden's Tally table for this http://www.sqlservercentral.com/articles/T-SQL/62867/
TRUNCATE TABLE [dbo].[OfficeCounty]
DELETE [dbo].[County]
DELETE [dbo].[Office]
-- Generate 26*26*10 counties (10 counties in each of 676 states)
INSERT [dbo].[County] ([CountyName], [StateAbbr])
SELECT 'Area ' + CONVERT(varchar,T3.N) + ' ' + CHAR(T1.N) + CHAR(T2.N) + ' County, ' + CHAR(T1.N) + CHAR(T2.N) [CountyName],
CHAR(T1.N) + CHAR(T2.N) [StateAbbr]
FROM dbo.Tally T1
INNER JOIN dbo.Tally T2
ON T2.N BETWEEN ASCII('A') AND ASCII('Z')
INNER JOIN dbo.Tally T3
ON T3.N BETWEEN 1 AND 10
WHERE T1.N BETWEEN ASCII('A') AND ASCII('Z')
ORDER BY 1
-- Generate 1 office in each of 676 states
INSERT [dbo].[Office] ([OfficeName], [Address1], [Address2])
SELECT CHAR(T1.N) + CHAR(T2.N) + ' Office' AS OfficeName,
CONVERT(varchar, T1.N - ASCII('A') + 1) + ' ' + CHAR(T1.N) + ' Avenue, ' +CHAR(T1.N) + CHAR(T2.N) AS Address1,
'State ' + + CHAR(T1.N) + CHAR(T2.N) AS Address2
FROM dbo.Tally T1
INNER JOIN dbo.Tally T2
ON T2.N BETWEEN ASCII('A') AND ASCII('Z')
WHERE T1.N BETWEEN ASCII('A') AND ASCII('Z')
-- Populate [OfficeCounty]
INSERT INTO [dbo].[OfficeCounty]
([OfficeID]
,[CountyID])
SELECT O.OfficeId,
C.CountyId
FROM Office O
INNER JOIN County C
ON C.StateAbbr = RIGHT(O.Address1, 2)
ORDER BY 1,2
I then did some tests on the following queries:
SET STATISTICS TIME, IO ON
GO
-- ***************************************************************
-- TEST 1
-- Query from article http://www.sqlservercentral.com/articles/Recursive+CTE/99294/
-- ***************************************************************
--source CTE:
WITH office_county_state AS (
SELECT
o.OfficeID,
c.CountyName,
c.StateAbbr,
ROW_NUMBER() OVER (PARTITION BY o.OfficeID,c.StateAbbr ORDER BY CountyName) AS rank_county_name,
COUNT(CountyName) OVER (PARTITION BY o.OfficeID,c.StateAbbr) AS max_rank_county_name
FROM dbo.Office o
INNER JOIN dbo.OfficeCounty oc
ON o.OfficeID=oc.OfficeID
INNER JOIN dbo.County c
ON c.CountyId=oc.CountyID
)
--recursive CTE:
,recurs_CTE AS
(
--anchor query:
SELECT
OfficeID,
StateAbbr,
rank_county_name,
max_rank_county_name,
CAST(CountyName AS VARCHAR(8000)) AS countynames
FROM office_county_state
WHERE rank_county_name=1
UNION ALL
--recursive member:
SELECT
d.officeID,
d.StateAbbr,
d.rank_county_name,
d.max_rank_county_name,
CAST(recurs_CTE.countynames + ',' + d.CountyName AS VARCHAR(8000))
FROM recurs_CTE
INNER JOIN office_county_state d
ON d.OfficeID=recurs_CTE.OfficeID
AND d.StateAbbr=recurs_CTE.StateAbbr
WHERE d.rank_county_name=1+recurs_CTE.rank_county_name
)
SELECT recurs_CTE.OfficeID,
recurs_CTE.StateAbbr,
recurs_CTE.countynames
FROM recurs_CTE
-- we filter the resultset from the recursive CTE to get the longest list of countyName
WHERE recurs_CTE.rank_county_name=recurs_CTE.max_rank_county_name
GO
-- ***************************************************************
-- TEST 2
-- Using FOR XML with CROSS APPLY
-- ***************************************************************
SELECT OfficeID,
S.StateAbbr,
C.countrynames
FROM dbo.Office O
CROSS APPLY (SELECT TOP(1)
C.StateAbbr
FROM dbo.County C
INNER JOIN dbo.OfficeCounty OC
ON OC.CountyID = C.CountyId
AND OC.OfficeID = O.OfficeID
) AS S
CROSS APPLY (SELECT STUFF((SELECT ', ' + CountyName
FROM dbo.County C
INNER JOIN dbo.OfficeCounty OC
ON OC.CountyID = C.CountyId
AND OC.OfficeID = O.OfficeID
ORDER BY CountyName
FOR XML PATH(''))
,1,2,'') countrynames
) AS C
ORDER BY 2
GO
-- ***************************************************************
-- TEST 3
-- Using a cursor and temporary table
-- ***************************************************************
DECLARE myCursor cursor LOCAL FORWARD_ONLY FAST_FORWARD READ_ONLY
FOR SELECT OfficeId
FROM dbo.Office
DECLARE @OfficeId int, @CountyNames varchar(3000), @StateAbbr varchar(5)
SELECT @OfficeId OfficeId, @StateAbbr StateAbbr, @CountyNames CountyNames
INTO #Results WHERE 1=0
OPEN myCursor
FETCH NEXT FROM myCursor INTO @OfficeId
WHILE @@FETCH_STATUS = 0 BEGIN
SELECT @CountyNames = '' -- Initialise
SELECT @CountyNames = @CountyNames + ', ' + C.CountyName,
@StateAbbr = C.StateAbbr
FROM dbo.[County] C
INNER JOIN dbo.[OfficeCounty] OC
ON OC.CountyId= C.CountyId
AND OC.OfficeID = @OfficeId
ORDER BY C.CountyName ASC
INSERT INTO #RESULTS
SELECT @OfficeId, @StateAbbr, STUFF(@CountyNames, 1, 2, '')
FETCH NEXT FROM myCursor INTO @OfficeId
END
CLOSE myCursor
DEALLOCATE myCursor
SELECT * FROM #RESULTS ORDER BY 2
DROP TABLE #RESULTS
GO
Results for first two queries:
-- *************************
-- TEST 1
-- *************************
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 84 ms, elapsed time = 84 ms.
(676 row(s) affected)
Table 'Worktable'. Scan count 6767, logical reads 109756814, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'County'. Scan count 6761, logical reads 236635, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'OfficeCounty'. Scan count 6761, logical reads 87893, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Office'. Scan count 6761, logical reads 47327, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 492922 ms, elapsed time = 499470 ms.
-- *************************
-- TEST 2
-- *************************
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
(676 row(s) affected)
Table 'County'. Scan count 0, logical reads 14872, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1352, logical reads 29946, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'OfficeCounty'. Scan count 2, logical reads 26, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Office'. Scan count 1, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 93 ms, elapsed time = 254 ms.
The rCTE really does not perform at all well with volume data. 500 seconds vs 0.254 seconds. Which is about 2000 times slower.
June 30, 2015 at 9:50 am
ChrisM@Work (6/30/2015)
If you remember a year or four back when quite a few of us were messing around with rCTE's and figuring out what they were good for, I remember trying to put a ROW_NUMBER on the recursive leg of the recursive part and always getting 1 as the result, even if there was a bunch of rows from the last set which match the current set.
If you mean what I think you mean, this is intended (though counter-intuitive!) behaviour, see this Connect item.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
June 30, 2015 at 9:54 am
Paul White (6/30/2015)
ChrisM@Work (6/30/2015)
If you remember a year or four back when quite a few of us were messing around with rCTE's and figuring out what they were good for, I remember trying to put a ROW_NUMBER on the recursive leg of the recursive part and always getting 1 as the result, even if there was a bunch of rows from the last set which match the current set.If you mean what I think you mean, this is intended (though counter-intuitive!) behaviour, see this Connect item.
Yes! But my explanation is so much better than yours :p
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 30, 2015 at 10:19 am
ChrisM@Work (6/30/2015)
Yes! But my explanation is so much better than yours :p
Nothing changes 🙂
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
Viewing 10 posts - 61 through 69 (of 69 total)
You must be logged in to reply to this topic. Login to reply