March 5, 2010 at 10:32 am
Jeff Moden (3/4/2010)
This is the link John tried to post...http://www.imdb.com/title/tt0151804/
It's a cult classic, totally silly, and (unfortunately) sometimes truer than life in a cube. π
and of course the main plot element is a scheme to skim fractions of pennies from interest rounding.
October 29, 2010 at 10:04 am
Paul White NZ (2/27/2010)
...recursive CTEs can be fast, but it is relative. I sometimes use them to seek down the distinct keys of a large index rather than scanning the whole thing and relying on an aggregate, for example...Paul
I'd really like to see an example of that - in the meantime, here's another simple and obvious use of rCTE's. No timings because I've not yet had the opportunity.
-- Rollup rows, concatenating row values into a new column
DROP TABLE #Test
CREATE TABLE #Test (TestID INT IDENTITY (1,1), Section INT, Word VARCHAR(20))
INSERT INTO #Test (Section, Word) VALUES
(1, 'A'),(1, 'day'), (1, 'in'),(1, 'the'),(1, 'life'),(1, 'of'),(1, 'Ivan'),(1, 'Denisovich'),
(2, 'Silent'),(2, 'Spring')
;WITH PreparedData AS (
SELECT *,
SectionID = ROW_NUMBER() OVER(PARTITION BY Section ORDER BY TestID), -- sets order of words in sentence
SectionSize = COUNT(*) OVER(PARTITION BY Section) -- used to detect last word of sentence
FROM #Test),
Concatenator AS (
SELECT TestID, Section, Word, -- source columns
SectionID, SectionSize, Sentence = CAST(Word AS VARCHAR(100)) -- "working" columns
FROM PreparedData
WHERE TestID = 1
UNION ALL
SELECT
t.TestID, t.Section, t.Word,
t.SectionID, t.SectionSize, Sentence = CAST(CASE WHEN t.SectionID = 1 THEN ISNULL(t.Word, '')
ELSE l.Sentence + ' ' + ISNULL(t.Word, '') END AS VARCHAR(100))
FROM PreparedData t
INNER JOIN Concatenator l ON l.TestID + 1 = t.TestID
)
SELECT
TestID,
Section,
Sentence
FROM Concatenator
WHERE SectionID = SectionSize
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 29, 2010 at 9:15 pm
Chris Morris-439714 (10/29/2010)
I'd really like to see an example of that
I'll post one in a minute.
...in the meantime, here's another simple and obvious use of rCTE's. No timings because I've not yet had the opportunity.
Table Spool mania! It doesn't perform well; I'd stick with the usual XML hack/solution:
SELECT Sections.Section,
Concatenated.sentence
FROM (
SELECT DISTINCT
Section
FROM #Test
) Sections
CROSS
APPLY (
SELECT SPACE(1) + Word
FROM #Test T
WHERE T.Section = Sections.Section
ORDER BY
T.TestID
FOR XML PATH (''),
TYPE
) Concatenator (xml_string)
CROSS
APPLY (
SELECT STUFF(Concatenator.xml_string.value('(./text())[1]', 'VARCHAR(MAX)'), 1, 1, SPACE(0))
) Concatenated (sentence);
October 29, 2010 at 9:54 pm
Super-fast DISTINCT using a recursive CTE:
USE tempdb;
GO
DROP TABLE dbo.Test;
GO
CREATE TABLE
dbo.Test
(
data INTEGER NOT NULL,
);
GO
CREATE CLUSTERED INDEX c ON dbo.Test (data);
GO
-- Lots of duplicated values
INSERT dbo.Test WITH (TABLOCK)
(data)
SELECT TOP (5000000)
ROW_NUMBER() OVER (ORDER BY (SELECT 0)) / 117329
FROM master.sys.columns C1,
master.sys.columns C2,
master.sys.columns C3;
GO
SET STATISTICS TIME ON;
-- 1591ms CPU
SELECT DISTINCT
data
FROM dbo.Test;
-- 15ms CPU
WITH RecursiveCTE
AS (
SELECT data = MIN(T.data)
FROM dbo.Test T
UNION ALL
SELECT R.data
FROM (
-- A cunning way to use TOP in the recursive part of a CTE :)
SELECT T.data,
rn = ROW_NUMBER() OVER (ORDER BY T.data)
FROM dbo.Test T
JOIN RecursiveCTE R
ON R.data < T.data
) R
WHERE R.rn = 1
)
SELECT *
FROM RecursiveCTE
OPTION (MAXRECURSION 0);
SET STATISTICS TIME OFF;
GO
DROP TABLE dbo.Test;
The recursive CTE is 100 times more efficient π
October 31, 2010 at 11:53 pm
Paul White NZ (10/29/2010)
Super-fast DISTINCT using a recursive CTE:
Absolutely awesome! What on earth made you look for an alternative to DISTINCT?
On my 8 year old 1.8GHz P4, here's what I get...
(43 row(s) affected)
SQL Server Execution Times:
CPU time = 2938 ms, elapsed time = 3052 ms.
(43 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 3 ms.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 1, 2010 at 12:06 am
Jeff Moden (10/31/2010)
Absolutely awesome! What on earth made you look for an alternative to DISTINCT?
Thanks. It's a fair while ago now, so I don't remember exactly what the circumstances were, but for sure there was a column with very many duplicate values and it just seemed dumb to me that the optimizer would choose to scan the whole index rather than trying something a little more creative. I think it was around the time I was writing my article on paging, so it might have been at least tangentially related to that.
November 1, 2010 at 6:54 am
Damn! There is a trade off. If you don't know there's going to be dupes or not, pray for dupes when you use it. π
USE tempdb;
GO
DROP TABLE dbo.Test;
GO
CREATE TABLE
dbo.Test
(
data INTEGER NOT NULL,
);
GO
CREATE CLUSTERED INDEX c ON dbo.Test (data);
GO
-- Lots of duplicated values
INSERT dbo.Test WITH (TABLOCK)
(data)
SELECT TOP (1000000) --CUT DOWN TO A MILLION FOR THIS TEST
ROW_NUMBER() OVER (ORDER BY (SELECT 0)) --NO DUPES
FROM master.sys.all_columns C1,
master.sys.all_columns C2
GO
------------------------------------------------------------------
DECLARE @Bitbucket INT --TO TAKE THE DISPLAY PROCESSING OUT OF THE PICTURE
SET STATISTICS TIME ON;
-- 1591ms CPU
SELECT DISTINCT
@Bitbucket = data
FROM dbo.Test;
-- 15ms CPU
WITH RecursiveCTE
AS (
SELECT data = MIN(T.data)
FROM dbo.Test T
UNION ALL
SELECT R.data
FROM (
-- A cunning way to use TOP in the recursive part of a CTE :)
SELECT T.data,
rn = ROW_NUMBER() OVER (ORDER BY T.data)
FROM dbo.Test T
JOIN RecursiveCTE R
ON R.data < T.data
) R
WHERE R.rn = 1
)
SELECT @Bitbucket = data
FROM RecursiveCTE
OPTION (MAXRECURSION 0);
SET STATISTICS TIME OFF;
GO
DROP TABLE dbo.Test;
Still, if you know you have a lot of dupes, this is the berries! π
--Jeff Moden
Change is inevitable... Change for the better is not.
November 1, 2010 at 7:13 am
Paul White NZ (10/29/2010)
...in the meantime, here's another simple and obvious use of rCTE's. No timings because I've not yet had the opportunity.[/quote-0]Table Spool mania! It doesn't perform well; I'd stick with the usual XML hack/solution:
Heh - until you wrap it up nice and warm:
-- make some simple sample data, 800,000 rows
DROP TABLE #Sections
SELECT TOP 100000 Section = ABS(CHECKSUM(NEWID()))
INTO #Sections
FROM dbo.syscolumns a, dbo.syscolumns b
DROP TABLE #Test
CREATE TABLE #Test (TestID INT IDENTITY (1,1), Section INT, Word VARCHAR(20))
INSERT INTO #Test (Section, Word)
SELECT s.Section, d1.Word
FROM #Sections s,
(
SELECT 'A' AS Word UNION ALL
SELECT 'day' UNION ALL
SELECT 'in' UNION ALL
SELECT 'the' UNION ALL
SELECT 'life' UNION ALL
SELECT 'of' UNION ALL
SELECT 'Ivan' UNION ALL
SELECT 'Denisovich'
) d1
ORDER BY s.Section
-- prepare the data for use - part of the solution
DROP TABLE #PreparedData
SELECT
TestID,
Section,
Word,
WordStatus = CAST(CASE WHEN SectionID = 1 THEN 1 WHEN SectionID = SectionSize THEN 9 ELSE 0 END AS TINYINT)
INTO #PreparedData
FROM (
SELECT *,
SectionID = ROW_NUMBER() OVER(PARTITION BY Section ORDER BY TestID), -- sets order of words in sentence
SectionSize = COUNT(*) OVER(PARTITION BY Section) -- used to detect last word of sentence
FROM #Test
) d
CREATE UNIQUE CLUSTERED INDEX CITestID ON #PreparedData ([TestID] ASC)
-- 6 seconds
-- consume the prepared data
;WITH Concatenator AS (
SELECT TestID, Section, Word, -- source columns
WordStatus, Sentence = CAST(Word AS VARCHAR(100)) -- "working" columns
FROM #PreparedData
WHERE TestID = 1
UNION ALL
SELECT
t.TestID, t.Section, t.Word,
t.WordStatus, Sentence = CAST(CASE WHEN t.WordStatus = 1 THEN ISNULL(t.Word, '')
ELSE l.Sentence + ' ' + ISNULL(t.Word, '') END AS VARCHAR(100))
FROM #PreparedData t
INNER JOIN Concatenator l ON l.TestID + 1 = t.TestID
)
SELECT
TestID,
Section,
Sentence
FROM Concatenator
WHERE WordStatus = 9
OPTION (MAXRECURSION 0)
-- 32 seconds: 100,000 (ish) rollup-up output rows
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 1, 2010 at 7:18 am
Jeff Moden (11/1/2010)
Damn! There is a trade off. If you don't know there's going to be dupes or not, pray for dupes when you use it.... Still, if you know you have a lot of dupes, this is the berries! π
Yes, absolutely.
November 1, 2010 at 7:18 am
Jeff Moden (11/1/2010)
Damn! There is a trade off. If you don't know there's going to be dupes or not, pray for dupes when you use it. π
USE tempdb;
GO
DROP TABLE dbo.Test;
GO
CREATE TABLE
dbo.Test
(
data INTEGER NOT NULL,
);
GO
CREATE CLUSTERED INDEX c ON dbo.Test (data);
GO
-- Lots of duplicated values
INSERT dbo.Test WITH (TABLOCK)
(data)
SELECT TOP (1000000) --CUT DOWN TO A MILLION FOR THIS TEST
ROW_NUMBER() OVER (ORDER BY (SELECT 0)) --NO DUPES
FROM master.sys.all_columns C1,
master.sys.all_columns C2
GO
------------------------------------------------------------------
DECLARE @Bitbucket INT --TO TAKE THE DISPLAY PROCESSING OUT OF THE PICTURE
SET STATISTICS TIME ON;
-- 1591ms CPU
SELECT DISTINCT
@Bitbucket = data
FROM dbo.Test;
-- 15ms CPU
WITH RecursiveCTE
AS (
SELECT data = MIN(T.data)
FROM dbo.Test T
UNION ALL
SELECT R.data
FROM (
-- A cunning way to use TOP in the recursive part of a CTE :)
SELECT T.data,
rn = ROW_NUMBER() OVER (ORDER BY T.data)
FROM dbo.Test T
JOIN RecursiveCTE R
ON R.data < T.data
) R
WHERE R.rn = 1
)
SELECT @Bitbucket = data
FROM RecursiveCTE
OPTION (MAXRECURSION 0);
SET STATISTICS TIME OFF;
GO
DROP TABLE dbo.Test;
Still, if you know you have a lot of dupes, this is the berries! π
No kidding. 100-fold improvement is astonishing.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 1, 2010 at 7:32 am
Chris Morris-439714 (11/1/2010)
Heh - until you wrap it up nice and warm...
That helps - as usual - but it's still a bit slower than the (unmodified) XML PATH solution I posted earlier.
My version also benefits from a suitable clustered index: CREATE UNIQUE CLUSTERED INDEX cuq ON #Test (Section, TestID). That makes it much faster, at least on my machine.
Both solutions suck a bit because they can only use a single thread - can you guess how I'd fix that? π
November 1, 2010 at 7:39 am
Paul White NZ (11/1/2010)
Chris Morris-439714 (11/1/2010)
Heh - until you wrap it up nice and warm...That helps - as usual - but it's still a bit slower than the (unmodified) XML PATH solution I posted earlier.
My version also benefits from a suitable clustered index: CREATE UNIQUE CLUSTERED INDEX cuq ON #Test (Section, TestID). That makes it much faster, at least on my machine.
Both solutions suck a bit because they can only use a single thread - can you guess how I'd fix that? π
Nah mate my walking-on-water skills are still underwhelming - don't even drink the stuff, fish f...never mind. Spill the beans! I'll try to understand, promise!
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 1, 2010 at 7:44 am
Chris Morris-439714 (11/1/2010)
Nah mate my walking-on-water skills are still underwhelming - don't even drink the stuff, fish f...never mind. Spill the beans! I'll try to understand, promise!
SQLCLR, naturally - using Adam's Query Parallelizer π
November 1, 2010 at 7:47 am
Paul White NZ (11/1/2010)
Chris Morris-439714 (11/1/2010)
Nah mate my walking-on-water skills are still underwhelming - don't even drink the stuff, fish f...never mind. Spill the beans! I'll try to understand, promise!SQLCLR, naturally - using Adam's Query Parallelizer π
Ooh you cheat, that's not a fix that's a different method altogether π
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
November 1, 2010 at 7:59 am
Chris Morris-439714 (11/1/2010)
Ooh you cheat, that's not a fix that's a different method altogether π
Yes I do cheat. Proud of it! :laugh:
Viewing 15 posts - 46 through 60 (of 67 total)
You must be logged in to reply to this topic. Login to reply