the use of CTEs

  • Oh, one more note. In between all the getting carried away with details and terminology and everyone being right and stuff, I forgot to add my two cents about the original question, which (if I remember correctly) was about over-use of CTEs.

    My take on that is that any feature in T-SQL, or in fact any feature in any language, should be used where useful and not used where not. Overuse of any feature is always bad.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Luis Cazares (1/6/2016)


    I'll just want some help from those who prefer subqueries over CTEs to transform one of our favorites functions.

    Here you go:

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE FUNCTION [dbo].[DelimitedSplit8K_NoCTE]

    --===== Define I/O parameters

    (@pString VARCHAR(8000), @pDelimiter CHAR(1))

    --WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!

    RETURNS TABLE AS

    RETURN

    --===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...

    -- enough to cover VARCHAR(8000)

    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY cteLen.N1),

    Item = SUBSTRING(@pString, cteLen.N1, cteLen.L1)

    FROM(--==== Return start and length (for use in substring)

    SELECT cteStart.N1,

    ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,cteStart.N1),0)-cteStart.N1,8000)

    FROM (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT 1 UNION ALL

    SELECT t.N+1 FROM [dbo].[TallyTable] t

    WHERE t.N <= DATALENGTH(@pString)

    AND SUBSTRING(@pString,t.N,1) = @pDelimiter

    ) cteStart(N1)

    ) cteLen(N1,L1)

    GO

    I kept all the names for the subqueries to make it easier to identify the parts.

    _____________
    Code for TallyGenerator

  • Sergiy (1/6/2016)


    Luis Cazares (1/6/2016)


    I'll just want some help from those who prefer subqueries over CTEs to transform one of our favorites functions.

    Here you go:

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE FUNCTION [dbo].[DelimitedSplit8K_NoCTE]

    --===== Define I/O parameters

    (@pString VARCHAR(8000), @pDelimiter CHAR(1))

    --WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!

    RETURNS TABLE AS

    RETURN

    --===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...

    -- enough to cover VARCHAR(8000)

    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY cteLen.N1),

    Item = SUBSTRING(@pString, cteLen.N1, cteLen.L1)

    FROM(--==== Return start and length (for use in substring)

    SELECT cteStart.N1,

    ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,cteStart.N1),0)-cteStart.N1,8000)

    FROM (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT 1 UNION ALL

    SELECT t.N+1 FROM [dbo].[TallyTable] t

    WHERE t.N <= DATALENGTH(@pString)

    AND SUBSTRING(@pString,t.N,1) = @pDelimiter

    ) cteStart(N1)

    ) cteLen(N1,L1)

    GO

    I kept all the names for the subqueries to make it easier to identify the parts.

    Unfortunately you failed to recreate the function as originally coded. You are relying on the existence of a tally table in the database. The original DelimitedSplit8K is not dependent on the existence of a tally table. And before you go off and say that one should exist in every database, there have been numerous individuals posting on ssc that have said that they can't create a tally table due to corporate or contractual reasons.

  • Lynn Pettis (1/6/2016)


    Unfortunately you failed to recreate the function as originally coded. You are relying on the existence of a tally table in the database. The original DelimitedSplit8K is not dependent on the existence of a tally table. And before you go off and say that one should exist in every database, there have been numerous individuals posting on ssc that have said that they can't create a tally table due to corporate or contractual reasons.

    Oh, c'mon!

    Everyone knows how to create a Tally table on fly without CTE:

    SELECT TOP 1000000 N = ROW_NUMBER() OVER(ORDER BY C1.id)

    FROM syscolumns c1, syscolumns c2, syscolumns c3

    😎

    _____________
    Code for TallyGenerator

  • Sergiy (1/6/2016)


    Luis Cazares (1/6/2016)


    I'll just want some help from those who prefer subqueries over CTEs to transform one of our favorites functions.

    You know, people been splitting delimited lists for many years in SQL2000 and earlier versions.

    You may find it hard to believe - but I may assure you - it's true. 🙂

    I did it myself.

    Some of those split functions I created back then must be running in Prod systems till now.

    :w00t:

    Edit: Yes, Tally was a static table back then.

    I'm not sure it's a bad thing.

    I'm sure people been splitting delimited lists for many years and many continue to do it without the help of tally tables. The problem is that those options (when coded only in T-SQL) and the ones you posted won't perform as well as the tested version of the DelimitedSplit8k.

    The point of posting the function is to show that there are places where a CTE will be clearly more readable than a subquery. That might not be the case every time and people have preferences (such as using leading or trailing commas) which might not change.

    A good developer/DBA should know when to use the best tool for the job and when two or more different tools can the best option. That same person should be able to prove that the chosen tool is actually the best, and not because someone said/wrote it.

    If it wasn't clear before, my point is that the over-use is bad, but not using them is bad as well. Or basically what Hugo said:

    My take on that is that any feature in T-SQL, or in fact any feature in any language, should be used where useful and not used where not. Overuse of any feature is always bad.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Sergiy (1/6/2016)


    Lynn Pettis (1/6/2016)


    Unfortunately you failed to recreate the function as originally coded. You are relying on the existence of a tally table in the database. The original DelimitedSplit8K is not dependent on the existence of a tally table. And before you go off and say that one should exist in every database, there have been numerous individuals posting on ssc that have said that they can't create a tally table due to corporate or contractual reasons.

    Oh, c'mon!

    Everyone knows how to create a Tally table on fly without CTE:

    SELECT TOP 1000000 N = ROW_NUMBER() OVER(ORDER BY C1.id)

    FROM syscolumns c1, syscolumns c2, syscolumns c3

    😎

    Really, again? Maybe you should read some of Jeff's stuff. He did some testing that shows there is a tipping point when a dynamic tally table becomes more efficient that reading tables from disk.

    I'm sure you also take issue with Itzek Ben-Gan and his code as well then.

  • Sergiy (1/6/2016)


    Lynn Pettis (1/6/2016)


    Unfortunately you failed to recreate the function as originally coded. You are relying on the existence of a tally table in the database. The original DelimitedSplit8K is not dependent on the existence of a tally table. And before you go off and say that one should exist in every database, there have been numerous individuals posting on ssc that have said that they can't create a tally table due to corporate or contractual reasons.

    Oh, c'mon!

    Everyone knows how to create a Tally table on fly without CTE:

    SELECT TOP 1000000 N = ROW_NUMBER() OVER(ORDER BY C1.id)

    FROM syscolumns c1, syscolumns c2, syscolumns c3

    😎

    Congratulations, you did it! Now try it without reading from disk or the buffer.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • <Removed> got hit by the Reply bug

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Sergiy (1/6/2016)


    Lynn Pettis (1/6/2016)


    Unfortunately you failed to recreate the function as originally coded. You are relying on the existence of a tally table in the database. The original DelimitedSplit8K is not dependent on the existence of a tally table. And before you go off and say that one should exist in every database, there have been numerous individuals posting on ssc that have said that they can't create a tally table due to corporate or contractual reasons.

    Oh, c'mon!

    Everyone knows how to create a Tally table on fly without CTE:

    SELECT TOP 1000000 N = ROW_NUMBER() OVER(ORDER BY C1.id)

    FROM syscolumns c1, syscolumns c2, syscolumns c3

    😎

    Oh come on really? Did you check the query plan?

    Next time consider OVER (ORDER BY (SELECT NULL)) to avoid that sort.

    You can't use that tally table in a function with SCHEMABINDING either. 😉

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Complete version, including non-CTE on-fly Tally table:

    DECLARE @pString VARCHAR(8000), @pDelimiter CHAR(1)

    SET @pString = 'aaaa,,ccc,d,'

    SET @pDelimiter = ','

    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY cteLen.N1),

    Item = SUBSTRING(@pString, cteLen.N1, cteLen.L1)

    FROM(--==== Return start and length (for use in substring)

    SELECT cteStart.N1,

    ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,cteStart.N1),0)-cteStart.N1,8000)

    FROM (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT 1 UNION ALL

    SELECT t.N+1 FROM (

    SELECT TOP (DATALENGTH(@pString))

    N = ROW_NUMBER() OVER(ORDER BY C1.id)

    FROM syscolumns c1, syscolumns c2, syscolumns c3) t

    WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter

    ) cteStart(N1)

    ) cteLen(N1,L1)

    _____________
    Code for TallyGenerator

  • Sergiy (1/6/2016)


    Complete version, including non-CTE on-fly Tally table:

    DECLARE @pString VARCHAR(8000), @pDelimiter CHAR(1)

    SET @pString = 'aaaa,,ccc,d,'

    SET @pDelimiter = ','

    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY cteLen.N1),

    Item = SUBSTRING(@pString, cteLen.N1, cteLen.L1)

    FROM(--==== Return start and length (for use in substring)

    SELECT cteStart.N1,

    ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,cteStart.N1),0)-cteStart.N1,8000)

    FROM (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT 1 UNION ALL

    SELECT t.N+1 FROM (

    SELECT TOP (DATALENGTH(@pString))

    N = ROW_NUMBER() OVER(ORDER BY C1.id)

    FROM syscolumns c1, syscolumns c2, syscolumns c3) t

    WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter

    ) cteStart(N1)

    ) cteLen(N1,L1)

    Did you test this before posting it? This is officially the slowest splitter on the internet.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Luis Cazares (1/6/2016)


    Been following this discussion, but I didn't feel like getting in it.

    I believe that over use of CTEs is as bad as the lack of CTEs. I'd rather hire a developer that knows how to use CTEs than one that doesn't, just because he seems more up to date and willing to learn (They've been available for over 10 years now).

    I'll just want some help from those who prefer subqueries over CTEs to transform one of our favorites functions.

    CREATE FUNCTION [dbo].[DelimitedSplit8K]

    --===== Define I/O parameters

    (@pString VARCHAR(8000), @pDelimiter CHAR(1))

    --WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    --===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...

    -- enough to cover VARCHAR(8000)

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS ( --==== This provides the "base" CTE and limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    ),

    cteStart(N1) AS ( --==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT 1 UNION ALL

    SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter

    ),

    cteLen(N1,L1) AS( --==== Return start and length (for use in substring)

    SELECT s.N1,

    ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)

    FROM cteStart s

    )

    --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),

    Item = SUBSTRING(@pString, l.N1, l.L1)

    FROM cteLen l

    ;

    And last, I'm sure that the ones that over use CTEs, would certainly code like this in previous versions:

    SELECT *

    FROM (

    SELECT column1, column2, column3

    FROM MyTable

    ) t

    And yes, I've seen code like that.

    I don't dislike CTEs by any stretch of the imagination... But a challenge is a challenge... So just for the fun of it...

    CREATE FUNCTION dbo.DelimitedSplit8K_DTs

    (

    @pString VARCHAR(8000),

    @pDelimiter CHAR(1)

    )

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    SELECT

    ItemNumber = ROW_NUMBER() OVER (ORDER BY dtLen.N1),

    Item = SUBSTRING(@pString, dtLen.N1, dtLen.L1)

    FROM (

    SELECT

    dtStart.N1,

    L1 = ISNULL(NULLIF(CHARINDEX(@pDelimiter, @pString, dtStart.N1),0) - dtStart.N1,8000)

    FROM (

    SELECT N1 = 1 UNION ALL

    SELECT Tally.N + 1

    FROM (

    SELECT TOP (ISNULL(LEN(@pString), 0))

    N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) E1 (n)

    CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) E2 (n)

    CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) E3 (n)

    CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) E4 (n)

    ) Tally

    WHERE

    SUBSTRING(@pString, Tally.N, 1) = @pDelimiter

    ) dtStart

    ) dtLen;

    GO

  • ALTER FUNCTION [dbo].[DelimitedSplit8K_NoCTE]

    --===== Define I/O parameters

    (@pString VARCHAR(8000), @pDelimiter CHAR(1))

    --WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!

    RETURNS TABLE AS

    RETURN

    --===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...

    -- enough to cover VARCHAR(8000)

    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY cteLen.N1),

    Item = SUBSTRING(@pString, cteLen.N1, cteLen.L1)

    FROM(--==== Return start and length (for use in substring)

    SELECT cteStart.N1,

    ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,cteStart.N1),0)-cteStart.N1,8000)

    FROM (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT 1 UNION ALL

    SELECT t.N+1

    FROM (

    SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM (SELECT 1

    FROM (SELECT 1 FROM (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ) a (N) , (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ) b (N)

    ) a (N),

    (SELECT 1 FROM (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ) a (N), (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ) b (N)

    ) b (N)

    ) E4 (N)

    ) t (N)

    WHERE t.N <= DATALENGTH(@pString)

    AND SUBSTRING(@pString,t.N,1) = @pDelimiter

    ) cteStart(N1)

    ) cteLen(N1,L1)

    GO

    Performance is identical to CTE version.

    _____________
    Code for TallyGenerator

  • But you were 11 mins too slow... 😛

  • Eh, I decided to join in the fun and got beat to it (I realized what I posted earlier had CTEs in it)

    Using delimitedsplit8K logic... here's one that's not limited to a char(1) delimiter:

    USE tempdb

    GO

    DECLARE

    @string varchar(8000) = 'xxx<br/>ryyy<br/>rzzz',

    @delimiter varchar(100) = '<br/>';

    SELECT Item = SUBSTRING(@string, l.N1, l.L1)

    FROM

    (

    SELECT s.N1, L1 = ISNULL(NULLIF(CHARINDEX(@delimiter,@string,s.N1),0)-s.N1,8000)

    FROM

    (

    SELECT N1 = 1 UNION ALL

    SELECT t.N+LEN(@delimiter)

    FROM

    (

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM

    (

    SELECT 1 FROM (VALUES

    (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),

    (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),

    (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),

    (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),

    (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t(N) -- 90 values

    ) a(x),

    (

    SELECT 1 FROM (VALUES

    (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),

    (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),

    (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),

    (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),

    (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t(N) -- 90 values

    ) b(x) -- 8100

    ) t(N)

    WHERE SUBSTRING(@string,t.N,LEN(NULLIF(@delimiter,''))) = @delimiter

    ) s

    ) l;

    GO

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

Viewing 15 posts - 91 through 105 (of 161 total)

You must be logged in to reply to this topic. Login to reply