Using SELECT INTO a bad practice?

  • allmhuran (6/12/2011)


    Interesting. Ever discovered a pattern for when they're fast or slow? I pretty much abandoned them a while ago, figuring it was just asking a bit too much of the engine. I still use CTEs, but limit them to situations where I'm using what would otherwise be the same subquery 2 or more times in a statement, or for recursive selects.

    If there's a pattern to it, it escapes me. I can show you one pattern that will blow the doors off of just about everything else... I just got done adding this piece of code on an article about using Recursive CTE's for a counting mechanism. Rest assured, the following is NOT a Recursive CTE but it does contain "cascaded CTE's" as you describe...

    --=======================================================================================

    -- Demonstrate the Itzek-Style CROSS JOIN method.

    -- The code is a bit complex but you could turn the CTE's into an iTVF

    -- ("iTVF" = "Inline Table Valued Function")

    --=======================================================================================

    WITH --===== Itzek-Style CROSS JOIN counts from 1 to the number of days needed

    E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), -- 1*10^1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), -- 1*10^2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), -- 1*10^4 or 10,000 rows

    E8(N) AS (SELECT 1 FROM E4 a, E4 b), -- 1*10^8 or 100,000,000 rows

    cteTally(N) AS (SELECT TOP (@Days) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E8)

    SELECT WholeDate = DATEADD(dd,N-1,@StartYear),

    NextDate = DATEADD(dd,N ,@StartYear)

    FROM cteTally

    ;

    Here's another example of cascaded CTE's that smoke just about everything else for a seemingly common problem. It's code to solve the age old quandry about how to make large numbers of constrained unique random numbers for membership cards or whatever. In not only has multiple cascaded CTE's, it even updates the table through one of the cascaded CTE's (which I'd wager that a whole lot of people didn't know could be done).

    --=====================================================================================================================

    -- Create a table of test data. Nothing in this section is a part of the solution.

    -- We're just creating test data so we have something to test for performance.

    -- This just takes a second or two on my 9 year old, single CPU desktop box.

    --=====================================================================================================================

    --===== Do this testing in a nice, safe place that everyone has.

    USE tempdb

    ;

    --===== Conditionally drop the test table to make reruns in SSMS easier

    IF OBJECT_ID('tempdb..#YourTable','U') IS NOT NULL

    DROP TABLE #YourTable

    ;

    --===== Create a million row test table and populate it on the fly

    SELECT TOP 1000000

    ID = IDENTITY(INT,1,1),

    Number = ABS(CHECKSUM(NEWID()))%4000000,

    EntryText = REPLACE(LEFT(NEWID(),12+ABS(CHECKSUM(NEWID()))%12),'-',''),

    MembershipNumber = CAST(NULL AS CHAR(10))

    INTO #YourTable

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ;

    --===== Now, delete all but 200,000 rows in random order so we have gaps in the ID column

    WITH

    cteDeleteableCTE AS

    (

    SELECT RowNum = ROW_NUMBER() OVER (ORDER BY NEWID())

    FROM #YourTable

    )

    DELETE FROM cteDeleteableCTE

    WHERE RowNum <= 800000

    ;

    --===== Add the clustered PK to the test table.

    -- We won't name it because it's a temp table.

    ALTER TABLE #YourTable

    ADD PRIMARY KEY CLUSTERED (ID)

    ;

    --=====================================================================================================================

    -- The only way to ensure that you have unique random numbers in a set-based

    -- fashion is to generate more than you need and use only the ones that

    -- haven't been duplicated. Then we need to add a row number to each random

    -- number randomly and update the final table with the random numbers. In

    -- order to pull off the UPDATE, we need something to join on like a sequential

    -- number. Since the ID may not be sequential or may not even start at 1, we

    -- have to generate

    --=====================================================================================================================

    WITH

    --====== E1 through E8 provides a "Pseudo Cursor" of up to 100 Million rows

    E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1

    ), --1*10^1 = 10 Rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --1*10^2 = 100 Rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --1*10^4 = 10,000 Rows

    E8(N) AS (SELECT 1 FROM E4 a, E4 b), --1*10^8 = 100,000,000 Rows

    cteGenRandom10DigitNums(Random10DigitNum) AS

    (--==== Create .1% more random numbers than we need

    SELECT TOP (CAST((SELECT COUNT(ID)*1.001 FROM #YourTable) AS INT))

    Random10DigitNum = CAST(CAST(NEWID() AS BINARY(16)) AS BIGINT)

    FROM E8

    )

    ,

    cteDedupeAndEnumerate AS

    (--==== Use only the random numbers that have no dupes and give them a row number

    SELECT RowNum = ROW_NUMBER() OVER (ORDER BY NEWID()),

    Random10DigitNum

    FROM cteGenRandom10DigitNums

    GROUP BY Random10DigitNum

    HAVING COUNT(*) = 1

    ) ,

    cteUpdateableCTE AS

    (--==== Assign row numbers to each row in your table without changing the table

    SELECT RowNum = ROW_NUMBER() OVER (ORDER BY ID),

    MembershipNumber

    FROM #YourTable

    )--==== Update the table with unique random numbers through the updateable cte.

    UPDATE ucte

    SET MembershipNumber = enum.Random10DigitNum

    FROM cteUpdateableCTE ucte

    inner JOIN #MyHead enum

    ON ucte.RowNum = enum.RowNum

    ;

    --=====================================================================================================================

    -- Testing: Show there are no duplicate values that have been created.

    -- No rows will be returned which means there are no duplicates.

    -- Again, this is not a part of the solution from above. It's just a test.

    --=====================================================================================================================

    SELECT MembershipNumber, Dupes = COUNT(*)

    FROM #YourTable

    GROUP BY MembershipNumber

    HAVING COUNT(*) > 1

    ;

    --=====================================================================================================================

    -- Testing: Show the "random order" in relation to the ID column.

    -- Again, this is not a part of the solution from above. It's just a test.

    --=====================================================================================================================

    SELECT *

    FROM #YourTable

    ORDER BY ID

    ;

    --=====================================================================================================================

    -- Testing: Show that all rows have a 10 digit number

    --=====================================================================================================================

    SELECT COUNT(*)

    FROM #YourTable

    WHERE LEN(MembershipNumber) <> 10

    ;

    Like I ssaid before, I've also seen it where cascaded CTE's will suck the life out of a server because it goes bonkers for one reason or another (especially if there's a datatype mismatch... I guess that qualifies as the start of a "pattern").

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • allmhuran (6/12/2011)


    Interesting. Ever discovered a pattern for when they're fast or slow? I pretty much abandoned them a while ago, figuring it was just asking a bit too much of the engine. I still use CTEs, but limit them to situations where I'm using what would otherwise be the same subquery 2 or more times in a statement, or for recursive selects.

    Ah... One more thing. Be real careful about what you do with Recursive CTE's which is nothing more than Hidden RBAR (the code I posted is a part of an article I'm doing on Recursive CTE's as (bad) counters. Here's a chart from that article... the painfully obvious loser in that chart is a Recursive CTE.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The performance of the CTEs there isn't surprising to me given that there isn't much "work" for the engine to do. There's no statistics or indexes to play with, no keys, constraints, etc. There's few tables involved and all it's really doing is a cartesian product (cleverly done though, I've seen and used this before for generating large numbers of rows). The times I've found the cascaded (good word) CTEs to be painful is when you're doing it against a "real" schema, often with tables that are common to two or more CTEs, such that the engine gets confused or overwhelmed. To use your phrase, I suppose that's also a "pattern", so I guess I was hoping you might have seen a pattern in the pattern, so to speak.

    I stay clear of deep CTE usage in general, but I will use a recursive CTE when I consider the clarity benefit to be worth the cost. That is to say, when the cost is minimal: simple tree-walking stuff with perhaps two tables involved to define the hierarchy and a limited depth. I didn't know the performance diverged as much as your graphs indicate though, that's quite enlightening.

  • Jeff Moden (6/12/2011)


    allmhuran (6/12/2011)


    Interesting. Ever discovered a pattern for when they're fast or slow? I pretty much abandoned them a while ago, figuring it was just asking a bit too much of the engine. I still use CTEs, but limit them to situations where I'm using what would otherwise be the same subquery 2 or more times in a statement, or for recursive selects.

    If there's a pattern to it, it escapes me. I can show you one pattern that will blow the doors off of just about everything else... I just got done adding this piece of code on an article about using Recursive CTE's for a counting mechanism. Rest assured, the following is NOT a Recursive CTE but it does contain "cascaded CTE's" as you describe...

    Jeff,

    If I get a vote I call them piggy-backed CTE's. Cascaded makes me think of a dish washer.

    Todd Fifield

  • Sheesh! Once again, I am chastened by the "Everything you think you know is wrong!" fairy. I started off writing big CTE expressions, and they crashed under their own weight. I swore off doing SELECT ... INTO statements, because of locking issues, which no longer exist.

    Is nothing sacred anymore?!?

    Kevin

    --
    Please upgrade to .sig 2.0

  • knechod (6/13/2011)


    Sheesh! Once again, I am chastened by the "Everything you think you know is wrong!" fairy. I started off writing big CTE expressions, and they crashed under their own weight. I swore off doing SELECT ... INTO statements, because of locking issues, which no longer exist.

    Is nothing sacred anymore?!?

    Kevin

    The only thing sacred in SQL Server is the simple belief that under just about any circumstance: "It Depends!"


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • tfifield (6/13/2011)


    Jeff Moden (6/12/2011)


    allmhuran (6/12/2011)


    Interesting. Ever discovered a pattern for when they're fast or slow? I pretty much abandoned them a while ago, figuring it was just asking a bit too much of the engine. I still use CTEs, but limit them to situations where I'm using what would otherwise be the same subquery 2 or more times in a statement, or for recursive selects.

    If there's a pattern to it, it escapes me. I can show you one pattern that will blow the doors off of just about everything else... I just got done adding this piece of code on an article about using Recursive CTE's for a counting mechanism. Rest assured, the following is NOT a Recursive CTE but it does contain "cascaded CTE's" as you describe...

    Jeff,

    If I get a vote I call them piggy-backed CTE's. Cascaded makes me think of a dish washer.

    Todd Fifield

    Heh... used in the correct hardware environment, they WILL do your dishes. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply