Using SELECT INTO a bad practice?

  • Hi everyone,

    We have lots of reports which use SELECT INTO to insert data into the temp tables. I was told that it's a bad practice. Could anyone please why is SELECT INTO a bad practice?

    Thanks,

    Sunny.

  • i wouldn't say bad practice; you'd have to review the logic to determine if the data REALLY needs to go into a temp table (like it's being further processed or something); maybe there's no need for a temp table...then the argument might be made for calling it a bad practice.

    i might say using a CTE is a better practice, if it's possible to use that instead.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I think I might go a little further than Lowell, I view it as, at minimum, a questionable practice. You are taking a lot on faith, you know, like the table structure and the datatypes. I might use it when I am writing queries and such, but I recommend against it in a sproc that is headed to production. Along with not using SELECT * in basically all cases.

    Basically you should know the structure of the table you are inserting records into at design time and hoping they don't change is way to optimistic for me.

    CEWII

  • Select Into for temp tables used to be a bad practice in pre-SQL 2000 days because of schema locks in tempdb. The datum that it is a bad practice has a sort of vampiric difficulty to kill, and keeps resurrecting itself all over the place.

    If you use them correctly, by explicitly defining the columns that will go into them, it's not a problem.

    It's just an old "best/worst practice" datum that refuses to die even though it's been untrue for over a decade.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Out of curiosity ,would there be a performance impact on sql trying to get the underlying definition for the table , things like datatype , collation etc. vs defining it outright in the proc. esp for a proc that executes very frequently ?

    Personally I have always gone with scripting the entire tables to make sure the proc breaks when the underlying table is changed by someone else esp for tables with columns where implicit conversions are possible.

  • Here are the two options most commonly used:

    Select <column list>

    Into #MyTempTable

    From MyTable;

    create table #MyTempTable (colum list and definitions);

    Insert Into #MyTempTable (column list)

    Select (column list)

    From MyTable;

    In both cases, SQL Server has to obtain the same metadata about MyTable, whether that's a single table, a complex set of tables and views and UDFs joined together, or something in between.

    In the first instance, it checks during creation of the temp table. In the second, it checks during the insert, and errors out if there are conversion/truncation issues.

    I've seen and done speed tests that show the first has some advantages in most scenarios on performance.

    The second has both advantages and disadvantages when it comes to data typing and sizing. If, for example, you find that you need to expand a "City" column in an addresses table by a few characters, because you're now including Welsh towns, and one of them is named "Llanfairpwllgwyngyllgogerychwyrndrobwyll-llantysiliogogogoch" (it's a real name of a real town), and the varchar(50) you originally defined won't quite hold that (it's 60 characters), then any Select Into statements will automatically also be increased in size, but Create Insert Select won't, and you'll have to rewrite code to accommodate a data-definition change.

    So, with Select Into, you usually get a slight speed increase (it's tiny but consistent in the tests I've done), and you get simplicity of managing data types/lengths.

    If you use Create Insert Select, you get errors raised when you try to insert if incompatibilities are found. That can be an important point of data hygiene and can be important.

    But most of the time, Select Into will manage what you need, and if you have a specific data hygiene issue, you'll still get that later in the process, or can explicitly check for it by querying tempdb.sys.columns if you need to. I've never run into a situation yet where Select Into created that kind of problem, but that doesn't mean it can't happen, it just means I personally haven't run into it.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • CELKO (6/10/2011)


    >> We have lots of reports which use SELECT INTO to insert data into the temp tables. I was told that it's a bad practice. Could anyone please why is SELECT INTO a bad practice? <<

    You have to materialize it so it wasted time and disk space. You can use a derived table, a VIEW or a CTE which can be optimized.

    It is also non-ANSI syntax. It shoudl be what is called a singleton select if MSA was up to Standards.

    The reason that bad programmers use it is that it mimics mounting a scratch tape in a file system.

    Having said that, there might be a few cases where it is a good idea. I cannot think of any right now.

    Once populated, the target table can certainly be optimized and will frequently provide much better performance than any of the 3 methods you mentioned. It's also much more effecient to populate a Temp Table once than it is to call the same CTE or View more than once because the underlying code will be executed more than once. The reason that excellent programmers use it is to "Divide'n'Conquer" instead of having very large queries which can be quite inefficient even when written correctly. And it doesn't matter if it doesn't meet ANSI syntax standards because true code portability is a myth.

    "ScratchPad" technology isn't a bad thing. It was used to enhance speed way back when and, done properly, it still works very, very well today. Further, SQL Server does it all the time in the form of "WorkTables". You'd know that if you actually spent more time studying T-SQL than you do bad mouthing other people. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'm not sure what all the concerns are about datatyping with SELECT/INTO because it will create the new table with the same datatypes and sizes as the source tables. It even transfers the IDENTITY and other properties if it's done within the same database. Datatype matching is one of the most important things to performance.

    It's also significantly faster, in most cases, than creating and populating a table in separate steps even if the SELECT/INTO doesn't happen to qualify as "minimally logged".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I use SELECT * INTO from VIEW a number of times to prep my datawarehouse each morning. I whap an index on the tables, and it dramatically improves the ROLAP partition I have specifying today's data.

    My view is complex, the query execution time is far too long, and it seems to be the best solution. If this is questionable practise, i'd like to know what a better one is; it seems perfectly adequate for the task for building a datawarehouse.

  • It's bad if it's unnecessary for performance.

    A lot of the time when I see select into temp tables I also see cursors processing those temp tables. In other words it can be an indication of bad code. But if you've got some large, multi-step set based problem going on, temp tables will beat the pants off of multiple CTEs.

  • allmhuran (6/12/2011)


    It's bad if it's unnecessary for performance.

    A lot of the time when I see select into temp tables I also see cursors processing those temp tables. In other words it can be an indication of bad code. But if you've got some large, multi-step set based problem going on, temp tables will beat the pants off of multiple CTEs.

    I've definitely seen that happen but I'll also say "It Depends" because I've seen the opposite be true, as well. 🙂

    I'll definitely agree that trying to replace cursors with Temp Tables and WHILE loops is usually a bad thing. It's like decaffeinated coffee to me... I call both a "Why Bother". 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • SQL seems to have trouble with constructs like...

    with c1 as (...),

    c2 as (select ... from c1 ...),

    c3 as (select ... from c2 ...)

    ... etc. Fair enough too, I'd say.

    That's one situation where I'd be looking at temp tables (with the premise here being that there's no logic shortcuts available)

  • allmhuran (6/12/2011)


    SQL seems to have trouble with constructs like...

    with c1 as (...),

    c2 as (select ... from c1 ...),

    c3 as (select ... from c2 ...)

    ... etc. Fair enough too, I'd say.

    That's one situation where I'd be looking at temp tables (with the premise here being that there's no logic shortcuts available)

    I've used such constructs because they were so fast. Of course, not it all cases. I've also seen it where such constructs are absolute dogs. It still "Depends". 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Interesting. Ever discovered a pattern for when they're fast or slow? I pretty much abandoned them a while ago, figuring it was just asking a bit too much of the engine. I still use CTEs, but limit them to situations where I'm using what would otherwise be the same subquery 2 or more times in a statement, or for recursive selects.

  • CELKO (6/10/2011)


    >> We have lots of reports which use SELECT INTO to insert data into the temp tables. I was told that it's a bad practice. Could anyone please why is SELECT INTO a bad practice? <<

    You have to materialize it so it wasted time and disk space. You can use a derived table, a VIEW or a CTE which can be optimized.

    It is also non-ANSI syntax. It shoudl be what is called a singleton select if MSA was up to Standards.

    The reason that bad programmers use it is that it mimics mounting a scratch tape in a file system.

    Having said that, there might be a few cases where it is a good idea. I cannot think of any right now.

    That looks to me like a failure to think clearly.

    A CTE will not work if this data is required often and the optimiser is not clever enough to create it only once; the same applies to a view (other than a permanently materialised view). Materialisation is often the best approach, given the capabilities of the optimiser, rather than something to be avoided. Of course with your strong orientation towards the language standard (which incidentally is a mess which needs to be sorted out but probably won't be since the standards bodies appear to be heavily biased against any attempt to describe a relational algebra/calculus or even someting as simple as a sensible defintion and use of NULL, and even more biased against recognising that current and forseeable limitations on optimisation technology require the addition to the standard of features not currently included in the standard) as opposed to towards any actual working database management system, you are not aware of what typical optimiser capabilities are - but surely you are aware of that unawareness, and could - if you bothered to try - avoid making rather silly statements that display it only too clearly.

    Tom

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply