Long ago there were locking / blocking problems with the SELECT INTO statement. That’s not the case anymore and for AdHoc operations and investigation of data SELECT INTO is very helpful.
If you’re not aware of what SELECT INTO is or its benefits here’s an example:
SELECT City, COUNT(1) as CNT
INTO #MyDuplicateCities
FROM Person.Address
GROUP BY City
HAVING COUNT(1) > 1
This query for AdventureWorks will dump all of its results into a table named #MyDuplicateCities. Note that there is no CREATE TABLE statement. The INTO [tablename] will create the table for you.
Running this query a second time will result in failure if you haven’t dropped the #MyDuplicateCities table.
Using this syntax can be really helpful if you just need to do some quick and dirty cleanup; however, it should be avoided for stored procedures. Here’s why…
Have you ever tried to get an Estimated plan from a stored procedure and it results in an ERROR 208 Invalid object name?
When you’re attempting to get an estimated plan the query / stored procedure isn’t actually running. Since the DDL doesn’t exist SQL Server errors with a 208 message and cannot continue to analyze the statement.
Creating the DDL at the beginning of your procedure will always ensure that the estimated plan can be retrieved and as an added bonus the code will be easier to debug for someone that isn’t familiar with the data.
Summing up
Avoid using SELECT INTO for code you’re going to push to production for the following reasons:
- It can make estimated plans fail.
- It’s slightly harder to debug or modify since the schema isn’t easily shown.
Be lazy and efficient when investigating data. For code that you’re not going to reuse SELECT INTO is a champ. It’s faster than building the typing the CREATE TABLE statement and works just as well for AdHoc scenarios.
*Update for Mr Ozar*
When I originally wrote this post I thought I did include a counter point or at least some mention of parallelism for SELECT INTO as a benefit. I think it’s easy to say that in the world of SQL Server “It depends.” and this is always why a talking point like this has two sides and both are valid. I typically always select the highest performing path. I think it’s up to you, the reader, to decide which is best for your environment. Given the new Parallel Insert operation in SQL Server 2016, I’d say why not create your temp table first and have the best of both worlds?