Is it "better" to specify the definition of a temporary table, or just define it on the fly?

Question

Post reply

Is it "better" to specify the definition of a temporary table, or just define it on the fly?

Rod at work

SSC-Dedicated

Points: 34008
More actions
October 30, 2012 at 1:13 pm

#266288

The subject line says it all, but just to make it clearer...
I could use a CREATE TABLE #TempTable to define the definition of a table, but I'm wondering if it even matters? Most code I've seen, using temporary tables, do something like this:
INSERT INTO #TempTable
SELECT CustomerID, LastName, FirstName FROM
SomeTable
So, does it really matter?
Kindest Regards, Rod Connect with me on LinkedIn.

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

menon.satyen SSC-Addicted Points: 421 More actions · Answer 1

Hi,

For known column I use create table and then insert

For creating a table from another table schema I use insert into.

Have used both, but honestly i prefer 'insert into' because then you dont have to know all columns before insertion.

Here, I guess it doesnt matter which one you use.

🙂

Luis Cazares SSC Guru Points: 183706 More actions · Answer 2

There's a small confusion.

INSERT INTO, won't create a table, SELECT INTO will. Example:

--This will create a table

SELECT column1, column2, column3

INTO #TempTable

FROM MyTable

--This won't

INSERT INTO #TempTable

SELECT column1, column2, column3

FROM MyTable

Now, to answer the question. IMHO, It depends. If you don't care on the structure of your table and need something easy, you can use SELECT INTO. However, creating the table before inserting data will give you more flexibility and control on it's structure.

What happens if you need more columns than the ones that you'll have available in the select? What happens if you want an exact copy of the columns used in a query?

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 3

Luis Cazares (10/30/2012)
What happens if you need more columns than the ones that you'll have available in the select? What happens if you want an exact copy of the columns used in a query?

That's an easy thing to do and you have much more control over what the columns may be like than you might imagine.

For example, here's the final SELECT from a CTE that I use to begin the process of converting an Adjacency List to Nested Sets (article comes out Nov 13th, 2012).

SELECT EmployeeID = ISNULL(sorted.EmployeeID,0),

sorted.ManagerID,

HLevel = ISNULL(sorted.HLevel,0),

LeftBower = ISNULL(CAST(0 AS INT),0), --Place holder

RightBower = ISNULL(CAST(0 AS INT),0), --Place holder

NodeNumber = ISNULL(ROW_NUMBER() OVER (ORDER BY sorted.SortPath),0),

NodeCount = ISNULL(CAST(0 AS INT),0), --Place holder

SortPath = ISNULL(sorted.SortPath,sorted.SortPath)

INTO dbo.Hierarchy

FROM cteBuildPath AS sorted

First, all of the ISNULL's make for NOT NULL columns in the final table. That's especially important for the EmployeeID column because it will become the PK for this table using a separate piece of code.

The other thing to notice is that it also creates 3 "place holder" columns in the table for which I no data for at this point in the process. It even controls what the datatype will be for those columns using CAST.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Luis Cazares SSC Guru Points: 183706 More actions · Answer 4

I knew it was doable, but I believe is better to have an explicit CREATE TABLE statement for this situations.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 5

I did a bit of testing on this a while ago, and could probably recreate the scenarios, but here's where I ended up.

SELECT INTO #tmp

and

CREATE TABLE #Tmp INSERT INTO #tmp SELECT

were functionally equivalent timewise at a million row test with varying degrees of spacer columns ( (REPLICATE('@',200) for example). The concern comes in after that, where any temp that size almost requires an index.

CREATE TABLE #Tmp CREATE INDEX idx1 INSERT INTO #tmp SELECT

runs about 1.5x - 2x faster (depending on the index) as

SELECT INTO #tmp CREATE INDEX idx1

This is voodoo on a large scale as I can't give you exact reasons why, but I get it generally. The heap being indexed after is more expensive than the pre-sort into the index.

You can make either version do whatever you want, but in my personal preference and practice is for permanent procs I always declare the #TMP directly, then insert into it. If I'm doing one-off adhocs trying to research up some data, I'll swap between the two depending on how many columns are involved until I nail down my requirements.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 6

Luis Cazares (10/30/2012)
I knew it was doable, but I believe is better to have an explicit CREATE TABLE statement for this situations.

Why?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Luis Cazares SSC Guru Points: 183706 More actions · Answer 7

Jeff Moden (10/30/2012)
Luis Cazares (10/30/2012)
I knew it was doable, but I believe is better to have an explicit CREATE TABLE statement for this situations.
Why?

Why? Because I get paid by line of code. Ok, that's not true.

The reason is much similar to comments, it makes code more readable.

I won't use Craig post but that would have been a good reason if I knew it before.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 8

I have to say that it truly depends. For example, I typically only put into a Temp Table exactly what I need. Since I usually end up using the whole table, the best an index would do would be a SCAN instead of a SEEK. With that thought in mind, I frequently don't put any kind of an index on TempTables... not even a PK or CI. In fact, adding an index will sometimes cause queries against the Temp Table to run slower... a lot slower. I just went through such a thing with the code for an article coming out soon. With the Clustered Index that seemed totally logical to have, the code ran in just over 2 minutes. Without the index, the code ran in 54 seconds... nearly twice as fast.

To wit, on a million row table on my humble i5/6GB laptop, the SELECT/INTO edges out the CREATE TABLE by a couple of hundred milliseconds when I don't create an index. That's not the important part to me because it's such a very small difference. What's important to me is that I don't have to write the code to create the table to begin with and that code can sometimes be fairly extensive. If the table is a #Temp Table, then I'm not normally doing any datatyping or making columns NOT NULL with ISNULL. It's all very fast to type because I only need to add two words to a query to get it to build the table... INTO tablename.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004363 More actions · Answer 9

My apologies. I forgot to post the code for my performance claim.

--http://www.sqlservercentral.com/Forums/Topic1378984-149-1.aspx?Update=1

DECLARE @StartTime DATETIME;

RAISERROR('===============================================================',0,1) WITH NOWAIT;

RAISERROR('========== SELECT/INTO',0,1) WITH NOWAIT;

RAISERROR('===============================================================',0,1) WITH NOWAIT;

DBCC FREESYSTEMCACHE ('ALL') WITH NO_INFOMSGS ;

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS ;

SELECT @StartTime = GETDATE();

SELECT TOP (1000000)

SomeID = IDENTITY(INT,1,1),

SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,

SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),

SomeCSV = 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10',

SomeMoney = ABS(CHECKSUM(NEWID()))%10000 /100.0,

SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

SomeHex09 = RIGHT(NEWID(),09)

INTO #JBMTest

FROM sys.all_columns t1

CROSS JOIN sys.all_columns t2

;

--===== Add the PK

-- Takes about 1 second to execute.

-- ALTER TABLE #JBMTest

-- ADD PRIMARY KEY CLUSTERED (SomeID)

--;

--===== Duration and HouseKeeping

PRINT 'Duration = ' + CONVERT(CHAR(12),GETDATE()-@StartTime,114);

DROP TABLE #JBMTest

;

GO 5

DECLARE @StartTime DATETIME;

RAISERROR('===============================================================',0,1) WITH NOWAIT;

RAISERROR('========== Precreate Table with Index',0,1) WITH NOWAIT;

RAISERROR('===============================================================',0,1) WITH NOWAIT;

DBCC FREESYSTEMCACHE ('ALL') WITH NO_INFOMSGS ;

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS ;

SELECT @StartTime = GETDATE();

CREATE TABLE [dbo].[#JBMTest](

[SomeID] [int] IDENTITY(1,1) NOT NULL,

[SomeInt] [int] NULL,

[SomeLetters2] [char](2) NULL,

[SomeCSV] [varchar](80) NULL,

[SomeMoney] [money] NULL,

[SomeDate] [datetime] NULL,

[SomeHex09] [char](9) NULL,

--PRIMARY KEY CLUSTERED

--(

--[SomeID] ASC

--)

)

;

INSERT INTO #JBMTest

SELECT TOP (1000000)

--SomeID = IDENTITY(INT,1,1),

SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,

SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),

SomeCSV = 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10',

SomeMoney = ABS(CHECKSUM(NEWID()))%10000 /100.0,

SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

SomeHex09 = RIGHT(NEWID(),09)

FROM sys.all_columns t1

CROSS JOIN sys.all_columns t2

;

--===== Duration and HouseKeeping

PRINT 'Duration = ' + CONVERT(CHAR(12),GETDATE()-@StartTime,114);

DROP TABLE #JBMTest

;

GO 5

RAISERROR('===============================================================',0,1) WITH NOWAIT;

Here are the run results in my i5 laptop. 5 runs each. The test code clears cache at the system level so make sure you don't run it on a production box.

Beginning execution loop

===============================================================

========== SELECT/INTO

===============================================================

(1000000 row(s) affected)

Duration = 00:00:03:460

===============================================================

========== SELECT/INTO

===============================================================

(1000000 row(s) affected)

Duration = 00:00:03:480

===============================================================

========== SELECT/INTO

===============================================================

(1000000 row(s) affected)

Duration = 00:00:03:647

===============================================================

========== SELECT/INTO

===============================================================

(1000000 row(s) affected)

Duration = 00:00:03:467

===============================================================

========== SELECT/INTO

===============================================================

(1000000 row(s) affected)

Duration = 00:00:03:497

Batch execution completed 5 times.

Beginning execution loop

===============================================================

========== Precreate Table with Index

===============================================================

(1000000 row(s) affected)

Duration = 00:00:04:070

===============================================================

========== Precreate Table with Index

===============================================================

(1000000 row(s) affected)

Duration = 00:00:04:177

===============================================================

========== Precreate Table with Index

===============================================================

(1000000 row(s) affected)

Duration = 00:00:04:113

===============================================================

========== Precreate Table with Index

===============================================================

(1000000 row(s) affected)

Duration = 00:00:04:123

===============================================================

========== Precreate Table with Index

===============================================================

(1000000 row(s) affected)

Duration = 00:00:03:997

Batch execution completed 5 times.

===============================================================

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Rod at work SSC-Dedicated Points: 34008 More actions · Answer 10

WOW, lots of great feedback here, thank you everyone!

Kindest Regards, Rod Connect with me on LinkedIn.

Gail Wanabee SSCrazy Eights Points: 8471 More actions · Answer 11

Jeff Moden,

Please clarify this subject for me:

I'm almost certain that I read somewhere that a "SELECT .... INTO #Table" would automatically create the target table, then lock the TempDB database for the duration of the data insertions.

Is this true or is it incorrect information?

Thanks,

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 12

Completely false.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

GSquared SSC Guru Points: 260824 More actions · Answer 13

Lee Crain (11/6/2012)
Jeff Moden,
Please clarify this subject for me:
I'm almost certain that I read somewhere that a "SELECT .... INTO #Table" would automatically create the target table, then lock the TempDB database for the duration of the data insertions.
Is this true or is it incorrect information?
Thanks,

Used to be partially true in SQL 7.5 and before. Schema locks on tempdb were needed in order to create temp tables back then. In any version of SQL Server from SQL 2000 or later, that is no longer true. But even in the older versions, the locks only lasted as long as it took to create the table, not for the duration of the inserts, if I remember correctly.

On the main subject, I use Select Into when the structure of the temp table is subject to change without notice, or when I'm trying to debug truncation issues and so on. I generally use Create...Insert...Select for temp tables where I want to control the structure of the temp table. In either case, I usually add a conditional Drop Table before the create step.

IF OBJECT_ID(N'tempdb..#T') IS NOT NULL

DROP TABLE #T;

CREATE TABLE #T

(ID INT IDENTITY

PRIMARY KEY,

MyColumn1 INT NOT NULL);

INSERT INTO #T

(MyColumn1)

VALUES (1),

(2),

(5);

Like that.

Makes it so the script can be run repeatedly for debugging/testing/refactoring.

Breaking it up that way makes it more "self-documenting", it my opinion. I find it much easier to overlook an "INTO" in a Select statement, than to overlook a "CREATE TABLE" with some whitespace around it.

Plus habits from SQL 2000 and before, where mixing DDL and DML could cause performance issues - not as important any more, but it's a habit and I haven't bothered to change it.

So I tend towards explicitly creating temp tables.

One very useful application of Select Into, though, is when you get a truncation error from an ETL process that was working before. Use Select Into, pull the structure of the resulting table, compare it to your target table, and the column with a longer max_length will usually leap right out. If you name the columns the same as the target table, you can even do an Outer Join on the two sets of column definitions and find where the temp table has a bigger scope. Speeds up debugging that kind of thing marvelously when you have a dozen or more varchar columns, and all you get from the SQL engine is "string or binary data would be truncated", with no indication of what column/row you need to look at.

There are lots of other little shortcuts like that where Select Into can save a lot of time and work.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Gail Wanabee SSCrazy Eights Points: 8471 More actions · Answer 14

Thanks, Gail, and thanks, G-Squared, especially for the details regarding debugging SSIS problems.

SSIS is not one of my strong points. I'm going to tuck that advice away for future reference.