October 30, 2012 at 1:13 pm
The subject line says it all, but just to make it clearer...
I could use a CREATE TABLE #TempTable to define the definition of a table, but I'm wondering if it even matters? Most code I've seen, using temporary tables, do something like this:
INSERT INTO #TempTable
SELECT CustomerID, LastName, FirstName FROM
SomeTable
So, does it really matter?
Kindest Regards, Rod Connect with me on LinkedIn.
October 30, 2012 at 1:40 pm
Hi,
For known column I use create table and then insert
For creating a table from another table schema I use insert into.
Have used both, but honestly i prefer 'insert into' because then you dont have to know all columns before insertion.
Here, I guess it doesnt matter which one you use.
🙂
October 30, 2012 at 2:21 pm
There's a small confusion.
INSERT INTO, won't create a table, SELECT INTO will. Example:
--This will create a table
SELECT column1, column2, column3
INTO #TempTable
FROM MyTable
--This won't
INSERT INTO #TempTable
SELECT column1, column2, column3
FROM MyTable
Now, to answer the question. IMHO, It depends. If you don't care on the structure of your table and need something easy, you can use SELECT INTO. However, creating the table before inserting data will give you more flexibility and control on it's structure.
What happens if you need more columns than the ones that you'll have available in the select? What happens if you want an exact copy of the columns used in a query?
October 30, 2012 at 4:57 pm
Luis Cazares (10/30/2012)
What happens if you need more columns than the ones that you'll have available in the select? What happens if you want an exact copy of the columns used in a query?
That's an easy thing to do and you have much more control over what the columns may be like than you might imagine.
For example, here's the final SELECT from a CTE that I use to begin the process of converting an Adjacency List to Nested Sets (article comes out Nov 13th, 2012).
SELECT EmployeeID = ISNULL(sorted.EmployeeID,0),
sorted.ManagerID,
HLevel = ISNULL(sorted.HLevel,0),
LeftBower = ISNULL(CAST(0 AS INT),0), --Place holder
RightBower = ISNULL(CAST(0 AS INT),0), --Place holder
NodeNumber = ISNULL(ROW_NUMBER() OVER (ORDER BY sorted.SortPath),0),
NodeCount = ISNULL(CAST(0 AS INT),0), --Place holder
SortPath = ISNULL(sorted.SortPath,sorted.SortPath)
INTO dbo.Hierarchy
FROM cteBuildPath AS sorted
First, all of the ISNULL's make for NOT NULL columns in the final table. That's especially important for the EmployeeID column because it will become the PK for this table using a separate piece of code.
The other thing to notice is that it also creates 3 "place holder" columns in the table for which I no data for at this point in the process. It even controls what the datatype will be for those columns using CAST.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 30, 2012 at 5:06 pm
I knew it was doable, but I believe is better to have an explicit CREATE TABLE statement for this situations.
October 30, 2012 at 6:03 pm
I did a bit of testing on this a while ago, and could probably recreate the scenarios, but here's where I ended up.
SELECT INTO #tmp
and
CREATE TABLE #Tmp INSERT INTO #tmp SELECT
were functionally equivalent timewise at a million row test with varying degrees of spacer columns ( (REPLICATE('@',200) for example). The concern comes in after that, where any temp that size almost requires an index.
CREATE TABLE #Tmp CREATE INDEX idx1 INSERT INTO #tmp SELECT
runs about 1.5x - 2x faster (depending on the index) as
SELECT INTO #tmp CREATE INDEX idx1
This is voodoo on a large scale as I can't give you exact reasons why, but I get it generally. The heap being indexed after is more expensive than the pre-sort into the index.
You can make either version do whatever you want, but in my personal preference and practice is for permanent procs I always declare the #TMP directly, then insert into it. If I'm doing one-off adhocs trying to research up some data, I'll swap between the two depending on how many columns are involved until I nail down my requirements.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
October 30, 2012 at 6:05 pm
Luis Cazares (10/30/2012)
I knew it was doable, but I believe is better to have an explicit CREATE TABLE statement for this situations.
Why?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 30, 2012 at 6:48 pm
Jeff Moden (10/30/2012)
Luis Cazares (10/30/2012)
I knew it was doable, but I believe is better to have an explicit CREATE TABLE statement for this situations.Why?
Why? Because I get paid by line of code. Ok, that's not true.
The reason is much similar to comments, it makes code more readable.
I won't use Craig post but that would have been a good reason if I knew it before.
October 30, 2012 at 7:41 pm
I have to say that it truly depends. For example, I typically only put into a Temp Table exactly what I need. Since I usually end up using the whole table, the best an index would do would be a SCAN instead of a SEEK. With that thought in mind, I frequently don't put any kind of an index on TempTables... not even a PK or CI. In fact, adding an index will sometimes cause queries against the Temp Table to run slower... a lot slower. I just went through such a thing with the code for an article coming out soon. With the Clustered Index that seemed totally logical to have, the code ran in just over 2 minutes. Without the index, the code ran in 54 seconds... nearly twice as fast.
To wit, on a million row table on my humble i5/6GB laptop, the SELECT/INTO edges out the CREATE TABLE by a couple of hundred milliseconds when I don't create an index. That's not the important part to me because it's such a very small difference. What's important to me is that I don't have to write the code to create the table to begin with and that code can sometimes be fairly extensive. If the table is a #Temp Table, then I'm not normally doing any datatyping or making columns NOT NULL with ISNULL. It's all very fast to type because I only need to add two words to a query to get it to build the table... INTO tablename.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 30, 2012 at 7:55 pm
My apologies. I forgot to post the code for my performance claim.
--http://www.sqlservercentral.com/Forums/Topic1378984-149-1.aspx?Update=1
DECLARE @StartTime DATETIME;
RAISERROR('===============================================================',0,1) WITH NOWAIT;
RAISERROR('========== SELECT/INTO',0,1) WITH NOWAIT;
RAISERROR('===============================================================',0,1) WITH NOWAIT;
DBCC FREESYSTEMCACHE ('ALL') WITH NO_INFOMSGS ;
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS ;
SELECT @StartTime = GETDATE();
SELECT TOP (1000000)
SomeID = IDENTITY(INT,1,1),
SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),
SomeCSV = 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10',
SomeMoney = ABS(CHECKSUM(NEWID()))%10000 /100.0,
SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),
SomeHex09 = RIGHT(NEWID(),09)
INTO #JBMTest
FROM sys.all_columns t1
CROSS JOIN sys.all_columns t2
;
--===== Add the PK
-- Takes about 1 second to execute.
-- ALTER TABLE #JBMTest
-- ADD PRIMARY KEY CLUSTERED (SomeID)
--;
--===== Duration and HouseKeeping
PRINT 'Duration = ' + CONVERT(CHAR(12),GETDATE()-@StartTime,114);
DROP TABLE #JBMTest
;
GO 5
DECLARE @StartTime DATETIME;
RAISERROR('===============================================================',0,1) WITH NOWAIT;
RAISERROR('========== Precreate Table with Index',0,1) WITH NOWAIT;
RAISERROR('===============================================================',0,1) WITH NOWAIT;
DBCC FREESYSTEMCACHE ('ALL') WITH NO_INFOMSGS ;
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS ;
SELECT @StartTime = GETDATE();
CREATE TABLE [dbo].[#JBMTest](
[SomeID] [int] IDENTITY(1,1) NOT NULL,
[SomeInt] [int] NULL,
[SomeLetters2] [char](2) NULL,
[SomeCSV] [varchar](80) NULL,
[SomeMoney] [money] NULL,
[SomeDate] [datetime] NULL,
[SomeHex09] [char](9) NULL,
--PRIMARY KEY CLUSTERED
--(
--[SomeID] ASC
--)
)
;
INSERT INTO #JBMTest
SELECT TOP (1000000)
--SomeID = IDENTITY(INT,1,1),
SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),
SomeCSV = 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10',
SomeMoney = ABS(CHECKSUM(NEWID()))%10000 /100.0,
SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),
SomeHex09 = RIGHT(NEWID(),09)
FROM sys.all_columns t1
CROSS JOIN sys.all_columns t2
;
--===== Duration and HouseKeeping
PRINT 'Duration = ' + CONVERT(CHAR(12),GETDATE()-@StartTime,114);
DROP TABLE #JBMTest
;
GO 5
RAISERROR('===============================================================',0,1) WITH NOWAIT;
Here are the run results in my i5 laptop. 5 runs each. The test code clears cache at the system level so make sure you don't run it on a production box.
Beginning execution loop
===============================================================
========== SELECT/INTO
===============================================================
(1000000 row(s) affected)
Duration = 00:00:03:460
===============================================================
========== SELECT/INTO
===============================================================
(1000000 row(s) affected)
Duration = 00:00:03:480
===============================================================
========== SELECT/INTO
===============================================================
(1000000 row(s) affected)
Duration = 00:00:03:647
===============================================================
========== SELECT/INTO
===============================================================
(1000000 row(s) affected)
Duration = 00:00:03:467
===============================================================
========== SELECT/INTO
===============================================================
(1000000 row(s) affected)
Duration = 00:00:03:497
Batch execution completed 5 times.
Beginning execution loop
===============================================================
========== Precreate Table with Index
===============================================================
(1000000 row(s) affected)
Duration = 00:00:04:070
===============================================================
========== Precreate Table with Index
===============================================================
(1000000 row(s) affected)
Duration = 00:00:04:177
===============================================================
========== Precreate Table with Index
===============================================================
(1000000 row(s) affected)
Duration = 00:00:04:113
===============================================================
========== Precreate Table with Index
===============================================================
(1000000 row(s) affected)
Duration = 00:00:04:123
===============================================================
========== Precreate Table with Index
===============================================================
(1000000 row(s) affected)
Duration = 00:00:03:997
Batch execution completed 5 times.
===============================================================
--Jeff Moden
Change is inevitable... Change for the better is not.
November 6, 2012 at 11:39 am
Jeff Moden,
Please clarify this subject for me:
I'm almost certain that I read somewhere that a "SELECT .... INTO #Table" would automatically create the target table, then lock the TempDB database for the duration of the data insertions.
Is this true or is it incorrect information?
Thanks,
November 6, 2012 at 11:50 am
Completely false.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 6, 2012 at 1:29 pm
Lee Crain (11/6/2012)
Jeff Moden,Please clarify this subject for me:
I'm almost certain that I read somewhere that a "SELECT .... INTO #Table" would automatically create the target table, then lock the TempDB database for the duration of the data insertions.
Is this true or is it incorrect information?
Thanks,
Used to be partially true in SQL 7.5 and before. Schema locks on tempdb were needed in order to create temp tables back then. In any version of SQL Server from SQL 2000 or later, that is no longer true. But even in the older versions, the locks only lasted as long as it took to create the table, not for the duration of the inserts, if I remember correctly.
On the main subject, I use Select Into when the structure of the temp table is subject to change without notice, or when I'm trying to debug truncation issues and so on. I generally use Create...Insert...Select for temp tables where I want to control the structure of the temp table. In either case, I usually add a conditional Drop Table before the create step.
IF OBJECT_ID(N'tempdb..#T') IS NOT NULL
DROP TABLE #T;
CREATE TABLE #T
(ID INT IDENTITY
PRIMARY KEY,
MyColumn1 INT NOT NULL);
INSERT INTO #T
(MyColumn1)
VALUES (1),
(2),
(5);
Like that.
Makes it so the script can be run repeatedly for debugging/testing/refactoring.
Breaking it up that way makes it more "self-documenting", it my opinion. I find it much easier to overlook an "INTO" in a Select statement, than to overlook a "CREATE TABLE" with some whitespace around it.
Plus habits from SQL 2000 and before, where mixing DDL and DML could cause performance issues - not as important any more, but it's a habit and I haven't bothered to change it.
So I tend towards explicitly creating temp tables.
One very useful application of Select Into, though, is when you get a truncation error from an ETL process that was working before. Use Select Into, pull the structure of the resulting table, compare it to your target table, and the column with a longer max_length will usually leap right out. If you name the columns the same as the target table, you can even do an Outer Join on the two sets of column definitions and find where the temp table has a bigger scope. Speeds up debugging that kind of thing marvelously when you have a dozen or more varchar columns, and all you get from the SQL engine is "string or binary data would be truncated", with no indication of what column/row you need to look at.
There are lots of other little shortcuts like that where Select Into can save a lot of time and work.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
November 6, 2012 at 2:30 pm
Thanks, Gail, and thanks, G-Squared, especially for the details regarding debugging SSIS problems.
SSIS is not one of my strong points. I'm going to tuck that advice away for future reference.
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply