Which one is better select * into # temp from tableA Vs create #temp table insert into...

Question

Which one is better select * into # temp from tableA Vs create #temp table insert into...

Viewing 15 posts - 16 through 30 (of 60 total)

You must be logged in to reply to this topic. Login to reply

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 1

lrutkowski (6/23/2011)
I was always taught to use the create table, insert method. not because it is faster, but because it blocks the sysobjects while the table is being created and the data is inserted. Don't know oif this is still true, but old habits die hard. You could do a select into where 1=2, and then an insert. then the select into would create the table, not insert data, and be very quick.
Leonard

That myth should have died in sql 7. Sql 11 is coming out and it's not anywhere as near as dead as I'd like to see it!

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 2

tfifield (6/23/2011)
TheSQLGuru (6/23/2011)
sqldba_icon (6/22/2011)
I really appreciate and thank all of you for taking some time to reply to my post. I think i will try creating a temp table first, then an index(probably clustered) and then insert data into temp table. Now the amount of data being inserted into temp table is pretty constant about 170K - 200 K records.
Please tell me why you want to create an index on the temp table. In 15 years of doing SQL Server work I can probably count on 1 or maybe 2 hands the number of times when an index on a temp table was appropriate for it's given use.
Kevin,
I've found that if the temp table is used to join to the main table on pretty much the same index (especially if the index on the main table is covering), I can get a MERGE instead of HASH JOIN. This sometimes will greatly boost performance.
However, for the most part I don't bother since creating the index on the temp table takes away more than I get back on the JOIN.
Todd Fifield

That could be a valid use. I note that I have seen the optimizer actually SORT data in a much smaller table to utilize a MERGE join in a larger one.

I also note that it should be better to put the data into the table and THEN build the index. Should be both faster AND result in a non-fragmented index. Oh, and obviously you want said index created with explicit 100% fill factor too.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Orlando Colamatteo SSC Guru Points: 182293 More actions · Answer 3

TheSQLGuru (6/23/2011)
I also note that it should be better to put the data into the table and THEN build the index. Should be both faster AND result in a non-fragmented index.

Kevin, Please help me understand this concept (note my earlier comment about internals). Links, RTFM and lmgtfy.com are fine if it's a can of worms but I figured I would ask. For this pseudocode, why would fragmentation occur? and why would it be faster?

1. Create table.

2. Create proper unique clustered index.

3. Ask for INSERT...SELECT where the SELECT includes an ORDER BY naming the same column order as the unique clustered index.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Leonard Rutkowski SSCrazy Points: 2668 More actions · Answer 4

That myth should have died in sql 7. Sql 11 is coming out and it's not anywhere as near as dead as I'd like to see it!

yep, like I said, old habits die hard. Right after I posted I googled it, and saw it was fixed after 7. So I guess you can teach an old dog new tricks.:-)

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 5

opc.three (6/23/2011)
TheSQLGuru (6/23/2011)
I also note that it should be better to put the data into the table and THEN build the index. Should be both faster AND result in a non-fragmented index.
Kevin, Please help me understand this concept (note my earlier comment about internals). Links, RTFM and lmgtfy.com are fine if it's a can of worms but I figured I would ask. For this pseudocode, why would fragmentation occur? and why would it be faster?
1. Create table.
2. Create proper unique clustered index.
3. Ask for INSERT...SELECT where the SELECT includes an ORDER BY naming the same column order as the unique clustered index.

You are giving a very specific case here that was not what I understood the OP was intending nor is it a generic case. But here goes: even for your exact case the optimizer and storage engine too would have to be CODED to KNOW that the SELECT statement doing the population of the INSERT exactly matches the clustered index of the table. I do not know that it is (although logically it SHOULD be). I also wonder if the UNIQUENESS of your particular index might not monkey with things in that the extra check for uniqueness might not get in the way of allowing a good data flow from the SELECT into the table. If everything is set up in the engine to allow direct ordered insert your example may well be faster than populating the table and then building the index afterwards. But in the generic case of non-sorted data it would not nor would it be faster or especially result in less fragmentation.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

tfifield SSCrazy Eights Points: 9655 More actions · Answer 6

opc.three,

On your question to Kevin.

I've bench marked CREATE TABLE/INSERT INTO vs SELECT INTO many times. What I come up with is that the SELECT INTO is always faster - sometimes a little and sometimes by a lot. Jeff Moden has come up with similar results.

My understanding of ORDER BY is that it is applied to the completed record set, so it would seem that it is the CREATE TABLE/INSERT part that is taking longer than SELECT INTO since the ORDER BY is only being applied to the completed record set anyhow.

Most of my bench marks don't involve an index on the temp table - just the creation of it. I always create the index after the SELECT INTO as Kevin suggested, since the create time is better. If the temp table is large, then I use the 100% fill factor and it seems to help performance in most cases.

Some day I'll bench mark indexing the temp table and possibly write an article about it.

Todd Fifield

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 7

tfifield (6/23/2011)
. . . I always create the index after the SELECT INTO as Kevin suggested, since the create time is better. If the temp table is large, then I use the 100% fill factor and it seems to help performance in most cases.
. . .

Can I inquire about what percentage of the time you use a temp table that you actually do create one or more indexes on it? And do you routinely examine the query plans and performance implications to see if said index(es) actually improve performance? Again I have seen VERY few real-world cases where they were appropriate.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Orlando Colamatteo SSC Guru Points: 182293 More actions · Answer 8

Thanks for taking the time, I appreciate your responses. You're correct Kevin, the thread definitely took on a slightly different personality once indexes were added to the mix and so my question may have appeared to come out of left field, but I swear there was a technical train of thought there. Also, I could not help myself, with the you and the other good folks on this thread I had to throw it out there because of the learning op 🙂 I accept that SELECT...INTO is preferred over CREATE TABLE followed by INSERT...SELECT in the generic case. What little I know of internals still lands me on "it will depend on the scenario" as to which method will be faster when indexes are involved, but it's not like I have a breadth of experience in this area on my side helping me land there so I definitely look forward to reading your article Todd. If you need a proof-reader or tester drop me a PM.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Eric M Russell SSC Guru Points: 125624 More actions · Answer 9

sqldba_icon (6/21/2011)
Can any one suggest based on there experience which one of these are preferred.
i) select * into #temp from tableA
ii) create table #temp (col 1)
insert into #temp
select col 1 from tableA
The number of records inserted into temp table would be between 100 K - 400 K. I couldnt find much info online. Any suggestion would be appreciated. Thanks

Why do you need to select so many rows into a temporary table?

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 10

This conversation resparked a curiousity on my part.

The following was tested in a 2k8 R1 environment off my local drives. End result, and if you read the notes you'll see why: No indexing, SELECT INTO. If indexing, fully prebuild it. Test yourself of course, but that's now my rule of thumb.

-- DROP TABLE CoreTable

-- DROP TABLE AggregateMe

CREATE TABLE CoreTable (ctID INT IDENTITY( 1, 1) NOT NULL,

ctNCID INT NULL,

ctAttr1 INT NOT NULL,

ctAttr2 INT NOT NULL,

CONSTRAINT PK_CoreTable PRIMARY KEY CLUSTERED (ctID)

)

GO

CREATE INDEX idx_CoreTable_Attr ON CoreTable (ctAttr1, ctAttr2)

GO

CREATE TABLE AggregateMe (ctID INT NOT NULL,

ctNCID INT NOT NULL,

amSumUpVal DECIMAL(20,4) NULL,

amAvgUpVal DECIMAL(20,4) NULL)

GO

CREATE CLUSTERED INDEX cidx_AggregateMe ON AggregateMe (ctID)

GO

-- Generate some test data:

INSERT INTO CoreTable (ctAttr1, ctAttr2)

SELECT

rn%10 AS Attr1,

rn%20 AS Attr2

FROM

(SELECT TOP 100000

Row_number() OVER (ORDER BY RAND()) AS rn

FROM

master..syscolumns AS s1

CROSS JOIN

master..syscolumns AS s2

CROSS JOIN

master..syscolumns AS s3

) AS drv

UPDATE CoreTable SET ctNCID = 100005 - ctID --(we want it in a different order)

-- TRUNCATE TABLE AggregateMe

INSERT INTO AggregateMe

SELECT

ct.ctID,

ct.ctNCID,

CONVERT( DECIMAL ( 20, 4), CHECKSUM( NEWID())) / CONVERT( DECIMAL ( 20, 4), CHECKSUM( NEWID())) AS amSumUpVal,

CONVERT( DECIMAL ( 20, 4), CHECKSUM( NEWID())) / CONVERT( DECIMAL ( 20, 4), CHECKSUM( NEWID())) AS amAvgUpVal

FROM

CoreTable AS ct

CROSS JOIN

(SELECT 1 AS n UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5

UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 0)

AS drvMultiplier

GO

select top 10 * from CoreTable

SELECT TOP 50 * FROM AggregateMe

GO

-- So, we have our baseline data in place, now, for testing. Run twice.

SET STATISTICS IO, TIME ON

PRINT 'Create Table'

CREATE TABLE #PreBuild (ctID INT, SumVal Decimal(20,4), AvgVal Decimal(20,4))

PRINT 'Insert Into'

INSERT INTO #PreBuild

SELECT

ctID,

SUM( amSumUpVal) AS SumVal,

AVG( amAvgUpVal) AS AvgVal

FROM

AggregateMe AS am

GROUP BY

ctID

HAVING

SUM( AmSumUpVal) > 0

PRINT 'SELECT INTO'

SELECT

ctID,

SUM( amSumUpVal) AS SumVal,

AVG( amAvgUpVal) AS AvgVal

INTO

#SelectInto

FROM

AggregateMe AS am

GROUP BY

ctID

HAVING

SUM( AmSumUpVal) > 0

SET STATISTICS TIME, IO OFF

IF OBJECT_ID( 'tempdb..#PreBuild') IS NOT NULL

DROP TABLE #PreBuild

IF OBJECT_ID( 'tempdb..#SelectInto') IS NOT NULL

DROP TABLE #SelectInto

GO

-- So, in general, no indexes and whatnot on the tables, the SELECT INTO runs at about 10-15ms faster.

-- This is insignificant in general, especially in context of this build.

-- This is only important in hyper-optimization.

-- NEXT!

-- Index discussions.

-- Note the ctNCID is at the exact same selectivity level as ctID on purpose.

-- This is a functionality optimization test, not a real-world scenario. The idea is to prove

-- the underlying theory, without having to include a heuristics discussion.

-- First, building off the clustered index in AggregateMe

SET NOCOUNT ON

PRINT 'CREATE TABLE - CLUSTERED INDEX BEFORE INSERT, Using ctID'

SET STATISTICS TIME ON

CREATE TABLE #PreBuild (ctID INT, SumVal Decimal(20,4), AvgVal Decimal(20,4))

SET STATISTICS TIME OFF

PRINT 'CREATE INDEX - CLUSTERED INDEX BEFORE INSERT, Using ctID'

SET STATISTICS TIME ON

CREATE CLUSTERED INDEX cidx_Prebuild ON #PreBuild (ctID)

SET STATISTICS TIME OFF

PRINT 'INSERT INTO - CLUSTERED INDEX BEFORE INSERT, Using ctID'

SET STATISTICS TIME ON

INSERT INTO #PreBuild

SELECT

ctID,

SUM( amSumUpVal) AS SumVal,

AVG( amAvgUpVal) AS AvgVal

FROM

AggregateMe AS am

GROUP BY

ctID

HAVING

SUM( AmSumUpVal) > 0

SET STATISTICS TIME OFF

PRINT 'CREATE TABLE - CLUSTERED INDEX AFTER INSERT, Using ctID'

SET STATISTICS TIME ON

CREATE TABLE #PreBuild2 (ctID INT, SumVal Decimal(20,4), AvgVal Decimal(20,4))

SET STATISTICS TIME OFF

PRINT 'INSERT INTO - CLUSTERED INDEX AFTER INSERT, Using ctID'

SET STATISTICS TIME ON

INSERT INTO #PreBuild2

SELECT

ctID,

SUM( amSumUpVal) AS SumVal,

AVG( amAvgUpVal) AS AvgVal

FROM

AggregateMe AS am

GROUP BY

ctID

HAVING

SUM( AmSumUpVal) > 0

SET STATISTICS TIME OFF

PRINT 'CREATE INDEX - CLUSTERED INDEX AFTER INSERT, Using ctID'

SET STATISTICS TIME ON

CREATE CLUSTERED INDEX cidx_Prebuild ON #PreBuild2 (ctID)

SET STATISTICS TIME OFF

PRINT 'SELECT INTO'

SET STATISTICS TIME ON

SELECT

ctID,

SUM( amSumUpVal) AS SumVal,

AVG( amAvgUpVal) AS AvgVal

INTO

#SelectInto

FROM

AggregateMe AS am

GROUP BY

ctID

HAVING

SUM( AmSumUpVal) > 0

SET STATISTICS TIME OFF

PRINT 'CREATE INDEX - CLUSTERED INDEX AFTER INSERT, Using ctID'

SET STATISTICS TIME ON

CREATE CLUSTERED INDEX cidx_SelectInto ON #SelectInto (ctID)

SET STATISTICS TIME OFF

IF OBJECT_ID( 'tempdb..#PreBuild') IS NOT NULL

DROP TABLE #PreBuild

IF OBJECT_ID( 'tempdb..#PreBuild2') IS NOT NULL

DROP TABLE #PreBuild2

IF OBJECT_ID( 'tempdb..#SelectInto') IS NOT NULL

DROP TABLE #SelectInto

GO

-- The reason I wrapped the ON/OFF this way is to emphasize the CREATE INDEX having

-- three unique time components involved that must be summed to have the actual full

-- time it took to build the index. In this case ~150 milliseconds, much higher than

-- the extra ~20 milliseconds that it takes to insert into the table if it's pre-built.

-- Since SELECT INTO can't have the clustered pre-built, it takes less time then the

-- prebuilt table with post-built clustered, but is still more time consuming then the

-- full prebuilt.

-- With this test, we've proven that building the clustered index on the temp table

-- before inserting data that is grouped on the clustered index from the source table

-- is faster. Let's test the ctNCID, which we'll index equivalently.

--This code is the exact same as above, just using the ctNCID column instead.

SET NOCOUNT ON

PRINT 'CREATE TABLE - CLUSTERED INDEX BEFORE INSERT, Using ctID'

SET STATISTICS TIME ON

CREATE TABLE #PreBuild (ctNCID INT, SumVal Decimal(20,4), AvgVal Decimal(20,4))

SET STATISTICS TIME OFF

PRINT 'CREATE INDEX - CLUSTERED INDEX BEFORE INSERT, Using ctID'

SET STATISTICS TIME ON

CREATE CLUSTERED INDEX cidx_Prebuild ON #PreBuild (ctNCID)

SET STATISTICS TIME OFF

PRINT 'INSERT INTO - CLUSTERED INDEX BEFORE INSERT, Using ctID'

SET STATISTICS TIME ON

INSERT INTO #PreBuild

SELECT

ctNCID,

SUM( amSumUpVal) AS SumVal,

AVG( amAvgUpVal) AS AvgVal

FROM

AggregateMe AS am

GROUP BY

ctNCID

HAVING

SUM( AmSumUpVal) > 0

SET STATISTICS TIME OFF

PRINT 'CREATE TABLE - CLUSTERED INDEX AFTER INSERT, Using ctID'

SET STATISTICS TIME ON

CREATE TABLE #PreBuild2 (ctNCID INT, SumVal Decimal(20,4), AvgVal Decimal(20,4))

SET STATISTICS TIME OFF

PRINT 'INSERT INTO - CLUSTERED INDEX AFTER INSERT, Using ctID'

SET STATISTICS TIME ON

INSERT INTO #PreBuild2

SELECT

ctNCID,

SUM( amSumUpVal) AS SumVal,

AVG( amAvgUpVal) AS AvgVal

FROM

AggregateMe AS am

GROUP BY

ctNCID

HAVING

SUM( AmSumUpVal) > 0

SET STATISTICS TIME OFF

PRINT 'CREATE INDEX - CLUSTERED INDEX AFTER INSERT, Using ctID'

SET STATISTICS TIME ON

CREATE CLUSTERED INDEX cidx_Prebuild ON #PreBuild2 (ctNCID)

SET STATISTICS TIME OFF

PRINT 'SELECT INTO'

SET STATISTICS TIME ON

SELECT

ctNCID,

SUM( amSumUpVal) AS SumVal,

AVG( amAvgUpVal) AS AvgVal

INTO

#SelectInto

FROM

AggregateMe AS am

GROUP BY

ctNCID

HAVING

SUM( AmSumUpVal) > 0

SET STATISTICS TIME OFF

PRINT 'CREATE INDEX - CLUSTERED INDEX AFTER INSERT, Using ctID'

SET STATISTICS TIME ON

CREATE CLUSTERED INDEX cidx_SelectInto ON #SelectInto (ctNCID)

SET STATISTICS TIME OFF

IF OBJECT_ID( 'tempdb..#PreBuild') IS NOT NULL

DROP TABLE #PreBuild

IF OBJECT_ID( 'tempdb..#PreBuild2') IS NOT NULL

DROP TABLE #PreBuild2

IF OBJECT_ID( 'tempdb..#SelectInto') IS NOT NULL

DROP TABLE #SelectInto

GO

-- Skipping the gory details, the fully pre-built STILL wins.

-- Even though it takes on average about 40ms longer to insert all the data,

-- the subsequent clustered index build takes about 180ms.

-- So, no indexing, go with SELECT INTO. If you need indexing, fully pre-build it.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Orlando Colamatteo SSC Guru Points: 182293 More actions · Answer 11

Gee Craig, I don't know what to say, except wow! I got pulled into an SSIS mess yesterday afternoon so did not have time to do any testing. Even if I did I am not sure I would have produced anything that quick, nicely done. Using your script my findings were similar on a 2008R2 Standard Edition instance.

I tried adding UNIQUE to all clustered indexes as well as ORDER BY to the INSERT...SELECT to see if it added any time but changes to results were insignificant. I may try this with a wider clustering key but I expect I will adopt the default approach in your conclusion.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 12

Craig Farrell (6/23/2011)
This conversation resparked a curiousity on my part.

When I run that code as a single unit, it keeps returning the following error...

There is already an object named '#PreBuild' in the database.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

FTdenali SSCertifiable Points: 5362 More actions · Answer 13

Ever thought of the Query Analyzer?

the first option is best.

curious_sqldba SSC-Dedicated Points: 36502 More actions · Answer 14

Eric M Russell (6/23/2011)
sqldba_icon (6/21/2011)
Can any one suggest based on there experience which one of these are preferred.
i) select * into #temp from tableA
ii) create table #temp (col 1)
insert into #temp
select col 1 from tableA
The number of records inserted into temp table would be between 100 K - 400 K. I couldnt find much info online. Any suggestion would be appreciated. Thanks
Why do you need to select so many rows into a temporary table?

Eric not sure what do u mean by that question but that is not under my control. There could be a way to reduce the number of records but there are bigger problems like the one mentioned in this topic. thanks

FTdenali SSCertifiable Points: 5362 More actions · Answer 15

SQL 2008 come with query execution plan and estimation. try testing this queries and see the estimated results.