June 23, 2011 at 11:13 am
lrutkowski (6/23/2011)
I was always taught to use the create table, insert method. not because it is faster, but because it blocks the sysobjects while the table is being created and the data is inserted. Don't know oif this is still true, but old habits die hard. You could do a select into where 1=2, and then an insert. then the select into would create the table, not insert data, and be very quick.Leonard
That myth should have died in sql 7. Sql 11 is coming out and it's not anywhere as near as dead as I'd like to see it!
June 23, 2011 at 11:55 am
tfifield (6/23/2011)
TheSQLGuru (6/23/2011)
sqldba_icon (6/22/2011)
I really appreciate and thank all of you for taking some time to reply to my post. I think i will try creating a temp table first, then an index(probably clustered) and then insert data into temp table. Now the amount of data being inserted into temp table is pretty constant about 170K - 200 K records.
Please tell me why you want to create an index on the temp table. In 15 years of doing SQL Server work I can probably count on 1 or maybe 2 hands the number of times when an index on a temp table was appropriate for it's given use.
Kevin,
I've found that if the temp table is used to join to the main table on pretty much the same index (especially if the index on the main table is covering), I can get a MERGE instead of HASH JOIN. This sometimes will greatly boost performance.
However, for the most part I don't bother since creating the index on the temp table takes away more than I get back on the JOIN.
Todd Fifield
That could be a valid use. I note that I have seen the optimizer actually SORT data in a much smaller table to utilize a MERGE join in a larger one.
I also note that it should be better to put the data into the table and THEN build the index. Should be both faster AND result in a non-fragmented index. Oh, and obviously you want said index created with explicit 100% fill factor too.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
June 23, 2011 at 12:08 pm
TheSQLGuru (6/23/2011)
I also note that it should be better to put the data into the table and THEN build the index. Should be both faster AND result in a non-fragmented index.
Kevin, Please help me understand this concept (note my earlier comment about internals). Links, RTFM and lmgtfy.com are fine if it's a can of worms but I figured I would ask. For this pseudocode, why would fragmentation occur? and why would it be faster?
1. Create table.
2. Create proper unique clustered index.
3. Ask for INSERT...SELECT where the SELECT includes an ORDER BY naming the same column order as the unique clustered index.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
June 23, 2011 at 12:53 pm
That myth should have died in sql 7. Sql 11 is coming out and it's not anywhere as near as dead as I'd like to see it!
yep, like I said, old habits die hard. Right after I posted I googled it, and saw it was fixed after 7. So I guess you can teach an old dog new tricks.:-)
June 23, 2011 at 1:06 pm
opc.three (6/23/2011)
TheSQLGuru (6/23/2011)
I also note that it should be better to put the data into the table and THEN build the index. Should be both faster AND result in a non-fragmented index.Kevin, Please help me understand this concept (note my earlier comment about internals). Links, RTFM and lmgtfy.com are fine if it's a can of worms but I figured I would ask. For this pseudocode, why would fragmentation occur? and why would it be faster?
1. Create table.
2. Create proper unique clustered index.
3. Ask for INSERT...SELECT where the SELECT includes an ORDER BY naming the same column order as the unique clustered index.
You are giving a very specific case here that was not what I understood the OP was intending nor is it a generic case. But here goes: even for your exact case the optimizer and storage engine too would have to be CODED to KNOW that the SELECT statement doing the population of the INSERT exactly matches the clustered index of the table. I do not know that it is (although logically it SHOULD be). I also wonder if the UNIQUENESS of your particular index might not monkey with things in that the extra check for uniqueness might not get in the way of allowing a good data flow from the SELECT into the table. If everything is set up in the engine to allow direct ordered insert your example may well be faster than populating the table and then building the index afterwards. But in the generic case of non-sorted data it would not nor would it be faster or especially result in less fragmentation.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
June 23, 2011 at 1:14 pm
opc.three,
On your question to Kevin.
I've bench marked CREATE TABLE/INSERT INTO vs SELECT INTO many times. What I come up with is that the SELECT INTO is always faster - sometimes a little and sometimes by a lot. Jeff Moden has come up with similar results.
My understanding of ORDER BY is that it is applied to the completed record set, so it would seem that it is the CREATE TABLE/INSERT part that is taking longer than SELECT INTO since the ORDER BY is only being applied to the completed record set anyhow.
Most of my bench marks don't involve an index on the temp table - just the creation of it. I always create the index after the SELECT INTO as Kevin suggested, since the create time is better. If the temp table is large, then I use the 100% fill factor and it seems to help performance in most cases.
Some day I'll bench mark indexing the temp table and possibly write an article about it.
Todd Fifield
June 23, 2011 at 1:41 pm
tfifield (6/23/2011)
. . . I always create the index after the SELECT INTO as Kevin suggested, since the create time is better. If the temp table is large, then I use the 100% fill factor and it seems to help performance in most cases.. . .
Can I inquire about what percentage of the time you use a temp table that you actually do create one or more indexes on it? And do you routinely examine the query plans and performance implications to see if said index(es) actually improve performance? Again I have seen VERY few real-world cases where they were appropriate.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
June 23, 2011 at 1:47 pm
Thanks for taking the time, I appreciate your responses. You're correct Kevin, the thread definitely took on a slightly different personality once indexes were added to the mix and so my question may have appeared to come out of left field, but I swear there was a technical train of thought there. Also, I could not help myself, with the you and the other good folks on this thread I had to throw it out there because of the learning op 🙂 I accept that SELECT...INTO is preferred over CREATE TABLE followed by INSERT...SELECT in the generic case. What little I know of internals still lands me on "it will depend on the scenario" as to which method will be faster when indexes are involved, but it's not like I have a breadth of experience in this area on my side helping me land there so I definitely look forward to reading your article Todd. If you need a proof-reader or tester drop me a PM.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
June 23, 2011 at 3:29 pm
sqldba_icon (6/21/2011)
Can any one suggest based on there experience which one of these are preferred.i) select * into #temp from tableA
ii) create table #temp (col 1)
insert into #temp
select col 1 from tableA
The number of records inserted into temp table would be between 100 K - 400 K. I couldnt find much info online. Any suggestion would be appreciated. Thanks
Why do you need to select so many rows into a temporary table?
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
June 23, 2011 at 5:31 pm
This conversation resparked a curiousity on my part.
The following was tested in a 2k8 R1 environment off my local drives. End result, and if you read the notes you'll see why: No indexing, SELECT INTO. If indexing, fully prebuild it. Test yourself of course, but that's now my rule of thumb.
-- DROP TABLE CoreTable
-- DROP TABLE AggregateMe
CREATE TABLE CoreTable (ctID INT IDENTITY( 1, 1) NOT NULL,
ctNCID INT NULL,
ctAttr1 INT NOT NULL,
ctAttr2 INT NOT NULL,
CONSTRAINT PK_CoreTable PRIMARY KEY CLUSTERED (ctID)
)
GO
CREATE INDEX idx_CoreTable_Attr ON CoreTable (ctAttr1, ctAttr2)
GO
CREATE TABLE AggregateMe (ctID INT NOT NULL,
ctNCID INT NOT NULL,
amSumUpVal DECIMAL(20,4) NULL,
amAvgUpVal DECIMAL(20,4) NULL)
GO
CREATE CLUSTERED INDEX cidx_AggregateMe ON AggregateMe (ctID)
GO
-- Generate some test data:
INSERT INTO CoreTable (ctAttr1, ctAttr2)
SELECT
rn%10 AS Attr1,
rn%20 AS Attr2
FROM
(SELECT TOP 100000
Row_number() OVER (ORDER BY RAND()) AS rn
FROM
master..syscolumns AS s1
CROSS JOIN
master..syscolumns AS s2
CROSS JOIN
master..syscolumns AS s3
) AS drv
UPDATE CoreTable SET ctNCID = 100005 - ctID --(we want it in a different order)
-- TRUNCATE TABLE AggregateMe
INSERT INTO AggregateMe
SELECT
ct.ctID,
ct.ctNCID,
CONVERT( DECIMAL ( 20, 4), CHECKSUM( NEWID())) / CONVERT( DECIMAL ( 20, 4), CHECKSUM( NEWID())) AS amSumUpVal,
CONVERT( DECIMAL ( 20, 4), CHECKSUM( NEWID())) / CONVERT( DECIMAL ( 20, 4), CHECKSUM( NEWID())) AS amAvgUpVal
FROM
CoreTable AS ct
CROSS JOIN
(SELECT 1 AS n UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5
UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 0)
AS drvMultiplier
GO
select top 10 * from CoreTable
SELECT TOP 50 * FROM AggregateMe
GO
-- So, we have our baseline data in place, now, for testing. Run twice.
SET STATISTICS IO, TIME ON
PRINT 'Create Table'
CREATE TABLE #PreBuild (ctID INT, SumVal Decimal(20,4), AvgVal Decimal(20,4))
PRINT 'Insert Into'
INSERT INTO #PreBuild
SELECT
ctID,
SUM( amSumUpVal) AS SumVal,
AVG( amAvgUpVal) AS AvgVal
FROM
AggregateMe AS am
GROUP BY
ctID
HAVING
SUM( AmSumUpVal) > 0
PRINT 'SELECT INTO'
SELECT
ctID,
SUM( amSumUpVal) AS SumVal,
AVG( amAvgUpVal) AS AvgVal
INTO
#SelectInto
FROM
AggregateMe AS am
GROUP BY
ctID
HAVING
SUM( AmSumUpVal) > 0
SET STATISTICS TIME, IO OFF
IF OBJECT_ID( 'tempdb..#PreBuild') IS NOT NULL
DROP TABLE #PreBuild
IF OBJECT_ID( 'tempdb..#SelectInto') IS NOT NULL
DROP TABLE #SelectInto
GO
-- So, in general, no indexes and whatnot on the tables, the SELECT INTO runs at about 10-15ms faster.
-- This is insignificant in general, especially in context of this build.
-- This is only important in hyper-optimization.
-- NEXT!
-- Index discussions.
-- Note the ctNCID is at the exact same selectivity level as ctID on purpose.
-- This is a functionality optimization test, not a real-world scenario. The idea is to prove
-- the underlying theory, without having to include a heuristics discussion.
-- First, building off the clustered index in AggregateMe
SET NOCOUNT ON
PRINT 'CREATE TABLE - CLUSTERED INDEX BEFORE INSERT, Using ctID'
SET STATISTICS TIME ON
CREATE TABLE #PreBuild (ctID INT, SumVal Decimal(20,4), AvgVal Decimal(20,4))
SET STATISTICS TIME OFF
PRINT 'CREATE INDEX - CLUSTERED INDEX BEFORE INSERT, Using ctID'
SET STATISTICS TIME ON
CREATE CLUSTERED INDEX cidx_Prebuild ON #PreBuild (ctID)
SET STATISTICS TIME OFF
PRINT 'INSERT INTO - CLUSTERED INDEX BEFORE INSERT, Using ctID'
SET STATISTICS TIME ON
INSERT INTO #PreBuild
SELECT
ctID,
SUM( amSumUpVal) AS SumVal,
AVG( amAvgUpVal) AS AvgVal
FROM
AggregateMe AS am
GROUP BY
ctID
HAVING
SUM( AmSumUpVal) > 0
SET STATISTICS TIME OFF
PRINT 'CREATE TABLE - CLUSTERED INDEX AFTER INSERT, Using ctID'
SET STATISTICS TIME ON
CREATE TABLE #PreBuild2 (ctID INT, SumVal Decimal(20,4), AvgVal Decimal(20,4))
SET STATISTICS TIME OFF
PRINT 'INSERT INTO - CLUSTERED INDEX AFTER INSERT, Using ctID'
SET STATISTICS TIME ON
INSERT INTO #PreBuild2
SELECT
ctID,
SUM( amSumUpVal) AS SumVal,
AVG( amAvgUpVal) AS AvgVal
FROM
AggregateMe AS am
GROUP BY
ctID
HAVING
SUM( AmSumUpVal) > 0
SET STATISTICS TIME OFF
PRINT 'CREATE INDEX - CLUSTERED INDEX AFTER INSERT, Using ctID'
SET STATISTICS TIME ON
CREATE CLUSTERED INDEX cidx_Prebuild ON #PreBuild2 (ctID)
SET STATISTICS TIME OFF
PRINT 'SELECT INTO'
SET STATISTICS TIME ON
SELECT
ctID,
SUM( amSumUpVal) AS SumVal,
AVG( amAvgUpVal) AS AvgVal
INTO
#SelectInto
FROM
AggregateMe AS am
GROUP BY
ctID
HAVING
SUM( AmSumUpVal) > 0
SET STATISTICS TIME OFF
PRINT 'CREATE INDEX - CLUSTERED INDEX AFTER INSERT, Using ctID'
SET STATISTICS TIME ON
CREATE CLUSTERED INDEX cidx_SelectInto ON #SelectInto (ctID)
SET STATISTICS TIME OFF
IF OBJECT_ID( 'tempdb..#PreBuild') IS NOT NULL
DROP TABLE #PreBuild
IF OBJECT_ID( 'tempdb..#PreBuild2') IS NOT NULL
DROP TABLE #PreBuild2
IF OBJECT_ID( 'tempdb..#SelectInto') IS NOT NULL
DROP TABLE #SelectInto
GO
-- The reason I wrapped the ON/OFF this way is to emphasize the CREATE INDEX having
-- three unique time components involved that must be summed to have the actual full
-- time it took to build the index. In this case ~150 milliseconds, much higher than
-- the extra ~20 milliseconds that it takes to insert into the table if it's pre-built.
-- Since SELECT INTO can't have the clustered pre-built, it takes less time then the
-- prebuilt table with post-built clustered, but is still more time consuming then the
-- full prebuilt.
-- With this test, we've proven that building the clustered index on the temp table
-- before inserting data that is grouped on the clustered index from the source table
-- is faster. Let's test the ctNCID, which we'll index equivalently.
--This code is the exact same as above, just using the ctNCID column instead.
SET NOCOUNT ON
PRINT 'CREATE TABLE - CLUSTERED INDEX BEFORE INSERT, Using ctID'
SET STATISTICS TIME ON
CREATE TABLE #PreBuild (ctNCID INT, SumVal Decimal(20,4), AvgVal Decimal(20,4))
SET STATISTICS TIME OFF
PRINT 'CREATE INDEX - CLUSTERED INDEX BEFORE INSERT, Using ctID'
SET STATISTICS TIME ON
CREATE CLUSTERED INDEX cidx_Prebuild ON #PreBuild (ctNCID)
SET STATISTICS TIME OFF
PRINT 'INSERT INTO - CLUSTERED INDEX BEFORE INSERT, Using ctID'
SET STATISTICS TIME ON
INSERT INTO #PreBuild
SELECT
ctNCID,
SUM( amSumUpVal) AS SumVal,
AVG( amAvgUpVal) AS AvgVal
FROM
AggregateMe AS am
GROUP BY
ctNCID
HAVING
SUM( AmSumUpVal) > 0
SET STATISTICS TIME OFF
PRINT 'CREATE TABLE - CLUSTERED INDEX AFTER INSERT, Using ctID'
SET STATISTICS TIME ON
CREATE TABLE #PreBuild2 (ctNCID INT, SumVal Decimal(20,4), AvgVal Decimal(20,4))
SET STATISTICS TIME OFF
PRINT 'INSERT INTO - CLUSTERED INDEX AFTER INSERT, Using ctID'
SET STATISTICS TIME ON
INSERT INTO #PreBuild2
SELECT
ctNCID,
SUM( amSumUpVal) AS SumVal,
AVG( amAvgUpVal) AS AvgVal
FROM
AggregateMe AS am
GROUP BY
ctNCID
HAVING
SUM( AmSumUpVal) > 0
SET STATISTICS TIME OFF
PRINT 'CREATE INDEX - CLUSTERED INDEX AFTER INSERT, Using ctID'
SET STATISTICS TIME ON
CREATE CLUSTERED INDEX cidx_Prebuild ON #PreBuild2 (ctNCID)
SET STATISTICS TIME OFF
PRINT 'SELECT INTO'
SET STATISTICS TIME ON
SELECT
ctNCID,
SUM( amSumUpVal) AS SumVal,
AVG( amAvgUpVal) AS AvgVal
INTO
#SelectInto
FROM
AggregateMe AS am
GROUP BY
ctNCID
HAVING
SUM( AmSumUpVal) > 0
SET STATISTICS TIME OFF
PRINT 'CREATE INDEX - CLUSTERED INDEX AFTER INSERT, Using ctID'
SET STATISTICS TIME ON
CREATE CLUSTERED INDEX cidx_SelectInto ON #SelectInto (ctNCID)
SET STATISTICS TIME OFF
IF OBJECT_ID( 'tempdb..#PreBuild') IS NOT NULL
DROP TABLE #PreBuild
IF OBJECT_ID( 'tempdb..#PreBuild2') IS NOT NULL
DROP TABLE #PreBuild2
IF OBJECT_ID( 'tempdb..#SelectInto') IS NOT NULL
DROP TABLE #SelectInto
GO
-- Skipping the gory details, the fully pre-built STILL wins.
-- Even though it takes on average about 40ms longer to insert all the data,
-- the subsequent clustered index build takes about 180ms.
-- So, no indexing, go with SELECT INTO. If you need indexing, fully pre-build it.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
June 24, 2011 at 4:47 am
Gee Craig, I don't know what to say, except wow! I got pulled into an SSIS mess yesterday afternoon so did not have time to do any testing. Even if I did I am not sure I would have produced anything that quick, nicely done. Using your script my findings were similar on a 2008R2 Standard Edition instance.
I tried adding UNIQUE to all clustered indexes as well as ORDER BY to the INSERT...SELECT to see if it added any time but changes to results were insignificant. I may try this with a wider clustering key but I expect I will adopt the default approach in your conclusion.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
June 24, 2011 at 7:47 am
Craig Farrell (6/23/2011)
This conversation resparked a curiousity on my part.
When I run that code as a single unit, it keeps returning the following error...
There is already an object named '#PreBuild' in the database.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 24, 2011 at 8:21 am
Ever thought of the Query Analyzer?
the first option is best.
June 24, 2011 at 8:34 am
Eric M Russell (6/23/2011)
sqldba_icon (6/21/2011)
Can any one suggest based on there experience which one of these are preferred.i) select * into #temp from tableA
ii) create table #temp (col 1)
insert into #temp
select col 1 from tableA
The number of records inserted into temp table would be between 100 K - 400 K. I couldnt find much info online. Any suggestion would be appreciated. Thanks
Why do you need to select so many rows into a temporary table?
Eric not sure what do u mean by that question but that is not under my control. There could be a way to reduce the number of records but there are bigger problems like the one mentioned in this topic. thanks
June 24, 2011 at 8:37 am
SQL 2008 come with query execution plan and estimation. try testing this queries and see the estimated results.
Viewing 15 posts - 16 through 30 (of 59 total)
You must be logged in to reply to this topic. Login to reply