SELECT INTO

Question

SELECT INTO

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004339 More actions · Answer 1

Heh... thanks, Donald... I just knew there had to be something else... maybe even some DELETE triggers somewhere in there that no longer fire due to the Truncate...

Anyway, thanks for the feedback!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 2

But still - I'm kinda seeing what donaldW was getting at (i.e. a substantial increase in performance, much higher than the 40% you mentioned). I ran three tests earlier, and it was along the lines of 4x faster @1M, 5x faster @25M, and almost 7x faster on 100M. (just by modifiying the TOP predicate on your test query.)

I didn't spend enough time trying to figure out why it would be so much faster, but it sounds like it might be worth finding out. I would have figured some kind of extra overhead, but that seems a lot.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1004339 More actions · Answer 3

Thanks for the feedback, Matt...

I suppose the reason why I only got 40% improvement out of SELECT/INTO is because I'm running an "ancient" 5 year old P5 running at only 1.8 GHZ and my disk system is just an IDE... nothing fancy like SCSI or anything like that.

Anyway... glad to see someone else say that "SELECT/INTO" is NOT a dog... Like I said, it's nasty fast even if you have the FULL recovery mode in play.

The only thing you sometimes have to worry about is if you expect an IDENTITY column to keep the insert order. Turns out that the identities are sometimes formed "before" the order by has a chance and, especially if dates are involved, identities may not be faithful to the ORDER BY of the SELECT/INTO. In those cases, you may have to create the table first.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

DonaldW SSCarpal Tunnel Points: 4525 More actions · Answer 4

Jeff said: "The only thing you sometimes have to worry about is if you expect an IDENTITY column to keep the insert order. Turns out that the identities are sometimes formed "before" the order by has a chance and, especially if dates are involved, identities may not be faithful to the ORDER BY of the SELECT/INTO. In those cases, you may have to create the table first."

That's a good point, Jeff. In my case the data was being held for comparison purposes and we didn't have an Identity field. For the kind of situation you reference I wonder if the IDENTITY() function would serve. Say, if you were moving the data from a production table to an archive table and creating it on the fly.

Jeff Moden SSC Guru Points: 1004339 More actions · Answer 5

Sorry... I didn't say it quite correctly... if the table is preformed with an IDENTITY column, you'll be ok... it's actually when you use the SELECT/INTO with the IDENTITY function that you can sometimes run into problems.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

DonaldW SSCarpal Tunnel Points: 4525 More actions · Answer 6

Jeff Moden (12/19/2007)
Sorry... I didn't say it quite correctly... if the table is preformed with an IDENTITY column, you'll be ok... it's actually when you use the SELECT/INTO with the IDENTITY function that you can sometimes run into problems.

I'll have to keep that in mind. If you absolutely had to have a specific order to the records would it work to Select the records into an intermediate temp table, applying the Order By, then Select from that table into the final table with the Identity function? Theoretically the read from the intermediate table still might not be in the desired order but practically have you ever seen it not happen? To be honest, I've not looked at output with that in mind before so I can't say for sure.

Jeff Moden SSC Guru Points: 1004339 More actions · Answer 7

You mean like doing the SELECT/INTO from a "derived" table with a "Top 100 Percent"... dunno... haven't tried that, but I will 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

DonaldW SSCarpal Tunnel Points: 4525 More actions · Answer 8

Jeff Moden (12/19/2007)
You mean like doing the SELECT/INTO from a "derived" table with a "Top 100 Percent"... dunno... haven't tried that, but I will 😉

Yes. What I described was addit the data to a temp table but I had thought of a derived table such as you describe. That would be even better, since there's no physical manifestation. I'll try it, too.

Jeff Moden SSC Guru Points: 1004339 More actions · Answer 9

Ok... here's the code that proves what the problem is... it's not my code, but I forget the name of the good fellow that wrote it for me as a proof... I've highlighted the code that causes the out of order identities (also notice that if you comment out the getdate() from both ORDER BY statements the sequences match)...

--Proof that SELECT INTO with an order by may not produce IDENTITIES in the same order

--as if the table were first created...

drop table #insert

drop table #selectinto

--pre-create table with identity

create table #insert

(rowid int identity(1,1)

,xtype tinyint,

ID int,

colID smallint,

refdate datetime,

primary key clustered (xtype,ID,colid))

--Select Into with identity(int,1,1)

select rowid = identity(int, 1,1)

,sc.xtype

,sc.id

,sc.colid

,getdate() as refDate

into#selectinto

fromnorthwind..syscolumns sc

order by sc.xtype

,sc.id

,sc.colid

,getdate()

--same statement as select into but as Insert into pre-created table with identity column

insert #insert (xtype,id,colid,refdate)

selectsc.xtype

,sc.id

,sc.colid

,getdate()

fromnorthwind..syscolumns sc

order by sc.xtype

,sc.id

,sc.colid

,getdate()

PRINT 'Show Sequence differences'

select i.rowid, si.*

from#insert i

join#selectinto si

on si.xtype = i.xtype

and si.id = i.id

and si.colid = i.colid

wherei.rowid <> si.rowid

Of course, I think the real problem is cause by the fact that there's no need to order by GETDATE() at any time. However, it does show that you could have a problem if something indeterminant shows up in the ORDER BY.

To fix it, ya gotta do something like this (probably safer to always do it like this)... again, I've highlighted the code I changed...

--Proof that SELECT INTO with an order by may not produce IDENTITIES in the same order

--as if the table were first created...

drop table #insert

drop table #selectinto

--pre-cretae table with identity

create table #insert

(rowid int identity(1,1)

,xtype tinyint,

ID int,

colID smallint,

refdate datetime,

primary key clustered (xtype,ID,colid))

--Select Into with identity(int,1,1)

select rowid = identity(int, 1,1),d.*

into#selectinto

FROM (

SELECT TOP 100 PERCENT

sc.xtype

,sc.id

,sc.colid

,getdate() as refDate

fromnorthwind..syscolumns sc

order by sc.xtype

,sc.id

,sc.colid

,getdate()

)d

--same statement as select into but as Insert into pre-created table with identity column

insert #insert (xtype,id,colid,refdate)

selectsc.xtype

,sc.id

,sc.colid

,getdate()

fromnorthwind..syscolumns sc

order by sc.xtype

,sc.id

,sc.colid

,getdate()

PRINT 'Show Sequence differences'

select i.rowid, si.*

from#insert i

join#selectinto si

on si.xtype = i.xtype

and si.id = i.id

and si.colid = i.colid

wherei.rowid <> si.rowid

I still think that ordering by GETDATE() is an insane notion, but like I said, it does show that an error is possible if you use anything (perhaps a UDF?) that is indeterminant in the Order By.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)