December 19, 2007 at 6:48 am
Heh... thanks, Donald... I just knew there had to be something else... maybe even some DELETE triggers somewhere in there that no longer fire due to the Truncate...
Anyway, thanks for the feedback!
--Jeff Moden
Change is inevitable... Change for the better is not.
December 19, 2007 at 11:45 am
But still - I'm kinda seeing what donaldW was getting at (i.e. a substantial increase in performance, much higher than the 40% you mentioned). I ran three tests earlier, and it was along the lines of 4x faster @1M, 5x faster @25M, and almost 7x faster on 100M. (just by modifiying the TOP predicate on your test query.)
I didn't spend enough time trying to figure out why it would be so much faster, but it sounds like it might be worth finding out. I would have figured some kind of extra overhead, but that seems a lot.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
December 19, 2007 at 12:02 pm
Thanks for the feedback, Matt...
I suppose the reason why I only got 40% improvement out of SELECT/INTO is because I'm running an "ancient" 5 year old P5 running at only 1.8 GHZ and my disk system is just an IDE... nothing fancy like SCSI or anything like that.
Anyway... glad to see someone else say that "SELECT/INTO" is NOT a dog... Like I said, it's nasty fast even if you have the FULL recovery mode in play.
The only thing you sometimes have to worry about is if you expect an IDENTITY column to keep the insert order. Turns out that the identities are sometimes formed "before" the order by has a chance and, especially if dates are involved, identities may not be faithful to the ORDER BY of the SELECT/INTO. In those cases, you may have to create the table first.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 19, 2007 at 12:21 pm
Jeff said: "The only thing you sometimes have to worry about is if you expect an IDENTITY column to keep the insert order. Turns out that the identities are sometimes formed "before" the order by has a chance and, especially if dates are involved, identities may not be faithful to the ORDER BY of the SELECT/INTO. In those cases, you may have to create the table first."
That's a good point, Jeff. In my case the data was being held for comparison purposes and we didn't have an Identity field. For the kind of situation you reference I wonder if the IDENTITY() function would serve. Say, if you were moving the data from a production table to an archive table and creating it on the fly.
December 19, 2007 at 1:00 pm
Sorry... I didn't say it quite correctly... if the table is preformed with an IDENTITY column, you'll be ok... it's actually when you use the SELECT/INTO with the IDENTITY function that you can sometimes run into problems.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 19, 2007 at 1:16 pm
Jeff Moden (12/19/2007)
Sorry... I didn't say it quite correctly... if the table is preformed with an IDENTITY column, you'll be ok... it's actually when you use the SELECT/INTO with the IDENTITY function that you can sometimes run into problems.
I'll have to keep that in mind. If you absolutely had to have a specific order to the records would it work to Select the records into an intermediate temp table, applying the Order By, then Select from that table into the final table with the Identity function? Theoretically the read from the intermediate table still might not be in the desired order but practically have you ever seen it not happen? To be honest, I've not looked at output with that in mind before so I can't say for sure.
December 19, 2007 at 1:44 pm
You mean like doing the SELECT/INTO from a "derived" table with a "Top 100 Percent"... dunno... haven't tried that, but I will 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
December 19, 2007 at 1:49 pm
Jeff Moden (12/19/2007)
You mean like doing the SELECT/INTO from a "derived" table with a "Top 100 Percent"... dunno... haven't tried that, but I will 😉
Yes. What I described was addit the data to a temp table but I had thought of a derived table such as you describe. That would be even better, since there's no physical manifestation. I'll try it, too.
December 20, 2007 at 11:24 am
Ok... here's the code that proves what the problem is... it's not my code, but I forget the name of the good fellow that wrote it for me as a proof... I've highlighted the code that causes the out of order identities (also notice that if you comment out the getdate() from both ORDER BY statements the sequences match)...
--Proof that SELECT INTO with an order by may not produce IDENTITIES in the same order
--as if the table were first created...
drop table #insert
drop table #selectinto
--pre-create table with identity
create table #insert
(rowid int identity(1,1)
,xtype tinyint,
ID int,
colID smallint,
refdate datetime,
primary key clustered (xtype,ID,colid))
--Select Into with identity(int,1,1)
select rowid = identity(int, 1,1)
,sc.xtype
,sc.id
,sc.colid
,getdate() as refDate
into#selectinto
fromnorthwind..syscolumns sc
order by sc.xtype
,sc.id
,sc.colid
,getdate()
--same statement as select into but as Insert into pre-created table with identity column
insert #insert (xtype,id,colid,refdate)
selectsc.xtype
,sc.id
,sc.colid
,getdate()
fromnorthwind..syscolumns sc
order by sc.xtype
,sc.id
,sc.colid
,getdate()
PRINT 'Show Sequence differences'
select i.rowid, si.*
from#insert i
join#selectinto si
on si.xtype = i.xtype
and si.id = i.id
and si.colid = i.colid
wherei.rowid <> si.rowid
Of course, I think the real problem is cause by the fact that there's no need to order by GETDATE() at any time. However, it does show that you could have a problem if something indeterminant shows up in the ORDER BY.
To fix it, ya gotta do something like this (probably safer to always do it like this)... again, I've highlighted the code I changed...
--Proof that SELECT INTO with an order by may not produce IDENTITIES in the same order
--as if the table were first created...
drop table #insert
drop table #selectinto
--pre-cretae table with identity
create table #insert
(rowid int identity(1,1)
,xtype tinyint,
ID int,
colID smallint,
refdate datetime,
primary key clustered (xtype,ID,colid))
--Select Into with identity(int,1,1)
select rowid = identity(int, 1,1),d.*
into#selectinto
FROM (
SELECT TOP 100 PERCENT
sc.xtype
,sc.id
,sc.colid
,getdate() as refDate
fromnorthwind..syscolumns sc
order by sc.xtype
,sc.id
,sc.colid
,getdate()
)d
--same statement as select into but as Insert into pre-created table with identity column
insert #insert (xtype,id,colid,refdate)
selectsc.xtype
,sc.id
,sc.colid
,getdate()
fromnorthwind..syscolumns sc
order by sc.xtype
,sc.id
,sc.colid
,getdate()
PRINT 'Show Sequence differences'
select i.rowid, si.*
from#insert i
join#selectinto si
on si.xtype = i.xtype
and si.id = i.id
and si.colid = i.colid
wherei.rowid <> si.rowid
I still think that ordering by GETDATE() is an insane notion, but like I said, it does show that an error is possible if you use anything (perhaps a UDF?) that is indeterminant in the Order By.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 9 posts - 16 through 23 (of 23 total)
You must be logged in to reply to this topic. Login to reply