SELECT INTO

  • I need to copy the data from one table to another table. which method will provide the optimized performance?

    1) insert into

    or

    2) select * into table from table.

    which method is better???????

  • Change to bulk recovery mode and use "Select into". Select into is minimally logged (see http://msdn2.microsoft.com/en-us/library/ms191244.aspx).

    Regards,

    Andras


    Andras Belokosztolszki, MCPD, PhD
    GoldenGate Software

  • I would do this in two stages.

    SELECT * INTO newtable FROM oldtable WHERE 1=0

    INSERT INTO newtable

    SELECT * FROM oldtable.

    The first statement creates the table in the desired structure but with no records, the 2nd populates it.

    Things to watch out for.

    1. Identity values. INSERT INTO SELECT * won't work if there are identity values.

    2. The size of the table in terms of number of records. If you are talking about millions of records then having a number of batched inserts may be a better bet and won't blow your log file to bits.

  • I think David recommends his method because SELECT/INTO will temporarily lock SysObjects while the data is transferring...

    However, I prefer the SELECT/INTO method for two reasons...

    1. If you CAN temporarily change to the Bulk or Simple recovery modes, there's a lot less logging (other factors must also be true... see Books Online for details).

    2. It's nasty fast even with a couple of million rows... you won't have the locks on SysObjects for long. Other processes will simply wait.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • So why would SELECT INTO not lock sysobjects?

    Surely any form of creation of an object would have a small lock on sysobjects?

    The other point is, how often are records created in sysobjects in any case? Application code doesn't create tables, constraints, procs or functions so surely any locking on sysobjects is irrelevant?

  • So why do you recommend creating the table first?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The main problem SELECT INTO is that is locks sysobjects (especially in tempdb) when lots of procedures make use of temporary tables having to wait until the insert is finished. (if I read it right from previous posts)

  • Yep... know that... see my post above...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi,

    Check the following post http://sql-server-performance.com/Community/forums/p/12553/70520.aspx

    Regards,

    Ahmed

  • Jeff Moden (12/14/2007)


    So why do you recommend creating the table first?

    Using SELECT INTO to create the initial table structure with no records is "blink of an eye" fast.

    SELECT INTO to populate the entire recordset is painfully slow compared to INSERT INTO. In addition you can batch up your INSERT INTOs if you want to.

  • SELECT INTO to populate the entire recordset is painfully slow compared to INSERT INTO

    Not real sure where you're coming up with that... I agree that SELECT/INTO will hold a lock on the sysObjects table longer than the Create/Insert/Select... but, if you run the following code a couple of times, you'll see that SELECT/INTO is consistently about 40% faster than INSERT/SELECT and it doesn't matter what the recovery mode is...

    SET NOCOUNT ON

    --=============================================================================

    -- Million row test for SELECT/INTO

    --=============================================================================

    IF OBJECT_ID('dbo.jbmTest','U') IS NOT NULL

    DROP TABLE dbo.jbmTest

    DECLARE @StartTime DATETIME

    SET @StartTime = GETDATE()

    --===== Create and populate the Test table on the fly

    SELECT TOP 1000000

    sc1.ID AS ID1, sc2.ID AS ID2

    INTO dbo.jbmTest

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    SELECT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration SELECT/INTO'

    GO

    --=============================================================================

    -- Million row test for INSERT/SELECT

    --=============================================================================

    IF OBJECT_ID('dbo.jbmTest','U') IS NOT NULL

    DROP TABLE dbo.jbmTest

    DECLARE @StartTime DATETIME

    SET @StartTime = GETDATE()

    --===== Create the empty table

    SELECT sc1.ID AS ID1, sc2.ID AS ID2

    INTO dbo.jbmTest

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    WHERE 1 = 0

    --===== Populate the table using INSERT/SELECT

    INSERT INTO dbo.jbmTest

    SELECT TOP 1000000

    sc1.ID AS ID1, sc2.ID AS ID2

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    SELECT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration INSERT/SELECT'

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Indeed, the select into is faster in the example

    Select into: avg 1-2 seconds

    Table 'Worktable'. Scan count 201, logical reads 6574, physical reads 0, read-ahead reads 0.

    Table 'jbmTest'. Scan count 0, logical reads 1, physical reads 0, read-ahead reads 0.

    Table 'syscolumns'. Scan count 2, logical reads 45, physical reads 2, read-ahead reads 19.

    Insert into: 3 seconds

    Table 'Worktable'. Scan count 201, logical reads 6574, physical reads 0, read-ahead reads 0.

    Table 'jbmTest2'. Scan count 0, logical reads 1000001, physical reads 0, read-ahead reads 0.

    Table 'syscolumns'. Scan count 2, logical reads 45, physical reads 1, read-ahead reads 17.

    The select into has the advantage that no checking is required on doubles/constraints because there can't be none.

    That might explain why insert into requires 1000001 logical reads. (even though there are no constraints/indexes specified?)

  • Jeff Moden (12/14/2007)


    I think David recommends his method because SELECT/INTO will temporarily lock SysObjects while the data is transferring...

    However, I prefer the SELECT/INTO method for two reasons...

    1. If you CAN temporarily change to the Bulk or Simple recovery modes, there's a lot less logging (other factors must also be true... see Books Online for details).

    2. It's nasty fast even with a couple of million rows... you won't have the locks on SysObjects for long. Other processes will simply wait.

    You ain't kidding about fast. I inherited a stored procedure that ran like a lame dog when dealing with a million or so records - took about ten hours to process. :sick:

    I tracked it down to an Insert Into. I switched it to a Select...Into and the time dropped to 15 minutes for the entire procedure. 😎 Needless to say, the folks I work for were VERY happy about it.

    If there's any other way to wring additional speed out of such an operation I'd LOVE to hear about it! 😀

  • 10 hours to 15 minutes? Heh... I like SELECT/INTO a lot but I've not seen that type of performance gain just by switching like that before... Are you sure you're not overlooking something else that you did?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/18/2007)


    10 hours to 15 minutes? Heh... I like SELECT/INTO a lot but I've not seen that type of performance gain just by switching like that before... Are you sure you're not overlooking something else that you did?

    You know, now that you mention it I also change a Delete From to a Truncate Table. I'd forgotten that until I just reviewed the code to answer your question. Ah, the power of minimally logged operations!

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply