Select * into table2 from table1 - Performance Impact - what do u think?

  • Hi All,

    Data : 9 Million records

    144 GB Ram

    what about the Performance of Select * into table2 from table1 in Bulk recovery model.

    Here it is taking around 1 hour

    Any one has any idea to improve the query to 1-10 Minutes?

    Regards,

    Varun

  • If possible use a where clause to limit the data instead of copying 9 million rows

    Pre-allocate the storage space required so SQL doesnt need to grow the files

  • What is the total size of these 9 million rows?



    Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

    Concatenating Row Values in Transact-SQL[/url]

  • sp_spaceused - information

    60656288 - reserved

    32717008 - data

    27924440 - index_size

    14840 - unused

  • if table has a large size data, I think SELECT * INTO should be avoided due to its limitations.

    I would create New table ahead of time with partitions (each partition on different disk) and use INSERT INTO SELECT ...

    I would also consider in my test applying data compression in both the tables

  • Is that for the DB or the table?

  • I would look into SSIS and play with the fast load options and the batch size to get an optimal solution. Consider dropping and recreating your indexes on the target table.

    edit: no files involved forget bulk insert

  • Daxesh Patel (6/13/2012)


    if table has a large size data, I think SELECT * INTO should be avoided due to its limitations.

    I would create New table ahead of time with partitions (each partition on different disk) and use INSERT INTO SELECT ...

    I would also consider in my test applying data compression in both the tables

    As the DB is in BULK LOGGED mode, SELECT ... INTO is minimally logged, where as INSERT INTO SELECT is fully logged so will cause more information to be written to the transaction log resulting a potentially longer execution time.

  • As already mentioned it would be better to run this in batches however, is your database used on an evening? are there any downtime periods of the application as to limit the impact to any users?

    If the DB is not used in an evening this would be an ideal time to run your script to populate the other table.

    ***The first step is always the hardest *******

  • anthony.green (6/13/2012)


    Daxesh Patel (6/13/2012)


    if table has a large size data, I think SELECT * INTO should be avoided due to its limitations.

    I would create New table ahead of time with partitions (each partition on different disk) and use INSERT INTO SELECT ...

    I would also consider in my test applying data compression in both the tables

    As the DB is in BULK LOGGED mode, SELECT ... INTO is minimally logged, where as INSERT INTO SELECT is fully logged so will cause more information to be written to the transaction log resulting a potentially longer execution time.

    Agree with this, but overall performance should be tested with different scenarios, SELECT * INTO will create new table in default file group and you cannot take advantage of partitions. Considering log file on different disk overall execution time may have less impact than gain in the write performance. Again I don't know what infrasturcture is available.

    If enough Disks are available, this is worth trying and if not than yes SELECT INTO is the best option and nothing much can be done except improving read operation (e.g. compression)

    Also this operation can be BULK LOGGED...

    This is from BOL

    Using INSERT INTO…SELECT to Bulk Import Data with Minimal Logging

    You can use INSERT INTO <target_table> SELECT <columns> FROM <source_table> to efficiently transfer a large number of rows from one table, such as a staging table, to another table with minimal logging. Minimal logging can improve the performance of the statement and reduce the possibility of the operation filling the available transaction log space during the transaction.

    Minimal logging for this statement has the following requirements:

    The recovery model of the database is set to simple or bulk-logged.

    The target table is empty or is a nonempty heap.

    The target table is not used in replication.

    The TABLOCK hint is specified for the target table.

    Rows that are inserted into a heap as the result of an insert action in a MERGE statement may also be minimally logged.

    Unlike the BULK INSERT statement, which holds a less restrictive Bulk Update lock, INSERT INTO…SELECT with the TABLOCK hint holds an exclusive (X) lock on the table. This means that you cannot insert rows using parallel insert operations.

  • Daxesh Patel (6/13/2012)


    anthony.green (6/13/2012)


    Daxesh Patel (6/13/2012)


    if table has a large size data, I think SELECT * INTO should be avoided due to its limitations.

    I would create New table ahead of time with partitions (each partition on different disk) and use INSERT INTO SELECT ...

    I would also consider in my test applying data compression in both the tables

    As the DB is in BULK LOGGED mode, SELECT ... INTO is minimally logged, where as INSERT INTO SELECT is fully logged so will cause more information to be written to the transaction log resulting a potentially longer execution time.

    Agree with this, but overall performance should be tested with different scenarios, SELECT * INTO will create new table in default file group and you cannot take advantage of partitions. Considering log file on different disk overall execution time may have less impact than gain in the write performance. Again I don't know what infrasturcture is available.

    If enough Disks are available, this is worth trying and if not than yes SELECT INTO is the best option and nothing much can be done except improving read operation (e.g. compression)

    Also this operation can be BULK LOGGED...

    This is from BOL

    Using INSERT INTO…SELECT to Bulk Import Data with Minimal Logging

    You can use INSERT INTO <target_table> SELECT <columns> FROM <source_table> to efficiently transfer a large number of rows from one table, such as a staging table, to another table with minimal logging. Minimal logging can improve the performance of the statement and reduce the possibility of the operation filling the available transaction log space during the transaction.

    Minimal logging for this statement has the following requirements:

    The recovery model of the database is set to simple or bulk-logged.

    The target table is empty or is a nonempty heap.

    The target table is not used in replication.

    The TABLOCK hint is specified for the target table.

    Rows that are inserted into a heap as the result of an insert action in a MERGE statement may also be minimally logged.

    Unlike the BULK INSERT statement, which holds a less restrictive Bulk Update lock, INSERT INTO…SELECT with the TABLOCK hint holds an exclusive (X) lock on the table. This means that you cannot insert rows using parallel insert operations.

    This is also assuming that there are other benefits to the partitioning. If the OP is simply trying to speed it up even though the current operation is acceptable, partitioning may cause more problems than not. Also, the OP states 144GB of ram on the machine, but we don't know how the instance is configured to use that memory. There are many variables here that need to be considered to give a proper answer.

    Jared
    CE - Microsoft

  • All of your information are valid, but performance is not improving.

    Some times batches take much longer time 🙂

  • How many processors/cores? what is your istance's maxdop setting? What is the instance max memory setting?

    Jared
    CE - Microsoft

  • 24 Processors,

    Max degree of Parallelism : 0

    Max server Mem : 2147483647

    Any clue?

  • Yeah, first set your server's max server memory to a sensible value. For 144GB of memory, that would likely be around 130GB

    Taking a few steps back... Why is a SELECT ... INTO of 9 million rows being run often enough that its performance is a concern? What's the reasoning and background?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 1 through 15 (of 30 total)

You must be logged in to reply to this topic. Login to reply