June 13, 2012 at 6:59 am
Hi All,
Data : 9 Million records
144 GB Ram
what about the Performance of Select * into table2 from table1 in Bulk recovery model.
Here it is taking around 1 hour
Any one has any idea to improve the query to 1-10 Minutes?
Regards,
Varun
June 13, 2012 at 7:07 am
If possible use a where clause to limit the data instead of copying 9 million rows
Pre-allocate the storage space required so SQL doesnt need to grow the files
June 13, 2012 at 7:21 am
sp_spaceused - information
60656288 - reserved
32717008 - data
27924440 - index_size
14840 - unused
June 13, 2012 at 7:27 am
if table has a large size data, I think SELECT * INTO should be avoided due to its limitations.
I would create New table ahead of time with partitions (each partition on different disk) and use INSERT INTO SELECT ...
I would also consider in my test applying data compression in both the tables
June 13, 2012 at 7:28 am
Is that for the DB or the table?
June 13, 2012 at 7:29 am
I would look into SSIS and play with the fast load options and the batch size to get an optimal solution. Consider dropping and recreating your indexes on the target table.
edit: no files involved forget bulk insert
June 13, 2012 at 7:30 am
Daxesh Patel (6/13/2012)
if table has a large size data, I think SELECT * INTO should be avoided due to its limitations.I would create New table ahead of time with partitions (each partition on different disk) and use INSERT INTO SELECT ...
I would also consider in my test applying data compression in both the tables
As the DB is in BULK LOGGED mode, SELECT ... INTO is minimally logged, where as INSERT INTO SELECT is fully logged so will cause more information to be written to the transaction log resulting a potentially longer execution time.
June 13, 2012 at 7:37 am
As already mentioned it would be better to run this in batches however, is your database used on an evening? are there any downtime periods of the application as to limit the impact to any users?
If the DB is not used in an evening this would be an ideal time to run your script to populate the other table.
***The first step is always the hardest *******
June 13, 2012 at 7:40 am
anthony.green (6/13/2012)
Daxesh Patel (6/13/2012)
if table has a large size data, I think SELECT * INTO should be avoided due to its limitations.I would create New table ahead of time with partitions (each partition on different disk) and use INSERT INTO SELECT ...
I would also consider in my test applying data compression in both the tables
As the DB is in BULK LOGGED mode, SELECT ... INTO is minimally logged, where as INSERT INTO SELECT is fully logged so will cause more information to be written to the transaction log resulting a potentially longer execution time.
Agree with this, but overall performance should be tested with different scenarios, SELECT * INTO will create new table in default file group and you cannot take advantage of partitions. Considering log file on different disk overall execution time may have less impact than gain in the write performance. Again I don't know what infrasturcture is available.
If enough Disks are available, this is worth trying and if not than yes SELECT INTO is the best option and nothing much can be done except improving read operation (e.g. compression)
Also this operation can be BULK LOGGED...
This is from BOL
Using INSERT INTO…SELECT to Bulk Import Data with Minimal Logging
You can use INSERT INTO <target_table> SELECT <columns> FROM <source_table> to efficiently transfer a large number of rows from one table, such as a staging table, to another table with minimal logging. Minimal logging can improve the performance of the statement and reduce the possibility of the operation filling the available transaction log space during the transaction.
Minimal logging for this statement has the following requirements:
The recovery model of the database is set to simple or bulk-logged.
The target table is empty or is a nonempty heap.
The target table is not used in replication.
The TABLOCK hint is specified for the target table.
Rows that are inserted into a heap as the result of an insert action in a MERGE statement may also be minimally logged.
Unlike the BULK INSERT statement, which holds a less restrictive Bulk Update lock, INSERT INTO…SELECT with the TABLOCK hint holds an exclusive (X) lock on the table. This means that you cannot insert rows using parallel insert operations.
June 13, 2012 at 8:04 am
Daxesh Patel (6/13/2012)
anthony.green (6/13/2012)
Daxesh Patel (6/13/2012)
if table has a large size data, I think SELECT * INTO should be avoided due to its limitations.I would create New table ahead of time with partitions (each partition on different disk) and use INSERT INTO SELECT ...
I would also consider in my test applying data compression in both the tables
As the DB is in BULK LOGGED mode, SELECT ... INTO is minimally logged, where as INSERT INTO SELECT is fully logged so will cause more information to be written to the transaction log resulting a potentially longer execution time.
Agree with this, but overall performance should be tested with different scenarios, SELECT * INTO will create new table in default file group and you cannot take advantage of partitions. Considering log file on different disk overall execution time may have less impact than gain in the write performance. Again I don't know what infrasturcture is available.
If enough Disks are available, this is worth trying and if not than yes SELECT INTO is the best option and nothing much can be done except improving read operation (e.g. compression)
Also this operation can be BULK LOGGED...
This is from BOL
Using INSERT INTO…SELECT to Bulk Import Data with Minimal Logging
You can use INSERT INTO <target_table> SELECT <columns> FROM <source_table> to efficiently transfer a large number of rows from one table, such as a staging table, to another table with minimal logging. Minimal logging can improve the performance of the statement and reduce the possibility of the operation filling the available transaction log space during the transaction.
Minimal logging for this statement has the following requirements:
The recovery model of the database is set to simple or bulk-logged.
The target table is empty or is a nonempty heap.
The target table is not used in replication.
The TABLOCK hint is specified for the target table.
Rows that are inserted into a heap as the result of an insert action in a MERGE statement may also be minimally logged.
Unlike the BULK INSERT statement, which holds a less restrictive Bulk Update lock, INSERT INTO…SELECT with the TABLOCK hint holds an exclusive (X) lock on the table. This means that you cannot insert rows using parallel insert operations.
This is also assuming that there are other benefits to the partitioning. If the OP is simply trying to speed it up even though the current operation is acceptable, partitioning may cause more problems than not. Also, the OP states 144GB of ram on the machine, but we don't know how the instance is configured to use that memory. There are many variables here that need to be considered to give a proper answer.
Jared
CE - Microsoft
June 13, 2012 at 8:07 am
All of your information are valid, but performance is not improving.
Some times batches take much longer time 🙂
June 13, 2012 at 8:10 am
How many processors/cores? what is your istance's maxdop setting? What is the instance max memory setting?
Jared
CE - Microsoft
June 13, 2012 at 8:17 am
24 Processors,
Max degree of Parallelism : 0
Max server Mem : 2147483647
Any clue?
June 13, 2012 at 8:35 am
Yeah, first set your server's max server memory to a sensible value. For 144GB of memory, that would likely be around 130GB
Taking a few steps back... Why is a SELECT ... INTO of 9 million rows being run often enough that its performance is a concern? What's the reasoning and background?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 15 posts - 1 through 15 (of 30 total)
You must be logged in to reply to this topic. Login to reply