How can I improve Insert into select query?

  • Hi everyone,

    • I have a SSIS package that reads data from xml files and transfers them to temporary tables.
    • Then after performing changes on this new data within the temporary tables, this new data is moved to the main permanent tables in my database.
    • To move this new data to permanent tables, I am using "INSERT INTO SELECT" query.
    • I am deleting data from temporary tables once the new data is transferred to permanent tables.
    • I have 20 xml files.
    • I am noticing huge time difference when executing this particular query when new file is loaded.
    • I am assuming that rows are increasing in the permanent tables and hence its taking longer to insert new data each time.
    • I have created indexes and analyzed execution plans.
    • When first file was loaded, execution time of "INSERT INTO SELECT" query was just 2 minutes and when the ssis package is loading last files like third last or second last, execution time is upto 40 minutes of this query.

     

    I can provide more information to understand my case better if its missing from above and if required I can provide the sql queries I am executing.

    My question is basically how can I improve this timing. Index scan is being used when selecting data for inserting because it is supposed to carry all data to permanent tables. I am not using where or any other clause.

    Number of rows in each file is upto 100,000,0.

     

    • This topic was modified 2 years, 3 months ago by  Jobs90312.
  • Are you inserting from the temporary tables to the same 'permanent' table in each case? Or are there different targets?

    100,000,0 is not a number – please clarify.

    I'd suggest doing the initial inserts into permanent 'staging' tables, rather than temporary tables. TRUNCATE the tables before doing the initial inserts and call dedicated stored procedures to move the data from the staging tables to the permanent table.

    As you are inserting all data, indexes will slow things down rather than speed them up.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil Parkin wrote:

    Are you inserting from the temporary tables to the same 'permanent' table in each case? Or are there different targets?

    100,000,0 is not a number – please clarify.

    I'd suggest doing the initial inserts into permanent 'staging' tables, rather than temporary tables. TRUNCATE the tables before doing the initial inserts and call dedicated stored procedures to move the data from the staging tables to the permanent table.

    As you are inserting all data, indexes will slow things down rather than speed them up.

     

    Are you inserting from the temporary tables to the same 'permanent' table in each case? Or are there different targets?

    Yes same table.

    Like I have a table named "Individual". I have created the exact same temporary table with same columns named "Individual_TEMP".

    First all the new rows goes into TEMP table, there I change the things I want to and then move the data to Individual using

    "Insert into individual <column names>

    select <column names>

    from Individual_TEMP"

    And then I use "delete from Individual_TEMP" to clear the TEMP table before loading the new file.

     

    I'd suggest doing the initial inserts into permanent 'staging' tables, rather than temporary tables. TRUNCATE the tables before doing the initial inserts and call dedicated stored procedures to move the data from the staging tables to the permanent table.

    These are staging tables, I was just using the term temporary as data is moved to them temporary.

    Can you please tell me which dedicated sps you are talking about?

    As you are inserting all data, indexes will slow things down rather than speed them up.

    If I delete all the indexes from permanent tables and create them after all files data is uploaded, will that help?

    • This reply was modified 2 years, 3 months ago by  Jobs90312.
    • This reply was modified 2 years, 3 months ago by  Jobs90312.
  • Referring to staging tables as temporary tables is confusing!

    For each of the staging tables, create a stored procedure. This stored proc should take care of 'changing the things you want' and inserting to the target table.

    The pattern I would use would be as follows:

    1. TRUNCATE all staging tables involved in the job. (TRUNCATE is significantly faster than DELETE.)
    2. Import from XML files to staging tables. Importing several files in parallel is likely to give you a performance lift, though testing is required to be sure.
    3. Call your stored procedures in sequence.

    You should perform testing on disabling indexes before inserting and then re-enabling them. I would expect an improvement in performance by doing this. If you do disable indexes, you may also find that running part (3) in parallel gives you performance gains.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil Parkin wrote:

    Referring to staging tables as temporary tables is confusing!

    For each of the staging tables, create a stored procedure. This stored proc should take care of 'changing the things you want' and inserting to the target table.

    The pattern I would use would be as follows:

    1. TRUNCATE all staging tables involved in the job. (TRUNCATE is significantly faster than DELETE.)
    2. Import from XML files to staging tables. Importing several files in parallel is likely to give you a performance lift, though testing is required to be sure.
    3. Call your stored procedures in sequence.

    You should perform testing on disabling indexes before inserting and then re-enabling them. I would expect an improvement in performance by doing this. If you do disable indexes, you may also find that running part (3) in parallel gives you performance gains.

     

    Ok I will all the things you mentioned and report back.

    One more question though. When we assign primary key to a column it creates clustered index by default on that column but I know that in future I will be using a different column or columns to select or update my data.

    So is it normal to delete the primary key clustered index to create your own clustered index on your desired columns?

  • One more question though. When we assign primary key to a column it creates clustered index by default on that column but I know that in future I will be using a different column or columns to select or update my data.

    So is it normal to delete the primary key clustered index to create your own clustered index on your desired columns?

    When you create a PK, you have the option of making it clustered or nonclustered.

    Without seeing some DDL, I cannot advise on the suitability of creating a clustered PK on multiple columns.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Jobs90312 wrote:

    One more question though. When we assign primary key to a column it creates clustered index by default on that column but I know that in future I will be using a different column or columns to select or update my data.

    So is it normal to delete the primary key clustered index to create your own clustered index on your desired columns?

    no its not normal to delete the PK - what is normal is to define the PK as non clustered in the first place!! as per example below.

    create table xx (f1 int, f2 int)

    alter table xx add constraint pk_xx primary key (f1) non clustered

    create clustered index CI_xx on xx (f2)

  • This was removed by the editor as SPAM

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply