How does the target table size impact the insert performance

  • I have a target table which is incrementing with 2 Million records every day from the past 30 days. Earlier this job used to complete in 15 mins, but nowadays it's taking to much of time (10hrs). So, I have tried to reduce the commit size by inserting data in chunks and I can see some improvement. But still, I have to optimize these writes.
    My target table is having only one index (clustered index - identity(1,1) - primary key).
    I am using Insert into ...select from ... as it will reduce the logging.
    Currently, the target table is having 500 milling records.

    Any suggestions would be greatly appreciated

  • Is the target database having to grow during your insert times? What are your autogrow settings?

    Joie Andrew
    "Since 1982"

  • Joie Andrew - Sunday, March 3, 2019 4:25 AM

    Is the target database having to grow during your insert times? What are your autogrow settings?

    If the autogrowth is on a percentage setting, this is to be expected.
    ๐Ÿ˜Ž
    What are the sizes of the database files?

  • MDF Size: 700 gigs
    LDF Size: 70 GB
    MDF - Auto Growth - 512 MB Unlimited
    LDF - Auto Growth - 512 MB, Limited to 1900 Gigs

  • Joie Andrew - Sunday, March 3, 2019 4:25 AM

    Is the target database having to grow during your insert times? What are your autogrow settings?

    MDF Size: 700 gigs
    LDF Size: 70 GB
    MDF - Auto Growth - 512 MB Unlimited
    LDF - Auto Growth - 512 MB, Limited to 1900 Gigs

  • Eirikur Eiriksson - Sunday, March 3, 2019 8:01 AM

    Joie Andrew - Sunday, March 3, 2019 4:25 AM

    Is the target database having to grow during your insert times? What are your autogrow settings?

    If the autogrowth is on a percentage setting, this is to be expected.
    ๐Ÿ˜Ž
    What are the sizes of the database files?

    MDF Size: 700 gigs
    LDF Size: 70 GB
    MDF - Auto Growth - 512 MB Unlimited
    LDF - Auto Growth - 512 MB, Limited to 1900 Gigs

  • I think there may be more to this than just a simple insert.
    Would it be possible to post here a actual explain plan of that insert (and if this is part of a proc you would need to get it from a trace).

    and if you could also post the DDL of that table, the specific code you are using to do the insert (including your "batch chunk" code)

    Regarding your autogrowth - possibly a bit small for that db size - depending on how much it grows per week I would change it to be enough for 4 to 8 weeks growth.
    Would also make sure that IFI (instant file initialization) is enabled for the instance user - see https://www.brentozar.com/blitz/instant-file-initialization/

  • Are you sure there is only one index on the table?
    Does the query you use to insert data reference the insert table in a query (or any other table)?

  • These may seem like random questions, but:
    1) Is the database set for Read Committed Snapshot Isolation?
    2) Are there any deletes on the table?
    3) Are there any old open connections to the database (over SPID 50)? 

    If the answer to 1) & 3) is yes, kill the old /inactive SPIDs and see if that resolves the problem.

    Leo
    Nothing in life is ever so complex that with a little work it can't be made more complicated.

    Leo
    Nothing in life is ever so complicated that with a little work it can't be made more complicated.

  • @frederico_fonseca -  Thanks for your response. Unfortunately, I can't post the query here.
    @Jonathan AC Roberts - Thanks for your response. Yes, There is only one cluster index on the table.
    @Leo.Miller -  My answer is No to all your questions. 
    Today I have tried to load the same data in a non-prod environment, it got completed in 10mins. After seeing this, I doubted the PROD server performance, but DBA confirmed that the server's health is good.
    Digging all the way around...

  • Rocky's - Wednesday, March 6, 2019 5:22 PM

    @frederico_fonseca -  Thanks for your response. Unfortunately, I can't post the query here.
    @Jonathan AC Roberts - Thanks for your response. Yes, There is only one cluster index on the table.
    @Leo.Miller -  My answer is No to all your questions. 
    Today I have tried to load the same data in a non-prod environment, it got completed in 10mins. After seeing this, I doubted the PROD server performance, but DBA confirmed that the server's health is good.
    Digging all the way around...

    Testing on a non-prod unless the target DB is exactly the same size (or near enough) and the volume of data is the same, then its not a valid test.

    what does your DBA see when its taking 10 hours? any experienced DBA would hardly allow a query to go for that long without doing a full investigation and finding out the root cause of the issue and alternative ways of improving it.

    So unless your query is doing a insert into ... select from tbl -- WITHOUT any joins to any table then the issue is most likely your query and/or the way you are doing the loop and/or other load on the server .. few more options, but without the query/plan nearly impossible to help

    A simple loop/insert is not ever privileged information that can't be shared.
    As for the explain plan, you can use SQL Sentry One plan explorer (which is a MUST have for any developer/dba and is free) which will anonymize the plan making it suitable for sharing.

  • Rocky's - Wednesday, March 6, 2019 5:22 PM

    @frederico_fonseca -  Thanks for your response. Unfortunately, I can't post the query here.
    @Jonathan AC Roberts - Thanks for your response. Yes, There is only one cluster index on the table.
    @Leo.Miller -  My answer is No to all your questions. 
    Today I have tried to load the same data in a non-prod environment, it got completed in 10mins. After seeing this, I doubted the PROD server performance, but DBA confirmed that the server's health is good.
    Digging all the way around...

    An obfuscated (preferably actual) plan would be ideal, failing that, an obfuscated query. Neither are difficult or time-consuming to prepare.
    Without one of these, everybody is guessing.

    โ€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.โ€ - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Rocky's - Wednesday, March 6, 2019 5:22 PM

    @frederico_fonseca -  Thanks for your response. Unfortunately, I can't post the query here.

    You could at least give us an idea of what the insert is doing.
    Is it just inserting from another table?:
    INSERT INTO myTable1(col1...)
    SELECT (Col1,...) FROM myTable2

    or is it inserting from another set of tables with joins?:
    INSERT INTO myTable1(col1...)
    SELECT (Col1,...)
    FROM myTable2
    INNER JOIN myTable3 ON...
    INNER JOIN myTable4 ON ...

    because it could be the query that's selecting the data, and not the table that the data is being inserted into, that's causing the problem.

  • First, let's clear up the notion about reducing logging... the only way that you can reduce logging during INSERT/SELECT is if a bunch of things happen...

    1.  The database must be in either the BULK LOGGED or SIMPLE Recovery Model.  If the database is normally in the FULL Recovery Model, then DO NOT USE THE SIMPLE RECOVERY MODEL because it'll break the transaction log file chain, which will need to be restarted either with a FULL or DIF backup after changing back to the FULL Recovery Model.
    2.  If the table has Non-Clustered Indexes on it, you might be in deep Kimchie when it comes to minimal logging even with the use of Trace Flag 610 or the other mumbo-jumbo that MS claims will allow you to do such a thing.
    3.  You CAN get minimal logging for a table with a Clustered Index but a bunch of things need to happen.
    3.1 You MUST use WITH (TABLOCK) on the INSERT statement.
    3.2 You MUST do the insert in the same order as the Clustered Index (CI).  With the CI being on an IDENTITY column, that will be easy.
    3.3 This part is undocumented but I've found it to be very true... if the Insert/Select query that you're using to populate the table has any variables in it, then you almost always have to use OPTION(RECOMPILE) so that the values of the variables are actually assigned in the query plan as run-time constants.

    The size of the table doesn't matter here.  If you're inserting the same amount of data at the end of the CI (thanks to the identity column), then the same amount of data should always take the same amount of time especially if you have "Instant File Initialization" enabled.  Log file growth will cause problems as it always has and always will because of the VLF formatting it has to do but you seem to have that handled because you used growth in MB rather than percent for both the MDF and the LDF.

    All that being said, none of it matters because those are just requirements.  What we need you to do is...
    1.  Post the full DDL for both the source and destination tables.  Make sure that you include the definitions for all indexes, constraints, any related indexed views (and their indexes), triggers, and the definitions of any functions that may appear in computed columns.
    2.  Post the exact code that you're using for the INSERT/SELECT.
    3.  Identify what method you use to update stats (especially on the source table) and how often you do it.
    4.  As the others have stated, attaching an "Actual" execution plan would be incredibly helpful.  The guessing on our part will only continue if you can't provide it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • One more thing to add... IF you're using SET IDENTITY_INSERT tablename ON, then THAT's the problem with performance.  It's a well know fact (and there is a posted plea with MS to fix it) that setting causes the data to materialize in TempDB even when all other requirements for minimal logging have been correctly implemented.  Until they fix it, there is basically nothing that can be done for appending rows to a larger table that will help.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply