Performance issue (e.g. top 60000 takes 1000x longer than top 50000 on insert)

  • Hi all

    Short version:

    USE StagingDBtemp -- new empty database with same design as StagingDB created for no reason other than troubleshooting this

    IDENTITY INSERT Incoming ON

    INSERT INTO Incoming (ID, DataField1, DataField2) SELECT TOP 60000 ID, DataField1, DataField2 FROM StagingDB..Incoming -- note: ID is auto-incrementing INT and PK and there are no other indexes on the table.

    IDENTITY INSERT Incoming OFF

    Query above takes very long. If I change the SELECT to TOP 50000, it runs in a split second, but as soon as I make it TOP 60000, it takes ±20 minutes. Only SELECT is fine (returns data in < 1 second even without TOP clause with current 88k rows in the Incoming table).

    Longer version / More background:

    I'm busy with an integration project for one of my clients (running SQL 2012 SP4, Enterprise edition).

    Basic design is there is a web service API that writes incoming data into an Incoming table. Generally, data doesn't live on the Incoming table for very long, but there is some volume flowing through it. It either gets marked as not for our use and archived (i.e. inserted to a separate archive table and then deleted) or processed (this involves some transformation processes and ultimately an insert of transformed data into a separate table) and then archived. All of these processes happen in stored procs. Once a month there is a peak period where there will be a few million rows going in and out of the Incoming table over the course of a few days and typically in 5-6 batches each with something between 5k and 1M rows per batch.

    The same design is followed for another very similar client and has been running in production for several months without any performance issues. We went live with the new client a few days ago, but performance is terrible. I've been investigating and experiencing some extremely weird behaviour.

    On Sunday evening I ran the entire process (with ± 10 million rows in the Incoming table - this was larger as it was the first batch, typically we won't have this type of volume in one batch) and it took extremely long (±6 hours) - at the other client we seldom have data sets this big, but I know from experience ±2 million rows typically runs in <30 minutes, so I thought Sunday night's batch was slow but since it's more data and unlikely to repeat, I wasn't too worried. However, it's getting worse.

    For instance, this evening we're trying to process 88k rows in the Incoming table, but after the process had been running for 2 hours I stopped it to try and troubleshoot. Had a look at most expensive recent queries and identified a few troublesome ones. One of these is an UPDATE...FROM statement with two JOINS to other tables (very small, both <1000 rows) that takes very long to execute with the current 88k rows in the Incoming table (>10 minutes). When I replace the UPDATE portion with just a SELECT, the SELECT executes in ±3 seconds.

    I then noticed similar behaviour on inserts. In pure desperation I decided to see if it helps if I move all the data to a new DB (that doesn't have any history, but the same table design - created through a create script with all the objects) and noticed the quirk on the insert to this as well that is illustrated by the code sample above. I'm attaching actual query plans for the top 50000 and top 60000 queries (the top 50000 one executes in <1 sec, the top 60000 took almost 20 minutes).

    Once the data is in the new DB, the entire process on the 88k rows still runs very slowly (I started it 20 minutes ago and it's still running, would have expected that to take <1 minute based on my experience at the other client).

    I'm busy downloading a copy of the DB to try and further troubleshoot locally, but in the meantime any pointers would be much appreciated.

  • Attachments seem to have not worked... Trying again with both in a zip file.

    Attachments:
    You must be logged in to view attached files.
  • Further update: I took a backup of the database to a different server (2019 Dev edition), restored there and ran the entire process in 56 seconds. On the client's server it is now running more than an hour and a half and still going...

    I can't believe I only checked this now, but I just noticed that the client's server memory is completely maxing out while the process runs and the server has only 4GB of memory (my dev server that I tried the experiment on has 32GB), so I suspect this may be why it's going wrong.

    I'll follow up with the client in the morning and see if they can allocate a bit more RAM.

    Any comments on possible other causes / contributing factors would be highly appreciated. I don't exactly want to insist the client upgrade their server only to find that it doesn't help.

  • If you look at the message on the INSERT it says:

    "The query had to wait 1164 seconds for MemoryGrant during execution"

    That's nearly 20 minutes.

    https://techcommunity.microsoft.com/t5/sql-server/understanding-sql-server-memory-grant/ba-p/383595

    Try adding this to the end of the INSERT query:

    OPTION (MAXDOP 1)
  • Jonathan AC Roberts wrote:

    If you look at the message on the INSERT it says:

    "The query had to wait 1164 seconds for MemoryGrant during execution"

    That's nearly 20 minutes.

    https://techcommunity.microsoft.com/t5/sql-server/understanding-sql-server-memory-grant/ba-p/383595

    Try adding this to the end of the INSERT query:

    OPTION (MAXDOP 1)

     

    Thanks, I missed that message. Clearly a memory issue then! Will try the MAXDOP suggestion as well. For now it's completed running (in 4 hours 49 minutes) while I got some sleep so hopefully I can get them to increase the memory before the next run.

  • I would add an ORDER BY Id clause to your SELECT, to help ensure consistency between the two versions of the query.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • There are actually a couple of compounding problems with the code... and it's not because of the code... it's because of the stupid stuff MS has done.

    First, there's a problem with SET IDENTITY INSERT ON... it forces a full sort in temp DB even if a sort isn't needed.

    Second, tempdb now has this thing where they've made it work like the old Trace Flag 1117 and all files grow to the same size.  That causes the above SET IDENTITY INSERT ON problem to be even worse.

    All of that takes on the form of "memory" and, as you've probably noticed especially with the small amount of RAM you're talking about, you're probably banging the hell out of a swap-file.

    {EDIT}  Probably not a problem on the latter issue above in 2012 but the forced sort will still be.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply