SQL 2017 Batch insert performance degradation

  • Hi Everyone!

    This one's a bit strange so bear with me:

    I'm seeing extreme insert performance degradation after an insert was interrupted in a dirty fashion (the transaction was not committed and the connection was not closed properly).

    Pre incident, insert performance was roughly 100m rows/hour with 500k batches (about 20 seconds/batch).

    Post incident, batch inserts with more than ~80k rows will never complete (left it running for more than a day) whereas an insert with 20k rows completes in ~95 seconds but that results in an insert performance of only around 18m rows/hour. The machine the SQL server is installed on does not experience any load. As far as I can tell the server is idling.

    What trips me up about this is that dropping the database or even reinstalling the server instance does not resolve the issue. I simply can't reproduce the performance I was seeing pre incident. On top of that, I had observed this exact phenomenon with Azure SQL instances, except that dropping the database "solved" the issue there.

    The way this issue presents itself indicates that the problem is not actually with the SQL server but rather on the insert side / the driver. However I've tested basically all possible ways to connect to a MS SQL instance from python with the same result.

    The SQL Server 2017 instance is updated and running on a fairly powerful dedicated machine with NVMe drives located in the same datacenter as the client machine.

    The inserts are executed using:

    - ubuntu 16.04
    - Python 2.7
    - pyodbc 4.0.23 (also tested pymssql+freetds and pypyodbc)
    - Microsoft ODBC driver 17.2.0.1

  • Issue is somewhat resolved. There seems to be bug in how pyodbc handles duplicate primary keys on tables with the IGNORE_DUP_KEY = ON option.
    Github issue here: https://github.com/mkleehammer/pyodbc/issues/476

  • MadOwl - Sunday, October 21, 2018 7:27 AM

    Issue is somewhat resolved. There seems to be bug in how pyodbc handles duplicate primary keys on tables with the IGNORE_DUP_KEY = ON option.
    Github issue here: https://github.com/mkleehammer/pyodbc/issues/476

    What is "pyodbc"?  Some form of ETL application?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Sunday, October 21, 2018 1:21 PM

    MadOwl - Sunday, October 21, 2018 7:27 AM

    Issue is somewhat resolved. There seems to be bug in how pyodbc handles duplicate primary keys on tables with the IGNORE_DUP_KEY = ON option.
    Github issue here: https://github.com/mkleehammer/pyodbc/issues/476

    What is "pyodbc"?  Some form of ETL application?

    Python ODBC module 🙂

  • frederico_fonseca - Sunday, October 21, 2018 2:00 PM

    Jeff Moden - Sunday, October 21, 2018 1:21 PM

    MadOwl - Sunday, October 21, 2018 7:27 AM

    Issue is somewhat resolved. There seems to be bug in how pyodbc handles duplicate primary keys on tables with the IGNORE_DUP_KEY = ON option.
    Github issue here: https://github.com/mkleehammer/pyodbc/issues/476

    What is "pyodbc"?  Some form of ETL application?

    Python ODBC module 🙂

    K.  Thanks.  Guess they should change the name of it to "PyroDBC" for when stuff like this catches fire. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It seems the issue ended up being unixODBC version 2.3.1. Upgrading to 2.3.7 resolved the problem for me.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply