SQL 2017 Batch insert performance degradation

Question

SQL 2017 Batch insert performance degradation

MadOwl

SSC Veteran

Points: 240
More actions
October 20, 2018 at 3:10 pm

#366167

Hi Everyone!
This one's a bit strange so bear with me:
I'm seeing extreme insert performance degradation after an insert was interrupted in a dirty fashion (the transaction was not committed and the connection was not closed properly).
Pre incident, insert performance was roughly 100m rows/hour with 500k batches (about 20 seconds/batch).
Post incident, batch inserts with more than ~80k rows will never complete (left it running for more than a day) whereas an insert with 20k rows completes in ~95 seconds but that results in an insert performance of only around 18m rows/hour. The machine the SQL server is installed on does not experience any load. As far as I can tell the server is idling.
What trips me up about this is that dropping the database or even reinstalling the server instance does not resolve the issue. I simply can't reproduce the performance I was seeing pre incident. On top of that, I had observed this exact phenomenon with Azure SQL instances, except that dropping the database "solved" the issue there.
The way this issue presents itself indicates that the problem is not actually with the SQL server but rather on the insert side / the driver. However I've tested basically all possible ways to connect to a MS SQL instance from python with the same result.
The SQL Server 2017 instance is updated and running on a fairly powerful dedicated machine with NVMe drives located in the same datacenter as the client machine.
The inserts are executed using:
- ubuntu 16.04
- Python 2.7
- pyodbc 4.0.23 (also tested pymssql+freetds and pypyodbc)
- Microsoft ODBC driver 17.2.0.1

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

MadOwl SSC Veteran Points: 240 More actions · Answer 1

Issue is somewhat resolved. There seems to be bug in how pyodbc handles duplicate primary keys on tables with the IGNORE_DUP_KEY = ON option.
Github issue here: https://github.com/mkleehammer/pyodbc/issues/476

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 2

MadOwl - Sunday, October 21, 2018 7:27 AM
Issue is somewhat resolved. There seems to be bug in how pyodbc handles duplicate primary keys on tables with the IGNORE_DUP_KEY = ON option.
Github issue here: https://github.com/mkleehammer/pyodbc/issues/476

What is "pyodbc"? Some form of ETL application?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

frederico_fonseca SSCoach Points: 16171 More actions · Answer 3

Jeff Moden - Sunday, October 21, 2018 1:21 PM
MadOwl - Sunday, October 21, 2018 7:27 AM
Issue is somewhat resolved. There seems to be bug in how pyodbc handles duplicate primary keys on tables with the IGNORE_DUP_KEY = ON option.
Github issue here: https://github.com/mkleehammer/pyodbc/issues/476
What is "pyodbc"? Some form of ETL application?

Python ODBC module 🙂

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 4

frederico_fonseca - Sunday, October 21, 2018 2:00 PM
Jeff Moden - Sunday, October 21, 2018 1:21 PM
MadOwl - Sunday, October 21, 2018 7:27 AM
Issue is somewhat resolved. There seems to be bug in how pyodbc handles duplicate primary keys on tables with the IGNORE_DUP_KEY = ON option.
Github issue here: https://github.com/mkleehammer/pyodbc/issues/476
What is "pyodbc"? Some form of ETL application?
Python ODBC module 🙂

K. Thanks. Guess they should change the name of it to "PyroDBC" for when stuff like this catches fire. 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

MadOwl SSC Veteran Points: 240 More actions · Answer 5

It seems the issue ended up being unixODBC version 2.3.1. Upgrading to 2.3.7 resolved the problem for me.