October 20, 2018 at 3:10 pm
Hi Everyone!
This one's a bit strange so bear with me:
I'm seeing extreme insert performance degradation after an insert was interrupted in a dirty fashion (the transaction was not committed and the connection was not closed properly).
Pre incident, insert performance was roughly 100m rows/hour with 500k batches (about 20 seconds/batch).
Post incident, batch inserts with more than ~80k rows will never complete (left it running for more than a day) whereas an insert with 20k rows completes in ~95 seconds but that results in an insert performance of only around 18m rows/hour. The machine the SQL server is installed on does not experience any load. As far as I can tell the server is idling.
What trips me up about this is that dropping the database or even reinstalling the server instance does not resolve the issue. I simply can't reproduce the performance I was seeing pre incident. On top of that, I had observed this exact phenomenon with Azure SQL instances, except that dropping the database "solved" the issue there.
The way this issue presents itself indicates that the problem is not actually with the SQL server but rather on the insert side / the driver. However I've tested basically all possible ways to connect to a MS SQL instance from python with the same result.
The SQL Server 2017 instance is updated and running on a fairly powerful dedicated machine with NVMe drives located in the same datacenter as the client machine.
The inserts are executed using:
- ubuntu 16.04
- Python 2.7
- pyodbc 4.0.23 (also tested pymssql+freetds and pypyodbc)
- Microsoft ODBC driver 17.2.0.1
October 21, 2018 at 7:27 am
Issue is somewhat resolved. There seems to be bug in how pyodbc handles duplicate primary keys on tables with the IGNORE_DUP_KEY = ON option.
Github issue here: https://github.com/mkleehammer/pyodbc/issues/476
October 21, 2018 at 1:21 pm
MadOwl - Sunday, October 21, 2018 7:27 AMIssue is somewhat resolved. There seems to be bug in how pyodbc handles duplicate primary keys on tables with the IGNORE_DUP_KEY = ON option.
Github issue here: https://github.com/mkleehammer/pyodbc/issues/476
What is "pyodbc"? Some form of ETL application?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 21, 2018 at 2:00 pm
Jeff Moden - Sunday, October 21, 2018 1:21 PMMadOwl - Sunday, October 21, 2018 7:27 AMIssue is somewhat resolved. There seems to be bug in how pyodbc handles duplicate primary keys on tables with the IGNORE_DUP_KEY = ON option.
Github issue here: https://github.com/mkleehammer/pyodbc/issues/476What is "pyodbc"? Some form of ETL application?
Python ODBC module 🙂
October 21, 2018 at 4:25 pm
frederico_fonseca - Sunday, October 21, 2018 2:00 PMJeff Moden - Sunday, October 21, 2018 1:21 PMMadOwl - Sunday, October 21, 2018 7:27 AMIssue is somewhat resolved. There seems to be bug in how pyodbc handles duplicate primary keys on tables with the IGNORE_DUP_KEY = ON option.
Github issue here: https://github.com/mkleehammer/pyodbc/issues/476What is "pyodbc"? Some form of ETL application?
Python ODBC module 🙂
K. Thanks. Guess they should change the name of it to "PyroDBC" for when stuff like this catches fire. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
October 22, 2018 at 7:02 am
It seems the issue ended up being unixODBC version 2.3.1. Upgrading to 2.3.7 resolved the problem for me.
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply