March 28, 2024 at 3:57 pm
Hello people, I am an SQL beginner and I have a problem with the upsert operation.
I am not sure how relevant this is but to give some context, I am trying to write batches of messages from Azure EH to the MS SQL Server by using Azure Functions written in Python.
I send about 500,000 messages to the EH and then start Function to send them from there to the table named orders on SQL Server.
But the problem is that this operation takes too long (at least I think), it takes about an hour and a half to write all the records to the orders table.
I don't think that the problem is with the configuration of the function itself because I am using a very similar function to accomplish upsert in the Postgres database and for the same number of messages it takes about 10 minutes. For Postgres I used ON CONFLICT DO UPDATE syntax.
Also, when I look a little more closely at logs I noticed often this message: "Transaction (Process ID) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction."
Of course, transaction will be rerun by Function, and eventually, all messages will be written to the database but I think that a great number of deadlocks is the reason why performance is so bad.
At first, I used MERGE statement for SQL Server upsert operation, and it performed badly so I read around that UPDATE/INSERT combo might perform better but right now that is not the case. They are pretty similar.
I came across some very well-written articles about this subject and about the best patterns to use for upsert operation in SQL Server, for example, I read that @@ROWCOUNT can be used as a condition to skip insert or update part of the statement and to speed it up that way, but I can not use that because I am writing records to the table in batches.
This is some general info:
My orders table looks like this:
column_name data_type
order_id bigint
item_id int
user_id bigint
quantity int
customer varchar
order_id is primary key.
This is how my UPDATE/INSERT statement looks:
UPDATE orders WITH (UPDLOCK, SERIALIZABLE)
SET orders.item_id = src.item_id, orders.user_id = src.user_id, orders.quantity = src.quantity,
orders.customer = src.customer
FROM orders
INNER JOIN (
SELECT * FROM (VALUES (%(order_id)s, %(item_id)s, %(user_id)s, %(quantity)s, %(customer)s)) AS
src(order_id, item_id, user_id, quantity, customer)
) AS src ON orders.order_id = src.order_id;
INSERT INTO orders(order_id, item_id, user_id, quantity, customer)
SELECT * FROM (VALUES (%(order_id)s, %(item_id)s, %(user_id)s, %(quantity)s, %(customer)s))
AS src(order_id, item_id, user_id, quantity, customer)
WHERE NOT EXISTS (
SELECT 1 FROM orders
WHERE orders.order_id = src.order_id
);
I would like to know if is there any way to improve this statement or if is there some alternative to it, for example, some pattern for MERGE that I missed or something else entirely?
I am sorry if you see this question twice, I am posting again because for some reason it was removed first time, probably because I don't have enough points.
Thanks much for any help!
March 29, 2024 at 4:10 pm
Thanks for posting your issue and hopefully someone will answer soon.
This is an automated bump to increase visibility of your question.
March 29, 2024 at 4:28 pm
It's difficult to know much based on the explanation. The SQL code provided isn't valid SQL Server syntax (afaik) and appears to be embedded strings for which the parameters are replaced prior to execution. Azure Function are similar to API endpoints, no? What's the full data flow? Generally, JSON data gets sent to an endpoint, there's some data conversions, and then one or more SQL statement is executed. There are ways to do that without Python, without data conversions, and without multiple executions. Basically, send the JSON to SQL as a stored procedure parameter
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
March 30, 2024 at 6:39 am
This was removed by the editor as SPAM
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply