April 27, 2020 at 1:35 pm
With all the reading and analysis' that I've been doing based on Paul Randal's Wait Stats articles, I come to the conclusion that the issue is an IO subsystem issue... But how do I just tell customers or my bosses: "just add better disks"...
For some reason, it doesn't pass well. 🙂
April 27, 2020 at 1:38 pm
The trigger is already in place and I can't just change the logic of inserting the deleted records when We already started inserting the inserted records.
The Diff is necessary due to the fact that our software can update many tables in one process... They don't check if an update is necessary before updating, so I need to do that logic to minimize the number of entries in the audit table.
I have to try to make the best with all these restrictions I have...
I'm not sure who the post above is directed to. I made no changes to your code (well, except to make the test data creation a whole lot faster) and the final update of all 1 million rows only took 8 seconds. How is that a problem?
--Jeff Moden
Change is inevitable... Change for the better is not.
April 27, 2020 at 1:40 pm
With all the reading and analysis' that I've been doing based on Paul Randal's Wait Stats articles, I come to the conclusion that the issue is an IO subsystem issue... But how do I just tell customers or my bosses: "just add better disks"...
For some reason, it doesn't pass well. 🙂
Do what I just did. Tell them it only takes 8 seconds on a different box. How long does it take your test code to run the final update on your production box for a million rows?
--Jeff Moden
Change is inevitable... Change for the better is not.
April 27, 2020 at 1:41 pm
That response was for Aaron... after my first response to him, I noticed that I didn't answer all of his questions.
April 27, 2020 at 1:56 pm
When the software is installed on one of the local test box, the update took 9 minutes and on my test box it took 4 hours (my test box isn't as powerful).
My bosses questioned this because without the trigger it takes 30 seconds (on the system that takes 9 minutes with the trigger) which they find unacceptable.
April 27, 2020 at 2:07 pm
There is no question that the trigger is adding time to the problem. When you do a join of a million rows between two logical tables that probably spilled to disk and then do more than 200 ORs that must meet a <> condition in the WHERE clause is going to be a problem and that's why I asked about how many columns.
The bottom line is that you can't replace a 30 ton dump truck with an ox cart. They're doing the comparisons in the wrong place. The trigger isn't the place for this.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 27, 2020 at 2:18 pm
I agree with you but to fix the trigger for it not to spill, I would think that if I insert smaller chunks at a time in the audit table it might help.
The thing is that I'm not sure that it's possible to create a cursor within a trigger to do just that... And if it is possible, I'm not even sure it will perform better.
April 27, 2020 at 3:51 pm
The cursor won't help. This will spread the update over time, at the expense of slower commit times for the client, which just moves the problem. Might cause an unresponsive app for the user.
Ultimately, the load of writing these changes from table x to table y is the load. For Jeff, this is 8sec, but that's a single test. If you take minutes to write on a test box, perhaps understanding the resource differences is worth doing. If there aren't enough IOPs available to do the writes, there aren't.
Someone can examine disk reads/writes/sec, queue length, and get latency from DMVs. If the test or prod box is slow, it needs new disks. You can get some metrics from a box that works well, or maybe Jeff can post some, and then compare across machines. This can help provide the "you need better disk" rationale.
April 27, 2020 at 6:23 pm
So I moved my tests to another machine and ran the Paul Randal Script that measure IO latencies... My machine has over 200 ms of IO read and write latencies for the tempdb while this new environment has 1ms for both read and write latencies.
So this new environment has SSD drives, but still the script takes 2 seconds without the trigger and 23sec with the trigger.
and when analyzing the execution plan all the time is on the "Table Insert"... the original script in Prod for 3 million rows and 200 columns goes from 30 sec without the trigger to 7 minutes with the trigger... Which they find unacceptable... They want me to find a way to reduce the time that it takes in the trigger...
So now that I have a baseline where my bottleneck isn't the disks, I will try to work on parts of the trigger in prod (which more complicated than the example given here) to improve some other parts of the trigger...
What else can affect the execution plan's "table insert" other than memory and disks which I can work on to improve the performance of the trigger?
April 27, 2020 at 8:23 pm
I've always felt that it was worth it to pay for a second audit tool that is external to the database. Auditing in the same database mixes both data and metadata, in violation of a basic principle of data modeling, and is about as dumb as keeping a spare set of keys in your glove compartment so that in case get locked out of your car you'll always know where the second set of keys is. What happens when your database crashes your audit trail? Gee, it goes down in flames too!
Please post DDL and follow ANSI/ISO standards when asking for help.
April 27, 2020 at 8:44 pm
I've always felt that it was worth it to pay for a second audit tool that is external to the database. Auditing in the same database mixes both data and metadata, in violation of a basic principle of data modeling, and is about as dumb as keeping a spare set of keys in your glove compartment so that in case get locked out of your car you'll always know where the second set of keys is. What happens when your database crashes your audit trail? Gee, it goes down in flames too!
Now THAT is something that I whole-heartedly agree with, Joe!
--Jeff Moden
Change is inevitable... Change for the better is not.
April 27, 2020 at 8:47 pm
Very good point. The thing is that we kept all the audit info in text files until we introduced the auditing tables. When installing our 30-year-old evolving software, although it's doable, I don't see how we can have 2 databases. this would introduce complexity in reports and processes... especially that we now we also use SQL Azure as a DB option. In Azure, you can't just access a different database as easily as an on-premise installation.
There's a backup that is done that can always be restored.
April 27, 2020 at 9:11 pm
Getting back to your original problem... I think that a large part of the pain you're going through is that the inclusion of which "fields" changed in the VARCHAR(MAX) constitutes and "Expansive Update". Now, that's not so bad on Heaps like you have insofar as page splits but it DOES mean that you're writing all the data 3 times and the VARCHAR(MAX) column twice... once to the log file, once to the Heap, and then the update causes forwarded rows, which is another full set of writes to the Heap along with pointer updates and the recording of the updated VARCHAR(MAX) column. Actually, it might be writing some of the data more times than that because you're also writing to the NCI, which gets written to the log and the index and then when the main rows are forwarded in the heap, the heap pointers in the NCI have to be updated to match and that's also fully logged.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 28, 2020 at 4:05 pm
Jeff: Not quite sure I understand.
in the meantime, I don't know how I overlooked the following...
The update of 3 million rows took 27 seconds without my original trigger. While doing some tests, I created my Inserted and Deleted Tables by running the Select * into myInserted from PriceTable and also did for MyDeleted... it took 3 minutes each. Which means that I can't go lower than the 3 minutes it took to insert (it's the bare minimum)
With the trigger, it took 7 minutes as stated earlier. I need to work on the 4 minutes extra that are the result of the trigger.
One of the first things that my original trigger does is to verify if there are differences between the inserted and deleted data... otherwise I exit the trigger. That alone takes 1 minute...
Lesson learned, not because your execution plan shows a process with a high cost means that the issue is there.
I will continue troubleshooting and tweaking the trigger to get the best performance. I now realize that I can't go faster than a bare insert takes.
Viewing 14 posts - 16 through 28 (of 28 total)
You must be logged in to reply to this topic. Login to reply